Message ID | 1606123198-6230-1-git-send-email-moshe@mellanox.com |
---|---|
Headers | show |
Series | Add support for DSFP transceiver type | expand |
On Mon, Nov 23, 2020 at 11:19:56AM +0200, Moshe Shemesh wrote: > Add support for new cable module type DSFP (Dual Small Form-Factor Pluggable > transceiver). DSFP EEPROM memory layout is compatible with CMIS 4.0 spec. Add > CMIS 4.0 module type to UAPI and implement DSFP EEPROM dump in mlx5. So the patches themselves look O.K. But we are yet again kicking the can down the road and not fixing the underlying inflexibility of the API. Do we want to keep kicking the can, or is now the time to do the work on this API? Andrew
On 11/24/2020 11:16 PM, Jakub Kicinski wrote: > On Tue, 24 Nov 2020 02:14:59 +0100 Andrew Lunn wrote: >> On Mon, Nov 23, 2020 at 11:19:56AM +0200, Moshe Shemesh wrote: >>> Add support for new cable module type DSFP (Dual Small Form-Factor Pluggable >>> transceiver). DSFP EEPROM memory layout is compatible with CMIS 4.0 spec. Add >>> CMIS 4.0 module type to UAPI and implement DSFP EEPROM dump in mlx5. >> So the patches themselves look O.K. >> >> But we are yet again kicking the can down the road and not fixing the >> underlying inflexibility of the API. >> >> Do we want to keep kicking the can, or is now the time to do the work >> on this API? > This is hardly rocket science. Let's do it right. OK, we will add API options to select bank and page to read any specific page the user selects. So advanced user will use it get the optional pages he needs, but what about non advanced user who wants to use the current API with a current script for DSFP EEPROM. Isn't it better that he will get the 5 mandatory pages then keep it not supported ?
> OK, we will add API options to select bank and page to read any specific > page the user selects. So advanced user will use it get the optional pages > he needs, but what about non advanced user who wants to use the current API > with a current script for DSFP EEPROM. Isn't it better that he will get the > 5 mandatory pages then keep it not supported ? Users using ethtool will not see a difference. They get a dump of what ethtool knows how to decode. It should try the netlink API first, and then fall back to the old ioctl interface. If i was implementing the ethtool side of it, i would probably do some sort of caching system. We know page 0 should always exist, so pre-load that into the cache. Try the netlink API first. If that fails, use the ioctl interface. If the ioctl is used, put everything returned into the cache. The decoder can then start decoding, see what bits are set indicating other pages should be available. Ask for them from the cache. The netlink API can go fetch them and load them into the cache. If they cannot be loaded return ENODEV, and the decoder has to skip what it wanted to decode. If you do it correctly, the decoder should not care about ioctl vs netlink. I can do a follow up patch for the generic SFP code in drivers/net/phy, once you have done the first implementation. But i only have a limited number of SFPs and most are 1G only. Russell King can hopefully test with his collection. Andrew
On 11/25/2020 4:18 PM, Andrew Lunn wrote: > External email: Use caution opening links or attachments > > >> OK, we will add API options to select bank and page to read any specific >> page the user selects. So advanced user will use it get the optional pages >> he needs, but what about non advanced user who wants to use the current API >> with a current script for DSFP EEPROM. Isn't it better that he will get the >> 5 mandatory pages then keep it not supported ? > Users using ethtool will not see a difference. They get a dump of what > ethtool knows how to decode. It should try the netlink API first, and > then fall back to the old ioctl interface. Yes, it makes sense that whenever command not supported by netlink API it falls back to old ioctl interface. As I see it we want here to add bank and page options to netlink API to get data from specific page. > > If i was implementing the ethtool side of it, i would probably do some > sort of caching system. We know page 0 should always exist, so > pre-load that into the cache. Try the netlink API first. If that > fails, use the ioctl interface. If the ioctl is used, put everything > returned into the cache. I am not sure what you mean by cache here. Don't you want to read page 0 once you got the ethtool command to read from the module ? If not, then at what stage ? > The decoder can then start decoding, see what > bits are set indicating other pages should be available. Ask for them > from the cache. The netlink API can go fetch them and load them into > the cache. If they cannot be loaded return ENODEV, and the decoder has > to skip what it wanted to decode. So the decoder should read page 0 and check according to page 0 and specification which pages should be present, right ? What about the global offset that we currently got when user doesn't specify a page, do you mean that this global offset goes through the optional and non optional pages that exist and skip the ones that are missing according to the specific EEPROM ? > If you do it correctly, the decoder > should not care about ioctl vs netlink. > > I can do a follow up patch for the generic SFP code in > drivers/net/phy, once you have done the first implementation. But i > only have a limited number of SFPs and most are 1G only. Russell King > can hopefully test with his collection. > > Andrew
> > If i was implementing the ethtool side of it, i would probably do some > > sort of caching system. We know page 0 should always exist, so > > pre-load that into the cache. Try the netlink API first. If that > > fails, use the ioctl interface. If the ioctl is used, put everything > > returned into the cache. > I am not sure what you mean by cache here. Don't you want to read page 0 > once you got the ethtool command to read from the module ? If not, then at > what stage ? At the beginning, you try the netlink API and ask for pager 0, bytes 0-127. If you get a page, put it into the cache. If not, use the ioctl interface, which could return one page, or multiple pages. Put them all into the cache. > > The decoder can then start decoding, see what > > bits are set indicating other pages should be available. Ask for them > > from the cache. The netlink API can go fetch them and load them into > > the cache. If they cannot be loaded return ENODEV, and the decoder has > > to skip what it wanted to decode. > > So the decoder should read page 0 and check according to page 0 and > specification which pages should be present, right ? Yes. It ask the cache, give me a pointer to page 0, bytes 0-127. It then decodes that, looking at the enumeration data to indicate what other pages should be available. Maybe it decides, page 0, bytes 128-255 should exist, so it asks the cache for a pointer to that. If using netlink, it would ask the kernel for that data, put it into the cache, and return a pointer. If using ioctl, it already knows if it has that data, so it just returns a pointer, so says sorry, not available. > What about the global offset that we currently got when user doesn't specify > a page, do you mean that this global offset goes through the optional and > non optional pages that exist and skip the ones that are missing according > to the specific EEPROM ? ethtool -m|--dump-module-eeprom|--module-info devname [raw on|off] [hex on|off] [offset N] [length N] So you mean [offset N] [length N]. That is going to be hard, but the API is broken for complex SFPs with optional pages. And it is not well defined exactly what offset means. You can keep backwards compatibility by identifying the SFP from page 0, and then reading the pages in the order the ioctl would do. Let user space handle it, for those SFPs which the kernel already supports. For SFPs which the kernel does not support, i would just return not supported. You can do the same for raw. However, for new SFPs, for raw you can run the decoder but output to /dev/null. That loads into the cache all the pages which the decoder knows about. You can then dump the cache. You probably need a new format, to give an indication of what each page actually is. Maybe you want to add new options [page N] [ bank N] to allow arbitrary queries to be made? Again, you can answer these from the cache, so the old ioctl interface could work if asked for pages which the old API had. Andrew
On 11/26/2020 5:21 PM, Andrew Lunn wrote: >>> If i was implementing the ethtool side of it, i would probably do some >>> sort of caching system. We know page 0 should always exist, so >>> pre-load that into the cache. Try the netlink API first. If that >>> fails, use the ioctl interface. If the ioctl is used, put everything >>> returned into the cache. >> I am not sure what you mean by cache here. Don't you want to read page 0 >> once you got the ethtool command to read from the module ? If not, then at >> what stage ? > At the beginning, you try the netlink API and ask for pager 0, bytes > 0-127. If you get a page, put it into the cache. If not, use the ioctl > interface, which could return one page, or multiple pages. Put them > all into the cache. OK, but if the caching system is checking one time netlink and one time ioctl, it means this cache should be in user space, or did you mean to have this cache in kernel ? >>> The decoder can then start decoding, see what >>> bits are set indicating other pages should be available. Ask for them >>> from the cache. The netlink API can go fetch them and load them into >>> the cache. If they cannot be loaded return ENODEV, and the decoder has >>> to skip what it wanted to decode. >> So the decoder should read page 0 and check according to page 0 and >> specification which pages should be present, right ? > Yes. It ask the cache, give me a pointer to page 0, bytes 0-127. It > then decodes that, looking at the enumeration data to indicate what > other pages should be available. Maybe it decides, page 0, bytes > 128-255 should exist, so it asks the cache for a pointer to that. If > using netlink, it would ask the kernel for that data, put it into the > cache, and return a pointer. If using ioctl, it already knows if it > has that data, so it just returns a pointer, so says sorry, not > available. > >> What about the global offset that we currently got when user doesn't specify >> a page, do you mean that this global offset goes through the optional and >> non optional pages that exist and skip the ones that are missing according >> to the specific EEPROM ? > ethtool -m|--dump-module-eeprom|--module-info devname [raw on|off] [hex on|off] [offset N] [length N] > > So you mean [offset N] [length N]. Yes, that's the current options and we can either try coding new implementation for that or just call the current ioctl implementation. The new code can be triggered once options [bank N] and [Page N] are used. > > That is going to be hard, but the API is broken for complex SFPs with > optional pages. And it is not well defined exactly what offset means. > You can keep backwards compatibility by identifying the SFP from page > 0, and then reading the pages in the order the ioctl would do. Let > user space handle it, for those SFPs which the kernel already > supports. For SFPs which the kernel does not support, i would just > return not supported. You can do the same for raw. However, for new > SFPs, for raw you can run the decoder but output to /dev/null. That > loads into the cache all the pages which the decoder knows about. You > can then dump the cache. You probably need a new format, to give an > indication of what each page actually is. OK, if I got it right on current API [offset N] [length N] just call ioctl current implementation, while using the option [raw on] will call new implementation for new SFPs (CMIS 4). Also using [bank N] and [page N] will call new implementation for new SFPs. > Maybe you want to add new options [page N] [ bank N] to allow > arbitrary queries to be made? Exactly what I meant, I actually thought of letting the user ask for the page he wants, he should know what he needs. > Again, you can answer these from the > cache, so the old ioctl interface could work if asked for pages which > the old API had. Yes, for the simple EEPROM types that have 1 or 4 pages, ioctl read should be enough to get the data. > > Andrew
> OK, but if the caching system is checking one time netlink and one time > ioctl, it means this cache should be in user space, or did you mean to have > this cache in kernel ? This is all in userspace, in the ethtool code. > > > What about the global offset that we currently got when user doesn't specify > > > a page, do you mean that this global offset goes through the optional and > > > non optional pages that exist and skip the ones that are missing according > > > to the specific EEPROM ? > > ethtool -m|--dump-module-eeprom|--module-info devname [raw on|off] [hex on|off] [offset N] [length N] > > > > So you mean [offset N] [length N]. > > > Yes, that's the current options and we can either try coding new > implementation for that or just call the current ioctl implementation. The > new code can be triggered once options [bank N] and [Page N] are used. You cannot rely on the ioctl being available. New drivers won't implement it, if they have the netlink code. Drivers will convert from get_module_info to whatever new ndo call you add for netlink. > OK, if I got it right on current API [offset N] [length N] just call ioctl > current implementation, while using the option [raw on] will call new > implementation for new SFPs (CMIS 4). Also using [bank N] and [page N] will > call new implementation for new SFPs. Not just CMIS. All SFPs. Andrew
On 27-Nov-20 17:56, Andrew Lunn wrote: >> OK, but if the caching system is checking one time netlink and one time >> ioctl, it means this cache should be in user space, or did you mean to have >> this cache in kernel ? > This is all in userspace, in the ethtool code. > >>>> What about the global offset that we currently got when user doesn't specify >>>> a page, do you mean that this global offset goes through the optional and >>>> non optional pages that exist and skip the ones that are missing according >>>> to the specific EEPROM ? >>> ethtool -m|--dump-module-eeprom|--module-info devname [raw on|off] [hex on|off] [offset N] [length N] >>> >>> So you mean [offset N] [length N]. >> >> Yes, that's the current options and we can either try coding new >> implementation for that or just call the current ioctl implementation. The >> new code can be triggered once options [bank N] and [Page N] are used. > You cannot rely on the ioctl being available. New drivers won't > implement it, if they have the netlink code. Drivers will convert from > get_module_info to whatever new ndo call you add for netlink. > >> OK, if I got it right on current API [offset N] [length N] just call ioctl >> current implementation, while using the option [raw on] will call new >> implementation for new SFPs (CMIS 4). Also using [bank N] and [page N] will >> call new implementation for new SFPs. > Not just CMIS. All SFPs. > > Andrew Hi Andrew, Following this conversation, I wrote some pseudocode checking if I'm on right path here. Please review: struct eeprom_page { u8 page_number; u8 bank_number; u16 offset; u16 data_length; u8 *data; } Check every requested page, offset and length availability in userspace, depending on module standard and EEPROM data. ================================================================================ cache_fetch_add(page_number, bank_number, offset, length) { ethnla_put_u8 (msgbuff, ETHTOOL_A_EEPROM_PAGE_NUMBER, page_number); ethnla_put_u8 (msgbuff, ETHTOOL_A_EEPROM_BANK_NUMBER, bank_number); ethnla_put_u16(msgbuff, ETHTOOL_A_EEPROM_PAGE_OFFSET, offset); ethnla_put_u16(msgbuff, ETHTOOL_A_EEPROM_PAGE_LENGTH, length); err = nlsock_send_get_request(eeprom_page_reply_cb) // caches a page if (err < 0) return err; } // Any call should be a pair of cache_fetch_add() with error check and only // then a cache_get(), but for pseudocode this will do cache_get_page(page_number, bank_number, offset, length) { if (!cache_contains(page_number, bank_number, offset, length)) cache_fetch_add(page_number, bank_number, offset, length); return cache_get(page_number, bank_number); } print_cmis_y() { page_data = cache_get_page(0, 0, 0, 256)->data; print_page_zero(page_data); page_data = cache_get_page(1, 0, 128, 128)->data; print_page_one(page_data); } print_human_readable() { spec_id = cache_get_page(0, 0, 0, 128)->data[0]; switch (spec_id) { case sff_xxxx: print_sff_xxxx(); case cmis_y: print_cmis_y(); default: print_hex(); } } getmodule_reply_cb() { if (offset || hex || bank_number || page number) print_hex(); else // if _human_readable() decoder needs more than page 00, it will // fetch it on demand print_human_readable(); } eeprom_page_reply_cb() { cache_add(new_page); } page_available(page_number) { spec_id = cache_get_page(0, 0, 0, 128)->data[0]; // check bits indicating page availability switch (spec_id) { case sff_xxxx: return is_page_available_sff_xxxx(page_number); case cmis_y: return is_page_available_cmis_y(page_number, bank_number); default: return -EINVAL; } } nl_getmodule() { msg_init(); // request first low page ret = cache_fetch_add(0, 0, 0, 128); if (ret < 0) // netlink unsupported, fall back to ioctl return ret; netlink_cmd_check(); nl_parser() // check bits indicating page availability according to used spec if (page_available(page_number, bank_number)) { if (cache_contains(page_number, bank_number, offset, length)) print_page(page_number, bank_number, offset, length) else { ret = cache_fetch_add(); if (ret < 0) return ret; print_page(page_number, bank_number, offset, length); } } else { return -EINVAL; } } Or just proceed to request any page and depend on kernel parameter validation ================================================================================ getmodule_reply_cb() { if (offset || hex || bank_number || page_number) print_hex(); else // if _human_readable() decoder needs more than page 00, it will // fetch it on demand print_human_readable(); } nl_getmodule() { // request page straight away netlink_cmd_check(); nl_parser(); /* * nl_parser() sets page_number, page_bank, offset and length and * prepares to pass them to the kernel. See eeprom_page_parse_request() */ ret = nlsock_sendmsg(getmodule_reply_cb); if (ret < 0) retrun ret; } Kernel This part is the same for both userspace variants ================================================================================ kernel_fallback(request, data) { struct ethtool_eeprom eeprom; if (request->bank_number) return -EINVAL; // also take into account page 00 size eeprom.offset = request->offset + page_size * request->page_number; eeprom.len = request->length; eeprom.cmd = ETHTOOL_GMODULEEEPROM; eeprom.data = data; err = ops->get_module_eeprom(dev, &eeprom, eeprom.data) if (err < 0) return err; // prepare data for response via netlink like for supported ndo } eeprom_page_parse_request() { struct eeprom_page_req_info *request; request->offset = nla_get_u16(tb[ETHTOOL_A_EEPROM_PAGE_OFFSET]); request->length = nla_get_u16(tb[ETHTOOL_A_EEPROM_PAGE_LENGTH]); request->page_number = nla_get_u8(tb[ETHTOOL_A_EEPROM_PAGE_NUMBER]); request->bank_number = nla_get_u8(tb[ETHTOOL_A_EEPROM_PAGE_BANK_NUMBER]); } eeprom_page_prepare_data(request) { u8 data[128]; u8 page_zero_data[128]; if (!ops->get_module_eeprom_page) { if (ops->get_module_info && ops->get_module_eeprom) return kernel_fallback(request, &data); else return -EOPNOTSUPP; } // fetch page 00 for further checks err = ops->get_module_eeprom_page(dev, 0, 128, 0, 0, &page_zero_data); if (err < 0) return err; // Checks according to spec, similar to page_available() in userspace. // May be ommited and passed directly to the driver for it to validate if (is_request_valid(request, &page_zero_data)) err = ops->get_module_eeprom_page(dev, request->offset, request->length, request->page_number, request->bank_number, &data); else return -EINVAL; } Driver It is required to implement get_module_eeprom_page() ndo, where it queries its EEPROM and copies to u8 *data array allocated by the kernel previously. The ndo has the following prototype: int get_module_eeprom_page(struct net_device *, u16 offset, u16 length, u8 page_number, u8 bank_number, u8 *data); The driver may test whatever it may need in this ndo implementation. Its parameters may also be passed using a special struct. =============================================================================== driver_get_module_eeprom_page() { vendor_query_module_eeprom(page_number, bank_number, offset, length, data); } Thanks, Vlad
> Hi Andrew, > > Following this conversation, I wrote some pseudocode checking if I'm on > right path here. > Please review: > > struct eeprom_page { > u8 page_number; > u8 bank_number; > u16 offset; > u16 data_length; > u8 *data; > } I'm wondering about offset and data_length, in this context. I would expect you always ask the kernel for the full page, not part of it. Even when user space asks for just part of a page. That keeps you cache management simpler. But maybe some indicator of low/high is needed, since many pages are actually 1/2 pages? The other thing to consider is SFF-8472 and its use of two different i2c addresses, A0h and A2h. These are different to pages and banks. > print_human_readable() > { > spec_id = cache_get_page(0, 0, 0, 128)->data[0]; > switch (spec_id) { > case sff_xxxx: > print_sff_xxxx(); > case cmis_y: > print_cmis_y(); > default: > print_hex(); > } > } You want to keep as much of the existing ethtool code as you can, but the basic idea looks O.K. > getmodule_reply_cb() > { > if (offset || hex || bank_number || page number) > print_hex(); > else > // if _human_readable() decoder needs more than page 00, it > will > // fetch it on demand > print_human_readable(); > } Things get interesting here. Say this is page 0, and print_human_readable() finds a bit indicating page 1 is valid. So it requests page 1. We go recursive. While deep down in print_human_readable(), we send the next netlink message and call getmodule_reply_cb() when the answer appears. I've had problems with some of the netlink code not liking recursive calls. So i suggest you try to find a different structure for the code. Try to complete the netlink call before doing the decoding. So add the page to the cache and then return. Do the decode after nlsock_sendmsg() has returned. > Driver > It is required to implement get_module_eeprom_page() ndo, where it queries > its EEPROM and copies to u8 *data array allocated by the kernel previously. > The ndo has the following prototype: > int get_module_eeprom_page(struct net_device *, u16 offset, u16 length, > u8 page_number, u8 bank_number, u8 *data); I would include extack here, so we can get better error messages. Andrew
On 29-Dec-20 18:25, Andrew Lunn wrote: >> Hi Andrew, >> >> Following this conversation, I wrote some pseudocode checking if I'm on >> right path here. >> Please review: >> >> struct eeprom_page { >> u8 page_number; >> u8 bank_number; >> u16 offset; >> u16 data_length; >> u8 *data; >> } > I'm wondering about offset and data_length, in this context. I would > expect you always ask the kernel for the full page, not part of > it. Even when user space asks for just part of a page. That keeps you > cache management simpler. As far as I know, there may be bytes, which may change on read. For example, clear on read values in CMIS 4.0. Then, retrieving whole page every time may be incorrect. So I kept these for cases, when user asks for specific few bytes. > But maybe some indicator of low/high is > needed, since many pages are actually 1/2 pages? I was planning to use offset and data_length fields to indicate the available page. For example, high page will have offset 128 and data_length 128. > The other thing to consider is SFF-8472 and its use of two different > i2c addresses, A0h and A2h. These are different to pages and banks. I wasn't aware of that. It complicates things a bit, should we add a parameter of i2c address? So in this case page 0 will be with i2c address A0h. And if user needs page 0 from i2c address A2h, he will specify it in command line. And for other specs, this parameter will not be supported. >> print_human_readable() >> { >> spec_id = cache_get_page(0, 0, 0, 128)->data[0]; >> switch (spec_id) { >> case sff_xxxx: >> print_sff_xxxx(); >> case cmis_y: >> print_cmis_y(); >> default: >> print_hex(); >> } >> } > You want to keep as much of the existing ethtool code as you can, but > the basic idea looks O.K. Yes, under print_sff_xxxx(), for example, I meant using existing functions. Either as is, or refactoring according to cache requirements. >> getmodule_reply_cb() >> { >> if (offset || hex || bank_number || page number) >> print_hex(); >> else >> // if _human_readable() decoder needs more than page 00, it >> will >> // fetch it on demand >> print_human_readable(); >> } > Things get interesting here. Say this is page 0, and > print_human_readable() finds a bit indicating page 1 is valid. So it > requests page 1. We go recursive. While deep down in > print_human_readable(), we send the next netlink message and call > getmodule_reply_cb() when the answer appears. I've had problems with > some of the netlink code not liking recursive calls. > > So i suggest you try to find a different structure for the code. Try > to complete the netlink call before doing the decoding. So add the > page to the cache and then return. Do the decode after > nlsock_sendmsg() has returned. I'm thinking about a standard-specific function, which will prefetch pages needed by print_human_readable(). It will check the standard ID, and go request pages and add them to the cache. Then, decoder kicks with already cached pages. This will eliminate recursive netlink calls. >> Driver >> It is required to implement get_module_eeprom_page() ndo, where it queries >> its EEPROM and copies to u8 *data array allocated by the kernel previously. >> The ndo has the following prototype: >> int get_module_eeprom_page(struct net_device *, u16 offset, u16 length, >> u8 page_number, u8 bank_number, u8 *data); > > I would include extack here, so we can get better error messages. OK, I will add extack. Thanks, Vlad
On Wed, Dec 30, 2020 at 03:55:02PM +0200, Vladyslav Tarasiuk wrote: > > On 29-Dec-20 18:25, Andrew Lunn wrote: > > > Hi Andrew, > > > > > > Following this conversation, I wrote some pseudocode checking if I'm on > > > right path here. > > > Please review: > > > > > > struct eeprom_page { > > > u8 page_number; > > > u8 bank_number; > > > u16 offset; > > > u16 data_length; > > > u8 *data; > > > } > > I'm wondering about offset and data_length, in this context. I would > > expect you always ask the kernel for the full page, not part of > > it. Even when user space asks for just part of a page. That keeps you > > cache management simpler. > As far as I know, there may be bytes, which may change on read. > For example, clear on read values in CMIS 4.0. Ah, i did not know there were such bits. I will go read the spec. But it should not really matter. If the SFP driver is interested in these bits, it will have to intercept the read and act on the values. > I wasn't aware of that. It complicates things a bit, should we add a > parameter of i2c address? So in this case page 0 will be with i2c > address A0h. And if user needs page 0 from i2c address A2h, he will > specify it in command line. Not on the command line. You should be able to determine from reading page 0 at A0h is the diagnostics are at A2h or a page of A0h. That is the whole point of this API, we decode the first page, and that tells us what other pages should be available. So adding the i2c address to the netlink message would be sensible. And i would not be too surprised if there are SFPs with proprietary registers on other addresses, which could be interesting to dump, if you can get access to the needed datasheets. Andrew
On 30-Dec-20 17:36, Andrew Lunn wrote: > On Wed, Dec 30, 2020 at 03:55:02PM +0200, Vladyslav Tarasiuk wrote: >> On 29-Dec-20 18:25, Andrew Lunn wrote: >>>> Hi Andrew, >>>> >>>> Following this conversation, I wrote some pseudocode checking if I'm on >>>> right path here. >>>> Please review: >>>> >>>> struct eeprom_page { >>>> u8 page_number; >>>> u8 bank_number; >>>> u16 offset; >>>> u16 data_length; >>>> u8 *data; >>>> } >>> I'm wondering about offset and data_length, in this context. I would >>> expect you always ask the kernel for the full page, not part of >>> it. Even when user space asks for just part of a page. That keeps you >>> cache management simpler. >> As far as I know, there may be bytes, which may change on read. >> For example, clear on read values in CMIS 4.0. > Ah, i did not know there were such bits. I will go read the spec. But > it should not really matter. If the SFP driver is interested in these > bits, it will have to intercept the read and act on the values. But in case user requests a few bytes from a page with clear-on-read values, reading full page will clear all such bytesfrom user perspective even if they were not requested. Driver may intercept the read, but for user it will look like those bytes were not set. Current user interface allows arbitrary reads, so I wanted to keep this behavior pretty much exactly like it is now - request specific part of a page, get this part without any extra data. >> I wasn't aware of that. It complicates things a bit, should we add a >> parameter of i2c address? So in this case page 0 will be with i2c >> address A0h. And if user needs page 0 from i2c address A2h, he will >> specify it in command line. > Not on the command line. You should be able to determine from reading > page 0 at A0h is the diagnostics are at A2h or a page of A0h. That is > the whole point of this API, we decode the first page, and that tells > us what other pages should be available. So adding the i2c address to > the netlink message would be sensible. And i would not be too > surprised if there are SFPs with proprietary registers on other > addresses, which could be interesting to dump, if you can get access > to the needed datasheets. Without command line argument user will not be able to request a single A2h page, for example. He will see it only in some kind of general dump - with human-readable decoder usage or multiple page dump. And same goes forpages on other i2c addresses. How to know what to dump, if user does not provide i2c address and there is no way to know what to request from proprietary SFPs? Thanks, Vlad
On Mon, Jan 04, 2021 at 05:24:11PM +0200, Vladyslav Tarasiuk wrote: > > On 30-Dec-20 17:36, Andrew Lunn wrote: > > On Wed, Dec 30, 2020 at 03:55:02PM +0200, Vladyslav Tarasiuk wrote: > > > On 29-Dec-20 18:25, Andrew Lunn wrote: > > > > > Hi Andrew, > > > > > > > > > > Following this conversation, I wrote some pseudocode checking if I'm on > > > > > right path here. > > > > > Please review: > > > > > > > > > > struct eeprom_page { > > > > > u8 page_number; > > > > > u8 bank_number; > > > > > u16 offset; > > > > > u16 data_length; > > > > > u8 *data; > > > > > } > > > > I'm wondering about offset and data_length, in this context. I would > > > > expect you always ask the kernel for the full page, not part of > > > > it. Even when user space asks for just part of a page. That keeps you > > > > cache management simpler. > > > As far as I know, there may be bytes, which may change on read. > > > For example, clear on read values in CMIS 4.0. > > Ah, i did not know there were such bits. I will go read the spec. But > > it should not really matter. If the SFP driver is interested in these > > bits, it will have to intercept the read and act on the values. > > But in case user requests a few bytes from a page with clear-on-read > values, reading full page will clear all such bytesfrom user perspective > even if they were not requested. Driver may intercept the read, but for > user it will look like those bytes were not set. Yes, O.K. Reading individual words does make sense. > Without command line argument user will not be able to request a single > A2h page, for example. He will see it only in some kind of general dump - > with human-readable decoder usage or multiple page dump. > > And same goes forpages on other i2c addresses. How to know what to dump, > if user does not provide i2c address and there is no way to know what to > request from proprietary SFPs? So we should look at this from the perspective of use cases. Currently we have: ethtool -m|--dump-module-eeprom|--module-info devname [raw on|off] [hex on|off] [offset N] [length N] If you use it without any of [raw on|off] [hex on|off] [offset N] [length N] it decodes what it finds. As soon as you pass any of these options, the decoding is disabled and is just dumps values, either raw or hex. I would say, i2c address as a parameter can be added, but again, passing it disables decoding, is just dumps raw or hex. When not passed, and decoding is enabled, the decoder should decide if A2 is available based on what is finds in page 0, and ask for it. We also need to clearly define what offset and length means in this context. It has to be within a specific page if page, bank or i2c address has been passed, unlike what it currently means which is offset into the current blob returned by the kernel. I also took a look at CMIS. It has interesting semantics for address wrap around when doing multiple byte reads. A read which reaches 127 wraps around to 0. A read which reached 255 wraps around to 128. So for the kernel API, we probably do not want to allow offset/length to cause a wrap around. You can only read within the low 128 bytes, or the upper 128 bytes. Andrew