Message ID | 20240823030242.3083528-1-jiawenwu@trustnetic.com |
---|---|
Headers | show |
Series | Add I2C bus lock for Wangxun | expand |
On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote: > Sometimes the driver can not get the SFP information because the I2C bus > is accessed by the firmware at the same time. Please could you explain this some more. What firmware? There some registers which are clear on read. They don't work when you have multiple entities reading them. Andrew
On Mon, Aug 26, 2024 9:33 AM, Andrew Lunn wrote: > On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote: > > Sometimes the driver can not get the SFP information because the I2C bus > > is accessed by the firmware at the same time. > > Please could you explain this some more. What firmware? It's the firmware of our ethernet devices. > There some registers which are clear on read. They don't work when you > have multiple entities reading them. I'm not trying to multiple access the I2C registers, but these registers cannot be accessed by other interfaces in the process of reading complete information each time. So there is a semaphore needed that locks up the entire read process.
On Mon, Aug 26, 2024 at 10:04:42AM +0800, Jiawen Wu wrote: > On Mon, Aug 26, 2024 9:33 AM, Andrew Lunn wrote: > > On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote: > > > Sometimes the driver can not get the SFP information because the I2C bus > > > is accessed by the firmware at the same time. > > > > Please could you explain this some more. What firmware? > > It's the firmware of our ethernet devices. > > > There some registers which are clear on read. They don't work when you > > have multiple entities reading them. > > I'm not trying to multiple access the I2C registers, but these registers cannot > be accessed by other interfaces in the process of reading complete information > each time. So there is a semaphore needed that locks up the entire read process. More details please. Linux assume it is driving the hardware. Your firmware cannot be touching any registers which will clear on read. QSFP states that registers 3-31 of page 0 are all clear on read, for example. The firmware should also not be setting any registers, otherwise you can confuse Linux which assumes registers it set stay set, because it is controlling the hardware. Your firmware also needs to handle that Linux can change the page. If the firmware changes the page, it must restore it back to whatever page Linux selected, etc. The fact you are submitting this for net suggests you have seen real issues. Please describe what those issues are. Andrew
On Mon, Aug 26, 2024 10:34 AM, Andrew Lunn wrote: > On Mon, Aug 26, 2024 at 10:04:42AM +0800, Jiawen Wu wrote: > > On Mon, Aug 26, 2024 9:33 AM, Andrew Lunn wrote: > > > On Fri, Aug 23, 2024 at 11:02:39AM +0800, Jiawen Wu wrote: > > > > Sometimes the driver can not get the SFP information because the I2C bus > > > > is accessed by the firmware at the same time. > > > > > > Please could you explain this some more. What firmware? > > > > It's the firmware of our ethernet devices. > > > > > There some registers which are clear on read. They don't work when you > > > have multiple entities reading them. > > > > I'm not trying to multiple access the I2C registers, but these registers cannot > > be accessed by other interfaces in the process of reading complete information > > each time. So there is a semaphore needed that locks up the entire read process. > > More details please. > > Linux assume it is driving the hardware. Your firmware cannot be > touching any registers which will clear on read. QSFP states that > registers 3-31 of page 0 are all clear on read, for example. The > firmware should also not be setting any registers, otherwise you can > confuse Linux which assumes registers it set stay set, because it is > controlling the hardware. > > Your firmware also needs to handle that Linux can change the page. If > the firmware changes the page, it must restore it back to whatever > page Linux selected, etc. > > The fact you are submitting this for net suggests you have seen real > issues. Please describe what those issues are. The error log shows: [257681.367827] sfp sfp.1025: Host maximum power 1.0W [257681.370813] txgbe 0000:04:00.1: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:02:02.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) [257681.373364] txgbe 0000:04:00.1 enp4s0f1: renamed from eth0 [257681.434719] txgbe 0000:04:00.1 enp4s0f1: configuring for inband/10gbase-r link mode [257681.676747] sfp sfp.1025: EEPROM base structure checksum failure: 0x63 != 0x1f [257681.676755] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00 ............g... [257681.676757] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 64 ....FiberStore d [257681.676759] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53 ...!SFP-10GS [257681.676760] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f R-85 A .R.. [257681.676762] sfp EE: 00000040: 00 81 cd 5b df 25 0a bd 40 f6 c6 ce 47 8e ff ff ...[.%..@...G... [257681.676763] sfp EE: 00000050: 10 d8 24 33 44 8e ff ff 10 41 b0 9a ff ff ff ff ..$3D....A...... It looks like some fields are read incorrectly. For comparison, I printed the SFP info when it loaded correctly: [260908.194533] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00 ............g... [260908.194536] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 20 ....FiberStore [260908.194538] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53 ...!SFP-10GS [260908.194540] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f R-85 A .R.. [260908.194541] sfp EE: 00000040: 40 63 bd df 40 8e ff ff 10 41 b0 9a ff ff ff ff @c..@....A...... [260908.194543] sfp EE: 00000050: 10 58 5b 29 41 8e ff ff 10 41 b0 9a ff ff ff ff .X[)A....A...... [260908.198205] sfp sfp.1025: module FiberStore SFP-10GSR-85 rev A sn G1804125607 dc 180605 Since the read mechanism of I2C is to write the offset and read command first, and then read the target address. I think it's possible that the different offsets be written at the same time, from Linux and firmware.
On Fri, Aug 23, 2024 7:05 PM, Jarkko Nikula wrote: > Hi > Hi > > On 8/23/24 6:02 AM, Jiawen Wu wrote: > > Sometimes the driver can not get the SFP information because the I2C bus > > is accessed by the firmware at the same time. So we need to add the lock > > on the I2C bus access. The hardware semaphores perform this lock. > > > > Jiawen Wu (3): > > net: txgbe: add IO address in I2C platform device data > > i2c: designware: add device private data passing to lock functions > > i2c: designware: support hardware lock for Wangxun 10Gb NIC > > > I was wondering the Fixes tag use in the series. Obviously patchset is > not fixing a regression so question is what happens when issue occurs? > > I don't think e.g. failing I2C transfer with an error code yet qualifies > backporting into Linux stable? Ah, I think this was a leftover bug from when I added Wangxun NIC support in I2C designware, so I added a fix tag.
> > More details please. > > > > Linux assume it is driving the hardware. Your firmware cannot be > > touching any registers which will clear on read. QSFP states that > > registers 3-31 of page 0 are all clear on read, for example. The > > firmware should also not be setting any registers, otherwise you can > > confuse Linux which assumes registers it set stay set, because it is > > controlling the hardware. > > > > Your firmware also needs to handle that Linux can change the page. If > > the firmware changes the page, it must restore it back to whatever > > page Linux selected, etc. > > > > The fact you are submitting this for net suggests you have seen real > > issues. Please describe what those issues are. > > The error log shows: > > [257681.367827] sfp sfp.1025: Host maximum power 1.0W > [257681.370813] txgbe 0000:04:00.1: 31.504 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x4 link at 0000:02:02.0 (capable > of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) > [257681.373364] txgbe 0000:04:00.1 enp4s0f1: renamed from eth0 > [257681.434719] txgbe 0000:04:00.1 enp4s0f1: configuring for inband/10gbase-r link mode > [257681.676747] sfp sfp.1025: EEPROM base structure checksum failure: 0x63 != 0x1f > [257681.676755] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00 ............g... > [257681.676757] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 64 ....FiberStore d > [257681.676759] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53 ...!SFP-10GS > [257681.676760] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f R-85 A .R.. > [257681.676762] sfp EE: 00000040: 00 81 cd 5b df 25 0a bd 40 f6 c6 ce 47 8e ff ff ...[.%..@...G... > [257681.676763] sfp EE: 00000050: 10 d8 24 33 44 8e ff ff 10 41 b0 9a ff ff ff ff ..$3D....A...... > > It looks like some fields are read incorrectly. For comparison, I printed the > SFP info when it loaded correctly: > > [260908.194533] sfp EE: 00000000: 03 04 07 10 00 00 01 00 00 00 00 06 67 02 00 00 ............g... > [260908.194536] sfp EE: 00000010: 1e 0f 00 00 46 69 62 65 72 53 74 6f 72 65 20 20 ....FiberStore > [260908.194538] sfp EE: 00000020: 20 20 20 20 00 00 1b 21 53 46 50 2d 31 30 47 53 ...!SFP-10GS > [260908.194540] sfp EE: 00000030: 52 2d 38 35 20 20 20 20 41 20 20 20 03 52 00 1f R-85 A .R.. > [260908.194541] sfp EE: 00000040: 40 63 bd df 40 8e ff ff 10 41 b0 9a ff ff ff ff @c..@....A...... > [260908.194543] sfp EE: 00000050: 10 58 5b 29 41 8e ff ff 10 41 b0 9a ff ff ff ff .X[)A....A...... > [260908.198205] sfp sfp.1025: module FiberStore SFP-10GSR-85 rev A sn G1804125607 dc 180605 > > Since the read mechanism of I2C is to write the offset and read command > first, and then read the target address. I think it's possible that the different > offsets be written at the same time, from Linux and firmware. O.K, that is bad. The SFP is totally unreliable... You however have still not answered my question. What is the firmware accessing? How does it handle pages? The hack you have put in place is per i2c transaction. But accessing pages is likely to be multiple transactions. One to change the page, followed by a few reads/writes in the new page, then maybe followed by a transactions to return to page 0. I think your best solution is to simply take the mutex and never release it. Block your firmware from accessing the SFP. Andrew