diff mbox series

i2c: omap: fix IRQ storms

Message ID 20250207185435.751878-1-andreas@kemnade.info
State New
Headers show
Series i2c: omap: fix IRQ storms | expand

Commit Message

Andreas Kemnade Feb. 7, 2025, 6:54 p.m. UTC
On the GTA04A5 writing a reset command to the gyroscope causes IRQ
storms because NACK IRQs are enabled and therefore triggered but not
acked.

Sending a reset command to the gyroscope by
i2cset 1 0x69 0x14 0xb6
with an additional debug print in the ISR (not the thread) itself
causes

[ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
[ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
[ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
[ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
[ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
[ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
[ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
repeating till infinity
[...]
(0x2 = NACK, 0x100 = Bus free, which is not enabled)
Apparently no other IRQ bit gets set, so this stalls.

Do not ignore enabled interrupts and make sure they are acked.
If the NACK IRQ is not needed, it should simply not enabled, but
according to the above log, caring about it is necessary unless
the Bus free IRQ is enabled and handled. The assumption that is
will always come with a ARDY IRQ, which was the idea behind
ignoring it, proves wrong.
It is true for simple reads from an unused address.

So revert
commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").

The offending commit was used to reduce the false detections in
i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
rare false detections (I have never seen such on my systems) is the
lesser devil than having basically the system hanging completely.

No more details came to light in the corresponding email thread since
several months:
https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
so no better fix to solve both problems can be developed right now.

Fixes: c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
CC: <stable@kernel.org>
Signed-off-by: Andreas Kemnade <andreas@kemnade.info>
---
 drivers/i2c/busses/i2c-omap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andi Shyti Feb. 19, 2025, 7:22 p.m. UTC | #1
Hi,

On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:
> On the GTA04A5 writing a reset command to the gyroscope causes IRQ
> storms because NACK IRQs are enabled and therefore triggered but not
> acked.
> 
> Sending a reset command to the gyroscope by
> i2cset 1 0x69 0x14 0xb6
> with an additional debug print in the ISR (not the thread) itself
> causes
> 
> [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
> [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
> [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
> [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> repeating till infinity
> [...]
> (0x2 = NACK, 0x100 = Bus free, which is not enabled)
> Apparently no other IRQ bit gets set, so this stalls.
> 
> Do not ignore enabled interrupts and make sure they are acked.
> If the NACK IRQ is not needed, it should simply not enabled, but
> according to the above log, caring about it is necessary unless
> the Bus free IRQ is enabled and handled. The assumption that is
> will always come with a ARDY IRQ, which was the idea behind
> ignoring it, proves wrong.
> It is true for simple reads from an unused address.
> 
> So revert
> commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
> 
> The offending commit was used to reduce the false detections in
> i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
> rare false detections (I have never seen such on my systems) is the
> lesser devil than having basically the system hanging completely.
> 
> No more details came to light in the corresponding email thread since
> several months:
> https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
> so no better fix to solve both problems can be developed right now.

I need someone from TI or someone who can test to ack here.

Can someone help?

Thanks,
Andi
H. Nikolaus Schaller Feb. 20, 2025, 8:43 a.m. UTC | #2
Hi,

> Am 19.02.2025 um 20:22 schrieb Andi Shyti <andi.shyti@kernel.org>:
> 
> Hi,
> 
> On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:
>> On the GTA04A5 writing a reset command to the gyroscope causes IRQ
>> storms because NACK IRQs are enabled and therefore triggered but not
>> acked.
>> 
>> Sending a reset command to the gyroscope by
>> i2cset 1 0x69 0x14 0xb6
>> with an additional debug print in the ISR (not the thread) itself
>> causes
>> 
>> [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
>> [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
>> [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
>> [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
>> [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
>> [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
>> [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
>> repeating till infinity
>> [...]
>> (0x2 = NACK, 0x100 = Bus free, which is not enabled)
>> Apparently no other IRQ bit gets set, so this stalls.
>> 
>> Do not ignore enabled interrupts and make sure they are acked.
>> If the NACK IRQ is not needed, it should simply not enabled, but
>> according to the above log, caring about it is necessary unless
>> the Bus free IRQ is enabled and handled. The assumption that is
>> will always come with a ARDY IRQ, which was the idea behind
>> ignoring it, proves wrong.
>> It is true for simple reads from an unused address.
>> 
>> So revert
>> commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
>> 
>> The offending commit was used to reduce the false detections in
>> i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
>> rare false detections (I have never seen such on my systems) is the
>> lesser devil than having basically the system hanging completely.
>> 
>> No more details came to light in the corresponding email thread since
>> several months:
>> https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
>> so no better fix to solve both problems can be developed right now.
> 
> I need someone from TI or someone who can test to ack here.
> 
> Can someone help?

Well, I think since this is simply a full revert of

commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings")

to the status before. Therefore the status after revert was tested for many years
until c770657bd261 arrived to try to solve one issue but it apparently introduced
another one which is more severe and difficult to work around.

For real world tests please see also:

https://lore.kernel.org/linux-omap/664241E0-8D6B-4783-997B-2D8510ADAEA3@goldelico.com/
https://lore.kernel.org/linux-omap/ad0fe7ca-fb6c-4c19-b4b3-0f29ddaa92c3@jm0.eu/

BR and thanks,
Nikolaus
Andreas Kemnade Feb. 20, 2025, 9:08 a.m. UTC | #3
Am Wed, 19 Feb 2025 20:22:13 +0100
schrieb Andi Shyti <andi.shyti@kernel.org>:

> Hi,
> 
> On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:
> > On the GTA04A5 writing a reset command to the gyroscope causes IRQ
> > storms because NACK IRQs are enabled and therefore triggered but not
> > acked.
> > 
> > Sending a reset command to the gyroscope by
> > i2cset 1 0x69 0x14 0xb6
> > with an additional debug print in the ISR (not the thread) itself
> > causes
> > 
> > [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
> > [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
> > [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
> > [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > repeating till infinity
> > [...]
> > (0x2 = NACK, 0x100 = Bus free, which is not enabled)
> > Apparently no other IRQ bit gets set, so this stalls.
> > 
> > Do not ignore enabled interrupts and make sure they are acked.
> > If the NACK IRQ is not needed, it should simply not enabled, but
> > according to the above log, caring about it is necessary unless
> > the Bus free IRQ is enabled and handled. The assumption that is
> > will always come with a ARDY IRQ, which was the idea behind
> > ignoring it, proves wrong.
> > It is true for simple reads from an unused address.
> > 
> > So revert
> > commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
> > 
> > The offending commit was used to reduce the false detections in
> > i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
> > rare false detections (I have never seen such on my systems) is the
> > lesser devil than having basically the system hanging completely.
> > 
> > No more details came to light in the corresponding email thread since
> > several months:
> > https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
> > so no better fix to solve both problems can be developed right now.  
> 
> I need someone from TI or someone who can test to ack here.
> 
> Can someone help?
>
The original (IMHO minor) problem which should be fixed by c770657bd261
is hard to test, I have never seen that on any system (and as a
platform maintainer have a bunch of them) I have access to.
There is not much description anywhere about the system in which the
original system occured, and no reaction since several months from the
author, so I do not see anything which can be done.
Maybe it was just faulty hardware.

As said in the commit message, reverting it should be the lesser devil.
And that state was tested for many years.

Regards,
Andreas
Nishanth Menon Feb. 27, 2025, 2:20 p.m. UTC | #4
On 10:08-20250220, Andreas Kemnade wrote:
> Am Wed, 19 Feb 2025 20:22:13 +0100
> schrieb Andi Shyti <andi.shyti@kernel.org>:
> 
> > Hi,
> > 
> > On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:
> > > On the GTA04A5 writing a reset command to the gyroscope causes IRQ
> > > storms because NACK IRQs are enabled and therefore triggered but not
> > > acked.
> > > 
> > > Sending a reset command to the gyroscope by
> > > i2cset 1 0x69 0x14 0xb6
> > > with an additional debug print in the ISR (not the thread) itself
> > > causes
> > > 
> > > [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
> > > [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
> > > [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
> > > [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > > [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > repeating till infinity
> > > [...]
> > > (0x2 = NACK, 0x100 = Bus free, which is not enabled)
> > > Apparently no other IRQ bit gets set, so this stalls.
> > > 
> > > Do not ignore enabled interrupts and make sure they are acked.
> > > If the NACK IRQ is not needed, it should simply not enabled, but
> > > according to the above log, caring about it is necessary unless
> > > the Bus free IRQ is enabled and handled. The assumption that is
> > > will always come with a ARDY IRQ, which was the idea behind
> > > ignoring it, proves wrong.
> > > It is true for simple reads from an unused address.
> > > 
> > > So revert
> > > commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
> > > 
> > > The offending commit was used to reduce the false detections in
> > > i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
> > > rare false detections (I have never seen such on my systems) is the
> > > lesser devil than having basically the system hanging completely.
> > > 
> > > No more details came to light in the corresponding email thread since
> > > several months:
> > > https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
> > > so no better fix to solve both problems can be developed right now.  
> > 
> > I need someone from TI or someone who can test to ack here.
> > 
> > Can someone help?
> >
> The original (IMHO minor) problem which should be fixed by c770657bd261
> is hard to test, I have never seen that on any system (and as a
> platform maintainer have a bunch of them) I have access to.
> There is not much description anywhere about the system in which the
> original system occured, and no reaction since several months from the
> author, so I do not see anything which can be done.
> Maybe it was just faulty hardware.
> 
> As said in the commit message, reverting it should be the lesser devil.
> And that state was tested for many years.

Can we not handle this slightly differently? leave the fix based on
compatible? we know that the i2c controller changed over time. the
i2cdetect bug fixed by c770657bd261 esp hard to find and fix.
Andreas Kemnade Feb. 27, 2025, 3:08 p.m. UTC | #5
Hi,

Am Thu, 27 Feb 2025 08:20:55 -0600
schrieb Nishanth Menon <nm@ti.com>:

> On 10:08-20250220, Andreas Kemnade wrote:
> > Am Wed, 19 Feb 2025 20:22:13 +0100
> > schrieb Andi Shyti <andi.shyti@kernel.org>:
> >   
> > > Hi,
> > > 
> > > On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:  
> > > > On the GTA04A5 writing a reset command to the gyroscope causes IRQ
> > > > storms because NACK IRQs are enabled and therefore triggered but not
> > > > acked.
> > > > 
> > > > Sending a reset command to the gyroscope by
> > > > i2cset 1 0x69 0x14 0xb6
> > > > with an additional debug print in the ISR (not the thread) itself
> > > > causes
> > > > 
> > > > [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
> > > > [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
> > > > [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
> > > > [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > > > [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > repeating till infinity
> > > > [...]
> > > > (0x2 = NACK, 0x100 = Bus free, which is not enabled)
> > > > Apparently no other IRQ bit gets set, so this stalls.
> > > > 
> > > > Do not ignore enabled interrupts and make sure they are acked.
> > > > If the NACK IRQ is not needed, it should simply not enabled, but
> > > > according to the above log, caring about it is necessary unless
> > > > the Bus free IRQ is enabled and handled. The assumption that is
> > > > will always come with a ARDY IRQ, which was the idea behind
> > > > ignoring it, proves wrong.
> > > > It is true for simple reads from an unused address.
> > > > 
> > > > So revert
> > > > commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
> > > > 
> > > > The offending commit was used to reduce the false detections in
> > > > i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
> > > > rare false detections (I have never seen such on my systems) is the
> > > > lesser devil than having basically the system hanging completely.
> > > > 
> > > > No more details came to light in the corresponding email thread since
> > > > several months:
> > > > https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
> > > > so no better fix to solve both problems can be developed right now.    
> > > 
> > > I need someone from TI or someone who can test to ack here.
> > > 
> > > Can someone help?
> > >  
> > The original (IMHO minor) problem which should be fixed by c770657bd261
> > is hard to test, I have never seen that on any system (and as a
> > platform maintainer have a bunch of them) I have access to.
> > There is not much description anywhere about the system in which the
> > original system occured, and no reaction since several months from the
> > author, so I do not see anything which can be done.
> > Maybe it was just faulty hardware.
> > 
> > As said in the commit message, reverting it should be the lesser devil.
> > And that state was tested for many years.  
> 
> Can we not handle this slightly differently? leave the fix based on
> compatible? we know that the i2c controller changed over time. the
> i2cdetect bug fixed by c770657bd261 esp hard to find and fix.
> 
Yes, if there are nicer solutions, then I agree. But if there is a case
where NACK should be ignored, then we should either

a) not set it in OMAP_I2C_IE_REG

or 

b) do something more sophisticated in omap_i2c_isr_thread() to not do
nonsense in that case.

Even just not setting NACK in IE should improve things, so maybe the
i2c get stuck but no IRQ storms. Of course conditions to do such could
depend on compatible. I will investigate what you have sent me.

Regards,
Andreas
Andreas Kemnade Feb. 27, 2025, 9:47 p.m. UTC | #6
Am Thu, 27 Feb 2025 08:20:55 -0600
schrieb Nishanth Menon <nm@ti.com>:

> On 10:08-20250220, Andreas Kemnade wrote:
> > Am Wed, 19 Feb 2025 20:22:13 +0100
> > schrieb Andi Shyti <andi.shyti@kernel.org>:
> >   
> > > Hi,
> > > 
> > > On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:  
> > > > On the GTA04A5 writing a reset command to the gyroscope causes IRQ
> > > > storms because NACK IRQs are enabled and therefore triggered but not
> > > > acked.
> > > > 
> > > > Sending a reset command to the gyroscope by
> > > > i2cset 1 0x69 0x14 0xb6
> > > > with an additional debug print in the ISR (not the thread) itself
> > > > causes
> > > > 
> > > > [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
> > > > [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
> > > > [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
> > > > [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > > > [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > repeating till infinity
> > > > [...]
> > > > (0x2 = NACK, 0x100 = Bus free, which is not enabled)
> > > > Apparently no other IRQ bit gets set, so this stalls.
> > > > 
> > > > Do not ignore enabled interrupts and make sure they are acked.
> > > > If the NACK IRQ is not needed, it should simply not enabled, but
> > > > according to the above log, caring about it is necessary unless
> > > > the Bus free IRQ is enabled and handled. The assumption that is
> > > > will always come with a ARDY IRQ, which was the idea behind
> > > > ignoring it, proves wrong.
> > > > It is true for simple reads from an unused address.
> > > > 
> > > > So revert
> > > > commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
> > > > 
> > > > The offending commit was used to reduce the false detections in
> > > > i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
> > > > rare false detections (I have never seen such on my systems) is the
> > > > lesser devil than having basically the system hanging completely.
> > > > 
> > > > No more details came to light in the corresponding email thread since
> > > > several months:
> > > > https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
> > > > so no better fix to solve both problems can be developed right now.    
> > > 
> > > I need someone from TI or someone who can test to ack here.
> > > 
> > > Can someone help?
> > >  
> > The original (IMHO minor) problem which should be fixed by c770657bd261
> > is hard to test, I have never seen that on any system (and as a
> > platform maintainer have a bunch of them) I have access to.
> > There is not much description anywhere about the system in which the
> > original system occured, and no reaction since several months from the
> > author, so I do not see anything which can be done.
> > Maybe it was just faulty hardware.
> > 
> > As said in the commit message, reverting it should be the lesser devil.
> > And that state was tested for many years.  
> 
> Can we not handle this slightly differently? leave the fix based on
> compatible? we know that the i2c controller changed over time. the
> i2cdetect bug fixed by c770657bd261 esp hard to find and fix.
> 
looking a bit more deeper in:
Why do we have omap_i2c_isr at all? Can there any case that
stat & mask == 0 there (without c770657bd261 applied)?

I looked at omap_i2c_xfer_data() and nothing interesting seems to
happen without other bits besides OMAP_I2C_STAT_NACK. 
Looking again, things get interesting when that loop is left.

Maybe just acking NACK, setting cmd_err and return -EAGAIN if no other
bits are set. That should not cause changes to scenarios where NACK
comes with other bits set. Lets check whether that fixes the
mess I see here. Well, everything is better then having that IRQ going
mad.

For reference, the sensor involved was the BMG160. Because it is not
enabled in omap2plus_defconfig, the issue did not show up early.
From my understanding, that there is a NACK after the reset command
data byte is sent. @Nikolaus: are there any nice and simple test points
for a scope?

Do you have any chance to test such a scenario on any device requiring
the c770657bd261 applied?

Regards,
Andreas
H. Nikolaus Schaller Feb. 28, 2025, 9:49 a.m. UTC | #7
Hi Andreas,

> Am 27.02.2025 um 22:47 schrieb Andreas Kemnade <andreas@kemnade.info>:
> 
> For reference, the sensor involved was the BMG160. Because it is not
> enabled in omap2plus_defconfig, the issue did not show up early.
> From my understanding, that there is a NACK after the reset command
> data byte is sent. @Nikolaus: are there any nice and simple test points
> for a scope?

According to schematics [1] you can either tap the I2C bus at the camera connector
(pins 3 and 5) or R206 and R207.

For the camera connector you would need a flex cable that fits the SD-52437-2471
connector.

The resistors are located near the pogo pin connector for the right speaker (all
are near the antenna connector). If you need a photo please let me know.
It would be better to solder tiny wires...

BR,
Nikolaus

[1]: https://projects.goldelico.com/p/gta04-main/downloads/48/
Andreas Kemnade Feb. 28, 2025, 10:17 a.m. UTC | #8
Am Thu, 27 Feb 2025 08:20:55 -0600
schrieb Nishanth Menon <nm@ti.com>:

> On 10:08-20250220, Andreas Kemnade wrote:
> > Am Wed, 19 Feb 2025 20:22:13 +0100
> > schrieb Andi Shyti <andi.shyti@kernel.org>:
> >   
> > > Hi,
> > > 
> > > On Fri, Feb 07, 2025 at 07:54:35PM +0100, Andreas Kemnade wrote:  
> > > > On the GTA04A5 writing a reset command to the gyroscope causes IRQ
> > > > storms because NACK IRQs are enabled and therefore triggered but not
> > > > acked.
> > > > 
> > > > Sending a reset command to the gyroscope by
> > > > i2cset 1 0x69 0x14 0xb6
> > > > with an additional debug print in the ISR (not the thread) itself
> > > > causes
> > > > 
> > > > [ 363.353515] i2c i2c-1: ioctl, cmd=0x720, arg=0xbe801b00
> > > > [ 363.359039] omap_i2c 48072000.i2c: addr: 0x0069, len: 2, flags: 0x0, stop: 1
> > > > [ 363.366180] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x1110)
> > > > [ 363.371673] omap_i2c 48072000.i2c: IRQ (ISR = 0x0010)
> > > > [ 363.376892] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.382263] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > [ 363.387664] omap_i2c 48072000.i2c: IRQ LL (ISR = 0x0102)
> > > > repeating till infinity
> > > > [...]
> > > > (0x2 = NACK, 0x100 = Bus free, which is not enabled)
> > > > Apparently no other IRQ bit gets set, so this stalls.
> > > > 
> > > > Do not ignore enabled interrupts and make sure they are acked.
> > > > If the NACK IRQ is not needed, it should simply not enabled, but
> > > > according to the above log, caring about it is necessary unless
> > > > the Bus free IRQ is enabled and handled. The assumption that is
> > > > will always come with a ARDY IRQ, which was the idea behind
> > > > ignoring it, proves wrong.
> > > > It is true for simple reads from an unused address.
> > > > 
> > > > So revert
> > > > commit c770657bd261 ("i2c: omap: Fix standard mode false ACK readings").
> > > > 
> > > > The offending commit was used to reduce the false detections in
> > > > i2cdetect. i2cdetect warns for confusing the I2C bus, so having some
> > > > rare false detections (I have never seen such on my systems) is the
> > > > lesser devil than having basically the system hanging completely.
> > > > 
> > > > No more details came to light in the corresponding email thread since
> > > > several months:
> > > > https://lore.kernel.org/linux-omap/20230426194956.689756-1-reidt@ti.com/
> > > > so no better fix to solve both problems can be developed right now.    
> > > 
> > > I need someone from TI or someone who can test to ack here.
> > > 
> > > Can someone help?
> > >  
> > The original (IMHO minor) problem which should be fixed by c770657bd261
> > is hard to test, I have never seen that on any system (and as a
> > platform maintainer have a bunch of them) I have access to.
> > There is not much description anywhere about the system in which the
> > original system occured, and no reaction since several months from the
> > author, so I do not see anything which can be done.
> > Maybe it was just faulty hardware.
> > 
> > As said in the commit message, reverting it should be the lesser devil.
> > And that state was tested for many years.  
> 
> Can we not handle this slightly differently? leave the fix based on
> compatible? we know that the i2c controller changed over time. the
> i2cdetect bug fixed by c770657bd261 esp hard to find and fix.
> 
Looking again more deeply, I do not understand how c770657bd261
should reliably prevent doing stuff without ARDY:
omap_i2c_xfer_data() is called as a reaction the the IRQ but also in
a polling loop. So it must deal with situations were only NACK is set
there.
So just disabling the IRQ cannot help. If that is just tested via
i2cdetect, the polling mode is not tested.

Regards,
Andreas
diff mbox series

Patch

diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index 92faf03d64cf..b54d4120899f 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -1057,7 +1057,7 @@  omap_i2c_isr(int irq, void *dev_id)
 	u16 stat;
 
 	stat = omap_i2c_read_reg(omap, OMAP_I2C_STAT_REG);
-	mask = omap_i2c_read_reg(omap, OMAP_I2C_IE_REG) & ~OMAP_I2C_STAT_NACK;
+	mask = omap_i2c_read_reg(omap, OMAP_I2C_IE_REG);
 
 	if (stat & mask)
 		ret = IRQ_WAKE_THREAD;