diff mbox series

[3/3] ath9k: Fix potential hw interrupt resume during reset

Message ID 20210914192515.9273-4-linus.luessing@c0d3.blue
State New
Headers show
Series ath9k: interrupt fixes on queue reset | expand

Commit Message

Linus Lüssing Sept. 14, 2021, 7:25 p.m. UTC
From: Linus Lüssing <ll@simonwunderlich.de>

There is a small risk of the ath9k hw interrupts being reenabled in the
following way:

1) ath_reset_internal()
   ...
   -> disable_irq()
      ...
      <- returns

                      2) ath9k_tasklet()
                         ...
                         -> ath9k_hw_resume_interrupts()
                         ...

1) ath_reset_internal() continued:
   -> tasklet_disable(&sc->intr_tq); (= ath9k_tasklet() off)

By first disabling the ath9k interrupt there is a small window
afterwards which allows ath9k hw interrupts being reenabled through
the ath9k_tasklet() before we disable this tasklet in
ath_reset_internal(). Leading to having the ath9k hw interrupts enabled
during the reset, which we should avoid.

Fixing this by first disabling all ath9k tasklets. And only after
they are not running anymore also disabling the overall ath9k interrupt.

Either ath9k_queue_reset()->ath9k_kill_hw_interrupts() or
ath_reset_internal()->disable_irq()->ath_isr()->ath9k_kill_hw_interrupts()
should then have ensured that no ath9k hw interrupts are running during
the actual ath9k reset.

We could reproduce this issue with two Lima boards from 8devices
(QCA4531) on OpenWrt 19.07 while sending UDP traffic between the two and
triggering an ath9k_queue_reset() and with added msleep()s between
disable_irq() and tasklet_disable() in ath_reset_internal().

Cc: Sven Eckelmann <sven@narfation.org>
Cc: Simon Wunderlich <sw@simonwunderlich.de>
Cc: Linus Lüssing <linus.luessing@c0d3.blue>
Fixes: e3f31175a3ee ("ath9k: fix race condition in irq processing during hardware reset")
Signed-off-by: Linus Lüssing <ll@simonwunderlich.de>
---
 drivers/net/wireless/ath/ath9k/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Felix Fietkau Sept. 15, 2021, 9:48 a.m. UTC | #1
On 2021-09-14 21:25, Linus Lüssing wrote:
> From: Linus Lüssing <ll@simonwunderlich.de>
> 
> There is a small risk of the ath9k hw interrupts being reenabled in the
> following way:
> 
> 1) ath_reset_internal()
>    ...
>    -> disable_irq()
>       ...
>       <- returns
> 
>                       2) ath9k_tasklet()
>                          ...
>                          -> ath9k_hw_resume_interrupts()
>                          ...
> 
> 1) ath_reset_internal() continued:
>    -> tasklet_disable(&sc->intr_tq); (= ath9k_tasklet() off)
> 
> By first disabling the ath9k interrupt there is a small window
> afterwards which allows ath9k hw interrupts being reenabled through
> the ath9k_tasklet() before we disable this tasklet in
> ath_reset_internal(). Leading to having the ath9k hw interrupts enabled
> during the reset, which we should avoid.
I don't see a way in which interrupts can be re-enabled through the
tasklet. disable_irq disables the entire PCI IRQ (not through ath9k hw
registers), and they will only be re-enabled by the corresponding
enable_irq call.

- Felix
Linus Lüssing Sept. 15, 2021, 7:18 p.m. UTC | #2
On Wed, Sep 15, 2021 at 11:48:55AM +0200, Felix Fietkau wrote:
> 

> On 2021-09-14 21:25, Linus Lüssing wrote:

> > From: Linus Lüssing <ll@simonwunderlich.de>

> > 

> > There is a small risk of the ath9k hw interrupts being reenabled in the

> > following way:

> > 

> > 1) ath_reset_internal()

> >    ...

> >    -> disable_irq()

> >       ...

> >       <- returns

> > 

> >                       2) ath9k_tasklet()

> >                          ...

> >                          -> ath9k_hw_resume_interrupts()

> >                          ...

> > 

> > 1) ath_reset_internal() continued:

> >    -> tasklet_disable(&sc->intr_tq); (= ath9k_tasklet() off)

> > 

> > By first disabling the ath9k interrupt there is a small window

> > afterwards which allows ath9k hw interrupts being reenabled through

> > the ath9k_tasklet() before we disable this tasklet in

> > ath_reset_internal(). Leading to having the ath9k hw interrupts enabled

> > during the reset, which we should avoid.

> I don't see a way in which interrupts can be re-enabled through the

> tasklet. disable_irq disables the entire PCI IRQ (not through ath9k hw

> registers), and they will only be re-enabled by the corresponding

> enable_irq call.


Ah, okay, then I think I misunderstood the previous fixes to the
ath9k interrupt shutdown during reset here. I had only tested the
following diff and assumed that it were not okay to have the ath9k
hw interrupt registers enabled within the spinlock'd section:

```
@@ -299,11 +299,23 @@ static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan)
 
        __ath_cancel_work(sc);
 
        disable_irq(sc->irq);
+       for (r = 0; r < 200; r++) {
+               msleep(5);
+
+               if (REG_READ(ah, AR_INTR_SYNC_CAUSE) ||
+                   REG_READ(ah, AR_INTR_ASYNC_CAUSE)) {
+                       break;
+               }
+       }
        tasklet_disable(&sc->intr_tq);
        tasklet_disable(&sc->bcon_tasklet);
        spin_lock_bh(&sc->sc_pcu_lock);
 
+       if (REG_READ(ah, AR_INTR_SYNC_CAUSE) ||
+           REG_READ(ah, AR_INTR_ASYNC_CAUSE))
+               ATH_DBG_WARN(1, "%s: interrupts are enabled", __func__);
+
        if (!sc->cur_chan->offchannel) {
                fastcc = false;
                caldata = &sc->cur_chan->caldata;
```

And yes, the general ath9k interrupt should still be disabled. Didn't
test for that but seems like it.


(And now I noticed that actually
 ath_reset_internal()
 -> ath_prepare_reset()
   -> ath9k_hw_disable_interrupts()
     -> ath9k_hw_kill_interrupts()
 disables the ath9k hw interrupt registers before the
 ath_reset_internal()->ath9k_hw_reset() call anyway.)


What would you prefer, should this patch just be dropped or should
I resend it without the "Fixes:" line as a cosmetic change? (e.g. to
have the order in reverse to reenablement at the end of
ath_reset_internal(), to avoid confusion in the future?)

Regards, Linus
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 98090e40e1cf..b9f9a8ae3b56 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -292,9 +292,9 @@  static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan)
 
 	__ath_cancel_work(sc);
 
-	disable_irq(sc->irq);
 	tasklet_disable(&sc->intr_tq);
 	tasklet_disable(&sc->bcon_tasklet);
+	disable_irq(sc->irq);
 	spin_lock_bh(&sc->sc_pcu_lock);
 
 	if (!sc->cur_chan->offchannel) {