Message ID | 20240816173529.17873-15-nbd@nbd.name |
---|---|
State | Superseded |
Headers | show |
Series | [01/16] mt76: mt7603: fix mixed declarations and code | expand |
On 8/16/24 10:35, Felix Fietkau wrote: > On MT7915, MCU hangs do not trigger watchdog interrupts, so they can only > be detected through MCU message timeouts. Ensure that the hardware gets > restarted when that happens in order to prevent a permanent stuck state. We applied this to our hacked upon 6.10 kernel, and this patch appears to cause NPE down in debugfs file removal during radio restart. We didn't investigate this closely, but removing this patch fixes the problem. Also of note, we see the radio have a timeout, but then recover, often (without this patch). Did you force/fake this situation to happen and see it actually work? Thanks, Ben
On 23.08.24 20:32, Ben Greear wrote: > On 8/16/24 10:35, Felix Fietkau wrote: >> On MT7915, MCU hangs do not trigger watchdog interrupts, so they can only >> be detected through MCU message timeouts. Ensure that the hardware gets >> restarted when that happens in order to prevent a permanent stuck state. > > We applied this to our hacked upon 6.10 kernel, and this patch appears > to cause NPE down in debugfs file removal during radio restart. We didn't investigate this > closely, but removing this patch fixes the problem. > > Also of note, we see the radio have a timeout, but then recover, often > (without this patch). > > Did you force/fake this situation to happen and see it actually work? I found some issues in a few patches of this series in the last few days and will send v2 soon. - Felix
diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c index 068523561f5e..7c98d9ba9152 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c +++ b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c @@ -157,12 +157,19 @@ static int mt7915_mcu_parse_response(struct mt76_dev *mdev, int cmd, struct sk_buff *skb, int seq) { + struct mt7915_dev *dev = container_of(mdev, struct mt7915_dev, mt76); struct mt76_connac2_mcu_rxd *rxd; int ret = 0; if (!skb) { dev_err(mdev->dev, "Message %08x (seq %d) timeout\n", cmd, seq); + dev->recovery.restart = true; + set_bit(MT76_MCU_RESET, &dev->mphy.state); + wake_up(&dev->mt76.mcu.wait); + queue_work(dev->mt76.wq, &dev->reset_work); + wake_up(&dev->reset_wait); + return -ETIMEDOUT; }
On MT7915, MCU hangs do not trigger watchdog interrupts, so they can only be detected through MCU message timeouts. Ensure that the hardware gets restarted when that happens in order to prevent a permanent stuck state. Signed-off-by: Felix Fietkau <nbd@nbd.name> --- drivers/net/wireless/mediatek/mt76/mt7915/mcu.c | 7 +++++++ 1 file changed, 7 insertions(+)