Message ID | 20250610100208.2282442-1-dmantipov@yandex.ru |
---|---|
State | New |
Headers | show |
Series | wifi: carl9170: do not ping device which has failed to load firmware | expand |
On 6/10/25 12:02 PM, Dmitry Antipov wrote: > Syzkaller reports [1, 2] crashes caused by an attempts to ping > the device which has failed to load firmware. Since such a device > doesn't pass 'ieee80211_register_hw()', an internal workqueue > managed by 'ieee80211_queue_work()' is not yet created and an > attempt to queue work on it causes null-ptr-deref. > > [1] https://syzkaller.appspot.com/bug?extid=9a4aec827829942045ff hmm, the sample crash report didn't show any carl9170 involvement. But the provided console log did have it: <https://syzkaller.appspot.com/text?tag=CrashLog&x=12cf580c580000> |[ 144.671347][ C1] Call Trace: |[ 144.674634][ C1] <TASK> |[ 144.677574][ C1] ? do_raw_spin_unlock+0x122/0x240 |[ 144.682819][ C1] queue_work_on+0x181/0x270 |[ 144.687414][ C1] ? __pfx_queue_work_on+0x10/0x10 |[ 144.692525][ C1] ? carl9170_usb_submit_rx_urb+0x198/0x1d0 |[ 144.698424][ C1] ? carl9170_usb_rx_complete+0x207/0x280 |[ 144.704149][ C1] __usb_hcd_giveback_urb+0x41a/0x690 |[ 144.709555][ C1] ? usb_hcd_unlink_urb_from_ep+0x2c/0x110 |[ 144.715455][ C1] ? __pfx___usb_hcd_giveback_urb+0x10/0x10 > [2] https://syzkaller.appspot.com/bug?extid=0d8afba53e8fb2633217 > Fixes: e4a668c59080 ("carl9170: fix spurious restart due to high latency") > Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Christian Lamparter <chunkeey@gmail.com> Cc: <stable@vger.kernel.org>
It's better to be back to the original thread. Sorry, I misclicked and forgot to put In-Reply-To firstly. That part of the broken thread: https://lore.kernel.org/linux-wireless/y4ufvifcearf75qds5hlro3rfiadwfwlixz5xg3w6jjozk5sdg@7yyfsdvyehon/T/#u Quoting your last reply there.. On Sat, 14 Jun 2025 18:33:28 +0200, Christian Lamparter wrote: > Hi, > > On 6/13/25 10:19 PM, Fedor Pchelkin wrote: > > Dmitry Antipov wrote: > >> Syzkaller reports [1, 2] crashes caused by an attempts to ping > >> the device which has failed to load firmware. Since such a device > >> doesn't pass 'ieee80211_register_hw()', an internal workqueue > >> managed by 'ieee80211_queue_work()' is not yet created and an > >> attempt to queue work on it causes null-ptr-deref. > >> > >> [1] https://syzkaller.appspot.com/bug?extid=9a4aec827829942045ff > >> [2] https://syzkaller.appspot.com/bug?extid=0d8afba53e8fb2633217 > >> Fixes: e4a668c59080 ("carl9170: fix spurious restart due to high latency") > >> Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> > >> --- > >> drivers/net/wireless/ath/carl9170/usb.c | 15 ++++++++++----- > >> 1 file changed, 10 insertions(+), 5 deletions(-) > >> > >> diff --git a/drivers/net/wireless/ath/carl9170/usb.c b/drivers/net/wireless/ath/carl9170/usb.c > >> index a3e03580cd9f..a0bfa0c477ee 100644 > >> --- a/drivers/net/wireless/ath/carl9170/usb.c > >> +++ b/drivers/net/wireless/ath/carl9170/usb.c > >> @@ -438,13 +438,18 @@ static void carl9170_usb_rx_complete(struct urb *urb) > >> > >> if (atomic_read(&ar->rx_anch_urbs) == 0) { > >> /* > >> - * The system is too slow to cope with > >> - * the enormous workload. We have simply > >> - * run out of active rx urbs and this > >> - * unfortunately leads to an unpredictable > >> - * device. > >> + * At this point, either the system is too slow to > >> + * cope with the enormous workload (so we have simply > >> + * run out of active rx urbs and this unfortunately > >> + * leads to an unpredictable device), or the device > >> + * is not fully functional after an unsuccessful > >> + * firmware loading attempts (so it doesn't pass > >> + * ieee80211_register_hw() and there is no internal > >> + * workqueue at all). > >> */ > >> > >> + if (WARN_ON_ONCE(!ar->registered)) > >> + return; > > > > Is WARN justifiable here if it concerns handling a predefined error > > condition? > > The driver has many more WARN_ON_(ONCE). Most of them are from "back in the day". I think > carl9170 predates Syzkaller by something like 5 years or less. Just the fact of presence of other WARNs in the driver is not usually a pretext to add the new ones. WARNs should be used to point out that something is going wrong on the kernel level, i.e. be indicators of kernel bugs (BUG and BUG_ONs are discouraged for the new code, if I remember correctly). > > In this case, it would be good to know if this only happens with syzkaller, or with some > dogy device (be it the hci, or maybe the ar9170 device itself - they are getting old by now). > I mean Garbage In => Garbage Out. But yes, it shouldn't crash. The patch description lacks details on why Syzkaller does ever hit such a situation, even taking into account that fuzzers love to exaggerate the likelihood of hitting some weird and impossible-to-hit-in-practice stuff. Dmitry, if you'd like to, please give some comments on the following.. The path in question is executed when carl9170_usb_submit_rx_urb() fails. I didn't check the repro, only judging by driver code and crash report, but presumably it fails due to usb_submit_urb() returning some -ENODEV because the device is disconnected concurrently. usb_submit_urb() errors lead to &ar->rx_anch_urbs still remaining to have a zero value. I think the key reason of the crash is that the device disconnection and the URB completion callback - carl9170_usb_rx_complete() where the crash occurs - have a race. The logs say it's disconnected just before the crash: [ 87.100458][ T2990] usb 4-1: USB disconnect, device number 11 [ 87.101369][ C1] Oops: general protection fault, probably for non-canonical address 0xdffffc0000000038: 0000 [#1] SMP KASAN PTI [ 87.101407][ C1] KASAN: null-ptr-deref in range [0x00000000000001c0-0x00000000000001c7] So it looks like ar->registered being false here is a "correct" failure condition, i.e. it can be expected when the certain phase of the driver initialization fails and should be handled without any WARNs. > > > I mean, yeah, it avoids a crash in the completion handler but kernels > > with panic_on_warn - the ones which Syzkaller runs - will still stumble > > here for no reason. > > I bet there is already a precedence for this, if someone knows it or has previous decisions: > Please join in! > > Regards, > Christian
diff --git a/drivers/net/wireless/ath/carl9170/usb.c b/drivers/net/wireless/ath/carl9170/usb.c index a3e03580cd9f..a0bfa0c477ee 100644 --- a/drivers/net/wireless/ath/carl9170/usb.c +++ b/drivers/net/wireless/ath/carl9170/usb.c @@ -438,13 +438,18 @@ static void carl9170_usb_rx_complete(struct urb *urb) if (atomic_read(&ar->rx_anch_urbs) == 0) { /* - * The system is too slow to cope with - * the enormous workload. We have simply - * run out of active rx urbs and this - * unfortunately leads to an unpredictable - * device. + * At this point, either the system is too slow to + * cope with the enormous workload (so we have simply + * run out of active rx urbs and this unfortunately + * leads to an unpredictable device), or the device + * is not fully functional after an unsuccessful + * firmware loading attempts (so it doesn't pass + * ieee80211_register_hw() and there is no internal + * workqueue at all). */ + if (WARN_ON_ONCE(!ar->registered)) + return; ieee80211_queue_work(ar->hw, &ar->ping_work); } } else {
Syzkaller reports [1, 2] crashes caused by an attempts to ping the device which has failed to load firmware. Since such a device doesn't pass 'ieee80211_register_hw()', an internal workqueue managed by 'ieee80211_queue_work()' is not yet created and an attempt to queue work on it causes null-ptr-deref. [1] https://syzkaller.appspot.com/bug?extid=9a4aec827829942045ff [2] https://syzkaller.appspot.com/bug?extid=0d8afba53e8fb2633217 Fixes: e4a668c59080 ("carl9170: fix spurious restart due to high latency") Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> --- drivers/net/wireless/ath/carl9170/usb.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)