Message ID | 20210515024227.2159311-1-briannorris@chromium.org |
---|---|
State | Accepted |
Commit | 1f9482aa8d412b4ba06ce6ab8e333fb8ca29a06e |
Headers | show |
Series | [5.13] mwifiex: bring down link before deleting interface | expand |
On 5/15/21 4:42 AM, Brian Norris wrote: > We can deadlock when rmmod'ing the driver or going through firmware > reset, because the cfg80211_unregister_wdev() has to bring down the link > for us, ... which then grab the same wiphy lock. > > nl80211_del_interface() already handles a very similar case, with a nice > description: > > /* > * We hold RTNL, so this is safe, without RTNL opencount cannot > * reach 0, and thus the rdev cannot be deleted. > * > * We need to do it for the dev_close(), since that will call > * the netdev notifiers, and we need to acquire the mutex there > * but don't know if we get there from here or from some other > * place (e.g. "ip link set ... down"). > */ > mutex_unlock(&rdev->wiphy.mtx); > ... > > Do similarly for mwifiex teardown, by ensuring we bring the link down > first. > > Sample deadlock trace: > > [ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds. > [ 247.110630] Not tainted 5.12.4 #5 > [ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 247.124557] task:rmmod state:D stack: 0 pid: 2119 ppid: 2114 flags:0x00400208 > [ 247.133905] Call trace: > [ 247.136644] __switch_to+0x130/0x170 > [ 247.140643] __schedule+0x714/0xa0c > [ 247.144548] schedule_preempt_disabled+0x88/0xf4 > [ 247.149714] __mutex_lock_common+0x43c/0x750 > [ 247.154496] mutex_lock_nested+0x5c/0x68 > [ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211] > [ 247.165769] raw_notifier_call_chain+0x4c/0x78 > [ 247.170742] call_netdevice_notifiers_info+0x68/0xa4 > [ 247.176305] __dev_close_many+0x7c/0x138 > [ 247.180693] dev_close_many+0x7c/0x10c > [ 247.184893] unregister_netdevice_many+0xfc/0x654 > [ 247.190158] unregister_netdevice_queue+0xb4/0xe0 > [ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211] > [ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211] > [ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex] > [ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex] > [ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex] > [ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie] > [ 247.231451] pci_device_remove+0x50/0xe0 > [ 247.235849] device_release_driver_internal+0x110/0x1b0 > [ 247.241701] driver_detach+0x5c/0x9c > [ 247.245704] bus_remove_driver+0x84/0xb8 > [ 247.250095] driver_unregister+0x3c/0x60 > [ 247.254486] pci_unregister_driver+0x2c/0x90 > [ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie] > > Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") > Cc: stable@vger.kernel.org > Link: https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725@gmail.com/ > Link: https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bewaar.me/ > Reported-by: Maximilian Luz <luzmaximilian@gmail.com> > Reported-by: dave@bewaar.me > Cc: Johannes Berg <johannes@sipsolutions.net> > Signed-off-by: Brian Norris <briannorris@chromium.org> Thanks! Tested-by: Maximilian Luz <luzmaximilian@gmail.com> > --- > drivers/net/wireless/marvell/mwifiex/main.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c > index 529dfd8b7ae8..17399d4aa129 100644 > --- a/drivers/net/wireless/marvell/mwifiex/main.c > +++ b/drivers/net/wireless/marvell/mwifiex/main.c > @@ -1445,11 +1445,18 @@ static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter) > if (!priv) > continue; > rtnl_lock(); > - wiphy_lock(adapter->wiphy); > if (priv->netdev && > - priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) > + priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) { > + /* > + * Close the netdev now, because if we do it later, the > + * netdev notifiers will need to acquire the wiphy lock > + * again --> deadlock. > + */ > + dev_close(priv->wdev.netdev); > + wiphy_lock(adapter->wiphy); > mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev); > - wiphy_unlock(adapter->wiphy); > + wiphy_unlock(adapter->wiphy); > + } > rtnl_unlock(); > } > >
On 2021-05-15 04:42, Brian Norris wrote: > We can deadlock when rmmod'ing the driver or going through firmware > reset, because the cfg80211_unregister_wdev() has to bring down the > link > for us, ... which then grab the same wiphy lock. > > nl80211_del_interface() already handles a very similar case, with a > nice > description: > > /* > * We hold RTNL, so this is safe, without RTNL opencount cannot > * reach 0, and thus the rdev cannot be deleted. > * > * We need to do it for the dev_close(), since that will call > * the netdev notifiers, and we need to acquire the mutex there > * but don't know if we get there from here or from some other > * place (e.g. "ip link set ... down"). > */ > mutex_unlock(&rdev->wiphy.mtx); > ... > > Do similarly for mwifiex teardown, by ensuring we bring the link down > first. > > Sample deadlock trace: > > [ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds. > [ 247.110630] Not tainted 5.12.4 #5 > [ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [ 247.124557] task:rmmod state:D stack: 0 pid: 2119 > ppid: 2114 flags:0x00400208 > [ 247.133905] Call trace: > [ 247.136644] __switch_to+0x130/0x170 > [ 247.140643] __schedule+0x714/0xa0c > [ 247.144548] schedule_preempt_disabled+0x88/0xf4 > [ 247.149714] __mutex_lock_common+0x43c/0x750 > [ 247.154496] mutex_lock_nested+0x5c/0x68 > [ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211] > [ 247.165769] raw_notifier_call_chain+0x4c/0x78 > [ 247.170742] call_netdevice_notifiers_info+0x68/0xa4 > [ 247.176305] __dev_close_many+0x7c/0x138 > [ 247.180693] dev_close_many+0x7c/0x10c > [ 247.184893] unregister_netdevice_many+0xfc/0x654 > [ 247.190158] unregister_netdevice_queue+0xb4/0xe0 > [ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211] > [ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211] > [ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex] > [ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex] > [ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex] > [ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie] > [ 247.231451] pci_device_remove+0x50/0xe0 > [ 247.235849] device_release_driver_internal+0x110/0x1b0 > [ 247.241701] driver_detach+0x5c/0x9c > [ 247.245704] bus_remove_driver+0x84/0xb8 > [ 247.250095] driver_unregister+0x3c/0x60 > [ 247.254486] pci_unregister_driver+0x2c/0x90 > [ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie] > > Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the > driver") > Cc: stable@vger.kernel.org > Link: > https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725@gmail.com/ > Link: > https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bewaar.me/ > Reported-by: Maximilian Luz <luzmaximilian@gmail.com> > Reported-by: Dave Olsthoorn <dave@bewaar.me> Thanks! The firmware still seems to crash quicker than previously, but that's a unrelated problem. Tested-by: Dave Olsthoorn <dave@bewaar.me> > Cc: Johannes Berg <johannes@sipsolutions.net> > Signed-off-by: Brian Norris <briannorris@chromium.org> > --- > drivers/net/wireless/marvell/mwifiex/main.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/wireless/marvell/mwifiex/main.c > b/drivers/net/wireless/marvell/mwifiex/main.c > index 529dfd8b7ae8..17399d4aa129 100644 > --- a/drivers/net/wireless/marvell/mwifiex/main.c > +++ b/drivers/net/wireless/marvell/mwifiex/main.c > @@ -1445,11 +1445,18 @@ static void mwifiex_uninit_sw(struct > mwifiex_adapter *adapter) > if (!priv) > continue; > rtnl_lock(); > - wiphy_lock(adapter->wiphy); > if (priv->netdev && > - priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) > + priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) { > + /* > + * Close the netdev now, because if we do it later, the > + * netdev notifiers will need to acquire the wiphy lock > + * again --> deadlock. > + */ > + dev_close(priv->wdev.netdev); > + wiphy_lock(adapter->wiphy); > mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev); > - wiphy_unlock(adapter->wiphy); > + wiphy_unlock(adapter->wiphy); > + } > rtnl_unlock(); > }
On Fri, May 14, 2021 at 7:45 PM Brian Norris <briannorris@chromium.org> wrote: > > We can deadlock when rmmod'ing the driver or going through firmware > reset, because the cfg80211_unregister_wdev() has to bring down the link > for us, ... which then grab the same wiphy lock. ... > Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") > Cc: stable@vger.kernel.org > Link: https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725@gmail.com/ > Link: https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bewaar.me/ > Reported-by: Maximilian Luz <luzmaximilian@gmail.com> > Reported-by: dave@bewaar.me > Cc: Johannes Berg <johannes@sipsolutions.net> > Signed-off-by: Brian Norris <briannorris@chromium.org> Ping - is this going to get merged? It's a 5.12 regression, and we have multiple people complaining about it (and they tested the fix too!). Thanks, Brian
Brian Norris <briannorris@chromium.org> writes: > On Fri, May 14, 2021 at 7:45 PM Brian Norris <briannorris@chromium.org> wrote: >> >> We can deadlock when rmmod'ing the driver or going through firmware >> reset, because the cfg80211_unregister_wdev() has to bring down the link >> for us, ... which then grab the same wiphy lock. > ... >> Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") >> Cc: stable@vger.kernel.org >> Link: >> https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725@gmail.com/ >> Link: >> https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bewaar.me/ >> Reported-by: Maximilian Luz <luzmaximilian@gmail.com> >> Reported-by: dave@bewaar.me >> Cc: Johannes Berg <johannes@sipsolutions.net> >> Signed-off-by: Brian Norris <briannorris@chromium.org> > > Ping - is this going to get merged? It's a 5.12 regression, and we > have multiple people complaining about it (and they tested the fix > too!). Thanks for the ping, this got piled up under all the -next patches and I missed it. I'll look at it now. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Brian Norris <briannorris@chromium.org> wrote: > We can deadlock when rmmod'ing the driver or going through firmware > reset, because the cfg80211_unregister_wdev() has to bring down the link > for us, ... which then grab the same wiphy lock. > > nl80211_del_interface() already handles a very similar case, with a nice > description: > > /* > * We hold RTNL, so this is safe, without RTNL opencount cannot > * reach 0, and thus the rdev cannot be deleted. > * > * We need to do it for the dev_close(), since that will call > * the netdev notifiers, and we need to acquire the mutex there > * but don't know if we get there from here or from some other > * place (e.g. "ip link set ... down"). > */ > mutex_unlock(&rdev->wiphy.mtx); > ... > > Do similarly for mwifiex teardown, by ensuring we bring the link down > first. > > Sample deadlock trace: > > [ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds. > [ 247.110630] Not tainted 5.12.4 #5 > [ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [ 247.124557] task:rmmod state:D stack: 0 pid: 2119 ppid: 2114 flags:0x00400208 > [ 247.133905] Call trace: > [ 247.136644] __switch_to+0x130/0x170 > [ 247.140643] __schedule+0x714/0xa0c > [ 247.144548] schedule_preempt_disabled+0x88/0xf4 > [ 247.149714] __mutex_lock_common+0x43c/0x750 > [ 247.154496] mutex_lock_nested+0x5c/0x68 > [ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211] > [ 247.165769] raw_notifier_call_chain+0x4c/0x78 > [ 247.170742] call_netdevice_notifiers_info+0x68/0xa4 > [ 247.176305] __dev_close_many+0x7c/0x138 > [ 247.180693] dev_close_many+0x7c/0x10c > [ 247.184893] unregister_netdevice_many+0xfc/0x654 > [ 247.190158] unregister_netdevice_queue+0xb4/0xe0 > [ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211] > [ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211] > [ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex] > [ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex] > [ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex] > [ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie] > [ 247.231451] pci_device_remove+0x50/0xe0 > [ 247.235849] device_release_driver_internal+0x110/0x1b0 > [ 247.241701] driver_detach+0x5c/0x9c > [ 247.245704] bus_remove_driver+0x84/0xb8 > [ 247.250095] driver_unregister+0x3c/0x60 > [ 247.254486] pci_unregister_driver+0x2c/0x90 > [ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie] > > Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") > Cc: stable@vger.kernel.org > Link: https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725@gmail.com/ > Link: https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bewaar.me/ > Reported-by: Maximilian Luz <luzmaximilian@gmail.com> > Reported-by: dave@bewaar.me > Cc: Johannes Berg <johannes@sipsolutions.net> > Signed-off-by: Brian Norris <briannorris@chromium.org> > Tested-by: Maximilian Luz <luzmaximilian@gmail.com> > Tested-by: Dave Olsthoorn <dave@bewaar.me> Patch applied to wireless-drivers.git, thanks. 1f9482aa8d41 mwifiex: bring down link before deleting interface -- https://patchwork.kernel.org/project/linux-wireless/patch/20210515024227.2159311-1-briannorris@chromium.org/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c index 529dfd8b7ae8..17399d4aa129 100644 --- a/drivers/net/wireless/marvell/mwifiex/main.c +++ b/drivers/net/wireless/marvell/mwifiex/main.c @@ -1445,11 +1445,18 @@ static void mwifiex_uninit_sw(struct mwifiex_adapter *adapter) if (!priv) continue; rtnl_lock(); - wiphy_lock(adapter->wiphy); if (priv->netdev && - priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) + priv->wdev.iftype != NL80211_IFTYPE_UNSPECIFIED) { + /* + * Close the netdev now, because if we do it later, the + * netdev notifiers will need to acquire the wiphy lock + * again --> deadlock. + */ + dev_close(priv->wdev.netdev); + wiphy_lock(adapter->wiphy); mwifiex_del_virtual_intf(adapter->wiphy, &priv->wdev); - wiphy_unlock(adapter->wiphy); + wiphy_unlock(adapter->wiphy); + } rtnl_unlock(); }
We can deadlock when rmmod'ing the driver or going through firmware reset, because the cfg80211_unregister_wdev() has to bring down the link for us, ... which then grab the same wiphy lock. nl80211_del_interface() already handles a very similar case, with a nice description: /* * We hold RTNL, so this is safe, without RTNL opencount cannot * reach 0, and thus the rdev cannot be deleted. * * We need to do it for the dev_close(), since that will call * the netdev notifiers, and we need to acquire the mutex there * but don't know if we get there from here or from some other * place (e.g. "ip link set ... down"). */ mutex_unlock(&rdev->wiphy.mtx); ... Do similarly for mwifiex teardown, by ensuring we bring the link down first. Sample deadlock trace: [ 247.103516] INFO: task rmmod:2119 blocked for more than 123 seconds. [ 247.110630] Not tainted 5.12.4 #5 [ 247.115796] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 247.124557] task:rmmod state:D stack: 0 pid: 2119 ppid: 2114 flags:0x00400208 [ 247.133905] Call trace: [ 247.136644] __switch_to+0x130/0x170 [ 247.140643] __schedule+0x714/0xa0c [ 247.144548] schedule_preempt_disabled+0x88/0xf4 [ 247.149714] __mutex_lock_common+0x43c/0x750 [ 247.154496] mutex_lock_nested+0x5c/0x68 [ 247.158884] cfg80211_netdev_notifier_call+0x280/0x4e0 [cfg80211] [ 247.165769] raw_notifier_call_chain+0x4c/0x78 [ 247.170742] call_netdevice_notifiers_info+0x68/0xa4 [ 247.176305] __dev_close_many+0x7c/0x138 [ 247.180693] dev_close_many+0x7c/0x10c [ 247.184893] unregister_netdevice_many+0xfc/0x654 [ 247.190158] unregister_netdevice_queue+0xb4/0xe0 [ 247.195424] _cfg80211_unregister_wdev+0xa4/0x204 [cfg80211] [ 247.201816] cfg80211_unregister_wdev+0x20/0x2c [cfg80211] [ 247.208016] mwifiex_del_virtual_intf+0xc8/0x188 [mwifiex] [ 247.214174] mwifiex_uninit_sw+0x158/0x1b0 [mwifiex] [ 247.219747] mwifiex_remove_card+0x38/0xa0 [mwifiex] [ 247.225316] mwifiex_pcie_remove+0xd0/0xe0 [mwifiex_pcie] [ 247.231451] pci_device_remove+0x50/0xe0 [ 247.235849] device_release_driver_internal+0x110/0x1b0 [ 247.241701] driver_detach+0x5c/0x9c [ 247.245704] bus_remove_driver+0x84/0xb8 [ 247.250095] driver_unregister+0x3c/0x60 [ 247.254486] pci_unregister_driver+0x2c/0x90 [ 247.259267] cleanup_module+0x18/0xcdc [mwifiex_pcie] Fixes: a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-wireless/98392296-40ee-6300-369c-32e16cff3725@gmail.com/ Link: https://lore.kernel.org/linux-wireless/ab4d00ce52f32bd8e45ad0448a44737e@bewaar.me/ Reported-by: Maximilian Luz <luzmaximilian@gmail.com> Reported-by: dave@bewaar.me Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Brian Norris <briannorris@chromium.org> --- drivers/net/wireless/marvell/mwifiex/main.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) --