Message ID | 20230108211324.442823-2-martin.blumenstingl@googlemail.com |
---|---|
State | New |
Headers | show |
Series | wifi: rtw88: Three locking fixes for existing code | expand |
Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote: > USB and (upcoming) SDIO support may sleep in the read/write handlers. > Shrink the RCU critical section so it only cover the call to > ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the > found station. This moves the chip's BFEE configuration outside the > rcu_read_lock section and thus prevent "scheduling while atomic" or > "Voluntary context switch within RCU read-side critical section!" > warnings when accessing the registers using an SDIO card (which is > where this issue has been spotted in the real world - but it also > affects USB cards). > > Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> > Tested-by: Sascha Hauer <s.hauer@pengutronix.de> > Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> 3 patches applied to wireless-next.git, thanks. 8a1e2fd8e2da wifi: rtw88: Move register access from rtw_bf_assoc() outside the RCU 313f6dc7c5ed wifi: rtw88: Use rtw_iterate_vifs() for rtw_vif_watch_dog_iter() 2931978cd74f wifi: rtw88: Use non-atomic sta iterator in rtw_ra_mask_info_update()
On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote: > USB and (upcoming) SDIO support may sleep in the read/write handlers. > Shrink the RCU critical section so it only cover the call to > ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the > found station. This moves the chip's BFEE configuration outside the > rcu_read_lock section and thus prevent "scheduling while atomic" or > "Voluntary context switch within RCU read-side critical section!" > warnings when accessing the registers using an SDIO card (which is > where this issue has been spotted in the real world - but it also > affects USB cards). Unfortunately this introduces a regression on my RTW8821CU chip. With this it constantly looses connection to the AP and reconnects shortly after: [ 199.771143] wlan0: authenticate with b0:be:76:5e:7b:34 [ 201.447301] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3) [ 201.456789] wlan0: authenticated [ 201.462356] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3) [ 201.477263] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2) [ 201.512995] wlan0: associated [ 213.790399] wlan0: authenticate with b0:be:76:5e:7b:34 [ 215.467302] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3) [ 215.470532] wlan0: authenticated [ 215.490355] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3) [ 215.503777] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2) [ 215.539608] wlan0: associated [ 227.770596] wlan0: authenticate with b0:be:76:5e:7b:34 [ 229.443302] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3) [ 229.451209] wlan0: authenticated [ 229.462487] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3) [ 229.476077] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2) [ 229.513499] wlan0: associated [ 241.738494] wlan0: authenticate with b0:be:76:5e:7b:34 [ 243.407301] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3) [ 243.411207] wlan0: authenticated [ 243.423213] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3) [ 243.439822] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2) [ 243.476731] wlan0: associated I haven't got any further information yet, I just realized this when I rebased my own RTW88 bugfix series from v6.2.2 to v6.3-rc4 before sending it. RTW8723D and RTW8822CU seem unaffected though. Sascha > > Reviewed-by: Ping-Ke Shih <pkshih@realtek.com> > Tested-by: Sascha Hauer <s.hauer@pengutronix.de> > Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> > --- > v1 -> v2: > - Added Ping-Ke's Reviewed-by (thank you!) > > v2 -> v3: > - Added Sascha's Tested-by (thank you!) > - added "wifi" prefix to the subject and reworded the title accordingly > > > drivers/net/wireless/realtek/rtw88/bf.c | 13 +++++++------ > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/drivers/net/wireless/realtek/rtw88/bf.c b/drivers/net/wireless/realtek/rtw88/bf.c > index 038a30b170ef..c827c4a2814b 100644 > --- a/drivers/net/wireless/realtek/rtw88/bf.c > +++ b/drivers/net/wireless/realtek/rtw88/bf.c > @@ -49,19 +49,23 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif, > > sta = ieee80211_find_sta(vif, bssid); > if (!sta) { > + rcu_read_unlock(); > + > rtw_warn(rtwdev, "failed to find station entry for bss %pM\n", > bssid); > - goto out_unlock; > + return; > } > > ic_vht_cap = &hw->wiphy->bands[NL80211_BAND_5GHZ]->vht_cap; > vht_cap = &sta->deflink.vht_cap; > > + rcu_read_unlock(); > + > if ((ic_vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMEE_CAPABLE) && > (vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMER_CAPABLE)) { > if (bfinfo->bfer_mu_cnt >= chip->bfer_mu_max_num) { > rtw_dbg(rtwdev, RTW_DBG_BF, "mu bfer number over limit\n"); > - goto out_unlock; > + return; > } > > ether_addr_copy(bfee->mac_addr, bssid); > @@ -75,7 +79,7 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif, > (vht_cap->cap & IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE)) { > if (bfinfo->bfer_su_cnt >= chip->bfer_su_max_num) { > rtw_dbg(rtwdev, RTW_DBG_BF, "su bfer number over limit\n"); > - goto out_unlock; > + return; > } > > sound_dim = vht_cap->cap & > @@ -98,9 +102,6 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif, > > rtw_chip_config_bfee(rtwdev, rtwvif, bfee, true); > } > - > -out_unlock: > - rcu_read_unlock(); > } > > void rtw_bf_init_bfer_entry_mu(struct rtw_dev *rtwdev, > -- > 2.39.0 > >
Hi Sascha, On Fri, Mar 31, 2023 at 2:59 PM Sascha Hauer <s.hauer@pengutronix.de> wrote: > > On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote: > > USB and (upcoming) SDIO support may sleep in the read/write handlers. > > Shrink the RCU critical section so it only cover the call to > > ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the > > found station. This moves the chip's BFEE configuration outside the > > rcu_read_lock section and thus prevent "scheduling while atomic" or > > "Voluntary context switch within RCU read-side critical section!" > > warnings when accessing the registers using an SDIO card (which is > > where this issue has been spotted in the real world - but it also > > affects USB cards). > > Unfortunately this introduces a regression on my RTW8821CU chip. With > this it constantly looses connection to the AP and reconnects shortly > after: Sorry to hear this! This is odd and unfortunately I don't understand the reason for this. rtw_bf_assoc() is only called from drivers/net/wireless/realtek/rtw88/mac80211.c with rtwdev->mutex held. So I don't think that it's a race condition. There's a module parameter which lets you enable/disable BF support: $ git grep rtw_bf_support drivers/net/wireless/realtek/rtw88/ | grep param drivers/net/wireless/realtek/rtw88/main.c:module_param_named(support_bf, rtw_bf_support, bool, 0644); Have you tried disabling BF support? Also +Cc Jernej in case he has an idea. Best regards, Martin
[CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html] On 31.03.23 14:59, Sascha Hauer wrote: > On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote: >> USB and (upcoming) SDIO support may sleep in the read/write handlers. >> Shrink the RCU critical section so it only cover the call to >> ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the >> found station. This moves the chip's BFEE configuration outside the >> rcu_read_lock section and thus prevent "scheduling while atomic" or >> "Voluntary context switch within RCU read-side critical section!" >> warnings when accessing the registers using an SDIO card (which is >> where this issue has been spotted in the real world - but it also >> affects USB cards). > > Unfortunately this introduces a regression on my RTW8821CU chip. With > this it constantly looses connection to the AP and reconnects shortly > after: Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced c7eca79def44 #regzbot title net: wifi: rtw88: RTW8821CU constantly looses connection to the AP and reconnects shortly after #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
Hi Martin, On Sat, Apr 01, 2023 at 11:30:40PM +0200, Martin Blumenstingl wrote: > Hi Sascha, > > On Fri, Mar 31, 2023 at 2:59 PM Sascha Hauer <s.hauer@pengutronix.de> wrote: > > > > On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote: > > > USB and (upcoming) SDIO support may sleep in the read/write handlers. > > > Shrink the RCU critical section so it only cover the call to > > > ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the > > > found station. This moves the chip's BFEE configuration outside the > > > rcu_read_lock section and thus prevent "scheduling while atomic" or > > > "Voluntary context switch within RCU read-side critical section!" > > > warnings when accessing the registers using an SDIO card (which is > > > where this issue has been spotted in the real world - but it also > > > affects USB cards). > > > > Unfortunately this introduces a regression on my RTW8821CU chip. With > > this it constantly looses connection to the AP and reconnects shortly > > after: > Sorry to hear this! This is odd and unfortunately I don't understand > the reason for this. > rtw_bf_assoc() is only called from > drivers/net/wireless/realtek/rtw88/mac80211.c with rtwdev->mutex held. > So I don't think that it's a race condition. > > There's a module parameter which lets you enable/disable BF support: > $ git grep rtw_bf_support drivers/net/wireless/realtek/rtw88/ | grep param > drivers/net/wireless/realtek/rtw88/main.c:module_param_named(support_bf, > rtw_bf_support, bool, 0644); I was a bit too fast reporting this. Yes, there seems to be a problem with the RTW8821CU, but it doesn't seem to be related to this patch. Sorry for the noise. The chipset seems to have problems with one access point that I have and I can see these problems with or without the patch. Maybe NetworkManager decided to connect to another accesspoint without me noticing it, making it look to me as if this patch was guilty. Sascha
Hi Sascha, On Mon, Apr 3, 2023 at 12:00 PM Sascha Hauer <s.hauer@pengutronix.de> wrote: [...] > > There's a module parameter which lets you enable/disable BF support: > > $ git grep rtw_bf_support drivers/net/wireless/realtek/rtw88/ | grep param > > drivers/net/wireless/realtek/rtw88/main.c:module_param_named(support_bf, > > rtw_bf_support, bool, 0644); > > I was a bit too fast reporting this. Yes, there seems to be a problem > with the RTW8821CU, but it doesn't seem to be related to this patch. > > Sorry for the noise. Thanks for investigating further and confirming that this is not the cause! And don't worry: we're all human and with complex drivers that can be impacted by so many things (other APs, phones, antennas, ...) it's easy to miss a tiny detail (I've been there before). Best regards, Martin
diff --git a/drivers/net/wireless/realtek/rtw88/bf.c b/drivers/net/wireless/realtek/rtw88/bf.c index 038a30b170ef..c827c4a2814b 100644 --- a/drivers/net/wireless/realtek/rtw88/bf.c +++ b/drivers/net/wireless/realtek/rtw88/bf.c @@ -49,19 +49,23 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif, sta = ieee80211_find_sta(vif, bssid); if (!sta) { + rcu_read_unlock(); + rtw_warn(rtwdev, "failed to find station entry for bss %pM\n", bssid); - goto out_unlock; + return; } ic_vht_cap = &hw->wiphy->bands[NL80211_BAND_5GHZ]->vht_cap; vht_cap = &sta->deflink.vht_cap; + rcu_read_unlock(); + if ((ic_vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMEE_CAPABLE) && (vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMER_CAPABLE)) { if (bfinfo->bfer_mu_cnt >= chip->bfer_mu_max_num) { rtw_dbg(rtwdev, RTW_DBG_BF, "mu bfer number over limit\n"); - goto out_unlock; + return; } ether_addr_copy(bfee->mac_addr, bssid); @@ -75,7 +79,7 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif, (vht_cap->cap & IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE)) { if (bfinfo->bfer_su_cnt >= chip->bfer_su_max_num) { rtw_dbg(rtwdev, RTW_DBG_BF, "su bfer number over limit\n"); - goto out_unlock; + return; } sound_dim = vht_cap->cap & @@ -98,9 +102,6 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif, rtw_chip_config_bfee(rtwdev, rtwvif, bfee, true); } - -out_unlock: - rcu_read_unlock(); } void rtw_bf_init_bfer_entry_mu(struct rtw_dev *rtwdev,