diff mbox series

[v3,1/3] wifi: rtw88: Move register access from rtw_bf_assoc() outside the RCU

Message ID 20230108211324.442823-2-martin.blumenstingl@googlemail.com
State New
Headers show
Series wifi: rtw88: Three locking fixes for existing code | expand

Commit Message

Martin Blumenstingl Jan. 8, 2023, 9:13 p.m. UTC
USB and (upcoming) SDIO support may sleep in the read/write handlers.
Shrink the RCU critical section so it only cover the call to
ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the
found station. This moves the chip's BFEE configuration outside the
rcu_read_lock section and thus prevent "scheduling while atomic" or
"Voluntary context switch within RCU read-side critical section!"
warnings when accessing the registers using an SDIO card (which is
where this issue has been spotted in the real world - but it also
affects USB cards).

Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
---
v1 -> v2:
- Added Ping-Ke's Reviewed-by (thank you!)

v2 -> v3:
- Added Sascha's Tested-by (thank you!)
- added "wifi" prefix to the subject and reworded the title accordingly


 drivers/net/wireless/realtek/rtw88/bf.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Kalle Valo Jan. 16, 2023, 4:27 p.m. UTC | #1
Martin Blumenstingl <martin.blumenstingl@googlemail.com> wrote:

> USB and (upcoming) SDIO support may sleep in the read/write handlers.
> Shrink the RCU critical section so it only cover the call to
> ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the
> found station. This moves the chip's BFEE configuration outside the
> rcu_read_lock section and thus prevent "scheduling while atomic" or
> "Voluntary context switch within RCU read-side critical section!"
> warnings when accessing the registers using an SDIO card (which is
> where this issue has been spotted in the real world - but it also
> affects USB cards).
> 
> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
> Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>

3 patches applied to wireless-next.git, thanks.

8a1e2fd8e2da wifi: rtw88: Move register access from rtw_bf_assoc() outside the RCU
313f6dc7c5ed wifi: rtw88: Use rtw_iterate_vifs() for rtw_vif_watch_dog_iter()
2931978cd74f wifi: rtw88: Use non-atomic sta iterator in rtw_ra_mask_info_update()
Sascha Hauer March 31, 2023, 12:59 p.m. UTC | #2
On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote:
> USB and (upcoming) SDIO support may sleep in the read/write handlers.
> Shrink the RCU critical section so it only cover the call to
> ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the
> found station. This moves the chip's BFEE configuration outside the
> rcu_read_lock section and thus prevent "scheduling while atomic" or
> "Voluntary context switch within RCU read-side critical section!"
> warnings when accessing the registers using an SDIO card (which is
> where this issue has been spotted in the real world - but it also
> affects USB cards).

Unfortunately this introduces a regression on my RTW8821CU chip. With
this it constantly looses connection to the AP and reconnects shortly
after:

[  199.771143] wlan0: authenticate with b0:be:76:5e:7b:34
[  201.447301] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3)
[  201.456789] wlan0: authenticated
[  201.462356] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3)
[  201.477263] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2)
[  201.512995] wlan0: associated
[  213.790399] wlan0: authenticate with b0:be:76:5e:7b:34
[  215.467302] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3)
[  215.470532] wlan0: authenticated
[  215.490355] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3)
[  215.503777] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2)
[  215.539608] wlan0: associated
[  227.770596] wlan0: authenticate with b0:be:76:5e:7b:34
[  229.443302] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3)
[  229.451209] wlan0: authenticated
[  229.462487] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3)
[  229.476077] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2)
[  229.513499] wlan0: associated
[  241.738494] wlan0: authenticate with b0:be:76:5e:7b:34
[  243.407301] wlan0: send auth to b0:be:76:5e:7b:34 (try 1/3)
[  243.411207] wlan0: authenticated
[  243.423213] wlan0: associate with b0:be:76:5e:7b:34 (try 1/3)
[  243.439822] wlan0: RX AssocResp from b0:be:76:5e:7b:34 (capab=0x431 status=0 aid=2)
[  243.476731] wlan0: associated

I haven't got any further information yet, I just realized this when I
rebased my own RTW88 bugfix series from v6.2.2 to v6.3-rc4 before
sending it.

RTW8723D and RTW8822CU seem unaffected though.

Sascha

> 
> Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
> Tested-by: Sascha Hauer <s.hauer@pengutronix.de>
> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
> ---
> v1 -> v2:
> - Added Ping-Ke's Reviewed-by (thank you!)
> 
> v2 -> v3:
> - Added Sascha's Tested-by (thank you!)
> - added "wifi" prefix to the subject and reworded the title accordingly
> 
> 
>  drivers/net/wireless/realtek/rtw88/bf.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/wireless/realtek/rtw88/bf.c b/drivers/net/wireless/realtek/rtw88/bf.c
> index 038a30b170ef..c827c4a2814b 100644
> --- a/drivers/net/wireless/realtek/rtw88/bf.c
> +++ b/drivers/net/wireless/realtek/rtw88/bf.c
> @@ -49,19 +49,23 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif,
>  
>  	sta = ieee80211_find_sta(vif, bssid);
>  	if (!sta) {
> +		rcu_read_unlock();
> +
>  		rtw_warn(rtwdev, "failed to find station entry for bss %pM\n",
>  			 bssid);
> -		goto out_unlock;
> +		return;
>  	}
>  
>  	ic_vht_cap = &hw->wiphy->bands[NL80211_BAND_5GHZ]->vht_cap;
>  	vht_cap = &sta->deflink.vht_cap;
>  
> +	rcu_read_unlock();
> +
>  	if ((ic_vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMEE_CAPABLE) &&
>  	    (vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMER_CAPABLE)) {
>  		if (bfinfo->bfer_mu_cnt >= chip->bfer_mu_max_num) {
>  			rtw_dbg(rtwdev, RTW_DBG_BF, "mu bfer number over limit\n");
> -			goto out_unlock;
> +			return;
>  		}
>  
>  		ether_addr_copy(bfee->mac_addr, bssid);
> @@ -75,7 +79,7 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif,
>  		   (vht_cap->cap & IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE)) {
>  		if (bfinfo->bfer_su_cnt >= chip->bfer_su_max_num) {
>  			rtw_dbg(rtwdev, RTW_DBG_BF, "su bfer number over limit\n");
> -			goto out_unlock;
> +			return;
>  		}
>  
>  		sound_dim = vht_cap->cap &
> @@ -98,9 +102,6 @@ void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif,
>  
>  		rtw_chip_config_bfee(rtwdev, rtwvif, bfee, true);
>  	}
> -
> -out_unlock:
> -	rcu_read_unlock();
>  }
>  
>  void rtw_bf_init_bfer_entry_mu(struct rtw_dev *rtwdev,
> -- 
> 2.39.0
> 
>
Martin Blumenstingl April 1, 2023, 9:30 p.m. UTC | #3
Hi Sascha,

On Fri, Mar 31, 2023 at 2:59 PM Sascha Hauer <s.hauer@pengutronix.de> wrote:
>
> On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote:
> > USB and (upcoming) SDIO support may sleep in the read/write handlers.
> > Shrink the RCU critical section so it only cover the call to
> > ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the
> > found station. This moves the chip's BFEE configuration outside the
> > rcu_read_lock section and thus prevent "scheduling while atomic" or
> > "Voluntary context switch within RCU read-side critical section!"
> > warnings when accessing the registers using an SDIO card (which is
> > where this issue has been spotted in the real world - but it also
> > affects USB cards).
>
> Unfortunately this introduces a regression on my RTW8821CU chip. With
> this it constantly looses connection to the AP and reconnects shortly
> after:
Sorry to hear this! This is odd and unfortunately I don't understand
the reason for this.
rtw_bf_assoc() is only called from
drivers/net/wireless/realtek/rtw88/mac80211.c with rtwdev->mutex held.
So I don't think that it's a race condition.

There's a module parameter which lets you enable/disable BF support:
$ git grep rtw_bf_support drivers/net/wireless/realtek/rtw88/ | grep param
drivers/net/wireless/realtek/rtw88/main.c:module_param_named(support_bf,
rtw_bf_support, bool, 0644);

Have you tried disabling BF support?
Also +Cc Jernej in case he has an idea.


Best regards,
Martin
Linux regression tracking (Thorsten Leemhuis) April 2, 2023, 11:30 a.m. UTC | #4
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 31.03.23 14:59, Sascha Hauer wrote:
> On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote:
>> USB and (upcoming) SDIO support may sleep in the read/write handlers.
>> Shrink the RCU critical section so it only cover the call to
>> ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the
>> found station. This moves the chip's BFEE configuration outside the
>> rcu_read_lock section and thus prevent "scheduling while atomic" or
>> "Voluntary context switch within RCU read-side critical section!"
>> warnings when accessing the registers using an SDIO card (which is
>> where this issue has been spotted in the real world - but it also
>> affects USB cards).
> 
> Unfortunately this introduces a regression on my RTW8821CU chip. With
> this it constantly looses connection to the AP and reconnects shortly
> after:

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced c7eca79def44
#regzbot title net: wifi: rtw88: RTW8821CU constantly looses connection
to the AP and reconnects shortly after
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
Sascha Hauer April 3, 2023, 10 a.m. UTC | #5
Hi Martin,

On Sat, Apr 01, 2023 at 11:30:40PM +0200, Martin Blumenstingl wrote:
> Hi Sascha,
> 
> On Fri, Mar 31, 2023 at 2:59 PM Sascha Hauer <s.hauer@pengutronix.de> wrote:
> >
> > On Sun, Jan 08, 2023 at 10:13:22PM +0100, Martin Blumenstingl wrote:
> > > USB and (upcoming) SDIO support may sleep in the read/write handlers.
> > > Shrink the RCU critical section so it only cover the call to
> > > ieee80211_find_sta() and finding the ic_vht_cap/vht_cap based on the
> > > found station. This moves the chip's BFEE configuration outside the
> > > rcu_read_lock section and thus prevent "scheduling while atomic" or
> > > "Voluntary context switch within RCU read-side critical section!"
> > > warnings when accessing the registers using an SDIO card (which is
> > > where this issue has been spotted in the real world - but it also
> > > affects USB cards).
> >
> > Unfortunately this introduces a regression on my RTW8821CU chip. With
> > this it constantly looses connection to the AP and reconnects shortly
> > after:
> Sorry to hear this! This is odd and unfortunately I don't understand
> the reason for this.
> rtw_bf_assoc() is only called from
> drivers/net/wireless/realtek/rtw88/mac80211.c with rtwdev->mutex held.
> So I don't think that it's a race condition.
> 
> There's a module parameter which lets you enable/disable BF support:
> $ git grep rtw_bf_support drivers/net/wireless/realtek/rtw88/ | grep param
> drivers/net/wireless/realtek/rtw88/main.c:module_param_named(support_bf,
> rtw_bf_support, bool, 0644);

I was a bit too fast reporting this. Yes, there seems to be a problem
with the RTW8821CU, but it doesn't seem to be related to this patch.

Sorry for the noise.

The chipset seems to have problems with one access point that I have and
I can see these problems with or without the patch. Maybe NetworkManager
decided to connect to another accesspoint without me noticing it, making
it look to me as if this patch was guilty.

Sascha
Martin Blumenstingl April 3, 2023, 7:41 p.m. UTC | #6
Hi Sascha,

On Mon, Apr 3, 2023 at 12:00 PM Sascha Hauer <s.hauer@pengutronix.de> wrote:
[...]
> > There's a module parameter which lets you enable/disable BF support:
> > $ git grep rtw_bf_support drivers/net/wireless/realtek/rtw88/ | grep param
> > drivers/net/wireless/realtek/rtw88/main.c:module_param_named(support_bf,
> > rtw_bf_support, bool, 0644);
>
> I was a bit too fast reporting this. Yes, there seems to be a problem
> with the RTW8821CU, but it doesn't seem to be related to this patch.
>
> Sorry for the noise.
Thanks for investigating further and confirming that this is not the cause!
And don't worry: we're all human and with complex drivers that can be
impacted by so many things (other APs, phones, antennas, ...) it's
easy to miss a tiny detail (I've been there before).


Best regards,
Martin
diff mbox series

Patch

diff --git a/drivers/net/wireless/realtek/rtw88/bf.c b/drivers/net/wireless/realtek/rtw88/bf.c
index 038a30b170ef..c827c4a2814b 100644
--- a/drivers/net/wireless/realtek/rtw88/bf.c
+++ b/drivers/net/wireless/realtek/rtw88/bf.c
@@ -49,19 +49,23 @@  void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif,
 
 	sta = ieee80211_find_sta(vif, bssid);
 	if (!sta) {
+		rcu_read_unlock();
+
 		rtw_warn(rtwdev, "failed to find station entry for bss %pM\n",
 			 bssid);
-		goto out_unlock;
+		return;
 	}
 
 	ic_vht_cap = &hw->wiphy->bands[NL80211_BAND_5GHZ]->vht_cap;
 	vht_cap = &sta->deflink.vht_cap;
 
+	rcu_read_unlock();
+
 	if ((ic_vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMEE_CAPABLE) &&
 	    (vht_cap->cap & IEEE80211_VHT_CAP_MU_BEAMFORMER_CAPABLE)) {
 		if (bfinfo->bfer_mu_cnt >= chip->bfer_mu_max_num) {
 			rtw_dbg(rtwdev, RTW_DBG_BF, "mu bfer number over limit\n");
-			goto out_unlock;
+			return;
 		}
 
 		ether_addr_copy(bfee->mac_addr, bssid);
@@ -75,7 +79,7 @@  void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif,
 		   (vht_cap->cap & IEEE80211_VHT_CAP_SU_BEAMFORMER_CAPABLE)) {
 		if (bfinfo->bfer_su_cnt >= chip->bfer_su_max_num) {
 			rtw_dbg(rtwdev, RTW_DBG_BF, "su bfer number over limit\n");
-			goto out_unlock;
+			return;
 		}
 
 		sound_dim = vht_cap->cap &
@@ -98,9 +102,6 @@  void rtw_bf_assoc(struct rtw_dev *rtwdev, struct ieee80211_vif *vif,
 
 		rtw_chip_config_bfee(rtwdev, rtwvif, bfee, true);
 	}
-
-out_unlock:
-	rcu_read_unlock();
 }
 
 void rtw_bf_init_bfer_entry_mu(struct rtw_dev *rtwdev,