diff mbox series

wifi: rtw88: reduce failed to flush queue severity

Message ID 20240413230030.390563-2-lewis.robbins2@gmail.com
State New
Headers show
Series wifi: rtw88: reduce failed to flush queue severity | expand

Commit Message

Lewis Robbins April 13, 2024, 10:59 p.m. UTC
Reduce the log message severity when we fail to flush device priority
queue. If a system has a lot of traffic, we may fail to flush the queue
in time. This generates a lot of messages in the kernel ring buffer. As
this is a common occurrence, we should use dev_info instead of dev_warn.

Signed-off-by: Lewis Robbins <lewis.robbins2@gmail.com>
---
 drivers/net/wireless/realtek/rtw88/mac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ping-Ke Shih April 15, 2024, 1:40 a.m. UTC | #1
Lewis Robbins <lewis.robbins2@gmail.com> wrote:
> 
> Reduce the log message severity when we fail to flush device priority
> queue. If a system has a lot of traffic, we may fail to flush the queue
> in time. This generates a lot of messages in the kernel ring buffer. As
> this is a common occurrence, we should use dev_info instead of dev_warn.
> 
> Signed-off-by: Lewis Robbins <lewis.robbins2@gmail.com>

Acked-by: Ping-Ke Shih <pkshih@realtek.com>

I'd like to know situations of " If a system has a lot of traffic...". 
Did you scan or do something during traffic?
Kalle Valo April 15, 2024, 4:16 p.m. UTC | #2
Ping-Ke Shih <pkshih@realtek.com> writes:

> Lewis Robbins <lewis.robbins2@gmail.com> wrote:
>> 
>> Reduce the log message severity when we fail to flush device priority
>> queue. If a system has a lot of traffic, we may fail to flush the queue
>> in time. This generates a lot of messages in the kernel ring buffer. As
>> this is a common occurrence, we should use dev_info instead of dev_warn.
>> 
>> Signed-off-by: Lewis Robbins <lewis.robbins2@gmail.com>
>
> Acked-by: Ping-Ke Shih <pkshih@realtek.com>
>
> I'd like to know situations of " If a system has a lot of traffic...". 
> Did you scan or do something during traffic?

The driver shouldn't print any warnings in normal usage, even using info
level. If this is expected scenario then maybe change it to debug print?
Or if is this an actual bug then it's better try to investigate and fix
it.
Lewis Robbins April 15, 2024, 11:28 p.m. UTC | #3
Ping-Ke Shih <pkshih@realtek.com> writes:

> Lewis Robbins <lewis.robbins2@gmail.com> wrote:
>> 
>> Reduce the log message severity when we fail to flush device priority
>> queue. If a system has a lot of traffic, we may fail to flush the queue
>> in time. This generates a lot of messages in the kernel ring buffer. As
>> this is a common occurrence, we should use dev_info instead of dev_warn.
>> 
>> Signed-off-by: Lewis Robbins <lewis.robbins2@gmail.com>
>
> Acked-by: Ping-Ke Shih <pkshih@realtek.com>
>
> I'd like to know situations of " If a system has a lot of traffic...". 
> Did you scan or do something during traffic?

So, after digging a bit more, it seems you're right this only happens during a
scan. The log message itself is repeated about 5-10x.

Kalle Valo <kvalo@kernel.org> writes:

> The driver shouldn't print any warnings in normal usage, even using info
> level. If this is expected scenario then maybe change it to debug print?
> Or if is this an actual bug then it's better try to investigate and fix
> it.

I have the stack-trace:

[23838.633664] rtw_8821ce 0000:02:00.0: timed out to flush queue 2
[23838.633685] CPU: 1 PID: 363059 Comm: kworker/u8:1 Tainted G 6.8.5
[23838.633698] Hardware name:  /, BIOS 5.26 09/26/2023
[23838.633704] Workqueue: events_unbound cfg80211_wiphy_work [cfg80211]
[23838.633881] Call Trace:
[23838.633889]  <TASK>
[23838.633898]  dump_stack_lvl+0x47/0x60
[23838.633918]  rtw_mac_flush_queues+0x148/0x190 [rtw88_core 0d7ad2d9d6116c633c0aab4e7bc6016d572d75d4]
[23838.633993]  rtw_ops_flush+0x5a/0x70 [rtw88_core 0d7ad2d9d6116c633c0aab4e7bc6016d572d75d4]
[23838.634056]  __ieee80211_flush_queues+0x10b/0x2e0 [mac80211 5d0b446baffe1290bc56d55aa496e941688b7b40]
[23838.634309]  ieee80211_scan_work+0x3e3/0x520 [mac80211 5d0b446baffe1290bc56d55aa496e941688b7b40]
[23838.634494]  cfg80211_wiphy_work+0xa7/0xe0 [cfg80211 b36d5437ba649ace42ea92e8f83a3ec499e0d5b7]
[23838.634646]  process_one_work+0x178/0x350
[23838.634660]  worker_thread+0x30f/0x450
[23838.634670]  ? __pfx_worker_thread+0x10/0x10
[23838.634678]  kthread+0xe5/0x120
[23838.634691]  ? __pfx_kthread+0x10/0x10
[23838.634702]  ret_from_fork+0x31/0x50
[23838.634714]  ? __pfx_kthread+0x10/0x10
[23838.634724]  ret_from_fork_asm+0x1b/0x30
[23838.634736]  </TASK>

I'm not sure as to the cause. If the flush operation takes a long time do we 
need to release any mutexes etc? And if this is just a hardware issue, then we
can do a debug print as you say.

BugZilla: https://bugzilla.kernel.org/show_bug.cgi?id=218697
Ping-Ke Shih April 16, 2024, 2:57 a.m. UTC | #4
Lewis Robbins <lewis.robbins2@gmail.com> wrote:
> 
> Ping-Ke Shih <pkshih@realtek.com> writes:
> 
> > Lewis Robbins <lewis.robbins2@gmail.com> wrote:
> >>
> >> Reduce the log message severity when we fail to flush device priority
> >> queue. If a system has a lot of traffic, we may fail to flush the queue
> >> in time. This generates a lot of messages in the kernel ring buffer. As
> >> this is a common occurrence, we should use dev_info instead of dev_warn.
> >>
> >> Signed-off-by: Lewis Robbins <lewis.robbins2@gmail.com>
> >
> > Acked-by: Ping-Ke Shih <pkshih@realtek.com>
> >
> > I'd like to know situations of " If a system has a lot of traffic...".
> > Did you scan or do something during traffic?
> 
> So, after digging a bit more, it seems you're right this only happens during a
> scan. The log message itself is repeated about 5-10x.

That is the same as my test before. 

> 
> I'm not sure as to the cause. If the flush operation takes a long time do we
> need to release any mutexes etc? And if this is just a hardware issue, then we
> can do a debug print as you say.

The cause is because packets in hardware TX queue that can't be sent out in time,
and flush ops with 'drop = false', so driver throws one warning. I don't have
good idea for now. Maybe, we can add a special debug mask to replace this kind of
verbose warning with uncertain solution.
Ping-Ke Shih April 17, 2024, 1:42 a.m. UTC | #5
> 
> The cause is because packets in hardware TX queue that can't be sent out in time,
> and flush ops with 'drop = false', so driver throws one warning. I don't have
> good idea for now. Maybe, we can add a special debug mask to replace this kind of
> verbose warning with uncertain solution.

I have made a patch [1] as mentioned before. Lewis, please give it a try.

[1] https://lore.kernel.org/linux-wireless/20240417014036.11234-1-pkshih@realtek.com/T/#u
diff mbox series

Patch

diff --git a/drivers/net/wireless/realtek/rtw88/mac.c b/drivers/net/wireless/realtek/rtw88/mac.c
index 0c1c1ff31085..800f4eabae13 100644
--- a/drivers/net/wireless/realtek/rtw88/mac.c
+++ b/drivers/net/wireless/realtek/rtw88/mac.c
@@ -1040,7 +1040,7 @@  static void __rtw_mac_flush_prio_queue(struct rtw_dev *rtwdev,
 	 * And it requires like ~2secs to flush the full priority queue.
 	 */
 	if (!drop)
-		rtw_warn(rtwdev, "timed out to flush queue %d\n", prio_queue);
+		rtw_info(rtwdev, "timed out to flush queue %d\n", prio_queue);
 }
 
 static void rtw_mac_flush_prio_queues(struct rtw_dev *rtwdev,