diff mbox series

[v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled

Message ID 20210514071452.25220-1-kai.heng.feng@canonical.com
State New
Headers show
Series [v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled | expand

Commit Message

Kai-Heng Feng May 14, 2021, 7:14 a.m. UTC
Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
stops working:
[  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
[  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
[  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[  545.660350] usb 3-10: device descriptor read/64, error -110
[  561.283530] usb 3-10: device descriptor read/64, error -110
[  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
[  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
[  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
[  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
[  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
[  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)

Or kernel panics because other workqueues already freed skb:
[ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 2048.663775] #PF: supervisor read access in kernel mode
[ 2048.663779] #PF: error_code(0x0000) - not-present page
[ 2048.663782] PGD 0 P4D 0
[ 2048.663787] Oops: 0000 [#1] SMP NOPTI
[ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
[ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
[ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
[ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
07 83 f8 01
[ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
[ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
[ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
[ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
[ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
[ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
[ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
[ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
[ 2048.663856] PKRU: 55555554
[ 2048.663859] Call Trace:
[ 2048.663865]  ? skb_release_head_state+0x5e/0x80
[ 2048.663873]  kfree_skb+0x2f/0xb0
[ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
[ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
[ 2048.663954]  ? __cond_resched+0x1a/0x50
[ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
[ 2048.664007]  rfkill_set_block+0x98/0x170
[ 2048.664016]  rfkill_fop_write+0x136/0x1e0
[ 2048.664022]  vfs_write+0xc7/0x260
[ 2048.664030]  ksys_write+0xb1/0xe0
[ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
[ 2048.664042]  __x64_sys_write+0x1a/0x20
[ 2048.664048]  do_syscall_64+0x40/0xb0
[ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 2048.664060] RIP: 0033:0x7fe02ac23c27
[ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
[ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
[ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
[ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
[ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
[ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff

So move the shutdown callback to a place where workqueues are either
flushed or cancelled to resolve the issue.

Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
---
v2:
 - Rebased on bluetooth-next.

 net/bluetooth/hci_core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

Comments

Mattijs Korpershoek July 28, 2021, 1:50 p.m. UTC | #1
Hi Kai-Heng, Marcel,

Marcel Holtmann <marcel@holtmann.org> writes:

> Hi Kai-Heng,

>

>> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it

>> stops working:

>> [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed

>> [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on

>> [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd

>> [  545.660350] usb 3-10: device descriptor read/64, error -110

>> [  561.283530] usb 3-10: device descriptor read/64, error -110

>> [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd

>> [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500

>> [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)

>> [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)

>> [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)

>> [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)

>> 

>> Or kernel panics because other workqueues already freed skb:

>> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000

>> [ 2048.663775] #PF: supervisor read access in kernel mode

>> [ 2048.663779] #PF: error_code(0x0000) - not-present page

>> [ 2048.663782] PGD 0 P4D 0

>> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI

>> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20

>> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020

>> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50

>> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1

>> 07 83 f8 01

>> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286

>> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000

>> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000

>> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0

>> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000

>> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000

>> [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000

>> [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033

>> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0

>> [ 2048.663856] PKRU: 55555554

>> [ 2048.663859] Call Trace:

>> [ 2048.663865]  ? skb_release_head_state+0x5e/0x80

>> [ 2048.663873]  kfree_skb+0x2f/0xb0

>> [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]

>> [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]

>> [ 2048.663954]  ? __cond_resched+0x1a/0x50

>> [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]

>> [ 2048.664007]  rfkill_set_block+0x98/0x170

>> [ 2048.664016]  rfkill_fop_write+0x136/0x1e0

>> [ 2048.664022]  vfs_write+0xc7/0x260

>> [ 2048.664030]  ksys_write+0xb1/0xe0

>> [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0

>> [ 2048.664042]  __x64_sys_write+0x1a/0x20

>> [ 2048.664048]  do_syscall_64+0x40/0xb0

>> [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae

>> [ 2048.664060] RIP: 0033:0x7fe02ac23c27

>> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24

>> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001

>> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27

>> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003

>> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440

>> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003

>> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff

>> 

>> So move the shutdown callback to a place where workqueues are either

>> flushed or cancelled to resolve the issue.

>> 

>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>

>> ---

>> v2:

>> - Rebased on bluetooth-next.

>> 

>> net/bluetooth/hci_core.c | 16 ++++++++--------

>> 1 file changed, 8 insertions(+), 8 deletions(-)

>

> patch has been applied to bluetooth-next tree.


This patch seems to introduce a regression in the btmtksdio driver.
With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin:

i500-pumpkin login: root
root@i500-pumpkin:~# uname -a
Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
root@i500-pumpkin:~# hciconfig hci0 up
Can't init device hci0: Connection timed out (110)
root@i500-pumpkin:~# hciconfig hci0 down
root@i500-pumpkin:~# hciconfig hci0 up
Can't init device hci0: Input/output error (5)

Reverting it fixes the above issue.
Any suggestion on how to fix this without touching hci_core ?
Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips.

Thanks,
Mattijs Korpershoek


>

> Regards

>

> Marcel
Mattijs Korpershoek July 30, 2021, 11:40 a.m. UTC | #2
Hi Kai-Heng,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek
> <mkorpershoek@baylibre.com> wrote:
>>
>> Hi Kai-Heng, Marcel,
>>
>> Marcel Holtmann <marcel@holtmann.org> writes:
>>
>> > Hi Kai-Heng,
>> >
>> >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
>> >> stops working:
>> >> [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
>> >> [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
>> >> [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> >> [  545.660350] usb 3-10: device descriptor read/64, error -110
>> >> [  561.283530] usb 3-10: device descriptor read/64, error -110
>> >> [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
>> >> [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
>> >> [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
>> >> [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> >> [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
>> >> [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
>> >>
>> >> Or kernel panics because other workqueues already freed skb:
>> >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
>> >> [ 2048.663775] #PF: supervisor read access in kernel mode
>> >> [ 2048.663779] #PF: error_code(0x0000) - not-present page
>> >> [ 2048.663782] PGD 0 P4D 0
>> >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
>> >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
>> >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
>> >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
>> >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
>> >> 07 83 f8 01
>> >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
>> >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
>> >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
>> >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
>> >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
>> >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
>> >> [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
>> >> [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
>> >> [ 2048.663856] PKRU: 55555554
>> >> [ 2048.663859] Call Trace:
>> >> [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
>> >> [ 2048.663873]  kfree_skb+0x2f/0xb0
>> >> [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
>> >> [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
>> >> [ 2048.663954]  ? __cond_resched+0x1a/0x50
>> >> [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
>> >> [ 2048.664007]  rfkill_set_block+0x98/0x170
>> >> [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
>> >> [ 2048.664022]  vfs_write+0xc7/0x260
>> >> [ 2048.664030]  ksys_write+0xb1/0xe0
>> >> [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
>> >> [ 2048.664042]  __x64_sys_write+0x1a/0x20
>> >> [ 2048.664048]  do_syscall_64+0x40/0xb0
>> >> [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
>> >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27
>> >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>> >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>> >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
>> >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
>> >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
>> >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
>> >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
>> >>
>> >> So move the shutdown callback to a place where workqueues are either
>> >> flushed or cancelled to resolve the issue.
>> >>
>> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
>> >> ---
>> >> v2:
>> >> - Rebased on bluetooth-next.
>> >>
>> >> net/bluetooth/hci_core.c | 16 ++++++++--------
>> >> 1 file changed, 8 insertions(+), 8 deletions(-)
>> >
>> > patch has been applied to bluetooth-next tree.
>>
>> This patch seems to introduce a regression in the btmtksdio driver.
>> With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin:
>>
>> i500-pumpkin login: root
>> root@i500-pumpkin:~# uname -a
>> Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Connection timed out (110)
>> root@i500-pumpkin:~# hciconfig hci0 down
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Input/output error (5)
>>
>> Reverting it fixes the above issue.
>> Any suggestion on how to fix this without touching hci_core ?
>> Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips.
>
> Can you please attach dmesg?  Also, full ftrace log on btmtksdio can
> also be helpful.
Thank you for your help. Sorry I did not post the logs previously.

dmesg: https://pastebin.com/tpWDNyQr
ftrace on btmtksdio: https://pastebin.com/jmhvmwUw

Mattijs
>
> Kai-Heng
>
>>
>> Thanks,
>> Mattijs Korpershoek
>>
>>
>> >
>> > Regards
>> >
>> > Marcel
Kai-Heng Feng Aug. 3, 2021, 6:42 a.m. UTC | #3
Hi Mattijs,

On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>

> Hi Kai-Heng,


[snipped]

> Thank you for your help. Sorry I did not post the logs previously.

>

> dmesg: https://pastebin.com/tpWDNyQr

> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw


Seems like btmtksdio needs shudown() to be called before flush().
Since the order was there for a very long time, changing the calling
order indeed can break what driver expects.
Can you please test the following patch:
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d..a61e610a400c 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
        aosp_do_close(hdev);
        msft_do_close(hdev);

+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        if (hdev->flush)
                hdev->flush(hdev);

@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }

-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);

Kai-Heng

>

> Mattijs

> >

> > Kai-Heng

> >

> >>

> >> Thanks,

> >> Mattijs Korpershoek

> >>

> >>

> >> >

> >> > Regards

> >> >

> >> > Marcel
Kai-Heng Feng Aug. 3, 2021, 6:45 a.m. UTC | #4
On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote:
> > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> > stops working:
> > [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> > [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> > [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > [  545.660350] usb 3-10: device descriptor read/64, error -110
> > [  561.283530] usb 3-10: device descriptor read/64, error -110
> > [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> > [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> > [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> > [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> > [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> >
> > Or kernel panics because other workqueues already freed skb:
> > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > [ 2048.663775] #PF: supervisor read access in kernel mode
> > [ 2048.663779] #PF: error_code(0x0000) - not-present page
> > [ 2048.663782] PGD 0 P4D 0
> > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
> > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> > 07 83 f8 01
> > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> > [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> > [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> > [ 2048.663856] PKRU: 55555554
> > [ 2048.663859] Call Trace:
> > [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
> > [ 2048.663873]  kfree_skb+0x2f/0xb0
> > [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
> > [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> > [ 2048.663954]  ? __cond_resched+0x1a/0x50
> > [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> > [ 2048.664007]  rfkill_set_block+0x98/0x170
> > [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
> > [ 2048.664022]  vfs_write+0xc7/0x260
> > [ 2048.664030]  ksys_write+0xb1/0xe0
> > [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
> > [ 2048.664042]  __x64_sys_write+0x1a/0x20
> > [ 2048.664048]  do_syscall_64+0x40/0xb0
> > [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> >
> > So move the shutdown callback to a place where workqueues are either
> > flushed or cancelled to resolve the issue.
> >
> > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > ---
> > v2:
> >  - Rebased on bluetooth-next.
> >
> >  net/bluetooth/hci_core.c | 16 ++++++++--------
> >  1 file changed, 8 insertions(+), 8 deletions(-)
> >
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 7baf93eda936..6eedf334f943 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >
> >       BT_DBG("%s %p", hdev->name, hdev);
> >
> > -     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > -         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > -         test_bit(HCI_UP, &hdev->flags)) {
> > -             /* Execute vendor specific shutdown routine */
> > -             if (hdev->shutdown)
> > -                     hdev->shutdown(hdev);
> > -     }
> > -
> >       cancel_delayed_work(&hdev->power_off);
> >       cancel_delayed_work(&hdev->ncmd_timer);
> >
> > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >               clear_bit(HCI_INIT, &hdev->flags);
> >       }
> >
> > +     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > +         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > +         test_bit(HCI_UP, &hdev->flags)) {
> > +             /* Execute vendor specific shutdown routine */
> > +             if (hdev->shutdown)
> > +                     hdev->shutdown(hdev);
> > +     }
> > +
> >       /* flush cmd  work */
> >       flush_work(&hdev->cmd_work);
> >
> > --
> > 2.30.2
>
> btusb_shutdown_intel_new
>   __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
>     __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout);
>       hci_req_run_skb(&req, hci_req_sync_complete);
>
> hci_req_sync_complete
>   if (skb)
>         hdev->req_skb = skb_get(skb);
>
> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> driver side, I doubt this patch is the correct fix as it is.

Some workqueues are still active.
The shutdown() should be called at least after hci_request_cancel_all().

Kai-Heng
Kai-Heng Feng Aug. 4, 2021, 2:35 p.m. UTC | #5
On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote:
> >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote:
> >>
> >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> >> driver side, I doubt this patch is the correct fix as it is.
> >
> >Some workqueues are still active.
> >The shutdown() should be called at least after hci_request_cancel_all().
>
> What is muddy then is how active workqueues prevent skb_get from protecting
> kfree_skb. Can you spot what workqueue it is?

I managed to reproduce the issue with another kernel splat:
------------[ cut here ]------------
kernel BUG at mm/slub.c:321!
invalid opcode: 0000 [#1] SMP NOPTI
CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16
Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver.
01.03.04 01/07/2021
Workqueue: hci0 discov_update [bluetooth]
RIP: 0010:__slab_free+0x20c/0x3a0
Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24
1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b
49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24
RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246
RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600
RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500
RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e
R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80
R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600
FS:  0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0
PKRU: 55555554
Call Trace:
 ? psi_task_switch+0xc3/0x1e0
 ? __switch_to_asm+0x36/0x70
 ? skb_free_head+0x67/0x80
 kmem_cache_free+0x370/0x3d0
 ? kfree_skbmem+0x4e/0x90
 kfree_skbmem+0x4e/0x90
 kfree_skb+0x47/0xb0
 __hci_req_sync+0x134/0x2a0 [bluetooth]
 ? wait_woken+0x70/0x70
 discov_update+0x2ae/0x310 [bluetooth]
 process_one_work+0x21d/0x3c0
 worker_thread+0x53/0x420
 ? process_one_work+0x3c0/0x3c0
 kthread+0x127/0x150
 ? set_kthread_struct+0x50/0x50
 ret_from_fork+0x1f/0x30
Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg bnep
nls_iso8859_1 snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common
snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek
snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl
snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci snd_sof
snd_sof_xtensa_dsp snd_sof_intel_hda snd_hda_ext_core
snd_soc_acpi_intel_match snd_soc_acpi ledtrig_audio snd_soc_core
snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core
snd_pcm snd_seq i915 snd_seq_device snd_timer hp_wmi intel_tcc_cooling
x86_pkg_temp_thermal intel_powerclamp i2c_algo_bit coretemp joydev
kvm_intel ttm mei_hdcp intel_rapl_msr platform_profile wmi_bmof kvm
uvcvideo crct10dif_pclmul btusb videobuf2_vmalloc videobuf2_memops
drm_kms_helper btrtl videobuf2_v4l2 crc32_pclmul btbcm
ghash_clmulni_intel input_leds videobuf2_common btintel snd videodev
syscopyarea sysfillrect sysimgblt aesni_intel mc serio_raw fb_sys_fops
bluetooth crypto_simd
 cec cryptd intel_cstate ecdh_generic efi_pstore ecc rc_core
hid_multitouch processor_thermal_device_pci_legacy mei_me
intel_soc_dts_iosf processor_thermal_device mei soundcore
processor_thermal_rfim ee1004 processor_thermal_mbox
processor_thermal_rapl intel_pmt_telemetry intel_rapl_common
intel_pmt_class ucsi_acpi typec_ucsi typec wmi soc_button_array
int3403_thermal int340x_thermal_zone video int3400_thermal
acpi_thermal_rel acpi_pad mac_hid intel_hid sparse_keymap sch_fq_codel
msr parport_pc ppdev lp drm parport ip_tables x_tables autofs4
hid_generic nvme nvme_core intel_lpss_pci e1000e intel_lpss i2c_i801
idma64 i2c_smbus xhci_pci xhci_pci_renesas vmd intel_pmt i2c_hid_acpi
i2c_hid hid pinctrl_tigerlake
---[ end trace c09445d4697039ed ]---

So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update)
and can prevent the race from happening.

And the kernel splat is just one symptom of the issue, most of the
time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in
dmesg.

Kai-Heng
Kai-Heng Feng Aug. 4, 2021, 2:42 p.m. UTC | #6
On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>
> Hi Kai-Heng,
>
> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>
> > Hi Mattijs,
> >
> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
> > <mkorpershoek@baylibre.com> wrote:
> >>
> >> Hi Kai-Heng,
> >
> > [snipped]
> >
> >> Thank you for your help. Sorry I did not post the logs previously.
> >>
> >> dmesg: https://pastebin.com/tpWDNyQr
> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
> >
> > Seems like btmtksdio needs shudown() to be called before flush().
> > Since the order was there for a very long time, changing the calling
> > order indeed can break what driver expects.
> > Can you please test the following patch:
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 2560ed2f144d..a61e610a400c 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >         aosp_do_close(hdev);
> >         msft_do_close(hdev);
> >
> > +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > +           test_bit(HCI_UP, &hdev->flags)) {
> > +               /* Execute vendor specific shutdown routine */
> > +               if (hdev->shutdown)
> > +                       hdev->shutdown(hdev);
> > +       }
> > +
> >         if (hdev->flush)
> >                 hdev->flush(hdev);
> >
> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >                 clear_bit(HCI_INIT, &hdev->flags);
> >         }
> >
> > -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > -           test_bit(HCI_UP, &hdev->flags)) {
> > -               /* Execute vendor specific shutdown routine */
> > -               if (hdev->shutdown)
> > -                       hdev->shutdown(hdev);
> > -       }
> > -
> >         /* flush cmd  work */
> >         flush_work(&hdev->cmd_work);
>
> Thanks for the patch and your help.
> I've tried it, but it seems that it does not improve for me.
> I'm still observing:
>
> i500-pumpkin login: root
> root@i500-pumpkin:~# hciconfig hci0 up
> Can't init device hci0: Connection timed out (110)
>
> Logs for this session:
> dmesg:   https://pastebin.com/iAFk5Tzi
> ftrace:  https://pastebin.com/kEMWSYrE

Thanks for the testing!
What about moving the shutdown() part right after hci_req_sync_lock()
so tx/rx can still work:

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d4..be3113fb7d4b0 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
        hci_request_cancel_all(hdev);
        hci_req_sync_lock(hdev);

+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
                cancel_delayed_work_sync(&hdev->cmd_timer);
                hci_req_sync_unlock(hdev);
@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }

-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);





>
>
> >
> > Kai-Heng
> >
> >>
> >> Mattijs
> >> >
> >> > Kai-Heng
> >> >
> >> >>
> >> >> Thanks,
> >> >> Mattijs Korpershoek
> >> >>
> >> >>
> >> >> >
> >> >> > Regards
> >> >> >
> >> >> > Marcel
Hsin-Yi Wang Aug. 5, 2021, 6:12 a.m. UTC | #7
On Tue, Aug 3, 2021 at 2:45 PM Kai-Heng Feng
<kai.heng.feng@canonical.com> wrote:
>
> On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote:
> >
> > On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote:
> > > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it
> > > stops working:
> > > [  509.691509] Bluetooth: hci0: HCI reset during shutdown failed
> > > [  514.897584] Bluetooth: hci0: MSFT filter_enable is already on
> > > [  530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > > [  545.660350] usb 3-10: device descriptor read/64, error -110
> > > [  561.283530] usb 3-10: device descriptor read/64, error -110
> > > [  561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd
> > > [  566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500
> > > [  568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113)
> > > [  578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> > > [  586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113)
> > > [  596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110)
> > >
> > > Or kernel panics because other workqueues already freed skb:
> > > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > > [ 2048.663775] #PF: supervisor read access in kernel mode
> > > [ 2048.663779] #PF: error_code(0x0000) - not-present page
> > > [ 2048.663782] PGD 0 P4D 0
> > > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI
> > > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G        W         5.13.0-rc1-next-20210510+ #20
> > > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020
> > > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50
> > > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1
> > > 07 83 f8 01
> > > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286
> > > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000
> > > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000
> > > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0
> > > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000
> > > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000
> > > [ 2048.663843] FS:  00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000
> > > [ 2048.663849] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0
> > > [ 2048.663856] PKRU: 55555554
> > > [ 2048.663859] Call Trace:
> > > [ 2048.663865]  ? skb_release_head_state+0x5e/0x80
> > > [ 2048.663873]  kfree_skb+0x2f/0xb0
> > > [ 2048.663881]  btusb_shutdown_intel_new+0x36/0x60 [btusb]
> > > [ 2048.663905]  hci_dev_do_close+0x48c/0x5e0 [bluetooth]
> > > [ 2048.663954]  ? __cond_resched+0x1a/0x50
> > > [ 2048.663962]  hci_rfkill_set_block+0x56/0xa0 [bluetooth]
> > > [ 2048.664007]  rfkill_set_block+0x98/0x170
> > > [ 2048.664016]  rfkill_fop_write+0x136/0x1e0
> > > [ 2048.664022]  vfs_write+0xc7/0x260
> > > [ 2048.664030]  ksys_write+0xb1/0xe0
> > > [ 2048.664035]  ? exit_to_user_mode_prepare+0x37/0x1c0
> > > [ 2048.664042]  __x64_sys_write+0x1a/0x20
> > > [ 2048.664048]  do_syscall_64+0x40/0xb0
> > > [ 2048.664055]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> > > [ 2048.664060] RIP: 0033:0x7fe02ac23c27
> > > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> > > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27
> > > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003
> > > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440
> > > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003
> > > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff
> > >
> > > So move the shutdown callback to a place where workqueues are either
> > > flushed or cancelled to resolve the issue.
> > >
> > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
> > > ---
> > > v2:
> > >  - Rebased on bluetooth-next.
> > >
> > >  net/bluetooth/hci_core.c | 16 ++++++++--------
> > >  1 file changed, 8 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > > index 7baf93eda936..6eedf334f943 100644
> > > --- a/net/bluetooth/hci_core.c
> > > +++ b/net/bluetooth/hci_core.c
> > > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > >
> > >       BT_DBG("%s %p", hdev->name, hdev);
> > >
> > > -     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > > -         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > -         test_bit(HCI_UP, &hdev->flags)) {
> > > -             /* Execute vendor specific shutdown routine */
> > > -             if (hdev->shutdown)
> > > -                     hdev->shutdown(hdev);
> > > -     }
> > > -
> > >       cancel_delayed_work(&hdev->power_off);
> > >       cancel_delayed_work(&hdev->ncmd_timer);
> > >
> > > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
> > >               clear_bit(HCI_INIT, &hdev->flags);
> > >       }
> > >
> > > +     if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> > > +         !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> > > +         test_bit(HCI_UP, &hdev->flags)) {
> > > +             /* Execute vendor specific shutdown routine */
> > > +             if (hdev->shutdown)
> > > +                     hdev->shutdown(hdev);
> > > +     }
> > > +
> > >       /* flush cmd  work */
> > >       flush_work(&hdev->cmd_work);
> > >
> > > --
> > > 2.30.2
> >
> > btusb_shutdown_intel_new
> >   __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT);
> >     __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout);
> >       hci_req_run_skb(&req, hci_req_sync_complete);
> >
> > hci_req_sync_complete
> >   if (skb)
> >         hdev->req_skb = skb_get(skb);
> >
> > Given the skb_get in hci_req_sync_complete makes it safe to free skb on
> > driver side, I doubt this patch is the correct fix as it is.
>
> Some workqueues are still active.
> The shutdown() should be called at least after hci_request_cancel_all().
>
Hello,

The original patch 60789afc02f592b8d91217b60930e7a76271ae07
("Bluetooth: Shutdown controller after workqueues are flushed or
cancelled") is causing the tx fail when downloading fw on mediatek
mt8183 device using QCA bluetooth:

[  225.205061] Bluetooth: qca_download_firmware() hci0: QCA
Downloading qca/rampatch_00440302.bin
[  227.252653] Bluetooth: hci_cmd_timeout() hci0: command 0xfc00 tx timeout
...
follows by a lot of:
[  223.604971] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84)
[  223.605027] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84)

After applying the fixup to allow tx, the issue is solved.


> Kai-Heng
Mattijs Korpershoek Aug. 5, 2021, 6:55 a.m. UTC | #8
Hi Kai-Heng,

Thanks for your patch,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek
> <mkorpershoek@baylibre.com> wrote:
>>
>> Hi Kai-Heng,
>>
>> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>>
>> > Hi Mattijs,
>> >
>> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek
>> > <mkorpershoek@baylibre.com> wrote:
>> >>
>> >> Hi Kai-Heng,
>> >
>> > [snipped]
>> >
>> >> Thank you for your help. Sorry I did not post the logs previously.
>> >>
>> >> dmesg: https://pastebin.com/tpWDNyQr
>> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw
>> >
>> > Seems like btmtksdio needs shudown() to be called before flush().
>> > Since the order was there for a very long time, changing the calling
>> > order indeed can break what driver expects.
>> > Can you please test the following patch:
>> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> > index 2560ed2f144d..a61e610a400c 100644
>> > --- a/net/bluetooth/hci_core.c
>> > +++ b/net/bluetooth/hci_core.c
>> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> >         aosp_do_close(hdev);
>> >         msft_do_close(hdev);
>> >
>> > +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
>> > +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
>> > +           test_bit(HCI_UP, &hdev->flags)) {
>> > +               /* Execute vendor specific shutdown routine */
>> > +               if (hdev->shutdown)
>> > +                       hdev->shutdown(hdev);
>> > +       }
>> > +
>> >         if (hdev->flush)
>> >                 hdev->flush(hdev);
>> >
>> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> >                 clear_bit(HCI_INIT, &hdev->flags);
>> >         }
>> >
>> > -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
>> > -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
>> > -           test_bit(HCI_UP, &hdev->flags)) {
>> > -               /* Execute vendor specific shutdown routine */
>> > -               if (hdev->shutdown)
>> > -                       hdev->shutdown(hdev);
>> > -       }
>> > -
>> >         /* flush cmd  work */
>> >         flush_work(&hdev->cmd_work);
>>
>> Thanks for the patch and your help.
>> I've tried it, but it seems that it does not improve for me.
>> I'm still observing:
>>
>> i500-pumpkin login: root
>> root@i500-pumpkin:~# hciconfig hci0 up
>> Can't init device hci0: Connection timed out (110)
>>
>> Logs for this session:
>> dmesg:   https://pastebin.com/iAFk5Tzi
>> ftrace:  https://pastebin.com/kEMWSYrE
>
> Thanks for the testing!
> What about moving the shutdown() part right after hci_req_sync_lock()
> so tx/rx can still work:
>
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 2560ed2f144d4..be3113fb7d4b0 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
>         hci_request_cancel_all(hdev);
>         hci_req_sync_lock(hdev);
>
> +       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> +           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> +           test_bit(HCI_UP, &hdev->flags)) {
> +               /* Execute vendor specific shutdown routine */
> +               if (hdev->shutdown)
> +                       hdev->shutdown(hdev);
> +       }
> +
>         if (!test_and_clear_bit(HCI_UP, &hdev->flags)) {
>                 cancel_delayed_work_sync(&hdev->cmd_timer);
>                 hci_req_sync_unlock(hdev);
> @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
>                 clear_bit(HCI_INIT, &hdev->flags);
>         }
>
> -       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
> -           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
> -           test_bit(HCI_UP, &hdev->flags)) {
> -               /* Execute vendor specific shutdown routine */
> -               if (hdev->shutdown)
> -                       hdev->shutdown(hdev);
> -       }
> -
>         /* flush cmd  work */
>         flush_work(&hdev->cmd_work);
I confirm this diff works for me:

root@i500-pumpkin:~# hciconfig hci0 up
root@i500-pumpkin:~# hciconfig hci0 down
root@i500-pumpkin:~# hciconfig hci0 up
root@i500-pumpkin:~# hciconfig hci0
hci0:   Type: Primary  Bus: SDIO
        BD Address: 00:0C:E7:55:FF:12  ACL MTU: 1021:8  SCO MTU: 244:4
        UP RUNNING 
        RX bytes:11268 acl:0 sco:0 events:829 errors:0
        TX bytes:182569 acl:0 sco:0 commands:829 errors:0

root@i500-pumpkin:~# hcitool scan 
Scanning ...
        <redacted>       Pixel 3 XL

Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>
>
>
>
>
>
>>
>>
>> >
>> > Kai-Heng
>> >
>> >>
>> >> Mattijs
>> >> >
>> >> > Kai-Heng
>> >> >
>> >> >>
>> >> >> Thanks,
>> >> >> Mattijs Korpershoek
>> >> >>
>> >> >>
>> >> >> >
>> >> >> > Regards
>> >> >> >
>> >> >> > Marcel
Kai-Heng Feng Aug. 5, 2021, 3:50 p.m. UTC | #9
Hi Mattijs,

On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek
<mkorpershoek@baylibre.com> wrote:
>
> Hi Kai-Heng,
>
> Thanks for your patch,
>
> Kai-Heng Feng <kai.heng.feng@canonical.com> writes:
>

[snipped]

> I confirm this diff works for me:
>
> root@i500-pumpkin:~# hciconfig hci0 up
> root@i500-pumpkin:~# hciconfig hci0 down
> root@i500-pumpkin:~# hciconfig hci0 up
> root@i500-pumpkin:~# hciconfig hci0
> hci0:   Type: Primary  Bus: SDIO
>         BD Address: 00:0C:E7:55:FF:12  ACL MTU: 1021:8  SCO MTU: 244:4
>         UP RUNNING
>         RX bytes:11268 acl:0 sco:0 events:829 errors:0
>         TX bytes:182569 acl:0 sco:0 commands:829 errors:0
>
> root@i500-pumpkin:~# hcitool scan
> Scanning ...
>         <redacted>       Pixel 3 XL
>
> Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com>

I found that btmtksdio_flush() only cancels the work instead of doing
flush_work(). That probably explains why putting ->shutdown right
before ->flush doesn't work.
So can you please test the following again:
diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c
index 9872ef18f9fea..b33c05ad2150b 100644
--- a/drivers/bluetooth/btmtksdio.c
+++ b/drivers/bluetooth/btmtksdio.c
@@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev)
 {
        struct btmtksdio_dev *bdev = hci_get_drvdata(hdev);

-       skb_queue_purge(&bdev->txq);
+       flush_work(&bdev->tx_work);

-       cancel_work_sync(&bdev->tx_work);
+       skb_queue_purge(&bdev->txq);

        return 0;
 }
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d4..a61e610a400cb 100644

--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
        aosp_do_close(hdev);
        msft_do_close(hdev);

+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        if (hdev->flush)
                hdev->flush(hdev);

@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }

-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);

Kai-Heng
Mattijs Korpershoek Aug. 9, 2021, 9:19 a.m. UTC | #10
Hi Kai-Heng,

Kai-Heng Feng <kai.heng.feng@canonical.com> writes:

> Hi Mattijs,
>
>> [snipped]
>
> Thanks for your testing. I think I finally got it:
> btmtksdio_shutdown()
> -> mtk_hci_wmt_sync()
>  -> __hci_cmd_send()
>   then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in
> btmtksdio_recv_event():
> btmtksdio_recv_event()
> -> hci_recv_frame()
>  -> queue_work(hdev->workqueue, &hdev->rx_work);
>
> That means it has to be done before the following drain_workqueue() call.
> Can you please see if moving the ->shutdown() part right before
> drain_workqueue() can fix the issue?
I've tested the following patch:

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 2560ed2f144d..131e69a9a66a 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1757,6 +1757,14 @@ int hci_dev_do_close(struct hci_dev *hdev)
                        cancel_delayed_work_sync(&adv_instance->rpa_expired_cb);
        }
 
+       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+           test_bit(HCI_UP, &hdev->flags)) {
+               /* Execute vendor specific shutdown routine */
+               if (hdev->shutdown)
+                       hdev->shutdown(hdev);
+       }
+
        /* Avoid potential lockdep warnings from the *_flush() calls by
         * ensuring the workqueue is empty up front.
         */
@@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev)
                clear_bit(HCI_INIT, &hdev->flags);
        }
 
-       if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-           !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-           test_bit(HCI_UP, &hdev->flags)) {
-               /* Execute vendor specific shutdown routine */
-               if (hdev->shutdown)
-                       hdev->shutdown(hdev);
-       }
-
        /* flush cmd  work */
        flush_work(&hdev->cmd_work);

It does not seem to fix the issue.
Adding the bits in btmtksdio_flush() does not change the result of the
above patch.

Here are the logs. These are just with the above patch (not with the
btmtksdio_flush() changes.

dmesg: https://pastebin.com/FZZBkqGC
btmtksdio_ftrace: https://pastebin.com/JQ0UWenY

Mattijs
>
> Kai-Heng
>
>>
>> Mattijs
>>
>> >
>> > Kai-Heng
diff mbox series

Patch

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 7baf93eda936..6eedf334f943 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1716,14 +1716,6 @@  int hci_dev_do_close(struct hci_dev *hdev)
 
 	BT_DBG("%s %p", hdev->name, hdev);
 
-	if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
-	    !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
-	    test_bit(HCI_UP, &hdev->flags)) {
-		/* Execute vendor specific shutdown routine */
-		if (hdev->shutdown)
-			hdev->shutdown(hdev);
-	}
-
 	cancel_delayed_work(&hdev->power_off);
 	cancel_delayed_work(&hdev->ncmd_timer);
 
@@ -1801,6 +1793,14 @@  int hci_dev_do_close(struct hci_dev *hdev)
 		clear_bit(HCI_INIT, &hdev->flags);
 	}
 
+	if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) &&
+	    !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) &&
+	    test_bit(HCI_UP, &hdev->flags)) {
+		/* Execute vendor specific shutdown routine */
+		if (hdev->shutdown)
+			hdev->shutdown(hdev);
+	}
+
 	/* flush cmd  work */
 	flush_work(&hdev->cmd_work);