Message ID | 20210514071452.25220-1-kai.heng.feng@canonical.com |
---|---|
State | Accepted |
Commit | 0ea9fd001a14ebc294f112b0361a4e601551d508 |
Headers | show |
Series | [v2] Bluetooth: Shutdown controller after workqueues are flushed or cancelled | expand |
This is automated email and please do not reply to this email! Dear submitter, Thank you for submitting the patches to the linux bluetooth mailing list. This is a CI test results with your patch series: PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=482277 ---Test result--- Test Summary: CheckPatch PASS 0.75 seconds GitLint FAIL 0.11 seconds BuildKernel PASS 523.24 seconds TestRunner: Setup PASS 352.71 seconds TestRunner: l2cap-tester PASS 2.68 seconds TestRunner: bnep-tester PASS 1.88 seconds TestRunner: mgmt-tester PASS 27.21 seconds TestRunner: rfcomm-tester PASS 2.03 seconds TestRunner: sco-tester PASS 2.04 seconds TestRunner: smp-tester PASS 2.18 seconds TestRunner: userchan-tester PASS 1.90 seconds Details ############################## Test: CheckPatch - PASS - 0.75 seconds Run checkpatch.pl script with rule in .checkpatch.conf ############################## Test: GitLint - FAIL - 0.11 seconds Run gitlint with rule in .gitlint Bluetooth: Shutdown controller after workqueues are flushed or cancelled 23: B1 Line exceeds max length (102>80): "[ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20" 24: B1 Line exceeds max length (104>80): "[ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020" 26: B1 Line exceeds max length (202>80): "[ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1" 34: B1 Line exceeds max length (91>80): "[ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000" 54: B1 Line exceeds max length (214>80): "[ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24" 55: B1 Line exceeds max length (85>80): "[ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001" ############################## Test: BuildKernel - PASS - 523.24 seconds Build Kernel with minimal configuration supports Bluetooth ############################## Test: TestRunner: Setup - PASS - 352.71 seconds Setup environment for running Test Runner ############################## Test: TestRunner: l2cap-tester - PASS - 2.68 seconds Run test-runner with l2cap-tester Total: 40, Passed: 40 (100.0%), Failed: 0, Not Run: 0 ############################## Test: TestRunner: bnep-tester - PASS - 1.88 seconds Run test-runner with bnep-tester Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0 ############################## Test: TestRunner: mgmt-tester - PASS - 27.21 seconds Run test-runner with mgmt-tester Total: 416, Passed: 401 (96.4%), Failed: 0, Not Run: 15 ############################## Test: TestRunner: rfcomm-tester - PASS - 2.03 seconds Run test-runner with rfcomm-tester Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0 ############################## Test: TestRunner: sco-tester - PASS - 2.04 seconds Run test-runner with sco-tester Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0 ############################## Test: TestRunner: smp-tester - PASS - 2.18 seconds Run test-runner with smp-tester Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0 ############################## Test: TestRunner: userchan-tester - PASS - 1.90 seconds Run test-runner with userchan-tester Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0 --- Regards, Linux Bluetooth
Hi Kai-Heng, > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it > stops working: > [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed > [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on > [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > [ 545.660350] usb 3-10: device descriptor read/64, error -110 > [ 561.283530] usb 3-10: device descriptor read/64, error -110 > [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 > [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) > [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) > [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) > [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > Or kernel panics because other workqueues already freed skb: > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 > [ 2048.663775] #PF: supervisor read access in kernel mode > [ 2048.663779] #PF: error_code(0x0000) - not-present page > [ 2048.663782] PGD 0 P4D 0 > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 > 07 83 f8 01 > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 > [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 > [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 > [ 2048.663856] PKRU: 55555554 > [ 2048.663859] Call Trace: > [ 2048.663865] ? skb_release_head_state+0x5e/0x80 > [ 2048.663873] kfree_skb+0x2f/0xb0 > [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] > [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] > [ 2048.663954] ? __cond_resched+0x1a/0x50 > [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] > [ 2048.664007] rfkill_set_block+0x98/0x170 > [ 2048.664016] rfkill_fop_write+0x136/0x1e0 > [ 2048.664022] vfs_write+0xc7/0x260 > [ 2048.664030] ksys_write+0xb1/0xe0 > [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 > [ 2048.664042] __x64_sys_write+0x1a/0x20 > [ 2048.664048] do_syscall_64+0x40/0xb0 > [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae > [ 2048.664060] RIP: 0033:0x7fe02ac23c27 > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff > > So move the shutdown callback to a place where workqueues are either > flushed or cancelled to resolve the issue. > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > --- > v2: > - Rebased on bluetooth-next. > > net/bluetooth/hci_core.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) patch has been applied to bluetooth-next tree. Regards Marcel
On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, Marcel, > > Marcel Holtmann <marcel@holtmann.org> writes: > > > Hi Kai-Heng, > > > >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it > >> stops working: > >> [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed > >> [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on > >> [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > >> [ 545.660350] usb 3-10: device descriptor read/64, error -110 > >> [ 561.283530] usb 3-10: device descriptor read/64, error -110 > >> [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > >> [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 > >> [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) > >> [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) > >> [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) > >> [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) > >> > >> Or kernel panics because other workqueues already freed skb: > >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 > >> [ 2048.663775] #PF: supervisor read access in kernel mode > >> [ 2048.663779] #PF: error_code(0x0000) - not-present page > >> [ 2048.663782] PGD 0 P4D 0 > >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI > >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 > >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 > >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 > >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 > >> 07 83 f8 01 > >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 > >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 > >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 > >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 > >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 > >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 > >> [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 > >> [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 > >> [ 2048.663856] PKRU: 55555554 > >> [ 2048.663859] Call Trace: > >> [ 2048.663865] ? skb_release_head_state+0x5e/0x80 > >> [ 2048.663873] kfree_skb+0x2f/0xb0 > >> [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] > >> [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] > >> [ 2048.663954] ? __cond_resched+0x1a/0x50 > >> [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] > >> [ 2048.664007] rfkill_set_block+0x98/0x170 > >> [ 2048.664016] rfkill_fop_write+0x136/0x1e0 > >> [ 2048.664022] vfs_write+0xc7/0x260 > >> [ 2048.664030] ksys_write+0xb1/0xe0 > >> [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 > >> [ 2048.664042] __x64_sys_write+0x1a/0x20 > >> [ 2048.664048] do_syscall_64+0x40/0xb0 > >> [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae > >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27 > >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 > >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 > >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 > >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff > >> > >> So move the shutdown callback to a place where workqueues are either > >> flushed or cancelled to resolve the issue. > >> > >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > >> --- > >> v2: > >> - Rebased on bluetooth-next. > >> > >> net/bluetooth/hci_core.c | 16 ++++++++-------- > >> 1 file changed, 8 insertions(+), 8 deletions(-) > > > > patch has been applied to bluetooth-next tree. > > This patch seems to introduce a regression in the btmtksdio driver. > With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin: > > i500-pumpkin login: root > root@i500-pumpkin:~# uname -a > Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux > root@i500-pumpkin:~# hciconfig hci0 up > Can't init device hci0: Connection timed out (110) > root@i500-pumpkin:~# hciconfig hci0 down > root@i500-pumpkin:~# hciconfig hci0 up > Can't init device hci0: Input/output error (5) > > Reverting it fixes the above issue. > Any suggestion on how to fix this without touching hci_core ? > Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips. Can you please attach dmesg? Also, full ftrace log on btmtksdio can also be helpful. Kai-Heng > > Thanks, > Mattijs Korpershoek > > > > > > Regards > > > > Marcel
Hi Kai-Heng, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > On Wed, Jul 28, 2021 at 9:50 PM Mattijs Korpershoek > <mkorpershoek@baylibre.com> wrote: >> >> Hi Kai-Heng, Marcel, >> >> Marcel Holtmann <marcel@holtmann.org> writes: >> >> > Hi Kai-Heng, >> > >> >> Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it >> >> stops working: >> >> [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed >> >> [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on >> >> [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd >> >> [ 545.660350] usb 3-10: device descriptor read/64, error -110 >> >> [ 561.283530] usb 3-10: device descriptor read/64, error -110 >> >> [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd >> >> [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 >> >> [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) >> >> [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) >> >> [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) >> >> [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) >> >> >> >> Or kernel panics because other workqueues already freed skb: >> >> [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 >> >> [ 2048.663775] #PF: supervisor read access in kernel mode >> >> [ 2048.663779] #PF: error_code(0x0000) - not-present page >> >> [ 2048.663782] PGD 0 P4D 0 >> >> [ 2048.663787] Oops: 0000 [#1] SMP NOPTI >> >> [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 >> >> [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 >> >> [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 >> >> [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 >> >> 07 83 f8 01 >> >> [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 >> >> [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 >> >> [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 >> >> [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 >> >> [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 >> >> [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 >> >> [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 >> >> [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> >> [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 >> >> [ 2048.663856] PKRU: 55555554 >> >> [ 2048.663859] Call Trace: >> >> [ 2048.663865] ? skb_release_head_state+0x5e/0x80 >> >> [ 2048.663873] kfree_skb+0x2f/0xb0 >> >> [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] >> >> [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] >> >> [ 2048.663954] ? __cond_resched+0x1a/0x50 >> >> [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] >> >> [ 2048.664007] rfkill_set_block+0x98/0x170 >> >> [ 2048.664016] rfkill_fop_write+0x136/0x1e0 >> >> [ 2048.664022] vfs_write+0xc7/0x260 >> >> [ 2048.664030] ksys_write+0xb1/0xe0 >> >> [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 >> >> [ 2048.664042] __x64_sys_write+0x1a/0x20 >> >> [ 2048.664048] do_syscall_64+0x40/0xb0 >> >> [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae >> >> [ 2048.664060] RIP: 0033:0x7fe02ac23c27 >> >> [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 >> >> [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 >> >> [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 >> >> [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 >> >> [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 >> >> [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 >> >> [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff >> >> >> >> So move the shutdown callback to a place where workqueues are either >> >> flushed or cancelled to resolve the issue. >> >> >> >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> >> >> --- >> >> v2: >> >> - Rebased on bluetooth-next. >> >> >> >> net/bluetooth/hci_core.c | 16 ++++++++-------- >> >> 1 file changed, 8 insertions(+), 8 deletions(-) >> > >> > patch has been applied to bluetooth-next tree. >> >> This patch seems to introduce a regression in the btmtksdio driver. >> With this patch applied, I can't enable the hci0 interface anymore on mt8183-pumpkin: >> >> i500-pumpkin login: root >> root@i500-pumpkin:~# uname -a >> Linux i500-pumpkin 5.14.0-rc3 #94 SMP PREEMPT Wed Jul 28 11:58:20 CEST 2021 aarch64 aarch64 aarch64 GNU/Linux >> root@i500-pumpkin:~# hciconfig hci0 up >> Can't init device hci0: Connection timed out (110) >> root@i500-pumpkin:~# hciconfig hci0 down >> root@i500-pumpkin:~# hciconfig hci0 up >> Can't init device hci0: Input/output error (5) >> >> Reverting it fixes the above issue. >> Any suggestion on how to fix this without touching hci_core ? >> Maybe the btmtksdio driver needs some rework. As I'm not familiar with the code, I would appreciate any tips. > > Can you please attach dmesg? Also, full ftrace log on btmtksdio can > also be helpful. Thank you for your help. Sorry I did not post the logs previously. dmesg: https://pastebin.com/tpWDNyQr ftrace on btmtksdio: https://pastebin.com/jmhvmwUw Mattijs > > Kai-Heng > >> >> Thanks, >> Mattijs Korpershoek >> >> >> > >> > Regards >> > >> > Marcel
Hi Mattijs, On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, [snipped] > Thank you for your help. Sorry I did not post the logs previously. > > dmesg: https://pastebin.com/tpWDNyQr > ftrace on btmtksdio: https://pastebin.com/jmhvmwUw Seems like btmtksdio needs shudown() to be called before flush(). Since the order was there for a very long time, changing the calling order indeed can break what driver expects. Can you please test the following patch: diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d..a61e610a400c 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) aosp_do_close(hdev); msft_do_close(hdev); + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + if (hdev->flush) hdev->flush(hdev); @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); Kai-Heng > > Mattijs > > > > Kai-Heng > > > >> > >> Thanks, > >> Mattijs Korpershoek > >> > >> > >> > > >> > Regards > >> > > >> > Marcel
On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote: > > On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote: > > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it > > stops working: > > [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed > > [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on > > [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > [ 545.660350] usb 3-10: device descriptor read/64, error -110 > > [ 561.283530] usb 3-10: device descriptor read/64, error -110 > > [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 > > [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) > > [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) > > [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > > > Or kernel panics because other workqueues already freed skb: > > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 > > [ 2048.663775] #PF: supervisor read access in kernel mode > > [ 2048.663779] #PF: error_code(0x0000) - not-present page > > [ 2048.663782] PGD 0 P4D 0 > > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI > > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 > > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 > > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 > > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 > > 07 83 f8 01 > > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 > > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 > > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 > > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 > > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 > > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 > > [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 > > [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 > > [ 2048.663856] PKRU: 55555554 > > [ 2048.663859] Call Trace: > > [ 2048.663865] ? skb_release_head_state+0x5e/0x80 > > [ 2048.663873] kfree_skb+0x2f/0xb0 > > [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] > > [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] > > [ 2048.663954] ? __cond_resched+0x1a/0x50 > > [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] > > [ 2048.664007] rfkill_set_block+0x98/0x170 > > [ 2048.664016] rfkill_fop_write+0x136/0x1e0 > > [ 2048.664022] vfs_write+0xc7/0x260 > > [ 2048.664030] ksys_write+0xb1/0xe0 > > [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 > > [ 2048.664042] __x64_sys_write+0x1a/0x20 > > [ 2048.664048] do_syscall_64+0x40/0xb0 > > [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae > > [ 2048.664060] RIP: 0033:0x7fe02ac23c27 > > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 > > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 > > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 > > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff > > > > So move the shutdown callback to a place where workqueues are either > > flushed or cancelled to resolve the issue. > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > --- > > v2: > > - Rebased on bluetooth-next. > > > > net/bluetooth/hci_core.c | 16 ++++++++-------- > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > index 7baf93eda936..6eedf334f943 100644 > > --- a/net/bluetooth/hci_core.c > > +++ b/net/bluetooth/hci_core.c > > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > > > > BT_DBG("%s %p", hdev->name, hdev); > > > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > - test_bit(HCI_UP, &hdev->flags)) { > > - /* Execute vendor specific shutdown routine */ > > - if (hdev->shutdown) > > - hdev->shutdown(hdev); > > - } > > - > > cancel_delayed_work(&hdev->power_off); > > cancel_delayed_work(&hdev->ncmd_timer); > > > > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > > clear_bit(HCI_INIT, &hdev->flags); > > } > > > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > + test_bit(HCI_UP, &hdev->flags)) { > > + /* Execute vendor specific shutdown routine */ > > + if (hdev->shutdown) > > + hdev->shutdown(hdev); > > + } > > + > > /* flush cmd work */ > > flush_work(&hdev->cmd_work); > > > > -- > > 2.30.2 > > btusb_shutdown_intel_new > __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT); > __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout); > hci_req_run_skb(&req, hci_req_sync_complete); > > hci_req_sync_complete > if (skb) > hdev->req_skb = skb_get(skb); > > Given the skb_get in hci_req_sync_complete makes it safe to free skb on > driver side, I doubt this patch is the correct fix as it is. Some workqueues are still active. The shutdown() should be called at least after hci_request_cancel_all(). Kai-Heng
Hi Kai-Heng, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > Hi Mattijs, > > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek > <mkorpershoek@baylibre.com> wrote: >> >> Hi Kai-Heng, > > [snipped] > >> Thank you for your help. Sorry I did not post the logs previously. >> >> dmesg: https://pastebin.com/tpWDNyQr >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw > > Seems like btmtksdio needs shudown() to be called before flush(). > Since the order was there for a very long time, changing the calling > order indeed can break what driver expects. > Can you please test the following patch: > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index 2560ed2f144d..a61e610a400c 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > aosp_do_close(hdev); > msft_do_close(hdev); > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > + test_bit(HCI_UP, &hdev->flags)) { > + /* Execute vendor specific shutdown routine */ > + if (hdev->shutdown) > + hdev->shutdown(hdev); > + } > + > if (hdev->flush) > hdev->flush(hdev); > > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > clear_bit(HCI_INIT, &hdev->flags); > } > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > - test_bit(HCI_UP, &hdev->flags)) { > - /* Execute vendor specific shutdown routine */ > - if (hdev->shutdown) > - hdev->shutdown(hdev); > - } > - > /* flush cmd work */ > flush_work(&hdev->cmd_work); Thanks for the patch and your help. I've tried it, but it seems that it does not improve for me. I'm still observing: i500-pumpkin login: root root@i500-pumpkin:~# hciconfig hci0 up Can't init device hci0: Connection timed out (110) Logs for this session: dmesg: https://pastebin.com/iAFk5Tzi ftrace: https://pastebin.com/kEMWSYrE > > Kai-Heng > >> >> Mattijs >> > >> > Kai-Heng >> > >> >> >> >> Thanks, >> >> Mattijs Korpershoek >> >> >> >> >> >> > >> >> > Regards >> >> > >> >> > Marcel
On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <hdanton@sina.com> wrote: > > On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote: > >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote: > >> > >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on > >> driver side, I doubt this patch is the correct fix as it is. > > > >Some workqueues are still active. > >The shutdown() should be called at least after hci_request_cancel_all(). > > What is muddy then is how active workqueues prevent skb_get from protecting > kfree_skb. Can you spot what workqueue it is? I managed to reproduce the issue with another kernel splat: ------------[ cut here ]------------ kernel BUG at mm/slub.c:321! invalid opcode: 0000 [#1] SMP NOPTI CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16 Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver. 01.03.04 01/07/2021 Workqueue: hci0 discov_update [bluetooth] RIP: 0010:__slab_free+0x20c/0x3a0 Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24 1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b 49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24 RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246 RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600 RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500 RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80 R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600 FS: 0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: ? psi_task_switch+0xc3/0x1e0 ? __switch_to_asm+0x36/0x70 ? skb_free_head+0x67/0x80 kmem_cache_free+0x370/0x3d0 ? kfree_skbmem+0x4e/0x90 kfree_skbmem+0x4e/0x90 kfree_skb+0x47/0xb0 __hci_req_sync+0x134/0x2a0 [bluetooth] ? wait_woken+0x70/0x70 discov_update+0x2ae/0x310 [bluetooth] process_one_work+0x21d/0x3c0 worker_thread+0x53/0x420 ? process_one_work+0x3c0/0x3c0 kthread+0x127/0x150 ? set_kthread_struct+0x50/0x50 ret_from_fork+0x1f/0x30 Modules linked in: rfcomm cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_pci snd_sof snd_sof_xtensa_dsp snd_sof_intel_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi ledtrig_audio snd_soc_core snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq i915 snd_seq_device snd_timer hp_wmi intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp i2c_algo_bit coretemp joydev kvm_intel ttm mei_hdcp intel_rapl_msr platform_profile wmi_bmof kvm uvcvideo crct10dif_pclmul btusb videobuf2_vmalloc videobuf2_memops drm_kms_helper btrtl videobuf2_v4l2 crc32_pclmul btbcm ghash_clmulni_intel input_leds videobuf2_common btintel snd videodev syscopyarea sysfillrect sysimgblt aesni_intel mc serio_raw fb_sys_fops bluetooth crypto_simd cec cryptd intel_cstate ecdh_generic efi_pstore ecc rc_core hid_multitouch processor_thermal_device_pci_legacy mei_me intel_soc_dts_iosf processor_thermal_device mei soundcore processor_thermal_rfim ee1004 processor_thermal_mbox processor_thermal_rapl intel_pmt_telemetry intel_rapl_common intel_pmt_class ucsi_acpi typec_ucsi typec wmi soc_button_array int3403_thermal int340x_thermal_zone video int3400_thermal acpi_thermal_rel acpi_pad mac_hid intel_hid sparse_keymap sch_fq_codel msr parport_pc ppdev lp drm parport ip_tables x_tables autofs4 hid_generic nvme nvme_core intel_lpss_pci e1000e intel_lpss i2c_i801 idma64 i2c_smbus xhci_pci xhci_pci_renesas vmd intel_pmt i2c_hid_acpi i2c_hid hid pinctrl_tigerlake ---[ end trace c09445d4697039ed ]--- So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update) and can prevent the race from happening. And the kernel splat is just one symptom of the issue, most of the time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in dmesg. Kai-Heng
On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, > > Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > > > Hi Mattijs, > > > > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek > > <mkorpershoek@baylibre.com> wrote: > >> > >> Hi Kai-Heng, > > > > [snipped] > > > >> Thank you for your help. Sorry I did not post the logs previously. > >> > >> dmesg: https://pastebin.com/tpWDNyQr > >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw > > > > Seems like btmtksdio needs shudown() to be called before flush(). > > Since the order was there for a very long time, changing the calling > > order indeed can break what driver expects. > > Can you please test the following patch: > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > index 2560ed2f144d..a61e610a400c 100644 > > --- a/net/bluetooth/hci_core.c > > +++ b/net/bluetooth/hci_core.c > > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > > aosp_do_close(hdev); > > msft_do_close(hdev); > > > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > + test_bit(HCI_UP, &hdev->flags)) { > > + /* Execute vendor specific shutdown routine */ > > + if (hdev->shutdown) > > + hdev->shutdown(hdev); > > + } > > + > > if (hdev->flush) > > hdev->flush(hdev); > > > > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > > clear_bit(HCI_INIT, &hdev->flags); > > } > > > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > - test_bit(HCI_UP, &hdev->flags)) { > > - /* Execute vendor specific shutdown routine */ > > - if (hdev->shutdown) > > - hdev->shutdown(hdev); > > - } > > - > > /* flush cmd work */ > > flush_work(&hdev->cmd_work); > > Thanks for the patch and your help. > I've tried it, but it seems that it does not improve for me. > I'm still observing: > > i500-pumpkin login: root > root@i500-pumpkin:~# hciconfig hci0 up > Can't init device hci0: Connection timed out (110) > > Logs for this session: > dmesg: https://pastebin.com/iAFk5Tzi > ftrace: https://pastebin.com/kEMWSYrE Thanks for the testing! What about moving the shutdown() part right after hci_req_sync_lock() so tx/rx can still work: diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d4..be3113fb7d4b0 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev) hci_request_cancel_all(hdev); hci_req_sync_lock(hdev); + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + if (!test_and_clear_bit(HCI_UP, &hdev->flags)) { cancel_delayed_work_sync(&hdev->cmd_timer); hci_req_sync_unlock(hdev); @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); > > > > > > Kai-Heng > > > >> > >> Mattijs > >> > > >> > Kai-Heng > >> > > >> >> > >> >> Thanks, > >> >> Mattijs Korpershoek > >> >> > >> >> > >> >> > > >> >> > Regards > >> >> > > >> >> > Marcel
On Thu, Aug 5, 2021 at 11:00 AM Hillf Danton <hdanton@sina.com> wrote: > > On Wed, 4 Aug 2021 22:35:43 +0800 Kai-Heng Feng wrote: > >On Tue, Aug 3, 2021 at 3:47 PM Hillf Danton <hdanton@sina.com> wrote: > >> > >> On Tue, 3 Aug 2021 14:45:07 +0800 Kai-Heng Feng wrote: > >> >On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote: > >> >> > >> >> Given the skb_get in hci_req_sync_complete makes it safe to free skb on > >> >> driver side, I doubt this patch is the correct fix as it is. > >> > > >> >Some workqueues are still active. > >> >The shutdown() should be called at least after hci_request_cancel_all(). > >> > >> What is muddy then is how active workqueues prevent skb_get from protecting > >> kfree_skb. Can you spot what workqueue it is? > > > >I managed to reproduce the issue with another kernel splat: > >------------[ cut here ]------------ > >kernel BUG at mm/slub.c:321! > >invalid opcode: 0000 [#1] SMP NOPTI > >CPU: 2 PID: 2208 Comm: kworker/u9:3 Not tainted 5.14.0-rc4+ #16 > >Hardware name: HP HP ProBook 650 G8 Notebook PC/87ED, BIOS T74 Ver. > >01.03.04 01/07/2021 > >Workqueue: hci0 discov_update [bluetooth] > >RIP: 0010:__slab_free+0x20c/0x3a0 > >Code: 00 44 0f b6 54 24 1a 8b 74 24 14 44 0f b6 4c 24 1b 44 8b 44 24 > >1c 48 89 44 24 08 48 8b 54 24 20 48 8b 7c 24 28 e9 ad fe ff ff <0f> 0b > >49 3b 54 24 28 0f 85 6b ff ff ff 49 89 5c 24 20 49 89 4c 24 > >RSP: 0018:ffffaa0e4164fc50 EFLAGS: 00010246 > >RAX: ffff9cc9a217e668 RBX: ffff9cc9a217e600 RCX: ffff9cc9a217e600 > >RDX: 000000008010000e RSI: ffffd09044885f80 RDI: ffff9cc980e96500 > >RBP: ffffaa0e4164fd00 R08: 0000000000000001 R09: ffffffff885b3a4e > >R10: ffff9cc999aab800 R11: ffff9cc9a217e600 R12: ffffd09044885f80 > >R13: ffff9cc9a217e600 R14: ffff9cc980e96500 R15: ffff9cc9a217e600 > >FS: 0000000000000000(0000) GS:ffff9cca2b900000(0000) knlGS:0000000000000000 > >CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >CR2: 00007ffe164d5b98 CR3: 000000013f410002 CR4: 0000000000770ee0 > >PKRU: 55555554 > >Call Trace: > > ? psi_task_switch+0xc3/0x1e0 > > ? __switch_to_asm+0x36/0x70 > > ? skb_free_head+0x67/0x80 > > kmem_cache_free+0x370/0x3d0 > > ? kfree_skbmem+0x4e/0x90 > > kfree_skbmem+0x4e/0x90 > > kfree_skb+0x47/0xb0 > > __hci_req_sync+0x134/0x2a0 [bluetooth] > > ? wait_woken+0x70/0x70 > > discov_update+0x2ae/0x310 [bluetooth] > > process_one_work+0x21d/0x3c0 > > worker_thread+0x53/0x420 > > ? process_one_work+0x3c0/0x3c0 > > kthread+0x127/0x150 > > ? set_kthread_struct+0x50/0x50 > > ret_from_fork+0x1f/0x30 > > [...] > > > > >So hci_request_cancel_all() -> cancel_work_sync(&hdev->discov_update) > >and can prevent the race from happening. > > Given > > __hci_req_sync > err = hci_req_run_skb(&req, hci_req_sync_complete); > kfree_skb(hdev->req_skb); > > hci_req_sync_complete > if (skb) > hdev->req_skb = skb_get(skb); > > once more skb_get makes the above race hard to understand, though it should > better check error before freeing skb there to avoid blind free. > > > > >And the kernel splat is just one symptom of the issue, most of the > > It fails to support your reasoning so far. How? Most of the time the BT controller can't be brought up again after shutdown(), and we need to stop other activities before that. What other reasoning is expected? Kai-Heng > > >time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in > >dmesg. > > > >Kai-Heng > > > +++ x/net/bluetooth/hci_request.c > @@ -257,8 +257,10 @@ int __hci_req_sync(struct hci_dev *hdev, > break; > } > > - kfree_skb(hdev->req_skb); > - hdev->req_skb = NULL; > + if (!err) { > + kfree_skb(hdev->req_skb); > + hdev->req_skb = NULL; > + } > hdev->req_status = hdev->req_result = 0; > > bt_dev_dbg(hdev, "end: err %d", err);
On Tue, Aug 3, 2021 at 2:45 PM Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > On Mon, Aug 2, 2021 at 11:05 AM Hillf Danton <hdanton@sina.com> wrote: > > > > On Fri, 14 May 2021 15:14:52 +0800 Kai-Heng Feng wrote: > > > Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it > > > stops working: > > > [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed > > > [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on > > > [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > > [ 545.660350] usb 3-10: device descriptor read/64, error -110 > > > [ 561.283530] usb 3-10: device descriptor read/64, error -110 > > > [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd > > > [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 > > > [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) > > > [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > > [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) > > > [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) > > > > > > Or kernel panics because other workqueues already freed skb: > > > [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 > > > [ 2048.663775] #PF: supervisor read access in kernel mode > > > [ 2048.663779] #PF: error_code(0x0000) - not-present page > > > [ 2048.663782] PGD 0 P4D 0 > > > [ 2048.663787] Oops: 0000 [#1] SMP NOPTI > > > [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 > > > [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 > > > [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 > > > [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 > > > 07 83 f8 01 > > > [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 > > > [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 > > > [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 > > > [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 > > > [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 > > > [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 > > > [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 > > > [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 > > > [ 2048.663856] PKRU: 55555554 > > > [ 2048.663859] Call Trace: > > > [ 2048.663865] ? skb_release_head_state+0x5e/0x80 > > > [ 2048.663873] kfree_skb+0x2f/0xb0 > > > [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] > > > [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] > > > [ 2048.663954] ? __cond_resched+0x1a/0x50 > > > [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] > > > [ 2048.664007] rfkill_set_block+0x98/0x170 > > > [ 2048.664016] rfkill_fop_write+0x136/0x1e0 > > > [ 2048.664022] vfs_write+0xc7/0x260 > > > [ 2048.664030] ksys_write+0xb1/0xe0 > > > [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 > > > [ 2048.664042] __x64_sys_write+0x1a/0x20 > > > [ 2048.664048] do_syscall_64+0x40/0xb0 > > > [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae > > > [ 2048.664060] RIP: 0033:0x7fe02ac23c27 > > > [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 > > > [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 > > > [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 > > > [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 > > > [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 > > > [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 > > > [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff > > > > > > So move the shutdown callback to a place where workqueues are either > > > flushed or cancelled to resolve the issue. > > > > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > > --- > > > v2: > > > - Rebased on bluetooth-next. > > > > > > net/bluetooth/hci_core.c | 16 ++++++++-------- > > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > > index 7baf93eda936..6eedf334f943 100644 > > > --- a/net/bluetooth/hci_core.c > > > +++ b/net/bluetooth/hci_core.c > > > @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > > > > > > BT_DBG("%s %p", hdev->name, hdev); > > > > > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > - test_bit(HCI_UP, &hdev->flags)) { > > > - /* Execute vendor specific shutdown routine */ > > > - if (hdev->shutdown) > > > - hdev->shutdown(hdev); > > > - } > > > - > > > cancel_delayed_work(&hdev->power_off); > > > cancel_delayed_work(&hdev->ncmd_timer); > > > > > > @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > > > clear_bit(HCI_INIT, &hdev->flags); > > > } > > > > > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > > + test_bit(HCI_UP, &hdev->flags)) { > > > + /* Execute vendor specific shutdown routine */ > > > + if (hdev->shutdown) > > > + hdev->shutdown(hdev); > > > + } > > > + > > > /* flush cmd work */ > > > flush_work(&hdev->cmd_work); > > > > > > -- > > > 2.30.2 > > > > btusb_shutdown_intel_new > > __hci_cmd_sync(hdev, HCI_OP_RESET, 0, NULL, HCI_INIT_TIMEOUT); > > __hci_cmd_sync_ev(hdev, opcode, plen, param, 0, timeout); > > hci_req_run_skb(&req, hci_req_sync_complete); > > > > hci_req_sync_complete > > if (skb) > > hdev->req_skb = skb_get(skb); > > > > Given the skb_get in hci_req_sync_complete makes it safe to free skb on > > driver side, I doubt this patch is the correct fix as it is. > > Some workqueues are still active. > The shutdown() should be called at least after hci_request_cancel_all(). > Hello, The original patch 60789afc02f592b8d91217b60930e7a76271ae07 ("Bluetooth: Shutdown controller after workqueues are flushed or cancelled") is causing the tx fail when downloading fw on mediatek mt8183 device using QCA bluetooth: [ 225.205061] Bluetooth: qca_download_firmware() hci0: QCA Downloading qca/rampatch_00440302.bin [ 227.252653] Bluetooth: hci_cmd_timeout() hci0: command 0xfc00 tx timeout ... follows by a lot of: [ 223.604971] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84) [ 223.605027] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84) After applying the fixup to allow tx, the issue is solved. > Kai-Heng
Hi Kai-Heng, Thanks for your patch, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > On Tue, Aug 3, 2021 at 4:21 PM Mattijs Korpershoek > <mkorpershoek@baylibre.com> wrote: >> >> Hi Kai-Heng, >> >> Kai-Heng Feng <kai.heng.feng@canonical.com> writes: >> >> > Hi Mattijs, >> > >> > On Fri, Jul 30, 2021 at 7:40 PM Mattijs Korpershoek >> > <mkorpershoek@baylibre.com> wrote: >> >> >> >> Hi Kai-Heng, >> > >> > [snipped] >> > >> >> Thank you for your help. Sorry I did not post the logs previously. >> >> >> >> dmesg: https://pastebin.com/tpWDNyQr >> >> ftrace on btmtksdio: https://pastebin.com/jmhvmwUw >> > >> > Seems like btmtksdio needs shudown() to be called before flush(). >> > Since the order was there for a very long time, changing the calling >> > order indeed can break what driver expects. >> > Can you please test the following patch: >> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c >> > index 2560ed2f144d..a61e610a400c 100644 >> > --- a/net/bluetooth/hci_core.c >> > +++ b/net/bluetooth/hci_core.c >> > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) >> > aosp_do_close(hdev); >> > msft_do_close(hdev); >> > >> > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && >> > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && >> > + test_bit(HCI_UP, &hdev->flags)) { >> > + /* Execute vendor specific shutdown routine */ >> > + if (hdev->shutdown) >> > + hdev->shutdown(hdev); >> > + } >> > + >> > if (hdev->flush) >> > hdev->flush(hdev); >> > >> > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) >> > clear_bit(HCI_INIT, &hdev->flags); >> > } >> > >> > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && >> > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && >> > - test_bit(HCI_UP, &hdev->flags)) { >> > - /* Execute vendor specific shutdown routine */ >> > - if (hdev->shutdown) >> > - hdev->shutdown(hdev); >> > - } >> > - >> > /* flush cmd work */ >> > flush_work(&hdev->cmd_work); >> >> Thanks for the patch and your help. >> I've tried it, but it seems that it does not improve for me. >> I'm still observing: >> >> i500-pumpkin login: root >> root@i500-pumpkin:~# hciconfig hci0 up >> Can't init device hci0: Connection timed out (110) >> >> Logs for this session: >> dmesg: https://pastebin.com/iAFk5Tzi >> ftrace: https://pastebin.com/kEMWSYrE > > Thanks for the testing! > What about moving the shutdown() part right after hci_req_sync_lock() > so tx/rx can still work: > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index 2560ed2f144d4..be3113fb7d4b0 100644 > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -1727,6 +1727,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > hci_request_cancel_all(hdev); > hci_req_sync_lock(hdev); > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > + test_bit(HCI_UP, &hdev->flags)) { > + /* Execute vendor specific shutdown routine */ > + if (hdev->shutdown) > + hdev->shutdown(hdev); > + } > + > if (!test_and_clear_bit(HCI_UP, &hdev->flags)) { > cancel_delayed_work_sync(&hdev->cmd_timer); > hci_req_sync_unlock(hdev); > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > clear_bit(HCI_INIT, &hdev->flags); > } > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > - test_bit(HCI_UP, &hdev->flags)) { > - /* Execute vendor specific shutdown routine */ > - if (hdev->shutdown) > - hdev->shutdown(hdev); > - } > - > /* flush cmd work */ > flush_work(&hdev->cmd_work); I confirm this diff works for me: root@i500-pumpkin:~# hciconfig hci0 up root@i500-pumpkin:~# hciconfig hci0 down root@i500-pumpkin:~# hciconfig hci0 up root@i500-pumpkin:~# hciconfig hci0 hci0: Type: Primary Bus: SDIO BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4 UP RUNNING RX bytes:11268 acl:0 sco:0 events:829 errors:0 TX bytes:182569 acl:0 sco:0 commands:829 errors:0 root@i500-pumpkin:~# hcitool scan Scanning ... <redacted> Pixel 3 XL Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> > > > > > >> >> >> > >> > Kai-Heng >> > >> >> >> >> Mattijs >> >> > >> >> > Kai-Heng >> >> > >> >> >> >> >> >> Thanks, >> >> >> Mattijs Korpershoek >> >> >> >> >> >> >> >> >> > >> >> >> > Regards >> >> >> > >> >> >> > Marcel
On Thu, Aug 5, 2021 at 3:01 PM Hillf Danton <hdanton@sina.com> wrote: > > On Thu, 5 Aug 2021 14:12:10 +0800 Hsin-Yi Wang wrote: > > > >The original patch 60789afc02f592b8d91217b60930e7a76271ae07 > >("Bluetooth: Shutdown controller after workqueues are flushed or > >cancelled") is causing the tx fail when downloading fw on mediatek > >mt8183 device using QCA bluetooth: > > Want to know if reverting 60789afc02f59 is a cure. Reverting 60789afc02f592b8d91217b60930e7a76271ae07 can also fixed the issue. > > > > >[ 225.205061] Bluetooth: qca_download_firmware() hci0: QCA > >Downloading qca/rampatch_00440302.bin > >[ 227.252653] Bluetooth: hci_cmd_timeout() hci0: command 0xfc00 tx timeout > >... > >follows by a lot of: > >[ 223.604971] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84) > >[ 223.605027] Bluetooth: qca_recv() hci0: Frame reassembly failed (-84) > > > >After applying the fixup to allow tx, the issue is solved. > > Feel free to specify the fixup and its link to 60789afc02f59. The fixup is pasted by Kai-Heng here: https://lkml.org/lkml/2021/8/3/103
[snipped] > >How? Most of the time the BT controller can't be brought up again > >after shutdown(), and we need to stop other activities before that. > >What other reasoning is expected? > > See below. > > > >Kai-Heng > > > >> > >> >time it's just "Bluetooth: hci0: HCI reset during shutdown failed" in > >> >dmesg. > > In drivers/bluetooth/btusb.c, there are three cases of > > bt_dev_err(hdev, "HCI reset during shutdown failed"); > > and in btusb_shutdown_intel_new() it has nothing to do with kfree_skb() > because of IS_ERR(skb). No, kfree_skb() doesn't gets called in this case. But when that happens the BT controller won't work anymore. > > Feel free to specify why an skb error links to the race you are trying to fix. The race here is that the btusb_shutdown_intel_new() is trying to reset the controller while other works like scanning or discovering are still underway. So the patch is to ensure that shutdown() callback is invoked after other works are cancelled. I think I understand what you are trying to ask, you want to know where the double kfree_skb() race happens. I didn't really investigate that because quiesce the other activities then call shutdown() is the right thing to do and I haven't seen the kernel splat since. Kai-Heng > > >> > > >> >Kai-Heng > >> > >> > >> +++ x/net/bluetooth/hci_request.c > >> @@ -257,8 +257,10 @@ int __hci_req_sync(struct hci_dev *hdev, > >> break; > >> } > >> > >> - kfree_skb(hdev->req_skb); > >> - hdev->req_skb = NULL; > >> + if (!err) { > >> + kfree_skb(hdev->req_skb); > >> + hdev->req_skb = NULL; > >> + } > >> hdev->req_status = hdev->req_result = 0; > >> > >> bt_dev_dbg(hdev, "end: err %d", err);
Hi Mattijs, On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, > > Thanks for your patch, > > Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > [snipped] > I confirm this diff works for me: > > root@i500-pumpkin:~# hciconfig hci0 up > root@i500-pumpkin:~# hciconfig hci0 down > root@i500-pumpkin:~# hciconfig hci0 up > root@i500-pumpkin:~# hciconfig hci0 > hci0: Type: Primary Bus: SDIO > BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4 > UP RUNNING > RX bytes:11268 acl:0 sco:0 events:829 errors:0 > TX bytes:182569 acl:0 sco:0 commands:829 errors:0 > > root@i500-pumpkin:~# hcitool scan > Scanning ... > <redacted> Pixel 3 XL > > Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> I found that btmtksdio_flush() only cancels the work instead of doing flush_work(). That probably explains why putting ->shutdown right before ->flush doesn't work. So can you please test the following again: diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c index 9872ef18f9fea..b33c05ad2150b 100644 --- a/drivers/bluetooth/btmtksdio.c +++ b/drivers/bluetooth/btmtksdio.c @@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev) { struct btmtksdio_dev *bdev = hci_get_drvdata(hdev); - skb_queue_purge(&bdev->txq); + flush_work(&bdev->tx_work); - cancel_work_sync(&bdev->tx_work); + skb_queue_purge(&bdev->txq); return 0; } diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d4..a61e610a400cb 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) aosp_do_close(hdev); msft_do_close(hdev); + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + if (hdev->flush) hdev->flush(hdev); @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); Kai-Heng
Hi Kai-Heng, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > Hi Mattijs, > > On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek > <mkorpershoek@baylibre.com> wrote: >> >> Hi Kai-Heng, >> >> Thanks for your patch, >> >> Kai-Heng Feng <kai.heng.feng@canonical.com> writes: >> > > [snipped] > >> I confirm this diff works for me: >> >> root@i500-pumpkin:~# hciconfig hci0 up >> root@i500-pumpkin:~# hciconfig hci0 down >> root@i500-pumpkin:~# hciconfig hci0 up >> root@i500-pumpkin:~# hciconfig hci0 >> hci0: Type: Primary Bus: SDIO >> BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4 >> UP RUNNING >> RX bytes:11268 acl:0 sco:0 events:829 errors:0 >> TX bytes:182569 acl:0 sco:0 commands:829 errors:0 >> >> root@i500-pumpkin:~# hcitool scan >> Scanning ... >> <redacted> Pixel 3 XL >> >> Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> > > I found that btmtksdio_flush() only cancels the work instead of doing > flush_work(). That probably explains why putting ->shutdown right > before ->flush doesn't work. > So can you please test the following again: > diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c > index 9872ef18f9fea..b33c05ad2150b 100644 > --- a/drivers/bluetooth/btmtksdio.c > +++ b/drivers/bluetooth/btmtksdio.c > @@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev) > { > struct btmtksdio_dev *bdev = hci_get_drvdata(hdev); > > - skb_queue_purge(&bdev->txq); > + flush_work(&bdev->tx_work); > > - cancel_work_sync(&bdev->tx_work); > + skb_queue_purge(&bdev->txq); > > return 0; > } > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > index 2560ed2f144d4..a61e610a400cb 100644 > > --- a/net/bluetooth/hci_core.c > +++ b/net/bluetooth/hci_core.c > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > aosp_do_close(hdev); > msft_do_close(hdev); > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > + test_bit(HCI_UP, &hdev->flags)) { > + /* Execute vendor specific shutdown routine */ > + if (hdev->shutdown) > + hdev->shutdown(hdev); > + } > + > if (hdev->flush) > hdev->flush(hdev); > > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > clear_bit(HCI_INIT, &hdev->flags); > } > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > - test_bit(HCI_UP, &hdev->flags)) { > - /* Execute vendor specific shutdown routine */ > - if (hdev->shutdown) > - hdev->shutdown(hdev); > - } > - > /* flush cmd work */ > flush_work(&hdev->cmd_work); I've tried this but I have the same (broken) symptoms as before. Here are some logs of v3: dmesg: https://pastebin.com/1x4UHkzy ftrace: https://pastebin.com/Lm1d6AWy Mattijs > > Kai-Heng
Hi Mattijs, On Fri, Aug 6, 2021 at 4:51 PM Mattijs Korpershoek <mkorpershoek@baylibre.com> wrote: > > Hi Kai-Heng, > > Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > > > Hi Mattijs, > > > > On Thu, Aug 5, 2021 at 2:55 PM Mattijs Korpershoek > > <mkorpershoek@baylibre.com> wrote: > >> > >> Hi Kai-Heng, > >> > >> Thanks for your patch, > >> > >> Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > >> > > > > [snipped] > > > >> I confirm this diff works for me: > >> > >> root@i500-pumpkin:~# hciconfig hci0 up > >> root@i500-pumpkin:~# hciconfig hci0 down > >> root@i500-pumpkin:~# hciconfig hci0 up > >> root@i500-pumpkin:~# hciconfig hci0 > >> hci0: Type: Primary Bus: SDIO > >> BD Address: 00:0C:E7:55:FF:12 ACL MTU: 1021:8 SCO MTU: 244:4 > >> UP RUNNING > >> RX bytes:11268 acl:0 sco:0 events:829 errors:0 > >> TX bytes:182569 acl:0 sco:0 commands:829 errors:0 > >> > >> root@i500-pumpkin:~# hcitool scan > >> Scanning ... > >> <redacted> Pixel 3 XL > >> > >> Tested-by: Mattijs Korpershoek <mkorpershoek@baylibre.com> > > > > I found that btmtksdio_flush() only cancels the work instead of doing > > flush_work(). That probably explains why putting ->shutdown right > > before ->flush doesn't work. > > So can you please test the following again: > > diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c > > index 9872ef18f9fea..b33c05ad2150b 100644 > > --- a/drivers/bluetooth/btmtksdio.c > > +++ b/drivers/bluetooth/btmtksdio.c > > @@ -649,9 +649,9 @@ static int btmtksdio_flush(struct hci_dev *hdev) > > { > > struct btmtksdio_dev *bdev = hci_get_drvdata(hdev); > > > > - skb_queue_purge(&bdev->txq); > > + flush_work(&bdev->tx_work); > > > > - cancel_work_sync(&bdev->tx_work); > > + skb_queue_purge(&bdev->txq); > > > > return 0; > > } > > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c > > index 2560ed2f144d4..a61e610a400cb 100644 > > > > --- a/net/bluetooth/hci_core.c > > +++ b/net/bluetooth/hci_core.c > > @@ -1785,6 +1785,14 @@ int hci_dev_do_close(struct hci_dev *hdev) > > aosp_do_close(hdev); > > msft_do_close(hdev); > > > > + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > + test_bit(HCI_UP, &hdev->flags)) { > > + /* Execute vendor specific shutdown routine */ > > + if (hdev->shutdown) > > + hdev->shutdown(hdev); > > + } > > + > > if (hdev->flush) > > hdev->flush(hdev); > > > > @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) > > clear_bit(HCI_INIT, &hdev->flags); > > } > > > > - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && > > - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && > > - test_bit(HCI_UP, &hdev->flags)) { > > - /* Execute vendor specific shutdown routine */ > > - if (hdev->shutdown) > > - hdev->shutdown(hdev); > > - } > > - > > /* flush cmd work */ > > flush_work(&hdev->cmd_work); > I've tried this but I have the same (broken) symptoms as before. > > Here are some logs of v3: > dmesg: https://pastebin.com/1x4UHkzy > ftrace: https://pastebin.com/Lm1d6AWy Thanks for your testing. I think I finally got it: btmtksdio_shutdown() -> mtk_hci_wmt_sync() -> __hci_cmd_send() then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in btmtksdio_recv_event(): btmtksdio_recv_event() -> hci_recv_frame() -> queue_work(hdev->workqueue, &hdev->rx_work); That means it has to be done before the following drain_workqueue() call. Can you please see if moving the ->shutdown() part right before drain_workqueue() can fix the issue? Kai-Heng > > Mattijs > > > > > Kai-Heng
Hi Kai-Heng, Kai-Heng Feng <kai.heng.feng@canonical.com> writes: > Hi Mattijs, > >> [snipped] > > Thanks for your testing. I think I finally got it: > btmtksdio_shutdown() > -> mtk_hci_wmt_sync() > -> __hci_cmd_send() > then waiting for BTMTKSDIO_TX_WAIT_VND_EVT, which is cleared in > btmtksdio_recv_event(): > btmtksdio_recv_event() > -> hci_recv_frame() > -> queue_work(hdev->workqueue, &hdev->rx_work); > > That means it has to be done before the following drain_workqueue() call. > Can you please see if moving the ->shutdown() part right before > drain_workqueue() can fix the issue? I've tested the following patch: diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 2560ed2f144d..131e69a9a66a 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1757,6 +1757,14 @@ int hci_dev_do_close(struct hci_dev *hdev) cancel_delayed_work_sync(&adv_instance->rpa_expired_cb); } + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + /* Avoid potential lockdep warnings from the *_flush() calls by * ensuring the workqueue is empty up front. */ @@ -1798,14 +1806,6 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - /* flush cmd work */ flush_work(&hdev->cmd_work); It does not seem to fix the issue. Adding the bits in btmtksdio_flush() does not change the result of the above patch. Here are the logs. These are just with the above patch (not with the btmtksdio_flush() changes. dmesg: https://pastebin.com/FZZBkqGC btmtksdio_ftrace: https://pastebin.com/JQ0UWenY Mattijs > > Kai-Heng > >> >> Mattijs >> >> > >> > Kai-Heng
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 7baf93eda936..6eedf334f943 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -1716,14 +1716,6 @@ int hci_dev_do_close(struct hci_dev *hdev) BT_DBG("%s %p", hdev->name, hdev); - if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && - !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && - test_bit(HCI_UP, &hdev->flags)) { - /* Execute vendor specific shutdown routine */ - if (hdev->shutdown) - hdev->shutdown(hdev); - } - cancel_delayed_work(&hdev->power_off); cancel_delayed_work(&hdev->ncmd_timer); @@ -1801,6 +1793,14 @@ int hci_dev_do_close(struct hci_dev *hdev) clear_bit(HCI_INIT, &hdev->flags); } + if (!hci_dev_test_flag(hdev, HCI_UNREGISTER) && + !hci_dev_test_flag(hdev, HCI_USER_CHANNEL) && + test_bit(HCI_UP, &hdev->flags)) { + /* Execute vendor specific shutdown routine */ + if (hdev->shutdown) + hdev->shutdown(hdev); + } + /* flush cmd work */ flush_work(&hdev->cmd_work);
Rfkill block and unblock Intel USB Bluetooth [8087:0026] may make it stops working: [ 509.691509] Bluetooth: hci0: HCI reset during shutdown failed [ 514.897584] Bluetooth: hci0: MSFT filter_enable is already on [ 530.044751] usb 3-10: reset full-speed USB device number 5 using xhci_hcd [ 545.660350] usb 3-10: device descriptor read/64, error -110 [ 561.283530] usb 3-10: device descriptor read/64, error -110 [ 561.519682] usb 3-10: reset full-speed USB device number 5 using xhci_hcd [ 566.686650] Bluetooth: hci0: unexpected event for opcode 0x0500 [ 568.752452] Bluetooth: hci0: urb 0000000096cd309b failed to resubmit (113) [ 578.797955] Bluetooth: hci0: Failed to read MSFT supported features (-110) [ 586.286565] Bluetooth: hci0: urb 00000000c522f633 failed to resubmit (113) [ 596.215302] Bluetooth: hci0: Failed to read MSFT supported features (-110) Or kernel panics because other workqueues already freed skb: [ 2048.663763] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 2048.663775] #PF: supervisor read access in kernel mode [ 2048.663779] #PF: error_code(0x0000) - not-present page [ 2048.663782] PGD 0 P4D 0 [ 2048.663787] Oops: 0000 [#1] SMP NOPTI [ 2048.663793] CPU: 3 PID: 4491 Comm: rfkill Tainted: G W 5.13.0-rc1-next-20210510+ #20 [ 2048.663799] Hardware name: HP HP EliteBook 850 G8 Notebook PC/8846, BIOS T76 Ver. 01.01.04 12/02/2020 [ 2048.663801] RIP: 0010:__skb_ext_put+0x6/0x50 [ 2048.663814] Code: 8b 1b 48 85 db 75 db 5b 41 5c 5d c3 be 01 00 00 00 e8 de 13 c0 ff eb e7 be 02 00 00 00 e8 d2 13 c0 ff eb db 0f 1f 44 00 00 55 <8b> 07 48 89 e5 83 f8 01 74 14 b8 ff ff ff ff f0 0f c1 07 83 f8 01 [ 2048.663819] RSP: 0018:ffffc1d105b6fd80 EFLAGS: 00010286 [ 2048.663824] RAX: 0000000000000000 RBX: ffff9d9ac5649000 RCX: 0000000000000000 [ 2048.663827] RDX: ffffffffc0d1daf6 RSI: 0000000000000206 RDI: 0000000000000000 [ 2048.663830] RBP: ffffc1d105b6fd98 R08: 0000000000000001 R09: ffff9d9ace8ceac0 [ 2048.663834] R10: ffff9d9ace8ceac0 R11: 0000000000000001 R12: ffff9d9ac5649000 [ 2048.663838] R13: 0000000000000000 R14: 00007ffe0354d650 R15: 0000000000000000 [ 2048.663843] FS: 00007fe02ab19740(0000) GS:ffff9d9e5f8c0000(0000) knlGS:0000000000000000 [ 2048.663849] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2048.663853] CR2: 0000000000000000 CR3: 0000000111a52004 CR4: 0000000000770ee0 [ 2048.663856] PKRU: 55555554 [ 2048.663859] Call Trace: [ 2048.663865] ? skb_release_head_state+0x5e/0x80 [ 2048.663873] kfree_skb+0x2f/0xb0 [ 2048.663881] btusb_shutdown_intel_new+0x36/0x60 [btusb] [ 2048.663905] hci_dev_do_close+0x48c/0x5e0 [bluetooth] [ 2048.663954] ? __cond_resched+0x1a/0x50 [ 2048.663962] hci_rfkill_set_block+0x56/0xa0 [bluetooth] [ 2048.664007] rfkill_set_block+0x98/0x170 [ 2048.664016] rfkill_fop_write+0x136/0x1e0 [ 2048.664022] vfs_write+0xc7/0x260 [ 2048.664030] ksys_write+0xb1/0xe0 [ 2048.664035] ? exit_to_user_mode_prepare+0x37/0x1c0 [ 2048.664042] __x64_sys_write+0x1a/0x20 [ 2048.664048] do_syscall_64+0x40/0xb0 [ 2048.664055] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 2048.664060] RIP: 0033:0x7fe02ac23c27 [ 2048.664066] Code: 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 [ 2048.664070] RSP: 002b:00007ffe0354d638 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 2048.664075] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe02ac23c27 [ 2048.664078] RDX: 0000000000000008 RSI: 00007ffe0354d650 RDI: 0000000000000003 [ 2048.664081] RBP: 0000000000000000 R08: 0000559b05998440 R09: 0000559b05998440 [ 2048.664084] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000003 [ 2048.664086] R13: 0000000000000000 R14: ffffffff00000000 R15: 00000000ffffffff So move the shutdown callback to a place where workqueues are either flushed or cancelled to resolve the issue. Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- v2: - Rebased on bluetooth-next. net/bluetooth/hci_core.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)