mbox series

[v3,0/2] HID: use spinlocks to safely schedule led workers

Message ID 20230125-hid-unregister-leds-v3-0-0a52ac225e00@diag.uniroma1.it
Headers show
Series HID: use spinlocks to safely schedule led workers | expand

Message

Pietro Borrello Feb. 9, 2023, 11:58 p.m. UTC
I noticed a recurring pattern is present in multiple hid devices in the
Linux tree, where the LED controller of a device schedules a work_struct
to interact with the hardware.
The work_struct is embedded in the device structure and thus, is freed
at device removal.

The issue is that a LED worker may be scheduled by a timer concurrently
with device removal, causing the work_struct to be accessed after having
been freed.
I was able to trigger the issue in hid-bigbenff.c and hid-asus.c 
where the work_structs may be scheduled by the LED controller
while the device is disconnecting, triggering use-after-frees.
I can attach the reproducer, but it's very simple USB configuration, 
using the /dev/raw-gadget interface with some more USB interactions 
to manage LEDs configuration and pass checks in asus_kbd_init() 
and asus_kbd_get_functions() in case of hid-asus.c.
I triggered the issue by connecting a device and immediately 
disconnecting it, so that the remove function runs before the LED one
which remains pending.

I am attaching multiple patches for asus and bigben drivers. 
The proposed patches introduce safe wrappers to schedule the workers
safely with several spinlocks checks.

I attach the (partial for brevity) ODEBUG dumps:

```hid-bigbenff.c
[   37.803135][ T1170] usb 1-1: USB disconnect, device number 2
[   37.827979][ T1170] ODEBUG: free active (active state 0) object
type: work_struct hint: bigben_worker+0x0/0x860
[   37.829634][ T1170] WARNING: CPU: 0 PID: 1170 at
lib/debugobjects.c:505 debug_check_no_obj_freed+0x43a/0x630
[   37.830904][ T1170] Modules linked in:
[   37.831413][ T1170] CPU: 0 PID: 1170 Comm: kworker/0:3 Not tainted
6.1.0-rc4-dirty #43
[   37.832465][ T1170] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   37.833751][ T1170] Workqueue: usb_hub_wq hub_event
[   37.834409][ T1170] RIP: 0010:debug_check_no_obj_freed+0x43a/0x630
[   37.835218][ T1170] Code: 48 89 ef e8 28 82 58 ff 49 8b 14 24 4c 8b
45 00 48 c7 c7 40 5f 09 87 48 c7 c6 60 5b 09 87 89 d9 4d 89 f9 31 c0
e8 46 25 ef fe <0f> 0b 4c 8b 64 24 20 48 ba 00 00 00 00 00 fc ff df ff
05 4f 7c 17
[   37.837667][ T1170] RSP: 0018:ffffc900006fee60 EFLAGS: 00010246
[   37.838503][ T1170] RAX: 0d2d19ffcded3d00 RBX: 0000000000000000
RCX: ffff888117fc9b00
[   37.839519][ T1170] RDX: 0000000000000000 RSI: 0000000000000000
RDI: 0000000000000000
[   37.840570][ T1170] RBP: ffffffff86e88380 R08: ffffffff8130793b
R09: fffff520000dfd85
[   37.841618][ T1170] R10: fffff520000dfd85 R11: 0000000000000000
R12: ffffffff87095fb8
[   37.842649][ T1170] R13: ffff888117770ad8 R14: ffff888117770acc
R15: ffffffff852b7420
[   37.843728][ T1170] FS:  0000000000000000(0000)
GS:ffff8881f6600000(0000) knlGS:0000000000000000
[   37.844877][ T1170] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.845749][ T1170] CR2: 00007f992eaab380 CR3: 000000011834b000
CR4: 00000000001006f0
[   37.846794][ T1170] Call Trace:
[   37.847245][ T1170]  <TASK>
[   37.847643][ T1170]  slab_free_freelist_hook+0x89/0x160
[   37.848409][ T1170]  ? devres_release_all+0x262/0x350
[   37.849156][ T1170]  __kmem_cache_free+0x71/0x110
[   37.849829][ T1170]  devres_release_all+0x262/0x350
[   37.850478][ T1170]  ? devres_release+0x90/0x90
[   37.851118][ T1170]  device_release_driver_internal+0x5e5/0x8a0
[   37.851944][ T1170]  bus_remove_device+0x2ea/0x400
[   37.852611][ T1170]  device_del+0x64f/0xb40
[   37.853212][ T1170]  ? kill_device+0x150/0x150
[   37.853831][ T1170]  ? print_irqtrace_events+0x1f0/0x1f0
[   37.854564][ T1170]  hid_destroy_device+0x66/0x100
[   37.855226][ T1170]  usbhid_disconnect+0x9a/0xc0
[   37.855887][ T1170]  usb_unbind_interface+0x1e1/0x890
```

``` hid-asus.c
[   77.409878][ T1169] usb 1-1: USB disconnect, device number 2
[   77.423606][ T1169] ODEBUG: free active (active state 0) object
type: work_struct hint: asus_kbd_backlight_work+0x0/0x2c0
[   77.425222][ T1169] WARNING: CPU: 0 PID: 1169 at
lib/debugobjects.c:505 debug_check_no_obj_freed+0x43a/0x630
[   77.426599][ T1169] Modules linked in:
[   77.427322][ T1169] CPU: 0 PID: 1169 Comm: kworker/0:3 Not tainted
6.1.0-rc4-dirty #43
[   77.428404][ T1169] Hardware name: QEMU Standard PC (i440FX + PIIX,
1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[   77.429644][ T1169] Workqueue: usb_hub_wq hub_event
[   77.430296][ T1169] RIP: 0010:debug_check_no_obj_freed+0x43a/0x630
[   77.431142][ T1169] Code: 48 89 ef e8 28 82 58 ff 49 8b 14 24 4c 8b
45 00 48 c7 c7 40 5f 09 87 48 c7 c6 60 5b 09 87 89 d9 4d 89 f9 31 c0
e8 46 25 ef fe <0f> 0b 4c 8b 64 24 20 48 ba 00 00 00 00 00 fc ff df ff
05 4f 7c 17
[   77.433691][ T1169] RSP: 0018:ffffc9000069ee60 EFLAGS: 00010246
[   77.434470][ T1169] RAX: b85d2b40c12d7600 RBX: 0000000000000000
RCX: ffff888117a78000
[   77.435507][ T1169] RDX: 0000000000000000 RSI: 0000000080000000
RDI: 0000000000000000
[   77.436521][ T1169] RBP: ffffffff86e88380 R08: ffffffff8130793b
R09: ffffed103ecc4ed6
[   77.437582][ T1169] R10: ffffed103ecc4ed6 R11: 0000000000000000
R12: ffffffff87095fb8
[   77.438593][ T1169] R13: ffff88810e348fe0 R14: ffff88810e348fd4
R15: ffffffff852b5780
[   77.439667][ T1169] FS:  0000000000000000(0000)
GS:ffff8881f6600000(0000) knlGS:0000000000000000
[   77.440842][ T1169] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   77.441688][ T1169] CR2: 00007ffc05495ff0 CR3: 000000010cdf0000
CR4: 00000000001006f0
[   77.442720][ T1169] Call Trace:
[   77.443167][ T1169]  <TASK>
[   77.443555][ T1169]  slab_free_freelist_hook+0x89/0x160
[   77.444302][ T1169]  ? devres_release_all+0x262/0x350
[   77.444990][ T1169]  __kmem_cache_free+0x71/0x110
[   77.445638][ T1169]  devres_release_all+0x262/0x350
[   77.446309][ T1169]  ? devres_release+0x90/0x90
[   77.446978][ T1169]  device_release_driver_internal+0x5e5/0x8a0
[   77.447748][ T1169]  bus_remove_device+0x2ea/0x400
[   77.448421][ T1169]  device_del+0x64f/0xb40
[   77.448976][ T1169]  ? kill_device+0x150/0x150
[   77.449577][ T1169]  ? print_irqtrace_events+0x1f0/0x1f0
[   77.450307][ T1169]  hid_destroy_device+0x66/0x100
[   77.450938][ T1169]  usbhid_disconnect+0x9a/0xc0
```

Signed-off-by: Pietro Borrello <borrello@diag.uniroma1.it>
---
Changes in v3:
- use spinlocks to prevent workers scheduling
- drop patches on sony & playstation hid drivers
- Link to v2: https://lore.kernel.org/r/20230125-hid-unregister-leds-v2-0-689cc62fc878@diag.uniroma1.it

Changes in v2:
- dualshock4: Clarify UAF
- dualsense:  Clarify UAF
- dualsense:  Unregister multicolor led controller
- Link to v1: https://lore.kernel.org/r/20230125-hid-unregister-leds-v1-0-9a5192dcef16@diag.uniroma1.it

---
Pietro Borrello (2):
      HID: bigben: use spinlock to safely schedule workers
      HID: asus: use spinlock to safely schedule workers

 drivers/hid/hid-asus.c     | 24 +++++++++++++++++++++---
 drivers/hid/hid-bigbenff.c | 34 +++++++++++++++++++++++++++++-----
 2 files changed, 50 insertions(+), 8 deletions(-)
---
base-commit: 2241ab53cbb5cdb08a6b2d4688feb13971058f65
change-id: 20230125-hid-unregister-leds-4cbf67099e1d

Best regards,

Comments

Benjamin Tissoires Feb. 10, 2023, 2:11 p.m. UTC | #1
On Fri, Feb 10, 2023 at 2:24 PM Hillf Danton <hdanton@sina.com> wrote:
>
> On Thu, 09 Feb 2023 23:58:55 +0000 Pietro Borrello <borrello@diag.uniroma1.it>
> > Use spinlocks to deal with workers introducing a wrapper
> > bigben_schedule_work(), and several spinlock checks.
> > Otherwise, bigben_set_led() may schedule bigben->worker after the
> > structure has been freed, causing a use-after-free.
> >
> > Fixes: 4eb1b01de5b9 ("HID: hid-bigbenff: fix race condition for scheduled work during removal")
>
> Given the flag added in 4eb1b01de5b9 and the spinlock added in this
> patchset, devm_led_classdev_register() looks to not work for you.

Actually, looking at the code now, it is clear that we need that lock.
The current code is happily changing the struct bigben_device from
multiple contexts, and pulls that without any barrier in the work
struct which should produce some interesting results :)

And we can probably abuse that lock to prevent scheduling a new work
as it is done in hid-playstation.c

I'll comment in the patch which parts need to be changed, because it
is true that this patch is definitely not mergeable as such and will
need another revision.

>
> How about replacing the advanced devm_ method with the traditional plain
> pair of led_classdev_un/register(), with the flag mentioned cut off but
> without bothering to add another lock?
>

As mentioned above, the lock is needed anyway, and will probably need
to be added in a separate patch.
Reverting to a non devm version of the led class would complexify the
driver for the error paths, and is probably not the best move IMO.

Cheers,
Benjamin
Benjamin Tissoires Feb. 13, 2023, 8:25 a.m. UTC | #2
On Sat, Feb 11, 2023 at 3:01 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Fri, 10 Feb 2023 15:11:26 +0100 Benjamin Tissoires <benjamin.tissoires@redhat.com>
> > On Fri, Feb 10, 2023 at 2:24 PM Hillf Danton <hdanton@sina.com> wrote:
> > > On Thu, 09 Feb 2023 23:58:55 +0000 Pietro Borrello <borrello@diag.uniroma1.it>
> > > > Use spinlocks to deal with workers introducing a wrapper
> > > > bigben_schedule_work(), and several spinlock checks.
> > > > Otherwise, bigben_set_led() may schedule bigben->worker after the
> > > > structure has been freed, causing a use-after-free.
> > > >
> > > > Fixes: 4eb1b01de5b9 ("HID: hid-bigbenff: fix race condition for scheduled work during removal")
> > >
> > > Given the flag added in 4eb1b01de5b9 and the spinlock added in this
> > > patchset, devm_led_classdev_register() looks to not work for you.
> >
> > Actually, looking at the code now, it is clear that we need that lock.
> > The current code is happily changing the struct bigben_device from
> > multiple contexts, and pulls that without any barrier in the work
> > struct which should produce some interesting results :)
> >
> > And we can probably abuse that lock to prevent scheduling a new work
> > as it is done in hid-playstation.c
> >
> > I'll comment in the patch which parts need to be changed, because it
> > is true that this patch is definitely not mergeable as such and will
> > need another revision.
> >
> > >
> > > How about replacing the advanced devm_ method with the traditional plain
> > > pair of led_classdev_un/register(), with the flag mentioned cut off but
> > > without bothering to add another lock?
> > >
> >
> > As mentioned above, the lock is needed anyway, and will probably need
> > to be added in a separate patch.
> > Reverting to a non devm version of the led class would complexify the
> > driver for the error paths, and is probably not the best move IMO.
>
> After this patch,
>
>         cpu 0                   cpu 2
>         ---                     ---
>         bigben_remove()
>         spin_lock_irqsave(&bigben->lock, flags);
>         bigben->removed = true;
>         spin_unlock_irqrestore(&bigben->lock, flags);
>
>                                 spin_lock_irqsave(&bigben->lock, flags);
>
> what makes it safe for cpu2 to acquire lock after the removed flag is true?

The remove flag is just a way to prevent any other workqueue from
starting. It doesn't mean that the struct bigben has been freed, so
acquiring a lock at that point is fine.
We then rely on 2 things:
- devm_class_led to be released before struct bigben, because it was
devm-allocated *after* the struct bigben was devm-allocated
- we prevent any new workqueue to start and we guarantee that any
running workqueue is terminated before leaving the .remove function.

Given that the ledclass is gracefully shutting down all of its
potential queues, we don't have any other possibility to have an
unsafe call AFAIU.

Cheers,
Benjamin
Benjamin Tissoires Feb. 13, 2023, 10:47 a.m. UTC | #3
On Mon, Feb 13, 2023 at 11:37 AM Hillf Danton <hdanton@sina.com> wrote:
>
> On Mon, 13 Feb 2023 09:25:37 +0100 Benjamin Tissoires <benjamin.tissoires@redhat.com>
> >
> > The remove flag is just a way to prevent any other workqueue from
> > starting. It doesn't mean that the struct bigben has been freed, so
> > acquiring a lock at that point is fine.
> > We then rely on 2 things:
> > - devm_class_led to be released before struct bigben, because it was
> > devm-allocated *after* the struct bigben was devm-allocated
>
> This is the current behavior and it is intact after this patch.
>
> > - we prevent any new workqueue to start and we guarantee that any
> > running workqueue is terminated before leaving the .remove function.
>
> If spinlock is added for not scheduling new workqueue then it is not
> needed, because the removed flag is set before running workqueue is
> terminated. Checking the flag is enough upon queuing new work.
>

I tend to disagree (based on Pietro's v4:
- no worker is running
- a led sysfs call is made
- the line "if (!bigben->removed)" is true
  - this gets interrupted/or another CPU kicks in for the next one
-> .remove gets called
- bigben->removed is set to false
- cancel_work_sync() called

the led call continues, and schedules the work

.removes terminates, and devm kicks in, killing led_class and struct
bigben while the workqueue is running.

So having a simple spinlocks ensures the atomicity between checking
for bigben->removed and scheduling a workqueue.

Cheers,
Benjamin