diff mbox series

usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate()

Message ID 20230117131839.1138208-1-maze@google.com
State New
Headers show
Series usb: gadget: f_ncm: fix potential NULL ptr deref in ncm_bitrate() | expand

Commit Message

Maciej Żenczykowski Jan. 17, 2023, 1:18 p.m. UTC
In Google internal bug 265639009 we've received an (as yet) unreproducible
crash report from an aarch64 GKI 5.10.149-android13 running device.

AFAICT the source code is at:
  https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_13-5.10

The call stack is:
  ncm_close() -> ncm_notify() -> ncm_do_notify()
with the crash at:
  ncm_do_notify+0x98/0x270
Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b)

Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...):

  // halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification)
  0B 0D 00 79    strh w11, [x8, #6]

  // word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request)
  6C 0A 00 B9    str  w12, [x19, #8]

  // x10 (NULL) was read here from offset 0 of valid pointer x9
  // IMHO we're reading 'cdev->gadget' and getting NULL
  // gadget is indeed at offset 0 of struct usb_composite_dev
  2A 01 40 F9    ldr  x10, [x9]

  // loading req->buf pointer, which is at offset 0 of struct usb_request
  69 02 40 F9    ldr  x9, [x19]

  // x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed
  4B 5D 40 B9    ldr  w11, [x10, #0x5c]

which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment:

  event->wLength = cpu_to_le16(8);
  req->length = NCM_STATUS_BYTECOUNT;

  /* SPEED_CHANGE data is up/down speeds in bits/sec */
  data = req->buf + sizeof *event;
  data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));

My analysis of registers and NULL ptr deref crash offset
  (Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c)
heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing:
  data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
which calls:
  ncm_bitrate(NULL)
which then calls:
  gadget_is_superspeed(NULL)
which reads
  ((struct usb_gadget *)NULL)->max_speed
and hits a panic.

AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C.
(remember there's a GKI KABI reservation of 16 bytes in struct work_struct)

It's not at all clear to me how this is all supposed to work...
but returning 0 seems much better than panic-ing...

Cc: Felipe Balbi <balbi@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Lorenzo Colitti <lorenzo@google.com>
Cc: Carlos Llamas <cmllamas@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 drivers/usb/gadget/function/f_ncm.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

John Keeping Jan. 20, 2023, 7:27 p.m. UTC | #1
On Tue, Jan 17, 2023 at 05:18:39AM -0800, Maciej Żenczykowski wrote:
> In Google internal bug 265639009 we've received an (as yet) unreproducible
> crash report from an aarch64 GKI 5.10.149-android13 running device.
> 
> AFAICT the source code is at:
>   https://android.googlesource.com/kernel/common/+/refs/tags/ASB-2022-12-05_13-5.10
> 
> The call stack is:
>   ncm_close() -> ncm_notify() -> ncm_do_notify()
> with the crash at:
>   ncm_do_notify+0x98/0x270
> Code: 79000d0b b9000a6c f940012a f9400269 (b9405d4b)
> 
> Which I believe disassembles to (I don't know ARM assembly, but it looks sane enough to me...):
> 
>   // halfword (16-bit) store presumably to event->wLength (at offset 6 of struct usb_cdc_notification)
>   0B 0D 00 79    strh w11, [x8, #6]
> 
>   // word (32-bit) store presumably to req->Length (at offset 8 of struct usb_request)
>   6C 0A 00 B9    str  w12, [x19, #8]
> 
>   // x10 (NULL) was read here from offset 0 of valid pointer x9
>   // IMHO we're reading 'cdev->gadget' and getting NULL
>   // gadget is indeed at offset 0 of struct usb_composite_dev
>   2A 01 40 F9    ldr  x10, [x9]
> 
>   // loading req->buf pointer, which is at offset 0 of struct usb_request
>   69 02 40 F9    ldr  x9, [x19]
> 
>   // x10 is null, crash, appears to be attempt to read cdev->gadget->max_speed
>   4B 5D 40 B9    ldr  w11, [x10, #0x5c]
> 
> which seems to line up with ncm_do_notify() case NCM_NOTIFY_SPEED code fragment:
> 
>   event->wLength = cpu_to_le16(8);
>   req->length = NCM_STATUS_BYTECOUNT;
> 
>   /* SPEED_CHANGE data is up/down speeds in bits/sec */
>   data = req->buf + sizeof *event;
>   data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
> 
> My analysis of registers and NULL ptr deref crash offset
>   (Unable to handle kernel NULL pointer dereference at virtual address 000000000000005c)
> heavily suggests that the crash is due to 'cdev->gadget' being NULL when executing:
>   data[0] = cpu_to_le32(ncm_bitrate(cdev->gadget));
> which calls:
>   ncm_bitrate(NULL)
> which then calls:
>   gadget_is_superspeed(NULL)
> which reads
>   ((struct usb_gadget *)NULL)->max_speed
> and hits a panic.
> 
> AFAICT, if I'm counting right, the offset of max_speed is indeed 0x5C.
> (remember there's a GKI KABI reservation of 16 bytes in struct work_struct)
> 
> It's not at all clear to me how this is all supposed to work...
> but returning 0 seems much better than panic-ing...
> 
> Cc: Felipe Balbi <balbi@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Cc: Carlos Llamas <cmllamas@google.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
>  drivers/usb/gadget/function/f_ncm.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
> index c36bcfa0e9b4..424bb3b666db 100644
> --- a/drivers/usb/gadget/function/f_ncm.c
> +++ b/drivers/usb/gadget/function/f_ncm.c
> @@ -83,7 +83,9 @@ static inline struct f_ncm *func_to_ncm(struct usb_function *f)
>  /* peak (theoretical) bulk transfer rate in bits-per-second */
>  static inline unsigned ncm_bitrate(struct usb_gadget *g)
>  {
> -	if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS)
> +	if (!g)
> +		return 0;
> +	else if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS)
>  		return 4250000000U;
>  	else if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER)
>  		return 3750000000U;

This looks like the wrong place to fix things - if this case is hit,
don't we go on to call usb_eq_queue() which can't be valid if the gadget
has been destroyed?

I don't see how cdev->gadget can be set to null without cdev being freed
so is this actually a use-after-free not a simple null-dereference?

My guess is that somehow the gadget is being destroyed while the network
interface is held open (we've seen similar issues in other, non-network,
gadget functions), but I don't know enough about the network side of
things to know how to cause that from userspace.


John
Maciej Żenczykowski Jan. 20, 2023, 8:11 p.m. UTC | #2
> This looks like the wrong place to fix things - if this case is hit,
> don't we go on to call usb_eq_queue() which can't be valid if the gadget
> has been destroyed?
>
> I don't see how cdev->gadget can be set to null without cdev being freed
> so is this actually a use-after-free not a simple null-dereference?
>
> My guess is that somehow the gadget is being destroyed while the network
> interface is held open (we've seen similar issues in other, non-network,
> gadget functions), but I don't know enough about the network side of
> things to know how to cause that from userspace.

I'm still waiting on confirmation of whether this fixes things.
So far we've seen it crash twice without the fix...

I don't know what triggers it - it's being triggered in some huge
automated test framework.
Whether the issue is lack of bind, or a too early gadget unbind... or
something else...

As mentioned in the patch, I'm not entirely sure if this is the right fix...
I certainly don't claim to understand the usb/gadget stack.

It does seem like usb_ep_queue() at least has some protections in place...
though no idea if they're enough - and whether we'll hit a
WARN_ON_ONCE now instead?

Honestly I don't even understand *why* we're sending out this speed
notification out of ncm_close...
(and if we do send speed notification of out ncm_close()... shouldn't
it always just say speed 0?)
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index c36bcfa0e9b4..424bb3b666db 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -83,7 +83,9 @@  static inline struct f_ncm *func_to_ncm(struct usb_function *f)
 /* peak (theoretical) bulk transfer rate in bits-per-second */
 static inline unsigned ncm_bitrate(struct usb_gadget *g)
 {
-	if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS)
+	if (!g)
+		return 0;
+	else if (gadget_is_superspeed(g) && g->speed >= USB_SPEED_SUPER_PLUS)
 		return 4250000000U;
 	else if (gadget_is_superspeed(g) && g->speed == USB_SPEED_SUPER)
 		return 3750000000U;