diff mbox series

[v2] usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs

Message ID 20240131150332.1326523-1-quic_kriskura@quicinc.com
State New
Headers show
Series [v2] usb: gadget: ncm: Avoid dropping datagrams of properly parsed NTBs | expand

Commit Message

Krishna Kurapati Jan. 31, 2024, 3:03 p.m. UTC
It is observed sometimes when tethering is used over NCM with Windows 11
as host, at some instances, the gadget_giveback has one byte appended at
the end of a proper NTB. When the NTB is parsed, unwrap call looks for
any leftover bytes in SKB provided by u_ether and if there are any pending
bytes, it treats them as a separate NTB and parses it. But in case the
second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that
were parsed properly in the first NTB and saved in rx_list are dropped.

Adding a few custom traces showed the following:

[002] d..1  7828.532866: dwc3_gadget_giveback: ep1out:
req 000000003868811a length 1025/16384 zsI ==> 0
[002] d..1  7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025
[002] d..1  7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
[002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67
[002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400
[002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10
[002] d..1  7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames

In this case, the giveback is of 1025 bytes and block length is 1024.
The rest 1 byte (which is 0x00) won't be parsed resulting in drop of
all datagrams in rx_list.

Same is case with packets of size 2048:
[002] d..1  7828.557948: dwc3_gadget_giveback: ep1out:
req 0000000011dfd96e length 2049/16384 zsI ==> 0
[002] d..1  7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
[002] d..1  7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800

Lecroy shows one byte coming in extra confirming that the byte is coming
in from PC:

Transfer 2959 - Bytes Transferred(1025)  Timestamp((18.524 843 590)
- Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590)
--- Packet 4063861
      Data(1024 bytes)
      Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590)
--- Packet 4063863
      Data(1 byte)
      Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)

According to Windows driver, no ZLP is needed if wBlockLength is non-zero,
because the non-zero wBlockLength has already told the function side the
size of transfer to be expected. However, there are in-market NCM devices
that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize.
To deal with such devices, it pads an extra 0 at end so the transfer is no
longer multiple of wMaxPacketSize.

Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
---
 drivers/usb/gadget/function/f_ncm.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Maciej Żenczykowski Jan. 31, 2024, 5:03 p.m. UTC | #1
On Wed, Jan 31, 2024 at 7:03 AM Krishna Kurapati
<quic_kriskura@quicinc.com> wrote:
>
> It is observed sometimes when tethering is used over NCM with Windows 11
> as host, at some instances, the gadget_giveback has one byte appended at
> the end of a proper NTB. When the NTB is parsed, unwrap call looks for
> any leftover bytes in SKB provided by u_ether and if there are any pending
> bytes, it treats them as a separate NTB and parses it. But in case the
> second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that
> were parsed properly in the first NTB and saved in rx_list are dropped.
>
> Adding a few custom traces showed the following:
>
> [002] d..1  7828.532866: dwc3_gadget_giveback: ep1out:
> req 000000003868811a length 1025/16384 zsI ==> 0
> [002] d..1  7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025
> [002] d..1  7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
> [002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67
> [002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400
> [002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10
> [002] d..1  7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames
>
> In this case, the giveback is of 1025 bytes and block length is 1024.
> The rest 1 byte (which is 0x00) won't be parsed resulting in drop of
> all datagrams in rx_list.
>
> Same is case with packets of size 2048:
> [002] d..1  7828.557948: dwc3_gadget_giveback: ep1out:
> req 0000000011dfd96e length 2049/16384 zsI ==> 0
> [002] d..1  7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
> [002] d..1  7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800
>
> Lecroy shows one byte coming in extra confirming that the byte is coming
> in from PC:
>
> Transfer 2959 - Bytes Transferred(1025)  Timestamp((18.524 843 590)
> - Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590)
> --- Packet 4063861
>       Data(1024 bytes)
>       Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590)
> --- Packet 4063863
>       Data(1 byte)
>       Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)
>
> According to Windows driver, no ZLP is needed if wBlockLength is non-zero,
> because the non-zero wBlockLength has already told the function side the
> size of transfer to be expected. However, there are in-market NCM devices
> that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize.
> To deal with such devices, it pads an extra 0 at end so the transfer is no
> longer multiple of wMaxPacketSize.
>
> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
> ---
>  drivers/usb/gadget/function/f_ncm.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
> index ca5d5f564998..8c314dc98952 100644
> --- a/drivers/usb/gadget/function/f_ncm.c
> +++ b/drivers/usb/gadget/function/f_ncm.c
> @@ -1338,11 +1338,17 @@ static int ncm_unwrap_ntb(struct gether *port,
>              "Parsed NTB with %d frames\n", dgram_counter);
>
>         to_process -= block_len;
> -       if (to_process != 0) {
> +
> +       if (to_process == 1 &&
> +           (block_len % 512 == 0) &&
> +           (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) {

I'm unconvinced this (block_len % 512) works right...
imagine you have 2 ntbs back to back (for example 400 + 624) that
together add up to 1024,
and then a padding null byte.
I think block_len will be 624 then and not 1024.

perhaps this actually needs to be:
  (ntp_ptr + block_len - skb->data) % 512 == 0

The point is that AFAICT the multiple of 512/1024 that matters is in
the size of the USB transfer,
not the NTB block size itself.

> +               goto done;
> +       } else if (to_process > 0) {
>                 ntb_ptr = (unsigned char *)(ntb_ptr + block_len);

note that ntb_ptr moves forward by block_len here,
so it's *not* always the start of the packet, so block_len is not
necessarily the actual on the wire size.

>                 goto parse_ntb;
>         }
>
> +done:
>         dev_consume_skb_any(skb);
>
>         return 0;
> --
> 2.34.1
>

But it would perhaps be worth confirming in the MS driver source what
exactly they're doing...
(maybe they never even generate multiple ntbs per usb segment...)

You may also want to mention in the commit message that 512 comes from
USB2 block size, and also works for USB3 block size of 1024.

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski Jan. 31, 2024, 5:18 p.m. UTC | #2
On Wed, Jan 31, 2024 at 9:14 AM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Wed, Jan 31, 2024 at 9:03 AM Maciej Żenczykowski <maze@google.com> wrote:
> >
> > On Wed, Jan 31, 2024 at 7:03 AM Krishna Kurapati
> > <quic_kriskura@quicinc.com> wrote:
> > >
> > > It is observed sometimes when tethering is used over NCM with Windows 11
> > > as host, at some instances, the gadget_giveback has one byte appended at
> > > the end of a proper NTB. When the NTB is parsed, unwrap call looks for
> > > any leftover bytes in SKB provided by u_ether and if there are any pending
> > > bytes, it treats them as a separate NTB and parses it. But in case the
> > > second NTB (as per unwrap call) is faulty/corrupt, all the datagrams that
> > > were parsed properly in the first NTB and saved in rx_list are dropped.
> > >
> > > Adding a few custom traces showed the following:
> > >
> > > [002] d..1  7828.532866: dwc3_gadget_giveback: ep1out:
> > > req 000000003868811a length 1025/16384 zsI ==> 0
> > > [002] d..1  7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb toprocess: 1025
> > > [002] d..1  7828.532867: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
> > > [002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb seq: 0xce67
> > > [002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x400
> > > [002] d..1  7828.532868: ncm_unwrap_ntb: K: ncm_unwrap_ntb ndp_len: 0x10
> > > [002] d..1  7828.532869: ncm_unwrap_ntb: K: Parsed NTB with 1 frames
> > >
> > > In this case, the giveback is of 1025 bytes and block length is 1024.
> > > The rest 1 byte (which is 0x00) won't be parsed resulting in drop of
> > > all datagrams in rx_list.
> > >
> > > Same is case with packets of size 2048:
> > > [002] d..1  7828.557948: dwc3_gadget_giveback: ep1out:
> > > req 0000000011dfd96e length 2049/16384 zsI ==> 0
> > > [002] d..1  7828.557949: ncm_unwrap_ntb: K: ncm_unwrap_ntb nth: 1751999342
> > > [002] d..1  7828.557950: ncm_unwrap_ntb: K: ncm_unwrap_ntb blk_len: 0x800
> > >
> > > Lecroy shows one byte coming in extra confirming that the byte is coming
> > > in from PC:
> > >
> > > Transfer 2959 - Bytes Transferred(1025)  Timestamp((18.524 843 590)
> > > - Transaction 8391 - Data(1025 bytes) Timestamp(18.524 843 590)
> > > --- Packet 4063861
> > >       Data(1024 bytes)
> > >       Duration(2.117us) Idle(14.700ns) Timestamp(18.524 843 590)
> > > --- Packet 4063863
> > >       Data(1 byte)
> > >       Duration(66.160ns) Time(282.000ns) Timestamp(18.524 845 722)
> > >
> > > According to Windows driver, no ZLP is needed if wBlockLength is non-zero,
> > > because the non-zero wBlockLength has already told the function side the
> > > size of transfer to be expected. However, there are in-market NCM devices
> > > that rely on ZLP as long as the wBlockLength is multiple of wMaxPacketSize.
> > > To deal with such devices, it pads an extra 0 at end so the transfer is no
> > > longer multiple of wMaxPacketSize.
> > >
> > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com>
> > > ---
> > >  drivers/usb/gadget/function/f_ncm.c | 8 +++++++-
> > >  1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
> > > index ca5d5f564998..8c314dc98952 100644
> > > --- a/drivers/usb/gadget/function/f_ncm.c
> > > +++ b/drivers/usb/gadget/function/f_ncm.c
> > > @@ -1338,11 +1338,17 @@ static int ncm_unwrap_ntb(struct gether *port,
> > >              "Parsed NTB with %d frames\n", dgram_counter);
> > >
> > >         to_process -= block_len;
> > > -       if (to_process != 0) {
> > > +
> > > +       if (to_process == 1 &&
> > > +           (block_len % 512 == 0) &&
> > > +           (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) {
> >
> > I'm unconvinced this (block_len % 512) works right...
> > imagine you have 2 ntbs back to back (for example 400 + 624) that
> > together add up to 1024,
> > and then a padding null byte.
> > I think block_len will be 624 then and not 1024.
> >
> > perhaps this actually needs to be:
> >   (ntp_ptr + block_len - skb->data) % 512 == 0
> >
> > The point is that AFAICT the multiple of 512/1024 that matters is in
> > the size of the USB transfer,
> > not the NTB block size itself.
> >
> > > +               goto done;
> > > +       } else if (to_process > 0) {
> > >                 ntb_ptr = (unsigned char *)(ntb_ptr + block_len);
> >
> > note that ntb_ptr moves forward by block_len here,
> > so it's *not* always the start of the packet, so block_len is not
> > necessarily the actual on the wire size.
> >
> > >                 goto parse_ntb;
> > >         }
> > >
> > > +done:
> > >         dev_consume_skb_any(skb);
> > >
> > >         return 0;
> > > --
> > > 2.34.1
> > >
> >
> > But it would perhaps be worth confirming in the MS driver source what
> > exactly they're doing...
> > (maybe they never even generate multiple ntbs per usb segment...)
> >
> > You may also want to mention in the commit message that 512 comes from
> > USB2 block size, and also works for USB3 block size of 1024.
>
> Also since this is a fix, it should probably have a Fixes tag
> (possibly on some original sha1 that added the driver, since I think
> it's always been broken) or at least a commit title that more clearly
> flags it as a 'fix' or cc stable...
>
> (something like 'usb: gadget: ncm: fix rare win11 packet discard')
>
> We definitely want this to hit LTS...

One more thing: the 'goto done' and 'done' label are not needed -
that's already the natural code flow...
So the 'goto' could just be replaced with a comment or perhaps with
--to_process.
Krishna Kurapati Jan. 31, 2024, 6:08 p.m. UTC | #3
On 1/31/2024 10:33 PM, Maciej Żenczykowski wrote:
>>
>> diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
>> index ca5d5f564998..8c314dc98952 100644
>> --- a/drivers/usb/gadget/function/f_ncm.c
>> +++ b/drivers/usb/gadget/function/f_ncm.c
>> @@ -1338,11 +1338,17 @@ static int ncm_unwrap_ntb(struct gether *port,
>>               "Parsed NTB with %d frames\n", dgram_counter);
>>
>>          to_process -= block_len;
>> -       if (to_process != 0) {
>> +
>> +       if (to_process == 1 &&
>> +           (block_len % 512 == 0) &&
>> +           (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) {
> 

Hi Maciej,

> I'm unconvinced this (block_len % 512) works right...
> imagine you have 2 ntbs back to back (for example 400 + 624) that
> together add up to 1024,
> and then a padding null byte.
> I think block_len will be 624 then and not 1024.
>
> perhaps this actually needs to be:
>    (ntp_ptr + block_len - skb->data) % 512 == 0
> 
> The point is that AFAICT the multiple of 512/1024 that matters is in
> the size of the USB transfer,
> not the NTB block size itself.
> 

Ahh !!  So you mean, since this comes because the host doesn't want to 
send packets of size wMaxPacketSize on the wire, we need to consider the 
length of data parsed so far to be checked against 512/1024 and not 
block length.

But either ways, the implementation in the patch also the same thing in 
a different way right ? Process all NTB's present iteratively and while 
doing so, check if there is one byte left. So if there are multiple 
NTB's mixed up to form a packet of size 1024, even then, both the logics 
would converge onto the point where they find that one byte is left.

>> +               goto done;
>> +       } else if (to_process > 0) {
>>                  ntb_ptr = (unsigned char *)(ntb_ptr + block_len);
> 
> note that ntb_ptr moves forward by block_len here,
> so it's *not* always the start of the packet, so block_len is not
> necessarily the actual on the wire size.
> 

Apologies, I didn't understand this comment. By moving the ntb_ptr by 
block length, we are pointing to the (last byte/ start of the next NTB) 
right as we are fixing in [1] ?

>>                  goto parse_ntb;
>>          }
>>
>> +done:
>>          dev_consume_skb_any(skb);
>>
>>          return 0;
>> --
>> 2.34.1
>>
> 
> But it would perhaps be worth confirming in the MS driver source what
> exactly they're doing...
> (maybe they never even generate multiple ntbs per usb segment...)
> 

Yes they do and we made a fix for that in [1].


> You may also want to mention in the commit message that 512 comes from
> USB2 block size, and also works for USB3 block size of 1024.
> 

Sure. Will update the commit message accordingly.

[1]: 
https://lore.kernel.org/all/20230927105858.12950-1-quic_kriskura@quicinc.com/

Regards,
Krishna,
Krishna Kurapati Feb. 1, 2024, 5:44 a.m. UTC | #4
On 2/1/2024 12:24 AM, Maciej Żenczykowski wrote:

> 
> I believe so, yes.
> 
> I believe they are trying to avoid having to send ZLPs.
> That's determined *purely* by the size of things as they show up on
> the usb cable (ie. the size of the usb xfer).
> ie. that's where things that are a multiple of 512 (USB2) or 1024
> (USB3) need an extra 0 byte sized packet to prevent ZLP.
> 
> The actual size of the NTB doesn't matter.
> 
> That said... maybe we're overcomplicating this...
> Maybe it's enough to just remove this modulo check entirely (I know I
> asked for it before).
> 
> Ultimately if we just do:
> 
> // Windows NCM driver avoids USB ZLPs by adding a 1-byte zero pad as needed
> if (to_process == 1 && !*(u8*)(ntb_ptr + block_len)) --to_process;
> 
> it'll fix the problem too, and perhaps be easier to understand?
> 
I agree. This will simplify the check and also cover all cases.
To keep diff minimal from tested version (because issue comes up easily 
with a particular test here), I can remove the second check in the if 
block in the v2 and push it as v3.

Regards,
Krishna,
diff mbox series

Patch

diff --git a/drivers/usb/gadget/function/f_ncm.c b/drivers/usb/gadget/function/f_ncm.c
index ca5d5f564998..8c314dc98952 100644
--- a/drivers/usb/gadget/function/f_ncm.c
+++ b/drivers/usb/gadget/function/f_ncm.c
@@ -1338,11 +1338,17 @@  static int ncm_unwrap_ntb(struct gether *port,
 	     "Parsed NTB with %d frames\n", dgram_counter);
 
 	to_process -= block_len;
-	if (to_process != 0) {
+
+	if (to_process == 1 &&
+	    (block_len % 512 == 0) &&
+	    (*(unsigned char *)(ntb_ptr + block_len) == 0x00)) {
+		goto done;
+	} else if (to_process > 0) {
 		ntb_ptr = (unsigned char *)(ntb_ptr + block_len);
 		goto parse_ntb;
 	}
 
+done:
 	dev_consume_skb_any(skb);
 
 	return 0;