diff mbox series

usb: dwc3: ep0: prevent dwc3_request from being queued twice

Message ID 20250327141630.2085029-1-fisaksen@baylibre.com
State New
Headers show
Series usb: dwc3: ep0: prevent dwc3_request from being queued twice | expand

Commit Message

Frode Isaksen March 27, 2025, 2:15 p.m. UTC
From: Frode Isaksen <frode@meta.com>

Prevent dwc3_request from being queued twice, by checking
req->status.
Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
dwc3_request from being queued twice") for non-ep0 endpoints.
Crash log:
list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
 next=ffffff87ab485b60.
kernel BUG at lib/list_debug.c:35!
Call trace:
__list_add_valid+0x70/0xc0
__dwc3_gadget_ep0_queue+0x70/0x224
dwc3_ep0_handle_status+0x118/0x200
dwc3_ep0_inspect_setup+0x144/0x32c
dwc3_ep0_interrupt+0xac/0x340
dwc3_process_event_entry+0x90/0x724
dwc3_process_event_buf+0x7c/0x33c
dwc3_thread_interrupt+0x58/0x8c

Signed-off-by: Frode Isaksen <frode@meta.com>
---
This bug was discovered, tested and fixed (no more crashes seen) on 
Meta Quest 3 device. Also tested on T.I. AM62x board.

 drivers/usb/dwc3/ep0.c    | 5 +++++
 drivers/usb/dwc3/gadget.c | 1 +
 2 files changed, 6 insertions(+)

Comments

Thinh Nguyen April 1, 2025, 11:49 p.m. UTC | #1
On Thu, Mar 27, 2025, Frode Isaksen wrote:
> From: Frode Isaksen <frode@meta.com>
> 
> Prevent dwc3_request from being queued twice, by checking
> req->status.
> Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
> dwc3_request from being queued twice") for non-ep0 endpoints.
> Crash log:
> list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
>  next=ffffff87ab485b60.
> kernel BUG at lib/list_debug.c:35!
> Call trace:
> __list_add_valid+0x70/0xc0
> __dwc3_gadget_ep0_queue+0x70/0x224
> dwc3_ep0_handle_status+0x118/0x200
> dwc3_ep0_inspect_setup+0x144/0x32c
> dwc3_ep0_interrupt+0xac/0x340
> dwc3_process_event_entry+0x90/0x724
> dwc3_process_event_buf+0x7c/0x33c
> dwc3_thread_interrupt+0x58/0x8c
> 
> Signed-off-by: Frode Isaksen <frode@meta.com>
> ---
> This bug was discovered, tested and fixed (no more crashes seen) on 
> Meta Quest 3 device. Also tested on T.I. AM62x board.
> 
>  drivers/usb/dwc3/ep0.c    | 5 +++++
>  drivers/usb/dwc3/gadget.c | 1 +
>  2 files changed, 6 insertions(+)
> 
> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> index 666ac432f52d..e26c3a62d470 100644
> --- a/drivers/usb/dwc3/ep0.c
> +++ b/drivers/usb/dwc3/ep0.c
> @@ -91,6 +91,11 @@ static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
>  {
>  	struct dwc3		*dwc = dep->dwc;
>  
> +	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,

Let's not use WARN. Perhaps dev_warn?

> +				"%s: request %pK already in flight\n",
> +				dep->name, &req->request))
> +		return -EINVAL;
> +
>  	req->request.actual	= 0;
>  	req->request.status	= -EINPROGRESS;
>  	req->epnum		= dep->number;
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 89a4dc8ebf94..c34446d8c54f 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -3002,6 +3002,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
>  	dwc->ep0_bounced = false;
>  	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
>  	dwc->delayed_status = false;
> +	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
>  	dwc3_ep0_out_start(dwc);
>  
>  	dwc3_gadget_enable_irq(dwc);
> -- 
> 2.48.1
> 

I'm still not clear how this can happen. Are you testing this against
mainline? Can you provide the dwc3 tracepoints?

Thanks,
Thinh
Thinh Nguyen April 7, 2025, 11:44 p.m. UTC | #2
On Thu, Apr 03, 2025, Frode Isaksen wrote:
> On 4/3/25 12:21 AM, Thinh Nguyen wrote:
> > On Wed, Apr 02, 2025, Frode Isaksen wrote:
> > > On 4/2/25 1:49 AM, Thinh Nguyen wrote:
> > > > On Thu, Mar 27, 2025, Frode Isaksen wrote:
> > > > > From: Frode Isaksen <frode@meta.com>
> > > > > 
> > > > > Prevent dwc3_request from being queued twice, by checking
> > > > > req->status.
> > > > > Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
> > > > > dwc3_request from being queued twice") for non-ep0 endpoints.
> > > > > Crash log:
> > > > > list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
> > > > >    next=ffffff87ab485b60.
> > > > > kernel BUG at lib/list_debug.c:35!
> > > > > Call trace:
> > > > > __list_add_valid+0x70/0xc0
> > > > > __dwc3_gadget_ep0_queue+0x70/0x224
> > > > > dwc3_ep0_handle_status+0x118/0x200
> > > > > dwc3_ep0_inspect_setup+0x144/0x32c
> > > > > dwc3_ep0_interrupt+0xac/0x340
> > > > > dwc3_process_event_entry+0x90/0x724
> > > > > dwc3_process_event_buf+0x7c/0x33c
> > > > > dwc3_thread_interrupt+0x58/0x8c
> > > > > 
> > > > > Signed-off-by: Frode Isaksen <frode@meta.com>
> > > > > ---
> > > > > This bug was discovered, tested and fixed (no more crashes seen) on
> > > > > Meta Quest 3 device. Also tested on T.I. AM62x board.
> > > > > 
> > > > >    drivers/usb/dwc3/ep0.c    | 5 +++++
> > > > >    drivers/usb/dwc3/gadget.c | 1 +
> > > > >    2 files changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> > > > > index 666ac432f52d..e26c3a62d470 100644
> > > > > --- a/drivers/usb/dwc3/ep0.c
> > > > > +++ b/drivers/usb/dwc3/ep0.c
> > > > > @@ -91,6 +91,11 @@ static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
> > > > >    {
> > > > >    	struct dwc3		*dwc = dep->dwc;
> > > > > +	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,
> > > > Let's not use WARN. Perhaps dev_warn?
> > > I copied the coding style from commit b2b6d601365a ("usb: dwc3: gadget:
> > > prevent
> > > 
> > > dwc3_request from being queued twice"), so maybe a follow-up patch to change from WARN to dev_warn for the two patches ?
> > We can just use dev_warn here, we don't need 2 separate patches for this
> > change. The other place can be reworked in a separate patch.
> OK
> > 
> > > > > +				"%s: request %pK already in flight\n",
> > > > > +				dep->name, &req->request))
> > > > > +		return -EINVAL;
> > > > > +
> > > > >    	req->request.actual	= 0;
> > > > >    	req->request.status	= -EINPROGRESS;
> > > > >    	req->epnum		= dep->number;
> > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > index 89a4dc8ebf94..c34446d8c54f 100644
> > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > @@ -3002,6 +3002,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
> > > > >    	dwc->ep0_bounced = false;
> > > > >    	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
> > > > >    	dwc->delayed_status = false;
> > > > > +	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
> > > > >    	dwc3_ep0_out_start(dwc);
> > > > >    	dwc3_gadget_enable_irq(dwc);
> > > > > -- 
> > > > > 2.48.1
> > > > > 
> > > > I'm still not clear how this can happen. Are you testing this against
> > > > mainline? Can you provide the dwc3 tracepoints?
> > > I was not able to reproduce this locally. Was seen on 5.10, tested on 6.1
> > > and 6.6.
> > > 
> > Without understanding how this can happen and why it is needed, we
> > should not just apply this. Kernel v5.10, v6.1, and v6.6 are really old.
> > We have a lot of fixes since then. Please see if this is reproducible
> > using mainline.
> 
> We have applied all relevant patches from mainline to ep0.c, in order to try
> to fix this crash:

What causes the crash? I'm still not clear whether you were able to
reproduced this on mainline, or any of the new stable tree.

> 
> 566bc793bdd7 usb: dwc3: ep0: Don't clear ep0 DWC3_EP_TRANSFER_STARTED
> 371d3fc577db usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
> d819a0bbb493 usb: dwc3: ep0: Don't reset resource alloc flag
> 2eb3ba38bf88 usb: dwc3: gadget: Rewrite endpoint allocation flow
> cbfcf517dc42 FROMGIT: usb: dwc: ep0: Update request status in
> dwc3_ep0_stall_restart
> 4fc39328579e UPSTREAM: usb: dwc3: Avoid unmapping USB requests if endxfer is
> not complete
> 5dc06419d8a6 UPSTREAM: usb: dwc3: Do not service EP0 and conndone events if
> soft disconnected
> 75a4f0b5e1f4 UPSTREAM: usb: dwc3: ep0: Properly handle setup_packet_pending
> scenario in data stage
> a79e848e5299 UPSTREAM: usb: dwc3: ep0: Don't prepare beyond Setup stage
> 910e9e60492a UPSTREAM: usb: dwc3: Fix ep0 handling when getting reset while
> doing control transfer
> fe513e1c2685 UPSTREAM: usb: dwc3: gadget: Wait for ep0 xfers to complete
> during dequeueThan
> 

Cherry-picking the upstream patch like this will be difficult to debug
as I can't determine the corresponding changes related to the usb gadget
core API and the dwc3.

BR,
Thinh
Frode Isaksen April 8, 2025, 7:38 a.m. UTC | #3
On 4/8/25 1:44 AM, Thinh Nguyen wrote:
> On Thu, Apr 03, 2025, Frode Isaksen wrote:
>> On 4/3/25 12:21 AM, Thinh Nguyen wrote:
>>> On Wed, Apr 02, 2025, Frode Isaksen wrote:
>>>> On 4/2/25 1:49 AM, Thinh Nguyen wrote:
>>>>> On Thu, Mar 27, 2025, Frode Isaksen wrote:
>>>>>> From: Frode Isaksen <frode@meta.com>
>>>>>>
>>>>>> Prevent dwc3_request from being queued twice, by checking
>>>>>> req->status.
>>>>>> Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
>>>>>> dwc3_request from being queued twice") for non-ep0 endpoints.
>>>>>> Crash log:
>>>>>> list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
>>>>>>     next=ffffff87ab485b60.
>>>>>> kernel BUG at lib/list_debug.c:35!
>>>>>> Call trace:
>>>>>> __list_add_valid+0x70/0xc0
>>>>>> __dwc3_gadget_ep0_queue+0x70/0x224
>>>>>> dwc3_ep0_handle_status+0x118/0x200
>>>>>> dwc3_ep0_inspect_setup+0x144/0x32c
>>>>>> dwc3_ep0_interrupt+0xac/0x340
>>>>>> dwc3_process_event_entry+0x90/0x724
>>>>>> dwc3_process_event_buf+0x7c/0x33c
>>>>>> dwc3_thread_interrupt+0x58/0x8c
>>>>>>
>>>>>> Signed-off-by: Frode Isaksen <frode@meta.com>
>>>>>> ---
>>>>>> This bug was discovered, tested and fixed (no more crashes seen) on
>>>>>> Meta Quest 3 device. Also tested on T.I. AM62x board.
>>>>>>
>>>>>>     drivers/usb/dwc3/ep0.c    | 5 +++++
>>>>>>     drivers/usb/dwc3/gadget.c | 1 +
>>>>>>     2 files changed, 6 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>> index 666ac432f52d..e26c3a62d470 100644
>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>> @@ -91,6 +91,11 @@ static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
>>>>>>     {
>>>>>>     	struct dwc3		*dwc = dep->dwc;
>>>>>> +	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,
>>>>> Let's not use WARN. Perhaps dev_warn?
>>>> I copied the coding style from commit b2b6d601365a ("usb: dwc3: gadget:
>>>> prevent
>>>>
>>>> dwc3_request from being queued twice"), so maybe a follow-up patch to change from WARN to dev_warn for the two patches ?
>>> We can just use dev_warn here, we don't need 2 separate patches for this
>>> change. The other place can be reworked in a separate patch.
>> OK
>>>>>> +				"%s: request %pK already in flight\n",
>>>>>> +				dep->name, &req->request))
>>>>>> +		return -EINVAL;
>>>>>> +
>>>>>>     	req->request.actual	= 0;
>>>>>>     	req->request.status	= -EINPROGRESS;
>>>>>>     	req->epnum		= dep->number;
>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>> index 89a4dc8ebf94..c34446d8c54f 100644
>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>> @@ -3002,6 +3002,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
>>>>>>     	dwc->ep0_bounced = false;
>>>>>>     	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
>>>>>>     	dwc->delayed_status = false;
>>>>>> +	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
>>>>>>     	dwc3_ep0_out_start(dwc);
>>>>>>     	dwc3_gadget_enable_irq(dwc);
>>>>>> -- 
>>>>>> 2.48.1
>>>>>>
>>>>> I'm still not clear how this can happen. Are you testing this against
>>>>> mainline? Can you provide the dwc3 tracepoints?
>>>> I was not able to reproduce this locally. Was seen on 5.10, tested on 6.1
>>>> and 6.6.
>>>>
>>> Without understanding how this can happen and why it is needed, we
>>> should not just apply this. Kernel v5.10, v6.1, and v6.6 are really old.
>>> We have a lot of fixes since then. Please see if this is reproducible
>>> using mainline.
>> We have applied all relevant patches from mainline to ep0.c, in order to try
>> to fix this crash:
> What causes the crash? I'm still not clear whether you were able to
> reproduced this on mainline, or any of the new stable tree.
I was not able to reproduce this bug locally in any version.
>
>> 566bc793bdd7 usb: dwc3: ep0: Don't clear ep0 DWC3_EP_TRANSFER_STARTED
>> 371d3fc577db usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
>> d819a0bbb493 usb: dwc3: ep0: Don't reset resource alloc flag
>> 2eb3ba38bf88 usb: dwc3: gadget: Rewrite endpoint allocation flow
>> cbfcf517dc42 FROMGIT: usb: dwc: ep0: Update request status in
>> dwc3_ep0_stall_restart
>> 4fc39328579e UPSTREAM: usb: dwc3: Avoid unmapping USB requests if endxfer is
>> not complete
>> 5dc06419d8a6 UPSTREAM: usb: dwc3: Do not service EP0 and conndone events if
>> soft disconnected
>> 75a4f0b5e1f4 UPSTREAM: usb: dwc3: ep0: Properly handle setup_packet_pending
>> scenario in data stage
>> a79e848e5299 UPSTREAM: usb: dwc3: ep0: Don't prepare beyond Setup stage
>> 910e9e60492a UPSTREAM: usb: dwc3: Fix ep0 handling when getting reset while
>> doing control transfer
>> fe513e1c2685 UPSTREAM: usb: dwc3: gadget: Wait for ep0 xfers to complete
>> during dequeueThan
>>
> Cherry-picking the upstream patch like this will be difficult to debug
> as I can't determine the corresponding changes related to the usb gadget
> core API and the dwc3.

There were first a WARNING:

<4>[ 341.860109] WARNING: CPU: 0 PID: 8548 at 
drivers/usb/dwc3/ep0.c:1077 dwc3_ep0_interrupt+0x1e8/0x340

Here:

static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct 
dwc3_ep *dep) { WARN_ON(dwc3_ep0_start_control_status(dep)); } and then 
the crash:

<3>[ 351.674418] list_add double add: new=ffffff87ab2c7950, 
prev=ffffff87ab2c7950, next=ffffff87ab485b60.

<6>[ 351.674437] ------------[ cut here ]------------

<2>[ 351.674439] kernel BUG at lib/list_debug.c:35!

<0>[ 351.674447] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP

<6>[ 351.675047] Call trace:

<6>[ 351.675052] __list_add_valid+0x70/0xc0

<6>[ 351.675060] __dwc3_gadget_ep0_queue+0x70/0x224

<6>[ 351.675064] dwc3_ep0_handle_status+0x118/0x200

<6>[ 351.675068] dwc3_ep0_inspect_setup+0x144/0x32c

<6>[ 351.675073] dwc3_ep0_interrupt+0xac/0x340

<6>[ 351.675078] dwc3_process_event_entry+0x90/0x724

<6>[ 351.675082] dwc3_process_event_buf+0x7c/0x33c

<6>[ 351.675086] dwc3_thread_interrupt+0x58/0x8c

<6>[ 351.675092] irq_thread_fn+0x54/0xd4

This is all I have..

Thanks,

Frode

>
> BR,
> Thinh
Thinh Nguyen April 9, 2025, 1:16 a.m. UTC | #4
On Tue, Apr 08, 2025, Frode Isaksen wrote:
> On 4/8/25 1:44 AM, Thinh Nguyen wrote:
> > On Thu, Apr 03, 2025, Frode Isaksen wrote:
> > > On 4/3/25 12:21 AM, Thinh Nguyen wrote:
> > > > On Wed, Apr 02, 2025, Frode Isaksen wrote:
> > > > > On 4/2/25 1:49 AM, Thinh Nguyen wrote:
> > > > > > On Thu, Mar 27, 2025, Frode Isaksen wrote:
> > > > > > > From: Frode Isaksen <frode@meta.com>
> > > > > > > 
> > > > > > > Prevent dwc3_request from being queued twice, by checking
> > > > > > > req->status.
> > > > > > > Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
> > > > > > > dwc3_request from being queued twice") for non-ep0 endpoints.
> > > > > > > Crash log:
> > > > > > > list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
> > > > > > >     next=ffffff87ab485b60.
> > > > > > > kernel BUG at lib/list_debug.c:35!
> > > > > > > Call trace:
> > > > > > > __list_add_valid+0x70/0xc0
> > > > > > > __dwc3_gadget_ep0_queue+0x70/0x224
> > > > > > > dwc3_ep0_handle_status+0x118/0x200
> > > > > > > dwc3_ep0_inspect_setup+0x144/0x32c
> > > > > > > dwc3_ep0_interrupt+0xac/0x340
> > > > > > > dwc3_process_event_entry+0x90/0x724
> > > > > > > dwc3_process_event_buf+0x7c/0x33c
> > > > > > > dwc3_thread_interrupt+0x58/0x8c
> > > > > > > 
> > > > > > > Signed-off-by: Frode Isaksen <frode@meta.com>
> > > > > > > ---
> > > > > > > This bug was discovered, tested and fixed (no more crashes seen) on
> > > > > > > Meta Quest 3 device. Also tested on T.I. AM62x board.
> > > > > > > 
> > > > > > >     drivers/usb/dwc3/ep0.c    | 5 +++++
> > > > > > >     drivers/usb/dwc3/gadget.c | 1 +
> > > > > > >     2 files changed, 6 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> > > > > > > index 666ac432f52d..e26c3a62d470 100644
> > > > > > > --- a/drivers/usb/dwc3/ep0.c
> > > > > > > +++ b/drivers/usb/dwc3/ep0.c
> > > > > > > @@ -91,6 +91,11 @@ static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
> > > > > > >     {
> > > > > > >     	struct dwc3		*dwc = dep->dwc;
> > > > > > > +	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,
> > > > > > Let's not use WARN. Perhaps dev_warn?
> > > > > I copied the coding style from commit b2b6d601365a ("usb: dwc3: gadget:
> > > > > prevent
> > > > > 
> > > > > dwc3_request from being queued twice"), so maybe a follow-up patch to change from WARN to dev_warn for the two patches ?
> > > > We can just use dev_warn here, we don't need 2 separate patches for this
> > > > change. The other place can be reworked in a separate patch.
> > > OK
> > > > > > > +				"%s: request %pK already in flight\n",
> > > > > > > +				dep->name, &req->request))
> > > > > > > +		return -EINVAL;
> > > > > > > +
> > > > > > >     	req->request.actual	= 0;
> > > > > > >     	req->request.status	= -EINPROGRESS;
> > > > > > >     	req->epnum		= dep->number;
> > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > > index 89a4dc8ebf94..c34446d8c54f 100644
> > > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > > @@ -3002,6 +3002,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
> > > > > > >     	dwc->ep0_bounced = false;
> > > > > > >     	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
> > > > > > >     	dwc->delayed_status = false;
> > > > > > > +	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
> > > > > > >     	dwc3_ep0_out_start(dwc);
> > > > > > >     	dwc3_gadget_enable_irq(dwc);
> > > > > > > -- 
> > > > > > > 2.48.1
> > > > > > > 
> > > > > > I'm still not clear how this can happen. Are you testing this against
> > > > > > mainline? Can you provide the dwc3 tracepoints?
> > > > > I was not able to reproduce this locally. Was seen on 5.10, tested on 6.1
> > > > > and 6.6.
> > > > > 
> > > > Without understanding how this can happen and why it is needed, we
> > > > should not just apply this. Kernel v5.10, v6.1, and v6.6 are really old.
> > > > We have a lot of fixes since then. Please see if this is reproducible
> > > > using mainline.
> > > We have applied all relevant patches from mainline to ep0.c, in order to try
> > > to fix this crash:
> > What causes the crash? I'm still not clear whether you were able to
> > reproduced this on mainline, or any of the new stable tree.
> I was not able to reproduce this bug locally in any version.
> > 
> > > 566bc793bdd7 usb: dwc3: ep0: Don't clear ep0 DWC3_EP_TRANSFER_STARTED
> > > 371d3fc577db usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
> > > d819a0bbb493 usb: dwc3: ep0: Don't reset resource alloc flag
> > > 2eb3ba38bf88 usb: dwc3: gadget: Rewrite endpoint allocation flow
> > > cbfcf517dc42 FROMGIT: usb: dwc: ep0: Update request status in
> > > dwc3_ep0_stall_restart
> > > 4fc39328579e UPSTREAM: usb: dwc3: Avoid unmapping USB requests if endxfer is
> > > not complete
> > > 5dc06419d8a6 UPSTREAM: usb: dwc3: Do not service EP0 and conndone events if
> > > soft disconnected
> > > 75a4f0b5e1f4 UPSTREAM: usb: dwc3: ep0: Properly handle setup_packet_pending
> > > scenario in data stage
> > > a79e848e5299 UPSTREAM: usb: dwc3: ep0: Don't prepare beyond Setup stage
> > > 910e9e60492a UPSTREAM: usb: dwc3: Fix ep0 handling when getting reset while
> > > doing control transfer
> > > fe513e1c2685 UPSTREAM: usb: dwc3: gadget: Wait for ep0 xfers to complete
> > > during dequeueThan
> > > 
> > Cherry-picking the upstream patch like this will be difficult to debug
> > as I can't determine the corresponding changes related to the usb gadget
> > core API and the dwc3.
> 
> There were first a WARNING:
> 
> <4>[ 341.860109] WARNING: CPU: 0 PID: 8548 at drivers/usb/dwc3/ep0.c:1077
> dwc3_ep0_interrupt+0x1e8/0x340
> 
> Here:
> 
> static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep
> *dep) { WARN_ON(dwc3_ep0_start_control_status(dep)); } and then the crash:
> 
> <3>[ 351.674418] list_add double add: new=ffffff87ab2c7950,
> prev=ffffff87ab2c7950, next=ffffff87ab485b60.
> 
> <6>[ 351.674437] ------------[ cut here ]------------
> 
> <2>[ 351.674439] kernel BUG at lib/list_debug.c:35!
> 
> <0>[ 351.674447] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> 
> <6>[ 351.675047] Call trace:
> 
> <6>[ 351.675052] __list_add_valid+0x70/0xc0
> 
> <6>[ 351.675060] __dwc3_gadget_ep0_queue+0x70/0x224
> 
> <6>[ 351.675064] dwc3_ep0_handle_status+0x118/0x200
> 
> <6>[ 351.675068] dwc3_ep0_inspect_setup+0x144/0x32c
> 
> <6>[ 351.675073] dwc3_ep0_interrupt+0xac/0x340
> 
> <6>[ 351.675078] dwc3_process_event_entry+0x90/0x724
> 
> <6>[ 351.675082] dwc3_process_event_buf+0x7c/0x33c
> 
> <6>[ 351.675086] dwc3_thread_interrupt+0x58/0x8c
> 
> <6>[ 351.675092] irq_thread_fn+0x54/0xd4
> 
> This is all I have..
> 

Hm... which kernel version was this reproduced?

Are you using the usb composite framework? The control transfers are
driven by the host, and the usb gadget composite framework should not
hit scenarios like this.

Can you capture the tracepoints* for dwc3?
[*] https://docs.kernel.org/driver-api/usb/dwc3.html#reporting-bugs

Thanks,
Thinh
Thinh Nguyen April 10, 2025, 12:09 a.m. UTC | #5
On Wed, Apr 09, 2025, Thinh Nguyen wrote:
> On Tue, Apr 08, 2025, Frode Isaksen wrote:
> > On 4/8/25 1:44 AM, Thinh Nguyen wrote:
> > > On Thu, Apr 03, 2025, Frode Isaksen wrote:
> > > > On 4/3/25 12:21 AM, Thinh Nguyen wrote:
> > > > > On Wed, Apr 02, 2025, Frode Isaksen wrote:
> > > > > > On 4/2/25 1:49 AM, Thinh Nguyen wrote:
> > > > > > > On Thu, Mar 27, 2025, Frode Isaksen wrote:
> > > > > > > > From: Frode Isaksen <frode@meta.com>
> > > > > > > > 
> > > > > > > > Prevent dwc3_request from being queued twice, by checking
> > > > > > > > req->status.
> > > > > > > > Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
> > > > > > > > dwc3_request from being queued twice") for non-ep0 endpoints.
> > > > > > > > Crash log:
> > > > > > > > list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
> > > > > > > >     next=ffffff87ab485b60.
> > > > > > > > kernel BUG at lib/list_debug.c:35!
> > > > > > > > Call trace:
> > > > > > > > __list_add_valid+0x70/0xc0
> > > > > > > > __dwc3_gadget_ep0_queue+0x70/0x224
> > > > > > > > dwc3_ep0_handle_status+0x118/0x200
> > > > > > > > dwc3_ep0_inspect_setup+0x144/0x32c
> > > > > > > > dwc3_ep0_interrupt+0xac/0x340
> > > > > > > > dwc3_process_event_entry+0x90/0x724
> > > > > > > > dwc3_process_event_buf+0x7c/0x33c
> > > > > > > > dwc3_thread_interrupt+0x58/0x8c
> > > > > > > > 
> > > > > > > > Signed-off-by: Frode Isaksen <frode@meta.com>
> > > > > > > > ---
> > > > > > > > This bug was discovered, tested and fixed (no more crashes seen) on
> > > > > > > > Meta Quest 3 device. Also tested on T.I. AM62x board.
> > > > > > > > 
> > > > > > > >     drivers/usb/dwc3/ep0.c    | 5 +++++
> > > > > > > >     drivers/usb/dwc3/gadget.c | 1 +
> > > > > > > >     2 files changed, 6 insertions(+)
> > > > > > > > 
> > > > > > > > diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> > > > > > > > index 666ac432f52d..e26c3a62d470 100644
> > > > > > > > --- a/drivers/usb/dwc3/ep0.c
> > > > > > > > +++ b/drivers/usb/dwc3/ep0.c
> > > > > > > > @@ -91,6 +91,11 @@ static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
> > > > > > > >     {
> > > > > > > >     	struct dwc3		*dwc = dep->dwc;
> > > > > > > > +	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,
> > > > > > > Let's not use WARN. Perhaps dev_warn?
> > > > > > I copied the coding style from commit b2b6d601365a ("usb: dwc3: gadget:
> > > > > > prevent
> > > > > > 
> > > > > > dwc3_request from being queued twice"), so maybe a follow-up patch to change from WARN to dev_warn for the two patches ?
> > > > > We can just use dev_warn here, we don't need 2 separate patches for this
> > > > > change. The other place can be reworked in a separate patch.
> > > > OK
> > > > > > > > +				"%s: request %pK already in flight\n",
> > > > > > > > +				dep->name, &req->request))
> > > > > > > > +		return -EINVAL;
> > > > > > > > +
> > > > > > > >     	req->request.actual	= 0;
> > > > > > > >     	req->request.status	= -EINPROGRESS;
> > > > > > > >     	req->epnum		= dep->number;
> > > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > > > index 89a4dc8ebf94..c34446d8c54f 100644
> > > > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > > > @@ -3002,6 +3002,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
> > > > > > > >     	dwc->ep0_bounced = false;
> > > > > > > >     	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
> > > > > > > >     	dwc->delayed_status = false;
> > > > > > > > +	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
> > > > > > > >     	dwc3_ep0_out_start(dwc);
> > > > > > > >     	dwc3_gadget_enable_irq(dwc);
> > > > > > > > -- 
> > > > > > > > 2.48.1
> > > > > > > > 
> > > > > > > I'm still not clear how this can happen. Are you testing this against
> > > > > > > mainline? Can you provide the dwc3 tracepoints?
> > > > > > I was not able to reproduce this locally. Was seen on 5.10, tested on 6.1
> > > > > > and 6.6.
> > > > > > 
> > > > > Without understanding how this can happen and why it is needed, we
> > > > > should not just apply this. Kernel v5.10, v6.1, and v6.6 are really old.
> > > > > We have a lot of fixes since then. Please see if this is reproducible
> > > > > using mainline.
> > > > We have applied all relevant patches from mainline to ep0.c, in order to try
> > > > to fix this crash:
> > > What causes the crash? I'm still not clear whether you were able to
> > > reproduced this on mainline, or any of the new stable tree.

<snip>

> > I was not able to reproduce this bug locally in any version.

Sorry, I totally missed this response...

<snip>

> > 
> > There were first a WARNING:
> > 
> > <4>[ 341.860109] WARNING: CPU: 0 PID: 8548 at drivers/usb/dwc3/ep0.c:1077
> > dwc3_ep0_interrupt+0x1e8/0x340
> > 
> > Here:
> > 
> > static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep
> > *dep) { WARN_ON(dwc3_ep0_start_control_status(dep)); } and then the crash:
> > 
> > <3>[ 351.674418] list_add double add: new=ffffff87ab2c7950,
> > prev=ffffff87ab2c7950, next=ffffff87ab485b60.
> > 
> > <6>[ 351.674437] ------------[ cut here ]------------
> > 
> > <2>[ 351.674439] kernel BUG at lib/list_debug.c:35!
> > 
> > <0>[ 351.674447] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > 
> > <6>[ 351.675047] Call trace:
> > 
> > <6>[ 351.675052] __list_add_valid+0x70/0xc0
> > 
> > <6>[ 351.675060] __dwc3_gadget_ep0_queue+0x70/0x224
> > 
> > <6>[ 351.675064] dwc3_ep0_handle_status+0x118/0x200
> > 
> > <6>[ 351.675068] dwc3_ep0_inspect_setup+0x144/0x32c
> > 
> > <6>[ 351.675073] dwc3_ep0_interrupt+0xac/0x340
> > 
> > <6>[ 351.675078] dwc3_process_event_entry+0x90/0x724
> > 
> > <6>[ 351.675082] dwc3_process_event_buf+0x7c/0x33c
> > 
> > <6>[ 351.675086] dwc3_thread_interrupt+0x58/0x8c
> > 
> > <6>[ 351.675092] irq_thread_fn+0x54/0xd4
> > 
> > This is all I have..
> > 
> 
> Hm... which kernel version was this reproduced?
> 
> Are you using the usb composite framework? The control transfers are
> driven by the host, and the usb gadget composite framework should not
> hit scenarios like this.
> 
> Can you capture the tracepoints* for dwc3?
> [*] https://docs.kernel.org/driver-api/usb/dwc3.html#reporting-bugs

Ignore this request since you can't reproduce this bug. But can you help
answer the question above about whether you used the usb composite
framework?

Thanks,
Thinh
Frode Isaksen April 10, 2025, 12:12 p.m. UTC | #6
On 4/10/25 2:09 AM, Thinh Nguyen wrote:
> On Wed, Apr 09, 2025, Thinh Nguyen wrote:
>> On Tue, Apr 08, 2025, Frode Isaksen wrote:
>>> On 4/8/25 1:44 AM, Thinh Nguyen wrote:
>>>> On Thu, Apr 03, 2025, Frode Isaksen wrote:
>>>>> On 4/3/25 12:21 AM, Thinh Nguyen wrote:
>>>>>> On Wed, Apr 02, 2025, Frode Isaksen wrote:
>>>>>>> On 4/2/25 1:49 AM, Thinh Nguyen wrote:
>>>>>>>> On Thu, Mar 27, 2025, Frode Isaksen wrote:
>>>>>>>>> From: Frode Isaksen <frode@meta.com>
>>>>>>>>>
>>>>>>>>> Prevent dwc3_request from being queued twice, by checking
>>>>>>>>> req->status.
>>>>>>>>> Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
>>>>>>>>> dwc3_request from being queued twice") for non-ep0 endpoints.
>>>>>>>>> Crash log:
>>>>>>>>> list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
>>>>>>>>>      next=ffffff87ab485b60.
>>>>>>>>> kernel BUG at lib/list_debug.c:35!
>>>>>>>>> Call trace:
>>>>>>>>> __list_add_valid+0x70/0xc0
>>>>>>>>> __dwc3_gadget_ep0_queue+0x70/0x224
>>>>>>>>> dwc3_ep0_handle_status+0x118/0x200
>>>>>>>>> dwc3_ep0_inspect_setup+0x144/0x32c
>>>>>>>>> dwc3_ep0_interrupt+0xac/0x340
>>>>>>>>> dwc3_process_event_entry+0x90/0x724
>>>>>>>>> dwc3_process_event_buf+0x7c/0x33c
>>>>>>>>> dwc3_thread_interrupt+0x58/0x8c
>>>>>>>>>
>>>>>>>>> Signed-off-by: Frode Isaksen <frode@meta.com>
>>>>>>>>> ---
>>>>>>>>> This bug was discovered, tested and fixed (no more crashes seen) on
>>>>>>>>> Meta Quest 3 device. Also tested on T.I. AM62x board.
>>>>>>>>>
>>>>>>>>>      drivers/usb/dwc3/ep0.c    | 5 +++++
>>>>>>>>>      drivers/usb/dwc3/gadget.c | 1 +
>>>>>>>>>      2 files changed, 6 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
>>>>>>>>> index 666ac432f52d..e26c3a62d470 100644
>>>>>>>>> --- a/drivers/usb/dwc3/ep0.c
>>>>>>>>> +++ b/drivers/usb/dwc3/ep0.c
>>>>>>>>> @@ -91,6 +91,11 @@ static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
>>>>>>>>>      {
>>>>>>>>>      	struct dwc3		*dwc = dep->dwc;
>>>>>>>>> +	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,
>>>>>>>> Let's not use WARN. Perhaps dev_warn?
>>>>>>> I copied the coding style from commit b2b6d601365a ("usb: dwc3: gadget:
>>>>>>> prevent
>>>>>>>
>>>>>>> dwc3_request from being queued twice"), so maybe a follow-up patch to change from WARN to dev_warn for the two patches ?
>>>>>> We can just use dev_warn here, we don't need 2 separate patches for this
>>>>>> change. The other place can be reworked in a separate patch.
>>>>> OK
>>>>>>>>> +				"%s: request %pK already in flight\n",
>>>>>>>>> +				dep->name, &req->request))
>>>>>>>>> +		return -EINVAL;
>>>>>>>>> +
>>>>>>>>>      	req->request.actual	= 0;
>>>>>>>>>      	req->request.status	= -EINPROGRESS;
>>>>>>>>>      	req->epnum		= dep->number;
>>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>>>>>>>>> index 89a4dc8ebf94..c34446d8c54f 100644
>>>>>>>>> --- a/drivers/usb/dwc3/gadget.c
>>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>>>>>>> @@ -3002,6 +3002,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
>>>>>>>>>      	dwc->ep0_bounced = false;
>>>>>>>>>      	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
>>>>>>>>>      	dwc->delayed_status = false;
>>>>>>>>> +	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
>>>>>>>>>      	dwc3_ep0_out_start(dwc);
>>>>>>>>>      	dwc3_gadget_enable_irq(dwc);
>>>>>>>>> -- 
>>>>>>>>> 2.48.1
>>>>>>>>>
>>>>>>>> I'm still not clear how this can happen. Are you testing this against
>>>>>>>> mainline? Can you provide the dwc3 tracepoints?
>>>>>>> I was not able to reproduce this locally. Was seen on 5.10, tested on 6.1
>>>>>>> and 6.6.
>>>>>>>
>>>>>> Without understanding how this can happen and why it is needed, we
>>>>>> should not just apply this. Kernel v5.10, v6.1, and v6.6 are really old.
>>>>>> We have a lot of fixes since then. Please see if this is reproducible
>>>>>> using mainline.
>>>>> We have applied all relevant patches from mainline to ep0.c, in order to try
>>>>> to fix this crash:
>>>> What causes the crash? I'm still not clear whether you were able to
>>>> reproduced this on mainline, or any of the new stable tree.
> <snip>
>
>>> I was not able to reproduce this bug locally in any version.
> Sorry, I totally missed this response...
>
> <snip>
>
>>> There were first a WARNING:
>>>
>>> <4>[ 341.860109] WARNING: CPU: 0 PID: 8548 at drivers/usb/dwc3/ep0.c:1077
>>> dwc3_ep0_interrupt+0x1e8/0x340
>>>
>>> Here:
>>>
>>> static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep
>>> *dep) { WARN_ON(dwc3_ep0_start_control_status(dep)); } and then the crash:
>>>
>>> <3>[ 351.674418] list_add double add: new=ffffff87ab2c7950,
>>> prev=ffffff87ab2c7950, next=ffffff87ab485b60.
>>>
>>> <6>[ 351.674437] ------------[ cut here ]------------
>>>
>>> <2>[ 351.674439] kernel BUG at lib/list_debug.c:35!
>>>
>>> <0>[ 351.674447] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
>>>
>>> <6>[ 351.675047] Call trace:
>>>
>>> <6>[ 351.675052] __list_add_valid+0x70/0xc0
>>>
>>> <6>[ 351.675060] __dwc3_gadget_ep0_queue+0x70/0x224
>>>
>>> <6>[ 351.675064] dwc3_ep0_handle_status+0x118/0x200
>>>
>>> <6>[ 351.675068] dwc3_ep0_inspect_setup+0x144/0x32c
>>>
>>> <6>[ 351.675073] dwc3_ep0_interrupt+0xac/0x340
>>>
>>> <6>[ 351.675078] dwc3_process_event_entry+0x90/0x724
>>>
>>> <6>[ 351.675082] dwc3_process_event_buf+0x7c/0x33c
>>>
>>> <6>[ 351.675086] dwc3_thread_interrupt+0x58/0x8c
>>>
>>> <6>[ 351.675092] irq_thread_fn+0x54/0xd4
>>>
>>> This is all I have..
>>>
>> Hm... which kernel version was this reproduced?
>>
>> Are you using the usb composite framework? The control transfers are
>> driven by the host, and the usb gadget composite framework should not
>> hit scenarios like this.
>>
>> Can you capture the tracepoints* for dwc3?
>> [*] https://docs.kernel.org/driver-api/usb/dwc3.html#reporting-bugs
> Ignore this request since you can't reproduce this bug. But can you help
> answer the question above about whether you used the usb composite
> framework?
>
> Thanks,
> Thinh

Yes, I thinks it's using the composite framework, as there are two 
interfaces XRSP and ADB.

Thanks,

Frode
Thinh Nguyen April 11, 2025, 1:27 a.m. UTC | #7
On Thu, Apr 10, 2025, Frode Isaksen wrote:
> On 4/10/25 2:09 AM, Thinh Nguyen wrote:
> > On Wed, Apr 09, 2025, Thinh Nguyen wrote:
> > > On Tue, Apr 08, 2025, Frode Isaksen wrote:
> > > > On 4/8/25 1:44 AM, Thinh Nguyen wrote:
> > > > > On Thu, Apr 03, 2025, Frode Isaksen wrote:
> > > > > > On 4/3/25 12:21 AM, Thinh Nguyen wrote:
> > > > > > > On Wed, Apr 02, 2025, Frode Isaksen wrote:
> > > > > > > > On 4/2/25 1:49 AM, Thinh Nguyen wrote:
> > > > > > > > > On Thu, Mar 27, 2025, Frode Isaksen wrote:
> > > > > > > > > > From: Frode Isaksen <frode@meta.com>
> > > > > > > > > > 
> > > > > > > > > > Prevent dwc3_request from being queued twice, by checking
> > > > > > > > > > req->status.
> > > > > > > > > > Similar to commit b2b6d601365a ("usb: dwc3: gadget: prevent
> > > > > > > > > > dwc3_request from being queued twice") for non-ep0 endpoints.
> > > > > > > > > > Crash log:
> > > > > > > > > > list_add double add: new=ffffff87ab2c7950, prev=ffffff87ab2c7950,
> > > > > > > > > >      next=ffffff87ab485b60.
> > > > > > > > > > kernel BUG at lib/list_debug.c:35!
> > > > > > > > > > Call trace:
> > > > > > > > > > __list_add_valid+0x70/0xc0
> > > > > > > > > > __dwc3_gadget_ep0_queue+0x70/0x224
> > > > > > > > > > dwc3_ep0_handle_status+0x118/0x200
> > > > > > > > > > dwc3_ep0_inspect_setup+0x144/0x32c
> > > > > > > > > > dwc3_ep0_interrupt+0xac/0x340
> > > > > > > > > > dwc3_process_event_entry+0x90/0x724
> > > > > > > > > > dwc3_process_event_buf+0x7c/0x33c
> > > > > > > > > > dwc3_thread_interrupt+0x58/0x8c
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Frode Isaksen <frode@meta.com>
> > > > > > > > > > ---
> > > > > > > > > > This bug was discovered, tested and fixed (no more crashes seen) on
> > > > > > > > > > Meta Quest 3 device. Also tested on T.I. AM62x board.
> > > > > > > > > > 
> > > > > > > > > >      drivers/usb/dwc3/ep0.c    | 5 +++++
> > > > > > > > > >      drivers/usb/dwc3/gadget.c | 1 +
> > > > > > > > > >      2 files changed, 6 insertions(+)
> > > > > > > > > > 
> > > > > > > > > > diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
> > > > > > > > > > index 666ac432f52d..e26c3a62d470 100644
> > > > > > > > > > --- a/drivers/usb/dwc3/ep0.c
> > > > > > > > > > +++ b/drivers/usb/dwc3/ep0.c
> > > > > > > > > > @@ -91,6 +91,11 @@ static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
> > > > > > > > > >      {
> > > > > > > > > >      	struct dwc3		*dwc = dep->dwc;
> > > > > > > > > > +	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,
> > > > > > > > > Let's not use WARN. Perhaps dev_warn?
> > > > > > > > I copied the coding style from commit b2b6d601365a ("usb: dwc3: gadget:
> > > > > > > > prevent
> > > > > > > > 
> > > > > > > > dwc3_request from being queued twice"), so maybe a follow-up patch to change from WARN to dev_warn for the two patches ?
> > > > > > > We can just use dev_warn here, we don't need 2 separate patches for this
> > > > > > > change. The other place can be reworked in a separate patch.
> > > > > > OK
> > > > > > > > > > +				"%s: request %pK already in flight\n",
> > > > > > > > > > +				dep->name, &req->request))
> > > > > > > > > > +		return -EINVAL;
> > > > > > > > > > +
> > > > > > > > > >      	req->request.actual	= 0;
> > > > > > > > > >      	req->request.status	= -EINPROGRESS;
> > > > > > > > > >      	req->epnum		= dep->number;
> > > > > > > > > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> > > > > > > > > > index 89a4dc8ebf94..c34446d8c54f 100644
> > > > > > > > > > --- a/drivers/usb/dwc3/gadget.c
> > > > > > > > > > +++ b/drivers/usb/dwc3/gadget.c
> > > > > > > > > > @@ -3002,6 +3002,7 @@ static int __dwc3_gadget_start(struct dwc3 *dwc)
> > > > > > > > > >      	dwc->ep0_bounced = false;
> > > > > > > > > >      	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
> > > > > > > > > >      	dwc->delayed_status = false;
> > > > > > > > > > +	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
> > > > > > > > > >      	dwc3_ep0_out_start(dwc);
> > > > > > > > > >      	dwc3_gadget_enable_irq(dwc);
> > > > > > > > > > -- 
> > > > > > > > > > 2.48.1
> > > > > > > > > > 
> > > > > > > > > I'm still not clear how this can happen. Are you testing this against
> > > > > > > > > mainline? Can you provide the dwc3 tracepoints?
> > > > > > > > I was not able to reproduce this locally. Was seen on 5.10, tested on 6.1
> > > > > > > > and 6.6.
> > > > > > > > 
> > > > > > > Without understanding how this can happen and why it is needed, we
> > > > > > > should not just apply this. Kernel v5.10, v6.1, and v6.6 are really old.
> > > > > > > We have a lot of fixes since then. Please see if this is reproducible
> > > > > > > using mainline.
> > > > > > We have applied all relevant patches from mainline to ep0.c, in order to try
> > > > > > to fix this crash:
> > > > > What causes the crash? I'm still not clear whether you were able to
> > > > > reproduced this on mainline, or any of the new stable tree.
> > <snip>
> > 
> > > > I was not able to reproduce this bug locally in any version.
> > Sorry, I totally missed this response...
> > 
> > <snip>
> > 
> > > > There were first a WARNING:
> > > > 
> > > > <4>[ 341.860109] WARNING: CPU: 0 PID: 8548 at drivers/usb/dwc3/ep0.c:1077
> > > > dwc3_ep0_interrupt+0x1e8/0x340
> > > > 
> > > > Here:
> > > > 
> > > > static void __dwc3_ep0_do_control_status(struct dwc3 *dwc, struct dwc3_ep
> > > > *dep) { WARN_ON(dwc3_ep0_start_control_status(dep)); } and then the crash:
> > > > 
> > > > <3>[ 351.674418] list_add double add: new=ffffff87ab2c7950,
> > > > prev=ffffff87ab2c7950, next=ffffff87ab485b60.
> > > > 
> > > > <6>[ 351.674437] ------------[ cut here ]------------
> > > > 
> > > > <2>[ 351.674439] kernel BUG at lib/list_debug.c:35!
> > > > 
> > > > <0>[ 351.674447] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
> > > > 
> > > > <6>[ 351.675047] Call trace:
> > > > 
> > > > <6>[ 351.675052] __list_add_valid+0x70/0xc0
> > > > 
> > > > <6>[ 351.675060] __dwc3_gadget_ep0_queue+0x70/0x224
> > > > 
> > > > <6>[ 351.675064] dwc3_ep0_handle_status+0x118/0x200
> > > > 
> > > > <6>[ 351.675068] dwc3_ep0_inspect_setup+0x144/0x32c
> > > > 
> > > > <6>[ 351.675073] dwc3_ep0_interrupt+0xac/0x340
> > > > 
> > > > <6>[ 351.675078] dwc3_process_event_entry+0x90/0x724
> > > > 
> > > > <6>[ 351.675082] dwc3_process_event_buf+0x7c/0x33c
> > > > 
> > > > <6>[ 351.675086] dwc3_thread_interrupt+0x58/0x8c
> > > > 
> > > > <6>[ 351.675092] irq_thread_fn+0x54/0xd4
> > > > 
> > > > This is all I have..
> > > > 
> > > Hm... which kernel version was this reproduced?
> > > 
> > > Are you using the usb composite framework? The control transfers are
> > > driven by the host, and the usb gadget composite framework should not
> > > hit scenarios like this.
> > > 
> > > Can you capture the tracepoints* for dwc3?
> > > [*] https://urldefense.com/v3/__https://docs.kernel.org/driver-api/usb/dwc3.html*reporting-bugs__;Iw!!A4F2R9G_pg!a99eDyoLRAn8bMgn6L9hZr1e0Dv214xF-wMGKgw8j-n78id3zRmMhpfG_0sNYa9gHgqgOrtqpb2r6WPO3eR42lZL$
> > Ignore this request since you can't reproduce this bug. But can you help
> > answer the question above about whether you used the usb composite
> > framework?
> > 
> > Thanks,
> > Thinh
> 
> Yes, I thinks it's using the composite framework, as there are two
> interfaces XRSP and ADB.
> 

Ok... it's unfortunate that we can't reproduce this.

Base on your change in __dwc3_gadget_start for this:
+	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;

I suspect the problem occurs during bind/unbind where
soft-disconnect/connect occurs. We have a lot of fixes to dwc3 since the
kernel 5.10 related to soft-disconnect handling that may have resolved
the issue you reported.

We should not just override the request status and ignore the ep0 queue.
The control transfer state from the driver may not be matching with the
controller, things may already be broken here.

If you run into this issue again, we can take a look and properly
resolve it. But I don't think we should handle it this way.

Thanks,
Thinh
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index 666ac432f52d..e26c3a62d470 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -91,6 +91,11 @@  static int __dwc3_gadget_ep0_queue(struct dwc3_ep *dep,
 {
 	struct dwc3		*dwc = dep->dwc;
 
+	if (WARN(req->status < DWC3_REQUEST_STATUS_COMPLETED,
+				"%s: request %pK already in flight\n",
+				dep->name, &req->request))
+		return -EINVAL;
+
 	req->request.actual	= 0;
 	req->request.status	= -EINPROGRESS;
 	req->epnum		= dep->number;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 89a4dc8ebf94..c34446d8c54f 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3002,6 +3002,7 @@  static int __dwc3_gadget_start(struct dwc3 *dwc)
 	dwc->ep0_bounced = false;
 	dwc->link_state = DWC3_LINK_STATE_SS_DIS;
 	dwc->delayed_status = false;
+	dwc->ep0_usb_req.status = DWC3_REQUEST_STATUS_UNKNOWN;
 	dwc3_ep0_out_start(dwc);
 
 	dwc3_gadget_enable_irq(dwc);