diff mbox series

usb: dwc3: gadget: Stall and restart EP0 if host is unresponsive

Message ID 20230331232039.1407-1-quic_wcheng@quicinc.com
State New
Headers show
Series usb: dwc3: gadget: Stall and restart EP0 if host is unresponsive | expand

Commit Message

Wesley Cheng March 31, 2023, 11:20 p.m. UTC
It was observed that there are hosts that may complete pending SETUP
transactions before the stop active transfers and controller halt occurs,
leading to lingering endxfer commands on DEPs on subsequent pullup/gadget
start iterations.

dwc3_gadget_ep_disable   name=ep8in flags=0x3009  direction=1
dwc3_gadget_ep_disable   name=ep4in flags=1  direction=1
dwc3_gadget_ep_disable   name=ep3out flags=1  direction=0
usb_gadget_disconnect   deactivated=0  connected=0  ret=0

The sequence shows that the USB gadget disconnect (dwc3_gadget_pullup(0))
routine completed successfully, allowing for the USB gadget to proceed with
a USB gadget connect.  However, if this occurs the system runs into an
issue where:

BUG: spinlock already unlocked on CPU
 spin_bug+0x0
 dwc3_remove_requests+0x278
 dwc3_ep0_out_start+0xb0
 __dwc3_gadget_start+0x25c

This is due to the pending endxfers, leading to gadget start (w/o lock
held) to execute the remove requests, which will unlock the dwc3 spinlock
as part of giveback.

To mitigate this, resolve the pending endxfers on the pullup disable path
by:
 1. Re-locating the SETUP phase check after stop active transfers, since
 that is where the DWC3_EP_DELAY_STOP is potentially set.  This also allows
 for handling of a host that may be unresponsive by using the completion
 timeout to trigger the stall and restart for EP0.

 2. Do not call gadget stop until the poll for controller halt is
 completed.  DEVTEN is cleared as part of gadget stop, so the intention to
 allow ep0 events to continue while waiting for controller halt is not
 happening.

Fixes: c96683798e27 ("usb: dwc3: ep0: Don't prepare beyond Setup stage")
Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
---
 drivers/usb/dwc3/gadget.c | 101 ++++++++++++++++++++++----------------
 1 file changed, 58 insertions(+), 43 deletions(-)

Comments

Wesley Cheng April 4, 2023, 6:49 p.m. UTC | #1
Hi Thinh,

On 4/3/2023 6:11 PM, Thinh Nguyen wrote:
> On Fri, Mar 31, 2023, Wesley Cheng wrote:
>> It was observed that there are hosts that may complete pending SETUP
>> transactions before the stop active transfers and controller halt occurs,
>> leading to lingering endxfer commands on DEPs on subsequent pullup/gadget
>> start iterations.
> 
> Can you clarify this a bit further? Even though the controller is
> halted, you still observed activity?
> 

Yes...I didn't understand how that was possible either, but traces 
clearly showed that the controller halt was successful even though there 
were no endxfers issued on some EPs.  Although, I can't say for certain 
if those EPs were actively being used at that time.

>>
>> dwc3_gadget_ep_disable   name=ep8in flags=0x3009  direction=1
>> dwc3_gadget_ep_disable   name=ep4in flags=1  direction=1
>> dwc3_gadget_ep_disable   name=ep3out flags=1  direction=0
>> usb_gadget_disconnect   deactivated=0  connected=0  ret=0
>>
>> The sequence shows that the USB gadget disconnect (dwc3_gadget_pullup(0))
>> routine completed successfully, allowing for the USB gadget to proceed with
>> a USB gadget connect.  However, if this occurs the system runs into an
>> issue where:
>>
>> BUG: spinlock already unlocked on CPU
>>   spin_bug+0x0
>>   dwc3_remove_requests+0x278
>>   dwc3_ep0_out_start+0xb0
>>   __dwc3_gadget_start+0x25c
>>
>> This is due to the pending endxfers, leading to gadget start (w/o lock
>> held) to execute the remove requests, which will unlock the dwc3 spinlock
>> as part of giveback.
>>
>> To mitigate this, resolve the pending endxfers on the pullup disable path
>> by:
>>   1. Re-locating the SETUP phase check after stop active transfers, since
>>   that is where the DWC3_EP_DELAY_STOP is potentially set.  This also allows
>>   for handling of a host that may be unresponsive by using the completion
>>   timeout to trigger the stall and restart for EP0.
>>
>>   2. Do not call gadget stop until the poll for controller halt is
>>   completed.  DEVTEN is cleared as part of gadget stop, so the intention to
>>   allow ep0 events to continue while waiting for controller halt is not
>>   happening.
>>
>> Fixes: c96683798e27 ("usb: dwc3: ep0: Don't prepare beyond Setup stage")
>> Signed-off-by: Wesley Cheng <quic_wcheng@quicinc.com>
>> ---
>>   drivers/usb/dwc3/gadget.c | 101 ++++++++++++++++++++++----------------
>>   1 file changed, 58 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
>> index 3c63fa97a680..9715de8e99bc 100644
>> --- a/drivers/usb/dwc3/gadget.c
>> +++ b/drivers/usb/dwc3/gadget.c
>> @@ -139,6 +139,24 @@ int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state)
>>   	return -ETIMEDOUT;
>>   }
>>   
>> +static void dwc3_ep0_reset_state(struct dwc3 *dwc)
>> +{
>> +	unsigned int	dir;
>> +
>> +	if (dwc->ep0state != EP0_SETUP_PHASE) {
>> +		dir = !!dwc->ep0_expect_in;
>> +		if (dwc->ep0state == EP0_DATA_PHASE)
>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[dir]);
>> +		else
>> +			dwc3_ep0_end_control_data(dwc, dwc->eps[!dir]);
>> +
>> +		dwc->eps[0]->trb_enqueue = 0;
>> +		dwc->eps[1]->trb_enqueue = 0;
>> +
>> +		dwc3_ep0_stall_and_restart(dwc);
>> +	}
>> +}
>> +
> 
> Can we separate refactoring changes other functional changes? It's
> difficult to review with too many things to keep track of.
> 

Sure I can split this into another patch.

Thanks
Wesley Cheng
Wesley Cheng April 4, 2023, 10:23 p.m. UTC | #2
Hi Thinh,

On 4/4/2023 3:16 PM, Thinh Nguyen wrote:
> On Tue, Apr 04, 2023, Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 4/3/2023 6:11 PM, Thinh Nguyen wrote:
>>> On Fri, Mar 31, 2023, Wesley Cheng wrote:
>>>> It was observed that there are hosts that may complete pending SETUP
>>>> transactions before the stop active transfers and controller halt occurs,
>>>> leading to lingering endxfer commands on DEPs on subsequent pullup/gadget
>>>> start iterations.
>>>
>>> Can you clarify this a bit further? Even though the controller is
>>> halted, you still observed activity?
>>>
>>
>> Yes...I didn't understand how that was possible either, but traces clearly
>> showed that the controller halt was successful even though there were no
>> endxfers issued on some EPs.  Although, I can't say for certain if those EPs
>> were actively being used at that time.
>>
> 
> The controller should only be halted after the (non-ep0) endpoints are
> disabled.
> 
> "even though there were no endxferx issued on some EPs", which EPs are
> you referring to? If there's no End Transfer issued while the endpoints
> are active and started during disconnect, we need to fix that in the
> driver.
> 

Sorry let me clarify.  When I said there were no endxfers issued, I 
meant that they were pending (DWC3_EP_DELAY_STOP is set for those EPs). 
  However, since the host wasn't moving the EP0 state forward, we never 
moved back to the SETUP phase, which is where we flush any pending end 
transfers.

void dwc3_ep0_out_start(struct dwc3 *dwc)
{
...
	for (i = 2; i < DWC3_ENDPOINTS_NUM; i++) {
		struct dwc3_ep *dwc3_ep;

		dwc3_ep = dwc->eps[i];
		if (!dwc3_ep)
			continue;

		if (!(dwc3_ep->flags & DWC3_EP_DELAY_STOP))
			continue;

		dwc3_ep->flags &= ~DWC3_EP_DELAY_STOP;
		if (dwc->connected)
			dwc3_stop_active_transfer(dwc3_ep, true, true);
		else
			dwc3_remove_requests(dwc, dwc3_ep, -ESHUTDOWN);
	}
}

This is part of the reason for moving the wait_for_completion() call 
until AFTER the stop active transfers, since that is the point at which 
we could potentially set the DWC3_EP_DELAY_STOP.  If there is a host not 
moving the EP0 state, then we can at least utilize the timeout path to 
force EP0 back to the setup phase.

Thanks
Wesley Cheng
Thinh Nguyen April 5, 2023, 6:49 p.m. UTC | #3
On Tue, Apr 04, 2023, Wesley Cheng wrote:
> Hi Thinh,
> 
> On 4/4/2023 3:16 PM, Thinh Nguyen wrote:
> > On Tue, Apr 04, 2023, Wesley Cheng wrote:
> > > Hi Thinh,
> > > 
> > > On 4/3/2023 6:11 PM, Thinh Nguyen wrote:
> > > > On Fri, Mar 31, 2023, Wesley Cheng wrote:
> > > > > It was observed that there are hosts that may complete pending SETUP
> > > > > transactions before the stop active transfers and controller halt occurs,
> > > > > leading to lingering endxfer commands on DEPs on subsequent pullup/gadget
> > > > > start iterations.
> > > > 
> > > > Can you clarify this a bit further? Even though the controller is
> > > > halted, you still observed activity?
> > > > 
> > > 
> > > Yes...I didn't understand how that was possible either, but traces clearly
> > > showed that the controller halt was successful even though there were no
> > > endxfers issued on some EPs.  Although, I can't say for certain if those EPs
> > > were actively being used at that time.
> > > 
> > 
> > The controller should only be halted after the (non-ep0) endpoints are
> > disabled.
> > 
> > "even though there were no endxferx issued on some EPs", which EPs are
> > you referring to? If there's no End Transfer issued while the endpoints
> > are active and started during disconnect, we need to fix that in the
> > driver.
> > 
> 
> Sorry let me clarify.  When I said there were no endxfers issued, I meant
> that they were pending (DWC3_EP_DELAY_STOP is set for those EPs).  However,
> since the host wasn't moving the EP0 state forward, we never moved back to
> the SETUP phase, which is where we flush any pending end transfers.
> 
> void dwc3_ep0_out_start(struct dwc3 *dwc)
> {
> ...
> 	for (i = 2; i < DWC3_ENDPOINTS_NUM; i++) {
> 		struct dwc3_ep *dwc3_ep;
> 
> 		dwc3_ep = dwc->eps[i];
> 		if (!dwc3_ep)
> 			continue;
> 
> 		if (!(dwc3_ep->flags & DWC3_EP_DELAY_STOP))
> 			continue;
> 
> 		dwc3_ep->flags &= ~DWC3_EP_DELAY_STOP;
> 		if (dwc->connected)
> 			dwc3_stop_active_transfer(dwc3_ep, true, true);
> 		else
> 			dwc3_remove_requests(dwc, dwc3_ep, -ESHUTDOWN);
> 	}
> }
> 
> This is part of the reason for moving the wait_for_completion() call until
> AFTER the stop active transfers, since that is the point at which we could
> potentially set the DWC3_EP_DELAY_STOP.  If there is a host not moving the
> EP0 state, then we can at least utilize the timeout path to force EP0 back
> to the setup phase.
> 

Thanks for the clarification. If I understand correctly, the issue here
is the missing handling of the timeout of the wait for control transfer
completion. This can happen if we enter flow control and/or if the host
delays polling for data. If it times out, then we'll have problems as
you mention above.

In that case, we can halt the control endpoint causing it to respond
with a STALL to the next host polling of data, cancelling the control
transfer. This is what you did with dwc3_ep0_reset_state() on timeout
right?

I think this should work as the End Transfer commands should complete
before the controller is halted.

Thanks,
Thinh
Wesley Cheng April 6, 2023, 12:10 a.m. UTC | #4
Hi Thinh,

On 4/5/2023 11:49 AM, Thinh Nguyen wrote:
> On Tue, Apr 04, 2023, Wesley Cheng wrote:
>> Hi Thinh,
>>
>> On 4/4/2023 3:16 PM, Thinh Nguyen wrote:
>>> On Tue, Apr 04, 2023, Wesley Cheng wrote:
>>>> Hi Thinh,
>>>>
>>>> On 4/3/2023 6:11 PM, Thinh Nguyen wrote:
>>>>> On Fri, Mar 31, 2023, Wesley Cheng wrote:
>>>>>> It was observed that there are hosts that may complete pending SETUP
>>>>>> transactions before the stop active transfers and controller halt occurs,
>>>>>> leading to lingering endxfer commands on DEPs on subsequent pullup/gadget
>>>>>> start iterations.
>>>>>
>>>>> Can you clarify this a bit further? Even though the controller is
>>>>> halted, you still observed activity?
>>>>>
>>>>
>>>> Yes...I didn't understand how that was possible either, but traces clearly
>>>> showed that the controller halt was successful even though there were no
>>>> endxfers issued on some EPs.  Although, I can't say for certain if those EPs
>>>> were actively being used at that time.
>>>>
>>>
>>> The controller should only be halted after the (non-ep0) endpoints are
>>> disabled.
>>>
>>> "even though there were no endxferx issued on some EPs", which EPs are
>>> you referring to? If there's no End Transfer issued while the endpoints
>>> are active and started during disconnect, we need to fix that in the
>>> driver.
>>>
>>
>> Sorry let me clarify.  When I said there were no endxfers issued, I meant
>> that they were pending (DWC3_EP_DELAY_STOP is set for those EPs).  However,
>> since the host wasn't moving the EP0 state forward, we never moved back to
>> the SETUP phase, which is where we flush any pending end transfers.
>>
>> void dwc3_ep0_out_start(struct dwc3 *dwc)
>> {
>> ...
>> 	for (i = 2; i < DWC3_ENDPOINTS_NUM; i++) {
>> 		struct dwc3_ep *dwc3_ep;
>>
>> 		dwc3_ep = dwc->eps[i];
>> 		if (!dwc3_ep)
>> 			continue;
>>
>> 		if (!(dwc3_ep->flags & DWC3_EP_DELAY_STOP))
>> 			continue;
>>
>> 		dwc3_ep->flags &= ~DWC3_EP_DELAY_STOP;
>> 		if (dwc->connected)
>> 			dwc3_stop_active_transfer(dwc3_ep, true, true);
>> 		else
>> 			dwc3_remove_requests(dwc, dwc3_ep, -ESHUTDOWN);
>> 	}
>> }
>>
>> This is part of the reason for moving the wait_for_completion() call until
>> AFTER the stop active transfers, since that is the point at which we could
>> potentially set the DWC3_EP_DELAY_STOP.  If there is a host not moving the
>> EP0 state, then we can at least utilize the timeout path to force EP0 back
>> to the setup phase.
>>
> 
> Thanks for the clarification. If I understand correctly, the issue here
> is the missing handling of the timeout of the wait for control transfer
> completion. This can happen if we enter flow control and/or if the host
> delays polling for data. If it times out, then we'll have problems as
> you mention above.
> 
> In that case, we can halt the control endpoint causing it to respond
> with a STALL to the next host polling of data, cancelling the control
> transfer. This is what you did with dwc3_ep0_reset_state() on timeout
> right?
> 

Yep, that is correct.  In addition, I also made sure to call gadget stop 
at the end of the soft disconnect API, since we wanted to still service 
EP0 events while polling for the controller halt, right?  If we call 
gadget stop, then we clear the event enable register, so no events are 
going to occur:

static void __dwc3_gadget_stop(struct dwc3 *dwc)
{
	dwc3_gadget_disable_irq(dwc);
	__dwc3_gadget_ep_disable(dwc->eps[0]);
	__dwc3_gadget_ep_disable(dwc->eps[1]);
}

static void dwc3_gadget_disable_irq(struct dwc3 *dwc)
{
	/* mask all interrupts */
	dwc3_writel(dwc->regs, DWC3_DEVTEN, 0x00);
}

I think I'll split this change up as well from the timeout related one.

> I think this should work as the End Transfer commands should complete
> before the controller is halted.
> 

Yes, anyway we poll for the halted status, and in the changes for the 
endxfer timeouts we increased the poll time to allow for them to complete.

Thanks
Wesley Cheng
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 3c63fa97a680..9715de8e99bc 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -139,6 +139,24 @@  int dwc3_gadget_set_link_state(struct dwc3 *dwc, enum dwc3_link_state state)
 	return -ETIMEDOUT;
 }
 
+static void dwc3_ep0_reset_state(struct dwc3 *dwc)
+{
+	unsigned int	dir;
+
+	if (dwc->ep0state != EP0_SETUP_PHASE) {
+		dir = !!dwc->ep0_expect_in;
+		if (dwc->ep0state == EP0_DATA_PHASE)
+			dwc3_ep0_end_control_data(dwc, dwc->eps[dir]);
+		else
+			dwc3_ep0_end_control_data(dwc, dwc->eps[!dir]);
+
+		dwc->eps[0]->trb_enqueue = 0;
+		dwc->eps[1]->trb_enqueue = 0;
+
+		dwc3_ep0_stall_and_restart(dwc);
+	}
+}
+
 /**
  * dwc3_ep_inc_trb - increment a trb index.
  * @index: Pointer to the TRB index to increment.
@@ -2528,29 +2546,17 @@  static int __dwc3_gadget_start(struct dwc3 *dwc);
 static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
 {
 	unsigned long flags;
+	int ret;
 
 	spin_lock_irqsave(&dwc->lock, flags);
 	dwc->connected = false;
 
 	/*
-	 * Per databook, when we want to stop the gadget, if a control transfer
-	 * is still in process, complete it and get the core into setup phase.
+	 * Attempt to end pending SETUP status phase, and not wait for the
+	 * function to do so.
 	 */
-	if (dwc->ep0state != EP0_SETUP_PHASE) {
-		int ret;
-
-		if (dwc->delayed_status)
-			dwc3_ep0_send_delayed_status(dwc);
-
-		reinit_completion(&dwc->ep0_in_setup);
-
-		spin_unlock_irqrestore(&dwc->lock, flags);
-		ret = wait_for_completion_timeout(&dwc->ep0_in_setup,
-				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
-		spin_lock_irqsave(&dwc->lock, flags);
-		if (ret == 0)
-			dev_warn(dwc->dev, "timed out waiting for SETUP phase\n");
-	}
+	if (dwc->delayed_status)
+		dwc3_ep0_send_delayed_status(dwc);
 
 	/*
 	 * In the Synopsys DesignWare Cores USB3 Databook Rev. 3.30a
@@ -2560,9 +2566,28 @@  static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
 	 * bit.
 	 */
 	dwc3_stop_active_transfers(dwc);
-	__dwc3_gadget_stop(dwc);
 	spin_unlock_irqrestore(&dwc->lock, flags);
 
+	/*
+	 * Per databook, when we want to stop the gadget, if a control transfer
+	 * is still in process, complete it and get the core into setup phase.
+	 * In case the host is unresponsive to a SETUP transaction, forcefully
+	 * stall the transfer, and move back to the SETUP phase, so that any
+	 * pending endxfers can be executed.
+	 */
+	if (dwc->ep0state != EP0_SETUP_PHASE) {
+		reinit_completion(&dwc->ep0_in_setup);
+
+		ret = wait_for_completion_timeout(&dwc->ep0_in_setup,
+				msecs_to_jiffies(DWC3_PULL_UP_TIMEOUT));
+		if (ret == 0) {
+			dev_warn(dwc->dev, "wait for SETUP phase timed out\n");
+			spin_lock_irqsave(&dwc->lock, flags);
+			dwc3_ep0_reset_state(dwc);
+			spin_unlock_irqrestore(&dwc->lock, flags);
+		}
+	}
+
 	/*
 	 * Note: if the GEVNTCOUNT indicates events in the event buffer, the
 	 * driver needs to acknowledge them before the controller can halt.
@@ -2570,7 +2595,19 @@  static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc)
 	 * remaining event generated by the controller while polling for
 	 * DSTS.DEVCTLHLT.
 	 */
-	return dwc3_gadget_run_stop(dwc, false, false);
+	ret = dwc3_gadget_run_stop(dwc, false, false);
+
+	/*
+	 * Stop the gadget after controller is halted, so that if needed, the
+	 * events to update EP0 state can still occur while the run/stop
+	 * routine polls for the halted state.  DEVTEN is cleared as part of
+	 * gadget stop.
+	 */
+	spin_lock_irqsave(&dwc->lock, flags);
+	__dwc3_gadget_stop(dwc);
+	spin_unlock_irqrestore(&dwc->lock, flags);
+
+	return ret;
 }
 
 static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on)
@@ -3821,16 +3858,7 @@  static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc)
 	dwc->setup_packet_pending = false;
 	usb_gadget_set_state(dwc->gadget, USB_STATE_NOTATTACHED);
 
-	if (dwc->ep0state != EP0_SETUP_PHASE) {
-		unsigned int    dir;
-
-		dir = !!dwc->ep0_expect_in;
-		if (dwc->ep0state == EP0_DATA_PHASE)
-			dwc3_ep0_end_control_data(dwc, dwc->eps[dir]);
-		else
-			dwc3_ep0_end_control_data(dwc, dwc->eps[!dir]);
-		dwc3_ep0_stall_and_restart(dwc);
-	}
+	dwc3_ep0_reset_state(dwc);
 }
 
 static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
@@ -3884,20 +3912,7 @@  static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc)
 	 * phase. So ensure that EP0 is in setup phase by issuing a stall
 	 * and restart if EP0 is not in setup phase.
 	 */
-	if (dwc->ep0state != EP0_SETUP_PHASE) {
-		unsigned int	dir;
-
-		dir = !!dwc->ep0_expect_in;
-		if (dwc->ep0state == EP0_DATA_PHASE)
-			dwc3_ep0_end_control_data(dwc, dwc->eps[dir]);
-		else
-			dwc3_ep0_end_control_data(dwc, dwc->eps[!dir]);
-
-		dwc->eps[0]->trb_enqueue = 0;
-		dwc->eps[1]->trb_enqueue = 0;
-
-		dwc3_ep0_stall_and_restart(dwc);
-	}
+	dwc3_ep0_reset_state(dwc);
 
 	/*
 	 * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a