diff mbox series

usb: dwc3: Prevent indefinite sleep in _dwc3_set_mode during suspend/resume

Message ID 1519730526-22274-1-git-send-email-rogerq@ti.com
State New
Headers show
Series usb: dwc3: Prevent indefinite sleep in _dwc3_set_mode during suspend/resume | expand

Commit Message

Roger Quadros Feb. 27, 2018, 11:22 a.m. UTC
In the following test we get stuck by sleeping forever in _dwc3_set_mode()
after which dual-role switching doesn't work.

On dra7-evm's dual-role port,
- Load g_zero gadget driver and enumerate to host
- suspend to mem
- disconnect USB cable to host and connect otg cable with Pen drive in it.
- resume system
- we sleep indefinitely in _dwc3_set_mode due to.
  dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->
	dwc3_gadget_stop()->wait_event_lock_irq()

Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints
so we don't wait in dwc3_gadget_stop().

Signed-off-by: Roger Quadros <rogerq@ti.com>

---
 drivers/usb/dwc3/gadget.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

-- 
cheers,
-roger

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

Comments

(Exiting) Baolin Wang Feb. 28, 2018, 3:04 a.m. UTC | #1
Hi Roger,

On 27 February 2018 at 19:22, Roger Quadros <rogerq@ti.com> wrote:
> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

> after which dual-role switching doesn't work.

>

> On dra7-evm's dual-role port,

> - Load g_zero gadget driver and enumerate to host

> - suspend to mem

> - disconnect USB cable to host and connect otg cable with Pen drive in it.

> - resume system

> - we sleep indefinitely in _dwc3_set_mode due to.

>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>         dwc3_gadget_stop()->wait_event_lock_irq()

>

> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

> so we don't wait in dwc3_gadget_stop().


I am curious why the DWC3_DEPEVT_EPCMDCMPLT event was not triggered
any more when you executed the DWC3_DEPCMD_ENDTRANSFER command?

>

> Signed-off-by: Roger Quadros <rogerq@ti.com>

> ---

>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>  1 file changed, 14 insertions(+)

>

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

> index 2bda4eb..0a360da 100644

> --- a/drivers/usb/dwc3/gadget.c

> +++ b/drivers/usb/dwc3/gadget.c

> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>

>  void dwc3_gadget_exit(struct dwc3 *dwc)

>  {

> +       int epnum;

> +       unsigned long flags;

> +

> +       spin_lock_irqsave(&dwc->lock, flags);

> +       for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

> +               struct dwc3_ep  *dep = dwc->eps[epnum];

> +

> +               if (!dep)

> +                       continue;

> +

> +               dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

> +       }

> +       spin_unlock_irqrestore(&dwc->lock, flags);

> +

>         usb_del_gadget_udc(&dwc->gadget);

>         dwc3_gadget_free_endpoints(dwc);

>         dma_free_coherent(dwc->sysdev, DWC3_BOUNCE_SIZE, dwc->bounce,

> --


-- 
Baolin.wang
Best Regards
Felipe Balbi Feb. 28, 2018, 7:53 a.m. UTC | #2
Hi,

Roger Quadros <rogerq@ti.com> writes:
> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

> after which dual-role switching doesn't work.

>

> On dra7-evm's dual-role port,

> - Load g_zero gadget driver and enumerate to host

> - suspend to mem

> - disconnect USB cable to host and connect otg cable with Pen drive in it.

> - resume system

> - we sleep indefinitely in _dwc3_set_mode due to.

>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

> 	dwc3_gadget_stop()->wait_event_lock_irq()

>

> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

> so we don't wait in dwc3_gadget_stop().

>

> Signed-off-by: Roger Quadros <rogerq@ti.com>

> ---

>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>  1 file changed, 14 insertions(+)

>

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

> index 2bda4eb..0a360da 100644

> --- a/drivers/usb/dwc3/gadget.c

> +++ b/drivers/usb/dwc3/gadget.c

> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>  

>  void dwc3_gadget_exit(struct dwc3 *dwc)

>  {

> +	int epnum;

> +	unsigned long flags;

> +

> +	spin_lock_irqsave(&dwc->lock, flags);

> +	for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

> +		struct dwc3_ep  *dep = dwc->eps[epnum];

> +

> +		if (!dep)

> +			continue;

> +

> +		dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

> +	}

> +	spin_unlock_irqrestore(&dwc->lock, flags);

> +

>  	usb_del_gadget_udc(&dwc->gadget);

>  	dwc3_gadget_free_endpoints(dwc);


free endpoints is a better place for this. It's already going to free
the memory anyway. Might as well clear all flags to 0 there.

-- 
balbi
Roger Quadros Feb. 28, 2018, 9:55 a.m. UTC | #3
Hi Baolin,

On 28/02/18 05:04, Baolin Wang wrote:
> Hi Roger,

> 

> On 27 February 2018 at 19:22, Roger Quadros <rogerq@ti.com> wrote:

>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>> after which dual-role switching doesn't work.

>>

>> On dra7-evm's dual-role port,

>> - Load g_zero gadget driver and enumerate to host

>> - suspend to mem

>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>> - resume system

>> - we sleep indefinitely in _dwc3_set_mode due to.

>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>         dwc3_gadget_stop()->wait_event_lock_irq()

>>

>> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

>> so we don't wait in dwc3_gadget_stop().

> 

> I am curious why the DWC3_DEPEVT_EPCMDCMPLT event was not triggered

> any more when you executed the DWC3_DEPCMD_ENDTRANSFER command?


In this particular case the USB gadget has been disconnected from the host so
we shouldn't be expecting any command completion events.

> 

>>

>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>> ---

>>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>>  1 file changed, 14 insertions(+)

>>

>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>> index 2bda4eb..0a360da 100644

>> --- a/drivers/usb/dwc3/gadget.c

>> +++ b/drivers/usb/dwc3/gadget.c

>> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>>

>>  void dwc3_gadget_exit(struct dwc3 *dwc)

>>  {

>> +       int epnum;

>> +       unsigned long flags;

>> +

>> +       spin_lock_irqsave(&dwc->lock, flags);

>> +       for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>> +               struct dwc3_ep  *dep = dwc->eps[epnum];

>> +

>> +               if (!dep)

>> +                       continue;

>> +

>> +               dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

>> +       }

>> +       spin_unlock_irqrestore(&dwc->lock, flags);

>> +

>>         usb_del_gadget_udc(&dwc->gadget);

>>         dwc3_gadget_free_endpoints(dwc);

>>         dma_free_coherent(dwc->sysdev, DWC3_BOUNCE_SIZE, dwc->bounce,

>> --

> 


-- 
cheers,
-roger

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Felipe Balbi March 5, 2018, 8:49 a.m. UTC | #4
Hi,

Roger Quadros <rogerq@ti.com> writes:
>> Roger Quadros <rogerq@ti.com> writes:

>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>> after which dual-role switching doesn't work.

>>>

>>> On dra7-evm's dual-role port,

>>> - Load g_zero gadget driver and enumerate to host

>>> - suspend to mem

>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>> - resume system

>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>> 	dwc3_gadget_stop()->wait_event_lock_irq()

>>>

>>> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

>>> so we don't wait in dwc3_gadget_stop().

>>>

>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>> ---

>>>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>>>  1 file changed, 14 insertions(+)

>>>

>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>> index 2bda4eb..0a360da 100644

>>> --- a/drivers/usb/dwc3/gadget.c

>>> +++ b/drivers/usb/dwc3/gadget.c

>>> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>>>  

>>>  void dwc3_gadget_exit(struct dwc3 *dwc)

>>>  {

>>> +	int epnum;

>>> +	unsigned long flags;

>>> +

>>> +	spin_lock_irqsave(&dwc->lock, flags);

>>> +	for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>>> +		struct dwc3_ep  *dep = dwc->eps[epnum];

>>> +

>>> +		if (!dep)

>>> +			continue;

>>> +

>>> +		dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

>>> +	}

>>> +	spin_unlock_irqrestore(&dwc->lock, flags);

>>> +

>>>  	usb_del_gadget_udc(&dwc->gadget);

>>>  	dwc3_gadget_free_endpoints(dwc);

>> 

>> free endpoints is a better place for this. It's already going to free

>> the memory anyway. Might as well clear all flags to 0 there.

>> 

>

> But it won't solve the deadlock issue. Since dwc3_gadget_free_endpoints()

> is called after usb_del_gadget_udc() and the deadlock happens when

>

> usb_del_gadget_udc()->udc_stop()->dwc3_gadget_stop()->wait_event_lock_irq()

>

> and DWC3_EP_END_TRANSFER_PENDING flag is set.


indeed. Iterating twice over the entire endpoint list seems
wasteful. Perhaps we just shouldn't wait when removing the UDC since
that's essentially what this patch will do, right? If you clear the flag
before calling ->udc_stop(), this means the loop in dwc3_gadget_stop()
will do nothing. Might as well remove it.

-- 
balbi
(Exiting) Baolin Wang March 5, 2018, 10:41 a.m. UTC | #5
Hi Roger,

On 5 March 2018 at 17:45, Roger Quadros <rogerq@ti.com> wrote:
> Felipe,

>

> On 05/03/18 10:49, Felipe Balbi wrote:

>>

>> Hi,

>>

>> Roger Quadros <rogerq@ti.com> writes:

>>>> Roger Quadros <rogerq@ti.com> writes:

>>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>>>> after which dual-role switching doesn't work.

>>>>>

>>>>> On dra7-evm's dual-role port,

>>>>> - Load g_zero gadget driver and enumerate to host

>>>>> - suspend to mem

>>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>>>> - resume system

>>>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>>>>    dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>

>>>>> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

>>>>> so we don't wait in dwc3_gadget_stop().

>>>>>

>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>> ---

>>>>>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>>>>>  1 file changed, 14 insertions(+)

>>>>>

>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>>>> index 2bda4eb..0a360da 100644

>>>>> --- a/drivers/usb/dwc3/gadget.c

>>>>> +++ b/drivers/usb/dwc3/gadget.c

>>>>> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>>>>>

>>>>>  void dwc3_gadget_exit(struct dwc3 *dwc)

>>>>>  {

>>>>> +  int epnum;

>>>>> +  unsigned long flags;

>>>>> +

>>>>> +  spin_lock_irqsave(&dwc->lock, flags);

>>>>> +  for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>>>>> +          struct dwc3_ep  *dep = dwc->eps[epnum];

>>>>> +

>>>>> +          if (!dep)

>>>>> +                  continue;

>>>>> +

>>>>> +          dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

>>>>> +  }

>>>>> +  spin_unlock_irqrestore(&dwc->lock, flags);

>>>>> +

>>>>>    usb_del_gadget_udc(&dwc->gadget);

>>>>>    dwc3_gadget_free_endpoints(dwc);

>>>>

>>>> free endpoints is a better place for this. It's already going to free

>>>> the memory anyway. Might as well clear all flags to 0 there.

>>>>

>>>

>>> But it won't solve the deadlock issue. Since dwc3_gadget_free_endpoints()

>>> is called after usb_del_gadget_udc() and the deadlock happens when

>>>

>>> usb_del_gadget_udc()->udc_stop()->dwc3_gadget_stop()->wait_event_lock_irq()

>>>

>>> and DWC3_EP_END_TRANSFER_PENDING flag is set.

>>

>> indeed. Iterating twice over the entire endpoint list seems

>> wasteful. Perhaps we just shouldn't wait when removing the UDC since

>> that's essentially what this patch will do, right? If you clear the flag

>> before calling ->udc_stop(), this means the loop in dwc3_gadget_stop()

>> will do nothing. Might as well remove it.

>>

>

> This means that we will never wait for DWC3_EP_END_TRANSFER_PENDING to clear

> in dwc3_gadget_stop() like we used to. This is perfectly fine, right?

>

> It makes sense to me as dwc3_gadget_stop() calls __dwc3_gadget_stop() which

> masks all interrupts and nobody will ever clear that flag if it was set.


I don't think so. It can not mask the endpoint events, please check
the events which will be masked in DEVTEN register. The reason why we
should wait for DWC3_EP_END_TRANSFER_PENDING to clear is that,
sometimes the DWC3_DEPEVT_EPCMDCMPLT event will be triggered later
than 100us, but now we may have freed the gadget irq which will cause
crash.

-- 
Baolin.wang
Best Regards
Felipe Balbi March 5, 2018, 11:06 a.m. UTC | #6
Hi,

Baolin Wang <baolin.wang@linaro.org> writes:
>>> Roger Quadros <rogerq@ti.com> writes:

>>>>> Roger Quadros <rogerq@ti.com> writes:

>>>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>>>>> after which dual-role switching doesn't work.

>>>>>>

>>>>>> On dra7-evm's dual-role port,

>>>>>> - Load g_zero gadget driver and enumerate to host

>>>>>> - suspend to mem

>>>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>>>>> - resume system

>>>>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>>>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>>>>>    dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>>

>>>>>> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

>>>>>> so we don't wait in dwc3_gadget_stop().

>>>>>>

>>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>>> ---

>>>>>>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>>>>>>  1 file changed, 14 insertions(+)

>>>>>>

>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>>>>> index 2bda4eb..0a360da 100644

>>>>>> --- a/drivers/usb/dwc3/gadget.c

>>>>>> +++ b/drivers/usb/dwc3/gadget.c

>>>>>> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>>>>>>

>>>>>>  void dwc3_gadget_exit(struct dwc3 *dwc)

>>>>>>  {

>>>>>> +  int epnum;

>>>>>> +  unsigned long flags;

>>>>>> +

>>>>>> +  spin_lock_irqsave(&dwc->lock, flags);

>>>>>> +  for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>>>>>> +          struct dwc3_ep  *dep = dwc->eps[epnum];

>>>>>> +

>>>>>> +          if (!dep)

>>>>>> +                  continue;

>>>>>> +

>>>>>> +          dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

>>>>>> +  }

>>>>>> +  spin_unlock_irqrestore(&dwc->lock, flags);

>>>>>> +

>>>>>>    usb_del_gadget_udc(&dwc->gadget);

>>>>>>    dwc3_gadget_free_endpoints(dwc);

>>>>>

>>>>> free endpoints is a better place for this. It's already going to free

>>>>> the memory anyway. Might as well clear all flags to 0 there.

>>>>>

>>>>

>>>> But it won't solve the deadlock issue. Since dwc3_gadget_free_endpoints()

>>>> is called after usb_del_gadget_udc() and the deadlock happens when

>>>>

>>>> usb_del_gadget_udc()->udc_stop()->dwc3_gadget_stop()->wait_event_lock_irq()

>>>>

>>>> and DWC3_EP_END_TRANSFER_PENDING flag is set.

>>>

>>> indeed. Iterating twice over the entire endpoint list seems

>>> wasteful. Perhaps we just shouldn't wait when removing the UDC since

>>> that's essentially what this patch will do, right? If you clear the flag

>>> before calling ->udc_stop(), this means the loop in dwc3_gadget_stop()

>>> will do nothing. Might as well remove it.

>>>

>>

>> This means that we will never wait for DWC3_EP_END_TRANSFER_PENDING to clear

>> in dwc3_gadget_stop() like we used to. This is perfectly fine, right?

>>

>> It makes sense to me as dwc3_gadget_stop() calls __dwc3_gadget_stop() which

>> masks all interrupts and nobody will ever clear that flag if it was set.

>

> I don't think so. It can not mask the endpoint events, please check

> the events which will be masked in DEVTEN register. The reason why we

> should wait for DWC3_EP_END_TRANSFER_PENDING to clear is that,

> sometimes the DWC3_DEPEVT_EPCMDCMPLT event will be triggered later

> than 100us, but now we may have freed the gadget irq which will cause

> crash.


We could mask command complete events as soon as ->udc_stop() is called,
right? Hmm, actually, __dwc3_gadget_stop() already clears DEVTEN
completely.

/me goes check databook

At least on revision 2.60a of the databook, bit 10 is reserved. I wonder
if that's the start of all the problems. Anybody has access to older and
newer databook revisions so we can cross-check?

best

-- 
balbi
Roger Quadros March 5, 2018, 11:14 a.m. UTC | #7
On 05/03/18 13:06, Felipe Balbi wrote:
> 

> Hi,

> 

> Baolin Wang <baolin.wang@linaro.org> writes:

>>>> Roger Quadros <rogerq@ti.com> writes:

>>>>>> Roger Quadros <rogerq@ti.com> writes:

>>>>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>>>>>> after which dual-role switching doesn't work.

>>>>>>>

>>>>>>> On dra7-evm's dual-role port,

>>>>>>> - Load g_zero gadget driver and enumerate to host

>>>>>>> - suspend to mem

>>>>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>>>>>> - resume system

>>>>>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>>>>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>>>>>>    dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>>>

>>>>>>> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

>>>>>>> so we don't wait in dwc3_gadget_stop().

>>>>>>>

>>>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>>>> ---

>>>>>>>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>>>>>>>  1 file changed, 14 insertions(+)

>>>>>>>

>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>>>>>> index 2bda4eb..0a360da 100644

>>>>>>> --- a/drivers/usb/dwc3/gadget.c

>>>>>>> +++ b/drivers/usb/dwc3/gadget.c

>>>>>>> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>>>>>>>

>>>>>>>  void dwc3_gadget_exit(struct dwc3 *dwc)

>>>>>>>  {

>>>>>>> +  int epnum;

>>>>>>> +  unsigned long flags;

>>>>>>> +

>>>>>>> +  spin_lock_irqsave(&dwc->lock, flags);

>>>>>>> +  for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>>>>>>> +          struct dwc3_ep  *dep = dwc->eps[epnum];

>>>>>>> +

>>>>>>> +          if (!dep)

>>>>>>> +                  continue;

>>>>>>> +

>>>>>>> +          dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

>>>>>>> +  }

>>>>>>> +  spin_unlock_irqrestore(&dwc->lock, flags);

>>>>>>> +

>>>>>>>    usb_del_gadget_udc(&dwc->gadget);

>>>>>>>    dwc3_gadget_free_endpoints(dwc);

>>>>>>

>>>>>> free endpoints is a better place for this. It's already going to free

>>>>>> the memory anyway. Might as well clear all flags to 0 there.

>>>>>>

>>>>>

>>>>> But it won't solve the deadlock issue. Since dwc3_gadget_free_endpoints()

>>>>> is called after usb_del_gadget_udc() and the deadlock happens when

>>>>>

>>>>> usb_del_gadget_udc()->udc_stop()->dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>

>>>>> and DWC3_EP_END_TRANSFER_PENDING flag is set.

>>>>

>>>> indeed. Iterating twice over the entire endpoint list seems

>>>> wasteful. Perhaps we just shouldn't wait when removing the UDC since

>>>> that's essentially what this patch will do, right? If you clear the flag

>>>> before calling ->udc_stop(), this means the loop in dwc3_gadget_stop()

>>>> will do nothing. Might as well remove it.

>>>>

>>>

>>> This means that we will never wait for DWC3_EP_END_TRANSFER_PENDING to clear

>>> in dwc3_gadget_stop() like we used to. This is perfectly fine, right?

>>>

>>> It makes sense to me as dwc3_gadget_stop() calls __dwc3_gadget_stop() which

>>> masks all interrupts and nobody will ever clear that flag if it was set.

>>

>> I don't think so. It can not mask the endpoint events, please check

>> the events which will be masked in DEVTEN register. The reason why we

>> should wait for DWC3_EP_END_TRANSFER_PENDING to clear is that,

>> sometimes the DWC3_DEPEVT_EPCMDCMPLT event will be triggered later

>> than 100us, but now we may have freed the gadget irq which will cause

>> crash.

> 

> We could mask command complete events as soon as ->udc_stop() is called,

> right? Hmm, actually, __dwc3_gadget_stop() already clears DEVTEN

> completely.


But which bit in DEVTEN says Endpoint events are disabled?

> 

> /me goes check databook

> 

> At least on revision 2.60a of the databook, bit 10 is reserved. I wonder

> if that's the start of all the problems. Anybody has access to older and

> newer databook revisions so we can cross-check?

> 


I can access v2.40 and v3.10 books.

bit 10 is reserved on both

Differences in v2.4 vs v3.10 are:

bit 8	reserved	vs	L1SUSPEN
bit 13	reserved	vs	StopOnDisconnectEn
bit 14	reserved	vs	L1WKUPEVTEN

-- 
cheers,
-roger

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
(Exiting) Baolin Wang March 5, 2018, 11:25 a.m. UTC | #8
On 5 March 2018 at 19:14, Roger Quadros <rogerq@ti.com> wrote:
> On 05/03/18 13:06, Felipe Balbi wrote:

>>

>> Hi,

>>

>> Baolin Wang <baolin.wang@linaro.org> writes:

>>>>> Roger Quadros <rogerq@ti.com> writes:

>>>>>>> Roger Quadros <rogerq@ti.com> writes:

>>>>>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>>>>>>> after which dual-role switching doesn't work.

>>>>>>>>

>>>>>>>> On dra7-evm's dual-role port,

>>>>>>>> - Load g_zero gadget driver and enumerate to host

>>>>>>>> - suspend to mem

>>>>>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>>>>>>> - resume system

>>>>>>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>>>>>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>>>>>>>    dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>>>>

>>>>>>>> Let's clear the DWC3_EP_END_TRANSFER_PENDING flag on all endpoints

>>>>>>>> so we don't wait in dwc3_gadget_stop().

>>>>>>>>

>>>>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>>>>> ---

>>>>>>>>  drivers/usb/dwc3/gadget.c | 14 ++++++++++++++

>>>>>>>>  1 file changed, 14 insertions(+)

>>>>>>>>

>>>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>>>>>>> index 2bda4eb..0a360da 100644

>>>>>>>> --- a/drivers/usb/dwc3/gadget.c

>>>>>>>> +++ b/drivers/usb/dwc3/gadget.c

>>>>>>>> @@ -3273,6 +3273,20 @@ int dwc3_gadget_init(struct dwc3 *dwc)

>>>>>>>>

>>>>>>>>  void dwc3_gadget_exit(struct dwc3 *dwc)

>>>>>>>>  {

>>>>>>>> +  int epnum;

>>>>>>>> +  unsigned long flags;

>>>>>>>> +

>>>>>>>> +  spin_lock_irqsave(&dwc->lock, flags);

>>>>>>>> +  for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>>>>>>>> +          struct dwc3_ep  *dep = dwc->eps[epnum];

>>>>>>>> +

>>>>>>>> +          if (!dep)

>>>>>>>> +                  continue;

>>>>>>>> +

>>>>>>>> +          dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

>>>>>>>> +  }

>>>>>>>> +  spin_unlock_irqrestore(&dwc->lock, flags);

>>>>>>>> +

>>>>>>>>    usb_del_gadget_udc(&dwc->gadget);

>>>>>>>>    dwc3_gadget_free_endpoints(dwc);

>>>>>>>

>>>>>>> free endpoints is a better place for this. It's already going to free

>>>>>>> the memory anyway. Might as well clear all flags to 0 there.

>>>>>>>

>>>>>>

>>>>>> But it won't solve the deadlock issue. Since dwc3_gadget_free_endpoints()

>>>>>> is called after usb_del_gadget_udc() and the deadlock happens when

>>>>>>

>>>>>> usb_del_gadget_udc()->udc_stop()->dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>>

>>>>>> and DWC3_EP_END_TRANSFER_PENDING flag is set.

>>>>>

>>>>> indeed. Iterating twice over the entire endpoint list seems

>>>>> wasteful. Perhaps we just shouldn't wait when removing the UDC since

>>>>> that's essentially what this patch will do, right? If you clear the flag

>>>>> before calling ->udc_stop(), this means the loop in dwc3_gadget_stop()

>>>>> will do nothing. Might as well remove it.

>>>>>

>>>>

>>>> This means that we will never wait for DWC3_EP_END_TRANSFER_PENDING to clear

>>>> in dwc3_gadget_stop() like we used to. This is perfectly fine, right?

>>>>

>>>> It makes sense to me as dwc3_gadget_stop() calls __dwc3_gadget_stop() which

>>>> masks all interrupts and nobody will ever clear that flag if it was set.

>>>

>>> I don't think so. It can not mask the endpoint events, please check

>>> the events which will be masked in DEVTEN register. The reason why we

>>> should wait for DWC3_EP_END_TRANSFER_PENDING to clear is that,

>>> sometimes the DWC3_DEPEVT_EPCMDCMPLT event will be triggered later

>>> than 100us, but now we may have freed the gadget irq which will cause

>>> crash.

>>

>> We could mask command complete events as soon as ->udc_stop() is called,

>> right? Hmm, actually, __dwc3_gadget_stop() already clears DEVTEN

>> completely.

>

> But which bit in DEVTEN says Endpoint events are disabled?


When we set up the DWC3_DEPCMD_ENDTRANSFER command in
dwc3_stop_active_transfer(), we can do not set DWC3_DEPCMD_CMDIOC,
then there will no endpoint command complete interrupts I think.

cmd |= DWC3_DEPCMD_CMDIOC;

>

>>

>> /me goes check databook

>>

>> At least on revision 2.60a of the databook, bit 10 is reserved. I wonder

>> if that's the start of all the problems. Anybody has access to older and

>> newer databook revisions so we can cross-check?

>>

>

> I can access v2.40 and v3.10 books.

>

> bit 10 is reserved on both

>

> Differences in v2.4 vs v3.10 are:

>

> bit 8   reserved        vs      L1SUSPEN

> bit 13  reserved        vs      StopOnDisconnectEn

> bit 14  reserved        vs      L1WKUPEVTEN

>

> --

> cheers,

> -roger

>

> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki




-- 
Baolin.wang
Best Regards
Felipe Balbi March 5, 2018, 11:27 a.m. UTC | #9
Hi,

Baolin Wang <baolin.wang@linaro.org> writes:
>>>>>>>>>  void dwc3_gadget_exit(struct dwc3 *dwc)

>>>>>>>>>  {

>>>>>>>>> +  int epnum;

>>>>>>>>> +  unsigned long flags;

>>>>>>>>> +

>>>>>>>>> +  spin_lock_irqsave(&dwc->lock, flags);

>>>>>>>>> +  for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>>>>>>>>> +          struct dwc3_ep  *dep = dwc->eps[epnum];

>>>>>>>>> +

>>>>>>>>> +          if (!dep)

>>>>>>>>> +                  continue;

>>>>>>>>> +

>>>>>>>>> +          dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;

>>>>>>>>> +  }

>>>>>>>>> +  spin_unlock_irqrestore(&dwc->lock, flags);

>>>>>>>>> +

>>>>>>>>>    usb_del_gadget_udc(&dwc->gadget);

>>>>>>>>>    dwc3_gadget_free_endpoints(dwc);

>>>>>>>>

>>>>>>>> free endpoints is a better place for this. It's already going to free

>>>>>>>> the memory anyway. Might as well clear all flags to 0 there.

>>>>>>>>

>>>>>>>

>>>>>>> But it won't solve the deadlock issue. Since dwc3_gadget_free_endpoints()

>>>>>>> is called after usb_del_gadget_udc() and the deadlock happens when

>>>>>>>

>>>>>>> usb_del_gadget_udc()->udc_stop()->dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>>>

>>>>>>> and DWC3_EP_END_TRANSFER_PENDING flag is set.

>>>>>>

>>>>>> indeed. Iterating twice over the entire endpoint list seems

>>>>>> wasteful. Perhaps we just shouldn't wait when removing the UDC since

>>>>>> that's essentially what this patch will do, right? If you clear the flag

>>>>>> before calling ->udc_stop(), this means the loop in dwc3_gadget_stop()

>>>>>> will do nothing. Might as well remove it.

>>>>>>

>>>>>

>>>>> This means that we will never wait for DWC3_EP_END_TRANSFER_PENDING to clear

>>>>> in dwc3_gadget_stop() like we used to. This is perfectly fine, right?

>>>>>

>>>>> It makes sense to me as dwc3_gadget_stop() calls __dwc3_gadget_stop() which

>>>>> masks all interrupts and nobody will ever clear that flag if it was set.

>>>>

>>>> I don't think so. It can not mask the endpoint events, please check

>>>> the events which will be masked in DEVTEN register. The reason why we

>>>> should wait for DWC3_EP_END_TRANSFER_PENDING to clear is that,

>>>> sometimes the DWC3_DEPEVT_EPCMDCMPLT event will be triggered later

>>>> than 100us, but now we may have freed the gadget irq which will cause

>>>> crash.

>>>

>>> We could mask command complete events as soon as ->udc_stop() is called,

>>> right? Hmm, actually, __dwc3_gadget_stop() already clears DEVTEN

>>> completely.

>>

>> But which bit in DEVTEN says Endpoint events are disabled?

>

> When we set up the DWC3_DEPCMD_ENDTRANSFER command in

> dwc3_stop_active_transfer(), we can do not set DWC3_DEPCMD_CMDIOC,

> then there will no endpoint command complete interrupts I think.

>

> cmd |= DWC3_DEPCMD_CMDIOC;


I remember some part of the databook mandating CMDIOC to be set. We
could test it out without and see if anything blows up. I would,
however, require a lengthy comment explaining that we're deviating from
databook revision x.yya, section foobar because $reasons. :-)

-- 
balbi
Felipe Balbi March 9, 2018, 9:23 a.m. UTC | #10
Hi,

Roger Quadros <rogerq@ti.com> writes:

<snip>

>>> When we set up the DWC3_DEPCMD_ENDTRANSFER command in

>>> dwc3_stop_active_transfer(), we can do not set DWC3_DEPCMD_CMDIOC,

>>> then there will no endpoint command complete interrupts I think.

>>>

>>> cmd |= DWC3_DEPCMD_CMDIOC;

>> 

>> I remember some part of the databook mandating CMDIOC to be set. We

>> could test it out without and see if anything blows up. I would,

>> however, require a lengthy comment explaining that we're deviating from

>> databook revision x.yya, section foobar because $reasons. :-)

>> 

>

> This is what the v3.10 databook says

>

> "When issuing an End Transfer command, software must set the CmdIOC

> bit (field 8) so that an Endpoint Command Complete event is generated

> after the transfer ends. This is necessary to synchronize the

> conclusion of system bus traffic before the End Transfer command is

> completed."

>

> with a note

>

> "If GUCTL2[Rst_actbitlater] is set, Software can poll the completion

> of the End Transfer command by polling the command active bit to be

> cleared to 0."

>

> fyi.

>

> Rst_actbitlater - "Enable clearing of the command active bit for the

> ENDXFER command after the command execution is completed.  This bit is

> valid in device mode only."

>

> So I'd prefer not to clear CMDIOC for all cases.

>

> Could we some how just tackle the dwc3_gadget_exit case like I did in

> this patch?


if you can send a version that doesn't iterate over all endpoints twice,
sure. We still need a comment somewhere, and I fear we may get
interrupts later in some cases. How would we deal with that?

-- 
balbi
Roger Quadros March 9, 2018, 9:49 a.m. UTC | #11
On 09/03/18 11:26, Roger Quadros wrote:
> On 09/03/18 11:23, Felipe Balbi wrote:

>>

>> Hi,

>>

>> Roger Quadros <rogerq@ti.com> writes:

>>

>> <snip>

>>

>>>>> When we set up the DWC3_DEPCMD_ENDTRANSFER command in

>>>>> dwc3_stop_active_transfer(), we can do not set DWC3_DEPCMD_CMDIOC,

>>>>> then there will no endpoint command complete interrupts I think.

>>>>>

>>>>> cmd |= DWC3_DEPCMD_CMDIOC;

>>>>

>>>> I remember some part of the databook mandating CMDIOC to be set. We

>>>> could test it out without and see if anything blows up. I would,

>>>> however, require a lengthy comment explaining that we're deviating from

>>>> databook revision x.yya, section foobar because $reasons. :-)

>>>>

>>>

>>> This is what the v3.10 databook says

>>>

>>> "When issuing an End Transfer command, software must set the CmdIOC

>>> bit (field 8) so that an Endpoint Command Complete event is generated

>>> after the transfer ends. This is necessary to synchronize the

>>> conclusion of system bus traffic before the End Transfer command is

>>> completed."

>>>

>>> with a note

>>>

>>> "If GUCTL2[Rst_actbitlater] is set, Software can poll the completion

>>> of the End Transfer command by polling the command active bit to be

>>> cleared to 0."

>>>

>>> fyi.

>>>

>>> Rst_actbitlater - "Enable clearing of the command active bit for the

>>> ENDXFER command after the command execution is completed.  This bit is

>>> valid in device mode only."

>>>

>>> So I'd prefer not to clear CMDIOC for all cases.

>>>

>>> Could we some how just tackle the dwc3_gadget_exit case like I did in

>>> this patch?

>>

>> if you can send a version that doesn't iterate over all endpoints twice,

>> sure. We still need a comment somewhere, and I fear we may get

>> interrupts later in some cases. How would we deal with that?

>>

> 

> how about explicitly masking that interrupt? Is it possible?

> 


Other easy option is to use wait_event_interruptible_lock_irq_timeout()
instead of wait_event_lock_irq() in dwc3_gadget_stop().

Is a 200ms timeout sufficient? And after the first timeout we assume all
will timeout so no point in waiting 200ms for each endpoint.

-- 
cheers,
-roger

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Felipe Balbi March 9, 2018, 10:36 a.m. UTC | #12
Hi,

Roger Quadros <rogerq@ti.com> writes:
>>> This is what the v3.10 databook says

>>>

>>> "When issuing an End Transfer command, software must set the CmdIOC

>>> bit (field 8) so that an Endpoint Command Complete event is generated

>>> after the transfer ends. This is necessary to synchronize the

>>> conclusion of system bus traffic before the End Transfer command is

>>> completed."

>>>

>>> with a note

>>>

>>> "If GUCTL2[Rst_actbitlater] is set, Software can poll the completion

>>> of the End Transfer command by polling the command active bit to be

>>> cleared to 0."

>>>

>>> fyi.

>>>

>>> Rst_actbitlater - "Enable clearing of the command active bit for the

>>> ENDXFER command after the command execution is completed.  This bit is

>>> valid in device mode only."

>>>

>>> So I'd prefer not to clear CMDIOC for all cases.

>>>

>>> Could we some how just tackle the dwc3_gadget_exit case like I did in

>>> this patch?

>> 

>> if you can send a version that doesn't iterate over all endpoints twice,

>> sure. We still need a comment somewhere, and I fear we may get

>> interrupts later in some cases. How would we deal with that?

>> 

>

> how about explicitly masking that interrupt? Is it possible?


I think I showed that the bit is reserved on recent dwc3 core releases
(anytyhing 2.40a+, at least).

-- 
balbi
Felipe Balbi March 9, 2018, 10:39 a.m. UTC | #13
Hi,

Roger Quadros <rogerq@ti.com> writes:
>>>>>> When we set up the DWC3_DEPCMD_ENDTRANSFER command in

>>>>>> dwc3_stop_active_transfer(), we can do not set DWC3_DEPCMD_CMDIOC,

>>>>>> then there will no endpoint command complete interrupts I think.

>>>>>>

>>>>>> cmd |= DWC3_DEPCMD_CMDIOC;

>>>>>

>>>>> I remember some part of the databook mandating CMDIOC to be set. We

>>>>> could test it out without and see if anything blows up. I would,

>>>>> however, require a lengthy comment explaining that we're deviating from

>>>>> databook revision x.yya, section foobar because $reasons. :-)

>>>>>

>>>>

>>>> This is what the v3.10 databook says

>>>>

>>>> "When issuing an End Transfer command, software must set the CmdIOC

>>>> bit (field 8) so that an Endpoint Command Complete event is generated

>>>> after the transfer ends. This is necessary to synchronize the

>>>> conclusion of system bus traffic before the End Transfer command is

>>>> completed."

>>>>

>>>> with a note

>>>>

>>>> "If GUCTL2[Rst_actbitlater] is set, Software can poll the completion

>>>> of the End Transfer command by polling the command active bit to be

>>>> cleared to 0."

>>>>

>>>> fyi.

>>>>

>>>> Rst_actbitlater - "Enable clearing of the command active bit for the

>>>> ENDXFER command after the command execution is completed.  This bit is

>>>> valid in device mode only."

>>>>

>>>> So I'd prefer not to clear CMDIOC for all cases.

>>>>

>>>> Could we some how just tackle the dwc3_gadget_exit case like I did in

>>>> this patch?

>>>

>>> if you can send a version that doesn't iterate over all endpoints twice,

>>> sure. We still need a comment somewhere, and I fear we may get

>>> interrupts later in some cases. How would we deal with that?

>>>

>> 

>> how about explicitly masking that interrupt? Is it possible?

>> 

>

> Other easy option is to use wait_event_interruptible_lock_irq_timeout()

> instead of wait_event_lock_irq() in dwc3_gadget_stop().

>

> Is a 200ms timeout sufficient? And after the first timeout we assume all

> will timeout so no point in waiting 200ms for each endpoint.


We can do that. And I think some 5ms is more than enough :-) I'd be
surprised if it takes anything over some 200us for the EndTransfer
command to complete.

-- 
balbi
Roger Quadros March 16, 2018, 10:34 a.m. UTC | #14
Hi Felipe,

On 09/03/18 14:47, Roger Quadros wrote:
> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

> after which dual-role switching doesn't work.

> 

> On dra7-evm's dual-role port,

> - Load g_zero gadget driver and enumerate to host

> - suspend to mem

> - disconnect USB cable to host and connect otg cable with Pen drive in it.

> - resume system

> - we sleep indefinitely in _dwc3_set_mode due to.

>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

> 	dwc3_gadget_stop()->wait_event_lock_irq()

> 

> To fix this instead of waiting indefinitely with wait_event_lock_irq()

> we use wait_event_interruptible_lock_irq_timeout() and print

> and error message if there was a timeout.

> 

> Signed-off-by: Roger Quadros <rogerq@ti.com>


Thanks for picking this for -next.
Is it better to have this in v4.16-rc fixes?
and also stable? v4.12+

> ---

> 

> Changelog:

> 

> v2:

> - use wait_event_interruptible_lock_irq_timeout() instead of wait_event_lock_irq()

> 

>  drivers/usb/dwc3/gadget.c | 23 ++++++++++++++++++++---

>  1 file changed, 20 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

> index 2bda4eb..7c3a6e4 100644

> --- a/drivers/usb/dwc3/gadget.c

> +++ b/drivers/usb/dwc3/gadget.c

> @@ -1950,6 +1950,7 @@ static int dwc3_gadget_stop(struct usb_gadget *g)

>  	struct dwc3		*dwc = gadget_to_dwc(g);

>  	unsigned long		flags;

>  	int			epnum;

> +	u32			tmo_eps = 0;

>  

>  	spin_lock_irqsave(&dwc->lock, flags);

>  

> @@ -1960,6 +1961,7 @@ static int dwc3_gadget_stop(struct usb_gadget *g)

>  

>  	for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {

>  		struct dwc3_ep  *dep = dwc->eps[epnum];

> +		int ret;

>  

>  		if (!dep)

>  			continue;

> @@ -1967,9 +1969,24 @@ static int dwc3_gadget_stop(struct usb_gadget *g)

>  		if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))

>  			continue;

>  

> -		wait_event_lock_irq(dep->wait_end_transfer,

> -				    !(dep->flags & DWC3_EP_END_TRANSFER_PENDING),

> -				    dwc->lock);

> +		ret = wait_event_interruptible_lock_irq_timeout(dep->wait_end_transfer,

> +			    !(dep->flags & DWC3_EP_END_TRANSFER_PENDING),

> +			    dwc->lock, msecs_to_jiffies(5));

> +

> +		if (ret <= 0) {

> +			/* Timed out or interrupted! There's nothing much

> +			 * we can do so we just log here and print which

> +			 * endpoints timed out at the end.

> +			 */

> +			tmo_eps |= 1 << epnum;

> +			dep->flags &= DWC3_EP_END_TRANSFER_PENDING;

> +		}

> +	}

> +

> +	if (tmo_eps) {

> +		dev_err(dwc->dev,

> +			"end transfer timed out on endpoints 0x%x [bitmap]\n",

> +			tmo_eps);

>  	}

>  

>  out:

> 


-- 
cheers,
-roger

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Felipe Balbi March 16, 2018, 11 a.m. UTC | #15
Hi,

Roger Quadros <rogerq@ti.com> writes:

> Hi Felipe,

>

> On 09/03/18 14:47, Roger Quadros wrote:

>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>> after which dual-role switching doesn't work.

>> 

>> On dra7-evm's dual-role port,

>> - Load g_zero gadget driver and enumerate to host

>> - suspend to mem

>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>> - resume system

>> - we sleep indefinitely in _dwc3_set_mode due to.

>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>> 	dwc3_gadget_stop()->wait_event_lock_irq()

>> 

>> To fix this instead of waiting indefinitely with wait_event_lock_irq()

>> we use wait_event_interruptible_lock_irq_timeout() and print

>> and error message if there was a timeout.

>> 

>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>

> Thanks for picking this for -next.

> Is it better to have this in v4.16-rc fixes?

> and also stable? v4.12+


Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit
log ;-)

The best we can do now, is wait for -rc1 and manually send the commit to
stable.

-- 
balbi
Roger Quadros March 16, 2018, 11:03 a.m. UTC | #16
On 16/03/18 13:00, Felipe Balbi wrote:
> 

> Hi,

> 

> Roger Quadros <rogerq@ti.com> writes:

> 

>> Hi Felipe,

>>

>> On 09/03/18 14:47, Roger Quadros wrote:

>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>> after which dual-role switching doesn't work.

>>>

>>> On dra7-evm's dual-role port,

>>> - Load g_zero gadget driver and enumerate to host

>>> - suspend to mem

>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>> - resume system

>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>   dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>> 	dwc3_gadget_stop()->wait_event_lock_irq()

>>>

>>> To fix this instead of waiting indefinitely with wait_event_lock_irq()

>>> we use wait_event_interruptible_lock_irq_timeout() and print

>>> and error message if there was a timeout.

>>>

>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>

>> Thanks for picking this for -next.

>> Is it better to have this in v4.16-rc fixes?

>> and also stable? v4.12+

> 

> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit

> log ;-)

> 

> The best we can do now, is wait for -rc1 and manually send the commit to

> stable.

> 


That's fine. Thanks.

-- 
cheers,
-roger

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
Minas Harutyunyan March 16, 2018, 11:43 a.m. UTC | #17
Hi,

On 3/16/2018 3:03 PM, Roger Quadros wrote:
> On 16/03/18 13:00, Felipe Balbi wrote:

>>

>> Hi,

>>

>> Roger Quadros <rogerq@ti.com> writes:

>>

>>> Hi Felipe,

>>>

>>> On 09/03/18 14:47, Roger Quadros wrote:

>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>>> after which dual-role switching doesn't work.

>>>>

>>>> On dra7-evm's dual-role port,

>>>> - Load g_zero gadget driver and enumerate to host

>>>> - suspend to mem

>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>>> - resume system

>>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>>    dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>>> 	dwc3_gadget_stop()->wait_event_lock_irq()

>>>>

>>>> To fix this instead of waiting indefinitely with wait_event_lock_irq()

>>>> we use wait_event_interruptible_lock_irq_timeout() and print

>>>> and error message if there was a timeout.

>>>>

>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>

>>> Thanks for picking this for -next.

>>> Is it better to have this in v4.16-rc fixes?

>>> and also stable? v4.12+

>>

>> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit

>> log ;-)

>>

>> The best we can do now, is wait for -rc1 and manually send the commit to

>> stable.

>>

> 

> That's fine. Thanks.

> 


Same issue seen in dwc3_gadget_ep_dequeue() function where also used 
wait_event_lock_irq() - as result infinite loop.
Actually to fix this issue I updated condition of wait function
from:
!(dep->flags & DWC3_EP_END_TRANSFER_PENDING)
to:
!(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)
Not, sure that this fix is fully correct because I'm familiar with dwc3, 
but this fix allow us to go forward with request dequeue. I think, need 
deeper investigation of infinite loop to catch root cause of it, before 
accept any of fixes.

Thanks,
Minas
Felipe Balbi March 16, 2018, 12:25 p.m. UTC | #18
Hi,

Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> writes:
>>>> On 09/03/18 14:47, Roger Quadros wrote:

>>>>> In the following test we get stuck by sleeping forever in _dwc3_set_mode()

>>>>> after which dual-role switching doesn't work.

>>>>>

>>>>> On dra7-evm's dual-role port,

>>>>> - Load g_zero gadget driver and enumerate to host

>>>>> - suspend to mem

>>>>> - disconnect USB cable to host and connect otg cable with Pen drive in it.

>>>>> - resume system

>>>>> - we sleep indefinitely in _dwc3_set_mode due to.

>>>>>    dwc3_gadget_exit()->usb_del_gadget_udc()->udc_stop()->

>>>>> 	dwc3_gadget_stop()->wait_event_lock_irq()

>>>>>

>>>>> To fix this instead of waiting indefinitely with wait_event_lock_irq()

>>>>> we use wait_event_interruptible_lock_irq_timeout() and print

>>>>> and error message if there was a timeout.

>>>>>

>>>>> Signed-off-by: Roger Quadros <rogerq@ti.com>

>>>>

>>>> Thanks for picking this for -next.

>>>> Is it better to have this in v4.16-rc fixes?

>>>> and also stable? v4.12+

>>>

>>> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit

>>> log ;-)

>>>

>>> The best we can do now, is wait for -rc1 and manually send the commit to

>>> stable.

>>>

>> 

>> That's fine. Thanks.

>> 

>

> Same issue seen in dwc3_gadget_ep_dequeue() function where also used 

> wait_event_lock_irq() - as result infinite loop.


how did this happen? During rmmod dwc3? Or, perhaps, after you unloaded
a gadget driver?

> Actually to fix this issue I updated condition of wait function

> from:

> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING)

> to:

> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)


you're not fixing anything. You're, essentially, removing the entire
end transfer pending logic. The whole idea of this is that we can
disable the endpoint and wait for the End Transfer interrupt. When you
add a check for the endpoint being enabled, then that code will never
run and, thus, never wait for the End Transfer IRQ.

If you manage to find a more reliable way of reproducing this, then make
sure to capture dwc3 tracepoints (see the documentation for details) and
let's start trying to figure out what's going on.

cheers

-- 
balbi
Felipe Balbi March 19, 2018, 8:54 a.m. UTC | #19
Hi,

Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> writes:
>>>>>> Thanks for picking this for -next.

>>>>>> Is it better to have this in v4.16-rc fixes?

>>>>>> and also stable? v4.12+

>>>>>

>>>>> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit

>>>>> log ;-)

>>>>>

>>>>> The best we can do now, is wait for -rc1 and manually send the commit to

>>>>> stable.

>>>>>

>>>>

>>>> That's fine. Thanks.

>>>>

>>>

>>> Same issue seen in dwc3_gadget_ep_dequeue() function where also used

>>> wait_event_lock_irq() - as result infinite loop.

>> 

>> how did this happen? During rmmod dwc3? Or, perhaps, after you unloaded

>> a gadget driver?

>> 

> No, not during rmmod's.

> We using our internal USB testing tool. Test case; ISOC OUT, transfer 

> size N frames. When host starts ISOC OUT traffic then the dwc3 based on 

> "Transfer not ready" event in frame F starts transfers staring from 

> frame F+4 (for bInterval=1) as result 4 requests, which already queued 

> on device side, remain incomplete. Function driver on some timeout 

> trying dequeue these 4 requests (without disabling EP) to complete test.

> For IN ISOC's these requests completed on MISSED ISOC event, but for 

> ISOC OUT required call dequeue on some timeout.


okay

>>> Actually to fix this issue I updated condition of wait function

>>> from:

>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING)

>>> to:

>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)

>> 

>> you're not fixing anything. You're, essentially, removing the entire

>> end transfer pending logic. 

> yes, you are right, but how to overcome this infinite loop? Replace 

> wait_event_lock_irq() by  wait_event_interruptible_lock_irq_timeout()?


The best way here would be to figure why we're missing command complete
IRQ in those cases. According to documentation, we *should* receive that
interrupt, so why is it missing?

-- 
balbi
Minas Harutyunyan March 19, 2018, 11:36 a.m. UTC | #20
Hi,

On 3/19/2018 12:55 PM, Felipe Balbi wrote:
> 

> Hi,

> 

> Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> writes:

>>>>>>> Thanks for picking this for -next.

>>>>>>> Is it better to have this in v4.16-rc fixes?

>>>>>>> and also stable? v4.12+

>>>>>>

>>>>>> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit

>>>>>> log ;-)

>>>>>>

>>>>>> The best we can do now, is wait for -rc1 and manually send the commit to

>>>>>> stable.

>>>>>>

>>>>>

>>>>> That's fine. Thanks.

>>>>>

>>>>

>>>> Same issue seen in dwc3_gadget_ep_dequeue() function where also used

>>>> wait_event_lock_irq() - as result infinite loop.

>>>

>>> how did this happen? During rmmod dwc3? Or, perhaps, after you unloaded

>>> a gadget driver?

>>>

>> No, not during rmmod's.

>> We using our internal USB testing tool. Test case; ISOC OUT, transfer

>> size N frames. When host starts ISOC OUT traffic then the dwc3 based on

>> "Transfer not ready" event in frame F starts transfers staring from

>> frame F+4 (for bInterval=1) as result 4 requests, which already queued

>> on device side, remain incomplete. Function driver on some timeout

>> trying dequeue these 4 requests (without disabling EP) to complete test.

>> For IN ISOC's these requests completed on MISSED ISOC event, but for

>> ISOC OUT required call dequeue on some timeout.

> 

> okay

> 

>>>> Actually to fix this issue I updated condition of wait function

>>>> from:

>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING)

>>>> to:

>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)

>>>

>>> you're not fixing anything. You're, essentially, removing the entire

>>> end transfer pending logic.

>> yes, you are right, but how to overcome this infinite loop? Replace

>> wait_event_lock_irq() by  wait_event_interruptible_lock_irq_timeout()?

> 

> The best way here would be to figure why we're missing command complete

> IRQ in those cases. According to documentation, we *should* receive that

> interrupt, so why is it missing?

> 


Additional info on test. Core configuration is HS only mode, test speed 
HS, core version v2.90a. Maybe it will help to understand cause of issue.
BTW, currently to pass above describe ISOC OUT test we just commented 
wait_event_lock_irq() in dwc3_gadget_ep_dequeue() function and 
successfully received request completion in function driver.
Thanks,
Minas
Minas Harutyunyan March 19, 2018, 1:53 p.m. UTC | #21
Hi,

On 3/19/2018 3:36 PM, Minas Harutyunyan wrote:
> Hi,

> 

> On 3/19/2018 12:55 PM, Felipe Balbi wrote:

>>

>> Hi,

>>

>> Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> writes:

>>>>>>>> Thanks for picking this for -next.

>>>>>>>> Is it better to have this in v4.16-rc fixes?

>>>>>>>> and also stable? v4.12+

>>>>>>>

>>>>>>> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit

>>>>>>> log ;-)

>>>>>>>

>>>>>>> The best we can do now, is wait for -rc1 and manually send the commit to

>>>>>>> stable.

>>>>>>>

>>>>>>

>>>>>> That's fine. Thanks.

>>>>>>

>>>>>

>>>>> Same issue seen in dwc3_gadget_ep_dequeue() function where also used

>>>>> wait_event_lock_irq() - as result infinite loop.

>>>>

>>>> how did this happen? During rmmod dwc3? Or, perhaps, after you unloaded

>>>> a gadget driver?

>>>>

>>> No, not during rmmod's.

>>> We using our internal USB testing tool. Test case; ISOC OUT, transfer

>>> size N frames. When host starts ISOC OUT traffic then the dwc3 based on

>>> "Transfer not ready" event in frame F starts transfers staring from

>>> frame F+4 (for bInterval=1) as result 4 requests, which already queued

>>> on device side, remain incomplete. Function driver on some timeout

>>> trying dequeue these 4 requests (without disabling EP) to complete test.

>>> For IN ISOC's these requests completed on MISSED ISOC event, but for

>>> ISOC OUT required call dequeue on some timeout.

>>

>> okay

>>

>>>>> Actually to fix this issue I updated condition of wait function

>>>>> from:

>>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING)

>>>>> to:

>>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)

>>>>

>>>> you're not fixing anything. You're, essentially, removing the entire

>>>> end transfer pending logic.

>>> yes, you are right, but how to overcome this infinite loop? Replace

>>> wait_event_lock_irq() by  wait_event_interruptible_lock_irq_timeout()?

>>

>> The best way here would be to figure why we're missing command complete

>> IRQ in those cases. According to documentation, we *should* receive that

>> interrupt, so why is it missing?

>>

> 

> Additional info on test. Core configuration is HS only mode, test speed

> HS, core version v2.90a. Maybe it will help to understand cause of issue.

> BTW, currently to pass above describe ISOC OUT test we just commented

> wait_event_lock_irq() in dwc3_gadget_ep_dequeue() function and

> successfully received request completion in function driver.

> Thanks,

> Minas

> 


One more info: while function driver call dequeue, host periodically 
send control read command to get status of test from function - test In 
Progress or Finished.
Thanks,
Minas
Minas Harutyunyan April 10, 2018, 6:29 a.m. UTC | #22
Hi Filipe,

On 3/19/2018 5:53 PM, Minas Harutyunyan wrote:
> Hi,

> 

> On 3/19/2018 3:36 PM, Minas Harutyunyan wrote:

>> Hi,

>>

>> On 3/19/2018 12:55 PM, Felipe Balbi wrote:

>>>

>>> Hi,

>>>

>>> Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> writes:

>>>>>>>>> Thanks for picking this for -next.

>>>>>>>>> Is it better to have this in v4.16-rc fixes?

>>>>>>>>> and also stable? v4.12+

>>>>>>>>

>>>>>>>> Well, there was no "Fixes: foobar" or "Cc: stable" lines in the commit

>>>>>>>> log ;-)

>>>>>>>>

>>>>>>>> The best we can do now, is wait for -rc1 and manually send the commit to

>>>>>>>> stable.

>>>>>>>>

>>>>>>>

>>>>>>> That's fine. Thanks.

>>>>>>>

>>>>>>

>>>>>> Same issue seen in dwc3_gadget_ep_dequeue() function where also used

>>>>>> wait_event_lock_irq() - as result infinite loop.

>>>>>

>>>>> how did this happen? During rmmod dwc3? Or, perhaps, after you unloaded

>>>>> a gadget driver?

>>>>>

>>>> No, not during rmmod's.

>>>> We using our internal USB testing tool. Test case; ISOC OUT, transfer

>>>> size N frames. When host starts ISOC OUT traffic then the dwc3 based on

>>>> "Transfer not ready" event in frame F starts transfers staring from

>>>> frame F+4 (for bInterval=1) as result 4 requests, which already queued

>>>> on device side, remain incomplete. Function driver on some timeout

>>>> trying dequeue these 4 requests (without disabling EP) to complete test.

>>>> For IN ISOC's these requests completed on MISSED ISOC event, but for

>>>> ISOC OUT required call dequeue on some timeout.

>>>

>>> okay

>>>

>>>>>> Actually to fix this issue I updated condition of wait function

>>>>>> from:

>>>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING)

>>>>>> to:

>>>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)

>>>>>

>>>>> you're not fixing anything. You're, essentially, removing the entire

>>>>> end transfer pending logic.

>>>> yes, you are right, but how to overcome this infinite loop? Replace

>>>> wait_event_lock_irq() by  wait_event_interruptible_lock_irq_timeout()?

>>>

>>> The best way here would be to figure why we're missing command complete

>>> IRQ in those cases. According to documentation, we *should* receive that

>>> interrupt, so why is it missing?

>>>

>>

>> Additional info on test. Core configuration is HS only mode, test speed

>> HS, core version v2.90a. Maybe it will help to understand cause of issue.

>> BTW, currently to pass above describe ISOC OUT test we just commented

>> wait_event_lock_irq() in dwc3_gadget_ep_dequeue() function and

>> successfully received request completion in function driver.

>> Thanks,

>> Minas

>>

> 

> One more info: while function driver call dequeue, host periodically

> send control read command to get status of test from function - test In

> Progress or Finished.

> Thanks,

> Minas

> 


Your last dwc3 patch series allow us to successfully dequeuing remaining 
requests without falling in to infinite loop.

Thank you,
Minas
Felipe Balbi April 10, 2018, 7:31 a.m. UTC | #23
Hi,

Minas Harutyunyan <Minas.Harutyunyan@synopsys.com> writes:
>>>>>>> Actually to fix this issue I updated condition of wait function

>>>>>>> from:

>>>>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING)

>>>>>>> to:

>>>>>>> !(dep->flags & DWC3_EP_END_TRANSFER_PENDING & DWC3_EP_ENABLED)

>>>>>>

>>>>>> you're not fixing anything. You're, essentially, removing the entire

>>>>>> end transfer pending logic.

>>>>> yes, you are right, but how to overcome this infinite loop? Replace

>>>>> wait_event_lock_irq() by  wait_event_interruptible_lock_irq_timeout()?

>>>>

>>>> The best way here would be to figure why we're missing command complete

>>>> IRQ in those cases. According to documentation, we *should* receive that

>>>> interrupt, so why is it missing?

>>>>

>>>

>>> Additional info on test. Core configuration is HS only mode, test speed

>>> HS, core version v2.90a. Maybe it will help to understand cause of issue.

>>> BTW, currently to pass above describe ISOC OUT test we just commented

>>> wait_event_lock_irq() in dwc3_gadget_ep_dequeue() function and

>>> successfully received request completion in function driver.

>>> Thanks,

>>> Minas

>>>

>> 

>> One more info: while function driver call dequeue, host periodically

>> send control read command to get status of test from function - test In

>> Progress or Finished.

>> Thanks,

>> Minas

>> 

>

> Your last dwc3 patch series allow us to successfully dequeuing remaining 

> requests without falling in to infinite loop.


that's cool, thanks :-) I'll just fix the documentation bug I introduced
heh :-)

-- 
balbi
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 2bda4eb..0a360da 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -3273,6 +3273,20 @@  int dwc3_gadget_init(struct dwc3 *dwc)
 
 void dwc3_gadget_exit(struct dwc3 *dwc)
 {
+	int epnum;
+	unsigned long flags;
+
+	spin_lock_irqsave(&dwc->lock, flags);
+	for (epnum = 2; epnum < DWC3_ENDPOINTS_NUM; epnum++) {
+		struct dwc3_ep  *dep = dwc->eps[epnum];
+
+		if (!dep)
+			continue;
+
+		dep->flags &= ~DWC3_EP_END_TRANSFER_PENDING;
+	}
+	spin_unlock_irqrestore(&dwc->lock, flags);
+
 	usb_del_gadget_udc(&dwc->gadget);
 	dwc3_gadget_free_endpoints(dwc);
 	dma_free_coherent(dwc->sysdev, DWC3_BOUNCE_SIZE, dwc->bounce,