diff mbox series

usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0

Message ID 20230102050831.105499-1-jh0801.jung@samsung.com
State New
Headers show
Series usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0 | expand

Commit Message

정재훈 Jan. 2, 2023, 5:08 a.m. UTC
Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
It must not have these status. Because, It can make happen interrupt storming
status.
So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
It means "There are no interrupts to handle.".

(struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
	(void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
	(void *) cache = 0xFFFFFF8839F54080,
	(unsigned int) length = 0x1000,
	(unsigned int) lpos = 0x0,
	(unsigned int) count = 0x0,
	(unsigned int) flags = 0x00000001,
	(dma_addr_t) dma = 0x00000008BD7D7000,
	(struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
	(u64) android_kabi_reserved1 = 0x0),

(time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
(time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
(time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),

Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
---
 drivers/usb/dwc3/gadget.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Felipe Balbi Jan. 3, 2023, 3:48 p.m. UTC | #1
Hi,

JaeHun Jung <jh0801.jung@samsung.com> writes:
> Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
> It must not have these status. Because, It can make happen interrupt storming
> status.
> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> It means "There are no interrupts to handle.".
>
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> 	(void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> 	(void *) cache = 0xFFFFFF8839F54080,
> 	(unsigned int) length = 0x1000,
> 	(unsigned int) lpos = 0x0,
> 	(unsigned int) count = 0x0,
> 	(unsigned int) flags = 0x00000001,
> 	(dma_addr_t) dma = 0x00000008BD7D7000,
> 	(struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> 	(u64) android_kabi_reserved1 = 0x0),
>
> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>
> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>

Please get in the habit of running patches through
scripts/get_maintainer.pl. If you did, it would tell you Thinh is the
new maintainer for dwc3.
Linyu Yuan Jan. 5, 2023, 3:29 a.m. UTC | #2
On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
> It must not have these status. Because, It can make happen interrupt storming
> status.

could you help explain without clear the flag, how interrupt storming 
happen ?

as your change didn't touch any hardware register, i don't know how it 
fix storming.

storming from software layer ?

> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> It means "There are no interrupts to handle.".
>
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> 	(void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> 	(void *) cache = 0xFFFFFF8839F54080,
> 	(unsigned int) length = 0x1000,
> 	(unsigned int) lpos = 0x0,
> 	(unsigned int) count = 0x0,
> 	(unsigned int) flags = 0x00000001,
> 	(dma_addr_t) dma = 0x00000008BD7D7000,
> 	(struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> 	(u64) android_kabi_reserved1 = 0x0),
>
> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>
> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
> ---
>   drivers/usb/dwc3/gadget.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 789976567f9f..5d2d5a9b9915 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>   	 * irq event handler completes before caching new event to prevent
>   	 * losing events.
>   	 */
> -	if (evt->flags & DWC3_EVENT_PENDING)
> +	if (evt->flags & DWC3_EVENT_PENDING) {
> +		if (!evt->count)
> +			evt->flags &= ~DWC3_EVENT_PENDING;
>   		return IRQ_HANDLED;
> +	}
>   
>   	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>   	count &= DWC3_GEVNTCOUNT_MASK;

how about below change ?

         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
         count &= DWC3_GEVNTCOUNT_MASK;
-       if (!count)
+       if (!count) {
+               evt->flags &= ~DWC3_EVENT_PENDING;
                 return IRQ_NONE;
+       }

         evt->count = count;
         evt->flags |= DWC3_EVENT_PENDING;
정재훈 Jan. 5, 2023, 9:54 a.m. UTC | #3
> -----Original Message-----
> From: Linyu Yuan [mailto:quic_linyyuan@quicinc.com]
> Sent: Thursday, January 5, 2023 12:35 PM
> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
> 
> 
> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> > On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> >> Sometimes very rarely, The count is 0 and the DWC3 flag is set has
> >> status.
> >> It must not have these status. Because, It can make happen interrupt
> >> storming status.
> >
> > could you help explain without clear the flag, how interrupt storming
> > happen ?
> >
> > as your change didn't touch any hardware register, i don't know how it
> > fix storming.
> >
H/W interrupts are still occur on IP.
But. ISR doesn't handle it. Because of this
"if (evt->flags & DWC3_EVENT_PENDING)"

This is event values.
(struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 = kernel_size_le_lo32+0xFFFFFF883B391180 -> (
        (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000 -> ,
        (void *) cache = 0xFFFFFF8839F54080 = kernel_size_le_lo32+0xFFFFFF88376F4080 -> ,
        (unsigned int) length = 0x1000,
        (unsigned int) lpos = 0x0,
        (unsigned int) count = 0x0,
        (unsigned int) flags = 0x00100001,
        (dma_addr_t) dma = 0x00000008BD7D7000,

*((struct dwc3_event_type *) 0xFFFFFF8839F54080) = (
    is_devspec = 1,
    type = 0,
    reserved8_31 = 774)

*((struct dwc3_event_devt  *) 0xFFFFFF8839F54080) = (
    one_bit = 1,
    device_event = 0,
    type = 6, << DWC3_DEVICE_EVENT_SUSPEND
    reserved15_12 = 0,
    event_info = 3,
    reserved31_25 = 0)

Occurred interrupts are "DWC3_DEVICE_EVENT_SUSPEND".
If when "count has 0" and DWC3_EVENT_PENDING are set at the same time than
 ISR and bottom thread couldn't handle next interrupt.
We guessing connected with "dwc3_gadget_process_pending_events()" function.
But, We could not find reproduce way.


> > storming from software layer ?
> >
> >> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> >> It means "There are no interrupts to handle.".
> >>
> >> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> >>     (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> >>     (void *) cache = 0xFFFFFF8839F54080,
> >>     (unsigned int) length = 0x1000,
> >>     (unsigned int) lpos = 0x0,
> >>     (unsigned int) count = 0x0,
> >>     (unsigned int) flags = 0x00000001,
> >>     (dma_addr_t) dma = 0x00000008BD7D7000,
> >>     (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> >>     (u64) android_kabi_reserved1 = 0x0),
> >>
> >> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 1), (time = 47557628931268, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 3), (time = 47557628932383, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 1), (time = 47557628932652, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
> >> 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> >> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 3), (time = 47557628935152, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 1), (time = 47557628935460, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 3), (time = 47557628936575, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
> >> 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> >> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 1), (time = 47557628938229, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 3), (time = 47557628939345, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 1), (time = 47557628939652, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
> >> 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> >> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 3), (time = 47557628942152, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 1), (time = 47557628942422, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 3), (time = 47557628943537, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
> >> 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> >> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 1), (time = 47557628945229, irq = 165, fn = dwc3_interrupt,
> >> latency = 0, en = 3), (time = 47557628946345, irq = 165, fn =
> >> dwc3_interrupt, latency = 0, en = 1), (time = 47557628946614, irq =
> >> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
> >> 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> >> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0,
> >> en = 3),
> >>
> >> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
> >> ---
> >>   drivers/usb/dwc3/gadget.c | 5 ++++-
> >>   1 file changed, 4 insertions(+), 1 deletion(-) diff --git
> >> a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index
> >> 789976567f9f..5d2d5a9b9915 100644
> >> --- a/drivers/usb/dwc3/gadget.c
> >> +++ b/drivers/usb/dwc3/gadget.c
> >> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct
> >> dwc3_event_buffer *evt)
> >>        * irq event handler completes before caching new event to
> >> prevent
> >>        * losing events.
> >>        */
> >> -    if (evt->flags & DWC3_EVENT_PENDING)
> >> +    if (evt->flags & DWC3_EVENT_PENDING) {
> >> +        if (!evt->count)
> >> +            evt->flags &= ~DWC3_EVENT_PENDING;
> >>           return IRQ_HANDLED;
> >> +    }
> >>         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> >>       count &= DWC3_GEVNTCOUNT_MASK;
> >
> > how about below change ?
> >
> >         count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
> >         count &= DWC3_GEVNTCOUNT_MASK;
> > -       if (!count)
> > +       if (!count) {
> > +               evt->flags &= ~DWC3_EVENT_PENDING;
> >                 return IRQ_NONE;
> > +       }
> ignore my suggestion, add Thinh to comment.
> >
> >         evt->count = count;
> >         evt->flags |= DWC3_EVENT_PENDING;
> >
Linyu Yuan Jan. 6, 2023, 3:13 a.m. UTC | #4
On 1/5/2023 5:54 PM, 정재훈 wrote:
>
>> -----Original Message-----
>> From: Linyu Yuan [mailto:quic_linyyuan@quicinc.com]
>> Sent: Thursday, January 5, 2023 12:35 PM
>> To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
>> Cc: open list:USB XHCI DRIVER; open list; Seungchull Suh; Daehwan Jung
>> Subject: Re: [PATCH] usb: dwc3: Clear DWC3_EVENT_PENDING when count is 0
>>
>>
>> On 1/5/2023 11:29 AM, Linyu Yuan wrote:
>>> On 1/2/2023 1:08 PM, JaeHun Jung wrote:
>>>> Sometimes very rarely, The count is 0 and the DWC3 flag is set has
>>>> status.
>>>> It must not have these status. Because, It can make happen interrupt
>>>> storming status.
>>> could you help explain without clear the flag, how interrupt storming
>>> happen ?
>>>
>>> as your change didn't touch any hardware register, i don't know how it
>>> fix storming.
>>>
> H/W interrupts are still occur on IP.

I guess we should fix it from IP layer.


but when checking DWC3_EVENT_PENDING flag, it will auto clear in dwc3 
thread irq handler.

there is one possible root cause is it cleared only after irq enabled in 
dwc3_process_event_buf(),

we should move unmask irq operation at end of this function.


> But. ISR doesn't handle it. Because of this
> "if (evt->flags & DWC3_EVENT_PENDING)"
>
> This is event values.
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 = kernel_size_le_lo32+0xFFFFFF883B391180 -> (
>          (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000 -> ,
>          (void *) cache = 0xFFFFFF8839F54080 = kernel_size_le_lo32+0xFFFFFF88376F4080 -> ,
>          (unsigned int) length = 0x1000,
>          (unsigned int) lpos = 0x0,
>          (unsigned int) count = 0x0,
>          (unsigned int) flags = 0x00100001,
>          (dma_addr_t) dma = 0x00000008BD7D7000,
>
> *((struct dwc3_event_type *) 0xFFFFFF8839F54080) = (
>      is_devspec = 1,
>      type = 0,
>      reserved8_31 = 774)
>
> *((struct dwc3_event_devt  *) 0xFFFFFF8839F54080) = (
>      one_bit = 1,
>      device_event = 0,
>      type = 6, << DWC3_DEVICE_EVENT_SUSPEND
>      reserved15_12 = 0,
>      event_info = 3,
>      reserved31_25 = 0)
>
> Occurred interrupts are "DWC3_DEVICE_EVENT_SUSPEND".
> If when "count has 0" and DWC3_EVENT_PENDING are set at the same time than
>   ISR and bottom thread couldn't handle next interrupt.
> We guessing connected with "dwc3_gadget_process_pending_events()" function.
> But, We could not find reproduce way.
>
>
>>> storming from software layer ?
>>>
>>>> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
>>>> It means "There are no interrupts to handle.".
>>>>
>>>> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
>>>>      (void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
>>>>      (void *) cache = 0xFFFFFF8839F54080,
>>>>      (unsigned int) length = 0x1000,
>>>>      (unsigned int) lpos = 0x0,
>>>>      (unsigned int) count = 0x0,
>>>>      (unsigned int) flags = 0x00000001,
>>>>      (dma_addr_t) dma = 0x00000008BD7D7000,
>>>>      (struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
>>>>      (u64) android_kabi_reserved1 = 0x0),
>>>>
>>>> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 1), (time = 47557628931268, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 3), (time = 47557628932383, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 1), (time = 47557628932652, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
>>>> 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>>>> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 3), (time = 47557628935152, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 1), (time = 47557628935460, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 3), (time = 47557628936575, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
>>>> 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>>>> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 1), (time = 47557628938229, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 3), (time = 47557628939345, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 1), (time = 47557628939652, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
>>>> 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>>>> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 3), (time = 47557628942152, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 1), (time = 47557628942422, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 3), (time = 47557628943537, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 1), (time =
>>>> 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
>>>> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 1), (time = 47557628945229, irq = 165, fn = dwc3_interrupt,
>>>> latency = 0, en = 3), (time = 47557628946345, irq = 165, fn =
>>>> dwc3_interrupt, latency = 0, en = 1), (time = 47557628946614, irq =
>>>> 165, fn = dwc3_interrupt, latency = 0, en = 3), (time =
>>>> 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
>>>> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0,
>>>> en = 3),
>>>>
>>>> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
>>>> ---
>>>>    drivers/usb/dwc3/gadget.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-) diff --git
>>>> a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index
>>>> 789976567f9f..5d2d5a9b9915 100644
>>>> --- a/drivers/usb/dwc3/gadget.c
>>>> +++ b/drivers/usb/dwc3/gadget.c
>>>> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct
>>>> dwc3_event_buffer *evt)
>>>>         * irq event handler completes before caching new event to
>>>> prevent
>>>>         * losing events.
>>>>         */
>>>> -    if (evt->flags & DWC3_EVENT_PENDING)
>>>> +    if (evt->flags & DWC3_EVENT_PENDING) {
>>>> +        if (!evt->count)
>>>> +            evt->flags &= ~DWC3_EVENT_PENDING;
>>>>            return IRQ_HANDLED;
>>>> +    }
>>>>          count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>>        count &= DWC3_GEVNTCOUNT_MASK;
>>> how about below change ?
>>>
>>>          count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>>>          count &= DWC3_GEVNTCOUNT_MASK;
>>> -       if (!count)
>>> +       if (!count) {
>>> +               evt->flags &= ~DWC3_EVENT_PENDING;
>>>                  return IRQ_NONE;
>>> +       }
>> ignore my suggestion, add Thinh to comment.
>>>          evt->count = count;
>>>          evt->flags |= DWC3_EVENT_PENDING;
>>>
Thinh Nguyen Jan. 9, 2023, 6:35 p.m. UTC | #5
Hi,

On Mon, Jan 02, 2023, JaeHun Jung wrote:
> Sometimes very rarely, The count is 0 and the DWC3 flag is set has status.
> It must not have these status. Because, It can make happen interrupt storming
> status.
> So, It have to clean up DWC3_EVENT_PENDING flags set when count is 0.
> It means "There are no interrupts to handle.".

Can you reword to include the explanation from my response [*] and add a
fixes tag to 7441b273388b ("usb: dwc3: gadget: Fix event pending check")

Thanks,
Thinh

[*] https://lore.kernel.org/linux-usb/20230109182813.sle5h34wdgglnlph@synopsys.com/T/#md32deea7e6b3c6d309aadba3d1cd2800d173685e


> 
> (struct dwc3_event_buffer *) ev_buf = 0xFFFFFF883DBF1180 (
> 	(void *) buf = 0xFFFFFFC00DBDD000 = end+0x337D000,
> 	(void *) cache = 0xFFFFFF8839F54080,
> 	(unsigned int) length = 0x1000,
> 	(unsigned int) lpos = 0x0,
> 	(unsigned int) count = 0x0,
> 	(unsigned int) flags = 0x00000001,
> 	(dma_addr_t) dma = 0x00000008BD7D7000,
> 	(struct dwc3 *) dwc = 0xFFFFFF8839CBC880,
> 	(u64) android_kabi_reserved1 = 0x0),
> 
> (time = 47557628930999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628931268, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628932383, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628932652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628933768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628934037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628935152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628935460, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628936575, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628936845, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628937960, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628938229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628939345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628939652, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628940768, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628941037, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628942152, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628942422, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628943537, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628943806, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628944922, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628945229, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628946345, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628946614, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> (time = 47557628947729, irq = 165, fn = dwc3_interrupt, latency = 0, en = 1),
> (time = 47557628947999, irq = 165, fn = dwc3_interrupt, latency = 0, en = 3),
> 
> Signed-off-by: JaeHun Jung <jh0801.jung@samsung.com>
> ---
>  drivers/usb/dwc3/gadget.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
> index 789976567f9f..5d2d5a9b9915 100644
> --- a/drivers/usb/dwc3/gadget.c
> +++ b/drivers/usb/dwc3/gadget.c
> @@ -4355,8 +4355,11 @@ static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
>  	 * irq event handler completes before caching new event to prevent
>  	 * losing events.
>  	 */
> -	if (evt->flags & DWC3_EVENT_PENDING)
> +	if (evt->flags & DWC3_EVENT_PENDING) {
> +		if (!evt->count)
> +			evt->flags &= ~DWC3_EVENT_PENDING;
>  		return IRQ_HANDLED;
> +	}
>  
>  	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
>  	count &= DWC3_GEVNTCOUNT_MASK;
> -- 
> 2.31.1
>
Thinh Nguyen Jan. 11, 2023, midnight UTC | #6
On Tue, Jan 10, 2023, Linyu Yuan wrote:
> 
> On 1/10/2023 11:13 AM, Linyu Yuan wrote:
> > 
> > On 1/10/2023 11:05 AM, Linyu Yuan wrote:
> > > 
> > > On 1/10/2023 10:53 AM, Thinh Nguyen wrote:
> > > > On Tue, Jan 10, 2023, Linyu Yuan wrote:
> > > > > On 1/10/2023 2:28 AM, Thinh Nguyen wrote:
> > > > > > On Fri, Jan 06, 2023, Linyu Yuan wrote:
> > > > > > > On 1/5/2023 5:54 PM, 정재훈 wrote:
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Linyu Yuan [mailto:quic_linyyuan@quicinc.com]
> > > > > > > > > Sent: Thursday, January 5, 2023 12:35 PM
> > > > > > > > > To: JaeHun Jung; Felipe Balbi; Greg Kroah-Hartman; Thinh Nguyen
> > > > > > > > > Cc: open list:USB XHCI DRIVER; open list;
> > > > > > > > > Seungchull Suh; Daehwan Jung
> > > > > > > > > Subject: Re: [PATCH] usb: dwc3: Clear
> > > > > > > > > DWC3_EVENT_PENDING when count is 0
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > On 1/5/2023 11:29 AM, Linyu Yuan wrote:
> > > > > > > > > > On 1/2/2023 1:08 PM, JaeHun Jung wrote:
> > > > > > > > > > > Sometimes very rarely, The count is
> > > > > > > > > > > 0 and the DWC3 flag is set has
> > > > > > > > > > > status.
> > > > > > > > > > > It must not have these status.
> > > > > > > > > > > Because, It can make happen
> > > > > > > > > > > interrupt
> > > > > > > > > > > storming status.
> > > > > > > > > > could you help explain without clear the
> > > > > > > > > > flag, how interrupt storming
> > > > > > > > > > happen ?
> > > > > > > > > > 
> > > > > > > > > > as your change didn't touch any hardware
> > > > > > > > > > register, i don't know how it
> > > > > > > > > > fix storming.
> > > > > > > > > > 
> > > > > > > > H/W interrupts are still occur on IP.
> > > > > > > I guess we should fix it from IP layer.
> > > > > > > 
> > > > > > How are you certain the problem is from IP layer?
> > > > > I think all IRQ is from DWC3 controller IP. if it is not IP
> > > > > layer, could you
> > > > > share how to prevent from SW layer ?
> > > > > 
> > > > > seem IRQ can happen when event count is zero ,  why this can
> > > > > happen ? does
> > > > > it mean event count register is not trust ?
> > > > When the interrupt is unmasked, the controller will generate interrupts
> > > > as events are received. Normally, the flag checking for pending event
> > > > should be cleared before unmasking the interrupt, but we clear it after
> > > > to account for possible false interrupt due to the nature of legacy pci
> > > > interrupt. This exposes a gap where the interrupts can come but
> > > > the flag
> > > > isn't cleared. While it should be rare and shouldn't be too much of a
> > > > problem, we can avoid this scenario with some additional checks.
> > > > 
> > > > > > > but when checking DWC3_EVENT_PENDING flag, it will
> > > > > > > auto clear in dwc3 thread
> > > > > > > irq handler.
> > > > > > > 
> > > > > > > there is one possible root cause is it cleared only
> > > > > > > after irq enabled in
> > > > > > > dwc3_process_event_buf(),
> > > > > > > 
> > > > > > > we should move unmask irq operation at end of this function.
> > > > > > > 
> > > > > > This interrupt storm can happen because we clear the
> > > > > > evt->flags _after_
> > > > > > we unmask the interrupt. This was done to prevent false
> > > > > > interrupt from
> > > > > > delay interrupt deassertion, which can be a problem for legacy pci
> > > > > > interrupt.
> > > > > thanks for explain, i didn't know that.
> > > > > > see 7441b273388b ("usb: dwc3: gadget: Fix event pending check")
> > > > > > 
> > > > > > The change JaeHun Jung did should be fine.
> > > > > agree.
> > > > The change may still need some additional check as suggested in my
> > > > response:
> > > > https://urldefense.com/v3/__https://lore.kernel.org/linux-usb/20230109190914.3blihjfjdcszazdd@synopsys.com/T/*m7b907aa6da4023cb20fa00a57813d31fd84e081f__;Iw!!A4F2R9G_pg!dchc0UAeZnm7PcA-aav_58rsw7agMygwPSGC_kN8Ga_BS3oTbLMNh3DkfQ9B2njbva-tJzTj7Cb2dJnE5RbEW6KJnDmtFA$
> > > > 
> > >  do you think we need to read event count before checking
> > > DWC3_EVENT_PENDING  ?
> > sorry for this noise, may be i have a little understanding of the legcy
> > pci issue now.
> one more question, is it legacy PCIe device still exist in real world ? and
> any VID/PID info ?

Currently, all dwc3 PCIe devices are affected. Some setups are more
noticeable than others. The dwc3 driver is implemented to probe platform
devices. So, dwc3 PCIe devices are wrapped as platform devices for the
dwc3 driver. Since we're going through the platform device code path,
the pci layer falls back to using legacy interrupt instead of MSI (last
I check awhile ago).

A little more detail on this problem:
PCIe legacy interrupt will emulate interrupt line by sending an
interrupt assert and deassert messages. After the interrupt assert
message is sent, interrupts are continuously generated until the
deassert message is sent. If there's a register write to unmask/mask
interrupt or clearing events falls in between these messages, then there
may be a race.

Let's say we don't have event pending check, this can happen:

Normal scenario
---------------
    event_count += n # controller generates new events
  interrupt asserts
    write(mask irq)
    event_count -= n # dwc3 clears events
  interrupt deasserts
    write(unmask irq)


Race scenario
-------------
    event_count += n # new events
  interrupt asserts
    write(mask irq)
    event_count -= n # clear events
    event_count += n # more events come and hard irq handler gets called
		     # again as interrupt is generated, but cached
		     # events haven't been handled. This breaks
		     # serialization and causes lost events.
    write(mask irq)

    event_count -= n # clear events
  interrupt deasserts
    write(unmask irq) # events handled


For MSI, this won't be a problem because it's edge-triggered and the way
it sends interrupt is different.

BR,
Thinh
Thinh Nguyen Feb. 2, 2023, 8:06 p.m. UTC | #7
On Thu, Feb 02, 2023, Linyu Yuan wrote:
> 
> On 2/2/2023 2:57 AM, Thinh Nguyen wrote:
> > On Tue, Jan 31, 2023, Linyu Yuan wrote:
> > > hi Thinh,
> > > 
> > > 
> > > regarding your suggestion, assume it is not PCIe type,  still have one
> > > question,
> > > 
> > > 
> > > -       if (evt->flags & DWC3_EVENT_PENDING)
> > > +       if (evt->flags & DWC3_EVENT_PENDING) {
> > > +               if (!evt->count) {
> > > +                       u32 reg = dwc3_readl(dwc->regs, DWC3_GEVNTSIZ(0));
> > > +
> > > +                       if (!(reg & DWC3_GEVNTSIZ_INTMASK))
> > > +                               evt->flags &= ~DWC3_EVENT_PENDING;
> > > 
> > > do we need to return IRQ_WAKE_THREAD  ?
> > No, if evt->count is 0, but GEVNTCOUNT is > 0, the controller will
> > generate interrupt. The evt->count will be updated and the events will
> > be handled on the next interrupt.
> 
> 
> when will next interrupt happen ?

Immediately after. You can test this by just return IRQ_HANDLED and not
clear the GEVNTCOUNT to see its behavior.

> 
> as when enter here, i guess GEVENTCOUNT is already > 0, but we didn't read
> it.

GEVNTCOUNT is always updating as new events are generated. We only clear
however many events we process, but that doesn't stop it from
incrementing.

BR,
Thinh

> 
> 
> > 
> > > +               }
> > >                  return IRQ_HANDLED;
> > > 
> > > as here return IRQ HANDLED, how can we make sure a new IRQ will be handled
> > > after previous IRQ thread clean PENDING flag ?
> > If evt->count > 0, that means the bottom half is still running. So,
> > leave it be. If evt->count == 0, then the cached events are processed,
> > we're safe to clear the PENDING flag. New interrupt will be generated if
> > GEVNTCOUNT is > 0.
> > 
> > > +       }
> > > 
> > > 
> > > also for non-PCIe controller, consider IRQ mask register working correctly,
> > > 
> > > consider a case IRQ happen before IRQ thread exit,  here just return
> > > IRQ_HANDLED.
> > > 
> > > once IRQ thread exit, it will clean PENDING flag, so next IRQ event will run
> > > normally.
> > > 
> > > if 정재훈 saw PENDING flag is not cleared, does it mean IRQ thread have no
> > > chance to exit ?
> > The PENDING flag should be cleared eventually when the bottom half
> > completes. I don't expect the interrupt storm to block the IRQ thread
> > forever, but I can't guarantee the device behavor. 정재훈 can confirm.
> > This change should resolve the interrupt storm.
> > 
> > BR,
> > Thinh
diff mbox series

Patch

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 789976567f9f..5d2d5a9b9915 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -4355,8 +4355,11 @@  static irqreturn_t dwc3_check_event_buf(struct dwc3_event_buffer *evt)
 	 * irq event handler completes before caching new event to prevent
 	 * losing events.
 	 */
-	if (evt->flags & DWC3_EVENT_PENDING)
+	if (evt->flags & DWC3_EVENT_PENDING) {
+		if (!evt->count)
+			evt->flags &= ~DWC3_EVENT_PENDING;
 		return IRQ_HANDLED;
+	}
 
 	count = dwc3_readl(dwc->regs, DWC3_GEVNTCOUNT(0));
 	count &= DWC3_GEVNTCOUNT_MASK;