mbox series

[4.19.y,v2,0/9] Fix scheduling while atomic in dwc3_gadget_ep_dequeue

Message ID 20190628182413.33225-1-john.stultz@linaro.org
Headers show
Series Fix scheduling while atomic in dwc3_gadget_ep_dequeue | expand

Message

John Stultz June 28, 2019, 6:24 p.m. UTC
With recent changes in AOSP, adb is using asynchronous io, which
causes the following crash usually on a reboot:

[  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
[  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
[  184.316034] Preemption disabled at:
[  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
[  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356
[  184.334963] Hardware name: HiKey960 (DT)
[  184.338892] Call trace:
[  184.341352]  dump_backtrace+0x0/0x158
[  184.345025]  show_stack+0x14/0x20
[  184.348355]  dump_stack+0x80/0xa4
[  184.351685]  __schedule_bug+0x6c/0xc0
[  184.355363]  __schedule+0x64c/0x978
[  184.358863]  schedule+0x2c/0x90
[  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
[  184.367210]  usb_ep_dequeue+0x24/0xf8
[  184.370884]  ffs_aio_cancel+0x3c/0x80
[  184.374561]  free_ioctx_users+0x40/0x148
[  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
[  184.383830]  rcu_process_callbacks+0x24c/0x5d8
[  184.388283]  __do_softirq+0x13c/0x398
[  184.391959]  run_ksoftirqd+0x3c/0x48
[  184.395549]  smpboot_thread_fn+0x220/0x288
[  184.399660]  kthread+0x12c/0x130
[  184.402901]  ret_from_fork+0x10/0x1c


This happens as usb_ep_dequeue can be called in interrupt
context, and dwc3_gadget_ep_dequeue() then calls
wait_event_lock_irq() which can sleep.

Upstream kernels are not affected due to the change
fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
removes the wait_even_lock_irq code. Unfortunately that change
has a number of dependencies, which I'm submitting here.

Also, to match upstream, in this series I've reverted one
change that was backported to -stable, to replace it with the
cherry-picked upstream commit (as the dependencies are now
there)

This issue also affects 4.14,4.9 and I believe 4.4 kernels,
however I don't know how to best backport this functionality
that far back. Help from the maintainers would be very much
appreciated!


New in v2:
* Reordered the patchset to put the revert patch first, which
  avoids any bisection build issues. (Thanks to Jack Pham for
  the suggestion!)


Feedback and comments would be welcome!

thanks
-john

Cc: Fei Yang <fei.yang@intel.com>
Cc: Sam Protsenko <semen.protsenko@linaro.org>
Cc: Felipe Balbi <balbi@kernel.org>
Cc: Jack Pham <jackp@codeaurora.org>
Cc: linux-usb@vger.kernel.org
Cc: stable@vger.kernel.org # 4.19.y


Felipe Balbi (7):
  usb: dwc3: gadget: combine unaligned and zero flags
  usb: dwc3: gadget: track number of TRBs per request
  usb: dwc3: gadget: use num_trbs when skipping TRBs on ->dequeue()
  usb: dwc3: gadget: extract dwc3_gadget_ep_skip_trbs()
  usb: dwc3: gadget: introduce cancelled_list
  usb: dwc3: gadget: move requests to cancelled_list
  usb: dwc3: gadget: remove wait_end_transfer

Jack Pham (1):
  usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup

John Stultz (1):
  Revert "usb: dwc3: gadget: Clear req->needs_extra_trb flag on cleanup"

 drivers/usb/dwc3/core.h   |  15 ++--
 drivers/usb/dwc3/gadget.c | 158 +++++++++++++-------------------------
 drivers/usb/dwc3/gadget.h |  15 ++++
 3 files changed, 75 insertions(+), 113 deletions(-)

-- 
2.17.1

Comments

Sasha Levin June 28, 2019, 10:58 p.m. UTC | #1
On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote:
>With recent changes in AOSP, adb is using asynchronous io, which

>causes the following crash usually on a reboot:

>

>[  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104

>[  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a

>[  184.316034] Preemption disabled at:

>[  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398

>[  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356

>[  184.334963] Hardware name: HiKey960 (DT)

>[  184.338892] Call trace:

>[  184.341352]  dump_backtrace+0x0/0x158

>[  184.345025]  show_stack+0x14/0x20

>[  184.348355]  dump_stack+0x80/0xa4

>[  184.351685]  __schedule_bug+0x6c/0xc0

>[  184.355363]  __schedule+0x64c/0x978

>[  184.358863]  schedule+0x2c/0x90

>[  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]

>[  184.367210]  usb_ep_dequeue+0x24/0xf8

>[  184.370884]  ffs_aio_cancel+0x3c/0x80

>[  184.374561]  free_ioctx_users+0x40/0x148

>[  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0

>[  184.383830]  rcu_process_callbacks+0x24c/0x5d8

>[  184.388283]  __do_softirq+0x13c/0x398

>[  184.391959]  run_ksoftirqd+0x3c/0x48

>[  184.395549]  smpboot_thread_fn+0x220/0x288

>[  184.399660]  kthread+0x12c/0x130

>[  184.402901]  ret_from_fork+0x10/0x1c

>

>

>This happens as usb_ep_dequeue can be called in interrupt

>context, and dwc3_gadget_ep_dequeue() then calls

>wait_event_lock_irq() which can sleep.

>

>Upstream kernels are not affected due to the change

>fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which

>removes the wait_even_lock_irq code. Unfortunately that change

>has a number of dependencies, which I'm submitting here.

>

>Also, to match upstream, in this series I've reverted one

>change that was backported to -stable, to replace it with the

>cherry-picked upstream commit (as the dependencies are now

>there)

>

>This issue also affects 4.14,4.9 and I believe 4.4 kernels,

>however I don't know how to best backport this functionality

>that far back. Help from the maintainers would be very much

>appreciated!

>

>

>New in v2:

>* Reordered the patchset to put the revert patch first, which

>  avoids any bisection build issues. (Thanks to Jack Pham for

>  the suggestion!)

>

>

>Feedback and comments would be welcome!


I've queued it up for 4.19.

Is it the case that for older kernels the dependency list is too long?

--
Thanks,
Sasha
Thinh Nguyen July 1, 2019, 11:36 p.m. UTC | #2
Hi,

John Stultz wrote:
> On Fri, Jun 28, 2019 at 3:58 PM Sasha Levin <sashal@kernel.org> wrote:
>> On Fri, Jun 28, 2019 at 06:24:04PM +0000, John Stultz wrote:
>>> With recent changes in AOSP, adb is using asynchronous io, which
>>> causes the following crash usually on a reboot:
>>>
>>> [  184.278302] BUG: scheduling while atomic: ksoftirqd/0/9/0x00000104
>>> [  184.284617] Modules linked in: wl18xx wlcore snd_soc_hdmi_codec wlcore_sdio tcpci_rt1711h tcpci tcpm typec adv7511 cec dwc3 phy_hi3660_usb3 snd_soc_simple_card snd_soc_a
>>> [  184.316034] Preemption disabled at:
>>> [  184.316072] [<ffffff8008081de4>] __do_softirq+0x64/0x398
>>> [  184.324953] CPU: 0 PID: 9 Comm: ksoftirqd/0 Tainted: G S                4.19.43-00669-g8e4970572c43-dirty #356
>>> [  184.334963] Hardware name: HiKey960 (DT)
>>> [  184.338892] Call trace:
>>> [  184.341352]  dump_backtrace+0x0/0x158
>>> [  184.345025]  show_stack+0x14/0x20
>>> [  184.348355]  dump_stack+0x80/0xa4
>>> [  184.351685]  __schedule_bug+0x6c/0xc0
>>> [  184.355363]  __schedule+0x64c/0x978
>>> [  184.358863]  schedule+0x2c/0x90
>>> [  184.362053]  dwc3_gadget_ep_dequeue+0x274/0x388 [dwc3]
>>> [  184.367210]  usb_ep_dequeue+0x24/0xf8
>>> [  184.370884]  ffs_aio_cancel+0x3c/0x80
>>> [  184.374561]  free_ioctx_users+0x40/0x148
>>> [  184.378500]  percpu_ref_switch_to_atomic_rcu+0x180/0x1c0
>>> [  184.383830]  rcu_process_callbacks+0x24c/0x5d8
>>> [  184.388283]  __do_softirq+0x13c/0x398
>>> [  184.391959]  run_ksoftirqd+0x3c/0x48
>>> [  184.395549]  smpboot_thread_fn+0x220/0x288
>>> [  184.399660]  kthread+0x12c/0x130
>>> [  184.402901]  ret_from_fork+0x10/0x1c
>>>
>>>
>>> This happens as usb_ep_dequeue can be called in interrupt
>>> context, and dwc3_gadget_ep_dequeue() then calls
>>> wait_event_lock_irq() which can sleep.
>>>
>>> Upstream kernels are not affected due to the change
>>> fec9095bdef4 ("dwc3: gadget: remove wait_end_transfer") which
>>> removes the wait_even_lock_irq code. Unfortunately that change
>>> has a number of dependencies, which I'm submitting here.
>>>
>>> Also, to match upstream, in this series I've reverted one
>>> change that was backported to -stable, to replace it with the
>>> cherry-picked upstream commit (as the dependencies are now
>>> there)
>>>
>>> This issue also affects 4.14,4.9 and I believe 4.4 kernels,
>>> however I don't know how to best backport this functionality
>>> that far back. Help from the maintainers would be very much
>>> appreciated!
>>>
>>>
>>> New in v2:
>>> * Reordered the patchset to put the revert patch first, which
>>>  avoids any bisection build issues. (Thanks to Jack Pham for
>>>  the suggestion!)
>>>
>>>
>>> Feedback and comments would be welcome!
>> I've queued it up for 4.19.
>>
>> Is it the case that for older kernels the dependency list is too long?
> Yea. It gets ugly and I'm not enough of an expert on the driver to
> feel comfortable knowing if I'm doing the right thing reworking this
> stack onto an even older tree.
>
> But I do see crashes on reboot w/ 4.14 and 4.9 (I and suspect 4.4 as
> well), so I'll need to figure out something eventually.
>
>

If you're backporting this series, then you also need to apply these
fixes for this series:

This fixes a race issue:
c5353b225df9 ("usb: dwc3: gadget: don't enable interrupt when disabling
endpoint")

This fixes incorrect TRB skip:
c7152763f02e ("usb: dwc3: Reset num_trbs after skipping")

BR,
Thinh