Message ID | 1615507142-23097-1-git-send-email-wcheng@codeaurora.org |
---|---|
State | Superseded |
Headers | show |
Series | [v3] usb: dwc3: gadget: Prevent EP queuing while stopping transfers | expand |
On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > > In the situations where the DWC3 gadget stops active transfers, once > calling the dwc3_gadget_giveback(), there is a chance where a function > driver can queue a new USB request in between the time where the dwc3 > lock has been released and re-aquired. This occurs after we've already > issued an ENDXFER command. When the stop active transfers continues > to remove USB requests from all dep lists, the newly added request will > also be removed, while controller still has an active TRB for it. > This can lead to the controller accessing an unmapped memory address. > > Fix this by ensuring parameters to prevent EP queuing are set before > calling the stop active transfers API. commit f09ddcfcb8c569675066337adac2ac205113471f Author: Wesley Cheng <wcheng@codeaurora.org> Date: Thu Mar 11 15:59:02 2021 -0800 usb: dwc3: gadget: Prevent EP queuing while stopping transfers effectively broke my gadget setup. The output of the kernel (followed by non responsive state of USB controller): [ 195.228586] using random self ethernet address [ 195.233104] using random host ethernet address [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready [ 195.780585] ------------[ cut here ]------------ [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in [ 195.790162] WARNING: CPU: 0 PID: 217 at drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 20 e9 80 fc ff ff 41 83 fe 92 [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) knlGS:0000000000000000 [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 [ 195.937300] Call Trace: [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 [ 195.949897] dwc3_gadget_ep_enable+0x5d/0x120 [ 195.954274] usb_ep_enable+0x27/0x80 [ 195.957869] gether_connect+0x32/0x1f0 [u_ether] [ 195.962512] eem_set_alt+0x6d/0x140 [usb_f_eem] [ 195.967061] composite_setup+0x224/0x1ba0 [libcomposite] [ 195.972405] ? debug_dma_unmap_page+0x79/0x80 [ 195.976782] ? configfs_composite_setup+0x6b/0x90 [libcomposite] [ 195.982811] configfs_composite_setup+0x6b/0x90 [libcomposite] [ 195.988668] dwc3_ep0_interrupt+0x459/0xa50 [ 195.992869] dwc3_thread_interrupt+0x8e2/0xee0 [ 195.997327] ? __schedule+0x237/0x6d0 [ 196.001005] ? disable_irq_nosync+0x10/0x10 [ 196.005200] irq_thread_fn+0x1b/0x60 [ 196.008789] irq_thread+0xd6/0x170 [ 196.012202] ? irq_thread_check_affinity+0x70/0x70 [ 196.017004] ? irq_forced_thread_fn+0x70/0x70 [ 196.021373] kthread+0x116/0x130 [ 196.024617] ? kthread_create_worker_on_cpu+0x60/0x60 [ 196.029680] ret_from_fork+0x22/0x30 [ 196.033272] ---[ end trace 8dd104a950d8d248 ]--- Revert helps (I'm on v5.12-rc4 now with a revert). The script to enable gadget: #!/bin/sh -efu # Mounting CONFIGFS grep -q -w /sys/kernel/config /proc/mounts || mount -t configfs none /sys/kernel/config # Addresses and files readonly GADGET_BASE_DIR="/sys/kernel/config/usb_gadget/g1" readonly DEV_ETH_ADDR="aa:bb:cc:dd:ee:f1" readonly HOST_ETH_ADDR="aa:bb:cc:dd:ee:f2" readonly USBDISK="/usbdisk.img" # Insert modules modprobe libcomposite # Create directory structure mkdir "${GADGET_BASE_DIR}" cd "${GADGET_BASE_DIR}" mkdir -p configs/c.1/strings/0x409 mkdir -p strings/0x409 # Ethernet device mkdir functions/eem.usb0 echo "${DEV_ETH_ADDR}" > functions/eem.usb0/dev_addr echo "${HOST_ETH_ADDR}" > functions/eem.usb0/host_addr ln -s functions/eem.usb0 configs/c.1/ # Composite Gadget Setup echo 0x1d6b > idVendor # Linux Foundation echo 0x0104 > idProduct # Multifunction Composite Gadget echo 0x0100 > bcdDevice # v1.0.0 echo 0x0200 > bcdUSB # USB2 echo "0123456789abcdef" > strings/0x409/serialnumber echo "USBArmory" > strings/0x409/manufacturer echo "USBArmory Gadget" > strings/0x409/product echo "Conf1" > configs/c.1/strings/0x409/configuration echo 120 > configs/c.1/MaxPower # Activate gadgets echo dwc3.0.auto > UDC Please, tell me how to fix this, otherwise I will have to send a revert. > Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") > Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> > --- > Changes since V2: > - Removed duplicate dwc->connected = false setting in pullup routine > > Changes since V1: > - Added Fixes tag to point to the commit this is addressing > > drivers/usb/dwc3/gadget.c | 11 +++++------ > 1 file changed, 5 insertions(+), 6 deletions(-) > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 4780983..2c94cc9 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -783,8 +783,6 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) > > trace_dwc3_gadget_ep_disable(dep); > > - dwc3_remove_requests(dwc, dep); > - > /* make sure HW endpoint isn't stalled */ > if (dep->flags & DWC3_EP_STALL) > __dwc3_gadget_ep_set_halt(dep, 0, false); > @@ -803,6 +801,8 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) > dep->endpoint.desc = NULL; > } > > + dwc3_remove_requests(dwc, dep); > + > return 0; > } > > @@ -1617,7 +1617,7 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req) > { > struct dwc3 *dwc = dep->dwc; > > - if (!dep->endpoint.desc || !dwc->pullups_connected) { > + if (!dep->endpoint.desc || !dwc->pullups_connected || !dwc->connected) { > dev_err(dwc->dev, "%s: can't queue to disabled endpoint\n", > dep->name); > return -ESHUTDOWN; > @@ -2247,6 +2247,7 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > if (!is_on) { > u32 count; > > + dwc->connected = false; > /* > * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a > * Section 4.1.8 Table 4-7, it states that for a device-initiated > @@ -2271,7 +2272,6 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) > dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) % > dwc->ev_buf->length; > } > - dwc->connected = false; > } else { > __dwc3_gadget_start(dwc); > } > @@ -3329,8 +3329,6 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) > { > u32 reg; > > - dwc->connected = true; > - > /* > * WORKAROUND: DWC3 revisions <1.88a have an issue which > * would cause a missing Disconnect Event if there's a > @@ -3370,6 +3368,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) > * transfers." > */ > dwc3_stop_active_transfers(dwc); > + dwc->connected = true; > > reg = dwc3_readl(dwc->regs, DWC3_DCTL); > reg &= ~DWC3_DCTL_TSTCTRL_MASK; > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project > -- With Best Regards, Andy Shevchenko
Hi Andy, On 3/22/2021 5:48 AM, Andy Shevchenko wrote: > On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >> >> In the situations where the DWC3 gadget stops active transfers, once >> calling the dwc3_gadget_giveback(), there is a chance where a function >> driver can queue a new USB request in between the time where the dwc3 >> lock has been released and re-aquired. This occurs after we've already >> issued an ENDXFER command. When the stop active transfers continues >> to remove USB requests from all dep lists, the newly added request will >> also be removed, while controller still has an active TRB for it. >> This can lead to the controller accessing an unmapped memory address. >> >> Fix this by ensuring parameters to prevent EP queuing are set before >> calling the stop active transfers API. > > > commit f09ddcfcb8c569675066337adac2ac205113471f > Author: Wesley Cheng <wcheng@codeaurora.org> > Date: Thu Mar 11 15:59:02 2021 -0800 > > usb: dwc3: gadget: Prevent EP queuing while stopping transfers > > effectively broke my gadget setup. > > The output of the kernel (followed by non responsive state of USB controller): > > [ 195.228586] using random self ethernet address > [ 195.233104] using random host ethernet address > [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 > [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 > # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready > [ 195.780585] ------------[ cut here ]------------ > [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in > [ 195.790162] WARNING: CPU: 0 PID: 217 at > drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 > [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite > brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps > s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt > snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec > spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci > extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m > mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel > [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 > [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, > BIOS 542 2015.01.21:18.19.48 > [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 > [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee > f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 > ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 > 20 e9 80 fc ff ff 41 83 fe 92 > [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 > [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff > [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 > [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb > [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 > [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 > [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) > knlGS:0000000000000000 > [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 > [ 195.937300] Call Trace: > [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 > [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 Odd that this change would affect the USB enablment path, as they were focused on the pullup disable path. Would you happen to have any downstream changes on top of v5.12-rc4 we could review to see if they are still required? (ie where is the dwc3_remove_requests() coming from during ep enable) > [ 195.949897] dwc3_gadget_ep_enable+0x5d/0x120 > [ 195.954274] usb_ep_enable+0x27/0x80 > [ 195.957869] gether_connect+0x32/0x1f0 [u_ether] > [ 195.962512] eem_set_alt+0x6d/0x140 [usb_f_eem] > [ 195.967061] composite_setup+0x224/0x1ba0 [libcomposite] > [ 195.972405] ? debug_dma_unmap_page+0x79/0x80 > [ 195.976782] ? configfs_composite_setup+0x6b/0x90 [libcomposite] > [ 195.982811] configfs_composite_setup+0x6b/0x90 [libcomposite] > [ 195.988668] dwc3_ep0_interrupt+0x459/0xa50 > [ 195.992869] dwc3_thread_interrupt+0x8e2/0xee0 > [ 195.997327] ? __schedule+0x237/0x6d0 > [ 196.001005] ? disable_irq_nosync+0x10/0x10 > [ 196.005200] irq_thread_fn+0x1b/0x60 > [ 196.008789] irq_thread+0xd6/0x170 > [ 196.012202] ? irq_thread_check_affinity+0x70/0x70 > [ 196.017004] ? irq_forced_thread_fn+0x70/0x70 > [ 196.021373] kthread+0x116/0x130 > [ 196.024617] ? kthread_create_worker_on_cpu+0x60/0x60 > [ 196.029680] ret_from_fork+0x22/0x30 > [ 196.033272] ---[ end trace 8dd104a950d8d248 ]--- > > Also, as I mentioned above, the changes should affect the pullup disable path, so when you 'echo "" > UDC' or something similar to that operation, did you see any errors? Can you provide a ftrace output w/ the DWC3 tracing enabled once removing the UDC? > Revert helps (I'm on v5.12-rc4 now with a revert). > > The script to enable gadget: > > #!/bin/sh -efu > > # Mounting CONFIGFS > grep -q -w /sys/kernel/config /proc/mounts || mount -t configfs none > /sys/kernel/config > > # Addresses and files > readonly GADGET_BASE_DIR="/sys/kernel/config/usb_gadget/g1" > readonly DEV_ETH_ADDR="aa:bb:cc:dd:ee:f1" > readonly HOST_ETH_ADDR="aa:bb:cc:dd:ee:f2" > readonly USBDISK="/usbdisk.img" > > # Insert modules > modprobe libcomposite > > # Create directory structure > mkdir "${GADGET_BASE_DIR}" > cd "${GADGET_BASE_DIR}" > mkdir -p configs/c.1/strings/0x409 > mkdir -p strings/0x409 > > # Ethernet device > mkdir functions/eem.usb0 > echo "${DEV_ETH_ADDR}" > functions/eem.usb0/dev_addr > echo "${HOST_ETH_ADDR}" > functions/eem.usb0/host_addr > ln -s functions/eem.usb0 configs/c.1/ > > # Composite Gadget Setup > echo 0x1d6b > idVendor # Linux Foundation > echo 0x0104 > idProduct # Multifunction Composite Gadget > echo 0x0100 > bcdDevice # v1.0.0 > echo 0x0200 > bcdUSB # USB2 > echo "0123456789abcdef" > strings/0x409/serialnumber > echo "USBArmory" > strings/0x409/manufacturer > echo "USBArmory Gadget" > strings/0x409/product > echo "Conf1" > configs/c.1/strings/0x409/configuration > echo 120 > configs/c.1/MaxPower > > # Activate gadgets > echo dwc3.0.auto > UDC > > Please, tell me how to fix this, otherwise I will have to send a revert. > This also fixes a potential SMMU fault on targets with that enabled. It causes the controller to access a stale TRB DMA address after it has already been unmapped. I think we should figure out what is causing the issue on your set up instead of reverting the entire change. Thanks Wesley Cheng >> Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") >> Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> >> --- >> Changes since V2: >> - Removed duplicate dwc->connected = false setting in pullup routine >> >> Changes since V1: >> - Added Fixes tag to point to the commit this is addressing >> >> drivers/usb/dwc3/gadget.c | 11 +++++------ >> 1 file changed, 5 insertions(+), 6 deletions(-) >> >> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c >> index 4780983..2c94cc9 100644 >> --- a/drivers/usb/dwc3/gadget.c >> +++ b/drivers/usb/dwc3/gadget.c >> @@ -783,8 +783,6 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) >> >> trace_dwc3_gadget_ep_disable(dep); >> >> - dwc3_remove_requests(dwc, dep); >> - >> /* make sure HW endpoint isn't stalled */ >> if (dep->flags & DWC3_EP_STALL) >> __dwc3_gadget_ep_set_halt(dep, 0, false); >> @@ -803,6 +801,8 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) >> dep->endpoint.desc = NULL; >> } >> >> + dwc3_remove_requests(dwc, dep); >> + >> return 0; >> } >> >> @@ -1617,7 +1617,7 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req) >> { >> struct dwc3 *dwc = dep->dwc; >> >> - if (!dep->endpoint.desc || !dwc->pullups_connected) { >> + if (!dep->endpoint.desc || !dwc->pullups_connected || !dwc->connected) { >> dev_err(dwc->dev, "%s: can't queue to disabled endpoint\n", >> dep->name); >> return -ESHUTDOWN; >> @@ -2247,6 +2247,7 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >> if (!is_on) { >> u32 count; >> >> + dwc->connected = false; >> /* >> * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a >> * Section 4.1.8 Table 4-7, it states that for a device-initiated >> @@ -2271,7 +2272,6 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) >> dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) % >> dwc->ev_buf->length; >> } >> - dwc->connected = false; >> } else { >> __dwc3_gadget_start(dwc); >> } >> @@ -3329,8 +3329,6 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) >> { >> u32 reg; >> >> - dwc->connected = true; >> - >> /* >> * WORKAROUND: DWC3 revisions <1.88a have an issue which >> * would cause a missing Disconnect Event if there's a >> @@ -3370,6 +3368,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) >> * transfers." >> */ >> dwc3_stop_active_transfers(dwc); >> + dwc->connected = true; >> >> reg = dwc3_readl(dwc->regs, DWC3_DCTL); >> reg &= ~DWC3_DCTL_TSTCTRL_MASK; >> -- >> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, >> a Linux Foundation Collaborative Project >> > > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > > Hi Andy, > > On 3/22/2021 5:48 AM, Andy Shevchenko wrote: > > On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > >> > >> In the situations where the DWC3 gadget stops active transfers, once > >> calling the dwc3_gadget_giveback(), there is a chance where a function > >> driver can queue a new USB request in between the time where the dwc3 > >> lock has been released and re-aquired. This occurs after we've already > >> issued an ENDXFER command. When the stop active transfers continues > >> to remove USB requests from all dep lists, the newly added request will > >> also be removed, while controller still has an active TRB for it. > >> This can lead to the controller accessing an unmapped memory address. > >> > >> Fix this by ensuring parameters to prevent EP queuing are set before > >> calling the stop active transfers API. > > > > > > commit f09ddcfcb8c569675066337adac2ac205113471f > > Author: Wesley Cheng <wcheng@codeaurora.org> > > Date: Thu Mar 11 15:59:02 2021 -0800 > > > > usb: dwc3: gadget: Prevent EP queuing while stopping transfers > > > > effectively broke my gadget setup. > > > > The output of the kernel (followed by non responsive state of USB controller): > > > > [ 195.228586] using random self ethernet address > > [ 195.233104] using random host ethernet address > > [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 > > [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 > > # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready > > [ 195.780585] ------------[ cut here ]------------ > > [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in > > [ 195.790162] WARNING: CPU: 0 PID: 217 at > > drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 > > [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite > > brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps > > s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt > > snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec > > spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci > > extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m > > mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel > > [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 > > [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, > > BIOS 542 2015.01.21:18.19.48 > > [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 > > [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee > > f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 > > ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 > > 20 e9 80 fc ff ff 41 83 fe 92 > > [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 > > [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff > > [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 > > [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb > > [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 > > [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 > > [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) > > knlGS:0000000000000000 > > [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 > > [ 195.937300] Call Trace: > > [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 > > [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 > > Odd that this change would affect the USB enablment path, as they were > focused on the pullup disable path. Would you happen to have any > downstream changes on top of v5.12-rc4 we could review to see if they > are still required? (ie where is the dwc3_remove_requests() coming from > during ep enable) You may check my branch [1] on GH. Basically you may be interested in the commit: 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: skip endpoints ep[18]{in,out} Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY suspend fix (which also shouldn't affect this). But I don't believe it should have affected this. > > [ 195.949897] dwc3_gadget_ep_enable+0x5d/0x120 > > [ 195.954274] usb_ep_enable+0x27/0x80 > > [ 195.957869] gether_connect+0x32/0x1f0 [u_ether] > > [ 195.962512] eem_set_alt+0x6d/0x140 [usb_f_eem] > > [ 195.967061] composite_setup+0x224/0x1ba0 [libcomposite] > > [ 195.972405] ? debug_dma_unmap_page+0x79/0x80 > > [ 195.976782] ? configfs_composite_setup+0x6b/0x90 [libcomposite] > > [ 195.982811] configfs_composite_setup+0x6b/0x90 [libcomposite] > > [ 195.988668] dwc3_ep0_interrupt+0x459/0xa50 > > [ 195.992869] dwc3_thread_interrupt+0x8e2/0xee0 > > [ 195.997327] ? __schedule+0x237/0x6d0 > > [ 196.001005] ? disable_irq_nosync+0x10/0x10 > > [ 196.005200] irq_thread_fn+0x1b/0x60 > > [ 196.008789] irq_thread+0xd6/0x170 > > [ 196.012202] ? irq_thread_check_affinity+0x70/0x70 > > [ 196.017004] ? irq_forced_thread_fn+0x70/0x70 > > [ 196.021373] kthread+0x116/0x130 > > [ 196.024617] ? kthread_create_worker_on_cpu+0x60/0x60 > > [ 196.029680] ret_from_fork+0x22/0x30 > > [ 196.033272] ---[ end trace 8dd104a950d8d248 ]--- > > > > > Also, as I mentioned above, the changes should affect the pullup disable > path, so when you 'echo "" > UDC' or something similar to that > operation, did you see any errors? After your patch I see a warning as above. Before — no errors or warnings. > Can you provide a ftrace output w/ > the DWC3 tracing enabled once removing the UDC? Can you provide step-by-step instructions what should I do? > > Revert helps (I'm on v5.12-rc4 now with a revert). > > > > The script to enable gadget: > > > > #!/bin/sh -efu > > > > # Mounting CONFIGFS > > grep -q -w /sys/kernel/config /proc/mounts || mount -t configfs none > > /sys/kernel/config > > > > # Addresses and files > > readonly GADGET_BASE_DIR="/sys/kernel/config/usb_gadget/g1" > > readonly DEV_ETH_ADDR="aa:bb:cc:dd:ee:f1" > > readonly HOST_ETH_ADDR="aa:bb:cc:dd:ee:f2" > > readonly USBDISK="/usbdisk.img" > > > > # Insert modules > > modprobe libcomposite > > > > # Create directory structure > > mkdir "${GADGET_BASE_DIR}" > > cd "${GADGET_BASE_DIR}" > > mkdir -p configs/c.1/strings/0x409 > > mkdir -p strings/0x409 > > > > # Ethernet device > > mkdir functions/eem.usb0 > > echo "${DEV_ETH_ADDR}" > functions/eem.usb0/dev_addr > > echo "${HOST_ETH_ADDR}" > functions/eem.usb0/host_addr > > ln -s functions/eem.usb0 configs/c.1/ > > > > # Composite Gadget Setup > > echo 0x1d6b > idVendor # Linux Foundation > > echo 0x0104 > idProduct # Multifunction Composite Gadget > > echo 0x0100 > bcdDevice # v1.0.0 > > echo 0x0200 > bcdUSB # USB2 > > echo "0123456789abcdef" > strings/0x409/serialnumber > > echo "USBArmory" > strings/0x409/manufacturer > > echo "USBArmory Gadget" > strings/0x409/product > > echo "Conf1" > configs/c.1/strings/0x409/configuration > > echo 120 > configs/c.1/MaxPower > > > > # Activate gadgets > > echo dwc3.0.auto > UDC > > > > Please, tell me how to fix this, otherwise I will have to send a revert. > > > This also fixes a potential SMMU fault on targets with that enabled. It > causes the controller to access a stale TRB DMA address after it has > already been unmapped. I think we should figure out what is causing the > issue on your set up instead of reverting the entire change. If we find a cause and have a fix during this week, otherwise it's rc5:ish timing when we may not have more time to play. -- With Best Regards, Andy Shevchenko
Hi Andy, On 3/22/2021 12:34 PM, Andy Shevchenko wrote: > On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >> >> Hi Andy, >> >> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: >>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>> >>>> In the situations where the DWC3 gadget stops active transfers, once >>>> calling the dwc3_gadget_giveback(), there is a chance where a function >>>> driver can queue a new USB request in between the time where the dwc3 >>>> lock has been released and re-aquired. This occurs after we've already >>>> issued an ENDXFER command. When the stop active transfers continues >>>> to remove USB requests from all dep lists, the newly added request will >>>> also be removed, while controller still has an active TRB for it. >>>> This can lead to the controller accessing an unmapped memory address. >>>> >>>> Fix this by ensuring parameters to prevent EP queuing are set before >>>> calling the stop active transfers API. >>> >>> >>> commit f09ddcfcb8c569675066337adac2ac205113471f >>> Author: Wesley Cheng <wcheng@codeaurora.org> >>> Date: Thu Mar 11 15:59:02 2021 -0800 >>> >>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers >>> >>> effectively broke my gadget setup. >>> >>> The output of the kernel (followed by non responsive state of USB controller): >>> >>> [ 195.228586] using random self ethernet address >>> [ 195.233104] using random host ethernet address >>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 >>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 >>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready >>> [ 195.780585] ------------[ cut here ]------------ >>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in >>> [ 195.790162] WARNING: CPU: 0 PID: 217 at >>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite >>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps >>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt >>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec >>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci >>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m >>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel >>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 >>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, >>> BIOS 542 2015.01.21:18.19.48 >>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee >>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 >>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 >>> 20 e9 80 fc ff ff 41 83 fe 92 >>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 >>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff >>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 >>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb >>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 >>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 >>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) >>> knlGS:0000000000000000 >>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 >>> [ 195.937300] Call Trace: >>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 >>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 >> >> Odd that this change would affect the USB enablment path, as they were >> focused on the pullup disable path. Would you happen to have any >> downstream changes on top of v5.12-rc4 we could review to see if they >> are still required? (ie where is the dwc3_remove_requests() coming from >> during ep enable) > > You may check my branch [1] on GH. Basically you may be interested in > the commit: > 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: > skip endpoints ep[18]{in,out} > Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY > suspend fix (which also shouldn't affect this). Can you link your GH reference? > > But I don't believe it should have affected this. > >>> [ 195.949897] dwc3_gadget_ep_enable+0x5d/0x120 >>> [ 195.954274] usb_ep_enable+0x27/0x80 >>> [ 195.957869] gether_connect+0x32/0x1f0 [u_ether] >>> [ 195.962512] eem_set_alt+0x6d/0x140 [usb_f_eem] >>> [ 195.967061] composite_setup+0x224/0x1ba0 [libcomposite] >>> [ 195.972405] ? debug_dma_unmap_page+0x79/0x80 >>> [ 195.976782] ? configfs_composite_setup+0x6b/0x90 [libcomposite] >>> [ 195.982811] configfs_composite_setup+0x6b/0x90 [libcomposite] >>> [ 195.988668] dwc3_ep0_interrupt+0x459/0xa50 >>> [ 195.992869] dwc3_thread_interrupt+0x8e2/0xee0 >>> [ 195.997327] ? __schedule+0x237/0x6d0 >>> [ 196.001005] ? disable_irq_nosync+0x10/0x10 >>> [ 196.005200] irq_thread_fn+0x1b/0x60 >>> [ 196.008789] irq_thread+0xd6/0x170 >>> [ 196.012202] ? irq_thread_check_affinity+0x70/0x70 >>> [ 196.017004] ? irq_forced_thread_fn+0x70/0x70 >>> [ 196.021373] kthread+0x116/0x130 >>> [ 196.024617] ? kthread_create_worker_on_cpu+0x60/0x60 >>> [ 196.029680] ret_from_fork+0x22/0x30 >>> [ 196.033272] ---[ end trace 8dd104a950d8d248 ]--- >>> >>> >> Also, as I mentioned above, the changes should affect the pullup disable >> path, so when you 'echo "" > UDC' or something similar to that >> operation, did you see any errors? > > After your patch I see a warning as above. Before — no errors or warnings. > >> Can you provide a ftrace output w/ >> the DWC3 tracing enabled once removing the UDC? > > Can you provide step-by-step instructions what should I do? > Let me try with your kernel, and steps below first. Thanks Wesley Cheng >>> Revert helps (I'm on v5.12-rc4 now with a revert). >>> >>> The script to enable gadget: >>> >>> #!/bin/sh -efu >>> >>> # Mounting CONFIGFS >>> grep -q -w /sys/kernel/config /proc/mounts || mount -t configfs none >>> /sys/kernel/config >>> >>> # Addresses and files >>> readonly GADGET_BASE_DIR="/sys/kernel/config/usb_gadget/g1" >>> readonly DEV_ETH_ADDR="aa:bb:cc:dd:ee:f1" >>> readonly HOST_ETH_ADDR="aa:bb:cc:dd:ee:f2" >>> readonly USBDISK="/usbdisk.img" >>> >>> # Insert modules >>> modprobe libcomposite >>> >>> # Create directory structure >>> mkdir "${GADGET_BASE_DIR}" >>> cd "${GADGET_BASE_DIR}" >>> mkdir -p configs/c.1/strings/0x409 >>> mkdir -p strings/0x409 >>> >>> # Ethernet device >>> mkdir functions/eem.usb0 >>> echo "${DEV_ETH_ADDR}" > functions/eem.usb0/dev_addr >>> echo "${HOST_ETH_ADDR}" > functions/eem.usb0/host_addr >>> ln -s functions/eem.usb0 configs/c.1/ >>> >>> # Composite Gadget Setup >>> echo 0x1d6b > idVendor # Linux Foundation >>> echo 0x0104 > idProduct # Multifunction Composite Gadget >>> echo 0x0100 > bcdDevice # v1.0.0 >>> echo 0x0200 > bcdUSB # USB2 >>> echo "0123456789abcdef" > strings/0x409/serialnumber >>> echo "USBArmory" > strings/0x409/manufacturer >>> echo "USBArmory Gadget" > strings/0x409/product >>> echo "Conf1" > configs/c.1/strings/0x409/configuration >>> echo 120 > configs/c.1/MaxPower >>> >>> # Activate gadgets >>> echo dwc3.0.auto > UDC >>> >>> Please, tell me how to fix this, otherwise I will have to send a revert. >>> >> This also fixes a potential SMMU fault on targets with that enabled. It >> causes the controller to access a stale TRB DMA address after it has >> already been unmapped. I think we should figure out what is causing the >> issue on your set up instead of reverting the entire change. > > If we find a cause and have a fix during this week, otherwise it's > rc5:ish timing when we may not have more time to play. > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > > Hi Andy, > > On 3/22/2021 12:34 PM, Andy Shevchenko wrote: > > On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > >> > >> Hi Andy, > >> > >> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: > >>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > >>>> > >>>> In the situations where the DWC3 gadget stops active transfers, once > >>>> calling the dwc3_gadget_giveback(), there is a chance where a function > >>>> driver can queue a new USB request in between the time where the dwc3 > >>>> lock has been released and re-aquired. This occurs after we've already > >>>> issued an ENDXFER command. When the stop active transfers continues > >>>> to remove USB requests from all dep lists, the newly added request will > >>>> also be removed, while controller still has an active TRB for it. > >>>> This can lead to the controller accessing an unmapped memory address. > >>>> > >>>> Fix this by ensuring parameters to prevent EP queuing are set before > >>>> calling the stop active transfers API. > >>> > >>> > >>> commit f09ddcfcb8c569675066337adac2ac205113471f > >>> Author: Wesley Cheng <wcheng@codeaurora.org> > >>> Date: Thu Mar 11 15:59:02 2021 -0800 > >>> > >>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers > >>> > >>> effectively broke my gadget setup. > >>> > >>> The output of the kernel (followed by non responsive state of USB controller): > >>> > >>> [ 195.228586] using random self ethernet address > >>> [ 195.233104] using random host ethernet address > >>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 > >>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 > >>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready > >>> [ 195.780585] ------------[ cut here ]------------ > >>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in > >>> [ 195.790162] WARNING: CPU: 0 PID: 217 at > >>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 > >>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite > >>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps > >>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt > >>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec > >>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci > >>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m > >>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel > >>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 > >>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, > >>> BIOS 542 2015.01.21:18.19.48 > >>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 > >>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee > >>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 > >>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 > >>> 20 e9 80 fc ff ff 41 83 fe 92 > >>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 > >>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff > >>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 > >>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb > >>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 > >>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 > >>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) > >>> knlGS:0000000000000000 > >>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 > >>> [ 195.937300] Call Trace: > >>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 > >>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 > >> > >> Odd that this change would affect the USB enablment path, as they were > >> focused on the pullup disable path. Would you happen to have any > >> downstream changes on top of v5.12-rc4 we could review to see if they > >> are still required? (ie where is the dwc3_remove_requests() coming from > >> during ep enable) > > > > You may check my branch [1] on GH. Basically you may be interested in > > the commit: > > 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: > > skip endpoints ep[18]{in,out} > > Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY > > suspend fix (which also shouldn't affect this). > > Can you link your GH reference? Oops, sorry. Here we are: [1]: https://github.com/andy-shev/linux/tree/eds-acpi > > > > But I don't believe it should have affected this. > > > >>> [ 195.949897] dwc3_gadget_ep_enable+0x5d/0x120 > >>> [ 195.954274] usb_ep_enable+0x27/0x80 > >>> [ 195.957869] gether_connect+0x32/0x1f0 [u_ether] > >>> [ 195.962512] eem_set_alt+0x6d/0x140 [usb_f_eem] > >>> [ 195.967061] composite_setup+0x224/0x1ba0 [libcomposite] > >>> [ 195.972405] ? debug_dma_unmap_page+0x79/0x80 > >>> [ 195.976782] ? configfs_composite_setup+0x6b/0x90 [libcomposite] > >>> [ 195.982811] configfs_composite_setup+0x6b/0x90 [libcomposite] > >>> [ 195.988668] dwc3_ep0_interrupt+0x459/0xa50 > >>> [ 195.992869] dwc3_thread_interrupt+0x8e2/0xee0 > >>> [ 195.997327] ? __schedule+0x237/0x6d0 > >>> [ 196.001005] ? disable_irq_nosync+0x10/0x10 > >>> [ 196.005200] irq_thread_fn+0x1b/0x60 > >>> [ 196.008789] irq_thread+0xd6/0x170 > >>> [ 196.012202] ? irq_thread_check_affinity+0x70/0x70 > >>> [ 196.017004] ? irq_forced_thread_fn+0x70/0x70 > >>> [ 196.021373] kthread+0x116/0x130 > >>> [ 196.024617] ? kthread_create_worker_on_cpu+0x60/0x60 > >>> [ 196.029680] ret_from_fork+0x22/0x30 > >>> [ 196.033272] ---[ end trace 8dd104a950d8d248 ]--- > >>> > >>> > >> Also, as I mentioned above, the changes should affect the pullup disable > >> path, so when you 'echo "" > UDC' or something similar to that > >> operation, did you see any errors? > > > > After your patch I see a warning as above. Before — no errors or warnings. > > > >> Can you provide a ftrace output w/ > >> the DWC3 tracing enabled once removing the UDC? > > > > Can you provide step-by-step instructions what should I do? > > > Let me try with your kernel, and steps below first. > > Thanks > Wesley Cheng > > >>> Revert helps (I'm on v5.12-rc4 now with a revert). > >>> > >>> The script to enable gadget: > >>> > >>> #!/bin/sh -efu > >>> > >>> # Mounting CONFIGFS > >>> grep -q -w /sys/kernel/config /proc/mounts || mount -t configfs none > >>> /sys/kernel/config > >>> > >>> # Addresses and files > >>> readonly GADGET_BASE_DIR="/sys/kernel/config/usb_gadget/g1" > >>> readonly DEV_ETH_ADDR="aa:bb:cc:dd:ee:f1" > >>> readonly HOST_ETH_ADDR="aa:bb:cc:dd:ee:f2" > >>> readonly USBDISK="/usbdisk.img" > >>> > >>> # Insert modules > >>> modprobe libcomposite > >>> > >>> # Create directory structure > >>> mkdir "${GADGET_BASE_DIR}" > >>> cd "${GADGET_BASE_DIR}" > >>> mkdir -p configs/c.1/strings/0x409 > >>> mkdir -p strings/0x409 > >>> > >>> # Ethernet device > >>> mkdir functions/eem.usb0 > >>> echo "${DEV_ETH_ADDR}" > functions/eem.usb0/dev_addr > >>> echo "${HOST_ETH_ADDR}" > functions/eem.usb0/host_addr > >>> ln -s functions/eem.usb0 configs/c.1/ > >>> > >>> # Composite Gadget Setup > >>> echo 0x1d6b > idVendor # Linux Foundation > >>> echo 0x0104 > idProduct # Multifunction Composite Gadget > >>> echo 0x0100 > bcdDevice # v1.0.0 > >>> echo 0x0200 > bcdUSB # USB2 > >>> echo "0123456789abcdef" > strings/0x409/serialnumber > >>> echo "USBArmory" > strings/0x409/manufacturer > >>> echo "USBArmory Gadget" > strings/0x409/product > >>> echo "Conf1" > configs/c.1/strings/0x409/configuration > >>> echo 120 > configs/c.1/MaxPower > >>> > >>> # Activate gadgets > >>> echo dwc3.0.auto > UDC > >>> > >>> Please, tell me how to fix this, otherwise I will have to send a revert. > >>> > >> This also fixes a potential SMMU fault on targets with that enabled. It > >> causes the controller to access a stale TRB DMA address after it has > >> already been unmapped. I think we should figure out what is causing the > >> issue on your set up instead of reverting the entire change. > > > > If we find a cause and have a fix during this week, otherwise it's > > rc5:ish timing when we may not have more time to play. > > > > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project -- With Best Regards, Andy Shevchenko
Hi Andy, On 3/22/2021 2:14 PM, Andy Shevchenko wrote: > On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >> >> Hi Andy, >> >> On 3/22/2021 12:34 PM, Andy Shevchenko wrote: >>> On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>> >>>> Hi Andy, >>>> >>>> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: >>>>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>>> >>>>>> In the situations where the DWC3 gadget stops active transfers, once >>>>>> calling the dwc3_gadget_giveback(), there is a chance where a function >>>>>> driver can queue a new USB request in between the time where the dwc3 >>>>>> lock has been released and re-aquired. This occurs after we've already >>>>>> issued an ENDXFER command. When the stop active transfers continues >>>>>> to remove USB requests from all dep lists, the newly added request will >>>>>> also be removed, while controller still has an active TRB for it. >>>>>> This can lead to the controller accessing an unmapped memory address. >>>>>> >>>>>> Fix this by ensuring parameters to prevent EP queuing are set before >>>>>> calling the stop active transfers API. >>>>> >>>>> >>>>> commit f09ddcfcb8c569675066337adac2ac205113471f >>>>> Author: Wesley Cheng <wcheng@codeaurora.org> >>>>> Date: Thu Mar 11 15:59:02 2021 -0800 >>>>> >>>>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers >>>>> >>>>> effectively broke my gadget setup. >>>>> >>>>> The output of the kernel (followed by non responsive state of USB controller): >>>>> >>>>> [ 195.228586] using random self ethernet address >>>>> [ 195.233104] using random host ethernet address >>>>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 >>>>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 >>>>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready >>>>> [ 195.780585] ------------[ cut here ]------------ >>>>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in >>>>> [ 195.790162] WARNING: CPU: 0 PID: 217 at >>>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite >>>>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps >>>>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt >>>>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec >>>>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci >>>>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m >>>>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel >>>>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 >>>>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, >>>>> BIOS 542 2015.01.21:18.19.48 >>>>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee >>>>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 >>>>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 >>>>> 20 e9 80 fc ff ff 41 83 fe 92 >>>>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 >>>>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff >>>>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 >>>>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb >>>>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 >>>>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 >>>>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 >>>>> [ 195.937300] Call Trace: >>>>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 >>>>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 >>>> >>>> Odd that this change would affect the USB enablment path, as they were >>>> focused on the pullup disable path. Would you happen to have any >>>> downstream changes on top of v5.12-rc4 we could review to see if they >>>> are still required? (ie where is the dwc3_remove_requests() coming from >>>> during ep enable) >>> >>> You may check my branch [1] on GH. Basically you may be interested in >>> the commit: >>> 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: >>> skip endpoints ep[18]{in,out} >>> Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY >>> suspend fix (which also shouldn't affect this). >> >> Can you link your GH reference? > > Oops, sorry. > Here we are: > > [1]: https://github.com/andy-shev/linux/tree/eds-acpi > Thanks, I took a look and even tried it on my device running 5.12-rc4, but wasn't able to see the same problem. Could you help collect the ftrace after enabling the tracing KCONFIG and running the below sequence? 1. Mount debugfs 2. Set up tracing instance mkdir /sys/kernel/debug/tracing/instances/usb echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_complete_trb/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ctrl_req/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_dequeue/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_queue/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_cmd/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_disable/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_enable/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_giveback/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_prepare_trb/enable echo 1 > /sys/kernel/debug/tracing/instances/usb/tracing_on 3. Run your test, which should include: - echo "" > /sys/kernel/config/usb_gadget/g1/UDC - echo <UDC name> > /sys/kernel/config/usb_gadget/g1/UDC 4. Collect the trace output: cat /sys/kernel/debug/tracing/instances/usb/trace Thanks Wesley Cheng >>> >>> But I don't believe it should have affected this. >>> >>>>> [ 195.949897] dwc3_gadget_ep_enable+0x5d/0x120 >>>>> [ 195.954274] usb_ep_enable+0x27/0x80 >>>>> [ 195.957869] gether_connect+0x32/0x1f0 [u_ether] >>>>> [ 195.962512] eem_set_alt+0x6d/0x140 [usb_f_eem] >>>>> [ 195.967061] composite_setup+0x224/0x1ba0 [libcomposite] >>>>> [ 195.972405] ? debug_dma_unmap_page+0x79/0x80 >>>>> [ 195.976782] ? configfs_composite_setup+0x6b/0x90 [libcomposite] >>>>> [ 195.982811] configfs_composite_setup+0x6b/0x90 [libcomposite] >>>>> [ 195.988668] dwc3_ep0_interrupt+0x459/0xa50 >>>>> [ 195.992869] dwc3_thread_interrupt+0x8e2/0xee0 >>>>> [ 195.997327] ? __schedule+0x237/0x6d0 >>>>> [ 196.001005] ? disable_irq_nosync+0x10/0x10 >>>>> [ 196.005200] irq_thread_fn+0x1b/0x60 >>>>> [ 196.008789] irq_thread+0xd6/0x170 >>>>> [ 196.012202] ? irq_thread_check_affinity+0x70/0x70 >>>>> [ 196.017004] ? irq_forced_thread_fn+0x70/0x70 >>>>> [ 196.021373] kthread+0x116/0x130 >>>>> [ 196.024617] ? kthread_create_worker_on_cpu+0x60/0x60 >>>>> [ 196.029680] ret_from_fork+0x22/0x30 >>>>> [ 196.033272] ---[ end trace 8dd104a950d8d248 ]--- >>>>> >>>>> >>>> Also, as I mentioned above, the changes should affect the pullup disable >>>> path, so when you 'echo "" > UDC' or something similar to that >>>> operation, did you see any errors? >>> >>> After your patch I see a warning as above. Before — no errors or warnings. >>> >>>> Can you provide a ftrace output w/ >>>> the DWC3 tracing enabled once removing the UDC? >>> >>> Can you provide step-by-step instructions what should I do? >>> >> Let me try with your kernel, and steps below first. >> >> Thanks >> Wesley Cheng >> >>>>> Revert helps (I'm on v5.12-rc4 now with a revert). >>>>> >>>>> The script to enable gadget: >>>>> >>>>> #!/bin/sh -efu >>>>> >>>>> # Mounting CONFIGFS >>>>> grep -q -w /sys/kernel/config /proc/mounts || mount -t configfs none >>>>> /sys/kernel/config >>>>> >>>>> # Addresses and files >>>>> readonly GADGET_BASE_DIR="/sys/kernel/config/usb_gadget/g1" >>>>> readonly DEV_ETH_ADDR="aa:bb:cc:dd:ee:f1" >>>>> readonly HOST_ETH_ADDR="aa:bb:cc:dd:ee:f2" >>>>> readonly USBDISK="/usbdisk.img" >>>>> >>>>> # Insert modules >>>>> modprobe libcomposite >>>>> >>>>> # Create directory structure >>>>> mkdir "${GADGET_BASE_DIR}" >>>>> cd "${GADGET_BASE_DIR}" >>>>> mkdir -p configs/c.1/strings/0x409 >>>>> mkdir -p strings/0x409 >>>>> >>>>> # Ethernet device >>>>> mkdir functions/eem.usb0 >>>>> echo "${DEV_ETH_ADDR}" > functions/eem.usb0/dev_addr >>>>> echo "${HOST_ETH_ADDR}" > functions/eem.usb0/host_addr >>>>> ln -s functions/eem.usb0 configs/c.1/ >>>>> >>>>> # Composite Gadget Setup >>>>> echo 0x1d6b > idVendor # Linux Foundation >>>>> echo 0x0104 > idProduct # Multifunction Composite Gadget >>>>> echo 0x0100 > bcdDevice # v1.0.0 >>>>> echo 0x0200 > bcdUSB # USB2 >>>>> echo "0123456789abcdef" > strings/0x409/serialnumber >>>>> echo "USBArmory" > strings/0x409/manufacturer >>>>> echo "USBArmory Gadget" > strings/0x409/product >>>>> echo "Conf1" > configs/c.1/strings/0x409/configuration >>>>> echo 120 > configs/c.1/MaxPower >>>>> >>>>> # Activate gadgets >>>>> echo dwc3.0.auto > UDC >>>>> >>>>> Please, tell me how to fix this, otherwise I will have to send a revert. >>>>> >>>> This also fixes a potential SMMU fault on targets with that enabled. It >>>> causes the controller to access a stale TRB DMA address after it has >>>> already been unmapped. I think we should figure out what is causing the >>>> issue on your set up instead of reverting the entire change. >>> >>> If we find a cause and have a fix during this week, otherwise it's >>> rc5:ish timing when we may not have more time to play. >>> >> >> -- >> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, >> a Linux Foundation Collaborative Project > > > -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
On Tue, Mar 23, 2021 at 1:19 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > > Hi Andy, > > On 3/22/2021 2:14 PM, Andy Shevchenko wrote: > > On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > >> > >> Hi Andy, > >> > >> On 3/22/2021 12:34 PM, Andy Shevchenko wrote: > >>> On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > >>>> > >>>> Hi Andy, > >>>> > >>>> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: > >>>>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > >>>>>> > >>>>>> In the situations where the DWC3 gadget stops active transfers, once > >>>>>> calling the dwc3_gadget_giveback(), there is a chance where a function > >>>>>> driver can queue a new USB request in between the time where the dwc3 > >>>>>> lock has been released and re-aquired. This occurs after we've already > >>>>>> issued an ENDXFER command. When the stop active transfers continues > >>>>>> to remove USB requests from all dep lists, the newly added request will > >>>>>> also be removed, while controller still has an active TRB for it. > >>>>>> This can lead to the controller accessing an unmapped memory address. > >>>>>> > >>>>>> Fix this by ensuring parameters to prevent EP queuing are set before > >>>>>> calling the stop active transfers API. > >>>>> > >>>>> > >>>>> commit f09ddcfcb8c569675066337adac2ac205113471f > >>>>> Author: Wesley Cheng <wcheng@codeaurora.org> > >>>>> Date: Thu Mar 11 15:59:02 2021 -0800 > >>>>> > >>>>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers > >>>>> > >>>>> effectively broke my gadget setup. > >>>>> > >>>>> The output of the kernel (followed by non responsive state of USB controller): > >>>>> > >>>>> [ 195.228586] using random self ethernet address > >>>>> [ 195.233104] using random host ethernet address > >>>>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 > >>>>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 > >>>>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready > >>>>> [ 195.780585] ------------[ cut here ]------------ > >>>>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in > >>>>> [ 195.790162] WARNING: CPU: 0 PID: 217 at > >>>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 > >>>>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite > >>>>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps > >>>>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt > >>>>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec > >>>>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci > >>>>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m > >>>>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel > >>>>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 > >>>>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, > >>>>> BIOS 542 2015.01.21:18.19.48 > >>>>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 > >>>>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee > >>>>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 > >>>>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 > >>>>> 20 e9 80 fc ff ff 41 83 fe 92 > >>>>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 > >>>>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff > >>>>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 > >>>>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb > >>>>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 > >>>>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 > >>>>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) > >>>>> knlGS:0000000000000000 > >>>>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 > >>>>> [ 195.937300] Call Trace: > >>>>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 > >>>>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 > >>>> > >>>> Odd that this change would affect the USB enablment path, as they were > >>>> focused on the pullup disable path. Would you happen to have any > >>>> downstream changes on top of v5.12-rc4 we could review to see if they > >>>> are still required? (ie where is the dwc3_remove_requests() coming from > >>>> during ep enable) > >>> > >>> You may check my branch [1] on GH. Basically you may be interested in > >>> the commit: > >>> 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: > >>> skip endpoints ep[18]{in,out} > >>> Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY > >>> suspend fix (which also shouldn't affect this). > >> > >> Can you link your GH reference? > > > > Oops, sorry. > > Here we are: > > > > [1]: https://github.com/andy-shev/linux/tree/eds-acpi > > > Thanks, I took a look and even tried it on my device running 5.12-rc4, > but wasn't able to see the same problem. Could you help collect the > ftrace after enabling the tracing KCONFIG and running the below sequence? > > 1. Mount debugfs > 2. Set up tracing instance > > mkdir /sys/kernel/debug/tracing/instances/usb > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_complete_trb/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ctrl_req/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_dequeue/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_queue/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_cmd/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_disable/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_enable/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_giveback/enable > echo 1 > > /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_prepare_trb/enable > echo 1 > /sys/kernel/debug/tracing/instances/usb/tracing_on > > 3. Run your test, which should include: > - echo "" > /sys/kernel/config/usb_gadget/g1/UDC > - echo <UDC name> > /sys/kernel/config/usb_gadget/g1/UDC > > 4. Collect the trace output: > cat /sys/kernel/debug/tracing/instances/usb/trace Here we are (I cherry-picked again reverted patch, other stays the same) [2]. On top I put a warning, so you may see timestamps. Dunno how long it will stay there, please confirm that you got it. [2]: https://paste.ubuntu.com/p/jNF565ypPp/ -- With Best Regards, Andy Shevchenko
On 3/23/2021 10:27 AM, Andy Shevchenko wrote: > On Tue, Mar 23, 2021 at 1:19 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >> >> Hi Andy, >> >> On 3/22/2021 2:14 PM, Andy Shevchenko wrote: >>> On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>> >>>> Hi Andy, >>>> >>>> On 3/22/2021 12:34 PM, Andy Shevchenko wrote: >>>>> On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>>> >>>>>> Hi Andy, >>>>>> >>>>>> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: >>>>>>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>>>>> >>>>>>>> In the situations where the DWC3 gadget stops active transfers, once >>>>>>>> calling the dwc3_gadget_giveback(), there is a chance where a function >>>>>>>> driver can queue a new USB request in between the time where the dwc3 >>>>>>>> lock has been released and re-aquired. This occurs after we've already >>>>>>>> issued an ENDXFER command. When the stop active transfers continues >>>>>>>> to remove USB requests from all dep lists, the newly added request will >>>>>>>> also be removed, while controller still has an active TRB for it. >>>>>>>> This can lead to the controller accessing an unmapped memory address. >>>>>>>> >>>>>>>> Fix this by ensuring parameters to prevent EP queuing are set before >>>>>>>> calling the stop active transfers API. >>>>>>> >>>>>>> >>>>>>> commit f09ddcfcb8c569675066337adac2ac205113471f >>>>>>> Author: Wesley Cheng <wcheng@codeaurora.org> >>>>>>> Date: Thu Mar 11 15:59:02 2021 -0800 >>>>>>> >>>>>>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers >>>>>>> >>>>>>> effectively broke my gadget setup. >>>>>>> >>>>>>> The output of the kernel (followed by non responsive state of USB controller): >>>>>>> >>>>>>> [ 195.228586] using random self ethernet address >>>>>>> [ 195.233104] using random host ethernet address >>>>>>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 >>>>>>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 >>>>>>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready >>>>>>> [ 195.780585] ------------[ cut here ]------------ >>>>>>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in >>>>>>> [ 195.790162] WARNING: CPU: 0 PID: 217 at >>>>>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>>>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite >>>>>>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps >>>>>>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt >>>>>>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec >>>>>>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci >>>>>>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m >>>>>>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel >>>>>>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 >>>>>>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, >>>>>>> BIOS 542 2015.01.21:18.19.48 >>>>>>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>>>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee >>>>>>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 >>>>>>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 >>>>>>> 20 e9 80 fc ff ff 41 83 fe 92 >>>>>>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 >>>>>>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff >>>>>>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 >>>>>>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb >>>>>>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 >>>>>>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 >>>>>>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) >>>>>>> knlGS:0000000000000000 >>>>>>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 >>>>>>> [ 195.937300] Call Trace: >>>>>>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 >>>>>>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 >>>>>> >>>>>> Odd that this change would affect the USB enablment path, as they were >>>>>> focused on the pullup disable path. Would you happen to have any >>>>>> downstream changes on top of v5.12-rc4 we could review to see if they >>>>>> are still required? (ie where is the dwc3_remove_requests() coming from >>>>>> during ep enable) >>>>> >>>>> You may check my branch [1] on GH. Basically you may be interested in >>>>> the commit: >>>>> 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: >>>>> skip endpoints ep[18]{in,out} >>>>> Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY >>>>> suspend fix (which also shouldn't affect this). >>>> >>>> Can you link your GH reference? >>> >>> Oops, sorry. >>> Here we are: >>> >>> [1]: https://github.com/andy-shev/linux/tree/eds-acpi >>> >> Thanks, I took a look and even tried it on my device running 5.12-rc4, >> but wasn't able to see the same problem. Could you help collect the >> ftrace after enabling the tracing KCONFIG and running the below sequence? >> >> 1. Mount debugfs >> 2. Set up tracing instance >> >> mkdir /sys/kernel/debug/tracing/instances/usb >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_complete_trb/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ctrl_req/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_dequeue/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_queue/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_cmd/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_disable/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_enable/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_giveback/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_prepare_trb/enable >> echo 1 > /sys/kernel/debug/tracing/instances/usb/tracing_on >> >> 3. Run your test, which should include: >> - echo "" > /sys/kernel/config/usb_gadget/g1/UDC >> - echo <UDC name> > /sys/kernel/config/usb_gadget/g1/UDC >> >> 4. Collect the trace output: >> cat /sys/kernel/debug/tracing/instances/usb/trace > > Here we are (I cherry-picked again reverted patch, other stays the same) [2]. > On top I put a warning, so you may see timestamps. > > Dunno how long it will stay there, please confirm that you got it. > > [2]: https://paste.ubuntu.com/p/jNF565ypPp/ > Hi Andy, Great, thanks, got access to it. Will take a look and update you. Thanks Wesley Cheng -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
On 3/23/2021 10:27 AM, Andy Shevchenko wrote: > On Tue, Mar 23, 2021 at 1:19 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >> >> Hi Andy, >> >> On 3/22/2021 2:14 PM, Andy Shevchenko wrote: >>> On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>> >>>> Hi Andy, >>>> >>>> On 3/22/2021 12:34 PM, Andy Shevchenko wrote: >>>>> On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>>> >>>>>> Hi Andy, >>>>>> >>>>>> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: >>>>>>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>>>>> >>>>>>>> In the situations where the DWC3 gadget stops active transfers, once >>>>>>>> calling the dwc3_gadget_giveback(), there is a chance where a function >>>>>>>> driver can queue a new USB request in between the time where the dwc3 >>>>>>>> lock has been released and re-aquired. This occurs after we've already >>>>>>>> issued an ENDXFER command. When the stop active transfers continues >>>>>>>> to remove USB requests from all dep lists, the newly added request will >>>>>>>> also be removed, while controller still has an active TRB for it. >>>>>>>> This can lead to the controller accessing an unmapped memory address. >>>>>>>> >>>>>>>> Fix this by ensuring parameters to prevent EP queuing are set before >>>>>>>> calling the stop active transfers API. >>>>>>> >>>>>>> >>>>>>> commit f09ddcfcb8c569675066337adac2ac205113471f >>>>>>> Author: Wesley Cheng <wcheng@codeaurora.org> >>>>>>> Date: Thu Mar 11 15:59:02 2021 -0800 >>>>>>> >>>>>>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers >>>>>>> >>>>>>> effectively broke my gadget setup. >>>>>>> >>>>>>> The output of the kernel (followed by non responsive state of USB controller): >>>>>>> >>>>>>> [ 195.228586] using random self ethernet address >>>>>>> [ 195.233104] using random host ethernet address >>>>>>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 >>>>>>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 >>>>>>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready >>>>>>> [ 195.780585] ------------[ cut here ]------------ >>>>>>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in >>>>>>> [ 195.790162] WARNING: CPU: 0 PID: 217 at >>>>>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>>>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite >>>>>>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps >>>>>>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt >>>>>>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec >>>>>>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci >>>>>>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m >>>>>>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel >>>>>>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 >>>>>>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, >>>>>>> BIOS 542 2015.01.21:18.19.48 >>>>>>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>>>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee >>>>>>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 >>>>>>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 >>>>>>> 20 e9 80 fc ff ff 41 83 fe 92 >>>>>>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 >>>>>>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff >>>>>>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 >>>>>>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb >>>>>>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 >>>>>>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 >>>>>>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) >>>>>>> knlGS:0000000000000000 >>>>>>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 >>>>>>> [ 195.937300] Call Trace: >>>>>>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 >>>>>>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 >>>>>> >>>>>> Odd that this change would affect the USB enablment path, as they were >>>>>> focused on the pullup disable path. Would you happen to have any >>>>>> downstream changes on top of v5.12-rc4 we could review to see if they >>>>>> are still required? (ie where is the dwc3_remove_requests() coming from >>>>>> during ep enable) >>>>> >>>>> You may check my branch [1] on GH. Basically you may be interested in >>>>> the commit: >>>>> 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: >>>>> skip endpoints ep[18]{in,out} >>>>> Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY >>>>> suspend fix (which also shouldn't affect this). >>>> >>>> Can you link your GH reference? >>> >>> Oops, sorry. >>> Here we are: >>> >>> [1]: https://github.com/andy-shev/linux/tree/eds-acpi >>> >> Thanks, I took a look and even tried it on my device running 5.12-rc4, >> but wasn't able to see the same problem. Could you help collect the >> ftrace after enabling the tracing KCONFIG and running the below sequence? >> >> 1. Mount debugfs >> 2. Set up tracing instance >> >> mkdir /sys/kernel/debug/tracing/instances/usb >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_complete_trb/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ctrl_req/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_dequeue/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_queue/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_cmd/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_disable/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_enable/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_giveback/enable >> echo 1 > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_prepare_trb/enable >> echo 1 > /sys/kernel/debug/tracing/instances/usb/tracing_on >> >> 3. Run your test, which should include: >> - echo "" > /sys/kernel/config/usb_gadget/g1/UDC >> - echo <UDC name> > /sys/kernel/config/usb_gadget/g1/UDC >> >> 4. Collect the trace output: >> cat /sys/kernel/debug/tracing/instances/usb/trace > > Here we are (I cherry-picked again reverted patch, other stays the same) [2]. > On top I put a warning, so you may see timestamps. > > Dunno how long it will stay there, please confirm that you got it. > > [2]: https://paste.ubuntu.com/p/jNF565ypPp/ > Hi Andy, Would you be able to give the below change a try? diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 80606b8..cd58bd5 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -791,10 +791,6 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) reg &= ~DWC3_DALEPENA_EP(dep->number); dwc3_writel(dwc->regs, DWC3_DALEPENA, reg); - dep->stream_capable = false; - dep->type = 0; - dep->flags = 0; - /* Clear out the ep descriptors for non-ep0 */ if (dep->number > 1) { dep->endpoint.comp_desc = NULL; @@ -803,6 +799,10 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) dwc3_remove_requests(dwc, dep); + dep->stream_capable = false; + dep->type = 0; + dep->flags = 0; + return 0; } Thanks Wesley Cheng -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Hi Wesley, On 23.03.2021 22:53, Wesley Cheng wrote: > On 3/23/2021 10:27 AM, Andy Shevchenko wrote: >> On Tue, Mar 23, 2021 at 1:19 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >>> On 3/22/2021 2:14 PM, Andy Shevchenko wrote: >>>> On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>> On 3/22/2021 12:34 PM, Andy Shevchenko wrote: >>>>>> On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>>>> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: >>>>>>>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: >>>>>>>>> In the situations where the DWC3 gadget stops active transfers, once >>>>>>>>> calling the dwc3_gadget_giveback(), there is a chance where a function >>>>>>>>> driver can queue a new USB request in between the time where the dwc3 >>>>>>>>> lock has been released and re-aquired. This occurs after we've already >>>>>>>>> issued an ENDXFER command. When the stop active transfers continues >>>>>>>>> to remove USB requests from all dep lists, the newly added request will >>>>>>>>> also be removed, while controller still has an active TRB for it. >>>>>>>>> This can lead to the controller accessing an unmapped memory address. >>>>>>>>> >>>>>>>>> Fix this by ensuring parameters to prevent EP queuing are set before >>>>>>>>> calling the stop active transfers API. >>>>>>>> >>>>>>>> commit f09ddcfcb8c569675066337adac2ac205113471f >>>>>>>> Author: Wesley Cheng <wcheng@codeaurora.org> >>>>>>>> Date: Thu Mar 11 15:59:02 2021 -0800 >>>>>>>> >>>>>>>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers >>>>>>>> >>>>>>>> effectively broke my gadget setup. >>>>>>>> >>>>>>>> The output of the kernel (followed by non responsive state of USB controller): >>>>>>>> >>>>>>>> [ 195.228586] using random self ethernet address >>>>>>>> [ 195.233104] using random host ethernet address >>>>>>>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 >>>>>>>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 >>>>>>>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready >>>>>>>> [ 195.780585] ------------[ cut here ]------------ >>>>>>>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in >>>>>>>> [ 195.790162] WARNING: CPU: 0 PID: 217 at >>>>>>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>>>>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite >>>>>>>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps >>>>>>>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt >>>>>>>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec >>>>>>>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci >>>>>>>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m >>>>>>>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel >>>>>>>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 >>>>>>>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, >>>>>>>> BIOS 542 2015.01.21:18.19.48 >>>>>>>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 >>>>>>>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee >>>>>>>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 >>>>>>>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 >>>>>>>> 20 e9 80 fc ff ff 41 83 fe 92 >>>>>>>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 >>>>>>>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff >>>>>>>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 >>>>>>>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb >>>>>>>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 >>>>>>>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 >>>>>>>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) >>>>>>>> knlGS:0000000000000000 >>>>>>>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 >>>>>>>> [ 195.937300] Call Trace: >>>>>>>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 >>>>>>>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 >>>>>>> Odd that this change would affect the USB enablment path, as they were >>>>>>> focused on the pullup disable path. Would you happen to have any >>>>>>> downstream changes on top of v5.12-rc4 we could review to see if they >>>>>>> are still required? (ie where is the dwc3_remove_requests() coming from >>>>>>> during ep enable) >>>>>> You may check my branch [1] on GH. Basically you may be interested in >>>>>> the commit: >>>>>> 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: >>>>>> skip endpoints ep[18]{in,out} >>>>>> Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY >>>>>> suspend fix (which also shouldn't affect this). >>>>> Can you link your GH reference? >>>> Oops, sorry. >>>> Here we are: >>>> >>>> [1]: https://github.com/andy-shev/linux/tree/eds-acpi >>>> >>> Thanks, I took a look and even tried it on my device running 5.12-rc4, >>> but wasn't able to see the same problem. Could you help collect the >>> ftrace after enabling the tracing KCONFIG and running the below sequence? >>> >>> 1. Mount debugfs >>> 2. Set up tracing instance >>> >>> mkdir /sys/kernel/debug/tracing/instances/usb >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_complete_trb/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ctrl_req/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_dequeue/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_queue/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_cmd/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_disable/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_enable/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_giveback/enable >>> echo 1 > >>> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_prepare_trb/enable >>> echo 1 > /sys/kernel/debug/tracing/instances/usb/tracing_on >>> >>> 3. Run your test, which should include: >>> - echo "" > /sys/kernel/config/usb_gadget/g1/UDC >>> - echo <UDC name> > /sys/kernel/config/usb_gadget/g1/UDC >>> >>> 4. Collect the trace output: >>> cat /sys/kernel/debug/tracing/instances/usb/trace >> Here we are (I cherry-picked again reverted patch, other stays the same) [2]. >> On top I put a warning, so you may see timestamps. >> >> Dunno how long it will stay there, please confirm that you got it. >> >> [2]: https://paste.ubuntu.com/p/jNF565ypPp/ >> > Hi Andy, > > Would you be able to give the below change a try? > > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 80606b8..cd58bd5 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -791,10 +791,6 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep > *dep) > reg &= ~DWC3_DALEPENA_EP(dep->number); > dwc3_writel(dwc->regs, DWC3_DALEPENA, reg); > > - dep->stream_capable = false; > - dep->type = 0; > - dep->flags = 0; > - > /* Clear out the ep descriptors for non-ep0 */ > if (dep->number > 1) { > dep->endpoint.comp_desc = NULL; > @@ -803,6 +799,10 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep > *dep) > > dwc3_remove_requests(dwc, dep); > > + dep->stream_capable = false; > + dep->type = 0; > + dep->flags = 0; > + > return 0; > } Feel free to add my: Tested-by: Marek Szyprowski <m.szyprowski@samsung.com> The above change fixed the issue I was about to complain, introduced (or revealed?) by the $subject commit. Before applying the above fix I've observed following warning during system power off: ------------[ cut here ]------------ dwc3 15400000.usb: No resource for ep2in WARNING: CPU: 0 PID: 162 at drivers/usb/dwc3/gadget.c:361 dwc3_send_gadget_ep_cmd+0x95c/0xae8 Modules linked in: cpufreq_powersave cpufreq_conservative brcmfmac brcmutil cfg80211 crct10dif_ce s3fwrn5_i2c s3fwrn5 nci nfc s5p_mfc hci_uart s5p_jpeg btqca btbcm exynos_gsc v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common bluetooth videodev mc panfrost gpu_sched ecdh_generic ecc rfkill ip_tables x_tables ipv6 CPU: 0 PID: 162 Comm: irq/151-dwc3 Not tainted 5.12.0-rc3+ #2713 Hardware name: Samsung TM2E board (DT) pstate: 40000085 (nZcv daIf -PAN -UAO -TCO BTYPE=--) pc : dwc3_send_gadget_ep_cmd+0x95c/0xae8 ... Call trace: dwc3_send_gadget_ep_cmd+0x95c/0xae8 __dwc3_gadget_ep_enable+0x40c/0x578 dwc3_gadget_ep_enable+0x5c/0xf0 usb_ep_enable+0x40/0x1d0 ecm_set_alt+0xa8/0x1b8 composite_setup+0x8b8/0x1840 dwc3_ep0_delegate_req+0x38/0x58 dwc3_ep0_interrupt+0xd2c/0x1048 dwc3_thread_interrupt+0xe7c/0x10f8 irq_thread_fn+0x28/0x98 irq_thread+0x17c/0x298 kthread+0x134/0x160 ret_from_fork+0x10/0x18 irq event stamp: 1220 hardirqs last enabled at (1219): [<ffff800010f96eac>] _raw_spin_unlock_irq+0x3c/0x80 hardirqs last disabled at (1220): [<ffff800010f97694>] _raw_spin_lock_irqsave+0xb4/0x148 softirqs last enabled at (924): [<ffff8000100104e8>] _stext+0x4e8/0x614 softirqs last disabled at (901): [<ffff800010095304>] irq_exit+0x19c/0x1a8 ---[ end trace 84804a1a38a2c9f7 ]--- Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
On Tue, Mar 23, 2021 at 11:53 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > > > > On 3/23/2021 10:27 AM, Andy Shevchenko wrote: > > On Tue, Mar 23, 2021 at 1:19 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > >> > >> Hi Andy, > >> > >> On 3/22/2021 2:14 PM, Andy Shevchenko wrote: > >>> On Mon, Mar 22, 2021 at 10:06 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > >>>> > >>>> Hi Andy, > >>>> > >>>> On 3/22/2021 12:34 PM, Andy Shevchenko wrote: > >>>>> On Mon, Mar 22, 2021 at 8:49 PM Wesley Cheng <wcheng@codeaurora.org> wrote: > >>>>>> > >>>>>> Hi Andy, > >>>>>> > >>>>>> On 3/22/2021 5:48 AM, Andy Shevchenko wrote: > >>>>>>> On Fri, Mar 12, 2021 at 2:01 AM Wesley Cheng <wcheng@codeaurora.org> wrote: > >>>>>>>> > >>>>>>>> In the situations where the DWC3 gadget stops active transfers, once > >>>>>>>> calling the dwc3_gadget_giveback(), there is a chance where a function > >>>>>>>> driver can queue a new USB request in between the time where the dwc3 > >>>>>>>> lock has been released and re-aquired. This occurs after we've already > >>>>>>>> issued an ENDXFER command. When the stop active transfers continues > >>>>>>>> to remove USB requests from all dep lists, the newly added request will > >>>>>>>> also be removed, while controller still has an active TRB for it. > >>>>>>>> This can lead to the controller accessing an unmapped memory address. > >>>>>>>> > >>>>>>>> Fix this by ensuring parameters to prevent EP queuing are set before > >>>>>>>> calling the stop active transfers API. > >>>>>>> > >>>>>>> > >>>>>>> commit f09ddcfcb8c569675066337adac2ac205113471f > >>>>>>> Author: Wesley Cheng <wcheng@codeaurora.org> > >>>>>>> Date: Thu Mar 11 15:59:02 2021 -0800 > >>>>>>> > >>>>>>> usb: dwc3: gadget: Prevent EP queuing while stopping transfers > >>>>>>> > >>>>>>> effectively broke my gadget setup. > >>>>>>> > >>>>>>> The output of the kernel (followed by non responsive state of USB controller): > >>>>>>> > >>>>>>> [ 195.228586] using random self ethernet address > >>>>>>> [ 195.233104] using random host ethernet address > >>>>>>> [ 195.245306] usb0: HOST MAC aa:bb:cc:dd:ee:f2 > >>>>>>> [ 195.249732] usb0: MAC aa:bb:cc:dd:ee:f1 > >>>>>>> # [ 195.773594] IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready > >>>>>>> [ 195.780585] ------------[ cut here ]------------ > >>>>>>> [ 195.785217] dwc3 dwc3.0.auto: No resource for ep2in > >>>>>>> [ 195.790162] WARNING: CPU: 0 PID: 217 at > >>>>>>> drivers/usb/dwc3/gadget.c:360 dwc3_send_gadget_ep_cmd+0x4b9/0x670 > >>>>>>> [ 195.799760] Modules linked in: usb_f_eem u_ether libcomposite > >>>>>>> brcmfmac brcmutil mmc_block pwm_lpss_pci pwm_lps > >>>>>>> s snd_sof_pci_intel_tng snd_sof_pci snd_sof_acpi_intel_byt > >>>>>>> snd_sof_intel_ipc snd_sof_acpi snd_sof snd_sof_nocodec > >>>>>>> spi_pxa2xx_platform snd_sof_xtensa_dsp spi_pxa2xx_pci > >>>>>>> extcon_intel_mrfld intel_mrfld_adc sdhci_pci cqhci sdhci m > >>>>>>> mc_core intel_mrfld_pwrbtn intel_soc_pmic_mrfld hci_uart btbcm btintel > >>>>>>> [ 195.835604] CPU: 0 PID: 217 Comm: irq/16-dwc3 Not tainted 5.12.0-rc4+ #60 > >>>>>>> [ 195.842403] Hardware name: Intel Corporation Merrifield/BODEGA BAY, > >>>>>>> BIOS 542 2015.01.21:18.19.48 > >>>>>>> [ 195.851191] RIP: 0010:dwc3_send_gadget_ep_cmd+0x4b9/0x670 > >>>>>>> [ 195.856608] Code: cd 00 00 00 44 89 44 24 20 48 89 4c 24 18 e8 ee > >>>>>>> f7 e4 ff 48 8b 4c 24 18 4c 89 f2 48 c7 c7 b9 > >>>>>>> ed 4f a0 48 89 c6 e8 ef 24 43 00 <0f> 0b 41 be ea ff ff ff 44 8b 44 24 > >>>>>>> 20 e9 80 fc ff ff 41 83 fe 92 > >>>>>>> [ 195.875381] RSP: 0000:ffffa53c00373ba8 EFLAGS: 00010086 > >>>>>>> [ 195.880617] RAX: 0000000000000000 RBX: 0000000000001387 RCX: 00000000ffffdfff > >>>>>>> [ 195.887755] RDX: 00000000ffffdfff RSI: 00000000ffffffea RDI: 0000000000000000 > >>>>>>> [ 195.894893] RBP: ffff9ce8c8f2b028 R08: ffffffffa0732288 R09: 0000000000009ffb > >>>>>>> [ 195.902034] R10: 00000000ffffe000 R11: 3fffffffffffffff R12: 0000000000041006 > >>>>>>> [ 195.909170] R13: ffffa53c00373c24 R14: ffff9ce8c11dadb0 R15: ffff9ce8c2861700 > >>>>>>> [ 195.916310] FS: 0000000000000000(0000) GS:ffff9ce8fe200000(0000) > >>>>>>> knlGS:0000000000000000 > >>>>>>> [ 195.924409] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > >>>>>>> [ 195.930161] CR2: 00000000f7f694a0 CR3: 0000000038e0c000 CR4: 00000000001006f0 > >>>>>>> [ 195.937300] Call Trace: > >>>>>>> [ 195.939755] __dwc3_gadget_ep_enable+0x2d4/0x4e0 > >>>>>>> [ 195.944393] ? dwc3_remove_requests.constprop.0+0x86/0x170 > >>>>>> > >>>>>> Odd that this change would affect the USB enablment path, as they were > >>>>>> focused on the pullup disable path. Would you happen to have any > >>>>>> downstream changes on top of v5.12-rc4 we could review to see if they > >>>>>> are still required? (ie where is the dwc3_remove_requests() coming from > >>>>>> during ep enable) > >>>>> > >>>>> You may check my branch [1] on GH. Basically you may be interested in > >>>>> the commit: > >>>>> 0f86df1294ee7523060cc16eafaf4898c693eab0 REVERTME: usb: dwc3: gadget: > >>>>> skip endpoints ep[18]{in,out} > >>>>> Otherwise it's a clean v5.12-rc4 with a revert and another USB PHY > >>>>> suspend fix (which also shouldn't affect this). > >>>> > >>>> Can you link your GH reference? > >>> > >>> Oops, sorry. > >>> Here we are: > >>> > >>> [1]: https://github.com/andy-shev/linux/tree/eds-acpi > >>> > >> Thanks, I took a look and even tried it on my device running 5.12-rc4, > >> but wasn't able to see the same problem. Could you help collect the > >> ftrace after enabling the tracing KCONFIG and running the below sequence? > >> > >> 1. Mount debugfs > >> 2. Set up tracing instance > >> > >> mkdir /sys/kernel/debug/tracing/instances/usb > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_complete_trb/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ctrl_req/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_dequeue/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_ep_queue/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_cmd/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_disable/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_ep_enable/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_gadget_giveback/enable > >> echo 1 > > >> /sys/kernel/debug/tracing/instances/usb/events/dwc3/dwc3_prepare_trb/enable > >> echo 1 > /sys/kernel/debug/tracing/instances/usb/tracing_on > >> > >> 3. Run your test, which should include: > >> - echo "" > /sys/kernel/config/usb_gadget/g1/UDC > >> - echo <UDC name> > /sys/kernel/config/usb_gadget/g1/UDC > >> > >> 4. Collect the trace output: > >> cat /sys/kernel/debug/tracing/instances/usb/trace > > > > Here we are (I cherry-picked again reverted patch, other stays the same) [2]. > > On top I put a warning, so you may see timestamps. > > > > Dunno how long it will stay there, please confirm that you got it. > > > > [2]: https://paste.ubuntu.com/p/jNF565ypPp/ > > > > Hi Andy, > > Would you be able to give the below change a try? Thanks! Reported-and-tested-by: Andy Shevchenko <andy.shevchenko@gmail.com> > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 80606b8..cd58bd5 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -791,10 +791,6 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep > *dep) > reg &= ~DWC3_DALEPENA_EP(dep->number); > dwc3_writel(dwc->regs, DWC3_DALEPENA, reg); > > - dep->stream_capable = false; > - dep->type = 0; > - dep->flags = 0; > - > /* Clear out the ep descriptors for non-ep0 */ > if (dep->number > 1) { > dep->endpoint.comp_desc = NULL; > @@ -803,6 +799,10 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep > *dep) > > dwc3_remove_requests(dwc, dep); > > + dep->stream_capable = false; > + dep->type = 0; > + dep->flags = 0; > + > return 0; > } > > Thanks > Wesley Cheng > > -- > The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, > a Linux Foundation Collaborative Project -- With Best Regards, Andy Shevchenko
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 4780983..2c94cc9 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -783,8 +783,6 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) trace_dwc3_gadget_ep_disable(dep); - dwc3_remove_requests(dwc, dep); - /* make sure HW endpoint isn't stalled */ if (dep->flags & DWC3_EP_STALL) __dwc3_gadget_ep_set_halt(dep, 0, false); @@ -803,6 +801,8 @@ static int __dwc3_gadget_ep_disable(struct dwc3_ep *dep) dep->endpoint.desc = NULL; } + dwc3_remove_requests(dwc, dep); + return 0; } @@ -1617,7 +1617,7 @@ static int __dwc3_gadget_ep_queue(struct dwc3_ep *dep, struct dwc3_request *req) { struct dwc3 *dwc = dep->dwc; - if (!dep->endpoint.desc || !dwc->pullups_connected) { + if (!dep->endpoint.desc || !dwc->pullups_connected || !dwc->connected) { dev_err(dwc->dev, "%s: can't queue to disabled endpoint\n", dep->name); return -ESHUTDOWN; @@ -2247,6 +2247,7 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) if (!is_on) { u32 count; + dwc->connected = false; /* * In the Synopsis DesignWare Cores USB3 Databook Rev. 3.30a * Section 4.1.8 Table 4-7, it states that for a device-initiated @@ -2271,7 +2272,6 @@ static int dwc3_gadget_pullup(struct usb_gadget *g, int is_on) dwc->ev_buf->lpos = (dwc->ev_buf->lpos + count) % dwc->ev_buf->length; } - dwc->connected = false; } else { __dwc3_gadget_start(dwc); } @@ -3329,8 +3329,6 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) { u32 reg; - dwc->connected = true; - /* * WORKAROUND: DWC3 revisions <1.88a have an issue which * would cause a missing Disconnect Event if there's a @@ -3370,6 +3368,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) * transfers." */ dwc3_stop_active_transfers(dwc); + dwc->connected = true; reg = dwc3_readl(dwc->regs, DWC3_DCTL); reg &= ~DWC3_DCTL_TSTCTRL_MASK;
In the situations where the DWC3 gadget stops active transfers, once calling the dwc3_gadget_giveback(), there is a chance where a function driver can queue a new USB request in between the time where the dwc3 lock has been released and re-aquired. This occurs after we've already issued an ENDXFER command. When the stop active transfers continues to remove USB requests from all dep lists, the newly added request will also be removed, while controller still has an active TRB for it. This can lead to the controller accessing an unmapped memory address. Fix this by ensuring parameters to prevent EP queuing are set before calling the stop active transfers API. Fixes: ae7e86108b12 ("usb: dwc3: Stop active transfers before halting the controller") Signed-off-by: Wesley Cheng <wcheng@codeaurora.org> --- Changes since V2: - Removed duplicate dwc->connected = false setting in pullup routine Changes since V1: - Added Fixes tag to point to the commit this is addressing drivers/usb/dwc3/gadget.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)