Message ID | 20250515185227.1507363-2-royluo@google.com |
---|---|
State | New |
Headers | show |
Series | Introduce XHCI_FULL_RESET_ON_REMOVE quirk for DWC3 | expand |
On Thu, May 15, 2025, Roy Luo wrote: > Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() > helper") introduced an optimization to xhci_reset() during xhci removal, > allowing it to bail out early without waiting for the reset to complete. > > This behavior can cause issues on SNPS DWC3 USB controller with dual-role > capability. When the DWC3 controller exits host mode and removes xhci > while a reset is still in progress, and then tries to configure its > hardware for device mode, the ongoing reset leads to register access > issues; specifically, all register reads returns 0. These issues extend > beyond the xhci register space (which is expected during a reset) and > affect the entire DWC3 IP block, causing the DWC3 device mode to > malfunction. > > To address this, introduce the `XHCI_FULL_RESET_ON_REMOVE` quirk. When this > quirk is set, xhci_reset() always completes its reset handshake, ensuring > the controller is in a fully reset state before proceeding. > > Cc: stable@vger.kernel.org > Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") > Signed-off-by: Roy Luo <royluo@google.com> > --- > drivers/usb/host/xhci-plat.c | 3 +++ > drivers/usb/host/xhci.c | 8 +++++++- > drivers/usb/host/xhci.h | 1 + > 3 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c > index 3155e3a842da..19c5c26a8e63 100644 > --- a/drivers/usb/host/xhci-plat.c > +++ b/drivers/usb/host/xhci-plat.c > @@ -265,6 +265,9 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s > if (device_property_read_bool(tmpdev, "xhci-skip-phy-init-quirk")) > xhci->quirks |= XHCI_SKIP_PHY_INIT; > > + if (device_property_read_bool(tmpdev, "xhci-full-reset-on-remove-quirk")) > + xhci->quirks |= XHCI_FULL_RESET_ON_REMOVE; > + > device_property_read_u32(tmpdev, "imod-interval-ns", > &xhci->imod_interval); > } > diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c > index 90eb491267b5..4f091d618c01 100644 > --- a/drivers/usb/host/xhci.c > +++ b/drivers/usb/host/xhci.c > @@ -198,6 +198,7 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) > u32 command; > u32 state; > int ret; > + unsigned int exit_state; > > state = readl(&xhci->op_regs->status); > > @@ -226,8 +227,13 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) > if (xhci->quirks & XHCI_INTEL_HOST) > udelay(1000); > > + if (xhci->quirks & XHCI_FULL_RESET_ON_REMOVE) > + exit_state = 0; There's no state 0. Checking against that is odd. Couldn't we just use xhci_handshake() equivalent instead? In any case, this is basically a revert of this change: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") Can't we just revert or fix the above patch that causes a regression? Thanks, Thinh > + else > + exit_state = XHCI_STATE_REMOVING; > + > ret = xhci_handshake_check_state(xhci, &xhci->op_regs->command, > - CMD_RESET, 0, timeout_us, XHCI_STATE_REMOVING); > + CMD_RESET, 0, timeout_us, exit_state); > if (ret) > return ret; > > diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h > index 242ab9fbc8ae..ac65af788298 100644 > --- a/drivers/usb/host/xhci.h > +++ b/drivers/usb/host/xhci.h > @@ -1637,6 +1637,7 @@ struct xhci_hcd { > #define XHCI_WRITE_64_HI_LO BIT_ULL(47) > #define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48) > #define XHCI_ETRON_HOST BIT_ULL(49) > +#define XHCI_FULL_RESET_ON_REMOVE BIT_ULL(50) > > unsigned int num_active_eps; > unsigned int limit_active_eps; > -- > 2.49.0.1112.g889b7c5bd8-goog >
On Thu, 15 May 2025 23:42:50 +0000, Thinh Nguyen wrote: > In any case, this is basically a revert of this change: > 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() > helper") > > Can't we just revert or fix the above patch that causes a regression? Also note that 6ccb83d6c497 claimed to fix actual problems, so disabling it on selected hardware could bring the old bug back: > In some situations where xhci removal happens parallel to > xhci_handshake, we encounter a scenario where the xhci_handshake > can't succeed, and it polls until timeout. > > If xhci_handshake runs until timeout it can on some platforms result > in a long wait which might lead to a watchdog timeout. But on the other hand, xhci_handshake() has long timeouts because the handshakes themselves can take a surprisingly long time (and sometimes still succeed), so any reliance on handshake completing before timeout is frankly a bug in itself. Regards, Michal
> There's no state 0. Checking against that is odd. Couldn't we just use > xhci_handshake() equivalent instead? Ok, I will change it in the next version. On Thu, May 15, 2025 at 11:33 PM Michał Pecio <michal.pecio@gmail.com> wrote: > > On Thu, 15 May 2025 23:42:50 +0000, Thinh Nguyen wrote: > > In any case, this is basically a revert of this change: > > 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() > > helper") > > > > Can't we just revert or fix the above patch that causes a regression? > > Also note that 6ccb83d6c497 claimed to fix actual problems, so > disabling it on selected hardware could bring the old bug back: > > > In some situations where xhci removal happens parallel to > > xhci_handshake, we encounter a scenario where the xhci_handshake > > can't succeed, and it polls until timeout. > > > > If xhci_handshake runs until timeout it can on some platforms result > > in a long wait which might lead to a watchdog timeout. On top of this, xhci_handshake_check_state(XHCI_STATE_REMOVING) is also used elsewhere like xhci_abort_cmd_ring(), so a simple revert is off the table. Commit 6ccb83d6c497 did not specify which platform and in what circumstance would xhci handshake timeout, adding a quirk for DWC3 seems to be the better option here. > > But on the other hand, xhci_handshake() has long timeouts because > the handshakes themselves can take a surprisingly long time (and > sometimes still succeed), so any reliance on handshake completing > before timeout is frankly a bug in itself. This patch simply honors the contract between the software and hardware, allowing the handshake to complete. It doesn't assume the handshake will finish on time. If it times out, then it times out and returns a failure. Thanks, Roy
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c index 3155e3a842da..19c5c26a8e63 100644 --- a/drivers/usb/host/xhci-plat.c +++ b/drivers/usb/host/xhci-plat.c @@ -265,6 +265,9 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s if (device_property_read_bool(tmpdev, "xhci-skip-phy-init-quirk")) xhci->quirks |= XHCI_SKIP_PHY_INIT; + if (device_property_read_bool(tmpdev, "xhci-full-reset-on-remove-quirk")) + xhci->quirks |= XHCI_FULL_RESET_ON_REMOVE; + device_property_read_u32(tmpdev, "imod-interval-ns", &xhci->imod_interval); } diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 90eb491267b5..4f091d618c01 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -198,6 +198,7 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) u32 command; u32 state; int ret; + unsigned int exit_state; state = readl(&xhci->op_regs->status); @@ -226,8 +227,13 @@ int xhci_reset(struct xhci_hcd *xhci, u64 timeout_us) if (xhci->quirks & XHCI_INTEL_HOST) udelay(1000); + if (xhci->quirks & XHCI_FULL_RESET_ON_REMOVE) + exit_state = 0; + else + exit_state = XHCI_STATE_REMOVING; + ret = xhci_handshake_check_state(xhci, &xhci->op_regs->command, - CMD_RESET, 0, timeout_us, XHCI_STATE_REMOVING); + CMD_RESET, 0, timeout_us, exit_state); if (ret) return ret; diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 242ab9fbc8ae..ac65af788298 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1637,6 +1637,7 @@ struct xhci_hcd { #define XHCI_WRITE_64_HI_LO BIT_ULL(47) #define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48) #define XHCI_ETRON_HOST BIT_ULL(49) +#define XHCI_FULL_RESET_ON_REMOVE BIT_ULL(50) unsigned int num_active_eps; unsigned int limit_active_eps;
Commit 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") introduced an optimization to xhci_reset() during xhci removal, allowing it to bail out early without waiting for the reset to complete. This behavior can cause issues on SNPS DWC3 USB controller with dual-role capability. When the DWC3 controller exits host mode and removes xhci while a reset is still in progress, and then tries to configure its hardware for device mode, the ongoing reset leads to register access issues; specifically, all register reads returns 0. These issues extend beyond the xhci register space (which is expected during a reset) and affect the entire DWC3 IP block, causing the DWC3 device mode to malfunction. To address this, introduce the `XHCI_FULL_RESET_ON_REMOVE` quirk. When this quirk is set, xhci_reset() always completes its reset handshake, ensuring the controller is in a fully reset state before proceeding. Cc: stable@vger.kernel.org Fixes: 6ccb83d6c497 ("usb: xhci: Implement xhci_handshake_check_state() helper") Signed-off-by: Roy Luo <royluo@google.com> --- drivers/usb/host/xhci-plat.c | 3 +++ drivers/usb/host/xhci.c | 8 +++++++- drivers/usb/host/xhci.h | 1 + 3 files changed, 11 insertions(+), 1 deletion(-)