Message ID | 20231011100214.25720-1-quic_kriskura@quicinc.com |
---|---|
State | New |
Headers | show |
Series | [RFC] usb: dwc3: core: Fix RAM interface getting stuck during enumeration | expand |
On Wed, Oct 11, 2023, Krishna Kurapati wrote: > This implementation is to fix RAM interface getting stuck during > enumeration and controller not responding to any command. > > During plug-out test cases, it is sometimes seen that no events > are generated by the controller and all CSR register reads give "0" > and CSR_Timeout bit gets set indicating that CSR reads/writes are > timing out or timed out. > > The issue comes up on different instnaces of enumeration on different > platforms. On one platform, the debug log is as follows: > > Prepared a TRB on ep0out and did start transfer to get set > address request from host: > > <...>-7191 [000] D..1. 66.421006: dwc3_gadget_ep_cmd: ep0out: > cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 --> > status: Successful > > <...>-7191 [000] D..1. 66.421196: dwc3_event: event (0000c040): > ep0out: Transfer Complete (sIL) [Setup Phase] > > <...>-7191 [000] D..1. 66.421197: dwc3_ctrl_req: Set > Address(Addr = 01) > > Then XFER NRDY received on ep0in for zero length status phase and > a Start Transfer was done on ep0in with 0-length packet in 2 Stage > status phase: > > <...>-7191 [000] D..1. 66.421249: dwc3_event: event (000020c2): > ep0in: Transfer Not Ready [00000000] (Not Active) [Status Phase] > > <...>-7191 [000] D..1. 66.421266: dwc3_prepare_trb: ep0in: trb > ffffffc00fcfd000 (E0:D0) buf 00000000efffa000 size 0 ctrl 00000c33 > sofn 00000000 (HLcs:SC:status2) > > <...>-7191 [000] D..1. 66.421387: dwc3_gadget_ep_cmd: ep0in: cmd > 'Start Transfer' [406] params 00000000 efffa000 00000000 -->status: > Successful > > Then a bus reset was received directly after 500 msec. Software never > got the cmd complete for the start transfer done in status phase. Here > the RAM interface is stuck. So host issues a bus reset as link is > idle for 500 msec: > > <...>-7191 [000] D..1. 66.935603: dwc3_event: event (00000101): > Reset [U0] > > Then software sees that it is in status phase and we issue an ENDXFER > on ep0in and it gets timedout waiting for the CMDACT to go '0': > > <...>-7191 [000] D..1. 66.958249: dwc3_gadget_ep_cmd: ep0in: cmd > 'End Transfer' [10508] params 00000000 00000000 00000000 --> status: > Timed Out > > Upon debug with Synopsys, it turns out that the root cause is as > follows: > > During any transfer, if the data is not successfully transmitted, > then a Done (with failure) handshake is returned, so that the BMU > can re-attempt the same data again by rewinding its data pointers. > > But, if the USB IN is a 0-length payload (which is what is happening > in this case - 2 stage status phase of set_address), then there is no > need to rewind the pointers and the Done (with failure) handshake is > not returned for failure case. This keeps the Request-Done interface > busy till the next Done handshake. The MAC sends the 0-length payload > again when the host requests. If the transmission is successful this > time, the Done (with success) handshake is provided back. Otherwise, > it repeats the same steps again. > > If the cable is disconnected or if the Host aborts the transfer on 3 > consecutive failed attempts, the Request-Done handshake is not > complete. This keeps the interface busy. > > The subsequent RAM access cannot proceed until the above pending > transfer is complete. This results in failure of any access to RAM > address locations. Many of the EndPoint commands need to access the > RAM and they would fail to complete successfully. > > Furthermore when cable removal happens, this would not generate a > disconnect event and the "connected" flag remains true always blockin > suspend. > > Synopsys confirmed that the issue is present on all USB3 devices and > as a workaround, suggested to re-initialize device mode. > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > --- > drivers/usb/dwc3/core.c | 20 ++++++++++++++++++++ > drivers/usb/dwc3/core.h | 4 ++++ > drivers/usb/dwc3/drd.c | 5 +++++ > drivers/usb/dwc3/gadget.c | 6 ++++-- > 4 files changed, 33 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > index 44ee8526dc28..d18b81cccdc5 100644 > --- a/drivers/usb/dwc3/core.c > +++ b/drivers/usb/dwc3/core.c > @@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work) > unsigned long flags; > int ret; > u32 reg; > + u8 timeout = 100; > u32 desired_dr_role; > > mutex_lock(&dwc->mutex); > @@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct *work) > if (!desired_dr_role) > goto out; > > + /* > + * STAR 5001544 - If cable disconnect doesn't generate > + * disconnect event in device mode, then re-initialize the > + * controller. > + */ > + if ((dwc->cable_disconnected == true) && > + (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) { > + while (dwc->connected == true && timeout != 0) { > + mdelay(10); > + timeout--; > + } > + > + if (timeout == 0) { > + dwc3_gadget_soft_disconnect(dwc); > + udelay(100); > + dwc3_gadget_soft_connect(dwc); > + } > + } > + > if (desired_dr_role == dwc->current_dr_role) > goto out; > > diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h > index c6c87acbd376..7642330cf608 100644 > --- a/drivers/usb/dwc3/core.h > +++ b/drivers/usb/dwc3/core.h > @@ -1355,6 +1355,7 @@ struct dwc3 { > int last_fifo_depth; > int num_ep_resized; > struct dentry *debug_root; > + bool cable_disconnected; > }; > > #define INCRX_BURST_MODE 0 > @@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc); > > int dwc3_core_soft_reset(struct dwc3 *dwc); > > +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc); > +int dwc3_gadget_soft_connect(struct dwc3 *dwc); > + > #if IS_ENABLED(CONFIG_USB_DWC3_HOST) || IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE) > int dwc3_host_init(struct dwc3 *dwc); > void dwc3_host_exit(struct dwc3 *dwc); > diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c > index 039bf241769a..593c023fc39a 100644 > --- a/drivers/usb/dwc3/drd.c > +++ b/drivers/usb/dwc3/drd.c > @@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw, > struct dwc3 *dwc = usb_role_switch_get_drvdata(sw); > u32 mode; > > + dwc->cable_disconnected = false; > + > switch (role) { > case USB_ROLE_HOST: > mode = DWC3_GCTL_PRTCAP_HOST; > @@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw, > mode = DWC3_GCTL_PRTCAP_DEVICE; > break; > default: > + if (role == USB_ROLE_NONE) > + dwc->cable_disconnected = true; > + > if (dwc->role_switch_default_mode == USB_DR_MODE_HOST) > mode = DWC3_GCTL_PRTCAP_HOST; > else > diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c > index 858fe4c299b7..a92df2e04cce 100644 > --- a/drivers/usb/dwc3/gadget.c > +++ b/drivers/usb/dwc3/gadget.c > @@ -2634,7 +2634,7 @@ static void dwc3_gadget_disable_irq(struct dwc3 *dwc); > static void __dwc3_gadget_stop(struct dwc3 *dwc); > static int __dwc3_gadget_start(struct dwc3 *dwc); > > -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > { > unsigned long flags; > int ret; > @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > return ret; > } > > -static int dwc3_gadget_soft_connect(struct dwc3 *dwc) > +int dwc3_gadget_soft_connect(struct dwc3 *dwc) > { > int ret; > > @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc) > dwc3_gadget_dctl_write_safe(dwc, reg); > > dwc->connected = false; > + dwc->cable_disconnected = true; > > dwc3_disconnect_gadget(dwc); > > @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) > */ > dwc3_stop_active_transfers(dwc); > dwc->connected = true; > + dwc->cable_disconnected = false; > > reg = dwc3_readl(dwc->regs, DWC3_DCTL); > reg &= ~DWC3_DCTL_TSTCTRL_MASK; > -- > 2.42.0 > We can just reset the controller when there's End Transfer command timeout as a failure recovery. No need to do what you're doing here. BR, Thinh
On 10/12/2023 11:29 PM, Thinh Nguyen wrote: >> -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >> { >> unsigned long flags; >> int ret; >> @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >> return ret; >> } >> >> -static int dwc3_gadget_soft_connect(struct dwc3 *dwc) >> +int dwc3_gadget_soft_connect(struct dwc3 *dwc) >> { >> int ret; >> >> @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc) >> dwc3_gadget_dctl_write_safe(dwc, reg); >> >> dwc->connected = false; >> + dwc->cable_disconnected = true; >> >> dwc3_disconnect_gadget(dwc); >> >> @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) >> */ >> dwc3_stop_active_transfers(dwc); >> dwc->connected = true; >> + dwc->cable_disconnected = false; >> >> reg = dwc3_readl(dwc->regs, DWC3_DCTL); >> reg &= ~DWC3_DCTL_TSTCTRL_MASK; >> -- >> 2.42.0 >> > > We can just reset the controller when there's End Transfer command > timeout as a failure recovery. No need to do what you're doing here. > Hi Thinh, That was what I initially wanted to do, but there were couple of reasons I wanted to take this approach: 1. We can't just reset the controller in midst of gadget_interrupt. We need to process it completely and then take action. 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on other platforms of same targets, the issue comes up at some other instances of code, at a point where no IRQ is running. In such cases its not possible to accurately find out code portions and reset the controller. The way I confirmed that both platforms are having the same issue is: a. During cable disconnect, I am not receiving disconnect interrupt b. The reg dump is exactly same in both cases (BMU as well) So I felt it was better to fix it during cable disconnect because even if we remove cable, we are still in device mode only and in this case we can unblock suspend and also bring back controller to a known state. Let me know your thoughts on the above. Regards, Krishna,
Hi Krishna, On 10/11/2023 3:02 AM, Krishna Kurapati wrote: > This implementation is to fix RAM interface getting stuck during > enumeration and controller not responding to any command. > > During plug-out test cases, it is sometimes seen that no events > are generated by the controller and all CSR register reads give "0" > and CSR_Timeout bit gets set indicating that CSR reads/writes are > timing out or timed out. > > The issue comes up on different instnaces of enumeration on different > platforms. On one platform, the debug log is as follows: > > Prepared a TRB on ep0out and did start transfer to get set > address request from host: > > <...>-7191 [000] D..1. 66.421006: dwc3_gadget_ep_cmd: ep0out: > cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 --> > status: Successful > > <...>-7191 [000] D..1. 66.421196: dwc3_event: event (0000c040): > ep0out: Transfer Complete (sIL) [Setup Phase] > > <...>-7191 [000] D..1. 66.421197: dwc3_ctrl_req: Set > Address(Addr = 01) > > Then XFER NRDY received on ep0in for zero length status phase and > a Start Transfer was done on ep0in with 0-length packet in 2 Stage > status phase: > > <...>-7191 [000] D..1. 66.421249: dwc3_event: event (000020c2): > ep0in: Transfer Not Ready [00000000] (Not Active) [Status Phase] > > <...>-7191 [000] D..1. 66.421266: dwc3_prepare_trb: ep0in: trb > ffffffc00fcfd000 (E0:D0) buf 00000000efffa000 size 0 ctrl 00000c33 > sofn 00000000 (HLcs:SC:status2) > > <...>-7191 [000] D..1. 66.421387: dwc3_gadget_ep_cmd: ep0in: cmd > 'Start Transfer' [406] params 00000000 efffa000 00000000 -->status: > Successful > > Then a bus reset was received directly after 500 msec. Software never > got the cmd complete for the start transfer done in status phase. Here > the RAM interface is stuck. So host issues a bus reset as link is > idle for 500 msec: > > <...>-7191 [000] D..1. 66.935603: dwc3_event: event (00000101): > Reset [U0] > > Then software sees that it is in status phase and we issue an ENDXFER > on ep0in and it gets timedout waiting for the CMDACT to go '0': > > <...>-7191 [000] D..1. 66.958249: dwc3_gadget_ep_cmd: ep0in: cmd > 'End Transfer' [10508] params 00000000 00000000 00000000 --> status: > Timed Out > > Upon debug with Synopsys, it turns out that the root cause is as > follows: > > During any transfer, if the data is not successfully transmitted, > then a Done (with failure) handshake is returned, so that the BMU > can re-attempt the same data again by rewinding its data pointers. > > But, if the USB IN is a 0-length payload (which is what is happening > in this case - 2 stage status phase of set_address), then there is no > need to rewind the pointers and the Done (with failure) handshake is > not returned for failure case. This keeps the Request-Done interface > busy till the next Done handshake. The MAC sends the 0-length payload > again when the host requests. If the transmission is successful this > time, the Done (with success) handshake is provided back. Otherwise, > it repeats the same steps again. > > If the cable is disconnected or if the Host aborts the transfer on 3 > consecutive failed attempts, the Request-Done handshake is not > complete. This keeps the interface busy. > > The subsequent RAM access cannot proceed until the above pending > transfer is complete. This results in failure of any access to RAM > address locations. Many of the EndPoint commands need to access the > RAM and they would fail to complete successfully. > > Furthermore when cable removal happens, this would not generate a > disconnect event and the "connected" flag remains true always blockin > suspend. > > Synopsys confirmed that the issue is present on all USB3 devices and > as a workaround, suggested to re-initialize device mode. > > Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> > --- > drivers/usb/dwc3/core.c | 20 ++++++++++++++++++++ > drivers/usb/dwc3/core.h | 4 ++++ > drivers/usb/dwc3/drd.c | 5 +++++ > drivers/usb/dwc3/gadget.c | 6 ++++-- > 4 files changed, 33 insertions(+), 2 deletions(-) > > diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c > index 44ee8526dc28..d18b81cccdc5 100644 > --- a/drivers/usb/dwc3/core.c > +++ b/drivers/usb/dwc3/core.c > @@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work) > unsigned long flags; > int ret; > u32 reg; > + u8 timeout = 100; > u32 desired_dr_role; > > mutex_lock(&dwc->mutex); > @@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct *work) > if (!desired_dr_role) > goto out; > > + /* > + * STAR 5001544 - If cable disconnect doesn't generate > + * disconnect event in device mode, then re-initialize the > + * controller. > + */ > + if ((dwc->cable_disconnected == true) && > + (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) { > + while (dwc->connected == true && timeout != 0) { > + mdelay(10); > + timeout--; > + } > + > + if (timeout == 0) { > + dwc3_gadget_soft_disconnect(dwc); > + udelay(100); > + dwc3_gadget_soft_connect(dwc); > + } > + } > + > if (desired_dr_role == dwc->current_dr_role) > goto out; > > diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h > index c6c87acbd376..7642330cf608 100644 > --- a/drivers/usb/dwc3/core.h > +++ b/drivers/usb/dwc3/core.h > @@ -1355,6 +1355,7 @@ struct dwc3 { > int last_fifo_depth; > int num_ep_resized; > struct dentry *debug_root; > + bool cable_disconnected; > }; > > #define INCRX_BURST_MODE 0 > @@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc); > > int dwc3_core_soft_reset(struct dwc3 *dwc); > > +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc); > +int dwc3_gadget_soft_connect(struct dwc3 *dwc); > + > #if IS_ENABLED(CONFIG_USB_DWC3_HOST) || IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE) > int dwc3_host_init(struct dwc3 *dwc); > void dwc3_host_exit(struct dwc3 *dwc); > diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c > index 039bf241769a..593c023fc39a 100644 > --- a/drivers/usb/dwc3/drd.c > +++ b/drivers/usb/dwc3/drd.c > @@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw, > struct dwc3 *dwc = usb_role_switch_get_drvdata(sw); > u32 mode; > > + dwc->cable_disconnected = false; > + > switch (role) { > case USB_ROLE_HOST: > mode = DWC3_GCTL_PRTCAP_HOST; > @@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw, > mode = DWC3_GCTL_PRTCAP_DEVICE; > break; > default: > + if (role == USB_ROLE_NONE) > + dwc->cable_disconnected = true; > + How do we handle cases where role switch isn't used? (ie extcon or maybe no cable connection notification at all) Thanks Wesley Cheng
On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote: > > > On 10/12/2023 11:29 PM, Thinh Nguyen wrote: > > > > -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > > +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > > { > > > unsigned long flags; > > > int ret; > > > @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > > return ret; > > > } > > > -static int dwc3_gadget_soft_connect(struct dwc3 *dwc) > > > +int dwc3_gadget_soft_connect(struct dwc3 *dwc) > > > { > > > int ret; > > > @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc) > > > dwc3_gadget_dctl_write_safe(dwc, reg); > > > dwc->connected = false; > > > + dwc->cable_disconnected = true; > > > dwc3_disconnect_gadget(dwc); > > > @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) > > > */ > > > dwc3_stop_active_transfers(dwc); > > > dwc->connected = true; > > > + dwc->cable_disconnected = false; > > > reg = dwc3_readl(dwc->regs, DWC3_DCTL); > > > reg &= ~DWC3_DCTL_TSTCTRL_MASK; > > > -- > > > 2.42.0 > > > > > > > We can just reset the controller when there's End Transfer command > > timeout as a failure recovery. No need to do what you're doing here. > > > Hi Thinh, > > That was what I initially wanted to do, but there were couple of reasons I > wanted to take this approach: > > 1. We can't just reset the controller in midst of gadget_interrupt. We need > to process it completely and then take action. You can flag the driver so you can do the teardown/soft-reset at the appropriate time. > > 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on > other platforms of same targets, the issue comes up at some other instances > of code, at a point where no IRQ is running. In such cases its not possible > to accurately find out code portions and reset the controller. The way I > confirmed that both platforms are having the same issue is: > > a. During cable disconnect, I am not receiving disconnect interrupt > b. The reg dump is exactly same in both cases (BMU as well) > > So I felt it was better to fix it during cable disconnect because even if we > remove cable, we are still in device mode only and in this case we can > unblock suspend and also bring back controller to a known state. > > Let me know your thoughts on the above. > This issue happens outside of disconnect right? Did you account for port reset? The symptom should be the same. At some point, a command will be issued. If a command timed out, then something is really wrong (especially End Transfer command). We can attempt to recover base on this symptom then. And you don't need to poll for timeout for this specific type of error. Just read some known register like GSNPSID to see if it's invalid. BR, Thinh
On 10/13/2023 12:55 AM, Wesley Cheng wrote: > Hi Krishna, > > On 10/11/2023 3:02 AM, Krishna Kurapati wrote: >> This implementation is to fix RAM interface getting stuck during >> Synopsys confirmed that the issue is present on all USB3 devices and >> as a workaround, suggested to re-initialize device mode. >> >> Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> >> --- >> drivers/usb/dwc3/core.c | 20 ++++++++++++++++++++ >> drivers/usb/dwc3/core.h | 4 ++++ >> drivers/usb/dwc3/drd.c | 5 +++++ >> drivers/usb/dwc3/gadget.c | 6 ++++-- >> 4 files changed, 33 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c >> index 44ee8526dc28..d18b81cccdc5 100644 >> --- a/drivers/usb/dwc3/core.c >> +++ b/drivers/usb/dwc3/core.c >> @@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work) >> unsigned long flags; >> int ret; >> u32 reg; >> + u8 timeout = 100; >> u32 desired_dr_role; >> mutex_lock(&dwc->mutex); >> @@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct >> *work) >> if (!desired_dr_role) >> goto out; >> + /* >> + * STAR 5001544 - If cable disconnect doesn't generate >> + * disconnect event in device mode, then re-initialize the >> + * controller. >> + */ >> + if ((dwc->cable_disconnected == true) && >> + (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) { >> + while (dwc->connected == true && timeout != 0) { >> + mdelay(10); >> + timeout--; >> + } >> + >> + if (timeout == 0) { >> + dwc3_gadget_soft_disconnect(dwc); >> + udelay(100); >> + dwc3_gadget_soft_connect(dwc); >> + } >> + } >> + >> if (desired_dr_role == dwc->current_dr_role) >> goto out; >> diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h >> index c6c87acbd376..7642330cf608 100644 >> --- a/drivers/usb/dwc3/core.h >> +++ b/drivers/usb/dwc3/core.h >> @@ -1355,6 +1355,7 @@ struct dwc3 { >> int last_fifo_depth; >> int num_ep_resized; >> struct dentry *debug_root; >> + bool cable_disconnected; >> }; >> #define INCRX_BURST_MODE 0 >> @@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc); >> int dwc3_core_soft_reset(struct dwc3 *dwc); >> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc); >> +int dwc3_gadget_soft_connect(struct dwc3 *dwc); >> + >> #if IS_ENABLED(CONFIG_USB_DWC3_HOST) || >> IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE) >> int dwc3_host_init(struct dwc3 *dwc); >> void dwc3_host_exit(struct dwc3 *dwc); >> diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c >> index 039bf241769a..593c023fc39a 100644 >> --- a/drivers/usb/dwc3/drd.c >> +++ b/drivers/usb/dwc3/drd.c >> @@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct >> usb_role_switch *sw, >> struct dwc3 *dwc = usb_role_switch_get_drvdata(sw); >> u32 mode; >> + dwc->cable_disconnected = false; >> + >> switch (role) { >> case USB_ROLE_HOST: >> mode = DWC3_GCTL_PRTCAP_HOST; >> @@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct >> usb_role_switch *sw, >> mode = DWC3_GCTL_PRTCAP_DEVICE; >> break; >> default: >> + if (role == USB_ROLE_NONE) >> + dwc->cable_disconnected = true; >> + > > How do we handle cases where role switch isn't used? (ie extcon or maybe > no cable connection notification at all) > Hi Wesley, Since I was considering fixing it during disconnect, I made it in role switch. So no cable connection notification case has been ruled out in this patch. But yes, extcon is a valid case. Will need to account for it. Regards, Krishna,
On 10/13/2023 3:47 AM, Thinh Nguyen wrote: > On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote: >> >> >> On 10/12/2023 11:29 PM, Thinh Nguyen wrote: >> >>>> -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >>>> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >>>> { >>>> unsigned long flags; >>>> int ret; >>>> @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >>>> return ret; >>>> } >>>> -static int dwc3_gadget_soft_connect(struct dwc3 *dwc) >>>> +int dwc3_gadget_soft_connect(struct dwc3 *dwc) >>>> { >>>> int ret; >>>> @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc) >>>> dwc3_gadget_dctl_write_safe(dwc, reg); >>>> dwc->connected = false; >>>> + dwc->cable_disconnected = true; >>>> dwc3_disconnect_gadget(dwc); >>>> @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) >>>> */ >>>> dwc3_stop_active_transfers(dwc); >>>> dwc->connected = true; >>>> + dwc->cable_disconnected = false; >>>> reg = dwc3_readl(dwc->regs, DWC3_DCTL); >>>> reg &= ~DWC3_DCTL_TSTCTRL_MASK; >>>> -- >>>> 2.42.0 >>>> >>> >>> We can just reset the controller when there's End Transfer command >>> timeout as a failure recovery. No need to do what you're doing here. >>> >> Hi Thinh, >> >> That was what I initially wanted to do, but there were couple of reasons I >> wanted to take this approach: >> >> 1. We can't just reset the controller in midst of gadget_interrupt. We need >> to process it completely and then take action. > > You can flag the driver so you can do the teardown/soft-reset at the > appropriate time. > >> >> 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on >> other platforms of same targets, the issue comes up at some other instances >> of code, at a point where no IRQ is running. In such cases its not possible >> to accurately find out code portions and reset the controller. The way I >> confirmed that both platforms are having the same issue is: >> >> a. During cable disconnect, I am not receiving disconnect interrupt >> b. The reg dump is exactly same in both cases (BMU as well) >> >> So I felt it was better to fix it during cable disconnect because even if we >> remove cable, we are still in device mode only and in this case we can >> unblock suspend and also bring back controller to a known state. >> >> Let me know your thoughts on the above. >> > > This issue happens outside of disconnect right? Did you account for port > reset? > > The symptom should be the same. At some point, a command will be issued. > If a command timed out, then something is really wrong (especially End > Transfer command). We can attempt to recover base on this symptom then. > > And you don't need to poll for timeout for this specific type of error. > Just read some known register like GSNPSID to see if it's invalid. Hi Thinh, Yes. It actually happens before disconnect, but I was trying to handle it during disconnect. How about I add a check in gadget_ep_cmd saying that if cmd timeout happens, then read the snpsid and if it reads an invalid value, then I will call soft_disconnect followed by soft_connect. That must cover all cases ? Regards, Krishna,
On 10/13/2023 9:28 AM, Krishna Kurapati PSSNV wrote: > > > On 10/13/2023 3:47 AM, Thinh Nguyen wrote: >> On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote: >>> >>> >>> On 10/12/2023 11:29 PM, Thinh Nguyen wrote: >>> >>>>> -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >>>>> +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) >>>>> { >>>>> unsigned long flags; >>>>> int ret; >>>>> @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct >>>>> dwc3 *dwc) >>>>> return ret; >>>>> } >>>>> -static int dwc3_gadget_soft_connect(struct dwc3 *dwc) >>>>> +int dwc3_gadget_soft_connect(struct dwc3 *dwc) >>>>> { >>>>> int ret; >>>>> @@ -3963,6 +3963,7 @@ static void >>>>> dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc) >>>>> dwc3_gadget_dctl_write_safe(dwc, reg); >>>>> dwc->connected = false; >>>>> + dwc->cable_disconnected = true; >>>>> dwc3_disconnect_gadget(dwc); >>>>> @@ -4038,6 +4039,7 @@ static void >>>>> dwc3_gadget_reset_interrupt(struct dwc3 *dwc) >>>>> */ >>>>> dwc3_stop_active_transfers(dwc); >>>>> dwc->connected = true; >>>>> + dwc->cable_disconnected = false; >>>>> reg = dwc3_readl(dwc->regs, DWC3_DCTL); >>>>> reg &= ~DWC3_DCTL_TSTCTRL_MASK; >>>>> -- >>>>> 2.42.0 >>>>> >>>> >>>> We can just reset the controller when there's End Transfer command >>>> timeout as a failure recovery. No need to do what you're doing here. >>>> >>> Hi Thinh, >>> >>> That was what I initially wanted to do, but there were couple of >>> reasons I >>> wanted to take this approach: >>> >>> 1. We can't just reset the controller in midst of gadget_interrupt. >>> We need >>> to process it completely and then take action. >> >> You can flag the driver so you can do the teardown/soft-reset at the >> appropriate time. >> >>> >>> 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on >>> other platforms of same targets, the issue comes up at some other >>> instances >>> of code, at a point where no IRQ is running. In such cases its not >>> possible >>> to accurately find out code portions and reset the controller. The way I >>> confirmed that both platforms are having the same issue is: >>> >>> a. During cable disconnect, I am not receiving disconnect interrupt >>> b. The reg dump is exactly same in both cases (BMU as well) >>> >>> So I felt it was better to fix it during cable disconnect because >>> even if we >>> remove cable, we are still in device mode only and in this case we can >>> unblock suspend and also bring back controller to a known state. >>> >>> Let me know your thoughts on the above. >>> >> >> This issue happens outside of disconnect right? Did you account for port >> reset? >> >> The symptom should be the same. At some point, a command will be issued. >> If a command timed out, then something is really wrong (especially End >> Transfer command). We can attempt to recover base on this symptom then. >> >> And you don't need to poll for timeout for this specific type of error. >> Just read some known register like GSNPSID to see if it's invalid. > > Hi Thinh, > > Yes. It actually happens before disconnect, but I was trying to handle > it during disconnect. How about I add a check in gadget_ep_cmd saying > that if cmd timeout happens, then read the snpsid and if it reads an > invalid value, then I will call soft_disconnect followed by > soft_connect. That must cover all cases ? > Hi Thinh, One more datapoint for putting forward the above opinion. There are two types of situations observed during bus reset when we try to do endxfer on ep0in followed by set_stall on ep0out: 1. If we issue endxfer for ep0in, and it times out, then the next command set_stall is successful (only because all reads give zero and since we check for CMDACT bit for ep0out after setting it, and since the read of DEPCMD register gives "0" while polling for it, sw indicated that set stall is successful. But when we check GEVTADDR at this point, both ADDRLO and ADDRHI read zero. 2. If ram access timeout occurs while polling for CMDACT bit after issuing endxfer on ep0in, then at that instant, polling returns true, sw indicates that endxfer is successful but inreality when checked, GEVTADDR LO/HI read zero. So I think it would be better to check for validity of GEVADDR registers at end of gadget_ep_cmd to identify id RAM access timeout happened and call soft_disconnect followed by soft_connect. Let me know your thoughts on this Regards, Krishna,
On Sun, Oct 15, 2023, Krishna Kurapati PSSNV wrote: > > > On 10/13/2023 9:28 AM, Krishna Kurapati PSSNV wrote: > > > > > > On 10/13/2023 3:47 AM, Thinh Nguyen wrote: > > > On Fri, Oct 13, 2023, Krishna Kurapati PSSNV wrote: > > > > > > > > > > > > On 10/12/2023 11:29 PM, Thinh Nguyen wrote: > > > > > > > > > > -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > > > > > +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > > > > > { > > > > > > unsigned long flags; > > > > > > int ret; > > > > > > @@ -2701,7 +2701,7 @@ static int > > > > > > dwc3_gadget_soft_disconnect(struct dwc3 *dwc) > > > > > > return ret; > > > > > > } > > > > > > -static int dwc3_gadget_soft_connect(struct dwc3 *dwc) > > > > > > +int dwc3_gadget_soft_connect(struct dwc3 *dwc) > > > > > > { > > > > > > int ret; > > > > > > @@ -3963,6 +3963,7 @@ static void > > > > > > dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc) > > > > > > dwc3_gadget_dctl_write_safe(dwc, reg); > > > > > > dwc->connected = false; > > > > > > + dwc->cable_disconnected = true; > > > > > > dwc3_disconnect_gadget(dwc); > > > > > > @@ -4038,6 +4039,7 @@ static void > > > > > > dwc3_gadget_reset_interrupt(struct dwc3 *dwc) > > > > > > */ > > > > > > dwc3_stop_active_transfers(dwc); > > > > > > dwc->connected = true; > > > > > > + dwc->cable_disconnected = false; > > > > > > reg = dwc3_readl(dwc->regs, DWC3_DCTL); > > > > > > reg &= ~DWC3_DCTL_TSTCTRL_MASK; > > > > > > -- > > > > > > 2.42.0 > > > > > > > > > > > > > > > > We can just reset the controller when there's End Transfer command > > > > > timeout as a failure recovery. No need to do what you're doing here. > > > > > > > > > Hi Thinh, > > > > > > > > That was what I initially wanted to do, but there were couple > > > > of reasons I > > > > wanted to take this approach: > > > > > > > > 1. We can't just reset the controller in midst of > > > > gadget_interrupt. We need > > > > to process it completely and then take action. > > > > > > You can flag the driver so you can do the teardown/soft-reset at the > > > appropriate time. > > > > > > > > > > > 2. The above log was seen on QRD variant of SM8550/SM8650 easily. But on > > > > other platforms of same targets, the issue comes up at some > > > > other instances > > > > of code, at a point where no IRQ is running. In such cases its > > > > not possible > > > > to accurately find out code portions and reset the controller. The way I > > > > confirmed that both platforms are having the same issue is: > > > > > > > > a. During cable disconnect, I am not receiving disconnect interrupt > > > > b. The reg dump is exactly same in both cases (BMU as well) > > > > > > > > So I felt it was better to fix it during cable disconnect > > > > because even if we > > > > remove cable, we are still in device mode only and in this case we can > > > > unblock suspend and also bring back controller to a known state. > > > > > > > > Let me know your thoughts on the above. > > > > > > > > > > This issue happens outside of disconnect right? Did you account for port > > > reset? > > > > > > The symptom should be the same. At some point, a command will be issued. > > > If a command timed out, then something is really wrong (especially End > > > Transfer command). We can attempt to recover base on this symptom then. > > > > > > And you don't need to poll for timeout for this specific type of error. > > > Just read some known register like GSNPSID to see if it's invalid. > > > > Hi Thinh, > > > > Yes. It actually happens before disconnect, but I was trying to handle > > it during disconnect. How about I add a check in gadget_ep_cmd saying > > that if cmd timeout happens, then read the snpsid and if it reads an > > invalid value, then I will call soft_disconnect followed by > > soft_connect. That must cover all cases ? > > > Hi Thinh, > > One more datapoint for putting forward the above opinion. There are two > types of situations observed during bus reset when we try to do endxfer on > ep0in followed by set_stall on ep0out: > > 1. If we issue endxfer for ep0in, and it times out, then the next command > set_stall is successful (only because all reads give zero and since we check > for CMDACT bit for ep0out after setting it, and since the read of DEPCMD > register gives "0" while polling for it, sw indicated that set stall is > successful. But when we check GEVTADDR at this point, both ADDRLO and ADDRHI > read zero. > > 2. If ram access timeout occurs while polling for CMDACT bit after issuing > endxfer on ep0in, then at that instant, polling returns true, sw indicates > that endxfer is successful but inreality when checked, GEVTADDR LO/HI read > zero. > > So I think it would be better to check for validity of GEVADDR registers at > end of gadget_ep_cmd to identify id RAM access timeout happened and call > soft_disconnect followed by soft_connect. > > Let me know your thoughts on this > In this case, the check can be simpler. You can just read back and check for the command type of DEPCMD or DGCMD when polling for CMDACT to see if it matches the command you initiated. If not, something is wrong and you can perform the soft disconnect connect to recover. Thanks, Thinh
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 44ee8526dc28..d18b81cccdc5 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -122,6 +122,7 @@ static void __dwc3_set_mode(struct work_struct *work) unsigned long flags; int ret; u32 reg; + u8 timeout = 100; u32 desired_dr_role; mutex_lock(&dwc->mutex); @@ -137,6 +138,25 @@ static void __dwc3_set_mode(struct work_struct *work) if (!desired_dr_role) goto out; + /* + * STAR 5001544 - If cable disconnect doesn't generate + * disconnect event in device mode, then re-initialize the + * controller. + */ + if ((dwc->cable_disconnected == true) && + (dwc->current_dr_role == DWC3_GCTL_PRTCAP_DEVICE)) { + while (dwc->connected == true && timeout != 0) { + mdelay(10); + timeout--; + } + + if (timeout == 0) { + dwc3_gadget_soft_disconnect(dwc); + udelay(100); + dwc3_gadget_soft_connect(dwc); + } + } + if (desired_dr_role == dwc->current_dr_role) goto out; diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index c6c87acbd376..7642330cf608 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -1355,6 +1355,7 @@ struct dwc3 { int last_fifo_depth; int num_ep_resized; struct dentry *debug_root; + bool cable_disconnected; }; #define INCRX_BURST_MODE 0 @@ -1568,6 +1569,9 @@ void dwc3_event_buffers_cleanup(struct dwc3 *dwc); int dwc3_core_soft_reset(struct dwc3 *dwc); +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc); +int dwc3_gadget_soft_connect(struct dwc3 *dwc); + #if IS_ENABLED(CONFIG_USB_DWC3_HOST) || IS_ENABLED(CONFIG_USB_DWC3_DUAL_ROLE) int dwc3_host_init(struct dwc3 *dwc); void dwc3_host_exit(struct dwc3 *dwc); diff --git a/drivers/usb/dwc3/drd.c b/drivers/usb/dwc3/drd.c index 039bf241769a..593c023fc39a 100644 --- a/drivers/usb/dwc3/drd.c +++ b/drivers/usb/dwc3/drd.c @@ -446,6 +446,8 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw, struct dwc3 *dwc = usb_role_switch_get_drvdata(sw); u32 mode; + dwc->cable_disconnected = false; + switch (role) { case USB_ROLE_HOST: mode = DWC3_GCTL_PRTCAP_HOST; @@ -454,6 +456,9 @@ static int dwc3_usb_role_switch_set(struct usb_role_switch *sw, mode = DWC3_GCTL_PRTCAP_DEVICE; break; default: + if (role == USB_ROLE_NONE) + dwc->cable_disconnected = true; + if (dwc->role_switch_default_mode == USB_DR_MODE_HOST) mode = DWC3_GCTL_PRTCAP_HOST; else diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c index 858fe4c299b7..a92df2e04cce 100644 --- a/drivers/usb/dwc3/gadget.c +++ b/drivers/usb/dwc3/gadget.c @@ -2634,7 +2634,7 @@ static void dwc3_gadget_disable_irq(struct dwc3 *dwc); static void __dwc3_gadget_stop(struct dwc3 *dwc); static int __dwc3_gadget_start(struct dwc3 *dwc); -static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) +int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) { unsigned long flags; int ret; @@ -2701,7 +2701,7 @@ static int dwc3_gadget_soft_disconnect(struct dwc3 *dwc) return ret; } -static int dwc3_gadget_soft_connect(struct dwc3 *dwc) +int dwc3_gadget_soft_connect(struct dwc3 *dwc) { int ret; @@ -3963,6 +3963,7 @@ static void dwc3_gadget_disconnect_interrupt(struct dwc3 *dwc) dwc3_gadget_dctl_write_safe(dwc, reg); dwc->connected = false; + dwc->cable_disconnected = true; dwc3_disconnect_gadget(dwc); @@ -4038,6 +4039,7 @@ static void dwc3_gadget_reset_interrupt(struct dwc3 *dwc) */ dwc3_stop_active_transfers(dwc); dwc->connected = true; + dwc->cable_disconnected = false; reg = dwc3_readl(dwc->regs, DWC3_DCTL); reg &= ~DWC3_DCTL_TSTCTRL_MASK;
This implementation is to fix RAM interface getting stuck during enumeration and controller not responding to any command. During plug-out test cases, it is sometimes seen that no events are generated by the controller and all CSR register reads give "0" and CSR_Timeout bit gets set indicating that CSR reads/writes are timing out or timed out. The issue comes up on different instnaces of enumeration on different platforms. On one platform, the debug log is as follows: Prepared a TRB on ep0out and did start transfer to get set address request from host: <...>-7191 [000] D..1. 66.421006: dwc3_gadget_ep_cmd: ep0out: cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 --> status: Successful <...>-7191 [000] D..1. 66.421196: dwc3_event: event (0000c040): ep0out: Transfer Complete (sIL) [Setup Phase] <...>-7191 [000] D..1. 66.421197: dwc3_ctrl_req: Set Address(Addr = 01) Then XFER NRDY received on ep0in for zero length status phase and a Start Transfer was done on ep0in with 0-length packet in 2 Stage status phase: <...>-7191 [000] D..1. 66.421249: dwc3_event: event (000020c2): ep0in: Transfer Not Ready [00000000] (Not Active) [Status Phase] <...>-7191 [000] D..1. 66.421266: dwc3_prepare_trb: ep0in: trb ffffffc00fcfd000 (E0:D0) buf 00000000efffa000 size 0 ctrl 00000c33 sofn 00000000 (HLcs:SC:status2) <...>-7191 [000] D..1. 66.421387: dwc3_gadget_ep_cmd: ep0in: cmd 'Start Transfer' [406] params 00000000 efffa000 00000000 -->status: Successful Then a bus reset was received directly after 500 msec. Software never got the cmd complete for the start transfer done in status phase. Here the RAM interface is stuck. So host issues a bus reset as link is idle for 500 msec: <...>-7191 [000] D..1. 66.935603: dwc3_event: event (00000101): Reset [U0] Then software sees that it is in status phase and we issue an ENDXFER on ep0in and it gets timedout waiting for the CMDACT to go '0': <...>-7191 [000] D..1. 66.958249: dwc3_gadget_ep_cmd: ep0in: cmd 'End Transfer' [10508] params 00000000 00000000 00000000 --> status: Timed Out Upon debug with Synopsys, it turns out that the root cause is as follows: During any transfer, if the data is not successfully transmitted, then a Done (with failure) handshake is returned, so that the BMU can re-attempt the same data again by rewinding its data pointers. But, if the USB IN is a 0-length payload (which is what is happening in this case - 2 stage status phase of set_address), then there is no need to rewind the pointers and the Done (with failure) handshake is not returned for failure case. This keeps the Request-Done interface busy till the next Done handshake. The MAC sends the 0-length payload again when the host requests. If the transmission is successful this time, the Done (with success) handshake is provided back. Otherwise, it repeats the same steps again. If the cable is disconnected or if the Host aborts the transfer on 3 consecutive failed attempts, the Request-Done handshake is not complete. This keeps the interface busy. The subsequent RAM access cannot proceed until the above pending transfer is complete. This results in failure of any access to RAM address locations. Many of the EndPoint commands need to access the RAM and they would fail to complete successfully. Furthermore when cable removal happens, this would not generate a disconnect event and the "connected" flag remains true always blockin suspend. Synopsys confirmed that the issue is present on all USB3 devices and as a workaround, suggested to re-initialize device mode. Signed-off-by: Krishna Kurapati <quic_kriskura@quicinc.com> --- drivers/usb/dwc3/core.c | 20 ++++++++++++++++++++ drivers/usb/dwc3/core.h | 4 ++++ drivers/usb/dwc3/drd.c | 5 +++++ drivers/usb/dwc3/gadget.c | 6 ++++-- 4 files changed, 33 insertions(+), 2 deletions(-)