usb: host: xhci: fix HALTED endpoint handling

Message ID 1390250711-25840-1-git-send-email-balbi@ti.com
State New
Headers show

Commit Message

Felipe Balbi Jan. 20, 2014, 8:45 p.m.
If the HW reports the endpoint as Halted, there's
no need to start any URB for that endpoint, we can
simply return -EPIPE and this will tell upper layers
that the endpoint is, indeed, halted.

This patch fixes usbtest's set/clear halt test #13
on xHCI-based hosts.

Cc: <stable@vger.kernel.org>
Signed-off-by: Felipe Balbi <balbi@ti.com>
---

Hi Sarah,

seems like this has been broken for quite a long
time.

I tested the patch on an AM437x which has XHCI Host and
the same IP configured as device as well (two dwc3 instances,
basically).

It has been running for the past hour or so without any failures
when running testusb -t 13 -c 5000 -s 2048 -a in a loop, but
if you think there's a better way to fix this, let me know. I just
didn't understand why xhci-hcd still queues the URB even though
HW already told us the endpoint is halted. Any comments ?

cheers

 drivers/usb/host/xhci-ring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Felipe Balbi Jan. 21, 2014, 5:55 p.m. | #1
On Tue, Jan 21, 2014 at 09:39:12AM -0800, Sarah Sharp wrote:
> On Mon, Jan 20, 2014 at 02:45:11PM -0600, Felipe Balbi wrote:
> > If the HW reports the endpoint as Halted, there's
> > no need to start any URB for that endpoint, we can
> > simply return -EPIPE and this will tell upper layers
> > that the endpoint is, indeed, halted.
> > 
> > This patch fixes usbtest's set/clear halt test #13
> > on xHCI-based hosts.
> 
> It looks like that test assumes that the xHCI driver should not allow
> URBs to be enqueued to the endpoint when the driver needs to call
> usb_clear_halt().  Or is it expecting that the host will attempt the
> transfer, and the device will stall the transfer?

I suppose it doesn't make a difference, considering it
wait_for_completion_interruptible(). As long as the transfer actually
completes within 10 seconds, it doesn't matter.

What I have seen, though, is the transfer never completes and the 10
second timeout always kicks in when trying two consecutive
usb_submit_urb().

Here's, in a nut-shell, what happens:

-> SetFeature(ENDPOINT_HALT)
-> GetStatus()
-> usb_submit_urb()
	-> this one completes with -EPIPE as it should
-> usb_submit_urb()
	-> this one always times out :-(

> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Felipe Balbi <balbi@ti.com>
> > ---
> > 
> > Hi Sarah,
> > 
> > seems like this has been broken for quite a long
> > time.
> > 
> > I tested the patch on an AM437x which has XHCI Host and
> > the same IP configured as device as well (two dwc3 instances,
> > basically).
> > 
> > It has been running for the past hour or so without any failures
> > when running testusb -t 13 -c 5000 -s 2048 -a in a loop, but
> > if you think there's a better way to fix this, let me know. I just
> > didn't understand why xhci-hcd still queues the URB even though
> > HW already told us the endpoint is halted. Any comments ?
> 
> The reason the driver still queues URBs is because the xHC halts the
> endpoint for events that are outside of stalls.  For instance, it will
> halt the endpoint on a babble error, a transfer error, or a split
> transaction error.  The upper layers don't call usb_clear_halt() for
> those cases, so the xHCI driver handles the halted endpoint internally.
> Look for calls to xhci_requires_manual_halt_cleanup().

alright.

> The xHC also halts the endpoint on a control transfer stall.  The upper
> layers don't ever call usb_clear_halt on the control endpoint, because
> the bus spec says the next control transfer will clear the halted
> condition.  Therefore the xHCI driver has to handle the halt manually.
> 
> The way the driver does that is to set a flag for the endpoint
> (EP_HALTED), issue a Reset Endpoint command, and then a Set TR Dequeue
> command to move the dequeue pointer past the transfer that failed.  The
> driver still accepts URBs submitted to the endpoint (because from the
> upper layer's perspective, we should in the cases where we manually
> handle the halt), but it does not ring the doorbell for that endpoint.
> Once the command to set the dequeue pointer completes, the EP_HALTED
> flag is cleared, the doorbell is rung for the endpoint, and the queued
> URBs are processed.
> 
> The end result is that the xHCI driver *needs* to allow URBs to be
> queued when the endpoint is halted and it needs to manual cleanup the
> ring.  Otherwise, the driver might queue an URB while the two commands
> are still being processed, the driver will reject the URBs, and the
> behavior in these corner cases will be different than other host
> controllers.
> 
> So, no, I don't think this is quite the right solution.
> 
> You could try to differentiate between an endpoint halt that requires a
> manual cleanup, and ones that the upper layer should handle, perhaps by
> adding a new EP_HALTED_MANUAL flag.  The driver could accept URBs for
> the manual cleanup case, and reject URBs with -EPIPE for the ones the
> upper layers should handle.
> 
> Code that refuses to ring the endpoint doorbell would have to look for
> both EP_HALTED and EP_HALTED_MANUAL.  There might be other implications
> as well; grep for EP_HALTED and see what code checks it.

will do.

So what's supposed to actually happen with that test ? I'd expect the
URB to actually be started (meaning we should ring the ep doorbell ?)
and the device would reply with a Stall which, in turn, would return
-EPIPE to upper layers.

Currently, that URB seems like it's never started to begin with, so we
never see a Stall from the device side and the test fails.

Surprisingly, this only happens on the second usb_submit_urb(). From
what I could gather yesterday, EP_STATE_HALTED isn't yet set when the
first usb_submit_urb() is called. Likely that because of that, the
doorbell is rung and the device has a chance to Stall the transfer.

Or am I missing something ?

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 53c2e29..e9df61a 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -2959,7 +2959,8 @@  static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring,
 		/* XXX not sure if this should be -ENOENT or not */
 		return -EINVAL;
 	case EP_STATE_HALTED:
-		xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n");
+		xhci_dbg(xhci, "WARN halted endpoint.\n");
+		return -EPIPE;
 	case EP_STATE_STOPPED:
 	case EP_STATE_RUNNING:
 		break;