[RFC/PATCH,v2] usb: host: xhci: issue 'Reset Endpoint' for !CONTROL EPs from finish_td()

Message ID 1415818273-22610-1-git-send-email-balbi@ti.com
State New
Headers show

Commit Message

Felipe Balbi Nov. 12, 2014, 6:51 p.m.
If class driver wants to SetFeature(ENDPOINT_HALT) and
later tries to talk to the stalled endpoint, xhci will
move endpoint to EP_STATE_STALL and subsequent usb_submit_urb()
will not cause a USB token to be shifted into the data lines.

Because of that, peripheral will never have any means of
STALLing a follow up token.

This is a known error at least with g_zero + testusb -t 13
and can be easily reproduced.

Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Felipe Balbi <balbi@ti.com>
---

this is a second version of patch sent originally in January this
year [1].

I suppose this is still not the best fix (as noted on code comment below),
but it'll serve to restart the thread on what's the best way to fix this.

I thought about clearing endpoint halt if !control from prepare_ring(),
but I'm not entirely of the implications there either.

I'm adding stable to Cc for v3.14+ (which is the earlier kernel I could easily
run to reproduce this) but it's likely a much older bug. Still, if nobody other
than me complained until now, perhaps we don't care as much ?

[1] http://marc.info/?l=linux-usb&m=139025060301432&w=2

 drivers/usb/host/xhci-ring.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

Comments

Alan Stern Nov. 12, 2014, 8:03 p.m. | #1
On Wed, 12 Nov 2014, Felipe Balbi wrote:

> If class driver wants to SetFeature(ENDPOINT_HALT) and
> later tries to talk to the stalled endpoint, xhci will
> move endpoint to EP_STATE_STALL and subsequent usb_submit_urb()
> will not cause a USB token to be shifted into the data lines.
> 
> Because of that, peripheral will never have any means of
> STALLing a follow up token.
> 
> This is a known error at least with g_zero + testusb -t 13
> and can be easily reproduced.

Can you elaborate this description a bit more?  I don't understand what 
the problem is.

For instance, if an endpoint is halted then there's no reason for the
controller to shift any USB tokens for it onto the data lines.  Doing
so would just be a waste of bandwidth, since the response is bound to
be another STALL.  And it doesn't matter that the peripheral has no
means to STALL any follow-up tokens, since the host controller already
knows the endpoint is halted.

Of course, control endpoints behave differently from other ones.  But 
that's not what you're talking about here, since test 13 is for bulk or 
interrupt endpoints.

The comment in the patch talks about moving the dequeue pointer past
the STALLed TD and then clearing the halt condition.  Moving the
dequeue pointer is fine -- there's no other way to take control of the
TD back from the hardware -- but why would you want to clear the halt?  
The HCD isn't supposed to do that; the class driver is.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Felipe Balbi Nov. 12, 2014, 8:17 p.m. | #2
Hi,

On Wed, Nov 12, 2014 at 03:03:10PM -0500, Alan Stern wrote:
> On Wed, 12 Nov 2014, Felipe Balbi wrote:
> 
> > If class driver wants to SetFeature(ENDPOINT_HALT) and
> > later tries to talk to the stalled endpoint, xhci will
> > move endpoint to EP_STATE_STALL and subsequent usb_submit_urb()
> > will not cause a USB token to be shifted into the data lines.
> > 
> > Because of that, peripheral will never have any means of
> > STALLing a follow up token.
> > 
> > This is a known error at least with g_zero + testusb -t 13
> > and can be easily reproduced.
> 
> Can you elaborate this description a bit more?  I don't understand what 
> the problem is.

Look at drivers/usb/misc/usbtest.c::test_halt(). Here's a dump of the
code:

static int verify_not_halted(struct usbtest_dev *tdev, int ep, struct urb *urb)
{
	int	retval;
	u16	status;

	/* shouldn't look or act halted */
	retval = usb_get_status(urb->dev, USB_RECIP_ENDPOINT, ep, &status);
	if (retval < 0) {
		ERROR(tdev, "ep %02x couldn't get no-halt status, %d\n",
				ep, retval);
		return retval;
	}
	if (status != 0) {
		ERROR(tdev, "ep %02x bogus status: %04x != 0\n", ep, status);
		return -EINVAL;
	}
	retval = simple_io(tdev, urb, 1, 0, 0, __func__);
	if (retval != 0)
		return -EINVAL;
	return 0;
}

static int verify_halted(struct usbtest_dev *tdev, int ep, struct urb *urb)
{
	int	retval;
	u16	status;

	/* should look and act halted */
	retval = usb_get_status(urb->dev, USB_RECIP_ENDPOINT, ep, &status);
	if (retval < 0) {
		ERROR(tdev, "ep %02x couldn't get halt status, %d\n",
				ep, retval);
		return retval;
	}
	if (status != 1) {
		ERROR(tdev, "ep %02x bogus status: %04x != 1\n", ep, status);
		return -EINVAL;
	}
	retval = simple_io(tdev, urb, 1, 0, -EPIPE, __func__);
	if (retval != -EPIPE)
		return -EINVAL;
	retval = simple_io(tdev, urb, 1, 0, -EPIPE, "verify_still_halted");
	if (retval != -EPIPE)
		return -EINVAL;
	return 0;
}

static int test_halt(struct usbtest_dev *tdev, int ep, struct urb *urb)
{
	int	retval;

	/* shouldn't look or act halted now */
	retval = verify_not_halted(tdev, ep, urb);
	if (retval < 0)
		return retval;

	/* set halt (protocol test only), verify it worked */
	retval = usb_control_msg(urb->dev, usb_sndctrlpipe(urb->dev, 0),
			USB_REQ_SET_FEATURE, USB_RECIP_ENDPOINT,
			USB_ENDPOINT_HALT, ep,
			NULL, 0, USB_CTRL_SET_TIMEOUT);
	if (retval < 0) {
		ERROR(tdev, "ep %02x couldn't set halt, %d\n", ep, retval);
		return retval;
	}
	retval = verify_halted(tdev, ep, urb);
	if (retval < 0) {
		int ret;

		/* clear halt anyways, else further tests will fail */
		ret = usb_clear_halt(urb->dev, urb->pipe);
		if (ret)
			ERROR(tdev, "ep %02x couldn't clear halt, %d\n",
			      ep, ret);

		return retval;
	}

	/* clear halt (tests API + protocol), verify it worked */
	retval = usb_clear_halt(urb->dev, urb->pipe);
	if (retval < 0) {
		ERROR(tdev, "ep %02x couldn't clear halt, %d\n", ep, retval);
		return retval;
	}
	retval = verify_not_halted(tdev, ep, urb);
	if (retval < 0)
		return retval;

	/* NOTE:  could also verify SET_INTERFACE clear halts ... */

	return 0;
}

Note, specially, what verify_halted() does. It will issue a GetStatus()
followed by two usb_submit_urb(). The first usb_submit_urb() will be
STALLed by the function (g_zero) and that will cause the endpoint to
move from EP_STATE_RUNNING to EP_STATE_HALTED. The following
usb_submit_urb() will trigger:

2815         case EP_STATE_HALTED:
2816                 xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n");

But, because EP_HALTED flag is set, xhci_ring_ep_doorbell() will return
early:

316 void xhci_ring_ep_doorbell(struct xhci_hcd *xhci,
317                 unsigned int slot_id,
318                 unsigned int ep_index,
319                 unsigned int stream_id)
320 {
321         __le32 __iomem *db_addr = &xhci->dba->doorbell[slot_id];
322         struct xhci_virt_ep *ep = &xhci->devs[slot_id]->eps[ep_index];
323         unsigned int ep_state = ep->ep_state;
324 
325         /* Don't ring the doorbell for this endpoint if there are pending
326          * cancellations because we don't want to interrupt processing.
327          * We don't want to restart any stream rings if there's a set dequeue
328          * pointer command pending because the device can choose to start any
329          * stream once the endpoint is on the HW schedule.
330          */
331         if ((ep_state & EP_HALT_PENDING) || (ep_state & SET_DEQ_PENDING) ||
332             (ep_state & EP_HALTED))
333                 return;
334         writel(DB_VALUE(ep_index, stream_id), db_addr);
335         /* The CPU has better things to do at this point than wait for a
336          * write-posting flush.  It'll get there soon enough.
337          */
338 }

and the doorbell will never rung. But even if I drop EP_HALTED from the
check below and let EP doorbell be rung, nothing will happen because,
according to XHCI spec 1.0, we *must* first issue a Reset Endpoint
command to get Endpoint to move to EP_STATE_STOPPED, in which case,
ringing that EP's doorbell will actually cause a transfer to happen.

Right now, what happens is that second usb_submit_urb() does nothing and
the 10 second timer expires, causing the URB to be canceled and test
failing with -ETIMEDOUT.

> For instance, if an endpoint is halted then there's no reason for the
> controller to shift any USB tokens for it onto the data lines.  Doing
> so would just be a waste of bandwidth, since the response is bound to
> be another STALL.  And it doesn't matter that the peripheral has no
> means to STALL any follow-up iens, since the host controller already
> knows the endpoint is halted.

Now you're claiming that this is a bug on usbtest which has been in tree
for many, many years and is known to work with EHCI, MUSB and UHCI (at
least, probably dummy too), which is a different statement from previous
thread [1].

> Of course, control endpoints behave differently from other ones.  But 
> that's not what you're talking about here, since test 13 is for bulk or 
> interrupt endpoints.
> 
> The comment in the patch talks about moving the dequeue pointer past
> the STALLed TD and then clearing the halt condition.  Moving the
> dequeue pointer is fine -- there's no other way to take control of the
> TD back from the hardware -- but why would you want to clear the halt?  
> The HCD isn't supposed to do that; the class driver is.

See what usbtest does. It wants to make sure that, even if we issue
several URBs for that endpoint, the function will always STALL. Sure,
it's a waste of bandwidth, but what's the probability that any class
driver will actually do this outside of a test environment ? I think
it's not up to the HCD to device and it should, rather, let the function
respond with the expected STALL again which will, once more, move the
endpoint back into EP_STATE_HALT.

The only thing we should be discussing here, is proper placement for
xhci_cleanup_halted_endpoint().

[1] http://marc.info/?l=linux-usb&m=139033307026763&w=2
Alan Stern Nov. 12, 2014, 9:54 p.m. | #3
On Wed, 12 Nov 2014, Felipe Balbi wrote:

> Hi,
> 
> On Wed, Nov 12, 2014 at 03:03:10PM -0500, Alan Stern wrote:
> > On Wed, 12 Nov 2014, Felipe Balbi wrote:
> > 
> > > If class driver wants to SetFeature(ENDPOINT_HALT) and
> > > later tries to talk to the stalled endpoint, xhci will
> > > move endpoint to EP_STATE_STALL and subsequent usb_submit_urb()
> > > will not cause a USB token to be shifted into the data lines.
> > > 
> > > Because of that, peripheral will never have any means of
> > > STALLing a follow up token.
> > > 
> > > This is a known error at least with g_zero + testusb -t 13
> > > and can be easily reproduced.
> > 
> > Can you elaborate this description a bit more?  I don't understand what 
> > the problem is.
> 
> Look at drivers/usb/misc/usbtest.c::test_halt(). Here's a dump of the
> code:

...

> Note, specially, what verify_halted() does. It will issue a GetStatus()
> followed by two usb_submit_urb(). The first usb_submit_urb() will be
> STALLed by the function (g_zero) and that will cause the endpoint to
> move from EP_STATE_RUNNING to EP_STATE_HALTED. The following
> usb_submit_urb() will trigger:
> 
> 2815         case EP_STATE_HALTED:
> 2816                 xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n");
> 
> But, because EP_HALTED flag is set, xhci_ring_ep_doorbell() will return
> early:

...

> and the doorbell will never rung. But even if I drop EP_HALTED from the
> check below and let EP doorbell be rung, nothing will happen because,
> according to XHCI spec 1.0, we *must* first issue a Reset Endpoint
> command to get Endpoint to move to EP_STATE_STOPPED, in which case,
> ringing that EP's doorbell will actually cause a transfer to happen.
> 
> Right now, what happens is that second usb_submit_urb() does nothing and
> the 10 second timer expires, causing the URB to be canceled and test
> failing with -ETIMEDOUT.

Okay, I see.  What usbcore and usbtest expect to happen is this:

    (1)	An URB fails because the endpoint sent a STALL.  The completion
	routine is called with status -EPIPE.

    (2)	When the completion routine returns, the next URB in the
	endpoint's queue should get handled by the hardware.  If the
	endpoint is still halted, this URB should fail with -EPIPE
	just like the first.

    (3)	Etc.  Eventually the endpoint queue empties out or the class
	driver calls usb_clear_halt().

So (1) works as desired, but (2) doesn't work because the doorbell
never gets rung.  And the easiest way to make (2) work is to issue a
Reset Endpoint command.

(There are other, more complicated ways to get the same result.  For 
instance, you could loop through the remaining queued URBs, giving them 
back one by one with -EPIPE.  And each time an URB is submitted, you 
could give it back right away.  But Reset Endpoint is simpler.)

In the patch, you talked about clearing the endpoint halt.  But that's 
not what you want to do; you want to issue a Reset Endpoint command, 
which affects only the host controller.  The endpoint's status in the 
peripheral device will remain unchanged -- no halt will be cleared.  
That contributed to my confusion on reading the patch.

By the way, does the same sort of thing happen after a transfer
error (such as a CRC mismatch)?  Does the xHCI controller change the 
state to EP_STATE_HALTED?  Or does it instead go directly to 
EP_STATE_STOPPED?  You probably want to treat that case and the STALL 
case as similarly as possible.

> > For instance, if an endpoint is halted then there's no reason for the
> > controller to shift any USB tokens for it onto the data lines.  Doing
> > so would just be a waste of bandwidth, since the response is bound to
> > be another STALL.  And it doesn't matter that the peripheral has no
> > means to STALL any follow-up iens, since the host controller already
> > knows the endpoint is halted.
> 
> Now you're claiming that this is a bug on usbtest which has been in tree
> for many, many years and is known to work with EHCI, MUSB and UHCI (at
> least, probably dummy too), which is a different statement from previous
> thread [1].

No, I simply failed to understood what you wanted to do.

> > The comment in the patch talks about moving the dequeue pointer past
> > the STALLed TD and then clearing the halt condition.  Moving the
> > dequeue pointer is fine -- there's no other way to take control of the
> > TD back from the hardware -- but why would you want to clear the halt?  
> > The HCD isn't supposed to do that; the class driver is.
> 
> See what usbtest does. It wants to make sure that, even if we issue
> several URBs for that endpoint, the function will always STALL. Sure,
> it's a waste of bandwidth, but what's the probability that any class
> driver will actually do this outside of a test environment ? I think
> it's not up to the HCD to device and it should, rather, let the function
> respond with the expected STALL again which will, once more, move the
> endpoint back into EP_STATE_HALT.
> 
> The only thing we should be discussing here, is proper placement for
> xhci_cleanup_halted_endpoint().

Right.  In theory you could do it any time up until the completion
routine returns.  Doing it when you process the failed TD seems like a
good choice -- advance the dequeue pointer and issue the command at the
same time.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index bc6fcbc..4e8c3cf 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -1826,13 +1826,45 @@  static int finish_td(struct xhci_hcd *xhci, struct xhci_td *td,
 		if (trb_comp_code == COMP_STALL) {
 			/* The transfer is completed from the driver's
 			 * perspective, but we need to issue a set dequeue
-			 * command for this stalled endpoint to move the dequeue
-			 * pointer past the TD.  We can't do that here because
-			 * the halt condition must be cleared first.  Let the
-			 * USB class driver clear the stall later.
+			 * command for this stalled endpoint to move the
+			 * dequeue pointer past the TD.
+			 *
+			 * Before we can move the dequeue pointer past the TD,
+			 * we must first clear the halt condition, however, we
+			 * can't clear such condition for control endpoints
+			 * since that can only be cleared through a USB Reset.
+			 *
+			 * Note that this isn't 100% safe because as soon as a
+			 * stalled TD is completed, we will clear the halt
+			 * condition on the endpoint from the host side, which
+			 * might not always be what we want.
+			 *
+			 * Perhaps a better approach would be to differentiate
+			 * SetFeature(ENDPOINT_HALT) from a STALL due to error
+			 * or Babble and only clear halt in that case.
+			 *
+			 * Another difficulty with this case is that if class
+			 * driver wants to talk to a halted endpoint to verify
+			 * SetFeature(ENDPOINT_HALT) was implemented correctly
+			 * by firmware (one such case is g_zero + testusb -t
+			 * 13), without clearing HALT on the host side, EP
+			 * doorbell will be rung but endpoint will not exit
+			 * EP_STATE_HALTED, because we *must* first issue a
+			 * Reset Endpoint command to move the EP to the
+			 * EP_STATE_STOPPED before EP doorbell works again.
+			 *
+			 * So, we will issue 'Reset Endpoint' here to make sure a
+			 * subsequent usb_submit_urb() will actually cause EP
+			 * doorbell to ring so a USB token is shifted into the
+			 * wire and peripheral has a chance of replying with
+			 * STALL again.
 			 */
 			ep->stopped_td = td;
 			ep->stopped_stream = ep_ring->stream_id;
+			if (!usb_endpoint_xfer_control(&td->urb->ep->desc))
+				xhci_cleanup_halted_endpoint(xhci,
+						slot_id, ep_index, ep_ring->stream_id,
+						td, event_trb);
 		} else if (xhci_requires_manual_halt_cleanup(xhci,
 					ep_ctx, trb_comp_code)) {
 			/* Other types of errors halt the endpoint, but the