From patchwork Wed Nov 12 22:28:53 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Felipe Balbi X-Patchwork-Id: 40695 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-la0-f71.google.com (mail-la0-f71.google.com [209.85.215.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id CD37524493 for ; Wed, 12 Nov 2014 22:28:38 +0000 (UTC) Received: by mail-la0-f71.google.com with SMTP id gq15sf8325822lab.6 for ; Wed, 12 Nov 2014 14:28:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:date:from:to:cc:subject:message-id :reply-to:references:mime-version:content-type:content-disposition :in-reply-to:user-agent:sender:precedence:list-id:x-original-sender :x-original-authentication-results:mailing-list:list-post:list-help :list-archive:list-unsubscribe; bh=IpztzabdatzCe/HNCmTFpApT4stDSTGVqElEP7dek+0=; b=EoeGF4G9lxHaRO4LHhjcZWl481/Ek1JMq16mofLyQz1Pqvv6fnW0EEMmb3emFwE0Kx jFl/HKc1Ehs8UlNb7EfjhCXWCLx1xjz1H2g95ciH8sgK83ZH4uk3+ckGXryi2qMok/SJ gyWA+bBkvj7zNZNEb2njSh2StiE2aXabWJHyMn59wvPX16IDiNozkLqLU205YHQ+uKpF BmFFuE28LodEFq14FpDnmT3Cw4zeVz1yOo2lJQtXMNwlFFggiqrVOPawkXvCx/njHGQZ ib3UqDfHly+gubhXTZ9I4hxMy7t1XauvSglk4oTrOW09i1xuHIFWk3+/qhziz5F4xcmG nHyQ== X-Gm-Message-State: ALoCoQnUkuvvlfIC8uchlXjpNM2/C3ZS4Z0UA5fG6mBMFj6L8s/EwF5Z0NEOPbmJ4hC3STw/i4eE X-Received: by 10.194.71.19 with SMTP id q19mr23322wju.5.1415831317802; Wed, 12 Nov 2014 14:28:37 -0800 (PST) X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.6.134 with SMTP id b6ls649194laa.77.gmail; Wed, 12 Nov 2014 14:28:37 -0800 (PST) X-Received: by 10.112.25.73 with SMTP id a9mr45636249lbg.10.1415831317159; Wed, 12 Nov 2014 14:28:37 -0800 (PST) Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com. [209.85.217.172]) by mx.google.com with ESMTPS id kz1si36436860lbc.63.2014.11.12.14.28.37 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 12 Nov 2014 14:28:37 -0800 (PST) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.172 as permitted sender) client-ip=209.85.217.172; Received: by mail-lb0-f172.google.com with SMTP id u10so2726534lbd.17 for ; Wed, 12 Nov 2014 14:28:37 -0800 (PST) X-Received: by 10.112.135.229 with SMTP id pv5mr44628932lbb.52.1415831317032; Wed, 12 Nov 2014 14:28:37 -0800 (PST) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.184.201 with SMTP id ew9csp503941lbc; Wed, 12 Nov 2014 14:28:35 -0800 (PST) X-Received: by 10.66.139.234 with SMTP id rb10mr20806536pab.146.1415831314353; Wed, 12 Nov 2014 14:28:34 -0800 (PST) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s1si23908077pdf.200.2014.11.12.14.28.33 for ; Wed, 12 Nov 2014 14:28:34 -0800 (PST) Received-SPF: none (google.com: stable-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752545AbaKLW2c (ORCPT + 1 other); Wed, 12 Nov 2014 17:28:32 -0500 Received: from arroyo.ext.ti.com ([192.94.94.40]:35905 "EHLO arroyo.ext.ti.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752505AbaKLW2c (ORCPT ); Wed, 12 Nov 2014 17:28:32 -0500 Received: from dflxv15.itg.ti.com ([128.247.5.124]) by arroyo.ext.ti.com (8.13.7/8.13.7) with ESMTP id sACMSSXR017418; Wed, 12 Nov 2014 16:28:28 -0600 Received: from DFLE73.ent.ti.com (dfle73.ent.ti.com [128.247.5.110]) by dflxv15.itg.ti.com (8.14.3/8.13.8) with ESMTP id sACMSRhA024415; Wed, 12 Nov 2014 16:28:27 -0600 Received: from dlep32.itg.ti.com (157.170.170.100) by DFLE73.ent.ti.com (128.247.5.110) with Microsoft SMTP Server id 14.3.174.1; Wed, 12 Nov 2014 16:28:27 -0600 Received: from localhost (ileax41-snat.itg.ti.com [10.172.224.153]) by dlep32.itg.ti.com (8.14.3/8.13.8) with ESMTP id sACMSQSQ003456; Wed, 12 Nov 2014 16:28:27 -0600 Date: Wed, 12 Nov 2014 16:28:53 -0600 From: Felipe Balbi To: Alan Stern CC: Felipe Balbi , Mathias Nyman , Linux USB Mailing List , Greg KH , Subject: Re: [RFC/PATCH v2] usb: host: xhci: issue 'Reset Endpoint' for !CONTROL EPs from finish_td() Message-ID: <20141112222853.GO641@saruman> Reply-To: References: <20141112201732.GN641@saruman> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.23 (2014-03-12) Sender: stable-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: stable@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: balbi@ti.com X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.217.172 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Hi, On Wed, Nov 12, 2014 at 04:54:06PM -0500, Alan Stern wrote: [...] > > and the doorbell will never rung. But even if I drop EP_HALTED from the > > check below and let EP doorbell be rung, nothing will happen because, > > according to XHCI spec 1.0, we *must* first issue a Reset Endpoint > > command to get Endpoint to move to EP_STATE_STOPPED, in which case, > > ringing that EP's doorbell will actually cause a transfer to happen. > > > > Right now, what happens is that second usb_submit_urb() does nothing and > > the 10 second timer expires, causing the URB to be canceled and test > > failing with -ETIMEDOUT. > > Okay, I see. What usbcore and usbtest expect to happen is this: > > (1) An URB fails because the endpoint sent a STALL. The completion > routine is called with status -EPIPE. > > (2) When the completion routine returns, the next URB in the > endpoint's queue should get handled by the hardware. If the > endpoint is still halted, this URB should fail with -EPIPE > just like the first. > > (3) Etc. Eventually the endpoint queue empties out or the class > driver calls usb_clear_halt(). perfect :-) > So (1) works as desired, but (2) doesn't work because the doorbell > never gets rung. right. > And the easiest way to make (2) work is to issue a Reset Endpoint > command. There's one extra bit of information here, see below. > (There are other, more complicated ways to get the same result. For > instance, you could loop through the remaining queued URBs, giving them > back one by one with -EPIPE. And each time an URB is submitted, you > could give it back right away. But Reset Endpoint is simpler.) IMO issuing a Reset Endpoint is the only correct way of implementing this. See, there are two sides to usbtest + g_zero love relationship: (a) it will help you test your UDC driver. (b) it will help you test your HCD driver. Currently, we have a bug with (b), but if iterate over the list of submitted URBs we can create two problems: (i) There's no way to synchronize class driver's usb_submit_urb() with the exact time when we iterate over the list of pending URBs in xhci-hcd. Which means that we could very well iterate over an empty list and think everything was fine. Heck, this would even happen with usbtest, note that simple_io() always waits 10 seconds for a transfer completion before returning control to the caller. (ii) We would fall into the possibility of not catching bugs with UDCs running g_zero because HCD would just return early -EPIPE for us without asking UDC to handle another transfer :-) Because of these two cases, I think the *only* way to solve this is by issuing a Reset Endpoint cmd so another token can be shifted onto the data lines. > In the patch, you talked about clearing the endpoint halt. But that's > not what you want to do; you want to issue a Reset Endpoint command, > which affects only the host controller. The endpoint's status in the this is exactly what xhci_cleanup_halted_endpoint() does. Sure, it's a misnamer and should probably be renamed to either xhci_reset_endpoint_cmd() or xhci_clear_endpoint_state_halted() to make it clearer as to what's happening. But that can happen later, we don't need to clean that up in order to fix the bug :-) > peripheral device will remain unchanged -- no halt will be cleared. > That contributed to my confusion on reading the patch. yeah, that got me for a while too. I had to keep reminding myself what xhci_cleanup_halted_endpoint() was doing ;-) > By the way, does the same sort of thing happen after a transfer > error (such as a CRC mismatch)? Does the xHCI controller change the > state to EP_STATE_HALTED? Or does it instead go directly to There are a few conditions in which XHCI will change EP state to EP_STATE_HALTED, one of them is a STALL token from the peripheral and the others would be really error conditions: Babble, CRC error, etc. The spec even has a note about it, which I quote: " A Halt condition or USB Transaction error detected on a USB pipe shall cause a Running Endpoint to transition to the Halted state. A Reset Endpoint Command shall be used to clear the Halt condition on the endpoint and transition the endpoint to the Stopped state. A Stop Endpoint Command received while an endpoint is in the Halted state shall have no effect and shall generate a Command Completion Event with the Completion Code set to Context State Error. " > EP_STATE_STOPPED? You probably want to treat that case and the STALL > case as similarly as possible. There's already code to deal with that, take a look at xhci_requires_manual_halt_cleanup() and its callers: 1754 static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, 1755 struct xhci_ep_ctx *ep_ctx, 1756 unsigned int trb_comp_code) 1757 { 1758 /* TRB completion codes that may require a manual halt cleanup */ 1759 if (trb_comp_code == COMP_TX_ERR || 1760 trb_comp_code == COMP_BABBLE || 1761 trb_comp_code == COMP_SPLIT_ERR) 1762 /* The 0.96 spec says a babbling control endpoint 1763 * is not halted. The 0.96 spec says it is. Some HW 1764 * claims to be 0.95 compliant, but it halts the control 1765 * endpoint anyway. Check if a babble halted the 1766 * endpoint. 1767 */ 1768 if ((ep_ctx->ep_info & cpu_to_le32(EP_STATE_MASK)) == 1769 cpu_to_le32(EP_STATE_HALTED)) 1770 return 1; 1771 1772 return 0; 1773 } In fact, there is a typo on that comment. It needs this patch: I think this has a higher probability of being correct. Class driver might not queue any URB to a particular after the first Stall, so why would be move the endpoint away from EP_STATE_HALTED prematurely ? What do you think ? Tested-by: Felipe Balbi diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 4e8c3cf..a8bbacb 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1759,7 +1759,7 @@ static int xhci_requires_manual_halt_cleanup(struct xhci_hcd *xhci, if (trb_comp_code == COMP_TX_ERR || trb_comp_code == COMP_BABBLE || trb_comp_code == COMP_SPLIT_ERR) - /* The 0.96 spec says a babbling control endpoint + /* The 0.95 spec says a babbling control endpoint * is not halted. The 0.96 spec says it is. Some HW * claims to be 0.95 compliant, but it halts the control * endpoint anyway. Check if a babble halted the > > > For instance, if an endpoint is halted then there's no reason for the > > > controller to shift any USB tokens for it onto the data lines. Doing > > > so would just be a waste of bandwidth, since the response is bound to > > > be another STALL. And it doesn't matter that the peripheral has no > > > means to STALL any follow-up iens, since the host controller already > > > knows the endpoint is halted. > > > > Now you're claiming that this is a bug on usbtest which has been in tree > > for many, many years and is known to work with EHCI, MUSB and UHCI (at > > least, probably dummy too), which is a different statement from previous > > thread [1]. > > No, I simply failed to understood what you wanted to do. alright. > > > The comment in the patch talks about moving the dequeue pointer past > > > the STALLed TD and then clearing the halt condition. Moving the > > > dequeue pointer is fine -- there's no other way to take control of the > > > TD back from the hardware -- but why would you want to clear the halt? > > > The HCD isn't supposed to do that; the class driver is. > > > > See what usbtest does. It wants to make sure that, even if we issue > > several URBs for that endpoint, the function will always STALL. Sure, > > it's a waste of bandwidth, but what's the probability that any class > > driver will actually do this outside of a test environment ? I think > > it's not up to the HCD to device and it should, rather, let the function > > respond with the expected STALL again which will, once more, move the > > endpoint back into EP_STATE_HALT. > > > > The only thing we should be discussing here, is proper placement for > > xhci_cleanup_halted_endpoint(). > > Right. In theory you could do it any time up until the completion > routine returns. Doing it when you process the failed TD seems like a > good choice -- advance the dequeue pointer and issue the command at the > same time. My concern here is that this will happen when the first usb_submit_urb() completes. I wonder if moving this to when the following usb_submit_urb() is about to start would be better ? Something like below (it won't build, just trying to illustrate the situation): diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 4e8c3cf..a1dac2c 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2814,6 +2814,12 @@ static int prepare_ring(struct xhci_hcd *xhci, struct xhci_ring *ep_ring, return -EINVAL; case EP_STATE_HALTED: xhci_dbg(xhci, "WARN halted endpoint, queueing URB anyway.\n"); + if (!usb_endpoint_xfer_control(&td->urb->ep->desc)) { + xhci_dbg(xhci, "Resetting endpoint.\n"); + xhci_cleanup_halted_endpoint(xhci, + slot_id, ep_index, ep_ring->stream_id, + td, event_trb); + } case EP_STATE_STOPPED: case EP_STATE_RUNNING: break;