From patchwork Mon Nov 4 09:33:02 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Pecio X-Patchwork-Id: 841031 Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ED6906FC5 for ; Mon, 4 Nov 2024 09:33:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730712790; cv=none; b=mQGxWyj2fJdPmkCNSmAGT2Q+PqTdIuFTozx9hEITiPmtGFt3w5xTSWfUL+JPSlFUxth4ipdgf2cxuHDe2aDl57XGRLxhSFhf/JG2zYA6t7foJG2VBW4HZRGMwxMpUUOYG8X9G9rxU8Nse8Z3PCKCPQMrc2i6Y7iO7uRGBa1yil4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730712790; c=relaxed/simple; bh=uA3l/ZAsN+U9x0hp3zWWQ0Qvvx7VT3rJZooxKQ02R+Q=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=uMZlDMVawwKqGO7lWhmzoIeB4mwiaHBN+SjVSO3WXA0E1h83dy7U+gKjKbMgF5VJY3Xqy4FSPjNuDxvOEqTz1mm8fe6ew0IjgxvK9nsAkrFzWOlLVk8AESDAi6HBTBb5CbHSNWn02gNCT5VAmhoFFcn5fMhqTEAVq5kPs9MwNGw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=PO4Rm0B3; arc=none smtp.client-ip=209.85.208.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PO4Rm0B3" Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-2fb4af0b6beso57436621fa.3 for ; Mon, 04 Nov 2024 01:33:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730712787; x=1731317587; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=Kci4jz+pzI7To/0XFg7fTr7ly41ccauv/ECZTS12fdY=; b=PO4Rm0B3atXu0n9EA1n0x/AH+gPnWzPlhTBd1yLq1ZtwuJX9g+Lx1khIefGZOLfQZ+ FnyRRIGfOIKhgSFpA9+7NBHqIXB0MmCZu/KIUX78IiwMBawgqVngEXz+EUOvhvxLi/p+ M5wI+T17PQY5xuUpsvodqC6C6IVyHn+Y2HID3yRjpGE5e/YoyP7KNq/ipGpoAuAO/kXc Do906J3EHdCpLjXTKbAE4FlV8LJiae456wmNpDI3YHKfmCsAUwJ78bN/9QArXhPteEUU CS79ZAIvqLfnsQiSO9qAICZ2ni4Lm3PynBVwtZmQAd2/qmDYGtWxEfcAUcRTr9czDVT4 LjCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730712787; x=1731317587; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Kci4jz+pzI7To/0XFg7fTr7ly41ccauv/ECZTS12fdY=; b=seSQdH0Wa5GzdJK8N2JL2qdVoenCWYyEasO9FMq9AsCh716cRLeFWjh+ObRoPQlFvn yOajyE7xoDQ5Dp4z1mVh5r6B3eiY82AGPXknpfZ5XdZoElCA/H5KKkFuhlUEZ+HPH8MP zXGAlxntat1ZqCI+Jgl+oESpO2hCY9aDT1QWAmu++j2tfyNZG+tC58r7SA3N+B8bEHEe n5TYcdBROmmCEC5XE9pHus4KWyzWSmiUbAVHGh9Kz7R5evOhn+lXBiHHQws4EnM2R0sm XNccVzjyDEOtAqmqQOC9ERLf239gQ2SjG9bK+vEOUsQyVHgfMvNEqdtNnKThzE+lpA7c 3wWg== X-Gm-Message-State: AOJu0YyMEK0xpgm/HC8APUdpqREX7mKxFA79OuHMC2OXxdPUc2XDLiie wXfAaX3PzMRle4otLfA7fk0E6I4AsktBhmL9WJitG3B/ROMbNMf7 X-Google-Smtp-Source: AGHT+IGertSGLysntXGxMaxnQpxPKhlh6eD7GxRCGVc1hYNLc32i8uzPn6uBYW9ntbg+VrkUaePRTg== X-Received: by 2002:a2e:a908:0:b0:2fb:2a1c:936d with SMTP id 38308e7fff4ca-2fedb7a1fd0mr80760331fa.10.1730712786783; Mon, 04 Nov 2024 01:33:06 -0800 (PST) Received: from foxbook (bhf124.neoplus.adsl.tpnet.pl. [83.28.95.124]) by smtp.gmail.com with ESMTPSA id 38308e7fff4ca-2fdef8c3871sm15811961fa.110.2024.11.04.01.33.04 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 04 Nov 2024 01:33:06 -0800 (PST) Date: Mon, 4 Nov 2024 10:33:02 +0100 From: Michal Pecio To: Mathias Nyman Cc: linux-usb@vger.kernel.org Subject: [PATCH v4 1/3] usb: xhci: Limit Stop Endpoint retries Message-ID: <20241104103302.731601c8@foxbook> In-Reply-To: <20241104103200.533fe1fb@foxbook> References: <20241104103200.533fe1fb@foxbook> Precedence: bulk X-Mailing-List: linux-usb@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Some host controllers fail to atomically transition an endpoint to the Running state on a doorbell ring and enter a hidden "Restarting" state, which looks very much like Stopped, with the important difference that it will spontaneously transition to Running anytime soon. A Stop Endpoint command queued in the Restarting state typically fails with Context State Error and the completion handler sees the Endpoint Context State as either still Stopped or already Running. Even a case of Halted was observed, when an error occurred right after the restart. The Halted state is already recovered from by resetting the endpoint. The Running state is handled by retrying Stop Endpoint. The Stopped state was recognized as a problem on NEC controllers and worked around also by retrying, because the endpoint soon restarts and then stops for good. But there is a risk: the command may fail if the endpoint is "stopped for good" already, and retries will fail forever. The possibility of this was not realized at the time, but a number of cases were discovered later and reproduced. Some proved difficult to deal with, and it is outright impossible to predict if an endpoint may fail to ever start at all due to a hardware bug. One such bug (albeit on ASM3142, not on NEC) was found to be reliably triggered simply by toggling an AX88179 NIC up/down in a tight loop for a few seconds. An endless retries storm is quite nasty. Besides putting needless load on the xHC and CPU, it causes URBs never to be given back, paralyzing the device and connection/disconnection logic for the whole bus if the device is unplugged. User processes waiting for URBs become unkillable, drivers and kworker threads lock up and xhci_hcd cannot be reloaded. For peace of mind, impose a timeout on Stop Endpoint retries in this case. If they don't succeed in 100ms, consider the endpoint stopped permanently for some reason and just give back the unlinked URBs. This failure case is rare already and work is under way to make it rarer. Start this work today by also handling one simple case of race with Reset Endpoint, because it costs just two lines to implement. Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio --- drivers/usb/host/xhci-ring.c | 28 ++++++++++++++++++++++++---- drivers/usb/host/xhci.c | 2 ++ drivers/usb/host/xhci.h | 1 + 3 files changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index b6eb928e260f..98e12267bbf6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -52,6 +52,7 @@ * endpoint rings; it generates events on the event ring for these. */ +#include #include #include #include @@ -1151,16 +1152,35 @@ static void xhci_handle_cmd_stop_ep(struct xhci_hcd *xhci, int slot_id, return; case EP_STATE_STOPPED: /* - * NEC uPD720200 sometimes sets this state and fails with - * Context Error while continuing to process TRBs. - * Be conservative and trust EP_CTX_STATE on other chips. + * Per xHCI 4.6.9, Stop Endpoint command on a Stopped + * EP is a Context State Error, and EP stays Stopped. + * + * But maybe it failed on Halted, and somebody ran Reset + * Endpoint later. EP state is now Stopped and EP_HALTED + * still set because Reset EP handler will run after us. + */ + if (ep->ep_state & EP_HALTED) + break; + /* + * On some HCs EP state remains Stopped for some tens of + * us to a few ms or more after a doorbell ring, and any + * new Stop Endpoint fails without aborting the restart. + * This handler may run quickly enough to still see this + * Stopped state, but it will soon change to Running. + * + * Assume this bug on unexpected Stop Endpoint failures. + * Keep retrying until the EP starts and stops again, on + * chips where this is known to help. Wait for 100ms. */ if (!(xhci->quirks & XHCI_NEC_HOST)) break; + if (time_is_before_jiffies(ep->stop_time + msecs_to_jiffies(100))) + break; fallthrough; case EP_STATE_RUNNING: /* Race, HW handled stop ep cmd before ep was running */ - xhci_dbg(xhci, "Stop ep completion ctx error, ep is running\n"); + xhci_dbg(xhci, "Stop ep completion ctx error, ctx_state %d\n", + GET_EP_CTX_STATE(ep_ctx)); command = xhci_alloc_command(xhci, false, GFP_ATOMIC); if (!command) { diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 899c0effb5d3..1ed4dce7eeb7 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -8,6 +8,7 @@ * Some code borrowed from the Linux EHCI driver. */ +#include #include #include #include @@ -1777,6 +1778,7 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) ret = -ENOMEM; goto done; } + ep->stop_time = jiffies; ep->ep_state |= EP_STOP_CMD_PENDING; xhci_queue_stop_endpoint(xhci, command, urb->dev->slot_id, ep_index, 0); diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index f0fb696d5619..0bef6c8b51cf 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -690,6 +690,7 @@ struct xhci_virt_ep { /* Bandwidth checking storage */ struct xhci_bw_info bw_info; struct list_head bw_endpoint_list; + unsigned long stop_time; /* Isoch Frame ID checking storage */ int next_frame_id; /* Use new Isoch TRB layout needed for extended TBC support */ From patchwork Mon Nov 4 09:34:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michal Pecio X-Patchwork-Id: 841030 Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15A4A6FC5 for ; Mon, 4 Nov 2024 09:34:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.43 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730712887; cv=none; b=Pl7QrGC9jUn9dSkDOUBk0uLDMhbP8xWxXUewcUeCIK2I/tK3dPi2xRhcfDcp5hcs5AEeNd+WjtIir0t13IQKG4rakgfyPZNDO8pgwbRdkudcq64YEc4Jyga3mjRAsz1VI+ScotRPgDDo9AJARaHp6KpiZSApkruYh65v4kszcvU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730712887; c=relaxed/simple; bh=SwBw0u2oB9YWmUAWvLcSrdIztuafapwkYZlH6X3wVW4=; h=Date:From:To:Cc:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=U20ny6ZVIWZBMbIcbgLWeEW0vdDxXLREhpavwdNiq2NnRwdLlxFrG3oQPgZfJrPmOZ24HU1iC7tWrRIaixoqJKsP0bQFPnwZBId3b+0qgpX+ofzrfk+8eMkEEZ3QXELoYhiRTNg1xdO8YnbmvZA6lljhQ5g2AnnhXxT7/uETnfY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=IKmFPnTd; arc=none smtp.client-ip=209.85.167.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="IKmFPnTd" Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-539e6c754bdso3829614e87.2 for ; Mon, 04 Nov 2024 01:34:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730712884; x=1731317684; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=IglvRZ5385zU0mmHh62rTw1JW5g8sQ0kUxTBvIPUQZQ=; b=IKmFPnTdKDPgtddijxNfW60cZQYy8aDh/tdcO7ThKpK/7K20leVx0UyvWrQYvap2YI PdD1h6Hm7z09vvn9erThc9ByV9HvVNfBgQ+0jbR2nUJ1Znk0YVer9ujIrythPuw3Xe/g pOSxJXSXqFfHXs0LxBkEc8afiP9Agb6UjSYj4CwDytWSJ4Z23HIyvLQHBLIes/HDvI/5 7SdEJeWKcMVVm8V3Z1NmlhlO0hIklztptXyyKRoppRJv7iMYcvAhjmMUPUNCtf2UdTJi MSuSbpa0uXj/xyRUhvtMKsKpzqjsSa1IqMyoDyQvSxOcierxNlf0Q1oPhBsRfe65PVzQ RdPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730712884; x=1731317684; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IglvRZ5385zU0mmHh62rTw1JW5g8sQ0kUxTBvIPUQZQ=; b=rc1Q5N/Jo2hvJgSKOAndwBRmsIAWGHQnOFti1b890HACNQ85LbTIaWnW1qSjByzw9E LEcmjx4wj08JZlY16BMsTRAdON4lx3cxt5m+cH2XKoCTrSdAKdhVHtJ9H4v8ijjeDvb8 aRkcaAxvjiuVT/uqqGGU/ISrkI+CD85aVFBxMi1xzYg/XXETt1n3SJbYB3d0WoQA3tfm vj4kmAYkRUcW1OgwpuMu37ziSduHmRMM87s3X7NOBrZcw6PU9PDpcgviuJblT2QCJwB9 f5bkcRQl1+i3TLrbUavszK1a4WsTKEZcXzby97q2t9ZQtbH6wqTAObue39ye6vH46ki5 Qosw== X-Gm-Message-State: AOJu0YxeOHq4wFjkmCPsI1wgDpKNlmWdcn7Rf4tA1EyKpIjRQZokeWl7 H44p+nrQnno3qOYxg/6lUQa7tqm1uP3pPOF3zWywijGDb5upCWHYcV6DTg== X-Google-Smtp-Source: AGHT+IGwAGtiWux+yHuACaJpEIpd+psYaQwWzsCCXOB4ath7tF+0e+6+46ZPtRZ38I6mTV0UFceqlA== X-Received: by 2002:a05:6512:68d:b0:535:699b:b076 with SMTP id 2adb3069b0e04-53c79e2fdb7mr6578023e87.16.1730712883905; Mon, 04 Nov 2024 01:34:43 -0800 (PST) Received: from foxbook (bhf124.neoplus.adsl.tpnet.pl. [83.28.95.124]) by smtp.gmail.com with ESMTPSA id 2adb3069b0e04-53c7bde08c4sm1582637e87.253.2024.11.04.01.34.40 (version=TLS1_2 cipher=AES128-SHA bits=128/128); Mon, 04 Nov 2024 01:34:42 -0800 (PST) Date: Mon, 4 Nov 2024 10:34:38 +0100 From: Michal Pecio To: Mathias Nyman Cc: linux-usb@vger.kernel.org Subject: [PATCH v4 3/3] usb: xhci: Avoid queuing redundant Stop Endpoint commands Message-ID: <20241104103438.14b07877@foxbook> In-Reply-To: <20241104103200.533fe1fb@foxbook> References: <20241104103200.533fe1fb@foxbook> Precedence: bulk X-Mailing-List: linux-usb@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Stop Endpoint command on an already stopped endpoint fails and may be misinterpreted as a known hardware bug by the completion handler. This results in an unnecessary delay with repeated retries of the command. Avoid queuing this command when endpoint state flags indicate that it's stopped or halted and the command will fail. If commands are pending on the endpoint, their completion handlers will process cancelled TDs so it's done. In case of waiting for external operations like clearing TT buffer, the endpoint is stopped and cancelled TDs can be processed now. This eliminates practically all unnecessary retries because an endpoint with pending URBs is maintained in Running state by the driver, unless aforementioned commands or other operations are pending on it. This is guaranteed by xhci_ring_ep_doorbell() and by the fact that it is called every time any of those operations completes. The only known exceptions are hardware bugs (the endpoint never starts at all) and Stream Protocol errors not associated with any TRB, which cause an endpoint reset not followed by restart. Sounds like a bug. Generally, these retries are only expected to happen when the endpoint fails to start for unknown/no reason, which is a worse problem itself, and fixing the bug eliminates the retries too. All cases were tested and found to work as expected. SET_DEQ_PENDING was produced by patching uvcvideo to unlink URBs in 100us intervals, which then runs into this case very often. EP_HALTED was produced by restarting 'cat /dev/ttyUSB0' on a serial dongle with broken cable. EP_CLEARING_TT by the same, with the dongle on an external hub. Fixes: fd9d55d190c0 ("xhci: retry Stop Endpoint on buggy NEC controllers") CC: stable@vger.kernel.org Signed-off-by: Michal Pecio --- drivers/usb/host/xhci-ring.c | 13 +++++++++++++ drivers/usb/host/xhci.c | 19 +++++++++++++++---- drivers/usb/host/xhci.h | 1 + 3 files changed, 29 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 30b392715f1e..92f9b6c468c6 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1069,6 +1069,19 @@ static int xhci_invalidate_cancelled_tds(struct xhci_virt_ep *ep) return 0; } +/* + * Erase queued TDs from transfer ring(s) and give back those the xHC didn't + * stop on. If necessary, queue commands to move the xHC off cancelled TDs it + * stopped on. Those will be given back later when the commands complete. + * + * Call under xhci->lock on a stopped endpoint. + */ +void xhci_process_cancelled_tds(struct xhci_virt_ep *ep) +{ + xhci_invalidate_cancelled_tds(ep); + xhci_giveback_invalidated_tds(ep); +} + /* * Returns the TD the endpoint ring halted on. * Only call for non-running rings without streams. diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 1ed4dce7eeb7..695e29b0946c 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -1769,10 +1769,21 @@ static int xhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) } } - /* Queue a stop endpoint command, but only if this is - * the first cancellation to be handled. - */ - if (!(ep->ep_state & EP_STOP_CMD_PENDING)) { + /* These completion handlers will sort out cancelled TDs for us */ + if (ep->ep_state & (EP_STOP_CMD_PENDING | EP_HALTED | SET_DEQ_PENDING)) { + xhci_dbg(xhci, "Not queuing Stop Endpoint on slot %d ep %d in state 0x%x\n", + urb->dev->slot_id, ep_index, ep->ep_state); + goto done; + } + + /* In this case no commands are pending but the endpoint is stopped */ + if (ep->ep_state & EP_CLEARING_TT) { + /* and cancelled TDs can be given back right away */ + xhci_dbg(xhci, "Invalidating TDs instantly on slot %d ep %d in state 0x%x\n", + urb->dev->slot_id, ep_index, ep->ep_state); + xhci_process_cancelled_tds(ep); + } else { + /* Otherwise, queue a new Stop Endpoint command */ command = xhci_alloc_command(xhci, false, GFP_ATOMIC); if (!command) { ret = -ENOMEM; diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 0bef6c8b51cf..33e57f408e86 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1914,6 +1914,7 @@ void xhci_ring_doorbell_for_active_rings(struct xhci_hcd *xhci, void xhci_cleanup_command_queue(struct xhci_hcd *xhci); void inc_deq(struct xhci_hcd *xhci, struct xhci_ring *ring); unsigned int count_trbs(u64 addr, u64 len); +void xhci_process_cancelled_tds(struct xhci_virt_ep *ep); /* xHCI roothub code */ void xhci_set_link_state(struct xhci_hcd *xhci, struct xhci_port *port,