diff mbox series

USB: Prevent xhci from resuming root hub during suspend entrance

Message ID 20250110084413.80981-1-leo.lin@canonical.com
State New
Headers show
Series USB: Prevent xhci from resuming root hub during suspend entrance | expand

Commit Message

Yo-Jung (Leo) Lin Jan. 10, 2025, 8:44 a.m. UTC
The commit d9b4067aef50 ("USB: Fix the issue of task recovery failure
caused by USB status when S4 wakes up") fixed an issue where if an USB
port change happens during the entering steps of hibernation, xhci driver
would attempt to resume the root hub, making the hibernation fail.

System-wide suspend may fail due to the same reason, but this hasn't been
addressed yet. This has been found on HP ProOne 440[1], as well as on
some newer Dell all-in-one models. When suspend fails due to this reason,
the kernel would show the following messages:

[   74.245058] [165] usbcore:hub_suspend:3961: hub 2-0:1.0: hub_suspend
[   74.245850] [165] usbcore:hcd_bus_suspend:2251: usb usb2: bus suspend, wakeup 0
[   74.250971] [3508] usbcore:usb_port_suspend:3554: usb 1-2: usb suspend, wakeup 1
[   74.263025] [11] usbcore:hub_suspend:3961: hub 1-0:1.0: hub_suspend
[   74.264029] [11] usbcore:hcd_bus_suspend:2251: usb usb1: bus suspend, wakeup 0
[   74.265061] [11] xhci_hcd:xhci_bus_suspend:1779: xhci_hcd 0000:80:14.0: port 1-14 not suspended
[   74.266020] [11] xhci_hcd:xhci_bus_suspend:1779: xhci_hcd 0000:80:14.0: port 1-8 not suspended
[   74.266933] [11] xhci_hcd:xhci_bus_suspend:1779: xhci_hcd 0000:80:14.0: port 1-4 not suspended
[   74.267758] [11] xhci_hcd:xhci_ring_cmd_db:369: xhci_hcd 0000:80:14.0: // Ding dong!
[   74.268677] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 6
[   74.269632] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 5
[   74.270448] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 3
[   74.271228] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 0
[   74.271946] [11] xhci_hcd:xhci_ring_cmd_db:369: xhci_hcd 0000:80:14.0: // Ding dong!
[   74.272802] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 6 ep 8
[   74.273533] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 6 ep 5
[   74.274233] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 6 ep 0
[   74.274982] [11] xhci_hcd:xhci_ring_cmd_db:369: xhci_hcd 0000:80:14.0: // Ding dong!
[   74.275814] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 2 ep 0
[   74.281739] <intr> xhci_hcd:handle_port_status:1992: xhci_hcd 0000:80:14.0: Port change event, 1-8, id 8, portsc: 0x202a0
[   74.282453] <intr> xhci_hcd:handle_port_status:1998: xhci_hcd 0000:80:14.0: resume root hub
[   74.283145] <intr> xhci_hcd:handle_port_status:2109: xhci_hcd 0000:80:14.0: handle_port_status: starting usb1 port polling.
[   74.385425] e1000e: EEE TX LPI TIMER: 00000011
[   74.385543] [3508] pci_acpi:acpi_pci_set_power_state:1119: pcieport 0000:80:1c.0: power state changed by ACPI to D0
[   74.385722] xhci_hcd 0000:80:14.0: PM: pci_pm_suspend(): hcd_pci_suspend returns -16
[   74.385735] xhci_hcd 0000:80:14.0: PM: dpm_run_callback(): pci_pm_suspend returns -16
[   74.385743] xhci_hcd 0000:80:14.0: PM: failed to suspend async: error -16

To address this, extend the fix in that commit also to suspend.

This patch was tested on top of next-20250109, by suspending every 2
minutes consecutively for 300 times on the machine where the above error
messages were found. There's no suspend failure found during the test.

[1] [PATCH v3] platform/x86/hp: Avoid spurious wakeup on HP ProOne 440
    https://lore.kernel.org/all/20240906053047.459036-1-kai.heng.feng@canonical.com/#t

Signed-off-by: Yo-Jung (Leo) Lin <leo.lin@canonical.com>
---
 include/linux/pm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Greg KH Jan. 10, 2025, 8:54 a.m. UTC | #1
On Fri, Jan 10, 2025 at 04:44:10PM +0800, Yo-Jung (Leo) Lin wrote:
> The commit d9b4067aef50 ("USB: Fix the issue of task recovery failure
> caused by USB status when S4 wakes up") fixed an issue where if an USB
> port change happens during the entering steps of hibernation, xhci driver
> would attempt to resume the root hub, making the hibernation fail.
> 
> System-wide suspend may fail due to the same reason, but this hasn't been
> addressed yet. This has been found on HP ProOne 440[1], as well as on
> some newer Dell all-in-one models. When suspend fails due to this reason,
> the kernel would show the following messages:
> 
> [   74.245058] [165] usbcore:hub_suspend:3961: hub 2-0:1.0: hub_suspend
> [   74.245850] [165] usbcore:hcd_bus_suspend:2251: usb usb2: bus suspend, wakeup 0
> [   74.250971] [3508] usbcore:usb_port_suspend:3554: usb 1-2: usb suspend, wakeup 1
> [   74.263025] [11] usbcore:hub_suspend:3961: hub 1-0:1.0: hub_suspend
> [   74.264029] [11] usbcore:hcd_bus_suspend:2251: usb usb1: bus suspend, wakeup 0
> [   74.265061] [11] xhci_hcd:xhci_bus_suspend:1779: xhci_hcd 0000:80:14.0: port 1-14 not suspended
> [   74.266020] [11] xhci_hcd:xhci_bus_suspend:1779: xhci_hcd 0000:80:14.0: port 1-8 not suspended
> [   74.266933] [11] xhci_hcd:xhci_bus_suspend:1779: xhci_hcd 0000:80:14.0: port 1-4 not suspended
> [   74.267758] [11] xhci_hcd:xhci_ring_cmd_db:369: xhci_hcd 0000:80:14.0: // Ding dong!
> [   74.268677] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 6
> [   74.269632] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 5
> [   74.270448] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 3
> [   74.271228] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 5 ep 0
> [   74.271946] [11] xhci_hcd:xhci_ring_cmd_db:369: xhci_hcd 0000:80:14.0: // Ding dong!
> [   74.272802] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 6 ep 8
> [   74.273533] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 6 ep 5
> [   74.274233] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 6 ep 0
> [   74.274982] [11] xhci_hcd:xhci_ring_cmd_db:369: xhci_hcd 0000:80:14.0: // Ding dong!
> [   74.275814] <intr> xhci_hcd:handle_tx_event:2711: xhci_hcd 0000:80:14.0: Stopped on No-op or Link TRB for slot 2 ep 0
> [   74.281739] <intr> xhci_hcd:handle_port_status:1992: xhci_hcd 0000:80:14.0: Port change event, 1-8, id 8, portsc: 0x202a0
> [   74.282453] <intr> xhci_hcd:handle_port_status:1998: xhci_hcd 0000:80:14.0: resume root hub
> [   74.283145] <intr> xhci_hcd:handle_port_status:2109: xhci_hcd 0000:80:14.0: handle_port_status: starting usb1 port polling.
> [   74.385425] e1000e: EEE TX LPI TIMER: 00000011
> [   74.385543] [3508] pci_acpi:acpi_pci_set_power_state:1119: pcieport 0000:80:1c.0: power state changed by ACPI to D0
> [   74.385722] xhci_hcd 0000:80:14.0: PM: pci_pm_suspend(): hcd_pci_suspend returns -16
> [   74.385735] xhci_hcd 0000:80:14.0: PM: dpm_run_callback(): pci_pm_suspend returns -16
> [   74.385743] xhci_hcd 0000:80:14.0: PM: failed to suspend async: error -16
> 
> To address this, extend the fix in that commit also to suspend.
> 
> This patch was tested on top of next-20250109, by suspending every 2
> minutes consecutively for 300 times on the machine where the above error
> messages were found. There's no suspend failure found during the test.
> 
> [1] [PATCH v3] platform/x86/hp: Avoid spurious wakeup on HP ProOne 440
>     https://lore.kernel.org/all/20240906053047.459036-1-kai.heng.feng@canonical.com/#t
> 
> Signed-off-by: Yo-Jung (Leo) Lin <leo.lin@canonical.com>
> ---
>  include/linux/pm.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 08c37b83fea8..d71347357fb1 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -571,7 +571,7 @@ const struct dev_pm_ops name = { \
>  
>  #define PMSG_IS_AUTO(msg)	(((msg).event & PM_EVENT_AUTO) != 0)
>  #define PMSG_NO_WAKEUP(msg)	(((msg).event & \
> -				(PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0)
> +				(PM_EVENT_FREEZE | PM_EVENT_QUIESCE | PM_EVENT_SUSPEND)) != 0)

What tree is this against?  It's not my usb trees as this #define is not
present there.

And why are you changing a define that affects multiple devices when
it's only a USB issue you are seeing?

confused,

greg k-h
Alan Stern Jan. 10, 2025, 3:44 p.m. UTC | #2
On Fri, Jan 10, 2025 at 04:44:10PM +0800, Yo-Jung (Leo) Lin wrote:
> The commit d9b4067aef50 ("USB: Fix the issue of task recovery failure
> caused by USB status when S4 wakes up") fixed an issue where if an USB
> port change happens during the entering steps of hibernation, xhci driver
> would attempt to resume the root hub, making the hibernation fail.
> 
> System-wide suspend may fail due to the same reason, but this hasn't been
> addressed yet. This has been found on HP ProOne 440[1], as well as on
> some newer Dell all-in-one models. When suspend fails due to this reason,
> the kernel would show the following messages:

I believe this problem was discussed on the mailing list before, and it 
turned out that the issue was caused by a bug in the xhci-hcd driver, 
not a bug in the USB core.

Basically, suspend is _supposed_ to fail if a wakeup event occurs while 
the suspend is in progress.  As I recall, the bug in xhci-hcd was that 
it treats some non-wakeup events as if they were wakeup events.

In particular, a port change on the root hub should be treated as a 
wakeup event if and only if the root hub is enabled for wakeup.  Does 
xhci-hcd check for this before failing the suspend?

This reasoning shows that your proposed fix is incorrect.

Alan Stern
Yo-Jung (Leo) Lin Jan. 13, 2025, 8:14 a.m. UTC | #3
Hi Alan

On Fri, Jan 10, 2025 at 11:44 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Fri, Jan 10, 2025 at 04:44:10PM +0800, Yo-Jung (Leo) Lin wrote:
> > The commit d9b4067aef50 ("USB: Fix the issue of task recovery failure
> > caused by USB status when S4 wakes up") fixed an issue where if an USB
> > port change happens during the entering steps of hibernation, xhci driver
> > would attempt to resume the root hub, making the hibernation fail.
> >
> > System-wide suspend may fail due to the same reason, but this hasn't been
> > addressed yet. This has been found on HP ProOne 440[1], as well as on
> > some newer Dell all-in-one models. When suspend fails due to this reason,
> > the kernel would show the following messages:
>
> I believe this problem was discussed on the mailing list before, and it
> turned out that the issue was caused by a bug in the xhci-hcd driver,
> not a bug in the USB core.

Could you be more specific on which bug/thread it is?
If you were mentioning thread about d9b4067aef50 ("USB: Fix the
issue of task recovery failure caused by USB status when S4 wakes up"),
the log in that commit message suggests that it happened on ehci, while
here it happened on xhci. So this may be more general than just the xhci.

>
> Basically, suspend is _supposed_ to fail if a wakeup event occurs while
> the suspend is in progress.  As I recall, the bug in xhci-hcd was that
> it treats some non-wakeup events as if they were wakeup events.
>
> In particular, a port change on the root hub should be treated as a
> wakeup event if and only if the root hub is enabled for wakeup.  Does
> xhci-hcd check for this before failing the suspend?
>
> This reasoning shows that your proposed fix is incorrect.
>
Thanks for the feedback, This indeed isn't a correct way to address this.
Will try to figure out some other ways.

> Alan Stern

Best,
Leo
Alan Stern Jan. 13, 2025, 4:03 p.m. UTC | #4
On Mon, Jan 13, 2025 at 04:14:07PM +0800, Yo-Jung (Leo) Lin wrote:
> Hi Alan
> 
> On Fri, Jan 10, 2025 at 11:44 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Fri, Jan 10, 2025 at 04:44:10PM +0800, Yo-Jung (Leo) Lin wrote:
> > > The commit d9b4067aef50 ("USB: Fix the issue of task recovery failure
> > > caused by USB status when S4 wakes up") fixed an issue where if an USB
> > > port change happens during the entering steps of hibernation, xhci driver
> > > would attempt to resume the root hub, making the hibernation fail.
> > >
> > > System-wide suspend may fail due to the same reason, but this hasn't been
> > > addressed yet. This has been found on HP ProOne 440[1], as well as on
> > > some newer Dell all-in-one models. When suspend fails due to this reason,
> > > the kernel would show the following messages:
> >
> > I believe this problem was discussed on the mailing list before, and it
> > turned out that the issue was caused by a bug in the xhci-hcd driver,
> > not a bug in the USB core.
> 
> Could you be more specific on which bug/thread it is?
> If you were mentioning thread about d9b4067aef50 ("USB: Fix the
> issue of task recovery failure caused by USB status when S4 wakes up"),
> the log in that commit message suggests that it happened on ehci, while
> here it happened on xhci. So this may be more general than just the xhci.

I was referring to the discussion in the email thread here:

https://lore.kernel.org/linux-usb/7be0c87a-c00f-4346-8482-f41ef0249b57@rowland.harvard.edu/

> > Basically, suspend is _supposed_ to fail if a wakeup event occurs while
> > the suspend is in progress.  As I recall, the bug in xhci-hcd was that
> > it treats some non-wakeup events as if they were wakeup events.
> >
> > In particular, a port change on the root hub should be treated as a
> > wakeup event if and only if the root hub is enabled for wakeup.  Does
> > xhci-hcd check for this before failing the suspend?
> >
> > This reasoning shows that your proposed fix is incorrect.
> >
> Thanks for the feedback, This indeed isn't a correct way to address this.
> Will try to figure out some other ways.

Okay, good.

Alan Stern
diff mbox series

Patch

diff --git a/include/linux/pm.h b/include/linux/pm.h
index 08c37b83fea8..d71347357fb1 100644
--- a/include/linux/pm.h
+++ b/include/linux/pm.h
@@ -571,7 +571,7 @@  const struct dev_pm_ops name = { \
 
 #define PMSG_IS_AUTO(msg)	(((msg).event & PM_EVENT_AUTO) != 0)
 #define PMSG_NO_WAKEUP(msg)	(((msg).event & \
-				(PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0)
+				(PM_EVENT_FREEZE | PM_EVENT_QUIESCE | PM_EVENT_SUSPEND)) != 0)
 /*
  * Device run-time power management status.
  *