diff mbox series

USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up

Message ID 20240906030548.845115-1-duanchenghao@kylinos.cn
State Superseded
Headers show
Series USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up | expand

Commit Message

Duan Chenghao Sept. 6, 2024, 3:05 a.m. UTC
When a device is inserted into the USB port and an S4 wakeup is initiated,
after the USB-hub initialization is completed, it will automatically enter suspend mode.
Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery.
However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state,
it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed.

This patch makes two modifications in total:
1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd,
which were previously split between the top half and bottom half of the interrupt,
are now unified and executed solely in the bottom half of the interrupt.
This prevents the bottom half tasks from being frozen during the S4 process,
ensuring that the clean_bit process can proceed without interruption.

2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING.
When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit.
This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process.

Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
---
 drivers/usb/core/hcd.c | 1 -
 drivers/usb/core/hub.c | 3 +++
 2 files changed, 3 insertions(+), 1 deletion(-)

Comments

Alan Stern Sept. 6, 2024, 2:05 p.m. UTC | #1
[Please make sure that the lines in your email message don't extend 
beyond 76 columns or so.]

Lots of things here seem to be wrong.

On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote:
> When a device is inserted into the USB port and an S4 wakeup is initiated,

There is no such thing as an S4 wakeup.  Do you mean wakeup from an S4 
suspend state?

> after the USB-hub initialization is completed, it will automatically enter suspend mode.

What will enter suspend mode?  The hub that the device was plugged into?
That should not happen.  The hub initialization code should detect that 
a new device was plugged in and prevent the hub from suspending.

> Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.

HCD_FLAG_WAKEUP_PENDING is not a state.  It is a flag.

> During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery.

What do you mean by "task recovery"?  We don't need to recover any 
tasks.

What do you mean by "peripherals are put into suspend mode"?  That's not 
what happens.  Peripherals are set back to full power.

> However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state,
> it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed.

What will return an EBUSY status?

Why do you say that S4 suspend will fail?  Aren't you talking about S4 
wakeup?

Can you provide a kernel log that explains these points and shows what 
problem you are trying to solve?

> This patch makes two modifications in total:
> 1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd,
> which were previously split between the top half and bottom half of the interrupt,
> are now unified and executed solely in the bottom half of the interrupt.
> This prevents the bottom half tasks from being frozen during the S4 process,
> ensuring that the clean_bit process can proceed without interruption.

The name is "clear_bit" (with an 'r'), not "clean_bit".

> 2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING.
> When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit.
> This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process.

hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after calling 
hcd->driver->bus_resume().  After that point, usb_hcd_resume_root_hub() 
won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again?

Alan Stern

> Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> ---
>  drivers/usb/core/hcd.c | 1 -
>  drivers/usb/core/hub.c | 3 +++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index 1ff7d901fede..a6bd0fbd82f4 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd *hcd)
>  	spin_lock_irqsave (&hcd_root_hub_lock, flags);
>  	if (hcd->rh_registered) {
>  		pm_wakeup_event(&hcd->self.root_hub->dev, 0);
> -		set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
>  		queue_work(pm_wq, &hcd->wakeup_work);
>  	}
>  	spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> index 4b93c0bd1d4b..7f847c4afc0d 100644
> --- a/drivers/usb/core/hub.c
> +++ b/drivers/usb/core/hub.c
> @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg)
>  
>  int usb_remote_wakeup(struct usb_device *udev)
>  {
> +	struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
>  	int	status = 0;
>  
>  	usb_lock_device(udev);
>  	if (udev->state == USB_STATE_SUSPENDED) {
>  		dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
> +		if (hcd->state == HC_STATE_SUSPENDED)
> +			set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
>  		status = usb_autoresume_device(udev);
>  		if (status == 0) {
>  			/* Let the drivers do their thing, then... */
> -- 
> 2.34.1
> 
>
Duan Chenghao Sept. 10, 2024, 9:34 a.m. UTC | #2
> [Please make sure that the lines in your email message don't extend 
> beyond 76 columns or so.]
> 

OK. Later, I will modify the patch format. V2 patch will be released
later

> Lots of things here seem to be wrong.
> 
> On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote:
> > When a device is inserted into the USB port and an S4 wakeup is
> > initiated,
> 
> There is no such thing as an S4 wakeup.  Do you mean wakeup from an
> S4 
> suspend state?

Yes, waking up from the S4 suspend state.

> 
> > after the USB-hub initialization is completed, it will
> > automatically enter suspend mode.
> 
> What will enter suspend mode?  The hub that the device was plugged
> into?
> That should not happen.  The hub initialization code should detect
> that 
> a new device was plugged in and prevent the hub from suspending.
> 

Yes, the current issue is that the hub detects a new device during the
resuming process. However, the S4 wakeup is attempting to put the hub
into suspend mode, and during the suspend process, it detects that the
HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the
return of an EBUSY status.

> > Upon detecting a device on the USB port, it will proceed with
> > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> 
> HCD_FLAG_WAKEUP_PENDING is not a state.  It is a flag.
> 
> > During the S4 wakeup process, peripherals are put into suspend
> > mode, followed by task recovery.
> 
> What do you mean by "task recovery"?  We don't need to recover any 
> tasks.
> 

S4 wakeup restores the image that was saved before the system entered
the S4 sleep state.

    S4 waking up from hibernation
    =============================
    kernel initialization
    |   
    v   
    freeze user task and kernel thread
    |   
    v   
    load saved image
    |    
    v   
    freeze the peripheral device and controller
    (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set,
     return to EBUSY and do not perform the following restore image.)
    |
    v
    restore image(task recovery)


> What do you mean by "peripherals are put into suspend mode"?  That's
> not 
> what happens.  Peripherals are set back to full power.
> 
> > However, upon detecting that the hcd is in the
> > HCD_FLAG_WAKEUP_PENDING state,
> > it will return an EBUSY status, causing the S4 suspend to fail and
> > subsequent task recovery to not proceed.
> 
> What will return an EBUSY status?

if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.

> 
> Why do you say that S4 suspend will fail?  Aren't you talking about
> S4 
> wakeup?

After returning EBUSY, the subsequent restore image operation will not
be executed.

> 
> Can you provide a kernel log that explains these points and shows
> what 
> problem you are trying to solve?

[    9.009166][ 2] [  T403] PM: Image signature found, resuming
[    9.009167][ 2] [  T403] PM: resume from hibernation
[    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
[inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
[    9.009244][ 2] [  T403] Freezing user space processes ... (elapsed
0.001 seconds) done.
[    9.010355][ 2] [  T403] OOM killer disabled.
[    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
(elapsed 0.000 seconds) done.
[    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
[    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
[    9.073334][ 2] [  T403] PM: Loading and decompressing image data
(486874 pages)...
[    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0 [mpidr:0x0]
[    9.095928][ 2] [  T403] PM: Image loading progress:   0%
[    9.664803][ 2] [  T403] PM: Image loading progress:  10%
[    9.794156][ 2] [  T403] PM: Image loading progress:  20%
[    9.913001][ 2] [  T403] PM: Image loading progress:  30%
[   10.034331][ 2] [  T403] PM: Image loading progress:  40%
[   10.154070][ 2] [  T403] PM: Image loading progress:  50%
[   10.277096][ 2] [  T403] PM: Image loading progress:  60%
[   10.398860][ 2] [  T403] PM: Image loading progress:  70%
[   10.533760][ 2] [  T403] PM: Image loading progress:  80%
[   10.659874][ 2] [  T403] PM: Image loading progress:  90%
[   10.760681][ 2] [  T403] PM: Image loading progress: 100%
[   10.760693][ 2] [  T403] PM: Image loading done
[   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
(1159.22 MB/s)
[   10.761982][ 2] [  T403] PM: Image successfully loaded
[   10.761988][ 2] [  T403] printk: Suspending console(s) (use
no_console_suspend to debug)
[   10.864973][ 2] [  T403] innovpu_freeze:1782
[   10.864974][ 2] [  T403] innovpu_suspend:1759
[   11.168871][ 2] [  T189] PM: pci_pm_freeze():
hcd_pci_suspend+0x0/0x38 returns -16
[   11.168875][ 2] [  T189] PM: dpm_run_callback():
pci_pm_freeze+0x0/0x108 returns -16
[   11.168876][ 2] [  T189] PM: Device 0000:05:00.0 failed to quiesce
async: error -16
[   12.270452][ 2] [  T403] innovpu_thaw:1792
[   12.405296][ 2] [  T403] PM: Failed to load hibernation image,
recovering.
[   12.486859][ 2] [  T403] PM: Basic memory bitmaps freed
[   12.486860][ 2] [  T403] OOM killer enabled.
[   12.486861][ 2] [  T403] Restarting tasks ... 

> 
> > This patch makes two modifications in total:
> > 1. The set_bit and clean_bit operations for the
> > HCD_FLAG_WAKEUP_PENDING flag of Hcd,
> > which were previously split between the top half and bottom half of
> > the interrupt,
> > are now unified and executed solely in the bottom half of the
> > interrupt.
> > This prevents the bottom half tasks from being frozen during the S4
> > process,
> > ensuring that the clean_bit process can proceed without
> > interruption.
> 
> The name is "clear_bit" (with an 'r'), not "clean_bit".
> 
> > 2. Add a condition to the set_bit operation for the hcd status
> > HCD_FLAG_WAKEUP_PENDING.
> > When the hcd status is HC_STATE_SUSPENDED, perform the setting of
> > the aforementioned status bit.
> > This prevents a subsequent set_bit from occurring after the
> > clean_bit if the hcd is in the resuming process.
> 
> hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after
> calling 
> hcd->driver->bus_resume().  After that point,
> usb_hcd_resume_root_hub() 
> won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again?
> 
> Alan Stern
> 
> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > ---
> >  drivers/usb/core/hcd.c | 1 -
> >  drivers/usb/core/hub.c | 3 +++
> >  2 files changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 1ff7d901fede..a6bd0fbd82f4 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd
> > *hcd)
> >         spin_lock_irqsave (&hcd_root_hub_lock, flags);
> >         if (hcd->rh_registered) {
> >                 pm_wakeup_event(&hcd->self.root_hub->dev, 0);
> > -               set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
> >                 queue_work(pm_wq, &hcd->wakeup_work);
> >         }
> >         spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
> > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
> > index 4b93c0bd1d4b..7f847c4afc0d 100644
> > --- a/drivers/usb/core/hub.c
> > +++ b/drivers/usb/core/hub.c
> > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device
> > *udev, pm_message_t msg)
> >  
> >  int usb_remote_wakeup(struct usb_device *udev)
> >  {
> > +       struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
> >         int     status = 0;
> >  
> >         usb_lock_device(udev);
> >         if (udev->state == USB_STATE_SUSPENDED) {
> >                 dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
> > +               if (hcd->state == HC_STATE_SUSPENDED)
> > +                       set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd-
> > > flags);
> >                 status = usb_autoresume_device(udev);
> >                 if (status == 0) {
> >                         /* Let the drivers do their thing, then...
> > */
> > -- 
> > 2.34.1
> > 
> >
Duan Chenghao Sept. 12, 2024, 3:21 a.m. UTC | #3
在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > S4 wakeup restores the image that was saved before the system
> > entered
> > the S4 sleep state.
> > 
> >     S4 waking up from hibernation
> >     =============================
> >     kernel initialization
> >     |   
> >     v   
> >     freeze user task and kernel thread
> >     |   
> >     v   
> >     load saved image
> >     |    
> >     v   
> >     freeze the peripheral device and controller
> >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
> > set,
> >      return to EBUSY and do not perform the following restore
> > image.)
> 
> Why is the flag set at this point?  It should not be; the device and 
> controller should have been frozen with wakeup disabled.
> 
This is check point, not set point. When the USB goes into a suspend
state, the HCD_FLAG_WAKEUP_PENDING flag is checked, and if it is found
that the USB is in the process of resuming, then an EBUSY error is
returned.

> >     |
> >     v
> >     restore image(task recovery)
> 
> > > > However, upon detecting that the hcd is in the
> > > > HCD_FLAG_WAKEUP_PENDING state,
> > > > it will return an EBUSY status, causing the S4 suspend to fail
> > > > and
> > > > subsequent task recovery to not proceed.
> > > 
> > > What will return an EBUSY status?
> > 
> > if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY.
> 
> I meant: Which function will return EBUSY status?  The answer is in
> the 
> log below; hcd_pci_suspend() does this.
> 
> > > Why do you say that S4 suspend will fail?  Aren't you talking
> > > about
> > > S4 
> > > wakeup?
> > 
> > After returning EBUSY, the subsequent restore image operation will
> > not
> > be executed.
> > 
> > > 
> > > Can you provide a kernel log that explains these points and shows
> > > what 
> > > problem you are trying to solve?
> > 
> > [    9.009166][ 2] [  T403] PM: Image signature found, resuming
> > [    9.009167][ 2] [  T403] PM: resume from hibernation
> > [    9.009243][ 2] [  T403] inno-codec inno-codec.16.auto:
> > [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5...
> > [    9.009244][ 2] [  T403] Freezing user space processes ...
> > (elapsed
> > 0.001 seconds) done.
> > [    9.010355][ 2] [  T403] OOM killer disabled.
> > [    9.010355][ 2] [  T403] Freezing remaining freezable tasks ...
> > (elapsed 0.000 seconds) done.
> > [    9.012152][ 2] [  T403] PM: Basic memory bitmaps created
> > [    9.073333][ 2] [  T403] PM: Using 3 thread(s) for decompression
> > [    9.073334][ 2] [  T403] PM: Loading and decompressing image
> > data
> > (486874 pages)...
> > [    9.073335][ 2] [  T403] hibernate: Hibernated on CPU 0
> > [mpidr:0x0]
> > [    9.095928][ 2] [  T403] PM: Image loading progress:   0%
> > [    9.664803][ 2] [  T403] PM: Image loading progress:  10%
> > [    9.794156][ 2] [  T403] PM: Image loading progress:  20%
> > [    9.913001][ 2] [  T403] PM: Image loading progress:  30%
> > [   10.034331][ 2] [  T403] PM: Image loading progress:  40%
> > [   10.154070][ 2] [  T403] PM: Image loading progress:  50%
> > [   10.277096][ 2] [  T403] PM: Image loading progress:  60%
> > [   10.398860][ 2] [  T403] PM: Image loading progress:  70%
> > [   10.533760][ 2] [  T403] PM: Image loading progress:  80%
> > [   10.659874][ 2] [  T403] PM: Image loading progress:  90%
> > [   10.760681][ 2] [  T403] PM: Image loading progress: 100%
> > [   10.760693][ 2] [  T403] PM: Image loading done
> > [   10.760718][ 2] [  T403] PM: Read 1947496 kbytes in 1.68 seconds
> > (1159.22 MB/s)
> > [   10.761982][ 2] [  T403] PM: Image successfully loaded
> > [   10.761988][ 2] [  T403] printk: Suspending console(s) (use
> > no_console_suspend to debug)
> > [   10.864973][ 2] [  T403] innovpu_freeze:1782
> > [   10.864974][ 2] [  T403] innovpu_suspend:1759
> > [   11.168871][ 2] [  T189] PM: pci_pm_freeze():
> > hcd_pci_suspend+0x0/0x38 returns -16
> 
> This should not be allowed to happen.  Freezing is mandatory and not 
> subject to wakeup requests.
> 
> Is your problem related to the one discussed in this email thread?
> 
> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> 
> Would the suggestion I made there -- i.e., have the xhci-hcd 
> interrupt handler skip calling usb_hcd_resume_root_hub() if the root
> hub 
> was suspended with wakeup = 0 -- fix your problem?

Skipping usb_hcd_resume_root_hub() should generally be possible, but
it's important to ensure that normal remote wakeup functionality is not
compromised. Is it HUB_SUSPEND that the hub you are referring to is in
a suspended state?

v2 patch:
https://lore.kernel.org/all/20240910105714.148976-1-duanchenghao@kylinos.cn/
> 
> Alan Stern
Alan Stern Sept. 12, 2024, 3 p.m. UTC | #4
On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > S4 wakeup restores the image that was saved before the system
> > > entered
> > > the S4 sleep state.
> > > 
> > >     S4 waking up from hibernation
> > >     =============================
> > >     kernel initialization
> > >     |   
> > >     v   
> > >     freeze user task and kernel thread
> > >     |   
> > >     v   
> > >     load saved image
> > >     |    
> > >     v   
> > >     freeze the peripheral device and controller
> > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
> > > set,
> > >      return to EBUSY and do not perform the following restore
> > > image.)
> > 
> > Why is the flag set at this point?  It should not be; the device and 
> > controller should have been frozen with wakeup disabled.
> > 
> This is check point, not set point.

Yes, I know that.  But when the flag was checked, why did the code find 
that it was set?  The flag should have been clear.

> > Is your problem related to the one discussed in this email thread?
> > 
> > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > 
> > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > interrupt handler skip calling usb_hcd_resume_root_hub() if the root
> > hub 
> > was suspended with wakeup = 0 -- fix your problem?
> 
> Skipping usb_hcd_resume_root_hub() should generally be possible, but
> it's important to ensure that normal remote wakeup functionality is not
> compromised. Is it HUB_SUSPEND that the hub you are referring to is in
> a suspended state?

I don't understand this question.  hub_quiesce() gets called with 
HUB_SUSPEND when the hub enters a suspended state.

You are correct about the need for normal remote wakeup to work 
properly.  The interrupt handler should skip calling 
usb_hcd_resume_root_hub() for port connect or disconnect changes and for 
port overcurrent changes (when the root hub is suspended with wakeup = 
0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for port 
resume events.

Alan Stern
Duan Chenghao Sept. 13, 2024, 1:51 a.m. UTC | #5
在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > S4 wakeup restores the image that was saved before the system
> > > > entered
> > > > the S4 sleep state.
> > > > 
> > > >     S4 waking up from hibernation
> > > >     =============================
> > > >     kernel initialization
> > > >     |   
> > > >     v   
> > > >     freeze user task and kernel thread
> > > >     |   
> > > >     v   
> > > >     load saved image
> > > >     |    
> > > >     v   
> > > >     freeze the peripheral device and controller
> > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it
> > > > is
> > > > set,
> > > >      return to EBUSY and do not perform the following restore
> > > > image.)
> > > 
> > > Why is the flag set at this point?  It should not be; the device
> > > and 
> > > controller should have been frozen with wakeup disabled.
> > > 
> > This is check point, not set point.
> 
> Yes, I know that.  But when the flag was checked, why did the code
> find 
> that it was set?  The flag should have been clear.

Yes, the current issue is that during S4 testing, there is a
probabilistic scenario where clear_bit is not called after set_bit, or
clear_bit is called but does not execute after set_bit. Please refer to
the two modification points in the v2 patch for details, as both of
them can cause the current issue.

> 
> > > Is your problem related to the one discussed in this email
> > > thread?
> > > 
> > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > 
> > > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > root
> > > hub 
> > > was suspended with wakeup = 0 -- fix your problem?
> > 
> > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > but
> > it's important to ensure that normal remote wakeup functionality is
> > not
> > compromised. Is it HUB_SUSPEND that the hub you are referring to is
> > in
> > a suspended state?
> 
> I don't understand this question.  hub_quiesce() gets called with 
> HUB_SUSPEND when the hub enters a suspended state.
> 
> You are correct about the need for normal remote wakeup to work 
> properly.  The interrupt handler should skip calling 
> usb_hcd_resume_root_hub() for port connect or disconnect changes and
> for 
> port overcurrent changes (when the root hub is suspended with wakeup
> = 
> 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> port 
> resume events.

The current issue arises when rh_state is detected as RH_SUSPEND and
usb_hcd_resume_root_hub() is called to resume the root hub. However,
there is no mutual exclusion between the suspend flag, set_bit, and
clear_bit, which can lead to two scenarios:

    1. After set_bit is called, the state of the USB device is modified
by another process to !USB_STATE_SUSPEND, preventing the hub's resume
from being executed, and consequently, clear_bit is not called again.

    2. In another scenario, during the hub resume process, after
HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
been set to !RH_SUSPENDED. At this point, set_bit is executed, but
since the hub has already entered the running state, the clear_bit
associated with the resume operation is not executed.

Please review the v2 patch, where I have described both the logical
flow before the modification and the revised logical flow after the
modification.

> 
> Alan Stern
Duan Chenghao Sept. 13, 2024, 9:38 a.m. UTC | #6
在 2024-09-13星期五的 09:51 +0800,duanchenghao写道:
> 在 2024-09-12星期四的 11:00 -0400,Alan Stern写道:
> > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > > S4 wakeup restores the image that was saved before the system
> > > > > entered
> > > > > the S4 sleep state.
> > > > > 
> > > > >     S4 waking up from hibernation
> > > > >     =============================
> > > > >     kernel initialization
> > > > >     |   
> > > > >     v   
> > > > >     freeze user task and kernel thread
> > > > >     |   
> > > > >     v   
> > > > >     load saved image
> > > > >     |    
> > > > >     v   
> > > > >     freeze the peripheral device and controller
> > > > >     (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If
> > > > > it
> > > > > is
> > > > > set,
> > > > >      return to EBUSY and do not perform the following restore
> > > > > image.)
> > > > 
> > > > Why is the flag set at this point?  It should not be; the
> > > > device
> > > > and 
> > > > controller should have been frozen with wakeup disabled.
> > > > 
> > > This is check point, not set point.
> > 
> > Yes, I know that.  But when the flag was checked, why did the code
> > find 
> > that it was set?  The flag should have been clear.
> 
> Yes, the current issue is that during S4 testing, there is a
> probabilistic scenario where clear_bit is not called after set_bit,
> or
> clear_bit is called but does not execute after set_bit. Please refer
> to
> the two modification points in the v2 patch for details, as both of
> them can cause the current issue.
> 
> > 
> > > > Is your problem related to the one discussed in this email
> > > > thread?
> > > > 
> > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > 
> > > > Would the suggestion I made there -- i.e., have the xhci-hcd 
> > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > > root
> > > > hub 
> > > > was suspended with wakeup = 0 -- fix your problem?
> > > 
> > > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > > but
> > > it's important to ensure that normal remote wakeup functionality
> > > is
> > > not
> > > compromised. Is it HUB_SUSPEND that the hub you are referring to
> > > is
> > > in
> > > a suspended state?
> > 
> > I don't understand this question.  hub_quiesce() gets called with 
> > HUB_SUSPEND when the hub enters a suspended state.
> > 
> > You are correct about the need for normal remote wakeup to work 
> > properly.  The interrupt handler should skip calling 
> > usb_hcd_resume_root_hub() for port connect or disconnect changes
> > and
> > for 
> > port overcurrent changes (when the root hub is suspended with
> > wakeup
> > = 
> > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> > port 
> > resume events.
> 
> The current issue arises when rh_state is detected as RH_SUSPEND and
> usb_hcd_resume_root_hub() is called to resume the root hub. However,
> there is no mutual exclusion between the suspend flag, set_bit, and
> clear_bit, which can lead to two scenarios:
> 
>     1. After set_bit is called, the state of the USB device is
> modified
> by another process to !USB_STATE_SUSPEND, preventing the hub's resume
> from being executed, and consequently, clear_bit is not called again.
> 
>     2. In another scenario, during the hub resume process, after
> HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet
> been set to !RH_SUSPENDED. At this point, set_bit is executed, but
> since the hub has already entered the running state, the clear_bit
> associated with the resume operation is not executed.
> 
> Please review the v2 patch, where I have described both the logical
> flow before the modification and the revised logical flow after the
> modification.
> 

In fact, issue point 2 in the patch is introduced by issue point 1, and
issue point 2 represents a further improvement. The main issue lies in
point 1, where after the execution of the top half of the interrupt is
completed, the bottom half is frozen by S4. As a result, the USB resume
is not executed during this S4 process, and clear_bit is not called as
well. This further leads to a situation where during the process of S4
putting the USB controller into suspend, the check for
HCD_FLAG_WAKEUP_PENDING being set returns -EBUSY.

> > 
> > Alan Stern
>
Hongyu Xie Sept. 14, 2024, 2:43 a.m. UTC | #7
From: Hongyu Xie <xiehongyu1@kylinos.cn>


Hi Alan,
On 2024/9/12 23:00, Alan Stern wrote:
> On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
>> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
>>> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
>>>> S4 wakeup restores the image that was saved before the system
>>>> entered
>>>> the S4 sleep state.
>>>>
>>>>      S4 waking up from hibernation
>>>>      =============================
>>>>      kernel initialization
>>>>      |
>>>>      v
>>>>      freeze user task and kernel thread
>>>>      |
>>>>      v
>>>>      load saved image
>>>>      |
>>>>      v
>>>>      freeze the peripheral device and controller
>>>>      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is
>>>> set,
>>>>       return to EBUSY and do not perform the following restore
>>>> image.)
>>>
>>> Why is the flag set at this point?  It should not be; the device and
>>> controller should have been frozen with wakeup disabled.
>>>
>> This is check point, not set point.
> 
> Yes, I know that.  But when the flag was checked, why did the code find
> that it was set?  The flag should have been clear.
Maybe duanchenghao means this,
freeze_kernel_threads
load_image_and_restore
   suspend roothub
                                      interrupt occurred
                                        usb_hcd_resume_root_hub
                                          set HCD_FLAG_WAKEUP_PENDING
                                          queue_work // freezed
   suspend pci
     return -EBUSY  because HCD_FLAG_WAKEUP_PENDING

So s4 resume failed.
> 
>>> Is your problem related to the one discussed in this email thread?
>>>
>>> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
>>>
>>> Would the suggestion I made there -- i.e., have the xhci-hcd
>>> interrupt handler skip calling usb_hcd_resume_root_hub() if the root
>>> hub
>>> was suspended with wakeup = 0 -- fix your problem?
>>
>> Skipping usb_hcd_resume_root_hub() should generally be possible, but
>> it's important to ensure that normal remote wakeup functionality is not
>> compromised. Is it HUB_SUSPEND that the hub you are referring to is in
>> a suspended state?
> 
> I don't understand this question.  hub_quiesce() gets called with
> HUB_SUSPEND when the hub enters a suspended state.
> 
> You are correct about the need for normal remote wakeup to work
> properly.  The interrupt handler should skip calling
> usb_hcd_resume_root_hub() for port connect or disconnect changes and for
> port overcurrent changes (when the root hub is suspended with wakeup =
> 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for port
> resume events.
> 
> Alan Stern
> 

Hongyu Xie
Duan Chenghao Sept. 23, 2024, 8 a.m. UTC | #8
Hi Alan,

Do you think this plan is feasible, or is there any unclear part in my
description that needs to be supplemented?

duanchenghao


在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道:
> From: Hongyu Xie <xiehongyu1@kylinos.cn>
> 
> 
> Hi Alan,
> On 2024/9/12 23:00, Alan Stern wrote:
> > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote:
> > > > > S4 wakeup restores the image that was saved before the system
> > > > > entered
> > > > > the S4 sleep state.
> > > > > 
> > > > >      S4 waking up from hibernation
> > > > >      =============================
> > > > >      kernel initialization
> > > > >      |
> > > > >      v
> > > > >      freeze user task and kernel thread
> > > > >      |
> > > > >      v
> > > > >      load saved image
> > > > >      |
> > > > >      v
> > > > >      freeze the peripheral device and controller
> > > > >      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If
> > > > > it is
> > > > > set,
> > > > >       return to EBUSY and do not perform the following
> > > > > restore
> > > > > image.)
> > > > 
> > > > Why is the flag set at this point?  It should not be; the
> > > > device and
> > > > controller should have been frozen with wakeup disabled.
> > > > 
> > > This is check point, not set point.
> > 
> > Yes, I know that.  But when the flag was checked, why did the code
> > find
> > that it was set?  The flag should have been clear.
> Maybe duanchenghao means this,
> freeze_kernel_threads
> load_image_and_restore
>    suspend roothub
>                                       interrupt occurred
>                                         usb_hcd_resume_root_hub
>                                           set HCD_FLAG_WAKEUP_PENDING
>                                           queue_work // freezed
>    suspend pci
>      return -EBUSY  because HCD_FLAG_WAKEUP_PENDING
> 
> So s4 resume failed.
> > 
> > > > Is your problem related to the one discussed in this email
> > > > thread?
> > > > 
> > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > 
> > > > Would the suggestion I made there -- i.e., have the xhci-hcd
> > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the
> > > > root
> > > > hub
> > > > was suspended with wakeup = 0 -- fix your problem?
> > > 
> > > Skipping usb_hcd_resume_root_hub() should generally be possible,
> > > but
> > > it's important to ensure that normal remote wakeup functionality
> > > is not
> > > compromised. Is it HUB_SUSPEND that the hub you are referring to
> > > is in
> > > a suspended state?
> > 
> > I don't understand this question.  hub_quiesce() gets called with
> > HUB_SUSPEND when the hub enters a suspended state.
> > 
> > You are correct about the need for normal remote wakeup to work
> > properly.  The interrupt handler should skip calling
> > usb_hcd_resume_root_hub() for port connect or disconnect changes
> > and for
> > port overcurrent changes (when the root hub is suspended with
> > wakeup =
> > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub() for
> > port
> > resume events.
> > 
> > Alan Stern
> > 
> 
> Hongyu Xie
Duan Chenghao Sept. 29, 2024, 3:14 a.m. UTC | #9
Hi Alan,

Please reveiew the patch when you have time.

duanchenghao

在 2024-09-24星期二的 09:38 -0400,Alan Stern写道:
> On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote:
> > Hi Alan,
> > 
> > Do you think this plan is feasible, or is there any unclear part in
> > my
> > description that needs to be supplemented?
> 
> I apologize for not getting back to you earlier -- I've been
> incredibly 
> busy during the last few weeks.
> 
> I still haven't had time to go over this throroughly.  If I don't 
> respond by the end of this week, remind me again.
> 
> Alan Stern
> 
> > duanchenghao
> > 
> > 
> > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道:
> > > From: Hongyu Xie <xiehongyu1@kylinos.cn>
> > > 
> > > 
> > > Hi Alan,
> > > On 2024/9/12 23:00, Alan Stern wrote:
> > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao
> > > > > > wrote:
> > > > > > > S4 wakeup restores the image that was saved before the
> > > > > > > system
> > > > > > > entered
> > > > > > > the S4 sleep state.
> > > > > > > 
> > > > > > >      S4 waking up from hibernation
> > > > > > >      =============================
> > > > > > >      kernel initialization
> > > > > > >      |
> > > > > > >      v
> > > > > > >      freeze user task and kernel thread
> > > > > > >      |
> > > > > > >      v
> > > > > > >      load saved image
> > > > > > >      |
> > > > > > >      v
> > > > > > >      freeze the peripheral device and controller
> > > > > > >      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB.
> > > > > > > If
> > > > > > > it is
> > > > > > > set,
> > > > > > >       return to EBUSY and do not perform the following
> > > > > > > restore
> > > > > > > image.)
> > > > > > 
> > > > > > Why is the flag set at this point?  It should not be; the
> > > > > > device and
> > > > > > controller should have been frozen with wakeup disabled.
> > > > > > 
> > > > > This is check point, not set point.
> > > > 
> > > > Yes, I know that.  But when the flag was checked, why did the
> > > > code
> > > > find
> > > > that it was set?  The flag should have been clear.
> > > Maybe duanchenghao means this,
> > > freeze_kernel_threads
> > > load_image_and_restore
> > >    suspend roothub
> > >                                       interrupt occurred
> > >                                         usb_hcd_resume_root_hub
> > >                                           set
> > > HCD_FLAG_WAKEUP_PENDING
> > >                                           queue_work // freezed
> > >    suspend pci
> > >      return -EBUSY  because HCD_FLAG_WAKEUP_PENDING
> > > 
> > > So s4 resume failed.
> > > > 
> > > > > > Is your problem related to the one discussed in this email
> > > > > > thread?
> > > > > > 
> > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > > > 
> > > > > > Would the suggestion I made there -- i.e., have the xhci-
> > > > > > hcd
> > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if
> > > > > > the
> > > > > > root
> > > > > > hub
> > > > > > was suspended with wakeup = 0 -- fix your problem?
> > > > > 
> > > > > Skipping usb_hcd_resume_root_hub() should generally be
> > > > > possible,
> > > > > but
> > > > > it's important to ensure that normal remote wakeup
> > > > > functionality
> > > > > is not
> > > > > compromised. Is it HUB_SUSPEND that the hub you are referring
> > > > > to
> > > > > is in
> > > > > a suspended state?
> > > > 
> > > > I don't understand this question.  hub_quiesce() gets called
> > > > with
> > > > HUB_SUSPEND when the hub enters a suspended state.
> > > > 
> > > > You are correct about the need for normal remote wakeup to work
> > > > properly.  The interrupt handler should skip calling
> > > > usb_hcd_resume_root_hub() for port connect or disconnect
> > > > changes
> > > > and for
> > > > port overcurrent changes (when the root hub is suspended with
> > > > wakeup =
> > > > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub()
> > > > for
> > > > port
> > > > resume events.
> > > > 
> > > > Alan Stern
> > > > 
> > > 
> > > Hongyu Xie
> >
Duan Chenghao Oct. 9, 2024, 1:19 a.m. UTC | #10
Hi Alan,

I haven't received a reply from you since my last email. Could you
please confirm if you have received this one?

I'm worried that there might be an issue with the email system and you
might not be receiving them.

Duanchenghao


在 2024-09-29星期日的 11:14 +0800,duanchenghao写道:
> Hi Alan,
> 
> Please reveiew the patch when you have time.
> 
> duanchenghao
> 
> 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道:
> > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote:
> > > Hi Alan,
> > > 
> > > Do you think this plan is feasible, or is there any unclear part
> > > in
> > > my
> > > description that needs to be supplemented?
> > 
> > I apologize for not getting back to you earlier -- I've been
> > incredibly 
> > busy during the last few weeks.
> > 
> > I still haven't had time to go over this throroughly.  If I don't 
> > respond by the end of this week, remind me again.
> > 
> > Alan Stern
> > 
> > > duanchenghao
> > > 
> > > 
> > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道:
> > > > From: Hongyu Xie <xiehongyu1@kylinos.cn>
> > > > 
> > > > 
> > > > Hi Alan,
> > > > On 2024/9/12 23:00, Alan Stern wrote:
> > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao
> > > > > > > wrote:
> > > > > > > > S4 wakeup restores the image that was saved before the
> > > > > > > > system
> > > > > > > > entered
> > > > > > > > the S4 sleep state.
> > > > > > > > 
> > > > > > > >      S4 waking up from hibernation
> > > > > > > >      =============================
> > > > > > > >      kernel initialization
> > > > > > > >      |
> > > > > > > >      v
> > > > > > > >      freeze user task and kernel thread
> > > > > > > >      |
> > > > > > > >      v
> > > > > > > >      load saved image
> > > > > > > >      |
> > > > > > > >      v
> > > > > > > >      freeze the peripheral device and controller
> > > > > > > >      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the
> > > > > > > > USB.
> > > > > > > > If
> > > > > > > > it is
> > > > > > > > set,
> > > > > > > >       return to EBUSY and do not perform the following
> > > > > > > > restore
> > > > > > > > image.)
> > > > > > > 
> > > > > > > Why is the flag set at this point?  It should not be; the
> > > > > > > device and
> > > > > > > controller should have been frozen with wakeup disabled.
> > > > > > > 
> > > > > > This is check point, not set point.
> > > > > 
> > > > > Yes, I know that.  But when the flag was checked, why did the
> > > > > code
> > > > > find
> > > > > that it was set?  The flag should have been clear.
> > > > Maybe duanchenghao means this,
> > > > freeze_kernel_threads
> > > > load_image_and_restore
> > > >    suspend roothub
> > > >                                       interrupt occurred
> > > >                                         usb_hcd_resume_root_hub
> > > >                                           set
> > > > HCD_FLAG_WAKEUP_PENDING
> > > >                                           queue_work // freezed
> > > >    suspend pci
> > > >      return -EBUSY  because HCD_FLAG_WAKEUP_PENDING
> > > > 
> > > > So s4 resume failed.
> > > > > 
> > > > > > > Is your problem related to the one discussed in this
> > > > > > > email
> > > > > > > thread?
> > > > > > > 
> > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > > > > 
> > > > > > > Would the suggestion I made there -- i.e., have the xhci-
> > > > > > > hcd
> > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub()
> > > > > > > if
> > > > > > > the
> > > > > > > root
> > > > > > > hub
> > > > > > > was suspended with wakeup = 0 -- fix your problem?
> > > > > > 
> > > > > > Skipping usb_hcd_resume_root_hub() should generally be
> > > > > > possible,
> > > > > > but
> > > > > > it's important to ensure that normal remote wakeup
> > > > > > functionality
> > > > > > is not
> > > > > > compromised. Is it HUB_SUSPEND that the hub you are
> > > > > > referring
> > > > > > to
> > > > > > is in
> > > > > > a suspended state?
> > > > > 
> > > > > I don't understand this question.  hub_quiesce() gets called
> > > > > with
> > > > > HUB_SUSPEND when the hub enters a suspended state.
> > > > > 
> > > > > You are correct about the need for normal remote wakeup to
> > > > > work
> > > > > properly.  The interrupt handler should skip calling
> > > > > usb_hcd_resume_root_hub() for port connect or disconnect
> > > > > changes
> > > > > and for
> > > > > port overcurrent changes (when the root hub is suspended with
> > > > > wakeup =
> > > > > 0).  But it should _not_ skip calling
> > > > > usb_hcd_resume_root_hub()
> > > > > for
> > > > > port
> > > > > resume events.
> > > > > 
> > > > > Alan Stern
> > > > > 
> > > > 
> > > > Hongyu Xie
> > > 
>
Alan Stern Oct. 9, 2024, 1:54 a.m. UTC | #11
On Wed, Oct 09, 2024 at 09:19:39AM +0800, duanchenghao wrote:
> Hi Alan,
> 
> I haven't received a reply from you since my last email. Could you
> please confirm if you have received this one?

I have.

> I'm worried that there might be an issue with the email system and you
> might not be receiving them.

I sent a message 9 days ago.  See

https://lore.kernel.org/all/85105e45-3553-4a8c-b132-3875c4657c4b@rowland.harvard.edu/

You were CC'ed on that message; maybe you didn't receive it.

Maybe the topic of that thread isn't exactly the same as the topic of 
your thread; I tend to get the two of them confused.

Alan Stern
Duan Chenghao Oct. 9, 2024, 2:35 a.m. UTC | #12
Hi Alan,

These are two patches, each addressing the same issue. The current
patch is a direct solution to the problem of the interrupt bottom half
being frozen. The patch you replied with is another, alternative
solution to the same problem. Please review which solution is more
suitable, or if there are any other revised proposals.


Please review the patch I mentioned:
https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/

Duanchenghao

在 2024-09-24星期二的 09:38 -0400,Alan Stern写道:
> On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote:
> > Hi Alan,
> > 
> > Do you think this plan is feasible, or is there any unclear part in
> > my
> > description that needs to be supplemented?
> 
> I apologize for not getting back to you earlier -- I've been
> incredibly 
> busy during the last few weeks.
> 
> I still haven't had time to go over this throroughly.  If I don't 
> respond by the end of this week, remind me again.
> 
> Alan Stern
> 
> > duanchenghao
> > 
> > 
> > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道:
> > > From: Hongyu Xie <xiehongyu1@kylinos.cn>
> > > 
> > > 
> > > Hi Alan,
> > > On 2024/9/12 23:00, Alan Stern wrote:
> > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote:
> > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao
> > > > > > wrote:
> > > > > > > S4 wakeup restores the image that was saved before the
> > > > > > > system
> > > > > > > entered
> > > > > > > the S4 sleep state.
> > > > > > > 
> > > > > > >      S4 waking up from hibernation
> > > > > > >      =============================
> > > > > > >      kernel initialization
> > > > > > >      |
> > > > > > >      v
> > > > > > >      freeze user task and kernel thread
> > > > > > >      |
> > > > > > >      v
> > > > > > >      load saved image
> > > > > > >      |
> > > > > > >      v
> > > > > > >      freeze the peripheral device and controller
> > > > > > >      (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB.
> > > > > > > If
> > > > > > > it is
> > > > > > > set,
> > > > > > >       return to EBUSY and do not perform the following
> > > > > > > restore
> > > > > > > image.)
> > > > > > 
> > > > > > Why is the flag set at this point?  It should not be; the
> > > > > > device and
> > > > > > controller should have been frozen with wakeup disabled.
> > > > > > 
> > > > > This is check point, not set point.
> > > > 
> > > > Yes, I know that.  But when the flag was checked, why did the
> > > > code
> > > > find
> > > > that it was set?  The flag should have been clear.
> > > Maybe duanchenghao means this,
> > > freeze_kernel_threads
> > > load_image_and_restore
> > >    suspend roothub
> > >                                       interrupt occurred
> > >                                         usb_hcd_resume_root_hub
> > >                                           set
> > > HCD_FLAG_WAKEUP_PENDING
> > >                                           queue_work // freezed
> > >    suspend pci
> > >      return -EBUSY  because HCD_FLAG_WAKEUP_PENDING
> > > 
> > > So s4 resume failed.
> > > > 
> > > > > > Is your problem related to the one discussed in this email
> > > > > > thread?
> > > > > > 
> > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > > > 
> > > > > > Would the suggestion I made there -- i.e., have the xhci-
> > > > > > hcd
> > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if
> > > > > > the
> > > > > > root
> > > > > > hub
> > > > > > was suspended with wakeup = 0 -- fix your problem?
> > > > > 
> > > > > Skipping usb_hcd_resume_root_hub() should generally be
> > > > > possible,
> > > > > but
> > > > > it's important to ensure that normal remote wakeup
> > > > > functionality
> > > > > is not
> > > > > compromised. Is it HUB_SUSPEND that the hub you are referring
> > > > > to
> > > > > is in
> > > > > a suspended state?
> > > > 
> > > > I don't understand this question.  hub_quiesce() gets called
> > > > with
> > > > HUB_SUSPEND when the hub enters a suspended state.
> > > > 
> > > > You are correct about the need for normal remote wakeup to work
> > > > properly.  The interrupt handler should skip calling
> > > > usb_hcd_resume_root_hub() for port connect or disconnect
> > > > changes
> > > > and for
> > > > port overcurrent changes (when the root hub is suspended with
> > > > wakeup =
> > > > 0).  But it should _not_ skip calling usb_hcd_resume_root_hub()
> > > > for
> > > > port
> > > > resume events.
> > > > 
> > > > Alan Stern
> > > > 
> > > 
> > > Hongyu Xie
> >
Gopal, Saranya Oct. 9, 2024, 6:29 a.m. UTC | #13
Hi Duanchenghao,

> -----Original Message-----
> From: duanchenghao <duanchenghao@kylinos.cn>
> Sent: Wednesday, October 09, 2024 8:05 AM
> To: Alan Stern <stern@rowland.harvard.edu>
> Cc: Hongyu Xie <xy521521@gmail.com>;
> gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org; linux-
> pm@vger.kernel.org; linux-usb@vger.kernel.org;
> niko.mauno@vaisala.com; pavel@ucw.cz;
> stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie
> <xiehongyu1@kylinos.cn>
> Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused
> by USB status when S4 wakes up
> 
> Hi Alan,
> 
> These are two patches, each addressing the same issue. The current
> patch is a direct solution to the problem of the interrupt bottom half
> being frozen. The patch you replied with is another, alternative
> solution to the same problem. Please review which solution is more
> suitable, or if there are any other revised proposals.
> 
> 
> Please review the patch I mentioned:
> https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f
> 05539a.camel@kylinos.cn/
> 

I was able to test 1500 S4 cycles in my system with your patch and all cycles passed with your patch.
I was facing an issue similar to what Kai-Heng was observing in the below mail thread:
https://lore.kernel.org/lkml/b8553def-19ea-41d5-b665-4859ddb7b6d5@redhat.com/T/
He was observing issues due to USB touchscreen. And my issue was due to Bluetooth USB controller.
But, xHCI error logs were exactly same.
It looks like your patch is fixing both the issues.
It would be nice to see this patch land upstream.

Thanks,
Saranya

> Duanchenghao
> 
> 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道:
> > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote:
> > > Hi Alan,
> > >
> > > Do you think this plan is feasible, or is there any unclear part in
> > > my
> > > description that needs to be supplemented?
> >
> > I apologize for not getting back to you earlier -- I've been
> > incredibly
> > busy during the last few weeks.
> >
> > I still haven't had time to go over this throroughly.  If I don't
> > respond by the end of this week, remind me again.
> >
> > Alan Stern
> >
> > > duanchenghao
> > >
> > >
> > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道:
> > > > From: Hongyu Xie <xiehongyu1@kylinos.cn>
> > > >
> > > >
> > > > Hi Alan,
> > > > On 2024/9/12 23:00, Alan Stern wrote:
> > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao
> wrote:
> > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道:
> > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800,
> duanchenghao
> > > > > > > wrote:
> > > > > > > > S4 wakeup restores the image that was saved before the
> > > > > > > > system
> > > > > > > > entered
> > > > > > > > the S4 sleep state.
> > > > > > > >
> > > > > > > >      S4 waking up from hibernation
> > > > > > > >      =============================
> > > > > > > >      kernel initialization
> > > > > > > >      |
> > > > > > > >      v
> > > > > > > >      freeze user task and kernel thread
> > > > > > > >      |
> > > > > > > >      v
> > > > > > > >      load saved image
> > > > > > > >      |
> > > > > > > >      v
> > > > > > > >      freeze the peripheral device and controller
> > > > > > > >      (Check the HCD_FLAG_WAKEUP_ PENDING flag of
> the USB.
> > > > > > > > If
> > > > > > > > it is
> > > > > > > > set,
> > > > > > > >       return to EBUSY and do not perform the following
> > > > > > > > restore
> > > > > > > > image.)
> > > > > > >
> > > > > > > Why is the flag set at this point?  It should not be; the
> > > > > > > device and
> > > > > > > controller should have been frozen with wakeup disabled.
> > > > > > >
> > > > > > This is check point, not set point.
> > > > >
> > > > > Yes, I know that.  But when the flag was checked, why did the
> > > > > code
> > > > > find
> > > > > that it was set?  The flag should have been clear.
> > > > Maybe duanchenghao means this,
> > > > freeze_kernel_threads
> > > > load_image_and_restore
> > > >    suspend roothub
> > > >                                       interrupt occurred
> > > >                                         usb_hcd_resume_root_hub
> > > >                                           set
> > > > HCD_FLAG_WAKEUP_PENDING
> > > >                                           queue_work // freezed
> > > >    suspend pci
> > > >      return -EBUSY  because HCD_FLAG_WAKEUP_PENDING
> > > >
> > > > So s4 resume failed.
> > > > >
> > > > > > > Is your problem related to the one discussed in this email
> > > > > > > thread?
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-
> b328-852b6ac8ecb5@rowland.harvard.edu/
> > > > > > >
> > > > > > > Would the suggestion I made there -- i.e., have the xhci-
> > > > > > > hcd
> > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub()
> if
> > > > > > > the
> > > > > > > root
> > > > > > > hub
> > > > > > > was suspended with wakeup = 0 -- fix your problem?
> > > > > >
> > > > > > Skipping usb_hcd_resume_root_hub() should generally be
> > > > > > possible,
> > > > > > but
> > > > > > it's important to ensure that normal remote wakeup
> > > > > > functionality
> > > > > > is not
> > > > > > compromised. Is it HUB_SUSPEND that the hub you are
> referring
> > > > > > to
> > > > > > is in
> > > > > > a suspended state?
> > > > >
> > > > > I don't understand this question.  hub_quiesce() gets called
> > > > > with
> > > > > HUB_SUSPEND when the hub enters a suspended state.
> > > > >
> > > > > You are correct about the need for normal remote wakeup to
> work
> > > > > properly.  The interrupt handler should skip calling
> > > > > usb_hcd_resume_root_hub() for port connect or disconnect
> > > > > changes
> > > > > and for
> > > > > port overcurrent changes (when the root hub is suspended
> with
> > > > > wakeup =
> > > > > 0).  But it should _not_ skip calling
> usb_hcd_resume_root_hub()
> > > > > for
> > > > > port
> > > > > resume events.
> > > > >
> > > > > Alan Stern
> > > > >
> > > >
> > > > Hongyu Xie
> > >
>
Alan Stern Oct. 9, 2024, 3:50 p.m. UTC | #14
On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote:
> Hi Alan,
> 
> These are two patches, each addressing the same issue. The current
> patch is a direct solution to the problem of the interrupt bottom half
> being frozen. The patch you replied with is another, alternative
> solution to the same problem. Please review which solution is more
> suitable, or if there are any other revised proposals.
> 
> 
> Please review the patch I mentioned:
> https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/

I still think your whole approach is wrong.  There is no need to change 
the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; that's not 
the reason for the problem you're seeing.

The problem occurs because when suspend_common() in 
drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling 
device_may_wakeup(), and the value that function returns is not what we 
want.  The value is based on whether the controller's power/wakeup 
attribute is set, but we also have to take into account the type of 
suspend that's happening.

Basically, if msg is one of the suspend types for which wakeups should 
always be disabled, then do_wakeup should be set to false.  There isn't 
a macro to test for these things, but there should be.  Something like 
PMSG_IS_AUTO() in include/linux/pm.h; maybe call it PMSG_NO_WAKEUP().  
This macro should return true if the PM_EVENT_FREEZE or PM_EVENT_QUIESCE 
bits are set in msg.event.

Rafael, please correct me if I got any of this wrong.

So the right way to fix the problem is to add to pm.h:

#define PMSG_NO_WAKEUP(msg)	(((msg.event) & \
		(PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0)

and in suspend_common():

	if (PMSG_IS_AUTO(msg))
		do_wakeup = true;
	else if (PMSG_NO_WAKEUP(msg))
		do_wakeup = false;
	else
		do_wakeup = device_may_wakeup(dev);

Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() tests 
in the following code will be executed, so the routine won't fail during 
the freeze or restore phase with -EBUSY.

You'll also have to define an hcd_pci_freeze() routine, just 
like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of 
PMSG_SUSPEND.  And the .freeze callback in usb_hcd_pci_pm_ops should be 
changed to hcd_pci_freeze.

In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c 
could also use the new macro, instead of checking for FREEZE or QUIESCE 
directly the way it does now.

Alan Stern
Duan Chenghao Oct. 10, 2024, 5:53 a.m. UTC | #15
Hi Alan,

在 2024-10-09星期三的 11:50 -0400,Alan Stern写道:
> On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote:
> > Hi Alan,
> > 
> > These are two patches, each addressing the same issue. The current
> > patch is a direct solution to the problem of the interrupt bottom
> > half
> > being frozen. The patch you replied with is another, alternative
> > solution to the same problem. Please review which solution is more
> > suitable, or if there are any other revised proposals.
> > 
> > 
> > Please review the patch I mentioned:
> > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/
> 
> I still think your whole approach is wrong.  There is no need to
> change 
> the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; that's
> not 
> the reason for the problem you're seeing.
> 

Thank you very much for your evaluation of the scheme. I have a
question regarding why the set_bit operation for the
HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an
interrupt handler, while the clear_bit operation is done in the bottom
half. This seems to contradict conventional practices. The issue is not
limited to S4; if other processes freeze the work queue in the bottom
half, the same problem may arise.

The solution you described below should be able to resolve the current
S4 issue, but for now, we haven't identified any other scenarios that
require the same operation. Based on my understanding, the USB device
is woken up in the bottom half of the interrupt, and both the set_bit
and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be
executed within the same thread in the bottom half. May I ask if there
are any other reasons why the set_bit is executed in the top half?

> The problem occurs because when suspend_common() in 
> drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling 
> device_may_wakeup(), and the value that function returns is not what
> we 
> want.  The value is based on whether the controller's power/wakeup 
> attribute is set, but we also have to take into account the type of 
> suspend that's happening.
> 
> Basically, if msg is one of the suspend types for which wakeups
> should 
> always be disabled, then do_wakeup should be set to false.  There
> isn't 
> a macro to test for these things, but there should be.  Something
> like 
> PMSG_IS_AUTO() in include/linux/pm.h; maybe call it
> PMSG_NO_WAKEUP().  
> This macro should return true if the PM_EVENT_FREEZE or
> PM_EVENT_QUIESCE 
> bits are set in msg.event.
> 
> Rafael, please correct me if I got any of this wrong.
> 
> So the right way to fix the problem is to add to pm.h:
> 
> #define PMSG_NO_WAKEUP(msg)     (((msg.event) & \
>                 (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0)
> 
> and in suspend_common():
> 
>         if (PMSG_IS_AUTO(msg))
>                 do_wakeup = true;
>         else if (PMSG_NO_WAKEUP(msg))
>                 do_wakeup = false;
>         else
>                 do_wakeup = device_may_wakeup(dev);
> 
> Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING()
> tests 
> in the following code will be executed, so the routine won't fail
> during 
> the freeze or restore phase with -EBUSY.
> 
> You'll also have to define an hcd_pci_freeze() routine, just 
> like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of 
> PMSG_SUSPEND.  And the .freeze callback in usb_hcd_pci_pm_ops should
> be 
> changed to hcd_pci_freeze.
> 
> In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c 
> could also use the new macro, instead of checking for FREEZE or
> QUIESCE 
> directly the way it does now.
> 
> Alan Stern
Duan Chenghao Oct. 10, 2024, 5:59 a.m. UTC | #16
Hi Alan,

在 2024-10-09星期三的 11:50 -0400,Alan Stern写道:
> > On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote:
> > > > Hi Alan,
> > > > 
> > > > These are two patches, each addressing the same issue. The
> > > > current
> > > > patch is a direct solution to the problem of the interrupt
> > > > bottom
> > > > half
> > > > being frozen. The patch you replied with is another,
> > > > alternative
> > > > solution to the same problem. Please review which solution is
> > > > more
> > > > suitable, or if there are any other revised proposals.
> > > > 
> > > > 
> > > > Please review the patch I mentioned:
> > > > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/
> > 
> > I still think your whole approach is wrong.  There is no need to
> > change 
> > the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared;
> > that's
> > not 
> > the reason for the problem you're seeing.
> > 

Thank you very much for your evaluation of the scheme. I have a
question regarding why the set_bit operation for the
HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an
interrupt handler, while the clear_bit operation is done in the bottom
half. This seems to contradict conventional practices. The issue is not
limited to S4; if other processes freeze the work queue in the bottom
half, the same problem may arise.

The solution you described below should be able to resolve the current
S4 issue, but for now, we haven't identified any other scenarios that
require the same operation. Based on my understanding, the USB device
is woken up in the bottom half of the interrupt, and both the set_bit
and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be
executed within the same thread in the bottom half. May I ask if there
are any other reasons why the set_bit is executed in the top half?

Thanks
Duanchenghao

> > The problem occurs because when suspend_common() in 
> > drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling 
> > device_may_wakeup(), and the value that function returns is not
> > what
> > we 
> > want.  The value is based on whether the controller's power/wakeup 
> > attribute is set, but we also have to take into account the type of
> > suspend that's happening.
> > 
> > Basically, if msg is one of the suspend types for which wakeups
> > should 
> > always be disabled, then do_wakeup should be set to false.  There
> > isn't 
> > a macro to test for these things, but there should be.  Something
> > like 
> > PMSG_IS_AUTO() in include/linux/pm.h; maybe call it
> > PMSG_NO_WAKEUP().  
> > This macro should return true if the PM_EVENT_FREEZE or
> > PM_EVENT_QUIESCE 
> > bits are set in msg.event.
> > 
> > Rafael, please correct me if I got any of this wrong.
> > 
> > So the right way to fix the problem is to add to pm.h:
> > 
> > #define PMSG_NO_WAKEUP(msg)     (((msg.event) & \
> >                 (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0)
> > 
> > and in suspend_common():
> > 
> >         if (PMSG_IS_AUTO(msg))
> >                 do_wakeup = true;
> >         else if (PMSG_NO_WAKEUP(msg))
> >                 do_wakeup = false;
> >         else
> >                 do_wakeup = device_may_wakeup(dev);
> > 
> > Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING()
> > tests 
> > in the following code will be executed, so the routine won't fail
> > during 
> > the freeze or restore phase with -EBUSY.
> > 
> > You'll also have to define an hcd_pci_freeze() routine, just 
> > like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of 
> > PMSG_SUSPEND.  And the .freeze callback in usb_hcd_pci_pm_ops
> > should
> > be 
> > changed to hcd_pci_freeze.
> > 
> > In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c
> > could also use the new macro, instead of checking for FREEZE or
> > QUIESCE 
> > directly the way it does now.
> > 
> > Alan Stern
Alan Stern Oct. 10, 2024, 2:21 p.m. UTC | #17
On Thu, Oct 10, 2024 at 01:59:08PM +0800, duanchenghao wrote:
> Hi Alan,

> Thank you very much for your evaluation of the scheme. I have a
> question regarding why the set_bit operation for the
> HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an
> interrupt handler, while the clear_bit operation is done in the bottom
> half. This seems to contradict conventional practices. The issue is not
> limited to S4; if other processes freeze the work queue in the bottom
> half, the same problem may arise.

The flag is treated this way because that's what it means: A wakeup is 
pending.  The kernel first learns about the wakeup when it receives the 
wakeup interrupt from the host controller, so that's when it sets the 
flag -- in the top half of the interrupt handler.  The wakeup procedure 
isn't complete until the root hub has been resumed, so the flag remains 
set until that resume is finished, in the bottom half.

You say "the same problem may arise", but I don't think it is a problem.  
If the system is about to suspend the host controller with wakeups 
enabled, and a wakeup request has already been received but the system 
has not yet finished acting on it, then the suspend _should_ fail.  
After all, if the wakeup interrupt had arrived just after the host 
controller was suspended rather than just before, it would have caused 
the host controller to be resumed right away -- with exactly the same 
effect as if the controller had never been suspended in the first place.

> The solution you described below should be able to resolve the current
> S4 issue, but for now, we haven't identified any other scenarios that
> require the same operation.

Perhaps because there aren't any.

>  Based on my understanding, the USB device
> is woken up in the bottom half of the interrupt,

You are failing to distinguish between the host controller and the root 
hub.  The host controller (which is a PCI device on your system, not a 
USB device) is woken up in the top half of the interrupt handler.  The 
root hub (which is a virtual USB device) is woken up in the bottom half.  
Both operations have to occur before the wakeup can be considered fully 
complete.

>  and both the set_bit
> and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be
> executed within the same thread in the bottom half. May I ask if there
> are any other reasons why the set_bit is executed in the top half?

Because the root hub's wakeup becomes pending when the host controller 
is resumed, in the top half.

Alan Stern
Duan Chenghao Oct. 11, 2024, 1:42 a.m. UTC | #18
在 2024-10-10星期四的 10:21 -0400,Alan Stern写道:
> On Thu, Oct 10, 2024 at 01:59:08PM +0800, duanchenghao wrote:
> > Hi Alan,
> 
> > Thank you very much for your evaluation of the scheme. I have a
> > question regarding why the set_bit operation for the
> > HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an
> > interrupt handler, while the clear_bit operation is done in the
> > bottom
> > half. This seems to contradict conventional practices. The issue is
> > not
> > limited to S4; if other processes freeze the work queue in the
> > bottom
> > half, the same problem may arise.
> 
> The flag is treated this way because that's what it means: A wakeup
> is 
> pending.  The kernel first learns about the wakeup when it receives
> the 
> wakeup interrupt from the host controller, so that's when it sets the
> flag -- in the top half of the interrupt handler.  The wakeup
> procedure 
> isn't complete until the root hub has been resumed, so the flag
> remains 
> set until that resume is finished, in the bottom half.
> 
> You say "the same problem may arise", but I don't think it is a
> problem.  
> If the system is about to suspend the host controller with wakeups 
> enabled, and a wakeup request has already been received but the
> system 
> has not yet finished acting on it, then the suspend _should_ fail.  
> After all, if the wakeup interrupt had arrived just after the host 
> controller was suspended rather than just before, it would have
> caused 
> the host controller to be resumed right away -- with exactly the same
> effect as if the controller had never been suspended in the first
> place.
> 
> > The solution you described below should be able to resolve the
> > current
> > S4 issue, but for now, we haven't identified any other scenarios
> > that
> > require the same operation.
> 
> Perhaps because there aren't any.
> 
> >  Based on my understanding, the USB device
> > is woken up in the bottom half of the interrupt,
> 
> You are failing to distinguish between the host controller and the
> root 
> hub.  The host controller (which is a PCI device on your system, not
> a 
> USB device) is woken up in the top half of the interrupt handler. 
> The 
> root hub (which is a virtual USB device) is woken up in the bottom
> half.  
> Both operations have to occur before the wakeup can be considered
> fully 
> complete.
> 
> >  and both the set_bit
> > and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag
> > should be
> > executed within the same thread in the bottom half. May I ask if
> > there
> > are any other reasons why the set_bit is executed in the top half?
> 
> Because the root hub's wakeup becomes pending when the host
> controller 
> is resumed, in the top half.
> 
> Alan Stern
> 

Hi Alan,

I roughly understand now.

In your previous email, you mentioned assigning a value to do_wakeup
based on the judgment of PMSG in suspend_common, but there is no
parameter passing of PMSG in suspend_common.

Do you mean using the global parameter pm_transition.event for the
judgment?

Thanks
Duanchenghao
Alan Stern Oct. 11, 2024, 1:53 p.m. UTC | #19
On Fri, Oct 11, 2024 at 09:42:11AM +0800, duanchenghao wrote:
> Hi Alan,
> 
> I roughly understand now.
> 
> In your previous email, you mentioned assigning a value to do_wakeup
> based on the judgment of PMSG in suspend_common, but there is no
> parameter passing of PMSG in suspend_common.

In my kernel tree, the first line of code in suspend_common() (following 
all the variable definitions) is this:

	do_wakeup = PMSG_IS_AUTO(msg) ? true : device_may_wakeup(dev);

That's what I was talking about.

> Do you mean using the global parameter pm_transition.event for the
> judgment?

No, I meant what I wrote.

Alan Stern
Duan Chenghao Oct. 12, 2024, 9:51 a.m. UTC | #20
Hi Alan,

The V3 patch has been sent. Please review it to check if it aligns with
the solution you described.

Thanks
Duan Chenghao

在 2024-10-11星期五的 09:53 -0400,Alan Stern写道:
> On Fri, Oct 11, 2024 at 09:42:11AM +0800, duanchenghao wrote:
> > Hi Alan,
> > 
> > I roughly understand now.
> > 
> > In your previous email, you mentioned assigning a value to
> > do_wakeup
> > based on the judgment of PMSG in suspend_common, but there is no
> > parameter passing of PMSG in suspend_common.
> 
> In my kernel tree, the first line of code in suspend_common()
> (following 
> all the variable definitions) is this:
> 
>         do_wakeup = PMSG_IS_AUTO(msg) ? true :
> device_may_wakeup(dev);
> 
> That's what I was talking about.
> 
> > Do you mean using the global parameter pm_transition.event for the
> > judgment?
> 
> No, I meant what I wrote.
> 
> Alan Stern
gregkh@linuxfoundation.org Oct. 12, 2024, 12:39 p.m. UTC | #21
On Sat, Oct 12, 2024 at 05:46:33PM +0800, Duan Chenghao wrote:
> When a device is inserted into the USB port and an S4 wakeup is initiated,
> after the USB-hub initialization is completed, it will automatically enter
> suspend mode. Upon detecting a device on the USB port, it will proceed with
> resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During the S4
> wakeup process, peripherals are put into suspend mode, followed by task
> recovery. However, upon detecting that the hcd is in the
> HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, causing the
> S4 suspend to fail and subsequent task recovery to not proceed.
> -
> [   27.594598][ 1]  PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28 returns -16
> [   27.594601][ 1]  PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100 returns -16
> [   27.603420][ 1]  ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100 returned 0 after 3 usecs
> [   27.612233][ 1]  ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100 returned -16 after 17223 usecs
> [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce async: error -16
> [   27.816988][ 1]  PM: quiesce of devices aborted after 1833.282 msecs
> [   27.823302][ 1]  PM: start quiesce of devices aborted after 1839.975 msecs
> ......
> [   31.303172][ 1]  PM: recover of devices complete after 3473.039 msecs
> [   31.309818][ 1]  PM: Failed to load hibernation image, recovering.
> [   31.348188][ 1]  PM: Basic memory bitmaps freed
> [   31.352686][ 1]  OOM killer enabled.
> [   31.356232][ 1]  Restarting tasks ... done.
> [   31.360609][ 1]  PM: resume from hibernation failed (0)
> [   31.365800][ 1]  PM: Hibernation image not present or could not be loaded.
> 
> The "do_wakeup" is determined based on whether the controller's
> power/wakeup attribute is set. The current issue necessitates considering
> the type of suspend that is occurring. If the suspend type is either
> PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set to
> false.
> 
> Co-authored-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> ---
>  drivers/usb/core/hcd-pci.c | 14 ++++++++++++--
>  include/linux/pm.h         |  3 ++-
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index a08f3f228e6d..2c7d4446b82e 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -422,7 +422,12 @@ static int suspend_common(struct device *dev, pm_message_t msg)
>  	bool			do_wakeup;
>  	int			retval;
>  
> -	do_wakeup = PMSG_IS_AUTO(msg) ? true : device_may_wakeup(dev);
> +	if (PMSG_IS_AUTO(msg))
> +		do_wakeup = true;
> +	else if (PMSG_NO_WAKEUP(msg))
> +		do_wakeup = false;
> +	else
> +		do_wakeup = device_may_wakeup(dev);
>  
>  	/* Root hub suspend should have stopped all downstream traffic,
>  	 * and all bus master traffic.  And done so for both the interface
> @@ -521,6 +526,11 @@ static int hcd_pci_suspend(struct device *dev)
>  	return suspend_common(dev, PMSG_SUSPEND);
>  }
>  
> +static int hcd_pci_freeze(struct device *dev)
> +{
> +	return suspend_common(dev, PMSG_FREEZE);
> +}
> +
>  static int hcd_pci_suspend_noirq(struct device *dev)
>  {
>  	struct pci_dev		*pci_dev = to_pci_dev(dev);
> @@ -624,7 +634,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = {
>  	.suspend_noirq	= hcd_pci_suspend_noirq,
>  	.resume_noirq	= hcd_pci_resume_noirq,
>  	.resume		= hcd_pci_resume,
> -	.freeze		= hcd_pci_suspend,
> +	.freeze		= hcd_pci_freeze,
>  	.freeze_noirq	= check_root_hub_suspended,
>  	.thaw_noirq	= NULL,
>  	.thaw		= hcd_pci_resume,
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 97b0e23363c8..0dd7f559188b 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -570,7 +570,8 @@ const struct dev_pm_ops name = { \
>  					{ .event = PM_EVENT_AUTO_RESUME, })
>  
>  #define PMSG_IS_AUTO(msg)	(((msg).event & PM_EVENT_AUTO) != 0)
> -
> +#define PMSG_NO_WAKEUP(msg)	(((msg.event) & \
> +				(PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0)
>  /*
>   * Device run-time power management status.
>   *
> -- 
> 2.34.1
> 
> 

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/process/submitting-patches.rst for what
  needs to be done here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot
gregkh@linuxfoundation.org Oct. 12, 2024, 12:40 p.m. UTC | #22
On Sat, Oct 12, 2024 at 05:46:33PM +0800, Duan Chenghao wrote:
> Co-authored-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>

That's not how to use a co-authored-by tag, please read the
documentation for how to do so properly.

thanks,

greg k-h
Alan Stern Oct. 12, 2024, 3:01 p.m. UTC | #23
On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> 
> Hi Alan,
> 
> The V3 patch has been sent. Please review it to check if it aligns with
> the solution you described.

Yes, that is what I meant.

Have you and all the other people at kylinos.cn tested the patch to make 
sure that it fixes the problem?

Alan Stern
Alan Stern Oct. 12, 2024, 3:06 p.m. UTC | #24
On Sat, Oct 12, 2024 at 02:40:01PM +0200, Greg KH wrote:
> On Sat, Oct 12, 2024 at 05:46:33PM +0800, Duan Chenghao wrote:
> > Co-authored-by: Alan Stern <stern@rowland.harvard.edu>
> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> 
> That's not how to use a co-authored-by tag, please read the
> documentation for how to do so properly.

Duan, you can add my Signed-off-by: when you resubmit the patch.

Alan Stern
Duan Chenghao Oct. 14, 2024, 1:32 a.m. UTC | #25
在 2024-10-12星期六的 11:01 -0400,Alan Stern写道:
> On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> > 
> > Hi Alan,
> > 
> > The V3 patch has been sent. Please review it to check if it aligns
> > with
> > the solution you described.
> 
> Yes, that is what I meant.
> 
> Have you and all the other people at kylinos.cn tested the patch to
> make 
> sure that it fixes the problem?
> 

Yes, I'm currently arranging for the testing to be conducted. Since the
S4 test takes time, the test results will be released later. Based on
my understanding, this patch should be able to solve the problem, but I
will rely solely on the actual test results for final confirmation.

> Alan Stern

Duan Chenghao
Duan Chenghao Oct. 14, 2024, 1:41 a.m. UTC | #26
在 2024-10-12星期六的 11:01 -0400,Alan Stern写道:
Hi Saranya,

> On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> > 
> > Hi Alan,
> > 
> > The V3 patch has been sent. Please review it to check if it aligns
> > with
> > the solution you described.
> 
> Yes, that is what I meant.
> 
> Have you and all the other people at kylinos.cn tested the patch to
> make 
> sure that it fixes the problem?
> 
> Alan Stern

If you have time, you can arrange to test your issue using the V3
version. This way, we can jointly verify whether the problem has been
resolved.

Thanks
Duan Chenghao
Gopal, Saranya Oct. 14, 2024, 4:10 a.m. UTC | #27
Hi Duan Changhao,

> -----Original Message-----
> From: duanchenghao <duanchenghao@kylinos.cn>
> Sent: Monday, October 14, 2024 7:11 AM
> To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya
> <saranya.gopal@intel.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie
> <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux-
> kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux-
> usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz;
> stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie
> <xiehongyu1@kylinos.cn>
> Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused
> by USB status when S4 wakes up
> 
> 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道:
> Hi Saranya,
> 
> > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> > >
> > > Hi Alan,
> > >
> > > The V3 patch has been sent. Please review it to check if it aligns
> > > with
> > > the solution you described.
> >
> > Yes, that is what I meant.
> >
> > Have you and all the other people at kylinos.cn tested the patch to
> > make
> > sure that it fixes the problem?
> >
> > Alan Stern
> 
> If you have time, you can arrange to test your issue using the V3
> version. This way, we can jointly verify whether the problem has been
> resolved.
> 
Thanks for your patch.
I will arrange a system by tomorrow and start 1500 cycles of S4 with the V3 version of your patch.
Please note that it will take two more days from the start of the test to get the results.

Thanks,
Saranya

> Thanks
> Duan Chenghao
> 
>
Gopal, Saranya Oct. 16, 2024, 8:57 a.m. UTC | #28
Hi Duan Chenghao,

> -----Original Message-----
> From: duanchenghao <duanchenghao@kylinos.cn>
> Sent: Monday, October 14, 2024 7:11 AM
> To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya
> <saranya.gopal@intel.com>
> Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie
> <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux-
> kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux-
> usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz;
> stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie
> <xiehongyu1@kylinos.cn>
> Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused
> by USB status when S4 wakes up
> 
> 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道:
> Hi Saranya,
> 
> > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> > >
> > > Hi Alan,
> > >
> > > The V3 patch has been sent. Please review it to check if it aligns
> > > with
> > > the solution you described.
> >
> > Yes, that is what I meant.
> >
> > Have you and all the other people at kylinos.cn tested the patch to
> > make
> > sure that it fixes the problem?
> >
> > Alan Stern
> 
> If you have time, you can arrange to test your issue using the V3
> version. This way, we can jointly verify whether the problem has been
> resolved.
> 
My testing completed today.
I couldn't reproduce the issue when I ran 1500 S4 cycles with v3 version of your patch.
The issue was anyway rare before with an occurrence rate of 20/1500 cycles in our system.
So, please conclude after verifying from your side also.

Thanks,
Saranya

> Thanks
> Duan Chenghao
> 
>
Duan Chenghao Oct. 16, 2024, 9:06 a.m. UTC | #29
Hi Saranya,

在 2024-10-16星期三的 08:57 +0000,Gopal, Saranya写道:
> Hi Duan Chenghao,
> 
> > -----Original Message-----
> > From: duanchenghao <duanchenghao@kylinos.cn>
> > Sent: Monday, October 14, 2024 7:11 AM
> > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya
> > <saranya.gopal@intel.com>
> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie
> > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux-
> > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux-
> > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz;
> > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie
> > <xiehongyu1@kylinos.cn>
> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure
> > caused
> > by USB status when S4 wakes up
> > 
> > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道:
> > Hi Saranya,
> > 
> > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote:
> > > > 
> > > > Hi Alan,
> > > > 
> > > > The V3 patch has been sent. Please review it to check if it
> > > > aligns
> > > > with
> > > > the solution you described.
> > > 
> > > Yes, that is what I meant.
> > > 
> > > Have you and all the other people at kylinos.cn tested the patch
> > > to
> > > make
> > > sure that it fixes the problem?
> > > 
> > > Alan Stern
> > 
> > If you have time, you can arrange to test your issue using the V3
> > version. This way, we can jointly verify whether the problem has
> > been
> > resolved.
> > 
> My testing completed today.
> I couldn't reproduce the issue when I ran 1500 S4 cycles with v3
> version of your patch.
> The issue was anyway rare before with an occurrence rate of 20/1500
> cycles in our system.
> So, please conclude after verifying from your side also.
> 

Thank you very much for your verification. I am currently conducting
parallel testing on multiple machines, and the results should be
available soon as well.

Thanks,
Duan Chenghao

> Thanks,
> Saranya
> 
> > Thanks
> > Duan Chenghao
> > 
> > 
>
Duan Chenghao Oct. 24, 2024, 2:53 a.m. UTC | #30
Hi Alan,

The current patch has been tested on 6 machines and has passed the
tests, with each machine undergoing 500 cycles of testing.

Saranya also replied in the email that she conducted 1,500 tests and
all were successful.


Thanks
Duan Chenghao 

在 2024-10-24星期四的 10:40 +0800,Duan Chenghao写道:
> When a device is inserted into the USB port and an S4 wakeup is
> initiated,
> after the USB-hub initialization is completed, it will automatically
> enter
> suspend mode. Upon detecting a device on the USB port, it will
> proceed with
> resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During
> the S4
> wakeup process, peripherals are put into suspend mode, followed by
> task
> recovery. However, upon detecting that the hcd is in the
> HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status,
> causing the
> S4 suspend to fail and subsequent task recovery to not proceed.
> -
> [   27.594598][ 1]  PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28
> returns -16
> [   27.594601][ 1]  PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100
> returns -16
> [   27.603420][ 1]  ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100
> returned 0 after 3 usecs
> [   27.612233][ 1]  ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100
> returned -16 after 17223 usecs
> [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce async:
> error -16
> [   27.816988][ 1]  PM: quiesce of devices aborted after 1833.282
> msecs
> [   27.823302][ 1]  PM: start quiesce of devices aborted after
> 1839.975 msecs
> ......
> [   31.303172][ 1]  PM: recover of devices complete after 3473.039
> msecs
> [   31.309818][ 1]  PM: Failed to load hibernation image, recovering.
> [   31.348188][ 1]  PM: Basic memory bitmaps freed
> [   31.352686][ 1]  OOM killer enabled.
> [   31.356232][ 1]  Restarting tasks ... done.
> [   31.360609][ 1]  PM: resume from hibernation failed (0)
> [   31.365800][ 1]  PM: Hibernation image not present or could not be
> loaded.
> 
> The "do_wakeup" is determined based on whether the controller's
> power/wakeup attribute is set. The current issue necessitates
> considering
> the type of suspend that is occurring. If the suspend type is either
> PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set
> to
> false.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Closes:
> https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> ---
>  drivers/usb/core/hcd-pci.c | 15 +++++++++++++--
>  include/linux/pm.h         |  3 ++-
>  2 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c
> index a08f3f228e6d..390b1225f3cc 100644
> --- a/drivers/usb/core/hcd-pci.c
> +++ b/drivers/usb/core/hcd-pci.c
> @@ -422,7 +422,12 @@ static int suspend_common(struct device *dev,
> pm_message_t msg)
>         bool                    do_wakeup;
>         int                     retval;
>  
> -       do_wakeup = PMSG_IS_AUTO(msg) ? true :
> device_may_wakeup(dev);
> +       if (PMSG_IS_AUTO(msg))
> +               do_wakeup = true;
> +       else if (PMSG_NO_WAKEUP(msg))
> +               do_wakeup = false;
> +       else
> +               do_wakeup = device_may_wakeup(dev);
>  
>         /* Root hub suspend should have stopped all downstream
> traffic,
>          * and all bus master traffic.  And done so for both the
> interface
> @@ -521,6 +526,11 @@ static int hcd_pci_suspend(struct device *dev)
>         return suspend_common(dev, PMSG_SUSPEND);
>  }
>  
> +static int hcd_pci_freeze(struct device *dev)
> +{
> +       return suspend_common(dev, PMSG_FREEZE);
> +}
> +
>  static int hcd_pci_suspend_noirq(struct device *dev)
>  {
>         struct pci_dev          *pci_dev = to_pci_dev(dev);
> @@ -590,6 +600,7 @@ static int hcd_pci_restore(struct device *dev)
>  #else
>  
>  #define hcd_pci_suspend                NULL
> +#define hcd_pci_freeze                 NULL
>  #define hcd_pci_suspend_noirq  NULL
>  #define hcd_pci_poweroff_late  NULL
>  #define hcd_pci_resume_noirq   NULL
> @@ -624,7 +635,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = {
>         .suspend_noirq  = hcd_pci_suspend_noirq,
>         .resume_noirq   = hcd_pci_resume_noirq,
>         .resume         = hcd_pci_resume,
> -       .freeze         = hcd_pci_suspend,
> +       .freeze         = hcd_pci_freeze,
>         .freeze_noirq   = check_root_hub_suspended,
>         .thaw_noirq     = NULL,
>         .thaw           = hcd_pci_resume,
> diff --git a/include/linux/pm.h b/include/linux/pm.h
> index 97b0e23363c8..2a676b2cb699 100644
> --- a/include/linux/pm.h
> +++ b/include/linux/pm.h
> @@ -570,7 +570,8 @@ const struct dev_pm_ops name = { \
>                                         { .event =
> PM_EVENT_AUTO_RESUME, })
>  
>  #define PMSG_IS_AUTO(msg)      (((msg).event & PM_EVENT_AUTO) != 0)
> -
> +#define PMSG_NO_WAKEUP(msg)    (((msg).event & \
> +                               (PM_EVENT_FREEZE | PM_EVENT_QUIESCE))
> != 0)
>  /*
>   * Device run-time power management status.
>   *
gregkh@linuxfoundation.org Oct. 24, 2024, 7:05 a.m. UTC | #31
On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote:
> When a device is inserted into the USB port and an S4 wakeup is initiated,
> after the USB-hub initialization is completed, it will automatically enter
> suspend mode. Upon detecting a device on the USB port, it will proceed with
> resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During the S4
> wakeup process, peripherals are put into suspend mode, followed by task
> recovery. However, upon detecting that the hcd is in the
> HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, causing the
> S4 suspend to fail and subsequent task recovery to not proceed.
> -
> [   27.594598][ 1]  PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28 returns -16
> [   27.594601][ 1]  PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100 returns -16
> [   27.603420][ 1]  ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100 returned 0 after 3 usecs
> [   27.612233][ 1]  ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100 returned -16 after 17223 usecs
> [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce async: error -16
> [   27.816988][ 1]  PM: quiesce of devices aborted after 1833.282 msecs
> [   27.823302][ 1]  PM: start quiesce of devices aborted after 1839.975 msecs
> ......
> [   31.303172][ 1]  PM: recover of devices complete after 3473.039 msecs
> [   31.309818][ 1]  PM: Failed to load hibernation image, recovering.
> [   31.348188][ 1]  PM: Basic memory bitmaps freed
> [   31.352686][ 1]  OOM killer enabled.
> [   31.356232][ 1]  Restarting tasks ... done.
> [   31.360609][ 1]  PM: resume from hibernation failed (0)
> [   31.365800][ 1]  PM: Hibernation image not present or could not be loaded.
> 
> The "do_wakeup" is determined based on whether the controller's
> power/wakeup attribute is set. The current issue necessitates considering
> the type of suspend that is occurring. If the suspend type is either
> PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set to
> false.
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Closes: https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>

What commit id does this fix?

And I missed where Alan provided a signed-off-by, where was that?

thanks,

greg k-h
Duan Chenghao Oct. 24, 2024, 8:46 a.m. UTC | #32
hi greg k-h,

在 2024-10-24星期四的 09:05 +0200,Greg KH写道:
> On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote:
> > When a device is inserted into the USB port and an S4 wakeup is
> > initiated,
> > after the USB-hub initialization is completed, it will
> > automatically enter
> > suspend mode. Upon detecting a device on the USB port, it will
> > proceed with
> > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During
> > the S4
> > wakeup process, peripherals are put into suspend mode, followed by
> > task
> > recovery. However, upon detecting that the hcd is in the
> > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status,
> > causing the
> > S4 suspend to fail and subsequent task recovery to not proceed.
> > -
> > [   27.594598][ 1]  PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28
> > returns -16
> > [   27.594601][ 1]  PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100
> > returns -16
> > [   27.603420][ 1]  ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100
> > returned 0 after 3 usecs
> > [   27.612233][ 1]  ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100
> > returned -16 after 17223 usecs
> > [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce
> > async: error -16
> > [   27.816988][ 1]  PM: quiesce of devices aborted after 1833.282
> > msecs
> > [   27.823302][ 1]  PM: start quiesce of devices aborted after
> > 1839.975 msecs
> > ......
> > [   31.303172][ 1]  PM: recover of devices complete after 3473.039
> > msecs
> > [   31.309818][ 1]  PM: Failed to load hibernation image,
> > recovering.
> > [   31.348188][ 1]  PM: Basic memory bitmaps freed
> > [   31.352686][ 1]  OOM killer enabled.
> > [   31.356232][ 1]  Restarting tasks ... done.
> > [   31.360609][ 1]  PM: resume from hibernation failed (0)
> > [   31.365800][ 1]  PM: Hibernation image not present or could not
> > be loaded.
> > 
> > The "do_wakeup" is determined based on whether the controller's
> > power/wakeup attribute is set. The current issue necessitates
> > considering
> > the type of suspend that is occurring. If the suspend type is
> > either
> > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set
> > to
> > false.
> > 
> > Reported-by: kernel test robot <lkp@intel.com>
> > Closes:
> > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> > Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> 
> What commit id does this fix?

The current patch is not intended to fix an issue with a specific
commit, but rather to address a long-standing problem in the USB core.

> 
> And I missed where Alan provided a signed-off-by, where was that?

In the following email, Alan proposed using "Signed-off-by" for
signing.
https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/

Thanks
Duan Chenghao

> 
> thanks,
> 
> greg k-h
gregkh@linuxfoundation.org Oct. 29, 2024, 3:27 a.m. UTC | #33
On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote:
> hi greg k-h,
> 
> 在 2024-10-24星期四的 09:05 +0200,Greg KH写道:
> > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote:
> > > When a device is inserted into the USB port and an S4 wakeup is
> > > initiated,
> > > after the USB-hub initialization is completed, it will
> > > automatically enter
> > > suspend mode. Upon detecting a device on the USB port, it will
> > > proceed with
> > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During
> > > the S4
> > > wakeup process, peripherals are put into suspend mode, followed by
> > > task
> > > recovery. However, upon detecting that the hcd is in the
> > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status,
> > > causing the
> > > S4 suspend to fail and subsequent task recovery to not proceed.
> > > -
> > > [   27.594598][ 1]  PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28
> > > returns -16
> > > [   27.594601][ 1]  PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100
> > > returns -16
> > > [   27.603420][ 1]  ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100
> > > returned 0 after 3 usecs
> > > [   27.612233][ 1]  ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100
> > > returned -16 after 17223 usecs
> > > [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce
> > > async: error -16
> > > [   27.816988][ 1]  PM: quiesce of devices aborted after 1833.282
> > > msecs
> > > [   27.823302][ 1]  PM: start quiesce of devices aborted after
> > > 1839.975 msecs
> > > ......
> > > [   31.303172][ 1]  PM: recover of devices complete after 3473.039
> > > msecs
> > > [   31.309818][ 1]  PM: Failed to load hibernation image,
> > > recovering.
> > > [   31.348188][ 1]  PM: Basic memory bitmaps freed
> > > [   31.352686][ 1]  OOM killer enabled.
> > > [   31.356232][ 1]  Restarting tasks ... done.
> > > [   31.360609][ 1]  PM: resume from hibernation failed (0)
> > > [   31.365800][ 1]  PM: Hibernation image not present or could not
> > > be loaded.
> > > 
> > > The "do_wakeup" is determined based on whether the controller's
> > > power/wakeup attribute is set. The current issue necessitates
> > > considering
> > > the type of suspend that is occurring. If the suspend type is
> > > either
> > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set
> > > to
> > > false.
> > > 
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Closes:
> > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > 
> > What commit id does this fix?
> 
> The current patch is not intended to fix an issue with a specific
> commit, but rather to address a long-standing problem in the USB core.

So should it be backported to older stable kernels?  If so, how far
back?

> > And I missed where Alan provided a signed-off-by, where was that?
> 
> In the following email, Alan proposed using "Signed-off-by" for
> signing.
> https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/

Ah, missed that, sorry.

thanks,

greg k-h
Duan Chenghao Oct. 29, 2024, 8:01 a.m. UTC | #34
hi greg k-h,

在 2024-10-29星期二的 04:27 +0100,Greg KH写道:
> On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote:
> > hi greg k-h,
> > 
> > 在 2024-10-24星期四的 09:05 +0200,Greg KH写道:
> > > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote:
> > > > When a device is inserted into the USB port and an S4 wakeup is
> > > > initiated,
> > > > after the USB-hub initialization is completed, it will
> > > > automatically enter
> > > > suspend mode. Upon detecting a device on the USB port, it will
> > > > proceed with
> > > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> > > > During
> > > > the S4
> > > > wakeup process, peripherals are put into suspend mode, followed
> > > > by
> > > > task
> > > > recovery. However, upon detecting that the hcd is in the
> > > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status,
> > > > causing the
> > > > S4 suspend to fail and subsequent task recovery to not proceed.
> > > > -
> > > > [   27.594598][ 1]  PM: pci_pm_freeze():
> > > > hcd_pci_suspend+0x0/0x28
> > > > returns -16
> > > > [   27.594601][ 1]  PM: dpm_run_callback():
> > > > pci_pm_freeze+0x0/0x100
> > > > returns -16
> > > > [   27.603420][ 1]  ehci-pci 0000:00:04.1:
> > > > pci_pm_freeze+0x0/0x100
> > > > returned 0 after 3 usecs
> > > > [   27.612233][ 1]  ehci-pci 0000:00:05.1:
> > > > pci_pm_freeze+0x0/0x100
> > > > returned -16 after 17223 usecs
> > > > [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce
> > > > async: error -16
> > > > [   27.816988][ 1]  PM: quiesce of devices aborted after
> > > > 1833.282
> > > > msecs
> > > > [   27.823302][ 1]  PM: start quiesce of devices aborted after
> > > > 1839.975 msecs
> > > > ......
> > > > [   31.303172][ 1]  PM: recover of devices complete after
> > > > 3473.039
> > > > msecs
> > > > [   31.309818][ 1]  PM: Failed to load hibernation image,
> > > > recovering.
> > > > [   31.348188][ 1]  PM: Basic memory bitmaps freed
> > > > [   31.352686][ 1]  OOM killer enabled.
> > > > [   31.356232][ 1]  Restarting tasks ... done.
> > > > [   31.360609][ 1]  PM: resume from hibernation failed (0)
> > > > [   31.365800][ 1]  PM: Hibernation image not present or could
> > > > not
> > > > be loaded.
> > > > 
> > > > The "do_wakeup" is determined based on whether the controller's
> > > > power/wakeup attribute is set. The current issue necessitates
> > > > considering
> > > > the type of suspend that is occurring. If the suspend type is
> > > > either
> > > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be
> > > > set
> > > > to
> > > > false.
> > > > 
> > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > Closes:
> > > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> > > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > > 
> > > What commit id does this fix?
> > 
> > The current patch is not intended to fix an issue with a specific
> > commit, but rather to address a long-standing problem in the USB
> > core.
> 
> So should it be backported to older stable kernels?  If so, how far
> back?

yes, It needs to be backported. The stable branches such as 6.6.y,
6.10.y, and 6.11.y can be considered for the backport.

Should we backport to these versions?

Thanks 

Duan Chenghao
> 
> > > And I missed where Alan provided a signed-off-by, where was that?
> > 
> > In the following email, Alan proposed using "Signed-off-by" for
> > signing.
> > https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/
> 
> Ah, missed that, sorry.
> 
> thanks,
> 
> greg k-h
Duan Chenghao Nov. 6, 2024, 1:29 a.m. UTC | #35
Hi Greg k-h & Alan,

Excuse me, both of you. I've noticed that you haven't replied to the
emails for quite some time, which prompted me to send this one. I'd
like to inquire if the proposal in the current email is feasible and if
it can be integrated into the community.

Thanks,

Duan Chenghao

在 2024-10-29星期二的 16:01 +0800,duanchenghao写道:
> hi greg k-h,
> 
> 在 2024-10-29星期二的 04:27 +0100,Greg KH写道:
> > On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote:
> > > hi greg k-h,
> > > 
> > > 在 2024-10-24星期四的 09:05 +0200,Greg KH写道:
> > > > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote:
> > > > > When a device is inserted into the USB port and an S4 wakeup
> > > > > is
> > > > > initiated,
> > > > > after the USB-hub initialization is completed, it will
> > > > > automatically enter
> > > > > suspend mode. Upon detecting a device on the USB port, it
> > > > > will
> > > > > proceed with
> > > > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state.
> > > > > During
> > > > > the S4
> > > > > wakeup process, peripherals are put into suspend mode,
> > > > > followed
> > > > > by
> > > > > task
> > > > > recovery. However, upon detecting that the hcd is in the
> > > > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY
> > > > > status,
> > > > > causing the
> > > > > S4 suspend to fail and subsequent task recovery to not
> > > > > proceed.
> > > > > -
> > > > > [   27.594598][ 1]  PM: pci_pm_freeze():
> > > > > hcd_pci_suspend+0x0/0x28
> > > > > returns -16
> > > > > [   27.594601][ 1]  PM: dpm_run_callback():
> > > > > pci_pm_freeze+0x0/0x100
> > > > > returns -16
> > > > > [   27.603420][ 1]  ehci-pci 0000:00:04.1:
> > > > > pci_pm_freeze+0x0/0x100
> > > > > returned 0 after 3 usecs
> > > > > [   27.612233][ 1]  ehci-pci 0000:00:05.1:
> > > > > pci_pm_freeze+0x0/0x100
> > > > > returned -16 after 17223 usecs
> > > > > [   27.810067][ 1]  PM: Device 0000:00:05.1 failed to quiesce
> > > > > async: error -16
> > > > > [   27.816988][ 1]  PM: quiesce of devices aborted after
> > > > > 1833.282
> > > > > msecs
> > > > > [   27.823302][ 1]  PM: start quiesce of devices aborted
> > > > > after
> > > > > 1839.975 msecs
> > > > > ......
> > > > > [   31.303172][ 1]  PM: recover of devices complete after
> > > > > 3473.039
> > > > > msecs
> > > > > [   31.309818][ 1]  PM: Failed to load hibernation image,
> > > > > recovering.
> > > > > [   31.348188][ 1]  PM: Basic memory bitmaps freed
> > > > > [   31.352686][ 1]  OOM killer enabled.
> > > > > [   31.356232][ 1]  Restarting tasks ... done.
> > > > > [   31.360609][ 1]  PM: resume from hibernation failed (0)
> > > > > [   31.365800][ 1]  PM: Hibernation image not present or
> > > > > could
> > > > > not
> > > > > be loaded.
> > > > > 
> > > > > The "do_wakeup" is determined based on whether the
> > > > > controller's
> > > > > power/wakeup attribute is set. The current issue necessitates
> > > > > considering
> > > > > the type of suspend that is occurring. If the suspend type is
> > > > > either
> > > > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should
> > > > > be
> > > > > set
> > > > > to
> > > > > false.
> > > > > 
> > > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > > Closes:
> > > > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/
> > > > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
> > > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn>
> > > > 
> > > > What commit id does this fix?
> > > 
> > > The current patch is not intended to fix an issue with a specific
> > > commit, but rather to address a long-standing problem in the USB
> > > core.
> > 
> > So should it be backported to older stable kernels?  If so, how far
> > back?
> 
> yes, It needs to be backported. The stable branches such as 6.6.y,
> 6.10.y, and 6.11.y can be considered for the backport.
> 
> Should we backport to these versions?
> 
> Thanks 
> 
> Duan Chenghao
> > 
> > > > And I missed where Alan provided a signed-off-by, where was
> > > > that?
> > > 
> > > In the following email, Alan proposed using "Signed-off-by" for
> > > signing.
> > > https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/
> > 
> > Ah, missed that, sorry.
> > 
> > thanks,
> > 
> > greg k-h
>
diff mbox series

Patch

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 1ff7d901fede..a6bd0fbd82f4 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -2389,7 +2389,6 @@  void usb_hcd_resume_root_hub (struct usb_hcd *hcd)
 	spin_lock_irqsave (&hcd_root_hub_lock, flags);
 	if (hcd->rh_registered) {
 		pm_wakeup_event(&hcd->self.root_hub->dev, 0);
-		set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
 		queue_work(pm_wq, &hcd->wakeup_work);
 	}
 	spin_unlock_irqrestore (&hcd_root_hub_lock, flags);
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 4b93c0bd1d4b..7f847c4afc0d 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3835,11 +3835,14 @@  int usb_port_resume(struct usb_device *udev, pm_message_t msg)
 
 int usb_remote_wakeup(struct usb_device *udev)
 {
+	struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
 	int	status = 0;
 
 	usb_lock_device(udev);
 	if (udev->state == USB_STATE_SUSPENDED) {
 		dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-");
+		if (hcd->state == HC_STATE_SUSPENDED)
+			set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags);
 		status = usb_autoresume_device(udev);
 		if (status == 0) {
 			/* Let the drivers do their thing, then... */