Message ID | 20240906030548.845115-1-duanchenghao@kylinos.cn |
---|---|
State | Superseded |
Headers | show |
Series | USB: Fix the issue of task recovery failure caused by USB status when S4 wakes up | expand |
[Please make sure that the lines in your email message don't extend beyond 76 columns or so.] Lots of things here seem to be wrong. On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote: > When a device is inserted into the USB port and an S4 wakeup is initiated, There is no such thing as an S4 wakeup. Do you mean wakeup from an S4 suspend state? > after the USB-hub initialization is completed, it will automatically enter suspend mode. What will enter suspend mode? The hub that the device was plugged into? That should not happen. The hub initialization code should detect that a new device was plugged in and prevent the hub from suspending. > Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. HCD_FLAG_WAKEUP_PENDING is not a state. It is a flag. > During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery. What do you mean by "task recovery"? We don't need to recover any tasks. What do you mean by "peripherals are put into suspend mode"? That's not what happens. Peripherals are set back to full power. > However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state, > it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed. What will return an EBUSY status? Why do you say that S4 suspend will fail? Aren't you talking about S4 wakeup? Can you provide a kernel log that explains these points and shows what problem you are trying to solve? > This patch makes two modifications in total: > 1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd, > which were previously split between the top half and bottom half of the interrupt, > are now unified and executed solely in the bottom half of the interrupt. > This prevents the bottom half tasks from being frozen during the S4 process, > ensuring that the clean_bit process can proceed without interruption. The name is "clear_bit" (with an 'r'), not "clean_bit". > 2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING. > When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit. > This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process. hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after calling hcd->driver->bus_resume(). After that point, usb_hcd_resume_root_hub() won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again? Alan Stern > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > --- > drivers/usb/core/hcd.c | 1 - > drivers/usb/core/hub.c | 3 +++ > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > index 1ff7d901fede..a6bd0fbd82f4 100644 > --- a/drivers/usb/core/hcd.c > +++ b/drivers/usb/core/hcd.c > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd *hcd) > spin_lock_irqsave (&hcd_root_hub_lock, flags); > if (hcd->rh_registered) { > pm_wakeup_event(&hcd->self.root_hub->dev, 0); > - set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); > queue_work(pm_wq, &hcd->wakeup_work); > } > spin_unlock_irqrestore (&hcd_root_hub_lock, flags); > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > index 4b93c0bd1d4b..7f847c4afc0d 100644 > --- a/drivers/usb/core/hub.c > +++ b/drivers/usb/core/hub.c > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg) > > int usb_remote_wakeup(struct usb_device *udev) > { > + struct usb_hcd *hcd = bus_to_hcd(udev->bus); > int status = 0; > > usb_lock_device(udev); > if (udev->state == USB_STATE_SUSPENDED) { > dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); > + if (hcd->state == HC_STATE_SUSPENDED) > + set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); > status = usb_autoresume_device(udev); > if (status == 0) { > /* Let the drivers do their thing, then... */ > -- > 2.34.1 > >
> [Please make sure that the lines in your email message don't extend > beyond 76 columns or so.] > OK. Later, I will modify the patch format. V2 patch will be released later > Lots of things here seem to be wrong. > > On Fri, Sep 06, 2024 at 11:05:48AM +0800, Duan Chenghao wrote: > > When a device is inserted into the USB port and an S4 wakeup is > > initiated, > > There is no such thing as an S4 wakeup. Do you mean wakeup from an > S4 > suspend state? Yes, waking up from the S4 suspend state. > > > after the USB-hub initialization is completed, it will > > automatically enter suspend mode. > > What will enter suspend mode? The hub that the device was plugged > into? > That should not happen. The hub initialization code should detect > that > a new device was plugged in and prevent the hub from suspending. > Yes, the current issue is that the hub detects a new device during the resuming process. However, the S4 wakeup is attempting to put the hub into suspend mode, and during the suspend process, it detects that the HCD_FLAG_WAKEUP_PENDING flag has already been set, resulting in the return of an EBUSY status. > > Upon detecting a device on the USB port, it will proceed with > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. > > HCD_FLAG_WAKEUP_PENDING is not a state. It is a flag. > > > During the S4 wakeup process, peripherals are put into suspend > > mode, followed by task recovery. > > What do you mean by "task recovery"? We don't need to recover any > tasks. > S4 wakeup restores the image that was saved before the system entered the S4 sleep state. S4 waking up from hibernation ============================= kernel initialization | v freeze user task and kernel thread | v load saved image | v freeze the peripheral device and controller (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is set, return to EBUSY and do not perform the following restore image.) | v restore image(task recovery) > What do you mean by "peripherals are put into suspend mode"? That's > not > what happens. Peripherals are set back to full power. > > > However, upon detecting that the hcd is in the > > HCD_FLAG_WAKEUP_PENDING state, > > it will return an EBUSY status, causing the S4 suspend to fail and > > subsequent task recovery to not proceed. > > What will return an EBUSY status? if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY. > > Why do you say that S4 suspend will fail? Aren't you talking about > S4 > wakeup? After returning EBUSY, the subsequent restore image operation will not be executed. > > Can you provide a kernel log that explains these points and shows > what > problem you are trying to solve? [ 9.009166][ 2] [ T403] PM: Image signature found, resuming [ 9.009167][ 2] [ T403] PM: resume from hibernation [ 9.009243][ 2] [ T403] inno-codec inno-codec.16.auto: [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5... [ 9.009244][ 2] [ T403] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 9.010355][ 2] [ T403] OOM killer disabled. [ 9.010355][ 2] [ T403] Freezing remaining freezable tasks ... (elapsed 0.000 seconds) done. [ 9.012152][ 2] [ T403] PM: Basic memory bitmaps created [ 9.073333][ 2] [ T403] PM: Using 3 thread(s) for decompression [ 9.073334][ 2] [ T403] PM: Loading and decompressing image data (486874 pages)... [ 9.073335][ 2] [ T403] hibernate: Hibernated on CPU 0 [mpidr:0x0] [ 9.095928][ 2] [ T403] PM: Image loading progress: 0% [ 9.664803][ 2] [ T403] PM: Image loading progress: 10% [ 9.794156][ 2] [ T403] PM: Image loading progress: 20% [ 9.913001][ 2] [ T403] PM: Image loading progress: 30% [ 10.034331][ 2] [ T403] PM: Image loading progress: 40% [ 10.154070][ 2] [ T403] PM: Image loading progress: 50% [ 10.277096][ 2] [ T403] PM: Image loading progress: 60% [ 10.398860][ 2] [ T403] PM: Image loading progress: 70% [ 10.533760][ 2] [ T403] PM: Image loading progress: 80% [ 10.659874][ 2] [ T403] PM: Image loading progress: 90% [ 10.760681][ 2] [ T403] PM: Image loading progress: 100% [ 10.760693][ 2] [ T403] PM: Image loading done [ 10.760718][ 2] [ T403] PM: Read 1947496 kbytes in 1.68 seconds (1159.22 MB/s) [ 10.761982][ 2] [ T403] PM: Image successfully loaded [ 10.761988][ 2] [ T403] printk: Suspending console(s) (use no_console_suspend to debug) [ 10.864973][ 2] [ T403] innovpu_freeze:1782 [ 10.864974][ 2] [ T403] innovpu_suspend:1759 [ 11.168871][ 2] [ T189] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x38 returns -16 [ 11.168875][ 2] [ T189] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x108 returns -16 [ 11.168876][ 2] [ T189] PM: Device 0000:05:00.0 failed to quiesce async: error -16 [ 12.270452][ 2] [ T403] innovpu_thaw:1792 [ 12.405296][ 2] [ T403] PM: Failed to load hibernation image, recovering. [ 12.486859][ 2] [ T403] PM: Basic memory bitmaps freed [ 12.486860][ 2] [ T403] OOM killer enabled. [ 12.486861][ 2] [ T403] Restarting tasks ... > > > This patch makes two modifications in total: > > 1. The set_bit and clean_bit operations for the > > HCD_FLAG_WAKEUP_PENDING flag of Hcd, > > which were previously split between the top half and bottom half of > > the interrupt, > > are now unified and executed solely in the bottom half of the > > interrupt. > > This prevents the bottom half tasks from being frozen during the S4 > > process, > > ensuring that the clean_bit process can proceed without > > interruption. > > The name is "clear_bit" (with an 'r'), not "clean_bit". > > > 2. Add a condition to the set_bit operation for the hcd status > > HCD_FLAG_WAKEUP_PENDING. > > When the hcd status is HC_STATE_SUSPENDED, perform the setting of > > the aforementioned status bit. > > This prevents a subsequent set_bit from occurring after the > > clean_bit if the hcd is in the resuming process. > > hcd_bus_resume() clears that HCD_FLAG_WAKEUP_PENDING bit after > calling > hcd->driver->bus_resume(). After that point, > usb_hcd_resume_root_hub() > won't be called, so how can HCD_FLAG_WAKEUP_PENDING get set again? > > Alan Stern > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > --- > > drivers/usb/core/hcd.c | 1 - > > drivers/usb/core/hub.c | 3 +++ > > 2 files changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c > > index 1ff7d901fede..a6bd0fbd82f4 100644 > > --- a/drivers/usb/core/hcd.c > > +++ b/drivers/usb/core/hcd.c > > @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd > > *hcd) > > spin_lock_irqsave (&hcd_root_hub_lock, flags); > > if (hcd->rh_registered) { > > pm_wakeup_event(&hcd->self.root_hub->dev, 0); > > - set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); > > queue_work(pm_wq, &hcd->wakeup_work); > > } > > spin_unlock_irqrestore (&hcd_root_hub_lock, flags); > > diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c > > index 4b93c0bd1d4b..7f847c4afc0d 100644 > > --- a/drivers/usb/core/hub.c > > +++ b/drivers/usb/core/hub.c > > @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device > > *udev, pm_message_t msg) > > > > int usb_remote_wakeup(struct usb_device *udev) > > { > > + struct usb_hcd *hcd = bus_to_hcd(udev->bus); > > int status = 0; > > > > usb_lock_device(udev); > > if (udev->state == USB_STATE_SUSPENDED) { > > dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); > > + if (hcd->state == HC_STATE_SUSPENDED) > > + set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd- > > > flags); > > status = usb_autoresume_device(udev); > > if (status == 0) { > > /* Let the drivers do their thing, then... > > */ > > -- > > 2.34.1 > > > >
在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > S4 wakeup restores the image that was saved before the system > > entered > > the S4 sleep state. > > > > S4 waking up from hibernation > > ============================= > > kernel initialization > > | > > v > > freeze user task and kernel thread > > | > > v > > load saved image > > | > > v > > freeze the peripheral device and controller > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is > > set, > > return to EBUSY and do not perform the following restore > > image.) > > Why is the flag set at this point? It should not be; the device and > controller should have been frozen with wakeup disabled. > This is check point, not set point. When the USB goes into a suspend state, the HCD_FLAG_WAKEUP_PENDING flag is checked, and if it is found that the USB is in the process of resuming, then an EBUSY error is returned. > > | > > v > > restore image(task recovery) > > > > > However, upon detecting that the hcd is in the > > > > HCD_FLAG_WAKEUP_PENDING state, > > > > it will return an EBUSY status, causing the S4 suspend to fail > > > > and > > > > subsequent task recovery to not proceed. > > > > > > What will return an EBUSY status? > > > > if HCD_FLAG_WAKEUP_PENDING flag is set_bit, will return EBUSY. > > I meant: Which function will return EBUSY status? The answer is in > the > log below; hcd_pci_suspend() does this. > > > > Why do you say that S4 suspend will fail? Aren't you talking > > > about > > > S4 > > > wakeup? > > > > After returning EBUSY, the subsequent restore image operation will > > not > > be executed. > > > > > > > > Can you provide a kernel log that explains these points and shows > > > what > > > problem you are trying to solve? > > > > [ 9.009166][ 2] [ T403] PM: Image signature found, resuming > > [ 9.009167][ 2] [ T403] PM: resume from hibernation > > [ 9.009243][ 2] [ T403] inno-codec inno-codec.16.auto: > > [inno_vpu][vpu_notifier:1540]vpu_notifier: untested action 5... > > [ 9.009244][ 2] [ T403] Freezing user space processes ... > > (elapsed > > 0.001 seconds) done. > > [ 9.010355][ 2] [ T403] OOM killer disabled. > > [ 9.010355][ 2] [ T403] Freezing remaining freezable tasks ... > > (elapsed 0.000 seconds) done. > > [ 9.012152][ 2] [ T403] PM: Basic memory bitmaps created > > [ 9.073333][ 2] [ T403] PM: Using 3 thread(s) for decompression > > [ 9.073334][ 2] [ T403] PM: Loading and decompressing image > > data > > (486874 pages)... > > [ 9.073335][ 2] [ T403] hibernate: Hibernated on CPU 0 > > [mpidr:0x0] > > [ 9.095928][ 2] [ T403] PM: Image loading progress: 0% > > [ 9.664803][ 2] [ T403] PM: Image loading progress: 10% > > [ 9.794156][ 2] [ T403] PM: Image loading progress: 20% > > [ 9.913001][ 2] [ T403] PM: Image loading progress: 30% > > [ 10.034331][ 2] [ T403] PM: Image loading progress: 40% > > [ 10.154070][ 2] [ T403] PM: Image loading progress: 50% > > [ 10.277096][ 2] [ T403] PM: Image loading progress: 60% > > [ 10.398860][ 2] [ T403] PM: Image loading progress: 70% > > [ 10.533760][ 2] [ T403] PM: Image loading progress: 80% > > [ 10.659874][ 2] [ T403] PM: Image loading progress: 90% > > [ 10.760681][ 2] [ T403] PM: Image loading progress: 100% > > [ 10.760693][ 2] [ T403] PM: Image loading done > > [ 10.760718][ 2] [ T403] PM: Read 1947496 kbytes in 1.68 seconds > > (1159.22 MB/s) > > [ 10.761982][ 2] [ T403] PM: Image successfully loaded > > [ 10.761988][ 2] [ T403] printk: Suspending console(s) (use > > no_console_suspend to debug) > > [ 10.864973][ 2] [ T403] innovpu_freeze:1782 > > [ 10.864974][ 2] [ T403] innovpu_suspend:1759 > > [ 11.168871][ 2] [ T189] PM: pci_pm_freeze(): > > hcd_pci_suspend+0x0/0x38 returns -16 > > This should not be allowed to happen. Freezing is mandatory and not > subject to wakeup requests. > > Is your problem related to the one discussed in this email thread? > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > Would the suggestion I made there -- i.e., have the xhci-hcd > interrupt handler skip calling usb_hcd_resume_root_hub() if the root > hub > was suspended with wakeup = 0 -- fix your problem? Skipping usb_hcd_resume_root_hub() should generally be possible, but it's important to ensure that normal remote wakeup functionality is not compromised. Is it HUB_SUSPEND that the hub you are referring to is in a suspended state? v2 patch: https://lore.kernel.org/all/20240910105714.148976-1-duanchenghao@kylinos.cn/ > > Alan Stern
On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > S4 wakeup restores the image that was saved before the system > > > entered > > > the S4 sleep state. > > > > > > S4 waking up from hibernation > > > ============================= > > > kernel initialization > > > | > > > v > > > freeze user task and kernel thread > > > | > > > v > > > load saved image > > > | > > > v > > > freeze the peripheral device and controller > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is > > > set, > > > return to EBUSY and do not perform the following restore > > > image.) > > > > Why is the flag set at this point? It should not be; the device and > > controller should have been frozen with wakeup disabled. > > > This is check point, not set point. Yes, I know that. But when the flag was checked, why did the code find that it was set? The flag should have been clear. > > Is your problem related to the one discussed in this email thread? > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > interrupt handler skip calling usb_hcd_resume_root_hub() if the root > > hub > > was suspended with wakeup = 0 -- fix your problem? > > Skipping usb_hcd_resume_root_hub() should generally be possible, but > it's important to ensure that normal remote wakeup functionality is not > compromised. Is it HUB_SUSPEND that the hub you are referring to is in > a suspended state? I don't understand this question. hub_quiesce() gets called with HUB_SUSPEND when the hub enters a suspended state. You are correct about the need for normal remote wakeup to work properly. The interrupt handler should skip calling usb_hcd_resume_root_hub() for port connect or disconnect changes and for port overcurrent changes (when the root hub is suspended with wakeup = 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for port resume events. Alan Stern
在 2024-09-12星期四的 11:00 -0400,Alan Stern写道: > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > > S4 wakeup restores the image that was saved before the system > > > > entered > > > > the S4 sleep state. > > > > > > > > S4 waking up from hibernation > > > > ============================= > > > > kernel initialization > > > > | > > > > v > > > > freeze user task and kernel thread > > > > | > > > > v > > > > load saved image > > > > | > > > > v > > > > freeze the peripheral device and controller > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it > > > > is > > > > set, > > > > return to EBUSY and do not perform the following restore > > > > image.) > > > > > > Why is the flag set at this point? It should not be; the device > > > and > > > controller should have been frozen with wakeup disabled. > > > > > This is check point, not set point. > > Yes, I know that. But when the flag was checked, why did the code > find > that it was set? The flag should have been clear. Yes, the current issue is that during S4 testing, there is a probabilistic scenario where clear_bit is not called after set_bit, or clear_bit is called but does not execute after set_bit. Please refer to the two modification points in the v2 patch for details, as both of them can cause the current issue. > > > > Is your problem related to the one discussed in this email > > > thread? > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the > > > root > > > hub > > > was suspended with wakeup = 0 -- fix your problem? > > > > Skipping usb_hcd_resume_root_hub() should generally be possible, > > but > > it's important to ensure that normal remote wakeup functionality is > > not > > compromised. Is it HUB_SUSPEND that the hub you are referring to is > > in > > a suspended state? > > I don't understand this question. hub_quiesce() gets called with > HUB_SUSPEND when the hub enters a suspended state. > > You are correct about the need for normal remote wakeup to work > properly. The interrupt handler should skip calling > usb_hcd_resume_root_hub() for port connect or disconnect changes and > for > port overcurrent changes (when the root hub is suspended with wakeup > = > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for > port > resume events. The current issue arises when rh_state is detected as RH_SUSPEND and usb_hcd_resume_root_hub() is called to resume the root hub. However, there is no mutual exclusion between the suspend flag, set_bit, and clear_bit, which can lead to two scenarios: 1. After set_bit is called, the state of the USB device is modified by another process to !USB_STATE_SUSPEND, preventing the hub's resume from being executed, and consequently, clear_bit is not called again. 2. In another scenario, during the hub resume process, after HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet been set to !RH_SUSPENDED. At this point, set_bit is executed, but since the hub has already entered the running state, the clear_bit associated with the resume operation is not executed. Please review the v2 patch, where I have described both the logical flow before the modification and the revised logical flow after the modification. > > Alan Stern
在 2024-09-13星期五的 09:51 +0800,duanchenghao写道: > 在 2024-09-12星期四的 11:00 -0400,Alan Stern写道: > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > > > S4 wakeup restores the image that was saved before the system > > > > > entered > > > > > the S4 sleep state. > > > > > > > > > > S4 waking up from hibernation > > > > > ============================= > > > > > kernel initialization > > > > > | > > > > > v > > > > > freeze user task and kernel thread > > > > > | > > > > > v > > > > > load saved image > > > > > | > > > > > v > > > > > freeze the peripheral device and controller > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If > > > > > it > > > > > is > > > > > set, > > > > > return to EBUSY and do not perform the following restore > > > > > image.) > > > > > > > > Why is the flag set at this point? It should not be; the > > > > device > > > > and > > > > controller should have been frozen with wakeup disabled. > > > > > > > This is check point, not set point. > > > > Yes, I know that. But when the flag was checked, why did the code > > find > > that it was set? The flag should have been clear. > > Yes, the current issue is that during S4 testing, there is a > probabilistic scenario where clear_bit is not called after set_bit, > or > clear_bit is called but does not execute after set_bit. Please refer > to > the two modification points in the v2 patch for details, as both of > them can cause the current issue. > > > > > > > Is your problem related to the one discussed in this email > > > > thread? > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the > > > > root > > > > hub > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > Skipping usb_hcd_resume_root_hub() should generally be possible, > > > but > > > it's important to ensure that normal remote wakeup functionality > > > is > > > not > > > compromised. Is it HUB_SUSPEND that the hub you are referring to > > > is > > > in > > > a suspended state? > > > > I don't understand this question. hub_quiesce() gets called with > > HUB_SUSPEND when the hub enters a suspended state. > > > > You are correct about the need for normal remote wakeup to work > > properly. The interrupt handler should skip calling > > usb_hcd_resume_root_hub() for port connect or disconnect changes > > and > > for > > port overcurrent changes (when the root hub is suspended with > > wakeup > > = > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for > > port > > resume events. > > The current issue arises when rh_state is detected as RH_SUSPEND and > usb_hcd_resume_root_hub() is called to resume the root hub. However, > there is no mutual exclusion between the suspend flag, set_bit, and > clear_bit, which can lead to two scenarios: > > 1. After set_bit is called, the state of the USB device is > modified > by another process to !USB_STATE_SUSPEND, preventing the hub's resume > from being executed, and consequently, clear_bit is not called again. > > 2. In another scenario, during the hub resume process, after > HCD_FLAG_WAKEUP_PENDING is cleared by clear_bit, rh_state has not yet > been set to !RH_SUSPENDED. At this point, set_bit is executed, but > since the hub has already entered the running state, the clear_bit > associated with the resume operation is not executed. > > Please review the v2 patch, where I have described both the logical > flow before the modification and the revised logical flow after the > modification. > In fact, issue point 2 in the patch is introduced by issue point 1, and issue point 2 represents a further improvement. The main issue lies in point 1, where after the execution of the top half of the interrupt is completed, the bottom half is frozen by S4. As a result, the USB resume is not executed during this S4 process, and clear_bit is not called as well. This further leads to a situation where during the process of S4 putting the USB controller into suspend, the check for HCD_FLAG_WAKEUP_PENDING being set returns -EBUSY. > > > > Alan Stern >
From: Hongyu Xie <xiehongyu1@kylinos.cn> Hi Alan, On 2024/9/12 23:00, Alan Stern wrote: > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: >> 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: >>> On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: >>>> S4 wakeup restores the image that was saved before the system >>>> entered >>>> the S4 sleep state. >>>> >>>> S4 waking up from hibernation >>>> ============================= >>>> kernel initialization >>>> | >>>> v >>>> freeze user task and kernel thread >>>> | >>>> v >>>> load saved image >>>> | >>>> v >>>> freeze the peripheral device and controller >>>> (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If it is >>>> set, >>>> return to EBUSY and do not perform the following restore >>>> image.) >>> >>> Why is the flag set at this point? It should not be; the device and >>> controller should have been frozen with wakeup disabled. >>> >> This is check point, not set point. > > Yes, I know that. But when the flag was checked, why did the code find > that it was set? The flag should have been clear. Maybe duanchenghao means this, freeze_kernel_threads load_image_and_restore suspend roothub interrupt occurred usb_hcd_resume_root_hub set HCD_FLAG_WAKEUP_PENDING queue_work // freezed suspend pci return -EBUSY because HCD_FLAG_WAKEUP_PENDING So s4 resume failed. > >>> Is your problem related to the one discussed in this email thread? >>> >>> https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ >>> >>> Would the suggestion I made there -- i.e., have the xhci-hcd >>> interrupt handler skip calling usb_hcd_resume_root_hub() if the root >>> hub >>> was suspended with wakeup = 0 -- fix your problem? >> >> Skipping usb_hcd_resume_root_hub() should generally be possible, but >> it's important to ensure that normal remote wakeup functionality is not >> compromised. Is it HUB_SUSPEND that the hub you are referring to is in >> a suspended state? > > I don't understand this question. hub_quiesce() gets called with > HUB_SUSPEND when the hub enters a suspended state. > > You are correct about the need for normal remote wakeup to work > properly. The interrupt handler should skip calling > usb_hcd_resume_root_hub() for port connect or disconnect changes and for > port overcurrent changes (when the root hub is suspended with wakeup = > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for port > resume events. > > Alan Stern > Hongyu Xie
Hi Alan, Do you think this plan is feasible, or is there any unclear part in my description that needs to be supplemented? duanchenghao 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > Hi Alan, > On 2024/9/12 23:00, Alan Stern wrote: > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao wrote: > > > > > S4 wakeup restores the image that was saved before the system > > > > > entered > > > > > the S4 sleep state. > > > > > > > > > > S4 waking up from hibernation > > > > > ============================= > > > > > kernel initialization > > > > > | > > > > > v > > > > > freeze user task and kernel thread > > > > > | > > > > > v > > > > > load saved image > > > > > | > > > > > v > > > > > freeze the peripheral device and controller > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. If > > > > > it is > > > > > set, > > > > > return to EBUSY and do not perform the following > > > > > restore > > > > > image.) > > > > > > > > Why is the flag set at this point? It should not be; the > > > > device and > > > > controller should have been frozen with wakeup disabled. > > > > > > > This is check point, not set point. > > > > Yes, I know that. But when the flag was checked, why did the code > > find > > that it was set? The flag should have been clear. > Maybe duanchenghao means this, > freeze_kernel_threads > load_image_and_restore > suspend roothub > interrupt occurred > usb_hcd_resume_root_hub > set HCD_FLAG_WAKEUP_PENDING > queue_work // freezed > suspend pci > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > So s4 resume failed. > > > > > > Is your problem related to the one discussed in this email > > > > thread? > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > Would the suggestion I made there -- i.e., have the xhci-hcd > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if the > > > > root > > > > hub > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > Skipping usb_hcd_resume_root_hub() should generally be possible, > > > but > > > it's important to ensure that normal remote wakeup functionality > > > is not > > > compromised. Is it HUB_SUSPEND that the hub you are referring to > > > is in > > > a suspended state? > > > > I don't understand this question. hub_quiesce() gets called with > > HUB_SUSPEND when the hub enters a suspended state. > > > > You are correct about the need for normal remote wakeup to work > > properly. The interrupt handler should skip calling > > usb_hcd_resume_root_hub() for port connect or disconnect changes > > and for > > port overcurrent changes (when the root hub is suspended with > > wakeup = > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() for > > port > > resume events. > > > > Alan Stern > > > > Hongyu Xie
Hi Alan, Please reveiew the patch when you have time. duanchenghao 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > Hi Alan, > > > > Do you think this plan is feasible, or is there any unclear part in > > my > > description that needs to be supplemented? > > I apologize for not getting back to you earlier -- I've been > incredibly > busy during the last few weeks. > > I still haven't had time to go over this throroughly. If I don't > respond by the end of this week, remind me again. > > Alan Stern > > > duanchenghao > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > Hi Alan, > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao > > > > > > wrote: > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > system > > > > > > > entered > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > ============================= > > > > > > > kernel initialization > > > > > > > | > > > > > > > v > > > > > > > freeze user task and kernel thread > > > > > > > | > > > > > > > v > > > > > > > load saved image > > > > > > > | > > > > > > > v > > > > > > > freeze the peripheral device and controller > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. > > > > > > > If > > > > > > > it is > > > > > > > set, > > > > > > > return to EBUSY and do not perform the following > > > > > > > restore > > > > > > > image.) > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > device and > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > This is check point, not set point. > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > code > > > > find > > > > that it was set? The flag should have been clear. > > > Maybe duanchenghao means this, > > > freeze_kernel_threads > > > load_image_and_restore > > > suspend roothub > > > interrupt occurred > > > usb_hcd_resume_root_hub > > > set > > > HCD_FLAG_WAKEUP_PENDING > > > queue_work // freezed > > > suspend pci > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > So s4 resume failed. > > > > > > > > > > Is your problem related to the one discussed in this email > > > > > > thread? > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > hcd > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if > > > > > > the > > > > > > root > > > > > > hub > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > possible, > > > > > but > > > > > it's important to ensure that normal remote wakeup > > > > > functionality > > > > > is not > > > > > compromised. Is it HUB_SUSPEND that the hub you are referring > > > > > to > > > > > is in > > > > > a suspended state? > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > with > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > You are correct about the need for normal remote wakeup to work > > > > properly. The interrupt handler should skip calling > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > changes > > > > and for > > > > port overcurrent changes (when the root hub is suspended with > > > > wakeup = > > > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() > > > > for > > > > port > > > > resume events. > > > > > > > > Alan Stern > > > > > > > > > > Hongyu Xie > >
Hi Alan, I haven't received a reply from you since my last email. Could you please confirm if you have received this one? I'm worried that there might be an issue with the email system and you might not be receiving them. Duanchenghao 在 2024-09-29星期日的 11:14 +0800,duanchenghao写道: > Hi Alan, > > Please reveiew the patch when you have time. > > duanchenghao > > 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > > Hi Alan, > > > > > > Do you think this plan is feasible, or is there any unclear part > > > in > > > my > > > description that needs to be supplemented? > > > > I apologize for not getting back to you earlier -- I've been > > incredibly > > busy during the last few weeks. > > > > I still haven't had time to go over this throroughly. If I don't > > respond by the end of this week, remind me again. > > > > Alan Stern > > > > > duanchenghao > > > > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > > > > Hi Alan, > > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao > > > > > > > wrote: > > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > > system > > > > > > > > entered > > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > > ============================= > > > > > > > > kernel initialization > > > > > > > > | > > > > > > > > v > > > > > > > > freeze user task and kernel thread > > > > > > > > | > > > > > > > > v > > > > > > > > load saved image > > > > > > > > | > > > > > > > > v > > > > > > > > freeze the peripheral device and controller > > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the > > > > > > > > USB. > > > > > > > > If > > > > > > > > it is > > > > > > > > set, > > > > > > > > return to EBUSY and do not perform the following > > > > > > > > restore > > > > > > > > image.) > > > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > > device and > > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > > > This is check point, not set point. > > > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > > code > > > > > find > > > > > that it was set? The flag should have been clear. > > > > Maybe duanchenghao means this, > > > > freeze_kernel_threads > > > > load_image_and_restore > > > > suspend roothub > > > > interrupt occurred > > > > usb_hcd_resume_root_hub > > > > set > > > > HCD_FLAG_WAKEUP_PENDING > > > > queue_work // freezed > > > > suspend pci > > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > > > So s4 resume failed. > > > > > > > > > > > > Is your problem related to the one discussed in this > > > > > > > email > > > > > > > thread? > > > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > > hcd > > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() > > > > > > > if > > > > > > > the > > > > > > > root > > > > > > > hub > > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > > possible, > > > > > > but > > > > > > it's important to ensure that normal remote wakeup > > > > > > functionality > > > > > > is not > > > > > > compromised. Is it HUB_SUSPEND that the hub you are > > > > > > referring > > > > > > to > > > > > > is in > > > > > > a suspended state? > > > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > > with > > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > > > You are correct about the need for normal remote wakeup to > > > > > work > > > > > properly. The interrupt handler should skip calling > > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > > changes > > > > > and for > > > > > port overcurrent changes (when the root hub is suspended with > > > > > wakeup = > > > > > 0). But it should _not_ skip calling > > > > > usb_hcd_resume_root_hub() > > > > > for > > > > > port > > > > > resume events. > > > > > > > > > > Alan Stern > > > > > > > > > > > > > Hongyu Xie > > > >
On Wed, Oct 09, 2024 at 09:19:39AM +0800, duanchenghao wrote: > Hi Alan, > > I haven't received a reply from you since my last email. Could you > please confirm if you have received this one? I have. > I'm worried that there might be an issue with the email system and you > might not be receiving them. I sent a message 9 days ago. See https://lore.kernel.org/all/85105e45-3553-4a8c-b132-3875c4657c4b@rowland.harvard.edu/ You were CC'ed on that message; maybe you didn't receive it. Maybe the topic of that thread isn't exactly the same as the topic of your thread; I tend to get the two of them confused. Alan Stern
Hi Alan, These are two patches, each addressing the same issue. The current patch is a direct solution to the problem of the interrupt bottom half being frozen. The patch you replied with is another, alternative solution to the same problem. Please review which solution is more suitable, or if there are any other revised proposals. Please review the patch I mentioned: https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ Duanchenghao 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > Hi Alan, > > > > Do you think this plan is feasible, or is there any unclear part in > > my > > description that needs to be supplemented? > > I apologize for not getting back to you earlier -- I've been > incredibly > busy during the last few weeks. > > I still haven't had time to go over this throroughly. If I don't > respond by the end of this week, remind me again. > > Alan Stern > > > duanchenghao > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > Hi Alan, > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao wrote: > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, duanchenghao > > > > > > wrote: > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > system > > > > > > > entered > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > ============================= > > > > > > > kernel initialization > > > > > > > | > > > > > > > v > > > > > > > freeze user task and kernel thread > > > > > > > | > > > > > > > v > > > > > > > load saved image > > > > > > > | > > > > > > > v > > > > > > > freeze the peripheral device and controller > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of the USB. > > > > > > > If > > > > > > > it is > > > > > > > set, > > > > > > > return to EBUSY and do not perform the following > > > > > > > restore > > > > > > > image.) > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > device and > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > This is check point, not set point. > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > code > > > > find > > > > that it was set? The flag should have been clear. > > > Maybe duanchenghao means this, > > > freeze_kernel_threads > > > load_image_and_restore > > > suspend roothub > > > interrupt occurred > > > usb_hcd_resume_root_hub > > > set > > > HCD_FLAG_WAKEUP_PENDING > > > queue_work // freezed > > > suspend pci > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > So s4 resume failed. > > > > > > > > > > Is your problem related to the one discussed in this email > > > > > > thread? > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab-b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > hcd > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() if > > > > > > the > > > > > > root > > > > > > hub > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > possible, > > > > > but > > > > > it's important to ensure that normal remote wakeup > > > > > functionality > > > > > is not > > > > > compromised. Is it HUB_SUSPEND that the hub you are referring > > > > > to > > > > > is in > > > > > a suspended state? > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > with > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > You are correct about the need for normal remote wakeup to work > > > > properly. The interrupt handler should skip calling > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > changes > > > > and for > > > > port overcurrent changes (when the root hub is suspended with > > > > wakeup = > > > > 0). But it should _not_ skip calling usb_hcd_resume_root_hub() > > > > for > > > > port > > > > resume events. > > > > > > > > Alan Stern > > > > > > > > > > Hongyu Xie > >
Hi Duanchenghao, > -----Original Message----- > From: duanchenghao <duanchenghao@kylinos.cn> > Sent: Wednesday, October 09, 2024 8:05 AM > To: Alan Stern <stern@rowland.harvard.edu> > Cc: Hongyu Xie <xy521521@gmail.com>; > gregkh@linuxfoundation.org; linux-kernel@vger.kernel.org; linux- > pm@vger.kernel.org; linux-usb@vger.kernel.org; > niko.mauno@vaisala.com; pavel@ucw.cz; > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > <xiehongyu1@kylinos.cn> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused > by USB status when S4 wakes up > > Hi Alan, > > These are two patches, each addressing the same issue. The current > patch is a direct solution to the problem of the interrupt bottom half > being frozen. The patch you replied with is another, alternative > solution to the same problem. Please review which solution is more > suitable, or if there are any other revised proposals. > > > Please review the patch I mentioned: > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f > 05539a.camel@kylinos.cn/ > I was able to test 1500 S4 cycles in my system with your patch and all cycles passed with your patch. I was facing an issue similar to what Kai-Heng was observing in the below mail thread: https://lore.kernel.org/lkml/b8553def-19ea-41d5-b665-4859ddb7b6d5@redhat.com/T/ He was observing issues due to USB touchscreen. And my issue was due to Bluetooth USB controller. But, xHCI error logs were exactly same. It looks like your patch is fixing both the issues. It would be nice to see this patch land upstream. Thanks, Saranya > Duanchenghao > > 在 2024-09-24星期二的 09:38 -0400,Alan Stern写道: > > On Mon, Sep 23, 2024 at 04:00:35PM +0800, duanchenghao wrote: > > > Hi Alan, > > > > > > Do you think this plan is feasible, or is there any unclear part in > > > my > > > description that needs to be supplemented? > > > > I apologize for not getting back to you earlier -- I've been > > incredibly > > busy during the last few weeks. > > > > I still haven't had time to go over this throroughly. If I don't > > respond by the end of this week, remind me again. > > > > Alan Stern > > > > > duanchenghao > > > > > > > > > 在 2024-09-14星期六的 10:43 +0800,Hongyu Xie写道: > > > > From: Hongyu Xie <xiehongyu1@kylinos.cn> > > > > > > > > > > > > Hi Alan, > > > > On 2024/9/12 23:00, Alan Stern wrote: > > > > > On Thu, Sep 12, 2024 at 11:21:26AM +0800, duanchenghao > wrote: > > > > > > 在 2024-09-11星期三的 10:40 -0400,Alan Stern写道: > > > > > > > On Tue, Sep 10, 2024 at 05:36:56PM +0800, > duanchenghao > > > > > > > wrote: > > > > > > > > S4 wakeup restores the image that was saved before the > > > > > > > > system > > > > > > > > entered > > > > > > > > the S4 sleep state. > > > > > > > > > > > > > > > > S4 waking up from hibernation > > > > > > > > ============================= > > > > > > > > kernel initialization > > > > > > > > | > > > > > > > > v > > > > > > > > freeze user task and kernel thread > > > > > > > > | > > > > > > > > v > > > > > > > > load saved image > > > > > > > > | > > > > > > > > v > > > > > > > > freeze the peripheral device and controller > > > > > > > > (Check the HCD_FLAG_WAKEUP_ PENDING flag of > the USB. > > > > > > > > If > > > > > > > > it is > > > > > > > > set, > > > > > > > > return to EBUSY and do not perform the following > > > > > > > > restore > > > > > > > > image.) > > > > > > > > > > > > > > Why is the flag set at this point? It should not be; the > > > > > > > device and > > > > > > > controller should have been frozen with wakeup disabled. > > > > > > > > > > > > > This is check point, not set point. > > > > > > > > > > Yes, I know that. But when the flag was checked, why did the > > > > > code > > > > > find > > > > > that it was set? The flag should have been clear. > > > > Maybe duanchenghao means this, > > > > freeze_kernel_threads > > > > load_image_and_restore > > > > suspend roothub > > > > interrupt occurred > > > > usb_hcd_resume_root_hub > > > > set > > > > HCD_FLAG_WAKEUP_PENDING > > > > queue_work // freezed > > > > suspend pci > > > > return -EBUSY because HCD_FLAG_WAKEUP_PENDING > > > > > > > > So s4 resume failed. > > > > > > > > > > > > Is your problem related to the one discussed in this email > > > > > > > thread? > > > > > > > > > > > > > > https://lore.kernel.org/linux-usb/d8600868-6e4b-45ab- > b328-852b6ac8ecb5@rowland.harvard.edu/ > > > > > > > > > > > > > > Would the suggestion I made there -- i.e., have the xhci- > > > > > > > hcd > > > > > > > interrupt handler skip calling usb_hcd_resume_root_hub() > if > > > > > > > the > > > > > > > root > > > > > > > hub > > > > > > > was suspended with wakeup = 0 -- fix your problem? > > > > > > > > > > > > Skipping usb_hcd_resume_root_hub() should generally be > > > > > > possible, > > > > > > but > > > > > > it's important to ensure that normal remote wakeup > > > > > > functionality > > > > > > is not > > > > > > compromised. Is it HUB_SUSPEND that the hub you are > referring > > > > > > to > > > > > > is in > > > > > > a suspended state? > > > > > > > > > > I don't understand this question. hub_quiesce() gets called > > > > > with > > > > > HUB_SUSPEND when the hub enters a suspended state. > > > > > > > > > > You are correct about the need for normal remote wakeup to > work > > > > > properly. The interrupt handler should skip calling > > > > > usb_hcd_resume_root_hub() for port connect or disconnect > > > > > changes > > > > > and for > > > > > port overcurrent changes (when the root hub is suspended > with > > > > > wakeup = > > > > > 0). But it should _not_ skip calling > usb_hcd_resume_root_hub() > > > > > for > > > > > port > > > > > resume events. > > > > > > > > > > Alan Stern > > > > > > > > > > > > > Hongyu Xie > > > >
On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote: > Hi Alan, > > These are two patches, each addressing the same issue. The current > patch is a direct solution to the problem of the interrupt bottom half > being frozen. The patch you replied with is another, alternative > solution to the same problem. Please review which solution is more > suitable, or if there are any other revised proposals. > > > Please review the patch I mentioned: > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ I still think your whole approach is wrong. There is no need to change the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; that's not the reason for the problem you're seeing. The problem occurs because when suspend_common() in drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling device_may_wakeup(), and the value that function returns is not what we want. The value is based on whether the controller's power/wakeup attribute is set, but we also have to take into account the type of suspend that's happening. Basically, if msg is one of the suspend types for which wakeups should always be disabled, then do_wakeup should be set to false. There isn't a macro to test for these things, but there should be. Something like PMSG_IS_AUTO() in include/linux/pm.h; maybe call it PMSG_NO_WAKEUP(). This macro should return true if the PM_EVENT_FREEZE or PM_EVENT_QUIESCE bits are set in msg.event. Rafael, please correct me if I got any of this wrong. So the right way to fix the problem is to add to pm.h: #define PMSG_NO_WAKEUP(msg) (((msg.event) & \ (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0) and in suspend_common(): if (PMSG_IS_AUTO(msg)) do_wakeup = true; else if (PMSG_NO_WAKEUP(msg)) do_wakeup = false; else do_wakeup = device_may_wakeup(dev); Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() tests in the following code will be executed, so the routine won't fail during the freeze or restore phase with -EBUSY. You'll also have to define an hcd_pci_freeze() routine, just like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of PMSG_SUSPEND. And the .freeze callback in usb_hcd_pci_pm_ops should be changed to hcd_pci_freeze. In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c could also use the new macro, instead of checking for FREEZE or QUIESCE directly the way it does now. Alan Stern
Hi Alan, 在 2024-10-09星期三的 11:50 -0400,Alan Stern写道: > On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote: > > Hi Alan, > > > > These are two patches, each addressing the same issue. The current > > patch is a direct solution to the problem of the interrupt bottom > > half > > being frozen. The patch you replied with is another, alternative > > solution to the same problem. Please review which solution is more > > suitable, or if there are any other revised proposals. > > > > > > Please review the patch I mentioned: > > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ > > I still think your whole approach is wrong. There is no need to > change > the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; that's > not > the reason for the problem you're seeing. > Thank you very much for your evaluation of the scheme. I have a question regarding why the set_bit operation for the HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an interrupt handler, while the clear_bit operation is done in the bottom half. This seems to contradict conventional practices. The issue is not limited to S4; if other processes freeze the work queue in the bottom half, the same problem may arise. The solution you described below should be able to resolve the current S4 issue, but for now, we haven't identified any other scenarios that require the same operation. Based on my understanding, the USB device is woken up in the bottom half of the interrupt, and both the set_bit and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be executed within the same thread in the bottom half. May I ask if there are any other reasons why the set_bit is executed in the top half? > The problem occurs because when suspend_common() in > drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling > device_may_wakeup(), and the value that function returns is not what > we > want. The value is based on whether the controller's power/wakeup > attribute is set, but we also have to take into account the type of > suspend that's happening. > > Basically, if msg is one of the suspend types for which wakeups > should > always be disabled, then do_wakeup should be set to false. There > isn't > a macro to test for these things, but there should be. Something > like > PMSG_IS_AUTO() in include/linux/pm.h; maybe call it > PMSG_NO_WAKEUP(). > This macro should return true if the PM_EVENT_FREEZE or > PM_EVENT_QUIESCE > bits are set in msg.event. > > Rafael, please correct me if I got any of this wrong. > > So the right way to fix the problem is to add to pm.h: > > #define PMSG_NO_WAKEUP(msg) (((msg.event) & \ > (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0) > > and in suspend_common(): > > if (PMSG_IS_AUTO(msg)) > do_wakeup = true; > else if (PMSG_NO_WAKEUP(msg)) > do_wakeup = false; > else > do_wakeup = device_may_wakeup(dev); > > Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() > tests > in the following code will be executed, so the routine won't fail > during > the freeze or restore phase with -EBUSY. > > You'll also have to define an hcd_pci_freeze() routine, just > like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of > PMSG_SUSPEND. And the .freeze callback in usb_hcd_pci_pm_ops should > be > changed to hcd_pci_freeze. > > In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c > could also use the new macro, instead of checking for FREEZE or > QUIESCE > directly the way it does now. > > Alan Stern
Hi Alan, 在 2024-10-09星期三的 11:50 -0400,Alan Stern写道: > > On Wed, Oct 09, 2024 at 10:35:05AM +0800, duanchenghao wrote: > > > > Hi Alan, > > > > > > > > These are two patches, each addressing the same issue. The > > > > current > > > > patch is a direct solution to the problem of the interrupt > > > > bottom > > > > half > > > > being frozen. The patch you replied with is another, > > > > alternative > > > > solution to the same problem. Please review which solution is > > > > more > > > > suitable, or if there are any other revised proposals. > > > > > > > > > > > > Please review the patch I mentioned: > > > > https://lore.kernel.org/all/0a4dc46ae767c28dd207ae29511ede747f05539a.camel@kylinos.cn/ > > > > I still think your whole approach is wrong. There is no need to > > change > > the way the HCD_FLAG_WAKEUP_PENDING flag gets set or cleared; > > that's > > not > > the reason for the problem you're seeing. > > Thank you very much for your evaluation of the scheme. I have a question regarding why the set_bit operation for the HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an interrupt handler, while the clear_bit operation is done in the bottom half. This seems to contradict conventional practices. The issue is not limited to S4; if other processes freeze the work queue in the bottom half, the same problem may arise. The solution you described below should be able to resolve the current S4 issue, but for now, we haven't identified any other scenarios that require the same operation. Based on my understanding, the USB device is woken up in the bottom half of the interrupt, and both the set_bit and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be executed within the same thread in the bottom half. May I ask if there are any other reasons why the set_bit is executed in the top half? Thanks Duanchenghao > > The problem occurs because when suspend_common() in > > drivers/usb/core/hcd-pci.c sets do_wakeup, it does so by calling > > device_may_wakeup(), and the value that function returns is not > > what > > we > > want. The value is based on whether the controller's power/wakeup > > attribute is set, but we also have to take into account the type of > > suspend that's happening. > > > > Basically, if msg is one of the suspend types for which wakeups > > should > > always be disabled, then do_wakeup should be set to false. There > > isn't > > a macro to test for these things, but there should be. Something > > like > > PMSG_IS_AUTO() in include/linux/pm.h; maybe call it > > PMSG_NO_WAKEUP(). > > This macro should return true if the PM_EVENT_FREEZE or > > PM_EVENT_QUIESCE > > bits are set in msg.event. > > > > Rafael, please correct me if I got any of this wrong. > > > > So the right way to fix the problem is to add to pm.h: > > > > #define PMSG_NO_WAKEUP(msg) (((msg.event) & \ > > (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0) > > > > and in suspend_common(): > > > > if (PMSG_IS_AUTO(msg)) > > do_wakeup = true; > > else if (PMSG_NO_WAKEUP(msg)) > > do_wakeup = false; > > else > > do_wakeup = device_may_wakeup(dev); > > > > Then with do_wakeup set to false, none of the HCD_WAKEUP_PENDING() > > tests > > in the following code will be executed, so the routine won't fail > > during > > the freeze or restore phase with -EBUSY. > > > > You'll also have to define an hcd_pci_freeze() routine, just > > like hcd_pci_suspend() except that it uses PMSG_FREEZE instead of > > PMSG_SUSPEND. And the .freeze callback in usb_hcd_pci_pm_ops > > should > > be > > changed to hcd_pci_freeze. > > > > In fact, it looks like choose_wakeup() in drivers/usb/core/driver.c > > could also use the new macro, instead of checking for FREEZE or > > QUIESCE > > directly the way it does now. > > > > Alan Stern
On Thu, Oct 10, 2024 at 01:59:08PM +0800, duanchenghao wrote: > Hi Alan, > Thank you very much for your evaluation of the scheme. I have a > question regarding why the set_bit operation for the > HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an > interrupt handler, while the clear_bit operation is done in the bottom > half. This seems to contradict conventional practices. The issue is not > limited to S4; if other processes freeze the work queue in the bottom > half, the same problem may arise. The flag is treated this way because that's what it means: A wakeup is pending. The kernel first learns about the wakeup when it receives the wakeup interrupt from the host controller, so that's when it sets the flag -- in the top half of the interrupt handler. The wakeup procedure isn't complete until the root hub has been resumed, so the flag remains set until that resume is finished, in the bottom half. You say "the same problem may arise", but I don't think it is a problem. If the system is about to suspend the host controller with wakeups enabled, and a wakeup request has already been received but the system has not yet finished acting on it, then the suspend _should_ fail. After all, if the wakeup interrupt had arrived just after the host controller was suspended rather than just before, it would have caused the host controller to be resumed right away -- with exactly the same effect as if the controller had never been suspended in the first place. > The solution you described below should be able to resolve the current > S4 issue, but for now, we haven't identified any other scenarios that > require the same operation. Perhaps because there aren't any. > Based on my understanding, the USB device > is woken up in the bottom half of the interrupt, You are failing to distinguish between the host controller and the root hub. The host controller (which is a PCI device on your system, not a USB device) is woken up in the top half of the interrupt handler. The root hub (which is a virtual USB device) is woken up in the bottom half. Both operations have to occur before the wakeup can be considered fully complete. > and both the set_bit > and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag should be > executed within the same thread in the bottom half. May I ask if there > are any other reasons why the set_bit is executed in the top half? Because the root hub's wakeup becomes pending when the host controller is resumed, in the top half. Alan Stern
在 2024-10-10星期四的 10:21 -0400,Alan Stern写道: > On Thu, Oct 10, 2024 at 01:59:08PM +0800, duanchenghao wrote: > > Hi Alan, > > > Thank you very much for your evaluation of the scheme. I have a > > question regarding why the set_bit operation for the > > HCD_FLAG_WAKEUP_PENDING flag is performed in the top half of an > > interrupt handler, while the clear_bit operation is done in the > > bottom > > half. This seems to contradict conventional practices. The issue is > > not > > limited to S4; if other processes freeze the work queue in the > > bottom > > half, the same problem may arise. > > The flag is treated this way because that's what it means: A wakeup > is > pending. The kernel first learns about the wakeup when it receives > the > wakeup interrupt from the host controller, so that's when it sets the > flag -- in the top half of the interrupt handler. The wakeup > procedure > isn't complete until the root hub has been resumed, so the flag > remains > set until that resume is finished, in the bottom half. > > You say "the same problem may arise", but I don't think it is a > problem. > If the system is about to suspend the host controller with wakeups > enabled, and a wakeup request has already been received but the > system > has not yet finished acting on it, then the suspend _should_ fail. > After all, if the wakeup interrupt had arrived just after the host > controller was suspended rather than just before, it would have > caused > the host controller to be resumed right away -- with exactly the same > effect as if the controller had never been suspended in the first > place. > > > The solution you described below should be able to resolve the > > current > > S4 issue, but for now, we haven't identified any other scenarios > > that > > require the same operation. > > Perhaps because there aren't any. > > > Based on my understanding, the USB device > > is woken up in the bottom half of the interrupt, > > You are failing to distinguish between the host controller and the > root > hub. The host controller (which is a PCI device on your system, not > a > USB device) is woken up in the top half of the interrupt handler. > The > root hub (which is a virtual USB device) is woken up in the bottom > half. > Both operations have to occur before the wakeup can be considered > fully > complete. > > > and both the set_bit > > and clear_bit operations for the HCD_FLAG_WAKEUP_PENDING flag > > should be > > executed within the same thread in the bottom half. May I ask if > > there > > are any other reasons why the set_bit is executed in the top half? > > Because the root hub's wakeup becomes pending when the host > controller > is resumed, in the top half. > > Alan Stern > Hi Alan, I roughly understand now. In your previous email, you mentioned assigning a value to do_wakeup based on the judgment of PMSG in suspend_common, but there is no parameter passing of PMSG in suspend_common. Do you mean using the global parameter pm_transition.event for the judgment? Thanks Duanchenghao
On Fri, Oct 11, 2024 at 09:42:11AM +0800, duanchenghao wrote: > Hi Alan, > > I roughly understand now. > > In your previous email, you mentioned assigning a value to do_wakeup > based on the judgment of PMSG in suspend_common, but there is no > parameter passing of PMSG in suspend_common. In my kernel tree, the first line of code in suspend_common() (following all the variable definitions) is this: do_wakeup = PMSG_IS_AUTO(msg) ? true : device_may_wakeup(dev); That's what I was talking about. > Do you mean using the global parameter pm_transition.event for the > judgment? No, I meant what I wrote. Alan Stern
Hi Alan, The V3 patch has been sent. Please review it to check if it aligns with the solution you described. Thanks Duan Chenghao 在 2024-10-11星期五的 09:53 -0400,Alan Stern写道: > On Fri, Oct 11, 2024 at 09:42:11AM +0800, duanchenghao wrote: > > Hi Alan, > > > > I roughly understand now. > > > > In your previous email, you mentioned assigning a value to > > do_wakeup > > based on the judgment of PMSG in suspend_common, but there is no > > parameter passing of PMSG in suspend_common. > > In my kernel tree, the first line of code in suspend_common() > (following > all the variable definitions) is this: > > do_wakeup = PMSG_IS_AUTO(msg) ? true : > device_may_wakeup(dev); > > That's what I was talking about. > > > Do you mean using the global parameter pm_transition.event for the > > judgment? > > No, I meant what I wrote. > > Alan Stern
On Sat, Oct 12, 2024 at 05:46:33PM +0800, Duan Chenghao wrote: > When a device is inserted into the USB port and an S4 wakeup is initiated, > after the USB-hub initialization is completed, it will automatically enter > suspend mode. Upon detecting a device on the USB port, it will proceed with > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During the S4 > wakeup process, peripherals are put into suspend mode, followed by task > recovery. However, upon detecting that the hcd is in the > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, causing the > S4 suspend to fail and subsequent task recovery to not proceed. > - > [ 27.594598][ 1] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28 returns -16 > [ 27.594601][ 1] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100 returns -16 > [ 27.603420][ 1] ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100 returned 0 after 3 usecs > [ 27.612233][ 1] ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100 returned -16 after 17223 usecs > [ 27.810067][ 1] PM: Device 0000:00:05.1 failed to quiesce async: error -16 > [ 27.816988][ 1] PM: quiesce of devices aborted after 1833.282 msecs > [ 27.823302][ 1] PM: start quiesce of devices aborted after 1839.975 msecs > ...... > [ 31.303172][ 1] PM: recover of devices complete after 3473.039 msecs > [ 31.309818][ 1] PM: Failed to load hibernation image, recovering. > [ 31.348188][ 1] PM: Basic memory bitmaps freed > [ 31.352686][ 1] OOM killer enabled. > [ 31.356232][ 1] Restarting tasks ... done. > [ 31.360609][ 1] PM: resume from hibernation failed (0) > [ 31.365800][ 1] PM: Hibernation image not present or could not be loaded. > > The "do_wakeup" is determined based on whether the controller's > power/wakeup attribute is set. The current issue necessitates considering > the type of suspend that is occurring. If the suspend type is either > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set to > false. > > Co-authored-by: Alan Stern <stern@rowland.harvard.edu> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > --- > drivers/usb/core/hcd-pci.c | 14 ++++++++++++-- > include/linux/pm.h | 3 ++- > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > index a08f3f228e6d..2c7d4446b82e 100644 > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -422,7 +422,12 @@ static int suspend_common(struct device *dev, pm_message_t msg) > bool do_wakeup; > int retval; > > - do_wakeup = PMSG_IS_AUTO(msg) ? true : device_may_wakeup(dev); > + if (PMSG_IS_AUTO(msg)) > + do_wakeup = true; > + else if (PMSG_NO_WAKEUP(msg)) > + do_wakeup = false; > + else > + do_wakeup = device_may_wakeup(dev); > > /* Root hub suspend should have stopped all downstream traffic, > * and all bus master traffic. And done so for both the interface > @@ -521,6 +526,11 @@ static int hcd_pci_suspend(struct device *dev) > return suspend_common(dev, PMSG_SUSPEND); > } > > +static int hcd_pci_freeze(struct device *dev) > +{ > + return suspend_common(dev, PMSG_FREEZE); > +} > + > static int hcd_pci_suspend_noirq(struct device *dev) > { > struct pci_dev *pci_dev = to_pci_dev(dev); > @@ -624,7 +634,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { > .suspend_noirq = hcd_pci_suspend_noirq, > .resume_noirq = hcd_pci_resume_noirq, > .resume = hcd_pci_resume, > - .freeze = hcd_pci_suspend, > + .freeze = hcd_pci_freeze, > .freeze_noirq = check_root_hub_suspended, > .thaw_noirq = NULL, > .thaw = hcd_pci_resume, > diff --git a/include/linux/pm.h b/include/linux/pm.h > index 97b0e23363c8..0dd7f559188b 100644 > --- a/include/linux/pm.h > +++ b/include/linux/pm.h > @@ -570,7 +570,8 @@ const struct dev_pm_ops name = { \ > { .event = PM_EVENT_AUTO_RESUME, }) > > #define PMSG_IS_AUTO(msg) (((msg).event & PM_EVENT_AUTO) != 0) > - > +#define PMSG_NO_WAKEUP(msg) (((msg.event) & \ > + (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) != 0) > /* > * Device run-time power management status. > * > -- > 2.34.1 > > Hi, This is the friendly patch-bot of Greg Kroah-Hartman. You have sent him a patch that has triggered this response. He used to manually respond to these common problems, but in order to save his sanity (he kept writing the same thing over and over, yet to different people), I was created. Hopefully you will not take offence and will fix the problem in your patch and resubmit it so that it can be accepted into the Linux kernel tree. You are receiving this message because of the following common error(s) as indicated below: - This looks like a new version of a previously submitted patch, but you did not list below the --- line any changes from the previous version. Please read the section entitled "The canonical patch format" in the kernel file, Documentation/process/submitting-patches.rst for what needs to be done here to properly describe this. If you wish to discuss this problem further, or you have questions about how to resolve this issue, please feel free to respond to this email and Greg will reply once he has dug out from the pending patches received from other developers. thanks, greg k-h's patch email bot
On Sat, Oct 12, 2024 at 05:46:33PM +0800, Duan Chenghao wrote: > Co-authored-by: Alan Stern <stern@rowland.harvard.edu> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> That's not how to use a co-authored-by tag, please read the documentation for how to do so properly. thanks, greg k-h
On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > Hi Alan, > > The V3 patch has been sent. Please review it to check if it aligns with > the solution you described. Yes, that is what I meant. Have you and all the other people at kylinos.cn tested the patch to make sure that it fixes the problem? Alan Stern
On Sat, Oct 12, 2024 at 02:40:01PM +0200, Greg KH wrote: > On Sat, Oct 12, 2024 at 05:46:33PM +0800, Duan Chenghao wrote: > > Co-authored-by: Alan Stern <stern@rowland.harvard.edu> > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > That's not how to use a co-authored-by tag, please read the > documentation for how to do so properly. Duan, you can add my Signed-off-by: when you resubmit the patch. Alan Stern
在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > Hi Alan, > > > > The V3 patch has been sent. Please review it to check if it aligns > > with > > the solution you described. > > Yes, that is what I meant. > > Have you and all the other people at kylinos.cn tested the patch to > make > sure that it fixes the problem? > Yes, I'm currently arranging for the testing to be conducted. Since the S4 test takes time, the test results will be released later. Based on my understanding, this patch should be able to solve the problem, but I will rely solely on the actual test results for final confirmation. > Alan Stern Duan Chenghao
在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: Hi Saranya, > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > Hi Alan, > > > > The V3 patch has been sent. Please review it to check if it aligns > > with > > the solution you described. > > Yes, that is what I meant. > > Have you and all the other people at kylinos.cn tested the patch to > make > sure that it fixes the problem? > > Alan Stern If you have time, you can arrange to test your issue using the V3 version. This way, we can jointly verify whether the problem has been resolved. Thanks Duan Chenghao
Hi Duan Changhao, > -----Original Message----- > From: duanchenghao <duanchenghao@kylinos.cn> > Sent: Monday, October 14, 2024 7:11 AM > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya > <saranya.gopal@intel.com> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux- > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux- > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz; > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > <xiehongyu1@kylinos.cn> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused > by USB status when S4 wakes up > > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > Hi Saranya, > > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > > > Hi Alan, > > > > > > The V3 patch has been sent. Please review it to check if it aligns > > > with > > > the solution you described. > > > > Yes, that is what I meant. > > > > Have you and all the other people at kylinos.cn tested the patch to > > make > > sure that it fixes the problem? > > > > Alan Stern > > If you have time, you can arrange to test your issue using the V3 > version. This way, we can jointly verify whether the problem has been > resolved. > Thanks for your patch. I will arrange a system by tomorrow and start 1500 cycles of S4 with the V3 version of your patch. Please note that it will take two more days from the start of the test to get the results. Thanks, Saranya > Thanks > Duan Chenghao > >
Hi Duan Chenghao, > -----Original Message----- > From: duanchenghao <duanchenghao@kylinos.cn> > Sent: Monday, October 14, 2024 7:11 AM > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya > <saranya.gopal@intel.com> > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux- > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux- > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz; > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > <xiehongyu1@kylinos.cn> > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure caused > by USB status when S4 wakes up > > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > Hi Saranya, > > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > > > Hi Alan, > > > > > > The V3 patch has been sent. Please review it to check if it aligns > > > with > > > the solution you described. > > > > Yes, that is what I meant. > > > > Have you and all the other people at kylinos.cn tested the patch to > > make > > sure that it fixes the problem? > > > > Alan Stern > > If you have time, you can arrange to test your issue using the V3 > version. This way, we can jointly verify whether the problem has been > resolved. > My testing completed today. I couldn't reproduce the issue when I ran 1500 S4 cycles with v3 version of your patch. The issue was anyway rare before with an occurrence rate of 20/1500 cycles in our system. So, please conclude after verifying from your side also. Thanks, Saranya > Thanks > Duan Chenghao > >
Hi Saranya, 在 2024-10-16星期三的 08:57 +0000,Gopal, Saranya写道: > Hi Duan Chenghao, > > > -----Original Message----- > > From: duanchenghao <duanchenghao@kylinos.cn> > > Sent: Monday, October 14, 2024 7:11 AM > > To: Alan Stern <stern@rowland.harvard.edu>; Gopal, Saranya > > <saranya.gopal@intel.com> > > Cc: Rafael J. Wysocki <rafael@kernel.org>; Hongyu Xie > > <xy521521@gmail.com>; gregkh@linuxfoundation.org; linux- > > kernel@vger.kernel.org; linux-pm@vger.kernel.org; linux- > > usb@vger.kernel.org; niko.mauno@vaisala.com; pavel@ucw.cz; > > stanley_chang@realtek.com; tj@kernel.org; Hongyu Xie > > <xiehongyu1@kylinos.cn> > > Subject: Re: [PATCH] USB: Fix the issue of task recovery failure > > caused > > by USB status when S4 wakes up > > > > 在 2024-10-12星期六的 11:01 -0400,Alan Stern写道: > > Hi Saranya, > > > > > On Sat, Oct 12, 2024 at 05:51:41PM +0800, duanchenghao wrote: > > > > > > > > Hi Alan, > > > > > > > > The V3 patch has been sent. Please review it to check if it > > > > aligns > > > > with > > > > the solution you described. > > > > > > Yes, that is what I meant. > > > > > > Have you and all the other people at kylinos.cn tested the patch > > > to > > > make > > > sure that it fixes the problem? > > > > > > Alan Stern > > > > If you have time, you can arrange to test your issue using the V3 > > version. This way, we can jointly verify whether the problem has > > been > > resolved. > > > My testing completed today. > I couldn't reproduce the issue when I ran 1500 S4 cycles with v3 > version of your patch. > The issue was anyway rare before with an occurrence rate of 20/1500 > cycles in our system. > So, please conclude after verifying from your side also. > Thank you very much for your verification. I am currently conducting parallel testing on multiple machines, and the results should be available soon as well. Thanks, Duan Chenghao > Thanks, > Saranya > > > Thanks > > Duan Chenghao > > > > >
Hi Alan, The current patch has been tested on 6 machines and has passed the tests, with each machine undergoing 500 cycles of testing. Saranya also replied in the email that she conducted 1,500 tests and all were successful. Thanks Duan Chenghao 在 2024-10-24星期四的 10:40 +0800,Duan Chenghao写道: > When a device is inserted into the USB port and an S4 wakeup is > initiated, > after the USB-hub initialization is completed, it will automatically > enter > suspend mode. Upon detecting a device on the USB port, it will > proceed with > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During > the S4 > wakeup process, peripherals are put into suspend mode, followed by > task > recovery. However, upon detecting that the hcd is in the > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, > causing the > S4 suspend to fail and subsequent task recovery to not proceed. > - > [ 27.594598][ 1] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28 > returns -16 > [ 27.594601][ 1] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100 > returns -16 > [ 27.603420][ 1] ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100 > returned 0 after 3 usecs > [ 27.612233][ 1] ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100 > returned -16 after 17223 usecs > [ 27.810067][ 1] PM: Device 0000:00:05.1 failed to quiesce async: > error -16 > [ 27.816988][ 1] PM: quiesce of devices aborted after 1833.282 > msecs > [ 27.823302][ 1] PM: start quiesce of devices aborted after > 1839.975 msecs > ...... > [ 31.303172][ 1] PM: recover of devices complete after 3473.039 > msecs > [ 31.309818][ 1] PM: Failed to load hibernation image, recovering. > [ 31.348188][ 1] PM: Basic memory bitmaps freed > [ 31.352686][ 1] OOM killer enabled. > [ 31.356232][ 1] Restarting tasks ... done. > [ 31.360609][ 1] PM: resume from hibernation failed (0) > [ 31.365800][ 1] PM: Hibernation image not present or could not be > loaded. > > The "do_wakeup" is determined based on whether the controller's > power/wakeup attribute is set. The current issue necessitates > considering > the type of suspend that is occurring. If the suspend type is either > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set > to > false. > > Reported-by: kernel test robot <lkp@intel.com> > Closes: > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/ > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > --- > drivers/usb/core/hcd-pci.c | 15 +++++++++++++-- > include/linux/pm.h | 3 ++- > 2 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/drivers/usb/core/hcd-pci.c b/drivers/usb/core/hcd-pci.c > index a08f3f228e6d..390b1225f3cc 100644 > --- a/drivers/usb/core/hcd-pci.c > +++ b/drivers/usb/core/hcd-pci.c > @@ -422,7 +422,12 @@ static int suspend_common(struct device *dev, > pm_message_t msg) > bool do_wakeup; > int retval; > > - do_wakeup = PMSG_IS_AUTO(msg) ? true : > device_may_wakeup(dev); > + if (PMSG_IS_AUTO(msg)) > + do_wakeup = true; > + else if (PMSG_NO_WAKEUP(msg)) > + do_wakeup = false; > + else > + do_wakeup = device_may_wakeup(dev); > > /* Root hub suspend should have stopped all downstream > traffic, > * and all bus master traffic. And done so for both the > interface > @@ -521,6 +526,11 @@ static int hcd_pci_suspend(struct device *dev) > return suspend_common(dev, PMSG_SUSPEND); > } > > +static int hcd_pci_freeze(struct device *dev) > +{ > + return suspend_common(dev, PMSG_FREEZE); > +} > + > static int hcd_pci_suspend_noirq(struct device *dev) > { > struct pci_dev *pci_dev = to_pci_dev(dev); > @@ -590,6 +600,7 @@ static int hcd_pci_restore(struct device *dev) > #else > > #define hcd_pci_suspend NULL > +#define hcd_pci_freeze NULL > #define hcd_pci_suspend_noirq NULL > #define hcd_pci_poweroff_late NULL > #define hcd_pci_resume_noirq NULL > @@ -624,7 +635,7 @@ const struct dev_pm_ops usb_hcd_pci_pm_ops = { > .suspend_noirq = hcd_pci_suspend_noirq, > .resume_noirq = hcd_pci_resume_noirq, > .resume = hcd_pci_resume, > - .freeze = hcd_pci_suspend, > + .freeze = hcd_pci_freeze, > .freeze_noirq = check_root_hub_suspended, > .thaw_noirq = NULL, > .thaw = hcd_pci_resume, > diff --git a/include/linux/pm.h b/include/linux/pm.h > index 97b0e23363c8..2a676b2cb699 100644 > --- a/include/linux/pm.h > +++ b/include/linux/pm.h > @@ -570,7 +570,8 @@ const struct dev_pm_ops name = { \ > { .event = > PM_EVENT_AUTO_RESUME, }) > > #define PMSG_IS_AUTO(msg) (((msg).event & PM_EVENT_AUTO) != 0) > - > +#define PMSG_NO_WAKEUP(msg) (((msg).event & \ > + (PM_EVENT_FREEZE | PM_EVENT_QUIESCE)) > != 0) > /* > * Device run-time power management status. > *
On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote: > When a device is inserted into the USB port and an S4 wakeup is initiated, > after the USB-hub initialization is completed, it will automatically enter > suspend mode. Upon detecting a device on the USB port, it will proceed with > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During the S4 > wakeup process, peripherals are put into suspend mode, followed by task > recovery. However, upon detecting that the hcd is in the > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, causing the > S4 suspend to fail and subsequent task recovery to not proceed. > - > [ 27.594598][ 1] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28 returns -16 > [ 27.594601][ 1] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100 returns -16 > [ 27.603420][ 1] ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100 returned 0 after 3 usecs > [ 27.612233][ 1] ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100 returned -16 after 17223 usecs > [ 27.810067][ 1] PM: Device 0000:00:05.1 failed to quiesce async: error -16 > [ 27.816988][ 1] PM: quiesce of devices aborted after 1833.282 msecs > [ 27.823302][ 1] PM: start quiesce of devices aborted after 1839.975 msecs > ...... > [ 31.303172][ 1] PM: recover of devices complete after 3473.039 msecs > [ 31.309818][ 1] PM: Failed to load hibernation image, recovering. > [ 31.348188][ 1] PM: Basic memory bitmaps freed > [ 31.352686][ 1] OOM killer enabled. > [ 31.356232][ 1] Restarting tasks ... done. > [ 31.360609][ 1] PM: resume from hibernation failed (0) > [ 31.365800][ 1] PM: Hibernation image not present or could not be loaded. > > The "do_wakeup" is determined based on whether the controller's > power/wakeup attribute is set. The current issue necessitates considering > the type of suspend that is occurring. If the suspend type is either > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set to > false. > > Reported-by: kernel test robot <lkp@intel.com> > Closes: https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/ > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> What commit id does this fix? And I missed where Alan provided a signed-off-by, where was that? thanks, greg k-h
hi greg k-h, 在 2024-10-24星期四的 09:05 +0200,Greg KH写道: > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote: > > When a device is inserted into the USB port and an S4 wakeup is > > initiated, > > after the USB-hub initialization is completed, it will > > automatically enter > > suspend mode. Upon detecting a device on the USB port, it will > > proceed with > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During > > the S4 > > wakeup process, peripherals are put into suspend mode, followed by > > task > > recovery. However, upon detecting that the hcd is in the > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, > > causing the > > S4 suspend to fail and subsequent task recovery to not proceed. > > - > > [ 27.594598][ 1] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28 > > returns -16 > > [ 27.594601][ 1] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100 > > returns -16 > > [ 27.603420][ 1] ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100 > > returned 0 after 3 usecs > > [ 27.612233][ 1] ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100 > > returned -16 after 17223 usecs > > [ 27.810067][ 1] PM: Device 0000:00:05.1 failed to quiesce > > async: error -16 > > [ 27.816988][ 1] PM: quiesce of devices aborted after 1833.282 > > msecs > > [ 27.823302][ 1] PM: start quiesce of devices aborted after > > 1839.975 msecs > > ...... > > [ 31.303172][ 1] PM: recover of devices complete after 3473.039 > > msecs > > [ 31.309818][ 1] PM: Failed to load hibernation image, > > recovering. > > [ 31.348188][ 1] PM: Basic memory bitmaps freed > > [ 31.352686][ 1] OOM killer enabled. > > [ 31.356232][ 1] Restarting tasks ... done. > > [ 31.360609][ 1] PM: resume from hibernation failed (0) > > [ 31.365800][ 1] PM: Hibernation image not present or could not > > be loaded. > > > > The "do_wakeup" is determined based on whether the controller's > > power/wakeup attribute is set. The current issue necessitates > > considering > > the type of suspend that is occurring. If the suspend type is > > either > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set > > to > > false. > > > > Reported-by: kernel test robot <lkp@intel.com> > > Closes: > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/ > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > What commit id does this fix? The current patch is not intended to fix an issue with a specific commit, but rather to address a long-standing problem in the USB core. > > And I missed where Alan provided a signed-off-by, where was that? In the following email, Alan proposed using "Signed-off-by" for signing. https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/ Thanks Duan Chenghao > > thanks, > > greg k-h
On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote: > hi greg k-h, > > 在 2024-10-24星期四的 09:05 +0200,Greg KH写道: > > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote: > > > When a device is inserted into the USB port and an S4 wakeup is > > > initiated, > > > after the USB-hub initialization is completed, it will > > > automatically enter > > > suspend mode. Upon detecting a device on the USB port, it will > > > proceed with > > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During > > > the S4 > > > wakeup process, peripherals are put into suspend mode, followed by > > > task > > > recovery. However, upon detecting that the hcd is in the > > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, > > > causing the > > > S4 suspend to fail and subsequent task recovery to not proceed. > > > - > > > [ 27.594598][ 1] PM: pci_pm_freeze(): hcd_pci_suspend+0x0/0x28 > > > returns -16 > > > [ 27.594601][ 1] PM: dpm_run_callback(): pci_pm_freeze+0x0/0x100 > > > returns -16 > > > [ 27.603420][ 1] ehci-pci 0000:00:04.1: pci_pm_freeze+0x0/0x100 > > > returned 0 after 3 usecs > > > [ 27.612233][ 1] ehci-pci 0000:00:05.1: pci_pm_freeze+0x0/0x100 > > > returned -16 after 17223 usecs > > > [ 27.810067][ 1] PM: Device 0000:00:05.1 failed to quiesce > > > async: error -16 > > > [ 27.816988][ 1] PM: quiesce of devices aborted after 1833.282 > > > msecs > > > [ 27.823302][ 1] PM: start quiesce of devices aborted after > > > 1839.975 msecs > > > ...... > > > [ 31.303172][ 1] PM: recover of devices complete after 3473.039 > > > msecs > > > [ 31.309818][ 1] PM: Failed to load hibernation image, > > > recovering. > > > [ 31.348188][ 1] PM: Basic memory bitmaps freed > > > [ 31.352686][ 1] OOM killer enabled. > > > [ 31.356232][ 1] Restarting tasks ... done. > > > [ 31.360609][ 1] PM: resume from hibernation failed (0) > > > [ 31.365800][ 1] PM: Hibernation image not present or could not > > > be loaded. > > > > > > The "do_wakeup" is determined based on whether the controller's > > > power/wakeup attribute is set. The current issue necessitates > > > considering > > > the type of suspend that is occurring. If the suspend type is > > > either > > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be set > > > to > > > false. > > > > > > Reported-by: kernel test robot <lkp@intel.com> > > > Closes: > > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/ > > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > > > What commit id does this fix? > > The current patch is not intended to fix an issue with a specific > commit, but rather to address a long-standing problem in the USB core. So should it be backported to older stable kernels? If so, how far back? > > And I missed where Alan provided a signed-off-by, where was that? > > In the following email, Alan proposed using "Signed-off-by" for > signing. > https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/ Ah, missed that, sorry. thanks, greg k-h
hi greg k-h, 在 2024-10-29星期二的 04:27 +0100,Greg KH写道: > On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote: > > hi greg k-h, > > > > 在 2024-10-24星期四的 09:05 +0200,Greg KH写道: > > > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote: > > > > When a device is inserted into the USB port and an S4 wakeup is > > > > initiated, > > > > after the USB-hub initialization is completed, it will > > > > automatically enter > > > > suspend mode. Upon detecting a device on the USB port, it will > > > > proceed with > > > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. > > > > During > > > > the S4 > > > > wakeup process, peripherals are put into suspend mode, followed > > > > by > > > > task > > > > recovery. However, upon detecting that the hcd is in the > > > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, > > > > causing the > > > > S4 suspend to fail and subsequent task recovery to not proceed. > > > > - > > > > [ 27.594598][ 1] PM: pci_pm_freeze(): > > > > hcd_pci_suspend+0x0/0x28 > > > > returns -16 > > > > [ 27.594601][ 1] PM: dpm_run_callback(): > > > > pci_pm_freeze+0x0/0x100 > > > > returns -16 > > > > [ 27.603420][ 1] ehci-pci 0000:00:04.1: > > > > pci_pm_freeze+0x0/0x100 > > > > returned 0 after 3 usecs > > > > [ 27.612233][ 1] ehci-pci 0000:00:05.1: > > > > pci_pm_freeze+0x0/0x100 > > > > returned -16 after 17223 usecs > > > > [ 27.810067][ 1] PM: Device 0000:00:05.1 failed to quiesce > > > > async: error -16 > > > > [ 27.816988][ 1] PM: quiesce of devices aborted after > > > > 1833.282 > > > > msecs > > > > [ 27.823302][ 1] PM: start quiesce of devices aborted after > > > > 1839.975 msecs > > > > ...... > > > > [ 31.303172][ 1] PM: recover of devices complete after > > > > 3473.039 > > > > msecs > > > > [ 31.309818][ 1] PM: Failed to load hibernation image, > > > > recovering. > > > > [ 31.348188][ 1] PM: Basic memory bitmaps freed > > > > [ 31.352686][ 1] OOM killer enabled. > > > > [ 31.356232][ 1] Restarting tasks ... done. > > > > [ 31.360609][ 1] PM: resume from hibernation failed (0) > > > > [ 31.365800][ 1] PM: Hibernation image not present or could > > > > not > > > > be loaded. > > > > > > > > The "do_wakeup" is determined based on whether the controller's > > > > power/wakeup attribute is set. The current issue necessitates > > > > considering > > > > the type of suspend that is occurring. If the suspend type is > > > > either > > > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should be > > > > set > > > > to > > > > false. > > > > > > > > Reported-by: kernel test robot <lkp@intel.com> > > > > Closes: > > > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/ > > > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > > > > > What commit id does this fix? > > > > The current patch is not intended to fix an issue with a specific > > commit, but rather to address a long-standing problem in the USB > > core. > > So should it be backported to older stable kernels? If so, how far > back? yes, It needs to be backported. The stable branches such as 6.6.y, 6.10.y, and 6.11.y can be considered for the backport. Should we backport to these versions? Thanks Duan Chenghao > > > > And I missed where Alan provided a signed-off-by, where was that? > > > > In the following email, Alan proposed using "Signed-off-by" for > > signing. > > https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/ > > Ah, missed that, sorry. > > thanks, > > greg k-h
Hi Greg k-h & Alan, Excuse me, both of you. I've noticed that you haven't replied to the emails for quite some time, which prompted me to send this one. I'd like to inquire if the proposal in the current email is feasible and if it can be integrated into the community. Thanks, Duan Chenghao 在 2024-10-29星期二的 16:01 +0800,duanchenghao写道: > hi greg k-h, > > 在 2024-10-29星期二的 04:27 +0100,Greg KH写道: > > On Thu, Oct 24, 2024 at 04:46:48PM +0800, duanchenghao wrote: > > > hi greg k-h, > > > > > > 在 2024-10-24星期四的 09:05 +0200,Greg KH写道: > > > > On Thu, Oct 24, 2024 at 10:40:38AM +0800, Duan Chenghao wrote: > > > > > When a device is inserted into the USB port and an S4 wakeup > > > > > is > > > > > initiated, > > > > > after the USB-hub initialization is completed, it will > > > > > automatically enter > > > > > suspend mode. Upon detecting a device on the USB port, it > > > > > will > > > > > proceed with > > > > > resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. > > > > > During > > > > > the S4 > > > > > wakeup process, peripherals are put into suspend mode, > > > > > followed > > > > > by > > > > > task > > > > > recovery. However, upon detecting that the hcd is in the > > > > > HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY > > > > > status, > > > > > causing the > > > > > S4 suspend to fail and subsequent task recovery to not > > > > > proceed. > > > > > - > > > > > [ 27.594598][ 1] PM: pci_pm_freeze(): > > > > > hcd_pci_suspend+0x0/0x28 > > > > > returns -16 > > > > > [ 27.594601][ 1] PM: dpm_run_callback(): > > > > > pci_pm_freeze+0x0/0x100 > > > > > returns -16 > > > > > [ 27.603420][ 1] ehci-pci 0000:00:04.1: > > > > > pci_pm_freeze+0x0/0x100 > > > > > returned 0 after 3 usecs > > > > > [ 27.612233][ 1] ehci-pci 0000:00:05.1: > > > > > pci_pm_freeze+0x0/0x100 > > > > > returned -16 after 17223 usecs > > > > > [ 27.810067][ 1] PM: Device 0000:00:05.1 failed to quiesce > > > > > async: error -16 > > > > > [ 27.816988][ 1] PM: quiesce of devices aborted after > > > > > 1833.282 > > > > > msecs > > > > > [ 27.823302][ 1] PM: start quiesce of devices aborted > > > > > after > > > > > 1839.975 msecs > > > > > ...... > > > > > [ 31.303172][ 1] PM: recover of devices complete after > > > > > 3473.039 > > > > > msecs > > > > > [ 31.309818][ 1] PM: Failed to load hibernation image, > > > > > recovering. > > > > > [ 31.348188][ 1] PM: Basic memory bitmaps freed > > > > > [ 31.352686][ 1] OOM killer enabled. > > > > > [ 31.356232][ 1] Restarting tasks ... done. > > > > > [ 31.360609][ 1] PM: resume from hibernation failed (0) > > > > > [ 31.365800][ 1] PM: Hibernation image not present or > > > > > could > > > > > not > > > > > be loaded. > > > > > > > > > > The "do_wakeup" is determined based on whether the > > > > > controller's > > > > > power/wakeup attribute is set. The current issue necessitates > > > > > considering > > > > > the type of suspend that is occurring. If the suspend type is > > > > > either > > > > > PM_EVENT_FREEZE or PM_EVENT_QUIESCE, then "do_wakeup" should > > > > > be > > > > > set > > > > > to > > > > > false. > > > > > > > > > > Reported-by: kernel test robot <lkp@intel.com> > > > > > Closes: > > > > > https://lore.kernel.org/oe-kbuild-all/202410151722.rfjtknRz-lkp@intel.com/ > > > > > Signed-off-by: Alan Stern <stern@rowland.harvard.edu> > > > > > Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> > > > > > > > > What commit id does this fix? > > > > > > The current patch is not intended to fix an issue with a specific > > > commit, but rather to address a long-standing problem in the USB > > > core. > > > > So should it be backported to older stable kernels? If so, how far > > back? > > yes, It needs to be backported. The stable branches such as 6.6.y, > 6.10.y, and 6.11.y can be considered for the backport. > > Should we backport to these versions? > > Thanks > > Duan Chenghao > > > > > > And I missed where Alan provided a signed-off-by, where was > > > > that? > > > > > > In the following email, Alan proposed using "Signed-off-by" for > > > signing. > > > https://lore.kernel.org/all/489805e7-c19c-4b57-9cd7-713e075261cd@rowland.harvard.edu/ > > > > Ah, missed that, sorry. > > > > thanks, > > > > greg k-h >
On Wed, Nov 06, 2024 at 09:29:15AM +0800, duanchenghao wrote: > Hi Greg k-h & Alan, > > Excuse me, both of you. I've noticed that you haven't replied to the > emails for quite some time, which prompted me to send this one. I'd > like to inquire if the proposal in the current email is feasible and if > it can be integrated into the community. Sorry, I missed this in the last round of patch reviews. I'll queue it up after 6.13-rc1 is out as the merge window is closed for adding new stuff to my trees. thanks, greg k-h
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c index 1ff7d901fede..a6bd0fbd82f4 100644 --- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -2389,7 +2389,6 @@ void usb_hcd_resume_root_hub (struct usb_hcd *hcd) spin_lock_irqsave (&hcd_root_hub_lock, flags); if (hcd->rh_registered) { pm_wakeup_event(&hcd->self.root_hub->dev, 0); - set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); queue_work(pm_wq, &hcd->wakeup_work); } spin_unlock_irqrestore (&hcd_root_hub_lock, flags); diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 4b93c0bd1d4b..7f847c4afc0d 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3835,11 +3835,14 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg) int usb_remote_wakeup(struct usb_device *udev) { + struct usb_hcd *hcd = bus_to_hcd(udev->bus); int status = 0; usb_lock_device(udev); if (udev->state == USB_STATE_SUSPENDED) { dev_dbg(&udev->dev, "usb %sresume\n", "wakeup-"); + if (hcd->state == HC_STATE_SUSPENDED) + set_bit(HCD_FLAG_WAKEUP_PENDING, &hcd->flags); status = usb_autoresume_device(udev); if (status == 0) { /* Let the drivers do their thing, then... */
When a device is inserted into the USB port and an S4 wakeup is initiated, after the USB-hub initialization is completed, it will automatically enter suspend mode. Upon detecting a device on the USB port, it will proceed with resume and set the hcd to the HCD_FLAG_WAKEUP_PENDING state. During the S4 wakeup process, peripherals are put into suspend mode, followed by task recovery. However, upon detecting that the hcd is in the HCD_FLAG_WAKEUP_PENDING state, it will return an EBUSY status, causing the S4 suspend to fail and subsequent task recovery to not proceed. This patch makes two modifications in total: 1. The set_bit and clean_bit operations for the HCD_FLAG_WAKEUP_PENDING flag of Hcd, which were previously split between the top half and bottom half of the interrupt, are now unified and executed solely in the bottom half of the interrupt. This prevents the bottom half tasks from being frozen during the S4 process, ensuring that the clean_bit process can proceed without interruption. 2. Add a condition to the set_bit operation for the hcd status HCD_FLAG_WAKEUP_PENDING. When the hcd status is HC_STATE_SUSPENDED, perform the setting of the aforementioned status bit. This prevents a subsequent set_bit from occurring after the clean_bit if the hcd is in the resuming process. Signed-off-by: Duan Chenghao <duanchenghao@kylinos.cn> --- drivers/usb/core/hcd.c | 1 - drivers/usb/core/hub.c | 3 +++ 2 files changed, 3 insertions(+), 1 deletion(-)