diff mbox series

[v2] xhci: fix runtime PM imbalance in USB2 resume

Message ID 20220225071506.22012-1-henryl@nvidia.com
State Superseded
Headers show
Series [v2] xhci: fix runtime PM imbalance in USB2 resume | expand

Commit Message

Henry Lin Feb. 25, 2022, 7:15 a.m. UTC
USB2 resume starts with usb_hcd_start_port_resume() in port status
change handling for RESUME link state. usb_hcd_end_port_resume() call is
needed to keep runtime PM balance.

Fixes: a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status")
Signed-off-by: Henry Lin <henryl@nvidia.com>
---
 drivers/usb/host/xhci-hub.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Mathias Nyman Feb. 28, 2022, 10:56 a.m. UTC | #1
On 25.2.2022 9.15, Henry Lin wrote:
> USB2 resume starts with usb_hcd_start_port_resume() in port status
> change handling for RESUME link state. usb_hcd_end_port_resume() call is
> needed to keep runtime PM balance.

For normal usb2 port resume the usb_hcd_end_port_resume() is called when resume
has been signaled for long enough in xhci_handle_usb2_port_link_resume().
 
This is also where driver directs the port to go from Resume state to U0.
Port can't do this without driver directing it.

If there's a failure during resume signaling (disconnect, reset, error) then
stale resume variables are detected in xhci_get_port_status() and 
usb_hcd_end_port_resume() is called.

I do now see a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status") 
does change order of checking and clearing stale resume variables, but this should
only happen if the first port state we read is a fully enabled functional U0 state after
a failed resume.

Could you expand a bit how this was detected? 

> 
> Fixes: a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status")
> Signed-off-by: Henry Lin <henryl@nvidia.com>
> ---
>  drivers/usb/host/xhci-hub.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
> index df3522dab31b..4a8b07b8ee01 100644
> --- a/drivers/usb/host/xhci-hub.c
> +++ b/drivers/usb/host/xhci-hub.c
> @@ -1090,6 +1090,8 @@ static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status,
>  		if (link_state == XDEV_U0) {
>  			bus_state->resume_done[portnum] = 0;
>  			clear_bit(portnum, &bus_state->resuming_ports);
> +			usb_hcd_end_port_resume(&port->rhub->hcd->self,
> +						portnum);

This will call usb_hcd_end_port_resume() every time port is in normal enabled U0 state
even if resume was never signaled, or port suspended.

-Mathias
Henry Lin March 1, 2022, 10:28 a.m. UTC | #2
>> USB2 resume starts with usb_hcd_start_port_resume() in port status
>> change handling for RESUME link state. usb_hcd_end_port_resume() call is
>> needed to keep runtime PM balance.

> For normal usb2 port resume the usb_hcd_end_port_resume() is called when resume
> has been signaled for long enough in xhci_handle_usb2_port_link_resume().
>
> This is also where driver directs the port to go from Resume state to U0.
> Port can't do this without driver directing it.
>
> If there's a failure during resume signaling (disconnect, reset, error) then
> stale resume variables are detected in xhci_get_port_status() and
> usb_hcd_end_port_resume() is called.

> I do now see a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status")
> does change order of checking and clearing stale resume variables, but this should
> only happen if the first port state we read is a fully enabled functional U0 state after
> a failed resume.

> Could you expand a bit how this was detected?
We observed the racing issue when usb2 device-initiated resume occurs in system resume.
If usb2 host-initiated resume for system resume directs U0 before xhci_get_usb2_port_status()
see RESUME state, xhci_get_usb2_port_status() will not finish resume process in
xhci_handle_usb2_port_link_resume(). Its scenario is as follows:

1. System resume starts. All driver system resume callbacks get called in order. XHCI controller
    is resumed by xhci_resume().
2. USB2 root hub is resuming. hcd_bus_resume() is being executed.
3. Before xhci_bus_resume() is finished, XHCI driver receives a port status change event for 
    an USB2 port with RESUME link state in xhci_irq(). XHCI driver starts the process to resume
    HS port for device-initiated resume.
4. In xhci_bus_resume(), host-initiated resume (direct U0) is performed on the same port that is
    resuming in step 3 in below loop:

                if (bus_state->bus_suspended) {
                        spin_unlock_irqrestore(&xhci->lock, flags);
                        msleep(USB_RESUME_TIMEOUT);
                        spin_lock_irqsave(&xhci->lock, flags);
                }
                for_each_set_bit(port_index, &bus_state->bus_suspended,
                                 BITS_PER_LONG) {
                        /* Clear PLC to poll it later for U0 transition */
                        xhci_test_and_clear_bit(xhci, ports[port_index],
                                                PORT_PLC);
                        xhci_set_link_state(xhci, ports[port_index], XDEV_U0);
                }
5. Then, link state of the resuming port is observed as U0 in following
    xhci_get_usb2_port_status(). xhci_handle_usb2_port_link_resume() has
    no chance to get called on the resuming port.

Thanks,
Henry
Mathias Nyman March 1, 2022, 3:49 p.m. UTC | #3
On 1.3.2022 12.28, Henry Lin wrote:
>>> USB2 resume starts with usb_hcd_start_port_resume() in port status
>>> change handling for RESUME link state. usb_hcd_end_port_resume() call is
>>> needed to keep runtime PM balance.
> 
>> For normal usb2 port resume the usb_hcd_end_port_resume() is called when resume
>> has been signaled for long enough in xhci_handle_usb2_port_link_resume().
>>
>> This is also where driver directs the port to go from Resume state to U0.
>> Port can't do this without driver directing it.
>>
>> If there's a failure during resume signaling (disconnect, reset, error) then
>> stale resume variables are detected in xhci_get_port_status() and
>> usb_hcd_end_port_resume() is called.
> 
>> I do now see a231ec41e6f6 ("xhci: refactor U0 link state handling in get_port_status")
>> does change order of checking and clearing stale resume variables, but this should
>> only happen if the first port state we read is a fully enabled functional U0 state after
>> a failed resume.
> 
>> Could you expand a bit how this was detected?
> We observed the racing issue when usb2 device-initiated resume occurs in system resume.
> If usb2 host-initiated resume for system resume directs U0 before xhci_get_usb2_port_status()
> see RESUME state, xhci_get_usb2_port_status() will not finish resume process in
> xhci_handle_usb2_port_link_resume(). Its scenario is as follows:
> 
> 1. System resume starts. All driver system resume callbacks get called in order. XHCI controller
>     is resumed by xhci_resume().
> 2. USB2 root hub is resuming. hcd_bus_resume() is being executed.
> 3. Before xhci_bus_resume() is finished, XHCI driver receives a port status change event for 
>     an USB2 port with RESUME link state in xhci_irq(). XHCI driver starts the process to resume
>     HS port for device-initiated resume.
> 4. In xhci_bus_resume(), host-initiated resume (direct U0) is performed on the same port that is
>     resuming in step 3 in below loop:
> 
>                 if (bus_state->bus_suspended) {
>                         spin_unlock_irqrestore(&xhci->lock, flags);
>                         msleep(USB_RESUME_TIMEOUT);
>                         spin_lock_irqsave(&xhci->lock, flags);
>                 }
>                 for_each_set_bit(port_index, &bus_state->bus_suspended,
>                                  BITS_PER_LONG) {
>                         /* Clear PLC to poll it later for U0 transition */
>                         xhci_test_and_clear_bit(xhci, ports[port_index],
>                                                 PORT_PLC);
>                         xhci_set_link_state(xhci, ports[port_index], XDEV_U0);
>                 }
> 5. Then, link state of the resuming port is observed as U0 in following
>     xhci_get_usb2_port_status(). xhci_handle_usb2_port_link_resume() has
>     no chance to get called on the resuming port.
> 

True, thanks for the explanation.
If there's a race between system resume and device-initiated resume, and port is resumed
in xhci_bus_resume() then yes I see how this could happen.

Maybe only call usb_hcd_end_port_resume() if xhci_irq() detected the device-initiated
resume. i.e. set a value to resume_done[portnum] and called usb_hcd_start_port_resume()
something like: 

@@ -1088,6 +1088,8 @@ static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status,
                if (link_state == XDEV_U2)
                        *status |= USB_PORT_STAT_L1;
                if (link_state == XDEV_U0) {
+                       if (bus_state->resume_done[portnum])
+                               usb_hcd_end_port_resume(&port->rhub->hcd->self, portnum);
                        bus_state->resume_done[portnum] = 0;
                        clear_bit(portnum, &bus_state->resuming_ports);

Also xhci_bus_resume() only resumes ports that were forcefully suspended to U3 in xhci_bus_suspend().
Could be worth checking why that device wasn't already in U3 when suspend reached xhci_bus_suspend().

Thanks
Mathias
Henry Lin March 1, 2022, 7:18 p.m. UTC | #4
> Maybe only call usb_hcd_end_port_resume() if xhci_irq() detected the device-initiated
> resume. i.e. set a value to resume_done[portnum] and called usb_hcd_start_port_resume()
> something like:
>
> @@ -1088,6 +1088,8 @@ static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status,
>                 if (link_state == XDEV_U2)
>                         *status |= USB_PORT_STAT_L1;
>                 if (link_state == XDEV_U0) {
> +                       if (bus_state->resume_done[portnum])
> +                               usb_hcd_end_port_resume(&port->rhub->hcd->self, portnum);
>                         bus_state->resume_done[portnum] = 0;
>                         clear_bit(portnum, &bus_state->resuming_ports);
This looks good. I can submit a new version based on this.

> Also xhci_bus_resume() only resumes ports that were forcefully suspended to U3 in xhci_bus_suspend().
> Could be worth checking why that device wasn't already in U3 when suspend reached xhci_bus_suspend().
It happens when we trigger USB device-initiated resume to bring system out of system suspend state.
For example, we can connect an USB2 external hub on root port and put system into suspend state. The USB2
external hub (the port it connects to) is in U3 after xhci_bus_suspend(). Once we connect a USB device
to downstream port of the USB2 external hub, the hub will trigger device initiated resume to wake up the
system. During system resume, XHCI controller will report the USB2 external hub is in RESUME state.

Thanks,
Henry
diff mbox series

Patch

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index df3522dab31b..4a8b07b8ee01 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -1090,6 +1090,8 @@  static void xhci_get_usb2_port_status(struct xhci_port *port, u32 *status,
 		if (link_state == XDEV_U0) {
 			bus_state->resume_done[portnum] = 0;
 			clear_bit(portnum, &bus_state->resuming_ports);
+			usb_hcd_end_port_resume(&port->rhub->hcd->self,
+						portnum);
 			if (bus_state->suspended_ports & (1 << portnum)) {
 				bus_state->suspended_ports &= ~(1 << portnum);
 				bus_state->port_c_suspend |= 1 << portnum;