diff mbox series

usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links

Message ID 20241022123742.3045707-1-mathias.nyman@linux.intel.com
State New
Headers show
Series usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links | expand

Commit Message

Mathias Nyman Oct. 22, 2024, 12:37 p.m. UTC
Fix a boot hang issue triggered when a USB3 device is incorrectly assumed
to be tunneled over USB4, thus attempting to create a device link between
the USB3 "consumer" device and the USB4 "supplier" Host Interface before
the USB4 side is properly bound to a driver.

This could happen if xhci isn't capable of detecting tunneled devices,
but ACPI tables contain all info needed to assume device is tunneled.
i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.

If the USB4 host interface hasn't probed yet, then we know the USB3
device is not in a tunnel created by the USB4 Host Interface driver, so
don't try to create a device link in this case.

cc: Mario Limonciello <mario.limonciello@amd.com>
Fixes: f1bfb4a6fed6 ("usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface")
Tested-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
---
 drivers/usb/core/usb-acpi.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Mathias Nyman Oct. 23, 2024, 12:12 p.m. UTC | #1
On 22.10.2024 16.22, Mika Westerberg wrote:
> Hi,
> 
> On Tue, Oct 22, 2024 at 03:37:42PM +0300, Mathias Nyman wrote:
>> Fix a boot hang issue triggered when a USB3 device is incorrectly assumed
>> to be tunneled over USB4, thus attempting to create a device link between
>> the USB3 "consumer" device and the USB4 "supplier" Host Interface before
>> the USB4 side is properly bound to a driver.
>>
>> This could happen if xhci isn't capable of detecting tunneled devices,
>> but ACPI tables contain all info needed to assume device is tunneled.
>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
>>
>> If the USB4 host interface hasn't probed yet, then we know the USB3
>> device is not in a tunnel created by the USB4 Host Interface driver, so
>> don't try to create a device link in this case.
>>
>> cc: Mario Limonciello <mario.limonciello@amd.com>
>> Fixes: f1bfb4a6fed6 ("usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface")
>> Tested-by: Harry Wentland <harry.wentland@amd.com>
>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>> ---
>>   drivers/usb/core/usb-acpi.c | 11 +++++++++++
>>   1 file changed, 11 insertions(+)
>>
>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>> index 21585ed89ef8..9e1ec71881db 100644
>> --- a/drivers/usb/core/usb-acpi.c
>> +++ b/drivers/usb/core/usb-acpi.c
>> @@ -173,6 +173,17 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>   	if (IS_ERR(nhi_fwnode))
>>   		return 0;
>>   
>> +	/*
>> +	 * If USB4 Host interface driver isn't set up yet we can't be in a USB3
>> +	 * tunnel created by it.
>> +	 */
>> +	if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
>> +		dev_info(&port_dev->dev, "%s probed before USB4 host interface\n",
>> +			 dev_name(&port_dev->child->dev));
>> +		udev->tunnel_mode = USB_LINK_NATIVE;
>> +		return 0;
>> +	}
> 
> I think this will not work if you boot with "thunderbolt.host_reset=0"
> and the BIOS CM created the tunnels, and that Thunderbolt/USB4 driver is
> bound after xHCI. Then it will mark this as USB_LINK_NATIVE and does not
> setup the link so after Thunderbolt/USB4 is runtime suspended it might
> not be there to re-create the tunnel before xHCI.
> 
> The test case would be something like this:
> 
> 1. Add "thunderbolt.host_reset=0" in the kernel command line.
> 2. Boot with USB4 device connected (and so that it has USB 3.x device
>     connected such as USB 3 memory stick).
> 3. Since there is no device link Thunderbolt/USB4 can runtime suspend
> freely.
> 4. Once that happens the tunnels are gone, including the USB 3.x tunnel
>     so the xHCI might see disconnect here.
> 
> Also on resume path it will not be recreating the tunnel before xHCI
> because there is no device link. I can try this on my side too if you
> like.
> 

You are right, I was able to reproduce it.

Device link won't be created if BIOS created the tunnel, thunderbolt driver
probes after this usb device is created, and "thunderbolt.host_reset=0" is set.

Turning the device link "stateless" could possible help.
It removes driver presence dependency but keeps correct suspend/resume and
shutdown ordering.
It should also help with boot hang/probe issues seen on the AMD platforms.

Mario, Harry, does the following work for you?

diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
index 21585ed89ef8..03c22114214b 100644
--- a/drivers/usb/core/usb-acpi.c
+++ b/drivers/usb/core/usb-acpi.c
@@ -170,11 +170,11 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
         struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
                 fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
  
-       if (IS_ERR(nhi_fwnode))
+       if (IS_ERR(nhi_fwnode) || !nhi_fwnode->dev)
                 return 0;
  
         link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev,
-                              DL_FLAG_AUTOREMOVE_CONSUMER |
+                              DL_FLAG_STATELESS |
                                DL_FLAG_RPM_ACTIVE |
                                DL_FLAG_PM_RUNTIME);
         if (!link) {

Thanks
Mathias
Mario Limonciello Oct. 23, 2024, 4:50 p.m. UTC | #2
On 10/23/2024 07:12, Mathias Nyman wrote:
> On 22.10.2024 16.22, Mika Westerberg wrote:
>> Hi,
>>
>> On Tue, Oct 22, 2024 at 03:37:42PM +0300, Mathias Nyman wrote:
>>> Fix a boot hang issue triggered when a USB3 device is incorrectly 
>>> assumed
>>> to be tunneled over USB4, thus attempting to create a device link 
>>> between
>>> the USB3 "consumer" device and the USB4 "supplier" Host Interface before
>>> the USB4 side is properly bound to a driver.
>>>
>>> This could happen if xhci isn't capable of detecting tunneled devices,
>>> but ACPI tables contain all info needed to assume device is tunneled.
>>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
>>>
>>> If the USB4 host interface hasn't probed yet, then we know the USB3
>>> device is not in a tunnel created by the USB4 Host Interface driver, so
>>> don't try to create a device link in this case.
>>>
>>> cc: Mario Limonciello <mario.limonciello@amd.com>
>>> Fixes: f1bfb4a6fed6 ("usb: acpi: add device link between tunneled 
>>> USB3 device and USB4 Host Interface")
>>> Tested-by: Harry Wentland <harry.wentland@amd.com>
>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>> ---
>>>   drivers/usb/core/usb-acpi.c | 11 +++++++++++
>>>   1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>> index 21585ed89ef8..9e1ec71881db 100644
>>> --- a/drivers/usb/core/usb-acpi.c
>>> +++ b/drivers/usb/core/usb-acpi.c
>>> @@ -173,6 +173,17 @@ static int usb_acpi_add_usb4_devlink(struct 
>>> usb_device *udev)
>>>       if (IS_ERR(nhi_fwnode))
>>>           return 0;
>>> +    /*
>>> +     * If USB4 Host interface driver isn't set up yet we can't be in 
>>> a USB3
>>> +     * tunnel created by it.
>>> +     */
>>> +    if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
>>> +        dev_info(&port_dev->dev, "%s probed before USB4 host 
>>> interface\n",
>>> +             dev_name(&port_dev->child->dev));
>>> +        udev->tunnel_mode = USB_LINK_NATIVE;
>>> +        return 0;
>>> +    }
>>
>> I think this will not work if you boot with "thunderbolt.host_reset=0"
>> and the BIOS CM created the tunnels, and that Thunderbolt/USB4 driver is
>> bound after xHCI. Then it will mark this as USB_LINK_NATIVE and does not
>> setup the link so after Thunderbolt/USB4 is runtime suspended it might
>> not be there to re-create the tunnel before xHCI.
>>
>> The test case would be something like this:
>>
>> 1. Add "thunderbolt.host_reset=0" in the kernel command line.
>> 2. Boot with USB4 device connected (and so that it has USB 3.x device
>>     connected such as USB 3 memory stick).
>> 3. Since there is no device link Thunderbolt/USB4 can runtime suspend
>> freely.
>> 4. Once that happens the tunnels are gone, including the USB 3.x tunnel
>>     so the xHCI might see disconnect here.
>>
>> Also on resume path it will not be recreating the tunnel before xHCI
>> because there is no device link. I can try this on my side too if you
>> like.
>>
> 
> You are right, I was able to reproduce it.
> 
> Device link won't be created if BIOS created the tunnel, thunderbolt driver
> probes after this usb device is created, and "thunderbolt.host_reset=0" 
> is set.
> 
> Turning the device link "stateless" could possible help.
> It removes driver presence dependency but keeps correct suspend/resume and
> shutdown ordering.
> It should also help with boot hang/probe issues seen on the AMD platforms.
> 
> Mario, Harry, does the following work for you?

I didn't repro Harry's problem, but I did try on two OEM systems 
(Rembrandt and Phoenix based) with this change in place on a 6.12-rc4 
base and didn't notice anything worse happening.

> 
> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
> index 21585ed89ef8..03c22114214b 100644
> --- a/drivers/usb/core/usb-acpi.c
> +++ b/drivers/usb/core/usb-acpi.c
> @@ -170,11 +170,11 @@ static int usb_acpi_add_usb4_devlink(struct 
> usb_device *udev)
>          struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
>                  fwnode_find_reference(dev_fwnode(&port_dev->dev), 
> "usb4-host-interface", 0);
> 
> -       if (IS_ERR(nhi_fwnode))
> +       if (IS_ERR(nhi_fwnode) || !nhi_fwnode->dev)
>                  return 0;
> 
>          link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev,
> -                              DL_FLAG_AUTOREMOVE_CONSUMER |
> +                              DL_FLAG_STATELESS |
>                                 DL_FLAG_RPM_ACTIVE |
>                                 DL_FLAG_PM_RUNTIME);
>          if (!link) {
> 
> Thanks
> Mathias
> 
>
Harry Wentland Oct. 23, 2024, 6:07 p.m. UTC | #3
On 2024-10-23 12:50, Mario Limonciello wrote:
> On 10/23/2024 07:12, Mathias Nyman wrote:
>> On 22.10.2024 16.22, Mika Westerberg wrote:
>>> Hi,
>>>
>>> On Tue, Oct 22, 2024 at 03:37:42PM +0300, Mathias Nyman wrote:
>>>> Fix a boot hang issue triggered when a USB3 device is incorrectly assumed
>>>> to be tunneled over USB4, thus attempting to create a device link between
>>>> the USB3 "consumer" device and the USB4 "supplier" Host Interface before
>>>> the USB4 side is properly bound to a driver.
>>>>
>>>> This could happen if xhci isn't capable of detecting tunneled devices,
>>>> but ACPI tables contain all info needed to assume device is tunneled.
>>>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.
>>>>
>>>> If the USB4 host interface hasn't probed yet, then we know the USB3
>>>> device is not in a tunnel created by the USB4 Host Interface driver, so
>>>> don't try to create a device link in this case.
>>>>
>>>> cc: Mario Limonciello <mario.limonciello@amd.com>
>>>> Fixes: f1bfb4a6fed6 ("usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface")
>>>> Tested-by: Harry Wentland <harry.wentland@amd.com>
>>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
>>>> ---
>>>>   drivers/usb/core/usb-acpi.c | 11 +++++++++++
>>>>   1 file changed, 11 insertions(+)
>>>>
>>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>>> index 21585ed89ef8..9e1ec71881db 100644
>>>> --- a/drivers/usb/core/usb-acpi.c
>>>> +++ b/drivers/usb/core/usb-acpi.c
>>>> @@ -173,6 +173,17 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>       if (IS_ERR(nhi_fwnode))
>>>>           return 0;
>>>> +    /*
>>>> +     * If USB4 Host interface driver isn't set up yet we can't be in a USB3
>>>> +     * tunnel created by it.
>>>> +     */
>>>> +    if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
>>>> +        dev_info(&port_dev->dev, "%s probed before USB4 host interface\n",
>>>> +             dev_name(&port_dev->child->dev));
>>>> +        udev->tunnel_mode = USB_LINK_NATIVE;
>>>> +        return 0;
>>>> +    }
>>>
>>> I think this will not work if you boot with "thunderbolt.host_reset=0"
>>> and the BIOS CM created the tunnels, and that Thunderbolt/USB4 driver is
>>> bound after xHCI. Then it will mark this as USB_LINK_NATIVE and does not
>>> setup the link so after Thunderbolt/USB4 is runtime suspended it might
>>> not be there to re-create the tunnel before xHCI.
>>>
>>> The test case would be something like this:
>>>
>>> 1. Add "thunderbolt.host_reset=0" in the kernel command line.
>>> 2. Boot with USB4 device connected (and so that it has USB 3.x device
>>>     connected such as USB 3 memory stick).
>>> 3. Since there is no device link Thunderbolt/USB4 can runtime suspend
>>> freely.
>>> 4. Once that happens the tunnels are gone, including the USB 3.x tunnel
>>>     so the xHCI might see disconnect here.
>>>
>>> Also on resume path it will not be recreating the tunnel before xHCI
>>> because there is no device link. I can try this on my side too if you
>>> like.
>>>
>>
>> You are right, I was able to reproduce it.
>>
>> Device link won't be created if BIOS created the tunnel, thunderbolt driver
>> probes after this usb device is created, and "thunderbolt.host_reset=0" is set.
>>
>> Turning the device link "stateless" could possible help.
>> It removes driver presence dependency but keeps correct suspend/resume and
>> shutdown ordering.
>> It should also help with boot hang/probe issues seen on the AMD platforms.
>>
>> Mario, Harry, does the following work for you?
> 
> I didn't repro Harry's problem, but I did try on two OEM systems (Rembrandt and Phoenix based) with this change in place on a 6.12-rc4 base and didn't notice anything worse happening.

Yeah, the following diff works for me.

If you create a patch feel free to add my
Tested-by: Harry Wentland <harry.wentland@amd.com>

Harry

> 
>>
>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>> index 21585ed89ef8..03c22114214b 100644
>> --- a/drivers/usb/core/usb-acpi.c
>> +++ b/drivers/usb/core/usb-acpi.c
>> @@ -170,11 +170,11 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>          struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
>>                  fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>
>> -       if (IS_ERR(nhi_fwnode))
>> +       if (IS_ERR(nhi_fwnode) || !nhi_fwnode->dev)
>>                  return 0;
>>
>>          link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev,
>> -                              DL_FLAG_AUTOREMOVE_CONSUMER |
>> +                              DL_FLAG_STATELESS |
>>                                 DL_FLAG_RPM_ACTIVE |
>>                                 DL_FLAG_PM_RUNTIME);
>>          if (!link) {
>>
>> Thanks
>> Mathias
>>
>>
>
Mathias Nyman Oct. 24, 2024, 10:57 a.m. UTC | #4
On 23.10.2024 21.07, Harry Wentland wrote:
> 
> 
> On 2024-10-23 12:50, Mario Limonciello wrote:
>> On 10/23/2024 07:12, Mathias Nyman wrote:
>>> On 22.10.2024 16.22, Mika Westerberg wrote:
>>>> The test case would be something like this:
>>>>
>>>> 1. Add "thunderbolt.host_reset=0" in the kernel command line.
>>>> 2. Boot with USB4 device connected (and so that it has USB 3.x device
>>>>      connected such as USB 3 memory stick).
>>>> 3. Since there is no device link Thunderbolt/USB4 can runtime suspend
>>>> freely.
>>>> 4. Once that happens the tunnels are gone, including the USB 3.x tunnel
>>>>      so the xHCI might see disconnect here.
>>>>
>>>> Also on resume path it will not be recreating the tunnel before xHCI
>>>> because there is no device link. I can try this on my side too if you
>>>> like.
>>>>
>>>
>>> You are right, I was able to reproduce it.
>>>
>>> Device link won't be created if BIOS created the tunnel, thunderbolt driver
>>> probes after this usb device is created, and "thunderbolt.host_reset=0" is set.
>>>
>>> Turning the device link "stateless" could possible help.
>>> It removes driver presence dependency but keeps correct suspend/resume and
>>> shutdown ordering.
>>> It should also help with boot hang/probe issues seen on the AMD platforms.
>>>
>>> Mario, Harry, does the following work for you?
>>
>> I didn't repro Harry's problem, but I did try on two OEM systems (Rembrandt and Phoenix based) with this change in place on a 6.12-rc4 base and didn't notice anything worse happening.
> 
> Yeah, the following diff works for me.
> 
> If you create a patch feel free to add my
> Tested-by: Harry Wentland <harry.wentland@amd.com>
> 
> Harry

Thanks for testing, I'll turn this into a proper v2 patch

-Mathias
diff mbox series

Patch

diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
index 21585ed89ef8..9e1ec71881db 100644
--- a/drivers/usb/core/usb-acpi.c
+++ b/drivers/usb/core/usb-acpi.c
@@ -173,6 +173,17 @@  static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
 	if (IS_ERR(nhi_fwnode))
 		return 0;
 
+	/*
+	 * If USB4 Host interface driver isn't set up yet we can't be in a USB3
+	 * tunnel created by it.
+	 */
+	if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
+		dev_info(&port_dev->dev, "%s probed before USB4 host interface\n",
+			 dev_name(&port_dev->child->dev));
+		udev->tunnel_mode = USB_LINK_NATIVE;
+		return 0;
+	}
+
 	link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev,
 			       DL_FLAG_AUTOREMOVE_CONSUMER |
 			       DL_FLAG_RPM_ACTIVE |