Message ID | 20241022123742.3045707-1-mathias.nyman@linux.intel.com |
---|---|
State | New |
Headers | show |
Series | usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links | expand |
On 22.10.2024 16.22, Mika Westerberg wrote: > Hi, > > On Tue, Oct 22, 2024 at 03:37:42PM +0300, Mathias Nyman wrote: >> Fix a boot hang issue triggered when a USB3 device is incorrectly assumed >> to be tunneled over USB4, thus attempting to create a device link between >> the USB3 "consumer" device and the USB4 "supplier" Host Interface before >> the USB4 side is properly bound to a driver. >> >> This could happen if xhci isn't capable of detecting tunneled devices, >> but ACPI tables contain all info needed to assume device is tunneled. >> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN. >> >> If the USB4 host interface hasn't probed yet, then we know the USB3 >> device is not in a tunnel created by the USB4 Host Interface driver, so >> don't try to create a device link in this case. >> >> cc: Mario Limonciello <mario.limonciello@amd.com> >> Fixes: f1bfb4a6fed6 ("usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface") >> Tested-by: Harry Wentland <harry.wentland@amd.com> >> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> >> --- >> drivers/usb/core/usb-acpi.c | 11 +++++++++++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c >> index 21585ed89ef8..9e1ec71881db 100644 >> --- a/drivers/usb/core/usb-acpi.c >> +++ b/drivers/usb/core/usb-acpi.c >> @@ -173,6 +173,17 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >> if (IS_ERR(nhi_fwnode)) >> return 0; >> >> + /* >> + * If USB4 Host interface driver isn't set up yet we can't be in a USB3 >> + * tunnel created by it. >> + */ >> + if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) { >> + dev_info(&port_dev->dev, "%s probed before USB4 host interface\n", >> + dev_name(&port_dev->child->dev)); >> + udev->tunnel_mode = USB_LINK_NATIVE; >> + return 0; >> + } > > I think this will not work if you boot with "thunderbolt.host_reset=0" > and the BIOS CM created the tunnels, and that Thunderbolt/USB4 driver is > bound after xHCI. Then it will mark this as USB_LINK_NATIVE and does not > setup the link so after Thunderbolt/USB4 is runtime suspended it might > not be there to re-create the tunnel before xHCI. > > The test case would be something like this: > > 1. Add "thunderbolt.host_reset=0" in the kernel command line. > 2. Boot with USB4 device connected (and so that it has USB 3.x device > connected such as USB 3 memory stick). > 3. Since there is no device link Thunderbolt/USB4 can runtime suspend > freely. > 4. Once that happens the tunnels are gone, including the USB 3.x tunnel > so the xHCI might see disconnect here. > > Also on resume path it will not be recreating the tunnel before xHCI > because there is no device link. I can try this on my side too if you > like. > You are right, I was able to reproduce it. Device link won't be created if BIOS created the tunnel, thunderbolt driver probes after this usb device is created, and "thunderbolt.host_reset=0" is set. Turning the device link "stateless" could possible help. It removes driver presence dependency but keeps correct suspend/resume and shutdown ordering. It should also help with boot hang/probe issues seen on the AMD platforms. Mario, Harry, does the following work for you? diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c index 21585ed89ef8..03c22114214b 100644 --- a/drivers/usb/core/usb-acpi.c +++ b/drivers/usb/core/usb-acpi.c @@ -170,11 +170,11 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) struct fwnode_handle *nhi_fwnode __free(fwnode_handle) = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); - if (IS_ERR(nhi_fwnode)) + if (IS_ERR(nhi_fwnode) || !nhi_fwnode->dev) return 0; link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev, - DL_FLAG_AUTOREMOVE_CONSUMER | + DL_FLAG_STATELESS | DL_FLAG_RPM_ACTIVE | DL_FLAG_PM_RUNTIME); if (!link) { Thanks Mathias
On 10/23/2024 07:12, Mathias Nyman wrote: > On 22.10.2024 16.22, Mika Westerberg wrote: >> Hi, >> >> On Tue, Oct 22, 2024 at 03:37:42PM +0300, Mathias Nyman wrote: >>> Fix a boot hang issue triggered when a USB3 device is incorrectly >>> assumed >>> to be tunneled over USB4, thus attempting to create a device link >>> between >>> the USB3 "consumer" device and the USB4 "supplier" Host Interface before >>> the USB4 side is properly bound to a driver. >>> >>> This could happen if xhci isn't capable of detecting tunneled devices, >>> but ACPI tables contain all info needed to assume device is tunneled. >>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN. >>> >>> If the USB4 host interface hasn't probed yet, then we know the USB3 >>> device is not in a tunnel created by the USB4 Host Interface driver, so >>> don't try to create a device link in this case. >>> >>> cc: Mario Limonciello <mario.limonciello@amd.com> >>> Fixes: f1bfb4a6fed6 ("usb: acpi: add device link between tunneled >>> USB3 device and USB4 Host Interface") >>> Tested-by: Harry Wentland <harry.wentland@amd.com> >>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> >>> --- >>> drivers/usb/core/usb-acpi.c | 11 +++++++++++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c >>> index 21585ed89ef8..9e1ec71881db 100644 >>> --- a/drivers/usb/core/usb-acpi.c >>> +++ b/drivers/usb/core/usb-acpi.c >>> @@ -173,6 +173,17 @@ static int usb_acpi_add_usb4_devlink(struct >>> usb_device *udev) >>> if (IS_ERR(nhi_fwnode)) >>> return 0; >>> + /* >>> + * If USB4 Host interface driver isn't set up yet we can't be in >>> a USB3 >>> + * tunnel created by it. >>> + */ >>> + if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) { >>> + dev_info(&port_dev->dev, "%s probed before USB4 host >>> interface\n", >>> + dev_name(&port_dev->child->dev)); >>> + udev->tunnel_mode = USB_LINK_NATIVE; >>> + return 0; >>> + } >> >> I think this will not work if you boot with "thunderbolt.host_reset=0" >> and the BIOS CM created the tunnels, and that Thunderbolt/USB4 driver is >> bound after xHCI. Then it will mark this as USB_LINK_NATIVE and does not >> setup the link so after Thunderbolt/USB4 is runtime suspended it might >> not be there to re-create the tunnel before xHCI. >> >> The test case would be something like this: >> >> 1. Add "thunderbolt.host_reset=0" in the kernel command line. >> 2. Boot with USB4 device connected (and so that it has USB 3.x device >> connected such as USB 3 memory stick). >> 3. Since there is no device link Thunderbolt/USB4 can runtime suspend >> freely. >> 4. Once that happens the tunnels are gone, including the USB 3.x tunnel >> so the xHCI might see disconnect here. >> >> Also on resume path it will not be recreating the tunnel before xHCI >> because there is no device link. I can try this on my side too if you >> like. >> > > You are right, I was able to reproduce it. > > Device link won't be created if BIOS created the tunnel, thunderbolt driver > probes after this usb device is created, and "thunderbolt.host_reset=0" > is set. > > Turning the device link "stateless" could possible help. > It removes driver presence dependency but keeps correct suspend/resume and > shutdown ordering. > It should also help with boot hang/probe issues seen on the AMD platforms. > > Mario, Harry, does the following work for you? I didn't repro Harry's problem, but I did try on two OEM systems (Rembrandt and Phoenix based) with this change in place on a 6.12-rc4 base and didn't notice anything worse happening. > > diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c > index 21585ed89ef8..03c22114214b 100644 > --- a/drivers/usb/core/usb-acpi.c > +++ b/drivers/usb/core/usb-acpi.c > @@ -170,11 +170,11 @@ static int usb_acpi_add_usb4_devlink(struct > usb_device *udev) > struct fwnode_handle *nhi_fwnode __free(fwnode_handle) = > fwnode_find_reference(dev_fwnode(&port_dev->dev), > "usb4-host-interface", 0); > > - if (IS_ERR(nhi_fwnode)) > + if (IS_ERR(nhi_fwnode) || !nhi_fwnode->dev) > return 0; > > link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev, > - DL_FLAG_AUTOREMOVE_CONSUMER | > + DL_FLAG_STATELESS | > DL_FLAG_RPM_ACTIVE | > DL_FLAG_PM_RUNTIME); > if (!link) { > > Thanks > Mathias > >
On 2024-10-23 12:50, Mario Limonciello wrote: > On 10/23/2024 07:12, Mathias Nyman wrote: >> On 22.10.2024 16.22, Mika Westerberg wrote: >>> Hi, >>> >>> On Tue, Oct 22, 2024 at 03:37:42PM +0300, Mathias Nyman wrote: >>>> Fix a boot hang issue triggered when a USB3 device is incorrectly assumed >>>> to be tunneled over USB4, thus attempting to create a device link between >>>> the USB3 "consumer" device and the USB4 "supplier" Host Interface before >>>> the USB4 side is properly bound to a driver. >>>> >>>> This could happen if xhci isn't capable of detecting tunneled devices, >>>> but ACPI tables contain all info needed to assume device is tunneled. >>>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN. >>>> >>>> If the USB4 host interface hasn't probed yet, then we know the USB3 >>>> device is not in a tunnel created by the USB4 Host Interface driver, so >>>> don't try to create a device link in this case. >>>> >>>> cc: Mario Limonciello <mario.limonciello@amd.com> >>>> Fixes: f1bfb4a6fed6 ("usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface") >>>> Tested-by: Harry Wentland <harry.wentland@amd.com> >>>> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> >>>> --- >>>> drivers/usb/core/usb-acpi.c | 11 +++++++++++ >>>> 1 file changed, 11 insertions(+) >>>> >>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c >>>> index 21585ed89ef8..9e1ec71881db 100644 >>>> --- a/drivers/usb/core/usb-acpi.c >>>> +++ b/drivers/usb/core/usb-acpi.c >>>> @@ -173,6 +173,17 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >>>> if (IS_ERR(nhi_fwnode)) >>>> return 0; >>>> + /* >>>> + * If USB4 Host interface driver isn't set up yet we can't be in a USB3 >>>> + * tunnel created by it. >>>> + */ >>>> + if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) { >>>> + dev_info(&port_dev->dev, "%s probed before USB4 host interface\n", >>>> + dev_name(&port_dev->child->dev)); >>>> + udev->tunnel_mode = USB_LINK_NATIVE; >>>> + return 0; >>>> + } >>> >>> I think this will not work if you boot with "thunderbolt.host_reset=0" >>> and the BIOS CM created the tunnels, and that Thunderbolt/USB4 driver is >>> bound after xHCI. Then it will mark this as USB_LINK_NATIVE and does not >>> setup the link so after Thunderbolt/USB4 is runtime suspended it might >>> not be there to re-create the tunnel before xHCI. >>> >>> The test case would be something like this: >>> >>> 1. Add "thunderbolt.host_reset=0" in the kernel command line. >>> 2. Boot with USB4 device connected (and so that it has USB 3.x device >>> connected such as USB 3 memory stick). >>> 3. Since there is no device link Thunderbolt/USB4 can runtime suspend >>> freely. >>> 4. Once that happens the tunnels are gone, including the USB 3.x tunnel >>> so the xHCI might see disconnect here. >>> >>> Also on resume path it will not be recreating the tunnel before xHCI >>> because there is no device link. I can try this on my side too if you >>> like. >>> >> >> You are right, I was able to reproduce it. >> >> Device link won't be created if BIOS created the tunnel, thunderbolt driver >> probes after this usb device is created, and "thunderbolt.host_reset=0" is set. >> >> Turning the device link "stateless" could possible help. >> It removes driver presence dependency but keeps correct suspend/resume and >> shutdown ordering. >> It should also help with boot hang/probe issues seen on the AMD platforms. >> >> Mario, Harry, does the following work for you? > > I didn't repro Harry's problem, but I did try on two OEM systems (Rembrandt and Phoenix based) with this change in place on a 6.12-rc4 base and didn't notice anything worse happening. Yeah, the following diff works for me. If you create a patch feel free to add my Tested-by: Harry Wentland <harry.wentland@amd.com> Harry > >> >> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c >> index 21585ed89ef8..03c22114214b 100644 >> --- a/drivers/usb/core/usb-acpi.c >> +++ b/drivers/usb/core/usb-acpi.c >> @@ -170,11 +170,11 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >> struct fwnode_handle *nhi_fwnode __free(fwnode_handle) = >> fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); >> >> - if (IS_ERR(nhi_fwnode)) >> + if (IS_ERR(nhi_fwnode) || !nhi_fwnode->dev) >> return 0; >> >> link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev, >> - DL_FLAG_AUTOREMOVE_CONSUMER | >> + DL_FLAG_STATELESS | >> DL_FLAG_RPM_ACTIVE | >> DL_FLAG_PM_RUNTIME); >> if (!link) { >> >> Thanks >> Mathias >> >> >
On 23.10.2024 21.07, Harry Wentland wrote: > > > On 2024-10-23 12:50, Mario Limonciello wrote: >> On 10/23/2024 07:12, Mathias Nyman wrote: >>> On 22.10.2024 16.22, Mika Westerberg wrote: >>>> The test case would be something like this: >>>> >>>> 1. Add "thunderbolt.host_reset=0" in the kernel command line. >>>> 2. Boot with USB4 device connected (and so that it has USB 3.x device >>>> connected such as USB 3 memory stick). >>>> 3. Since there is no device link Thunderbolt/USB4 can runtime suspend >>>> freely. >>>> 4. Once that happens the tunnels are gone, including the USB 3.x tunnel >>>> so the xHCI might see disconnect here. >>>> >>>> Also on resume path it will not be recreating the tunnel before xHCI >>>> because there is no device link. I can try this on my side too if you >>>> like. >>>> >>> >>> You are right, I was able to reproduce it. >>> >>> Device link won't be created if BIOS created the tunnel, thunderbolt driver >>> probes after this usb device is created, and "thunderbolt.host_reset=0" is set. >>> >>> Turning the device link "stateless" could possible help. >>> It removes driver presence dependency but keeps correct suspend/resume and >>> shutdown ordering. >>> It should also help with boot hang/probe issues seen on the AMD platforms. >>> >>> Mario, Harry, does the following work for you? >> >> I didn't repro Harry's problem, but I did try on two OEM systems (Rembrandt and Phoenix based) with this change in place on a 6.12-rc4 base and didn't notice anything worse happening. > > Yeah, the following diff works for me. > > If you create a patch feel free to add my > Tested-by: Harry Wentland <harry.wentland@amd.com> > > Harry Thanks for testing, I'll turn this into a proper v2 patch -Mathias
diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c index 21585ed89ef8..9e1ec71881db 100644 --- a/drivers/usb/core/usb-acpi.c +++ b/drivers/usb/core/usb-acpi.c @@ -173,6 +173,17 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) if (IS_ERR(nhi_fwnode)) return 0; + /* + * If USB4 Host interface driver isn't set up yet we can't be in a USB3 + * tunnel created by it. + */ + if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) { + dev_info(&port_dev->dev, "%s probed before USB4 host interface\n", + dev_name(&port_dev->child->dev)); + udev->tunnel_mode = USB_LINK_NATIVE; + return 0; + } + link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev, DL_FLAG_AUTOREMOVE_CONSUMER | DL_FLAG_RPM_ACTIVE |