diff mbox series

usb: hcd: xhci: Add set command timer delay API

Message ID 20230818092353.124658-1-hgajjar@de.adit-jv.com
State New
Headers show
Series usb: hcd: xhci: Add set command timer delay API | expand

Commit Message

Hardik Gajjar Aug. 18, 2023, 9:23 a.m. UTC
xHCI driver starts the response timer after sending each
command to the device. The default value of this timer is
5 seconds (XHCI_CMD_DEFAULT_TIMEOUT = HZ*5). This seems
too high in time crtical use case.

This patch provides an API to change the default value of
the timer from the vendor USB driver.

The default value will be XHCI_CMD_DEFAULT_TIMEOUT (5 sec)

Use case:
According to the Smartphone integration certification
requirement in the automotive, the phone connected via USB
should complete enumeration and user space handshake
within 3 sec.

Reducing the response waiting time by setting the smaller
command timer delay helps to speed up overall re-enumeration
process of the USB device in case of device is not responding
properly in first enumeration iteration.

Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com>
---
 drivers/usb/core/hcd.c       | 23 +++++++++++++++++++++++
 drivers/usb/host/xhci-ring.c | 10 +++++-----
 drivers/usb/host/xhci.c      | 15 +++++++++++++++
 drivers/usb/host/xhci.h      |  1 +
 include/linux/usb/hcd.h      |  2 ++
 5 files changed, 46 insertions(+), 5 deletions(-)

Comments

Hardik Gajjar Aug. 21, 2023, 9:55 a.m. UTC | #1
On Fri, Aug 18, 2023 at 04:18:30PM +0300, Mathias Nyman wrote:
> On 18.8.2023 12.23, Hardik Gajjar wrote:
> > xHCI driver starts the response timer after sending each
> > command to the device. The default value of this timer is
> > 5 seconds (XHCI_CMD_DEFAULT_TIMEOUT = HZ*5). This seems
> > too high in time crtical use case.
> > 
> > This patch provides an API to change the default value of
> > the timer from the vendor USB driver.
> > 
> > The default value will be XHCI_CMD_DEFAULT_TIMEOUT (5 sec)
> > 
> > Use case:
> > According to the Smartphone integration certification
> > requirement in the automotive, the phone connected via USB
> > should complete enumeration and user space handshake
> > within 3 sec.
> 
> The above incorrectly makes it sound as if the command timeout
> timer causes the delay.
>
Thank you, Mathias, for your prompt response. I will enhance the message
to provide more specificity in the subsequent patch.
> > 
> > Reducing the response waiting time by setting the smaller
> > command timer delay helps to speed up overall re-enumeration
> > process of the USB device in case of device is not responding
> > properly in first enumeration iteration.
> 
> So is this a case where addressing a usb device behind xHC always
> fail on the first attempt, i.e. address device command in xhci
> never completes. Solution proposed here is to fail faster and
> retry?
> 
> Is the rootcause known why first enumeration fails?
> 
> Does setting old_scheme_first module parameter help?
> 
Yes, you are correct. The problem occurs when setting the address,
and in such cases, this patch helps to fail faster and retry.

Unfortunately, the root cause is unknown. This problem is mainly
observed with Android phones. Upon analyzing the USB analyzer logs,
it seems that the device is not responding to the SET_ADDRESS request.

I tried using "old_scheme_first=Y," but that did not help. Below are
the short logs of the first iteration enumeration failure.
> > 
> > Signed-off-by: Hardik Gajjar <hgajjar@de.adit-jv.com>
> > ---
> >   drivers/usb/core/hcd.c       | 23 +++++++++++++++++++++++
> >   drivers/usb/host/xhci-ring.c | 10 +++++-----
> >   drivers/usb/host/xhci.c      | 15 +++++++++++++++
> >   drivers/usb/host/xhci.h      |  1 +
> >   include/linux/usb/hcd.h      |  2 ++
> >   5 files changed, 46 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> > index 8300baedafd2..e392e90e918c 100644
> > --- a/drivers/usb/core/hcd.c
> > +++ b/drivers/usb/core/hcd.c
> > @@ -3157,6 +3157,29 @@ int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
> >   }
> >   EXPORT_SYMBOL_GPL(usb_hcd_setup_local_mem);
> > +/**
> > + * usb_hcd_set_cmd_timer_delay Set the delay of the command timer.
> > + * @hcd - pointer to the HCD representing the controller
> > + * @delay - Delay value to be used in command timer.
> > + *
> > + * wrapper function to call the set_cmd_timer_delay API of the host
> > + * diver.
> > + *
> > + * return 0 on success; otherwise -ENODEV means the feature not
> > + * supported by host driver.
> > + */
> > +
> > +int usb_hcd_set_cmd_timer_delay(struct usb_hcd *hcd, int delay)
> > +{
> > +	int ret = -ENODEV;
> > +
> > +	if (hcd->driver->set_cmd_timer_delay)
> > +		ret = hcd->driver->set_cmd_timer_delay(hcd, delay);
> > +
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(usb_hcd_set_cmd_timer_delay);
> > +
> 
> The xhci command timeout is more of a xhci internal thing, not sure it's a good
> idea to add this to hcd.
> 
> Would it make sense to add a timeout parameter to hcd->driver->address_device(hcd, udev)
> instead?
> 
> First priority should of course be finding out why the first enumeration fails,
> and solve that.
> 
> Thanks
> Mathias
Thanks for the suggestion to modify hcd->driver->address_device.
I will definitely implement it.However to confirm.

So, if I understand correctly, the idea is to avoid exposing any
API from the xhci driver, but instead, create an interface in hcd.c (such as sysfs or API)
and incorporate the delay in address_device as an additional parameter.

However, in that case, modifying xhci is still necessary as the timer is controlled from there.

Thanks,
Hardik
kernel test robot Aug. 21, 2023, 7:54 p.m. UTC | #2
Hi Hardik,

kernel test robot noticed the following build warnings:

[auto build test WARNING on usb/usb-testing]
[also build test WARNING on usb/usb-next usb/usb-linus linus/master v6.5-rc7 next-20230821]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Hardik-Gajjar/usb-hcd-xhci-Add-set-command-timer-delay-API/20230821-094400
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git usb-testing
patch link:    https://lore.kernel.org/r/20230818092353.124658-1-hgajjar%40de.adit-jv.com
patch subject: [PATCH] usb: hcd: xhci: Add set command timer delay API
config: x86_64-randconfig-013-20230821 (https://download.01.org/0day-ci/archive/20230822/202308220340.2SPzc4kd-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce: (https://download.01.org/0day-ci/archive/20230822/202308220340.2SPzc4kd-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308220340.2SPzc4kd-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/usb/core/hcd.c:3133: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
    * usb_hcd_set_cmd_timer_delay Set the delay of the command timer.


vim +3133 drivers/usb/core/hcd.c

  3131	
  3132	/**
> 3133	 * usb_hcd_set_cmd_timer_delay Set the delay of the command timer.
  3134	 * @hcd - pointer to the HCD representing the controller
  3135	 * @delay - Delay value to be used in command timer.
  3136	 *
  3137	 * wrapper function to call the set_cmd_timer_delay API of the host
  3138	 * diver.
  3139	 *
  3140	 * return 0 on success; otherwise -ENODEV means the feature not
  3141	 * supported by host driver.
  3142	 */
  3143
Mathias Nyman Aug. 29, 2023, 1:57 p.m. UTC | #3
On 21.8.2023 12.55, Hardik Gajjar wrote:
> On Fri, Aug 18, 2023 at 04:18:30PM +0300, Mathias Nyman wrote:
>> On 18.8.2023 12.23, Hardik Gajjar wrote:
>>> xHCI driver starts the response timer after sending each
>>> command to the device. The default value of this timer is
>>> 5 seconds (XHCI_CMD_DEFAULT_TIMEOUT = HZ*5). This seems
>>> too high in time crtical use case.
>>>
>>> This patch provides an API to change the default value of
>>> the timer from the vendor USB driver.
>>>
>>> The default value will be XHCI_CMD_DEFAULT_TIMEOUT (5 sec)
>>>
>>> Use case:
>>> According to the Smartphone integration certification
>>> requirement in the automotive, the phone connected via USB
>>> should complete enumeration and user space handshake
>>> within 3 sec.
>>
>> The above incorrectly makes it sound as if the command timeout
>> timer causes the delay.
>>
> Thank you, Mathias, for your prompt response. I will enhance the message
> to provide more specificity in the subsequent patch.
>>>
>>> Reducing the response waiting time by setting the smaller
>>> command timer delay helps to speed up overall re-enumeration
>>> process of the USB device in case of device is not responding
>>> properly in first enumeration iteration.
>>
>> So is this a case where addressing a usb device behind xHC always
>> fail on the first attempt, i.e. address device command in xhci
>> never completes. Solution proposed here is to fail faster and
>> retry?
>>
>> Is the rootcause known why first enumeration fails?
>>
>> Does setting old_scheme_first module parameter help?
>>
> Yes, you are correct. The problem occurs when setting the address,
> and in such cases, this patch helps to fail faster and retry.
> 
> Unfortunately, the root cause is unknown. This problem is mainly
> observed with Android phones. Upon analyzing the USB analyzer logs,
> it seems that the device is not responding to the SET_ADDRESS request.

Is this only seen when connecting the device to a Linux xHCI host at USB 3 speeds?
How about connecting to a windows machine? or USB 2 Linux machine with a EHCI host?

> 
> I tried using "old_scheme_first=Y," but that did not help. Below are
> the short logs of the first iteration enumeration failure.
>>
>> The xhci command timeout is more of a xhci internal thing, not sure it's a good
>> idea to add this to hcd.
>>
>> Would it make sense to add a timeout parameter to hcd->driver->address_device(hcd, udev)
>> instead?
>>
>> First priority should of course be finding out why the first enumeration fails,
>> and solve that.
>>
>> Thanks
>> Mathias
> Thanks for the suggestion to modify hcd->driver->address_device.
> I will definitely implement it.However to confirm.
> 
> So, if I understand correctly, the idea is to avoid exposing any
> API from the xhci driver, but instead, create an interface in hcd.c (such as sysfs or API)
> and incorporate the delay in address_device as an additional parameter.

On second thought it only makes sense to do this if we can identify the device in advance
and make a quirk for it. But at this stage we don't know anything about the device.

So I guess this depends on the width of the problem.

If this works on windows then we need to figure out what we do differently.

If this fails with all hosts (well xHCI and EHCI) then hcd level change is probably needed.
For xhci we would pass timeout parameter when calling address_device, for other hosts
the timeout for the SET_ADDRESS control transfer would be adjusted.

If this only fails when connected behind a xHCI host then a local xHCI change should do.

> However, in that case, modifying xhci is still necessary as the timer is controlled from there.

Yes, xhci changes will be needed.

I suggest adding a timeout parameter to struct xhci_command, and use that when calling
xhci_mod_cmd_timer(). This way we can tailor different timeouts for different commands.

-Mathias
Hardik Gajjar Sept. 4, 2023, 9:57 a.m. UTC | #4
On Tue, Aug 29, 2023 at 04:57:54PM +0300, Mathias Nyman wrote:
> On 21.8.2023 12.55, Hardik Gajjar wrote:
> > On Fri, Aug 18, 2023 at 04:18:30PM +0300, Mathias Nyman wrote:
> > > On 18.8.2023 12.23, Hardik Gajjar wrote:
> > > > xHCI driver starts the response timer after sending each
> > > > command to the device. The default value of this timer is
> > > > 5 seconds (XHCI_CMD_DEFAULT_TIMEOUT = HZ*5). This seems
> > > > too high in time crtical use case.
> > > > 
> > > > This patch provides an API to change the default value of
> > > > the timer from the vendor USB driver.
> > > > 
> > > > The default value will be XHCI_CMD_DEFAULT_TIMEOUT (5 sec)
> > > > 
> > > > Use case:
> > > > According to the Smartphone integration certification
> > > > requirement in the automotive, the phone connected via USB
> > > > should complete enumeration and user space handshake
> > > > within 3 sec.
> > > 
> > > The above incorrectly makes it sound as if the command timeout
> > > timer causes the delay.
> > > 
> > Thank you, Mathias, for your prompt response. I will enhance the message
> > to provide more specificity in the subsequent patch.
> > > > 
> > > > Reducing the response waiting time by setting the smaller
> > > > command timer delay helps to speed up overall re-enumeration
> > > > process of the USB device in case of device is not responding
> > > > properly in first enumeration iteration.
> > > 
> > > So is this a case where addressing a usb device behind xHC always
> > > fail on the first attempt, i.e. address device command in xhci
> > > never completes. Solution proposed here is to fail faster and
> > > retry?
> > > 
> > > Is the rootcause known why first enumeration fails?
> > > 
> > > Does setting old_scheme_first module parameter help?
> > > 
> > Yes, you are correct. The problem occurs when setting the address,
> > and in such cases, this patch helps to fail faster and retry.
> > 
> > Unfortunately, the root cause is unknown. This problem is mainly
> > observed with Android phones. Upon analyzing the USB analyzer logs,
> > it seems that the device is not responding to the SET_ADDRESS request.
> 
> Is this only seen when connecting the device to a Linux xHCI host at USB 3 speeds?
> How about connecting to a windows machine? or USB 2 Linux machine with a EHCI host?
I could not able to reproduce the issue when connecting the
phone to x86 Ubuntu20.04 machine.

In our setup, we have a arm64 target with a DWC3 USB3 IP core,
connected through a USB hub (specifically designed for automotive use)
to the Android phone. Interestingly,when replicating this setup with
the x86 Ubuntu machine, we encountered a discrepancy.

The primary difference between the two setups lies in the utilization
of the DWC3 USB IP core.

Working Case:

0,HS,202,0:17.672.512,1.333 us,8 B,I,00,00,SETUP txn (SPLIT),80 06 00 01 00 00 40 00
1,HS,203,0:17.672.512,83 ns,4 B,,00,00,   SPLIT packet,78 0D 03 70
1,HS,204,0:17.672.513,66 ns,3 B,,00,00,   SETUP packet,2D 00 10
1,HS,205,0:17.672.513,200 ns,11 B,,00,00,   DATA0 packet,C3 80 06 00 01 00 00 40 00 DD 94
1,HS,206,0:17.672.514,33 ns,1 B,,00,00,   ACK packet,D2
0,HS,207,0:17.672.535,5.625.983 ms,,I,00,00,[256 CSPLIT-SETUP-NYET],

Non-Working Case:

0,HS,164,0:10.982.303,3.116 us,8 B,I,00,00,SETUP txn (SPLIT-STALL),00 05 04 00 00 00 00 00
1,HS,165,0:10.982.303,83 ns,4 B,,00,00,   SPLIT packet,78 01 02 A0
1,HS,166,0:10.982.303,66 ns,3 B,,00,00,   SETUP packet,2D 00 10
1,HS,167,0:10.982.304,200 ns,11 B,,00,00,   DATA0 packet,C3 00 05 04 00 00 00 00 00 EB 70
1,HS,168,0:10.982.304,50 ns,1 B,,00,00,   ACK packet,D2
1,HS,169,0:10.982.305,866 ns,,,00,00,   CSPLIT-SETUP-STALL,
2,HS,170,0:10.982.305,83 ns,4 B,,00,00,      SPLIT packet,78 81 02 78
2,HS,171,0:10.982.306,66 ns,3 B,,00,00,      SETUP packet,2D 00 10
2,HS,172,0:10.982.306,50 ns,1 B,,00,00,      STALL packet,1E

Upon close inspection of the USB analyzer data for both cases,
it becomes evident that a STALL packet consistently appears in
the non-working scenario.

This stands in contrast to the working case, where the channel
remains open, allowing for the successful execution of the following
high-speed enumeration commands.

In the event of a stall, we require an interruption from the host to
resume communication.
Given that there is a fixed 5-second command delay, this interruption
can only occur after this timeout period.

Therefore, I would like to propose a solution: providing an additional
API to control this command timeout or as you suggested, adding a timeout
parameter to struct xhci_command, and use that when calling xhci_mod_cmd_timer()
without the need to address any potential bugs in the xHCI driver.
> 
> > 
> > I tried using "old_scheme_first=Y," but that did not help. Below are
> > the short logs of the first iteration enumeration failure.
> > > 
> > > The xhci command timeout is more of a xhci internal thing, not sure it's a good
> > > idea to add this to hcd.
> > > 
> > > Would it make sense to add a timeout parameter to hcd->driver->address_device(hcd, udev)
> > > instead?
> > > 
> > > First priority should of course be finding out why the first enumeration fails,
> > > and solve that.
> > > 
> > > Thanks
> > > Mathias
> > Thanks for the suggestion to modify hcd->driver->address_device.
> > I will definitely implement it.However to confirm.
> > 
> > So, if I understand correctly, the idea is to avoid exposing any
> > API from the xhci driver, but instead, create an interface in hcd.c (such as sysfs or API)
> > and incorporate the delay in address_device as an additional parameter.
> 
> On second thought it only makes sense to do this if we can identify the device in advance
> and make a quirk for it. But at this stage we don't know anything about the device.
> 
> So I guess this depends on the width of the problem.
> 
> If this works on windows then we need to figure out what we do differently.
> 
> If this fails with all hosts (well xHCI and EHCI) then hcd level change is probably needed.
> For xhci we would pass timeout parameter when calling address_device, for other hosts
> the timeout for the SET_ADDRESS control transfer would be adjusted.
> 
> If this only fails when connected behind a xHCI host then a local xHCI change should do.
> 
> > However, in that case, modifying xhci is still necessary as the timer is controlled from there.
> 
> Yes, xhci changes will be needed.
> 
> I suggest adding a timeout parameter to struct xhci_command, and use that when calling
> xhci_mod_cmd_timer(). This way we can tailor different timeouts for different commands.
> 
> -Mathias
Thank you, Mathias. I plan to proceed with the recommended approach and prepare the second
version of the upstream patch for review

Hardik
Mathias Nyman Sept. 5, 2023, 2:31 p.m. UTC | #5
On 4.9.2023 12.57, Hardik Gajjar wrote:
> 
> In our setup, we have a arm64 target with a DWC3 USB3 IP core,
> connected through a USB hub (specifically designed for automotive use)
> to the Android phone. Interestingly,when replicating this setup with
> the x86 Ubuntu machine, we encountered a discrepancy.
> 
> The primary difference between the two setups lies in the utilization
> of the DWC3 USB IP core.

Because of the SPLIT transactions seen below it looks like phone is being
enumerated at Full or Low speed behind the automotive High-speed HUB.

> 
> Working Case:
> 
> 0,HS,202,0:17.672.512,1.333 us,8 B,I,00,00,SETUP txn (SPLIT),80 06 00 01 00 00 40 00
> 1,HS,203,0:17.672.512,83 ns,4 B,,00,00,   SPLIT packet,78 0D 03 70
> 1,HS,204,0:17.672.513,66 ns,3 B,,00,00,   SETUP packet,2D 00 10
> 1,HS,205,0:17.672.513,200 ns,11 B,,00,00,   DATA0 packet,C3 80 06 00 01 00 00 40 00 DD 94
> 1,HS,206,0:17.672.514,33 ns,1 B,,00,00,   ACK packet,D2
> 0,HS,207,0:17.672.535,5.625.983 ms,,I,00,00,[256 CSPLIT-SETUP-NYET],
> 

The working case shows a GET_DESCRIPTOR DEVICE request (80 06 00 01).
Was the SET ADDRESS request (00 05) seen before or after this?

> Non-Working Case:
> 
> 0,HS,164,0:10.982.303,3.116 us,8 B,I,00,00,SETUP txn (SPLIT-STALL),00 05 04 00 00 00 00 00
> 1,HS,165,0:10.982.303,83 ns,4 B,,00,00,   SPLIT packet,78 01 02 A0
> 1,HS,166,0:10.982.303,66 ns,3 B,,00,00,   SETUP packet,2D 00 10
> 1,HS,167,0:10.982.304,200 ns,11 B,,00,00,   DATA0 packet,C3 00 05 04 00 00 00 00 00 EB 70
> 1,HS,168,0:10.982.304,50 ns,1 B,,00,00,   ACK packet,D2
> 1,HS,169,0:10.982.305,866 ns,,,00,00,   CSPLIT-SETUP-STALL,
> 2,HS,170,0:10.982.305,83 ns,4 B,,00,00,      SPLIT packet,78 81 02 78
> 2,HS,171,0:10.982.306,66 ns,3 B,,00,00,      SETUP packet,2D 00 10
> 2,HS,172,0:10.982.306,50 ns,1 B,,00,00,      STALL packet,1E

In the failing case the SET ADDRESS (00 05) request STALLs.
Was there a GET_DESCRIPOR request before this?

This could still be a case where the order of SET ADDRES and GET DESCRIPTOR
requests matter.

See if you can force new scheme enumeration, it should make hub driver issue
a GET_DESCRIPTOR request before SET ADDRESS.

Probably testable with some quick hack that forces use_new_scheme() to return
  "true" in hub.c

> 
> Upon close inspection of the USB analyzer data for both cases,
> it becomes evident that a STALL packet consistently appears in
> the non-working scenario.
> 
> This stands in contrast to the working case, where the channel
> remains open, allowing for the successful execution of the following
> high-speed enumeration commands.
> 
> In the event of a stall, we require an interruption from the host to
> resume communication.
> Given that there is a fixed 5-second command delay, this interruption
> can only occur after this timeout period.
> 
> Therefore, I would like to propose a solution: providing an additional
> API to control this command timeout or as you suggested, adding a timeout
> parameter to struct xhci_command, and use that when calling xhci_mod_cmd_timer()
> without the need to address any potential bugs in the xHCI driver.

Lets first see if changing enumeration scheme helps.

If not, and the enumeration issue is only seen for devices behind this specific
automotive hub, then a quirk could be set for it that adjusts the timeout.

Also let me know if you find xhci driver bugs, those need to be fixed anyway.

Thanks
-Mathias
Hardik Gajjar Sept. 27, 2023, 2:14 p.m. UTC | #6
On Tue, Sep 05, 2023 at 05:31:12PM +0300, Mathias Nyman wrote:
> On 4.9.2023 12.57, Hardik Gajjar wrote:
> > 
> > In our setup, we have a arm64 target with a DWC3 USB3 IP core,
> > connected through a USB hub (specifically designed for automotive use)
> > to the Android phone. Interestingly,when replicating this setup with
> > the x86 Ubuntu machine, we encountered a discrepancy.
> > 
> > The primary difference between the two setups lies in the utilization
> > of the DWC3 USB IP core.
> 
> Because of the SPLIT transactions seen below it looks like phone is being
> enumerated at Full or Low speed behind the automotive High-speed HUB.
> 
> > 
> > Working Case:
> > 
> > 0,HS,202,0:17.672.512,1.333 us,8 B,I,00,00,SETUP txn (SPLIT),80 06 00 01 00 00 40 00
> > 1,HS,203,0:17.672.512,83 ns,4 B,,00,00,   SPLIT packet,78 0D 03 70
> > 1,HS,204,0:17.672.513,66 ns,3 B,,00,00,   SETUP packet,2D 00 10
> > 1,HS,205,0:17.672.513,200 ns,11 B,,00,00,   DATA0 packet,C3 80 06 00 01 00 00 40 00 DD 94
> > 1,HS,206,0:17.672.514,33 ns,1 B,,00,00,   ACK packet,D2
> > 0,HS,207,0:17.672.535,5.625.983 ms,,I,00,00,[256 CSPLIT-SETUP-NYET],
> > 
> 
> The working case shows a GET_DESCRIPTOR DEVICE request (80 06 00 01).
> Was the SET ADDRESS request (00 05) seen before or after this?
> 
> > Non-Working Case:
> > 
> > 0,HS,164,0:10.982.303,3.116 us,8 B,I,00,00,SETUP txn (SPLIT-STALL),00 05 04 00 00 00 00 00
> > 1,HS,165,0:10.982.303,83 ns,4 B,,00,00,   SPLIT packet,78 01 02 A0
> > 1,HS,166,0:10.982.303,66 ns,3 B,,00,00,   SETUP packet,2D 00 10
> > 1,HS,167,0:10.982.304,200 ns,11 B,,00,00,   DATA0 packet,C3 00 05 04 00 00 00 00 00 EB 70
> > 1,HS,168,0:10.982.304,50 ns,1 B,,00,00,   ACK packet,D2
> > 1,HS,169,0:10.982.305,866 ns,,,00,00,   CSPLIT-SETUP-STALL,
> > 2,HS,170,0:10.982.305,83 ns,4 B,,00,00,      SPLIT packet,78 81 02 78
> > 2,HS,171,0:10.982.306,66 ns,3 B,,00,00,      SETUP packet,2D 00 10
> > 2,HS,172,0:10.982.306,50 ns,1 B,,00,00,      STALL packet,1E
> 
> In the failing case the SET ADDRESS (00 05) request STALLs.
> Was there a GET_DESCRIPOR request before this?
> 
> This could still be a case where the order of SET ADDRES and GET DESCRIPTOR
> requests matter.
> 
> See if you can force new scheme enumeration, it should make hub driver issue
> a GET_DESCRIPTOR request before SET ADDRESS.
> 
> Probably testable with some quick hack that forces use_new_scheme() to return
>  "true" in hub.c
In failing case, there is no GET_DESCRIPTOR DEVICE request.

The result is even worse when enabling the 'new_scheme.'
Instead of a timeout, it directly crashes and restarts the USB subsystem.

In any case, we cannot use the 'new_scheme' because it exposes additional transactions
in USB enumeration. This leads to additional delays in the enumeration of all devices,
potentially exceeding the suggested time limit
> 
> > 
> > Upon close inspection of the USB analyzer data for both cases,
> > it becomes evident that a STALL packet consistently appears in
> > the non-working scenario.
> > 
> > This stands in contrast to the working case, where the channel
> > remains open, allowing for the successful execution of the following
> > high-speed enumeration commands.
> > 
> > In the event of a stall, we require an interruption from the host to
> > resume communication.
> > Given that there is a fixed 5-second command delay, this interruption
> > can only occur after this timeout period.
> > 
> > Therefore, I would like to propose a solution: providing an additional
> > API to control this command timeout or as you suggested, adding a timeout
> > parameter to struct xhci_command, and use that when calling xhci_mod_cmd_timer()
> > without the need to address any potential bugs in the xHCI driver.
> 
> Lets first see if changing enumeration scheme helps.
> 
> If not, and the enumeration issue is only seen for devices behind this specific
> automotive hub, then a quirk could be set for it that adjusts the timeout.
> 
> Also let me know if you find xhci driver bugs, those need to be fixed anyway.
>
I couldn't identify any issues with the xHCI driver either. However, I'd like
to propose the implementation of a more generic mechanism to control the 5-second
(HZ*5) timeout. This would offer better flexibility and help reduce enumeration
time in case of bus errors.

As of now, following your suggestion, I will add a parameter in the set_address
API and use the Quirk in hub.c to pass either the default or customized timeout
to the xHCI driver. However, in that case also we may need minor modifications in
xhci driver.

soon will submit Patch-v2

Thanks,
Hardik

> Thanks
> -Mathias
diff mbox series

Patch

diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 8300baedafd2..e392e90e918c 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -3157,6 +3157,29 @@  int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
 }
 EXPORT_SYMBOL_GPL(usb_hcd_setup_local_mem);
 
+/**
+ * usb_hcd_set_cmd_timer_delay Set the delay of the command timer.
+ * @hcd - pointer to the HCD representing the controller
+ * @delay - Delay value to be used in command timer.
+ *
+ * wrapper function to call the set_cmd_timer_delay API of the host
+ * diver.
+ *
+ * return 0 on success; otherwise -ENODEV means the feature not
+ * supported by host driver.
+ */
+
+int usb_hcd_set_cmd_timer_delay(struct usb_hcd *hcd, int delay)
+{
+	int ret = -ENODEV;
+
+	if (hcd->driver->set_cmd_timer_delay)
+		ret = hcd->driver->set_cmd_timer_delay(hcd, delay);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(usb_hcd_set_cmd_timer_delay);
+
 /*-------------------------------------------------------------------------*/
 
 #if IS_ENABLED(CONFIG_USB_MON)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 1dde53f6eb31..f4fcac9ae692 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -366,9 +366,9 @@  void xhci_ring_cmd_db(struct xhci_hcd *xhci)
 	readl(&xhci->dba->doorbell[0]);
 }
 
-static bool xhci_mod_cmd_timer(struct xhci_hcd *xhci, unsigned long delay)
+static bool xhci_mod_cmd_timer(struct xhci_hcd *xhci)
 {
-	return mod_delayed_work(system_wq, &xhci->cmd_timer, delay);
+	return mod_delayed_work(system_wq, &xhci->cmd_timer, xhci->cmd_timer_delay);
 }
 
 static struct xhci_command *xhci_next_queued_cmd(struct xhci_hcd *xhci)
@@ -412,7 +412,7 @@  static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
 	if ((xhci->cmd_ring->dequeue != xhci->cmd_ring->enqueue) &&
 	    !(xhci->xhc_state & XHCI_STATE_DYING)) {
 		xhci->current_cmd = cur_cmd;
-		xhci_mod_cmd_timer(xhci, XHCI_CMD_DEFAULT_TIMEOUT);
+		xhci_mod_cmd_timer(xhci);
 		xhci_ring_cmd_db(xhci);
 	}
 }
@@ -1786,7 +1786,7 @@  static void handle_cmd_completion(struct xhci_hcd *xhci,
 	if (!list_is_singular(&xhci->cmd_list)) {
 		xhci->current_cmd = list_first_entry(&cmd->cmd_list,
 						struct xhci_command, cmd_list);
-		xhci_mod_cmd_timer(xhci, XHCI_CMD_DEFAULT_TIMEOUT);
+		xhci_mod_cmd_timer(xhci);
 	} else if (xhci->current_cmd == cmd) {
 		xhci->current_cmd = NULL;
 	}
@@ -4301,7 +4301,7 @@  static int queue_command(struct xhci_hcd *xhci, struct xhci_command *cmd,
 	/* if there are no other commands queued we start the timeout timer */
 	if (list_empty(&xhci->cmd_list)) {
 		xhci->current_cmd = cmd;
-		xhci_mod_cmd_timer(xhci, XHCI_CMD_DEFAULT_TIMEOUT);
+		xhci_mod_cmd_timer(xhci);
 	}
 
 	list_add_tail(&cmd->cmd_list, &xhci->cmd_list);
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index fae994f679d4..e1920af985ed 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -449,6 +449,8 @@  static int xhci_init(struct usb_hcd *hcd)
 		compliance_mode_recovery_timer_init(xhci);
 	}
 
+	xhci->cmd_timer_delay = XHCI_CMD_DEFAULT_TIMEOUT;
+
 	return retval;
 }
 
@@ -5267,6 +5269,18 @@  static void xhci_clear_tt_buffer_complete(struct usb_hcd *hcd,
 	spin_unlock_irqrestore(&xhci->lock, flags);
 }
 
+static int xhci_set_cmd_delay(struct usb_hcd *hcd, int cmd_delay_ms)
+{
+	struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+
+	if (cmd_delay_ms < 150 || cmd_delay_ms > 10000)
+		return -EINVAL;
+
+	xhci->cmd_timer_delay = cmd_delay_ms;
+
+	return 0;
+}
+
 static const struct hc_driver xhci_hc_driver = {
 	.description =		"xhci-hcd",
 	.product_desc =		"xHCI Host Controller",
@@ -5332,6 +5346,7 @@  static const struct hc_driver xhci_hc_driver = {
 	.disable_usb3_lpm_timeout =	xhci_disable_usb3_lpm_timeout,
 	.find_raw_port_number =	xhci_find_raw_port_number,
 	.clear_tt_buffer_complete = xhci_clear_tt_buffer_complete,
+	.set_cmd_timer_delay = xhci_set_cmd_delay,
 };
 
 void xhci_init_driver(struct hc_driver *drv,
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 7e282b4522c0..eb9a9b0bfa3d 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1808,6 +1808,7 @@  struct xhci_hcd {
 	struct list_head        cmd_list;
 	unsigned int		cmd_ring_reserved_trbs;
 	struct delayed_work	cmd_timer;
+	unsigned long		cmd_timer_delay;
 	struct completion	cmd_ring_stop_completion;
 	struct xhci_command	*current_cmd;
 
diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
index 4e9623e8492b..b6b0a0bbf9fc 100644
--- a/include/linux/usb/hcd.h
+++ b/include/linux/usb/hcd.h
@@ -405,6 +405,7 @@  struct hc_driver {
 #define EHSET_TEST_SINGLE_STEP_SET_FEATURE 0x06
 	int	(*submit_single_step_set_feature)(struct usb_hcd *,
 			struct urb *, int);
+	int	(*set_cmd_timer_delay)(struct usb_hcd *hcd, int delay);
 };
 
 static inline int hcd_giveback_urb_in_bh(struct usb_hcd *hcd)
@@ -466,6 +467,7 @@  extern void usb_remove_hcd(struct usb_hcd *hcd);
 extern int usb_hcd_find_raw_port_number(struct usb_hcd *hcd, int port1);
 int usb_hcd_setup_local_mem(struct usb_hcd *hcd, phys_addr_t phys_addr,
 			    dma_addr_t dma, size_t size);
+int usb_hcd_set_cmd_timer_delay(struct usb_hcd *hcd, int delay);
 
 struct platform_device;
 extern void usb_hcd_platform_shutdown(struct platform_device *dev);