mbox series

[v2,0/7] usb: Set quirks for xhci/dwc3 host mode

Message ID cover.1618014279.git.Thinh.Nguyen@synopsys.com
Headers show
Series usb: Set quirks for xhci/dwc3 host mode | expand

Message

Thinh Nguyen April 10, 2021, 12:46 a.m. UTC
This series add 3 new quirks for DWC_usb31 host mode:
 * XHCI_ISOC_BLOCKED_DISCONNECT
 * XHCI_LIMIT_FS_BI_INTR_EP
 * XHCI_LOST_DISCONNECT_QUIRK

Different versions of DWC_usb3x controllers have different quirks. Typically we
set them based on PCI device VID:PID or DT compatible strings. However, we know
that a particular IP version(s) may share a common quirk across different
platform. We can enable these quirks based on the IP type and version number.
This simplifies the designer work and consolidate the logic check. To do this,
we will need to expose the xHCI quirks to the common header along with the
private platform structure.


Changes in v2:
- Instead of combining xhci-plat private structure in xhci-squirks.h, keep it
  as a separate header file


Thinh Nguyen (7):
  usb: xhci: Move quirks definitions to common usb header
  usb: xhci: Move xhci-plat header to common usb header
  usb: xhci: Check for blocked disconnection
  usb: xhci: Workaround undercalculated BW for fullspeed BI
  usb: xhci: Rename Compliance mode timer quirk
  usb: xhci: Workaround lost disconnect port status
  usb: dwc3: host: Set quirks base on version

 drivers/usb/cdns3/host.c                      |   2 +-
 drivers/usb/dwc3/host.c                       |  22 +++
 drivers/usb/host/xhci-hub.c                   |  12 +-
 drivers/usb/host/xhci-mem.c                   |  26 ++++
 drivers/usb/host/xhci-plat.c                  |   2 +-
 drivers/usb/host/xhci-rcar.c                  |   2 +-
 drivers/usb/host/xhci-ring.c                  |  76 ++++++++++
 drivers/usb/host/xhci.c                       | 134 +++++++++++++-----
 drivers/usb/host/xhci.h                       |  71 ++--------
 .../host => include/linux/usb}/xhci-plat.h    |  18 +--
 include/linux/usb/xhci-quirks.h               |  65 +++++++++
 11 files changed, 326 insertions(+), 104 deletions(-)
 rename {drivers/usb/host => include/linux/usb}/xhci-plat.h (54%)
 create mode 100644 include/linux/usb/xhci-quirks.h


base-commit: 496960274153bdeb9d1f904ff1ea875cef8232c1

Comments

Thinh Nguyen April 19, 2021, 9:07 p.m. UTC | #1
Hi Mathias,

Thinh Nguyen wrote:
> This series add 3 new quirks for DWC_usb31 host mode:

>  * XHCI_ISOC_BLOCKED_DISCONNECT

>  * XHCI_LIMIT_FS_BI_INTR_EP

>  * XHCI_LOST_DISCONNECT_QUIRK

> 

> Different versions of DWC_usb3x controllers have different quirks. Typically we

> set them based on PCI device VID:PID or DT compatible strings. However, we know

> that a particular IP version(s) may share a common quirk across different

> platform. We can enable these quirks based on the IP type and version number.

> This simplifies the designer work and consolidate the logic check. To do this,

> we will need to expose the xHCI quirks to the common header along with the

> private platform structure.

> 

> 

> Changes in v2:

> - Instead of combining xhci-plat private structure in xhci-squirks.h, keep it

>   as a separate header file

> 

> 

> Thinh Nguyen (7):

>   usb: xhci: Move quirks definitions to common usb header

>   usb: xhci: Move xhci-plat header to common usb header

>   usb: xhci: Check for blocked disconnection

>   usb: xhci: Workaround undercalculated BW for fullspeed BI

>   usb: xhci: Rename Compliance mode timer quirk

>   usb: xhci: Workaround lost disconnect port status

>   usb: dwc3: host: Set quirks base on version

> 

>  drivers/usb/cdns3/host.c                      |   2 +-

>  drivers/usb/dwc3/host.c                       |  22 +++

>  drivers/usb/host/xhci-hub.c                   |  12 +-

>  drivers/usb/host/xhci-mem.c                   |  26 ++++

>  drivers/usb/host/xhci-plat.c                  |   2 +-

>  drivers/usb/host/xhci-rcar.c                  |   2 +-

>  drivers/usb/host/xhci-ring.c                  |  76 ++++++++++

>  drivers/usb/host/xhci.c                       | 134 +++++++++++++-----

>  drivers/usb/host/xhci.h                       |  71 ++--------

>  .../host => include/linux/usb}/xhci-plat.h    |  18 +--

>  include/linux/usb/xhci-quirks.h               |  65 +++++++++

>  11 files changed, 326 insertions(+), 104 deletions(-)

>  rename {drivers/usb/host => include/linux/usb}/xhci-plat.h (54%)

>  create mode 100644 include/linux/usb/xhci-quirks.h

> 

> 

> base-commit: 496960274153bdeb9d1f904ff1ea875cef8232c1

> 


Did you get a chance to take a look at this series?

Thanks,
Thinh
Mathias Nyman April 27, 2021, 1:08 p.m. UTC | #2
Hi Thinh

Sorry about the delay. 

On 10.4.2021 3.47, Thinh Nguyen wrote:
> If there is a device with active enhanced super-speed (eSS) isoc IN

> endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior)

> host controller will not detect the device disconnection until no more

> isoc URB is submitted. If there's a device disconnection, internally

> the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the

> other endpoints from being scheduled. So, it blocks the interrupt

> endpoint of the eSS hub indicating the port change status.

> 

> This can be an issue for applications that continuously submitting isoc

> URBs to the xHCI. To work around this, stop processing new URBs after 3

> consecutive isoc transaction errors. If new isoc transfers are queued

> after the device is disconnected, the host will respond with USB

> transaction error. After 3 consecutive USB transaction errors, the

> driver can wait a period of time (at least 2 * largest periodic interval

> of the topology) without ringing isoc endpoint doorbell to detect the

> port change status. If there is no disconnection detected, ring the

> endpoint doorbell to resume isoc transfers.


Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB.
And drivers like UVC queue several URBs in advance.

If I remember correctly then a transaction errors won't stop Isoch endpoints,
so waiting for 2 * Interval after 3 consecutive transaction errors might not
be enough.

How about stopping the endpoint after 3 consecutive transaction errors,
and restating it a bit later? 

> 

> This workaround tracks the max eSS periodic interval every time there's

> an endpoint added or dropped, which happens when there's bandwidth

> check. So, scan the topology and update the xhci->max_ess_interval

> whenever there's a bandwidth check. Introduced a new flag

> VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting

> for a disconnection status. After 2 * max_ess_interval time and no

> disconnection detected, a delayed work will ring the doorbell to resume

> the active isoc transfers.


Sounds very elaborate for a vendor specific disconnect workaround.
Isn't there a simpler way?

Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error,
wait for 2x hub interrupt interval time, and then restart the endpoints if there is
no disconnect?

There is bigger concern with this series, it scatters a lot of vendor specific code 
around the generic xhci driver. It's not very clear afterwards what code is part of the
workaround and what is generic code.

We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c
instead of using the generic platform driver with tons of workarounds and quirks.

Thanks
-Mathias
Thinh Nguyen April 27, 2021, 10:30 p.m. UTC | #3
Mathias Nyman wrote:
> Hi Thinh

> 

> Sorry about the delay. 


Np :)

> 

> On 10.4.2021 3.47, Thinh Nguyen wrote:

>> If there is a device with active enhanced super-speed (eSS) isoc IN

>> endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior)

>> host controller will not detect the device disconnection until no more

>> isoc URB is submitted. If there's a device disconnection, internally

>> the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the

>> other endpoints from being scheduled. So, it blocks the interrupt

>> endpoint of the eSS hub indicating the port change status.

>>

>> This can be an issue for applications that continuously submitting isoc

>> URBs to the xHCI. To work around this, stop processing new URBs after 3

>> consecutive isoc transaction errors. If new isoc transfers are queued

>> after the device is disconnected, the host will respond with USB

>> transaction error. After 3 consecutive USB transaction errors, the

>> driver can wait a period of time (at least 2 * largest periodic interval

>> of the topology) without ringing isoc endpoint doorbell to detect the

>> port change status. If there is no disconnection detected, ring the

>> endpoint doorbell to resume isoc transfers.

> 

> Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB.

> And drivers like UVC queue several URBs in advance.


That's fine as long as the driver stops ringing more doorbell for a
certain period of time creating a gap that's enough to get the
notification from the interrupt endpoint. We tested with 128 isoc URBs
and was able to detect a disconnect after this delay.

> 

> If I remember correctly then a transaction errors won't stop Isoch endpoints,

> so waiting for 2 * Interval after 3 consecutive transaction errors might not

> be enough.

> 

> How about stopping the endpoint after 3 consecutive transaction errors,

> and restating it a bit later?


There's no need to stop and restart the endpoint.

> 

>>

>> This workaround tracks the max eSS periodic interval every time there's

>> an endpoint added or dropped, which happens when there's bandwidth

>> check. So, scan the topology and update the xhci->max_ess_interval

>> whenever there's a bandwidth check. Introduced a new flag

>> VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting

>> for a disconnection status. After 2 * max_ess_interval time and no

>> disconnection detected, a delayed work will ring the doorbell to resume

>> the active isoc transfers.

> 

> Sounds very elaborate for a vendor specific disconnect workaround.

> Isn't there a simpler way?

> 

> Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error,

> wait for 2x hub interrupt interval time, and then restart the endpoints if there is

> no disconnect?


We can also do this (but without stop + restart the endpoints). It just
creates a slightly larger gap that may be more noticeable to the user if
there's no actual disconnection.

> 

> There is bigger concern with this series, it scatters a lot of vendor specific code 

> around the generic xhci driver. It's not very clear afterwards what code is part of the

> workaround and what is generic code.

> 

> We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c

> instead of using the generic platform driver with tons of workarounds and quirks.

> 


Thanks for the reviews. I need to look into how this can be done. May
need your suggestion as not every scenarios can be overridden
easily/cleanly.

What about the other quirks, do you have any comments?

Thanks,
Thinh
Mathias Nyman April 28, 2021, 1:32 p.m. UTC | #4
On 28.4.2021 1.30, Thinh Nguyen wrote:
>> On 10.4.2021 3.47, Thinh Nguyen wrote:

>>> If there is a device with active enhanced super-speed (eSS) isoc IN

>>> endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior)

>>> host controller will not detect the device disconnection until no more

>>> isoc URB is submitted. If there's a device disconnection, internally

>>> the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the

>>> other endpoints from being scheduled. So, it blocks the interrupt

>>> endpoint of the eSS hub indicating the port change status.

>>>

>>> This can be an issue for applications that continuously submitting isoc

>>> URBs to the xHCI. To work around this, stop processing new URBs after 3

>>> consecutive isoc transaction errors. If new isoc transfers are queued

>>> after the device is disconnected, the host will respond with USB

>>> transaction error. After 3 consecutive USB transaction errors, the

>>> driver can wait a period of time (at least 2 * largest periodic interval

>>> of the topology) without ringing isoc endpoint doorbell to detect the

>>> port change status. If there is no disconnection detected, ring the

>>> endpoint doorbell to resume isoc transfers.

>>

>> Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB.

>> And drivers like UVC queue several URBs in advance.

> 

> That's fine as long as the driver stops ringing more doorbell for a

> certain period of time creating a gap that's enough to get the

> notification from the interrupt endpoint. We tested with 128 isoc URBs

> and was able to detect a disconnect after this delay.


Ok, if not ringing the endpoint is enough then that is better than
stopping the whole endpoint. 

>>> This workaround tracks the max eSS periodic interval every time there's

>>> an endpoint added or dropped, which happens when there's bandwidth

>>> check. So, scan the topology and update the xhci->max_ess_interval

>>> whenever there's a bandwidth check. Introduced a new flag

>>> VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting

>>> for a disconnection status. After 2 * max_ess_interval time and no

>>> disconnection detected, a delayed work will ring the doorbell to resume

>>> the active isoc transfers.

>>

>> Sounds very elaborate for a vendor specific disconnect workaround.

>> Isn't there a simpler way?

>>

>> Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error,

>> wait for 2x hub interrupt interval time, and then restart the endpoints if there is

>> no disconnect?

> 

> We can also do this (but without stop + restart the endpoints). It just

> creates a slightly larger gap that may be more noticeable to the user if

> there's no actual disconnection.


Ok, if blocking the doorbell is enough then it sounds better.

How about that max interval tracking, is it necessary?
It will block the doorbell from 250us up to several seconds.
Is there some reasonable single value that can be used instead?

>>

>> There is bigger concern with this series, it scatters a lot of vendor specific code 

>> around the generic xhci driver. It's not very clear afterwards what code is part of the

>> workaround and what is generic code.

>>

>> We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c

>> instead of using the generic platform driver with tons of workarounds and quirks.

>>

> 

> Thanks for the reviews. I need to look into how this can be done. May

> need your suggestion as not every scenarios can be overridden

> easily/cleanly.


true, no easy overrides for this transfer error / doorbell blocking workaround.
Needs a bit of work

-Mathias
Thinh Nguyen April 29, 2021, 12:54 a.m. UTC | #5
Mathias Nyman wrote:
> On 28.4.2021 1.30, Thinh Nguyen wrote:

>>> On 10.4.2021 3.47, Thinh Nguyen wrote:

>>>> If there is a device with active enhanced super-speed (eSS) isoc IN

>>>> endpoint(s) behind one or more eSS hubs, DWC_usb31 (v1.90a and prior)

>>>> host controller will not detect the device disconnection until no more

>>>> isoc URB is submitted. If there's a device disconnection, internally

>>>> the wait for tHostTransactionTimeout (USB 3.2 spec 8.13) blocks the

>>>> other endpoints from being scheduled. So, it blocks the interrupt

>>>> endpoint of the eSS hub indicating the port change status.

>>>>

>>>> This can be an issue for applications that continuously submitting isoc

>>>> URBs to the xHCI. To work around this, stop processing new URBs after 3

>>>> consecutive isoc transaction errors. If new isoc transfers are queued

>>>> after the device is disconnected, the host will respond with USB

>>>> transaction error. After 3 consecutive USB transaction errors, the

>>>> driver can wait a period of time (at least 2 * largest periodic interval

>>>> of the topology) without ringing isoc endpoint doorbell to detect the

>>>> port change status. If there is no disconnection detected, ring the

>>>> endpoint doorbell to resume isoc transfers.

>>>

>>> Is that enough? many Isoc URBs queue 16 - 32 Isoc TRBs per URB.

>>> And drivers like UVC queue several URBs in advance.

>>

>> That's fine as long as the driver stops ringing more doorbell for a

>> certain period of time creating a gap that's enough to get the

>> notification from the interrupt endpoint. We tested with 128 isoc URBs

>> and was able to detect a disconnect after this delay.

> 

> Ok, if not ringing the endpoint is enough then that is better than

> stopping the whole endpoint. 

> 

>>>> This workaround tracks the max eSS periodic interval every time there's

>>>> an endpoint added or dropped, which happens when there's bandwidth

>>>> check. So, scan the topology and update the xhci->max_ess_interval

>>>> whenever there's a bandwidth check. Introduced a new flag

>>>> VDEV_DISCONN_CHECK_PENDING to prevent ringing the doorbell while waiting

>>>> for a disconnection status. After 2 * max_ess_interval time and no

>>>> disconnection detected, a delayed work will ring the doorbell to resume

>>>> the active isoc transfers.

>>>

>>> Sounds very elaborate for a vendor specific disconnect workaround.

>>> Isn't there a simpler way?

>>>

>>> Maybe stop all isoc in endpoints if one them has 3 consecutive transaction error,

>>> wait for 2x hub interrupt interval time, and then restart the endpoints if there is

>>> no disconnect?

>>

>> We can also do this (but without stop + restart the endpoints). It just

>> creates a slightly larger gap that may be more noticeable to the user if

>> there's no actual disconnection.

> 

> Ok, if blocking the doorbell is enough then it sounds better.

> 

> How about that max interval tracking, is it necessary?

> It will block the doorbell from 250us up to several seconds.

> Is there some reasonable single value that can be used instead?


Yeah, I agree with you. I think we can find a number that satisfies with
most applications while keeping this change small and less intrusive.

> 

>>>

>>> There is bigger concern with this series, it scatters a lot of vendor specific code 

>>> around the generic xhci driver. It's not very clear afterwards what code is part of the

>>> workaround and what is generic code.

>>>

>>> We just got a lot of the Mediatek code moved to xhci-mtk*, maybe its time to add xhci-snps.c

>>> instead of using the generic platform driver with tons of workarounds and quirks.

>>>

>>

>> Thanks for the reviews. I need to look into how this can be done. May

>> need your suggestion as not every scenarios can be overridden

>> easily/cleanly.

> 

> true, no easy overrides for this transfer error / doorbell blocking workaround.

> Needs a bit of work

> 


Thanks,
Thinh