mbox series

[0/2] usb: dwc3: gadget: fix scatter gather support

Message ID 20210421204837.4185-1-m.grzeschik@pengutronix.de
Headers show
Series usb: dwc3: gadget: fix scatter gather support | expand

Message

Michael Grzeschik April 21, 2021, 8:48 p.m. UTC
This patches fix the usage of scatter gather with the dwc3 gadget
driver. The driver is currently not using the variables pending_sgs
and queued_sgs as intended. This series fixes that.

Michael Grzeschik (2):
  dwc3: gadget: fix setting of pending_sgs
  dwc3: gadget: fix tracking of used sgs in request

 drivers/usb/dwc3/gadget.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Felipe Balbi April 22, 2021, 10:56 a.m. UTC | #1
Hi,

(subject format as well)

Michael Grzeschik <m.grzeschik@pengutronix.de> writes:
> The variable pending_sgs was used to keep track of handled

> sgs in one request. But instead queued_sgs is being decremented


no, it wasn't. If the total number of entries in the scatter list is 'x'
and we have transferred (completed) 'y' entries, then pending_sgs should
be (x - y).

> on every handled sg. This patch fixes the usage of the variable

> to use queued_sgs instead as intended.

>

> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

> ---

>  drivers/usb/dwc3/gadget.c | 8 ++++----

>  1 file changed, 4 insertions(+), 4 deletions(-)

>

> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

> index 118b5bcc565d6..2d7d861b13b31 100644

> --- a/drivers/usb/dwc3/gadget.c

> +++ b/drivers/usb/dwc3/gadget.c

> @@ -2856,7 +2856,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>  	struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>  	struct scatterlist *sg = req->sg;

>  	struct scatterlist *s;

> -	unsigned int pending = req->num_pending_sgs;

> +	unsigned int pending = req->num_queued_sgs;

>  	unsigned int i;

>  	int ret = 0;

>  

> @@ -2864,7 +2864,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>  		trb = &dep->trb_pool[dep->trb_dequeue];

>  

>  		req->sg = sg_next(s);

> -		req->num_pending_sgs--;

> +		req->num_queued_sgs--;


no, this is wrong. queued shouldn't be modified as it comes straight
from the gadget driver. This is the number of entries in the request
that the gadget driver gave us. We don't want to modify it.

>  

>  		ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>  				trb, event, status, true);

> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>  

>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>  {

> -	return req->num_pending_sgs == 0;

> +	return req->num_queued_sgs == 0;


nope, request is, indeed, completed when there no more pending entries
to be consumed.

What sort of problem are you dealing with? Got any way of reproducing
it? Got some trace output showing the issue?

-- 
balbi
Michael Grzeschik April 22, 2021, 8:18 p.m. UTC | #2
On Thu, Apr 22, 2021 at 01:56:09PM +0300, Felipe Balbi wrote:
>

>Hi,

>

>(subject format as well)

>

>Michael Grzeschik <m.grzeschik@pengutronix.de> writes:

>> The variable pending_sgs was used to keep track of handled

>> sgs in one request. But instead queued_sgs is being decremented

>

>no, it wasn't. If the total number of entries in the scatter list is 'x'

>and we have transferred (completed) 'y' entries, then pending_sgs should

>be (x - y).

>

>> on every handled sg. This patch fixes the usage of the variable

>> to use queued_sgs instead as intended.

>>

>> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>> ---

>>  drivers/usb/dwc3/gadget.c | 8 ++++----

>>  1 file changed, 4 insertions(+), 4 deletions(-)

>>

>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>> index 118b5bcc565d6..2d7d861b13b31 100644

>> --- a/drivers/usb/dwc3/gadget.c

>> +++ b/drivers/usb/dwc3/gadget.c

>> @@ -2856,7 +2856,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>  	struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>>  	struct scatterlist *sg = req->sg;

>>  	struct scatterlist *s;

>> -	unsigned int pending = req->num_pending_sgs;

>> +	unsigned int pending = req->num_queued_sgs;

>>  	unsigned int i;

>>  	int ret = 0;

>>

>> @@ -2864,7 +2864,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>  		trb = &dep->trb_pool[dep->trb_dequeue];

>>

>>  		req->sg = sg_next(s);

>> -		req->num_pending_sgs--;

>> +		req->num_queued_sgs--;

>

>no, this is wrong. queued shouldn't be modified as it comes straight

>from the gadget driver. This is the number of entries in the request

>that the gadget driver gave us. We don't want to modify it.


Right, but pending_sgs than has two use cases. One to track the mapped
sgs that got not queued. And one here to to track the "queued sgs" that
got dequeued.

>>

>>  		ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>>  				trb, event, status, true);

>> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>

>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>  {

>> -	return req->num_pending_sgs == 0;

>> +	return req->num_queued_sgs == 0;

>

>nope, request is, indeed, completed when there no more pending entries

>to be consumed.

>

>What sort of problem are you dealing with? Got any way of reproducing

>it? Got some trace output showing the issue?


I digged a little bit deeper to fully understand the issue here. What
I found is that in dwc3_prepare_trbs will make two assumptions on
num_pending_sgs.

When the real purpose of the variable is to track the dequeued trbs.
Than we have to fix the started list handling in the dwc3_prepare_trbs.

The comment in the function says:

         /*
          * We can get in a situation where there's a request in the started list
          * but there weren't enough TRBs to fully kick it in the first time
          * around, so it has been waiting for more TRBs to be freed up.
          *
          * In that case, we should check if we have a request with pending_sgs
          * in the started list and prepare TRBs for that request first,
          * otherwise we will prepare TRBs completely out of order and that will
          * break things.
          */
         list_for_each_entry(req, &dep->started_list, list) {
		if (req->num_pending_sgs > 0) {
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This condition seems to be made on a wrong assumption, thinking the
num_pending_sgs was decremented after dwc3_prepare_one_trb was called on parts
of that requests sgs but not all.

But the completion path can not also depend on that variable to be decremented
after parts of that sgs get handled. Therefor I came up with this second patch.

What do you think, would be the right way to solve this?


The second issue I see in dwc3_prepare_trbs is the bail out of iterations over
the pending and starting lists. Whenever one case of (req->num_pending_sgs > 0)
will be true after calling dwc3_prepare_trbs_sg, the function returns immediately.

In my case, where my uvc_video now enqueues up to 64 requests, every single
kick_transfer called from one ep_queue will ensure only one call of
dwc3_prepare_trbs_sg on one entry of the pending list. This bottleneck makes
the hardware refill to slow and the hardware will drain fast even though enough
pending buffers are there.

I suggest to remove those returns.

Regards,
Michael

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Thinh Nguyen April 22, 2021, 9:28 p.m. UTC | #3
Hi,

Michael Grzeschik wrote:
> On Thu, Apr 22, 2021 at 01:56:09PM +0300, Felipe Balbi wrote:

>>

>> Hi,

>>

>> (subject format as well)

>>

>> Michael Grzeschik <m.grzeschik@pengutronix.de> writes:

>>> The variable pending_sgs was used to keep track of handled

>>> sgs in one request. But instead queued_sgs is being decremented

>>

>> no, it wasn't. If the total number of entries in the scatter list is 'x'

>> and we have transferred (completed) 'y' entries, then pending_sgs should

>> be (x - y).

>>

>>> on every handled sg. This patch fixes the usage of the variable

>>> to use queued_sgs instead as intended.

>>>

>>> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>> ---

>>>  drivers/usb/dwc3/gadget.c | 8 ++++----

>>>  1 file changed, 4 insertions(+), 4 deletions(-)

>>>

>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>> index 118b5bcc565d6..2d7d861b13b31 100644

>>> --- a/drivers/usb/dwc3/gadget.c

>>> +++ b/drivers/usb/dwc3/gadget.c

>>> @@ -2856,7 +2856,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct

>>> dwc3_ep *dep,

>>>      struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>>>      struct scatterlist *sg = req->sg;

>>>      struct scatterlist *s;

>>> -    unsigned int pending = req->num_pending_sgs;

>>> +    unsigned int pending = req->num_queued_sgs;

>>>      unsigned int i;

>>>      int ret = 0;

>>>

>>> @@ -2864,7 +2864,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct

>>> dwc3_ep *dep,

>>>          trb = &dep->trb_pool[dep->trb_dequeue];

>>>

>>>          req->sg = sg_next(s);

>>> -        req->num_pending_sgs--;

>>> +        req->num_queued_sgs--;

>>

>> no, this is wrong. queued shouldn't be modified as it comes straight

>> from the gadget driver. This is the number of entries in the request

>> that the gadget driver gave us. We don't want to modify it.

> 

> Right, but pending_sgs than has two use cases. One to track the mapped

> sgs that got not queued. And one here to to track the "queued sgs" that

> got dequeued.

> 

>>>

>>>          ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>>>                  trb, event, status, true);

>>> @@ -2887,7 +2887,7 @@ static int

>>> dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>>

>>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>>  {

>>> -    return req->num_pending_sgs == 0;

>>> +    return req->num_queued_sgs == 0;

>>

>> nope, request is, indeed, completed when there no more pending entries

>> to be consumed.

>>

>> What sort of problem are you dealing with? Got any way of reproducing

>> it? Got some trace output showing the issue?

> 

> I digged a little bit deeper to fully understand the issue here. What

> I found is that in dwc3_prepare_trbs will make two assumptions on

> num_pending_sgs.

> 

> When the real purpose of the variable is to track the dequeued trbs.

> Than we have to fix the started list handling in the dwc3_prepare_trbs.

> 

> The comment in the function says:

> 

>         /*

>          * We can get in a situation where there's a request in the

> started list

>          * but there weren't enough TRBs to fully kick it in the first time

>          * around, so it has been waiting for more TRBs to be freed up.

>          *

>          * In that case, we should check if we have a request with

> pending_sgs

>          * in the started list and prepare TRBs for that request first,

>          * otherwise we will prepare TRBs completely out of order and

> that will

>          * break things.

>          */

>         list_for_each_entry(req, &dep->started_list, list) {

>         if (req->num_pending_sgs > 0) {

>         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> 

> This condition seems to be made on a wrong assumption, thinking the

> num_pending_sgs was decremented after dwc3_prepare_one_trb was called on

> parts

> of that requests sgs but not all.

> 

> But the completion path can not also depend on that variable to be

> decremented

> after parts of that sgs get handled. Therefor I came up with this second

> patch.

> 

> What do you think, would be the right way to solve this?

> 

> 

> The second issue I see in dwc3_prepare_trbs is the bail out of

> iterations over

> the pending and starting lists. Whenever one case of

> (req->num_pending_sgs > 0)

> will be true after calling dwc3_prepare_trbs_sg, the function returns

> immediately.

> 

> In my case, where my uvc_video now enqueues up to 64 requests, every single

> kick_transfer called from one ep_queue will ensure only one call of

> dwc3_prepare_trbs_sg on one entry of the pending list. This bottleneck

> makes

> the hardware refill to slow and the hardware will drain fast even though

> enough

> pending buffers are there.

> 

> I suggest to remove those returns.

> 


Are you referring to the early returns in the dwc3_prepare_trbs()? Then
no, you should not remove them. They only return early if the request is
completely prepared or the TRB ring is full and we can only prepare the
request partially.

Can you help clarify the bottleneck more as I'm still not clear of the
issue here.

The way it's implemented now keeps track of the all the available TRBs
and ensure the TRB ring is filled as soon as a request is completed.

Thanks,
Thinh
Felipe Balbi April 23, 2021, 6:17 a.m. UTC | #4
Hi,

Michael Grzeschik <mgr@pengutronix.de> writes:
> On Thu, Apr 22, 2021 at 01:56:09PM +0300, Felipe Balbi wrote:

>>

>>Hi,

>>

>>(subject format as well)

>>

>>Michael Grzeschik <m.grzeschik@pengutronix.de> writes:

>>> The variable pending_sgs was used to keep track of handled

>>> sgs in one request. But instead queued_sgs is being decremented

>>

>>no, it wasn't. If the total number of entries in the scatter list is 'x'

>>and we have transferred (completed) 'y' entries, then pending_sgs should

>>be (x - y).

>>

>>> on every handled sg. This patch fixes the usage of the variable

>>> to use queued_sgs instead as intended.

>>>

>>> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>> ---

>>>  drivers/usb/dwc3/gadget.c | 8 ++++----

>>>  1 file changed, 4 insertions(+), 4 deletions(-)

>>>

>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>> index 118b5bcc565d6..2d7d861b13b31 100644

>>> --- a/drivers/usb/dwc3/gadget.c

>>> +++ b/drivers/usb/dwc3/gadget.c

>>> @@ -2856,7 +2856,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>>  	struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>>>  	struct scatterlist *sg = req->sg;

>>>  	struct scatterlist *s;

>>> -	unsigned int pending = req->num_pending_sgs;

>>> +	unsigned int pending = req->num_queued_sgs;

>>>  	unsigned int i;

>>>  	int ret = 0;

>>>

>>> @@ -2864,7 +2864,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>>  		trb = &dep->trb_pool[dep->trb_dequeue];

>>>

>>>  		req->sg = sg_next(s);

>>> -		req->num_pending_sgs--;

>>> +		req->num_queued_sgs--;

>>

>>no, this is wrong. queued shouldn't be modified as it comes straight

>>from the gadget driver. This is the number of entries in the request

>>that the gadget driver gave us. We don't want to modify it.

>

> Right, but pending_sgs than has two use cases. One to track the mapped

> sgs that got not queued. And one here to to track the "queued sgs" that

> got dequeued.


no, we have num_mapped_sgs for the number of mapped sgs. It's just that
right before kicking the transfer the number of pending sgs and the
number of mapped sgs is the same.

>>>

>>>  		ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>>>  				trb, event, status, true);

>>> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>>

>>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>>  {

>>> -	return req->num_pending_sgs == 0;

>>> +	return req->num_queued_sgs == 0;

>>

>>nope, request is, indeed, completed when there no more pending entries

>>to be consumed.

>>

>>What sort of problem are you dealing with? Got any way of reproducing

>>it? Got some trace output showing the issue?

>

> I digged a little bit deeper to fully understand the issue here. What

> I found is that in dwc3_prepare_trbs will make two assumptions on

> num_pending_sgs.

>

> When the real purpose of the variable is to track the dequeued trbs.


its purpose is *not* to track the dequeued trbs.

> Than we have to fix the started list handling in the dwc3_prepare_trbs.

>

> The comment in the function says:

>

>          /*

>           * We can get in a situation where there's a request in the started list

>           * but there weren't enough TRBs to fully kick it in the first time

>           * around, so it has been waiting for more TRBs to be freed up.

>           *

>           * In that case, we should check if we have a request with pending_sgs

>           * in the started list and prepare TRBs for that request first,

>           * otherwise we will prepare TRBs completely out of order and that will

>           * break things.

>           */

>          list_for_each_entry(req, &dep->started_list, list) {

> 		if (req->num_pending_sgs > 0) {

> 		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>

> This condition seems to be made on a wrong assumption, thinking the

> num_pending_sgs was decremented after dwc3_prepare_one_trb was called on parts

> of that requests sgs but not all.


say num_mapped_sgs = 500. The driver has 255 TRBs for use (+1 link
trb). Clearly not all SGs will fit in one go. Driver will set
num_pending_sgs to 500, because that's the number of sgs that need to be
transferred.

For each completion we will decrement num_pending_sgs. Assuming all 255
were free and used up, num_pending_sgs will be 245 at this point. Driver
*must* try to kick these 245 TRBs.

> But the completion path can not also depend on that variable to be decremented

> after parts of that sgs get handled. Therefor I came up with this second patch.

>

> What do you think, would be the right way to solve this?


unless you can show that a problem really exists, I don't think we
should do anything. Got a minimal reproduction method somewhere?
Tracepoint of the problem?

> The second issue I see in dwc3_prepare_trbs is the bail out of iterations over

> the pending and starting lists. Whenever one case of (req->num_pending_sgs > 0)

> will be true after calling dwc3_prepare_trbs_sg, the function returns immediately.

>

> In my case, where my uvc_video now enqueues up to 64 requests, every single


64 requests is not (necessarily) the same as 64 TRBs or 64 SG
entries. You need to explain this a little better.

> kick_transfer called from one ep_queue will ensure only one call of

> dwc3_prepare_trbs_sg on one entry of the pending list. This bottleneck makes

> the hardware refill to slow and the hardware will drain fast even though enough

> pending buffers are there.


what's the size of the buffers? are they contiguous or held in scatter
lists? Got tracepoints of the problem in question?

> I suggest to remove those returns.


we may get there when you successfully show the existence of a problem
:-)

-- 
balbi
Felipe Balbi April 23, 2021, 11:15 a.m. UTC | #5
Hi,

Michael Grzeschik <mgr@pengutronix.de> writes:
>>>Michael Grzeschik <m.grzeschik@pengutronix.de> writes:

>>>>> The variable pending_sgs was used to keep track of handled

>>>>> sgs in one request. But instead queued_sgs is being decremented

>>>>

>>>>no, it wasn't. If the total number of entries in the scatter list is 'x'

>>>>and we have transferred (completed) 'y' entries, then pending_sgs should

>>>>be (x - y).

>>>>

>>>>> on every handled sg. This patch fixes the usage of the variable

>>>>> to use queued_sgs instead as intended.

>>>>>

>>>>> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>>>> ---

>>>>>  drivers/usb/dwc3/gadget.c | 8 ++++----

>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)

>>>>>

>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>>>> index 118b5bcc565d6..2d7d861b13b31 100644

>>>>> --- a/drivers/usb/dwc3/gadget.c

>>>>> +++ b/drivers/usb/dwc3/gadget.c

>>>>> @@ -2856,7 +2856,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>>>>  	struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>>>>>  	struct scatterlist *sg = req->sg;

>>>>>  	struct scatterlist *s;

>>>>> -	unsigned int pending = req->num_pending_sgs;

>>>>> +	unsigned int pending = req->num_queued_sgs;

>>>>>  	unsigned int i;

>>>>>  	int ret = 0;

>>>>>

>>>>> @@ -2864,7 +2864,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>>>>  		trb = &dep->trb_pool[dep->trb_dequeue];

>>>>>

>>>>>  		req->sg = sg_next(s);

>>>>> -		req->num_pending_sgs--;

>>>>> +		req->num_queued_sgs--;

>>>>

>>>>no, this is wrong. queued shouldn't be modified as it comes straight

>>>>from the gadget driver. This is the number of entries in the request

>>>>that the gadget driver gave us. We don't want to modify it.

>>>

>>> Right, but pending_sgs than has two use cases. One to track the mapped

>>> sgs that got not queued. And one here to to track the "queued sgs" that

>>> got dequeued.

>>

>>no, we have num_mapped_sgs for the number of mapped sgs. It's just that

>>right before kicking the transfer the number of pending sgs and the

>>number of mapped sgs is the same.

>>

>>>>>

>>>>>  		ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>>>>>  				trb, event, status, true);

>>>>> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>>>>

>>>>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>>>>  {

>>>>> -	return req->num_pending_sgs == 0;

>>>>> +	return req->num_queued_sgs == 0;

>>>>

>>>>nope, request is, indeed, completed when there no more pending entries

>>>>to be consumed.

>>>>

>>>>What sort of problem are you dealing with? Got any way of reproducing

>>>>it? Got some trace output showing the issue?

>>>

>>> I digged a little bit deeper to fully understand the issue here. What

>>> I found is that in dwc3_prepare_trbs will make two assumptions on

>>> num_pending_sgs.

>>>

>>> When the real purpose of the variable is to track the dequeued trbs.

>>

>>its purpose is *not* to track the dequeued trbs.

>>

>>> Than we have to fix the started list handling in the dwc3_prepare_trbs.

>>>

>>> The comment in the function says:

>>>

>>>          /*

>>>           * We can get in a situation where there's a request in the started list

>>>           * but there weren't enough TRBs to fully kick it in the first time

>>>           * around, so it has been waiting for more TRBs to be freed up.

>>>           *

>>>           * In that case, we should check if we have a request with pending_sgs

>>>           * in the started list and prepare TRBs for that request first,

>>>           * otherwise we will prepare TRBs completely out of order and that will

>>>           * break things.

>>>           */

>>>          list_for_each_entry(req, &dep->started_list, list) {

>>> 		if (req->num_pending_sgs > 0) {

>>> 		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>>

>>> This condition seems to be made on a wrong assumption, thinking the

>>> num_pending_sgs was decremented after dwc3_prepare_one_trb was called on parts

>>> of that requests sgs but not all.

>>

>>say num_mapped_sgs = 500. The driver has 255 TRBs for use (+1 link

>>trb). Clearly not all SGs will fit in one go. Driver will set

>>num_pending_sgs to 500, because that's the number of sgs that need to be

>>transferred.

>>

>>For each completion we will decrement num_pending_sgs. Assuming all 255

>>were free and used up, num_pending_sgs will be 245 at this point. Driver

>>*must* try to kick these 245 TRBs.

>

> Lets assume we have a request with length 1024 to 3072 like a multiple

> of maxpacket * mul it is used on in uvc_video.

>

> So with a scatterlist creating only a 2 byte header and 1022 to 3070

> bytes payload we get only two scatter entries in this request. On

> special cases like page boundaries we will add one extra sg, but this

> does not change much. So this is nothing big as your 500 example. Now

> lets assume we queue 64 of these up we end up with maximum fo 192 trbs

> being added to the pending list.

>

> Lets say every one of these will be queued with ep_queue comming from

> uvc_video_pump. The kick_transfer from these ep_queues will result in an

> dwc3_prepare_trbs. *But* after the first one being preprared from the

> pending list to the started list, it will have an req->num_preprared_sgs != 0.


there's no such thing as req->num_prepared_sgs, do you mean
num_pending_sgs here? If you do, that's per-request. Gadget driver can
allocate and queue as many requests as it
likes. req_{n}->num_pending_sgs has nothing to do with
req_{n+1}->num_pending_sgs.

> Therefor after the first one got queued into the starting list and it

> did not complete yet, every next ep_queue->kick_transfer will always bail out

> in dwc3_prepare_sgs, checking the one started request with num_preprared

> != 0, without moving other requests from pending to the starting list.


no it won't. They're testing a different req pointer altogether. Or they
should. If you're queueing the same exact req pointer before it has been
completed, shame on you :-p

> This will result in an stalled driver, that will never prepare any

> pending requests into the started list.


it shouldn't. Please collect tracepoint output, this is probably the 5th
time I've asked for tracepoint output of the problem.

>>> But the completion path can not also depend on that variable to be decremented

>>> after parts of that sgs get handled. Therefor I came up with this second patch.

>>>

>>> What do you think, would be the right way to solve this?

>>

>>unless you can show that a problem really exists, I don't think we

>>should do anything. Got a minimal reproduction method somewhere?

>>Tracepoint of the problem?

>

> Look into the attached 20210423_uvc_video_dwc3_gadget_trace. You will

> see my scenario I want to fix.


where exactly in the file is the problem? I took a snippet to look at:

>  irq/51-dwc3-239     [000] d..1    56.766818: dwc3_event: event (d08510ca): ep2in: Transfer Not Ready [0000d085] (Not Active)

> kworker/u5:1-74      [000] d..2    56.807742: dwc3_ep_queue: ep2in: req 00000000823b0b62 length 0/3072 zsI ==> -115

> kworker/u5:1-74      [000] d..2    56.807759: dwc3_prepare_trb: ep2in: trb 000000007ebd2707 (E1:D0) buf 000000000c623058 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

> kworker/u5:1-74      [000] d..2    56.807764: dwc3_prepare_trb: ep2in: trb 0000000093bb2bc4 (E2:D0) buf 000000000e200000 size 1x 3070 ctrl 00000c71:00000000 (Hlcs:SC:isoc)

> kworker/u5:1-74      [000] d..2    56.807777: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d1d00406] params 00000000 0a6ab000 00000000 --> status: Successful


So, Isoc endpoints are a bit "special". We always wait for XferNotReady
before starting them. This is what you see here.

> kworker/u5:1-74      [000] d..2    56.807805: dwc3_ep_queue: ep2in: req 000000006d3f8d0a length 0/3072 zsi ==> -115


When the following request is queued, TRANSFER_STARTED is set, but
PENDING_REQUEST shouldn't be set which would mean
__dwc3_gadget_kick_transfer() be called which should result in a call to
Update Transfer, unless I'm missing something here. It has been quite a
while since I looked at this driver deeply.

> kworker/u5:1-74      [000] d..2    56.807831: dwc3_ep_queue: ep2in: req 00000000f5394c5d length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.807857: dwc3_ep_queue: ep2in: req 00000000ef95b55c length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.807883: dwc3_ep_queue: ep2in: req 00000000d36cf0cc length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.807908: dwc3_ep_queue: ep2in: req 00000000c5f1bd4b length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.807934: dwc3_ep_queue: ep2in: req 00000000b7180b2d length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.807960: dwc3_ep_queue: ep2in: req 000000005d629ded length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.807985: dwc3_ep_queue: ep2in: req 0000000062e02cc0 length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.808011: dwc3_ep_queue: ep2in: req 000000001d2ec1c0 length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.808038: dwc3_ep_queue: ep2in: req 0000000018ba3d29 length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.808063: dwc3_ep_queue: ep2in: req 000000000be47fc7 length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.808089: dwc3_ep_queue: ep2in: req 00000000caba09e2 length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.808115: dwc3_ep_queue: ep2in: req 00000000cb0191c4 length 0/3072 zsi ==> -115

> kworker/u5:1-74      [000] d..2    56.808140: dwc3_ep_queue: ep2in: req 000000005255f606 length 0/3072 zsi ==> -115

>  irq/51-dwc3-239     [000] d..1    56.808240: dwc3_event: event (d1d0408a): ep2in: Transfer In Progress [0000d1d0] (sIm)


eventually we get Transfer In Progress for the first TRB...

>  irq/51-dwc3-239     [000] d..1    56.808246: dwc3_complete_trb: ep2in: trb 000000007ebd2707 (E2:D1) buf 000000000c623058 size 1x 0 ctrl 34740464:0000d1d0 (hlCs:Sc:isoc-first)

>  irq/51-dwc3-239     [000] d..1    56.808250: dwc3_complete_trb: ep2in: trb 0000000093bb2bc4 (E2:D2) buf 000000000e200000 size 1x 0 ctrl 34740c70:0000d1d0 (hlcs:SC:isoc)

>  irq/51-dwc3-239     [000] d..1    56.808256: dwc3_gadget_giveback: ep2in: req 00000000823b0b62 length 3072/3072 zsI ==> 0


complete and give it back.

>  irq/51-dwc3-239     [000] d..1    56.808305: dwc3_prepare_trb: ep2in: trb 000000004af35733 (E3:D2) buf 000000000c623088 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

>  irq/51-dwc3-239     [000] d..1    56.808309: dwc3_prepare_trb: ep2in: trb 00000000a11671a8 (E4:D2) buf 000000000e200bfe size 1x 3070 ctrl 00000471:00000000 (Hlcs:Sc:isoc)


prepare the next

>  irq/51-dwc3-239     [000] d..1    56.808320: dwc3_gadget_ep_cmd: ep2in: cmd 'Update Transfer' [50407] params 00000000 00000000 00000000 --> status: Successful


kick it by telling the HW about it. I don't see any failures here,
although there may be situations where we could add more data to the
controller's cache early on.

I remember dealing with a bunch of Bus Expiry errors in the past and the
problem was because we couldn't maintain proper alignment with the
microframe number because HW only gives us 14 out of the 16 bits of the
microframe number, the only workaround was to waiting for
XferInProgress/XferNotReady to get that number.

PS: if you collect more traces in the future, please don't enable
ftrace. Keep current_tracer set to nop, ftrace is just adding garbage to
the output in this case :-)

-- 
balbi
Michael Grzeschik April 23, 2021, 1:18 p.m. UTC | #6
Hi!

On Fri, Apr 23, 2021 at 02:15:33PM +0300, Felipe Balbi wrote:
>

>Hi,

>

>Michael Grzeschik <mgr@pengutronix.de> writes:

>>>>Michael Grzeschik <m.grzeschik@pengutronix.de> writes:

>>>>>> The variable pending_sgs was used to keep track of handled

>>>>>> sgs in one request. But instead queued_sgs is being decremented

>>>>>

>>>>>no, it wasn't. If the total number of entries in the scatter list is 'x'

>>>>>and we have transferred (completed) 'y' entries, then pending_sgs should

>>>>>be (x - y).

>>>>>

>>>>>> on every handled sg. This patch fixes the usage of the variable

>>>>>> to use queued_sgs instead as intended.

>>>>>>

>>>>>> Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>>>>> ---

>>>>>>  drivers/usb/dwc3/gadget.c | 8 ++++----

>>>>>>  1 file changed, 4 insertions(+), 4 deletions(-)

>>>>>>

>>>>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>>>>> index 118b5bcc565d6..2d7d861b13b31 100644

>>>>>> --- a/drivers/usb/dwc3/gadget.c

>>>>>> +++ b/drivers/usb/dwc3/gadget.c

>>>>>> @@ -2856,7 +2856,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>>>>>  	struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>>>>>>  	struct scatterlist *sg = req->sg;

>>>>>>  	struct scatterlist *s;

>>>>>> -	unsigned int pending = req->num_pending_sgs;

>>>>>> +	unsigned int pending = req->num_queued_sgs;

>>>>>>  	unsigned int i;

>>>>>>  	int ret = 0;

>>>>>>

>>>>>> @@ -2864,7 +2864,7 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>>>>>  		trb = &dep->trb_pool[dep->trb_dequeue];

>>>>>>

>>>>>>  		req->sg = sg_next(s);

>>>>>> -		req->num_pending_sgs--;

>>>>>> +		req->num_queued_sgs--;

>>>>>

>>>>>no, this is wrong. queued shouldn't be modified as it comes straight

>>>>>from the gadget driver. This is the number of entries in the request

>>>>>that the gadget driver gave us. We don't want to modify it.

>>>>

>>>> Right, but pending_sgs than has two use cases. One to track the mapped

>>>> sgs that got not queued. And one here to to track the "queued sgs" that

>>>> got dequeued.

>>>

>>>no, we have num_mapped_sgs for the number of mapped sgs. It's just that

>>>right before kicking the transfer the number of pending sgs and the

>>>number of mapped sgs is the same.

>>>

>>>>>>

>>>>>>  		ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>>>>>>  				trb, event, status, true);

>>>>>> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>>>>>

>>>>>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>>>>>  {

>>>>>> -	return req->num_pending_sgs == 0;

>>>>>> +	return req->num_queued_sgs == 0;

>>>>>

>>>>>nope, request is, indeed, completed when there no more pending entries

>>>>>to be consumed.

>>>>>

>>>>>What sort of problem are you dealing with? Got any way of reproducing

>>>>>it? Got some trace output showing the issue?

>>>>

>>>> I digged a little bit deeper to fully understand the issue here. What

>>>> I found is that in dwc3_prepare_trbs will make two assumptions on

>>>> num_pending_sgs.

>>>>

>>>> When the real purpose of the variable is to track the dequeued trbs.

>>>

>>>its purpose is *not* to track the dequeued trbs.

>>>

>>>> Than we have to fix the started list handling in the dwc3_prepare_trbs.

>>>>

>>>> The comment in the function says:

>>>>

>>>>          /*

>>>>           * We can get in a situation where there's a request in the started list

>>>>           * but there weren't enough TRBs to fully kick it in the first time

>>>>           * around, so it has been waiting for more TRBs to be freed up.

>>>>           *

>>>>           * In that case, we should check if we have a request with pending_sgs

>>>>           * in the started list and prepare TRBs for that request first,

>>>>           * otherwise we will prepare TRBs completely out of order and that will

>>>>           * break things.

>>>>           */

>>>>          list_for_each_entry(req, &dep->started_list, list) {

>>>> 		if (req->num_pending_sgs > 0) {

>>>> 		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>>>

>>>> This condition seems to be made on a wrong assumption, thinking the

>>>> num_pending_sgs was decremented after dwc3_prepare_one_trb was called on parts

>>>> of that requests sgs but not all.

>>>

>>>say num_mapped_sgs = 500. The driver has 255 TRBs for use (+1 link

>>>trb). Clearly not all SGs will fit in one go. Driver will set

>>>num_pending_sgs to 500, because that's the number of sgs that need to be

>>>transferred.

>>>

>>>For each completion we will decrement num_pending_sgs. Assuming all 255

>>>were free and used up, num_pending_sgs will be 245 at this point. Driver

>>>*must* try to kick these 245 TRBs.

>>

>> Lets assume we have a request with length 1024 to 3072 like a multiple

>> of maxpacket * mul it is used on in uvc_video.

>>

>> So with a scatterlist creating only a 2 byte header and 1022 to 3070

>> bytes payload we get only two scatter entries in this request. On

>> special cases like page boundaries we will add one extra sg, but this

>> does not change much. So this is nothing big as your 500 example. Now

>> lets assume we queue 64 of these up we end up with maximum fo 192 trbs

>> being added to the pending list.

>>

>> Lets say every one of these will be queued with ep_queue comming from

>> uvc_video_pump. The kick_transfer from these ep_queues will result in an

>> dwc3_prepare_trbs. *But* after the first one being preprared from the

>> pending list to the started list, it will have an req->num_preprared_sgs != 0.

>

>there's no such thing as req->num_prepared_sgs, do you mean

>num_pending_sgs here?


Sure, I got this mixed up. Just keep in mind num_prepared_sgs was
ment to be num_pending_sgs.

>If you do, that's per-request. Gadget driver can

>allocate and queue as many requests as it

>likes. req_{n}->num_pending_sgs has nothing to do with

>req_{n+1}->num_pending_sgs.


That is also how I understand this.

>> Therefor after the first one got queued into the starting list and it

>> did not complete yet, every next ep_queue->kick_transfer will always bail out

>> in dwc3_prepare_sgs, checking the one started request with num_preprared

>> != 0, without moving other requests from pending to the starting list.

>

>no it won't. They're testing a different req pointer altogether. Or they

>should. If you're queueing the same exact req pointer before it has been

>completed, shame on you :-p


Yes, they will. Even if I will queue a different request. The function
dwc3_prepare_sgs will check the list of started request and will bail
out with ret = 0.

As I already mentioned dwc3_prepare_sgs will bail out early after
the first started request will have an num_prepared_sgs != 0.

Look into that code in dwc3_prepare_sgs:

---
1289         list_for_each_entry(req, &dep->started_list, list) {
1290                 if (req->num_pending_sgs > 0) {
1291                         ret = dwc3_prepare_trbs_sg(dep, req);
1292                         if (!ret || req->num_pending_sgs)
1293                                 return ret;
1294                 }
---

Since every new request being queued will trigger dwc3_prepare_sgs
to check the started list for request with num_pending_sgs, it will
absolutely run dwc3_prepare_sgs_sg for the request that did not finish yet.

The call will set ret to 0 since nothing changed. Then as num_pending_sgs for
that started request is still equal to num_queued_sgs it will bail out from
that function and will not prepare this actual new request in the pending list.

After dwc3_prepare_sgs will come back to kick_transfer, it will even not result
into an update transfer command since the following condition prevents that.

---
1354         /*
1355          * Note that it's normal to have no new TRBs prepared (i.e. ret == 0).
1356          * This happens when we need to stop and restart a transfer such as in
1357          * the case of reinitiating a stream or retrying an isoc transfer.
1358          */
1359         ret = dwc3_prepare_trbs(dep);
1360         if (ret < 0)
1361                 return ret;
1362
1363         starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED);
1364
1365         /*
1366          * If there's no new TRB prepared and we don't need to restart a
1367          * transfer, there's no need to update the transfer.
1368          */
1369         if (!ret && !starting)
1370                 return ret;
---

with ret = 0 and starting = 0:

if (!0 && !0)
	return 0;



>> This will result in an stalled driver, that will never prepare any

>> pending requests into the started list.

>

>it shouldn't. Please collect tracepoint output, this is probably the 5th

>time I've asked for tracepoint output of the problem.


I send you the traceback. I used trace_event=dwc3:* in kernel commandline.

>

>>>> But the completion path can not also depend on that variable to be decremented

>>>> after parts of that sgs get handled. Therefor I came up with this second patch.

>>>>

>>>> What do you think, would be the right way to solve this?

>>>

>>>unless you can show that a problem really exists, I don't think we

>>>should do anything. Got a minimal reproduction method somewhere?

>>>Tracepoint of the problem?

>>

>> Look into the attached 20210423_uvc_video_dwc3_gadget_trace. You will

>> see my scenario I want to fix.

>

>where exactly in the file is the problem? I took a snippet to look at:


That is the place you should look. Correct.

>>  irq/51-dwc3-239     [000] d..1    56.766818: dwc3_event: event (d08510ca): ep2in: Transfer Not Ready [0000d085] (Not Active)

>> kworker/u5:1-74      [000] d..2    56.807742: dwc3_ep_queue: ep2in: req 00000000823b0b62 length 0/3072 zsI ==> -115

>> kworker/u5:1-74      [000] d..2    56.807759: dwc3_prepare_trb: ep2in: trb 000000007ebd2707 (E1:D0) buf 000000000c623058 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

>> kworker/u5:1-74      [000] d..2    56.807764: dwc3_prepare_trb: ep2in: trb 0000000093bb2bc4 (E2:D0) buf 000000000e200000 size 1x 3070 ctrl 00000c71:00000000 (Hlcs:SC:isoc)

>> kworker/u5:1-74      [000] d..2    56.807777: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d1d00406] params 00000000 0a6ab000 00000000 --> status: Successful

>

>So, Isoc endpoints are a bit "special". We always wait for XferNotReady

>before starting them. This is what you see here.


Yes, they are special.

>> kworker/u5:1-74      [000] d..2    56.807805: dwc3_ep_queue: ep2in: req 000000006d3f8d0a length 0/3072 zsi ==> -115

>

>When the following request is queued, TRANSFER_STARTED is set, but

>PENDING_REQUEST shouldn't be set which would mean

>__dwc3_gadget_kick_transfer() be called which should result in a call to

>Update Transfer, unless I'm missing something here. It has been quite a

>while since I looked at this driver deeply.


Yes this is the case. Update transfer will be called but only
if dwc3_prepare_trbs did come back with ret != 0.

>

>> kworker/u5:1-74      [000] d..2    56.807831: dwc3_ep_queue: ep2in: req 00000000f5394c5d length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.807857: dwc3_ep_queue: ep2in: req 00000000ef95b55c length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.807883: dwc3_ep_queue: ep2in: req 00000000d36cf0cc length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.807908: dwc3_ep_queue: ep2in: req 00000000c5f1bd4b length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.807934: dwc3_ep_queue: ep2in: req 00000000b7180b2d length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.807960: dwc3_ep_queue: ep2in: req 000000005d629ded length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.807985: dwc3_ep_queue: ep2in: req 0000000062e02cc0 length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.808011: dwc3_ep_queue: ep2in: req 000000001d2ec1c0 length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.808038: dwc3_ep_queue: ep2in: req 0000000018ba3d29 length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.808063: dwc3_ep_queue: ep2in: req 000000000be47fc7 length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.808089: dwc3_ep_queue: ep2in: req 00000000caba09e2 length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.808115: dwc3_ep_queue: ep2in: req 00000000cb0191c4 length 0/3072 zsi ==> -115

>> kworker/u5:1-74      [000] d..2    56.808140: dwc3_ep_queue: ep2in: req 000000005255f606 length 0/3072 zsi ==> -115

>>  irq/51-dwc3-239     [000] d..1    56.808240: dwc3_event: event (d1d0408a): ep2in: Transfer In Progress [0000d1d0] (sIm)

>

>eventually we get Transfer In Progress for the first TRB...

>

>>  irq/51-dwc3-239     [000] d..1    56.808246: dwc3_complete_trb: ep2in: trb 000000007ebd2707 (E2:D1) buf 000000000c623058 size 1x 0 ctrl 34740464:0000d1d0 (hlCs:Sc:isoc-first)

>>  irq/51-dwc3-239     [000] d..1    56.808250: dwc3_complete_trb: ep2in: trb 0000000093bb2bc4 (E2:D2) buf 000000000e200000 size 1x 0 ctrl 34740c70:0000d1d0 (hlcs:SC:isoc)

>>  irq/51-dwc3-239     [000] d..1    56.808256: dwc3_gadget_giveback: ep2in: req 00000000823b0b62 length 3072/3072 zsI ==> 0

>

>complete and give it back.

>

>>  irq/51-dwc3-239     [000] d..1    56.808305: dwc3_prepare_trb: ep2in: trb 000000004af35733 (E3:D2) buf 000000000c623088 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

>>  irq/51-dwc3-239     [000] d..1    56.808309: dwc3_prepare_trb: ep2in: trb 00000000a11671a8 (E4:D2) buf 000000000e200bfe size 1x 3070 ctrl 00000471:00000000 (Hlcs:Sc:isoc)

>

>prepare the next

>

>>  irq/51-dwc3-239     [000] d..1    56.808320: dwc3_gadget_ep_cmd: ep2in: cmd 'Update Transfer' [50407] params 00000000 00000000 00000000 --> status: Successful

>

>kick it by telling the HW about it. I don't see any failures here,

>although there may be situations where we could add more data to the

>controller's cache early on.


I back that Idea of adding more data to the controllers cache early.
The second one you see is already way to late for an update. It comes
back as an Missed Interrupt as the Update was to late for the transfer.

"irq/51-dwc3-239     [000] d..1    56.808460: dwc3_event: event (d1d1808a): ep2in: Transfer In Progress [0000d1d1] (siM)"

After the missed transfer the last transfer will be ended and another transfer is prepared.

"irq/51-dwc3-239     [000] d..1    56.815974: dwc3_event: event (d20e10ca): ep2in: Transfer Not Ready [0000d20e] (Not Active)"

But after that and an successfull next start transfer no further completion interrupts will
show up. That seems broken.

"irq/51-dwc3-239     [000] d..1    56.815996: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d2120406] params 00000000 0a6ab040 00000000 --> status: Successful"

It looks like the underlying issue I could avoid with my patches. However not
updating the transfer and running into missed interrupts is another issue
that needs to be solved.

>I remember dealing with a bunch of Bus Expiry errors in the past and the

>problem was because we couldn't maintain proper alignment with the

>microframe number because HW only gives us 14 out of the 16 bits of the

>microframe number, the only workaround was to waiting for

>XferInProgress/XferNotReady to get that number.


I think the bus expiry or microframe handling is not a problem here.

>PS: if you collect more traces in the future, please don't enable

>ftrace. Keep current_tracer set to nop, ftrace is just adding garbage to

>the output in this case :-)


Yes, I will keep them non verbose.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Felipe Balbi April 24, 2021, 8:16 a.m. UTC | #7
Hi,

Michael Grzeschik <mgr@pengutronix.de> writes:
>>>>>>> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>>>>>>

>>>>>>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>>>>>>  {

>>>>>>> -	return req->num_pending_sgs == 0;

>>>>>>> +	return req->num_queued_sgs == 0;

>>>>>>

>>>>>>nope, request is, indeed, completed when there no more pending entries

>>>>>>to be consumed.

>>>>>>

>>>>>>What sort of problem are you dealing with? Got any way of reproducing

>>>>>>it? Got some trace output showing the issue?

>>>>>

>>>>> I digged a little bit deeper to fully understand the issue here. What

>>>>> I found is that in dwc3_prepare_trbs will make two assumptions on

>>>>> num_pending_sgs.

>>>>>

>>>>> When the real purpose of the variable is to track the dequeued trbs.

>>>>

>>>>its purpose is *not* to track the dequeued trbs.

>>>>

>>>>> Than we have to fix the started list handling in the dwc3_prepare_trbs.

>>>>>

>>>>> The comment in the function says:

>>>>>

>>>>>          /*

>>>>>           * We can get in a situation where there's a request in the started list

>>>>>           * but there weren't enough TRBs to fully kick it in the first time

>>>>>           * around, so it has been waiting for more TRBs to be freed up.

>>>>>           *

>>>>>           * In that case, we should check if we have a request with pending_sgs

>>>>>           * in the started list and prepare TRBs for that request first,

>>>>>           * otherwise we will prepare TRBs completely out of order and that will

>>>>>           * break things.

>>>>>           */

>>>>>          list_for_each_entry(req, &dep->started_list, list) {

>>>>> 		if (req->num_pending_sgs > 0) {

>>>>> 		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>>>>

>>>>> This condition seems to be made on a wrong assumption, thinking the

>>>>> num_pending_sgs was decremented after dwc3_prepare_one_trb was called on parts

>>>>> of that requests sgs but not all.

>>>>

>>>>say num_mapped_sgs = 500. The driver has 255 TRBs for use (+1 link

>>>>trb). Clearly not all SGs will fit in one go. Driver will set

>>>>num_pending_sgs to 500, because that's the number of sgs that need to be

>>>>transferred.

>>>>

>>>>For each completion we will decrement num_pending_sgs. Assuming all 255

>>>>were free and used up, num_pending_sgs will be 245 at this point. Driver

>>>>*must* try to kick these 245 TRBs.

>>>

>>> Lets assume we have a request with length 1024 to 3072 like a multiple

>>> of maxpacket * mul it is used on in uvc_video.

>>>

>>> So with a scatterlist creating only a 2 byte header and 1022 to 3070

>>> bytes payload we get only two scatter entries in this request. On

>>> special cases like page boundaries we will add one extra sg, but this

>>> does not change much. So this is nothing big as your 500 example. Now

>>> lets assume we queue 64 of these up we end up with maximum fo 192 trbs

>>> being added to the pending list.

>>>

>>> Lets say every one of these will be queued with ep_queue comming from

>>> uvc_video_pump. The kick_transfer from these ep_queues will result in an

>>> dwc3_prepare_trbs. *But* after the first one being preprared from the

>>> pending list to the started list, it will have an req->num_preprared_sgs != 0.

>>

>>there's no such thing as req->num_prepared_sgs, do you mean

>>num_pending_sgs here?

>

> Sure, I got this mixed up. Just keep in mind num_prepared_sgs was

> ment to be num_pending_sgs.


no worries :-)

>>> Therefor after the first one got queued into the starting list and it

>>> did not complete yet, every next ep_queue->kick_transfer will always bail out

>>> in dwc3_prepare_sgs, checking the one started request with num_preprared

>>> != 0, without moving other requests from pending to the starting list.

>>

>>no it won't. They're testing a different req pointer altogether. Or they

>>should. If you're queueing the same exact req pointer before it has been

>>completed, shame on you :-p

>

> Yes, they will. Even if I will queue a different request. The function

> dwc3_prepare_sgs will check the list of started request and will bail

> out with ret = 0.

>

> As I already mentioned dwc3_prepare_sgs will bail out early after

> the first started request will have an num_prepared_sgs != 0.

>

> Look into that code in dwc3_prepare_sgs:

>

> ---

> 1289         list_for_each_entry(req, &dep->started_list, list) {

> 1290                 if (req->num_pending_sgs > 0) {

> 1291                         ret = dwc3_prepare_trbs_sg(dep, req);

> 1292                         if (!ret || req->num_pending_sgs)

> 1293                                 return ret;

> 1294                 }

> ---

>

> Since every new request being queued will trigger dwc3_prepare_sgs

> to check the started list for request with num_pending_sgs, it will

> absolutely run dwc3_prepare_sgs_sg for the request that did not finish yet.

>

> The call will set ret to 0 since nothing changed. Then as num_pending_sgs for

> that started request is still equal to num_queued_sgs it will bail out from

> that function and will not prepare this actual new request in the pending list.

>

> After dwc3_prepare_sgs will come back to kick_transfer, it will even not result

> into an update transfer command since the following condition prevents that.

>

> ---

> 1354         /*

> 1355          * Note that it's normal to have no new TRBs prepared (i.e. ret == 0).

> 1356          * This happens when we need to stop and restart a transfer such as in

> 1357          * the case of reinitiating a stream or retrying an isoc transfer.

> 1358          */

> 1359         ret = dwc3_prepare_trbs(dep);

> 1360         if (ret < 0)

> 1361                 return ret;

> 1362

> 1363         starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED);

> 1364

> 1365         /*

> 1366          * If there's no new TRB prepared and we don't need to restart a

> 1367          * transfer, there's no need to update the transfer.

> 1368          */

> 1369         if (!ret && !starting)

> 1370                 return ret;

> ---

>

> with ret = 0 and starting = 0:

>

> if (!0 && !0)

> 	return 0;


yeah, I suggest looking at the history of gadget.c WRT isochronous
transfers. Also, it seems to me that we have a minor bug where once
PENDING_REQUEST is set, it's never cleared. Perhaps that's the only
thing that needs solving here?

>>> This will result in an stalled driver, that will never prepare any

>>> pending requests into the started list.

>>

>>it shouldn't. Please collect tracepoint output, this is probably the 5th

>>time I've asked for tracepoint output of the problem.

>

> I send you the traceback. I used trace_event=dwc3:* in kernel commandline.


sorry about this, I wrote the comment before seeing the trace and forgot
to remove before sending :-)

>>When the following request is queued, TRANSFER_STARTED is set, but

>>PENDING_REQUEST shouldn't be set which would mean

>>__dwc3_gadget_kick_transfer() be called which should result in a call to

>>Update Transfer, unless I'm missing something here. It has been quite a

>>while since I looked at this driver deeply.

>

> Yes this is the case. Update transfer will be called but only

> if dwc3_prepare_trbs did come back with ret != 0.

>

>>

>>> kworker/u5:1-74      [000] d..2    56.807831: dwc3_ep_queue: ep2in: req 00000000f5394c5d length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.807857: dwc3_ep_queue: ep2in: req 00000000ef95b55c length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.807883: dwc3_ep_queue: ep2in: req 00000000d36cf0cc length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.807908: dwc3_ep_queue: ep2in: req 00000000c5f1bd4b length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.807934: dwc3_ep_queue: ep2in: req 00000000b7180b2d length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.807960: dwc3_ep_queue: ep2in: req 000000005d629ded length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.807985: dwc3_ep_queue: ep2in: req 0000000062e02cc0 length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.808011: dwc3_ep_queue: ep2in: req 000000001d2ec1c0 length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.808038: dwc3_ep_queue: ep2in: req 0000000018ba3d29 length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.808063: dwc3_ep_queue: ep2in: req 000000000be47fc7 length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.808089: dwc3_ep_queue: ep2in: req 00000000caba09e2 length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.808115: dwc3_ep_queue: ep2in: req 00000000cb0191c4 length 0/3072 zsi ==> -115

>>> kworker/u5:1-74      [000] d..2    56.808140: dwc3_ep_queue: ep2in: req 000000005255f606 length 0/3072 zsi ==> -115

>>>  irq/51-dwc3-239     [000] d..1    56.808240: dwc3_event: event (d1d0408a): ep2in: Transfer In Progress [0000d1d0] (sIm)

>>

>>eventually we get Transfer In Progress for the first TRB...

>>

>>>  irq/51-dwc3-239     [000] d..1    56.808246: dwc3_complete_trb: ep2in: trb 000000007ebd2707 (E2:D1) buf 000000000c623058 size 1x 0 ctrl 34740464:0000d1d0 (hlCs:Sc:isoc-first)

>>>  irq/51-dwc3-239     [000] d..1    56.808250: dwc3_complete_trb: ep2in: trb 0000000093bb2bc4 (E2:D2) buf 000000000e200000 size 1x 0 ctrl 34740c70:0000d1d0 (hlcs:SC:isoc)

>>>  irq/51-dwc3-239     [000] d..1    56.808256: dwc3_gadget_giveback: ep2in: req 00000000823b0b62 length 3072/3072 zsI ==> 0

>>

>>complete and give it back.

>>

>>>  irq/51-dwc3-239     [000] d..1    56.808305: dwc3_prepare_trb: ep2in: trb 000000004af35733 (E3:D2) buf 000000000c623088 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

>>>  irq/51-dwc3-239     [000] d..1    56.808309: dwc3_prepare_trb: ep2in: trb 00000000a11671a8 (E4:D2) buf 000000000e200bfe size 1x 3070 ctrl 00000471:00000000 (Hlcs:Sc:isoc)

>>

>>prepare the next

>>

>>>  irq/51-dwc3-239     [000] d..1    56.808320: dwc3_gadget_ep_cmd: ep2in: cmd 'Update Transfer' [50407] params 00000000 00000000 00000000 --> status: Successful

>>

>>kick it by telling the HW about it. I don't see any failures here,

>>although there may be situations where we could add more data to the

>>controller's cache early on.

>

> I back that Idea of adding more data to the controllers cache early.


right, and that's where things get complicated because we don't have
enough information from ISOC endpoints to schedule transfers to correct
microframes. We're missing two bits which are only shared with us by the
HW during XferInProgress and XferNotReady (IIRC).

> The second one you see is already way to late for an update. It comes

> back as an Missed Interrupt as the Update was to late for the transfer.

>

> "irq/51-dwc3-239     [000] d..1    56.808460: dwc3_event: event (d1d1808a): ep2in: Transfer In Progress [0000d1d1] (siM)"

>

> After the missed transfer the last transfer will be ended and another transfer is prepared.

>

> "irq/51-dwc3-239     [000] d..1    56.815974: dwc3_event: event (d20e10ca): ep2in: Transfer Not Ready [0000d20e] (Not Active)"

>

> But after that and an successfull next start transfer no further completion interrupts will

> show up. That seems broken.


oh, that's messed up. We should get XferInProgress.

> "irq/51-dwc3-239     [000] d..1    56.815996: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d2120406] params 00000000 0a6ab040 00000000 --> status: Successful"

>

> It looks like the underlying issue I could avoid with my patches. However not

> updating the transfer and running into missed interrupts is another issue

> that needs to be solved.


my concern is that you may be regressing other parts of the driver :-)

How about we look into the fact that PENDING_REQUEST is never cleared
first?

>>I remember dealing with a bunch of Bus Expiry errors in the past and the

>>problem was because we couldn't maintain proper alignment with the

>>microframe number because HW only gives us 14 out of the 16 bits of the

>>microframe number, the only workaround was to waiting for

>>XferInProgress/XferNotReady to get that number.

>

> I think the bus expiry or microframe handling is not a problem here.


Not in this case because we're never pre-starting transfers. What you
suggest will pre-start transfers and we will have regressions.

-- 
balbi
Michael Grzeschik April 24, 2021, 9:12 a.m. UTC | #8
Hi!

On Sat, Apr 24, 2021 at 11:16:39AM +0300, Felipe Balbi wrote:
>

>Hi,

>

>Michael Grzeschik <mgr@pengutronix.de> writes:

>>>>>>>> @@ -2887,7 +2887,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>>>>>>>

>>>>>>>>  static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>>>>>>>  {

>>>>>>>> -	return req->num_pending_sgs == 0;

>>>>>>>> +	return req->num_queued_sgs == 0;

>>>>>>>

>>>>>>>nope, request is, indeed, completed when there no more pending entries

>>>>>>>to be consumed.

>>>>>>>

>>>>>>>What sort of problem are you dealing with? Got any way of reproducing

>>>>>>>it? Got some trace output showing the issue?

>>>>>>

>>>>>> I digged a little bit deeper to fully understand the issue here. What

>>>>>> I found is that in dwc3_prepare_trbs will make two assumptions on

>>>>>> num_pending_sgs.

>>>>>>

>>>>>> When the real purpose of the variable is to track the dequeued trbs.

>>>>>

>>>>>its purpose is *not* to track the dequeued trbs.

>>>>>

>>>>>> Than we have to fix the started list handling in the dwc3_prepare_trbs.

>>>>>>

>>>>>> The comment in the function says:

>>>>>>

>>>>>>          /*

>>>>>>           * We can get in a situation where there's a request in the started list

>>>>>>           * but there weren't enough TRBs to fully kick it in the first time

>>>>>>           * around, so it has been waiting for more TRBs to be freed up.

>>>>>>           *

>>>>>>           * In that case, we should check if we have a request with pending_sgs

>>>>>>           * in the started list and prepare TRBs for that request first,

>>>>>>           * otherwise we will prepare TRBs completely out of order and that will

>>>>>>           * break things.

>>>>>>           */

>>>>>>          list_for_each_entry(req, &dep->started_list, list) {

>>>>>> 		if (req->num_pending_sgs > 0) {

>>>>>> 		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>>>>>>

>>>>>> This condition seems to be made on a wrong assumption, thinking the

>>>>>> num_pending_sgs was decremented after dwc3_prepare_one_trb was called on parts

>>>>>> of that requests sgs but not all.

>>>>>

>>>>>say num_mapped_sgs = 500. The driver has 255 TRBs for use (+1 link

>>>>>trb). Clearly not all SGs will fit in one go. Driver will set

>>>>>num_pending_sgs to 500, because that's the number of sgs that need to be

>>>>>transferred.

>>>>>

>>>>>For each completion we will decrement num_pending_sgs. Assuming all 255

>>>>>were free and used up, num_pending_sgs will be 245 at this point. Driver

>>>>>*must* try to kick these 245 TRBs.

>>>>

>>>> Lets assume we have a request with length 1024 to 3072 like a multiple

>>>> of maxpacket * mul it is used on in uvc_video.

>>>>

>>>> So with a scatterlist creating only a 2 byte header and 1022 to 3070

>>>> bytes payload we get only two scatter entries in this request. On

>>>> special cases like page boundaries we will add one extra sg, but this

>>>> does not change much. So this is nothing big as your 500 example. Now

>>>> lets assume we queue 64 of these up we end up with maximum fo 192 trbs

>>>> being added to the pending list.

>>>>

>>>> Lets say every one of these will be queued with ep_queue comming from

>>>> uvc_video_pump. The kick_transfer from these ep_queues will result in an

>>>> dwc3_prepare_trbs. *But* after the first one being preprared from the

>>>> pending list to the started list, it will have an req->num_preprared_sgs != 0.

>>>

>>>there's no such thing as req->num_prepared_sgs, do you mean

>>>num_pending_sgs here?

>>

>> Sure, I got this mixed up. Just keep in mind num_prepared_sgs was

>> ment to be num_pending_sgs.

>

>no worries :-)

>

>>>> Therefor after the first one got queued into the starting list and it

>>>> did not complete yet, every next ep_queue->kick_transfer will always bail out

>>>> in dwc3_prepare_sgs, checking the one started request with num_preprared

>>>> != 0, without moving other requests from pending to the starting list.

>>>

>>>no it won't. They're testing a different req pointer altogether. Or they

>>>should. If you're queueing the same exact req pointer before it has been

>>>completed, shame on you :-p

>>

>> Yes, they will. Even if I will queue a different request. The function

>> dwc3_prepare_sgs will check the list of started request and will bail

>> out with ret = 0.

>>

>> As I already mentioned dwc3_prepare_sgs will bail out early after

>> the first started request will have an num_prepared_sgs != 0.

>>

>> Look into that code in dwc3_prepare_sgs:

>>

>> ---

>> 1289         list_for_each_entry(req, &dep->started_list, list) {

>> 1290                 if (req->num_pending_sgs > 0) {

>> 1291                         ret = dwc3_prepare_trbs_sg(dep, req);

>> 1292                         if (!ret || req->num_pending_sgs)

>> 1293                                 return ret;

>> 1294                 }

>> ---

>>

>> Since every new request being queued will trigger dwc3_prepare_sgs

>> to check the started list for request with num_pending_sgs, it will

>> absolutely run dwc3_prepare_sgs_sg for the request that did not finish yet.

>>

>> The call will set ret to 0 since nothing changed. Then as num_pending_sgs for

>> that started request is still equal to num_queued_sgs it will bail out from

>> that function and will not prepare this actual new request in the pending list.

>>

>> After dwc3_prepare_sgs will come back to kick_transfer, it will even not result

>> into an update transfer command since the following condition prevents that.

>>

>> ---

>> 1354         /*

>> 1355          * Note that it's normal to have no new TRBs prepared (i.e. ret == 0).

>> 1356          * This happens when we need to stop and restart a transfer such as in

>> 1357          * the case of reinitiating a stream or retrying an isoc transfer.

>> 1358          */

>> 1359         ret = dwc3_prepare_trbs(dep);

>> 1360         if (ret < 0)

>> 1361                 return ret;

>> 1362

>> 1363         starting = !(dep->flags & DWC3_EP_TRANSFER_STARTED);

>> 1364

>> 1365         /*

>> 1366          * If there's no new TRB prepared and we don't need to restart a

>> 1367          * transfer, there's no need to update the transfer.

>> 1368          */

>> 1369         if (!ret && !starting)

>> 1370                 return ret;

>> ---

>>

>> with ret = 0 and starting = 0:

>>

>> if (!0 && !0)

>> 	return 0;

>

>yeah, I suggest looking at the history of gadget.c WRT isochronous

>transfers.


I will check that.

>Also, it seems to me that we have a minor bug where once

>PENDING_REQUEST is set, it's never cleared. Perhaps that's the only

>thing that needs solving here?


That should be addressed as well. But look into my next comments for some
further details.

>>>> This will result in an stalled driver, that will never prepare any

>>>> pending requests into the started list.

>>>

>>>it shouldn't. Please collect tracepoint output, this is probably the 5th

>>>time I've asked for tracepoint output of the problem.

>>

>> I send you the traceback. I used trace_event=dwc3:* in kernel commandline.

>

>sorry about this, I wrote the comment before seeing the trace and forgot

>to remove before sending :-)


no worries ;-)

>>>When the following request is queued, TRANSFER_STARTED is set, but

>>>PENDING_REQUEST shouldn't be set which would mean

>>>__dwc3_gadget_kick_transfer() be called which should result in a call to

>>>Update Transfer, unless I'm missing something here. It has been quite a

>>>while since I looked at this driver deeply.

>>

>> Yes this is the case. Update transfer will be called but only

>> if dwc3_prepare_trbs did come back with ret != 0.

>>

>>>

>>>> kworker/u5:1-74      [000] d..2    56.807831: dwc3_ep_queue: ep2in: req 00000000f5394c5d length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.807857: dwc3_ep_queue: ep2in: req 00000000ef95b55c length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.807883: dwc3_ep_queue: ep2in: req 00000000d36cf0cc length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.807908: dwc3_ep_queue: ep2in: req 00000000c5f1bd4b length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.807934: dwc3_ep_queue: ep2in: req 00000000b7180b2d length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.807960: dwc3_ep_queue: ep2in: req 000000005d629ded length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.807985: dwc3_ep_queue: ep2in: req 0000000062e02cc0 length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.808011: dwc3_ep_queue: ep2in: req 000000001d2ec1c0 length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.808038: dwc3_ep_queue: ep2in: req 0000000018ba3d29 length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.808063: dwc3_ep_queue: ep2in: req 000000000be47fc7 length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.808089: dwc3_ep_queue: ep2in: req 00000000caba09e2 length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.808115: dwc3_ep_queue: ep2in: req 00000000cb0191c4 length 0/3072 zsi ==> -115

>>>> kworker/u5:1-74      [000] d..2    56.808140: dwc3_ep_queue: ep2in: req 000000005255f606 length 0/3072 zsi ==> -115

>>>>  irq/51-dwc3-239     [000] d..1    56.808240: dwc3_event: event (d1d0408a): ep2in: Transfer In Progress [0000d1d0] (sIm)

>>>

>>>eventually we get Transfer In Progress for the first TRB...

>>>

>>>>  irq/51-dwc3-239     [000] d..1    56.808246: dwc3_complete_trb: ep2in: trb 000000007ebd2707 (E2:D1) buf 000000000c623058 size 1x 0 ctrl 34740464:0000d1d0 (hlCs:Sc:isoc-first)

>>>>  irq/51-dwc3-239     [000] d..1    56.808250: dwc3_complete_trb: ep2in: trb 0000000093bb2bc4 (E2:D2) buf 000000000e200000 size 1x 0 ctrl 34740c70:0000d1d0 (hlcs:SC:isoc)

>>>>  irq/51-dwc3-239     [000] d..1    56.808256: dwc3_gadget_giveback: ep2in: req 00000000823b0b62 length 3072/3072 zsI ==> 0

>>>

>>>complete and give it back.

>>>

>>>>  irq/51-dwc3-239     [000] d..1    56.808305: dwc3_prepare_trb: ep2in: trb 000000004af35733 (E3:D2) buf 000000000c623088 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

>>>>  irq/51-dwc3-239     [000] d..1    56.808309: dwc3_prepare_trb: ep2in: trb 00000000a11671a8 (E4:D2) buf 000000000e200bfe size 1x 3070 ctrl 00000471:00000000 (Hlcs:Sc:isoc)

>>>

>>>prepare the next

>>>

>>>>  irq/51-dwc3-239     [000] d..1    56.808320: dwc3_gadget_ep_cmd: ep2in: cmd 'Update Transfer' [50407] params 00000000 00000000 00000000 --> status: Successful

>>>

>>>kick it by telling the HW about it. I don't see any failures here,

>>>although there may be situations where we could add more data to the

>>>controller's cache early on.

>>

>> I back that Idea of adding more data to the controllers cache early.

>

>right, and that's where things get complicated because we don't have

>enough information from ISOC endpoints to schedule transfers to correct

>microframes. We're missing two bits which are only shared with us by the

>HW during XferInProgress and XferNotReady (IIRC).


No! For Isoc we never miss these bits. I understand that driver part
already good enough.

Because once the transfer is started every ep_queue (respectively every
update transfer) will only add trbs to the running transfer. These
updates have no timestamp information. Just look into the update
transfer command.

When the update transfer is called not later than the last trb from the
current transfer was handled in hardware, then everything will run
smooth. In the other case, when the update transfer was called after the
trbs from the current transfer did run out we will receive a missed
interrupt for that new udpated trb and the driver needs to call end
transfer so a new one is started.

>> The second one you see is already way to late for an update. It comes

>> back as an Missed Interrupt as the Update was to late for the transfer.

>>

>> "irq/51-dwc3-239     [000] d..1    56.808460: dwc3_event: event (d1d1808a): ep2in: Transfer In Progress [0000d1d1] (siM)"

>>

>> After the missed transfer the last transfer will be ended and another transfer is prepared.

>>

>> "irq/51-dwc3-239     [000] d..1    56.815974: dwc3_event: event (d20e10ca): ep2in: Transfer Not Ready [0000d20e] (Not Active)"

>>

>> But after that and an successfull next start transfer no further completion interrupts will

>> show up. That seems broken.

>

>oh, that's messed up. We should get XferInProgress.


Should we? Look into the comment in transfer_not_ready call. So this in
our case is the caller of __dwc3_gadget_start_isoc -> dwc3_prepare_trbs -> kick_transfer
for the one next trb in the pending list.

3047 static void dwc3_gadget_endpoint_transfer_not_ready(struct dwc3_ep *dep,
3048                 const struct dwc3_event_depevt *event)
3049 {
3050         dwc3_gadget_endpoint_frame_from_event(dep, event);
3051
3052         /*
3053          * The XferNotReady event is generated only once before the endpoint
3054          * starts. It will be generated again when END_TRANSFER command is
3055          * issued. For some controller versions, the XferNotReady event may be
3056          * generated while the END_TRANSFER command is still in process. Ignore
3057          * it and wait for the next XferNotReady event after the command is
3058          * completed.
3059          */
3060         if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
3061                 return;
3062
3063         (void) __dwc3_gadget_start_isoc(dep);
3064 }

>> "irq/51-dwc3-239     [000] d..1    56.815996: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d2120406] params 00000000 0a6ab040 00000000 --> status: Successful"

>>

>> It looks like the underlying issue I could avoid with my patches. However not

>> updating the transfer and running into missed interrupts is another issue

>> that needs to be solved.

>

>my concern is that you may be regressing other parts of the driver :-)


We should obviously check for that. But for isoc I did not see any problem.

>How about we look into the fact that PENDING_REQUEST is never cleared

>first?


We can do that as well. But my head is issueing the pre-starting task now! :)

>>>I remember dealing with a bunch of Bus Expiry errors in the past and the

>>>problem was because we couldn't maintain proper alignment with the

>>>microframe number because HW only gives us 14 out of the 16 bits of the

>>>microframe number, the only workaround was to waiting for

>>>XferInProgress/XferNotReady to get that number.

>>

>> I think the bus expiry or microframe handling is not a problem here.

>

>Not in this case because we're never pre-starting transfers. What you

>suggest will pre-start transfers and we will have regressions.


I did that, and did never see any regressions, as I explained above. Otherwise
I would have checked for another solution, than my initial two patches.

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Felipe Balbi April 24, 2021, 1:43 p.m. UTC | #9
HI,

Michael Grzeschik <mgr@pengutronix.de> writes:

<big snip>

>>>>> kworker/u5:1-74      [000] d..2    56.807831: dwc3_ep_queue: ep2in: req 00000000f5394c5d length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.807857: dwc3_ep_queue: ep2in: req 00000000ef95b55c length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.807883: dwc3_ep_queue: ep2in: req 00000000d36cf0cc length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.807908: dwc3_ep_queue: ep2in: req 00000000c5f1bd4b length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.807934: dwc3_ep_queue: ep2in: req 00000000b7180b2d length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.807960: dwc3_ep_queue: ep2in: req 000000005d629ded length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.807985: dwc3_ep_queue: ep2in: req 0000000062e02cc0 length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.808011: dwc3_ep_queue: ep2in: req 000000001d2ec1c0 length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.808038: dwc3_ep_queue: ep2in: req 0000000018ba3d29 length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.808063: dwc3_ep_queue: ep2in: req 000000000be47fc7 length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.808089: dwc3_ep_queue: ep2in: req 00000000caba09e2 length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.808115: dwc3_ep_queue: ep2in: req 00000000cb0191c4 length 0/3072 zsi ==> -115

>>>>> kworker/u5:1-74      [000] d..2    56.808140: dwc3_ep_queue: ep2in: req 000000005255f606 length 0/3072 zsi ==> -115

>>>>>  irq/51-dwc3-239     [000] d..1    56.808240: dwc3_event: event (d1d0408a): ep2in: Transfer In Progress [0000d1d0] (sIm)

>>>>

>>>>eventually we get Transfer In Progress for the first TRB...

>>>>

>>>>>  irq/51-dwc3-239     [000] d..1    56.808246: dwc3_complete_trb: ep2in: trb 000000007ebd2707 (E2:D1) buf 000000000c623058 size 1x 0 ctrl 34740464:0000d1d0 (hlCs:Sc:isoc-first)

>>>>>  irq/51-dwc3-239     [000] d..1    56.808250: dwc3_complete_trb: ep2in: trb 0000000093bb2bc4 (E2:D2) buf 000000000e200000 size 1x 0 ctrl 34740c70:0000d1d0 (hlcs:SC:isoc)

>>>>>  irq/51-dwc3-239     [000] d..1    56.808256: dwc3_gadget_giveback: ep2in: req 00000000823b0b62 length 3072/3072 zsI ==> 0

>>>>

>>>>complete and give it back.

>>>>

>>>>>  irq/51-dwc3-239     [000] d..1    56.808305: dwc3_prepare_trb: ep2in: trb 000000004af35733 (E3:D2) buf 000000000c623088 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

>>>>>  irq/51-dwc3-239     [000] d..1    56.808309: dwc3_prepare_trb: ep2in: trb 00000000a11671a8 (E4:D2) buf 000000000e200bfe size 1x 3070 ctrl 00000471:00000000 (Hlcs:Sc:isoc)

>>>>

>>>>prepare the next

>>>>

>>>>>  irq/51-dwc3-239     [000] d..1    56.808320: dwc3_gadget_ep_cmd: ep2in: cmd 'Update Transfer' [50407] params 00000000 00000000 00000000 --> status: Successful

>>>>

>>>>kick it by telling the HW about it. I don't see any failures here,

>>>>although there may be situations where we could add more data to the

>>>>controller's cache early on.

>>>

>>> I back that Idea of adding more data to the controllers cache early.

>>

>>right, and that's where things get complicated because we don't have

>>enough information from ISOC endpoints to schedule transfers to correct

>>microframes. We're missing two bits which are only shared with us by the

>>HW during XferInProgress and XferNotReady (IIRC).

>

> No! For Isoc we never miss these bits. I understand that driver part

> already good enough.

>

> Because once the transfer is started every ep_queue (respectively every

> update transfer) will only add trbs to the running transfer. These

> updates have no timestamp information. Just look into the update

> transfer command.


right, the details of that are fuzzy for me since I haven't directly
touched this part of the driver in quite a while but IIRC, Update
Transfer should be used only to add transfer to the current *service
interval*; meaning that if you continuously Update Transfer, you will
run into bus expiry.

Thinh, any chance you can shed some light here?

> When the update transfer is called not later than the last trb from the

> current transfer was handled in hardware, then everything will run

> smooth. In the other case, when the update transfer was called after the


Sure, but this poses another requirement on the Gadget API that
transfers must be ready in a timely fashion. Currently, there's no way
for gadget driver to guarantee this requirement, specially since for
video frames processing is off-loaded to a userspace component. The
latency involved here may be quite large.

I have memories that what you say is exactly what we used to do but it
turned out that it was far too easy to get out of sync between dwc3,
g_webcam and the userspace component.

> trbs from the current transfer did run out we will receive a missed

> interrupt for that new udpated trb and the driver needs to call end

> transfer so a new one is started.


True, that's not handled as of yet, IIRC. We return -EXDEV to the gadget
driver but don't really issue End Transfer

(/me checks gadget.c)

Nope, scratch that, we _do_ issue End Transfer, so something like this
*could* work. I'm still worried about the PENDING_REQUEST flag never
being cleared, though.

>>> The second one you see is already way to late for an update. It comes

>>> back as an Missed Interrupt as the Update was to late for the transfer.

>>>

>>> "irq/51-dwc3-239     [000] d..1    56.808460: dwc3_event: event (d1d1808a): ep2in: Transfer In Progress [0000d1d1] (siM)"

>>>

>>> After the missed transfer the last transfer will be ended and another transfer is prepared.

>>>

>>> "irq/51-dwc3-239     [000] d..1    56.815974: dwc3_event: event (d20e10ca): ep2in: Transfer Not Ready [0000d20e] (Not Active)"

>>>

>>> But after that and an successfull next start transfer no further completion interrupts will

>>> show up. That seems broken.

>>

>>oh, that's messed up. We should get XferInProgress.

>

> Should we? Look into the comment in transfer_not_ready call. So this in

> our case is the caller of __dwc3_gadget_start_isoc -> dwc3_prepare_trbs -> kick_transfer

> for the one next trb in the pending list.

>

> 3047 static void dwc3_gadget_endpoint_transfer_not_ready(struct dwc3_ep *dep,

> 3048                 const struct dwc3_event_depevt *event)

> 3049 {

> 3050         dwc3_gadget_endpoint_frame_from_event(dep, event);

> 3051

> 3052         /*

> 3053          * The XferNotReady event is generated only once before the endpoint

> 3054          * starts. It will be generated again when END_TRANSFER command is

> 3055          * issued. For some controller versions, the XferNotReady event may be

> 3056          * generated while the END_TRANSFER command is still in process. Ignore

> 3057          * it and wait for the next XferNotReady event after the command is

> 3058          * completed.

> 3059          */

> 3060         if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)

> 3061                 return;

> 3062

> 3063         (void) __dwc3_gadget_start_isoc(dep);

> 3064 }


that came from commit 36f05d36b03523da906cf2ae70ec31af6f57e94c. I don't
think the intention was for that to apply in this particular case. Thinh?

>>> "irq/51-dwc3-239     [000] d..1    56.815996: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d2120406] params 00000000 0a6ab040 00000000 --> status: Successful"

>>>

>>> It looks like the underlying issue I could avoid with my patches. However not

>>> updating the transfer and running into missed interrupts is another issue

>>> that needs to be solved.

>>

>>my concern is that you may be regressing other parts of the driver :-)

>

> We should obviously check for that. But for isoc I did not see any problem.


Did you test both audio and video gadgets? HS and SS?

>>How about we look into the fact that PENDING_REQUEST is never cleared

>>first?

>

> We can do that as well. But my head is issueing the pre-starting task now! :)


heh :-) As I mentioned, I fear we may end up opening a can of worms if
we change this part without guaranteeing that all the driver flags are
in the correct state first.

>>>>I remember dealing with a bunch of Bus Expiry errors in the past and the

>>>>problem was because we couldn't maintain proper alignment with the

>>>>microframe number because HW only gives us 14 out of the 16 bits of the

>>>>microframe number, the only workaround was to waiting for

>>>>XferInProgress/XferNotReady to get that number.

>>>

>>> I think the bus expiry or microframe handling is not a problem here.

>>

>>Not in this case because we're never pre-starting transfers. What you

>>suggest will pre-start transfers and we will have regressions.

>

> I did that, and did never see any regressions, as I explained above. Otherwise

> I would have checked for another solution, than my initial two patches.


So you tested on your usecase, with one platform. We would need more
people testing this if we want any chance of avoiding regressions, and I
still think fixing the bad flags is more important first because that
flag is used exactly to make decisions regarding Isochronous transfers.

-- 
balbi
Thinh Nguyen April 24, 2021, 7:41 p.m. UTC | #10
Felipe Balbi wrote:
> 

> HI,

> 

> Michael Grzeschik <mgr@pengutronix.de> writes:

> 

> <big snip>

> 

>>>>>> kworker/u5:1-74      [000] d..2    56.807831: dwc3_ep_queue: ep2in: req 00000000f5394c5d length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.807857: dwc3_ep_queue: ep2in: req 00000000ef95b55c length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.807883: dwc3_ep_queue: ep2in: req 00000000d36cf0cc length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.807908: dwc3_ep_queue: ep2in: req 00000000c5f1bd4b length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.807934: dwc3_ep_queue: ep2in: req 00000000b7180b2d length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.807960: dwc3_ep_queue: ep2in: req 000000005d629ded length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.807985: dwc3_ep_queue: ep2in: req 0000000062e02cc0 length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.808011: dwc3_ep_queue: ep2in: req 000000001d2ec1c0 length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.808038: dwc3_ep_queue: ep2in: req 0000000018ba3d29 length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.808063: dwc3_ep_queue: ep2in: req 000000000be47fc7 length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.808089: dwc3_ep_queue: ep2in: req 00000000caba09e2 length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.808115: dwc3_ep_queue: ep2in: req 00000000cb0191c4 length 0/3072 zsi ==> -115

>>>>>> kworker/u5:1-74      [000] d..2    56.808140: dwc3_ep_queue: ep2in: req 000000005255f606 length 0/3072 zsi ==> -115

>>>>>>  irq/51-dwc3-239     [000] d..1    56.808240: dwc3_event: event (d1d0408a): ep2in: Transfer In Progress [0000d1d0] (sIm)

>>>>>

>>>>> eventually we get Transfer In Progress for the first TRB...

>>>>>

>>>>>>  irq/51-dwc3-239     [000] d..1    56.808246: dwc3_complete_trb: ep2in: trb 000000007ebd2707 (E2:D1) buf 000000000c623058 size 1x 0 ctrl 34740464:0000d1d0 (hlCs:Sc:isoc-first)

>>>>>>  irq/51-dwc3-239     [000] d..1    56.808250: dwc3_complete_trb: ep2in: trb 0000000093bb2bc4 (E2:D2) buf 000000000e200000 size 1x 0 ctrl 34740c70:0000d1d0 (hlcs:SC:isoc)

>>>>>>  irq/51-dwc3-239     [000] d..1    56.808256: dwc3_gadget_giveback: ep2in: req 00000000823b0b62 length 3072/3072 zsI ==> 0

>>>>>

>>>>> complete and give it back.

>>>>>

>>>>>>  irq/51-dwc3-239     [000] d..1    56.808305: dwc3_prepare_trb: ep2in: trb 000000004af35733 (E3:D2) buf 000000000c623088 size 1x 2 ctrl 00000465:00000000 (HlCs:Sc:isoc-first)

>>>>>>  irq/51-dwc3-239     [000] d..1    56.808309: dwc3_prepare_trb: ep2in: trb 00000000a11671a8 (E4:D2) buf 000000000e200bfe size 1x 3070 ctrl 00000471:00000000 (Hlcs:Sc:isoc)

>>>>>

>>>>> prepare the next

>>>>>

>>>>>>  irq/51-dwc3-239     [000] d..1    56.808320: dwc3_gadget_ep_cmd: ep2in: cmd 'Update Transfer' [50407] params 00000000 00000000 00000000 --> status: Successful

>>>>>

>>>>> kick it by telling the HW about it. I don't see any failures here,

>>>>> although there may be situations where we could add more data to the

>>>>> controller's cache early on.

>>>>

>>>> I back that Idea of adding more data to the controllers cache early.

>>>

>>> right, and that's where things get complicated because we don't have

>>> enough information from ISOC endpoints to schedule transfers to correct

>>> microframes. We're missing two bits which are only shared with us by the

>>> HW during XferInProgress and XferNotReady (IIRC).

>>

>> No! For Isoc we never miss these bits. I understand that driver part

>> already good enough.

>>

>> Because once the transfer is started every ep_queue (respectively every

>> update transfer) will only add trbs to the running transfer. These

>> updates have no timestamp information. Just look into the update

>> transfer command.

> 

> right, the details of that are fuzzy for me since I haven't directly

> touched this part of the driver in quite a while but IIRC, Update

> Transfer should be used only to add transfer to the current *service

> interval*; meaning that if you continuously Update Transfer, you will

> run into bus expiry.

> 

> Thinh, any chance you can shed some light here?


The missing 2 upper bits Felipe is referring to is the 2 bits from DSTS
register. The 16-bit frame counter value is from XferNotReady.
Bus-expiry only happens on START_TRANSFER command when the driver
schedules isoc too late. UPDATE_TRANSFER command simply tells the
controller to update its TRB cache after SW prepared some new TRBs with
HWO bit set.


> 

>> When the update transfer is called not later than the last trb from the

>> current transfer was handled in hardware, then everything will run

>> smooth. In the other case, when the update transfer was called after the

> 

> Sure, but this poses another requirement on the Gadget API that

> transfers must be ready in a timely fashion. Currently, there's no way

> for gadget driver to guarantee this requirement, specially since for

> video frames processing is off-loaded to a userspace component. The

> latency involved here may be quite large.

> 

> I have memories that what you say is exactly what we used to do but it

> turned out that it was far too easy to get out of sync between dwc3,

> g_webcam and the userspace component.

> 

>> trbs from the current transfer did run out we will receive a missed

>> interrupt for that new udpated trb and the driver needs to call end

>> transfer so a new one is started.

> 

> True, that's not handled as of yet, IIRC. We return -EXDEV to the gadget

> driver but don't really issue End Transfer

> 

> (/me checks gadget.c)

> 

> Nope, scratch that, we _do_ issue End Transfer, so something like this

> *could* work. I'm still worried about the PENDING_REQUEST flag never

> being cleared, though.


We issue END_TRANSFER if the cause of missed isoc is due to driver not
preparing enough TRBs. In normal cases, we don't issue END_TRANSFER.

> 

>>>> The second one you see is already way to late for an update. It comes

>>>> back as an Missed Interrupt as the Update was to late for the transfer.

>>>>

>>>> "irq/51-dwc3-239     [000] d..1    56.808460: dwc3_event: event (d1d1808a): ep2in: Transfer In Progress [0000d1d1] (siM)"

>>>>

>>>> After the missed transfer the last transfer will be ended and another transfer is prepared.

>>>>

>>>> "irq/51-dwc3-239     [000] d..1    56.815974: dwc3_event: event (d20e10ca): ep2in: Transfer Not Ready [0000d20e] (Not Active)"

>>>>

>>>> But after that and an successfull next start transfer no further completion interrupts will

>>>> show up. That seems broken.

>>>

>>> oh, that's messed up. We should get XferInProgress.

>>

>> Should we? Look into the comment in transfer_not_ready call. So this in

>> our case is the caller of __dwc3_gadget_start_isoc -> dwc3_prepare_trbs -> kick_transfer

>> for the one next trb in the pending list.

>>

>> 3047 static void dwc3_gadget_endpoint_transfer_not_ready(struct dwc3_ep *dep,

>> 3048                 const struct dwc3_event_depevt *event)

>> 3049 {

>> 3050         dwc3_gadget_endpoint_frame_from_event(dep, event);

>> 3051

>> 3052         /*

>> 3053          * The XferNotReady event is generated only once before the endpoint

>> 3054          * starts. It will be generated again when END_TRANSFER command is

>> 3055          * issued. For some controller versions, the XferNotReady event may be

>> 3056          * generated while the END_TRANSFER command is still in process. Ignore

>> 3057          * it and wait for the next XferNotReady event after the command is

>> 3058          * completed.

>> 3059          */

>> 3060         if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)

>> 3061                 return;

>> 3062

>> 3063         (void) __dwc3_gadget_start_isoc(dep);

>> 3064 }

> 

> that came from commit 36f05d36b03523da906cf2ae70ec31af6f57e94c. I don't

> think the intention was for that to apply in this particular case. Thinh?


It's not related to this. The END_TRANSFER came because there's no TRB
left in the queue. After END_TRANSFER, there will be XferNotReady to
restart the isoc.


> 

>>>> "irq/51-dwc3-239     [000] d..1    56.815996: dwc3_gadget_ep_cmd: ep2in: cmd 'Start Transfer' [d2120406] params 00000000 0a6ab040 00000000 --> status: Successful"

>>>>

>>>> It looks like the underlying issue I could avoid with my patches. However not

>>>> updating the transfer and running into missed interrupts is another issue

>>>> that needs to be solved.

>>>

>>> my concern is that you may be regressing other parts of the driver :-)

>>

>> We should obviously check for that. But for isoc I did not see any problem.

> 

> Did you test both audio and video gadgets? HS and SS?

> 

>>> How about we look into the fact that PENDING_REQUEST is never cleared

>>> first?

>>

>> We can do that as well. But my head is issueing the pre-starting task now! :)

> 

> heh :-) As I mentioned, I fear we may end up opening a can of worms if

> we change this part without guaranteeing that all the driver flags are

> in the correct state first.

> 

>>>>> I remember dealing with a bunch of Bus Expiry errors in the past and the

>>>>> problem was because we couldn't maintain proper alignment with the

>>>>> microframe number because HW only gives us 14 out of the 16 bits of the

>>>>> microframe number, the only workaround was to waiting for

>>>>> XferInProgress/XferNotReady to get that number.

>>>>

>>>> I think the bus expiry or microframe handling is not a problem here.

>>>

>>> Not in this case because we're never pre-starting transfers. What you

>>> suggest will pre-start transfers and we will have regressions.

>>

>> I did that, and did never see any regressions, as I explained above. Otherwise

>> I would have checked for another solution, than my initial two patches.

> 

> So you tested on your usecase, with one platform. We would need more

> people testing this if we want any chance of avoiding regressions, and I

> still think fixing the bad flags is more important first because that

> flag is used exactly to make decisions regarding Isochronous transfers.

> 


I think I see the issue that Michael reported.

The problem is that we're using num_pending_sgs to track both pending SG
entries and queued SG entries. num_pending_sgs doesn't get updated until
TRB completion interrupt (ie XferInProgress). Before the driver queues
more SG requests, it will check if there's any pending SG in the started
request list before it prepares more. Since the num_pending_sgs doesn't
get updated until the request is completed, the driver doesn't process
more until the request is completed.

I need to review more on Michael's patches next week, but I think what
he suggested makes sense (in term of properly usage of queued sgs vs
pending sgs). BTW, please correct me if I'm wrong, but we do modify
num_queued_sgs.

BR,
Thinh
Thinh Nguyen April 27, 2021, 3:18 a.m. UTC | #11
Hi Michael,

Thinh Nguyen wrote:
> Felipe Balbi wrote:

>>

>> HI,

>>

>> Michael Grzeschik <mgr@pengutronix.de> writes:

>>

>> <big snip>

>>



<bigger snip>


> I think I see the issue that Michael reported.

> 

> The problem is that we're using num_pending_sgs to track both pending SG

> entries and queued SG entries. num_pending_sgs doesn't get updated until

> TRB completion interrupt (ie XferInProgress). Before the driver queues

> more SG requests, it will check if there's any pending SG in the started

> request list before it prepares more. Since the num_pending_sgs doesn't

> get updated until the request is completed, the driver doesn't process

> more until the request is completed.

> 

> I need to review more on Michael's patches next week, but I think what

> he suggested makes sense (in term of properly usage of queued sgs vs

> pending sgs). BTW, please correct me if I'm wrong, but we do modify

> num_queued_sgs.

> 


There's still some issue with your patch. I think this should cover it.
Let me know if it works for you.

Note: this however probably needs more "Tested-by" and reviews
to make sure I'm not missing anything. I only ran some basic tests,
and will need to run more.

Let me know if this makes sense.

BR,
Thinh


diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index dd80e5ca8c78..040cc67b3361 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1244,6 +1244,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep *dep,
                        req->start_sg = sg_next(s);
 
                req->num_queued_sgs++;
+               req->num_pending_sgs--;
 
                /*
                 * The number of pending SG entries may not correspond to the
@@ -1251,7 +1252,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep *dep,
                 * don't include unused SG entries.
                 */
                if (length == 0) {
-                       req->num_pending_sgs -= req->request.num_mapped_sgs - req->num_queued_sgs;
+                       req->num_pending_sgs = 0;
                        break;
                }
 
@@ -2867,15 +2868,15 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,
        struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];
        struct scatterlist *sg = req->sg;
        struct scatterlist *s;
-       unsigned int pending = req->num_pending_sgs;
+       unsigned int num_queued = req->num_queued_sgs;
        unsigned int i;
        int ret = 0;
 
-       for_each_sg(sg, s, pending, i) {
+       for_each_sg(sg, s, num_queued, i) {
                trb = &dep->trb_pool[dep->trb_dequeue];
 
                req->sg = sg_next(s);
-               req->num_pending_sgs--;
+               req->num_queued_sgs--;
 
                ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,
                                trb, event, status, true);
@@ -2898,7 +2899,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,
 
 static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)
 {
-       return req->num_pending_sgs == 0;
+       return req->num_pending_sgs == 0 && req->num_queued_sgs == 0;
 }
 
 static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
@@ -2907,7 +2908,7 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,
 {
        int ret;
 
-       if (req->num_pending_sgs)
+       if (req->request.num_mapped_sgs)
                ret = dwc3_gadget_ep_reclaim_trb_sg(dep, req, event,
                                status);
        else
Michael Grzeschik April 27, 2021, 8:12 p.m. UTC | #12
On Tue, Apr 27, 2021 at 03:18:57AM +0000, Thinh Nguyen wrote:
>Hi Michael,

>

>Thinh Nguyen wrote:

>> Felipe Balbi wrote:

>>>

>>> HI,

>>>

>>> Michael Grzeschik <mgr@pengutronix.de> writes:

>>>

>>> <big snip>

>>>

>

>

><bigger snip>

>

>

>> I think I see the issue that Michael reported.

>>

>> The problem is that we're using num_pending_sgs to track both pending SG

>> entries and queued SG entries. num_pending_sgs doesn't get updated until

>> TRB completion interrupt (ie XferInProgress). Before the driver queues

>> more SG requests, it will check if there's any pending SG in the started

>> request list before it prepares more. Since the num_pending_sgs doesn't

>> get updated until the request is completed, the driver doesn't process

>> more until the request is completed.

>>

>> I need to review more on Michael's patches next week, but I think what

>> he suggested makes sense (in term of properly usage of queued sgs vs

>> pending sgs). BTW, please correct me if I'm wrong, but we do modify

>> num_queued_sgs.

>>

>

>There's still some issue with your patch. I think this should cover it.

>Let me know if it works for you.


This works for me! Will you spin a proper patch from that?

>Note: this however probably needs more "Tested-by" and reviews

>to make sure I'm not missing anything. I only ran some basic tests,

>and will need to run more.


You may already have mine:

Tested-by: Michael Grzeschik <m.grzeschik@pengutronix.de>


>Let me know if this makes sense.


From what I understand about the issue and the purpose of all
variables this makes total sense to me. So thanks for taking over
and make a proper solution.

Thanks,
Michael

>BR,

>Thinh

>

>

>diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>index dd80e5ca8c78..040cc67b3361 100644

>--- a/drivers/usb/dwc3/gadget.c

>+++ b/drivers/usb/dwc3/gadget.c

>@@ -1244,6 +1244,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep *dep,

>                        req->start_sg = sg_next(s);

>

>                req->num_queued_sgs++;

>+               req->num_pending_sgs--;

>

>                /*

>                 * The number of pending SG entries may not correspond to the

>@@ -1251,7 +1252,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep *dep,

>                 * don't include unused SG entries.

>                 */

>                if (length == 0) {

>-                       req->num_pending_sgs -= req->request.num_mapped_sgs - req->num_queued_sgs;

>+                       req->num_pending_sgs = 0;

>                        break;

>                }

>

>@@ -2867,15 +2868,15 @@ static int dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>        struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>        struct scatterlist *sg = req->sg;

>        struct scatterlist *s;

>-       unsigned int pending = req->num_pending_sgs;

>+       unsigned int num_queued = req->num_queued_sgs;

>        unsigned int i;

>        int ret = 0;

>

>-       for_each_sg(sg, s, pending, i) {

>+       for_each_sg(sg, s, num_queued, i) {

>                trb = &dep->trb_pool[dep->trb_dequeue];

>

>                req->sg = sg_next(s);

>-               req->num_pending_sgs--;

>+               req->num_queued_sgs--;

>

>                ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>                                trb, event, status, true);

>@@ -2898,7 +2899,7 @@ static int dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>

> static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

> {

>-       return req->num_pending_sgs == 0;

>+       return req->num_pending_sgs == 0 && req->num_queued_sgs == 0;

> }

>

> static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,

>@@ -2907,7 +2908,7 @@ static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,

> {

>        int ret;

>

>-       if (req->num_pending_sgs)

>+       if (req->request.num_mapped_sgs)

>                ret = dwc3_gadget_ep_reclaim_trb_sg(dep, req, event,

>                                status);

>        else


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Thinh Nguyen April 28, 2021, 1:45 a.m. UTC | #13
Michael Grzeschik wrote:
> On Tue, Apr 27, 2021 at 03:18:57AM +0000, Thinh Nguyen wrote:

>> Hi Michael,

>>

>> Thinh Nguyen wrote:

>>> Felipe Balbi wrote:

>>>>

>>>> HI,

>>>>

>>>> Michael Grzeschik <mgr@pengutronix.de> writes:

>>>>

>>>> <big snip>

>>>>

>>

>>

>> <bigger snip>

>>

>>

>>> I think I see the issue that Michael reported.

>>>

>>> The problem is that we're using num_pending_sgs to track both pending SG

>>> entries and queued SG entries. num_pending_sgs doesn't get updated until

>>> TRB completion interrupt (ie XferInProgress). Before the driver queues

>>> more SG requests, it will check if there's any pending SG in the started

>>> request list before it prepares more. Since the num_pending_sgs doesn't

>>> get updated until the request is completed, the driver doesn't process

>>> more until the request is completed.

>>>

>>> I need to review more on Michael's patches next week, but I think what

>>> he suggested makes sense (in term of properly usage of queued sgs vs

>>> pending sgs). BTW, please correct me if I'm wrong, but we do modify

>>> num_queued_sgs.

>>>

>>

>> There's still some issue with your patch. I think this should cover it.

>> Let me know if it works for you.

> 

> This works for me! Will you spin a proper patch from that?


Sure. I can do that after 5.13-rc1 is released

> 

>> Note: this however probably needs more "Tested-by" and reviews

>> to make sure I'm not missing anything. I only ran some basic tests,

>> and will need to run more.

> 

> You may already have mine:

> 

> Tested-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

> 

>> Let me know if this makes sense.

> 

> From what I understand about the issue and the purpose of all

> variables this makes total sense to me. So thanks for taking over

> and make a proper solution.

> 

> Thanks,

> Michael

> 


Thanks for the Tested-by.

Btw, any reason for using SG with transfer less than PAGE_SIZE? I assume
your platform is 4KB, but you're splitting your 3KB transfer to smaller.
Was it like this before? Note that DWC3 has a limited number of internal
TRB cache. But what you're doing now is fine and doesn't impact performance.

BR,
Thinh


>>

>>

>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>> index dd80e5ca8c78..040cc67b3361 100644

>> --- a/drivers/usb/dwc3/gadget.c

>> +++ b/drivers/usb/dwc3/gadget.c

>> @@ -1244,6 +1244,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep

>> *dep,

>>                        req->start_sg = sg_next(s);

>>

>>                req->num_queued_sgs++;

>> +               req->num_pending_sgs--;

>>

>>                /*

>>                 * The number of pending SG entries may not correspond

>> to the

>> @@ -1251,7 +1252,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep

>> *dep,

>>                 * don't include unused SG entries.

>>                 */

>>                if (length == 0) {

>> -                       req->num_pending_sgs -=

>> req->request.num_mapped_sgs - req->num_queued_sgs;

>> +                       req->num_pending_sgs = 0;

>>                        break;

>>                }

>>

>> @@ -2867,15 +2868,15 @@ static int

>> dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>        struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>>        struct scatterlist *sg = req->sg;

>>        struct scatterlist *s;

>> -       unsigned int pending = req->num_pending_sgs;

>> +       unsigned int num_queued = req->num_queued_sgs;

>>        unsigned int i;

>>        int ret = 0;

>>

>> -       for_each_sg(sg, s, pending, i) {

>> +       for_each_sg(sg, s, num_queued, i) {

>>                trb = &dep->trb_pool[dep->trb_dequeue];

>>

>>                req->sg = sg_next(s);

>> -               req->num_pending_sgs--;

>> +               req->num_queued_sgs--;

>>

>>                ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>>                                trb, event, status, true);

>> @@ -2898,7 +2899,7 @@ static int

>> dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>

>> static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>> {

>> -       return req->num_pending_sgs == 0;

>> +       return req->num_pending_sgs == 0 && req->num_queued_sgs == 0;

>> }

>>

>> static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,

>> @@ -2907,7 +2908,7 @@ static int

>> dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,

>> {

>>        int ret;

>>

>> -       if (req->num_pending_sgs)

>> +       if (req->request.num_mapped_sgs)

>>                ret = dwc3_gadget_ep_reclaim_trb_sg(dep, req, event,

>>                                status);

>>        else

>
Michael Grzeschik April 28, 2021, 7:37 a.m. UTC | #14
Hi Thinh,

On Wed, Apr 28, 2021 at 01:45:11AM +0000, Thinh Nguyen wrote:
>Michael Grzeschik wrote:

>> On Tue, Apr 27, 2021 at 03:18:57AM +0000, Thinh Nguyen wrote:

>>> Hi Michael,

>>>

>>> Thinh Nguyen wrote:

>>>> Felipe Balbi wrote:

>>>>>

>>>>> HI,

>>>>>

>>>>> Michael Grzeschik <mgr@pengutronix.de> writes:

>>>>>

>>>>> <big snip>

>>>>>

>>>

>>>

>>> <bigger snip>

>>>

>>>

>>>> I think I see the issue that Michael reported.

>>>>

>>>> The problem is that we're using num_pending_sgs to track both pending SG

>>>> entries and queued SG entries. num_pending_sgs doesn't get updated until

>>>> TRB completion interrupt (ie XferInProgress). Before the driver queues

>>>> more SG requests, it will check if there's any pending SG in the started

>>>> request list before it prepares more. Since the num_pending_sgs doesn't

>>>> get updated until the request is completed, the driver doesn't process

>>>> more until the request is completed.

>>>>

>>>> I need to review more on Michael's patches next week, but I think what

>>>> he suggested makes sense (in term of properly usage of queued sgs vs

>>>> pending sgs). BTW, please correct me if I'm wrong, but we do modify

>>>> num_queued_sgs.

>>>>

>>>

>>> There's still some issue with your patch. I think this should cover it.

>>> Let me know if it works for you.

>>

>> This works for me! Will you spin a proper patch from that?

>

>Sure. I can do that after 5.13-rc1 is released


Great!

>>

>>> Note: this however probably needs more "Tested-by" and reviews

>>> to make sure I'm not missing anything. I only ran some basic tests,

>>> and will need to run more.

>>

>> You may already have mine:

>>

>> Tested-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>

>>> Let me know if this makes sense.

>>

>> From what I understand about the issue and the purpose of all

>> variables this makes total sense to me. So thanks for taking over

>> and make a proper solution.

>>

>> Thanks,

>> Michael

>>

>

>Thanks for the Tested-by.

>

>Btw, any reason for using SG with transfer less than PAGE_SIZE? I assume

>your platform is 4KB, but you're splitting your 3KB transfer to smaller.

>Was it like this before? Note that DWC3 has a limited number of internal

>TRB cache. But what you're doing now is fine and doesn't impact performance.


It all comes from the limitation of the uvc_video gadget. Look into the
patches I send to support sg in uvc_video driver. It just maps entries
from an sg list comming from videobuf2 to an req->sg list. In
uvc_video_alloc_requests the req->length will be set to req_size which
is calculated with:

  ep->maxpacket(1024) * maxburst(1-15) * mult(1-3)

With maxburst = 1 and mult = 3 it currently reults in an req->length of:

  1024 * 1 * 3 = 3072

This request will use one extra sg created from the uvc driver adding
a header of 2 bytes per request and then has 3070 bytes left for the
payload which will look into the vb2 sg list and increment every sg
entrie with the the used offset until it reaches sg page boundaries
and in that case will have to add one extra sg.

So this results in the funny patterns you see.

We can increase the req->length to 46080 with a maxburst of 15 for usb3
but I did not try that out yet.


Thanks,
Michael

>>>

>>>

>>> diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c

>>> index dd80e5ca8c78..040cc67b3361 100644

>>> --- a/drivers/usb/dwc3/gadget.c

>>> +++ b/drivers/usb/dwc3/gadget.c

>>> @@ -1244,6 +1244,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep

>>> *dep,

>>>                        req->start_sg = sg_next(s);

>>>

>>>                req->num_queued_sgs++;

>>> +               req->num_pending_sgs--;

>>>

>>>                /*

>>>                 * The number of pending SG entries may not correspond

>>> to the

>>> @@ -1251,7 +1252,7 @@ static int dwc3_prepare_trbs_sg(struct dwc3_ep

>>> *dep,

>>>                 * don't include unused SG entries.

>>>                 */

>>>                if (length == 0) {

>>> -                       req->num_pending_sgs -=

>>> req->request.num_mapped_sgs - req->num_queued_sgs;

>>> +                       req->num_pending_sgs = 0;

>>>                        break;

>>>                }

>>>

>>> @@ -2867,15 +2868,15 @@ static int

>>> dwc3_gadget_ep_reclaim_trb_sg(struct dwc3_ep *dep,

>>>        struct dwc3_trb *trb = &dep->trb_pool[dep->trb_dequeue];

>>>        struct scatterlist *sg = req->sg;

>>>        struct scatterlist *s;

>>> -       unsigned int pending = req->num_pending_sgs;

>>> +       unsigned int num_queued = req->num_queued_sgs;

>>>        unsigned int i;

>>>        int ret = 0;

>>>

>>> -       for_each_sg(sg, s, pending, i) {

>>> +       for_each_sg(sg, s, num_queued, i) {

>>>                trb = &dep->trb_pool[dep->trb_dequeue];

>>>

>>>                req->sg = sg_next(s);

>>> -               req->num_pending_sgs--;

>>> +               req->num_queued_sgs--;

>>>

>>>                ret = dwc3_gadget_ep_reclaim_completed_trb(dep, req,

>>>                                trb, event, status, true);

>>> @@ -2898,7 +2899,7 @@ static int

>>> dwc3_gadget_ep_reclaim_trb_linear(struct dwc3_ep *dep,

>>>

>>> static bool dwc3_gadget_ep_request_completed(struct dwc3_request *req)

>>> {

>>> -       return req->num_pending_sgs == 0;

>>> +       return req->num_pending_sgs == 0 && req->num_queued_sgs == 0;

>>> }

>>>

>>> static int dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,

>>> @@ -2907,7 +2908,7 @@ static int

>>> dwc3_gadget_ep_cleanup_completed_request(struct dwc3_ep *dep,

>>> {

>>>        int ret;

>>>

>>> -       if (req->num_pending_sgs)

>>> +       if (req->request.num_mapped_sgs)

>>>                ret = dwc3_gadget_ep_reclaim_trb_sg(dep, req, event,

>>>                                status);

>>>        else

>>

>

>


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Thinh Nguyen April 28, 2021, 11:25 p.m. UTC | #15
Michael Grzeschik wrote:
> Hi Thinh,

> 

> On Wed, Apr 28, 2021 at 01:45:11AM +0000, Thinh Nguyen wrote:

>> Michael Grzeschik wrote:

>>> On Tue, Apr 27, 2021 at 03:18:57AM +0000, Thinh Nguyen wrote:

>>>> Hi Michael,

>>>>

>>>> Thinh Nguyen wrote:

>>>>> Felipe Balbi wrote:

>>>>>>

>>>>>> HI,

>>>>>>

>>>>>> Michael Grzeschik <mgr@pengutronix.de> writes:

>>>>>>

>>>>>> <big snip>

>>>>>>

>>>>

>>>>

>>>> <bigger snip>

>>>>

>>>>

>>>>> I think I see the issue that Michael reported.

>>>>>

>>>>> The problem is that we're using num_pending_sgs to track both

>>>>> pending SG

>>>>> entries and queued SG entries. num_pending_sgs doesn't get updated

>>>>> until

>>>>> TRB completion interrupt (ie XferInProgress). Before the driver queues

>>>>> more SG requests, it will check if there's any pending SG in the

>>>>> started

>>>>> request list before it prepares more. Since the num_pending_sgs

>>>>> doesn't

>>>>> get updated until the request is completed, the driver doesn't process

>>>>> more until the request is completed.

>>>>>

>>>>> I need to review more on Michael's patches next week, but I think what

>>>>> he suggested makes sense (in term of properly usage of queued sgs vs

>>>>> pending sgs). BTW, please correct me if I'm wrong, but we do modify

>>>>> num_queued_sgs.

>>>>>

>>>>

>>>> There's still some issue with your patch. I think this should cover it.

>>>> Let me know if it works for you.

>>>

>>> This works for me! Will you spin a proper patch from that?

>>

>> Sure. I can do that after 5.13-rc1 is released

> 

> Great!

> 

>>>

>>>> Note: this however probably needs more "Tested-by" and reviews

>>>> to make sure I'm not missing anything. I only ran some basic tests,

>>>> and will need to run more.

>>>

>>> You may already have mine:

>>>

>>> Tested-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>>

>>>> Let me know if this makes sense.

>>>

>>> From what I understand about the issue and the purpose of all

>>> variables this makes total sense to me. So thanks for taking over

>>> and make a proper solution.

>>>

>>> Thanks,

>>> Michael

>>>

>>

>> Thanks for the Tested-by.

>>

>> Btw, any reason for using SG with transfer less than PAGE_SIZE? I assume

>> your platform is 4KB, but you're splitting your 3KB transfer to smaller.

>> Was it like this before? Note that DWC3 has a limited number of internal

>> TRB cache. But what you're doing now is fine and doesn't impact

>> performance.

> 

> It all comes from the limitation of the uvc_video gadget. Look into the

> patches I send to support sg in uvc_video driver. It just maps entries

> from an sg list comming from videobuf2 to an req->sg list. In

> uvc_video_alloc_requests the req->length will be set to req_size which

> is calculated with:

> 

>  ep->maxpacket(1024) * maxburst(1-15) * mult(1-3)

> 

> With maxburst = 1 and mult = 3 it currently reults in an req->length of:

> 

>  1024 * 1 * 3 = 3072


I wasn't referring to this, I was just wondering why uvc uses SG.
Correct me if I'm wrong, but isn't uvc allocates and uses contiguous
buffer? Depend on the setup, normal request may show better performance.

> 

> This request will use one extra sg created from the uvc driver adding

> a header of 2 bytes per request and then has 3070 bytes left for the

> payload which will look into the vb2 sg list and increment every sg

> entrie with the the used offset until it reaches sg page boundaries

> and in that case will have to add one extra sg.

> 

> So this results in the funny patterns you see.

> 

> We can increase the req->length to 46080 with a maxburst of 15 for usb3

> but I did not try that out yet.

> 


I wouldn't recommend that. Unless you know you're using both DWC3x
device and host, most other controllers we know cannot handle high isoc
throughput, especially at bInterval=1.

BR,
Thinh
Michael Grzeschik April 29, 2021, 6:51 a.m. UTC | #16
On Wed, Apr 28, 2021 at 11:25:05PM +0000, Thinh Nguyen wrote:
>Michael Grzeschik wrote:

>> Hi Thinh,

>>

>> On Wed, Apr 28, 2021 at 01:45:11AM +0000, Thinh Nguyen wrote:

>>> Michael Grzeschik wrote:

>>>> On Tue, Apr 27, 2021 at 03:18:57AM +0000, Thinh Nguyen wrote:

>>>>> Hi Michael,

>>>>>

>>>>> Thinh Nguyen wrote:

>>>>>> Felipe Balbi wrote:

>>>>>>>

>>>>>>> HI,

>>>>>>>

>>>>>>> Michael Grzeschik <mgr@pengutronix.de> writes:

>>>>>>>

>>>>>>> <big snip>

>>>>>>>

>>>>>

>>>>>

>>>>> <bigger snip>

>>>>>

>>>>>

>>>>>> I think I see the issue that Michael reported.

>>>>>>

>>>>>> The problem is that we're using num_pending_sgs to track both

>>>>>> pending SG

>>>>>> entries and queued SG entries. num_pending_sgs doesn't get updated

>>>>>> until

>>>>>> TRB completion interrupt (ie XferInProgress). Before the driver queues

>>>>>> more SG requests, it will check if there's any pending SG in the

>>>>>> started

>>>>>> request list before it prepares more. Since the num_pending_sgs

>>>>>> doesn't

>>>>>> get updated until the request is completed, the driver doesn't process

>>>>>> more until the request is completed.

>>>>>>

>>>>>> I need to review more on Michael's patches next week, but I think what

>>>>>> he suggested makes sense (in term of properly usage of queued sgs vs

>>>>>> pending sgs). BTW, please correct me if I'm wrong, but we do modify

>>>>>> num_queued_sgs.

>>>>>>

>>>>>

>>>>> There's still some issue with your patch. I think this should cover it.

>>>>> Let me know if it works for you.

>>>>

>>>> This works for me! Will you spin a proper patch from that?

>>>

>>> Sure. I can do that after 5.13-rc1 is released

>>

>> Great!

>>

>>>>

>>>>> Note: this however probably needs more "Tested-by" and reviews

>>>>> to make sure I'm not missing anything. I only ran some basic tests,

>>>>> and will need to run more.

>>>>

>>>> You may already have mine:

>>>>

>>>> Tested-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>>>

>>>>> Let me know if this makes sense.

>>>>

>>>> From what I understand about the issue and the purpose of all

>>>> variables this makes total sense to me. So thanks for taking over

>>>> and make a proper solution.

>>>>

>>>> Thanks,

>>>> Michael

>>>>

>>>

>>> Thanks for the Tested-by.

>>>

>>> Btw, any reason for using SG with transfer less than PAGE_SIZE? I assume

>>> your platform is 4KB, but you're splitting your 3KB transfer to smaller.

>>> Was it like this before? Note that DWC3 has a limited number of internal

>>> TRB cache. But what you're doing now is fine and doesn't impact

>>> performance.

>>

>> It all comes from the limitation of the uvc_video gadget. Look into the

>> patches I send to support sg in uvc_video driver. It just maps entries

>> from an sg list comming from videobuf2 to an req->sg list. In

>> uvc_video_alloc_requests the req->length will be set to req_size which

>> is calculated with:

>>

>>  ep->maxpacket(1024) * maxburst(1-15) * mult(1-3)

>>

>> With maxburst = 1 and mult = 3 it currently reults in an req->length of:

>>

>>  1024 * 1 * 3 = 3072

>

>I wasn't referring to this, I was just wondering why uvc uses SG.

>Correct me if I'm wrong, but isn't uvc allocates and uses contiguous

>buffer?


Not necessarily. Depending on the source creating vb2 buffers with mmu
enabled or directly you may get different patterns. The patches I send
are able to also handle contiguous buffers. In that case vb2 would just
hand over a scatter list with one big entry, which would be totally fine.
In that case we would not have to scatter it for the gadget controller.

>Depend on the setup, normal request may show better performance.


Right. In my setup the data is coming from an mmu.

>> This request will use one extra sg created from the uvc driver adding

>> a header of 2 bytes per request and then has 3070 bytes left for the

>> payload which will look into the vb2 sg list and increment every sg

>> entrie with the the used offset until it reaches sg page boundaries

>> and in that case will have to add one extra sg.

>>

>> So this results in the funny patterns you see.

>>

>> We can increase the req->length to 46080 with a maxburst of 15 for usb3

>> but I did not try that out yet.

>>

>

>I wouldn't recommend that. Unless you know you're using both DWC3x

>device and host, most other controllers we know cannot handle high isoc

>throughput, especially at bInterval=1.


Right. I think Felipe mentioned that as well.

As the parameters are possible to tweak from userspace this is all fine.
The user just will have to find the compromise between performance and
stability.

Thanks,
Michael

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |
Thinh Nguyen April 29, 2021, 6:15 p.m. UTC | #17
Michael Grzeschik wrote:
> On Wed, Apr 28, 2021 at 11:25:05PM +0000, Thinh Nguyen wrote:

>> Michael Grzeschik wrote:

>>> Hi Thinh,

>>>

>>> On Wed, Apr 28, 2021 at 01:45:11AM +0000, Thinh Nguyen wrote:

>>>> Michael Grzeschik wrote:

>>>>> On Tue, Apr 27, 2021 at 03:18:57AM +0000, Thinh Nguyen wrote:

>>>>>> Hi Michael,

>>>>>>

>>>>>> Thinh Nguyen wrote:

>>>>>>> Felipe Balbi wrote:

>>>>>>>>

>>>>>>>> HI,

>>>>>>>>

>>>>>>>> Michael Grzeschik <mgr@pengutronix.de> writes:

>>>>>>>>

>>>>>>>> <big snip>

>>>>>>>>

>>>>>>

>>>>>>

>>>>>> <bigger snip>

>>>>>>

>>>>>>

>>>>>>> I think I see the issue that Michael reported.

>>>>>>>

>>>>>>> The problem is that we're using num_pending_sgs to track both

>>>>>>> pending SG

>>>>>>> entries and queued SG entries. num_pending_sgs doesn't get updated

>>>>>>> until

>>>>>>> TRB completion interrupt (ie XferInProgress). Before the driver

>>>>>>> queues

>>>>>>> more SG requests, it will check if there's any pending SG in the

>>>>>>> started

>>>>>>> request list before it prepares more. Since the num_pending_sgs

>>>>>>> doesn't

>>>>>>> get updated until the request is completed, the driver doesn't

>>>>>>> process

>>>>>>> more until the request is completed.

>>>>>>>

>>>>>>> I need to review more on Michael's patches next week, but I think

>>>>>>> what

>>>>>>> he suggested makes sense (in term of properly usage of queued sgs vs

>>>>>>> pending sgs). BTW, please correct me if I'm wrong, but we do modify

>>>>>>> num_queued_sgs.

>>>>>>>

>>>>>>

>>>>>> There's still some issue with your patch. I think this should

>>>>>> cover it.

>>>>>> Let me know if it works for you.

>>>>>

>>>>> This works for me! Will you spin a proper patch from that?

>>>>

>>>> Sure. I can do that after 5.13-rc1 is released

>>>

>>> Great!

>>>

>>>>>

>>>>>> Note: this however probably needs more "Tested-by" and reviews

>>>>>> to make sure I'm not missing anything. I only ran some basic tests,

>>>>>> and will need to run more.

>>>>>

>>>>> You may already have mine:

>>>>>

>>>>> Tested-by: Michael Grzeschik <m.grzeschik@pengutronix.de>

>>>>>

>>>>>> Let me know if this makes sense.

>>>>>

>>>>> From what I understand about the issue and the purpose of all

>>>>> variables this makes total sense to me. So thanks for taking over

>>>>> and make a proper solution.

>>>>>

>>>>> Thanks,

>>>>> Michael

>>>>>

>>>>

>>>> Thanks for the Tested-by.

>>>>

>>>> Btw, any reason for using SG with transfer less than PAGE_SIZE? I

>>>> assume

>>>> your platform is 4KB, but you're splitting your 3KB transfer to

>>>> smaller.

>>>> Was it like this before? Note that DWC3 has a limited number of

>>>> internal

>>>> TRB cache. But what you're doing now is fine and doesn't impact

>>>> performance.

>>>

>>> It all comes from the limitation of the uvc_video gadget. Look into the

>>> patches I send to support sg in uvc_video driver. It just maps entries

>>> from an sg list comming from videobuf2 to an req->sg list. In

>>> uvc_video_alloc_requests the req->length will be set to req_size which

>>> is calculated with:

>>>

>>>  ep->maxpacket(1024) * maxburst(1-15) * mult(1-3)

>>>

>>> With maxburst = 1 and mult = 3 it currently reults in an req->length of:

>>>

>>>  1024 * 1 * 3 = 3072

>>

>> I wasn't referring to this, I was just wondering why uvc uses SG.

>> Correct me if I'm wrong, but isn't uvc allocates and uses contiguous

>> buffer?

> 

> Not necessarily. Depending on the source creating vb2 buffers with mmu

> enabled or directly you may get different patterns. The patches I send

> are able to also handle contiguous buffers. In that case vb2 would just

> hand over a scatter list with one big entry, which would be totally fine.

> In that case we would not have to scatter it for the gadget controller.

> 

>> Depend on the setup, normal request may show better performance.

> 

> Right. In my setup the data is coming from an mmu.

> 


I see. Thanks for the clarification.

Thinh