diff mbox

dmaengine: pl330: Fix some race conditions in residue calculation

Message ID 1456319674.2867.15.camel@linaro.org
State New
Headers show

Commit Message

Jon Medhurst (Tixy) Feb. 24, 2016, 1:14 p.m. UTC
The residue calculation in pl330_tx_status doesn't handle transitional
states that occur at the time one descriptor (A) is completed and the
next (B) is started. Specifically, both A and B can simultaneously be in
the BUSY state and at this time the thread's 'req_running' may (or may
not) be -1.

To cope with this situation we change the code to ensure A is treated as
complete and B as having not yet started. Prior to the change, the code
would calculate a transferred byte count as if both A and B had
completed.

Fixes: aee4d1fac887 ("dmaengine: pl330: improve pl330_tx_status() function")

Signed-off-by: Jon Medhurst <tixy@linaro.org>

---

I discovered this issue when trying to work out why audio stopped
working on ARM's Juno platform and bisected it to commit aee4d1fac887.
Whilst this patch seems to fix the problems I was seeing, I can't help
but think there are more race conditions with this code. E.g. if the
running descriptor changes under us, pl330_get_current_xferred_count
can end up reading values from hardware that relate to a different
descriptor. And if we're really unlucky, the reading of the 'val' and
'addr' values in pl330_get_current_xferred_count can come from different
descriptors. I don't know if there is any locks we can use to prevent
such races or if we need to try and detect when things have changed and
redo/abort the residue calculation...

 drivers/dma/pl330.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

-- 
2.1.4

Comments

Jon Medhurst (Tixy) March 8, 2016, 10:40 a.m. UTC | #1
On Tue, 2016-03-08 at 09:42 +0530, Vinod Koul wrote:
> On Wed, Feb 24, 2016 at 01:14:34PM +0000, Jon Medhurst (Tixy) wrote:

> > The residue calculation in pl330_tx_status doesn't handle transitional

> > states that occur at the time one descriptor (A) is completed and the

> > next (B) is started. Specifically, both A and B can simultaneously be in

> > the BUSY state and at this time the thread's 'req_running' may (or may

> > not) be -1.

> 

> you are under lock so descriptor state wont be update while we are it.

> 

> Also the query for residue is for "a descriptor" not whatever is the current

> running descriptor...

> 

> > 

> > To cope with this situation we change the code to ensure A is treated as

> > complete and B as having not yet started. Prior to the change, the code

> > would calculate a transferred byte count as if both A and B had

> > completed.

> 

> You query for either A or B not both!


I've probably been using wrong/ambiguous terminology...

In my description I'm using 'descriptor' to refer to a 'struct
dma_pl330_desc', I guess other people assume 'struct
dma_async_tx_descriptor'?

The situation I was debugging was audio playback, where ASoC ends up
calling pl330_prep_dma_cyclic() with a period one quarter the length of
the buffer it is using, so that results in four dma_pl330_desc
'descriptors' being created to cover that buffer. These later get
submitted to a DMA channel (struct dma_pl330_chan) which has a list of
these that it is processing (the 'work_list').

The residual calculation that currently exists in pl08x_dma_tx_status()
is iterating this work_list and summing the length of currently
transferring 'descriptor' with those later pending ones. I believe that
is correct behaviour because these 'descriptors' (dma_pl330_desc) are
all internal implementation details of the driver, and the dmaengine
API's are dealing in units of 'dma_async_tx_descriptor' ?

If the current code is OK in this regard, it is definitely buggy because
it doesn't cope with the situation when two dma_pl330_desc's are in the
state 'BUSY' a, which I have seen occur when debugging this issue, had
worked out can happen by analysing the code, and is acknowledged by the
in-source comments for enum desc_status...

	/*
	 * Sitting on the work_list and already submitted
	 * to the PL330 core. Not more than two descriptors
	 * of a channel can be BUSY at any time.
	 */
	BUSY,

In my problematic usecase I have userside code calling ALSA ioctls to
poll the current audio playback position which results in
pl08x_dma_tx_status() being called multiple times a second. After only a
second or two the buggy situation gets hit, resulting in a
miscalculation that ASoC interprets as a buffer underflow and so it
stops the stream.

I spent several days debugging this, with enough ad hoc tests and
printk's littered everywhere to be very confident as to how things are
going wrong - what I'm not not totally confident of is how things should
be properly fixed.

This patch appears to fix the situation that I was hitting, but it
really looks like there isn't any locking that prevent this polling use
of pl08x_dma_tx_status() from happening concurrently with the irq
handler reprogramming the hardware for the next dma_pl330_desc. I didn't
attempt any fix for that for fear of introducing bugs in what looks like
complex code, and because it's not a problem I saw happen in practice.

-- Tixy

> 

> > 

> > Fixes: aee4d1fac887 ("dmaengine: pl330: improve pl330_tx_status() function")

> > 

> > Signed-off-by: Jon Medhurst <tixy@linaro.org>

> > ---

> > 

> > I discovered this issue when trying to work out why audio stopped

> > working on ARM's Juno platform and bisected it to commit aee4d1fac887.

> > Whilst this patch seems to fix the problems I was seeing, I can't help

> > but think there are more race conditions with this code. E.g. if the

> > running descriptor changes under us, pl330_get_current_xferred_count

> > can end up reading values from hardware that relate to a different

> > descriptor. And if we're really unlucky, the reading of the 'val' and

> > 'addr' values in pl330_get_current_xferred_count can come from different

> > descriptors. I don't know if there is any locks we can use to prevent

> > such races or if we need to try and detect when things have changed and

> > redo/abort the residue calculation...

> > 

> >  drivers/dma/pl330.c | 24 ++++++++++++++++++++----

> >  1 file changed, 20 insertions(+), 4 deletions(-)

> > 

> > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c

> > index 17ee758..55e3c5f 100644

> > --- a/drivers/dma/pl330.c

> > +++ b/drivers/dma/pl330.c

> > @@ -2240,6 +2240,7 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,

> >  	struct dma_pl330_desc *desc, *running = NULL;

> >  	struct dma_pl330_chan *pch = to_pchan(chan);

> >  	unsigned int transferred, residual = 0;

> > +	bool first_busy;

> >  

> >  	ret = dma_cookie_status(chan, cookie, txstate);

> >  

> > @@ -2253,16 +2254,31 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,

> >  

> >  	if (pch->thread->req_running != -1)

> >  		running = pch->thread->req[pch->thread->req_running].desc;

> > +	first_busy = true;

> >  

> >  	/* Check in pending list */

> >  	list_for_each_entry(desc, &pch->work_list, node) {

> >  		if (desc->status == DONE)

> >  			transferred = desc->bytes_requested;

> > -		else if (running && desc == running)

> > -			transferred =

> > -				pl330_get_current_xferred_count(pch, desc);

> > -		else

> > +		else if (desc->status == BUSY && first_busy) {

> > +			first_busy = false;

> > +			if (running && desc == running) {

> > +				transferred =

> > +					pl330_get_current_xferred_count(pch, desc);

> > +			} else {

> > +				/* BUSY but not running means it's just completed */

> > +				transferred = desc->bytes_requested;

> > +			}

> > +		} else {

> > +			/*

> > +			 * Descriptor is either in PREP state queued for future

> > +			 * transfer or it is the second BUSY descriptor we have

> > +			 * seen. The latter case means it has just, or is about

> > +			 * to be, started, so treat it as having not yet

> > +			 * transferred any bytes, the same as PREP.

> > +			 */

> >  			transferred = 0;

> > +		}

> >  		residual += desc->bytes_requested - transferred;

> >  		if (desc->txd.cookie == cookie) {

> >  			switch (desc->status) {

> > -- 

> > 2.1.4

> > 

> > 

>
Jon Medhurst (Tixy) March 8, 2016, 3:50 p.m. UTC | #2
On Tue, 2016-03-08 at 19:45 +0530, Vinod Koul wrote:
> On Tue, Mar 08, 2016 at 10:40:11AM +0000, Jon Medhurst (Tixy) wrote:

> > On Tue, 2016-03-08 at 09:42 +0530, Vinod Koul wrote:

> > > On Wed, Feb 24, 2016 at 01:14:34PM +0000, Jon Medhurst (Tixy) wrote:

> > > > The residue calculation in pl330_tx_status doesn't handle transitional

> > > > states that occur at the time one descriptor (A) is completed and the

> > > > next (B) is started. Specifically, both A and B can simultaneously be in

> > > > the BUSY state and at this time the thread's 'req_running' may (or may

> > > > not) be -1.

> > > 

> > > you are under lock so descriptor state wont be update while we are it.

> > > 

> > > Also the query for residue is for "a descriptor" not whatever is the current

> > > running descriptor...

> > > 

> > > > 

> > > > To cope with this situation we change the code to ensure A is treated as

> > > > complete and B as having not yet started. Prior to the change, the code

> > > > would calculate a transferred byte count as if both A and B had

> > > > completed.

> > > 

> > > You query for either A or B not both!

> > 

> > I've probably been using wrong/ambiguous terminology...

> > 

> > In my description I'm using 'descriptor' to refer to a 'struct

> > dma_pl330_desc', I guess other people assume 'struct

> > dma_async_tx_descriptor'?

> 

> Nope

> 

> > The situation I was debugging was audio playback, where ASoC ends up

> > calling pl330_prep_dma_cyclic() with a period one quarter the length of

> > the buffer it is using, so that results in four dma_pl330_desc

> > 'descriptors' being created to cover that buffer. These later get

> > submitted to a DMA channel (struct dma_pl330_chan) which has a list of

> > these that it is processing (the 'work_list').

> > 

> > The residual calculation that currently exists in pl08x_dma_tx_status()

> > is iterating this work_list and summing the length of currently

> > transferring 'descriptor' with those later pending ones. I believe that

> > is correct behaviour because these 'descriptors' (dma_pl330_desc) are

> > all internal implementation details of the driver, and the dmaengine

> > API's are dealing in units of 'dma_async_tx_descriptor' ?

> 

> Not really. If you look closely dma_pl330_desc contains

> dma_async_tx_descriptor. A descriptor represents a tranasaction and is

> certainly not internal detail of your driver


I think that for this driver it's kinda both, which is why it's
confusing...

pl330_prep_dma_cyclic(), after creating a number of dma_pl330_desc
objects in a circular list, returns the address of the
dma_async_tx_descriptor contained in the last one. I'm guessing that as
far as users of the dmaengine API is concerned, this is treated as just
a single descriptor. pl330_prep_slave_sg() creates a similar circular
list.

When these descriptors are later submitted to the driver for transfer,
it is done by a call to pl330_tx_submit, which I has this comment...

/*
 * We returned the last one of the circular list of descriptor(s)
 * from prep_xxx, so the argument to submit corresponds to the last
 * descriptor of the list.
 */

This goes on to add all the linked descriptors to the pending queue for
sending to the DMA hardware, and then returns the cookie of the last
descriptor, maintaining the fiction to the dmaengine API's that there is
just a single descriptor.

> 

> The reside is requested for "a descriptor". For example if you have prepared

> two descriptors A and B and submitted them, then you can request status and

> reside for A and you need to calculate that for A only and not take into

> account status of B


But, in the case of the pl330 driver, A and B may each consist of
multiple internal/hidden descriptors. So the residue calculation has to
sum up the residue of all these internal/hidden descriptors as well.
This is what the current pl330_tx_status() function does, but has bugs.

I've only just managed to clearly understand all the above details
whilst writing this email, and this confusion obviously means the code
and any commit messages need to explain things better.

-- Tixy

> 

> > 

> > If the current code is OK in this regard, it is definitely buggy because

> > it doesn't cope with the situation when two dma_pl330_desc's are in the

> > state 'BUSY' a, which I have seen occur when debugging this issue, had

> > worked out can happen by analysing the code, and is acknowledged by the

> > in-source comments for enum desc_status...

> > 

> > 	/*

> > 	 * Sitting on the work_list and already submitted

> > 	 * to the PL330 core. Not more than two descriptors

> > 	 * of a channel can be BUSY at any time.

> > 	 */

> > 	BUSY,

> > 

> > In my problematic usecase I have userside code calling ALSA ioctls to

> > poll the current audio playback position which results in

> > pl08x_dma_tx_status() being called multiple times a second. After only a

> > second or two the buggy situation gets hit, resulting in a

> > miscalculation that ASoC interprets as a buffer underflow and so it

> > stops the stream.

> > 

> > I spent several days debugging this, with enough ad hoc tests and

> > printk's littered everywhere to be very confident as to how things are

> > going wrong - what I'm not not totally confident of is how things should

> > be properly fixed.

> 

> There maybe a problem but this patch and the fix dont explain that

> 

> > 

> > This patch appears to fix the situation that I was hitting, but it

> > really looks like there isn't any locking that prevent this polling use

> > of pl08x_dma_tx_status() from happening concurrently with the irq

> > handler reprogramming the hardware for the next dma_pl330_desc. I didn't

> > attempt any fix for that for fear of introducing bugs in what looks like

> > complex code, and because it's not a problem I saw happen in practice.

> > 

> > -- Tixy

> > 

> > > 

> > > > 

> > > > Fixes: aee4d1fac887 ("dmaengine: pl330: improve pl330_tx_status() function")

> > > > 

> > > > Signed-off-by: Jon Medhurst <tixy@linaro.org>

> > > > ---

> > > > 

> > > > I discovered this issue when trying to work out why audio stopped

> > > > working on ARM's Juno platform and bisected it to commit aee4d1fac887.

> > > > Whilst this patch seems to fix the problems I was seeing, I can't help

> > > > but think there are more race conditions with this code. E.g. if the

> > > > running descriptor changes under us, pl330_get_current_xferred_count

> > > > can end up reading values from hardware that relate to a different

> > > > descriptor. And if we're really unlucky, the reading of the 'val' and

> > > > 'addr' values in pl330_get_current_xferred_count can come from different

> > > > descriptors. I don't know if there is any locks we can use to prevent

> > > > such races or if we need to try and detect when things have changed and

> > > > redo/abort the residue calculation...

> > > > 

> > > >  drivers/dma/pl330.c | 24 ++++++++++++++++++++----

> > > >  1 file changed, 20 insertions(+), 4 deletions(-)

> > > > 

> > > > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c

> > > > index 17ee758..55e3c5f 100644

> > > > --- a/drivers/dma/pl330.c

> > > > +++ b/drivers/dma/pl330.c

> > > > @@ -2240,6 +2240,7 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,

> > > >  	struct dma_pl330_desc *desc, *running = NULL;

> > > >  	struct dma_pl330_chan *pch = to_pchan(chan);

> > > >  	unsigned int transferred, residual = 0;

> > > > +	bool first_busy;

> > > >  

> > > >  	ret = dma_cookie_status(chan, cookie, txstate);

> > > >  

> > > > @@ -2253,16 +2254,31 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,

> > > >  

> > > >  	if (pch->thread->req_running != -1)

> > > >  		running = pch->thread->req[pch->thread->req_running].desc;

> > > > +	first_busy = true;

> > > >  

> > > >  	/* Check in pending list */

> > > >  	list_for_each_entry(desc, &pch->work_list, node) {

> > > >  		if (desc->status == DONE)

> > > >  			transferred = desc->bytes_requested;

> > > > -		else if (running && desc == running)

> > > > -			transferred =

> > > > -				pl330_get_current_xferred_count(pch, desc);

> > > > -		else

> > > > +		else if (desc->status == BUSY && first_busy) {

> > > > +			first_busy = false;

> > > > +			if (running && desc == running) {

> > > > +				transferred =

> > > > +					pl330_get_current_xferred_count(pch, desc);

> > > > +			} else {

> > > > +				/* BUSY but not running means it's just completed */

> > > > +				transferred = desc->bytes_requested;

> > > > +			}

> > > > +		} else {

> > > > +			/*

> > > > +			 * Descriptor is either in PREP state queued for future

> > > > +			 * transfer or it is the second BUSY descriptor we have

> > > > +			 * seen. The latter case means it has just, or is about

> > > > +			 * to be, started, so treat it as having not yet

> > > > +			 * transferred any bytes, the same as PREP.

> > > > +			 */

> > > >  			transferred = 0;

> > > > +		}

> > > >  		residual += desc->bytes_requested - transferred;

> > > >  		if (desc->txd.cookie == cookie) {

> > > >  			switch (desc->status) {

> > > > -- 

> > > > 2.1.4

> > > > 

> > > > 

> > > 

> > 

> > 

>
Jon Medhurst (Tixy) March 15, 2016, 9:10 a.m. UTC | #3
On Fri, 2016-03-11 at 13:13 +0530, Vinod Koul wrote:
> On Tue, Mar 08, 2016 at 03:50:41PM +0000, Jon Medhurst (Tixy) wrote:

> 

> > > The reside is requested for "a descriptor". For example if you have prepared

> > > two descriptors A and B and submitted them, then you can request status and

> > > reside for A and you need to calculate that for A only and not take into

> > > account status of B

> > 

> > But, in the case of the pl330 driver, A and B may each consist of

> > multiple internal/hidden descriptors. So the residue calculation has to

> > sum up the residue of all these internal/hidden descriptors as well.

> > This is what the current pl330_tx_status() function does, but has bugs.

> > 

> > I've only just managed to clearly understand all the above details

> > whilst writing this email, and this confusion obviously means the code

> > and any commit messages need to explain things better.

> 

> Okay I relooked at the code, the driver is creating descriptor per period,

> so yes while calculating that should be taken into account but only for

> cyclic case and not for rest, as it will break reside values for for them.


So not for scatter gather lists passed to pl330_prep_slave_sg? It seems
to me, that as for the cyclic case, the descriptors used for each
element in the list are equally internal to the pl330 driver and not
exposed to users.

pl330_tx_submit() only returns the cookie for the last descriptor, how
do users get the cookies for the other elements so they can pass them to
pl330_tx_status()?

To me, the implementation of the pl330 driver looks consistent with the
APIs I can see in header files and the Documentation directory, namely:

- dmaengine_prep_* functions prepare a DMA transactions and return
  a descriptor (struct dma_async_tx_descriptor)

- The transaction is queued for processing by passing that descriptor to
  dmaengine_submit() which returns 'cookie' (dma_cookie_t)

- The current status of a transaction can be queried by passing it's
  cookie to dmaengine_tx_status

Now, I've never looked at DMA API's or their working before I started
investigating the bug I'm trying to fix, so I may very well have missed
something. 

In fact, the more I look, the more uncertain I am. From
Documentation/dmaengine/provider.txt it describes:

   * device_tx_status
     - Should report the bytes left to go over on the given channel
     - Should only care about the transaction descriptor passed as
       argument, not the currently active one on a given channel
     - The tx_state argument might be NULL
     - Should use dma_set_residue to report it
     - In the case of a cyclic transfer, it should only take into
       account the current period.

So that's saying residue for cyclic transfers residue is only for the
current period?

I tried looking at other DMA drivers for inspiration, and it seems that
some at least (e.g. drivers/dma/mmp_pdma.c) do things the same way as
pl330, but that could just be a case of people copying broken code.

-- 
Tixy
diff mbox

Patch

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 17ee758..55e3c5f 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -2240,6 +2240,7 @@  pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
 	struct dma_pl330_desc *desc, *running = NULL;
 	struct dma_pl330_chan *pch = to_pchan(chan);
 	unsigned int transferred, residual = 0;
+	bool first_busy;
 
 	ret = dma_cookie_status(chan, cookie, txstate);
 
@@ -2253,16 +2254,31 @@  pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
 
 	if (pch->thread->req_running != -1)
 		running = pch->thread->req[pch->thread->req_running].desc;
+	first_busy = true;
 
 	/* Check in pending list */
 	list_for_each_entry(desc, &pch->work_list, node) {
 		if (desc->status == DONE)
 			transferred = desc->bytes_requested;
-		else if (running && desc == running)
-			transferred =
-				pl330_get_current_xferred_count(pch, desc);
-		else
+		else if (desc->status == BUSY && first_busy) {
+			first_busy = false;
+			if (running && desc == running) {
+				transferred =
+					pl330_get_current_xferred_count(pch, desc);
+			} else {
+				/* BUSY but not running means it's just completed */
+				transferred = desc->bytes_requested;
+			}
+		} else {
+			/*
+			 * Descriptor is either in PREP state queued for future
+			 * transfer or it is the second BUSY descriptor we have
+			 * seen. The latter case means it has just, or is about
+			 * to be, started, so treat it as having not yet
+			 * transferred any bytes, the same as PREP.
+			 */
 			transferred = 0;
+		}
 		residual += desc->bytes_requested - transferred;
 		if (desc->txd.cookie == cookie) {
 			switch (desc->status) {