[v2] mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls

Message ID 201301192005.20093.arnd@arndb.de
State New
Headers show

Commit Message

Arnd Bergmann Jan. 19, 2013, 8:05 p.m.
On Saturday 19 January 2013, Soeren Moch wrote:
> What I can see in the log: a lot of coherent mappings from sata_mv and 
> orion_ehci, a few from mv643xx_eth, no other coherent mappings.
> All coherent mappings are page aligned, some of them (from orion_ehci)
> are not really small (as claimed in __alloc_from_pool).

Right. Unfortunately, the output does not show which of the mappings
are atomic, so we still need to look through those that can be atomic
to understand what's going on. There are a few megabytes of coherent
mappings in total according to the output, but it seems that a significant
portion of them is atomic, which is a bit unexpected.

> I don't believe in a memory leak. When I restart vdr (the application
> utilizing the dvb sticks) then there is enough dma memory available
> again.

I found at least one source line that incorrectly uses an atomic
allocation, in ehci_mem_init():

                dma_alloc_coherent (ehci_to_hcd(ehci)->self.controller,
                        ehci->periodic_size * sizeof(__le32),
                        &ehci->periodic_dma, 0);

The last argument is the GFP_ flag, which should never be zero, as
that is implicit !wait. This function is called only once, so it
is not the actual culprit, but there could be other instances
where we accidentally allocate something as GFP_ATOMIC.

The total number of allocations I found for each type are

sata_mv: 66 pages (270336 bytes)
mv643xx_eth: 4 pages == (16384 bytes)
orion_ehci: 154 pages (630784 bytes)
orion_ehci (atomic): 256 pages (1048576 bytes)

from the distribution of the numbers, it seems that there is exactly 1 MB
of data allocated between bus addresses 0x1f90000 and 0x1f9ffff, allocated
in individual pages. This matches the size of your pool, so it's definitely
something coming from USB, and no single other allocation, but it does not
directly point to a specific line of code.

One thing I found was that the ARM dma-mapping code seems buggy in the way
that it does a bitwise and between the gfp mask and GFP_ATOMIC, which does
not work because GFP_ATOMIC is defined by the absence of __GFP_WAIT.

I believe we need the patch below, but it is not clear to me if that issue
is related to your problem or now.

8<-------

There is one more code path I could find, which is usb_submit_urb() =>
usb_hcd_submit_urb => ehci_urb_enqueue() => submit_async() =>
qh_append_tds() => qh_make(GFP_ATOMIC) => ehci_qh_alloc() =>
dma_pool_alloc() => pool_alloc_page() => dma_alloc_coherent()

So even for a GFP_KERNEL passed into usb_submit_urb, the ehci driver
causes the low-level allocation to be GFP_ATOMIC, because 
qh_append_tds() is called under a spinlock. If we have hundreds
of URBs in flight, that will exhaust the pool rather quickly.

	Arnd

Comments

Soeren Moch Jan. 21, 2013, 3:01 p.m. | #1
On 01/19/13 21:05, Arnd Bergmann wrote:
> I found at least one source line that incorrectly uses an atomic
> allocation, in ehci_mem_init():
>
>                  dma_alloc_coherent (ehci_to_hcd(ehci)->self.controller,
>                          ehci->periodic_size * sizeof(__le32),
>                          &ehci->periodic_dma, 0);
>
> The last argument is the GFP_ flag, which should never be zero, as
> that is implicit !wait. This function is called only once, so it
> is not the actual culprit, but there could be other instances
> where we accidentally allocate something as GFP_ATOMIC.
>
> The total number of allocations I found for each type are
>
> sata_mv: 66 pages (270336 bytes)
> mv643xx_eth: 4 pages == (16384 bytes)
> orion_ehci: 154 pages (630784 bytes)
> orion_ehci (atomic): 256 pages (1048576 bytes)
>
> from the distribution of the numbers, it seems that there is exactly 1 MB
> of data allocated between bus addresses 0x1f90000 and 0x1f9ffff, allocated
> in individual pages. This matches the size of your pool, so it's definitely
> something coming from USB, and no single other allocation, but it does not
> directly point to a specific line of code.
Very interesting, so this is no fragmentation problem nor something 
caused by sata or ethernet.
> One thing I found was that the ARM dma-mapping code seems buggy in the way
> that it does a bitwise and between the gfp mask and GFP_ATOMIC, which does
> not work because GFP_ATOMIC is defined by the absence of __GFP_WAIT.
>
> I believe we need the patch below, but it is not clear to me if that issue
> is related to your problem or now.
Out of curiosity I checked include/linux/gfp.h. GFP_ATOMIC is defined as 
__GFP_HIGH (which means 'use emergency pool', and no wait), so this 
patch should not make any difference for "normal" (GPF_ATOMIC / 
GFP_KERNEL) allocations, only for gfp_flags accidentally set to zero. 
So, can a new test with this patch help to debug the pool exhaustion?
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b2fb87..c57975f 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -640,7 +641,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
>   
>   	if (is_coherent || nommu())
>   		addr = __alloc_simple_buffer(dev, size, gfp, &page);
> -	else if (gfp & GFP_ATOMIC)
> +	else if (!(gfp & __GFP_WAIT))
>   		addr = __alloc_from_pool(size, &page);
>   	else if (!IS_ENABLED(CONFIG_CMA))
>   		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller);
> @@ -1272,7 +1273,7 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
>   	*handle = DMA_ERROR_CODE;
>   	size = PAGE_ALIGN(size);
>   
> -	if (gfp & GFP_ATOMIC)
> +	if (!(gfp & __GFP_WAIT))
>   		return __iommu_alloc_atomic(dev, size, handle);
>   
>   	pages = __iommu_alloc_buffer(dev, size, gfp, attrs);
> 8<-------
>
> There is one more code path I could find, which is usb_submit_urb() =>
> usb_hcd_submit_urb => ehci_urb_enqueue() => submit_async() =>
> qh_append_tds() => qh_make(GFP_ATOMIC) => ehci_qh_alloc() =>
> dma_pool_alloc() => pool_alloc_page() => dma_alloc_coherent()
>
> So even for a GFP_KERNEL passed into usb_submit_urb, the ehci driver
> causes the low-level allocation to be GFP_ATOMIC, because
> qh_append_tds() is called under a spinlock. If we have hundreds
> of URBs in flight, that will exhaust the pool rather quickly.
>
Maybe there are hundreds of URBs in flight in my application, I have no 
idea how to check this. It seems to me that bad reception conditions 
(lost lock / regained lock messages for some dvb channels) accelerate 
the buffer exhaustion. But even with a 4MB coherent pool I see the 
error. Is there any chance to fix this in the usb or dvb subsystem (or 
wherever)? Should I try to further increase the pool size, or what else 
can I do besides using an older kernel?

   Soeren
Arnd Bergmann Jan. 21, 2013, 6:55 p.m. | #2
On Monday 21 January 2013, Soeren Moch wrote:
> On 01/19/13 21:05, Arnd Bergmann wrote:
> > from the distribution of the numbers, it seems that there is exactly 1 MB
> > of data allocated between bus addresses 0x1f90000 and 0x1f9ffff, allocated
> > in individual pages. This matches the size of your pool, so it's definitely
> > something coming from USB, and no single other allocation, but it does not
> > directly point to a specific line of code.
> Very interesting, so this is no fragmentation problem nor something 
> caused by sata or ethernet.

Right.

> > One thing I found was that the ARM dma-mapping code seems buggy in the way
> > that it does a bitwise and between the gfp mask and GFP_ATOMIC, which does
> > not work because GFP_ATOMIC is defined by the absence of __GFP_WAIT.
> >
> > I believe we need the patch below, but it is not clear to me if that issue
> > is related to your problem or now.
> Out of curiosity I checked include/linux/gfp.h. GFP_ATOMIC is defined as 
> __GFP_HIGH (which means 'use emergency pool', and no wait), so this 
> patch should not make any difference for "normal" (GPF_ATOMIC / 
> GFP_KERNEL) allocations, only for gfp_flags accidentally set to zero. 

Yes, or one of the rare cases where someone intentionally does something like
(GFP_ATOMIC & !__GFP_HIGH) or (GFP_KERNEL || __GFP_HIGH), which are both
wrong.

> So, can a new test with this patch help to debug the pool exhaustion?

Yes, but I would not expect this to change much. It's a bug, but not likely
the one you are hitting.

> > So even for a GFP_KERNEL passed into usb_submit_urb, the ehci driver
> > causes the low-level allocation to be GFP_ATOMIC, because
> > qh_append_tds() is called under a spinlock. If we have hundreds
> > of URBs in flight, that will exhaust the pool rather quickly.
> >
> Maybe there are hundreds of URBs in flight in my application, I have no 
> idea how to check this.

I don't know a lot about USB, but I always assumed that this was not
a normal condition and that there are only a couple of URBs per endpoint
used at a time. Maybe Greg or someone else with a USB background can
shed some light on this.

> It seems to me that bad reception conditions 
> (lost lock / regained lock messages for some dvb channels) accelerate 
> the buffer exhaustion. But even with a 4MB coherent pool I see the 
> error. Is there any chance to fix this in the usb or dvb subsystem (or 
> wherever)? Should I try to further increase the pool size, or what else 
> can I do besides using an older kernel?

There are two things that I think can be done if hundreds of URBs is
indeed the normal working condition for this driver:

* change the locking in your driver so it can actually call usb_submit_urb
using GFP_KERNEL rather than GFP_ATOMIC
* after that is done, rework the ehci_hcd driver so it can do the
allocation inside of the submit_urb path to use the mem_flags rather
than unconditional GFP_ATOMIC.

Note that the problem you are seeing does not just exist in the case of
the atomic coherent pool getting exhausted, but also on any platform
that runs into an out-of-memory condition.

	Arnd
Greg Kroah-Hartman Jan. 21, 2013, 9:01 p.m. | #3
On Mon, Jan 21, 2013 at 06:55:25PM +0000, Arnd Bergmann wrote:
> On Monday 21 January 2013, Soeren Moch wrote:
> > On 01/19/13 21:05, Arnd Bergmann wrote:
> > > from the distribution of the numbers, it seems that there is exactly 1 MB
> > > of data allocated between bus addresses 0x1f90000 and 0x1f9ffff, allocated
> > > in individual pages. This matches the size of your pool, so it's definitely
> > > something coming from USB, and no single other allocation, but it does not
> > > directly point to a specific line of code.
> > Very interesting, so this is no fragmentation problem nor something 
> > caused by sata or ethernet.
> 
> Right.
> 
> > > One thing I found was that the ARM dma-mapping code seems buggy in the way
> > > that it does a bitwise and between the gfp mask and GFP_ATOMIC, which does
> > > not work because GFP_ATOMIC is defined by the absence of __GFP_WAIT.
> > >
> > > I believe we need the patch below, but it is not clear to me if that issue
> > > is related to your problem or now.
> > Out of curiosity I checked include/linux/gfp.h. GFP_ATOMIC is defined as 
> > __GFP_HIGH (which means 'use emergency pool', and no wait), so this 
> > patch should not make any difference for "normal" (GPF_ATOMIC / 
> > GFP_KERNEL) allocations, only for gfp_flags accidentally set to zero. 
> 
> Yes, or one of the rare cases where someone intentionally does something like
> (GFP_ATOMIC & !__GFP_HIGH) or (GFP_KERNEL || __GFP_HIGH), which are both
> wrong.
> 
> > So, can a new test with this patch help to debug the pool exhaustion?
> 
> Yes, but I would not expect this to change much. It's a bug, but not likely
> the one you are hitting.
> 
> > > So even for a GFP_KERNEL passed into usb_submit_urb, the ehci driver
> > > causes the low-level allocation to be GFP_ATOMIC, because
> > > qh_append_tds() is called under a spinlock. If we have hundreds
> > > of URBs in flight, that will exhaust the pool rather quickly.
> > >
> > Maybe there are hundreds of URBs in flight in my application, I have no 
> > idea how to check this.
> 
> I don't know a lot about USB, but I always assumed that this was not
> a normal condition and that there are only a couple of URBs per endpoint
> used at a time. Maybe Greg or someone else with a USB background can
> shed some light on this.

There's no restriction on how many URBs a driver can have outstanding at
once, and if you have a system with a lot of USB devices running at the
same time, there could be lots of URBs in flight depending on the number
of host controllers and devices and drivers being used.

Sorry,

greg k-h

Patch hide | download patch | download mbox

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b2fb87..c57975f 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -640,7 +641,7 @@  static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 
 	if (is_coherent || nommu())
 		addr = __alloc_simple_buffer(dev, size, gfp, &page);
-	else if (gfp & GFP_ATOMIC)
+	else if (!(gfp & __GFP_WAIT))
 		addr = __alloc_from_pool(size, &page);
 	else if (!IS_ENABLED(CONFIG_CMA))
 		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller);
@@ -1272,7 +1273,7 @@  static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
 	*handle = DMA_ERROR_CODE;
 	size = PAGE_ALIGN(size);
 
-	if (gfp & GFP_ATOMIC)
+	if (!(gfp & __GFP_WAIT))
 		return __iommu_alloc_atomic(dev, size, handle);
 
 	pages = __iommu_alloc_buffer(dev, size, gfp, attrs);