[v2] mm: dmapool: use provided gfp flags for all dma_alloc_coherent() calls

Message ID 201301172026.45514.arnd@arndb.de
State New
Headers show

Commit Message

Arnd Bergmann Jan. 17, 2013, 8:26 p.m.
On Thursday 17 January 2013, Soeren Moch wrote:
> On 17.01.2013 11:49, Arnd Bergmann wrote:
> > On Wednesday 16 January 2013, Soeren Moch wrote:
> >>>> I will see what I can do here. Is there an easy way to track the buffer
> >>>> usage without having to wait for complete exhaustion?
> >>>
> >>> DMA_API_DEBUG
> >>
> >> OK, maybe I can try this.
> >>>
> >
> > Any success with this? It should at least tell you if there is a
> > memory leak in one of the drivers.
> 
> Not yet, sorry. I have to do all the tests in my limited spare time.
> Can you tell me what to search for in the debug output?

Actually now that I've looked closer, you can't immediately see
all the mappings as I thought.

But please try enabling DMA_API_DEBUG in combination with this
one-line patch:

That will show every single allocation that is currently active. This lets
you see where all the memory went, and if there is a possible leak or
excessive fragmentation.

	Arnd

Comments

Soeren Moch Jan. 19, 2013, 3:29 p.m. | #1
On 17.01.2013 21:26, Arnd Bergmann wrote:
> On Thursday 17 January 2013, Soeren Moch wrote:
>> On 17.01.2013 11:49, Arnd Bergmann wrote:
>>> On Wednesday 16 January 2013, Soeren Moch wrote:
>>>>>> I will see what I can do here. Is there an easy way to track the buffer
>>>>>> usage without having to wait for complete exhaustion?
>>>>>
>>>>> DMA_API_DEBUG
>>>>
>>>> OK, maybe I can try this.
>>>>>
>>>
>>> Any success with this? It should at least tell you if there is a
>>> memory leak in one of the drivers.
>>
>> Not yet, sorry. I have to do all the tests in my limited spare time.
>> Can you tell me what to search for in the debug output?
>
> Actually now that I've looked closer, you can't immediately see
> all the mappings as I thought.
>
> But please try enabling DMA_API_DEBUG in combination with this
> one-line patch:
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6b2fb87..3df74ac 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -497,6 +497,7 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page)
>   		pr_err_once("ERROR: %u KiB atomic DMA coherent pool is too small!\n"
>   			    "Please increase it with coherent_pool= kernel parameter!\n",
>   			    (unsigned)pool->size / 1024);
> +		debug_dma_dump_mappings(NULL);
>   	}
>   	spin_unlock_irqrestore(&pool->lock, flags);
>
> That will show every single allocation that is currently active. This lets
> you see where all the memory went, and if there is a possible leak or
> excessive fragmentation.
>
> 	Arnd
>

Please find attached a debug log generated with your patch.

I used the sata disk and two em28xx dvb sticks, no other usb devices,
no ethernet cable connected, tuners on saa716x-based card not used.

What I can see in the log: a lot of coherent mappings from sata_mv and 
orion_ehci, a few from mv643xx_eth, no other coherent mappings.
All coherent mappings are page aligned, some of them (from orion_ehci)
are not really small (as claimed in __alloc_from_pool).

I don't believe in a memory leak. When I restart vdr (the application
utilizing the dvb sticks) then there is enough dma memory available
again.

Regards,
Soeren
Andrew Lunn Jan. 19, 2013, 4:24 p.m. | #2
On Thu, Jan 17, 2013 at 08:26:45PM +0000, Arnd Bergmann wrote:
> On Thursday 17 January 2013, Soeren Moch wrote:
> > On 17.01.2013 11:49, Arnd Bergmann wrote:
> > > On Wednesday 16 January 2013, Soeren Moch wrote:
> > >>>> I will see what I can do here. Is there an easy way to track the buffer
> > >>>> usage without having to wait for complete exhaustion?
> > >>>
> > >>> DMA_API_DEBUG
> > >>
> > >> OK, maybe I can try this.

I tried this. Not what i expected. We have at least one problem with
the ethernet driver:

WARNING: at lib/dma-debug.c:933 check_unmap+0x4b8/0x8a8()
mv643xx_eth_port mv643xx_eth_port.0: DMA-API: device driver failed to check map error[device address=0x000000001f22be00] [size=1536 bytes] [mapped as single]
Modules linked in:
[<c000db10>] (unwind_backtrace+0x0/0xf4) from [<c0016c44>] (warn_slowpath_common+0x4c/0x64)
[<c0016c44>] (warn_slowpath_common+0x4c/0x64) from [<c0016cf0>] (warn_slowpath_fmt+0x30/0x40)
[<c0016cf0>] (warn_slowpath_fmt+0x30/0x40) from [<c01ab540>] (check_unmap+0x4b8/0x8a8)
[<c01ab540>] (check_unmap+0x4b8/0x8a8) from [<c01abbb8>] (debug_dma_unmap_page+0x8c/0x98)
[<c01abbb8>] (debug_dma_unmap_page+0x8c/0x98) from [<c025cb1c>] (mv643xx_eth_poll+0x630/0x800)
[<c025cb1c>] (mv643xx_eth_poll+0x630/0x800) from [<c0331d9c>] (net_rx_action+0xcc/0x1d4)
[<c0331d9c>] (net_rx_action+0xcc/0x1d4) from [<c001e1b0>] (__do_softirq+0xa8/0x170)
[<c001e1b0>] (__do_softirq+0xa8/0x170) from [<c001e3e8>] (do_softirq+0x5c/0x6c)
[<c001e3e8>] (do_softirq+0x5c/0x6c) from [<c001e610>] (local_bh_enable+0xcc/0xdc)
[<c001e610>] (local_bh_enable+0xcc/0xdc) from [<c0359c74>] (ip_finish_output+0x1c8/0x39c)
[<c0359c74>] (ip_finish_output+0x1c8/0x39c) from [<c03571a4>] (ip_local_out+0x28/0x2c)
[<c03571a4>] (ip_local_out+0x28/0x2c) from [<c0359564>] (ip_queue_xmit+0x118/0x338)
[<c0359564>] (ip_queue_xmit+0x118/0x338) from [<c036e58c>] (tcp_transmit_skb+0x3fc/0x8e4)
[<c036e58c>] (tcp_transmit_skb+0x3fc/0x8e4) from [<c0371218>] (tcp_write_xmit+0x228/0xb08)
[<c0371218>] (tcp_write_xmit+0x228/0xb08) from [<c0371b6c>] (__tcp_push_pending_frames+0x30/0x9c)
[<c0371b6c>] (__tcp_push_pending_frames+0x30/0x9c) from [<c0362940>] (tcp_sendmsg+0x158/0xdc4)
[<c0362940>] (tcp_sendmsg+0x158/0xdc4) from [<c0386620>] (inet_sendmsg+0x38/0x74)
[<c0386620>] (inet_sendmsg+0x38/0x74) from [<c031f370>] (sock_aio_write+0x12c/0x138)
[<c031f370>] (sock_aio_write+0x12c/0x138) from [<c00a62f4>] (do_sync_write+0xa0/0xd0)
[<c00a62f4>] (do_sync_write+0xa0/0xd0) from [<c00a6f18>] (vfs_write+0x13c/0x144)
[<c00a6f18>] (vfs_write+0x13c/0x144) from [<c00a6ff0>] (sys_write+0x44/0x70)
[<c00a6ff0>] (sys_write+0x44/0x70) from [<c0008ce0>] (ret_fast_syscall+0x0/0x2c)
---[ end trace b75faa8779652e63 ]---

I'm getting about 4 errors reported a second from the ethernet driver.

Before i look at issues with em28xx i will first try to get the noise
from the ethernet driver sorted out.

     Andrew
Andrew Lunn Jan. 19, 2013, 6:59 p.m. | #3
> Please find attached a debug log generated with your patch.
> 
> I used the sata disk and two em28xx dvb sticks, no other usb devices,
> no ethernet cable connected, tuners on saa716x-based card not used.
> 
> What I can see in the log: a lot of coherent mappings from sata_mv
> and orion_ehci, a few from mv643xx_eth, no other coherent mappings.
> All coherent mappings are page aligned, some of them (from orion_ehci)
> are not really small (as claimed in __alloc_from_pool).
> 
> I don't believe in a memory leak. When I restart vdr (the application
> utilizing the dvb sticks) then there is enough dma memory available
> again.

Hi Soeren

We should be able to rule out a leak. Mount debugfg and then:

while [ /bin/true ] ; do cat /debug/dma-api/num_free_entries ; sleep 60 ; done

while you are capturing. See if the number goes down.

      Andrew
Soeren Moch Jan. 23, 2013, 3:30 p.m. | #4
On 19.01.2013 19:59, Andrew Lunn wrote:
>> Please find attached a debug log generated with your patch.
>>
>> I used the sata disk and two em28xx dvb sticks, no other usb devices,
>> no ethernet cable connected, tuners on saa716x-based card not used.
>>
>> What I can see in the log: a lot of coherent mappings from sata_mv
>> and orion_ehci, a few from mv643xx_eth, no other coherent mappings.
>> All coherent mappings are page aligned, some of them (from orion_ehci)
>> are not really small (as claimed in __alloc_from_pool).
>>
>> I don't believe in a memory leak. When I restart vdr (the application
>> utilizing the dvb sticks) then there is enough dma memory available
>> again.
>
> Hi Soeren
>
> We should be able to rule out a leak. Mount debugfg and then:
>
> while [ /bin/true ] ; do cat /debug/dma-api/num_free_entries ; sleep 60 ; done
>
> while you are capturing. See if the number goes down.
>
>        Andrew

Now I built a kernel with debugfs enabled.
It is not clear to me what I can see from the dma-api/num_free_entries 
output. After reboot (vdr running) I see decreasing numbers (3453 3452 
3445 3430...), min_free_entries is lower (3390). Sometimes the output is 
constant for several minutes ( 3396 3396 3396 3396 3396,...)

   Soeren
Andrew Lunn Jan. 23, 2013, 4:25 p.m. | #5
On Wed, Jan 23, 2013 at 04:30:53PM +0100, Soeren Moch wrote:
> On 19.01.2013 19:59, Andrew Lunn wrote:
> >>Please find attached a debug log generated with your patch.
> >>
> >>I used the sata disk and two em28xx dvb sticks, no other usb devices,
> >>no ethernet cable connected, tuners on saa716x-based card not used.
> >>
> >>What I can see in the log: a lot of coherent mappings from sata_mv
> >>and orion_ehci, a few from mv643xx_eth, no other coherent mappings.
> >>All coherent mappings are page aligned, some of them (from orion_ehci)
> >>are not really small (as claimed in __alloc_from_pool).
> >>
> >>I don't believe in a memory leak. When I restart vdr (the application
> >>utilizing the dvb sticks) then there is enough dma memory available
> >>again.
> >
> >Hi Soeren
> >
> >We should be able to rule out a leak. Mount debugfg and then:
> >
> >while [ /bin/true ] ; do cat /debug/dma-api/num_free_entries ; sleep 60 ; done
> >
> >while you are capturing. See if the number goes down.
> >
> >       Andrew
> 
> Now I built a kernel with debugfs enabled.
> It is not clear to me what I can see from the
> dma-api/num_free_entries output. After reboot (vdr running) I see
> decreasing numbers (3453 3452 3445 3430...), min_free_entries is
> lower (3390). Sometimes the output is constant for several minutes (
> 3396 3396 3396 3396 3396,...)

We are interesting in the long term behavior. Does it gradually go
down? Or is it stable? If it goes down over time, its clearly a leak
somewhere.

	Andrew
Soeren Moch Jan. 23, 2013, 5:07 p.m. | #6
On 23.01.2013 17:25, Andrew Lunn wrote:
> On Wed, Jan 23, 2013 at 04:30:53PM +0100, Soeren Moch wrote:
>> On 19.01.2013 19:59, Andrew Lunn wrote:
>>>> Please find attached a debug log generated with your patch.
>>>>
>>>> I used the sata disk and two em28xx dvb sticks, no other usb devices,
>>>> no ethernet cable connected, tuners on saa716x-based card not used.
>>>>
>>>> What I can see in the log: a lot of coherent mappings from sata_mv
>>>> and orion_ehci, a few from mv643xx_eth, no other coherent mappings.
>>>> All coherent mappings are page aligned, some of them (from orion_ehci)
>>>> are not really small (as claimed in __alloc_from_pool).
>>>>
>>>> I don't believe in a memory leak. When I restart vdr (the application
>>>> utilizing the dvb sticks) then there is enough dma memory available
>>>> again.
>>>
>>> Hi Soeren
>>>
>>> We should be able to rule out a leak. Mount debugfg and then:
>>>
>>> while [ /bin/true ] ; do cat /debug/dma-api/num_free_entries ; sleep 60 ; done
>>>
>>> while you are capturing. See if the number goes down.
>>>
>>>        Andrew
>>
>> Now I built a kernel with debugfs enabled.
>> It is not clear to me what I can see from the
>> dma-api/num_free_entries output. After reboot (vdr running) I see
>> decreasing numbers (3453 3452 3445 3430...), min_free_entries is
>> lower (3390). Sometimes the output is constant for several minutes (
>> 3396 3396 3396 3396 3396,...)
>
> We are interesting in the long term behavior. Does it gradually go
> down? Or is it stable? If it goes down over time, its clearly a leak
> somewhere.
>

Now (in the last hour) stable, occasionally lower numbers:
3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 
3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 
3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 
3396 3396 3396 3396 3396 3396 3396 3396 3396 3365 3396 3394 3396 3396 
3396 3396 3373 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 
3396 3353 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 
3394 3396 3396 3396 3396 3396 3396 3396

Before the last pool exhaustion going down:
3395 3395 3389 3379 3379 3374 3367 3360 3352 3343 3343 3343 3342 3336 
3332 3324 3318 3314 3310 3307 3305 3299 3290 3283 3279 3272 3266 3265 
3247 3247 3247 3242 3236 3236

Soeren
Soeren Moch Jan. 23, 2013, 5:20 p.m. | #7
On 23.01.2013 18:07, Soeren Moch wrote:
> On 23.01.2013 17:25, Andrew Lunn wrote:
>> On Wed, Jan 23, 2013 at 04:30:53PM +0100, Soeren Moch wrote:
>>> On 19.01.2013 19:59, Andrew Lunn wrote:
>>>>> Please find attached a debug log generated with your patch.
>>>>>
>>>>> I used the sata disk and two em28xx dvb sticks, no other usb devices,
>>>>> no ethernet cable connected, tuners on saa716x-based card not used.
>>>>>
>>>>> What I can see in the log: a lot of coherent mappings from sata_mv
>>>>> and orion_ehci, a few from mv643xx_eth, no other coherent mappings.
>>>>> All coherent mappings are page aligned, some of them (from orion_ehci)
>>>>> are not really small (as claimed in __alloc_from_pool).
>>>>>
>>>>> I don't believe in a memory leak. When I restart vdr (the application
>>>>> utilizing the dvb sticks) then there is enough dma memory available
>>>>> again.
>>>>
>>>> Hi Soeren
>>>>
>>>> We should be able to rule out a leak. Mount debugfg and then:
>>>>
>>>> while [ /bin/true ] ; do cat /debug/dma-api/num_free_entries ; sleep
>>>> 60 ; done
>>>>
>>>> while you are capturing. See if the number goes down.
>>>>
>>>>        Andrew
>>>
>>> Now I built a kernel with debugfs enabled.
>>> It is not clear to me what I can see from the
>>> dma-api/num_free_entries output. After reboot (vdr running) I see
>>> decreasing numbers (3453 3452 3445 3430...), min_free_entries is
>>> lower (3390). Sometimes the output is constant for several minutes (
>>> 3396 3396 3396 3396 3396,...)
>>
>> We are interesting in the long term behavior. Does it gradually go
>> down? Or is it stable? If it goes down over time, its clearly a leak
>> somewhere.
>>
>
> Now (in the last hour) stable, occasionally lower numbers:
> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3365 3396 3394 3396 3396
> 3396 3396 3373 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> 3396 3353 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> 3394 3396 3396 3396 3396 3396 3396 3396
>
> Before the last pool exhaustion going down:
> 3395 3395 3389 3379 3379 3374 3367 3360 3352 3343 3343 3343 3342 3336
> 3332 3324 3318 3314 3310 3307 3305 3299 3290 3283 3279 3272 3266 3265
> 3247 3247 3247 3242 3236 3236
>
Here I stopped vdr (and so closed all dvb_demux devices), the number was 
remaining the same 3236, even after restart of vdr (and restart of 
streaming).

> Soeren
Andrew Lunn Jan. 23, 2013, 6:10 p.m. | #8
> >>
> >
> >Now (in the last hour) stable, occasionally lower numbers:
> >3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >3396 3396 3396 3396 3396 3396 3396 3396 3396 3365 3396 3394 3396 3396
> >3396 3396 3373 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >3396 3353 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >3394 3396 3396 3396 3396 3396 3396 3396
> >
> >Before the last pool exhaustion going down:
> >3395 3395 3389 3379 3379 3374 3367 3360 3352 3343 3343 3343 3342 3336
> >3332 3324 3318 3314 3310 3307 3305 3299 3290 3283 3279 3272 3266 3265
> >3247 3247 3247 3242 3236 3236
> >
> Here I stopped vdr (and so closed all dvb_demux devices), the number
> was remaining the same 3236, even after restart of vdr (and restart
> of streaming).

So it does suggest a leak. Probably somewhere on an error path,
e.g. its lost video sync.

     Andrew
Soeren Moch Jan. 28, 2013, 8:59 p.m. | #9
On 23.01.2013 19:10, Andrew Lunn wrote:
>>>>
>>>
>>> Now (in the last hour) stable, occasionally lower numbers:
>>> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
>>> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
>>> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
>>> 3396 3396 3396 3396 3396 3396 3396 3396 3396 3365 3396 3394 3396 3396
>>> 3396 3396 3373 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
>>> 3396 3353 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
>>> 3394 3396 3396 3396 3396 3396 3396 3396
>>>
>>> Before the last pool exhaustion going down:
>>> 3395 3395 3389 3379 3379 3374 3367 3360 3352 3343 3343 3343 3342 3336
>>> 3332 3324 3318 3314 3310 3307 3305 3299 3290 3283 3279 3272 3266 3265
>>> 3247 3247 3247 3242 3236 3236
>>>
>> Here I stopped vdr (and so closed all dvb_demux devices), the number
>> was remaining the same 3236, even after restart of vdr (and restart
>> of streaming).
>
> So it does suggest a leak. Probably somewhere on an error path,
> e.g. its lost video sync.
>

Now I activated the debug messages in em28xx. From the messages I see no 
correlation of the pool exhaustion and lost sync. Also I cannot see any 
error messages from the em28xx driver.
I see a lot of init_isoc/stop_urbs (maybe EPG scan?) without draining 
the coherent pool (checked with 'cat /debug/dma-api/num_free_entries', 
which gave stable numbers), but after half an hour there are only 
init_isoc messages without corresponding stop_urbs messages and 
num_free_entries decreased until coherent pool exhaustion.

Any idea where the memory leak is? What is allocating coherent buffers 
for orion-ehci?

   Soeren


Jan 28 20:46:03 guruvdr kernel: em28xx #0/2-dvb: Using 5 buffers each 
with 64 x 940 bytes
Jan 28 20:46:03 guruvdr kernel: em28xx #0 em28xx_init_isoc :em28xx: 
called em28xx_init_isoc in mode 2
Jan 28 20:46:03 guruvdr kernel: em28xx #1/2-dvb: Using 5 buffers each 
with 64 x 940 bytes
Jan 28 20:46:03 guruvdr kernel: em28xx #1 em28xx_init_isoc :em28xx: 
called em28xx_init_isoc in mode 2
Jan 28 20:46:23 guruvdr kernel: em28xx #0 em28xx_stop_urbs :em28xx: 
called em28xx_stop_urbs
Jan 28 20:46:23 guruvdr kernel: em28xx #1 em28xx_stop_urbs :em28xx: 
called em28xx_stop_urbs
Jan 28 20:46:24 guruvdr kernel: em28xx #0/2-dvb: Using 5 buffers each 
with 64 x 940 bytes
Jan 28 20:46:24 guruvdr kernel: em28xx #0 em28xx_init_isoc :em28xx: 
called em28xx_init_isoc in mode 2
Jan 28 20:46:24 guruvdr kernel: em28xx #1/2-dvb: Using 5 buffers each 
with 64 x 940 bytes
Jan 28 20:46:24 guruvdr kernel: em28xx #1 em28xx_init_isoc :em28xx: 
called em28xx_init_isoc in mode 2
Jan 28 20:46:44 guruvdr kernel: em28xx #1 em28xx_stop_urbs :em28xx: 
called em28xx_stop_urbs
Jan 28 20:46:44 guruvdr kernel: em28xx #0 em28xx_stop_urbs :em28xx: 
called em28xx_stop_urbs
Jan 28 20:46:45 guruvdr kernel: em28xx #1/2-dvb: Using 5 buffers each 
with 64 x 940 bytes
Jan 28 20:46:45 guruvdr kernel: em28xx #1 em28xx_init_isoc :em28xx: 
called em28xx_init_isoc in mode 2
Jan 28 20:46:45 guruvdr kernel: em28xx #0/2-dvb: Using 5 buffers each 
with 64 x 940 bytes
Jan 28 20:46:45 guruvdr kernel: em28xx #0 em28xx_init_isoc :em28xx: 
called em28xx_init_isoc in mode 2
Jan 28 20:54:33 guruvdr kernel: ERROR: 1024 KiB atomic DMA coherent pool 
is too small!
Jan 28 20:54:33 guruvdr kernel: Please increase it with coherent_pool= 
kernel parameter!
Jason Cooper Jan. 29, 2013, 12:13 a.m. | #10
On Mon, Jan 28, 2013 at 09:59:18PM +0100, Soeren Moch wrote:
> On 23.01.2013 19:10, Andrew Lunn wrote:
> >>>>
> >>>
> >>>Now (in the last hour) stable, occasionally lower numbers:
> >>>3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >>>3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >>>3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >>>3396 3396 3396 3396 3396 3396 3396 3396 3396 3365 3396 3394 3396 3396
> >>>3396 3396 3373 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >>>3396 3353 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396 3396
> >>>3394 3396 3396 3396 3396 3396 3396 3396
> >>>
> >>>Before the last pool exhaustion going down:
> >>>3395 3395 3389 3379 3379 3374 3367 3360 3352 3343 3343 3343 3342 3336
> >>>3332 3324 3318 3314 3310 3307 3305 3299 3290 3283 3279 3272 3266 3265
> >>>3247 3247 3247 3242 3236 3236
> >>>
> >>Here I stopped vdr (and so closed all dvb_demux devices), the number
> >>was remaining the same 3236, even after restart of vdr (and restart
> >>of streaming).
> >
> >So it does suggest a leak. Probably somewhere on an error path,
> >e.g. its lost video sync.
> >
> 
> Now I activated the debug messages in em28xx. From the messages I
> see no correlation of the pool exhaustion and lost sync. Also I
> cannot see any error messages from the em28xx driver.
> I see a lot of init_isoc/stop_urbs (maybe EPG scan?) without
> draining the coherent pool (checked with 'cat
> /debug/dma-api/num_free_entries', which gave stable numbers), but
> after half an hour there are only init_isoc messages without
> corresponding stop_urbs messages and num_free_entries decreased
> until coherent pool exhaustion.
> 
> Any idea where the memory leak is? What is allocating coherent
> buffers for orion-ehci?

Keeping in mind that I am completely unfamiliar with usb dvb, my best
guess is that the problem is in em28xx-core.c:1131

According to your log messages, it is in mode 2, which is
EM28XX_DIGITAL_MODE.

There seem to be good hints in

86d38d1e [media] em28xx: pre-allocate DVB isoc transfer buffers

I added the relevant parties to the To:...

For Gianluca and Mauro, the whole thread may be found at:

http://markmail.org/message/wm4wlgzoudixd4so#query:+page:1+mid:o7phz7cosmwpcsrz+state:results

thx,

Jason.

> 
>   Soeren
> 
> 
> Jan 28 20:46:03 guruvdr kernel: em28xx #0/2-dvb: Using 5 buffers
> each with 64 x 940 bytes
> Jan 28 20:46:03 guruvdr kernel: em28xx #0 em28xx_init_isoc :em28xx:
> called em28xx_init_isoc in mode 2
> Jan 28 20:46:03 guruvdr kernel: em28xx #1/2-dvb: Using 5 buffers
> each with 64 x 940 bytes
> Jan 28 20:46:03 guruvdr kernel: em28xx #1 em28xx_init_isoc :em28xx:
> called em28xx_init_isoc in mode 2
> Jan 28 20:46:23 guruvdr kernel: em28xx #0 em28xx_stop_urbs :em28xx:
> called em28xx_stop_urbs
> Jan 28 20:46:23 guruvdr kernel: em28xx #1 em28xx_stop_urbs :em28xx:
> called em28xx_stop_urbs
> Jan 28 20:46:24 guruvdr kernel: em28xx #0/2-dvb: Using 5 buffers
> each with 64 x 940 bytes
> Jan 28 20:46:24 guruvdr kernel: em28xx #0 em28xx_init_isoc :em28xx:
> called em28xx_init_isoc in mode 2
> Jan 28 20:46:24 guruvdr kernel: em28xx #1/2-dvb: Using 5 buffers
> each with 64 x 940 bytes
> Jan 28 20:46:24 guruvdr kernel: em28xx #1 em28xx_init_isoc :em28xx:
> called em28xx_init_isoc in mode 2
> Jan 28 20:46:44 guruvdr kernel: em28xx #1 em28xx_stop_urbs :em28xx:
> called em28xx_stop_urbs
> Jan 28 20:46:44 guruvdr kernel: em28xx #0 em28xx_stop_urbs :em28xx:
> called em28xx_stop_urbs
> Jan 28 20:46:45 guruvdr kernel: em28xx #1/2-dvb: Using 5 buffers
> each with 64 x 940 bytes
> Jan 28 20:46:45 guruvdr kernel: em28xx #1 em28xx_init_isoc :em28xx:
> called em28xx_init_isoc in mode 2
> Jan 28 20:46:45 guruvdr kernel: em28xx #0/2-dvb: Using 5 buffers
> each with 64 x 940 bytes
> Jan 28 20:46:45 guruvdr kernel: em28xx #0 em28xx_init_isoc :em28xx:
> called em28xx_init_isoc in mode 2
> Jan 28 20:54:33 guruvdr kernel: ERROR: 1024 KiB atomic DMA coherent
> pool is too small!
> Jan 28 20:54:33 guruvdr kernel: Please increase it with
> coherent_pool= kernel parameter!
>
Andrew Lunn Jan. 29, 2013, 11:02 a.m. | #11
> Now I activated the debug messages in em28xx. From the messages I
> see no correlation of the pool exhaustion and lost sync. Also I
> cannot see any error messages from the em28xx driver.
> I see a lot of init_isoc/stop_urbs (maybe EPG scan?) without
> draining the coherent pool (checked with 'cat
> /debug/dma-api/num_free_entries', which gave stable numbers), but
> after half an hour there are only init_isoc messages without
> corresponding stop_urbs messages and num_free_entries decreased
> until coherent pool exhaustion.

Hi Soeren

em28xx_stop_urbs() is only called by em28xx_stop_streaming().

em28xx_stop_streaming() is only called by em28xx_stop_feed()
when 0 == dvb->nfeeds.

em28xx_stop_feed()and em28xx_start_feed() look O.K, dvb->nfeeds is
protected by a mutex etc.

Now, em28xx_init_isoc() is also called by buffer_prepare(). This uses
em28xx_alloc_isoc() to do the actual allocation, and that function
sets up the urb such that on completion the function
em28xx_irq_callback() is called.

It looks like there might be issues here:

Once the data has been copied out, it resubmits the urb:

       urb->status = usb_submit_urb(urb, GFP_ATOMIC);
        if (urb->status) {
                em28xx_isocdbg("urb resubmit failed (error=%i)\n",
                               urb->status);
        }
  
However, if the ubs_submit_urb fails, it looks like the urb is lost.

If you look at other code submitting urbs you have this pattern:

               rc = usb_submit_urb(isoc_bufs->urb[i], GFP_ATOMIC);
                if (rc) {
                        em28xx_err("submit of urb %i failed (error=%i)\n", i,
                                   rc);
                        em28xx_uninit_isoc(dev, mode);
                        return rc;
                }
 
Do you have your build such that you would see "urb resubmit failed"
in your logs? Are there any?

     Andrew
Soeren Moch Jan. 29, 2013, 11:50 a.m. | #12
On 29.01.2013 12:02, Andrew Lunn wrote:
>> Now I activated the debug messages in em28xx. From the messages I
>> see no correlation of the pool exhaustion and lost sync. Also I
>> cannot see any error messages from the em28xx driver.
>> I see a lot of init_isoc/stop_urbs (maybe EPG scan?) without
>> draining the coherent pool (checked with 'cat
>> /debug/dma-api/num_free_entries', which gave stable numbers), but
>> after half an hour there are only init_isoc messages without
>> corresponding stop_urbs messages and num_free_entries decreased
>> until coherent pool exhaustion.
>
> Hi Soeren
>
> em28xx_stop_urbs() is only called by em28xx_stop_streaming().
>
> em28xx_stop_streaming() is only called by em28xx_stop_feed()
> when 0 == dvb->nfeeds.
>
> em28xx_stop_feed()and em28xx_start_feed() look O.K, dvb->nfeeds is
> protected by a mutex etc.
>
> Now, em28xx_init_isoc() is also called by buffer_prepare(). This uses
> em28xx_alloc_isoc() to do the actual allocation, and that function
> sets up the urb such that on completion the function
> em28xx_irq_callback() is called.
>
> It looks like there might be issues here:
>
> Once the data has been copied out, it resubmits the urb:
>
>         urb->status = usb_submit_urb(urb, GFP_ATOMIC);
>          if (urb->status) {
>                  em28xx_isocdbg("urb resubmit failed (error=%i)\n",
>                                 urb->status);
>          }
>
> However, if the ubs_submit_urb fails, it looks like the urb is lost.
>
> If you look at other code submitting urbs you have this pattern:
>
>                 rc = usb_submit_urb(isoc_bufs->urb[i], GFP_ATOMIC);
>                  if (rc) {
>                          em28xx_err("submit of urb %i failed (error=%i)\n", i,
>                                     rc);
>                          em28xx_uninit_isoc(dev, mode);
>                          return rc;
>                  }
>
> Do you have your build such that you would see "urb resubmit failed"
> in your logs? Are there any?

I only had "urb resubmit failed" messages _after_ the coherent pool 
exhaustion. So I guess something below the usb_submit_urb call is 
allocating (too much) memory, sometimes. Or can dvb_demux allocate 
memory and blame orion-ehci for it?

   Soeren

Patch hide | download patch | download mbox

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6b2fb87..3df74ac 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -497,6 +497,7 @@  static void *__alloc_from_pool(size_t size, struct page **ret_page)
 		pr_err_once("ERROR: %u KiB atomic DMA coherent pool is too small!\n"
 			    "Please increase it with coherent_pool= kernel parameter!\n",
 			    (unsigned)pool->size / 1024);
+		debug_dma_dump_mappings(NULL);
 	}
 	spin_unlock_irqrestore(&pool->lock, flags);