Message ID | 1487682998-2564-1-git-send-email-m.szyprowski@samsung.com |
---|---|
State | New |
Headers | show |
On Tue, Feb 21, 2017 at 02:16:38PM +0100, Marek Szyprowski wrote: > Some DMA regions, for example those created by dma_declare_coherent_memory, > might not be possible to be represented by struct page pointers. Getting > scatterlist for the buffer allocated from such region is not possible. > This patch adds a simple check in arm_dma_get_sgtable() function if the > exported struct page pointer is valid. I really don't like this - the "validation" here looks like total crap, especially when you analyse what these macros expand to. I also don't like the way dma_get_sgtable() appeared without any documentation about (a) what it's for (b) what the expectations for its arguments are and (c) what it's supposed to be doing. There are no comments in include/linux/dma-mapping.h describing the interface, nor are there any comments about it in Documentation/DMA*. So, I'm really not applying this patch until the purpose of this function gets documented. Thanks. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Hi Russel and Marek, On Wed, Mar 22, 2017 at 5:10 AM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Tue, Feb 21, 2017 at 02:16:38PM +0100, Marek Szyprowski wrote: >> Some DMA regions, for example those created by dma_declare_coherent_memory, >> might not be possible to be represented by struct page pointers. Getting >> scatterlist for the buffer allocated from such region is not possible. >> This patch adds a simple check in arm_dma_get_sgtable() function if the >> exported struct page pointer is valid. > > I really don't like this - the "validation" here looks like total crap, > especially when you analyse what these macros expand to. > > I also don't like the way dma_get_sgtable() appeared without any > documentation about (a) what it's for (b) what the expectations for its > arguments are and (c) what it's supposed to be doing. There are no > comments in include/linux/dma-mapping.h describing the interface, nor > are there any comments about it in Documentation/DMA*. Soory for a long reply. This check doesn't really do any kind of validation. I have been debugging a pagefault when the sgtable created by arm_dma_get_sgtable() runs into the following kernel error when vb2_dc_dmabuf_ops_map() tries to map the exported buffer from another driver. Unable to handle kernel paging request at virtual address f0004000 I tracked the buffer and page - dma_to_pfn() returns a page that is invalid which gets into this sgtable and later on when another driver tries to map it, it trips on this invalid sgtable entry. [ 160.717795] pgd = ece98000 [ 160.720479] [f0004000] *pgd=00000000 [ 160.724034] Internal error: Oops: 5 [#1] PREEMPT SMP ARM [ 160.729316] Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_conservative exynos_gsc s5p_jpeg s5p_mfc v4l2_mem2mem videobuf2_dma_contig v4l2_common videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media [ 160.749077] CPU: 5 PID: 1979 Comm: v4l2video0dec0: Not tainted 4.10.0-rc5-00007-gfdc23c2-dirty #27 [ 160.758000] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 160.764066] task: eb852300 task.stack: eb958000 [ 160.768578] PC is at page_address+0x4/0xd8 [ 160.772651] LR is at vb2_dc_dmabuf_ops_map+0x140/0x220 [videobuf2_dma_contig] [ 160.779751] pc : [<c01c2bd0>] lr : [<bf1bc7c8>] psr: 60030013 sp : eb959be8 ip : 00000007 fp : ed3456c0 [ 160.791188] r10: 00000001 r9 : 00000001 r8 : ed3456c0 [ 160.796388] r7 : c0c09a44 r6 : 00000001 r5 : 87654321 r4 : ed345ec0 [ 160.802887] r3 : 00000000 r2 : 00000000 r1 : 00000001 r0 : f0004000 [ 160.809387] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 160.816492] Control: 10c5387d Table: 6ce9806a DAC: 00000051 [ 160.822211] Process v4l2video0dec0: (pid: 1979, stack limit = 0xeb958210) [ 160.828970] Stack: (0xeb959be8 to 0xeb95a000) [ 160.833304] 9be0: ed345ec0 87654321 00000001 c0c09a44 ed3456c0 00000001 [ 160.841449] 9c00: 00000001 bf1bc7c8 00000001 eb959c34 ed11f314 eeab0a10 c09bb5d4 ed345000 [ 160.849595] 9c20: 00000001 00000000 ed345290 00000000 bf12aa8c 00000001 00000001 c044c0f0 [ 160.857740] 9c40: ed345280 ed037600 00000000 bf1bc330 00000000 00000000 ed345290 000e6040 ---- [ 161.102113] [<c01c2bd0>] (page_address) from [<bf1bc7c8>] (vb2_dc_dmabuf_ops_map+0x140/0x220 [videobuf2_dma_contig]) [ 161.112602] [<bf1bc7c8>] (vb2_dc_dmabuf_ops_map [videobuf2_dma_contig]) from [<c044c0f0>] (dma_buf_map_attachment+0x44/0x68) [ 161.123774] [<c044c0f0>] (dma_buf_map_attachment) from [<bf1bc330>] (vb2_dc_map_dmabuf+0x64/0x16c [videobuf2_dma_contig]) [ 161.134697] [<bf1bc330>] (vb2_dc_map_dmabuf [videobuf2_dma_contig]) from [<bf1278c8>] (__qbuf_dmabuf+0x3c4/0x4c8 [videobuf2_core]) [ 161.146396] [<bf1278c8>] (__qbuf_dmabuf [videobuf2_core]) from [<bf127afc>] (__buf_prepare+0x130/0x16c [videobuf2_core]) [ 161.157226] [<bf127afc>] (__buf_prepare [videobuf2_core]) from [<bf127d3c>] (vb2_core_qbuf+0x128/0x204 [videobuf2_core]) [ 161.168058] [<bf127d3c>] (vb2_core_qbuf [videobuf2_core]) from [<bf1e4658>] (v4l2_m2m_qbuf+0x1c/0x34 [v4l2_mem2mem]) [ 161.178564] [<bf1e4658>] (v4l2_m2m_qbuf [v4l2_mem2mem]) from [<bf057914>] (__video_do_ioctl+0x290/0x2ec [videodev]) [ 161.188972] [<bf057914>] (__video_do_ioctl [videodev]) from [<bf057338>] (video_usercopy+0x1a0/0x4e0 [videodev]) [ 161.199109] [<bf057338>] (video_usercopy [videodev]) from [<bf0525a4>] (v4l2_ioctl+0xa0/0xd8 [videodev]) [ 161.208539] [<bf0525a4>] (v4l2_ioctl [videodev]) from [<c01f991c>] (do_vfs_ioctl+0x9c/0x8e0) [ 161.216923] [<c01f991c>] (do_vfs_ioctl) from [<c01fa194>] (SyS_ioctl+0x34/0x5c) [ 161.224205] [<c01fa194>] (SyS_ioctl) from [<c0107740>] (ret_fast_syscall+0x0/0x3c) [ 161.231741] Code: e0623623 e0800283 e12fff1e e92d47f0 (e5903000) [ 161.237807] ---[ end trace aa851942c99a7c54 ]--- Just an an experiment, I added this hack: in this case __dc_alloc() returned values is the mapped page. This is by mo means a fix - I am not sure if we can rely on this page, since at DQBUF its get unmapped. /* In this case cpu_addr is the page - just use it */ if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0) page = cpu_addr; At this point I dumped some data: [ 154.627505] platform 11000000.codec:left: debug_arm_dma_get_sgtable: bus_to_pfn 772608 dma_pfn_offset 0 PHYS_PFN_OFFSET 262144 [ 154.627512] platform 11000000.codec:left: debug_arm_dma_get_sgtable: dma_to_pfn 772608 virt_to_pfn 470272 virt_to_dma 1926234112 virt_to_page ef6ca000 page f0004000 cpu_addr is [ 154.627520] platform 11000000.codec:left: debug_arm_dma_get_sgtable: cpu_addr f2d00000 handle bca00000 page = f2d00000 size 946176 align size 946176 [ 154.627527] platform 11000000.codec:left: after dma_get_sgtable_attrs() vb2_dc_get_base_sgt: sgt ecdcec00 page f2d00000 buf ecdced80 cookie f2d00000 vaddr f2d00000 dc_cookie ecdced90 dma_addr bca00000 size 946176 cpu_addr is f2d00000 and the page you get from pfn_to_page(dma_to_pfn(dev, handle)) is f0004000. The page you get with struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); is bogus. So this check Merek added doesn't find a bogus page, does a reverse validation of the bogus page. I also generated page tables and I do find this page f2d00000 it in there. For some reason dma_to_pfn(dev, handle) fails baldy at this stage which trips the kernel later. I can help debug this further and find a fix - I have a use-case that can reproduces on every single run. I am suing 4.10-rc5 With the hack to make page = cpu_addr, it goes further as the page is good and then runs into an error when an attempt is made to clean D-cache. [ 154.687637] Unable to handle kernel paging request at virtual address a4800000 [ 154.693445] pgd = ed880000 [ 154.696054] [a4800000] *pgd=00000000 [ 154.699611] Internal error: Oops: 2805 [#1] PREEMPT SMP ARM [ 154.705153] Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_conservative s5p_mfc s5p_jpeg exynos_gsc v4l2_mem2mem videobuf2_dma_contig v4l2_common videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media [ 154.724914] CPU: 3 PID: 1901 Comm: v4l2video6dec0: Not tainted 4.10.0-rc5-00007-gfdc23c2-dirty #28 [ 154.733835] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 154.739902] task: eda6cb00 task.stack: eb844000 [ 154.744414] PC is at v7_dma_clean_range+0x1c/0x34 [ 154.749094] LR is at __dma_page_cpu_to_dev+0x28/0x90 [ 154.754027] pc : [<c0114070>] lr : [<c010f624>] psr: 20030013 sp : eb845b90 ip : 00040000 fp : 00000001 [ 154.765464] r10: 00000001 r9 : 00000000 r8 : c010f6d0 [ 154.770664] r7 : 00000001 r6 : 000e7000 r5 : f2d00000 r4 : 00000000 [ 154.777164] r3 : 0000003f r2 : 00000040 r1 : a48e7000 r0 : a4800000 [ 154.783663] Flags: nzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none [ 154.790767] Control: 10c5387d Table: 6d88006a DAC: 00000051 [ 154.796486] Process v4l2video6dec0: (pid: 1901, stack limit = 0xeb844210) [ 154.803246] Stack: (0xeb845b90 to 0xeb846000) --- [ 155.092674] 5fe0: b6589200 b384393c b6556095 b6cf8ea6 20030030 0000001a 00000000 00000000 [ 155.100828] [<c0114070>] (v7_dma_clean_range) from [<c010f624>] (__dma_page_cpu_to_dev+0x28/0x90) [ 155.109661] [<c010f624>] (__dma_page_cpu_to_dev) from [<c010f734>] (arm_dma_map_page+0x64/0x68) [ 155.118326] [<c010f734>] (arm_dma_map_page) from [<c0110760>] (arm_dma_map_sg+0x90/0x1a4) [ 155.126486] [<c0110760>] (arm_dma_map_sg) from [<bf1c7814>] (vb2_dc_dmabuf_ops_map+0x1a0/0x228 [videobuf2_dma_contig]) [ 155.137143] [<bf1c7814>] (vb2_dc_dmabuf_ops_map [videobuf2_dma_contig]) from [<c044c510>] (dma_buf_map_attachment+0x44/0x68) [ 155.148314] [<c044c510>] (dma_buf_map_attachment) from [<bf1c7318>] (vb2_dc_map_dmabuf+0x70/0x17c [videobuf2_dma_contig]) [ 155.159254] [<bf1c7318>] (vb2_dc_map_dmabuf [videobuf2_dma_contig]) from [<bf1328c8>] (__qbuf_dmabuf+0x3c4/0x4c8 [videobuf2_core]) [ 155.170940] [<bf1328c8>] (__qbuf_dmabuf [videobuf2_core]) from [<bf132afc>] (__buf_prepare+0x130/0x16c [videobuf2_core]) [ 155.181771] [<bf132afc>] (__buf_prepare [videobuf2_core]) from [<bf132d3c>] (vb2_core_qbuf+0x128/0x204 [videobuf2_core]) [ 155.192612] [<bf132d3c>] (vb2_core_qbuf [videobuf2_core]) from [<bf1ef658>] (v4l2_m2m_qbuf+0x1c/0x34 [v4l2_mem2mem]) [ 155.203195] [<bf1ef658>] (v4l2_m2m_qbuf [v4l2_mem2mem]) from [<bf062914>] (__video_do_ioctl+0x290/0x2ec [videodev]) [ 155.213542] [<bf062914>] (__video_do_ioctl [videodev]) from [<bf062338>] (video_usercopy+0x1a0/0x4e0 [videodev]) [ 155.223677] [<bf062338>] (video_usercopy [videodev]) from [<bf05d5a4>] (v4l2_ioctl+0xa0/0xd8 [videodev]) [ 155.233094] [<bf05d5a4>] (v4l2_ioctl [videodev]) from [<c01f9d3c>] (do_vfs_ioctl+0x9c/0x8e0) [ 155.241461] [<c01f9d3c>] (do_vfs_ioctl) from [<c01fa5b4>] (SyS_ioctl+0x34/0x5c) [ 155.248743] [<c01fa5b4>] (SyS_ioctl) from [<c0107740>] (ret_fast_syscall+0x0/0x3c) [ 155.256278] Code: e3a02004 e1a02312 e2423001 e1c00003 (ee070f3a) [ 155.262444] ---[ end trace e7118e14619b5c07 ]--- thanks, -- Shuah > > So, I'm really not applying this patch until the purpose of this > function gets documented. > > Thanks. > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
On Wed, Mar 22, 2017 at 09:46:05AM -0600, Shuah Khan wrote: > This check doesn't really do any kind of validation. Exactly right. > I have been debugging > a pagefault when the sgtable created by arm_dma_get_sgtable() runs into > the following kernel error when vb2_dc_dmabuf_ops_map() tries to map > the exported buffer from another driver. It is _completely_ incorrect to even try to pass memory allocated from the coherent DMA allocator back into the _streaming_ DMA API. That is just something that the streaming DMA API does not support. I am dismayed at dma_get_sgtable() ever having been proposed - I think it's completely wrong. Looking back to when it was created in 2012, it appears that it never got any review from architecture people - there certainly are no architecture people in the attributations for it. There are two Samsung people and a DRM person in the sign-off, and IMHO that's not good enough for a proposal which completely changes the way DMA memory is dealt with - especially as _that_ is soo architecture dependent. Annoyingly, when I was looking at the dma_buf API while I was implementing the armada DRM driver, I did point out the problems with the dma_buf API wrt passing contiguous regions from one driver to another, and I suggested separating the DMA mapping operation somewhat, but all that fell on deaf ears. I guess the results of that are now coming home to roost. The fundamental point about coherent DMA memory, or memory that is provided via the declaration mechanisms is that there is _no_ gurantee that there will be a "struct page" backing it, so creating a scatterlist with "struct page" pointers, which is then subsequently dma_map_sg()'d elsewhere is always going to be totally broken. In the case of some coherent DMA allocators, there _may_ be a "struct page" backing the memory, but that's not guaranteed, and since dma_map_sg() relies on there being a valid "struct page", things will go wrong. And by "wrong", I mean it could oops the kernel because of trying to dereference the "struct page" pointer, or if that's successful, we could end up invalidating the cache lines for some random region of memory, which then goes on to cause a completely different failure, or maybe filesystem corruption (eg, what if it hits a page cache page containing an inode table, and invalidates all updates to that page?) The consequences really don't bare thinking about. > Just an an experiment, I added this hack: in this case __dc_alloc() > returned values is the mapped page. This is by mo means a fix - I am > not sure if we can rely on this page, since at DQBUF its get unmapped. > > /* In this case cpu_addr is the page - just use it */ > if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0) > page = cpu_addr; > > At this point I dumped some data: > > [ 154.627505] platform 11000000.codec:left: > debug_arm_dma_get_sgtable: bus_to_pfn 772608 dma_pfn_offset 0 > PHYS_PFN_OFFSET 262144 > [ 154.627512] platform 11000000.codec:left: > debug_arm_dma_get_sgtable: dma_to_pfn 772608 virt_to_pfn 470272 > virt_to_dma 1926234112 virt_to_page ef6ca000 page f0004000 > > cpu_addr is > [ 154.627520] platform 11000000.codec:left: > debug_arm_dma_get_sgtable: cpu_addr f2d00000 handle bca00000 page = > f2d00000 size 946176 align size 946176 > [ 154.627527] platform 11000000.codec:left: after > dma_get_sgtable_attrs() vb2_dc_get_base_sgt: sgt ecdcec00 page > f2d00000 buf ecdced80 cookie f2d00000 vaddr f2d00000 dc_cookie > ecdced90 dma_addr bca00000 size 946176 > > cpu_addr is f2d00000 and the page you get from > pfn_to_page(dma_to_pfn(dev, handle)) is f0004000. I'd like to see the diff that produced the above dumps to make better sense of it... > The page you get with struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); > is bogus. So this check Merek added doesn't find a bogus page, does a reverse > validation of the bogus page. Right - dma_to_pfn() will correctly translate the "handle" to a PFN number, so we can just multiply that with PAGE_SIZE to get the physical address, which if I'm interpreting your information above correctly, would be 0x72d00000. However, what you're saying is that there is no valid "struct page" associated with that address. The correct way to validate this is: unsigned long pfn = dma_to_pfn(dev, handle); struct page *page; if (!pfn_valid(pfn)) return -ENXIO; page = pfn_to_page(pfn); However, I have a huge big problem with this - this can pass even for memory obtained through the coherent allocator (because of the way CMA works.) This can end up with the DMA streaming API kmapping memory using cacheable attributes while CMA has it mapped with uncacheable attributes, where (as mentioned on IRC this morning): you enter the realm of "mismatched memory attributes", which don't blow up, but you can lose guarantees of both mappings wrt coherence and ordering We really don't want to go there. The problem, however, is that we can't distingish a CMA page allocated via the DMA coherent allocator (and so potentially mapped uncacheable) with a normal allocation of the same page, which would be valid for streaming DMA API use. I think the best we can do is as per the above for declared DMA memory (iow, memory that isn't struct page backed.) It should _never_ be passed into the streaming DMA API. I also think that the dma_get_sgtable() thing is a total trainwreck, and we need to have a rethink about this. This isn't a case of "something changed in the architecture that broke dma_get_sgtable()" - this is a case that it's always been broken in this way ever since dma_get_sgtable() was introduced, and dma_get_sgtable() was never (properly?) reviewed with this kind of use-case in mind. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
On Wed, Mar 22, 2017 at 10:32 AM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Wed, Mar 22, 2017 at 09:46:05AM -0600, Shuah Khan wrote: >> This check doesn't really do any kind of validation. > > Exactly right. > >> I have been debugging >> a pagefault when the sgtable created by arm_dma_get_sgtable() runs into >> the following kernel error when vb2_dc_dmabuf_ops_map() tries to map >> the exported buffer from another driver. > > It is _completely_ incorrect to even try to pass memory allocated from > the coherent DMA allocator back into the _streaming_ DMA API. That is > just something that the streaming DMA API does not support. > > I am dismayed at dma_get_sgtable() ever having been proposed - I think > it's completely wrong. Looking back to when it was created in 2012, it > appears that it never got any review from architecture people - there > certainly are no architecture people in the attributations for it. > > There are two Samsung people and a DRM person in the sign-off, and IMHO > that's not good enough for a proposal which completely changes the way > DMA memory is dealt with - especially as _that_ is soo architecture > dependent. > > Annoyingly, when I was looking at the dma_buf API while I was implementing > the armada DRM driver, I did point out the problems with the dma_buf API > wrt passing contiguous regions from one driver to another, and I suggested > separating the DMA mapping operation somewhat, but all that fell on deaf > ears. I guess the results of that are now coming home to roost. > > The fundamental point about coherent DMA memory, or memory that is > provided via the declaration mechanisms is that there is _no_ gurantee > that there will be a "struct page" backing it, so creating a scatterlist > with "struct page" pointers, which is then subsequently dma_map_sg()'d > elsewhere is always going to be totally broken. In the case of some > coherent DMA allocators, there _may_ be a "struct page" backing the > memory, but that's not guaranteed, and since dma_map_sg() relies on > there being a valid "struct page", things will go wrong. > > And by "wrong", I mean it could oops the kernel because of trying to > dereference the "struct page" pointer, or if that's successful, we > could end up invalidating the cache lines for some random region of > memory, which then goes on to cause a completely different failure, or > maybe filesystem corruption (eg, what if it hits a page cache page > containing an inode table, and invalidates all updates to that page?) > The consequences really don't bare thinking about. > >> Just an an experiment, I added this hack: in this case __dc_alloc() >> returned values is the mapped page. This is by mo means a fix - I am >> not sure if we can rely on this page, since at DQBUF its get unmapped. >> >> /* In this case cpu_addr is the page - just use it */ >> if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) == 0) >> page = cpu_addr; >> >> At this point I dumped some data: >> >> [ 154.627505] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: bus_to_pfn 772608 dma_pfn_offset 0 >> PHYS_PFN_OFFSET 262144 >> [ 154.627512] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: dma_to_pfn 772608 virt_to_pfn 470272 >> virt_to_dma 1926234112 virt_to_page ef6ca000 page f0004000 >> >> cpu_addr is >> [ 154.627520] platform 11000000.codec:left: >> debug_arm_dma_get_sgtable: cpu_addr f2d00000 handle bca00000 page = >> f2d00000 size 946176 align size 946176 >> [ 154.627527] platform 11000000.codec:left: after >> dma_get_sgtable_attrs() vb2_dc_get_base_sgt: sgt ecdcec00 page >> f2d00000 buf ecdced80 cookie f2d00000 vaddr f2d00000 dc_cookie >> ecdced90 dma_addr bca00000 size 946176 >> >> cpu_addr is f2d00000 and the page you get from >> pfn_to_page(dma_to_pfn(dev, handle)) is f0004000. > > I'd like to see the diff that produced the above dumps to make better > sense of it... I added debug to track the buffer from allocation time to export buf time and all the way to the time map_sg fails. Please see the attached diff. It is somewhat cluttered with lots of debug messages. I also added a debug version (duplicate) arm_dma_get_sgtable() so I can limit the messages to when it gets called from vb2_dc_get_base_sgt() > >> The page you get with struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); >> is bogus. So this check Merek added doesn't find a bogus page, does a reverse >> validation of the bogus page. > > Right - dma_to_pfn() will correctly translate the "handle" to a PFN > number, so we can just multiply that with PAGE_SIZE to get the physical > address, which if I'm interpreting your information above correctly, > would be 0x72d00000. Correct paddr is 0x72d00000. I forgot to add the allocation time debug dump: [ 154.619456] platform 11000000.codec:left: vb2_dc_alloc: buf ecdced80 cookie f2d00000 vaddr f2d00000 paddr 72d00000 dma_addr bca00000 size 946176 dc_cookie ecdced90 dma_to_pfn 772608 I added debug to track the buffer from allocation time to export buf time and all the way to the time map_sg fails. > > However, what you're saying is that there is no valid "struct page" > associated with that address. From what I can tell from the data I have, dma_to_pfn(dev, handle); doesn't return valid page at arm_dma_get_sgtable(), even though dma_to_pfn(dev, handle); itself stays the same from alloc time, as you can see in the above dump from vb2_dc_alloc(): dma_to_pfn 772608 I am guessing page might not be valid, even at the alloc time. That is next on debug list. Which is somewhat puzzling, because dma_attrs don't have the DMA_ATTR_NO_KERNEL_MAP set, __dma_alloc() passes in want_vaddr true. __alloc_from_contiguous() will map the page and return a valid vaddr __dma_alloc() then does *handle = pfn_to_dma(dev, page_to_pfn(page)); to return the handle. > > The correct way to validate this is: > > unsigned long pfn = dma_to_pfn(dev, handle); > struct page *page; > > if (!pfn_valid(pfn)) > return -ENXIO; > > page = pfn_to_page(pfn); > > However, I have a huge big problem with this - this can pass even for > memory obtained through the coherent allocator (because of the way CMA > works.) Right. DMA_ATTR_NO_KERNEL_MAP is another variable, if set, there won't be an kernel mapping. > > This can end up with the DMA streaming API kmapping memory using > cacheable attributes while CMA has it mapped with uncacheable attributes, > where (as mentioned on IRC this morning): Which irc channel?? I can get on there for the discussion if it makes it easier. > > you enter the realm of "mismatched memory attributes", > which don't blow up, but you can lose guarantees of both > mappings wrt coherence and ordering > > We really don't want to go there. The problem, however, is that we > can't distingish a CMA page allocated via the DMA coherent allocator > (and so potentially mapped uncacheable) with a normal allocation of > the same page, which would be valid for streaming DMA API use. > > I think the best we can do is as per the above for declared DMA memory > (iow, memory that isn't struct page backed.) It should _never_ be > passed into the streaming DMA API. We can see some of the evidence here in this case. One driver allocates the buffer and another driber tries to import. After buffer is exported, there is a window between DQBUF and QBUF, DQBUF unmaps the buffers. Then the importer comes along referencing the sg_table and looks to map them. Before mapping, looks like D-cache clean has to occur and that would need vaddr if I understand correctly. So there a lot of places, it can trip all over. > > I also think that the dma_get_sgtable() thing is a total trainwreck, > and we need to have a rethink about this. I don't think it works for the case I am running into. > > This isn't a case of "something changed in the architecture that broke > dma_get_sgtable()" - this is a case that it's always been broken in > this way ever since dma_get_sgtable() was introduced, and dma_get_sgtable() > was never (properly?) reviewed with this kind of use-case in mind. I think the use-case I am debugging has always been broken. This is a fairly involved gstreamer pipeline to my knowledge has never been tested. This page fault is a known issue at least for a couple of releases- I started looking into it a couple of releases ago towards the end of 4.9. Maybe there are some simpler use-cases that will work. I am also curious if this works well when DMA_ATTR_NO_KERNEL_MAPPING is set. It is not in my use-case. Thanks, -- Shuah > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
On Wed, Mar 22, 2017 at 02:29:09PM -0600, Shuah Khan wrote: > On Wed, Mar 22, 2017 at 10:32 AM, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: > > Right - dma_to_pfn() will correctly translate the "handle" to a PFN > > number, so we can just multiply that with PAGE_SIZE to get the physical > > address, which if I'm interpreting your information above correctly, > > would be 0x72d00000. > > Correct paddr is 0x72d00000. I forgot to add the allocation time debug dump: > > [ 154.619456] platform 11000000.codec:left: vb2_dc_alloc: buf > ecdced80 cookie f2d00000 vaddr f2d00000 paddr 72d00000 dma_addr > bca00000 size 946176 dc_cookie ecdced90 dma_to_pfn 772608 > > > However, what you're saying is that there is no valid "struct page" > > associated with that address. > > >From what I can tell from the data I have, dma_to_pfn(dev, handle); > doesn't return valid page at arm_dma_get_sgtable(), even though > dma_to_pfn(dev, handle); itself stays the same from alloc time, as you > can see in the above dump from vb2_dc_alloc(): dma_to_pfn 772608 dma_to_pfn() returns the physical page frame number of the DMA address in handle. It's page number 772608, which multiplying by the page size gives us a physical address of 0xbca00000. So, it's correct in so far as if the dma_addr (of 0xbca00000 above) is correct, then that corresponds with a physical address of 0xbca00000. I now think that 0xf2d00000 virtual is a remapped address, not part of lowmem (it's within the vmalloc region) which will probably be confirmed if you look in /proc/vmallocinfo. Since this address is remapped, "__pa(buf->vaddr)" is invalid, it does not reference the physical page at 0x72d00000 at all. __pa() is just doing a simple mathematical offset translation, which is why it's undefined for remapped addresses. So, I'll go with 0xbca00000 being the real physical address, not 0x72d00000. > I am guessing page might not be valid, even at the alloc time. That is > next on debug list. If this is coming from dma_declare_coherent_memory() then it won't have a struct page associated with it. > Which is somewhat puzzling, because dma_attrs don't have the > DMA_ATTR_NO_KERNEL_MAP set, __dma_alloc() passes in want_vaddr true. > __alloc_from_contiguous() will map the page and return a valid vaddr > > __dma_alloc() then does > *handle = pfn_to_dma(dev, page_to_pfn(page)); > > to return the handle. I think you're missing something, which is the code in include/linux/dma-mapping.h - dma_alloc_attrs(). That calls dma_alloc_from_coherent() first, which is how memory provided via dma_declare_coherent_memory() is allocated. This bypasses the architecture code in arch/arm/mm/dma-mapping.c. This provides memory which is _not_ backed by a valid "struct page". > > The correct way to validate this is: > > > > unsigned long pfn = dma_to_pfn(dev, handle); > > struct page *page; > > > > if (!pfn_valid(pfn)) > > return -ENXIO; > > > > page = pfn_to_page(pfn); > > > > However, I have a huge big problem with this - this can pass even for > > memory obtained through the coherent allocator (because of the way CMA > > works.) > > Right. DMA_ATTR_NO_KERNEL_MAP is another variable, if set, there won't > be an kernel mapping. Correct - which is why the above is probably the most correct way on ARM (provided there is no IOMMU in the path) to get the corresponding valid struct page pointer. > > This can end up with the DMA streaming API kmapping memory using > > cacheable attributes while CMA has it mapped with uncacheable attributes, > > where (as mentioned on IRC this morning): > > Which irc channel?? I can get on there for the discussion if it makes > it easier. It was for an unrelated discussion. > We can see some of the evidence here in this case. One driver allocates > the buffer and another driber tries to import. After buffer is exported, > there is a window between DQBUF and QBUF, DQBUF unmaps the buffers. Then > the importer comes along referencing the sg_table and looks to map them. > Before mapping, looks like D-cache clean has to occur and that would need > vaddr if I understand correctly. So there a lot of places, it can trip > all over. This is wrong, and this is where I say that the dma_buf stuff is totally broken. DMA coherent memory must _never_ be passed to the streaming DMA API. No exceptions. However, the dma_buf stuff does not differentiate between DMA coherent memory and streaming DMA memory - the model is purely around the exporter creating a scatterlist, and the importer calling dma_map_sg() on the scatterlist to obtain its own set of DMA addresses. (sorry for the caps, but it's an important point) THIS DOES NOT WORK. It is a violation of the DMA API. > > I also think that the dma_get_sgtable() thing is a total trainwreck, > > and we need to have a rethink about this. > > I don't think it works for the case I am running into. This is the root cause - the idea that you can create a valid scatterlist from a DMA coherent area is _just_ completely broken. It's not possible. dma_get_sgtable() is _fundamentally_ broken in its design. > > This isn't a case of "something changed in the architecture that broke > > dma_get_sgtable()" - this is a case that it's always been broken in > > this way ever since dma_get_sgtable() was introduced, and dma_get_sgtable() > > was never (properly?) reviewed with this kind of use-case in mind. > > I think the use-case I am debugging has always been broken. It's _always_ been invalid to pass memory allocated from dma_alloc_coherent() into the streaming API functions (like dma_map_sg().) So, you are quite correct that this has _always_ been broken. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Okay, I'm about to merge the following patch for -rc, since refusing
to create a scattertable for non-page backed memory is the only valid
solution for that case. I'm intending to queue this for -rc this
evening.
8<====
ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
dma_get_sgtable() tries to create a scatterlist table containing valid
struct page pointers for the coherent memory allocation passed in to it.
However, memory can be declared via dma_declare_coherent_memory(), or
via other reservation schemes which means that coherent memory is not
guaranteed to be backed by struct pages. This means it is not possible
to create a valid scatterlist via this mechanism.
This patch adds detection of such memory, and refuses to create a
scatterlist table for such memory.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
---
arch/arm/mm/dma-mapping.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
--
2.7.4
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 63eabb06f9f1..475811f5383a 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -935,13 +935,31 @@ static void arm_coherent_dma_free(struct device *dev, size_t size, void *cpu_add
__arm_dma_free(dev, size, cpu_addr, handle, attrs, true);
}
+/*
+ * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems
+ * that the intention is to allow exporting memory allocated via the
+ * coherent DMA APIs through the dma_buf API, which only accepts a
+ * scattertable. This presents a couple of problems:
+ * 1. Not all memory allocated via the coherent DMA APIs is backed by
+ * a struct page
+ * 2. Passing coherent DMA memory into the streaming APIs is not allowed
+ * as we will try to flush the memory through a different alias to that
+ * actually being used (and the flushes are redundant.)
+ */
int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t handle, size_t size,
unsigned long attrs)
{
- struct page *page = pfn_to_page(dma_to_pfn(dev, handle));
+ unsigned long pfn = dma_to_pfn(dev, handle);
+ struct page *page;
int ret;
+ /* If the PFN is not valid, we do not have a struct page */
+ if (!pfn_valid(pfn))
+ return -ENXIO;
+
+ page = pfn_to_page(pfn);
+
ret = sg_alloc_table(sgt, 1, GFP_KERNEL);
if (unlikely(ret))
return ret;
Hi Russell, On Wed, Mar 29, 2017 at 10:33 AM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > Okay, I'm about to merge the following patch for -rc, since refusing > to create a scattertable for non-page backed memory is the only valid > solution for that case. I'm intending to queue this for -rc this > evening. I was bit sidetracked and getting back to this today. I am just about to test this change on my system. I will test your patch and send you results. This might not help my use-case - I suspect it will trip this check. That said I will send you results very soon. > > 8<==== > ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory > > dma_get_sgtable() tries to create a scatterlist table containing valid > struct page pointers for the coherent memory allocation passed in to it. > > However, memory can be declared via dma_declare_coherent_memory(), or > via other reservation schemes which means that coherent memory is not > guaranteed to be backed by struct pages. This means it is not possible > to create a valid scatterlist via this mechanism. I have been thinking about this some. I would like to see if we can provide a way forward to address the cases where coherent memory is not guaranteed to be backed by struct pages. We know the memory isn't backed at alloc time in dma_alloc_coherent(). Can we key off of that maybe add a new attr flag to avoid page lookups. I am willing to work on the fixing it. Let me first send you the results after testing your patch. Maybe we can explore ways to fix the problem. thanks, -- Shuah > > This patch adds detection of such memory, and refuses to create a > scatterlist table for such memory. > > Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> > --- > arch/arm/mm/dma-mapping.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c > index 63eabb06f9f1..475811f5383a 100644 > --- a/arch/arm/mm/dma-mapping.c > +++ b/arch/arm/mm/dma-mapping.c > @@ -935,13 +935,31 @@ static void arm_coherent_dma_free(struct device *dev, size_t size, void *cpu_add > __arm_dma_free(dev, size, cpu_addr, handle, attrs, true); > } > > +/* > + * The whole dma_get_sgtable() idea is fundamentally unsafe - it seems > + * that the intention is to allow exporting memory allocated via the > + * coherent DMA APIs through the dma_buf API, which only accepts a > + * scattertable. This presents a couple of problems: > + * 1. Not all memory allocated via the coherent DMA APIs is backed by > + * a struct page > + * 2. Passing coherent DMA memory into the streaming APIs is not allowed > + * as we will try to flush the memory through a different alias to that > + * actually being used (and the flushes are redundant.) > + */ > int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt, > void *cpu_addr, dma_addr_t handle, size_t size, > unsigned long attrs) > { > - struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); > + unsigned long pfn = dma_to_pfn(dev, handle); > + struct page *page; > int ret; > > + /* If the PFN is not valid, we do not have a struct page */ > + if (!pfn_valid(pfn)) > + return -ENXIO; > + > + page = pfn_to_page(pfn); > + > ret = sg_alloc_table(sgt, 1, GFP_KERNEL); > if (unlikely(ret)) > return ret; > -- > 2.7.4 > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
On Wed, Mar 29, 2017 at 10:51:05AM -0600, Shuah Khan wrote: > Hi Russell, > > On Wed, Mar 29, 2017 at 10:33 AM, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: > > Okay, I'm about to merge the following patch for -rc, since refusing > > to create a scattertable for non-page backed memory is the only valid > > solution for that case. I'm intending to queue this for -rc this > > evening. > > I was bit sidetracked and getting back to this today. I am just about > to test this change on my system. I will test your patch and send you > results. > > This might not help my use-case - I suspect it will trip this check. > That said I will send you results very soon. I do hope that it _does_ trip this check - the result should be that you get an error without the kernel oopsing. > I have been thinking about this some. I would like to see if we can > provide a way forward to address the cases where coherent memory is > not guaranteed to be backed by struct pages. We know the memory isn't > backed at alloc time in dma_alloc_coherent(). Can we key off of that > maybe add a new attr flag to avoid page lookups. I am willing to work > on the fixing it. What we need is to revise the dma-buf API and DMA APIs to allow passing coherent memory through it - there is no current provision in the DMA APIs to have coherent memory mapped for two peripherals. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up according to speedtest.net.
Hi Russell, On Wed, Mar 29, 2017 at 11:03 AM, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Wed, Mar 29, 2017 at 10:51:05AM -0600, Shuah Khan wrote: >> Hi Russell, >> >> On Wed, Mar 29, 2017 at 10:33 AM, Russell King - ARM Linux >> <linux@armlinux.org.uk> wrote: >> > Okay, I'm about to merge the following patch for -rc, since refusing >> > to create a scattertable for non-page backed memory is the only valid >> > solution for that case. I'm intending to queue this for -rc this >> > evening. >> >> I was bit sidetracked and getting back to this today. I am just about >> to test this change on my system. I will test your patch and send you >> results. >> >> This might not help my use-case - I suspect it will trip this check. >> That said I will send you results very soon. > > I do hope that it _does_ trip this check - the result should be that you > get an error without the kernel oopsing. Agreed. I would rather have it fail gracefully earlier than limp along and die later. I would make a suggestion based on these results. Please consider adding a dev_err() to if (!pfn_valid(pfn)) condition to indicate why get_sgtable failed including __func__ I worry about adding dev_err() to a core function, but in this case, it would be helpful to do to make it very clear that the requested use-case can't be supported. Results: [ 235.706048] platform 11000000.codec:left: failed to get scatterlist from DMA API The following is a WARN_ON from vb2_dc_get_dmabuf() - probably should go away. I will send a patch removing the WARN_ON to linux-media [ 235.711985] ------------[ cut here ]------------ [ 235.712001] WARNING: CPU: 6 PID: 1987 at drivers/media/v4l2-core/videobuf2-dma-contig.c:402 vb2_dc_get_dmabuf+0x130/0x168 [videobuf2_dma_contig] [ 235.712005] Modules linked in: cpufreq_userspace cpufreq_powersave cpufreq_conservative s5p_jpeg s5p_mfc exynos_gsc v4l2_mem2mem videobuf2_dma_contig v4l2_common videobuf2_memops videobuf2_v4l2 videobuf2_core videodev media [ 235.712063] CPU: 6 PID: 1987 Comm: qtdemux0:sink Not tainted 4.11.0-rc2-00001-g49214d0-dirty #11 [ 235.712068] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) [ 235.712085] [<c010e244>] (unwind_backtrace) from [<c010aea0>] (show_stack+0x10/0x14) [ 235.712096] [<c010aea0>] (show_stack) from [<c033ef50>] (dump_stack+0x78/0x8c) [ 235.712106] [<c033ef50>] (dump_stack) from [<c011b3c8>] (__warn+0xe8/0x100) [ 235.712113] [<c011b3c8>] (__warn) from [<c011b490>] (warn_slowpath_null+0x20/0x28) [ 235.712122] [<c011b490>] (warn_slowpath_null) from [<bf20d8b8>] (vb2_dc_get_dmabuf+0x130/0x168 [videobuf2_dma_contig]) [ 235.712141] [<bf20d8b8>] (vb2_dc_get_dmabuf [videobuf2_dma_contig]) from [<bf151ec4>] (vb2_core_expbuf+0x1d4/0x2d8 [videobuf2_core]) [ 235.712159] [<bf151ec4>] (vb2_core_expbuf [videobuf2_core]) from [<bf19c198>] (vb2_expbuf+0x2c/0x34 [videobuf2_v4l2]) [ 235.712195] [<bf19c198>] (vb2_expbuf [videobuf2_v4l2]) from [<bf04b834>] (__video_do_ioctl+0x204/0x2fc [videodev]) [ 235.712241] [<bf04b834>] (__video_do_ioctl [videodev]) from [<bf04b390>] (video_usercopy+0x24c/0x4e0 [videodev]) [ 235.712285] [<bf04b390>] (video_usercopy [videodev]) from [<bf0465a4>] (v4l2_ioctl+0xa0/0xd8 [videodev]) [ 235.712316] [<bf0465a4>] (v4l2_ioctl [videodev]) from [<c01fe920>] (do_vfs_ioctl+0x9c/0x8e0) [ 235.712325] [<c01fe920>] (do_vfs_ioctl) from [<c01ff198>] (SyS_ioctl+0x34/0x5c) [ 235.712334] [<c01ff198>] (SyS_ioctl) from [<c0107740>] (ret_fast_syscall+0x0/0x3c) [ 235.712339] ---[ end trace cdfda037de46497c ]--- > >> I have been thinking about this some. I would like to see if we can >> provide a way forward to address the cases where coherent memory is >> not guaranteed to be backed by struct pages. We know the memory isn't >> backed at alloc time in dma_alloc_coherent(). Can we key off of that >> maybe add a new attr flag to avoid page lookups. I am willing to work >> on the fixing it. > > What we need is to revise the dma-buf API and DMA APIs to allow passing > coherent memory through it - there is no current provision in the DMA APIs > to have coherent memory mapped for two peripherals. I will explore adding the support. I do think there is value in being able to use coherent memory buffers for DMA. thanks, -- Shuah > > -- > RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ > FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up > according to speedtest.net.
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index fc4a4eaa9934..4b2cf82fbca6 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -942,6 +942,10 @@ int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt, struct page *page = pfn_to_page(dma_to_pfn(dev, handle)); int ret; + /* Check if DMA handle has been properly translated to page. */ + if (pfn_to_dma(dev, page_to_pfn(page)) != handle) + return -ENXIO; + ret = sg_alloc_table(sgt, 1, GFP_KERNEL); if (unlikely(ret)) return ret;
Some DMA regions, for example those created by dma_declare_coherent_memory, might not be possible to be represented by struct page pointers. Getting scatterlist for the buffer allocated from such region is not possible. This patch adds a simple check in arm_dma_get_sgtable() function if the exported struct page pointer is valid. Fixes: dc2832e1e7db3 ("ARM: dma-mapping: add support for dma_get_sgtable()") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> --- arch/arm/mm/dma-mapping.c | 4 ++++ 1 file changed, 4 insertions(+) -- 1.9.1