Message ID | 20240730075755.10941-1-link@vivo.com |
---|---|
Headers | show |
Series | Introduce DMA_HEAP_ALLOC_AND_READ_FILE heap flag | expand |
在 2024/8/1 4:46, Daniel Vetter 写道: > On Tue, Jul 30, 2024 at 08:04:04PM +0800, Huan Yang wrote: >> 在 2024/7/30 17:05, Huan Yang 写道: >>> 在 2024/7/30 16:56, Daniel Vetter 写道: >>>> [????????? daniel.vetter@ffwll.ch ????????? >>>> https://aka.ms/LearnAboutSenderIdentification?????????????] >>>> >>>> On Tue, Jul 30, 2024 at 03:57:44PM +0800, Huan Yang wrote: >>>>> UDMA-BUF step: >>>>> 1. memfd_create >>>>> 2. open file(buffer/direct) >>>>> 3. udmabuf create >>>>> 4. mmap memfd >>>>> 5. read file into memfd vaddr >>>> Yeah this is really slow and the worst way to do it. You absolutely want >>>> to start _all_ the io before you start creating the dma-buf, ideally >>>> with >>>> everything running in parallel. But just starting the direct I/O with >>>> async and then creating the umdabuf should be a lot faster and avoid >>> That's greate, Let me rephrase that, and please correct me if I'm wrong. >>> >>> UDMA-BUF step: >>> 1. memfd_create >>> 2. mmap memfd >>> 3. open file(buffer/direct) >>> 4. start thread to async read >>> 3. udmabuf create >>> >>> With this, can improve >> I just test with it. Step is: >> >> UDMA-BUF step: >> 1. memfd_create >> 2. mmap memfd >> 3. open file(buffer/direct) >> 4. start thread to async read >> 5. udmabuf create >> >> 6 . join wait >> >> 3G file read all step cost 1,527,103,431ns, it's greate. > Ok that's almost the throughput of your patch set, which I think is close > enough. The remaining difference is probably just the mmap overhead, not > sure whether/how we can do direct i/o to an fd directly ... in principle > it's possible for any file that uses the standard pagecache. Yes, for mmap, IMO, now that we get all folios and pin it. That's mean all pfn it's got when udmabuf created. So, I think mmap with page fault is helpless for save memory but increase the mmap access cost.(maybe can save a little page table's memory) I want to offer a patchset to remove it and more suitable for folios operate(And remove unpin list). And contains some fix patch. I'll send it when I test it's good. About fd operation for direct I/O, maybe use sendfile or copy_file_range? sendfile base pipe buffer, it's low performance when I test is. copy_file_range can't work due to it's not the same file system. So, I can't find other way to do it. Can someone give some suggestions? > -Sima
On Thu, Aug 01, 2024 at 10:53:45AM +0800, Huan Yang wrote: > > 在 2024/8/1 4:46, Daniel Vetter 写道: > > On Tue, Jul 30, 2024 at 08:04:04PM +0800, Huan Yang wrote: > > > 在 2024/7/30 17:05, Huan Yang 写道: > > > > 在 2024/7/30 16:56, Daniel Vetter 写道: > > > > > [????????? daniel.vetter@ffwll.ch ????????? > > > > > https://aka.ms/LearnAboutSenderIdentification?????????????] > > > > > > > > > > On Tue, Jul 30, 2024 at 03:57:44PM +0800, Huan Yang wrote: > > > > > > UDMA-BUF step: > > > > > > 1. memfd_create > > > > > > 2. open file(buffer/direct) > > > > > > 3. udmabuf create > > > > > > 4. mmap memfd > > > > > > 5. read file into memfd vaddr > > > > > Yeah this is really slow and the worst way to do it. You absolutely want > > > > > to start _all_ the io before you start creating the dma-buf, ideally > > > > > with > > > > > everything running in parallel. But just starting the direct I/O with > > > > > async and then creating the umdabuf should be a lot faster and avoid > > > > That's greate, Let me rephrase that, and please correct me if I'm wrong. > > > > > > > > UDMA-BUF step: > > > > 1. memfd_create > > > > 2. mmap memfd > > > > 3. open file(buffer/direct) > > > > 4. start thread to async read > > > > 3. udmabuf create > > > > > > > > With this, can improve > > > I just test with it. Step is: > > > > > > UDMA-BUF step: > > > 1. memfd_create > > > 2. mmap memfd > > > 3. open file(buffer/direct) > > > 4. start thread to async read > > > 5. udmabuf create > > > > > > 6 . join wait > > > > > > 3G file read all step cost 1,527,103,431ns, it's greate. > > Ok that's almost the throughput of your patch set, which I think is close > > enough. The remaining difference is probably just the mmap overhead, not > > sure whether/how we can do direct i/o to an fd directly ... in principle > > it's possible for any file that uses the standard pagecache. > > Yes, for mmap, IMO, now that we get all folios and pin it. That's mean all > pfn it's got when udmabuf created. > > So, I think mmap with page fault is helpless for save memory but increase > the mmap access cost.(maybe can save a little page table's memory) > > I want to offer a patchset to remove it and more suitable for folios > operate(And remove unpin list). And contains some fix patch. > > I'll send it when I test it's good. > > > About fd operation for direct I/O, maybe use sendfile or copy_file_range? > > sendfile base pipe buffer, it's low performance when I test is. > > copy_file_range can't work due to it's not the same file system. > > So, I can't find other way to do it. Can someone give some suggestions? Yeah direct I/O to pagecache without an mmap might be too niche to be supported. Maybe io_uring has something, but I guess as unlikely as anything else. -Sima