Message ID | 20201020122046.31167-10-tzimmermann@suse.de |
---|---|
State | Superseded |
Headers | show |
Series | Support GEM object mappings from I/O memory | expand |
Overall I like this, just an inline question: On Tue, Oct 20, 2020 at 2:20 PM Thomas Zimmermann <tzimmermann@suse.de> wrote: > To do framebuffer updates, one needs memcpy from system memory and a > pointer-increment function. Add both interfaces with documentation. (...) > +/** > + * dma_buf_map_memcpy_to - Memcpy into dma-buf mapping > + * @dst: The dma-buf mapping structure > + * @src: The source buffer > + * @len: The number of byte in src > + * > + * Copies data into a dma-buf mapping. The source buffer is in system > + * memory. Depending on the buffer's location, the helper picks the correct > + * method of accessing the memory. > + */ > +static inline void dma_buf_map_memcpy_to(struct dma_buf_map *dst, const void *src, size_t len) > +{ > + if (dst->is_iomem) > + memcpy_toio(dst->vaddr_iomem, src, len); > + else > + memcpy(dst->vaddr, src, len); > +} Are these going to be really big memcpy() operations? Some platforms have DMA offload engines that can perform memcpy(), drivers/dma, include/linux/dmaengine.h especially if the CPU doesn't really need to touch the contents and flush caches etc. An example exist in some MTD drivers that move large quantities of data off flash memory like this: drivers/mtd/nand/raw/cadence-nand-controller.c Notice that DMAengine and DMAbuf does not have much in common, the names can be deceiving. The value of this varies with the system architecture. It is not just a question about performance but also about power and the CPU being able to do other stuff in parallel for large transfers. So *when* to use this facility to accelerate memcpy() is a delicate question. What I'm after here is if these can be really big, do we want (in the long run, not now) open up to the idea to slot in hardware-accelerated memcpy() here? Yours, Linus Walleij
Hi Am 05.11.20 um 11:07 schrieb Linus Walleij: > Overall I like this, just an inline question: > > On Tue, Oct 20, 2020 at 2:20 PM Thomas Zimmermann <tzimmermann@suse.de> wrote: > >> To do framebuffer updates, one needs memcpy from system memory and a >> pointer-increment function. Add both interfaces with documentation. > > (...) >> +/** >> + * dma_buf_map_memcpy_to - Memcpy into dma-buf mapping >> + * @dst: The dma-buf mapping structure >> + * @src: The source buffer >> + * @len: The number of byte in src >> + * >> + * Copies data into a dma-buf mapping. The source buffer is in system >> + * memory. Depending on the buffer's location, the helper picks the correct >> + * method of accessing the memory. >> + */ >> +static inline void dma_buf_map_memcpy_to(struct dma_buf_map *dst, const void *src, size_t len) >> +{ >> + if (dst->is_iomem) >> + memcpy_toio(dst->vaddr_iomem, src, len); >> + else >> + memcpy(dst->vaddr, src, len); >> +} > > Are these going to be really big memcpy() operations? Individually, each could be a scanline, so a few KiB. (4 bytes * horizontal resolution). Updating a full framebuffer can sum up to several MiB. > > Some platforms have DMA offload engines that can perform memcpy(),They could be > drivers/dma, include/linux/dmaengine.h > especially if the CPU doesn't really need to touch the contents > and flush caches etc. > An example exist in some MTD drivers that move large quantities of > data off flash memory like this: > drivers/mtd/nand/raw/cadence-nand-controller.c > > Notice that DMAengine and DMAbuf does not have much in common, > the names can be deceiving. > > The value of this varies with the system architecture. It is not just > a question about performance but also about power and the CPU > being able to do other stuff in parallel for large transfers. So *when* > to use this facility to accelerate memcpy() is a delicate question. > > What I'm after here is if these can be really big, do we want > (in the long run, not now) open up to the idea to slot in > hardware-accelerated memcpy() here? We currently use this functionality for the graphical framebuffer console that most DRM drivers provide. It's non-accelerated and slow, but this has not been much of a problem so far. Within DRM, we're more interested in removing console code from drivers and going for the generic implementation. Most of the graphics HW allocates framebuffers from video RAM, system memory or CMA pools and does not really need these memcpys. Only a few systems with small video RAM require a shadow buffer, which we flush into VRAM as needed. Those might benefit. OTOH, off-loading memcpys to hardware sounds reasonable if we can hide it from the DRM code. I think it all depends on how invasive that change would be. Best regards Thomas > > Yours, > Linus Walleij >
On Thu, Nov 05, 2020 at 11:37:08AM +0100, Thomas Zimmermann wrote: > Hi > > Am 05.11.20 um 11:07 schrieb Linus Walleij: > > Overall I like this, just an inline question: > > > > On Tue, Oct 20, 2020 at 2:20 PM Thomas Zimmermann <tzimmermann@suse.de> wrote: > > > >> To do framebuffer updates, one needs memcpy from system memory and a > >> pointer-increment function. Add both interfaces with documentation. > > > > (...) > >> +/** > >> + * dma_buf_map_memcpy_to - Memcpy into dma-buf mapping > >> + * @dst: The dma-buf mapping structure > >> + * @src: The source buffer > >> + * @len: The number of byte in src > >> + * > >> + * Copies data into a dma-buf mapping. The source buffer is in system > >> + * memory. Depending on the buffer's location, the helper picks the correct > >> + * method of accessing the memory. > >> + */ > >> +static inline void dma_buf_map_memcpy_to(struct dma_buf_map *dst, const void *src, size_t len) > >> +{ > >> + if (dst->is_iomem) > >> + memcpy_toio(dst->vaddr_iomem, src, len); > >> + else > >> + memcpy(dst->vaddr, src, len); > >> +} > > > > Are these going to be really big memcpy() operations? > > Individually, each could be a scanline, so a few KiB. (4 bytes * > horizontal resolution). Updating a full framebuffer can sum up to > several MiB. > > > > > Some platforms have DMA offload engines that can perform memcpy(),They could be > > drivers/dma, include/linux/dmaengine.h > > especially if the CPU doesn't really need to touch the contents > > and flush caches etc. > > An example exist in some MTD drivers that move large quantities of > > data off flash memory like this: > > drivers/mtd/nand/raw/cadence-nand-controller.c > > > > Notice that DMAengine and DMAbuf does not have much in common, > > the names can be deceiving. > > > > The value of this varies with the system architecture. It is not just > > a question about performance but also about power and the CPU > > being able to do other stuff in parallel for large transfers. So *when* > > to use this facility to accelerate memcpy() is a delicate question. > > > > What I'm after here is if these can be really big, do we want > > (in the long run, not now) open up to the idea to slot in > > hardware-accelerated memcpy() here? > > We currently use this functionality for the graphical framebuffer > console that most DRM drivers provide. It's non-accelerated and slow, > but this has not been much of a problem so far. > > Within DRM, we're more interested in removing console code from drivers > and going for the generic implementation. > > Most of the graphics HW allocates framebuffers from video RAM, system > memory or CMA pools and does not really need these memcpys. Only a few > systems with small video RAM require a shadow buffer, which we flush > into VRAM as needed. Those might benefit. > > OTOH, off-loading memcpys to hardware sounds reasonable if we can hide > it from the DRM code. I think it all depends on how invasive that change > would be. I wouldn't, all the additional locks this would pull in sound like nightmare. And when an oops happens, this might be the only thing that manages to get the oops to the user. Unless someone really starts caring about fbcon acceleration I really wouldn't bother. Ok maybe it also matters for fbdev, but the problem is that the page fault intercepting alone is already expensive, so the only real solution if you care about performance in that case is to use kms natively, and use a dirty rectangle flip (or the DIRTY syscall). And in there drivers should (and do) use any dma engines they have to upload the frames already. -Daniel
diff --git a/include/linux/dma-buf-map.h b/include/linux/dma-buf-map.h index 2e8bbecb5091..583a3a1f9447 100644 --- a/include/linux/dma-buf-map.h +++ b/include/linux/dma-buf-map.h @@ -7,6 +7,7 @@ #define __DMA_BUF_MAP_H__ #include <linux/io.h> +#include <linux/string.h> /** * DOC: overview @@ -32,6 +33,14 @@ * accessing the buffer. Use the returned instance and the helper functions * to access the buffer's memory in the correct way. * + * The type :c:type:`struct dma_buf_map <dma_buf_map>` and its helpers are + * actually independent from the dma-buf infrastructure. When sharing buffers + * among devices, drivers have to know the location of the memory to access + * the buffers in a safe way. :c:type:`struct dma_buf_map <dma_buf_map>` + * solves this problem for dma-buf and its users. If other drivers or + * sub-systems require similar functionality, the type could be generalized + * and moved to a more prominent header file. + * * Open-coding access to :c:type:`struct dma_buf_map <dma_buf_map>` is * considered bad style. Rather then accessing its fields directly, use one * of the provided helper functions, or implement your own. For example, @@ -51,6 +60,14 @@ * * dma_buf_map_set_vaddr_iomem(&map. 0xdeadbeaf); * + * Instances of struct dma_buf_map do not have to be cleaned up, but + * can be cleared to NULL with dma_buf_map_clear(). Cleared mappings + * always refer to system memory. + * + * .. code-block:: c + * + * dma_buf_map_clear(&map); + * * Test if a mapping is valid with either dma_buf_map_is_set() or * dma_buf_map_is_null(). * @@ -73,17 +90,19 @@ * if (dma_buf_map_is_equal(&sys_map, &io_map)) * // always false * - * Instances of struct dma_buf_map do not have to be cleaned up, but - * can be cleared to NULL with dma_buf_map_clear(). Cleared mappings - * always refer to system memory. + * A set up instance of struct dma_buf_map can be used to access or manipulate + * the buffer memory. Depending on the location of the memory, the provided + * helpers will pick the correct operations. Data can be copied into the memory + * with dma_buf_map_memcpy_to(). The address can be manipulated with + * dma_buf_map_incr(). * - * The type :c:type:`struct dma_buf_map <dma_buf_map>` and its helpers are - * actually independent from the dma-buf infrastructure. When sharing buffers - * among devices, drivers have to know the location of the memory to access - * the buffers in a safe way. :c:type:`struct dma_buf_map <dma_buf_map>` - * solves this problem for dma-buf and its users. If other drivers or - * sub-systems require similar functionality, the type could be generalized - * and moved to a more prominent header file. + * .. code-block:: c + * + * const void *src = ...; // source buffer + * size_t len = ...; // length of src + * + * dma_buf_map_memcpy_to(&map, src, len); + * dma_buf_map_incr(&map, len); // go to first byte after the memcpy */ /** @@ -210,4 +229,38 @@ static inline void dma_buf_map_clear(struct dma_buf_map *map) } } +/** + * dma_buf_map_memcpy_to - Memcpy into dma-buf mapping + * @dst: The dma-buf mapping structure + * @src: The source buffer + * @len: The number of byte in src + * + * Copies data into a dma-buf mapping. The source buffer is in system + * memory. Depending on the buffer's location, the helper picks the correct + * method of accessing the memory. + */ +static inline void dma_buf_map_memcpy_to(struct dma_buf_map *dst, const void *src, size_t len) +{ + if (dst->is_iomem) + memcpy_toio(dst->vaddr_iomem, src, len); + else + memcpy(dst->vaddr, src, len); +} + +/** + * dma_buf_map_incr - Increments the address stored in a dma-buf mapping + * @map: The dma-buf mapping structure + * @incr: The number of bytes to increment + * + * Increments the address stored in a dma-buf mapping. Depending on the + * buffer's location, the correct value will be updated. + */ +static inline void dma_buf_map_incr(struct dma_buf_map *map, size_t incr) +{ + if (map->is_iomem) + map->vaddr_iomem += incr; + else + map->vaddr += incr; +} + #endif /* __DMA_BUF_MAP_H__ */