Message ID | 1562251569-16506-1-git-send-email-ilias.apalodimas@linaro.org |
---|---|
State | New |
Headers | show |
Series | [net-next,v2] net: netsec: Sync dma for device on buffer allocation | expand |
On Thu, Jul 4, 2019 at 4:46 PM Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c > index 5544a722543f..ada7626bf3a2 100644 > --- a/drivers/net/ethernet/socionext/netsec.c > > + dma_start = page_pool_get_dma_addr(page); > /* We allocate the same buffer length for XDP and non-XDP cases. > * page_pool API will map the whole page, skip what's needed for > * network payloads and/or XDP > */ > - *dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM; > + *dma_handle = dma_start + NETSEC_RXBUF_HEADROOM; > /* Make sure the incoming payload fits in the page for XDP and non-XDP > * cases and reserve enough space for headroom + skb_shared_info > */ > *desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA; > + dma_dir = page_pool_get_dma_dir(dring->page_pool); > + dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir); Should this maybe become part of the page_pool_*() interfaces? Basically in order to map a page from the pool, any driver would have to go through these exact steps, so you could make it a combined function call dma_addr_t page_pool_sync_for_device(dring->page_pool, page); Or even fold the page_pool_dev_alloc_pages() into it as well and make that return both the virtual and dma addresses. Arnd
On Thu, 4 Jul 2019 17:46:09 +0300 Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > Quoting Arnd, > > We have to do a sync_single_for_device /somewhere/ before the > buffer is given to the device. On a non-cache-coherent machine with > a write-back cache, there may be dirty cache lines that get written back > after the device DMA's data into it (e.g. from a previous memset > from before the buffer got freed), so you absolutely need to flush any > dirty cache lines on it first. > > Since the coherency is configurable in this device make sure we cover > all configurations by explicitly syncing the allocated buffer for the > device before refilling it's descriptors > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > --- > > Changes since V1: > - Make the code more readable > > drivers/net/ethernet/socionext/netsec.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c > index 5544a722543f..ada7626bf3a2 100644 > --- a/drivers/net/ethernet/socionext/netsec.c > +++ b/drivers/net/ethernet/socionext/netsec.c > @@ -727,21 +727,26 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv, > { > > struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX]; > + enum dma_data_direction dma_dir; > + dma_addr_t dma_start; > struct page *page; > > page = page_pool_dev_alloc_pages(dring->page_pool); > if (!page) > return NULL; > > + dma_start = page_pool_get_dma_addr(page); > /* We allocate the same buffer length for XDP and non-XDP cases. > * page_pool API will map the whole page, skip what's needed for > * network payloads and/or XDP > */ > - *dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM; > + *dma_handle = dma_start + NETSEC_RXBUF_HEADROOM; > /* Make sure the incoming payload fits in the page for XDP and non-XDP > * cases and reserve enough space for headroom + skb_shared_info > */ > *desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA; > + dma_dir = page_pool_get_dma_dir(dring->page_pool); > + dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir); It's it costly to sync_for_device the entire page size? E.g. we already know that the head-room is not touched by device. And we actually want this head-room cache-hot for e.g. xdp_frame, thus it would be unfortunate if the head-room is explicitly evicted from the cache here. Even smarter, the driver could do the sync for_device, when it release/recycle page, as it likely know the exact length that was used by the packet. -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer
On Thu, Jul 04, 2019 at 07:39:44PM +0200, Jesper Dangaard Brouer wrote: > On Thu, 4 Jul 2019 17:46:09 +0300 > Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > > Quoting Arnd, > > > > We have to do a sync_single_for_device /somewhere/ before the > > buffer is given to the device. On a non-cache-coherent machine with > > a write-back cache, there may be dirty cache lines that get written back > > after the device DMA's data into it (e.g. from a previous memset > > from before the buffer got freed), so you absolutely need to flush any > > dirty cache lines on it first. > > > > Since the coherency is configurable in this device make sure we cover > > all configurations by explicitly syncing the allocated buffer for the > > device before refilling it's descriptors > > > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > > --- > > > > Changes since V1: > > - Make the code more readable > > > > drivers/net/ethernet/socionext/netsec.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c > > index 5544a722543f..ada7626bf3a2 100644 > > --- a/drivers/net/ethernet/socionext/netsec.c > > +++ b/drivers/net/ethernet/socionext/netsec.c > > @@ -727,21 +727,26 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv, > > { > > > > struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX]; > > + enum dma_data_direction dma_dir; > > + dma_addr_t dma_start; > > struct page *page; > > > > page = page_pool_dev_alloc_pages(dring->page_pool); > > if (!page) > > return NULL; > > > > + dma_start = page_pool_get_dma_addr(page); > > /* We allocate the same buffer length for XDP and non-XDP cases. > > * page_pool API will map the whole page, skip what's needed for > > * network payloads and/or XDP > > */ > > - *dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM; > > + *dma_handle = dma_start + NETSEC_RXBUF_HEADROOM; > > /* Make sure the incoming payload fits in the page for XDP and non-XDP > > * cases and reserve enough space for headroom + skb_shared_info > > */ > > *desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA; > > + dma_dir = page_pool_get_dma_dir(dring->page_pool); > > + dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir); > > It's it costly to sync_for_device the entire page size? > > E.g. we already know that the head-room is not touched by device. And > we actually want this head-room cache-hot for e.g. xdp_frame, thus it > would be unfortunate if the head-room is explicitly evicted from the > cache here. > > Even smarter, the driver could do the sync for_device, when it > release/recycle page, as it likely know the exact length that was used > by the packet. It does sync for device when recycling takes place in XDP_TX with the correct size. I guess i can explicitly sync on the xdp_return_buff cases, and netsec_setup_rx_dring() instead of the generic buffer allocation I'll send a V3 Thanks! /Ilias
On Thu, Jul 04, 2019 at 08:52:50PM +0300, Ilias Apalodimas wrote: > On Thu, Jul 04, 2019 at 07:39:44PM +0200, Jesper Dangaard Brouer wrote: > > On Thu, 4 Jul 2019 17:46:09 +0300 > > Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote: > > > > > Quoting Arnd, > > > > > > We have to do a sync_single_for_device /somewhere/ before the > > > buffer is given to the device. On a non-cache-coherent machine with > > > a write-back cache, there may be dirty cache lines that get written back > > > after the device DMA's data into it (e.g. from a previous memset > > > from before the buffer got freed), so you absolutely need to flush any > > > dirty cache lines on it first. > > > > > > Since the coherency is configurable in this device make sure we cover > > > all configurations by explicitly syncing the allocated buffer for the > > > device before refilling it's descriptors > > > > > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > > > --- > > > > > > Changes since V1: > > > - Make the code more readable > > > > > > drivers/net/ethernet/socionext/netsec.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c > > > index 5544a722543f..ada7626bf3a2 100644 > > > --- a/drivers/net/ethernet/socionext/netsec.c > > > +++ b/drivers/net/ethernet/socionext/netsec.c > > > @@ -727,21 +727,26 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv, > > > { > > > > > > struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX]; > > > + enum dma_data_direction dma_dir; > > > + dma_addr_t dma_start; > > > struct page *page; > > > > > > page = page_pool_dev_alloc_pages(dring->page_pool); > > > if (!page) > > > return NULL; > > > > > > + dma_start = page_pool_get_dma_addr(page); > > > /* We allocate the same buffer length for XDP and non-XDP cases. > > > * page_pool API will map the whole page, skip what's needed for > > > * network payloads and/or XDP > > > */ > > > - *dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM; > > > + *dma_handle = dma_start + NETSEC_RXBUF_HEADROOM; > > > /* Make sure the incoming payload fits in the page for XDP and non-XDP > > > * cases and reserve enough space for headroom + skb_shared_info > > > */ > > > *desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA; > > > + dma_dir = page_pool_get_dma_dir(dring->page_pool); > > > + dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir); > > > > It's it costly to sync_for_device the entire page size? > > > > E.g. we already know that the head-room is not touched by device. And > > we actually want this head-room cache-hot for e.g. xdp_frame, thus it > > would be unfortunate if the head-room is explicitly evicted from the > > cache here. > > > > Even smarter, the driver could do the sync for_device, when it > > release/recycle page, as it likely know the exact length that was used > > by the packet. > It does sync for device when recycling takes place in XDP_TX with the correct > size. > I guess i can explicitly sync on the xdp_return_buff cases, and > netsec_setup_rx_dring() instead of the generic buffer allocation > > I'll send a V3 On a second thought i think this is going to look a bit complicated for no apparent reason. If i do this i'll have to track the buffers that got recycled vs buffers that are freshly allocated (and sync in this case). I currently have no way of cwtelling if the buffer is new or recycled, so i'll just sync the payload for now as you suggested. Maybe this information can be added on page_pool_dev_alloc_pages() ? Thanks /Ilias
From: Ilias Apalodimas <ilias.apalodimas@linaro.org> Date: Thu, 4 Jul 2019 17:46:09 +0300 > Quoting Arnd, > > We have to do a sync_single_for_device /somewhere/ before the > buffer is given to the device. On a non-cache-coherent machine with > a write-back cache, there may be dirty cache lines that get written back > after the device DMA's data into it (e.g. from a previous memset > from before the buffer got freed), so you absolutely need to flush any > dirty cache lines on it first. > > Since the coherency is configurable in this device make sure we cover > all configurations by explicitly syncing the allocated buffer for the > device before refilling it's descriptors > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> > --- > > Changes since V1: > - Make the code more readable Ooops, I applied v1. Could you please send me any changes still necessary relative to that? Thanks.
diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c index 5544a722543f..ada7626bf3a2 100644 --- a/drivers/net/ethernet/socionext/netsec.c +++ b/drivers/net/ethernet/socionext/netsec.c @@ -727,21 +727,26 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv, { struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX]; + enum dma_data_direction dma_dir; + dma_addr_t dma_start; struct page *page; page = page_pool_dev_alloc_pages(dring->page_pool); if (!page) return NULL; + dma_start = page_pool_get_dma_addr(page); /* We allocate the same buffer length for XDP and non-XDP cases. * page_pool API will map the whole page, skip what's needed for * network payloads and/or XDP */ - *dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM; + *dma_handle = dma_start + NETSEC_RXBUF_HEADROOM; /* Make sure the incoming payload fits in the page for XDP and non-XDP * cases and reserve enough space for headroom + skb_shared_info */ *desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA; + dma_dir = page_pool_get_dma_dir(dring->page_pool); + dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir); return page_address(page); }
Quoting Arnd, We have to do a sync_single_for_device /somewhere/ before the buffer is given to the device. On a non-cache-coherent machine with a write-back cache, there may be dirty cache lines that get written back after the device DMA's data into it (e.g. from a previous memset from before the buffer got freed), so you absolutely need to flush any dirty cache lines on it first. Since the coherency is configurable in this device make sure we cover all configurations by explicitly syncing the allocated buffer for the device before refilling it's descriptors Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> --- Changes since V1: - Make the code more readable drivers/net/ethernet/socionext/netsec.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) -- 2.20.1