[net-next,v2] net: netsec: Sync dma for device on buffer allocation

Message ID 1562251569-16506-1-git-send-email-ilias.apalodimas@linaro.org
State New
Headers show
Series
  • [net-next,v2] net: netsec: Sync dma for device on buffer allocation
Related show

Commit Message

Ilias Apalodimas July 4, 2019, 2:46 p.m.
Quoting Arnd,

We have to do a sync_single_for_device /somewhere/ before the
buffer is given to the device. On a non-cache-coherent machine with
a write-back cache, there may be dirty cache lines that get written back
after the device DMA's data into it (e.g. from a previous memset
from before the buffer got freed), so you absolutely need to flush any
dirty cache lines on it first.

Since the coherency is configurable in this device make sure we cover
all configurations by explicitly syncing the allocated buffer for the
device before refilling it's descriptors

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

---

Changes since V1: 
- Make the code more readable
 
 drivers/net/ethernet/socionext/netsec.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

-- 
2.20.1

Comments

Arnd Bergmann July 4, 2019, 3:22 p.m. | #1
On Thu, Jul 4, 2019 at 4:46 PM Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c

> index 5544a722543f..ada7626bf3a2 100644

> --- a/drivers/net/ethernet/socionext/netsec.c

>

> +       dma_start = page_pool_get_dma_addr(page);

>         /* We allocate the same buffer length for XDP and non-XDP cases.

>          * page_pool API will map the whole page, skip what's needed for

>          * network payloads and/or XDP

>          */

> -       *dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM;

> +       *dma_handle = dma_start + NETSEC_RXBUF_HEADROOM;

>         /* Make sure the incoming payload fits in the page for XDP and non-XDP

>          * cases and reserve enough space for headroom + skb_shared_info

>          */

>         *desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA;

> +       dma_dir = page_pool_get_dma_dir(dring->page_pool);

> +       dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir);


Should this maybe become part of the page_pool_*() interfaces?
Basically in order to map a page from the pool, any driver would have
to go through these exact steps, so you could make it a combined function
call

dma_addr_t page_pool_sync_for_device(dring->page_pool, page);

Or even fold the page_pool_dev_alloc_pages() into it as well and
make that return both the virtual and dma addresses.

     Arnd
Jesper Dangaard Brouer July 4, 2019, 5:39 p.m. | #2
On Thu,  4 Jul 2019 17:46:09 +0300
Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:

> Quoting Arnd,

> 

> We have to do a sync_single_for_device /somewhere/ before the

> buffer is given to the device. On a non-cache-coherent machine with

> a write-back cache, there may be dirty cache lines that get written back

> after the device DMA's data into it (e.g. from a previous memset

> from before the buffer got freed), so you absolutely need to flush any

> dirty cache lines on it first.

> 

> Since the coherency is configurable in this device make sure we cover

> all configurations by explicitly syncing the allocated buffer for the

> device before refilling it's descriptors

> 

> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> ---

> 

> Changes since V1: 

> - Make the code more readable

>  

>  drivers/net/ethernet/socionext/netsec.c | 7 ++++++-

>  1 file changed, 6 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c

> index 5544a722543f..ada7626bf3a2 100644

> --- a/drivers/net/ethernet/socionext/netsec.c

> +++ b/drivers/net/ethernet/socionext/netsec.c

> @@ -727,21 +727,26 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv,

>  {

>  

>  	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];

> +	enum dma_data_direction dma_dir;

> +	dma_addr_t dma_start;

>  	struct page *page;

>  

>  	page = page_pool_dev_alloc_pages(dring->page_pool);

>  	if (!page)

>  		return NULL;

>  

> +	dma_start = page_pool_get_dma_addr(page);

>  	/* We allocate the same buffer length for XDP and non-XDP cases.

>  	 * page_pool API will map the whole page, skip what's needed for

>  	 * network payloads and/or XDP

>  	 */

> -	*dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM;

> +	*dma_handle = dma_start + NETSEC_RXBUF_HEADROOM;

>  	/* Make sure the incoming payload fits in the page for XDP and non-XDP

>  	 * cases and reserve enough space for headroom + skb_shared_info

>  	 */

>  	*desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA;

> +	dma_dir = page_pool_get_dma_dir(dring->page_pool);

> +	dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir);


It's it costly to sync_for_device the entire page size?

E.g. we already know that the head-room is not touched by device.  And
we actually want this head-room cache-hot for e.g. xdp_frame, thus it
would be unfortunate if the head-room is explicitly evicted from the
cache here.

Even smarter, the driver could do the sync for_device, when it
release/recycle page, as it likely know the exact length that was used
by the packet.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
Ilias Apalodimas July 4, 2019, 5:52 p.m. | #3
On Thu, Jul 04, 2019 at 07:39:44PM +0200, Jesper Dangaard Brouer wrote:
> On Thu,  4 Jul 2019 17:46:09 +0300

> Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:

> 

> > Quoting Arnd,

> > 

> > We have to do a sync_single_for_device /somewhere/ before the

> > buffer is given to the device. On a non-cache-coherent machine with

> > a write-back cache, there may be dirty cache lines that get written back

> > after the device DMA's data into it (e.g. from a previous memset

> > from before the buffer got freed), so you absolutely need to flush any

> > dirty cache lines on it first.

> > 

> > Since the coherency is configurable in this device make sure we cover

> > all configurations by explicitly syncing the allocated buffer for the

> > device before refilling it's descriptors

> > 

> > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> > ---

> > 

> > Changes since V1: 

> > - Make the code more readable

> >  

> >  drivers/net/ethernet/socionext/netsec.c | 7 ++++++-

> >  1 file changed, 6 insertions(+), 1 deletion(-)

> > 

> > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c

> > index 5544a722543f..ada7626bf3a2 100644

> > --- a/drivers/net/ethernet/socionext/netsec.c

> > +++ b/drivers/net/ethernet/socionext/netsec.c

> > @@ -727,21 +727,26 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv,

> >  {

> >  

> >  	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];

> > +	enum dma_data_direction dma_dir;

> > +	dma_addr_t dma_start;

> >  	struct page *page;

> >  

> >  	page = page_pool_dev_alloc_pages(dring->page_pool);

> >  	if (!page)

> >  		return NULL;

> >  

> > +	dma_start = page_pool_get_dma_addr(page);

> >  	/* We allocate the same buffer length for XDP and non-XDP cases.

> >  	 * page_pool API will map the whole page, skip what's needed for

> >  	 * network payloads and/or XDP

> >  	 */

> > -	*dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM;

> > +	*dma_handle = dma_start + NETSEC_RXBUF_HEADROOM;

> >  	/* Make sure the incoming payload fits in the page for XDP and non-XDP

> >  	 * cases and reserve enough space for headroom + skb_shared_info

> >  	 */

> >  	*desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA;

> > +	dma_dir = page_pool_get_dma_dir(dring->page_pool);

> > +	dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir);

> 

> It's it costly to sync_for_device the entire page size?

> 

> E.g. we already know that the head-room is not touched by device.  And

> we actually want this head-room cache-hot for e.g. xdp_frame, thus it

> would be unfortunate if the head-room is explicitly evicted from the

> cache here.

> 

> Even smarter, the driver could do the sync for_device, when it

> release/recycle page, as it likely know the exact length that was used

> by the packet.

It does sync for device when recycling takes place in XDP_TX with the correct
size. 
I guess i can explicitly sync on the xdp_return_buff cases, and 
netsec_setup_rx_dring() instead of the generic buffer allocation

I'll send a V3

Thanks!
/Ilias
Ilias Apalodimas July 4, 2019, 7:12 p.m. | #4
On Thu, Jul 04, 2019 at 08:52:50PM +0300, Ilias Apalodimas wrote:
> On Thu, Jul 04, 2019 at 07:39:44PM +0200, Jesper Dangaard Brouer wrote:

> > On Thu,  4 Jul 2019 17:46:09 +0300

> > Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:

> > 

> > > Quoting Arnd,

> > > 

> > > We have to do a sync_single_for_device /somewhere/ before the

> > > buffer is given to the device. On a non-cache-coherent machine with

> > > a write-back cache, there may be dirty cache lines that get written back

> > > after the device DMA's data into it (e.g. from a previous memset

> > > from before the buffer got freed), so you absolutely need to flush any

> > > dirty cache lines on it first.

> > > 

> > > Since the coherency is configurable in this device make sure we cover

> > > all configurations by explicitly syncing the allocated buffer for the

> > > device before refilling it's descriptors

> > > 

> > > Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> > > ---

> > > 

> > > Changes since V1: 

> > > - Make the code more readable

> > >  

> > >  drivers/net/ethernet/socionext/netsec.c | 7 ++++++-

> > >  1 file changed, 6 insertions(+), 1 deletion(-)

> > > 

> > > diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c

> > > index 5544a722543f..ada7626bf3a2 100644

> > > --- a/drivers/net/ethernet/socionext/netsec.c

> > > +++ b/drivers/net/ethernet/socionext/netsec.c

> > > @@ -727,21 +727,26 @@ static void *netsec_alloc_rx_data(struct netsec_priv *priv,

> > >  {

> > >  

> > >  	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];

> > > +	enum dma_data_direction dma_dir;

> > > +	dma_addr_t dma_start;

> > >  	struct page *page;

> > >  

> > >  	page = page_pool_dev_alloc_pages(dring->page_pool);

> > >  	if (!page)

> > >  		return NULL;

> > >  

> > > +	dma_start = page_pool_get_dma_addr(page);

> > >  	/* We allocate the same buffer length for XDP and non-XDP cases.

> > >  	 * page_pool API will map the whole page, skip what's needed for

> > >  	 * network payloads and/or XDP

> > >  	 */

> > > -	*dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM;

> > > +	*dma_handle = dma_start + NETSEC_RXBUF_HEADROOM;

> > >  	/* Make sure the incoming payload fits in the page for XDP and non-XDP

> > >  	 * cases and reserve enough space for headroom + skb_shared_info

> > >  	 */

> > >  	*desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA;

> > > +	dma_dir = page_pool_get_dma_dir(dring->page_pool);

> > > +	dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir);

> > 

> > It's it costly to sync_for_device the entire page size?

> > 

> > E.g. we already know that the head-room is not touched by device.  And

> > we actually want this head-room cache-hot for e.g. xdp_frame, thus it

> > would be unfortunate if the head-room is explicitly evicted from the

> > cache here.

> > 

> > Even smarter, the driver could do the sync for_device, when it

> > release/recycle page, as it likely know the exact length that was used

> > by the packet.

> It does sync for device when recycling takes place in XDP_TX with the correct

> size. 

> I guess i can explicitly sync on the xdp_return_buff cases, and 

> netsec_setup_rx_dring() instead of the generic buffer allocation

> 

> I'll send a V3


On a second thought i think this is going to look a bit complicated for no
apparent reason.
If i do this i'll have to track the buffers that got recycled vs buffers 
that are freshly allocated (and sync in this case). I currently have no 
way of cwtelling if the buffer is new or recycled, so i'll just sync the 
payload for now as you suggested.

Maybe this information can be added on page_pool_dev_alloc_pages() ?

Thanks
/Ilias
David Miller July 5, 2019, 10:45 p.m. | #5
From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Date: Thu,  4 Jul 2019 17:46:09 +0300

> Quoting Arnd,

> 

> We have to do a sync_single_for_device /somewhere/ before the

> buffer is given to the device. On a non-cache-coherent machine with

> a write-back cache, there may be dirty cache lines that get written back

> after the device DMA's data into it (e.g. from a previous memset

> from before the buffer got freed), so you absolutely need to flush any

> dirty cache lines on it first.

> 

> Since the coherency is configurable in this device make sure we cover

> all configurations by explicitly syncing the allocated buffer for the

> device before refilling it's descriptors

> 

> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

> ---

> 

> Changes since V1: 

> - Make the code more readable


Ooops, I applied v1.  Could you please send me any changes still necessary
relative to that?

Thanks.

Patch

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index 5544a722543f..ada7626bf3a2 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -727,21 +727,26 @@  static void *netsec_alloc_rx_data(struct netsec_priv *priv,
 {
 
 	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];
+	enum dma_data_direction dma_dir;
+	dma_addr_t dma_start;
 	struct page *page;
 
 	page = page_pool_dev_alloc_pages(dring->page_pool);
 	if (!page)
 		return NULL;
 
+	dma_start = page_pool_get_dma_addr(page);
 	/* We allocate the same buffer length for XDP and non-XDP cases.
 	 * page_pool API will map the whole page, skip what's needed for
 	 * network payloads and/or XDP
 	 */
-	*dma_handle = page_pool_get_dma_addr(page) + NETSEC_RXBUF_HEADROOM;
+	*dma_handle = dma_start + NETSEC_RXBUF_HEADROOM;
 	/* Make sure the incoming payload fits in the page for XDP and non-XDP
 	 * cases and reserve enough space for headroom + skb_shared_info
 	 */
 	*desc_len = PAGE_SIZE - NETSEC_RX_BUF_NON_DATA;
+	dma_dir = page_pool_get_dma_dir(dring->page_pool);
+	dma_sync_single_for_device(priv->dev, dma_start, PAGE_SIZE, dma_dir);
 
 	return page_address(page);
 }