diff mbox series

[7/7] net: page_pool: use alloc_pages_bulk in refill code path

Message ID 20210312154331.32229-8-mgorman@techsingularity.net
State New
Headers show
Series Introduce a bulk order-0 page allocator with two in-tree users | expand

Commit Message

Mel Gorman March 12, 2021, 3:43 p.m. UTC
From: Jesper Dangaard Brouer <brouer@redhat.com>

There are cases where the page_pool need to refill with pages from the
page allocator. Some workloads cause the page_pool to release pages
instead of recycling these pages.

For these workload it can improve performance to bulk alloc pages from
the page-allocator to refill the alloc cache.

For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
redirecting xdp_frame packets into a veth, that does XDP_PASS to create
an SKB from the xdp_frame, which then cannot return the page to the
page_pool. In this case, we saw[1] an improvement of 18.8% from using
the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).

[1] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 net/core/page_pool.c | 62 ++++++++++++++++++++++++++++----------------
 1 file changed, 39 insertions(+), 23 deletions(-)

Comments

Ilias Apalodimas March 12, 2021, 8:05 p.m. UTC | #1
[...]
> 6. return last_page
> 
> > +       /* Remaining pages store in alloc.cache */
> > +       list_for_each_entry_safe(page, next, &page_list, lru) {
> > +               list_del(&page->lru);
> > +               if ((pp_flags & PP_FLAG_DMA_MAP) &&
> > +                   unlikely(!page_pool_dma_map(pool, page))) {
> > +                       put_page(page);
> > +                       continue;
> > +               }
> 
> So if you added a last_page pointer what you could do is check for it
> here and assign it to the alloc cache. If last_page is not set the
> block would be skipped.
> 
> > +               if (likely(pool->alloc.count < PP_ALLOC_CACHE_SIZE)) {
> > +                       pool->alloc.cache[pool->alloc.count++] = page;
> > +                       pool->pages_state_hold_cnt++;
> > +                       trace_page_pool_state_hold(pool, page,
> > +                                                  pool->pages_state_hold_cnt);
> > +               } else {
> > +                       put_page(page);
> 
> If you are just calling put_page here aren't you leaking DMA mappings?
> Wouldn't you need to potentially unmap the page before you call
> put_page on it?

Oops, I completely missed that. Alexander is right here.

> 
> > +               }
> > +       }
> > +out:
> >         if ((pp_flags & PP_FLAG_DMA_MAP) &&
> > -           unlikely(!page_pool_dma_map(pool, page))) {
> > -               put_page(page);
> > +           unlikely(!page_pool_dma_map(pool, first_page))) {
> > +               put_page(first_page);
> 
> I would probably move this block up and make it a part of the pp_order
> block above. Also since you are doing this in 2 spots it might make
> sense to look at possibly making this an inline function.
> 
> >                 return NULL;
> >         }
> >
> >         /* Track how many pages are held 'in-flight' */
> >         pool->pages_state_hold_cnt++;
> > -       trace_page_pool_state_hold(pool, page, pool->pages_state_hold_cnt);
> > +       trace_page_pool_state_hold(pool, first_page, pool->pages_state_hold_cnt);
> >
> >         /* When page just alloc'ed is should/must have refcnt 1. */
> > -       return page;
> > +       return first_page;
> >  }
> >
> >  /* For using page_pool replace: alloc_pages() API calls, but provide
> > --
> > 2.26.2
> >

Cheers
/Ilias
Mel Gorman March 13, 2021, 1:30 p.m. UTC | #2
On Fri, Mar 12, 2021 at 11:44:09AM -0800, Alexander Duyck wrote:
> > -       /* FUTURE development:

> > -        *

> > -        * Current slow-path essentially falls back to single page

> > -        * allocations, which doesn't improve performance.  This code

> > -        * need bulk allocation support from the page allocator code.

> > -        */

> > -

> > -       /* Cache was empty, do real allocation */

> > -#ifdef CONFIG_NUMA

> > -       page = alloc_pages_node(pool->p.nid, gfp, pool->p.order);

> > -#else

> > -       page = alloc_pages(gfp, pool->p.order);

> > -#endif

> > -       if (!page)

> > +       if (unlikely(!__alloc_pages_bulk(gfp, pp_nid, NULL, bulk, &page_list)))

> >                 return NULL;

> >

> > +       /* First page is extracted and returned to caller */

> > +       first_page = list_first_entry(&page_list, struct page, lru);

> > +       list_del(&first_page->lru);

> > +

> 

> This seems kind of broken to me. If you pull the first page and then

> cannot map it you end up returning NULL even if you placed a number of

> pages in the cache.

> 


I think you're right but I'm punting this to Jesper to fix. He's more
familiar with this particular code and can verify the performance is
still ok for high speed networks.

-- 
Mel Gorman
SUSE Labs
Jesper Dangaard Brouer March 15, 2021, 8:40 a.m. UTC | #3
On Sat, 13 Mar 2021 13:30:58 +0000
Mel Gorman <mgorman@techsingularity.net> wrote:

> On Fri, Mar 12, 2021 at 11:44:09AM -0800, Alexander Duyck wrote:

> > > -       /* FUTURE development:

> > > -        *

> > > -        * Current slow-path essentially falls back to single page

> > > -        * allocations, which doesn't improve performance.  This code

> > > -        * need bulk allocation support from the page allocator code.

> > > -        */

> > > -

> > > -       /* Cache was empty, do real allocation */

> > > -#ifdef CONFIG_NUMA

> > > -       page = alloc_pages_node(pool->p.nid, gfp, pool->p.order);

> > > -#else

> > > -       page = alloc_pages(gfp, pool->p.order);

> > > -#endif

> > > -       if (!page)

> > > +       if (unlikely(!__alloc_pages_bulk(gfp, pp_nid, NULL, bulk, &page_list)))

> > >                 return NULL;

> > >

> > > +       /* First page is extracted and returned to caller */

> > > +       first_page = list_first_entry(&page_list, struct page, lru);

> > > +       list_del(&first_page->lru);

> > > +  

> > 

> > This seems kind of broken to me. If you pull the first page and then

> > cannot map it you end up returning NULL even if you placed a number of

> > pages in the cache.

> >   

> 

> I think you're right but I'm punting this to Jesper to fix. He's more

> familiar with this particular code and can verify the performance is

> still ok for high speed networks.


Yes, I'll take a look at this, and updated the patch accordingly (and re-run
the performance tests).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
Jesper Dangaard Brouer March 15, 2021, 1:39 p.m. UTC | #4
On Fri, 12 Mar 2021 22:05:45 +0200
Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:

> [...]

> > 6. return last_page

> >   

> > > +       /* Remaining pages store in alloc.cache */

> > > +       list_for_each_entry_safe(page, next, &page_list, lru) {

> > > +               list_del(&page->lru);

> > > +               if ((pp_flags & PP_FLAG_DMA_MAP) &&

> > > +                   unlikely(!page_pool_dma_map(pool, page))) {

> > > +                       put_page(page);

> > > +                       continue;

> > > +               }  

> > 

> > So if you added a last_page pointer what you could do is check for it

> > here and assign it to the alloc cache. If last_page is not set the

> > block would be skipped.

> >   

> > > +               if (likely(pool->alloc.count < PP_ALLOC_CACHE_SIZE)) {

> > > +                       pool->alloc.cache[pool->alloc.count++] = page;

> > > +                       pool->pages_state_hold_cnt++;

> > > +                       trace_page_pool_state_hold(pool, page,

> > > +                                                  pool->pages_state_hold_cnt);

> > > +               } else {

> > > +                       put_page(page);  

> > 

> > If you are just calling put_page here aren't you leaking DMA mappings?

> > Wouldn't you need to potentially unmap the page before you call

> > put_page on it?  

> 

> Oops, I completely missed that. Alexander is right here.


Well, the put_page() case can never happen as the pool->alloc.cache[]
is known to be empty when this function is called.  I do agree that the
code looks cumbersome and should free the DMA mapping, if it could
happen.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
Jesper Dangaard Brouer March 15, 2021, 7:33 p.m. UTC | #5
This patch is against Mel's git-tree:
 git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git

Using branch: mm-bulk-rebase-v4r2 but replacing the last patch related to
the page_pool using __alloc_pages_bulk().

 https://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git/log/?h=mm-bulk-rebase-v4r2

While implementing suggestions by Alexander Duyck, I realised that I could
simplify the code further, and simply take the last page from the
pool->alloc.cache given this avoids special casing the last page.

I re-ran performance tests and the improvement have been reduced to 13% from
18% before, but I don't think the rewrite of the specific patch have
anything to do with this.

Notes on tests:
 https://github.com/xdp-project/xdp-project/blob/master/areas/mem/page_pool06_alloc_pages_bulk.org#test-on-mel-git-tree

---

Jesper Dangaard Brouer (1):
      net: page_pool: use alloc_pages_bulk in refill code path


 net/core/page_pool.c |   73 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 47 insertions(+), 26 deletions(-)

--
diff mbox series

Patch

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 40e1b2beaa6c..a5889f1b86aa 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -208,44 +208,60 @@  noinline
 static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 						 gfp_t _gfp)
 {
+	const int bulk = PP_ALLOC_CACHE_REFILL;
+	struct page *page, *next, *first_page;
 	unsigned int pp_flags = pool->p.flags;
-	struct page *page;
+	unsigned int pp_order = pool->p.order;
+	int pp_nid = pool->p.nid;
+	LIST_HEAD(page_list);
 	gfp_t gfp = _gfp;
 
-	/* We could always set __GFP_COMP, and avoid this branch, as
-	 * prep_new_page() can handle order-0 with __GFP_COMP.
-	 */
-	if (pool->p.order)
+	/* Don't support bulk alloc for high-order pages */
+	if (unlikely(pp_order)) {
 		gfp |= __GFP_COMP;
+		first_page = alloc_pages_node(pp_nid, gfp, pp_order);
+		if (unlikely(!first_page))
+			return NULL;
+		goto out;
+	}
 
-	/* FUTURE development:
-	 *
-	 * Current slow-path essentially falls back to single page
-	 * allocations, which doesn't improve performance.  This code
-	 * need bulk allocation support from the page allocator code.
-	 */
-
-	/* Cache was empty, do real allocation */
-#ifdef CONFIG_NUMA
-	page = alloc_pages_node(pool->p.nid, gfp, pool->p.order);
-#else
-	page = alloc_pages(gfp, pool->p.order);
-#endif
-	if (!page)
+	if (unlikely(!__alloc_pages_bulk(gfp, pp_nid, NULL, bulk, &page_list)))
 		return NULL;
 
+	/* First page is extracted and returned to caller */
+	first_page = list_first_entry(&page_list, struct page, lru);
+	list_del(&first_page->lru);
+
+	/* Remaining pages store in alloc.cache */
+	list_for_each_entry_safe(page, next, &page_list, lru) {
+		list_del(&page->lru);
+		if ((pp_flags & PP_FLAG_DMA_MAP) &&
+		    unlikely(!page_pool_dma_map(pool, page))) {
+			put_page(page);
+			continue;
+		}
+		if (likely(pool->alloc.count < PP_ALLOC_CACHE_SIZE)) {
+			pool->alloc.cache[pool->alloc.count++] = page;
+			pool->pages_state_hold_cnt++;
+			trace_page_pool_state_hold(pool, page,
+						   pool->pages_state_hold_cnt);
+		} else {
+			put_page(page);
+		}
+	}
+out:
 	if ((pp_flags & PP_FLAG_DMA_MAP) &&
-	    unlikely(!page_pool_dma_map(pool, page))) {
-		put_page(page);
+	    unlikely(!page_pool_dma_map(pool, first_page))) {
+		put_page(first_page);
 		return NULL;
 	}
 
 	/* Track how many pages are held 'in-flight' */
 	pool->pages_state_hold_cnt++;
-	trace_page_pool_state_hold(pool, page, pool->pages_state_hold_cnt);
+	trace_page_pool_state_hold(pool, first_page, pool->pages_state_hold_cnt);
 
 	/* When page just alloc'ed is should/must have refcnt 1. */
-	return page;
+	return first_page;
 }
 
 /* For using page_pool replace: alloc_pages() API calls, but provide