mbox series

[0/3,v5] Introduce a bulk order-0 page allocator

Message ID 20210322091845.16437-1-mgorman@techsingularity.net
Headers show
Series Introduce a bulk order-0 page allocator | expand

Message

Mel Gorman March 22, 2021, 9:18 a.m. UTC
This series is based on top of Matthew Wilcox's series "Rationalise
__alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
test and are not using Andrew's tree as a baseline, I suggest using the
following git tree

git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9

The users of the API have been dropped in this version as the callers
need to check whether they prefer an array or list interface (whether
preference is based on convenience or performance).

Changelog since v4
o Drop users of the API
o Remove free_pages_bulk interface, no users
o Add array interface
o Allocate single page if watermark checks on local zones fail

Changelog since v3
o Rebase on top of Matthew's series consolidating the alloc_pages API
o Rename alloced to allocated
o Split out preparation patch for prepare_alloc_pages
o Defensive check for bulk allocation or <= 0 pages
o Call single page allocation path only if no pages were allocated
o Minor cosmetic cleanups
o Reorder patch dependencies by subsystem. As this is a cross-subsystem
  series, the mm patches have to be merged before the sunrpc and net
  users.

Changelog since v2
o Prep new pages with IRQs enabled
o Minor documentation update

Changelog since v1
o Parenthesise binary and boolean comparisons
o Add reviewed-bys
o Rebase to 5.12-rc2

This series introduces a bulk order-0 page allocator with the
intent that sunrpc and the network page pool become the first users.
The implementation is not particularly efficient and the intention is to
iron out what the semantics of the API should have for users. Despite
that, this is a performance-related enhancement for users that require
multiple pages for an operation without multiple round-trips to the page
allocator. Quoting the last patch for the prototype high-speed networking
use-case.

    For XDP-redirect workload with 100G mlx5 driver (that use page_pool)
    redirecting xdp_frame packets into a veth, that does XDP_PASS to
    create an SKB from the xdp_frame, which then cannot return the page
    to the page_pool. In this case, we saw[1] an improvement of 18.8%
    from using the alloc_pages_bulk API (3,677,958 pps -> 4,368,926 pps).

Both potential users in this series are corner cases (NFS and high-speed
networks) so it is unlikely that most users will see any benefit in the
short term. Other potential other users are batch allocations for page
cache readahead, fault around and SLUB allocations when high-order pages
are unavailable. It's unknown how much benefit would be seen by converting
multiple page allocation calls to a single batch or what difference it may
make to headline performance. It's a chicken and egg problem given that
the potential benefit cannot be investigated without an implementation
to test against.

Light testing passed, I'm relying on Chuck and Jesper to test their
implementations, choose whether to use lists or arrays and document
performance gains/losses in the changelogs.

Patch 1 renames a variable name that is particularly unpopular

Patch 2 adds a bulk page allocator

Patch 3 adds an array-based version of the bulk allocator

 include/linux/gfp.h |  18 +++++
 mm/page_alloc.c     | 171 ++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 185 insertions(+), 4 deletions(-)

Comments

Mel Gorman March 23, 2021, 10:44 a.m. UTC | #1
On Mon, Mar 22, 2021 at 09:18:42AM +0000, Mel Gorman wrote:
> This series is based on top of Matthew Wilcox's series "Rationalise
> __alloc_pages wrapper" and does not apply to 5.12-rc2. If you want to
> test and are not using Andrew's tree as a baseline, I suggest using the
> following git tree
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v5r9
> 

Jesper and Chuck, would you mind rebasing on top of the following branch
please? 

git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux.git mm-bulk-rebase-v6r2

The interface is the same so the rebase should be trivial.

Jesper, I'm hoping you see no differences in performance but it's best
to check.

For Chuck, this version will check for holes and scan the remainder of
the array to see if nr_pages are allocated before returning. If the holes
in the array are always at the start (which it should be for sunrpc)
then it should still be a single IRQ disable/enable. Specifically, each
contiguous hole in the array will disable/enable IRQs once. I prototyped
NFS array support and it had a 100% success rate with no sleeps running
dbench over the network with no memory pressure but that's a basic test
on a 10G switch.

The basic patch I used to convert sunrpc from using lists to an array
for testing is as follows;

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 922118968986..0ce33c1742d9 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -642,12 +642,10 @@ static void svc_check_conn_limits(struct svc_serv *serv)
 static int svc_alloc_arg(struct svc_rqst *rqstp)
 {
 	struct svc_serv *serv = rqstp->rq_server;
-	unsigned long needed;
 	struct xdr_buf *arg;
-	struct page *page;
 	LIST_HEAD(list);
 	int pages;
-	int i;
+	int i = 0;
 
 	pages = (serv->sv_max_mesg + 2 * PAGE_SIZE) >> PAGE_SHIFT;
 	if (pages > RPCSVC_MAXPAGES) {
@@ -657,29 +655,15 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
 		pages = RPCSVC_MAXPAGES;
 	}
 
-	for (needed = 0, i = 0; i < pages ; i++) {
-		if (!rqstp->rq_pages[i])
-			needed++;
-	}
-	i = 0;
-	while (needed) {
-		needed -= alloc_pages_bulk(GFP_KERNEL, needed, &list);
-		for (; i < pages; i++) {
-			if (rqstp->rq_pages[i])
-				continue;
-			page = list_first_entry_or_null(&list, struct page, lru);
-			if (likely(page)) {
-				list_del(&page->lru);
-				rqstp->rq_pages[i] = page;
-				continue;
-			}
+	while (i < pages) {
+		i = alloc_pages_bulk_array(GFP_KERNEL, pages, &rqstp->rq_pages[0]);
+		if (i < pages) {
 			set_current_state(TASK_INTERRUPTIBLE);
 			if (signalled() || kthread_should_stop()) {
 				set_current_state(TASK_RUNNING);
 				return -EINTR;
 			}
 			schedule_timeout(msecs_to_jiffies(500));
-			break;
 		}
 	}
 	rqstp->rq_page_end = &rqstp->rq_pages[pages];