diff mbox

arm64: fix MAX_ORDER for 64K pagesize

Message ID 1403285834.755.39.camel@deneb.redhat.com
State New
Headers show

Commit Message

Mark Salter June 20, 2014, 5:37 p.m. UTC
On Thu, 2014-06-19 at 21:24 +0200, Michal Nazarewicz wrote:
> On Thu, Jun 19 2014, Mark Salter <msalter@redhat.com> wrote:
> > On Tue, 2014-06-17 at 20:32 +0200, Michal Nazarewicz wrote:
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 5dba293..6e657ce 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -801,7 +801,15 @@ void __init init_cma_reserved_pageblock(struct page *page)
> >>  
> >>  	set_page_refcounted(page);
> >>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> >> -	__free_pages(page, pageblock_order);
> >> +	if (pageblock_order > MAX_ORDER) {
> >> +		struct page *subpage = p;
> >> +		unsigned count = 1 << (pageblock_order - MAX_ORDER);
> >> +		do {
> >> +			__free_pages(subpage, pageblock_order);
> >                                                ^^^^^^^
> >                                                MAX_ORDER
> 
> D'oh!  I'll send a revised patch.
> 
> >> +		} while (subpage += MAX_ORDER_NR_PAGES, --count);
> >> +	} else {
> >> +		__free_pages(page, pageblock_order);
> >> +	}
> >>  	adjust_managed_page_count(page, pageblock_nr_pages);
> >>  }
> >>  #endif
> >> --------- >8 ---------------------------------------------------------
> >> 
> >> Thoughts?  This has not been tested and I think it may cause performance
> >> degradation in some cases since pageblock_order is not always
> >> a constant, so the comparison may end up not being stripped away even on
> >> systems where it's always false.
> 
> > This works with the above tweak. So it fixes the problm here, but I was
> > not sure if we'd get bitten elsewhere by pageblock_order > MAX_ORDER.
> 
> This is always a possibility, but in such cases, it's a bug in CMA.
> I've tried to keep in mind that pageblock_order may be greater than
> MAX_ORDER when writing CMA, but I've never tested on such a system.
> 
> > It will be slower, but does it only gets called a few time at most at
> > boot time, right?
> 
> Yes.  The performance degradation should be negligible since
> init_cma_reserved is hardly a critical path and is called at most
> MAX_CMA_AREAS times which by default is 8.  And I mean it will be slower
> because it will have to perform a branch.
> 

I ended up needing this (on top of your patch) to get the system to
boot. Each MAX_ORDER-1 group needs the refcount and migratetype set so
that __free_pages does the right thing.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Mark Salter June 23, 2014, 9:10 p.m. UTC | #1
On Mon, 2014-06-23 at 21:40 +0200, Michal Nazarewicz wrote:
> With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
> the following is triggered at early boot:
> 
>   SMP: Total of 8 processors activated.
>   devtmpfs: initialized
>   Unable to handle kernel NULL pointer dereference at virtual address 00000008
>   pgd = fffffe0000050000
>   [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
>   Internal error: Oops: 96000006 [#1] SMP
>   Modules linked in:
>   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
>   task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
>   PC is at __list_add+0x10/0xd4
>   LR is at free_one_page+0x270/0x638
>   ...
>   Call trace:
>   [<fffffe00003ee970>] __list_add+0x10/0xd4
>   [<fffffe000019c478>] free_one_page+0x26c/0x638
>   [<fffffe000019c8c8>] __free_pages_ok.part.52+0x84/0xbc
>   [<fffffe000019d5e8>] __free_pages+0x74/0xbc
>   [<fffffe0000c01350>] init_cma_reserved_pageblock+0xe8/0x104
>   [<fffffe0000c24de0>] cma_init_reserved_areas+0x190/0x1e4
>   [<fffffe0000090418>] do_one_initcall+0xc4/0x154
>   [<fffffe0000bf0a50>] kernel_init_freeable+0x204/0x2a8
>   [<fffffe00007520a0>] kernel_init+0xc/0xd4
> 
> This happens because init_cma_reserved_pageblock() calls
> __free_one_page() with pageblock_order as page order but it is bigger
> han MAX_ORDER.  This in turn causes accesses past zone->free_list[].
> 
> Fix the problem by changing init_cma_reserved_pageblock() such that it
> splits pageblock into individual MAX_ORDER pages if pageblock is
> bigger than a MAX_ORDER page.
> 
> In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
> architectures expect for ia64, powerpc and tile at the moment, the
> “pageblock_order > MAX_ORDER” condition will be optimised out since
> both sides of the operator are constants.  In cases where pageblock
> size is variable, the performance degradation should not be
> significant anyway since init_cma_reserved_pageblock() is called
> only at boot time at most MAX_CMA_AREAS times which by default is
> eight.
> 
> Cc: stable@vger.kernel.org
> Signed-off-by: Michal Nazarewicz <mina86@mina86.com>
> Reported-by: Mark Salter <msalter@redhat.com>
> Tested-by: Christopher Covington <cov@codeaurora.org>
> ---
>  mm/page_alloc.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
>  Mark Salter wrote:
>  > I ended up needing this (on top of your patch) to get the system to
>  > boot.  Each MAX_ORDER-1 group needs the refcount and migratetype set
>  > so that __free_pages does the right thing.
>  >
>  > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>  > index 02fb1ed..a7ca6cc 100644
>  > --- a/mm/page_alloc.c
>  > +++ b/mm/page_alloc.c
>  > @@ -799,17 +799,18 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  >  		set_page_count(p, 0);
>  >  	} while (++p, --i);
>  >  
>  > -	set_page_refcounted(page);
>  > -	set_pageblock_migratetype(page, MIGRATE_CMA);
>  > -
>  > -	if (pageblock_order > MAX_ORDER) {
>  > -		i = pageblock_order - MAX_ORDER;
>  > +	if (pageblock_order >= MAX_ORDER) {
>  > +		i = pageblock_order - MAX_ORDER + 1;
>  >  		i = 1 << i;
>  >  		p = page;
>  >  		do {
>  > -			__free_pages(p, MAX_ORDER);
>  > +			set_page_refcounted(p);
>  > +			set_pageblock_migratetype(p, MIGRATE_CMA);
>  > +			__free_pages(p, MAX_ORDER - 1);
>  >  		} while (p += MAX_ORDER_NR_PAGES, --i);
>  >  	} else {
>  > +		set_page_refcounted(page);
>  > +		set_pageblock_migratetype(page, MIGRATE_CMA);
>  >  		__free_pages(page, pageblock_order);
>  >  	}
> 
>  This is kinda embarrassing, dunno how I missed that.
> 
>  But each page actually does not need to have migratetype set, does it?
>  All of those pages are in a single pageblock so a single call
>  suffices.  If you track set_pageblock_migratetype down to pfn_to_bitidx
>  there is:
> 
> 	return (pfn >> pageblock_order) * NR_PAGEBLOCK_BITS;
> 
>  so for pfns inside of a pageblock, they get truncated.  Or did I miss
>  yet another thing?

Nope, my turn to miss something. You only need to set migrate type
once per pageblock.

> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee92384..fef9614 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -816,9 +816,21 @@ void __init init_cma_reserved_pageblock(struct page *page)
>  		set_page_count(p, 0);
>  	} while (++p, --i);
>  
> -	set_page_refcounted(page);
>  	set_pageblock_migratetype(page, MIGRATE_CMA);
> -	__free_pages(page, pageblock_order);
> +
> +	if (pageblock_order >= MAX_ORDER) {
> +		i = pageblock_nr_pages;
> +		p = page;
> +		do {
> +			set_page_refcounted(p);
> +			__free_pages(p, MAX_ORDER - 1);
> +			p += MAX_ORDER_NR_PAGES;
> +		} while (i -= MAX_ORDER_NR_PAGES);
> +	} else {
> +		set_page_refcounted(page);
> +		__free_pages(page, pageblock_order);
> +	}
> +
>  	adjust_managed_page_count(page, pageblock_nr_pages);
>  }
>  #endif

This version works for me. Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 02fb1ed..a7ca6cc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -799,17 +799,18 @@  void __init init_cma_reserved_pageblock(struct page *page)
 		set_page_count(p, 0);
 	} while (++p, --i);
 
-	set_page_refcounted(page);
-	set_pageblock_migratetype(page, MIGRATE_CMA);
-
-	if (pageblock_order > MAX_ORDER) {
-		i = pageblock_order - MAX_ORDER;
+	if (pageblock_order >= MAX_ORDER) {
+		i = pageblock_order - MAX_ORDER + 1;
 		i = 1 << i;
 		p = page;
 		do {
-			__free_pages(p, MAX_ORDER);
+			set_page_refcounted(p);
+			set_pageblock_migratetype(p, MIGRATE_CMA);
+			__free_pages(p, MAX_ORDER - 1);
 		} while (p += MAX_ORDER_NR_PAGES, --i);
 	} else {
+		set_page_refcounted(page);
+		set_pageblock_migratetype(page, MIGRATE_CMA);
 		__free_pages(page, pageblock_order);
 	}