diff mbox

arm64: mm: ensure that the zero page is visible to the page table walker

Message ID 1449769199-31361-1-git-send-email-will.deacon@arm.com
State Accepted
Commit 32d6397805d00573ce1fa55f408ce2bca15b0ad3
Headers show

Commit Message

Will Deacon Dec. 10, 2015, 5:39 p.m. UTC
In paging_init, we allocate the zero page, memset it to zero and then
point TTBR0 to it in order to avoid speculative fetches through the
identity mapping.

In order to guarantee that the freshly zeroed page is indeed visible to
the page table walker, we need to execute a dsb instruction prior to
writing the TTBR.

Cc: <stable@vger.kernel.org> # v3.14+, for older kernels need to drop the 'ishst'
Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 arch/arm64/mm/mmu.c | 3 +++
 1 file changed, 3 insertions(+)

-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Mark Rutland Dec. 10, 2015, 6:14 p.m. UTC | #1
Hi Will,

On Thu, Dec 10, 2015 at 05:39:59PM +0000, Will Deacon wrote:
> In paging_init, we allocate the zero page, memset it to zero and then

> point TTBR0 to it in order to avoid speculative fetches through the

> identity mapping.

> 

> In order to guarantee that the freshly zeroed page is indeed visible to

> the page table walker, we need to execute a dsb instruction prior to

> writing the TTBR.

> 

> Cc: <stable@vger.kernel.org> # v3.14+, for older kernels need to drop the 'ishst'

> Signed-off-by: Will Deacon <will.deacon@arm.com>

> ---

>  arch/arm64/mm/mmu.c | 3 +++

>  1 file changed, 3 insertions(+)

> 

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c

> index c04def90f3e4..c5bd5bca8e3d 100644

> --- a/arch/arm64/mm/mmu.c

> +++ b/arch/arm64/mm/mmu.c

> @@ -464,6 +464,9 @@ void __init paging_init(void)

>  

>  	empty_zero_page = virt_to_page(zero_page);

>  

> +	/* Ensure the zero page is visible to the page table walker */

> +	dsb(ishst);


I think this should live in early_alloc (likewise in late_alloc).

In the other cases we call early_alloc or late_allot we assume the
zeroing is visible to the page table walker.

For example in in alloc_init_pte we do:
	
	if (pmd_none(*pmd) || pmd_sect(*pmd)) {
		pte = alloc(PTRS_PER_PTE * sizeof(pte_t));
		if (pmd_sect(*pmd))
			split_pmd(pmd, pte);
		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
		flush_tlb_all();
	}

There's a dsb in __pmd_populate, but it's _after_ the write to the pmd
entry, so the walker might start walking the newly-allocated pte table
before the zeroing is visible.

Either we need a barrier after every alloc, or we fold the barrier into
the two allocation functions.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Dec. 11, 2015, 5:58 p.m. UTC | #2
On Thu, Dec 10, 2015 at 06:14:12PM +0000, Mark Rutland wrote:
> Hi Will,


Hi Mark,

> On Thu, Dec 10, 2015 at 05:39:59PM +0000, Will Deacon wrote:

> > In paging_init, we allocate the zero page, memset it to zero and then

> > point TTBR0 to it in order to avoid speculative fetches through the

> > identity mapping.

> > 

> > In order to guarantee that the freshly zeroed page is indeed visible to

> > the page table walker, we need to execute a dsb instruction prior to

> > writing the TTBR.

> > 

> > Cc: <stable@vger.kernel.org> # v3.14+, for older kernels need to drop the 'ishst'

> > Signed-off-by: Will Deacon <will.deacon@arm.com>

> > ---

> >  arch/arm64/mm/mmu.c | 3 +++

> >  1 file changed, 3 insertions(+)

> > 

> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c

> > index c04def90f3e4..c5bd5bca8e3d 100644

> > --- a/arch/arm64/mm/mmu.c

> > +++ b/arch/arm64/mm/mmu.c

> > @@ -464,6 +464,9 @@ void __init paging_init(void)

> >  

> >  	empty_zero_page = virt_to_page(zero_page);

> >  

> > +	/* Ensure the zero page is visible to the page table walker */

> > +	dsb(ishst);

> 

> I think this should live in early_alloc (likewise in late_alloc).

> 

> In the other cases we call early_alloc or late_allot we assume the

> zeroing is visible to the page table walker.

> 

> For example in in alloc_init_pte we do:

> 	

> 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {

> 		pte = alloc(PTRS_PER_PTE * sizeof(pte_t));

> 		if (pmd_sect(*pmd))

> 			split_pmd(pmd, pte);

> 		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);

> 		flush_tlb_all();

> 	}

> 

> There's a dsb in __pmd_populate, but it's _after_ the write to the pmd

> entry, so the walker might start walking the newly-allocated pte table

> before the zeroing is visible.


Urgh. The reason this is a problem is because we're modifying the page
tables live (which I know that you're fixing) without using
break-before-make. Consequently, the usual ordering guarantees that we
get from the tlb flush after installing the invalid entry do not apply
and we end up with the issue you point out.

> Either we need a barrier after every alloc, or we fold the barrier into

> the two allocation functions.


Could you roll this into your patch that drops the size parameter from
the alloc functions please? Then we can name them {early,late}_alloc_pgtable
and have them do the dsb in there. Maybe we can drop it again when we're
doing proper break-before-make.

Cheers,

Will
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland Dec. 11, 2015, 6:19 p.m. UTC | #3
> > > +	/* Ensure the zero page is visible to the page table walker */

> > > +	dsb(ishst);

> > 

> > I think this should live in early_alloc (likewise in late_alloc).

> > 

> > In the other cases we call early_alloc or late_allot we assume the

> > zeroing is visible to the page table walker.

> > 

> > For example in in alloc_init_pte we do:

> > 	

> > 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {

> > 		pte = alloc(PTRS_PER_PTE * sizeof(pte_t));

> > 		if (pmd_sect(*pmd))

> > 			split_pmd(pmd, pte);

> > 		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);

> > 		flush_tlb_all();

> > 	}

> > 

> > There's a dsb in __pmd_populate, but it's _after_ the write to the pmd

> > entry, so the walker might start walking the newly-allocated pte table

> > before the zeroing is visible.

> 

> Urgh. The reason this is a problem is because we're modifying the page

> tables live (which I know that you're fixing) without using

> break-before-make. Consequently, the usual ordering guarantees that we

> get from the tlb flush after installing the invalid entry do not apply

> and we end up with the issue you point out.


My feeling was that in these paths we usually assume all prior page
table updates have been made visible to the page table walkers. Given
that, having the allocator guarantee the zeroing was already visible
felt like the natural thing to do.

That said, having looked at mm/memory.c, we seem to follow the exact
same pattern when plumbing tables together dynamically, with only an
smp_wmb() between the zeroed allocation and plumbing a table entry in.

e.g. in __pte_alloc we have the pattern:

	pgtable_t new = pte_alloc_one(mm, address);
	smp_wmb();
	if (pmd_none(*pmd))
		pmd_populate(mm, pmd, new);

> > Either we need a barrier after every alloc, or we fold the barrier into

> > the two allocation functions.

> 

> Could you roll this into your patch that drops the size parameter from

> the alloc functions please? Then we can name them {early,late}_alloc_pgtable

> and have them do the dsb in there. Maybe we can drop it again when we're

> doing proper break-before-make.


Sure, will do.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Will Deacon Dec. 11, 2015, 7:10 p.m. UTC | #4
On Fri, Dec 11, 2015 at 06:19:52PM +0000, Mark Rutland wrote:
> > > > +	/* Ensure the zero page is visible to the page table walker */

> > > > +	dsb(ishst);

> > > 

> > > I think this should live in early_alloc (likewise in late_alloc).

> > > 

> > > In the other cases we call early_alloc or late_allot we assume the

> > > zeroing is visible to the page table walker.

> > > 

> > > For example in in alloc_init_pte we do:

> > > 	

> > > 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {

> > > 		pte = alloc(PTRS_PER_PTE * sizeof(pte_t));

> > > 		if (pmd_sect(*pmd))

> > > 			split_pmd(pmd, pte);

> > > 		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);

> > > 		flush_tlb_all();

> > > 	}

> > > 

> > > There's a dsb in __pmd_populate, but it's _after_ the write to the pmd

> > > entry, so the walker might start walking the newly-allocated pte table

> > > before the zeroing is visible.

> > 

> > Urgh. The reason this is a problem is because we're modifying the page

> > tables live (which I know that you're fixing) without using

> > break-before-make. Consequently, the usual ordering guarantees that we

> > get from the tlb flush after installing the invalid entry do not apply

> > and we end up with the issue you point out.

> 

> My feeling was that in these paths we usually assume all prior page

> table updates have been made visible to the page table walkers. Given

> that, having the allocator guarantee the zeroing was already visible

> felt like the natural thing to do.

> 

> That said, having looked at mm/memory.c, we seem to follow the exact

> same pattern when plumbing tables together dynamically, with only an

> smp_wmb() between the zeroed allocation and plumbing a table entry in.

> 

> e.g. in __pte_alloc we have the pattern:

> 

> 	pgtable_t new = pte_alloc_one(mm, address);

> 	smp_wmb();

> 	if (pmd_none(*pmd))

> 		pmd_populate(mm, pmd, new);


I suspect this is potentially broken if somebody builds a CPU with a
"cool feature" in the page table walker that allows it to walk out of
order without respecting address dependencies.

The easiest fix is adding dsb(ishst) to the page table alloc functions.

Will
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland Dec. 11, 2015, 7:16 p.m. UTC | #5
On Fri, Dec 11, 2015 at 07:10:31PM +0000, Will Deacon wrote:
> On Fri, Dec 11, 2015 at 06:19:52PM +0000, Mark Rutland wrote:

> > > > > +	/* Ensure the zero page is visible to the page table walker */

> > > > > +	dsb(ishst);

> > > > 

> > > > I think this should live in early_alloc (likewise in late_alloc).

> > > > 

> > > > In the other cases we call early_alloc or late_allot we assume the

> > > > zeroing is visible to the page table walker.

> > > > 

> > > > For example in in alloc_init_pte we do:

> > > > 	

> > > > 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {

> > > > 		pte = alloc(PTRS_PER_PTE * sizeof(pte_t));

> > > > 		if (pmd_sect(*pmd))

> > > > 			split_pmd(pmd, pte);

> > > > 		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);

> > > > 		flush_tlb_all();

> > > > 	}

> > > > 

> > > > There's a dsb in __pmd_populate, but it's _after_ the write to the pmd

> > > > entry, so the walker might start walking the newly-allocated pte table

> > > > before the zeroing is visible.

> > > 

> > > Urgh. The reason this is a problem is because we're modifying the page

> > > tables live (which I know that you're fixing) without using

> > > break-before-make. Consequently, the usual ordering guarantees that we

> > > get from the tlb flush after installing the invalid entry do not apply

> > > and we end up with the issue you point out.

> > 

> > My feeling was that in these paths we usually assume all prior page

> > table updates have been made visible to the page table walkers. Given

> > that, having the allocator guarantee the zeroing was already visible

> > felt like the natural thing to do.

> > 

> > That said, having looked at mm/memory.c, we seem to follow the exact

> > same pattern when plumbing tables together dynamically, with only an

> > smp_wmb() between the zeroed allocation and plumbing a table entry in.

> > 

> > e.g. in __pte_alloc we have the pattern:

> > 

> > 	pgtable_t new = pte_alloc_one(mm, address);

> > 	smp_wmb();

> > 	if (pmd_none(*pmd))

> > 		pmd_populate(mm, pmd, new);

> 

> I suspect this is potentially broken if somebody builds a CPU with a

> "cool feature" in the page table walker that allows it to walk out of

> order without respecting address dependencies.

> 

> The easiest fix is adding dsb(ishst) to the page table alloc functions.


Sounds good to me.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c04def90f3e4..c5bd5bca8e3d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -464,6 +464,9 @@  void __init paging_init(void)
 
 	empty_zero_page = virt_to_page(zero_page);
 
+	/* Ensure the zero page is visible to the page table walker */
+	dsb(ishst);
+
 	/*
 	 * TTBR0 is only used for the identity mapping at this stage. Make it
 	 * point to zero page to avoid speculatively fetching new entries.