diff mbox

[RFC,06/20] arm64: mm: place empty_zero_page in bss

Message ID 1449665095-20774-7-git-send-email-mark.rutland@arm.com
State New
Headers show

Commit Message

Mark Rutland Dec. 9, 2015, 12:44 p.m. UTC
Currently the zero page is set up in paging_init, and thus we cannot use
the zero page earlier. We use the zero page as a reserved TTBR value
from which no TLB entries may be allocated (e.g. when uninstalling the
idmap). To enable such usage earlier (as may be required for invasive
changes to the kernel page tables), and to minimise the time that the
idmap is active, we need to be able to use the zero page before
paging_init.

This patch follows the example set by x86, by allocating the zero page
at compile time, in .bss. This means that the zero page itself is
available immediately upon entry to start_kernel (as we zero .bss before
this), and also means that the zero page takes up no space in the raw
Image binary. The associated struct page is allocated in bootmem_init,
and remains unavailable until this time.

Outside of arch code, the only users of empty_zero_page assume that the
empty_zero_page symbol refers to the zeroed memory itself, and that
ZERO_PAGE(x) must be used to acquire the associated struct page,
following the example of x86. This patch also brings arm64 inline with
these assumptions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 2 +-
 arch/arm64/include/asm/pgtable.h     | 4 ++--
 arch/arm64/mm/mmu.c                  | 9 +--------
 3 files changed, 4 insertions(+), 11 deletions(-)

-- 
1.9.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Will Deacon Dec. 10, 2015, 2:11 p.m. UTC | #1
On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:
> Currently the zero page is set up in paging_init, and thus we cannot use

> the zero page earlier. We use the zero page as a reserved TTBR value

> from which no TLB entries may be allocated (e.g. when uninstalling the

> idmap). To enable such usage earlier (as may be required for invasive

> changes to the kernel page tables), and to minimise the time that the

> idmap is active, we need to be able to use the zero page before

> paging_init.

> 

> This patch follows the example set by x86, by allocating the zero page

> at compile time, in .bss. This means that the zero page itself is

> available immediately upon entry to start_kernel (as we zero .bss before

> this), and also means that the zero page takes up no space in the raw

> Image binary. The associated struct page is allocated in bootmem_init,

> and remains unavailable until this time.

> 

> Outside of arch code, the only users of empty_zero_page assume that the

> empty_zero_page symbol refers to the zeroed memory itself, and that

> ZERO_PAGE(x) must be used to acquire the associated struct page,

> following the example of x86. This patch also brings arm64 inline with

> these assumptions.

> 

> Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> Cc: Catalin Marinas <catalin.marinas@arm.com>

> Cc: Jeremy Linton <jeremy.linton@arm.com>

> Cc: Laura Abbott <labbott@fedoraproject.org>

> Cc: Will Deacon <will.deacon@arm.com>

> ---

>  arch/arm64/include/asm/mmu_context.h | 2 +-

>  arch/arm64/include/asm/pgtable.h     | 4 ++--

>  arch/arm64/mm/mmu.c                  | 9 +--------

>  3 files changed, 4 insertions(+), 11 deletions(-)


[...]

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c

> index 304ff23..7559c22 100644

> --- a/arch/arm64/mm/mmu.c

> +++ b/arch/arm64/mm/mmu.c

> @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);

>   * Empty_zero_page is a special page that is used for zero-initialized data

>   * and COW.

>   */

> -struct page *empty_zero_page;

> +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;

>  EXPORT_SYMBOL(empty_zero_page);


I've been looking at this, and it was making me feel uneasy because it's
full of junk before the bss is zeroed. Working that through, it's no
worse than what we currently have but I then realised that (a) we don't
have a dsb after zeroing the zero page (which we need to make sure the
zeroes are visible to the page table walker and (b) the zero page is
never explicitly cleaned to the PoC.

There may be cases where the zero-page is used to back read-only,
non-cacheable mappings (something to do with KVM?), so I'd sleep better
if we made sure that it was clean.

Thoughts?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mark Rutland Dec. 10, 2015, 3:29 p.m. UTC | #2
On Thu, Dec 10, 2015 at 02:11:08PM +0000, Will Deacon wrote:
> On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:

> > Currently the zero page is set up in paging_init, and thus we cannot use

> > the zero page earlier. We use the zero page as a reserved TTBR value

> > from which no TLB entries may be allocated (e.g. when uninstalling the

> > idmap). To enable such usage earlier (as may be required for invasive

> > changes to the kernel page tables), and to minimise the time that the

> > idmap is active, we need to be able to use the zero page before

> > paging_init.

> > 

> > This patch follows the example set by x86, by allocating the zero page

> > at compile time, in .bss. This means that the zero page itself is

> > available immediately upon entry to start_kernel (as we zero .bss before

> > this), and also means that the zero page takes up no space in the raw

> > Image binary. The associated struct page is allocated in bootmem_init,

> > and remains unavailable until this time.

> > 

> > Outside of arch code, the only users of empty_zero_page assume that the

> > empty_zero_page symbol refers to the zeroed memory itself, and that

> > ZERO_PAGE(x) must be used to acquire the associated struct page,

> > following the example of x86. This patch also brings arm64 inline with

> > these assumptions.

> > 

> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> > Cc: Catalin Marinas <catalin.marinas@arm.com>

> > Cc: Jeremy Linton <jeremy.linton@arm.com>

> > Cc: Laura Abbott <labbott@fedoraproject.org>

> > Cc: Will Deacon <will.deacon@arm.com>

> > ---

> >  arch/arm64/include/asm/mmu_context.h | 2 +-

> >  arch/arm64/include/asm/pgtable.h     | 4 ++--

> >  arch/arm64/mm/mmu.c                  | 9 +--------

> >  3 files changed, 4 insertions(+), 11 deletions(-)

> 

> [...]

> 

> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c

> > index 304ff23..7559c22 100644

> > --- a/arch/arm64/mm/mmu.c

> > +++ b/arch/arm64/mm/mmu.c

> > @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);

> >   * Empty_zero_page is a special page that is used for zero-initialized data

> >   * and COW.

> >   */

> > -struct page *empty_zero_page;

> > +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;

> >  EXPORT_SYMBOL(empty_zero_page);

> 

> I've been looking at this, and it was making me feel uneasy because it's

> full of junk before the bss is zeroed. Working that through, it's no

> worse than what we currently have but I then realised that (a) we don't

> have a dsb after zeroing the zero page (which we need to make sure the

> zeroes are visible to the page table walker and (b) the zero page is

> never explicitly cleaned to the PoC.


Ouch; that's scary.

> There may be cases where the zero-page is used to back read-only,

> non-cacheable mappings (something to do with KVM?), so I'd sleep better

> if we made sure that it was clean.


From a grep around for uses of ZERO_PAGE, in most places the zero page
is simply used as an empty buffer for I/O. In these cases it's either
accessed coherently or goes via the usual machinery for non-coherent DMA
kicks in.

I don't believe that we usually give userspace the ability to create
non-cacheable mappings, and I couldn't spot any paths it could do so via
some driver-specific IOCTL applied to the zero page.

Looking around, kvm_clear_guest_page seemed problematic, but isn't used
on arm64. I can imagine the zero page being mapped into guests in other
situations when mirroring the userspace mapping. 

Marc, Christoffer, I thought we cleaned pages to the PoC before mapping
them into a guest? Is that right? Or do we have potential issues there?

> Thoughts?


I supect that other than the missing barrier, we're fine for the
timebeing.

We should figure out what other architectures do. If drivers cannot
assume that the zero page is accessible by non-cacheable accesses I'm
not sure wehther we should clean it (though I agree this is the simplest
thing to do).

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mark Rutland Dec. 10, 2015, 3:51 p.m. UTC | #3
On Thu, Dec 10, 2015 at 03:40:08PM +0000, Marc Zyngier wrote:
> On 10/12/15 15:29, Mark Rutland wrote:

> > On Thu, Dec 10, 2015 at 02:11:08PM +0000, Will Deacon wrote:

> >> On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:

> >>> Currently the zero page is set up in paging_init, and thus we cannot use

> >>> the zero page earlier. We use the zero page as a reserved TTBR value

> >>> from which no TLB entries may be allocated (e.g. when uninstalling the

> >>> idmap). To enable such usage earlier (as may be required for invasive

> >>> changes to the kernel page tables), and to minimise the time that the

> >>> idmap is active, we need to be able to use the zero page before

> >>> paging_init.

> >>>

> >>> This patch follows the example set by x86, by allocating the zero page

> >>> at compile time, in .bss. This means that the zero page itself is

> >>> available immediately upon entry to start_kernel (as we zero .bss before

> >>> this), and also means that the zero page takes up no space in the raw

> >>> Image binary. The associated struct page is allocated in bootmem_init,

> >>> and remains unavailable until this time.

> >>>

> >>> Outside of arch code, the only users of empty_zero_page assume that the

> >>> empty_zero_page symbol refers to the zeroed memory itself, and that

> >>> ZERO_PAGE(x) must be used to acquire the associated struct page,

> >>> following the example of x86. This patch also brings arm64 inline with

> >>> these assumptions.

> >>>

> >>> Signed-off-by: Mark Rutland <mark.rutland@arm.com>

> >>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> >>> Cc: Catalin Marinas <catalin.marinas@arm.com>

> >>> Cc: Jeremy Linton <jeremy.linton@arm.com>

> >>> Cc: Laura Abbott <labbott@fedoraproject.org>

> >>> Cc: Will Deacon <will.deacon@arm.com>

> >>> ---

> >>>  arch/arm64/include/asm/mmu_context.h | 2 +-

> >>>  arch/arm64/include/asm/pgtable.h     | 4 ++--

> >>>  arch/arm64/mm/mmu.c                  | 9 +--------

> >>>  3 files changed, 4 insertions(+), 11 deletions(-)

> >>

> >> [...]

> >>

> >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c

> >>> index 304ff23..7559c22 100644

> >>> --- a/arch/arm64/mm/mmu.c

> >>> +++ b/arch/arm64/mm/mmu.c

> >>> @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);

> >>>   * Empty_zero_page is a special page that is used for zero-initialized data

> >>>   * and COW.

> >>>   */

> >>> -struct page *empty_zero_page;

> >>> +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;

> >>>  EXPORT_SYMBOL(empty_zero_page);

> >>

> >> I've been looking at this, and it was making me feel uneasy because it's

> >> full of junk before the bss is zeroed. Working that through, it's no

> >> worse than what we currently have but I then realised that (a) we don't

> >> have a dsb after zeroing the zero page (which we need to make sure the

> >> zeroes are visible to the page table walker and (b) the zero page is

> >> never explicitly cleaned to the PoC.

> > 

> > Ouch; that's scary.

> > 

> >> There may be cases where the zero-page is used to back read-only,

> >> non-cacheable mappings (something to do with KVM?), so I'd sleep better

> >> if we made sure that it was clean.

> > 

> > From a grep around for uses of ZERO_PAGE, in most places the zero page

> > is simply used as an empty buffer for I/O. In these cases it's either

> > accessed coherently or goes via the usual machinery for non-coherent DMA

> > kicks in.

> > 

> > I don't believe that we usually give userspace the ability to create

> > non-cacheable mappings, and I couldn't spot any paths it could do so via

> > some driver-specific IOCTL applied to the zero page.

> > 

> > Looking around, kvm_clear_guest_page seemed problematic, but isn't used

> > on arm64. I can imagine the zero page being mapped into guests in other

> > situations when mirroring the userspace mapping. 

> > 

> > Marc, Christoffer, I thought we cleaned pages to the PoC before mapping

> > them into a guest? Is that right? Or do we have potential issues there?

> 

> I think we're OK. Looking at __coherent_cache_guest_page (which is

> called when transitioning from an invalid to valid mapping), we do flush

> things to PoC if the vcpu has its cache disabled (or if we know that the

> IPA shouldn't be cached - the whole NOR flash emulation horror story).


So we asume the guest never disables the MMU, and always uses consistent
attributes for a given IPA (e.g. it doesn't have a Device and Normal
Cacheable mapping)?

> Does it answer your question?


I think so. If those assumptions are true then I agree we're ok. If
those aren't we have other problems.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 2416578..600eacb 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -48,7 +48,7 @@  static inline void contextidr_thread_switch(struct task_struct *next)
  */
 static inline void cpu_set_reserved_ttbr0(void)
 {
-	unsigned long ttbr = page_to_phys(empty_zero_page);
+	unsigned long ttbr = virt_to_phys(empty_zero_page);
 
 	asm(
 	"	msr	ttbr0_el1, %0			// set TTBR0\n"
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e074f9..00f5a4b8 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -121,8 +121,8 @@  extern void __pgd_error(const char *file, int line, unsigned long val);
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
  */
-extern struct page *empty_zero_page;
-#define ZERO_PAGE(vaddr)	(empty_zero_page)
+extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
+#define ZERO_PAGE(vaddr)	virt_to_page(empty_zero_page)
 
 #define pte_ERROR(pte)		__pte_error(__FILE__, __LINE__, pte_val(pte))
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 304ff23..7559c22 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -48,7 +48,7 @@  u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
  */
-struct page *empty_zero_page;
+unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
 
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
@@ -441,18 +441,11 @@  void fixup_init(void)
  */
 void __init paging_init(void)
 {
-	void *zero_page;
-
 	map_mem();
 	fixup_executable();
 
-	/* allocate the zero page. */
-	zero_page = early_alloc(PAGE_SIZE);
-
 	bootmem_init();
 
-	empty_zero_page = virt_to_page(zero_page);
-
 	/*
 	 * TTBR0 is only used for the identity mapping at this stage. Make it
 	 * point to zero page to avoid speculatively fetching new entries.