diff mbox

arm64: Add flush_cache_vmap call in __early_set_fixmap

Message ID 20140609110356.GD25590@arm.com
State New
Headers show

Commit Message

Catalin Marinas June 9, 2014, 11:03 a.m. UTC
On Fri, Jun 06, 2014 at 04:09:33PM +0100, Mark Salter wrote:
> On Fri, 2014-06-06 at 15:53 +0100, Leif Lindholm wrote:
> > On Fri, Jun 06, 2014 at 10:37:29AM -0400, Mark Salter wrote:
> > > On Fri, 2014-06-06 at 11:29 +0100, Leif Lindholm wrote:
> > > > __early_set_fixmap does not do any synchronization when called to set a
> > > > fixmap entry. Add call to flush_vmap_cache().

Did you hit a problem or it was just for safety?

> > > > Tested on hardware.
> > > > 
> > > > Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
> > > > Tested-by: Graeme Gregory <graeme.gregory@linaro.org>
> > > > Cc: Steve Capper <steve.capper@linaro.org>
> > > > ---
> > > >  arch/arm64/mm/ioremap.c |    5 +++--
> > > >  1 file changed, 3 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/mm/ioremap.c b/arch/arm64/mm/ioremap.c
> > > > index 7ec3283..5b8766c 100644
> > > > --- a/arch/arm64/mm/ioremap.c
> > > > +++ b/arch/arm64/mm/ioremap.c
> > > > @@ -176,9 +176,10 @@ void __init __early_set_fixmap(enum fixed_addresses idx,
> > > >  
> > > >  	pte = early_ioremap_pte(addr);
> > > >  
> > > > -	if (pgprot_val(flags))
> > > > +	if (pgprot_val(flags)) {
> > > >  		set_pte(pte, pfn_pte(phys >> PAGE_SHIFT, flags));
> > > > -	else {
> > > > +		flush_cache_vmap(addr, addr + PAGE_SIZE);
> > > > +	} else {
> > > >  		pte_clear(&init_mm, addr, pte);
> > > >  		flush_tlb_kernel_range(addr, addr+PAGE_SIZE);
> > > >  	}
> > > 
> > > I'm confused by the commit message mentioning synchronization but
> > > the code doing a cache flush. I see that arm64 implementation of
> > > flush_cache_vmap() is just a dsb(). If it is synchronization that
> > > we need here (and it certainly looks like we do), why not just add
> > > the dsb() directly to make that clear?
> > 
> > It needs this Linux-semantically for the same reason remap_page_range
> > needs it. From the ARM architectural point of view, the reason is that
> > the translation table walk is considered a separate observer from the
> > core data interface.
> > 
> > But since there is a common Linux semantic for this, I preferred
> > reusing that over just throwing in a dsb(). My interpretation of
> > flush_cache_vmap() was "flush mappings from cache, so they can be
> > picked up by table walk". While we don't technically need to flush the
> > cache here, the underlying requirement is the same.
> 
> But the range you are flushing is not a range seen by the table walk
> observer. I just think it is clearer to explicitly show that it is
> the pte write which we want the table walk to see rather than to
> rely on the implicit behavior of a cache flush routine.

I think that's a valid point. flush_cache_vmap() is used to remove any
cached entries for the ioremap/vmap'ed range. That's not the aim here.

As an optimisation, set_pte() doesn't have a dsb(). We do this on the
clearing/invalidating path via the TLB flushing routines but not on the
re-enabling path. Here we just added dsb() in the relevant functions
that were called from the generic code (flush_cache_vmap,
update_mmu_cache).

A quick grep through the kernel shows that we have other set_pte() calls
without additional dsb() like create_mapping(), I think kvm_set_pte() as
well.

So I'm proposing an alternative patch (which needs some benchmarking as
well to see if anything is affected, maybe application startup time).

------------------8<-------------------------------

From b5cd0fff5cb044fa32cbaa4eebd2f00690922567 Mon Sep 17 00:00:00 2001
From: Catalin Marinas <catalin.marinas@arm.com>
Date: Mon, 9 Jun 2014 11:55:03 +0100
Subject: [PATCH] arm64: Use DSB after page table update

As an optimisation, set_pte() does not contain a DSB instruction to
ensure the observability of the page table update before subsequent
memory accesses or TLB maintenance. Such barrier is placed in the
flush_tlb_*() functions and flush_cache_vmap()/update_mmu_cache().
However, there are still places where this barrier is missed like
create_mapping().

This patch adds a dsb(ishst) call in set_pte() but only when the entry
being written is valid. For invalid entries, there is always a
subsequent TLB maintenance containing the DSB.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/cacheflush.h | 11 +----------
 arch/arm64/include/asm/pgtable.h    |  8 ++++++++
 arch/arm64/include/asm/tlbflush.h   |  5 -----
 3 files changed, 9 insertions(+), 15 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Leif Lindholm June 9, 2014, 1:24 p.m. UTC | #1
On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote:
> > > > > __early_set_fixmap does not do any synchronization when called to set a
> > > > > fixmap entry. Add call to flush_vmap_cache().
> 
> Did you hit a problem or it was just for safety?

This fixes an abort when accessing early UEFI configuration tables on
a hardware platform. (The issue can not be triggered with the FVM Base
Model, even with cache modelling enabled.)

> > > > I'm confused by the commit message mentioning synchronization but
> > > > the code doing a cache flush. I see that arm64 implementation of
> > > > flush_cache_vmap() is just a dsb(). If it is synchronization that
> > > > we need here (and it certainly looks like we do), why not just add
> > > > the dsb() directly to make that clear?
> > > 
> > > It needs this Linux-semantically for the same reason remap_page_range
> > > needs it. From the ARM architectural point of view, the reason is that
> > > the translation table walk is considered a separate observer from the
> > > core data interface.
> > > 
> > > But since there is a common Linux semantic for this, I preferred
> > > reusing that over just throwing in a dsb(). My interpretation of
> > > flush_cache_vmap() was "flush mappings from cache, so they can be
> > > picked up by table walk". While we don't technically need to flush the
> > > cache here, the underlying requirement is the same.
> > 
> > But the range you are flushing is not a range seen by the table walk
> > observer. I just think it is clearer to explicitly show that it is
> > the pte write which we want the table walk to see rather than to
> > rely on the implicit behavior of a cache flush routine.
> 
> I think that's a valid point. flush_cache_vmap() is used to remove any
> cached entries for the ioremap/vmap'ed range. That's not the aim here.
> 
> As an optimisation, set_pte() doesn't have a dsb(). We do this on the
> clearing/invalidating path via the TLB flushing routines but not on the
> re-enabling path. Here we just added dsb() in the relevant functions
> that were called from the generic code (flush_cache_vmap,
> update_mmu_cache).
> 
> A quick grep through the kernel shows that we have other set_pte() calls
> without additional dsb() like create_mapping(), I think kvm_set_pte() as
> well.
> 
> So I'm proposing an alternative patch (which needs some benchmarking as
> well to see if anything is affected, maybe application startup time).

I'm happy for any fix which can be included in 3.16.

But is the dsb(ishst) sufficient? We need to also prevent reads from
overtaking the set_pte(). i.e.:

ptr = early_ioremap(phys_addr, size);
if (ptr && strcmp(ptr, "magic") == 0)
   ...

Does it not require a dsb(ish)?
 
> ------------------8<-------------------------------
> 
> From b5cd0fff5cb044fa32cbaa4eebd2f00690922567 Mon Sep 17 00:00:00 2001
> From: Catalin Marinas <catalin.marinas@arm.com>
> Date: Mon, 9 Jun 2014 11:55:03 +0100
> Subject: [PATCH] arm64: Use DSB after page table update
> 
> As an optimisation, set_pte() does not contain a DSB instruction to
> ensure the observability of the page table update before subsequent
> memory accesses or TLB maintenance. Such barrier is placed in the
> flush_tlb_*() functions and flush_cache_vmap()/update_mmu_cache().
> However, there are still places where this barrier is missed like
> create_mapping().
> 
> This patch adds a dsb(ishst) call in set_pte() but only when the entry
> being written is valid. For invalid entries, there is always a
> subsequent TLB maintenance containing the DSB.
> 
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/include/asm/cacheflush.h | 11 +----------
>  arch/arm64/include/asm/pgtable.h    |  8 ++++++++
>  arch/arm64/include/asm/tlbflush.h   |  5 -----
>  3 files changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
> index a5176cf32dad..8fdf37d83014 100644
> --- a/arch/arm64/include/asm/cacheflush.h
> +++ b/arch/arm64/include/asm/cacheflush.h
> @@ -138,19 +138,10 @@ static inline void __flush_icache_all(void)
>  #define flush_icache_page(vma,page)	do { } while (0)
>  
>  /*
> - * flush_cache_vmap() is used when creating mappings (eg, via vmap,
> - * vmalloc, ioremap etc) in kernel space for pages.  On non-VIPT
> - * caches, since the direct-mappings of these pages may contain cached
> - * data, we need to do a full cache flush to ensure that writebacks
> - * don't corrupt data placed into these pages via the new mappings.
> + * Not required on AArch64 with VIPT caches.
>   */
>  static inline void flush_cache_vmap(unsigned long start, unsigned long end)
>  {
> -	/*
> -	 * set_pte_at() called from vmap_pte_range() does not
> -	 * have a DSB after cleaning the cache line.
> -	 */
> -	dsb(ish);
>  }
>  
>  static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index aa150ed99f22..2d03af6a2225 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -136,6 +136,7 @@ extern struct page *empty_zero_page;
>  #define pte_write(pte)		(!!(pte_val(pte) & PTE_WRITE))
>  #define pte_exec(pte)		(!(pte_val(pte) & PTE_UXN))
>  
> +#define pte_valid(pte)		(!!(pte_val(pte) & PTE_VALID))
>  #define pte_valid_user(pte) \
>  	((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER))
>  
> @@ -184,6 +185,13 @@ static inline pte_t pte_mkspecial(pte_t pte)
>  static inline void set_pte(pte_t *ptep, pte_t pte)
>  {
>  	*ptep = pte;
> +
> +	/*
> +	 * If !pte_valid(pte), the DSB is handled by the TLB invalidation
> +	 * operation.
> +	 */
> +	if (pte_valid(pte))
> +		dsb(ishst);
>  }
>  
>  extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
> diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
> index b9349c4513ea..b28b970123dc 100644
> --- a/arch/arm64/include/asm/tlbflush.h
> +++ b/arch/arm64/include/asm/tlbflush.h
> @@ -130,11 +130,6 @@ static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
>  static inline void update_mmu_cache(struct vm_area_struct *vma,
>  				    unsigned long addr, pte_t *ptep)
>  {
> -	/*
> -	 * set_pte() does not have a DSB, so make sure that the page table
> -	 * write is visible.
> -	 */
> -	dsb(ishst);
>  }
>  
>  #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Catalin Marinas June 9, 2014, 1:38 p.m. UTC | #2
On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote:
> On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote:
> > A quick grep through the kernel shows that we have other set_pte() calls
> > without additional dsb() like create_mapping(), I think kvm_set_pte() as
> > well.
> > 
> > So I'm proposing an alternative patch (which needs some benchmarking as
> > well to see if anything is affected, maybe application startup time).
> 
> I'm happy for any fix which can be included in 3.16.

Steve Capper made a point about performance. He'll follow up.

> But is the dsb(ishst) sufficient? We need to also prevent reads from
> overtaking the set_pte(). i.e.:
> 
> ptr = early_ioremap(phys_addr, size);
> if (ptr && strcmp(ptr, "magic") == 0)
>    ...
> 
> Does it not require a dsb(ish)?

So doesn't early_ioremap() now include a dsb() after set_pte() with my
patch?

BTW, according to the ARM ARM (and confirmed with architects), we needs
DSB+ISB even if we have just a data access (rather than instruction
fetch). We have to revisit both 32 and 64-bit code for this.
Steve Capper June 9, 2014, 4:40 p.m. UTC | #3
On Mon, Jun 09, 2014 at 02:38:59PM +0100, Catalin Marinas wrote:
> On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote:
> > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote:
> > > A quick grep through the kernel shows that we have other set_pte() calls
> > > without additional dsb() like create_mapping(), I think kvm_set_pte() as
> > > well.
> > > 
> > > So I'm proposing an alternative patch (which needs some benchmarking as
> > > well to see if anything is affected, maybe application startup time).
> > 
> > I'm happy for any fix which can be included in 3.16.
> 
> Steve Capper made a point about performance. He'll follow up.

Hi :-).
My concern was initially about splitting THPs, as with a 64K granule,
we will have 2048 calls to set_pte_at in a loop. Having thought about
it, these should be relatively rare events though.

There will be an impact on various aspects of the memory system (for
instance fork), but this will need to be measured.

Cheers,
Catalin Marinas June 10, 2014, 10:39 a.m. UTC | #4
On Mon, Jun 09, 2014 at 05:40:05PM +0100, Steve Capper wrote:
> On Mon, Jun 09, 2014 at 02:38:59PM +0100, Catalin Marinas wrote:
> > On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote:
> > > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote:
> > > > A quick grep through the kernel shows that we have other set_pte() calls
> > > > without additional dsb() like create_mapping(), I think kvm_set_pte() as
> > > > well.
> > > > 
> > > > So I'm proposing an alternative patch (which needs some benchmarking as
> > > > well to see if anything is affected, maybe application startup time).
> > > 
> > > I'm happy for any fix which can be included in 3.16.
> > 
> > Steve Capper made a point about performance. He'll follow up.
> 
> My concern was initially about splitting THPs, as with a 64K granule,
> we will have 2048 calls to set_pte_at in a loop. Having thought about
> it, these should be relatively rare events though.

Are there any THP benchmarks?

> There will be an impact on various aspects of the memory system (for
> instance fork), but this will need to be measured.

There is but I don't think we have an easy around. Grep'ing the kernel
for set_pte_at() shows several places where this is called without a
corresponding DSB (currently only as part of flush_cache_vmap and
update_mmu_cache).

For example, insert_page(), though insert_pfn() calls update_mmu_cache()
and it even has a comment "why not for insert page?".

So we need to run some benchmarks to assess the impact. If significant,
we can look at alternatives (in addition to the above, possibly adding
barriers to arch_leave_lazy_mmu_mode).
Will Deacon June 16, 2014, 2:12 p.m. UTC | #5
On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote:
> On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote:
> > So I'm proposing an alternative patch (which needs some benchmarking as
> > well to see if anything is affected, maybe application startup time).

I don't like the proposed patch at all -- keeping the dsb out of set_pte is
worthwhile if we can manage it. That said, it would be interesting to know
how often we get a subsequent page fault after mapping invalid -> valid
because of the missing dsb. It could be that the cost of the benign fault is
hitting us more than we think.

> I'm happy for any fix which can be included in 3.16.
> 
> But is the dsb(ishst) sufficient? We need to also prevent reads from
> overtaking the set_pte(). i.e.:
> 
> ptr = early_ioremap(phys_addr, size);
> if (ptr && strcmp(ptr, "magic") == 0)
>    ...
> 
> Does it not require a dsb(ish)?

I don't think so. Crudely, the sequence above would look like:

  STR	x0, [PTEP]
  DSB	ISHST
  LDR	x0, [MAGIC]

The DSB can't complete until the STR is globally observed within the
inner-shareable domain, so the LDR cannot execute until the page table
update is visible to the walker.

If it was a DMB, we'd have a problem. Interestingly, the asm-generic
page table allocators (e.g. __pmd_alloc) *do* use dmb for ordering
observability of page-table updates via smp_wmb. I'm struggling to decide
whether that's broken or not.

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Will Deacon June 16, 2014, 2:17 p.m. UTC | #6
On Mon, Jun 09, 2014 at 02:38:59PM +0100, Catalin Marinas wrote:
> On Mon, Jun 09, 2014 at 02:24:29PM +0100, Leif Lindholm wrote:
> > On Mon, Jun 09, 2014 at 12:03:56PM +0100, Catalin Marinas wrote:
> > > A quick grep through the kernel shows that we have other set_pte() calls
> > > without additional dsb() like create_mapping(), I think kvm_set_pte() as
> > > well.
> > > 
> > > So I'm proposing an alternative patch (which needs some benchmarking as
> > > well to see if anything is affected, maybe application startup time).
> > 
> > I'm happy for any fix which can be included in 3.16.
> 
> Steve Capper made a point about performance. He'll follow up.
> 
> > But is the dsb(ishst) sufficient? We need to also prevent reads from
> > overtaking the set_pte(). i.e.:
> > 
> > ptr = early_ioremap(phys_addr, size);
> > if (ptr && strcmp(ptr, "magic") == 0)
> >    ...
> > 
> > Does it not require a dsb(ish)?
> 
> So doesn't early_ioremap() now include a dsb() after set_pte() with my
> patch?
> 
> BTW, according to the ARM ARM (and confirmed with architects), we needs
> DSB+ISB even if we have just a data access (rather than instruction
> fetch). We have to revisit both 32 and 64-bit code for this.

The impact of this really depends on whether or not we need to sandwich
DSB+ISB between every update to a set of page tables (possibly plumbing
together different levels) or whether the ISB can just come at the end.

If the latter case is true, we just need to add something for kernel
mappings, which should be pretty low overhead,

Will
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/arch/arm64/include/asm/cacheflush.h b/arch/arm64/include/asm/cacheflush.h
index a5176cf32dad..8fdf37d83014 100644
--- a/arch/arm64/include/asm/cacheflush.h
+++ b/arch/arm64/include/asm/cacheflush.h
@@ -138,19 +138,10 @@  static inline void __flush_icache_all(void)
 #define flush_icache_page(vma,page)	do { } while (0)
 
 /*
- * flush_cache_vmap() is used when creating mappings (eg, via vmap,
- * vmalloc, ioremap etc) in kernel space for pages.  On non-VIPT
- * caches, since the direct-mappings of these pages may contain cached
- * data, we need to do a full cache flush to ensure that writebacks
- * don't corrupt data placed into these pages via the new mappings.
+ * Not required on AArch64 with VIPT caches.
  */
 static inline void flush_cache_vmap(unsigned long start, unsigned long end)
 {
-	/*
-	 * set_pte_at() called from vmap_pte_range() does not
-	 * have a DSB after cleaning the cache line.
-	 */
-	dsb(ish);
 }
 
 static inline void flush_cache_vunmap(unsigned long start, unsigned long end)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index aa150ed99f22..2d03af6a2225 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -136,6 +136,7 @@  extern struct page *empty_zero_page;
 #define pte_write(pte)		(!!(pte_val(pte) & PTE_WRITE))
 #define pte_exec(pte)		(!(pte_val(pte) & PTE_UXN))
 
+#define pte_valid(pte)		(!!(pte_val(pte) & PTE_VALID))
 #define pte_valid_user(pte) \
 	((pte_val(pte) & (PTE_VALID | PTE_USER)) == (PTE_VALID | PTE_USER))
 
@@ -184,6 +185,13 @@  static inline pte_t pte_mkspecial(pte_t pte)
 static inline void set_pte(pte_t *ptep, pte_t pte)
 {
 	*ptep = pte;
+
+	/*
+	 * If !pte_valid(pte), the DSB is handled by the TLB invalidation
+	 * operation.
+	 */
+	if (pte_valid(pte))
+		dsb(ishst);
 }
 
 extern void __sync_icache_dcache(pte_t pteval, unsigned long addr);
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index b9349c4513ea..b28b970123dc 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -130,11 +130,6 @@  static inline void flush_tlb_kernel_range(unsigned long start, unsigned long end
 static inline void update_mmu_cache(struct vm_area_struct *vma,
 				    unsigned long addr, pte_t *ptep)
 {
-	/*
-	 * set_pte() does not have a DSB, so make sure that the page table
-	 * write is visible.
-	 */
-	dsb(ishst);
 }
 
 #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)