diff mbox series

[RFC,03/11] arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d()

Message ID 1535125966-7666-4-git-send-email-will.deacon@arm.com
State Superseded
Headers show
Series Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 | expand

Commit Message

Will Deacon Aug. 24, 2018, 3:52 p.m. UTC
Now that our walk-cache invalidation routines imply a DSB before the
invalidation, we no longer need one when we are clearing an entry during
unmap.

Signed-off-by: Will Deacon <will.deacon@arm.com>

---
 arch/arm64/include/asm/pgtable.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

-- 
2.1.4

Comments

Linus Torvalds Aug. 24, 2018, 4:15 p.m. UTC | #1
On Fri, Aug 24, 2018 at 8:52 AM Will Deacon <will.deacon@arm.com> wrote:
>

> Now that our walk-cache invalidation routines imply a DSB before the

> invalidation, we no longer need one when we are clearing an entry during

> unmap.


Do you really still need it when *setting* it?

I'm wondering if you could just remove the thing unconditionally.

Why would you need a barrier for another CPU for a mapping that is
just being created? It's ok if they see the old lack of mapping until
they are told about it, and that eventual "being told about it" must
involve a data transfer already.

And I'm assuming arm doesn't cache negative page table entries, so
there's no issue with any stale tlb.

And any other kernel thread looking at the page tables will have to
honor the page table locking, so you don't need it for some direct
page table lookup either.

Hmm? It seems like you shouldn't need to order the "set page directory
entry" with anything.

But maybe there's some magic arm64 rule I'm not aware of. Maybe even
the local TLB hardware walker isn't coherent with local stores?

                Linus
Will Deacon Aug. 28, 2018, 12:49 p.m. UTC | #2
Hi Linus,

On Fri, Aug 24, 2018 at 09:15:17AM -0700, Linus Torvalds wrote:
> On Fri, Aug 24, 2018 at 8:52 AM Will Deacon <will.deacon@arm.com> wrote:

> >

> > Now that our walk-cache invalidation routines imply a DSB before the

> > invalidation, we no longer need one when we are clearing an entry during

> > unmap.

> 

> Do you really still need it when *setting* it?

> 

> I'm wondering if you could just remove the thing unconditionally.

> 

> Why would you need a barrier for another CPU for a mapping that is

> just being created? It's ok if they see the old lack of mapping until

> they are told about it, and that eventual "being told about it" must

> involve a data transfer already.

> 

> And I'm assuming arm doesn't cache negative page table entries, so

> there's no issue with any stale tlb.

> 

> And any other kernel thread looking at the page tables will have to

> honor the page table locking, so you don't need it for some direct

> page table lookup either.

> 

> Hmm? It seems like you shouldn't need to order the "set page directory

> entry" with anything.

> 

> But maybe there's some magic arm64 rule I'm not aware of. Maybe even

> the local TLB hardware walker isn't coherent with local stores?


Yup, you got it: it's not related to ordering of accesses by other CPUs, but
actually because the page-table walker is treated as a separate observer by
the architecture and therefore we need the DSB to push out the store to the
page-table so that the walker can see it (practically speaking, the walker
isn't guaranteed to snoop the store buffer).

For PTEs mapping user addresses, we actually don't bother with the DSB
when writing a valid entry because it's extremely unlikely that we'd get
back to userspace with the entry sitting in the store buffer. If that
*did* happen, we'd just take the fault a second time. However, if we played
that same trick for pXds, I think that:

	(a) We'd need to distinguish between user and kernel mappings
	    in set_pXd(), since we can't tolerate spurious faults on
	    kernel addresses.
	(b) We'd need to be careful about allocating page-table pages,
	    so that e.g. the walker sees zeroes for a new pgtable

We could probably achieve (a) with a software bit and (b) is a non-issue
because mm/memory.c uses smp_wmb(), which is always a DMB for us (which
will enforce the eventual ordering but doesn't necessarily publish the
stores immediately).

Will
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 1bdeca8918a6..2ab2031b778c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -360,6 +360,7 @@  static inline int pmd_protnone(pmd_t pmd)
 #define pmd_present(pmd)	pte_present(pmd_pte(pmd))
 #define pmd_dirty(pmd)		pte_dirty(pmd_pte(pmd))
 #define pmd_young(pmd)		pte_young(pmd_pte(pmd))
+#define pmd_valid(pmd)		pte_valid(pmd_pte(pmd))
 #define pmd_wrprotect(pmd)	pte_pmd(pte_wrprotect(pmd_pte(pmd)))
 #define pmd_mkold(pmd)		pte_pmd(pte_mkold(pmd_pte(pmd)))
 #define pmd_mkwrite(pmd)	pte_pmd(pte_mkwrite(pmd_pte(pmd)))
@@ -431,7 +432,9 @@  extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
 	WRITE_ONCE(*pmdp, pmd);
-	dsb(ishst);
+
+	if (pmd_valid(pmd))
+		dsb(ishst);
 }
 
 static inline void pmd_clear(pmd_t *pmdp)
@@ -477,11 +480,14 @@  static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 #define pud_none(pud)		(!pud_val(pud))
 #define pud_bad(pud)		(!(pud_val(pud) & PUD_TABLE_BIT))
 #define pud_present(pud)	pte_present(pud_pte(pud))
+#define pud_valid(pud)		pte_valid(pud_pte(pud))
 
 static inline void set_pud(pud_t *pudp, pud_t pud)
 {
 	WRITE_ONCE(*pudp, pud);
-	dsb(ishst);
+
+	if (pud_valid(pud))
+		dsb(ishst);
 }
 
 static inline void pud_clear(pud_t *pudp)