Message ID | 1535645747-9823-1-git-send-email-will.deacon@arm.com |
---|---|
Headers | show |
Series | Avoid synchronous TLB invalidation for intermediate page-table entries on arm64 | expand |
On Thu, Aug 30, 2018 at 9:15 AM Will Deacon <will.deacon@arm.com> wrote: > > It's also had a lot more testing, but has held up nicely so far on arm64. > I haven't figured out how to merge this yet, but I'll probably end up pulling > the core changes out onto a separate branch. This looks fine, and I'm actually ok getting the core changes through the arm64 branch, since this has been discussed across architectures, and I think "whoever does the work gets to drive the car". After all, a lot of the core changes originally came from x86 people (pretty much all of it, historically). No reason why arm64 can't get some of that too. But with the glory comes the blame when something breaks ;) Linus
On Thu, Aug 30, 2018 at 05:15:34PM +0100, Will Deacon wrote: > Peter Zijlstra (1): > asm-generic/tlb: Track freeing of page-table directories in struct > mmu_gather > > Will Deacon (11): > arm64: tlb: Use last-level invalidation in flush_tlb_kernel_range() > arm64: tlb: Add DSB ISHST prior to TLBI in > __flush_tlb_[kernel_]pgtable() > arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d() > arm64: tlb: Justify non-leaf invalidation in flush_tlb_range() > arm64: tlbflush: Allow stride to be specified for __flush_tlb_range() > arm64: tlb: Remove redundant !CONFIG_HAVE_RCU_TABLE_FREE code > asm-generic/tlb: Guard with #ifdef CONFIG_MMU > asm-generic/tlb: Track which levels of the page tables have been > cleared > arm64: tlb: Adjust stride and type of TLBI according to mmu_gather > arm64: tlb: Avoid synchronous TLBIs when freeing page tables > arm64: tlb: Rewrite stale comment in asm/tlbflush.h > > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/pgtable.h | 10 +++- > arch/arm64/include/asm/tlb.h | 34 +++++------- > arch/arm64/include/asm/tlbflush.h | 112 ++++++++++++++++++++++++-------------- > include/asm-generic/tlb.h | 85 +++++++++++++++++++++++++---- > mm/memory.c | 4 +- > 6 files changed, 168 insertions(+), 78 deletions(-) These patches look good to me, thanks! Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
On Thu, 30 Aug 2018 09:39:38 -0700 Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Thu, Aug 30, 2018 at 9:15 AM Will Deacon <will.deacon@arm.com> wrote: > > > > It's also had a lot more testing, but has held up nicely so far on arm64. > > I haven't figured out how to merge this yet, but I'll probably end up pulling > > the core changes out onto a separate branch. > > This looks fine, and I'm actually ok getting the core changes through > the arm64 branch, since this has been discussed across architectures, > and I think "whoever does the work gets to drive the car". Well it would help if powerpc say wanted to start using them without a merge cycle lag. Not a huge issue because powerpc already does reasonably well here and there's other work that can be done. I will try to review the core changes carefully next week. Thanks, Nick
On Thu, Aug 30, 2018 at 6:01 PM Nicholas Piggin <npiggin@gmail.com> wrote: > > Well it would help if powerpc say wanted to start using them without a > merge cycle lag. Not a huge issue because powerpc already does > reasonably well here and there's other work that can be done. Sure. If somebody wants to send the generic changes I can just take them directly to make it easier for people to work on this. Linus
On Thu, Aug 30, 2018 at 06:04:12PM -0700, Linus Torvalds wrote: > On Thu, Aug 30, 2018 at 6:01 PM Nicholas Piggin <npiggin@gmail.com> wrote: > > > > Well it would help if powerpc say wanted to start using them without a > > merge cycle lag. Not a huge issue because powerpc already does > > reasonably well here and there's other work that can be done. > > Sure. If somebody wants to send the generic changes I can just take > them directly to make it easier for people to work on this. Tell you what: how about I stick the following patches (with Nick's and Peter's acks) on a separate, stable branch: asm-generic/tlb: Track which levels of the page tables have been cleared asm-generic/tlb: Track freeing of page-table directories in struct mmu_gather asm-generic/tlb: Guard with #ifdef CONFIG_MMU and then anybody who needs them can just pull that in for the merge window? Also, how would people feel about adding a MAINTAINERS entry for all the tlb.h files? A big part of the recent "fun" was us figuring out what the code is actually doing ("It used to do foo() but that may have changed"). and it certainly took me the best part of a day to figure things out again. If we're trying to do more in the generic code and less in the arch code, it would help if we're on top of the changes in this area. Proposal below (omitted Linus because that seems to be the pattern elsewhere in the file and he's not going to shout at himself when things break :) Anybody I've missed? Will --->8 diff --git a/MAINTAINERS b/MAINTAINERS index a5b256b25905..7224b5618883 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -9681,6 +9681,15 @@ S: Maintained F: arch/arm/boot/dts/mmp* F: arch/arm/mach-mmp/ +MMU GATHER AND TLB INVALIDATION +M: Will Deacon <will.deacon@arm.com> +M: Nick Piggin <npiggin@gmail.com> +M: Peter Zijlstra <peterz@infradead.org> +L: linux-arch@vger.kernel.org +S: Maintained +F: include/asm-generic/tlb.h +F: arch/*/include/asm/tlb.h + MN88472 MEDIA DRIVER M: Antti Palosaari <crope@iki.fi> L: linux-media@vger.kernel.org
On Fri, Aug 31, 2018 at 10:54:18AM +0100, Will Deacon wrote: > Proposal below (omitted Linus because that seems to be the pattern elsewhere > in the file and he's not going to shout at himself when things break :) > Anybody I've missed? > > Will > > --->8 > > diff --git a/MAINTAINERS b/MAINTAINERS > index a5b256b25905..7224b5618883 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -9681,6 +9681,15 @@ S: Maintained > F: arch/arm/boot/dts/mmp* > F: arch/arm/mach-mmp/ > > +MMU GATHER AND TLB INVALIDATION > +M: Will Deacon <will.deacon@arm.com> > +M: Nick Piggin <npiggin@gmail.com> > +M: Peter Zijlstra <peterz@infradead.org> > +L: linux-arch@vger.kernel.org > +S: Maintained > +F: include/asm-generic/tlb.h > +F: arch/*/include/asm/tlb.h > + > MN88472 MEDIA DRIVER > M: Antti Palosaari <crope@iki.fi> > L: linux-media@vger.kernel.org If we're going to do that (and I'm not opposed); it might make sense to do something like the below and add: F: mm/mmu_gather.c --- b/mm/mmu_gather.c | 250 ++++++++++++++++++++++++++++++++++++++++++++++ include/asm-generic/tlb.h | 2 mm/Makefile | 2 mm/memory.c | 247 --------------------------------------------- 4 files changed, 253 insertions(+), 248 deletions(-) --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -149,6 +149,8 @@ static inline void tlb_flush_mmu_tlbonly __tlb_reset_range(tlb); } +extern void tlb_flush_mmu_free(struct mmu_gather *tlb); + static inline void tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size) { --- a/mm/Makefile +++ b/mm/Makefile @@ -22,7 +22,7 @@ KCOV_INSTRUMENT_mmzone.o := n KCOV_INSTRUMENT_vmstat.o := n mmu-y := nommu.o -mmu-$(CONFIG_MMU) := gup.o highmem.o memory.o mincore.o \ +mmu-$(CONFIG_MMU) := gup.o highmem.o memory.o mmu_gather.o mincore.o \ mlock.o mmap.o mprotect.o mremap.o msync.o \ page_vma_mapped.o pagewalk.o pgtable-generic.o \ rmap.o vmalloc.o --- a/mm/memory.c +++ b/mm/memory.c @@ -186,253 +186,6 @@ static void check_sync_rss_stat(struct t #endif /* SPLIT_RSS_COUNTING */ -#ifdef HAVE_GENERIC_MMU_GATHER - -static bool tlb_next_batch(struct mmu_gather *tlb) -{ - struct mmu_gather_batch *batch; - - batch = tlb->active; - if (batch->next) { - tlb->active = batch->next; - return true; - } - - if (tlb->batch_count == MAX_GATHER_BATCH_COUNT) - return false; - - batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0); - if (!batch) - return false; - - tlb->batch_count++; - batch->next = NULL; - batch->nr = 0; - batch->max = MAX_GATHER_BATCH; - - tlb->active->next = batch; - tlb->active = batch; - - return true; -} - -void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, - unsigned long start, unsigned long end) -{ - tlb->mm = mm; - - /* Is it from 0 to ~0? */ - tlb->fullmm = !(start | (end+1)); - tlb->need_flush_all = 0; - tlb->local.next = NULL; - tlb->local.nr = 0; - tlb->local.max = ARRAY_SIZE(tlb->__pages); - tlb->active = &tlb->local; - tlb->batch_count = 0; - -#ifdef CONFIG_HAVE_RCU_TABLE_FREE - tlb->batch = NULL; -#endif - tlb->page_size = 0; - - __tlb_reset_range(tlb); -} - -static void tlb_flush_mmu_free(struct mmu_gather *tlb) -{ - struct mmu_gather_batch *batch; - -#ifdef CONFIG_HAVE_RCU_TABLE_FREE - tlb_table_flush(tlb); -#endif - for (batch = &tlb->local; batch && batch->nr; batch = batch->next) { - free_pages_and_swap_cache(batch->pages, batch->nr); - batch->nr = 0; - } - tlb->active = &tlb->local; -} - -void tlb_flush_mmu(struct mmu_gather *tlb) -{ - tlb_flush_mmu_tlbonly(tlb); - tlb_flush_mmu_free(tlb); -} - -/* tlb_finish_mmu - * Called at the end of the shootdown operation to free up any resources - * that were required. - */ -void arch_tlb_finish_mmu(struct mmu_gather *tlb, - unsigned long start, unsigned long end, bool force) -{ - struct mmu_gather_batch *batch, *next; - - if (force) - __tlb_adjust_range(tlb, start, end - start); - - tlb_flush_mmu(tlb); - - /* keep the page table cache within bounds */ - check_pgt_cache(); - - for (batch = tlb->local.next; batch; batch = next) { - next = batch->next; - free_pages((unsigned long)batch, 0); - } - tlb->local.next = NULL; -} - -/* __tlb_remove_page - * Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)), while - * handling the additional races in SMP caused by other CPUs caching valid - * mappings in their TLBs. Returns the number of free page slots left. - * When out of page slots we must call tlb_flush_mmu(). - *returns true if the caller should flush. - */ -bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size) -{ - struct mmu_gather_batch *batch; - - VM_BUG_ON(!tlb->end); - VM_WARN_ON(tlb->page_size != page_size); - - batch = tlb->active; - /* - * Add the page and check if we are full. If so - * force a flush. - */ - batch->pages[batch->nr++] = page; - if (batch->nr == batch->max) { - if (!tlb_next_batch(tlb)) - return true; - batch = tlb->active; - } - VM_BUG_ON_PAGE(batch->nr > batch->max, page); - - return false; -} - -#endif /* HAVE_GENERIC_MMU_GATHER */ - -#ifdef CONFIG_HAVE_RCU_TABLE_FREE - -/* - * See the comment near struct mmu_table_batch. - */ - -/* - * If we want tlb_remove_table() to imply TLB invalidates. - */ -static inline void tlb_table_invalidate(struct mmu_gather *tlb) -{ -#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE - /* - * Invalidate page-table caches used by hardware walkers. Then we still - * need to RCU-sched wait while freeing the pages because software - * walkers can still be in-flight. - */ - tlb_flush_mmu_tlbonly(tlb); -#endif -} - -static void tlb_remove_table_smp_sync(void *arg) -{ - /* Simply deliver the interrupt */ -} - -static void tlb_remove_table_one(void *table) -{ - /* - * This isn't an RCU grace period and hence the page-tables cannot be - * assumed to be actually RCU-freed. - * - * It is however sufficient for software page-table walkers that rely on - * IRQ disabling. See the comment near struct mmu_table_batch. - */ - smp_call_function(tlb_remove_table_smp_sync, NULL, 1); - __tlb_remove_table(table); -} - -static void tlb_remove_table_rcu(struct rcu_head *head) -{ - struct mmu_table_batch *batch; - int i; - - batch = container_of(head, struct mmu_table_batch, rcu); - - for (i = 0; i < batch->nr; i++) - __tlb_remove_table(batch->tables[i]); - - free_page((unsigned long)batch); -} - -void tlb_table_flush(struct mmu_gather *tlb) -{ - struct mmu_table_batch **batch = &tlb->batch; - - if (*batch) { - tlb_table_invalidate(tlb); - call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu); - *batch = NULL; - } -} - -void tlb_remove_table(struct mmu_gather *tlb, void *table) -{ - struct mmu_table_batch **batch = &tlb->batch; - - if (*batch == NULL) { - *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); - if (*batch == NULL) { - tlb_table_invalidate(tlb); - tlb_remove_table_one(table); - return; - } - (*batch)->nr = 0; - } - - (*batch)->tables[(*batch)->nr++] = table; - if ((*batch)->nr == MAX_TABLE_BATCH) - tlb_table_flush(tlb); -} - -#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ - -/** - * tlb_gather_mmu - initialize an mmu_gather structure for page-table tear-down - * @tlb: the mmu_gather structure to initialize - * @mm: the mm_struct of the target address space - * @start: start of the region that will be removed from the page-table - * @end: end of the region that will be removed from the page-table - * - * Called to initialize an (on-stack) mmu_gather structure for page-table - * tear-down from @mm. The @start and @end are set to 0 and -1 - * respectively when @mm is without users and we're going to destroy - * the full address space (exit/execve). - */ -void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, - unsigned long start, unsigned long end) -{ - arch_tlb_gather_mmu(tlb, mm, start, end); - inc_tlb_flush_pending(tlb->mm); -} - -void tlb_finish_mmu(struct mmu_gather *tlb, - unsigned long start, unsigned long end) -{ - /* - * If there are parallel threads are doing PTE changes on same range - * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB - * flush by batching, a thread has stable TLB entry can fail to flush - * the TLB by observing pte_none|!pte_dirty, for example so flush TLB - * forcefully if we detect parallel PTE batching threads. - */ - bool force = mm_tlb_flush_nested(tlb->mm); - - arch_tlb_finish_mmu(tlb, start, end, force); - dec_tlb_flush_pending(tlb->mm); -} - /* * Note: this doesn't free the actual pages themselves. That * has been handled earlier when unmapping all the memory regions. --- /dev/null +++ b/mm/mmu_gather.c @@ -0,0 +1,250 @@ +#include <linux/smp.h> +#include "asm/tlb.h" + +#ifdef HAVE_GENERIC_MMU_GATHER + +static bool tlb_next_batch(struct mmu_gather *tlb) +{ + struct mmu_gather_batch *batch; + + batch = tlb->active; + if (batch->next) { + tlb->active = batch->next; + return true; + } + + if (tlb->batch_count == MAX_GATHER_BATCH_COUNT) + return false; + + batch = (void *)__get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0); + if (!batch) + return false; + + tlb->batch_count++; + batch->next = NULL; + batch->nr = 0; + batch->max = MAX_GATHER_BATCH; + + tlb->active->next = batch; + tlb->active = batch; + + return true; +} + +void arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + tlb->mm = mm; + + /* Is it from 0 to ~0? */ + tlb->fullmm = !(start | (end+1)); + tlb->need_flush_all = 0; + tlb->local.next = NULL; + tlb->local.nr = 0; + tlb->local.max = ARRAY_SIZE(tlb->__pages); + tlb->active = &tlb->local; + tlb->batch_count = 0; + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb->batch = NULL; +#endif + tlb->page_size = 0; + + __tlb_reset_range(tlb); +} + +void tlb_flush_mmu_free(struct mmu_gather *tlb) +{ + struct mmu_gather_batch *batch; + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + tlb_table_flush(tlb); +#endif + for (batch = &tlb->local; batch && batch->nr; batch = batch->next) { + free_pages_and_swap_cache(batch->pages, batch->nr); + batch->nr = 0; + } + tlb->active = &tlb->local; +} + +void tlb_flush_mmu(struct mmu_gather *tlb) +{ + tlb_flush_mmu_tlbonly(tlb); + tlb_flush_mmu_free(tlb); +} + +/* tlb_finish_mmu + * Called at the end of the shootdown operation to free up any resources + * that were required. + */ +void arch_tlb_finish_mmu(struct mmu_gather *tlb, + unsigned long start, unsigned long end, bool force) +{ + struct mmu_gather_batch *batch, *next; + + if (force) + __tlb_adjust_range(tlb, start, end - start); + + tlb_flush_mmu(tlb); + + /* keep the page table cache within bounds */ + check_pgt_cache(); + + for (batch = tlb->local.next; batch; batch = next) { + next = batch->next; + free_pages((unsigned long)batch, 0); + } + tlb->local.next = NULL; +} + +/* __tlb_remove_page + * Must perform the equivalent to __free_pte(pte_get_and_clear(ptep)), while + * handling the additional races in SMP caused by other CPUs caching valid + * mappings in their TLBs. Returns the number of free page slots left. + * When out of page slots we must call tlb_flush_mmu(). + *returns true if the caller should flush. + */ +bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page, int page_size) +{ + struct mmu_gather_batch *batch; + + VM_BUG_ON(!tlb->end); + VM_WARN_ON(tlb->page_size != page_size); + + batch = tlb->active; + /* + * Add the page and check if we are full. If so + * force a flush. + */ + batch->pages[batch->nr++] = page; + if (batch->nr == batch->max) { + if (!tlb_next_batch(tlb)) + return true; + batch = tlb->active; + } + VM_BUG_ON_PAGE(batch->nr > batch->max, page); + + return false; +} + +#endif /* HAVE_GENERIC_MMU_GATHER */ + +#ifdef CONFIG_HAVE_RCU_TABLE_FREE + +/* + * See the comment near struct mmu_table_batch. + */ + +/* + * If we want tlb_remove_table() to imply TLB invalidates. + */ +static inline void tlb_table_invalidate(struct mmu_gather *tlb) +{ +#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE + /* + * Invalidate page-table caches used by hardware walkers. Then we still + * need to RCU-sched wait while freeing the pages because software + * walkers can still be in-flight. + */ + tlb_flush_mmu_tlbonly(tlb); +#endif +} + +static void tlb_remove_table_smp_sync(void *arg) +{ + /* Simply deliver the interrupt */ +} + +static void tlb_remove_table_one(void *table) +{ + /* + * This isn't an RCU grace period and hence the page-tables cannot be + * assumed to be actually RCU-freed. + * + * It is however sufficient for software page-table walkers that rely on + * IRQ disabling. See the comment near struct mmu_table_batch. + */ + smp_call_function(tlb_remove_table_smp_sync, NULL, 1); + __tlb_remove_table(table); +} + +static void tlb_remove_table_rcu(struct rcu_head *head) +{ + struct mmu_table_batch *batch; + int i; + + batch = container_of(head, struct mmu_table_batch, rcu); + + for (i = 0; i < batch->nr; i++) + __tlb_remove_table(batch->tables[i]); + + free_page((unsigned long)batch); +} + +void tlb_table_flush(struct mmu_gather *tlb) +{ + struct mmu_table_batch **batch = &tlb->batch; + + if (*batch) { + tlb_table_invalidate(tlb); + call_rcu_sched(&(*batch)->rcu, tlb_remove_table_rcu); + *batch = NULL; + } +} + +void tlb_remove_table(struct mmu_gather *tlb, void *table) +{ + struct mmu_table_batch **batch = &tlb->batch; + + if (*batch == NULL) { + *batch = (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); + if (*batch == NULL) { + tlb_table_invalidate(tlb); + tlb_remove_table_one(table); + return; + } + (*batch)->nr = 0; + } + + (*batch)->tables[(*batch)->nr++] = table; + if ((*batch)->nr == MAX_TABLE_BATCH) + tlb_table_flush(tlb); +} + +#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ + +/** + * tlb_gather_mmu - initialize an mmu_gather structure for page-table tear-down + * @tlb: the mmu_gather structure to initialize + * @mm: the mm_struct of the target address space + * @start: start of the region that will be removed from the page-table + * @end: end of the region that will be removed from the page-table + * + * Called to initialize an (on-stack) mmu_gather structure for page-table + * tear-down from @mm. The @start and @end are set to 0 and -1 + * respectively when @mm is without users and we're going to destroy + * the full address space (exit/execve). + */ +void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, + unsigned long start, unsigned long end) +{ + arch_tlb_gather_mmu(tlb, mm, start, end); + inc_tlb_flush_pending(tlb->mm); +} + +void tlb_finish_mmu(struct mmu_gather *tlb, + unsigned long start, unsigned long end) +{ + /* + * If there are parallel threads are doing PTE changes on same range + * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB + * flush by batching, a thread has stable TLB entry can fail to flush + * the TLB by observing pte_none|!pte_dirty, for example so flush TLB + * forcefully if we detect parallel PTE batching threads. + */ + bool force = mm_tlb_flush_nested(tlb->mm); + + arch_tlb_finish_mmu(tlb, start, end, force); + dec_tlb_flush_pending(tlb->mm); +} +
On Fri, 31 Aug 2018 12:10:14 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Aug 31, 2018 at 10:54:18AM +0100, Will Deacon wrote: > > > Proposal below (omitted Linus because that seems to be the pattern elsewhere > > in the file and he's not going to shout at himself when things break :) > > Anybody I've missed? > > > > Will > > > > --->8 > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index a5b256b25905..7224b5618883 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -9681,6 +9681,15 @@ S: Maintained > > F: arch/arm/boot/dts/mmp* > > F: arch/arm/mach-mmp/ > > > > +MMU GATHER AND TLB INVALIDATION > > +M: Will Deacon <will.deacon@arm.com> > > +M: Nick Piggin <npiggin@gmail.com> Oh gee, I suppose. powerpc hash is kind of interesting because it's crazy, Aneesh knows that code a lot better than I do. radix modulo some minor details of exact instructions is fairly like x86 (he wrote a lot of that code too AFAIK). > > +M: Peter Zijlstra <peterz@infradead.org> > > +L: linux-arch@vger.kernel.org Maybe put linux-mm as well? Or should there just be one list? > > +S: Maintained > > +F: include/asm-generic/tlb.h > > +F: arch/*/include/asm/tlb.h > > + > > MN88472 MEDIA DRIVER > > M: Antti Palosaari <crope@iki.fi> > > L: linux-media@vger.kernel.org > > If we're going to do that (and I'm not opposed); it might make sense to > do something like the below and add: > > F: mm/mmu_gather.c I think that is a good idea regardless. How do feel about calling it tlb.c? Easier to type and autocompletes sooner. > > --- > b/mm/mmu_gather.c | 250 ++++++++++++++++++++++++++++++++++++++++++++++ > include/asm-generic/tlb.h | 2 > mm/Makefile | 2 > mm/memory.c | 247 ---------------------------------------------
On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote: > Oh gee, I suppose. powerpc hash is kind of interesting because it's > crazy, Aneesh knows that code a lot better than I do. radix modulo > some minor details of exact instructions is fairly like x86 The whole TLB broadcast vs explicit IPIs is a fairly big difference in my book. Anyway, have you guys tried the explicit IPI approach? Depending on how IPIs are routed vs broadcasts it might save a little bus traffic. No point in getting all CPUs to process the TLBI when there's only a hand full that really need it. OTOH, I suppose the broadcast thing has been optimized to death on the hardware side, so who knows..
On Fri, Aug 31, 2018 at 12:49:45PM +0200, Peter Zijlstra wrote: > On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote: > > Oh gee, I suppose. powerpc hash is kind of interesting because it's > > crazy, Aneesh knows that code a lot better than I do. radix modulo > > some minor details of exact instructions is fairly like x86 > > The whole TLB broadcast vs explicit IPIs is a fairly big difference in > my book. > > Anyway, have you guys tried the explicit IPI approach? Depending on how > IPIs are routed vs broadcasts it might save a little bus traffic. No > point in getting all CPUs to process the TLBI when there's only a hand > full that really need it. > > OTOH, I suppose the broadcast thing has been optimized to death on the > hardware side, so who knows.. You also can't IPI an IOMMU or a GPU ;) Will
On Fri, Aug 31, 2018 at 12:12:48PM +0100, Will Deacon wrote: > On Fri, Aug 31, 2018 at 12:49:45PM +0200, Peter Zijlstra wrote: > > On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote: > > > Oh gee, I suppose. powerpc hash is kind of interesting because it's > > > crazy, Aneesh knows that code a lot better than I do. radix modulo > > > some minor details of exact instructions is fairly like x86 > > > > The whole TLB broadcast vs explicit IPIs is a fairly big difference in > > my book. > > > > Anyway, have you guys tried the explicit IPI approach? Depending on how > > IPIs are routed vs broadcasts it might save a little bus traffic. No > > point in getting all CPUs to process the TLBI when there's only a hand > > full that really need it. > > > > OTOH, I suppose the broadcast thing has been optimized to death on the > > hardware side, so who knows.. > > You also can't IPI an IOMMU or a GPU ;) Oh, right you are. I suppose that is why x86-iommu is using those mmu_notifiers.
On Fri, 31 Aug 2018 12:49:45 +0200 Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote: > > Oh gee, I suppose. powerpc hash is kind of interesting because it's > > crazy, Aneesh knows that code a lot better than I do. radix modulo > > some minor details of exact instructions is fairly like x86 > > The whole TLB broadcast vs explicit IPIs is a fairly big difference in > my book. That's true I guess. Maybe arm64 is closer. > Anyway, have you guys tried the explicit IPI approach? Depending on how > IPIs are routed vs broadcasts it might save a little bus traffic. No > point in getting all CPUs to process the TLBI when there's only a hand > full that really need it. It has been looked at now and again there's a lot of variables to weigh. And things are also sized and speced to cover various hypervisors, OSes, hash and radix, etc. This is something we need to evaluate on radix a bit better. > > OTOH, I suppose the broadcast thing has been optimized to death on the > hardware side, so who knows.. There are some advantages of doing it in hardware. Also some of doing IPIs though. The "problem" is actually Linux is well optimised and it can be hard to notice much impact until you get to big systems. At least I don't know of any problem workloads outside micro benchmarks or stress tests. Thanks, Nick
On Fri, Aug 31, 2018 at 08:32:34PM +1000, Nicholas Piggin wrote: > On Fri, 31 Aug 2018 12:10:14 +0200 > Peter Zijlstra <peterz@infradead.org> wrote: > > > On Fri, Aug 31, 2018 at 10:54:18AM +0100, Will Deacon wrote: > > > > > Proposal below (omitted Linus because that seems to be the pattern elsewhere > > > in the file and he's not going to shout at himself when things break :) > > > Anybody I've missed? > > > > > > Will > > > > > > --->8 > > > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > > index a5b256b25905..7224b5618883 100644 > > > --- a/MAINTAINERS > > > +++ b/MAINTAINERS > > > @@ -9681,6 +9681,15 @@ S: Maintained > > > F: arch/arm/boot/dts/mmp* > > > F: arch/arm/mach-mmp/ > > > > > > +MMU GATHER AND TLB INVALIDATION > > > +M: Will Deacon <will.deacon@arm.com> > > > +M: Nick Piggin <npiggin@gmail.com> > > Oh gee, I suppose. powerpc hash is kind of interesting because it's > crazy, Aneesh knows that code a lot better than I do. radix modulo > some minor details of exact instructions is fairly like x86 (he > wrote a lot of that code too AFAIK). Sure, as long as we have Power represented here. Would you rather add Aneesh instead of yourself, or shall we just add you both? > > > > +M: Peter Zijlstra <peterz@infradead.org> > > > +L: linux-arch@vger.kernel.org > > Maybe put linux-mm as well? Or should there just be one list? If we do the landgrab on mmu_gather (which I think makes sense), then adding both lists makes sense to me. I'll spin this as a proper patch, along with Peter's code move. > > > +S: Maintained > > > +F: include/asm-generic/tlb.h > > > +F: arch/*/include/asm/tlb.h > > > + > > > MN88472 MEDIA DRIVER > > > M: Antti Palosaari <crope@iki.fi> > > > L: linux-media@vger.kernel.org > > > > If we're going to do that (and I'm not opposed); it might make sense to > > do something like the below and add: > > > > F: mm/mmu_gather.c > > I think that is a good idea regardless. How do feel about calling it > tlb.c? Easier to type and autocompletes sooner. No strong opinion on name, but I slightly prefer mmu_gather.c so that it avoids any remote possibility of confusion with tlb.c vs hugetlb.c Will