Message ID | 20241009150855.804605-1-richard.henderson@linaro.org |
---|---|
Headers | show |
Series | accel/tcg: Convert victim tlb to IntervalTree | expand |
On Wed, 9 Oct 2024, Richard Henderson wrote: > Based-on: 20241009000453.315652-1-richard.henderson@linaro.org > ("[PATCH v3 00/20] accel/tcg: Introduce tlb_fill_align hook") > > The initial idea was: how much can we do with an intelligent data > structure for the same cost as a linear search through an array? > > This is an initial installment along these lines. This is about > as far as I can go without first converting all targets to the > new tlb_fill_align hook. Indeed, the final two patches will not > compile with all targets enabled, but hint at the direction of > the next steps. > > I do not expect large perf changes with this patch set. I will > be happy if performance comes out even. Then what's the point? Diffstat below does not show it's less code and if it's also not more efficient then what's the reason to change it? I'm not opposed to any change just don't see an explanation of what motivates this series. Regards, BALATON Zoltan > > r~ > > Richard Henderson (23): > util/interval-tree: Introduce interval_tree_free_nodes > accel/tcg: Split out tlbfast_flush_locked > accel/tcg: Split out tlbfast_{index,entry} > accel/tcg: Split out tlbfast_flush_range_locked > accel/tcg: Fix flags usage in mmu_lookup1, atomic_mmu_lookup > accel/tcg: Early exit for zero length in tlb_flush_range_by_mmuidx* > accel/tcg: Flush entire tlb when a masked range wraps > accel/tcg: Add IntervalTreeRoot to CPUTLBDesc > accel/tcg: Populate IntervalTree in tlb_set_page_full > accel/tcg: Remove IntervalTree entry in tlb_flush_page_locked > accel/tcg: Remove IntervalTree entries in tlb_flush_range_locked > accel/tcg: Process IntervalTree entries in tlb_reset_dirty > accel/tcg: Process IntervalTree entries in tlb_set_dirty > accel/tcg: Replace victim_tlb_hit with tlbtree_hit > accel/tcg: Remove the victim tlb > include/exec/tlb-common: Move CPUTLBEntryFull from hw/core/cpu.h > accel/tcg: Delay plugin adjustment in probe_access_internal > accel/tcg: Call cpu_ld*_code_mmu from cpu_ld*_code > accel/tcg: Always use IntervalTree for code lookups > accel/tcg: Link CPUTLBEntry to CPUTLBEntryTree > accel/tcg: Remove CPUTLBDesc.fulltlb > accel/tcg: Drop TCGCPUOps.tlb_fill -- NOTYET > accel/tcg: Unexport tlb_set_page* > > include/exec/cpu-all.h | 3 + > include/exec/exec-all.h | 57 ---- > include/exec/tlb-common.h | 68 +++- > include/hw/core/cpu.h | 75 +---- > include/hw/core/tcg-cpu-ops.h | 10 - > include/qemu/interval-tree.h | 11 + > accel/tcg/cputlb.c | 599 +++++++++++++++++++--------------- > util/interval-tree.c | 20 ++ > util/selfmap.c | 13 +- > 9 files changed, 431 insertions(+), 425 deletions(-) > >
On 10/9/24 09:27, BALATON Zoltan wrote: > On Wed, 9 Oct 2024, Richard Henderson wrote: >> Based-on: 20241009000453.315652-1-richard.henderson@linaro.org >> ("[PATCH v3 00/20] accel/tcg: Introduce tlb_fill_align hook") >> >> The initial idea was: how much can we do with an intelligent data >> structure for the same cost as a linear search through an array? >> >> This is an initial installment along these lines. This is about >> as far as I can go without first converting all targets to the >> new tlb_fill_align hook. Indeed, the final two patches will not >> compile with all targets enabled, but hint at the direction of >> the next steps. >> >> I do not expect large perf changes with this patch set. I will >> be happy if performance comes out even. > > Then what's the point? Eventually fixing the page size > TARGET_PAGE_SIZE performance issues. E.g. with a 16k or 64k aarch64 guest kernel, we still have TARGET_PAGE_SIZE at 4k, so all guest pages are "large", and so run into our current behaviour of flushing the entire tlb too often. Even without that, I expect further cleanups to improve performance, we're just not there yet. r~
On 10/9/24 10:10, Richard Henderson wrote: > On 10/9/24 09:27, BALATON Zoltan wrote: >> On Wed, 9 Oct 2024, Richard Henderson wrote: >>> Based-on: 20241009000453.315652-1-richard.henderson@linaro.org >>> ("[PATCH v3 00/20] accel/tcg: Introduce tlb_fill_align hook") >>> >>> The initial idea was: how much can we do with an intelligent data >>> structure for the same cost as a linear search through an array? >>> >>> This is an initial installment along these lines. This is about >>> as far as I can go without first converting all targets to the >>> new tlb_fill_align hook. Indeed, the final two patches will not >>> compile with all targets enabled, but hint at the direction of >>> the next steps. >>> >>> I do not expect large perf changes with this patch set. I will >>> be happy if performance comes out even. >> >> Then what's the point? > > Eventually fixing the page size > TARGET_PAGE_SIZE performance issues. > > E.g. with a 16k or 64k aarch64 guest kernel, we still have TARGET_PAGE_SIZE at 4k, so all > guest pages are "large", and so run into our current behaviour of flushing the entire tlb > too often. > > Even without that, I expect further cleanups to improve performance, we're just not there yet. > > > r~ > Does merging pages over a given range be something we could benefit from too? In this case, entries in our tlbtree would have varying size, allowing us to cover more space with a single entry. It would allow us to have a more shallow tlbtree (given that it's rebalanced when modified) and speed up walking operations. I'm not sure if it can help performance wise though.
On 10/9/24 17:50, Pierrick Bouvier wrote: >> Eventually fixing the page size > TARGET_PAGE_SIZE performance issues. >> >> E.g. with a 16k or 64k aarch64 guest kernel, we still have TARGET_PAGE_SIZE at 4k, so all >> guest pages are "large", and so run into our current behaviour of flushing the entire tlb >> too often. >> >> Even without that, I expect further cleanups to improve performance, we're just not >> there yet. >> >> >> r~ >> > > Does merging pages over a given range be something we could benefit from too? In this > case, entries in our tlbtree would have varying size, allowing us to cover more space with > a single entry. I don't know. I kinda doubt it, because of tlb flushing mechanics. But we shall see once everything up until that point is done. r~