mbox series

[00/24] accel/tcg: Rewrite user-only vma tracking

Message ID 20221006031113.1139454-1-richard.henderson@linaro.org
Headers show
Series accel/tcg: Rewrite user-only vma tracking | expand

Message

Richard Henderson Oct. 6, 2022, 3:10 a.m. UTC
The primary motivator here are the numerous bug reports (e.g. #290)
about not being able to handle very large memory allocations.
I presume all or most of these are due to guest use of the clang
address sanitizer, which allocates a massive shadow vma.

This patch set copies the linux kernel code for interval trees,
which is what the kernel itself uses for managing vmas.  I then
purge all (real) use of PageDesc from user-only.  This is easy
for user-only because everything tricky happens under mmap_lock();

I have thought only briefly about using interval trees for system
mode too, but the locking situation there is more difficult.  So
for now that code gets moved around but not substantially changed.

The test case from #290 is added to test/tcg/multiarch/.
Before this patch set, on my moderately beefy laptop, it takes 39s
and has an RSS of 28GB before the qemu process is killed.  After
the patch set, the test case successfully allocates 16TB and
completes in 0.013s.


r~


Richard Henderson (24):
  util: Add interval-tree.c
  accel/tcg: Make page_alloc_target_data allocation constant
  accel/tcg: Remove disabled debug in translate-all.c
  accel/tcg: Split out PageDesc to internal.h
  accel/tcg: Split out tb-maint.c
  accel/tcg: Move assert_no_pages_locked to internal.h
  accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC
  accel/tcg: Remove duplicate store to tb->page_addr[]
  accel/tcg: Introduce tb_{set_}page_addr{0,1}
  accel/tcg: Rename tb_invalidate_phys_page
  accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter
  accel/tcg: Unify declarations of tb_invalidate_phys_range
  accel/tcg: Use tb_invalidate_phys_page in page_set_flags
  accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
  accel/tcg: Use interval tree for TBs in user-only mode
  accel/tcg: Use page_reset_target_data in page_set_flags
  accel/tcg: Use tb_invalidate_phys_range in page_set_flags
  accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c
  accel/tcg: Simplify page_get/alloc_target_data
  accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE
  accel/tcg: Move page_{get,set}_flags to user-exec.c
  accel/tcg: Use interval tree for user-only page tracking
  accel/tcg: Move PageDesc tree into tb-maint.c for system
  accel/tcg: Move remainder of page locking to tb-maint.c

 accel/tcg/internal.h            |   40 +
 include/exec/cpu-all.h          |   22 +-
 include/exec/exec-all.h         |   75 +-
 include/exec/ram_addr.h         |    2 -
 include/exec/translate-all.h    |    8 +-
 include/qemu/interval-tree.h    |   99 ++
 target/arm/cpu.h                |    8 +
 target/arm/internals.h          |    4 -
 accel/tcg/cpu-exec.c            |    9 +-
 accel/tcg/tb-maint.c            | 1222 ++++++++++++++++++++++
 accel/tcg/translate-all.c       | 1683 +------------------------------
 accel/tcg/translator.c          |    9 +-
 accel/tcg/user-exec.c           |  662 ++++++++++++
 bsd-user/mmap.c                 |    2 -
 cpu.c                           |    4 +-
 linux-user/mmap.c               |    4 -
 target/arm/mte_helper.c         |    5 -
 tests/tcg/multiarch/test-vma.c  |   22 +
 tests/unit/test-interval-tree.c |  209 ++++
 util/interval-tree.c            |  881 ++++++++++++++++
 accel/tcg/meson.build           |    1 +
 tests/unit/meson.build          |    1 +
 util/meson.build                |    1 +
 23 files changed, 3235 insertions(+), 1738 deletions(-)
 create mode 100644 include/qemu/interval-tree.h
 create mode 100644 accel/tcg/tb-maint.c
 create mode 100644 tests/tcg/multiarch/test-vma.c
 create mode 100644 tests/unit/test-interval-tree.c
 create mode 100644 util/interval-tree.c

Comments

Richard Henderson Oct. 24, 2022, 11:05 p.m. UTC | #1
Ping.

On 10/6/22 13:10, Richard Henderson wrote:
> The primary motivator here are the numerous bug reports (e.g. #290)
> about not being able to handle very large memory allocations.
> I presume all or most of these are due to guest use of the clang
> address sanitizer, which allocates a massive shadow vma.
> 
> This patch set copies the linux kernel code for interval trees,
> which is what the kernel itself uses for managing vmas.  I then
> purge all (real) use of PageDesc from user-only.  This is easy
> for user-only because everything tricky happens under mmap_lock();
> 
> I have thought only briefly about using interval trees for system
> mode too, but the locking situation there is more difficult.  So
> for now that code gets moved around but not substantially changed.
> 
> The test case from #290 is added to test/tcg/multiarch/.
> Before this patch set, on my moderately beefy laptop, it takes 39s
> and has an RSS of 28GB before the qemu process is killed.  After
> the patch set, the test case successfully allocates 16TB and
> completes in 0.013s.
> 
> 
> r~
> 
> 
> Richard Henderson (24):
>    util: Add interval-tree.c
>    accel/tcg: Make page_alloc_target_data allocation constant
>    accel/tcg: Remove disabled debug in translate-all.c
>    accel/tcg: Split out PageDesc to internal.h
>    accel/tcg: Split out tb-maint.c
>    accel/tcg: Move assert_no_pages_locked to internal.h
>    accel/tcg: Drop cpu_get_tb_cpu_state from TARGET_HAS_PRECISE_SMC
>    accel/tcg: Remove duplicate store to tb->page_addr[]
>    accel/tcg: Introduce tb_{set_}page_addr{0,1}
>    accel/tcg: Rename tb_invalidate_phys_page
>    accel/tcg: Rename tb_invalidate_phys_page_range and drop end parameter
>    accel/tcg: Unify declarations of tb_invalidate_phys_range
>    accel/tcg: Use tb_invalidate_phys_page in page_set_flags
>    accel/tcg: Call tb_invalidate_phys_page for PAGE_RESET
>    accel/tcg: Use interval tree for TBs in user-only mode
>    accel/tcg: Use page_reset_target_data in page_set_flags
>    accel/tcg: Use tb_invalidate_phys_range in page_set_flags
>    accel/tcg: Move TARGET_PAGE_DATA_SIZE impl to user-exec.c
>    accel/tcg: Simplify page_get/alloc_target_data
>    accel/tcg: Use interval tree for TARGET_PAGE_DATA_SIZE
>    accel/tcg: Move page_{get,set}_flags to user-exec.c
>    accel/tcg: Use interval tree for user-only page tracking
>    accel/tcg: Move PageDesc tree into tb-maint.c for system
>    accel/tcg: Move remainder of page locking to tb-maint.c
> 
>   accel/tcg/internal.h            |   40 +
>   include/exec/cpu-all.h          |   22 +-
>   include/exec/exec-all.h         |   75 +-
>   include/exec/ram_addr.h         |    2 -
>   include/exec/translate-all.h    |    8 +-
>   include/qemu/interval-tree.h    |   99 ++
>   target/arm/cpu.h                |    8 +
>   target/arm/internals.h          |    4 -
>   accel/tcg/cpu-exec.c            |    9 +-
>   accel/tcg/tb-maint.c            | 1222 ++++++++++++++++++++++
>   accel/tcg/translate-all.c       | 1683 +------------------------------
>   accel/tcg/translator.c          |    9 +-
>   accel/tcg/user-exec.c           |  662 ++++++++++++
>   bsd-user/mmap.c                 |    2 -
>   cpu.c                           |    4 +-
>   linux-user/mmap.c               |    4 -
>   target/arm/mte_helper.c         |    5 -
>   tests/tcg/multiarch/test-vma.c  |   22 +
>   tests/unit/test-interval-tree.c |  209 ++++
>   util/interval-tree.c            |  881 ++++++++++++++++
>   accel/tcg/meson.build           |    1 +
>   tests/unit/meson.build          |    1 +
>   util/meson.build                |    1 +
>   23 files changed, 3235 insertions(+), 1738 deletions(-)
>   create mode 100644 include/qemu/interval-tree.h
>   create mode 100644 accel/tcg/tb-maint.c
>   create mode 100644 tests/tcg/multiarch/test-vma.c
>   create mode 100644 tests/unit/test-interval-tree.c
>   create mode 100644 util/interval-tree.c
>