[2/3] softmmu: Use async_run_on_cpu in tcg_commit

Message ID	20230826232415.80233-3-richard.henderson@linaro.org
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Cc: peter.maydell@linaro.org, pbonzini@redhat.com, alex.bennee@linaro.org Subject: [PATCH 2/3] softmmu: Use async_run_on_cpu in tcg_commit Date: Sat, 26 Aug 2023 16:24:14 -0700 Message-Id: <20230826232415.80233-3-richard.henderson@linaro.org> In-Reply-To: <20230826232415.80233-1-richard.henderson@linaro.org> References: <20230826232415.80233-1-richard.henderson@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::22b; envelope-from=richard.henderson@linaro.org; helo=mail-oi1-x22b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: qemu-devel-bounces+patch=linaro.org@nongnu.org
Series	softmmu: Use async_run_on_cpu in tcg_commit \| expand [0/3] softmmu: Use async_run_on_cpu in tcg_commit [1/3] softmmu: Assert data in bounds in iotlb_to_section [2/3] softmmu: Use async_run_on_cpu in tcg_commit [3/3] softmmu: Remove cpu_reloading_memory_map as unused

Richard Henderson Aug. 26, 2023, 11:24 p.m. UTC

After system startup, run the update to memory_dispatch
and the tlb_flush on the cpu.  This eliminates a race,
wherein a running cpu sees the memory_dispatch change
but has not yet seen the tlb_flush.

Since the update now happens on the cpu, we need not use
qatomic_rcu_read to protect the read of memory_dispatch.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1826
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1834
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1846
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 softmmu/physmem.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

Alex Bennée Aug. 27, 2023, 9:58 a.m. UTC | #1

Richard Henderson <richard.henderson@linaro.org> writes:

> After system startup, run the update to memory_dispatch
> and the tlb_flush on the cpu.  This eliminates a race,
> wherein a running cpu sees the memory_dispatch change
> but has not yet seen the tlb_flush.
>
> Since the update now happens on the cpu, we need not use
> qatomic_rcu_read to protect the read of memory_dispatch.
>
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1826
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1834
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1846
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>


> ---
>  softmmu/physmem.c | 40 +++++++++++++++++++++++++++++-----------
>  1 file changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 7597dc1c39..18277ddd67 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -680,8 +680,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr orig_addr,
>      IOMMUTLBEntry iotlb;
>      int iommu_idx;
>      hwaddr addr = orig_addr;
> -    AddressSpaceDispatch *d =
> -        qatomic_rcu_read(&cpu->cpu_ases[asidx].memory_dispatch);
> +    AddressSpaceDispatch *d = cpu->cpu_ases[asidx].memory_dispatch;
>  
>      for (;;) {
>          section = address_space_translate_internal(d, addr, &addr, plen, false);
> @@ -2412,7 +2411,7 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>  {
>      int asidx = cpu_asidx_from_attrs(cpu, attrs);
>      CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx];
> -    AddressSpaceDispatch *d = qatomic_rcu_read(&cpuas->memory_dispatch);
> +    AddressSpaceDispatch *d = cpuas->memory_dispatch;
>      int section_index = index & ~TARGET_PAGE_MASK;
>      MemoryRegionSection *ret;
>  
> @@ -2487,23 +2486,42 @@ static void tcg_log_global_after_sync(MemoryListener *listener)
>      }
>  }
>  
> +static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
> +{
> +    CPUAddressSpace *cpuas = data.host_ptr;
> +
> +    cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
> +    tlb_flush(cpu);
> +}
> +
>  static void tcg_commit(MemoryListener *listener)
>  {
>      CPUAddressSpace *cpuas;
> -    AddressSpaceDispatch *d;
> +    CPUState *cpu;
>  
>      assert(tcg_enabled());
>      /* since each CPU stores ram addresses in its TLB cache, we must
>         reset the modified entries */
>      cpuas = container_of(listener, CPUAddressSpace, tcg_as_listener);
> -    cpu_reloading_memory_map();
> -    /* The CPU and TLB are protected by the iothread lock.
> -     * We reload the dispatch pointer now because cpu_reloading_memory_map()
> -     * may have split the RCU critical section.
> +    cpu = cpuas->cpu;
> +
> +    /*
> +     * Defer changes to as->memory_dispatch until the cpu is quiescent.
> +     * Otherwise we race between (1) other cpu threads and (2) ongoing
> +     * i/o for the current cpu thread, with data cached by mmu_lookup().
> +     *
> +     * In addition, queueing the work function will kick the cpu back to
> +     * the main loop, which will end the RCU critical section and reclaim
> +     * the memory data structures.
> +     *
> +     * That said, the listener is also called during realize, before
> +     * all of the tcg machinery for run-on is initialized: thus halt_cond.
>       */
> -    d = address_space_to_dispatch(cpuas->as);
> -    qatomic_rcu_set(&cpuas->memory_dispatch, d);
> -    tlb_flush(cpuas->cpu);
> +    if (cpu->halt_cond) {
> +        async_run_on_cpu(cpu, tcg_commit_cpu, RUN_ON_CPU_HOST_PTR(cpuas));
> +    } else {
> +        tcg_commit_cpu(cpu, RUN_ON_CPU_HOST_PTR(cpuas));
> +    }
>  }
>  
>  static void memory_map_init(void)

Richard Henderson Aug. 27, 2023, 2:54 p.m. UTC | #2

On 8/26/23 16:24, Richard Henderson wrote:
> +static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
> +{
> +    CPUAddressSpace *cpuas = data.host_ptr;
> +
> +    cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
> +    tlb_flush(cpu);
> +}

Question: do I need to take the iothread lock here, while re-generating the address space 
dispatch?


r~

Alex Bennée Aug. 27, 2023, 8:17 p.m. UTC | #3

Richard Henderson <richard.henderson@linaro.org> writes:

> On 8/26/23 16:24, Richard Henderson wrote:
>> +static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
>> +{
>> +    CPUAddressSpace *cpuas = data.host_ptr;
>> +
>> +    cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
>> +    tlb_flush(cpu);
>> +}
>
> Question: do I need to take the iothread lock here, while
> re-generating the address space dispatch?

Does it regenerate or just collect a current live version under RCU?

Richard Henderson Aug. 27, 2023, 9:16 p.m. UTC | #4

On 8/27/23 13:17, Alex Bennée wrote:
> 
> Richard Henderson <richard.henderson@linaro.org> writes:
> 
>> On 8/26/23 16:24, Richard Henderson wrote:
>>> +static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
>>> +{
>>> +    CPUAddressSpace *cpuas = data.host_ptr;
>>> +
>>> +    cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
>>> +    tlb_flush(cpu);
>>> +}
>>
>> Question: do I need to take the iothread lock here, while
>> re-generating the address space dispatch?
> 
> Does it regenerate or just collect a current live version under RCU?

Quite right, just reads.


r~

Jonathan Cameron Aug. 29, 2023, 10:50 a.m. UTC | #5

On Sat, 26 Aug 2023 16:24:14 -0700
Richard Henderson <richard.henderson@linaro.org> wrote:

> After system startup, run the update to memory_dispatch
> and the tlb_flush on the cpu.  This eliminates a race,
> wherein a running cpu sees the memory_dispatch change
> but has not yet seen the tlb_flush.
> 
> Since the update now happens on the cpu, we need not use
> qatomic_rcu_read to protect the read of memory_dispatch.
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1826
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1834
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1846
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

I'm not pretending I've understood the change though, just that
it makes the crashes I saw go away.

Jonathan

> ---
>  softmmu/physmem.c | 40 +++++++++++++++++++++++++++++-----------
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index 7597dc1c39..18277ddd67 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -680,8 +680,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr orig_addr,
>      IOMMUTLBEntry iotlb;
>      int iommu_idx;
>      hwaddr addr = orig_addr;
> -    AddressSpaceDispatch *d =
> -        qatomic_rcu_read(&cpu->cpu_ases[asidx].memory_dispatch);
> +    AddressSpaceDispatch *d = cpu->cpu_ases[asidx].memory_dispatch;
>  
>      for (;;) {
>          section = address_space_translate_internal(d, addr, &addr, plen, false);
> @@ -2412,7 +2411,7 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
>  {
>      int asidx = cpu_asidx_from_attrs(cpu, attrs);
>      CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx];
> -    AddressSpaceDispatch *d = qatomic_rcu_read(&cpuas->memory_dispatch);
> +    AddressSpaceDispatch *d = cpuas->memory_dispatch;
>      int section_index = index & ~TARGET_PAGE_MASK;
>      MemoryRegionSection *ret;
>  
> @@ -2487,23 +2486,42 @@ static void tcg_log_global_after_sync(MemoryListener *listener)
>      }
>  }
>  
> +static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
> +{
> +    CPUAddressSpace *cpuas = data.host_ptr;
> +
> +    cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
> +    tlb_flush(cpu);
> +}
> +
>  static void tcg_commit(MemoryListener *listener)
>  {
>      CPUAddressSpace *cpuas;
> -    AddressSpaceDispatch *d;
> +    CPUState *cpu;
>  
>      assert(tcg_enabled());
>      /* since each CPU stores ram addresses in its TLB cache, we must
>         reset the modified entries */
>      cpuas = container_of(listener, CPUAddressSpace, tcg_as_listener);
> -    cpu_reloading_memory_map();
> -    /* The CPU and TLB are protected by the iothread lock.
> -     * We reload the dispatch pointer now because cpu_reloading_memory_map()
> -     * may have split the RCU critical section.
> +    cpu = cpuas->cpu;
> +
> +    /*
> +     * Defer changes to as->memory_dispatch until the cpu is quiescent.
> +     * Otherwise we race between (1) other cpu threads and (2) ongoing
> +     * i/o for the current cpu thread, with data cached by mmu_lookup().
> +     *
> +     * In addition, queueing the work function will kick the cpu back to
> +     * the main loop, which will end the RCU critical section and reclaim
> +     * the memory data structures.
> +     *
> +     * That said, the listener is also called during realize, before
> +     * all of the tcg machinery for run-on is initialized: thus halt_cond.
>       */
> -    d = address_space_to_dispatch(cpuas->as);
> -    qatomic_rcu_set(&cpuas->memory_dispatch, d);
> -    tlb_flush(cpuas->cpu);
> +    if (cpu->halt_cond) {
> +        async_run_on_cpu(cpu, tcg_commit_cpu, RUN_ON_CPU_HOST_PTR(cpuas));
> +    } else {
> +        tcg_commit_cpu(cpu, RUN_ON_CPU_HOST_PTR(cpuas));
> +    }
>  }
>  
>  static void memory_map_init(void)

[2/3] softmmu: Use async_run_on_cpu in tcg_commit

Commit Message

Comments

Patch