diff mbox series

[4/4] tcg:tlb: use tcg_debug_assert() in assert_cpu_is_self()

Message ID 20250225184628.3590671-5-alex.bennee@linaro.org
State New
Headers show
Series cputlb: add tlb_flush_other_cpu | expand

Commit Message

Alex Bennée Feb. 25, 2025, 6:46 p.m. UTC
From: Igor Mammedov <imammedo@redhat.com>

that will enable assert_cpu_is_self when QEMU is configured with
   --enable-debug
without need for manual patching DEBUG_TLB_GATE define.

Need to manually path DEBUG_TLB_GATE define to enable assert,
let regression caused by [1] creep in unnoticed.

1) 30933c4fb4f3d ("tcg/cputlb: remove other-cpu capability from TLB flushing")

Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Suggested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250207162048.1890669-5-imammedo@redhat.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 accel/tcg/cputlb.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

Comments

Richard Henderson Feb. 25, 2025, 8:02 p.m. UTC | #1
On 2/25/25 10:46, Alex Bennée wrote:
> From: Igor Mammedov <imammedo@redhat.com>
> 
> that will enable assert_cpu_is_self when QEMU is configured with
>     --enable-debug
> without need for manual patching DEBUG_TLB_GATE define.
> 
> Need to manually path DEBUG_TLB_GATE define to enable assert,
> let regression caused by [1] creep in unnoticed.
> 
> 1) 30933c4fb4f3d ("tcg/cputlb: remove other-cpu capability from TLB flushing")
> 
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> Suggested-by: Alex Bennée <alex.bennee@linaro.org>
> Message-Id: <20250207162048.1890669-5-imammedo@redhat.com>
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   accel/tcg/cputlb.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> index fc16a576f0..65b04b1055 100644
> --- a/accel/tcg/cputlb.c
> +++ b/accel/tcg/cputlb.c
> @@ -73,11 +73,8 @@
>       } \
>   } while (0)
>   
> -#define assert_cpu_is_self(cpu) do {                              \
> -        if (DEBUG_TLB_GATE) {                                     \
> -            g_assert(!(cpu)->created || qemu_cpu_is_self(cpu));   \
> -        }                                                         \
> -    } while (0)
> +#define assert_cpu_is_self(cpu)                             \
> +    tcg_debug_assert(!(cpu)->created || qemu_cpu_is_self(cpu))

I think this check is just wrong or incomplete.

The intent here is to check that we're not attempting to modify the softmmu tlb 
asynchronously while a cpu is running.

(0) A synchronous flush to the current cpu is (obviously?) ok.
(1) A flush to a cpu that is not yet created is (or should be) a no-op.

Not checked here are any of the other reasons a flush might be ok:

(2) The system as a whole is stopped, on the way in from migration/vmload.
(3) The cpu is offline, on the way in from poweroff/reset.

If we decide that {1, 2, 3} are too complicated to check, then perhaps the solution to 
queue flushes to the cpu's workqueue is the appropriate solution.  But so far all I see is 
that we have an incomplete check, and no ready explanation for why that check can't be 
improved.


r~
Richard Henderson Feb. 25, 2025, 8:04 p.m. UTC | #2
On 2/25/25 12:02, Richard Henderson wrote:
> Not checked here are any of the other reasons a flush might be ok:
> 
> (2) The system as a whole is stopped, on the way in from migration/vmload.
> (3) The cpu is offline, on the way in from poweroff/reset.
(4) Running in round-robin mode, so there is *never* a race between cpus.

Anything else I've forgotten?


r~
Igor Mammedov Feb. 26, 2025, 1:31 p.m. UTC | #3
On Tue, 25 Feb 2025 12:02:02 -0800
Richard Henderson <richard.henderson@linaro.org> wrote:

> On 2/25/25 10:46, Alex Bennée wrote:
> > From: Igor Mammedov <imammedo@redhat.com>
> > 
> > that will enable assert_cpu_is_self when QEMU is configured with
> >     --enable-debug
> > without need for manual patching DEBUG_TLB_GATE define.
> > 
> > Need to manually path DEBUG_TLB_GATE define to enable assert,
> > let regression caused by [1] creep in unnoticed.
> > 
> > 1) 30933c4fb4f3d ("tcg/cputlb: remove other-cpu capability from TLB flushing")
> > 
> > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > Suggested-by: Alex Bennée <alex.bennee@linaro.org>
> > Message-Id: <20250207162048.1890669-5-imammedo@redhat.com>
> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> > ---
> >   accel/tcg/cputlb.c | 7 ++-----
> >   1 file changed, 2 insertions(+), 5 deletions(-)
> > 
> > diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
> > index fc16a576f0..65b04b1055 100644
> > --- a/accel/tcg/cputlb.c
> > +++ b/accel/tcg/cputlb.c
> > @@ -73,11 +73,8 @@
> >       } \
> >   } while (0)
> >   
> > -#define assert_cpu_is_self(cpu) do {                              \
> > -        if (DEBUG_TLB_GATE) {                                     \
> > -            g_assert(!(cpu)->created || qemu_cpu_is_self(cpu));   \
> > -        }                                                         \
> > -    } while (0)
> > +#define assert_cpu_is_self(cpu)                             \
> > +    tcg_debug_assert(!(cpu)->created || qemu_cpu_is_self(cpu))  
> 
> I think this check is just wrong or incomplete.

the point of the path is to bring out check out of ifdef limbo.
Whether it's correct or not it's up to another patch to fix.


> The intent here is to check that we're not attempting to modify the softmmu tlb 
> asynchronously while a cpu is running.
> 
> (0) A synchronous flush to the current cpu is (obviously?) ok.
> (1) A flush to a cpu that is not yet created is (or should be) a no-op.

my another patch that was touching the check

"[PATCH v2 06/10] tcg: drop cpu->created check"

is trying to remove (abusing)usage of cpu->created
which should be used only for syncing main loop and
a to be created vcpu thread.
The creation of vcpu is not really complete yet by
this point so it depends on luck (being nop).

End goal from my side is to get rid of users that
use cpu->created as workaround to move from one
incomplete vcpu state to another still incomplete state.

We can drop the check after reset/postload paths are
fixed to schedule async flush.
 
> Not checked here are any of the other reasons a flush might be ok:
> 
> (2) The system as a whole is stopped, on the way in from migration/vmload.
> (3) The cpu is offline, on the way in from poweroff/reset.
> 
> If we decide that {1, 2, 3} are too complicated to check, then perhaps the solution to 
> queue flushes to the cpu's workqueue is the appropriate solution.  But so far all I see is 
> that we have an incomplete check, and no ready explanation for why that check can't be 
> improved.
> 
> 
> r~
>
Igor Mammedov Feb. 26, 2025, 1:42 p.m. UTC | #4
On Tue, 25 Feb 2025 12:04:40 -0800
Richard Henderson <richard.henderson@linaro.org> wrote:

> On 2/25/25 12:02, Richard Henderson wrote:
> > Not checked here are any of the other reasons a flush might be ok:
> > 
> > (2) The system as a whole is stopped, on the way in from migration/vmload.
> > (3) The cpu is offline, on the way in from poweroff/reset.  
> (4) Running in round-robin mode, so there is *never* a race between cpus.
> 
> Anything else I've forgotten?

looking at x86 reset path
 * we have 2 resets per vcpu from main loop,
   when vcpu is created (at realize time and at system_reset time).
   this probably are nop for tcg as you said.
   (it likely applies to all targets)

 * And also a reset triggered by IPI (run in vcpu thread),
   which likely should do flush to clear whatever context
   vcpu had before reset.
diff mbox series

Patch

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index fc16a576f0..65b04b1055 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -73,11 +73,8 @@ 
     } \
 } while (0)
 
-#define assert_cpu_is_self(cpu) do {                              \
-        if (DEBUG_TLB_GATE) {                                     \
-            g_assert(!(cpu)->created || qemu_cpu_is_self(cpu));   \
-        }                                                         \
-    } while (0)
+#define assert_cpu_is_self(cpu)                             \
+    tcg_debug_assert(!(cpu)->created || qemu_cpu_is_self(cpu))
 
 /* run_on_cpu_data.target_ptr should always be big enough for a
  * vaddr even on 32 bit builds