diff mbox series

[PULL,08/41] target/arm: Set CTR_EL0.{IDC,DIC} for the 'max' CPU

Message ID 20240111110505.1563291-9-peter.maydell@linaro.org
State Accepted
Commit 3d65b958c5463a1c06cac51b6474097ecdbb576e
Headers show
Series [PULL,01/41] hw/arm: add cache controller for Freescale i.MX6 | expand

Commit Message

Peter Maydell Jan. 11, 2024, 11:04 a.m. UTC
The CTR_EL0 register has some bits which allow the implementation to
tell the guest that it does not need to do cache maintenance for
data-to-instruction coherence and instruction-to-data coherence.
QEMU doesn't emulate caches and so our cache maintenance insns are
all NOPs.

We already have some models of specific CPUs where we set these bits
(e.g.  the Neoverse V1), but the 'max' CPU still uses the settings it
inherits from Cortex-A57.  Set the bits for 'max' as well, so the
guest doesn't need to do unnecessary work.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Miguel Luis <miguel.luis@oracle.com>
---
 target/arm/tcg/cpu64.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

Comments

Fabiano Rosas Jan. 17, 2024, 9:24 p.m. UTC | #1
Peter Maydell <peter.maydell@linaro.org> writes:

> The CTR_EL0 register has some bits which allow the implementation to
> tell the guest that it does not need to do cache maintenance for
> data-to-instruction coherence and instruction-to-data coherence.
> QEMU doesn't emulate caches and so our cache maintenance insns are
> all NOPs.
>
> We already have some models of specific CPUs where we set these bits
> (e.g.  the Neoverse V1), but the 'max' CPU still uses the settings it
> inherits from Cortex-A57.  Set the bits for 'max' as well, so the
> guest doesn't need to do unnecessary work.
>
> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> Tested-by: Miguel Luis <miguel.luis@oracle.com>
> ---
>  target/arm/tcg/cpu64.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
> index fcda99e1583..40e7a45166f 100644
> --- a/target/arm/tcg/cpu64.c
> +++ b/target/arm/tcg/cpu64.c
> @@ -1105,6 +1105,16 @@ void aarch64_max_tcg_initfn(Object *obj)
>      u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
>      cpu->clidr = u;
>  
> +    /*
> +     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
> +     * do any cache maintenance for data-to-instruction or
> +     * instruction-to-guest coherence. (Our cache ops are nops.)
> +     */
> +    t = cpu->ctr;
> +    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
> +    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
> +    cpu->ctr = t;
> +
>      t = cpu->isar.id_aa64isar0;
>      t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
>      t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */

Hi, we're introducing new regression tests to migration and this patch
shows up in the bisect of an issue. I need some help figuring out
whether this is an actual regression or something else.

The migration is TCG QEMU 8.2.0 -> TCG QEMU master.

On the destination side (contains this patch) we're hitting this
condition:

bool write_list_to_cpustate(ARMCPU *cpu)
{
...
        /*
         * Write value and confirm it reads back as written
         * (to catch read-only registers and partially read-only
         * registers where the incoming migration value doesn't match)
         */
        write_raw_cp_reg(&cpu->env, ri, v);
        if (read_raw_cp_reg(&cpu->env, ri) != v) {
--->        ok = false;
        }

Thread 1 "qemu-system-aar" hit Breakpoint 3, write_list_to_cpustate (cpu=0x555557a2f8f0) at ../target/arm/helper.c:190
190                 ok = false;
(gdb) p ri->name
$7 = 0x555557ab9ae0 "CTR"
(gdb) p/x v
$3 = 0x8444c004
(gdb) p/x read_raw_cp_reg(&cpu->env, ri)
$4 = 0xb444c004

Is there any particularity in reading/writing to that register? This is
during post_load and 'v' is what came in the migration stream.
Peter Maydell Jan. 18, 2024, 9:44 a.m. UTC | #2
On Wed, 17 Jan 2024 at 21:24, Fabiano Rosas <farosas@suse.de> wrote:
> Peter Maydell <peter.maydell@linaro.org> writes:
> > diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
> > index fcda99e1583..40e7a45166f 100644
> > --- a/target/arm/tcg/cpu64.c
> > +++ b/target/arm/tcg/cpu64.c
> > @@ -1105,6 +1105,16 @@ void aarch64_max_tcg_initfn(Object *obj)
> >      u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
> >      cpu->clidr = u;
> >
> > +    /*
> > +     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
> > +     * do any cache maintenance for data-to-instruction or
> > +     * instruction-to-guest coherence. (Our cache ops are nops.)
> > +     */
> > +    t = cpu->ctr;
> > +    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
> > +    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
> > +    cpu->ctr = t;
> > +
> >      t = cpu->isar.id_aa64isar0;
> >      t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
> >      t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */
>
> Hi, we're introducing new regression tests to migration and this patch
> shows up in the bisect of an issue. I need some help figuring out
> whether this is an actual regression or something else.
>
> The migration is TCG QEMU 8.2.0 -> TCG QEMU master.
>
> On the destination side (contains this patch) we're hitting this
> condition:
>
> bool write_list_to_cpustate(ARMCPU *cpu)
> {
> ...
>         /*
>          * Write value and confirm it reads back as written
>          * (to catch read-only registers and partially read-only
>          * registers where the incoming migration value doesn't match)
>          */
>         write_raw_cp_reg(&cpu->env, ri, v);
>         if (read_raw_cp_reg(&cpu->env, ri) != v) {
> --->        ok = false;
>         }

This is (among other things) effectively checking that the
source and destination CPU agree about the values of constant
registers like the ID registers, of which this is one.

The "max" CPU is a moving target, so you shouldn't expect
to be able to migrate across QEMU versions using it:
it can have different features and thus different ID
register values between versions (as well as potentially
different actual-register-state if the added feature adds
new state).

thanks
-- PMM
Fabiano Rosas Jan. 18, 2024, 12:57 p.m. UTC | #3
Peter Maydell <peter.maydell@linaro.org> writes:

> On Wed, 17 Jan 2024 at 21:24, Fabiano Rosas <farosas@suse.de> wrote:
>> Peter Maydell <peter.maydell@linaro.org> writes:
>> > diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
>> > index fcda99e1583..40e7a45166f 100644
>> > --- a/target/arm/tcg/cpu64.c
>> > +++ b/target/arm/tcg/cpu64.c
>> > @@ -1105,6 +1105,16 @@ void aarch64_max_tcg_initfn(Object *obj)
>> >      u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
>> >      cpu->clidr = u;
>> >
>> > +    /*
>> > +     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
>> > +     * do any cache maintenance for data-to-instruction or
>> > +     * instruction-to-guest coherence. (Our cache ops are nops.)
>> > +     */
>> > +    t = cpu->ctr;
>> > +    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
>> > +    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
>> > +    cpu->ctr = t;
>> > +
>> >      t = cpu->isar.id_aa64isar0;
>> >      t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
>> >      t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */
>>
>> Hi, we're introducing new regression tests to migration and this patch
>> shows up in the bisect of an issue. I need some help figuring out
>> whether this is an actual regression or something else.
>>
>> The migration is TCG QEMU 8.2.0 -> TCG QEMU master.
>>
>> On the destination side (contains this patch) we're hitting this
>> condition:
>>
>> bool write_list_to_cpustate(ARMCPU *cpu)
>> {
>> ...
>>         /*
>>          * Write value and confirm it reads back as written
>>          * (to catch read-only registers and partially read-only
>>          * registers where the incoming migration value doesn't match)
>>          */
>>         write_raw_cp_reg(&cpu->env, ri, v);
>>         if (read_raw_cp_reg(&cpu->env, ri) != v) {
>> --->        ok = false;
>>         }
>
> This is (among other things) effectively checking that the
> source and destination CPU agree about the values of constant
> registers like the ID registers, of which this is one.
>
> The "max" CPU is a moving target, so you shouldn't expect
> to be able to migrate across QEMU versions using it:
> it can have different features and thus different ID
> register values between versions (as well as potentially
> different actual-register-state if the added feature adds
> new state).

We're adding a test for migrating across QEMU versions to stop these
sort of breakages happening. If 'max' is not suitable for that scenario,
would there be another cpu you recommend to use? Otherwise aarch64 will
have to remain uncovered for migration across QEMU versions.

There's no clear statement on what kinds of combinations we support for
migration, but anything that's actually tested is more likely to be
considered supported.

Note that we're only targeting 'n-1 -> n' and 'n -> n-1' migrations,
where n-1 is the latest release and n is the development branch.
Peter Maydell Jan. 18, 2024, 1 p.m. UTC | #4
On Thu, 18 Jan 2024 at 12:58, Fabiano Rosas <farosas@suse.de> wrote:
>
> Peter Maydell <peter.maydell@linaro.org> writes:
>
> > On Wed, 17 Jan 2024 at 21:24, Fabiano Rosas <farosas@suse.de> wrote:
> >> Peter Maydell <peter.maydell@linaro.org> writes:
> > The "max" CPU is a moving target, so you shouldn't expect
> > to be able to migrate across QEMU versions using it:
> > it can have different features and thus different ID
> > register values between versions (as well as potentially
> > different actual-register-state if the added feature adds
> > new state).
>
> We're adding a test for migrating across QEMU versions to stop these
> sort of breakages happening. If 'max' is not suitable for that scenario,
> would there be another cpu you recommend to use? Otherwise aarch64 will
> have to remain uncovered for migration across QEMU versions.
>
> There's no clear statement on what kinds of combinations we support for
> migration, but anything that's actually tested is more likely to be
> considered supported.

Any fixed defined CPU is a reasonable choice. I suggest neoverse-n1.

I don't think 'max' is expected to migrate cross-version on
any CPU architecture; this isn't Arm-specific.

thanks
-- PMM
diff mbox series

Patch

diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index fcda99e1583..40e7a45166f 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1105,6 +1105,16 @@  void aarch64_max_tcg_initfn(Object *obj)
     u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
     cpu->clidr = u;
 
+    /*
+     * Set CTR_EL0.DIC and IDC to tell the guest it doesnt' need to
+     * do any cache maintenance for data-to-instruction or
+     * instruction-to-guest coherence. (Our cache ops are nops.)
+     */
+    t = cpu->ctr;
+    t = FIELD_DP64(t, CTR_EL0, IDC, 1);
+    t = FIELD_DP64(t, CTR_EL0, DIC, 1);
+    cpu->ctr = t;
+
     t = cpu->isar.id_aa64isar0;
     t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);      /* FEAT_PMULL */
     t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */