Message ID | 20241101013541.883785-1-gustavo.romero@linaro.org |
---|---|
State | Superseded |
Headers | show |
Series | target/arm: Enable FEAT_CMOW for -cpu max | expand |
On 11/1/24 01:35, Gustavo Romero wrote: > FEAT_CMOW introduces support for controlling cache maintenance > instructions executed in EL0/1 and is mandatory from Armv8.8. > > On real hardware, the main use for this feature is to prevent processes > from invalidating or flushing cache lines for addresses they only have > read permission, which can impact the performance of other processes. > > QEMU implements all cache instructions as NOPs, and, according to rule > [1], which states that generating any Permission fault when a cache > instruction is implemented as a NOP is implementation-defined, no > Permission fault is generated for any cache instruction when it lacks > read and write permissions. > > QEMU does not model any cache topology, so the PoU and PoC are before > any cache, and rules [2] apply. These rules states that generating any > MMU fault for cache instructions in this topology is also > implementation-defined. Therefore, for FEAT_CMOW, we do not generate any > MMU faults either, instead, we only advertise it in the feature > register. > > [1] Rule R_HGLYG of section D8.14.3, Arm ARM K.a. > [2] Rules R_MZTNR and R_DNZYL of section D8.14.3, Arm ARM K.a. > > Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> > --- > docs/system/arm/emulation.rst | 1 + > target/arm/cpu-features.h | 5 +++++ > target/arm/cpu.h | 1 + > target/arm/tcg/cpu64.c | 1 + > 4 files changed, 8 insertions(+) > > diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst > index 35f52a54b1..a2a388f091 100644 > --- a/docs/system/arm/emulation.rst > +++ b/docs/system/arm/emulation.rst > @@ -26,6 +26,7 @@ the following architecture extensions: > - FEAT_BF16 (AArch64 BFloat16 instructions) > - FEAT_BTI (Branch Target Identification) > - FEAT_CCIDX (Extended cache index) > +- FEAT_CMOW (Control for cache maintenance permission) > - FEAT_CRC32 (CRC32 instructions) > - FEAT_Crypto (Cryptographic Extension) > - FEAT_CSV2 (Cache speculation variant 2) > diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h > index 04ce281826..e806f138b8 100644 > --- a/target/arm/cpu-features.h > +++ b/target/arm/cpu-features.h > @@ -802,6 +802,11 @@ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id) > return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0; > } > > +static inline bool isar_feature_aa64_cmow(const ARMISARegisters *id) > +{ > + return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, CMOW) != 0; > +} This isn't used, so it may be omitted. Otherwise, Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~ > + > static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id) > { > return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0; > diff --git a/target/arm/cpu.h b/target/arm/cpu.h > index 8fc8b6398f..1ea4c545e0 100644 > --- a/target/arm/cpu.h > +++ b/target/arm/cpu.h > @@ -1367,6 +1367,7 @@ void pmu_init(ARMCPU *cpu); > #define SCTLR_EnIB (1U << 30) /* v8.3, AArch64 only */ > #define SCTLR_EnIA (1U << 31) /* v8.3, AArch64 only */ > #define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */ > +#define SCTLR_CMOW (1ULL << 32) /* FEAT_CMOW */ > #define SCTLR_MSCEN (1ULL << 33) /* FEAT_MOPS */ > #define SCTLR_BT0 (1ULL << 35) /* v8.5-BTI */ > #define SCTLR_BT1 (1ULL << 36) /* v8.5-BTI */ > diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c > index 0168920828..2963d7510f 100644 > --- a/target/arm/tcg/cpu64.c > +++ b/target/arm/tcg/cpu64.c > @@ -1218,6 +1218,7 @@ void aarch64_max_tcg_initfn(Object *obj) > t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2); /* FEAT_ETS2 */ > t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1); /* FEAT_HCX */ > t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1); /* FEAT_TIDCP1 */ > + t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1); /* FEAT_CMOW */ > cpu->isar.id_aa64mmfr1 = t; > > t = cpu->isar.id_aa64mmfr2;
On Fri, 1 Nov 2024 at 01:36, Gustavo Romero <gustavo.romero@linaro.org> wrote: > > FEAT_CMOW introduces support for controlling cache maintenance > instructions executed in EL0/1 and is mandatory from Armv8.8. > > On real hardware, the main use for this feature is to prevent processes > from invalidating or flushing cache lines for addresses they only have > read permission, which can impact the performance of other processes. > > QEMU implements all cache instructions as NOPs, and, according to rule > [1], which states that generating any Permission fault when a cache > instruction is implemented as a NOP is implementation-defined, no > Permission fault is generated for any cache instruction when it lacks > read and write permissions. > > QEMU does not model any cache topology, so the PoU and PoC are before > any cache, and rules [2] apply. These rules states that generating any > MMU fault for cache instructions in this topology is also > implementation-defined. Therefore, for FEAT_CMOW, we do not generate any > MMU faults either, instead, we only advertise it in the feature > register. > > [1] Rule R_HGLYG of section D8.14.3, Arm ARM K.a. > [2] Rules R_MZTNR and R_DNZYL of section D8.14.3, Arm ARM K.a. > > Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> > --- > docs/system/arm/emulation.rst | 1 + > target/arm/cpu-features.h | 5 +++++ > target/arm/cpu.h | 1 + > target/arm/tcg/cpu64.c | 1 + > 4 files changed, 8 insertions(+) > > diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst > index 35f52a54b1..a2a388f091 100644 > --- a/docs/system/arm/emulation.rst > +++ b/docs/system/arm/emulation.rst > @@ -26,6 +26,7 @@ the following architecture extensions: > - FEAT_BF16 (AArch64 BFloat16 instructions) > - FEAT_BTI (Branch Target Identification) > - FEAT_CCIDX (Extended cache index) > +- FEAT_CMOW (Control for cache maintenance permission) > - FEAT_CRC32 (CRC32 instructions) > - FEAT_Crypto (Cryptographic Extension) > - FEAT_CSV2 (Cache speculation variant 2) > diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h > index 04ce281826..e806f138b8 100644 > --- a/target/arm/cpu-features.h > +++ b/target/arm/cpu-features.h > @@ -802,6 +802,11 @@ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id) > return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0; > } > > +static inline bool isar_feature_aa64_cmow(const ARMISARegisters *id) > +{ > + return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, CMOW) != 0; > +} > + > static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id) > { > return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0; > diff --git a/target/arm/cpu.h b/target/arm/cpu.h > index 8fc8b6398f..1ea4c545e0 100644 > --- a/target/arm/cpu.h > +++ b/target/arm/cpu.h > @@ -1367,6 +1367,7 @@ void pmu_init(ARMCPU *cpu); > #define SCTLR_EnIB (1U << 30) /* v8.3, AArch64 only */ > #define SCTLR_EnIA (1U << 31) /* v8.3, AArch64 only */ > #define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */ > +#define SCTLR_CMOW (1ULL << 32) /* FEAT_CMOW */ > #define SCTLR_MSCEN (1ULL << 33) /* FEAT_MOPS */ > #define SCTLR_BT0 (1ULL << 35) /* v8.5-BTI */ > #define SCTLR_BT1 (1ULL << 36) /* v8.5-BTI */ > diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c > index 0168920828..2963d7510f 100644 > --- a/target/arm/tcg/cpu64.c > +++ b/target/arm/tcg/cpu64.c > @@ -1218,6 +1218,7 @@ void aarch64_max_tcg_initfn(Object *obj) > t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2); /* FEAT_ETS2 */ > t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1); /* FEAT_HCX */ > t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1); /* FEAT_TIDCP1 */ > + t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1); /* FEAT_CMOW */ > cpu->isar.id_aa64mmfr1 = t; > > t = cpu->isar.id_aa64mmfr2; We don't need to do anything for the actual cache operations, but we do need to make sure that the SCTLR_ELx and HCRX_EL2 control bits for it can be set and read back. Our sctlr_write() doesn't impose a mask, so no change nedeed there, but our hcrx_write() does set up a valid_mask and doesn't allow the guest to write bits that aren't in that mask. So we need to add an if (cpu_isar_feature(aa64_cmow, cpu)) { valid_mask |= HCRX_CMOW; } in there. thanks -- PMM
Hi Peter! On 11/4/24 10:38, Peter Maydell wrote: > On Fri, 1 Nov 2024 at 01:36, Gustavo Romero <gustavo.romero@linaro.org> wrote: >> >> FEAT_CMOW introduces support for controlling cache maintenance >> instructions executed in EL0/1 and is mandatory from Armv8.8. >> >> On real hardware, the main use for this feature is to prevent processes >> from invalidating or flushing cache lines for addresses they only have >> read permission, which can impact the performance of other processes. >> >> QEMU implements all cache instructions as NOPs, and, according to rule >> [1], which states that generating any Permission fault when a cache >> instruction is implemented as a NOP is implementation-defined, no >> Permission fault is generated for any cache instruction when it lacks >> read and write permissions. >> >> QEMU does not model any cache topology, so the PoU and PoC are before >> any cache, and rules [2] apply. These rules states that generating any >> MMU fault for cache instructions in this topology is also >> implementation-defined. Therefore, for FEAT_CMOW, we do not generate any >> MMU faults either, instead, we only advertise it in the feature >> register. >> >> [1] Rule R_HGLYG of section D8.14.3, Arm ARM K.a. >> [2] Rules R_MZTNR and R_DNZYL of section D8.14.3, Arm ARM K.a. >> >> Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> >> --- >> docs/system/arm/emulation.rst | 1 + >> target/arm/cpu-features.h | 5 +++++ >> target/arm/cpu.h | 1 + >> target/arm/tcg/cpu64.c | 1 + >> 4 files changed, 8 insertions(+) >> >> diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst >> index 35f52a54b1..a2a388f091 100644 >> --- a/docs/system/arm/emulation.rst >> +++ b/docs/system/arm/emulation.rst >> @@ -26,6 +26,7 @@ the following architecture extensions: >> - FEAT_BF16 (AArch64 BFloat16 instructions) >> - FEAT_BTI (Branch Target Identification) >> - FEAT_CCIDX (Extended cache index) >> +- FEAT_CMOW (Control for cache maintenance permission) >> - FEAT_CRC32 (CRC32 instructions) >> - FEAT_Crypto (Cryptographic Extension) >> - FEAT_CSV2 (Cache speculation variant 2) >> diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h >> index 04ce281826..e806f138b8 100644 >> --- a/target/arm/cpu-features.h >> +++ b/target/arm/cpu-features.h >> @@ -802,6 +802,11 @@ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id) >> return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0; >> } >> >> +static inline bool isar_feature_aa64_cmow(const ARMISARegisters *id) >> +{ >> + return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, CMOW) != 0; >> +} >> + >> static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id) >> { >> return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0; >> diff --git a/target/arm/cpu.h b/target/arm/cpu.h >> index 8fc8b6398f..1ea4c545e0 100644 >> --- a/target/arm/cpu.h >> +++ b/target/arm/cpu.h >> @@ -1367,6 +1367,7 @@ void pmu_init(ARMCPU *cpu); >> #define SCTLR_EnIB (1U << 30) /* v8.3, AArch64 only */ >> #define SCTLR_EnIA (1U << 31) /* v8.3, AArch64 only */ >> #define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */ >> +#define SCTLR_CMOW (1ULL << 32) /* FEAT_CMOW */ >> #define SCTLR_MSCEN (1ULL << 33) /* FEAT_MOPS */ >> #define SCTLR_BT0 (1ULL << 35) /* v8.5-BTI */ >> #define SCTLR_BT1 (1ULL << 36) /* v8.5-BTI */ >> diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c >> index 0168920828..2963d7510f 100644 >> --- a/target/arm/tcg/cpu64.c >> +++ b/target/arm/tcg/cpu64.c >> @@ -1218,6 +1218,7 @@ void aarch64_max_tcg_initfn(Object *obj) >> t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2); /* FEAT_ETS2 */ >> t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1); /* FEAT_HCX */ >> t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1); /* FEAT_TIDCP1 */ >> + t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1); /* FEAT_CMOW */ >> cpu->isar.id_aa64mmfr1 = t; >> >> t = cpu->isar.id_aa64mmfr2; > > We don't need to do anything for the actual cache operations, > but we do need to make sure that the SCTLR_ELx and HCRX_EL2 > control bits for it can be set and read back. Our sctlr_write() > doesn't impose a mask, so no change nedeed there, but > our hcrx_write() does set up a valid_mask and doesn't allow > the guest to write bits that aren't in that mask. So we > need to add an > if (cpu_isar_feature(aa64_cmow, cpu)) { > valid_mask |= HCRX_CMOW; > } > in there. Ough. Sure! Fixed in v2. Thanks a lot. Cheers, Gustavo
Hi Richard, On 11/4/24 07:59, Richard Henderson wrote: > On 11/1/24 01:35, Gustavo Romero wrote: >> FEAT_CMOW introduces support for controlling cache maintenance >> instructions executed in EL0/1 and is mandatory from Armv8.8. >> >> On real hardware, the main use for this feature is to prevent processes >> from invalidating or flushing cache lines for addresses they only have >> read permission, which can impact the performance of other processes. >> >> QEMU implements all cache instructions as NOPs, and, according to rule >> [1], which states that generating any Permission fault when a cache >> instruction is implemented as a NOP is implementation-defined, no >> Permission fault is generated for any cache instruction when it lacks >> read and write permissions. >> >> QEMU does not model any cache topology, so the PoU and PoC are before >> any cache, and rules [2] apply. These rules states that generating any >> MMU fault for cache instructions in this topology is also >> implementation-defined. Therefore, for FEAT_CMOW, we do not generate any >> MMU faults either, instead, we only advertise it in the feature >> register. >> >> [1] Rule R_HGLYG of section D8.14.3, Arm ARM K.a. >> [2] Rules R_MZTNR and R_DNZYL of section D8.14.3, Arm ARM K.a. >> >> Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> >> --- >> docs/system/arm/emulation.rst | 1 + >> target/arm/cpu-features.h | 5 +++++ >> target/arm/cpu.h | 1 + >> target/arm/tcg/cpu64.c | 1 + >> 4 files changed, 8 insertions(+) >> >> diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst >> index 35f52a54b1..a2a388f091 100644 >> --- a/docs/system/arm/emulation.rst >> +++ b/docs/system/arm/emulation.rst >> @@ -26,6 +26,7 @@ the following architecture extensions: >> - FEAT_BF16 (AArch64 BFloat16 instructions) >> - FEAT_BTI (Branch Target Identification) >> - FEAT_CCIDX (Extended cache index) >> +- FEAT_CMOW (Control for cache maintenance permission) >> - FEAT_CRC32 (CRC32 instructions) >> - FEAT_Crypto (Cryptographic Extension) >> - FEAT_CSV2 (Cache speculation variant 2) >> diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h >> index 04ce281826..e806f138b8 100644 >> --- a/target/arm/cpu-features.h >> +++ b/target/arm/cpu-features.h >> @@ -802,6 +802,11 @@ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id) >> return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0; >> } >> +static inline bool isar_feature_aa64_cmow(const ARMISARegisters *id) >> +{ >> + return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, CMOW) != 0; >> +} > > This isn't used, so it may be omitted. > > Otherwise, > Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Got it. I wasn’t entirely sure if I should add it for future convenience. k, I'll omit it if I come across a similar case. Thanks. Since I'm using it after Peter's comment for v2, I kept it, and added your R-b. Cheers, Gustavo
diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst index 35f52a54b1..a2a388f091 100644 --- a/docs/system/arm/emulation.rst +++ b/docs/system/arm/emulation.rst @@ -26,6 +26,7 @@ the following architecture extensions: - FEAT_BF16 (AArch64 BFloat16 instructions) - FEAT_BTI (Branch Target Identification) - FEAT_CCIDX (Extended cache index) +- FEAT_CMOW (Control for cache maintenance permission) - FEAT_CRC32 (CRC32 instructions) - FEAT_Crypto (Cryptographic Extension) - FEAT_CSV2 (Cache speculation variant 2) diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h index 04ce281826..e806f138b8 100644 --- a/target/arm/cpu-features.h +++ b/target/arm/cpu-features.h @@ -802,6 +802,11 @@ static inline bool isar_feature_aa64_tidcp1(const ARMISARegisters *id) return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, TIDCP1) != 0; } +static inline bool isar_feature_aa64_cmow(const ARMISARegisters *id) +{ + return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, CMOW) != 0; +} + static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id) { return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0; diff --git a/target/arm/cpu.h b/target/arm/cpu.h index 8fc8b6398f..1ea4c545e0 100644 --- a/target/arm/cpu.h +++ b/target/arm/cpu.h @@ -1367,6 +1367,7 @@ void pmu_init(ARMCPU *cpu); #define SCTLR_EnIB (1U << 30) /* v8.3, AArch64 only */ #define SCTLR_EnIA (1U << 31) /* v8.3, AArch64 only */ #define SCTLR_DSSBS_32 (1U << 31) /* v8.5, AArch32 only */ +#define SCTLR_CMOW (1ULL << 32) /* FEAT_CMOW */ #define SCTLR_MSCEN (1ULL << 33) /* FEAT_MOPS */ #define SCTLR_BT0 (1ULL << 35) /* v8.5-BTI */ #define SCTLR_BT1 (1ULL << 36) /* v8.5-BTI */ diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c index 0168920828..2963d7510f 100644 --- a/target/arm/tcg/cpu64.c +++ b/target/arm/tcg/cpu64.c @@ -1218,6 +1218,7 @@ void aarch64_max_tcg_initfn(Object *obj) t = FIELD_DP64(t, ID_AA64MMFR1, ETS, 2); /* FEAT_ETS2 */ t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1); /* FEAT_HCX */ t = FIELD_DP64(t, ID_AA64MMFR1, TIDCP1, 1); /* FEAT_TIDCP1 */ + t = FIELD_DP64(t, ID_AA64MMFR1, CMOW, 1); /* FEAT_CMOW */ cpu->isar.id_aa64mmfr1 = t; t = cpu->isar.id_aa64mmfr2;
FEAT_CMOW introduces support for controlling cache maintenance instructions executed in EL0/1 and is mandatory from Armv8.8. On real hardware, the main use for this feature is to prevent processes from invalidating or flushing cache lines for addresses they only have read permission, which can impact the performance of other processes. QEMU implements all cache instructions as NOPs, and, according to rule [1], which states that generating any Permission fault when a cache instruction is implemented as a NOP is implementation-defined, no Permission fault is generated for any cache instruction when it lacks read and write permissions. QEMU does not model any cache topology, so the PoU and PoC are before any cache, and rules [2] apply. These rules states that generating any MMU fault for cache instructions in this topology is also implementation-defined. Therefore, for FEAT_CMOW, we do not generate any MMU faults either, instead, we only advertise it in the feature register. [1] Rule R_HGLYG of section D8.14.3, Arm ARM K.a. [2] Rules R_MZTNR and R_DNZYL of section D8.14.3, Arm ARM K.a. Signed-off-by: Gustavo Romero <gustavo.romero@linaro.org> --- docs/system/arm/emulation.rst | 1 + target/arm/cpu-features.h | 5 +++++ target/arm/cpu.h | 1 + target/arm/tcg/cpu64.c | 1 + 4 files changed, 8 insertions(+)