Message ID | 20250214-kvm-arm64-sme-v4-0-d64a681adcc2@kernel.org |
---|---|
Headers | show |
Series | KVM: arm64: Implement support for SME in non-protected guests | expand |
On Fri, 14 Feb 2025 15:13:52 +0000, Mark Brown <broonie@kernel.org> wrote: > > On Fri, Feb 14, 2025 at 09:24:03AM +0000, Marc Zyngier wrote: > > Mark Brown <broonie@kernel.org> wrote: > > > Just to be clear: I do not intend to review a series that doesn't > > cover the full gamut of KVM from day 1. Protected mode is an absolute > > requirement. It is the largest KVM deployment, and Android phones the > > only commonly available platform with SME. If CCA gets merged prior to > > SME support, supporting it will also be a requirement. > > OK, no problem. This is a new requirement and I'd been trying to > balance the concerns people have with the size of serieses like this > with the need to get everything in, my plan had been to follow up as > soon as possible with pKVM. If size is an issue, split the UAPI from the core code. But don't fragment the overall world switch and exception handling, because that's a sure way to end-up with the same sort of problems we ended up fixing in SVE. pKVM has a direct influence on what you track, where you track it, and implementing it as an afterthought is a very bad idea. > > > > Access to the floating point registers follows the architecture: > > > > - When both SVE and SME are present: > > > - If PSTATE.SM == 0 the vector length used for the Z and P registers > > > is the SVE vector length. > > > - If PSTATE.SM == 1 the vector length used for the Z and P registers > > > is the SME vector length. > > > - If only SME is present: > > > - If PSTATE.SM == 0 the Z and P registers are inaccessible and the > > > floating point state accessed via the encodings for the V registers. > > > - If PSTATE.SM == 1 the vector length used for the Z and P registers > > > - The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1. > > > > The VMM must understand this, in particular when loading state SVCR > > > should be configured before other state. > > > Why SVCR? This isn't a register, just an architected accessor to > > PSTATE.{ZA,SM}. Userspace already has direct access to PSTATE, so I > > don't understand this requirement. > > Could you be more explicit as to what you mean by direct access to > PSTATE here? The direct access to these PSTATE fields is in the form of > SVCR register accesses, or writes via SMSTART or SMSTOP instructions > when executing code - is there another access mechanism I'm not aware of > here? They don't appear in SPSR. Or is this a terminology issue with > referring to SVCR as the mechanism for configuring PSTATE.{SM,ZA} > without explicitly calling out that that's what's happening? I'm painfully aware of the architecture limitations. However, I don't get your mention of SPSR here. The architecture is quite clear that PSTATE is where these bits are held, that they are not propagated anywhere else, and that's where userspace should expect to find them. The fact that SW must use SVCR to alter PSTATE.{ZA,SM} doesn't matter. We save/restore registers, not accessors. If this means we need to play a dance when the VMM accesses PSTATE to reconciliate KVM's internal view with the userspace view, so be it. It probably means you need to obtain a clarification of the architecture to define *where* these bits are stored in PSTATE, because that's not currently defined. > > > Isn't it that there is simply a dependency between restoring PSTATE > > and any of the vector stuff? Also, how do you preserve the current ABI > > that do not have this requirement? > > Yes, that's the dependency - I'm spelling out explicitly what changes in > the register view when PSTATE.{SM,ZA} are restored. This ABI is what > you appeared to be asking for the last time we discussed this. > Previously I had also proposed either: > > - Exposing the streaming mode view of the register state as separate > registers, selecting between the standard and streaming views for > load/save based on the mode when the guest runs and clearing the > inactive registers on userspace access. > > - Always presenting userspace with the largest available vector length, > zero padding when userspace reads and discarding unused high bits > when loading into the registers for the guest. This ends up > requiring rewriting between VLs, or to/from FPSIMD format, around > periods of userspace access since when normally executing and context > switching the guest we want to store the data in the native format > for the current PSTATE.SM for performance. > > both of which largely avoid the ordering requirements but add complexity > to the implementation, and memory overhead in the first case. I'd > originally implemented the second case, that seems the best of the > available options to me. You weren't happy with these options and said > that we should not translate register formats and always use the current > mode for the vCPU, but given that changing PSTATE.SM changes the > register sizes that ends up creating an ordering requirement. You > seemed to agree that it was reasonable to have an ordering requirement > with PSTATE.SM so long as it only came when SME had been explicitly > enabled. > > Would you prefer: > > - Changing the register view based on the current value of PSTATE.SM. > - Exposing streaming mode Z and P as separate registers. > - Exposing the existing Z and P registers with the maximum S?E VL. > > or some other option? My take on this hasn't changed. I want to see something that behaves *exactly* like the architecture defines the expected behaviour of a CPU. But you still haven't answered my question: How is the *current* ABI preserved? Does it require *not* selecting SME? Does it require anything else? I'm expecting simple answers to simple questions, not a wall of text describing something that is not emulating the architecture. M.
On Mon, Feb 17, 2025 at 09:37:26AM +0000, Marc Zyngier wrote: > Mark Brown <broonie@kernel.org> wrote: > > On Fri, Feb 14, 2025 at 09:24:03AM +0000, Marc Zyngier wrote: > > > Why SVCR? This isn't a register, just an architected accessor to > > > PSTATE.{ZA,SM}. Userspace already has direct access to PSTATE, so I > > > don't understand this requirement. > > Could you be more explicit as to what you mean by direct access to > > PSTATE here? The direct access to these PSTATE fields is in the form of > I'm painfully aware of the architecture limitations. > However, I don't get your mention of SPSR here. The architecture is > quite clear that PSTATE is where these bits are held, that they are > not propagated anywhere else, and that's where userspace should expect > to find them. > The fact that SW must use SVCR to alter PSTATE.{ZA,SM} doesn't matter. > We save/restore registers, not accessors. If this means we need to > play a dance when the VMM accesses PSTATE to reconciliate KVM's > internal view with the userspace view, so be it. Could you please clarify what you're referring to as the VMM accessing PSTATE here? The KVM API documentation defines it's concept of PSTATE as: 0x6030 0000 0010 0042 PSTATE 64 regs.pstate in api.rst but does not futher elaborate. Looking at the code the values that appear there seem to be mapped fairly directly to SPSR values, this is why I was talking about them above. It's not clear to me that PSTATE in the architecture is a register exactly. DDI0487 L.a D1.4 defines the PSTATE bits as being stored in a ProcState pseudocode structure (ProcState is defined in J1.3.3.457, the pseudocode maps PSTATE in J1.3.3.454) but has an explicit comment that "There is no significace in the field order". I can't seem to locate an architectural definition of the layout of PSTATE as a whole. I also can't find any direct observability of PSTATE as a whole which would require a layout definition for PSTATE itself from the architecture. There *is* a statement in R_DQXFW that "The contents of PSTATE immediately before the exception was taken are written to SPSR_ELx" which is somewhat in conflict with the absence of SM and ZA fields in any of SPSR_ELx. There's also R_BWCFK similarly for exception return, though that also already has some additional text for PSTATE.{IT,T} for returns to AArch32. If everything in PSTATE ends up getting written to SPSR then we can use SPSR as our representation of PSTATE. > It probably means you need to obtain a clarification of the > architecture to define *where* these bits are stored in PSTATE, > because that's not currently defined. I will raise the issue with R_DQXFW and friends with the architects but I'm not convinced that at this point the clarification wouldn't be an adjustment to those rules rather than the addition of fields for SM and ZA in the SPSRs. Without any additions the only access we have is via SVCR. > > > Isn't it that there is simply a dependency between restoring PSTATE > > > and any of the vector stuff? Also, how do you preserve the current ABI > > > that do not have this requirement? > > Yes, that's the dependency - I'm spelling out explicitly what changes in > > the register view when PSTATE.{SM,ZA} are restored. This ABI is what > > you appeared to be asking for the last time we discussed this. ... > > Would you prefer: > > - Changing the register view based on the current value of PSTATE.SM. > > - Exposing streaming mode Z and P as separate registers. > > - Exposing the existing Z and P registers with the maximum S?E VL. > > or some other option? > My take on this hasn't changed. I want to see something that behaves > *exactly* like the architecture defines the expected behaviour of a > CPU. > But you still haven't answered my question: How is the *current* ABI > preserved? Does it require *not* selecting SME? Does it require > anything else? I'm expecting simple answers to simple questions, not a Yes, it requires not selecting SME and only that. If the VMM does not enable SME then it should see no change. > wall of text describing something that is not emulating the > architecture. I'm not clear what you're referring to as not emulating the architecture here, I *think* it's the issues with PSTATE.{SM,ZA} not appearing in SPSR_ELx and hence KVM's pstate? Your general style of interaction means that it is not always altogether clear when your intent is to just ask a simple question or when it is to point out some problem you have seen.
I've removed the RFC tag from this version of the series, but the items that I'm looking for feedback on remains the same: - The userspace ABI, in particular: - The vector length used for the SVE registers, access to the SVE registers and access to ZA and (if available) ZT0 depending on the current state of PSTATE.{SM,ZA}. - The use of a single finalisation for both SVE and SME. - The addition of control for enabling fine grained traps in a similar manner to FGU but without the UNDEF, I'm not clear if this is desired at all and at present this requires symmetric read and write traps like FGU. That seemed like it might be desired from an implementation point of view but we already have one case where we enable an asymmetric trap (for ARM64_WORKAROUND_AMPERE_AC03_CPU_38) and it seems generally useful to enable asymmetrically. This series implements support for SME use in non-protected KVM guests. Much of this is very similar to SVE, the main additional challenge that SME presents is that it introduces a new vector length similar to the SVE vector length and two new controls which change the registers seen by guests: - PSTATE.ZA enables the ZA matrix register and, if SME2 is supported, the ZT0 LUT register. - PSTATE.SM enables streaming mode, a new floating point mode which uses the SVE register set with the separately configured SME vector length. In streaming mode implementation of the FFR register is optional. It is also permitted to build systems which support SME without SVE, in this case when not in streaming mode no SVE registers or instructions are available. Further, there is no requirement that there be any overlap in the set of vector lengths supported by SVE and SME in a system, this is expected to be a common situation in practical systems. Since there is a new vector length to configure we introduce a new feature parallel to the existing SVE one with a new pseudo register for the streaming mode vector length. Due to the overlap with SVE caused by streaming mode rather than finalising SME as a separate feature we use the existing SVE finalisation to also finalise SME, a new define KVM_ARM_VCPU_VEC is provided to help make user code clearer. Finalising SVE and SME separately would introduce complication with register access since finalising SVE makes the SVE regsiters writeable by userspace and doing multiple finalisations results in an error being reported. Dealing with a state where the SVE registers are writeable due to one of SVE or SME being finalised but may have their VL changed by the other being finalised seems like needless complexity with minimal practical utility, it seems clearer to just express directly that only one finalisation can be done in the ABI. Access to the floating point registers follows the architecture: - When both SVE and SME are present: - If PSTATE.SM == 0 the vector length used for the Z and P registers is the SVE vector length. - If PSTATE.SM == 1 the vector length used for the Z and P registers is the SME vector length. - If only SME is present: - If PSTATE.SM == 0 the Z and P registers are inaccessible and the floating point state accessed via the encodings for the V registers. - If PSTATE.SM == 1 the vector length used for the Z and P registers - The SME specific ZA and ZT0 registers are only accessible if SVCR.ZA is 1. The VMM must understand this, in particular when loading state SVCR should be configured before other state. There are a large number of subfeatures for SME, most of which only offer additional instructions but some of which (SME2 and FA64) add architectural state. These are configured via the ID registers as per usual. The new KVM_ARM_VCPU_VEC feature and ZA and ZT0 registers have not been added to the get-reg-list selftest, the idea of supporting additional features there without restructuring the program to generate all possible feature combinations has been rejected. I will post a separate series which does that restructuring. No support is present for protected guests, this is expected to be added separately, the series is already rather large and pKVM in general offers a subset of features. This series is based on Mark Rutland's fix series: https://lore.kernel.org/r/20250210195226.1215254-1-mark.rutland@arm.com Signed-off-by: Mark Brown <broonie@kernel.org> --- Changes in v4: - Rebase onto v6.14-rc2 and Mark Rutland's fixes. - Expose SME to nested guests. - Additional cleanups and test fixes following on from the rebase. - Link to v3: https://lore.kernel.org/r/20241220-kvm-arm64-sme-v3-0-05b018c1ffeb@kernel.org Changes in v3: - Rebase onto v6.12-rc2. - Link to v2: https://lore.kernel.org/r/20231222-kvm-arm64-sme-v2-0-da226cb180bb@kernel.org Changes in v2: - Rebase onto v6.7-rc3. - Configure subfeatures based on host system only. - Complete nVHE support. - There was some snafu with sending v1 out, it didn't make it to the lists but in case it hit people's inboxes I'm sending as v2. --- Mark Brown (27): arm64/fpsimd: Update FA64 and ZT0 enables when loading SME state arm64/fpsimd: Decide to save ZT0 and streaming mode FFR at bind time arm64/fpsimd: Check enable bit for FA64 when saving EFI state arm64/fpsimd: Determine maximum virtualisable SME vector length KVM: arm64: Introduce non-UNDEF FGT control KVM: arm64: Pay attention to FFR parameter in SVE save and load KVM: arm64: Pull ctxt_has_ helpers to start of sysreg-sr.h KVM: arm64: Move SVE state access macros after feature test macros KVM: arm64: Rename SVE finalization constants to be more general KVM: arm64: Document the KVM ABI for SME KVM: arm64: Define internal features for SME KVM: arm64: Rename sve_state_reg_region KVM: arm64: Store vector lengths in an array KVM: arm64: Implement SME vector length configuration KVM: arm64: Support SME control registers KVM: arm64: Support TPIDR2_EL0 KVM: arm64: Support SME identification registers for guests KVM: arm64: Support SME priority registers KVM: arm64: Provide assembly for SME state restore KVM: arm64: Support userspace access to streaming mode Z and P registers KVM: arm64: Expose SME specific state to userspace KVM: arm64: Context switch SME state for normal guests KVM: arm64: Handle SME exceptions KVM: arm64: Expose SME to nested guests KVM: arm64: Provide interface for configuring and enabling SME for guests KVM: arm64: selftests: Add SME system registers to get-reg-list KVM: arm64: selftests: Add SME to set_id_regs test Documentation/virt/kvm/api.rst | 117 +++++++--- arch/arm64/include/asm/fpsimd.h | 22 ++ arch/arm64/include/asm/kvm_emulate.h | 12 +- arch/arm64/include/asm/kvm_host.h | 143 ++++++++++--- arch/arm64/include/asm/kvm_hyp.h | 4 +- arch/arm64/include/asm/kvm_pkvm.h | 2 +- arch/arm64/include/asm/vncr_mapping.h | 2 + arch/arm64/include/uapi/asm/kvm.h | 33 +++ arch/arm64/kernel/cpufeature.c | 2 - arch/arm64/kernel/fpsimd.c | 86 ++++---- arch/arm64/kvm/arm.c | 10 + arch/arm64/kvm/fpsimd.c | 19 +- arch/arm64/kvm/guest.c | 262 ++++++++++++++++++++--- arch/arm64/kvm/handle_exit.c | 14 ++ arch/arm64/kvm/hyp/fpsimd.S | 18 +- arch/arm64/kvm/hyp/include/hyp/switch.h | 141 ++++++++++-- arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h | 77 ++++--- arch/arm64/kvm/hyp/nvhe/hyp-main.c | 9 +- arch/arm64/kvm/hyp/nvhe/pkvm.c | 4 +- arch/arm64/kvm/hyp/nvhe/switch.c | 11 +- arch/arm64/kvm/hyp/vhe/switch.c | 21 +- arch/arm64/kvm/nested.c | 3 +- arch/arm64/kvm/reset.c | 154 +++++++++---- arch/arm64/kvm/sys_regs.c | 118 +++++++++- include/uapi/linux/kvm.h | 1 + tools/testing/selftests/kvm/arm64/get-reg-list.c | 32 ++- tools/testing/selftests/kvm/arm64/set_id_regs.c | 29 ++- 27 files changed, 1078 insertions(+), 268 deletions(-) --- base-commit: 6a25088d268ce4c2163142ead7fe1975bb687cb7 change-id: 20230301-kvm-arm64-sme-06a1246d3636 prerequisite-message-id: 20250210195226.1215254-1-mark.rutland@arm.com prerequisite-patch-id: 615ab9c526e9f6242bd5b8d7188e96fb66fb0e64 prerequisite-patch-id: e5c4f2ff9c9ba01a0f659dd1e8bf6396de46e197 prerequisite-patch-id: 0794d28526755180847841c045a6b7cb3d800c16 prerequisite-patch-id: 079d3a8a680f793b593268eeba000acc55a0b4ec prerequisite-patch-id: a3428f67a5ee49f2b01208f30b57984d5409d8f5 prerequisite-patch-id: 26393e401e9eae7cff5bb1d3bdb18b4e29ffc8fe prerequisite-patch-id: 64f9819f751da4a1c73b9d1b292ccee6afda89f6 prerequisite-patch-id: 0355baaa8ceb31dc85d015b56084c33416f78041 Best regards,