mbox series

[v14,0/5] KVM: arm64: Provide guest support for GCS

Message ID 20241005-arm64-gcs-v14-0-59060cd6092b@kernel.org
Headers show
Series KVM: arm64: Provide guest support for GCS | expand

Message

Mark Brown Oct. 5, 2024, 10:37 a.m. UTC
The arm64 Guarded Control Stack (GCS) feature provides support for
hardware protected stacks of return addresses, intended to provide
hardening against return oriented programming (ROP) attacks and to make
it easier to gather call stacks for applications such as profiling.

When GCS is active a secondary stack called the Guarded Control Stack is
maintained, protected with a memory attribute which means that it can
only be written with specific GCS operations.  The current GCS pointer
can not be directly written to by userspace.  When a BL is executed the
value stored in LR is also pushed onto the GCS, and when a RET is
executed the top of the GCS is popped and compared to LR with a fault
being raised if the values do not match.  GCS operations may only be
performed on GCS pages, a data abort is generated if they are not.

The combination of hardware enforcement and lack of extra instructions
in the function entry and exit paths should result in something which
has less overhead and is more difficult to attack than a purely software
implementation like clang's shadow stacks.

This series implements support for managing GCS for KVM guests, it also
includes a fix for S1PIE which has also been sent separately as this
feature is a dependency for GCS.  It is based on:

   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/gcs

Signed-off-by: Mark Brown <broonie@kernel.org>
---
Changes in v14:
- Rebase onto arm64/for-next/gcs which includes all the non-KVM support.
- Manage the fine grained traps for GCS instructions.
- Manage PSTATE.EXLOCK when delivering exceptions to KVM guests.
- Link to v13: https://lore.kernel.org/r/20241001-arm64-gcs-v13-0-222b78d87eee@kernel.org

Changes in v13:
- Rebase onto v6.12-rc1.
- Allocate VM_HIGH_ARCH_6 since protection keys used all the existing
  bits.
- Implement mm_release() and free transparently allocated GCSs there.
- Use bit 32 of AT_HWCAP for GCS due to AT_HWCAP2 being filled.
- Since we now only set GCSCRE0_EL1 on change ensure that it is
  initialised with GCSPR_EL0 accessible to EL0.
- Fix OOM handling on thread copy.
- Link to v12: https://lore.kernel.org/r/20240829-arm64-gcs-v12-0-42fec947436a@kernel.org

Changes in v12:
- Clarify and simplify the signal handling code so we work with the
  register state.
- When checking for write aborts to shadow stack pages ensure the fault
  is a data abort.
- Depend on !UPROBES.
- Comment cleanups.
- Link to v11: https://lore.kernel.org/r/20240822-arm64-gcs-v11-0-41b81947ecb5@kernel.org

Changes in v11:
- Remove the dependency on the addition of clone3() support for shadow
  stacks, rebasing onto v6.11-rc3.
- Make ID_AA64PFR1_EL1.GCS writeable in KVM.
- Hide GCS registers when GCS is not enabled for KVM guests.
- Require HCRX_EL2.GCSEn if booting at EL1.
- Require that GCSCR_EL1 and GCSCRE0_EL1 be initialised regardless of
  if we boot at EL2 or EL1.
- Remove some stray use of bit 63 in signal cap tokens.
- Warn if we see a GCS with VM_SHARED.
- Remove rdundant check for VM_WRITE in fault handling.
- Cleanups and clarifications in the ABI document.
- Clean up and improve documentation of some sync placement.
- Only set the EL0 GCS mode if it's actually changed.
- Various minor fixes and tweaks.
- Link to v10: https://lore.kernel.org/r/20240801-arm64-gcs-v10-0-699e2bd2190b@kernel.org

Changes in v10:
- Fix issues with THP.
- Tighten up requirements for initialising GCSCR*.
- Only generate GCS signal frames for threads using GCS.
- Only context switch EL1 GCS registers if S1PIE is enabled.
- Move context switch of GCSCRE0_EL1 to EL0 context switch.
- Make GCS registers unconditionally visible to userspace.
- Use FHU infrastructure.
- Don't change writability of ID_AA64PFR1_EL1 for KVM.
- Remove unused arguments from alloc_gcs().
- Typo fixes.
- Link to v9: https://lore.kernel.org/r/20240625-arm64-gcs-v9-0-0f634469b8f0@kernel.org

Changes in v9:
- Rebase onto v6.10-rc3.
- Restructure and clarify memory management fault handling.
- Fix up basic-gcs for the latest clone3() changes.
- Convert to newly merged KVM ID register based feature configuration.
- Fixes for NV traps.
- Link to v8: https://lore.kernel.org/r/20240203-arm64-gcs-v8-0-c9fec77673ef@kernel.org

Changes in v8:
- Invalidate signal cap token on stack when consuming.
- Typo and other trivial fixes.
- Don't try to use process_vm_write() on GCS, it intentionally does not
  work.
- Fix leak of thread GCSs.
- Rebase onto latest clone3() series.
- Link to v7: https://lore.kernel.org/r/20231122-arm64-gcs-v7-0-201c483bd775@kernel.org

Changes in v7:
- Rebase onto v6.7-rc2 via the clone3() patch series.
- Change the token used to cap the stack during signal handling to be
  compatible with GCSPOPM.
- Fix flags for new page types.
- Fold in support for clone3().
- Replace copy_to_user_gcs() with put_user_gcs().
- Link to v6: https://lore.kernel.org/r/20231009-arm64-gcs-v6-0-78e55deaa4dd@kernel.org

Changes in v6:
- Rebase onto v6.6-rc3.
- Add some more gcsb_dsync() barriers following spec clarifications.
- Due to ongoing discussion around clone()/clone3() I've not updated
  anything there, the behaviour is the same as on previous versions.
- Link to v5: https://lore.kernel.org/r/20230822-arm64-gcs-v5-0-9ef181dd6324@kernel.org

Changes in v5:
- Don't map any permissions for user GCSs, we always use EL0 accessors
  or use a separate mapping of the page.
- Reduce the standard size of the GCS to RLIMIT_STACK/2.
- Enforce a PAGE_SIZE alignment requirement on map_shadow_stack().
- Clarifications and fixes to documentation.
- More tests.
- Link to v4: https://lore.kernel.org/r/20230807-arm64-gcs-v4-0-68cfa37f9069@kernel.org

Changes in v4:
- Implement flags for map_shadow_stack() allowing the cap and end of
  stack marker to be enabled independently or not at all.
- Relax size and alignment requirements for map_shadow_stack().
- Add more blurb explaining the advantages of hardware enforcement.
- Link to v3: https://lore.kernel.org/r/20230731-arm64-gcs-v3-0-cddf9f980d98@kernel.org

Changes in v3:
- Rebase onto v6.5-rc4.
- Add a GCS barrier on context switch.
- Add a GCS stress test.
- Link to v2: https://lore.kernel.org/r/20230724-arm64-gcs-v2-0-dc2c1d44c2eb@kernel.org

Changes in v2:
- Rebase onto v6.5-rc3.
- Rework prctl() interface to allow each bit to be locked independently.
- map_shadow_stack() now places the cap token based on the size
  requested by the caller not the actual space allocated.
- Mode changes other than enable via ptrace are now supported.
- Expand test coverage.
- Various smaller fixes and adjustments.
- Link to v1: https://lore.kernel.org/r/20230716-arm64-gcs-v1-0-bf567f93bba6@kernel.org

---
Mark Brown (5):
      KVM: arm64: Expose S1PIE to guests
      arm64/gcs: Ensure FGTs for EL1 GCS instructions are disabled
      KVM: arm64: Manage GCS access and registers for guests
      KVM: arm64: Set PSTATE.EXLOCK when entering an exception
      KVM: selftests: arm64: Add GCS registers to get-reg-list

 arch/arm64/include/asm/el2_setup.h                 |  7 ++++-
 arch/arm64/include/asm/kvm_host.h                  | 12 ++++++++
 arch/arm64/include/asm/vncr_mapping.h              |  2 ++
 arch/arm64/include/uapi/asm/ptrace.h               |  2 ++
 arch/arm64/kvm/hyp/exception.c                     | 10 +++++++
 arch/arm64/kvm/hyp/include/hyp/sysreg-sr.h         | 31 +++++++++++++++++++
 arch/arm64/kvm/sys_regs.c                          | 35 ++++++++++++++++++++--
 tools/testing/selftests/kvm/aarch64/get-reg-list.c | 28 +++++++++++++++++
 8 files changed, 124 insertions(+), 3 deletions(-)
---
base-commit: ed4983d2da8c3b66ac6d048beb242916bec83522
change-id: 20230303-arm64-gcs-e311ab0d8729

Best regards,

Comments

Mark Brown Oct. 5, 2024, 1:08 p.m. UTC | #1
On Sat, Oct 05, 2024 at 12:34:20PM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:

> > +	if (!kvm_has_gcs(kvm)) {
> > +		kvm->arch.fgu[HFGxTR_GROUP] |= (HFGxTR_EL2_nGCS_EL0 |
> > +						HFGxTR_EL2_nGCS_EL1);
> > +		kvm->arch.fgu[HFGITR_GROUP] |= (HFGITR_EL2_nGCSEPP |
> > +						HFGITR_EL2_nGCSSTR_EL1 |
> > +						HFGITR_EL2_nGCSPUSHM_EL1);

> Where is the handling of traps resulting of HFGITR_EL2.nGCSSTR_EL1?

These will trap with an EC of 0x2d which isn't known so I was expecting
this to get handled in the same way as for example a return of false
from kvm_hyp_handle_fpsimd() for SVE when unsupported, or for the
simiarly unknown SME EC, currently.  I gather from your comment that
you're instead expecting to see an explicit exit handler for this EC
that just injects the UNDEF directly?
Mark Brown Oct. 5, 2024, 1:48 p.m. UTC | #2
On Sat, Oct 05, 2024 at 02:18:57PM +0100, Marc Zyngier wrote:
> Mark Brown <broonie@kernel.org> wrote:
> > On Sat, Oct 05, 2024 at 12:34:20PM +0100, Marc Zyngier wrote:

> > > Where is the handling of traps resulting of HFGITR_EL2.nGCSSTR_EL1?

> > These will trap with an EC of 0x2d which isn't known so I was expecting
> > this to get handled in the same way as for example a return of false
> > from kvm_hyp_handle_fpsimd() for SVE when unsupported, or for the
> > simiarly unknown SME EC, currently.  I gather from your comment that
> > you're instead expecting to see an explicit exit handler for this EC
> > that just injects the UNDEF directly?

> Not just inject an UNDEF directly, but also track whether this needs
> to be forwarded when the guest's HFGITR_EL2.nGCSSTR_EL1 is 0 while not
> being not RES0. Basically following what the pseudocode describes.

Ah, I see.  I'd been under the impression that the generic machinery was
supposed to handle this already using the descriptions in
emulate-nested.c and we only needed handlers for more specific actions.
Marc Zyngier Oct. 5, 2024, 2:33 p.m. UTC | #3
On Sat, 05 Oct 2024 15:26:38 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> [1  <text/plain; us-ascii (7bit)>]
> On Sat, Oct 05, 2024 at 03:02:09PM +0100, Marc Zyngier wrote:
> > Mark Brown <broonie@kernel.org> wrote:
> 
> > > Ah, I see.  I'd been under the impression that the generic machinery was
> > > supposed to handle this already using the descriptions in
> > > emulate-nested.c and we only needed handlers for more specific actions.
> 
> > From that very file:
> 
> > /*
> >  * Map encoding to trap bits for exception reported with EC=0x18.
> >  [...]
> >  */
> 
> > Everything else needs special handling.
> 
> I see.  I had noticed that comment on that table but I didn't register
> that the comment wound up applying to the whole file rather than being
> about a specific part of the handling.  I'm a bit confused about how
> things like the SVE traps I mentioned work here...

Like all ECs, the handling starts in handle_exit.c.

	M.