mbox series

[v14,00/22] Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support

Message ID 20240421180122.1650812-1-michael.roth@amd.com
Headers show
Series Add AMD Secure Nested Paging (SEV-SNP) Hypervisor Support | expand

Message

Michael Roth April 21, 2024, 6:01 p.m. UTC
This patchset is also available at:

  https://github.com/amdese/linux/commits/snp-host-v14

and is based on commit 20cc50a0410f (just before the v13 SNP patches) from:

  https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue


Patch Layout
------------

01-04: These patches add some basic infrastructure and introduces a new
       KVM_X86_SNP_VM vm_type to handle differences verses the existing
       KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.

05-07: These implement the KVM API to handle the creation of a
       cryptographic launch context, encrypt/measure the initial image
       into guest memory, and finalize it before launching it.

08-13: These implement handling for various guest-generated events such
       as page state changes, onlining of additional vCPUs, etc.

14-17: These implement the gmem hooks needed to prepare gmem-allocated
       pages before mapping them into guest private memory ranges as
       well as cleaning them up prior to returning them to the host for
       use as normal memory. Because this supplants certain activities
       like issued WBINVDs during KVM MMU invalidations, there's also
       a patch to avoid duplicating that work to avoid unecessary
       overhead.

18:    With all the core support in place, the patch adds a kvm_amd module
       parameter to enable SNP support.

19-22: These patches all deal with the servicing of guest requests to handle
       things like attestation, as well as some related host-management
       interfaces.


Testing
-------

For testing this via QEMU, use the following tree:

  https://github.com/amdese/qemu/commits/snp-v4-wip3b

A patched OVMF is also needed due to upstream KVM no longer supporting MMIO
ranges that are mapped as private. It is recommended you build the AmdSevX64
variant as it provides the kernel-hashing support present in this series:

  https://github.com/amdese/ovmf/commits/apic-mmio-fix1d

A basic command-line invocation for SNP would be:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
  -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd

With kernel-hashing and certificate data supplied:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=,certs-path=/home/mroth/cert.blob,kernel-hashes=on
  -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d-AmdSevX64.fd
  -kernel /boot/vmlinuz-$ver
  -initrd /boot/initrd.img-$ver
  -append "root=UUID=d72a6d1c-06cf-4b79-af43-f1bac4f620f9 ro console=ttyS0,115200n8"

With standard X64 OVMF package with separate image for persistent NVRAM:

 qemu-system-x86_64 -smp 32,maxcpus=255 -cpu EPYC-Milan-v2
  -machine q35,confidential-guest-support=sev0,memory-backend=ram1
  -object memory-backend-memfd,id=ram1,size=4G,share=true,reserve=false
  -object sev-snp-guest,id=sev0,cbitpos=51,reduced-phys-bits=1,id-auth=
  -bios OVMF_CODE-upstream-20240410-apic-mmio-fix1d.fd 
  -drive if=pflash,format=raw,unit=0,file=OVMF_VARS-upstream-20240410-apic-mmio-fix1d.fd,readonly=off


Known issues / TODOs
--------------------

 * Base tree in some cases reports "Unpatched return thunk in use. This should 
   not happen!" the first time it runs an SVM/SEV/SNP guests. This a recent
   regression upstream and unrelated to this series:

     https://lore.kernel.org/linux-kernel/CANpmjNOcKzEvLHoGGeL-boWDHJobwfwyVxUqMq2kWeka3N4tXA@mail.gmail.com/T/

 * 2MB hugepage support has been dropped pending discussion on how we plan to
   re-enable it in gmem.

 * Host kexec should work, but there is a known issue with host kdump support
   while SNP guests are running that will be addressed as a follow-up.

 * SNP kselftests are currently a WIP and will be included as part of SNP
   upstreaming efforts in the near-term.


SEV-SNP Overview
----------------

This part of the Secure Encrypted Paging (SEV-SNP) series focuses on the
changes required to add KVM support for SEV-SNP. This series builds upon
SEV-SNP guest support, which is now in mainline, and and SEV-SNP host
initialization support, which is now in linux-next.

While series provides the basic building blocks to support booting the
SEV-SNP VMs, it does not cover all the security enhancement introduced by
the SEV-SNP such as interrupt protection, which will added in the future.

With SNP, when pages are marked as guest-owned in the RMP table, they are
assigned to a specific guest/ASID, as well as a specific GFN with in the
guest. Any attempts to map it in the RMP table to a different guest/ASID,
or a different GFN within a guest/ASID, will result in an RMP nested page
fault.

Prior to accessing a guest-owned page, the guest must validate it with a
special PVALIDATE instruction which will set a special bit in the RMP table
for the guest. This is the only way to set the validated bit outside of the
initial pre-encrypted guest payload/image; any attempts outside the guest to
modify the RMP entry from that point forward will result in the validated
bit being cleared, at which point the guest will trigger an exception if it
attempts to access that page so it can be made aware of possible tampering.

One exception to this is the initial guest payload, which is pre-validated
by the firmware prior to launching. The guest can use Guest Message requests 
to fetch an attestation report which will include the measurement of the
initial image so that the guest can verify it was booted with the expected
image/environment.

After boot, guests can use Page State Change requests to switch pages
between shared/hypervisor-owned and private/guest-owned to share data for
things like DMA, virtio buffers, and other GHCB requests.

In this implementation of SEV-SNP, private guest memory is managed by a new
kernel framework called guest_memfd (gmem). With gmem, a new
KVM_SET_MEMORY_ATTRIBUTES KVM ioctl has been added to tell the KVM
MMU whether a particular GFN should be backed by shared (normal) memory or
private (gmem-allocated) memory. To tie into this, Page State Change
requests are forward to userspace via KVM_EXIT_VMGEXIT exits, which will
then issue the corresponding KVM_SET_MEMORY_ATTRIBUTES call to set the
private/shared state in the KVM MMU.

The gmem / KVM MMU hooks implemented in this series will then update the RMP
table entries for the backing PFNs to set them to guest-owned/private when
mapping private pages into the guest via KVM MMU, or use the normal KVM MMU
handling in the case of shared pages where the corresponding RMP table
entries are left in the default shared/hypervisor-owned state.

Feedback/review is very much appreciated!

-Mike


Changes since v13:

 * rebase to new kvm-coco-queue and wire up to PFERR_PRIVATE_ACCESS (Paolo)
 * handle setting kvm->arch.has_private_mem in same location as
   kvm->arch.has_protected_state (Paolo)
 * add flags and additional padding fields to
   snp_launch{start,update,finish} APIs to address alignment and
   expandability (Paolo)
 * update snp_launch_update() to update input struct values to reflect
   current progress of command in situations where mulitple calls are
   needed (Paolo)
 * update snp_launch_update() to avoid copying/accessing 'src' parameter
   when dealing with zero pages. (Paolo)
 * update snp_launch_update() to use u64 as length input parameter instead
   of u32 and adjust padding accordingly
 * modify ordering of SNP_POLICY_MASK_* definitions to be consistent with
   bit order of corresponding flags
 * let firmware handle enforcement of policy bits corresponding to
   user-specified minimum API version
 * add missing "0x" prefixs in pr_debug()'s for snp_launch_start()
 * fix handling of VMSAs during in-place migration (Paolo)

Changes since v12:

 * rebased to latest kvm-coco-queue branch (commit 4d2deb62185f)
 * add more input validation for SNP_LAUNCH_START, especially for handling
   things like MBO/MBZ policy bits, and API major/minor minimums. (Paolo)
 * block SNP KVM instances from being able to run legacy SEV commands (Paolo)
 * don't attempt to measure VMSA for vcpu 0/BSP before the others, let
   userspace deal with the ordering just like with SEV-ES (Paolo)
 * fix up docs for SNP_LAUNCH_FINISH (Paolo)
 * introduce svm->sev_es.snp_has_guest_vmsa flag to better distinguish
   handling for guest-mapped vs non-guest-mapped VMSAs, rename
   'snp_ap_create' flag to 'snp_ap_waiting_for_reset' (Paolo)
 * drop "KVM: SEV: Use a VMSA physical address variable for populating VMCB"
   as it is no longer needed due to above VMSA rework
 * replace pr_debug_ratelimited() messages for RMP #NPFs with a single trace
   event
 * handle transient PSMASH_FAIL_INUSE return codes in kvm_gmem_invalidate(),
   switch to WARN_ON*()'s to indicate remaining error cases are not expected
   and should not be seen in practice. (Paolo)
 * add a cond_resched() in kvm_gmem_invalidate() to avoid soft lock-ups when
   cleaning up large guest memory ranges.
 * rename VLEK_REQUIRED to VCEK_DISABLE. it's be more applicable if another
   key type ever gets added.
 * don't allow attestation to be paused while an attestation request is
   being processed by firmware (Tom)
 * add missing Documentation entry for SNP_VLEK_LOAD
 * collect Reviewed-by's from Paolo and Tom

Changes since v11:

 * Rebase series on kvm-coco-queue and re-work to leverage more
   infrastructure between SNP/TDX series.
 * Drop KVM_SNP_INIT in favor of the new KVM_SEV_INIT2 interface introduced
   here (Paolo):
     https://lore.kernel.org/lkml/20240318233352.2728327-1-pbonzini@redhat.com/
 * Drop exposure API fields related to things like VMPL levels, migration
   agents, etc., until they are actually supported/used (Sean)
 * Rework KVM_SEV_SNP_LAUNCH_UPDATE handling to use a new
   kvm_gmem_populate() interface instead of copying data directly into
   gmem-allocated pages (Sean)
 * Add support for SNP_LOAD_VLEK, rework the SNP_SET_CONFIG_{START,END} to
   have simpler semantics that are applicable to management of SNP_LOAD_VLEK
   updates as well, rename interfaces to the now more appropriate
   SNP_{PAUSE,RESUME}_ATTESTATION
 * Fix up documentation wording and do print warnings for
   userspace-triggerable failures (Peter, Sean)
 * Fix a race with AP_CREATION wake-up events (Jacob, Sean)
 * Fix a memory leak with VMSA pages (Sean)
 * Tighten up handling of RMP page faults to better distinguish between real
   and spurious cases (Tom)
 * Various patch/documentation rewording, cleanups, etc.


----------------------------------------------------------------
Ashish Kalra (1):
      KVM: SEV: Avoid WBINVD for HVA-based MMU notifications for SNP

Brijesh Singh (11):
      KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
      KVM: SEV: Add initial SEV-SNP support
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_START command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_UPDATE command
      KVM: SEV: Add KVM_SEV_SNP_LAUNCH_FINISH command
      KVM: SEV: Add support to handle GHCB GPA register VMGEXIT
      KVM: SEV: Add support to handle MSR based Page State Change VMGEXIT
      KVM: SEV: Add support to handle Page State Change VMGEXIT
      KVM: SEV: Add support to handle RMP nested page faults
      KVM: SVM: Add module parameter to enable SEV-SNP
      KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event

Michael Roth (8):
      KVM: SEV: Select KVM_GENERIC_PRIVATE_MEM when CONFIG_KVM_AMD_SEV=y
      KVM: SEV: Add support for GHCB-based termination requests
      KVM: SEV: Implement gmem hook for initializing private pages
      KVM: SEV: Implement gmem hook for invalidating private pages
      KVM: x86: Implement gmem hook for determining max NPT mapping level
      crypto: ccp: Add the SNP_VLEK_LOAD command
      crypto: ccp: Add the SNP_{PAUSE,RESUME}_ATTESTATION commands
      KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event

Tom Lendacky (2):
      KVM: SEV: Add support to handle AP reset MSR protocol
      KVM: SEV: Support SEV-SNP AP Creation NAE event

 Documentation/virt/coco/sev-guest.rst              |   69 +-
 Documentation/virt/kvm/api.rst                     |   73 +
 .../virt/kvm/x86/amd-memory-encryption.rst         |  110 +-
 arch/x86/include/asm/kvm_host.h                    |    2 +
 arch/x86/include/asm/sev-common.h                  |   22 +-
 arch/x86/include/asm/sev.h                         |   15 +
 arch/x86/include/asm/svm.h                         |    9 +-
 arch/x86/include/uapi/asm/kvm.h                    |   48 +
 arch/x86/kvm/Kconfig                               |    3 +
 arch/x86/kvm/mmu.h                                 |    2 -
 arch/x86/kvm/mmu/mmu.c                             |    1 +
 arch/x86/kvm/svm/sev.c                             | 1447 +++++++++++++++++++-
 arch/x86/kvm/svm/svm.c                             |   44 +-
 arch/x86/kvm/svm/svm.h                             |   50 +
 arch/x86/kvm/trace.h                               |   31 +
 arch/x86/kvm/x86.c                                 |   17 +
 arch/x86/virt/svm/sev.c                            |   80 ++
 drivers/crypto/ccp/sev-dev.c                       |   83 ++
 include/linux/psp-sev.h                            |    4 +-
 include/uapi/linux/kvm.h                           |   28 +
 include/uapi/linux/psp-sev.h                       |   39 +
 include/uapi/linux/sev-guest.h                     |    9 +
 virt/kvm/guest_memfd.c                             |    4 +-
 23 files changed, 2155 insertions(+), 35 deletions(-)

Comments

Michael Roth April 23, 2024, 4:31 p.m. UTC | #1
Quoting Michael Roth (2024-04-21 13:01:00)
> This patchset is also available at:
> 
>   https://github.com/amdese/linux/commits/snp-host-v14
> 
> and is based on commit 20cc50a0410f (just before the v13 SNP patches) from:
> 
>   https://git.kernel.org/pub/scm/virt/kvm/kvm.git/log/?h=kvm-coco-queue
> 
> 
> Patch Layout
> ------------
> 
> 01-04: These patches add some basic infrastructure and introduces a new
>        KVM_X86_SNP_VM vm_type to handle differences verses the existing
>        KVM_X86_SEV_VM and KVM_X86_SEV_ES_VM types.
> 
> 05-07: These implement the KVM API to handle the creation of a
>        cryptographic launch context, encrypt/measure the initial image
>        into guest memory, and finalize it before launching it.
> 
> 08-13: These implement handling for various guest-generated events such
>        as page state changes, onlining of additional vCPUs, etc.
> 
> 14-17: These implement the gmem hooks needed to prepare gmem-allocated
>        pages before mapping them into guest private memory ranges as
>        well as cleaning them up prior to returning them to the host for
>        use as normal memory. Because this supplants certain activities
>        like issued WBINVDs during KVM MMU invalidations, there's also
>        a patch to avoid duplicating that work to avoid unecessary
>        overhead.
> 
> 18:    With all the core support in place, the patch adds a kvm_amd module
>        parameter to enable SNP support.
> 
> 19-22: These patches all deal with the servicing of guest requests to handle
>        things like attestation, as well as some related host-management
>        interfaces.

I just sent an additional set of fixups, patches 23-29. These add some
additional input validation on GHCB requests, mainly ensuring that
SNP-specific requests from non-SNP guests result in an error as soon as
they are received rather than reaching an error state indirectly further
into the call stack.

It's a small diff (included below), but a bit of a pain to squash in
patch by patch due to close proximity with each other, so I've pushed an
updated branch here that already has them squashed in:

  https://github.com/amdese/linux/commits/snp-host-v14b

If preferred I can also resubmit as v15, just let me know.

Full diff is below.

Thanks!

-Mike


diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1cec466e593b..1137a7f4136b 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -3280,6 +3280,8 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 			goto vmgexit_err;
 		break;
 	case SVM_VMGEXIT_AP_CREATION:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto vmgexit_err;
 		if (lower_32_bits(control->exit_info_1) != SVM_VMGEXIT_AP_DESTROY)
 			if (!kvm_ghcb_rax_is_valid(svm))
 				goto vmgexit_err;
@@ -3289,10 +3291,19 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
 	case SVM_VMGEXIT_AP_JUMP_TABLE:
 	case SVM_VMGEXIT_UNSUPPORTED_EVENT:
 	case SVM_VMGEXIT_HV_FEATURES:
-	case SVM_VMGEXIT_PSC:
 	case SVM_VMGEXIT_TERM_REQUEST:
+		break;
+	case SVM_VMGEXIT_PSC:
 	case SVM_VMGEXIT_GUEST_REQUEST:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto vmgexit_err;
+		break;
 	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto vmgexit_err;
+		if (!kvm_ghcb_rax_is_valid(svm) ||
+		    !kvm_ghcb_rbx_is_valid(svm))
+			goto vmgexit_err;
 		break;
 	default:
 		reason = GHCB_ERR_INVALID_EVENT;
@@ -3970,6 +3981,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 				  GHCB_MSR_INFO_MASK, GHCB_MSR_INFO_POS);
 		break;
 	case GHCB_MSR_PREF_GPA_REQ:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto out_terminate;
+
 		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_NONE, GHCB_MSR_GPA_VALUE_MASK,
 				  GHCB_MSR_GPA_VALUE_POS);
 		set_ghcb_msr_bits(svm, GHCB_MSR_PREF_GPA_RESP, GHCB_MSR_INFO_MASK,
@@ -3978,6 +3992,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 	case GHCB_MSR_REG_GPA_REQ: {
 		u64 gfn;
 
+		if (!sev_snp_guest(vcpu->kvm))
+			goto out_terminate;
+
 		gfn = get_ghcb_msr_bits(svm, GHCB_MSR_GPA_VALUE_MASK,
 					GHCB_MSR_GPA_VALUE_POS);
 
@@ -3990,6 +4007,9 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		break;
 	}
 	case GHCB_MSR_PSC_REQ:
+		if (!sev_snp_guest(vcpu->kvm))
+			goto out_terminate;
+
 		ret = snp_begin_psc_msr(vcpu, control->ghcb_gpa);
 		break;
 	case GHCB_MSR_TERM_REQ: {
@@ -4004,12 +4024,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 		pr_info("SEV-ES guest requested termination: %#llx:%#llx\n",
 			reason_set, reason_code);
 
-		vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
-		vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
-		vcpu->run->system_event.ndata = 1;
-		vcpu->run->system_event.data[0] = control->ghcb_gpa;
-
-		return 0;
+		goto out_terminate;
 	}
 	default:
 		/* Error, keep GHCB MSR value as-is */
@@ -4020,6 +4035,14 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 					    control->ghcb_gpa, ret);
 
 	return ret;
+
+out_terminate:
+	vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT;
+	vcpu->run->system_event.type = KVM_SYSTEM_EVENT_SEV_TERM;
+	vcpu->run->system_event.ndata = 1;
+	vcpu->run->system_event.data[0] = control->ghcb_gpa;
+
+	return 0;
 }
 
 int sev_handle_vmgexit(struct kvm_vcpu *vcpu)
Jarkko Sakkinen April 23, 2024, 9:36 p.m. UTC | #2
On Tue Apr 23, 2024 at 7:21 PM EEST, Michael Roth wrote:
> Ensure an error is returned if a non-SNP guest attempts to issue an
> Extended Guest Request. Also add input validation for RAX/RBX.
>
> Signed-off-by: Michael Roth <michael.roth@amd.com>
> ---
>  arch/x86/kvm/svm/sev.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 2b30b3b0eec8..ff64ed8df301 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -3297,6 +3297,11 @@ static int sev_es_validate_vmgexit(struct vcpu_svm *svm)
>  			goto vmgexit_err;
>  		break;
>  	case SVM_VMGEXIT_EXT_GUEST_REQUEST:
> +		if (!sev_snp_guest(vcpu->kvm))
> +			goto vmgexit_err;
> +		if (!kvm_ghcb_rax_is_valid(svm) ||
> +		    !kvm_ghcb_rbx_is_valid(svm))
> +			goto vmgexit_err;

Hmm... maybe I'm ignoring something but why this is not just:

	if (!sev_snp_guest(vcpu->kvm) ||
	    !kvm_ghcb_rax_is_valid(svm) ||
	    !kvm_ghcb_rbx_is_valid(svm)))

since they branch to the same location.

BR, Jarkko
Sean Christopherson April 24, 2024, 8:21 p.m. UTC | #3
On Sun, Apr 21, 2024, Michael Roth wrote:
> diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> index 6e31cb408dd8..1d2264e93afe 100644
> --- a/arch/x86/kvm/svm/sev.c
> +++ b/arch/x86/kvm/svm/sev.c
> @@ -33,9 +33,11 @@
>  #include "cpuid.h"
>  #include "trace.h"
>  
> -#define GHCB_VERSION_MAX	1ULL
> +#define GHCB_VERSION_MAX	2ULL
>  #define GHCB_VERSION_MIN	1ULL

This needs a userspace control.  Being unable to limit the GHCB version advertised
to the guest is going to break live migration of SEV-ES VMs, e.g. if a pool of
hosts has some kernels running this flavor of KVM, and some hosts running an
older KVM that doesn't support v2.
Michael Roth April 25, 2024, 8:52 p.m. UTC | #4
On Wed, Apr 24, 2024 at 01:21:49PM -0700, Sean Christopherson wrote:
> On Sun, Apr 21, 2024, Michael Roth wrote:
> > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > index 6e31cb408dd8..1d2264e93afe 100644
> > --- a/arch/x86/kvm/svm/sev.c
> > +++ b/arch/x86/kvm/svm/sev.c
> > @@ -33,9 +33,11 @@
> >  #include "cpuid.h"
> >  #include "trace.h"
> >  
> > -#define GHCB_VERSION_MAX	1ULL
> > +#define GHCB_VERSION_MAX	2ULL
> >  #define GHCB_VERSION_MIN	1ULL
> 
> This needs a userspace control.  Being unable to limit the GHCB version advertised
> to the guest is going to break live migration of SEV-ES VMs, e.g. if a pool of
> hosts has some kernels running this flavor of KVM, and some hosts running an
> older KVM that doesn't support v2.
> 

The requirements for implementing the non-SNP aspects of the GHCB
version 2 protocol are fairly minimal, and KVM_SEV_INIT2 is already
migration incompatible with older kernels running KVM_SEV_ES_INIT (e.g.
migrate to newer host, shutdown, start -> measurement failure). There
are QEMU patches here that allow for controlling this via QEMU versioned
machine types to handle this [1]

So I think it makes sense to go ahead move to GHCB version 2 as the base
version for all SEV-ES/SNP guests created via KVM_SEV_INIT2, and leave
KVM_SEV_ES_INIT restricted to GHCB version 1.

This could be done in a pretty self-contained way for SEV-ES by applying
the following patches from this series which are the version 2 protocol
interfaces also applicable to SEV-ES:

  KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
  KVM: SEV: Add support to handle AP reset MSR protocol
  KVM: SEV: Add support for GHCB-based termination requests

And then applying the below patch on top to set GHCB version 1 or 2
accordingly for SEV-ES. (and relocating the GHCB_VERSION_MAX bump to the
below patch as well, although it's not really used at that point so
could also just be dropped completely).

Then in the future we can extend KVM_SEV_INIT2 to allow specifying
specific/newer versions of the GHCB protocol when that becomes needed.

If that sounds appropriate I can submit as a separate standalone patchset
for SEV-ES and then make that a prereq for the remaining SNP-specific bits.

-Mike

[1] https://lore.kernel.org/kvm/20240409230743.962513-1-michael.roth@amd.com/


commit 114da695c065595f74fe8aa0e9203f3c65175a95
Author: Michael Roth <michael.roth@amd.com>
Date:   Thu Apr 25 14:42:17 2024 -0500

    KVM: SEV: Allow per-instance tracking of GHCB protocol version
    
    The GHCB protocol version may be different from one guest to the next.
    Add a field to track it and initialize it accordingly when KVM_SEV_INIT,
    KVM_SEV_ES_INIT, and KVM_SEV_INIT2 are called.
    
    Signed-off-by: Michael Roth <michael.roth@amd.com>

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index 1137a7f4136b..5c16abc47541 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -310,7 +310,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
 
 static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 			    struct kvm_sev_init *data,
-			    unsigned long vm_type)
+			    unsigned long vm_type, u16 ghcb_version)
 {
 	struct kvm_sev_info *sev = &to_kvm_svm(kvm)->sev_info;
 	struct sev_platform_init_args init_args = {0};
@@ -333,6 +333,7 @@ static int __sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp,
 	sev->active = true;
 	sev->es_active = es_active;
 	sev->vmsa_features = data->vmsa_features;
+	sev->ghcb_version = ghcb_version;
 
 	if (vm_type == KVM_X86_SNP_VM)
 		sev->vmsa_features |= SVM_SEV_FEAT_SNP_ACTIVE;
@@ -376,7 +377,13 @@ static int sev_guest_init(struct kvm *kvm, struct kvm_sev_cmd *argp)
 		return -EINVAL;
 
 	vm_type = (argp->id == KVM_SEV_INIT ? KVM_X86_SEV_VM : KVM_X86_SEV_ES_VM);
-	return __sev_guest_init(kvm, argp, &data, vm_type);
+
+	/*
+	 * KVM_SEV_ES_INIT has been deprecated by KVM_SEV_INIT2, so it will
+	 * continue to only ever support the minimal GHCB protocol version.
+	 */
+	return __sev_guest_init(kvm, argp, &data, vm_type,
+				vm_type == KVM_X86_SEV_ES_VM ? GHCB_VERSION_MIN : 0);
 }
 
 static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
@@ -395,7 +402,15 @@ static int sev_guest_init2(struct kvm *kvm, struct kvm_sev_cmd *argp)
 	if (copy_from_user(&data, u64_to_user_ptr(argp->data), sizeof(data)))
 		return -EFAULT;
 
-	return __sev_guest_init(kvm, argp, &data, kvm->arch.vm_type);
+	/*
+	 * Currently KVM supports the full range of mandatory features defined by
+	 * version 2 of the GHCB protocol, so default to that for SEV/SNP guests
+	 * created via KVM_SEV_INIT2. Care should be taken that support for future
+	 * versions of the GHCB protocol are configurable via KVM_SEV_INIT2 to
+	 * allow limiting the guests to a particular version to support things
+	 * like live migration.
+	 */
+	return __sev_guest_init(kvm, argp, &data, kvm->arch.vm_type, 2);
 }
 
 static int sev_bind_asid(struct kvm *kvm, unsigned int handle, int *error)
@@ -3906,6 +3921,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 {
 	struct vmcb_control_area *control = &svm->vmcb->control;
 	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
 	u64 ghcb_info;
 	int ret = 1;
 
@@ -3916,7 +3932,7 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm)
 
 	switch (ghcb_info) {
 	case GHCB_MSR_SEV_INFO_REQ:
-		set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+		set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(sev->ghcb_version,
 						    GHCB_VERSION_MIN,
 						    sev_enc_bit));
 		break;
@@ -4341,11 +4357,14 @@ void sev_init_vmcb(struct vcpu_svm *svm)
 
 void sev_es_vcpu_reset(struct vcpu_svm *svm)
 {
+	struct kvm_vcpu *vcpu = &svm->vcpu;
+	struct kvm_sev_info *sev = &to_kvm_svm(vcpu->kvm)->sev_info;
+
 	/*
 	 * Set the GHCB MSR value as per the GHCB specification when emulating
 	 * vCPU RESET for an SEV-ES guest.
 	 */
-	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX,
+	set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(sev->ghcb_version,
 					    GHCB_VERSION_MIN,
 					    sev_enc_bit));
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 28140bc8af27..229cb630b540 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -87,6 +87,7 @@ struct kvm_sev_info {
 	struct list_head regions_list;  /* List of registered regions */
 	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
 	u64 vmsa_features;
+	u64 ghcb_version;	/* Highest guest GHCB protocol version allowed */
 	struct kvm *enc_context_owner; /* Owner of copied encryption context */
 	struct list_head mirror_vms; /* List of VMs mirroring */
 	struct list_head mirror_entry; /* Use as a list entry of mirrors */
Sean Christopherson April 25, 2024, 9:55 p.m. UTC | #5
On Thu, Apr 25, 2024, Michael Roth wrote:
> On Wed, Apr 24, 2024 at 01:21:49PM -0700, Sean Christopherson wrote:
> > On Sun, Apr 21, 2024, Michael Roth wrote:
> > > diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
> > > index 6e31cb408dd8..1d2264e93afe 100644
> > > --- a/arch/x86/kvm/svm/sev.c
> > > +++ b/arch/x86/kvm/svm/sev.c
> > > @@ -33,9 +33,11 @@
> > >  #include "cpuid.h"
> > >  #include "trace.h"
> > >  
> > > -#define GHCB_VERSION_MAX	1ULL
> > > +#define GHCB_VERSION_MAX	2ULL
> > >  #define GHCB_VERSION_MIN	1ULL
> > 
> > This needs a userspace control.  Being unable to limit the GHCB version advertised
> > to the guest is going to break live migration of SEV-ES VMs, e.g. if a pool of
> > hosts has some kernels running this flavor of KVM, and some hosts running an
> > older KVM that doesn't support v2.
> > 
> 
> The requirements for implementing the non-SNP aspects of the GHCB
> version 2 protocol are fairly minimal, and KVM_SEV_INIT2 is already
> migration incompatible with older kernels running KVM_SEV_ES_INIT (e.g.
> migrate to newer host, shutdown, start -> measurement failure). There
> are QEMU patches here that allow for controlling this via QEMU versioned
> machine types to handle this [1]
> 
> So I think it makes sense to go ahead move to GHCB version 2 as the base
> version for all SEV-ES/SNP guests created via KVM_SEV_INIT2, and leave
> KVM_SEV_ES_INIT restricted to GHCB version 1.

Hmm, I like that.  Dangle a carrot to get folks to switch to KVM_SEV_INIT2.

> This could be done in a pretty self-contained way for SEV-ES by applying
> the following patches from this series which are the version 2 protocol
> interfaces also applicable to SEV-ES:
> 
>   KVM: SEV: Add GHCB handling for Hypervisor Feature Support requests
>   KVM: SEV: Add support to handle AP reset MSR protocol
>   KVM: SEV: Add support for GHCB-based termination requests
> 
> And then applying the below patch on top to set GHCB version 1 or 2
> accordingly for SEV-ES. (and relocating the GHCB_VERSION_MAX bump to the
> below patch as well, although it's not really used at that point so
> could also just be dropped completely).
> 
> Then in the future we can extend KVM_SEV_INIT2 to allow specifying
> specific/newer versions of the GHCB protocol when that becomes needed.

Any reason not to let userspace restrict the GHCB protocol from the get-go?  It
seems inevitable that KVM will need to support that at some point, we'd have a
wee bit more confidence that we didn't botch the definition of KVM_SEV_INIT2 and
end up with KVM_SEV_INIT3, and in the unlikely event some poor provider gets into
a situation where guests are crashing because they don't handle v2 correctly,
userspace can workaround the issue without need to extend KVM's uAPI.

> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 28140bc8af27..229cb630b540 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -87,6 +87,7 @@ struct kvm_sev_info {
>  	struct list_head regions_list;  /* List of registered regions */
>  	u64 ap_jump_table;	/* SEV-ES AP Jump Table address */
>  	u64 vmsa_features;
> +	u64 ghcb_version;	/* Highest guest GHCB protocol version allowed */

This can/should be a u16, no?

>  	struct kvm *enc_context_owner; /* Owner of copied encryption context */
>  	struct list_head mirror_vms; /* List of VMs mirroring */
>  	struct list_head mirror_entry; /* Use as a list entry of mirrors */
Michael Roth April 26, 2024, 5:57 p.m. UTC | #6
On Wed, Apr 24, 2024 at 05:10:31PM -0700, Sean Christopherson wrote:
> On Sun, Apr 21, 2024, Michael Roth wrote:
> > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> > index 85099198a10f..6cf186ed8f66 100644
> > --- a/Documentation/virt/kvm/api.rst
> > +++ b/Documentation/virt/kvm/api.rst
> > @@ -7066,6 +7066,7 @@ values in kvm_run even if the corresponding bit in kvm_dirty_regs is not set.
> >  		struct kvm_user_vmgexit {
> >  		#define KVM_USER_VMGEXIT_PSC_MSR	1
> >  		#define KVM_USER_VMGEXIT_PSC		2
> > +		#define KVM_USER_VMGEXIT_EXT_GUEST_REQ	3
> 
> Assuming we can't get massage this into a vendor agnostic exit, there's gotta be
> a better name than EXT_GUEST_REQ, which is completely meaningless to me and probably
> most other people that aren't intimately familar with the specs.  Request what?

EXT_CERT_REQ maybe? That's effectively all it boils down to, fetching
the GHCB-defined certificate blob from userspace.

> 
> >  			__u32 type; /* KVM_USER_VMGEXIT_* type */
> >  			union {
> >  				struct {
> > @@ -3812,6 +3813,84 @@ static void snp_handle_guest_req(struct vcpu_svm *svm, gpa_t req_gpa, gpa_t resp
> >  	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> >  }
> >  
> > +static int snp_complete_ext_guest_req(struct kvm_vcpu *vcpu)
> > +{
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	struct vmcb_control_area *control;
> > +	struct kvm *kvm = vcpu->kvm;
> > +	sev_ret_code fw_err = 0;
> > +	int vmm_ret;
> > +
> > +	vmm_ret = vcpu->run->vmgexit.ext_guest_req.ret;
> > +	if (vmm_ret) {
> > +		if (vmm_ret == SNP_GUEST_VMM_ERR_INVALID_LEN)
> > +			vcpu->arch.regs[VCPU_REGS_RBX] =
> > +				vcpu->run->vmgexit.ext_guest_req.data_npages;
> > +		goto abort_request;
> > +	}
> > +
> > +	control = &svm->vmcb->control;
> > +
> > +	/*
> > +	 * To avoid the message sequence number getting out of sync between the
> > +	 * actual value seen by firmware verses the value expected by the guest,
> > +	 * make sure attestations can't get paused on the write-side at this
> > +	 * point by holding the lock for the entire duration of the firmware
> > +	 * request so that there is no situation where SNP_GUEST_VMM_ERR_BUSY
> > +	 * would need to be returned after firmware sees the request.
> > +	 */
> > +	mutex_lock(&snp_pause_attestation_lock);
> 
> Why is this in KVM?  IIUC, KVM only needs to get involved to translate GFNs to
> PFNs, the rest can go in the sev-dev driver, no?  The whole split is weird,
> seemingly due to KVM "needing" to take this lock.  E.g. why is core kernel code
> in arch/x86/virt/svm/sev.c at all dealing with attestation goo, when pretty much
> all of the actual usage is (or can be) in sev-dev.c

It would be very tempting to have all this locking tucked away in sev-dev
driver, but KVM is a part of that sequence because it needs to handle fetching
the certificate that will be returned to the guest as part of the
attestation request. The transaction ID updates from PAUSE/RESUME
commands is technically enough satisfy this requirement, in which case
KVM wouldn't need to take the lock directly, only check that the
transaction ID isn't stale and report -EBUSY/-EAGAIN to the guest if it
is.

But if the request actually gets sent to firmware, and then we decide
after that the transaction is stale and thus the attestation response
and certificate might not be in sync, then reporting -EBUSY/-EAGAIN will
permanently hose the guest because the message seq fields will be out of
sync between firmware and guest kernel. That's why we need to hold the
lock to actually block a transaction ID update from occuring once we've
committed to sending the attestation request to firmware.

> 
> > +
> > +	if (__snp_transaction_is_stale(svm->snp_transaction_id))
> > +		vmm_ret = SNP_GUEST_VMM_ERR_BUSY;
> > +	else if (!__snp_handle_guest_req(kvm, control->exit_info_1,
> > +					 control->exit_info_2, &fw_err))
> > +		vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > +
> > +	mutex_unlock(&snp_pause_attestation_lock);
> > +
> > +abort_request:
> > +	ghcb_set_sw_exit_info_2(svm->sev_es.ghcb, SNP_GUEST_ERR(vmm_ret, fw_err));
> > +
> > +	return 1; /* resume guest */
> > +}
> > +
> > +static int snp_begin_ext_guest_req(struct kvm_vcpu *vcpu)
> > +{
> > +	int vmm_ret = SNP_GUEST_VMM_ERR_GENERIC;
> > +	struct vcpu_svm *svm = to_svm(vcpu);
> > +	unsigned long data_npages;
> > +	sev_ret_code fw_err;
> > +	gpa_t data_gpa;
> > +
> > +	if (!sev_snp_guest(vcpu->kvm))
> > +		goto abort_request;
> > +
> > +	data_gpa = vcpu->arch.regs[VCPU_REGS_RAX];
> > +	data_npages = vcpu->arch.regs[VCPU_REGS_RBX];
> > +
> > +	if (!IS_ALIGNED(data_gpa, PAGE_SIZE))
> > +		goto abort_request;
> > +
> > +	svm->snp_transaction_id = snp_transaction_get_id();
> > +	if (snp_transaction_is_stale(svm->snp_transaction_id)) {
> 
> Why bother?  I assume this is rare, so why not handle it on the backend, i.e.
> after userspace does its thing?  Then KVM doesn't even have to care about
> checking for stale IDs, I think?

That's true, it's little more than a very minor performance optimization
to do it here rather than on the return path, so could definitely be
dropped.

> 
> None of this makes much sense to my eyes, e.g. AFAICT, the only thing that can
> pause attestation is userspace, yet the kernel is responsible for tracking whether
> or not a transaction is fresh?

Pausing is essentially the start of an update transaction initiated by
userspace. So the kernel is tracking whether there's a potential
transaction that has been started or completed since it first started
servicing the attestation request, otherwise there could be a mismatch
between the endorsement key used for the attestation report versus the
certificate for the endorsement key that was fetched from userspace.

-Mike