Message ID | 20250410132850.3708703-2-ardb+git@google.com |
---|---|
State | New |
Headers | show |
Series | [v3] x86/boot/sev: Avoid shared GHCB page for early memory acceptance | expand |
On 4/11/25 14:00, Ard Biesheuvel wrote: > On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: >> >> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: >>> From: Ard Biesheuvel <ardb@kernel.org> >>> >>> Communicating with the hypervisor using the shared GHCB page requires >>> clearing the C bit in the mapping of that page. When executing in the >>> context of the EFI boot services, the page tables are owned by the >>> firmware, and this manipulation is not possible. >>> >>> So switch to a different API for accepting memory in SEV-SNP guests, one >> >> That being the GHCB MSR protocol, it seems. >> > > Yes. > >> And since Tom co-developed, I guess we wanna do that. >> >> But then how much slower do we become? >> > > Non-EFI stub boot will become slower if the memory that is used to > decompress the kernel has not been accepted yet. But given how heavily > SEV-SNP depends on EFI boot, this typically only happens on kexec, as > that is the only boot path that goes through the traditional > decompressor. Some quick testing showed no significant differences in kexec booting and testing shows everything seems to be good. But, in testing with non-2M sized memory (e.g. a guest with 4097M of memory) and without the change to how SNP is detected before sev_enable() is called, we hit the error path in arch_accept_memory() in arch/x86/boot/compressed/mem.c and the boot crashes. Thanks, Tom > >> And nothing in here talks about why that GHCB method worked or didn't >> work before and that it is ok or not ok why we're axing that off. >> > > ---%<--- > The GHCB shared page method never worked for accepting memory from the > EFI stub, but this is rarely needed in practice: when using the higher > level page allocation APIs, the firmware will make sure that memory is > accepted before it is returned. The only use case for explicit memory > acceptance by the EFI stub is when populating the 'unaccepted memory' > bitmap, which tracks unaccepted memory at a 2MB granularity, and so > chunks of unaccepted memory that are misaligned wrt that are accepted > without being allocated or used. > ---%<--- > >> I'm somehow missing that aspect of why that change is warranted... >> > > This never worked correctly for SEV-SNP, we're just lucky the firmware > appears to accept memory in 2+ MB batches and so these misaligned > chunks are rare in practice. Tom did manage to trigger it IIUC by > giving a VM an amount of memory that is not a multiple of 2M.
On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: > > On 4/11/25 14:00, Ard Biesheuvel wrote: > > On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: > >> > >> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: > >>> From: Ard Biesheuvel <ardb@kernel.org> > >>> > >>> Communicating with the hypervisor using the shared GHCB page requires > >>> clearing the C bit in the mapping of that page. When executing in the > >>> context of the EFI boot services, the page tables are owned by the > >>> firmware, and this manipulation is not possible. > >>> > >>> So switch to a different API for accepting memory in SEV-SNP guests, one > >> > >> That being the GHCB MSR protocol, it seems. > >> > > > > Yes. > > > >> And since Tom co-developed, I guess we wanna do that. > >> > >> But then how much slower do we become? > >> > > > > Non-EFI stub boot will become slower if the memory that is used to > > decompress the kernel has not been accepted yet. But given how heavily > > SEV-SNP depends on EFI boot, this typically only happens on kexec, as > > that is the only boot path that goes through the traditional > > decompressor. > > Some quick testing showed no significant differences in kexec booting > and testing shows everything seems to be good. > Thanks. > But, in testing with non-2M sized memory (e.g. a guest with 4097M of > memory) and without the change to how SNP is detected before > sev_enable() is called, we hit the error path in arch_accept_memory() in > arch/x86/boot/compressed/mem.c and the boot crashes. > Right. So this is because sev_snp_enabled() is based on sev_status, which has not been set yet at this point, right? And for the record, could you please indicate whether you are ok with the co-developed-by/signed-off-by credits on this patch (and subsequent revisions)?
On 4/17/25 11:14, Ard Biesheuvel wrote: > On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: >> >> On 4/11/25 14:00, Ard Biesheuvel wrote: >>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: >>>> >>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: >>>>> From: Ard Biesheuvel <ardb@kernel.org> >>>>> >>>>> Communicating with the hypervisor using the shared GHCB page requires >>>>> clearing the C bit in the mapping of that page. When executing in the >>>>> context of the EFI boot services, the page tables are owned by the >>>>> firmware, and this manipulation is not possible. >>>>> >>>>> So switch to a different API for accepting memory in SEV-SNP guests, one >>>> >>>> That being the GHCB MSR protocol, it seems. >>>> >>> >>> Yes. >>> >>>> And since Tom co-developed, I guess we wanna do that. >>>> >>>> But then how much slower do we become? >>>> >>> >>> Non-EFI stub boot will become slower if the memory that is used to >>> decompress the kernel has not been accepted yet. But given how heavily >>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as >>> that is the only boot path that goes through the traditional >>> decompressor. >> >> Some quick testing showed no significant differences in kexec booting >> and testing shows everything seems to be good. >> > > Thanks. > >> But, in testing with non-2M sized memory (e.g. a guest with 4097M of >> memory) and without the change to how SNP is detected before >> sev_enable() is called, we hit the error path in arch_accept_memory() in >> arch/x86/boot/compressed/mem.c and the boot crashes. >> > > Right. So this is because sev_snp_enabled() is based on sev_status, > which has not been set yet at this point, right? Correct. > > And for the record, could you please indicate whether you are ok with > the co-developed-by/signed-off-by credits on this patch (and > subsequent revisions)? Yep, I'm fine with that. Thanks, Tom
On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote: > > On 4/17/25 11:14, Ard Biesheuvel wrote: > > On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: > >> > >> On 4/11/25 14:00, Ard Biesheuvel wrote: > >>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: > >>>> > >>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: > >>>>> From: Ard Biesheuvel <ardb@kernel.org> > >>>>> > >>>>> Communicating with the hypervisor using the shared GHCB page requires > >>>>> clearing the C bit in the mapping of that page. When executing in the > >>>>> context of the EFI boot services, the page tables are owned by the > >>>>> firmware, and this manipulation is not possible. > >>>>> > >>>>> So switch to a different API for accepting memory in SEV-SNP guests, one > >>>> > >>>> That being the GHCB MSR protocol, it seems. > >>>> > >>> > >>> Yes. > >>> > >>>> And since Tom co-developed, I guess we wanna do that. > >>>> > >>>> But then how much slower do we become? > >>>> > >>> > >>> Non-EFI stub boot will become slower if the memory that is used to > >>> decompress the kernel has not been accepted yet. But given how heavily > >>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as > >>> that is the only boot path that goes through the traditional > >>> decompressor. > >> > >> Some quick testing showed no significant differences in kexec booting > >> and testing shows everything seems to be good. > >> > > > > Thanks. > > > >> But, in testing with non-2M sized memory (e.g. a guest with 4097M of > >> memory) and without the change to how SNP is detected before > >> sev_enable() is called, we hit the error path in arch_accept_memory() in > >> arch/x86/boot/compressed/mem.c and the boot crashes. > >> > > > > Right. So this is because sev_snp_enabled() is based on sev_status, > > which has not been set yet at this point, right? > > Correct. > OK. Would this do the trick? (with asm/sev.h added to the #includes) --- a/arch/x86/boot/compressed/mem.c +++ b/arch/x86/boot/compressed/mem.c @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void) void arch_accept_memory(phys_addr_t start, phys_addr_t end) { + static bool sevsnp; + /* Platform-specific memory-acceptance call goes here */ if (early_is_tdx_guest()) { if (!tdx_accept_memory(start, end)) panic("TDX: Failed to accept memory\n"); - } else if (sev_snp_enabled()) { + } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) { + sevsnp = true; snp_accept_memory(start, end); } else { error("Cannot accept memory: unknown platform\n"); > > > > And for the record, could you please indicate whether you are ok with > > the co-developed-by/signed-off-by credits on this patch (and > > subsequent revisions)? > > Yep, I'm fine with that. > Thanks.
On 4/17/25 11:38, Ard Biesheuvel wrote: > On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote: >> >> On 4/17/25 11:14, Ard Biesheuvel wrote: >>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: >>>> >>>> On 4/11/25 14:00, Ard Biesheuvel wrote: >>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: >>>>>> >>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: >>>>>>> From: Ard Biesheuvel <ardb@kernel.org> >>>>>>> >>>>>>> Communicating with the hypervisor using the shared GHCB page requires >>>>>>> clearing the C bit in the mapping of that page. When executing in the >>>>>>> context of the EFI boot services, the page tables are owned by the >>>>>>> firmware, and this manipulation is not possible. >>>>>>> >>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one >>>>>> >>>>>> That being the GHCB MSR protocol, it seems. >>>>>> >>>>> >>>>> Yes. >>>>> >>>>>> And since Tom co-developed, I guess we wanna do that. >>>>>> >>>>>> But then how much slower do we become? >>>>>> >>>>> >>>>> Non-EFI stub boot will become slower if the memory that is used to >>>>> decompress the kernel has not been accepted yet. But given how heavily >>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as >>>>> that is the only boot path that goes through the traditional >>>>> decompressor. >>>> >>>> Some quick testing showed no significant differences in kexec booting >>>> and testing shows everything seems to be good. >>>> >>> >>> Thanks. >>> >>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of >>>> memory) and without the change to how SNP is detected before >>>> sev_enable() is called, we hit the error path in arch_accept_memory() in >>>> arch/x86/boot/compressed/mem.c and the boot crashes. >>>> >>> >>> Right. So this is because sev_snp_enabled() is based on sev_status, >>> which has not been set yet at this point, right? >> >> Correct. >> > > OK. Would this do the trick? (with asm/sev.h added to the #includes) Yes, that works for booting. Let me do some kexec testing and get back to you. Sorry, that might not be until tomorrow, though. Thanks, Tom > > --- a/arch/x86/boot/compressed/mem.c > +++ b/arch/x86/boot/compressed/mem.c > @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void) > > void arch_accept_memory(phys_addr_t start, phys_addr_t end) > { > + static bool sevsnp; > + > /* Platform-specific memory-acceptance call goes here */ > if (early_is_tdx_guest()) { > if (!tdx_accept_memory(start, end)) > panic("TDX: Failed to accept memory\n"); > - } else if (sev_snp_enabled()) { > + } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) { > + sevsnp = true; > snp_accept_memory(start, end); > } else { > error("Cannot accept memory: unknown platform\n"); > >>> >>> And for the record, could you please indicate whether you are ok with >>> the co-developed-by/signed-off-by credits on this patch (and >>> subsequent revisions)? >> >> Yep, I'm fine with that. >> > > Thanks.
On 4/17/25 12:26, Tom Lendacky wrote: > On 4/17/25 11:38, Ard Biesheuvel wrote: >> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote: >>> >>> On 4/17/25 11:14, Ard Biesheuvel wrote: >>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: >>>>> >>>>> On 4/11/25 14:00, Ard Biesheuvel wrote: >>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: >>>>>>> >>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: >>>>>>>> From: Ard Biesheuvel <ardb@kernel.org> >>>>>>>> >>>>>>>> Communicating with the hypervisor using the shared GHCB page requires >>>>>>>> clearing the C bit in the mapping of that page. When executing in the >>>>>>>> context of the EFI boot services, the page tables are owned by the >>>>>>>> firmware, and this manipulation is not possible. >>>>>>>> >>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one >>>>>>> >>>>>>> That being the GHCB MSR protocol, it seems. >>>>>>> >>>>>> >>>>>> Yes. >>>>>> >>>>>>> And since Tom co-developed, I guess we wanna do that. >>>>>>> >>>>>>> But then how much slower do we become? >>>>>>> >>>>>> >>>>>> Non-EFI stub boot will become slower if the memory that is used to >>>>>> decompress the kernel has not been accepted yet. But given how heavily >>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as >>>>>> that is the only boot path that goes through the traditional >>>>>> decompressor. >>>>> >>>>> Some quick testing showed no significant differences in kexec booting >>>>> and testing shows everything seems to be good. >>>>> >>>> >>>> Thanks. >>>> >>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of >>>>> memory) and without the change to how SNP is detected before >>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in >>>>> arch/x86/boot/compressed/mem.c and the boot crashes. >>>>> >>>> >>>> Right. So this is because sev_snp_enabled() is based on sev_status, >>>> which has not been set yet at this point, right? >>> >>> Correct. >>> >> >> OK. Would this do the trick? (with asm/sev.h added to the #includes) > > Yes, that works for booting. Let me do some kexec testing and get back > to you. Sorry, that might not be until tomorrow, though. Ok, found some time... looks good with kexec, too. Thanks, Tom > > Thanks, > Tom > >> >> --- a/arch/x86/boot/compressed/mem.c >> +++ b/arch/x86/boot/compressed/mem.c >> @@ -34,11 +34,14 @@ static bool early_is_tdx_guest(void) >> >> void arch_accept_memory(phys_addr_t start, phys_addr_t end) >> { >> + static bool sevsnp; >> + >> /* Platform-specific memory-acceptance call goes here */ >> if (early_is_tdx_guest()) { >> if (!tdx_accept_memory(start, end)) >> panic("TDX: Failed to accept memory\n"); >> - } else if (sev_snp_enabled()) { >> + } else if (sevsnp || (sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED)) { >> + sevsnp = true; >> snp_accept_memory(start, end); >> } else { >> error("Cannot accept memory: unknown platform\n"); >> >>>> >>>> And for the record, could you please indicate whether you are ok with >>>> the co-developed-by/signed-off-by credits on this patch (and >>>> subsequent revisions)? >>> >>> Yep, I'm fine with that. >>> >> >> Thanks.
On Thu, 17 Apr 2025 at 22:01, Tom Lendacky <thomas.lendacky@amd.com> wrote: > > On 4/17/25 12:26, Tom Lendacky wrote: > > On 4/17/25 11:38, Ard Biesheuvel wrote: > >> On Thu, 17 Apr 2025 at 18:21, Tom Lendacky <thomas.lendacky@amd.com> wrote: > >>> > >>> On 4/17/25 11:14, Ard Biesheuvel wrote: > >>>> On Thu, 17 Apr 2025 at 18:08, Tom Lendacky <thomas.lendacky@amd.com> wrote: > >>>>> > >>>>> On 4/11/25 14:00, Ard Biesheuvel wrote: > >>>>>> On Fri, 11 Apr 2025 at 20:40, Borislav Petkov <bp@alien8.de> wrote: > >>>>>>> > >>>>>>> On Thu, Apr 10, 2025 at 03:28:51PM +0200, Ard Biesheuvel wrote: > >>>>>>>> From: Ard Biesheuvel <ardb@kernel.org> > >>>>>>>> > >>>>>>>> Communicating with the hypervisor using the shared GHCB page requires > >>>>>>>> clearing the C bit in the mapping of that page. When executing in the > >>>>>>>> context of the EFI boot services, the page tables are owned by the > >>>>>>>> firmware, and this manipulation is not possible. > >>>>>>>> > >>>>>>>> So switch to a different API for accepting memory in SEV-SNP guests, one > >>>>>>> > >>>>>>> That being the GHCB MSR protocol, it seems. > >>>>>>> > >>>>>> > >>>>>> Yes. > >>>>>> > >>>>>>> And since Tom co-developed, I guess we wanna do that. > >>>>>>> > >>>>>>> But then how much slower do we become? > >>>>>>> > >>>>>> > >>>>>> Non-EFI stub boot will become slower if the memory that is used to > >>>>>> decompress the kernel has not been accepted yet. But given how heavily > >>>>>> SEV-SNP depends on EFI boot, this typically only happens on kexec, as > >>>>>> that is the only boot path that goes through the traditional > >>>>>> decompressor. > >>>>> > >>>>> Some quick testing showed no significant differences in kexec booting > >>>>> and testing shows everything seems to be good. > >>>>> > >>>> > >>>> Thanks. > >>>> > >>>>> But, in testing with non-2M sized memory (e.g. a guest with 4097M of > >>>>> memory) and without the change to how SNP is detected before > >>>>> sev_enable() is called, we hit the error path in arch_accept_memory() in > >>>>> arch/x86/boot/compressed/mem.c and the boot crashes. > >>>>> > >>>> > >>>> Right. So this is because sev_snp_enabled() is based on sev_status, > >>>> which has not been set yet at this point, right? > >>> > >>> Correct. > >>> > >> > >> OK. Would this do the trick? (with asm/sev.h added to the #includes) > > > > Yes, that works for booting. Let me do some kexec testing and get back > > to you. Sorry, that might not be until tomorrow, though. > > Ok, found some time... looks good with kexec, too. > Thanks!
* Ard Biesheuvel <ardb@kernel.org> wrote: > > >> OK. Would this do the trick? (with asm/sev.h added to the #includes) > > > > > > Yes, that works for booting. Let me do some kexec testing and get back > > > to you. Sorry, that might not be until tomorrow, though. > > > > Ok, found some time... looks good with kexec, too. > > > > Thanks! I've turned this into this tag for the -v4 patch: Tested-by: Tom Lendacky <thomas.lendacky@amd.com> Thanks, Ingo
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index bb55934c1cee..89ba168f4f0f 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -164,10 +164,7 @@ bool sev_snp_enabled(void) static void __page_state_change(unsigned long paddr, enum psc_op op) { - u64 val; - - if (!sev_snp_enabled()) - return; + u64 val, msr; /* * If private -> shared then invalidate the page before requesting the @@ -176,6 +173,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op) if (op == SNP_PAGE_STATE_SHARED) pvalidate_4k_page(paddr, paddr, false); + /* Save the current GHCB MSR value */ + msr = sev_es_rd_ghcb_msr(); + /* Issue VMGEXIT to change the page state in RMP table. */ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op)); VMGEXIT(); @@ -185,6 +185,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op) if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val)) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC); + /* Restore the GHCB MSR value */ + sev_es_wr_ghcb_msr(msr); + /* * Now that page state is changed in the RMP table, validate it so that it is * consistent with the RMP entry. @@ -195,11 +198,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op) void snp_set_page_private(unsigned long paddr) { + if (!sev_snp_enabled()) + return; + __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE); } void snp_set_page_shared(unsigned long paddr) { + if (!sev_snp_enabled()) + return; + __page_state_change(paddr, SNP_PAGE_STATE_SHARED); } @@ -223,56 +232,10 @@ static bool early_setup_ghcb(void) return true; } -static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc, - phys_addr_t pa, phys_addr_t pa_end) -{ - struct psc_hdr *hdr; - struct psc_entry *e; - unsigned int i; - - hdr = &desc->hdr; - memset(hdr, 0, sizeof(*hdr)); - - e = desc->entries; - - i = 0; - while (pa < pa_end && i < VMGEXIT_PSC_MAX_ENTRY) { - hdr->end_entry = i; - - e->gfn = pa >> PAGE_SHIFT; - e->operation = SNP_PAGE_STATE_PRIVATE; - if (IS_ALIGNED(pa, PMD_SIZE) && (pa_end - pa) >= PMD_SIZE) { - e->pagesize = RMP_PG_SIZE_2M; - pa += PMD_SIZE; - } else { - e->pagesize = RMP_PG_SIZE_4K; - pa += PAGE_SIZE; - } - - e++; - i++; - } - - if (vmgexit_psc(boot_ghcb, desc)) - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC); - - pvalidate_pages(desc); - - return pa; -} - void snp_accept_memory(phys_addr_t start, phys_addr_t end) { - struct snp_psc_desc desc = {}; - unsigned int i; - phys_addr_t pa; - - if (!boot_ghcb && !early_setup_ghcb()) - sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC); - - pa = start; - while (pa < end) - pa = __snp_accept_memory(&desc, pa, end); + for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE) + __page_state_change(pa, SNP_PAGE_STATE_PRIVATE); } void sev_es_shutdown_ghcb(void)