Message ID | 20250404082921.2767593-8-ardb+git@google.com |
---|---|
State | New |
Headers | show |
Series | efistub/x86: Fix early SEV-SNP memory acceptance | expand |
On Fri, Apr 04, 2025 at 10:29:25AM +0200, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Switch to a different API for accepting memory in SEV-SNP guests, one > which is actually supported at the point during boot where the EFI stub > may need to accept memory, but the SEV-SNP init code has not executed > yet. I probably miss the point, but why cannot decompressor use the same _early version of accept too and avoid code duplication? Maybe spell it out in the commit message for someone like me :P
On Fri, 4 Apr 2025 at 11:43, Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > On Fri, Apr 04, 2025 at 10:29:25AM +0200, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Switch to a different API for accepting memory in SEV-SNP guests, one > > which is actually supported at the point during boot where the EFI stub > > may need to accept memory, but the SEV-SNP init code has not executed > > yet. > > I probably miss the point, but why cannot decompressor use the same _early > version of accept too and avoid code duplication? > > Maybe spell it out in the commit message for someone like me :P > I assumed there was a reason that the shared GHCB page is used early on. Maybe it is faster than accepting memory page by page? It also depends on how important the memory acceptance is for the legacy decompressor - AIUI the use case is primarily kexec, but wouldn't the first kernel have accepted all memory already? I.e., if it is slower, we might not care if it is a rare case.
On Fri, Apr 4, 2025 at 1:46 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Fri, 4 Apr 2025 at 11:43, Kirill A. Shutemov > <kirill.shutemov@linux.intel.com> wrote: > > > > On Fri, Apr 04, 2025 at 10:29:25AM +0200, Ard Biesheuvel wrote: > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > > > Switch to a different API for accepting memory in SEV-SNP guests, one > > > which is actually supported at the point during boot where the EFI stub > > > may need to accept memory, but the SEV-SNP init code has not executed > > > yet. > > > > I probably miss the point, but why cannot decompressor use the same _early > > version of accept too and avoid code duplication? > > > > Maybe spell it out in the commit message for someone like me :P > > > > I assumed there was a reason that the shared GHCB page is used early > on. Maybe it is faster than accepting memory page by page? This is correct. The MSR protocol does a round trip per page, whereas the GHCB page can communicate hundreds of state changes per round trip. > > It also depends on how important the memory acceptance is for the > legacy decompressor - AIUI the use case is primarily kexec, but > wouldn't the first kernel have accepted all memory already? I.e., if The first kernel may not accept all memory due to the laziness of unaccepted memory transitions. I'm not sure if we have the planned background acceptance process in place (probably not), but we can't expect that to have finished before the first kexec. > it is slower, we might not care if it is a rare case. If the GHCB is available, we should always prefer it.
On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> If the GHCB is available, we should always prefer it.
I believe we should consider the cost of code duplication in this
situation.
If the non-early version is only used in the kexec path, it will not be
tested as frequently and could be more easily broken. I think it would be
acceptable for kexec to be slightly slower if it results in more
maintainable code.
* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: > > If the GHCB is available, we should always prefer it. > > I believe we should consider the cost of code duplication in this > situation. > > If the non-early version is only used in the kexec path, it will not be > tested as frequently and could be more easily broken. I think it would be > acceptable for kexec to be slightly slower if it results in more > maintainable code. Absolutely so. Thanks, Ingo
On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote: > > > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: > > > If the GHCB is available, we should always prefer it. > > > > I believe we should consider the cost of code duplication in this > > situation. > > > > If the non-early version is only used in the kexec path, it will not be > > tested as frequently and could be more easily broken. I think it would be > > acceptable for kexec to be slightly slower if it results in more > > maintainable code. > > Absolutely so. > It would be nice if someone could quantify 'slightly slower' - I am leaning to the same conclusion but I have no clue what the actual performance impact is.
On Mon, Apr 07, 2025 at 07:21:17PM +0200, Ard Biesheuvel wrote: > On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > > > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: > > > > If the GHCB is available, we should always prefer it. > > > > > > I believe we should consider the cost of code duplication in this > > > situation. > > > > > > If the non-early version is only used in the kexec path, it will not be > > > tested as frequently and could be more easily broken. I think it would be > > > acceptable for kexec to be slightly slower if it results in more > > > maintainable code. > > > > Absolutely so. > > > > It would be nice if someone could quantify 'slightly slower' - I am > leaning to the same conclusion but I have no clue what the actual > performance impact is. If we can survive the performance of the initial boot, we can live with it for kexec.
On Mon, 7 Apr 2025 at 19:33, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > On Mon, Apr 07, 2025 at 07:21:17PM +0200, Ard Biesheuvel wrote: > > On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > > > > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > > > > > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: > > > > > If the GHCB is available, we should always prefer it. > > > > > > > > I believe we should consider the cost of code duplication in this > > > > situation. > > > > > > > > If the non-early version is only used in the kexec path, it will not be > > > > tested as frequently and could be more easily broken. I think it would be > > > > acceptable for kexec to be slightly slower if it results in more > > > > maintainable code. > > > > > > Absolutely so. > > > > > > > It would be nice if someone could quantify 'slightly slower' - I am > > leaning to the same conclusion but I have no clue what the actual > > performance impact is. > > If we can survive the performance of the initial boot, we can live with it > for kexec. > The initial boot does not occur via the decompressor, but via the EFI stub, where memory acceptance is handled by the firmware (as it should). Given that the traditional decompressor carves out an allocation from the raw E820 map without using any of the higher level APIs, it has to accept the memory itself if it is marked as unaccepted in the table. Perhaps the decompressor should try to avoid unaccepted memory?
On 4/7/25 04:25, Kirill A. Shutemov wrote: > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: >> If the GHCB is available, we should always prefer it. > > I believe we should consider the cost of code duplication in this > situation. > > If the non-early version is only used in the kexec path, it will not be > tested as frequently and could be more easily broken. I think it would be > acceptable for kexec to be slightly slower if it results in more > maintainable code. > Is accept_memory() in the decompressor or efistub only used in the kexec path? Thanks, Tom
On Mon, 7 Apr 2025 at 20:05, Tom Lendacky <thomas.lendacky@amd.com> wrote: > > On 4/7/25 04:25, Kirill A. Shutemov wrote: > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: > >> If the GHCB is available, we should always prefer it. > > > > I believe we should consider the cost of code duplication in this > > situation. > > > > If the non-early version is only used in the kexec path, it will not be > > tested as frequently and could be more easily broken. I think it would be > > acceptable for kexec to be slightly slower if it results in more > > maintainable code. > > > > Is accept_memory() in the decompressor or efistub only used in the kexec > path? > The EFI stub does not call accept_memory(), only the decompressor does. The only use case for explicit memory acceptance in the EFI stub is process_unaccepted_memory(), which accepts the misaligned chunk of memory that cannot be described at 2M granularity by the accepted memory table. Remember that the EFI stub no longer calls into the legacy decompressor at all - it decompresses the kernel while executing the EFI boot services and branches straight to it.
On Mon, Apr 07, 2025 at 07:45:59PM +0200, Ard Biesheuvel wrote: > On Mon, 7 Apr 2025 at 19:33, Kirill A. Shutemov <kirill@shutemov.name> wrote: > > > > On Mon, Apr 07, 2025 at 07:21:17PM +0200, Ard Biesheuvel wrote: > > > On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote: > > > > > > > > > > > > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote: > > > > > > > > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: > > > > > > If the GHCB is available, we should always prefer it. > > > > > > > > > > I believe we should consider the cost of code duplication in this > > > > > situation. > > > > > > > > > > If the non-early version is only used in the kexec path, it will not be > > > > > tested as frequently and could be more easily broken. I think it would be > > > > > acceptable for kexec to be slightly slower if it results in more > > > > > maintainable code. > > > > > > > > Absolutely so. > > > > > > > > > > It would be nice if someone could quantify 'slightly slower' - I am > > > leaning to the same conclusion but I have no clue what the actual > > > performance impact is. > > > > If we can survive the performance of the initial boot, we can live with it > > for kexec. > > > > The initial boot does not occur via the decompressor, but via the EFI > stub, where memory acceptance is handled by the firmware (as it > should). I wounder what protocol BIOS uses. > Given that the traditional decompressor carves out an allocation from > the raw E820 map without using any of the higher level APIs, it has to > accept the memory itself if it is marked as unaccepted in the table. > > Perhaps the decompressor should try to avoid unaccepted memory? It limits KASLR. I would rather wait more on kexec.
On 4/7/25 14:59, Ard Biesheuvel wrote: > On Mon, 7 Apr 2025 at 20:05, Tom Lendacky <thomas.lendacky@amd.com> wrote: >> >> On 4/7/25 04:25, Kirill A. Shutemov wrote: >>> On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote: >>>> If the GHCB is available, we should always prefer it. >>> >>> I believe we should consider the cost of code duplication in this >>> situation. >>> >>> If the non-early version is only used in the kexec path, it will not be >>> tested as frequently and could be more easily broken. I think it would be >>> acceptable for kexec to be slightly slower if it results in more >>> maintainable code. >>> >> >> Is accept_memory() in the decompressor or efistub only used in the kexec >> path? >> > > The EFI stub does not call accept_memory(), only the decompressor > does. The only use case for explicit memory acceptance in the EFI stub Since EFI stub never uses accept_memory() I looked at moving enablement of SEV to be before the setup of the accepted memory bitmap, as SEV enablement doesn't need any e820 info. But that didn't work because the real issue is early_setup_ghcb() calls set_page_decrypted() which calls set_clr_page_flags(). The latter function is not meant to work with EFI page tables, so there is an incompatibility. If we had a way to check for whether we are coming through the EFI stub vs the decompressor, then snp_accept_memory() could decide to skip early_setup_ghcb() when called from the EFI stub and call either __snp_accept_memory() from the decompressor or __page_state_change() from the EFI stub (the latter having to be updated to return a value). I think there are other areas that might need investigating because I noticed that efi_warn() is successful before efi_exit_boot_services() but blows up immediately after (possibly in the EFI #VC handler having to do with addressing the string?). Thanks, Tom > is process_unaccepted_memory(), which accepts the misaligned chunk of > memory that cannot be described at 2M granularity by the accepted > memory table. > > Remember that the EFI stub no longer calls into the legacy > decompressor at all - it decompresses the kernel while executing the > EFI boot services and branches straight to it.
diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index bb55934c1cee..88100bf83ded 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -164,10 +164,7 @@ bool sev_snp_enabled(void) static void __page_state_change(unsigned long paddr, enum psc_op op) { - u64 val; - - if (!sev_snp_enabled()) - return; + u64 val, msr; /* * If private -> shared then invalidate the page before requesting the @@ -176,6 +173,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op) if (op == SNP_PAGE_STATE_SHARED) pvalidate_4k_page(paddr, paddr, false); + /* Save the current GHCB MSR value */ + msr = sev_es_rd_ghcb_msr(); + /* Issue VMGEXIT to change the page state in RMP table. */ sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op)); VMGEXIT(); @@ -185,6 +185,9 @@ static void __page_state_change(unsigned long paddr, enum psc_op op) if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val)) sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC); + /* Restore the GHCB MSR value */ + sev_es_wr_ghcb_msr(msr); + /* * Now that page state is changed in the RMP table, validate it so that it is * consistent with the RMP entry. @@ -195,11 +198,17 @@ static void __page_state_change(unsigned long paddr, enum psc_op op) void snp_set_page_private(unsigned long paddr) { + if (!sev_snp_enabled()) + return; + __page_state_change(paddr, SNP_PAGE_STATE_PRIVATE); } void snp_set_page_shared(unsigned long paddr) { + if (!sev_snp_enabled()) + return; + __page_state_change(paddr, SNP_PAGE_STATE_SHARED); } @@ -261,6 +270,11 @@ static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc, return pa; } +/* + * The memory acceptance support uses the boot GHCB page to perform + * the required page state change operation before validating the + * pages. + */ void snp_accept_memory(phys_addr_t start, phys_addr_t end) { struct snp_psc_desc desc = {}; @@ -275,6 +289,18 @@ void snp_accept_memory(phys_addr_t start, phys_addr_t end) pa = __snp_accept_memory(&desc, pa, end); } +/* + * The early version of memory acceptance is needed when being called + * from the EFI stub driver. The pagetable manipulation to mark the + * boot GHCB page as shared can't be performed at this stage, so use + * the GHCB page state change MSR protocol instead. + */ +void snp_accept_memory_early(phys_addr_t start, phys_addr_t end) +{ + for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE) + __page_state_change(pa, SNP_PAGE_STATE_PRIVATE); +} + void sev_es_shutdown_ghcb(void) { if (!boot_ghcb) diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 7d9cf473f4d0..dcf436dea99e 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -383,6 +383,8 @@ static bool efistub_is_sevsnp_guest(void) return sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED; } +void snp_accept_memory_early(phys_addr_t start, phys_addr_t end); + void efistub_accept_memory(phys_addr_t start, phys_addr_t end) { static bool once, is_tdx, is_sevsnp; @@ -398,7 +400,7 @@ void efistub_accept_memory(phys_addr_t start, phys_addr_t end) if (is_tdx) tdx_accept_memory(start, end); else if (is_sevsnp) - snp_accept_memory(start, end); + snp_accept_memory_early(start, end); } #endif