diff mbox series

[v2,3/3] x86/boot: Implement early memory acceptance for SEV-SNP

Message ID 20250404082921.2767593-8-ardb+git@google.com
State New
Headers show
Series efistub/x86: Fix early SEV-SNP memory acceptance | expand

Commit Message

Ard Biesheuvel April 4, 2025, 8:29 a.m. UTC
From: Ard Biesheuvel <ardb@kernel.org>

Switch to a different API for accepting memory in SEV-SNP guests, one
which is actually supported at the point during boot where the EFI stub
may need to accept memory, but the SEV-SNP init code has not executed
yet.

Co-developed-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/x86/boot/compressed/sev.c          | 34 +++++++++++++++++---
 drivers/firmware/efi/libstub/x86-stub.c |  4 ++-
 2 files changed, 33 insertions(+), 5 deletions(-)

Comments

Kirill A. Shutemov April 4, 2025, 8:43 a.m. UTC | #1
On Fri, Apr 04, 2025 at 10:29:25AM +0200, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> Switch to a different API for accepting memory in SEV-SNP guests, one
> which is actually supported at the point during boot where the EFI stub
> may need to accept memory, but the SEV-SNP init code has not executed
> yet.

I probably miss the point, but why cannot decompressor use the same _early
version of accept too and avoid code duplication?

Maybe spell it out in the commit message for someone like me :P
Ard Biesheuvel April 4, 2025, 8:46 a.m. UTC | #2
On Fri, 4 Apr 2025 at 11:43, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Fri, Apr 04, 2025 at 10:29:25AM +0200, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > Switch to a different API for accepting memory in SEV-SNP guests, one
> > which is actually supported at the point during boot where the EFI stub
> > may need to accept memory, but the SEV-SNP init code has not executed
> > yet.
>
> I probably miss the point, but why cannot decompressor use the same _early
> version of accept too and avoid code duplication?
>
> Maybe spell it out in the commit message for someone like me :P
>

I assumed there was a reason that the shared GHCB page is used early
on. Maybe it is faster than accepting memory page by page?

It also depends on how important the memory acceptance is for the
legacy decompressor - AIUI the use case is primarily kexec, but
wouldn't the first kernel have accepted all memory already? I.e., if
it is slower, we might not care if it is a rare case.
Dionna Amalie Glaze April 4, 2025, 3:07 p.m. UTC | #3
On Fri, Apr 4, 2025 at 1:46 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Fri, 4 Apr 2025 at 11:43, Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > On Fri, Apr 04, 2025 at 10:29:25AM +0200, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > Switch to a different API for accepting memory in SEV-SNP guests, one
> > > which is actually supported at the point during boot where the EFI stub
> > > may need to accept memory, but the SEV-SNP init code has not executed
> > > yet.
> >
> > I probably miss the point, but why cannot decompressor use the same _early
> > version of accept too and avoid code duplication?
> >
> > Maybe spell it out in the commit message for someone like me :P
> >
>
> I assumed there was a reason that the shared GHCB page is used early
> on. Maybe it is faster than accepting memory page by page?

This is correct. The MSR protocol does a round trip per page, whereas
the GHCB page can communicate hundreds of state changes per round
trip.
>
> It also depends on how important the memory acceptance is for the
> legacy decompressor - AIUI the use case is primarily kexec, but
> wouldn't the first kernel have accepted all memory already? I.e., if

The first kernel may not accept all memory due to the laziness of
unaccepted memory transitions.
I'm not sure if we have the planned background acceptance process in
place (probably not), but we
can't expect that to have finished before the first kexec.

> it is slower, we might not care if it is a rare case.

If the GHCB is available, we should always prefer it.
Kirill A. Shutemov April 7, 2025, 9:25 a.m. UTC | #4
On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> If the GHCB is available, we should always prefer it.

I believe we should consider the cost of code duplication in this
situation.

If the non-early version is only used in the kexec path, it will not be
tested as frequently and could be more easily broken. I think it would be
acceptable for kexec to be slightly slower if it results in more
maintainable code.
Ingo Molnar April 7, 2025, 4:44 p.m. UTC | #5
* Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:

> On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> > If the GHCB is available, we should always prefer it.
> 
> I believe we should consider the cost of code duplication in this
> situation.
> 
> If the non-early version is only used in the kexec path, it will not be
> tested as frequently and could be more easily broken. I think it would be
> acceptable for kexec to be slightly slower if it results in more
> maintainable code.

Absolutely so.

Thanks,

	Ingo
Ard Biesheuvel April 7, 2025, 5:21 p.m. UTC | #6
On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote:
>
>
> * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
>
> > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> > > If the GHCB is available, we should always prefer it.
> >
> > I believe we should consider the cost of code duplication in this
> > situation.
> >
> > If the non-early version is only used in the kexec path, it will not be
> > tested as frequently and could be more easily broken. I think it would be
> > acceptable for kexec to be slightly slower if it results in more
> > maintainable code.
>
> Absolutely so.
>

It would be nice if someone could quantify 'slightly slower' - I am
leaning to the same conclusion but I have no clue what the actual
performance impact is.
Kirill A. Shutemov April 7, 2025, 5:33 p.m. UTC | #7
On Mon, Apr 07, 2025 at 07:21:17PM +0200, Ard Biesheuvel wrote:
> On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote:
> >
> >
> > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> >
> > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> > > > If the GHCB is available, we should always prefer it.
> > >
> > > I believe we should consider the cost of code duplication in this
> > > situation.
> > >
> > > If the non-early version is only used in the kexec path, it will not be
> > > tested as frequently and could be more easily broken. I think it would be
> > > acceptable for kexec to be slightly slower if it results in more
> > > maintainable code.
> >
> > Absolutely so.
> >
> 
> It would be nice if someone could quantify 'slightly slower' - I am
> leaning to the same conclusion but I have no clue what the actual
> performance impact is.

If we can survive the performance of the initial boot, we can live with it
for kexec.
Ard Biesheuvel April 7, 2025, 5:45 p.m. UTC | #8
On Mon, 7 Apr 2025 at 19:33, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Mon, Apr 07, 2025 at 07:21:17PM +0200, Ard Biesheuvel wrote:
> > On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote:
> > >
> > >
> > > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> > > > > If the GHCB is available, we should always prefer it.
> > > >
> > > > I believe we should consider the cost of code duplication in this
> > > > situation.
> > > >
> > > > If the non-early version is only used in the kexec path, it will not be
> > > > tested as frequently and could be more easily broken. I think it would be
> > > > acceptable for kexec to be slightly slower if it results in more
> > > > maintainable code.
> > >
> > > Absolutely so.
> > >
> >
> > It would be nice if someone could quantify 'slightly slower' - I am
> > leaning to the same conclusion but I have no clue what the actual
> > performance impact is.
>
> If we can survive the performance of the initial boot, we can live with it
> for kexec.
>

The initial boot does not occur via the decompressor, but via the EFI
stub, where memory acceptance is handled by the firmware (as it
should).

Given that the traditional decompressor carves out an allocation from
the raw E820 map without using any of the higher level APIs, it has to
accept the memory itself if it is marked as unaccepted in the table.

Perhaps the decompressor should try to avoid unaccepted memory?
Tom Lendacky April 7, 2025, 6:05 p.m. UTC | #9
On 4/7/25 04:25, Kirill A. Shutemov wrote:
> On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
>> If the GHCB is available, we should always prefer it.
> 
> I believe we should consider the cost of code duplication in this
> situation.
> 
> If the non-early version is only used in the kexec path, it will not be
> tested as frequently and could be more easily broken. I think it would be
> acceptable for kexec to be slightly slower if it results in more
> maintainable code.
> 

Is accept_memory() in the decompressor or efistub only used in the kexec
path?

Thanks,
Tom
Ard Biesheuvel April 7, 2025, 7:59 p.m. UTC | #10
On Mon, 7 Apr 2025 at 20:05, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 4/7/25 04:25, Kirill A. Shutemov wrote:
> > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> >> If the GHCB is available, we should always prefer it.
> >
> > I believe we should consider the cost of code duplication in this
> > situation.
> >
> > If the non-early version is only used in the kexec path, it will not be
> > tested as frequently and could be more easily broken. I think it would be
> > acceptable for kexec to be slightly slower if it results in more
> > maintainable code.
> >
>
> Is accept_memory() in the decompressor or efistub only used in the kexec
> path?
>

The EFI stub does not call accept_memory(), only the decompressor
does. The only use case for explicit memory acceptance in the EFI stub
is process_unaccepted_memory(), which accepts the misaligned chunk of
memory that cannot be described at 2M granularity by the accepted
memory table.

Remember that the EFI stub no longer calls into the legacy
decompressor at all - it decompresses the kernel while executing the
EFI boot services and branches straight to it.
Kirill A. Shutemov April 7, 2025, 9:08 p.m. UTC | #11
On Mon, Apr 07, 2025 at 07:45:59PM +0200, Ard Biesheuvel wrote:
> On Mon, 7 Apr 2025 at 19:33, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > On Mon, Apr 07, 2025 at 07:21:17PM +0200, Ard Biesheuvel wrote:
> > > On Mon, 7 Apr 2025 at 18:44, Ingo Molnar <mingo@kernel.org> wrote:
> > > >
> > > >
> > > > * Kirill A. Shutemov <kirill.shutemov@linux.intel.com> wrote:
> > > >
> > > > > On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
> > > > > > If the GHCB is available, we should always prefer it.
> > > > >
> > > > > I believe we should consider the cost of code duplication in this
> > > > > situation.
> > > > >
> > > > > If the non-early version is only used in the kexec path, it will not be
> > > > > tested as frequently and could be more easily broken. I think it would be
> > > > > acceptable for kexec to be slightly slower if it results in more
> > > > > maintainable code.
> > > >
> > > > Absolutely so.
> > > >
> > >
> > > It would be nice if someone could quantify 'slightly slower' - I am
> > > leaning to the same conclusion but I have no clue what the actual
> > > performance impact is.
> >
> > If we can survive the performance of the initial boot, we can live with it
> > for kexec.
> >
> 
> The initial boot does not occur via the decompressor, but via the EFI
> stub, where memory acceptance is handled by the firmware (as it
> should).

I wounder what protocol BIOS uses.

> Given that the traditional decompressor carves out an allocation from
> the raw E820 map without using any of the higher level APIs, it has to
> accept the memory itself if it is marked as unaccepted in the table.
> 
> Perhaps the decompressor should try to avoid unaccepted memory?

It limits KASLR. I would rather wait more on kexec.
Tom Lendacky April 8, 2025, 3:53 p.m. UTC | #12
On 4/7/25 14:59, Ard Biesheuvel wrote:
> On Mon, 7 Apr 2025 at 20:05, Tom Lendacky <thomas.lendacky@amd.com> wrote:
>>
>> On 4/7/25 04:25, Kirill A. Shutemov wrote:
>>> On Fri, Apr 04, 2025 at 08:07:03AM -0700, Dionna Amalie Glaze wrote:
>>>> If the GHCB is available, we should always prefer it.
>>>
>>> I believe we should consider the cost of code duplication in this
>>> situation.
>>>
>>> If the non-early version is only used in the kexec path, it will not be
>>> tested as frequently and could be more easily broken. I think it would be
>>> acceptable for kexec to be slightly slower if it results in more
>>> maintainable code.
>>>
>>
>> Is accept_memory() in the decompressor or efistub only used in the kexec
>> path?
>>
> 
> The EFI stub does not call accept_memory(), only the decompressor
> does. The only use case for explicit memory acceptance in the EFI stub

Since EFI stub never uses accept_memory() I looked at moving enablement
of SEV to be before the setup of the accepted memory bitmap, as SEV
enablement doesn't need any e820 info. But that didn't work because the
real issue is early_setup_ghcb() calls set_page_decrypted() which calls
set_clr_page_flags(). The latter function is not meant to work with EFI
page tables, so there is an incompatibility.

If we had a way to check for whether we are coming through the EFI stub
vs the decompressor, then snp_accept_memory() could decide to skip
early_setup_ghcb() when called from the EFI stub and call either
__snp_accept_memory() from the decompressor or __page_state_change()
from the EFI stub (the latter having to be updated to return a value).

I think there are other areas that might need investigating because I
noticed that efi_warn() is successful before efi_exit_boot_services()
but blows up immediately after (possibly in the EFI #VC handler having
to do with addressing the string?).

Thanks,
Tom


> is process_unaccepted_memory(), which accepts the misaligned chunk of
> memory that cannot be described at 2M granularity by the accepted
> memory table.
> 
> Remember that the EFI stub no longer calls into the legacy
> decompressor at all - it decompresses the kernel while executing the
> EFI boot services and branches straight to it.
diff mbox series

Patch

diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c
index bb55934c1cee..88100bf83ded 100644
--- a/arch/x86/boot/compressed/sev.c
+++ b/arch/x86/boot/compressed/sev.c
@@ -164,10 +164,7 @@  bool sev_snp_enabled(void)
 
 static void __page_state_change(unsigned long paddr, enum psc_op op)
 {
-	u64 val;
-
-	if (!sev_snp_enabled())
-		return;
+	u64 val, msr;
 
 	/*
 	 * If private -> shared then invalidate the page before requesting the
@@ -176,6 +173,9 @@  static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if (op == SNP_PAGE_STATE_SHARED)
 		pvalidate_4k_page(paddr, paddr, false);
 
+	/* Save the current GHCB MSR value */
+	msr = sev_es_rd_ghcb_msr();
+
 	/* Issue VMGEXIT to change the page state in RMP table. */
 	sev_es_wr_ghcb_msr(GHCB_MSR_PSC_REQ_GFN(paddr >> PAGE_SHIFT, op));
 	VMGEXIT();
@@ -185,6 +185,9 @@  static void __page_state_change(unsigned long paddr, enum psc_op op)
 	if ((GHCB_RESP_CODE(val) != GHCB_MSR_PSC_RESP) || GHCB_MSR_PSC_RESP_VAL(val))
 		sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_PSC);
 
+	/* Restore the GHCB MSR value */
+	sev_es_wr_ghcb_msr(msr);
+
 	/*
 	 * Now that page state is changed in the RMP table, validate it so that it is
 	 * consistent with the RMP entry.
@@ -195,11 +198,17 @@  static void __page_state_change(unsigned long paddr, enum psc_op op)
 
 void snp_set_page_private(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_PRIVATE);
 }
 
 void snp_set_page_shared(unsigned long paddr)
 {
+	if (!sev_snp_enabled())
+		return;
+
 	__page_state_change(paddr, SNP_PAGE_STATE_SHARED);
 }
 
@@ -261,6 +270,11 @@  static phys_addr_t __snp_accept_memory(struct snp_psc_desc *desc,
 	return pa;
 }
 
+/*
+ * The memory acceptance support uses the boot GHCB page to perform
+ * the required page state change operation before validating the
+ * pages.
+ */
 void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 {
 	struct snp_psc_desc desc = {};
@@ -275,6 +289,18 @@  void snp_accept_memory(phys_addr_t start, phys_addr_t end)
 		pa = __snp_accept_memory(&desc, pa, end);
 }
 
+/*
+ * The early version of memory acceptance is needed when being called
+ * from the EFI stub driver. The pagetable manipulation to mark the
+ * boot GHCB page as shared can't be performed at this stage, so use
+ * the GHCB page state change MSR protocol instead.
+ */
+void snp_accept_memory_early(phys_addr_t start, phys_addr_t end)
+{
+	for (phys_addr_t pa = start; pa < end; pa += PAGE_SIZE)
+		__page_state_change(pa, SNP_PAGE_STATE_PRIVATE);
+}
+
 void sev_es_shutdown_ghcb(void)
 {
 	if (!boot_ghcb)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 7d9cf473f4d0..dcf436dea99e 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -383,6 +383,8 @@  static bool efistub_is_sevsnp_guest(void)
 	return sev_get_status() & MSR_AMD64_SEV_SNP_ENABLED;
 }
 
+void snp_accept_memory_early(phys_addr_t start, phys_addr_t end);
+
 void efistub_accept_memory(phys_addr_t start, phys_addr_t end)
 {
 	static bool once, is_tdx, is_sevsnp;
@@ -398,7 +400,7 @@  void efistub_accept_memory(phys_addr_t start, phys_addr_t end)
 	if (is_tdx)
 		tdx_accept_memory(start, end);
 	else if (is_sevsnp)
-		snp_accept_memory(start, end);
+		snp_accept_memory_early(start, end);
 }
 
 #endif