Message ID | 20250512190834.332684-23-ardb+git@google.com |
---|---|
Headers | show |
Series | x86: strict separation of startup code | expand |
On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > !!! Boot tested on non-SEV guest ONLY !!!! ... > !!! Boot tested on non-SEV guest ONLY !!!! > > Again, I will need to lean on Tom to determine whether this breaks > SEV-SNP guest boot. As I mentioned before, I am still waiting for > SEV-SNP capable hardware to be delivered. Ingo, please do not rush this stuff in before Tom and I have tested it successfully with SEV* guests. Thanks!
* Borislav Petkov <bp@alien8.de> wrote: > On Mon, May 12, 2025 at 09:08:35PM +0200, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > !!! Boot tested on non-SEV guest ONLY !!!! > > ... > > > !!! Boot tested on non-SEV guest ONLY !!!! > > > > Again, I will need to lean on Tom to determine whether this breaks > > SEV-SNP guest boot. As I mentioned before, I am still waiting for > > SEV-SNP capable hardware to be delivered. > > Ingo, please do not rush this stuff in before Tom and I have tested it > successfully with SEV* guests. > > Thanks! I don't intend to rush it, but note that AMD's SEV-SNP testing is lagging a *lot* at the moment: Ard asked for testing the -v2 series on May 4: https://lore.kernel.org/r/20250504095230.2932860-25-ardb+git@google.com That request for testing was ignored AFAICS. It's May 13 and still crickets. We also had SEV-SNP boot bugs pending since August 2024, that nobody but (eventually) AMD triggered. Ie. very few people outside of the vendor are testing SEV-SNP AFAICS, and even vendor testing is sporadic ... Please ask AMD internally to get SEV-SNP tested more reliably. Testing this -v3 series would be a good start. Hint, hint. ;-) Thanks! Ingo
* Borislav Petkov <bp@alien8.de> wrote: > On Tue, May 13, 2025 at 12:02:22PM +0200, Ingo Molnar wrote: > > I don't intend to rush it, > > Thanks. > > > That request for testing was ignored AFAICS. It's May 13 and still > > crickets. > > Not ignored - Tom and I are testing but we're busy as hell too. Yeah, so the problem is that SEV* is hardware that basically no active tester outside of the vendor (AMD) owns and is testing against development trees AFAICS. > > We also had SEV-SNP boot bugs pending since August 2024, that > > nobody but (eventually) AMD triggered. > > Where? I did a quick Git search, and here are a few examples: For example, this commit from last summer: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support") ... was only fixed recently: d54d610243a4 ("x86/boot/sev: Avoid shared GHCB page for early memory acceptance") Or this commit from June 2024: 34ff65901735 ("x86/sev: Use kernel provided SVSM Calling Areas") ... was only fixed a few days ago: f7387eff4bad ("x86/sev: Fix operator precedence in GHCB_MSR_VMPL_REQ_LEVEL macro") Or this commit from June 2024: fcd042e86422 ("x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0") ... was fixed a few weeks ago: 8ed12ab1319b ("x86/boot/sev: Support memory acceptance in the EFI stub under SVSM") Ie. bugfix latencies here were 10+ months. Note that two of those fixes were from Ard who is working on further robustifying the startup code - a much needed change. Ie. when Ard is asking for SEV-SNP testing for WIP series, which he did 10+ days ago, you should not ignore it ... or if you do ignore his request for testing, you should not complain about the changes being merged eventually, once they pass review & testing on non-SEV platforms. > > Ie. very few people outside of the vendor are testing SEV-SNP > > AFAICS, and even vendor testing is sporadic ... > > Not true - SEV* testing happens on a daily basis. If you didn't have time to personally test Ard's -v2 series since May 2, that's OK: I can merge these proposed changes in an RFT branch so that it gets tested in the daily testing flow. [see further below for the Git link] In other words: please no "gatekeeping". Please don't force Ard into a catch-22 situation where he cannot test the patches on SEV-SNP, but you are blocking these x86 startup code changes on the grounds that they weren't tested on SEV-SNP ... > > Please ask AMD internally to get SEV-SNP tested more reliably. > > Testing this -v3 series would be a good start. Hint, hint. ;-) > > We test everything that goes into linux-next. We haven't started > testing unreviewed patchsets yet because we don't do that - that > stuff is moving. > > So if you want to merge something, just ping me or Tom and we'll test > it. Here's Ard's request from May 2: https://lore.kernel.org/r/20250504095230.2932860-25-ardb+git@google.com "Again, I will need to lean on Tom to determine whether this breaks SEV-SNP guest boot. As I mentioned before, I am still waiting for SEV-SNP capable hardware to be delivered." This request for testing was ignored AFAICS. > But you have to give us ample time to do so - you can't merge > something which Ard sent *on the same day*. Sure: -v2 was sent more than 10 days ago, and the testing request was ignored AFAICS. Do 10 days count as 'ample time'? Anyway, to make it even lower-overhead to test these changes, I've put the -v3 series into the WIP.x86/boot tree: git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git WIP.x86/boot Only lightly tested. Just a "boots/doesn't boot" kind of quick feedback would be much appreciated. Note that naturally this tree is still subject to rebasing, as review feedback is incorporated. Thanks! Ingo
On Tue, 13 May 2025 at 14:55, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 12 May 2025 at 20:11, Ard Biesheuvel <ardb+git@google.com> wrote: > > > > From: Ard Biesheuvel <ardb@kernel.org> > > > > The early page state change API is mostly only used very early, when > > only the boot time SVSM calling area is in use. However, this API is > > also called by the kexec finishing code, which runs very late, and > > potentially from a different CPU (which uses a different calling area). > > > > To avoid pulling the per-CPU SVSM calling area pointers and related SEV > > state into the startup code, refactor the page state change API so the > > SVSM calling area virtual and physical addresses can be provided by the > > caller. > > > > No functional change intended. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > This patch is broken - I'll send a followup fix asap. ... or actually, the one before it.
On Tue, 13 May 2025 at 15:17, Borislav Petkov <bp@alien8.de> wrote: > > On Tue, May 13, 2025 at 01:22:16PM +0200, Ingo Molnar wrote: ... > > Note that two of those fixes were from Ard who is working on further > > robustifying the startup code - a much needed change. > > Really? Much needed huh? > > Please do explain why is it much needed? > > Because the reason Ard is doing it is a different one but maybe > I misunderstood him... > I will refrain from inserting myself into the intra-tip review and testing policy debate, but let me at least provide a quick recap of what I am doing here and why. Since commit c88d71508e36 x86/boot/64: Rewrite startup_64() in C dated Jun 6 2017, we have been using C code on the boot path in a way that is not supported by the toolchain, i.e., to execute non-PIC C code from a mapping of memory that is different from the one provided to the linker. It should have been obvious at the time that this was a bad idea, given the need to sprinkle fixup_pointer() calls left and right to manipulate global variables (including non-pointer variables) without crashing. This C startup code has been expanding, and in particular, the SEV-SNP startup code has been expanding over the past couple of years, and grown many of these warts, where the C code needs to use special annotations or helpers to access global objects. Google uses Clang internally, and as expected, it does not behave quite like GCC in this regard either. The result is that the SEV-SNP boot tended to break in cryptic ways with Clang built kernels, due to absolute references in the startup code that runs before those absolute references are mapped. I've done a preliminary pass upstream with RIP_REL_REF() and rip_rel_ptr() and the use of the .head.text section for startup code to ensure that we detect such issues at build time, and it has already resulted in a number of true positives where the code in question would have failed at boot time. At this point, I'm not aware of any issues caused by absolute references going undetected. However, Linus kindly indicated that the resulting diagnostics produced by the relocs tool do not meet his high standards, and so I proposed another approach, which I am implementing now (see cover letter for details). Note that this approach is also much more robust, as annotating things as __head by hand to get it emitted into the section to which the diagnostics are applied is obviously not foolproof. Fixing the existing 5-level paging and kernel mapping code was rather straight-forward. However, splitting up the SEV-SNP code has been rather challenging due to the way it was put together, i.e., as a single source file used everywhere, and to which additional functionality has been added piecemeal (i.e., the SVSM support). It is obvious that these changes should be tested before being merged, hence the RFT in the subject. And I have been struggling a bit to get access to usable hardware. (I do have access to internal development systems, but those do not fit the 'usable' description by any measure, given that I have to go through the cloud VM orchestration APIs to test boot a simple kernel image). What Boris might allude to is the fact that some of these changes also form a prerequisite for being able to construct a generic EFI zboot image for x86, which is a long-term objective that I am working on in the background. But this is not the main reason. In any case, there is no urgency wrt these changes as far as I am concerned, and given that I already found an issue myself with v3, perhaps it is better if we disregard it for the time being, and we can come back to it for the next cycle. In the mean time, I can compare notes with Boris and Tom directly to ensure that this is in the right shape, and perhaps we could at least fix the pgtable_l5_enabled() mess as well (for which I sent out a RFC/v3 today).
On Tue, 13 May 2025 at 17:44, Borislav Petkov <bp@alien8.de> wrote: > ... > > I've done a preliminary pass upstream with RIP_REL_REF() and > > rip_rel_ptr() and the use of the .head.text section for startup code > > to ensure that we detect such issues at build time, and it has already > > resulted in a number of true positives where the code in question > > would have failed at boot time. At this point, I'm not aware of any > > issues caused by absolute references going undetected. > > > > However, Linus kindly indicated that the resulting diagnostics > > produced by the relocs tool do not meet his high standards, and so I > > proposed another approach, which I am implementing now (see cover > > letter for details). Note that this approach is also much more robust, > > as annotating things as __head by hand to get it emitted into the > > section to which the diagnostics are applied is obviously not > > foolproof. > > AFAIR, this is what you've done for arm64 too, right? > The new approach proposed in this series is what I implemented before for arm64, yes.
From: Ard Biesheuvel <ardb@kernel.org> !!! Boot tested on non-SEV guest ONLY !!!! This RFT series implements a strict separation between startup code and ordinary code, where startup code is built in a way that tolerates being invoked from the initial 1:1 mapping of memory. The existsing approach of emitting this code into .head.text and checking for absolute relocations in that section is not 100% safe, and produces diagnostics that are sometimes difficult to interpret. [0] Instead, rely on symbol prefixes, similar to how this is implemented for the EFI stub and for the startup code in the arm64 port. This ensures that startup code can only call other startup code, unless a special symbol alias is emitted that exposes a non-startup routine to the startup code. This is somewhat intrusive, as there are many data objects that are referenced both by startup code and by ordinary code, and an alias needs to be emitted for each of those. If startup code references anything that has not been made available to it explicitly, a build time link error will occur. This ultimately allows the .head.text section to be dropped entirely, as it no longer has a special significance. Instead, code that only executes at boot is emitted into .init.text as it should. The majority of changes is around early SEV code. The main issue is that the use of GHCB pages and SVSM calling areas in code that may run from both the 1:1 mapping and the kernel virtual mapping is problematic as it relies on __pa() to perform VA to PA translations, which are ambiguous in this context. Also, __pa() pulls in non-trivial instrumented code when CONFIG_DEBUG_VIRTUAL=y and so it is better to avoid VA to PA translations altogether in the startup code. Change since RFT/v2: - Rebase onto tip/x86/boot and drop the patches from the previous revision that have been applied in the meantime. - Omit the pgtable_l5_enabled() changes for now, and just expose PIC aliases for the variables in question - this can be sorted later. - Don't use the boot SVSM calling area in snp_kexec_finish(), but pass down the correct per-CPU one to the early page state API. - Rename arch/x86/coco/sev/sev-noinstr.o to arch/x86/coco/sev/noinstr.o - Further reduce the amount of SEV code that needs to be constructed in a special way. Change since RFC/v1: - Include a major disentanglement/refactor of the SEV-SNP startup code, so that only code that really needs to run from the 1:1 mapping is included in the startup/ code - Incorporate some early notes from Ingo Build tested defconfig and allmodconfig !!! Boot tested on non-SEV guest ONLY !!!! Again, I will need to lean on Tom to determine whether this breaks SEV-SNP guest boot. As I mentioned before, I am still waiting for SEV-SNP capable hardware to be delivered. Cc: Borislav Petkov <bp@alien8.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Dionna Amalie Glaze <dionnaglaze@google.com> Cc: Kevin Loughlin <kevinloughlin@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> [0] https://lore.kernel.org/all/CAHk-=wj7k9nvJn6cpa3-5Ciwn2RGyE605BMkjWE4MqnvC9E92A@mail.gmail.com/ Ard Biesheuvel (21): x86/sev: Separate MSR and GHCB based snp_cpuid() via a callback x86/sev: Use MSR protocol for remapping SVSM calling area x86/sev: Use MSR protocol only for early SVSM PVALIDATE call x86/sev: Run RMPADJUST on SVSM calling area page to test VMPL x86/sev: Move GHCB page based HV communication out of startup code x86/sev: Avoid global variable to store virtual address of SVSM area x86/sev: Move MSR save/restore out of early page state change helper x86/sev: Share implementation of MSR-based page state change x86/sev: Pass SVSM calling area down to early page state change API x86/sev: Use boot SVSM CA for all startup and init code x86/boot: Drop redundant RMPADJUST in SEV SVSM presence check x86/sev: Unify SEV-SNP hypervisor feature check x86/sev: Provide PIC aliases for SEV related data objects x86/boot: Provide PIC aliases for 5-level paging related constants x86/sev: Move __sev_[get|put]_ghcb() into separate noinstr object x86/sev: Export startup routines for later use x86/boot: Create a confined code area for startup code x86/boot: Move startup code out of __head section x86/boot: Disallow absolute symbol references in startup code x86/boot: Revert "Reject absolute references in .head.text" x86/boot: Get rid of the .head.text section arch/x86/boot/compressed/sev-handle-vc.c | 3 + arch/x86/boot/compressed/sev.c | 132 ++----- arch/x86/boot/startup/Makefile | 21 ++ arch/x86/boot/startup/exports.h | 12 + arch/x86/boot/startup/gdt_idt.c | 4 +- arch/x86/boot/startup/map_kernel.c | 4 +- arch/x86/boot/startup/sev-shared.c | 361 +++++--------------- arch/x86/boot/startup/sev-startup.c | 190 ++--------- arch/x86/boot/startup/sme.c | 29 +- arch/x86/coco/sev/Makefile | 6 +- arch/x86/coco/sev/core.c | 179 +++++++--- arch/x86/coco/sev/{sev-nmi.c => noinstr.c} | 74 ++++ arch/x86/coco/sev/vc-handle.c | 3 +- arch/x86/coco/sev/vc-shared.c | 143 +++++++- arch/x86/include/asm/init.h | 6 - arch/x86/include/asm/setup.h | 1 + arch/x86/include/asm/sev-internal.h | 29 +- arch/x86/include/asm/sev.h | 67 +++- arch/x86/kernel/head64.c | 5 +- arch/x86/kernel/head_32.S | 2 +- arch/x86/kernel/head_64.S | 10 +- arch/x86/kernel/vmlinux.lds.S | 7 +- arch/x86/mm/mem_encrypt_amd.c | 6 - arch/x86/mm/mem_encrypt_boot.S | 6 +- arch/x86/platform/pvh/head.S | 2 +- arch/x86/tools/relocs.c | 8 +- 26 files changed, 621 insertions(+), 689 deletions(-) create mode 100644 arch/x86/boot/startup/exports.h rename arch/x86/coco/sev/{sev-nmi.c => noinstr.c} (61%) base-commit: ed4d95d033e359f9445e85bf5a768a5859a5830b