Message ID | 20250504095230.2932860-40-ardb+git@google.com |
---|---|
State | New |
Headers | show |
Series | x86: strict separation of startup code | expand |
On Sun, May 04, 2025 at 02:20:04PM -0000, tip-bot2 for Ard Biesheuvel wrote: > The following commit has been merged into the x86/boot branch of tip: > > Commit-ID: 5297886f0cc45db5f4a804caf359e6e7874ee864 > Gitweb: https://git.kernel.org/tip/5297886f0cc45db5f4a804caf359e6e7874ee864 > Author: Ard Biesheuvel <ardb@kernel.org> > AuthorDate: Sun, 04 May 2025 11:52:45 +02:00 > Committer: Ingo Molnar <mingo@kernel.org> > CommitterDate: Sun, 04 May 2025 15:59:43 +02:00 > > x86/boot: Provide __pti_set_user_pgtbl() to startup code > > The SME encryption startup code populates page tables using the ordinary > set_pXX() helpers, and in a PTI build, these will call out to > __pti_set_user_pgtbl() to manipulate the shadow copy of the page tables > for user space. > > This is unneeded for the startup code, which only manipulates the > swapper page tables, and so this call could be avoided in this > particular case. So instead of exposing the ordinary > __pti_set_user_pgtblt() to the startup code after its gets confined into > its own symbol space, provide an alternative which just returns pgd, > which is always correct in the startup context. > > Annotate it as __weak for now, this will be dropped in a subsequent > patch. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > Signed-off-by: Ingo Molnar <mingo@kernel.org> > Cc: Arnd Bergmann <arnd@arndb.de> > Cc: David Woodhouse <dwmw@amazon.co.uk> > Cc: Dionna Amalie Glaze <dionnaglaze@google.com> > Cc: H. Peter Anvin <hpa@zytor.com> > Cc: Kees Cook <keescook@chromium.org> > Cc: Kevin Loughlin <kevinloughlin@google.com> > Cc: Len Brown <len.brown@intel.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Tom Lendacky <thomas.lendacky@amd.com> > Cc: linux-efi@vger.kernel.org > Link: https://lore.kernel.org/r/20250504095230.2932860-40-ardb+git@google.com > --- > arch/x86/boot/startup/sme.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c > index 5738b31..753cd20 100644 > --- a/arch/x86/boot/startup/sme.c > +++ b/arch/x86/boot/startup/sme.c > @@ -564,3 +564,12 @@ void __head sme_enable(struct boot_params *bp) > cc_vendor = CC_VENDOR_AMD; > cc_set_mask(me_mask); > } > + > +#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION > +/* Local version for startup code, which never operates on user page tables */ > +__weak > +pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) > +{ > + return pgd; > +} > +#endif [ 1.227968] smp: Brought up 1 node, 32 CPUs [ 1.231576] smpboot: Total of 32 processors activated (191999.61 BogoMIPS) [ 1.247644] ------------[ cut here ]------------ [ 1.248697] WARNING: CPU: 17 PID: 104 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x2a/0x80 [ 1.251592] Modules linked in: [ 1.252370] CPU: 17 UID: 0 PID: 104 Comm: kworker/17:0 Not tainted 6.15.0-rc4+ #2 PREEMPT(voluntary) [ 1.253173] node 0 deferred pages initialised in 12ms [ 1.254539] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 [ 1.257490] Workqueue: events unaccepted_cleanup_work [ 1.258698] RIP: 0010:__static_key_slow_dec_cpuslocked+0x2a/0x80 [ 1.259574] Code: 0f 1f 44 00 00 53 48 89 fb e8 82 66 e5 ff 8b 03 85 c0 78 16 74 54 83 f8 01 74 11 8d 50 ff f0 0f b1 13 75 ec 5b e9 66 bf 8c 00 <0f> 0b 48 c7 c7 00 44 54 82 e8 a8 5f 8c 00 8b 03 83 f8 ff 74 33 85 [ 1.266446] Memory: 7574396K/8381588K available (13737K kernel code, 2487K rwdata, 6056K rodata, 3916K init, 3592K bss, 791248K reserved, 0K cma-reserved) [ 1.263574] RSP: 0018:ffffc9000048fe40 EFLAGS: 00010286 [ 1.267573] RAX: 00000000ffffffff RBX: ffffffff82f10d00 RCX: 0000000000000000 [ 1.269388] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff82f10d00 [ 1.275578] RBP: ffff88827b7d4d98 R08: 8080808080808080 R09: ffff888100b59100 [ 1.277441] R10: ffff888100050cc0 R11: fefefefefefefeff R12: ffff88827326e300 [ 1.279574] R13: ffff888100bc1940 R14: ffff8881000e2405 R15: ffff8881000e2400 [ 1.281460] FS: 0000000000000000(0000) GS:ffff8882f0400000(0000) knlGS:0000000000000000 [ 1.281460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.281460] CR2: 0000000000000000 CR3: 0008000002c22000 CR4: 00000000003506f0 [ 1.287576] Call Trace: [ 1.288381] <TASK> [ 1.289015] static_key_slow_dec+0x1f/0x40 [ 1.289980] process_one_work+0x171/0x330 [ 1.290988] worker_thread+0x247/0x390 [ 1.291576] ? __pfx_worker_thread+0x10/0x10 [ 1.292773] kthread+0x107/0x240 [ 1.293536] ? __pfx_kthread+0x10/0x10 [ 1.295573] ret_from_fork+0x30/0x50 [ 1.295575] ? __pfx_kthread+0x10/0x10 [ 1.296580] ret_from_fork_asm+0x1a/0x30 [ 1.297507] </TASK> [ 1.298085] ---[ end trace 0000000000000000 ]--- mingo simply doesn't want to listen and stop queueing untested patches. So lemme whack this one. My SNP guest had CONFIG_MITIGATION_PAGE_TABLE_ISOLATION=y leading to the above. Thx.
On Mon, 5 May 2025 at 18:04, Borislav Petkov <bp@alien8.de> wrote: > > On Sun, May 04, 2025 at 02:20:04PM -0000, tip-bot2 for Ard Biesheuvel wrote: > > The following commit has been merged into the x86/boot branch of tip: > > > > Commit-ID: 5297886f0cc45db5f4a804caf359e6e7874ee864 > > Gitweb: https://git.kernel.org/tip/5297886f0cc45db5f4a804caf359e6e7874ee864 > > Author: Ard Biesheuvel <ardb@kernel.org> > > AuthorDate: Sun, 04 May 2025 11:52:45 +02:00 > > Committer: Ingo Molnar <mingo@kernel.org> > > CommitterDate: Sun, 04 May 2025 15:59:43 +02:00 > > > > x86/boot: Provide __pti_set_user_pgtbl() to startup code > > > > The SME encryption startup code populates page tables using the ordinary > > set_pXX() helpers, and in a PTI build, these will call out to > > __pti_set_user_pgtbl() to manipulate the shadow copy of the page tables > > for user space. > > > > This is unneeded for the startup code, which only manipulates the > > swapper page tables, and so this call could be avoided in this > > particular case. So instead of exposing the ordinary > > __pti_set_user_pgtblt() to the startup code after its gets confined into > > its own symbol space, provide an alternative which just returns pgd, > > which is always correct in the startup context. > > > > Annotate it as __weak for now, this will be dropped in a subsequent > > patch. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > Signed-off-by: Ingo Molnar <mingo@kernel.org> > > Cc: Arnd Bergmann <arnd@arndb.de> > > Cc: David Woodhouse <dwmw@amazon.co.uk> > > Cc: Dionna Amalie Glaze <dionnaglaze@google.com> > > Cc: H. Peter Anvin <hpa@zytor.com> > > Cc: Kees Cook <keescook@chromium.org> > > Cc: Kevin Loughlin <kevinloughlin@google.com> > > Cc: Len Brown <len.brown@intel.com> > > Cc: Linus Torvalds <torvalds@linux-foundation.org> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > Cc: Tom Lendacky <thomas.lendacky@amd.com> > > Cc: linux-efi@vger.kernel.org > > Link: https://lore.kernel.org/r/20250504095230.2932860-40-ardb+git@google.com > > --- > > arch/x86/boot/startup/sme.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c > > index 5738b31..753cd20 100644 > > --- a/arch/x86/boot/startup/sme.c > > +++ b/arch/x86/boot/startup/sme.c > > @@ -564,3 +564,12 @@ void __head sme_enable(struct boot_params *bp) > > cc_vendor = CC_VENDOR_AMD; > > cc_set_mask(me_mask); > > } > > + > > +#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION > > +/* Local version for startup code, which never operates on user page tables */ > > +__weak > > +pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) > > +{ > > + return pgd; > > +} > > +#endif > > [ 1.227968] smp: Brought up 1 node, 32 CPUs > [ 1.231576] smpboot: Total of 32 processors activated (191999.61 BogoMIPS) > [ 1.247644] ------------[ cut here ]------------ > [ 1.248697] WARNING: CPU: 17 PID: 104 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x2a/0x80 > [ 1.251592] Modules linked in: > [ 1.252370] CPU: 17 UID: 0 PID: 104 Comm: kworker/17:0 Not tainted 6.15.0-rc4+ #2 PREEMPT(voluntary) > [ 1.253173] node 0 deferred pages initialised in 12ms > [ 1.254539] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 > [ 1.257490] Workqueue: events unaccepted_cleanup_work > [ 1.258698] RIP: 0010:__static_key_slow_dec_cpuslocked+0x2a/0x80 > [ 1.259574] Code: 0f 1f 44 00 00 53 48 89 fb e8 82 66 e5 ff 8b 03 85 c0 78 16 74 54 83 f8 01 74 11 8d 50 ff f0 0f b1 13 75 ec 5b e9 66 bf 8c 00 <0f> 0b 48 c7 c7 00 44 54 82 e8 a8 5f 8c 00 8b 03 83 f8 ff 74 33 85 > [ 1.266446] Memory: 7574396K/8381588K available (13737K kernel code, 2487K rwdata, 6056K rodata, 3916K init, 3592K bss, 791248K reserved, 0K cma-reserved) > [ 1.263574] RSP: 0018:ffffc9000048fe40 EFLAGS: 00010286 > [ 1.267573] RAX: 00000000ffffffff RBX: ffffffff82f10d00 RCX: 0000000000000000 > [ 1.269388] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff82f10d00 > [ 1.275578] RBP: ffff88827b7d4d98 R08: 8080808080808080 R09: ffff888100b59100 > [ 1.277441] R10: ffff888100050cc0 R11: fefefefefefefeff R12: ffff88827326e300 > [ 1.279574] R13: ffff888100bc1940 R14: ffff8881000e2405 R15: ffff8881000e2400 > [ 1.281460] FS: 0000000000000000(0000) GS:ffff8882f0400000(0000) knlGS:0000000000000000 > [ 1.281460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.281460] CR2: 0000000000000000 CR3: 0008000002c22000 CR4: 00000000003506f0 > [ 1.287576] Call Trace: > [ 1.288381] <TASK> > [ 1.289015] static_key_slow_dec+0x1f/0x40 > [ 1.289980] process_one_work+0x171/0x330 > [ 1.290988] worker_thread+0x247/0x390 > [ 1.291576] ? __pfx_worker_thread+0x10/0x10 > [ 1.292773] kthread+0x107/0x240 > [ 1.293536] ? __pfx_kthread+0x10/0x10 > [ 1.295573] ret_from_fork+0x30/0x50 > [ 1.295575] ? __pfx_kthread+0x10/0x10 > [ 1.296580] ret_from_fork_asm+0x1a/0x30 > [ 1.297507] </TASK> > [ 1.298085] ---[ end trace 0000000000000000 ]--- > > mingo simply doesn't want to listen and stop queueing untested patches. > > So lemme whack this one. > > My SNP guest had CONFIG_MITIGATION_PAGE_TABLE_ISOLATION=y leading to the > above. > This patch by itself does nothing. The symbol is __weak for the time being, and given that this code does not have its __pi_ prefixes yet, this function will be superseded by the existing one. (If you remove the __weak you will get a linker error) Are you sure this patch is causing the issue?
On Mon, May 05, 2025 at 06:19:49PM +0200, Ard Biesheuvel wrote: > This patch by itself does nothing. The symbol is __weak for the time > being, and given that this code does not have its __pi_ prefixes yet, > this function will be superseded by the existing one. (If you remove > the __weak you will get a linker error) > > Are you sure this patch is causing the issue? My by-foot bisection of x86/boot is below. I don't know if this patch uncovers something or maybe changes placement... Tom also pointed to 4067196a5227 ("mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()") as potentially related. And now I went and tried to reproduce the warning and it fired ontop of: 419cbaf6a56a ("x86/boot: Add a bunch of PIC aliases") which is the previous patch. Ufff, this is one of those which don't reproduce reliably because it was fine on that commit in the previous run. Nasty. Lemme go dig more. --- ed4d95d033e3 (HEAD, refs/remotes/tip/x86/boot) x86/sev: Disentangle #VC handling code from startup code <--- NOT [ 1.227968] smp: Brought up 1 node, 32 CPUs [ 1.231576] smpboot: Total of 32 processors activated (191999.61 BogoMIPS) [ 1.247644] ------------[ cut here ]------------ [ 1.248697] WARNING: CPU: 17 PID: 104 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x2a/0x80 [ 1.251592] Modules linked in: [ 1.252370] CPU: 17 UID: 0 PID: 104 Comm: kworker/17:0 Not tainted 6.15.0-rc4+ #2 PREEMPT(voluntary) [ 1.253173] node 0 deferred pages initialised in 12ms [ 1.254539] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 [ 1.257490] Workqueue: events unaccepted_cleanup_work [ 1.258698] RIP: 0010:__static_key_slow_dec_cpuslocked+0x2a/0x80 [ 1.259574] Code: 0f 1f 44 00 00 53 48 89 fb e8 82 66 e5 ff 8b 03 85 c0 78 16 74 54 83 f8 01 74 11 8d 50 ff f0 0f b1 13 75 ec 5b e9 66 bf 8c 00 <0f> 0b 48 c7 c7 00 44 54 82 e8 a8 5f 8c 00 8b 03 83 f8 ff 74 33 85 [ 1.266446] Memory: 7574396K/8381588K available (13737K kernel code, 2487K rwdata, 6056K rodata, 3916K init, 3592K bss, 791248K reserved, 0K cma-reserved) [ 1.263574] RSP: 0018:ffffc9000048fe40 EFLAGS: 00010286 [ 1.267573] RAX: 00000000ffffffff RBX: ffffffff82f10d00 RCX: 0000000000000000 [ 1.269388] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff82f10d00 [ 1.275578] RBP: ffff88827b7d4d98 R08: 8080808080808080 R09: ffff888100b59100 [ 1.277441] R10: ffff888100050cc0 R11: fefefefefefefeff R12: ffff88827326e300 [ 1.279574] R13: ffff888100bc1940 R14: ffff8881000e2405 R15: ffff8881000e2400 [ 1.281460] FS: 0000000000000000(0000) GS:ffff8882f0400000(0000) knlGS:0000000000000000 [ 1.281460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.281460] CR2: 0000000000000000 CR3: 0008000002c22000 CR4: 00000000003506f0 [ 1.287576] Call Trace: [ 1.288381] <TASK> [ 1.289015] static_key_slow_dec+0x1f/0x40 [ 1.289980] process_one_work+0x171/0x330 [ 1.290988] worker_thread+0x247/0x390 [ 1.291576] ? __pfx_worker_thread+0x10/0x10 [ 1.292773] kthread+0x107/0x240 [ 1.293536] ? __pfx_kthread+0x10/0x10 [ 1.295573] ret_from_fork+0x30/0x50 [ 1.295575] ? __pfx_kthread+0x10/0x10 [ 1.296580] ret_from_fork_asm+0x1a/0x30 [ 1.297507] </TASK> [ 1.298085] ---[ end trace 0000000000000000 ]--- 5297886f0cc4 x86/boot: Provide __pti_set_user_pgtbl() to startup code <--- OK 419cbaf6a56a x86/boot: Add a bunch of PIC aliases f932adcc8650 x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases <--- OK ae862964cbc5 x86/sev: Move instruction decoder into separate source file fae89bbfdd9d x86/sev: Make sev_snp_enabled() a static function b3464a36f7f2 x86/boot: Disregard __supported_pte_mask in __startup_64() bd4a58beaaf1 x86/boot: Move early_setup_gdt() back into head64.c <--- OK
+ Kirill. This looks like this unaccepted_cleanup_work() thing being unsynchronized... On Mon, May 05, 2025 at 06:47:59PM +0200, Borislav Petkov wrote: > On Mon, May 05, 2025 at 06:19:49PM +0200, Ard Biesheuvel wrote: > > This patch by itself does nothing. The symbol is __weak for the time > > being, and given that this code does not have its __pi_ prefixes yet, > > this function will be superseded by the existing one. (If you remove > > the __weak you will get a linker error) > > > > Are you sure this patch is causing the issue? > > My by-foot bisection of x86/boot is below. > > I don't know if this patch uncovers something or maybe changes placement... > > Tom also pointed to > > 4067196a5227 ("mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()") > > as potentially related. > > And now I went and tried to reproduce the warning and it fired ontop of: > > 419cbaf6a56a ("x86/boot: Add a bunch of PIC aliases") > > which is the previous patch. > > Ufff, this is one of those which don't reproduce reliably because it was fine > on that commit in the previous run. Nasty. > > Lemme go dig more. > > --- > > ed4d95d033e3 (HEAD, refs/remotes/tip/x86/boot) x86/sev: Disentangle #VC handling code from startup code > > <--- NOT > > [ 1.227968] smp: Brought up 1 node, 32 CPUs > [ 1.231576] smpboot: Total of 32 processors activated (191999.61 BogoMIPS) > [ 1.247644] ------------[ cut here ]------------ > [ 1.248697] WARNING: CPU: 17 PID: 104 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x2a/0x80 > [ 1.251592] Modules linked in: > [ 1.252370] CPU: 17 UID: 0 PID: 104 Comm: kworker/17:0 Not tainted 6.15.0-rc4+ #2 PREEMPT(voluntary) > [ 1.253173] node 0 deferred pages initialised in 12ms > [ 1.254539] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS unknown 02/02/2022 > [ 1.257490] Workqueue: events unaccepted_cleanup_work > [ 1.258698] RIP: 0010:__static_key_slow_dec_cpuslocked+0x2a/0x80 > [ 1.259574] Code: 0f 1f 44 00 00 53 48 89 fb e8 82 66 e5 ff 8b 03 85 c0 78 16 74 54 83 f8 01 74 11 8d 50 ff f0 0f b1 13 75 ec 5b e9 66 bf 8c 00 <0f> 0b 48 c7 c7 00 44 54 82 e8 a8 5f 8c 00 8b 03 83 f8 ff 74 33 85 > [ 1.266446] Memory: 7574396K/8381588K available (13737K kernel code, 2487K rwdata, 6056K rodata, 3916K init, 3592K bss, 791248K reserved, 0K cma-reserved) > [ 1.263574] RSP: 0018:ffffc9000048fe40 EFLAGS: 00010286 > [ 1.267573] RAX: 00000000ffffffff RBX: ffffffff82f10d00 RCX: 0000000000000000 > [ 1.269388] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffffff82f10d00 > [ 1.275578] RBP: ffff88827b7d4d98 R08: 8080808080808080 R09: ffff888100b59100 > [ 1.277441] R10: ffff888100050cc0 R11: fefefefefefefeff R12: ffff88827326e300 > [ 1.279574] R13: ffff888100bc1940 R14: ffff8881000e2405 R15: ffff8881000e2400 > [ 1.281460] FS: 0000000000000000(0000) GS:ffff8882f0400000(0000) knlGS:0000000000000000 > [ 1.281460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.281460] CR2: 0000000000000000 CR3: 0008000002c22000 CR4: 00000000003506f0 > [ 1.287576] Call Trace: > [ 1.288381] <TASK> > [ 1.289015] static_key_slow_dec+0x1f/0x40 > [ 1.289980] process_one_work+0x171/0x330 > [ 1.290988] worker_thread+0x247/0x390 > [ 1.291576] ? __pfx_worker_thread+0x10/0x10 > [ 1.292773] kthread+0x107/0x240 > [ 1.293536] ? __pfx_kthread+0x10/0x10 > [ 1.295573] ret_from_fork+0x30/0x50 > [ 1.295575] ? __pfx_kthread+0x10/0x10 > [ 1.296580] ret_from_fork_asm+0x1a/0x30 > [ 1.297507] </TASK> > [ 1.298085] ---[ end trace 0000000000000000 ]--- > > 5297886f0cc4 x86/boot: Provide __pti_set_user_pgtbl() to startup code > > <--- OK > > 419cbaf6a56a x86/boot: Add a bunch of PIC aliases > f932adcc8650 x86/linkage: Add SYM_PIC_ALIAS() macro helper to emit symbol aliases > > <--- OK > > ae862964cbc5 x86/sev: Move instruction decoder into separate source file > fae89bbfdd9d x86/sev: Make sev_snp_enabled() a static function > b3464a36f7f2 x86/boot: Disregard __supported_pte_mask in __startup_64() > bd4a58beaaf1 x86/boot: Move early_setup_gdt() back into head64.c > > <--- OK > > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
diff --git a/arch/x86/boot/startup/sme.c b/arch/x86/boot/startup/sme.c index 7fc6a689cefe..1e9f1f5a753c 100644 --- a/arch/x86/boot/startup/sme.c +++ b/arch/x86/boot/startup/sme.c @@ -557,3 +557,12 @@ void __head sme_enable(struct boot_params *bp) cc_vendor = CC_VENDOR_AMD; cc_set_mask(me_mask); } + +#ifdef CONFIG_MITIGATION_PAGE_TABLE_ISOLATION +/* Local version for startup code, which never operates on user page tables */ +__weak +pgd_t __pti_set_user_pgtbl(pgd_t *pgdp, pgd_t pgd) +{ + return pgd; +} +#endif