Message ID | 20230620140625.1001886-1-longman@redhat.com |
---|---|
Headers | show |
Series | x86/speculation: Disable IBRS when idle | expand |
On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote: > Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle") > disables IBRS when the CPU enters long idle. However, when a CPU becomes > offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is > enabled. That will impact the performance of a sibling CPU. Mitigate > this performance impact by clearing all the mitigation bits in SPEC_CTRL > MSR when offline and restoring the value of the MSR when it becomes > online again. > > Signed-off-by: Waiman Long <longman@redhat.com> > --- > arch/x86/kernel/smpboot.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 352f0ce1ece4..5ff82fef413c 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -84,6 +84,7 @@ > #include <asm/hw_irq.h> > #include <asm/stackprotector.h> > #include <asm/sev.h> > +#include <asm/nospec-branch.h> > > /* representing HT siblings of each logical CPU */ > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map); > @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void) > > void native_play_dead(void) > { > + u64 spec_ctrl = spec_ctrl_current(); > + > + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { > + this_cpu_write(x86_spec_ctrl_current, 0); > + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); > + } > + > play_dead_common(); > tboot_shutdown(TB_SHUTDOWN_WFS); > > mwait_play_dead(); > if (cpuidle_play_dead()) > hlt_play_dead(); > + > + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { > + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); > + this_cpu_write(x86_spec_ctrl_current, spec_ctrl); > + } > } play_dead() is marked __noreturn
On Tue, Jun 20, 2023 at 10:06:23AM -0400, Waiman Long wrote: > When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0 > in order disable IBRS. However, the new MSR value isn't reflected in > x86_spec_ctrl_current. That will cause the new spec_ctrl_msrs debugfs > file to show incorrect result. Fix that by updating x86_spec_ctrl_current > percpu value to always match the content of the SPEC_CTRL MSR. What debugfs file?
On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote: > Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what > mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs > debugfs file to dump the cached versions of the current SPEC_CTRL MSRs. > Pff, clearly I can't even read email anymore.. We don't do this for any of the other MSRs, so why start now?
On Wed, Jun 21, 2023 at 09:41:05AM +0200, Peter Zijlstra wrote: > On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote: > > Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what > > mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs > > debugfs file to dump the cached versions of the current SPEC_CTRL MSRs. > > > > Pff, clearly I can't even read email anymore.. > > We don't do this for any of the other MSRs, so why start now? Hell no. There's /sys/devices/system/cpu/vulnerabilities/ for that. We are abstracting MSRs away from APIs - not do the backwards thing.
On 6/21/23 03:41, Peter Zijlstra wrote: > On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote: >> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what >> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs >> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs. >> > Pff, clearly I can't even read email anymore.. > > We don't do this for any of the other MSRs, so why start now? That is true since most of the MSRs are static. IOW, they don't change once they are set. The current way to read the content of the MSRs is via the /dev/cpu/<n>/msr files. There are user space tools to do that. SPEC_CTRL, however, can be subjected to frequent changes especially when X86_FEATURE_KERNEL_IBRS is set. As a result, the current way of reading MSRs from /dev/cpu/<n>/msr doesn't quite work for SPEC_CTRL as the IBRS bit is always set due to the fact that the reading is done internally via an IPI in kernel space. That is the main reason that I add this debugfs file to get a good snapshot of the current set of cached SPEC_CTRL MSR values without the need to disturb what the CPUs are currently doing at that point in time. This patch is not central to the main purpose of this patch series, but it does enable me to quickly verify the other patches are working correctly. I can take it out if people don't think it is a good idea. Cheers, Longman
On 6/21/23 03:23, Peter Zijlstra wrote: > On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote: >> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle") >> disables IBRS when the CPU enters long idle. However, when a CPU becomes >> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is >> enabled. That will impact the performance of a sibling CPU. Mitigate >> this performance impact by clearing all the mitigation bits in SPEC_CTRL >> MSR when offline and restoring the value of the MSR when it becomes >> online again. >> >> Signed-off-by: Waiman Long <longman@redhat.com> >> --- >> arch/x86/kernel/smpboot.c | 13 +++++++++++++ >> 1 file changed, 13 insertions(+) >> >> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c >> index 352f0ce1ece4..5ff82fef413c 100644 >> --- a/arch/x86/kernel/smpboot.c >> +++ b/arch/x86/kernel/smpboot.c >> @@ -84,6 +84,7 @@ >> #include <asm/hw_irq.h> >> #include <asm/stackprotector.h> >> #include <asm/sev.h> >> +#include <asm/nospec-branch.h> >> >> /* representing HT siblings of each logical CPU */ >> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map); >> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void) >> >> void native_play_dead(void) >> { >> + u64 spec_ctrl = spec_ctrl_current(); >> + >> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { >> + this_cpu_write(x86_spec_ctrl_current, 0); >> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); >> + } >> + >> play_dead_common(); >> tboot_shutdown(TB_SHUTDOWN_WFS); >> >> mwait_play_dead(); >> if (cpuidle_play_dead()) >> hlt_play_dead(); >> + >> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { >> + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); >> + this_cpu_write(x86_spec_ctrl_current, spec_ctrl); >> + } >> } > play_dead() is marked __noreturn There are different versions of play_dead() in the kernel. Some of them are indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c. The native_play_dead() that I am patching isn't one of those. Cheers, Longman
On 6/21/23 04:24, Borislav Petkov wrote: > On Wed, Jun 21, 2023 at 09:41:05AM +0200, Peter Zijlstra wrote: >> On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote: >>> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what >>> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs >>> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs. >>> >> Pff, clearly I can't even read email anymore.. >> >> We don't do this for any of the other MSRs, so why start now? > Hell no. > > There's /sys/devices/system/cpu/vulnerabilities/ for that. > > We are abstracting MSRs away from APIs - not do the backwards thing. > OK, as I have said. This is not central to the main purpose of this patch series. It is mostly there for verification purpose. I can certainly take this out. Cheers, Longman
On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote: > > On 6/21/23 03:23, Peter Zijlstra wrote: > > On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote: > > > Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle") > > > disables IBRS when the CPU enters long idle. However, when a CPU becomes > > > offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is > > > enabled. That will impact the performance of a sibling CPU. Mitigate > > > this performance impact by clearing all the mitigation bits in SPEC_CTRL > > > MSR when offline and restoring the value of the MSR when it becomes > > > online again. > > > > > > Signed-off-by: Waiman Long <longman@redhat.com> > > > --- > > > arch/x86/kernel/smpboot.c | 13 +++++++++++++ > > > 1 file changed, 13 insertions(+) > > > > > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > > > index 352f0ce1ece4..5ff82fef413c 100644 > > > --- a/arch/x86/kernel/smpboot.c > > > +++ b/arch/x86/kernel/smpboot.c > > > @@ -84,6 +84,7 @@ > > > #include <asm/hw_irq.h> > > > #include <asm/stackprotector.h> > > > #include <asm/sev.h> > > > +#include <asm/nospec-branch.h> > > > /* representing HT siblings of each logical CPU */ > > > DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map); > > > @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void) > > > void native_play_dead(void) > > > { > > > + u64 spec_ctrl = spec_ctrl_current(); > > > + > > > + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { > > > + this_cpu_write(x86_spec_ctrl_current, 0); > > > + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); > > > + } > > > + > > > play_dead_common(); > > > tboot_shutdown(TB_SHUTDOWN_WFS); > > > mwait_play_dead(); > > > if (cpuidle_play_dead()) > > > hlt_play_dead(); > > > + > > > + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { > > > + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); > > > + this_cpu_write(x86_spec_ctrl_current, spec_ctrl); > > > + } > > > } > > play_dead() is marked __noreturn > > There are different versions of play_dead() in the kernel. Some of them are > indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c. > The native_play_dead() that I am patching isn't one of those. mostly by accident I think, hlt_play_dead() is, so I'm thinking everybody (all compiler and objtool) managed to figure out native_play_dead() is __noreturn too.
On 6/21/23 10:36, Peter Zijlstra wrote: > On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote: >> On 6/21/23 03:23, Peter Zijlstra wrote: >>> On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote: >>>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle") >>>> disables IBRS when the CPU enters long idle. However, when a CPU becomes >>>> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is >>>> enabled. That will impact the performance of a sibling CPU. Mitigate >>>> this performance impact by clearing all the mitigation bits in SPEC_CTRL >>>> MSR when offline and restoring the value of the MSR when it becomes >>>> online again. >>>> >>>> Signed-off-by: Waiman Long <longman@redhat.com> >>>> --- >>>> arch/x86/kernel/smpboot.c | 13 +++++++++++++ >>>> 1 file changed, 13 insertions(+) >>>> >>>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c >>>> index 352f0ce1ece4..5ff82fef413c 100644 >>>> --- a/arch/x86/kernel/smpboot.c >>>> +++ b/arch/x86/kernel/smpboot.c >>>> @@ -84,6 +84,7 @@ >>>> #include <asm/hw_irq.h> >>>> #include <asm/stackprotector.h> >>>> #include <asm/sev.h> >>>> +#include <asm/nospec-branch.h> >>>> /* representing HT siblings of each logical CPU */ >>>> DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map); >>>> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void) >>>> void native_play_dead(void) >>>> { >>>> + u64 spec_ctrl = spec_ctrl_current(); >>>> + >>>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { >>>> + this_cpu_write(x86_spec_ctrl_current, 0); >>>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, 0); >>>> + } >>>> + >>>> play_dead_common(); >>>> tboot_shutdown(TB_SHUTDOWN_WFS); >>>> mwait_play_dead(); >>>> if (cpuidle_play_dead()) >>>> hlt_play_dead(); >>>> + >>>> + if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) { >>>> + native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl); >>>> + this_cpu_write(x86_spec_ctrl_current, spec_ctrl); >>>> + } >>>> } >>> play_dead() is marked __noreturn >> There are different versions of play_dead() in the kernel. Some of them are >> indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c. >> The native_play_dead() that I am patching isn't one of those. > mostly by accident I think, hlt_play_dead() is, so I'm thinking > everybody (all compiler and objtool) managed to figure out > native_play_dead() is __noreturn too. Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error which is not the typical case. My testing does confirm that this patch is able to keep the IBRS bit off when a CPU is offline via its online sysfs file. Cheers, Longman
On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote: > Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error > which is not the typical case. My testing does confirm that this patch is > able to keep the IBRS bit off when a CPU is offline via its online sysfs > file. The point is; your re-enable IBRS hunk at the end is dead-code. It should never ever run and having it is confusing.
On 6/21/23 10:48, Peter Zijlstra wrote: > On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote: > >> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error >> which is not the typical case. My testing does confirm that this patch is >> able to keep the IBRS bit off when a CPU is offline via its online sysfs >> file. > The point is; your re-enable IBRS hunk at the end is dead-code. It > should never ever run and having it is confusing. What I meant is that hlt_play_dead() should never be called unless there is some serious problem with the system and native_play_dead() does return in normal usage. Cheers, Longman
On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote: > > On 6/21/23 10:48, Peter Zijlstra wrote: > > On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote: > > > > > Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error > > > which is not the typical case. My testing does confirm that this patch is > > > able to keep the IBRS bit off when a CPU is offline via its online sysfs > > > file. > > The point is; your re-enable IBRS hunk at the end is dead-code. It > > should never ever run and having it is confusing. > > What I meant is that hlt_play_dead() should never be called unless there is > some serious problem with the system and native_play_dead() does return in > normal usage. This is all through arch_cpu_idle_dead() which is __noreturn. And no, none of this must ever return. If you want an offline CPU to come back, you re-init.
On 6/21/23 10:54, Peter Zijlstra wrote: > On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote: >> On 6/21/23 10:48, Peter Zijlstra wrote: >>> On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote: >>> >>>> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error >>>> which is not the typical case. My testing does confirm that this patch is >>>> able to keep the IBRS bit off when a CPU is offline via its online sysfs >>>> file. >>> The point is; your re-enable IBRS hunk at the end is dead-code. It >>> should never ever run and having it is confusing. >> What I meant is that hlt_play_dead() should never be called unless there is >> some serious problem with the system and native_play_dead() does return in >> normal usage. > This is all through arch_cpu_idle_dead() which is __noreturn. And no, > none of this must ever return. > > If you want an offline CPU to come back, you re-init. Yes, you are right. I thought it will return. I will update the patch accordingly. Thanks, Longman