mbox series

[v2,0/5] x86/speculation: Disable IBRS when idle

Message ID 20230620140625.1001886-1-longman@redhat.com
Headers show
Series x86/speculation: Disable IBRS when idle | expand

Message

Waiman Long June 20, 2023, 2:06 p.m. UTC
For Intel processors that need to turn on IBRS to protect against
Spectre v2 and Retbleed, the IBRS bit in the SPEC_CTRL MSR affects
the performance of the whole core even if only one thread is turning
it on when running in the kernel. For user space heavy applications,
the performance impact of occasionally turning IBRS on during syscalls
shouldn't be significant. Unfortunately, that is not the case when the
sibling thread is idling in the kernel. In that case, the performance
impact can be significant.

When DPDK is running on an isolated CPU thread processing network packets
in user space while its sibling thread is idle. The performance of the
busy DPDK thread with IBRS on and off in the sibling idle thread are:

                                IBRS on         IBRS off
                                -------         --------
  packets/second:                  7.8M           10.4M
  avg tsc cycles/packet:         282.26          209.86

This is a 25% performance degradation. The test system is a Intel Xeon
4114 CPU @ 2.20GHz.

Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
disables IBRS when the CPU enters long idle (C6 or below). However, there
are existing users out there who have set "intel_idle.max_cstate=1" or
even "intel_idle.max_cstate=0" to decrease latency. Those users won't be
able to benefit from this commit. This patch series extends this commit
by providing a new "intel_idle.no_ibrs" module option to force disable
IBRS even when "intel_idle.max_cstate=1" at the expense of increased IRQ
response latency. It also includes commit to allow the disabling of IBRS
with "intel_idle.max_cstate=0" as well as when a CPU becomes offline.

The first patch adds a new x86/spec_ctrl_msrs debugfs file which display
the current cached values of the SPEC_CTRL MSRs of all the CPUs. This
allows us to verify that IBRS bit is correctly turned off in idle CPUs
for various cstate values.

Waiman Long (5):
  x86/speculation: Provide a debugfs file to dump SPEC_CTRL MSRs
  x86/idle: Disable IBRS when cpu is offline
  intel_idle: Sync up the SPEC_CTRL MSR value to x86_spec_ctrl_current
  intel_idle: Add no_ibrs module parameter to force disable IBRS
  x86/idle: Disable IBRS entering mwait idle and enable it on wakeup

 arch/x86/include/asm/mwait.h | 17 ++++++++
 arch/x86/kernel/cpu/bugs.c   | 79 ++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/smpboot.c    | 13 ++++++
 drivers/idle/intel_idle.c    | 22 ++++++++--
 4 files changed, 127 insertions(+), 4 deletions(-)

Comments

Peter Zijlstra June 21, 2023, 7:23 a.m. UTC | #1
On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> disables IBRS when the CPU enters long idle. However, when a CPU becomes
> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
> enabled. That will impact the performance of a sibling CPU. Mitigate
> this performance impact by clearing all the mitigation bits in SPEC_CTRL
> MSR when offline and restoring the value of the MSR when it becomes
> online again.
> 
> Signed-off-by: Waiman Long <longman@redhat.com>
> ---
>  arch/x86/kernel/smpboot.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 352f0ce1ece4..5ff82fef413c 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -84,6 +84,7 @@
>  #include <asm/hw_irq.h>
>  #include <asm/stackprotector.h>
>  #include <asm/sev.h>
> +#include <asm/nospec-branch.h>
>  
>  /* representing HT siblings of each logical CPU */
>  DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>  
>  void native_play_dead(void)
>  {
> +	u64 spec_ctrl = spec_ctrl_current();
> +
> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> +		this_cpu_write(x86_spec_ctrl_current, 0);
> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> +	}
> +
>  	play_dead_common();
>  	tboot_shutdown(TB_SHUTDOWN_WFS);
>  
>  	mwait_play_dead();
>  	if (cpuidle_play_dead())
>  		hlt_play_dead();
> +
> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> +	}
>  }

play_dead() is marked __noreturn
Peter Zijlstra June 21, 2023, 7:38 a.m. UTC | #2
On Tue, Jun 20, 2023 at 10:06:23AM -0400, Waiman Long wrote:
> When intel_idle_ibrs() is called, it modifies the SPEC_CTRL MSR to 0
> in order disable IBRS. However, the new MSR value isn't reflected in
> x86_spec_ctrl_current. That will cause the new spec_ctrl_msrs debugfs
> file to show incorrect result. Fix that by updating x86_spec_ctrl_current
> percpu value to always match the content of the SPEC_CTRL MSR.

What debugfs file?
Peter Zijlstra June 21, 2023, 7:41 a.m. UTC | #3
On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
> 

Pff, clearly I can't even read email anymore..

We don't do this for any of the other MSRs, so why start now?
Borislav Petkov June 21, 2023, 8:24 a.m. UTC | #4
On Wed, Jun 21, 2023 at 09:41:05AM +0200, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
> > Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
> > mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
> > debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
> > 
> 
> Pff, clearly I can't even read email anymore..
> 
> We don't do this for any of the other MSRs, so why start now?

Hell no.

There's /sys/devices/system/cpu/vulnerabilities/ for that.

We are abstracting MSRs away from APIs - not do the backwards thing.
Waiman Long June 21, 2023, 1:47 p.m. UTC | #5
On 6/21/23 03:41, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
>> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
>> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
>> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
>>
> Pff, clearly I can't even read email anymore..
>
> We don't do this for any of the other MSRs, so why start now?

That is true since most of the MSRs are static. IOW, they don't change 
once they are set. The current way to read the content of the MSRs is 
via the /dev/cpu/<n>/msr files. There are user space tools to do that.

SPEC_CTRL, however, can be subjected to frequent changes especially when 
X86_FEATURE_KERNEL_IBRS is set. As a result, the current way of reading 
MSRs from /dev/cpu/<n>/msr doesn't quite work for SPEC_CTRL as the IBRS 
bit is always set due to the fact that the reading is done internally 
via an IPI in kernel space. That is the main reason that I add this 
debugfs file to get a good snapshot of the current set of cached 
SPEC_CTRL MSR values without the need to disturb what the CPUs are 
currently doing at that point in time.

This patch is not central to the main purpose of this patch series, but 
it does enable me to quickly verify the other patches are working 
correctly. I can take it out if people don't think it is a good idea.

Cheers,
Longman
Waiman Long June 21, 2023, 1:59 p.m. UTC | #6
On 6/21/23 03:23, Peter Zijlstra wrote:
> On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>> disables IBRS when the CPU enters long idle. However, when a CPU becomes
>> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
>> enabled. That will impact the performance of a sibling CPU. Mitigate
>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>> MSR when offline and restoring the value of the MSR when it becomes
>> online again.
>>
>> Signed-off-by: Waiman Long <longman@redhat.com>
>> ---
>>   arch/x86/kernel/smpboot.c | 13 +++++++++++++
>>   1 file changed, 13 insertions(+)
>>
>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>> index 352f0ce1ece4..5ff82fef413c 100644
>> --- a/arch/x86/kernel/smpboot.c
>> +++ b/arch/x86/kernel/smpboot.c
>> @@ -84,6 +84,7 @@
>>   #include <asm/hw_irq.h>
>>   #include <asm/stackprotector.h>
>>   #include <asm/sev.h>
>> +#include <asm/nospec-branch.h>
>>   
>>   /* representing HT siblings of each logical CPU */
>>   DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>>   
>>   void native_play_dead(void)
>>   {
>> +	u64 spec_ctrl = spec_ctrl_current();
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> +		this_cpu_write(x86_spec_ctrl_current, 0);
>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>> +	}
>> +
>>   	play_dead_common();
>>   	tboot_shutdown(TB_SHUTDOWN_WFS);
>>   
>>   	mwait_play_dead();
>>   	if (cpuidle_play_dead())
>>   		hlt_play_dead();
>> +
>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>> +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>> +	}
>>   }
> play_dead() is marked __noreturn

There are different versions of play_dead() in the kernel. Some of them 
are indeed marked __noreturn like the non-SMP one in 
arch/x86/kernel/process.c. The native_play_dead() that I am patching 
isn't one of those.

Cheers,
Longman
Waiman Long June 21, 2023, 2:02 p.m. UTC | #7
On 6/21/23 04:24, Borislav Petkov wrote:
> On Wed, Jun 21, 2023 at 09:41:05AM +0200, Peter Zijlstra wrote:
>> On Tue, Jun 20, 2023 at 10:06:21AM -0400, Waiman Long wrote:
>>> Sometimes it is useful to know the states the SPEC_CTRL MSRs to see what
>>> mitigations are enabled at run time. Provide a new x86/spec_ctrl_msrs
>>> debugfs file to dump the cached versions of the current SPEC_CTRL MSRs.
>>>
>> Pff, clearly I can't even read email anymore..
>>
>> We don't do this for any of the other MSRs, so why start now?
> Hell no.
>
> There's /sys/devices/system/cpu/vulnerabilities/ for that.
>
> We are abstracting MSRs away from APIs - not do the backwards thing.
>
OK, as I have said. This is not central to the main purpose of this 
patch series. It is mostly there for verification purpose. I can 
certainly take this out.

Cheers,
Longman
Peter Zijlstra June 21, 2023, 2:36 p.m. UTC | #8
On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote:
> 
> On 6/21/23 03:23, Peter Zijlstra wrote:
> > On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
> > > Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
> > > disables IBRS when the CPU enters long idle. However, when a CPU becomes
> > > offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
> > > enabled. That will impact the performance of a sibling CPU. Mitigate
> > > this performance impact by clearing all the mitigation bits in SPEC_CTRL
> > > MSR when offline and restoring the value of the MSR when it becomes
> > > online again.
> > > 
> > > Signed-off-by: Waiman Long <longman@redhat.com>
> > > ---
> > >   arch/x86/kernel/smpboot.c | 13 +++++++++++++
> > >   1 file changed, 13 insertions(+)
> > > 
> > > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> > > index 352f0ce1ece4..5ff82fef413c 100644
> > > --- a/arch/x86/kernel/smpboot.c
> > > +++ b/arch/x86/kernel/smpboot.c
> > > @@ -84,6 +84,7 @@
> > >   #include <asm/hw_irq.h>
> > >   #include <asm/stackprotector.h>
> > >   #include <asm/sev.h>
> > > +#include <asm/nospec-branch.h>
> > >   /* representing HT siblings of each logical CPU */
> > >   DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
> > > @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
> > >   void native_play_dead(void)
> > >   {
> > > +	u64 spec_ctrl = spec_ctrl_current();
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> > > +		this_cpu_write(x86_spec_ctrl_current, 0);
> > > +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
> > > +	}
> > > +
> > >   	play_dead_common();
> > >   	tboot_shutdown(TB_SHUTDOWN_WFS);
> > >   	mwait_play_dead();
> > >   	if (cpuidle_play_dead())
> > >   		hlt_play_dead();
> > > +
> > > +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
> > > +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
> > > +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
> > > +	}
> > >   }
> > play_dead() is marked __noreturn
> 
> There are different versions of play_dead() in the kernel. Some of them are
> indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c.
> The native_play_dead() that I am patching isn't one of those.

mostly by accident I think, hlt_play_dead() is, so I'm thinking
everybody (all compiler and objtool) managed to figure out
native_play_dead() is __noreturn too.
Waiman Long June 21, 2023, 2:44 p.m. UTC | #9
On 6/21/23 10:36, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 09:59:52AM -0400, Waiman Long wrote:
>> On 6/21/23 03:23, Peter Zijlstra wrote:
>>> On Tue, Jun 20, 2023 at 10:06:22AM -0400, Waiman Long wrote:
>>>> Commit bf5835bcdb96 ("intel_idle: Disable IBRS during long idle")
>>>> disables IBRS when the CPU enters long idle. However, when a CPU becomes
>>>> offline, the IBRS bit is still set when X86_FEATURE_KERNEL_IBRS is
>>>> enabled. That will impact the performance of a sibling CPU. Mitigate
>>>> this performance impact by clearing all the mitigation bits in SPEC_CTRL
>>>> MSR when offline and restoring the value of the MSR when it becomes
>>>> online again.
>>>>
>>>> Signed-off-by: Waiman Long <longman@redhat.com>
>>>> ---
>>>>    arch/x86/kernel/smpboot.c | 13 +++++++++++++
>>>>    1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
>>>> index 352f0ce1ece4..5ff82fef413c 100644
>>>> --- a/arch/x86/kernel/smpboot.c
>>>> +++ b/arch/x86/kernel/smpboot.c
>>>> @@ -84,6 +84,7 @@
>>>>    #include <asm/hw_irq.h>
>>>>    #include <asm/stackprotector.h>
>>>>    #include <asm/sev.h>
>>>> +#include <asm/nospec-branch.h>
>>>>    /* representing HT siblings of each logical CPU */
>>>>    DEFINE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_sibling_map);
>>>> @@ -1838,12 +1839,24 @@ void __noreturn hlt_play_dead(void)
>>>>    void native_play_dead(void)
>>>>    {
>>>> +	u64 spec_ctrl = spec_ctrl_current();
>>>> +
>>>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>>>> +		this_cpu_write(x86_spec_ctrl_current, 0);
>>>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, 0);
>>>> +	}
>>>> +
>>>>    	play_dead_common();
>>>>    	tboot_shutdown(TB_SHUTDOWN_WFS);
>>>>    	mwait_play_dead();
>>>>    	if (cpuidle_play_dead())
>>>>    		hlt_play_dead();
>>>> +
>>>> +	if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS)) {
>>>> +		native_wrmsrl(MSR_IA32_SPEC_CTRL, spec_ctrl);
>>>> +		this_cpu_write(x86_spec_ctrl_current, spec_ctrl);
>>>> +	}
>>>>    }
>>> play_dead() is marked __noreturn
>> There are different versions of play_dead() in the kernel. Some of them are
>> indeed marked __noreturn like the non-SMP one in arch/x86/kernel/process.c.
>> The native_play_dead() that I am patching isn't one of those.
> mostly by accident I think, hlt_play_dead() is, so I'm thinking
> everybody (all compiler and objtool) managed to figure out
> native_play_dead() is __noreturn too.

Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an 
error which is not the typical case. My testing does confirm that this 
patch is able to keep the IBRS bit off when a CPU is offline via its 
online sysfs file.

Cheers,
Longman
Peter Zijlstra June 21, 2023, 2:48 p.m. UTC | #10
On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:

> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
> which is not the typical case. My testing does confirm that this patch is
> able to keep the IBRS bit off when a CPU is offline via its online sysfs
> file.

The point is; your re-enable IBRS hunk at the end is dead-code. It
should never ever run and having it is confusing.
Waiman Long June 21, 2023, 2:51 p.m. UTC | #11
On 6/21/23 10:48, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
>
>> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
>> which is not the typical case. My testing does confirm that this patch is
>> able to keep the IBRS bit off when a CPU is offline via its online sysfs
>> file.
> The point is; your re-enable IBRS hunk at the end is dead-code. It
> should never ever run and having it is confusing.

What I meant is that hlt_play_dead() should never be called unless there 
is some serious problem with the system and native_play_dead() does 
return in normal usage.

Cheers,
Longman
Peter Zijlstra June 21, 2023, 2:54 p.m. UTC | #12
On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote:
> 
> On 6/21/23 10:48, Peter Zijlstra wrote:
> > On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
> > 
> > > Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
> > > which is not the typical case. My testing does confirm that this patch is
> > > able to keep the IBRS bit off when a CPU is offline via its online sysfs
> > > file.
> > The point is; your re-enable IBRS hunk at the end is dead-code. It
> > should never ever run and having it is confusing.
> 
> What I meant is that hlt_play_dead() should never be called unless there is
> some serious problem with the system and native_play_dead() does return in
> normal usage.

This is all through arch_cpu_idle_dead() which is __noreturn. And no,
none of this must ever return.

If you want an offline CPU to come back, you re-init.
Waiman Long June 21, 2023, 3:42 p.m. UTC | #13
On 6/21/23 10:54, Peter Zijlstra wrote:
> On Wed, Jun 21, 2023 at 10:51:33AM -0400, Waiman Long wrote:
>> On 6/21/23 10:48, Peter Zijlstra wrote:
>>> On Wed, Jun 21, 2023 at 10:44:23AM -0400, Waiman Long wrote:
>>>
>>>> Well, hlt_play_dead() is only called if cpuidle_play_dead() returns an error
>>>> which is not the typical case. My testing does confirm that this patch is
>>>> able to keep the IBRS bit off when a CPU is offline via its online sysfs
>>>> file.
>>> The point is; your re-enable IBRS hunk at the end is dead-code. It
>>> should never ever run and having it is confusing.
>> What I meant is that hlt_play_dead() should never be called unless there is
>> some serious problem with the system and native_play_dead() does return in
>> normal usage.
> This is all through arch_cpu_idle_dead() which is __noreturn. And no,
> none of this must ever return.
>
> If you want an offline CPU to come back, you re-init.

Yes, you are right. I thought it will return. I will update the patch 
accordingly.

Thanks,
Longman