diff mbox series

[v4,03/16] x86/alternatives: Disable LASS when patching kernel alternatives

Message ID 20240710160655.3402786-4-alexander.shishkin@linux.intel.com
State New
Headers show
Series Enable Linear Address Space Separation support | expand

Commit Message

Alexander Shishkin July 10, 2024, 4:06 p.m. UTC
From: Sohil Mehta <sohil.mehta@intel.com>

For patching, the kernel initializes a temporary mm area in the lower
half of the address range. See commit 4fc19708b165 ("x86/alternatives:
Initialize temporary mm for patching").

Disable LASS enforcement during patching using the stac()/clac()
instructions to avoid triggering a #GP fault.

The objtool warns due to a call to a non-allowed function that exists
outside of the stac/clac guard, or references to any function with a
dynamic function pointer inside the guard. See the Objtool warnings
section #9 in the document tools/objtool/Documentation/objtool.txt.

Considering that patching is usually small, replace the memcpy and
memset functions in the text poking functions with their inline versions
respectively.

Signed-off-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/kernel/alternative.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

Comments

Borislav Petkov July 10, 2024, 5:18 p.m. UTC | #1
On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote:
>  static void text_poke_memcpy(void *dst, const void *src, size_t len)
>  {
> -	memcpy(dst, src, len);
> +	stac();
> +	__inline_memcpy(dst, src, len);
> +	clac();

I think you need LASS-specific stac()/clac() or an alternative_2 or so. You
can't cause that perf penalty on !LASS machines.
Kirill A. Shutemov July 10, 2024, 10:33 p.m. UTC | #2
On Wed, Jul 10, 2024 at 07:18:36PM +0200, Borislav Petkov wrote:
> On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote:
> >  static void text_poke_memcpy(void *dst, const void *src, size_t len)
> >  {
> > -	memcpy(dst, src, len);
> > +	stac();
> > +	__inline_memcpy(dst, src, len);
> > +	clac();
> 
> I think you need LASS-specific stac()/clac() or an alternative_2 or so. You
> can't cause that perf penalty on !LASS machines.

Hm. Do we have text_poke() in hot path?

Even if we do, I doubt flipping AC flag would make any performance
difference in context of all locking and TLB flushing we do in this
codepath.
Dave Hansen July 10, 2024, 11:05 p.m. UTC | #3
On 7/10/24 15:33, Kirill A. Shutemov wrote:
> On Wed, Jul 10, 2024 at 07:18:36PM +0200, Borislav Petkov wrote:
>> On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote:
>>>  static void text_poke_memcpy(void *dst, const void *src, size_t len)
>>>  {
>>> -	memcpy(dst, src, len);
>>> +	stac();
>>> +	__inline_memcpy(dst, src, len);
>>> +	clac();
>> I think you need LASS-specific stac()/clac() or an alternative_2 or so. You
>> can't cause that perf penalty on !LASS machines.
> Hm. Do we have text_poke() in hot path?
> 
> Even if we do, I doubt flipping AC flag would make any performance
> difference in context of all locking and TLB flushing we do in this
> codepath.

Yeah, I'm also wondering how much this would matter for performance.

But, I'm 100% sure that we want to distinguish a LASS-necessitated
stac()/clac() from a SMAP-necessitated one somehow.
Borislav Petkov July 11, 2024, 1:23 a.m. UTC | #4
On Thu, Jul 11, 2024 at 01:33:23AM +0300, Kirill A. Shutemov wrote:
> Hm. Do we have text_poke() in hot path?
> 
> Even if we do, I doubt flipping AC flag would make any performance
> difference in context of all locking and TLB flushing we do in this
> codepath.

$ dmesg | grep text_poke | wc -l
237

In a silly guest.

And regardless, we don't do unnecessary toggling of rFLAGS.AC.
Peter Zijlstra July 11, 2024, 8:14 a.m. UTC | #5
On Thu, Jul 11, 2024 at 01:33:23AM +0300, Kirill A. Shutemov wrote:
> On Wed, Jul 10, 2024 at 07:18:36PM +0200, Borislav Petkov wrote:
> > On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote:
> > >  static void text_poke_memcpy(void *dst, const void *src, size_t len)
> > >  {
> > > -	memcpy(dst, src, len);
> > > +	stac();
> > > +	__inline_memcpy(dst, src, len);
> > > +	clac();
> > 
> > I think you need LASS-specific stac()/clac() or an alternative_2 or so. You
> > can't cause that perf penalty on !LASS machines.
> 
> Hm. Do we have text_poke() in hot path?
> 
> Even if we do, I doubt flipping AC flag would make any performance
> difference in context of all locking and TLB flushing we do in this
> codepath.

The one where performance might matter is text_poke_early(), the full
fat one is comically slow as you point out.
Peter Zijlstra July 11, 2024, 8:16 a.m. UTC | #6
On Wed, Jul 10, 2024 at 04:05:31PM -0700, Dave Hansen wrote:

> But, I'm 100% sure that we want to distinguish a LASS-necessitated
> stac()/clac() from a SMAP-necessitated one somehow.

Yeah, if only to make it easier to understand the code.
Kirill A. Shutemov July 11, 2024, 10:32 a.m. UTC | #7
On Thu, Jul 11, 2024 at 10:16:28AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 10, 2024 at 04:05:31PM -0700, Dave Hansen wrote:
> 
> > But, I'm 100% sure that we want to distinguish a LASS-necessitated
> > stac()/clac() from a SMAP-necessitated one somehow.
> 
> Yeah, if only to make it easier to understand the code.

lass_disable()/lass_enable()?
Dave Hansen July 11, 2024, 2:03 p.m. UTC | #8
On 7/11/24 03:32, Kirill A. Shutemov wrote:
> On Thu, Jul 11, 2024 at 10:16:28AM +0200, Peter Zijlstra wrote:
>> On Wed, Jul 10, 2024 at 04:05:31PM -0700, Dave Hansen wrote:
>>
>>> But, I'm 100% sure that we want to distinguish a LASS-necessitated
>>> stac()/clac() from a SMAP-necessitated one somehow.
>> Yeah, if only to make it easier to understand the code.
> lass_disable()/lass_enable()?

I was thinking just lass_clac()/lass_stac() because it's not _really_
disabling LASS.

But I don't feel that strongly about it.
diff mbox series

Patch

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 89de61243272..c6e1b17d1da1 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -1825,16 +1825,24 @@  static inline void unuse_temporary_mm(temp_mm_state_t prev_state)
 __ro_after_init struct mm_struct *poking_mm;
 __ro_after_init unsigned long poking_addr;
 
+/*
+ * poking_init() initializes the text poking address from the lower half of the
+ * address space. Relax LASS enforcement when accessing the poking address.
+ */
 static void text_poke_memcpy(void *dst, const void *src, size_t len)
 {
-	memcpy(dst, src, len);
+	stac();
+	__inline_memcpy(dst, src, len);
+	clac();
 }
 
 static void text_poke_memset(void *dst, const void *src, size_t len)
 {
 	int c = *(const int *)src;
 
-	memset(dst, c, len);
+	stac();
+	__inline_memset(dst, c, len);
+	clac();
 }
 
 typedef void text_poke_f(void *dst, const void *src, size_t len);