Message ID | 20240710160655.3402786-4-alexander.shishkin@linux.intel.com |
---|---|
State | Superseded |
Headers | show |
Series | Enable Linear Address Space Separation support | expand |
On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote: > static void text_poke_memcpy(void *dst, const void *src, size_t len) > { > - memcpy(dst, src, len); > + stac(); > + __inline_memcpy(dst, src, len); > + clac(); I think you need LASS-specific stac()/clac() or an alternative_2 or so. You can't cause that perf penalty on !LASS machines.
On Wed, Jul 10, 2024 at 07:18:36PM +0200, Borislav Petkov wrote: > On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote: > > static void text_poke_memcpy(void *dst, const void *src, size_t len) > > { > > - memcpy(dst, src, len); > > + stac(); > > + __inline_memcpy(dst, src, len); > > + clac(); > > I think you need LASS-specific stac()/clac() or an alternative_2 or so. You > can't cause that perf penalty on !LASS machines. Hm. Do we have text_poke() in hot path? Even if we do, I doubt flipping AC flag would make any performance difference in context of all locking and TLB flushing we do in this codepath.
On 7/10/24 15:33, Kirill A. Shutemov wrote: > On Wed, Jul 10, 2024 at 07:18:36PM +0200, Borislav Petkov wrote: >> On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote: >>> static void text_poke_memcpy(void *dst, const void *src, size_t len) >>> { >>> - memcpy(dst, src, len); >>> + stac(); >>> + __inline_memcpy(dst, src, len); >>> + clac(); >> I think you need LASS-specific stac()/clac() or an alternative_2 or so. You >> can't cause that perf penalty on !LASS machines. > Hm. Do we have text_poke() in hot path? > > Even if we do, I doubt flipping AC flag would make any performance > difference in context of all locking and TLB flushing we do in this > codepath. Yeah, I'm also wondering how much this would matter for performance. But, I'm 100% sure that we want to distinguish a LASS-necessitated stac()/clac() from a SMAP-necessitated one somehow.
On Thu, Jul 11, 2024 at 01:33:23AM +0300, Kirill A. Shutemov wrote: > Hm. Do we have text_poke() in hot path? > > Even if we do, I doubt flipping AC flag would make any performance > difference in context of all locking and TLB flushing we do in this > codepath. $ dmesg | grep text_poke | wc -l 237 In a silly guest. And regardless, we don't do unnecessary toggling of rFLAGS.AC.
On Thu, Jul 11, 2024 at 01:33:23AM +0300, Kirill A. Shutemov wrote: > On Wed, Jul 10, 2024 at 07:18:36PM +0200, Borislav Petkov wrote: > > On Wed, Jul 10, 2024 at 07:06:39PM +0300, Alexander Shishkin wrote: > > > static void text_poke_memcpy(void *dst, const void *src, size_t len) > > > { > > > - memcpy(dst, src, len); > > > + stac(); > > > + __inline_memcpy(dst, src, len); > > > + clac(); > > > > I think you need LASS-specific stac()/clac() or an alternative_2 or so. You > > can't cause that perf penalty on !LASS machines. > > Hm. Do we have text_poke() in hot path? > > Even if we do, I doubt flipping AC flag would make any performance > difference in context of all locking and TLB flushing we do in this > codepath. The one where performance might matter is text_poke_early(), the full fat one is comically slow as you point out.
On Wed, Jul 10, 2024 at 04:05:31PM -0700, Dave Hansen wrote: > But, I'm 100% sure that we want to distinguish a LASS-necessitated > stac()/clac() from a SMAP-necessitated one somehow. Yeah, if only to make it easier to understand the code.
On Thu, Jul 11, 2024 at 10:16:28AM +0200, Peter Zijlstra wrote: > On Wed, Jul 10, 2024 at 04:05:31PM -0700, Dave Hansen wrote: > > > But, I'm 100% sure that we want to distinguish a LASS-necessitated > > stac()/clac() from a SMAP-necessitated one somehow. > > Yeah, if only to make it easier to understand the code. lass_disable()/lass_enable()?
On 7/11/24 03:32, Kirill A. Shutemov wrote: > On Thu, Jul 11, 2024 at 10:16:28AM +0200, Peter Zijlstra wrote: >> On Wed, Jul 10, 2024 at 04:05:31PM -0700, Dave Hansen wrote: >> >>> But, I'm 100% sure that we want to distinguish a LASS-necessitated >>> stac()/clac() from a SMAP-necessitated one somehow. >> Yeah, if only to make it easier to understand the code. > lass_disable()/lass_enable()? I was thinking just lass_clac()/lass_stac() because it's not _really_ disabling LASS. But I don't feel that strongly about it.
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 89de61243272..c6e1b17d1da1 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -1825,16 +1825,24 @@ static inline void unuse_temporary_mm(temp_mm_state_t prev_state) __ro_after_init struct mm_struct *poking_mm; __ro_after_init unsigned long poking_addr; +/* + * poking_init() initializes the text poking address from the lower half of the + * address space. Relax LASS enforcement when accessing the poking address. + */ static void text_poke_memcpy(void *dst, const void *src, size_t len) { - memcpy(dst, src, len); + stac(); + __inline_memcpy(dst, src, len); + clac(); } static void text_poke_memset(void *dst, const void *src, size_t len) { int c = *(const int *)src; - memset(dst, c, len); + stac(); + __inline_memset(dst, c, len); + clac(); } typedef void text_poke_f(void *dst, const void *src, size_t len);