Message ID | 20180824213849.23647-2-Jason@zx2c4.com |
---|---|
State | Superseded |
Headers | show |
Series | WireGuard: Secure Network Tunnel | expand |
On Fri, 24 Aug 2018, Jason A. Donenfeld wrote: > Sometimes it's useful to amortize calls to XSAVE/XRSTOR and the related > FPU/SIMD functions over a number of calls, because FPU restoration is > quite expensive. This adds a simple header for carrying out this pattern: > > simd_context_t simd_context = simd_get(); > while ((item = get_item_from_queue()) != NULL) { > encrypt_item(item, simd_context); > simd_context = simd_relax(simd_context); > } > simd_put(simd_context); I'm not too fond of this simply because it requires that relax() step in all code pathes. I'd rather make that completely transparent by just marking the task as FPU using and let the context switch code deal with it in case that it gets preempted. I'll let one of my engineers look into that next week. Thanks, tglx
Hey Thomas, On Sun, Aug 26, 2018 at 6:10 AM Thomas Gleixner <tglx@linutronix.de> wrote: > I'm not too fond of this simply because it requires that relax() step in > all code pathes. I'd rather make that completely transparent by just > marking the task as FPU using and let the context switch code deal with it > in case that it gets preempted. I'll let one of my engineers look into > that next week. Do you mean to say you intend to make kernel_fpu_end() and kernel_neon_end() only actually do something upon context switch, but not when it's actually called? So that multiple calls to kernel_fpu_begin() and kernel_neon_begin() can be made without penalty? If so, that'd be great, and I'd certainly prefer this to the simd_context_t passing. I consider the simd_get/put/relax API a stopgap measure until something like that is implemented. Jason
Jason, On Sun, 26 Aug 2018, Jason A. Donenfeld wrote: > On Sun, Aug 26, 2018 at 6:10 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > I'm not too fond of this simply because it requires that relax() step in > > all code pathes. I'd rather make that completely transparent by just > > marking the task as FPU using and let the context switch code deal with it > > in case that it gets preempted. I'll let one of my engineers look into > > that next week. > > Do you mean to say you intend to make kernel_fpu_end() and > kernel_neon_end() only actually do something upon context switch, but > not when it's actually called? So that multiple calls to > kernel_fpu_begin() and kernel_neon_begin() can be made without > penalty? On context switch and exit to user. That allows to keep those code pathes fully preemptible. Still twisting my brain around the details. > If so, that'd be great, and I'd certainly prefer this to the > simd_context_t passing. I consider the simd_get/put/relax API a > stopgap measure until something like that is implemented. I really want to avoid this stopgap^Wducttape thing. Thanks, tglx
On Sun, Aug 26, 2018 at 8:06 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > Do you mean to say you intend to make kernel_fpu_end() and > > kernel_neon_end() only actually do something upon context switch, but > > not when it's actually called? So that multiple calls to > > kernel_fpu_begin() and kernel_neon_begin() can be made without > > penalty? > > On context switch and exit to user. That allows to keep those code pathes > fully preemptible. Still twisting my brain around the details. Just to make sure we're on the same page, the goal is so that this code: kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); ... has the same performance as this code: kernel_fpu_begin(); kernel_fpu_end(); (Unless of course the process is preempted or the like.) Currently the present situation makes the performance of the above wildly different, since kernel_fpu_end() does something immediately. What about something like this: - Add a tristate flag connected to task_struct (or in the global fpu struct in the case that this happens in irq and there isn't a valid current). - On kernel_fpu_begin(), if the flag is 0, do the usual expensive XSAVE stuff, and set the flag to 1. - On kernel_fpu_begin(), if the flag is non-0, just set the flag to 1 and return. - On kernel_fpu_end(), if the flag is non-0, set the flag to 2. (Otherwise WARN() or BUG() or something.) - On context switch / preemption / etc away from the task, if the flag is non-0, XRSTOR and such. - On context switch / preemption / etc back to the task, if the flag is 1, XSAVE and such. If the flag is 2, set it to 0. Jason
> On Aug 26, 2018, at 7:06 AM, Thomas Gleixner <tglx@linutronix.de> wrote: > > Jason, > >> On Sun, 26 Aug 2018, Jason A. Donenfeld wrote: >>> On Sun, Aug 26, 2018 at 6:10 AM Thomas Gleixner <tglx@linutronix.de> wrote: >>> I'm not too fond of this simply because it requires that relax() step in >>> all code pathes. I'd rather make that completely transparent by just >>> marking the task as FPU using and let the context switch code deal with it >>> in case that it gets preempted. I'll let one of my engineers look into >>> that next week. >> >> Do you mean to say you intend to make kernel_fpu_end() and >> kernel_neon_end() only actually do something upon context switch, but >> not when it's actually called? So that multiple calls to >> kernel_fpu_begin() and kernel_neon_begin() can be made without >> penalty? > > On context switch and exit to user. That allows to keep those code pathes > fully preemptible. Still twisting my brain around the details. I think you’ll have to treat exit to user and context switch as different things. For exit to user, we want to restore the *user* state, but, for context switch, we’ll need to restore *kernel* state. Do user first as its own patch set. It’ll be less painful that way. And someone needs to rework PKRU for this to make sense. See previous threads.
On Sun, 2018-08-26 at 07:18 -0700, Andy Lutomirski wrote: > > On Aug 26, 2018, at 7:06 AM, Thomas Gleixner <tglx@linutronix.de> > > wrote: > > > > Jason, > > > > > On Sun, 26 Aug 2018, Jason A. Donenfeld wrote: > > > > On Sun, Aug 26, 2018 at 6:10 AM Thomas Gleixner < > > > > tglx@linutronix.de> wrote: > > > > I'm not too fond of this simply because it requires that > > > > relax() step in > > > > all code pathes. I'd rather make that completely transparent by > > > > just > > > > marking the task as FPU using and let the context switch code > > > > deal with it > > > > in case that it gets preempted. I'll let one of my engineers > > > > look into > > > > that next week. > > > > > > Do you mean to say you intend to make kernel_fpu_end() and > > > kernel_neon_end() only actually do something upon context switch, > > > but > > > not when it's actually called? So that multiple calls to > > > kernel_fpu_begin() and kernel_neon_begin() can be made without > > > penalty? > > > > On context switch and exit to user. That allows to keep those code > > pathes > > fully preemptible. Still twisting my brain around the details. > > I think you’ll have to treat exit to user and context switch as > different things. For exit to user, we want to restore the *user* > state, but, for context switch, we’ll need to restore *kernel* state. For non-preemptible kernel_fpu_begin/end (which seems like a good starting point since since it gets the code halfway to where Thomas would like it to go), the rules would be a little simpler: - For exit to userspace, restore the user FPU state. - At kernel_fpu_begin(), save the user FPU state (if still loaded). - At context switch time, save the user FPU state (if still loaded). > Do user first as its own patch set. It’ll be less painful that way. > > And someone needs to rework PKRU for this to make sense. See previous > threads. I sent Thomas the patches I worked on in the past. That series is likely incomplete, but should be a reasonable starting point. -- All Rights Reversed.
On Fri, 24 Aug 2018 14:38:33 PDT (-0700), Jason@zx2c4.com wrote: > Sometimes it's useful to amortize calls to XSAVE/XRSTOR and the related > FPU/SIMD functions over a number of calls, because FPU restoration is > quite expensive. This adds a simple header for carrying out this pattern: > > simd_context_t simd_context = simd_get(); > while ((item = get_item_from_queue()) != NULL) { > encrypt_item(item, simd_context); > simd_context = simd_relax(simd_context); > } > simd_put(simd_context); > > The relaxation step ensures that we don't trample over preemption, and > the get/put API should be a familiar paradigm in the kernel. > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Greg KH <gregkh@linuxfoundation.org> > Cc: Samuel Neves <sneves@dei.uc.pt> > Cc: linux-arch@vger.kernel.org > --- > arch/alpha/include/asm/Kbuild | 5 ++-- > arch/arc/include/asm/Kbuild | 1 + > arch/arm/include/asm/simd.h | 42 ++++++++++++++++++++++++++++++ > arch/arm64/include/asm/simd.h | 37 +++++++++++++++++++++----- > arch/c6x/include/asm/Kbuild | 3 ++- > arch/h8300/include/asm/Kbuild | 3 ++- > arch/hexagon/include/asm/Kbuild | 1 + > arch/ia64/include/asm/Kbuild | 1 + > arch/m68k/include/asm/Kbuild | 1 + > arch/microblaze/include/asm/Kbuild | 1 + > arch/mips/include/asm/Kbuild | 1 + > arch/nds32/include/asm/Kbuild | 7 ++--- > arch/nios2/include/asm/Kbuild | 1 + > arch/openrisc/include/asm/Kbuild | 7 ++--- > arch/parisc/include/asm/Kbuild | 1 + > arch/powerpc/include/asm/Kbuild | 3 ++- > arch/riscv/include/asm/Kbuild | 3 ++- > arch/s390/include/asm/Kbuild | 3 ++- > arch/sh/include/asm/Kbuild | 1 + > arch/sparc/include/asm/Kbuild | 1 + > arch/um/include/asm/Kbuild | 3 ++- > arch/unicore32/include/asm/Kbuild | 1 + > arch/x86/include/asm/simd.h | 30 ++++++++++++++++++++- > arch/xtensa/include/asm/Kbuild | 1 + > include/asm-generic/simd.h | 15 +++++++++++ > include/linux/simd.h | 28 ++++++++++++++++++++ > 26 files changed, 180 insertions(+), 21 deletions(-) > create mode 100644 arch/arm/include/asm/simd.h > create mode 100644 include/linux/simd.h ... > diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild > index 576ffdca06ba..8d3e7aef3234 100644 > --- a/arch/riscv/include/asm/Kbuild > +++ b/arch/riscv/include/asm/Kbuild > @@ -4,9 +4,9 @@ generic-y += checksum.h > generic-y += cputime.h > generic-y += device.h > generic-y += div64.h > -generic-y += dma.h > generic-y += dma-contiguous.h > generic-y += dma-mapping.h > +generic-y += dma.h > generic-y += emergency-restart.h > generic-y += errno.h > generic-y += exec.h If this is the canonical ordering and doing so makes your life easier then I'm OK taking this as a separate patch into the RISC-V tree, but if not then feel free to roll something like this up into your next patch set. > @@ -45,6 +45,7 @@ generic-y += setup.h > generic-y += shmbuf.h > generic-y += shmparam.h > generic-y += signal.h > +generic-y += simd.h > generic-y += socket.h > generic-y += sockios.h > generic-y += stat.h Either way, this looks fine for as far as the RISC-V stuff goes as it's pretty much a NOP. As long as it stays a NOP then feel free to add a Reviewed-by: Palmer Dabbelt <palmer@sifive.com> as far as the RISC-V parts are conceded. It looks like there's a lot of other issues, though, so it's not much of a review :)
Hey Thomas, I'd like to move ahead with my patchset and make some forward progress in LKML submission. If you've got something brewing regarding the FPU context on x86 and ARM, I'm happy to wait a bit longer so as to build on that. But if that is instead a far-off theoretical eventual thing, perhaps it's better for me to move ahead as planned, and we can switch to the superior FPU semantics whenever you get around to it? Either way, please let me know what you have in mind so our plans can stay somewhat sync'd. Talk soon, Jason
On Sat, Sep 1, 2018 at 1:19 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Hey Thomas, > > I'd like to move ahead with my patchset and make some forward progress > in LKML submission. If you've got something brewing regarding the FPU > context on x86 and ARM, I'm happy to wait a bit longer so as to build > on that. But if that is instead a far-off theoretical eventual thing, > perhaps it's better for me to move ahead as planned, and we can switch > to the superior FPU semantics whenever you get around to it? Either > way, please let me know what you have in mind so our plans can stay > somewhat sync'd. I tend to think the right approach is to merge Jason's code and then make it better later. Even with a totally perfect lazy FPU restore implementation on x86, we'll probably still need some way of dealing with SIMD contexts. I think we're highly unlikely to ever a allow SIMD usage in all NMI contexts, for example, and there will always be cases where we specifically don't want to use all available SIMD capabilities even if we can. For example, generating random numbers does crypto, but we probably don't want to do *SIMD* crypto, since that will force a save and restore and will probably fire up the AVX512 unit, and that's not worth it unless we're already using it for some other reason. Also, as Rik has discovered, lazy FPU restore is conceptually straightforward but isn't entirely trivial :) --Andy
On Sat, Sep 1, 2018 at 2:32 PM Andy Lutomirski <luto@kernel.org> wrote: > I tend to think the right approach is to merge Jason's code and then > make it better later. Even with a totally perfect lazy FPU restore > implementation on x86, we'll probably still need some way of dealing > with SIMD contexts. I think we're highly unlikely to ever a allow > SIMD usage in all NMI contexts, for example, and there will always be > cases where we specifically don't want to use all available SIMD > capabilities even if we can. For example, generating random numbers > does crypto, but we probably don't want to do *SIMD* crypto, since > that will force a save and restore and will probably fire up the > AVX512 unit, and that's not worth it unless we're already using it for > some other reason. > > Also, as Rik has discovered, lazy FPU restore is conceptually > straightforward but isn't entirely trivial :) Sounds good. I'll move ahead on this basis.
On Sat, 1 Sep 2018, Jason A. Donenfeld wrote: > On Sat, Sep 1, 2018 at 2:32 PM Andy Lutomirski <luto@kernel.org> wrote: > > I tend to think the right approach is to merge Jason's code and then > > make it better later. Even with a totally perfect lazy FPU restore > > implementation on x86, we'll probably still need some way of dealing > > with SIMD contexts. I think we're highly unlikely to ever a allow > > SIMD usage in all NMI contexts, for example, and there will always be > > cases where we specifically don't want to use all available SIMD > > capabilities even if we can. For example, generating random numbers > > does crypto, but we probably don't want to do *SIMD* crypto, since > > that will force a save and restore and will probably fire up the > > AVX512 unit, and that's not worth it unless we're already using it for > > some other reason. > > > > Also, as Rik has discovered, lazy FPU restore is conceptually > > straightforward but isn't entirely trivial :) > > Sounds good. I'll move ahead on this basis. Fine with me.
Hi Thomas, On Thu, Sep 6, 2018 at 9:29 AM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Sat, 1 Sep 2018, Jason A. Donenfeld wrote: > > On Sat, Sep 1, 2018 at 2:32 PM Andy Lutomirski <luto@kernel.org> wrote: > > > I tend to think the right approach is to merge Jason's code and then > > > make it better later. Even with a totally perfect lazy FPU restore > > > implementation on x86, we'll probably still need some way of dealing > > > with SIMD contexts. I think we're highly unlikely to ever a allow > > > SIMD usage in all NMI contexts, for example, and there will always be > > > cases where we specifically don't want to use all available SIMD > > > capabilities even if we can. For example, generating random numbers > > > does crypto, but we probably don't want to do *SIMD* crypto, since > > > that will force a save and restore and will probably fire up the > > > AVX512 unit, and that's not worth it unless we're already using it for > > > some other reason. > > > > > > Also, as Rik has discovered, lazy FPU restore is conceptually > > > straightforward but isn't entirely trivial :) > > > > Sounds good. I'll move ahead on this basis. > > Fine with me. Do you want to pull this single patch [01/17] into your tree now, and then when I submit v3 of WireGuard and such, I can just drop this patch from it, and then the rest will enter like usual networking stuff through Dave's tree? Jason
diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index 0580cb8c84b2..07b2c1025d34 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -2,14 +2,15 @@ generic-y += compat.h +generic-y += current.h generic-y += exec.h generic-y += export.h generic-y += fb.h generic-y += irq_work.h +generic-y += kprobes.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += preempt.h generic-y += sections.h +generic-y += simd.h generic-y += trace_clock.h -generic-y += current.h -generic-y += kprobes.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index feed50ce89fa..a7f4255f1649 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -22,6 +22,7 @@ generic-y += parport.h generic-y += pci.h generic-y += percpu.h generic-y += preempt.h +generic-y += simd.h generic-y += topology.h generic-y += trace_clock.h generic-y += user.h diff --git a/arch/arm/include/asm/simd.h b/arch/arm/include/asm/simd.h new file mode 100644 index 000000000000..bf468993bbef --- /dev/null +++ b/arch/arm/include/asm/simd.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. + */ + +#include <linux/simd.h> +#ifndef _ASM_SIMD_H +#define _ASM_SIMD_H + +static __must_check inline bool may_use_simd(void) +{ + return !in_interrupt(); +} + +#ifdef CONFIG_KERNEL_MODE_NEON +#include <asm/neon.h> + +static inline simd_context_t simd_get(void) +{ + bool have_simd = may_use_simd(); + if (have_simd) + kernel_neon_begin(); + return have_simd ? HAVE_FULL_SIMD : HAVE_NO_SIMD; +} + +static inline void simd_put(simd_context_t prior_context) +{ + if (prior_context != HAVE_NO_SIMD) + kernel_neon_end(); +} +#else +static inline simd_context_t simd_get(void) +{ + return HAVE_NO_SIMD; +} + +static inline void simd_put(simd_context_t prior_context) +{ +} +#endif + +#endif /* _ASM_SIMD_H */ diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h index 6495cc51246f..058c336de38d 100644 --- a/arch/arm64/include/asm/simd.h +++ b/arch/arm64/include/asm/simd.h @@ -1,11 +1,10 @@ -/* - * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org> +/* SPDX-License-Identifier: GPL-2.0 * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License version 2 as published - * by the Free Software Foundation. + * Copyright (C) 2017 Linaro Ltd. <ard.biesheuvel@linaro.org> + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. */ +#include <linux/simd.h> #ifndef __ASM_SIMD_H #define __ASM_SIMD_H @@ -16,6 +15,8 @@ #include <linux/types.h> #ifdef CONFIG_KERNEL_MODE_NEON +#include <asm/neon.h> +#include <asm/simd.h> DECLARE_PER_CPU(bool, kernel_neon_busy); @@ -40,12 +41,36 @@ static __must_check inline bool may_use_simd(void) !this_cpu_read(kernel_neon_busy); } +static inline simd_context_t simd_get(void) +{ + bool have_simd = may_use_simd(); + if (have_simd) + kernel_neon_begin(); + return have_simd ? HAVE_FULL_SIMD : HAVE_NO_SIMD; +} + +static inline void simd_put(simd_context_t prior_context) +{ + if (prior_context != HAVE_NO_SIMD) + kernel_neon_end(); +} + #else /* ! CONFIG_KERNEL_MODE_NEON */ -static __must_check inline bool may_use_simd(void) { +static __must_check inline bool may_use_simd(void) +{ return false; } +static inline simd_context_t simd_get(void) +{ + return HAVE_NO_SIMD; +} + +static inline void simd_put(simd_context_t prior_context) +{ +} + #endif /* ! CONFIG_KERNEL_MODE_NEON */ #endif diff --git a/arch/c6x/include/asm/Kbuild b/arch/c6x/include/asm/Kbuild index 33a2c94fed0d..22f3d8333c74 100644 --- a/arch/c6x/include/asm/Kbuild +++ b/arch/c6x/include/asm/Kbuild @@ -5,8 +5,8 @@ generic-y += compat.h generic-y += current.h generic-y += device.h generic-y += div64.h -generic-y += dma.h generic-y += dma-mapping.h +generic-y += dma.h generic-y += emergency-restart.h generic-y += exec.h generic-y += extable.h @@ -30,6 +30,7 @@ generic-y += pgalloc.h generic-y += preempt.h generic-y += segment.h generic-y += serial.h +generic-y += simd.h generic-y += tlbflush.h generic-y += topology.h generic-y += trace_clock.h diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild index a5d0b2991f47..f5c2f12d593e 100644 --- a/arch/h8300/include/asm/Kbuild +++ b/arch/h8300/include/asm/Kbuild @@ -8,8 +8,8 @@ generic-y += current.h generic-y += delay.h generic-y += device.h generic-y += div64.h -generic-y += dma.h generic-y += dma-mapping.h +generic-y += dma.h generic-y += emergency-restart.h generic-y += exec.h generic-y += extable.h @@ -39,6 +39,7 @@ generic-y += preempt.h generic-y += scatterlist.h generic-y += sections.h generic-y += serial.h +generic-y += simd.h generic-y += sizes.h generic-y += spinlock.h generic-y += timex.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index dd2fd9c0d292..217d4695fd8a 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -29,6 +29,7 @@ generic-y += rwsem.h generic-y += sections.h generic-y += segment.h generic-y += serial.h +generic-y += simd.h generic-y += sizes.h generic-y += topology.h generic-y += trace_clock.h diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index 557bbc8ba9f5..41c5ebdf79e5 100644 --- a/arch/ia64/include/asm/Kbuild +++ b/arch/ia64/include/asm/Kbuild @@ -4,6 +4,7 @@ generic-y += irq_work.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += preempt.h +generic-y += simd.h generic-y += trace_clock.h generic-y += vtime.h generic-y += word-at-a-time.h diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index a4b8d3331a9e..73898dd1a4d0 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -19,6 +19,7 @@ generic-y += mm-arch-hooks.h generic-y += percpu.h generic-y += preempt.h generic-y += sections.h +generic-y += simd.h generic-y += spinlock.h generic-y += topology.h generic-y += trace_clock.h diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index fe6a6c6e5003..9002fb24888c 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -24,6 +24,7 @@ generic-y += parport.h generic-y += percpu.h generic-y += preempt.h generic-y += serial.h +generic-y += simd.h generic-y += syscalls.h generic-y += topology.h generic-y += trace_clock.h diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild index 58351e48421e..e8868e0fb2c3 100644 --- a/arch/mips/include/asm/Kbuild +++ b/arch/mips/include/asm/Kbuild @@ -16,6 +16,7 @@ generic-y += qrwlock.h generic-y += qspinlock.h generic-y += sections.h generic-y += segment.h +generic-y += simd.h generic-y += trace_clock.h generic-y += unaligned.h generic-y += user.h diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild index dbc4e5422550..603c1d020620 100644 --- a/arch/nds32/include/asm/Kbuild +++ b/arch/nds32/include/asm/Kbuild @@ -7,14 +7,14 @@ generic-y += bug.h generic-y += bugs.h generic-y += checksum.h generic-y += clkdev.h -generic-y += cmpxchg.h generic-y += cmpxchg-local.h +generic-y += cmpxchg.h generic-y += compat.h generic-y += cputime.h generic-y += device.h generic-y += div64.h -generic-y += dma.h generic-y += dma-mapping.h +generic-y += dma.h generic-y += emergency-restart.h generic-y += errno.h generic-y += exec.h @@ -46,14 +46,15 @@ generic-y += sections.h generic-y += segment.h generic-y += serial.h generic-y += shmbuf.h +generic-y += simd.h generic-y += sizes.h generic-y += stat.h generic-y += switch_to.h generic-y += timex.h generic-y += topology.h generic-y += trace_clock.h -generic-y += xor.h generic-y += unaligned.h generic-y += user.h generic-y += vga.h generic-y += word-at-a-time.h +generic-y += xor.h diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild index 8fde4fa2c34f..571a9d9ad107 100644 --- a/arch/nios2/include/asm/Kbuild +++ b/arch/nios2/include/asm/Kbuild @@ -33,6 +33,7 @@ generic-y += preempt.h generic-y += sections.h generic-y += segment.h generic-y += serial.h +generic-y += simd.h generic-y += spinlock.h generic-y += topology.h generic-y += trace_clock.h diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild index 65964d390b10..81a39e274f6f 100644 --- a/arch/openrisc/include/asm/Kbuild +++ b/arch/openrisc/include/asm/Kbuild @@ -27,12 +27,13 @@ generic-y += module.h generic-y += pci.h generic-y += percpu.h generic-y += preempt.h -generic-y += qspinlock_types.h -generic-y += qspinlock.h -generic-y += qrwlock_types.h generic-y += qrwlock.h +generic-y += qrwlock_types.h +generic-y += qspinlock.h +generic-y += qspinlock_types.h generic-y += sections.h generic-y += segment.h +generic-y += simd.h generic-y += string.h generic-y += switch_to.h generic-y += topology.h diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index 2013d639e735..97970b4d05ab 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -17,6 +17,7 @@ generic-y += percpu.h generic-y += preempt.h generic-y += seccomp.h generic-y += segment.h +generic-y += simd.h generic-y += topology.h generic-y += trace_clock.h generic-y += user.h diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 3196d227e351..64290f48e733 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -4,7 +4,8 @@ generic-y += irq_regs.h generic-y += irq_work.h generic-y += local64.h generic-y += mcs_spinlock.h +generic-y += msi.h generic-y += preempt.h generic-y += rwsem.h +generic-y += simd.h generic-y += vtime.h -generic-y += msi.h diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 576ffdca06ba..8d3e7aef3234 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -4,9 +4,9 @@ generic-y += checksum.h generic-y += cputime.h generic-y += device.h generic-y += div64.h -generic-y += dma.h generic-y += dma-contiguous.h generic-y += dma-mapping.h +generic-y += dma.h generic-y += emergency-restart.h generic-y += errno.h generic-y += exec.h @@ -45,6 +45,7 @@ generic-y += setup.h generic-y += shmbuf.h generic-y += shmparam.h generic-y += signal.h +generic-y += simd.h generic-y += socket.h generic-y += sockios.h generic-y += stat.h diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index e3239772887a..7a26dc6ce815 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -7,9 +7,9 @@ generated-y += unistd_nr.h generic-y += asm-offsets.h generic-y += cacheflush.h generic-y += device.h +generic-y += div64.h generic-y += dma-contiguous.h generic-y += dma-mapping.h -generic-y += div64.h generic-y += emergency-restart.h generic-y += export.h generic-y += fb.h @@ -22,6 +22,7 @@ generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += preempt.h generic-y += rwsem.h +generic-y += simd.h generic-y += trace_clock.h generic-y += unaligned.h generic-y += word-at-a-time.h diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index 6a5609a55965..8e64ff35a933 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -16,6 +16,7 @@ generic-y += percpu.h generic-y += preempt.h generic-y += rwsem.h generic-y += serial.h +generic-y += simd.h generic-y += sizes.h generic-y += trace_clock.h generic-y += xor.h diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index 410b263ef5c8..72b9e08fb350 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -17,5 +17,6 @@ generic-y += msi.h generic-y += preempt.h generic-y += rwsem.h generic-y += serial.h +generic-y += simd.h generic-y += trace_clock.h generic-y += word-at-a-time.h diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index b10dde6cb793..d37288b08dd2 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -16,15 +16,16 @@ generic-y += io.h generic-y += irq_regs.h generic-y += irq_work.h generic-y += kdebug.h +generic-y += kprobes.h generic-y += mcs_spinlock.h generic-y += mm-arch-hooks.h generic-y += param.h generic-y += pci.h generic-y += percpu.h generic-y += preempt.h +generic-y += simd.h generic-y += switch_to.h generic-y += topology.h generic-y += trace_clock.h generic-y += word-at-a-time.h generic-y += xor.h -generic-y += kprobes.h diff --git a/arch/unicore32/include/asm/Kbuild b/arch/unicore32/include/asm/Kbuild index bfc7abe77905..98a908720bbd 100644 --- a/arch/unicore32/include/asm/Kbuild +++ b/arch/unicore32/include/asm/Kbuild @@ -27,6 +27,7 @@ generic-y += preempt.h generic-y += sections.h generic-y += segment.h generic-y += serial.h +generic-y += simd.h generic-y += sizes.h generic-y += syscalls.h generic-y += topology.h diff --git a/arch/x86/include/asm/simd.h b/arch/x86/include/asm/simd.h index a341c878e977..79411178988a 100644 --- a/arch/x86/include/asm/simd.h +++ b/arch/x86/include/asm/simd.h @@ -1,4 +1,11 @@ -/* SPDX-License-Identifier: GPL-2.0 */ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. + */ + +#include <linux/simd.h> +#ifndef _ASM_SIMD_H +#define _ASM_SIMD_H #include <asm/fpu/api.h> @@ -10,3 +17,24 @@ static __must_check inline bool may_use_simd(void) { return irq_fpu_usable(); } + +static inline simd_context_t simd_get(void) +{ + bool have_simd = false; +#if !defined(CONFIG_UML) + have_simd = may_use_simd(); + if (have_simd) + kernel_fpu_begin(); +#endif + return have_simd ? HAVE_FULL_SIMD : HAVE_NO_SIMD; +} + +static inline void simd_put(simd_context_t prior_context) +{ +#if !defined(CONFIG_UML) + if (prior_context != HAVE_NO_SIMD) + kernel_fpu_end(); +#endif +} + +#endif /* _ASM_SIMD_H */ diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index e5e1e61c538c..e3b194a187f9 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -23,6 +23,7 @@ generic-y += percpu.h generic-y += preempt.h generic-y += rwsem.h generic-y += sections.h +generic-y += simd.h generic-y += topology.h generic-y += trace_clock.h generic-y += word-at-a-time.h diff --git a/include/asm-generic/simd.h b/include/asm-generic/simd.h index d0343d58a74a..fad899a5a92d 100644 --- a/include/asm-generic/simd.h +++ b/include/asm-generic/simd.h @@ -1,5 +1,9 @@ /* SPDX-License-Identifier: GPL-2.0 */ +#include <linux/simd.h> +#ifndef _ASM_SIMD_H +#define _ASM_SIMD_H + #include <linux/hardirq.h> /* @@ -13,3 +17,14 @@ static __must_check inline bool may_use_simd(void) { return !in_interrupt(); } + +static inline simd_context_t simd_get(void) +{ + return HAVE_NO_SIMD; +} + +static inline void simd_put(simd_context_t prior_context) +{ +} + +#endif /* _ASM_SIMD_H */ diff --git a/include/linux/simd.h b/include/linux/simd.h new file mode 100644 index 000000000000..f62d047188bf --- /dev/null +++ b/include/linux/simd.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 + * + * Copyright (C) 2015-2018 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. + */ + +#ifndef _SIMD_H +#define _SIMD_H + +typedef enum { + HAVE_NO_SIMD, + HAVE_FULL_SIMD +} simd_context_t; + +#include <linux/sched.h> +#include <asm/simd.h> + +static inline simd_context_t simd_relax(simd_context_t prior_context) +{ +#ifdef CONFIG_PREEMPT + if (prior_context != HAVE_NO_SIMD && need_resched()) { + simd_put(prior_context); + return simd_get(); + } +#endif + return prior_context; +} + +#endif /* _SIMD_H */
Sometimes it's useful to amortize calls to XSAVE/XRSTOR and the related FPU/SIMD functions over a number of calls, because FPU restoration is quite expensive. This adds a simple header for carrying out this pattern: simd_context_t simd_context = simd_get(); while ((item = get_item_from_queue()) != NULL) { encrypt_item(item, simd_context); simd_context = simd_relax(simd_context); } simd_put(simd_context); The relaxation step ensures that we don't trample over preemption, and the get/put API should be a familiar paradigm in the kernel. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: Samuel Neves <sneves@dei.uc.pt> Cc: linux-arch@vger.kernel.org --- arch/alpha/include/asm/Kbuild | 5 ++-- arch/arc/include/asm/Kbuild | 1 + arch/arm/include/asm/simd.h | 42 ++++++++++++++++++++++++++++++ arch/arm64/include/asm/simd.h | 37 +++++++++++++++++++++----- arch/c6x/include/asm/Kbuild | 3 ++- arch/h8300/include/asm/Kbuild | 3 ++- arch/hexagon/include/asm/Kbuild | 1 + arch/ia64/include/asm/Kbuild | 1 + arch/m68k/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/nds32/include/asm/Kbuild | 7 ++--- arch/nios2/include/asm/Kbuild | 1 + arch/openrisc/include/asm/Kbuild | 7 ++--- arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild | 3 ++- arch/riscv/include/asm/Kbuild | 3 ++- arch/s390/include/asm/Kbuild | 3 ++- arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 3 ++- arch/unicore32/include/asm/Kbuild | 1 + arch/x86/include/asm/simd.h | 30 ++++++++++++++++++++- arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/simd.h | 15 +++++++++++ include/linux/simd.h | 28 ++++++++++++++++++++ 26 files changed, 180 insertions(+), 21 deletions(-) create mode 100644 arch/arm/include/asm/simd.h create mode 100644 include/linux/simd.h -- 2.18.0