diff mbox

[RFC,1/2] ARM: vfp - allow kernel mode NEON in softirq context

Message ID 1483991849-32448-2-git-send-email-ard.biesheuvel@linaro.org
State New
Headers show

Commit Message

Ard Biesheuvel Jan. 9, 2017, 7:57 p.m. UTC
This updates the kernel mode NEON handling to allow the NEON to be used
in softirq context as well as process context. This involves disabling
softirq processing when the NEON is used in kernel mode in process context,
and dealing with the situation where 'current' is not the owner of the
userland context that is present in the NEON register file when the NEON
is enabled in kernel mode.

The rationale for this change is that the NEON is shared with the ARMv8
Crypto Extensions (which are also defined for the AArch32 execution state),
which can give a huge performance boost (15x) to use cases like mac80211
CCMP processing, which executes in softirq context.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 arch/arm/vfp/vfpmodule.c | 22 ++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Ard Biesheuvel Jan. 11, 2017, 5:40 p.m. UTC | #1
On 9 January 2017 at 19:57, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> This updates the kernel mode NEON handling to allow the NEON to be used

> in softirq context as well as process context. This involves disabling

> softirq processing when the NEON is used in kernel mode in process context,

> and dealing with the situation where 'current' is not the owner of the

> userland context that is present in the NEON register file when the NEON

> is enabled in kernel mode.

>

> The rationale for this change is that the NEON is shared with the ARMv8

> Crypto Extensions (which are also defined for the AArch32 execution state),

> which can give a huge performance boost (15x) to use cases like mac80211

> CCMP processing, which executes in softirq context.

>

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

>  arch/arm/vfp/vfpmodule.c | 22 ++++++++++++++------

>  1 file changed, 16 insertions(+), 6 deletions(-)

>

> diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c

> index 569d5a650a4a..814752811537 100644

> --- a/arch/arm/vfp/vfpmodule.c

> +++ b/arch/arm/vfp/vfpmodule.c

> @@ -690,26 +690,33 @@ void kernel_neon_begin(void)

>         u32 fpexc;

>

>         /*

> -        * Kernel mode NEON is only allowed outside of interrupt context

> +        * Kernel mode NEON is only allowed outside of hardirq context

>          * with preemption disabled. This will make sure that the kernel

>          * mode NEON register contents never need to be preserved.

>          */

> -       BUG_ON(in_interrupt());

> +       BUG_ON(in_irq());

>         cpu = get_cpu();

>

> +       /*

> +        * Disable softirq processing while the NEON is used by the kernel in

> +        * process context. This ensures that only a single kernel mode NEON

> +        * state is live at any given time.

> +        */

> +       if (!in_serving_softirq())

> +               local_bh_disable();

> +

>         fpexc = fmrx(FPEXC) | FPEXC_EN;

>         fmxr(FPEXC, fpexc);

>

>         /*

> -        * Save the userland NEON/VFP state. Under UP,

> -        * the owner could be a task other than 'current'

> +        * Save the userland NEON/VFP state. Under UP, or when executing in

> +        * softirq context, the owner could be a task other than 'current'

>          */

>         if (vfp_state_in_hw(cpu, thread))

>                 vfp_save_state(&thread->vfpstate, fpexc);

> -#ifndef CONFIG_SMP

>         else if (vfp_current_hw_state[cpu] != NULL)

>                 vfp_save_state(vfp_current_hw_state[cpu], fpexc);

> -#endif

> +


Actually, I think this should not be necessary (and the change to the
comment is incorrect). Whether we're in process or softirq context
makes no difference here, and the comment is slightly confusing: under
SMP, the owner could also be a task other than 'current', but due to
the eager preserve, the latest userland NEON state will already have
been recorded, and there is no need doing it again.

>         vfp_current_hw_state[cpu] = NULL;

>  }

>  EXPORT_SYMBOL(kernel_neon_begin);

> @@ -718,7 +725,10 @@ void kernel_neon_end(void)

>  {

>         /* Disable the NEON/VFP unit. */

>         fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);

> +       if (!in_serving_softirq())

> +               local_bh_enable();

>         put_cpu();

> +

>  }

>  EXPORT_SYMBOL(kernel_neon_end);

>

> --

> 2.7.4

>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Russell King (Oracle) Jan. 11, 2017, 5:56 p.m. UTC | #2
On Mon, Jan 09, 2017 at 07:57:28PM +0000, Ard Biesheuvel wrote:
> This updates the kernel mode NEON handling to allow the NEON to be used

> in softirq context as well as process context. This involves disabling

> softirq processing when the NEON is used in kernel mode in process context,

> and dealing with the situation where 'current' is not the owner of the

> userland context that is present in the NEON register file when the NEON

> is enabled in kernel mode.


I really don't like this idea as-is.

We have cases where kernel code accesses VFP to (eg) save or restore
register state, such as during signal handling.  We assume that this
will not be interrupted by another user, and that if we enable access
to the VFP, it will stay enabled.  If it gets disabled beneath us, then
things won't go well.

For example, consider vfp_sync_hwstate():

vfp_sync_hwstate()
  vfp_state_in_hw() => true
    fpexc read
	softirq happens
		kernel_neon_begin()
		kernel_neon_end()
    fpexc re-enabled
    current register state saved out (corrupting what was there)
    fpexc restored, possible in an enabled state

Or we could have:

vfp_sync_hwstate()
  vfp_state_in_hw() => true
	softirq happens
		kernel_neon_begin()
		kernel_neon_end()
    fpexc read
    fpexc re-enabled
    current register state saved out (corrupting what was there)
    fpexc disabled

Or worse:

vfp_sync_hwstate()
  vfp_state_in_hw() => true
    fpexc read
    fpexc re-enabled
	softirq happens
		kernel_neon_begin()
		kernel_neon_end()
    current register state saved out, blowing up because VFP is
     unexpectedly disabled

So we would need to disable softirqs around every sensitive point in the
VFP support code, and over all VFP instruction emulations for those VFPs
which bounce "difficult" operations to the kernel support code.

> The rationale for this change is that the NEON is shared with the ARMv8

> Crypto Extensions (which are also defined for the AArch32 execution state),

> which can give a huge performance boost (15x) to use cases like mac80211

> CCMP processing, which executes in softirq context.


I think, once the implementation is more correct, this would need to
be re-evaluated, and I'd also like other more general performance
measurements as well (eg, latency.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Jan. 11, 2017, 6:23 p.m. UTC | #3
On 11 January 2017 at 17:56, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Mon, Jan 09, 2017 at 07:57:28PM +0000, Ard Biesheuvel wrote:

>> This updates the kernel mode NEON handling to allow the NEON to be used

>> in softirq context as well as process context. This involves disabling

>> softirq processing when the NEON is used in kernel mode in process context,

>> and dealing with the situation where 'current' is not the owner of the

>> userland context that is present in the NEON register file when the NEON

>> is enabled in kernel mode.

>

> I really don't like this idea as-is.

>

> We have cases where kernel code accesses VFP to (eg) save or restore

> register state, such as during signal handling.  We assume that this

> will not be interrupted by another user, and that if we enable access

> to the VFP, it will stay enabled.  If it gets disabled beneath us, then

> things won't go well.

>

> For example, consider vfp_sync_hwstate():

>

> vfp_sync_hwstate()

>   vfp_state_in_hw() => true

>     fpexc read

>         softirq happens

>                 kernel_neon_begin()

>                 kernel_neon_end()

>     fpexc re-enabled

>     current register state saved out (corrupting what was there)

>     fpexc restored, possible in an enabled state

>

> Or we could have:

>

> vfp_sync_hwstate()

>   vfp_state_in_hw() => true

>         softirq happens

>                 kernel_neon_begin()

>                 kernel_neon_end()

>     fpexc read

>     fpexc re-enabled

>     current register state saved out (corrupting what was there)

>     fpexc disabled

>

> Or worse:

>

> vfp_sync_hwstate()

>   vfp_state_in_hw() => true

>     fpexc read

>     fpexc re-enabled

>         softirq happens

>                 kernel_neon_begin()

>                 kernel_neon_end()

>     current register state saved out, blowing up because VFP is

>      unexpectedly disabled

>

> So we would need to disable softirqs around every sensitive point in the

> VFP support code, and over all VFP instruction emulations for those VFPs

> which bounce "difficult" operations to the kernel support code.

>


Ah yes, I should have known it couldn't be that simple.

Thanks for the critique: i will look into the impact of making these changes.

>> The rationale for this change is that the NEON is shared with the ARMv8

>> Crypto Extensions (which are also defined for the AArch32 execution state),

>> which can give a huge performance boost (15x) to use cases like mac80211

>> CCMP processing, which executes in softirq context.

>

> I think, once the implementation is more correct, this would need to

> be re-evaluated, and I'd also like other more general performance

> measurements as well (eg, latency.)

>


Re latency, I thought about adding a kernel_neon_yield(), which does a
kernel_neon_end()/do_softirq()/kernel_neon_begin() sequence if any
softirqs are pending, to be invoked by kernel mode NEON users at times
when there are no live NEON registers. But in-kernel users of the
crypto API are naturally quantised into disk sectors, pages or network
packets, so I would not expect any noticeable starvation to occur. But
that does mean such algorithms should not be exposed to userland
(which sounds like a bad idea in any case, given that userland can
simply execute the same instructions)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Russell King (Oracle) Jan. 11, 2017, 6:30 p.m. UTC | #4
On Wed, Jan 11, 2017 at 06:23:10PM +0000, Ard Biesheuvel wrote:
> Re latency, I thought about adding a kernel_neon_yield(), which does a

> kernel_neon_end()/do_softirq()/kernel_neon_begin() sequence if any

> softirqs are pending, to be invoked by kernel mode NEON users at times

> when there are no live NEON registers. But in-kernel users of the

> crypto API are naturally quantised into disk sectors, pages or network

> packets, so I would not expect any noticeable starvation to occur. But

> that does mean such algorithms should not be exposed to userland

> (which sounds like a bad idea in any case, given that userland can

> simply execute the same instructions)


I was actually thinking about the impact of adding softirq disabling
over much of the VFP code necessary to make this safe, rather than the
softirq disable coming from the kernel mode neon itself.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 569d5a650a4a..814752811537 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -690,26 +690,33 @@  void kernel_neon_begin(void)
 	u32 fpexc;
 
 	/*
-	 * Kernel mode NEON is only allowed outside of interrupt context
+	 * Kernel mode NEON is only allowed outside of hardirq context
 	 * with preemption disabled. This will make sure that the kernel
 	 * mode NEON register contents never need to be preserved.
 	 */
-	BUG_ON(in_interrupt());
+	BUG_ON(in_irq());
 	cpu = get_cpu();
 
+	/*
+	 * Disable softirq processing while the NEON is used by the kernel in
+	 * process context. This ensures that only a single kernel mode NEON
+	 * state is live at any given time.
+	 */
+	if (!in_serving_softirq())
+		local_bh_disable();
+
 	fpexc = fmrx(FPEXC) | FPEXC_EN;
 	fmxr(FPEXC, fpexc);
 
 	/*
-	 * Save the userland NEON/VFP state. Under UP,
-	 * the owner could be a task other than 'current'
+	 * Save the userland NEON/VFP state. Under UP, or when executing in
+	 * softirq context, the owner could be a task other than 'current'
 	 */
 	if (vfp_state_in_hw(cpu, thread))
 		vfp_save_state(&thread->vfpstate, fpexc);
-#ifndef CONFIG_SMP
 	else if (vfp_current_hw_state[cpu] != NULL)
 		vfp_save_state(vfp_current_hw_state[cpu], fpexc);
-#endif
+
 	vfp_current_hw_state[cpu] = NULL;
 }
 EXPORT_SYMBOL(kernel_neon_begin);
@@ -718,7 +725,10 @@  void kernel_neon_end(void)
 {
 	/* Disable the NEON/VFP unit. */
 	fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
+	if (!in_serving_softirq())
+		local_bh_enable();
 	put_cpu();
+
 }
 EXPORT_SYMBOL(kernel_neon_end);