diff mbox

[Xen-devel,10/17] xen: arm64: disable alignment traps

Message ID 1395330365-9901-10-git-send-email-ian.campbell@citrix.com
State New
Headers show

Commit Message

Ian Campbell March 20, 2014, 3:45 p.m. UTC
The mem* primitives which I am about to import from Linux in a subsequent
patch rely on the hardware handling misalignment.

The benefits of an optimised memcpy etc oughtweigh the downsides.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
---
 xen/arch/arm/arm64/head.S |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Andrew Cooper March 20, 2014, 3:57 p.m. UTC | #1
On 20/03/14 15:45, Ian Campbell wrote:
> The mem* primitives which I am about to import from Linux in a subsequent
> patch rely on the hardware handling misalignment.
>
> The benefits of an optimised memcpy etc oughtweigh the downsides.
>
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> ---
>  xen/arch/arm/arm64/head.S |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> index 9547ef5..22d0030 100644
> --- a/xen/arch/arm/arm64/head.S
> +++ b/xen/arch/arm/arm64/head.S
> @@ -241,7 +241,7 @@ skip_bss:
>           * I-cache enabled,
>           * Alignment checking enabled,

Is this comment still true?

~Andrew

>           * MMU translation disabled (for now). */
> -        ldr   x0, =(HSCTLR_BASE|SCTLR_A)
> +        ldr   x0, =(HSCTLR_BASE)
>          msr   SCTLR_EL2, x0
>  
>          /* Rebuild the boot pagetable's first-level entries. The structure
Ian Campbell March 20, 2014, 3:59 p.m. UTC | #2
On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> On 20/03/14 15:45, Ian Campbell wrote:
> > The mem* primitives which I am about to import from Linux in a subsequent
> > patch rely on the hardware handling misalignment.
> >
> > The benefits of an optimised memcpy etc oughtweigh the downsides.

Ahem, "outweigh".

> > Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
> > ---
> >  xen/arch/arm/arm64/head.S |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
> > index 9547ef5..22d0030 100644
> > --- a/xen/arch/arm/arm64/head.S
> > +++ b/xen/arch/arm/arm64/head.S
> > @@ -241,7 +241,7 @@ skip_bss:
> >           * I-cache enabled,
> >           * Alignment checking enabled,
> 
> Is this comment still true?

Oh balls, no it is not. I had a meeting between deciding to make this
change and actually making it...

Ian.

> 
> ~Andrew
> 
> >           * MMU translation disabled (for now). */
> > -        ldr   x0, =(HSCTLR_BASE|SCTLR_A)
> > +        ldr   x0, =(HSCTLR_BASE)
> >          msr   SCTLR_EL2, x0
> >  
> >          /* Rebuild the boot pagetable's first-level entries. The structure
>
Gordan Bobic March 20, 2014, 4:21 p.m. UTC | #3
On 2014-03-20 15:59, Ian Campbell wrote:
> On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
>> On 20/03/14 15:45, Ian Campbell wrote:
>> > The mem* primitives which I am about to import from Linux in a subsequent
>> > patch rely on the hardware handling misalignment.
>> >
>> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> 
> Ahem, "outweigh".

Just FYI, the slow-down from heavy unaligned accesses (with
hardware alignment fixup, you can't disable it using
/proc/cpu/alignment) on Cortex A15 is about 40x.

Most of the commonly used code has been fixed recently, but
there are still some packages that exhibit misaligned access
traps during their test suites and/or normal operation.

Whether the hardware alignment fixup is less overheady on
ARM64, I don't know - I haven't been able to get my hands
on the hardware yet.

Gordan
Ian Campbell March 20, 2014, 4:27 p.m. UTC | #4
On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
> On 2014-03-20 15:59, Ian Campbell wrote:
> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> >> On 20/03/14 15:45, Ian Campbell wrote:
> >> > The mem* primitives which I am about to import from Linux in a subsequent
> >> > patch rely on the hardware handling misalignment.
> >> >
> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> > 
> > Ahem, "outweigh".
> 
> Just FYI, the slow-down from heavy unaligned accesses (with
> hardware alignment fixup, you can't disable it using
> /proc/cpu/alignment) on Cortex A15 is about 40x.

That's pretty staggering -- are you positive this wasn't the kernel
doing the fixups?

> Most of the commonly used code has been fixed recently, but
> there are still some packages that exhibit misaligned access
> traps during their test suites and/or normal operation.
>
> Whether the hardware alignment fixup is less overheady on
> ARM64, I don't know - I haven't been able to get my hands
> on the hardware yet.

arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
taking it on trust that whoever implemented memcpy.S etc found that
memcpy.S with hardware alignment was better than the dumb loop, even if
it wasn't as good as a clever memcpy.S which avoided the alignments.

Ian.
Gordan Bobic March 20, 2014, 4:43 p.m. UTC | #5
On 2014-03-20 16:27, Ian Campbell wrote:
> On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
>> On 2014-03-20 15:59, Ian Campbell wrote:
>> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
>> >> On 20/03/14 15:45, Ian Campbell wrote:
>> >> > The mem* primitives which I am about to import from Linux in a subsequent
>> >> > patch rely on the hardware handling misalignment.
>> >> >
>> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
>> >
>> > Ahem, "outweigh".
>> 
>> Just FYI, the slow-down from heavy unaligned accesses (with
>> hardware alignment fixup, you can't disable it using
>> /proc/cpu/alignment) on Cortex A15 is about 40x.
> 
> That's pretty staggering -- are you positive this wasn't the kernel
> doing the fixups?

I'm not sure if this is easily checkable:

# echo 0 > /proc/cpu/alignment
# cat /proc/cpu/alignment
User:		0
System:		631040
Skipped:	0
Half:		0
Word:		631040
DWord:		0
Multi:		0
User faults:	2 (fixup)

i.e. I can't disable it.

This is on a Samsung Exynos Chromebook with the
standard ChromeOS kernel.

Here is a recent thread from the Fedora ARM mailing list
which contains links to a simple test program that can
be used to test the alignment related slowdown:

http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html

>> Most of the commonly used code has been fixed recently, but
>> there are still some packages that exhibit misaligned access
>> traps during their test suites and/or normal operation.
>> 
>> Whether the hardware alignment fixup is less overheady on
>> ARM64, I don't know - I haven't been able to get my hands
>> on the hardware yet.
> 
> arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
> taking it on trust that whoever implemented memcpy.S etc found that
> memcpy.S with hardware alignment was better than the dumb loop, even if
> it wasn't as good as a clever memcpy.S which avoided the alignments.

I am inclined to agree - it shouldn't be the job of the kernel or the
hypervisor to do this. It is up to the application developers to know
what they are doing and not do things that introduce misaligned
accesses. Unfortunately, there is far too little push-back on buggy
code because most developers have only ever used x86 and have no idea
that until recently everything else wasn't forgiving of such things.

Gordan
Ian Campbell March 20, 2014, 4:54 p.m. UTC | #6
On Thu, 2014-03-20 at 16:43 +0000, Gordan Bobic wrote:
> On 2014-03-20 16:27, Ian Campbell wrote:
> > On Thu, 2014-03-20 at 16:21 +0000, Gordan Bobic wrote:
> >> On 2014-03-20 15:59, Ian Campbell wrote:
> >> > On Thu, 2014-03-20 at 15:57 +0000, Andrew Cooper wrote:
> >> >> On 20/03/14 15:45, Ian Campbell wrote:
> >> >> > The mem* primitives which I am about to import from Linux in a subsequent
> >> >> > patch rely on the hardware handling misalignment.
> >> >> >
> >> >> > The benefits of an optimised memcpy etc oughtweigh the downsides.
> >> >
> >> > Ahem, "outweigh".
> >> 
> >> Just FYI, the slow-down from heavy unaligned accesses (with
> >> hardware alignment fixup, you can't disable it using
> >> /proc/cpu/alignment) on Cortex A15 is about 40x.
> > 
> > That's pretty staggering -- are you positive this wasn't the kernel
> > doing the fixups?
> 
> I'm not sure if this is easily checkable:
> 
> # echo 0 > /proc/cpu/alignment
> # cat /proc/cpu/alignment
> User:		0
> System:		631040
> Skipped:	0
> Half:		0
> Word:		631040
> DWord:		0
> Multi:		0
> User faults:	2 (fixup)
> 
> i.e. I can't disable it.

That "fixup" implies to me that the kernel will be fixing things up. 

linux/Documentation/arm/mem_alignment describes what happens here.

> 
> This is on a Samsung Exynos Chromebook with the
> standard ChromeOS kernel.

I've no idea if this sets SCTLR.A but it sounds like it does.

> 
> Here is a recent thread from the Fedora ARM mailing list
> which contains links to a simple test program that can
> be used to test the alignment related slowdown:
> 
> http://www.mail-archive.com/arm@lists.fedoraproject.org/msg06121.html
> 
> >> Most of the commonly used code has been fixed recently, but
> >> there are still some packages that exhibit misaligned access
> >> traps during their test suites and/or normal operation.
> >> 
> >> Whether the hardware alignment fixup is less overheady on
> >> ARM64, I don't know - I haven't been able to get my hands
> >> on the hardware yet.
> > 
> > arm64 is a lot "friendlier" than arm32 in this regard. I was mostly
> > taking it on trust that whoever implemented memcpy.S etc found that
> > memcpy.S with hardware alignment was better than the dumb loop, even if
> > it wasn't as good as a clever memcpy.S which avoided the alignments.
> 
> I am inclined to agree - it shouldn't be the job of the kernel or the
> hypervisor to do this.

This patch is only changing the alignment trap behaviour for the
hypervisor itself, it has no impact on either guest kernel or userspace,
which have their own control bits for operation in those modes.

Ian.
Julien Grall March 20, 2014, 5:54 p.m. UTC | #7
Hi Ian,

On 03/20/2014 03:45 PM, Ian Campbell wrote:
> The mem* primitives which I am about to import from Linux in a subsequent
> patch rely on the hardware handling misalignment.
> 
> The benefits of an optimised memcpy etc oughtweigh the downsides.
> 
> Signed-off-by: Ian Campbell <ian.campbell@citrix.com>

With the both minor changes that Andrew and you spotted:

Acked-by: Julien Grall <julien.grall@linaro.org>

Regards,
diff mbox

Patch

diff --git a/xen/arch/arm/arm64/head.S b/xen/arch/arm/arm64/head.S
index 9547ef5..22d0030 100644
--- a/xen/arch/arm/arm64/head.S
+++ b/xen/arch/arm/arm64/head.S
@@ -241,7 +241,7 @@  skip_bss:
          * I-cache enabled,
          * Alignment checking enabled,
          * MMU translation disabled (for now). */
-        ldr   x0, =(HSCTLR_BASE|SCTLR_A)
+        ldr   x0, =(HSCTLR_BASE)
         msr   SCTLR_EL2, x0
 
         /* Rebuild the boot pagetable's first-level entries. The structure