Message ID | 20180930024904.14240-1-Jason@zx2c4.com |
---|---|
State | New |
Headers | show |
Series | ARM: makefile: pass -march=armv4 to assembler even on CPU32v3 | expand |
On Sun, Sep 30, 2018 at 4:49 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Per the discussion about half-way down in [1], the kernel doesn't > actually support the ARMv3 ISA, but selects it for some ARMv4 ISA > hardware that benefits from ARMv3 code generation. Such a consideration, > then, only applies to the compiler but not to the assembler. This commit > passes -march=armv4 to the assembler in those cases, so that code > written for ARMv4 will continue to compile and run fine, without needing > module-specific asflags-y overrides. > > [1] https://lore.kernel.org/lkml/CAKv+Gu9FoFQymp2-=rUeh14CkUKON389OCE7stYCOFwKZpaCrg@mail.gmail.com/ > > Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > I don't have too much familiarity with hardware this old, nor access to > testing systems, so please do carefully evaluate the assertions above > before merging this, since I'm not sure I have a full understanding of > the Linux ARMv3 situation. I took a look at the original build problem. It seems that the issue here that your assembler implementation contains multiplication instructions that are part of ARMv3M and higher but not ARMv3. Since RPC/sa110 supports those instructions, should we maybe just build the kernel with -march-armv3 instead? Russell, does this make sense, or is there a reason we're not already doing the below? Arnd diff --git a/arch/arm/Makefile b/arch/arm/Makefile index 8c05ab53425a..445ae57ee3a1 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -81,7 +81,7 @@ endif arch-$(CONFIG_CPU_32v5) =-D__LINUX_ARM_ARCH__=5 -march=armv5te arch-$(CONFIG_CPU_32v4T) =-D__LINUX_ARM_ARCH__=4 -march=armv4t arch-$(CONFIG_CPU_32v4) =-D__LINUX_ARM_ARCH__=4 $(call cc-option,-march=armv4,-march=armv4t) -arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3 +arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3m # Evaluate arch cc-option calls now arch-y := $(arch-y)
On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote: > Per the discussion about half-way down in [1], the kernel doesn't > actually support the ARMv3 ISA, but selects it for some ARMv4 ISA > hardware that benefits from ARMv3 code generation. The issue is to do with the half-word stores in the ARMv4 ISA, which need to be avoided on StrongARM RiscPC - the bus from the processor card (which was designed for ARM610 and ARM710) does not support anything except 8-bit and 32-bit accesses, so the 16-bit load/store instructions don't work correctly. Obviously, the reason for having the compiler use ARMv3 is to avoid those instructions which we have no other way to prevent - however, the use of ARMv3 with the assembler ensures that ldrh/strh are not accidentally used. We could argue that the ARMv3 assembly files are now stable, so the chances of ldrh/strh being introduced is low, which would make this change tolerable, but the commit message needs to spell out that we lose this protection. > Such a consideration, > then, only applies to the compiler but not to the assembler. This commit > passes -march=armv4 to the assembler in those cases, so that code > written for ARMv4 will continue to compile and run fine, without needing > module-specific asflags-y overrides. Note that "code written for ARMv4" will not be usable on this platform if it makes use of ldrh/strh, so depending on which instructions the assembler is complaining about, it could very well be a real "you're doing something wrong" case. The side effect of this patch is that such cases will now be hidden rather than evaluated on a case-by-case basis. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up According to speedtest.net: 13Mbps down 490kbps up
On 1 October 2018 at 19:56, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote: >> Per the discussion about half-way down in [1], the kernel doesn't >> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA >> hardware that benefits from ARMv3 code generation. > > The issue is to do with the half-word stores in the ARMv4 ISA, which > need to be avoided on StrongARM RiscPC - the bus from the processor > card (which was designed for ARM610 and ARM710) does not support > anything except 8-bit and 32-bit accesses, so the 16-bit load/store > instructions don't work correctly. > > Obviously, the reason for having the compiler use ARMv3 is to avoid > those instructions which we have no other way to prevent - however, > the use of ARMv3 with the assembler ensures that ldrh/strh are not > accidentally used. > > We could argue that the ARMv3 assembly files are now stable, so the > chances of ldrh/strh being introduced is low, which would make this > change tolerable, but the commit message needs to spell out that > we lose this protection. > >> Such a consideration, >> then, only applies to the compiler but not to the assembler. This commit >> passes -march=armv4 to the assembler in those cases, so that code >> written for ARMv4 will continue to compile and run fine, without needing >> module-specific asflags-y overrides. > > Note that "code written for ARMv4" will not be usable on this platform > if it makes use of ldrh/strh, so depending on which instructions the > assembler is complaining about, it could very well be a real "you're > doing something wrong" case. > > The side effect of this patch is that such cases will now be hidden > rather than evaluated on a case-by-case basis. > Thanks for the insight. So Arnd's suggestion to switch to armv3-m would actually be feasible then? The code in question does not use ldrh only umull, and so it should build with armv3m as well. And we will even generate some better code for RiscPC if we apply it to both the cflags and the asflags.
On Mon, Oct 01, 2018 at 08:10:26PM +0200, Ard Biesheuvel wrote: > On 1 October 2018 at 19:56, Russell King - ARM Linux > <linux@armlinux.org.uk> wrote: > > On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote: > >> Per the discussion about half-way down in [1], the kernel doesn't > >> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA > >> hardware that benefits from ARMv3 code generation. > > > > The issue is to do with the half-word stores in the ARMv4 ISA, which > > need to be avoided on StrongARM RiscPC - the bus from the processor > > card (which was designed for ARM610 and ARM710) does not support > > anything except 8-bit and 32-bit accesses, so the 16-bit load/store > > instructions don't work correctly. > > > > Obviously, the reason for having the compiler use ARMv3 is to avoid > > those instructions which we have no other way to prevent - however, > > the use of ARMv3 with the assembler ensures that ldrh/strh are not > > accidentally used. > > > > We could argue that the ARMv3 assembly files are now stable, so the > > chances of ldrh/strh being introduced is low, which would make this > > change tolerable, but the commit message needs to spell out that > > we lose this protection. > > > >> Such a consideration, > >> then, only applies to the compiler but not to the assembler. This commit > >> passes -march=armv4 to the assembler in those cases, so that code > >> written for ARMv4 will continue to compile and run fine, without needing > >> module-specific asflags-y overrides. > > > > Note that "code written for ARMv4" will not be usable on this platform > > if it makes use of ldrh/strh, so depending on which instructions the > > assembler is complaining about, it could very well be a real "you're > > doing something wrong" case. > > > > The side effect of this patch is that such cases will now be hidden > > rather than evaluated on a case-by-case basis. > > > > Thanks for the insight. > > So Arnd's suggestion to switch to armv3-m would actually be feasible > then? The code in question does not use ldrh only umull, and so it > should build with armv3m as well. And we will even generate some > better code for RiscPC if we apply it to both the cflags and the > asflags. Yep. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up According to speedtest.net: 13Mbps down 490kbps up
Hi Arnd, Apologies for the delay in getting back to you. I had some MTA issues and stupidly assumed ARM developers were taking the day off instead... On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote: > -arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3 > +arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3m Unfortunately this doesn't really cut it in my case, as it's not only those multiplications: chacha20-arm.S:402: Error: selected processor does not support `bxeq lr' in ARM mode I think we're going to wind up playing whack-a-mole in silly ways. The fact of the matter is that the ARM assembly I'm adding to the tree is for ARMv4 and up, and not for ARMv3. I think there are three options to work around this issue: 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends". 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select -march=armv4. 3) This patch. My initial plan was (1). ArdB recommended I do (2) instead. I thought that was a bit too nuanced and submitted (3). It sounds like in light of the bus issues, (1) might be the best solution after all? Let me know, and I'll follow your direction. Regards, Jason
Hi Ard, On Tue, Oct 2, 2018 at 5:54 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > So Arnd's suggestion to switch to armv3-m would actually be feasible > then? The code in question does not use ldrh only umull, and so it > should build with armv3m as well. And we will even generate some > better code for RiscPC if we apply it to both the cflags and the > asflags. It turns out armv3m isn't a solution for the crypto code, since it does other armv4-isms. So we'll need to find another solution for that; I'm partial to my initial Kconfig one just doing "depends !CPU_32V3". With regards to RiscPC though, given that armv3 codegen might be better, it sounds like it's a good idea to set -march=armv3m anyway for the platform, irrespective of fixing the problem at hand here. Regards, Jason
Hi Russell, > On 1 October 2018 at 19:56, Russell King - ARM Linux <linux@armlinux.org.uk> wrote: > We could argue that the ARMv3 assembly files are now stable, so the > chances of ldrh/strh being introduced is low, which would make this > change tolerable, but the commit message needs to spell out that > we lose this protection. Actually I don't think that argument really holds too well. We very well could introduce some new ARMv4 assembly -- an optimized crypto routine, for example -- and it could use ldrh/strh. That isn't the case now, but it might be the case later. For that reason, I suspect the proper solution is just not building new cryptography assembly that's written for ARMv4 in mind on CPU_32v3 systems. This way there's never a mismatch of expectations. Jason
On 2 October 2018 at 05:53, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Hi Arnd, > > Apologies for the delay in getting back to you. I had some MTA issues > and stupidly assumed ARM developers were taking the day off instead... > > On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote: >> -arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3 >> +arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3m > > Unfortunately this doesn't really cut it in my case, as it's not only > those multiplications: > chacha20-arm.S:402: Error: selected processor does not support `bxeq > lr' in ARM mode > We have a macro for doing function returns: just replace 'bxeq lr' with 'reteq lr' (and be sure to include <asm/assembler.h>) > I think we're going to wind up playing whack-a-mole in silly ways. The > fact of the matter is that the ARM assembly I'm adding to the tree is > for ARMv4 and up, and not for ARMv3. > > I think there are three options to work around this issue: > > 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends". > 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select > -march=armv4. > 3) This patch. > > My initial plan was (1). ArdB recommended I do (2) instead. I thought > that was a bit too nuanced and submitted (3). > > It sounds like in light of the bus issues, (1) might be the best > solution after all? > > Let me know, and I'll follow your direction. > > Regards, > Jason
On Tue, Oct 2, 2018 at 5:53 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: > > Hi Arnd, > > Apologies for the delay in getting back to you. I had some MTA issues > and stupidly assumed ARM developers were taking the day off instead... > > On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote: > > -arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3 > > +arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3m > > Unfortunately this doesn't really cut it in my case, as it's not only > those multiplications: > chacha20-arm.S:402: Error: selected processor does not support `bxeq > lr' in ARM mode > > I think we're going to wind up playing whack-a-mole in silly ways. The > fact of the matter is that the ARM assembly I'm adding to the tree is > for ARMv4 and up, and not for ARMv3. I don't see what issues remain. The 'reteq lr' that Ard mentioned is definitely the correct way to return from assembly (you also need that for plain armv4, as 'bx' was added in armv4t), and Russell confirmed that using -march=armv3m is something we want anyway for mach-rpc. > I think there are three options to work around this issue: > > 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends". > 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select > -march=armv4. > 3) This patch. > > My initial plan was (1). ArdB recommended I do (2) instead. I thought > that was a bit too nuanced and submitted (3). I suspect all three of the above fail to work for armv4. Arnd
(adding Eric since he wrote the ChaCha20 scalar code) On 2 October 2018 at 09:51, Arnd Bergmann <arnd@arndb.de> wrote: > On Tue, Oct 2, 2018 at 5:53 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote: >> >> Hi Arnd, >> >> Apologies for the delay in getting back to you. I had some MTA issues >> and stupidly assumed ARM developers were taking the day off instead... >> >> On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote: >> > -arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3 >> > +arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3m >> >> Unfortunately this doesn't really cut it in my case, as it's not only >> those multiplications: >> chacha20-arm.S:402: Error: selected processor does not support `bxeq >> lr' in ARM mode >> >> I think we're going to wind up playing whack-a-mole in silly ways. The >> fact of the matter is that the ARM assembly I'm adding to the tree is >> for ARMv4 and up, and not for ARMv3. > > I don't see what issues remain. The 'reteq lr' that Ard mentioned > is definitely the correct way to return from assembly (you also need > that for plain armv4, as 'bx' was added in armv4t), and Russell > confirmed that using -march=armv3m is something we want > anyway for mach-rpc. > In fact, this bxeq instruction is the only remaining impediment to building all scalar code with -march-arm3m, and looking at the code ENTRY(chacha20_arm) cmp r2, #0 // len == 0? bxeq lr it seems to me that we can move this len == 0 check into the caller instead. index 163815f51aac..b2108e00d451 100644 --- a/lib/zinc/chacha20/chacha20-arm-glue.h +++ b/lib/zinc/chacha20/chacha20-arm-glue.h @@ -59,6 +59,8 @@ static inline bool chacha20_arch(struct chacha20_ctx *ctx, u8 *dst, src += bytes; simd_relax(simd_context); } else { + if (unlikely(!len)) + break; chacha20_arm(dst, src, len, ctx->key, ctx->counter); ctx->counter[0] += (len + 63) / 64; break; diff --git a/lib/zinc/chacha20/chacha20-arm.S b/lib/zinc/chacha20/chacha20-arm.S index 5abedafcf129..845843a14ab1 100644 --- a/lib/zinc/chacha20/chacha20-arm.S +++ b/lib/zinc/chacha20/chacha20-arm.S @@ -398,9 +398,6 @@ * const u32 iv[4]); */ ENTRY(chacha20_arm) - cmp r2, #0 // len == 0? - bxeq lr - push {r0-r2,r4-r11,lr} // Push state x0-x15 onto stack.
Hi Arnd, On Tue, Oct 2, 2018 at 9:58 AM Arnd Bergmann <arnd@arndb.de> wrote: > > I think we're going to wind up playing whack-a-mole in silly ways. The > > fact of the matter is that the ARM assembly I'm adding to the tree is > > for ARMv4 and up, and not for ARMv3. > > I don't see what issues remain. The 'reteq lr' that Ard mentioned > is definitely the correct way to return from assembly (you also need > that for plain armv4, as 'bx' was added in armv4t), and Russell > confirmed that using -march=armv3m is something we want > anyway for mach-rpc. I'll do that. I can confirm that after changing it to `reteq lr`, everything works well with armv3m. > > I think there are three options to work around this issue: > > > > 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends". > > 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select > > -march=armv4. > > 3) This patch. > > > > My initial plan was (1). ArdB recommended I do (2) instead. I thought > > that was a bit too nuanced and submitted (3). > > I suspect all three of the above fail to work for armv4. Armv4 does actually work in this configuration, in fact. But anyway, I'll go with your primary suggestion and we therefore can move ahead with changing the global cflag to march=armv3m. Would you like me to submit the patch for this, or would yo like to handle it? Jason
On 2 October 2018 at 14:30, Jason A. Donenfeld <Jason@zx2c4.com> wrote: > Hi Arnd, > > On Tue, Oct 2, 2018 at 9:58 AM Arnd Bergmann <arnd@arndb.de> wrote: >> > I think we're going to wind up playing whack-a-mole in silly ways. The >> > fact of the matter is that the ARM assembly I'm adding to the tree is >> > for ARMv4 and up, and not for ARMv3. >> >> I don't see what issues remain. The 'reteq lr' that Ard mentioned >> is definitely the correct way to return from assembly (you also need >> that for plain armv4, as 'bx' was added in armv4t), and Russell >> confirmed that using -march=armv3m is something we want >> anyway for mach-rpc. > > I'll do that. I can confirm that after changing it to `reteq lr`, > everything works well with armv3m. > >> > I think there are three options to work around this issue: >> > >> > 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends". >> > 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select >> > -march=armv4. >> > 3) This patch. >> > >> > My initial plan was (1). ArdB recommended I do (2) instead. I thought >> > that was a bit too nuanced and submitted (3). >> >> I suspect all three of the above fail to work for armv4. > > Armv4 does actually work in this configuration, in fact. It builds but it doesn't run, at least not when built into the kernel proper (which will be the case after random.c moves to this library) Your toolchain is implicitly passing --fix-v4bx to the assembler, which causes it to permit bx instructions in ARMv4 object code, but tag them with special R_ARM_V4BX ELF relocations. The ARM module loader does take these into account, so built as a module, it works. However, when built into the core kernel, we have to rely on the linker to patch this instruction into the ARMv4 equivalent, and a quick check reveals that that is currently not the case. Bottom line: let's not go there. > But anyway, > I'll go with your primary suggestion and we therefore can move ahead > with changing the global cflag to march=armv3m. Would you like me to > submit the patch for this, or would yo like to handle it? > Yes please go ahead.
Hey Ard, On Tue, Oct 2, 2018 at 3:10 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > It builds but it doesn't run, at least not when built into the kernel > proper (which will be the case after random.c moves to this library) > > Your toolchain is implicitly passing --fix-v4bx to the assembler, > which causes it to permit bx instructions in ARMv4 object code, but > tag them with special R_ARM_V4BX ELF relocations. The ARM module > loader does take these into account, so built as a module, it works. > However, when built into the core kernel, we have to rely on the > linker to patch this instruction into the ARMv4 equivalent, and a > quick check reveals that that is currently not the case. > > Bottom line: let's not go there. Oh, thanks a bunch for explaining that. Since I wrote the last email I've been playing around with as, readelf, and qemu and was slightly mystified. > > > But anyway, > > I'll go with your primary suggestion and we therefore can move ahead > > with changing the global cflag to march=armv3m. Would you like me to > > submit the patch for this, or would yo like to handle it? > > > > Yes please go ahead. Alright, incoming in a few minutes. Jason
diff --git a/arch/arm/Makefile b/arch/arm/Makefile index d1516f85f25d..7a264cacb482 100644 --- a/arch/arm/Makefile +++ b/arch/arm/Makefile @@ -76,8 +76,14 @@ arch-$(CONFIG_CPU_32v4T) =-D__LINUX_ARM_ARCH__=4 -march=armv4t arch-$(CONFIG_CPU_32v4) =-D__LINUX_ARM_ARCH__=4 -march=armv4 arch-$(CONFIG_CPU_32v3) =-D__LINUX_ARM_ARCH__=3 -march=armv3 +# We do not actually support the ARMv3 ISA and prefer it above only for +# code generation purposes, which does not apply to assembly. So we pass +# v4 to the assembler, so that we can still assemble all instructions. +arch-asflags-$(CONFIG_CPU_32v3) =-march=armv4 + # Evaluate arch cc-option calls now arch-y := $(arch-y) +arch-asflags-y := $(arch-asflags-y) # This selects how we optimise for the processor. tune-$(CONFIG_CPU_ARM7TDMI) =-mtune=arm7tdmi @@ -130,7 +136,7 @@ endif # Need -Uarm for gcc < 3.x KBUILD_CFLAGS +=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm -KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) -include asm/unified.h -msoft-float +KBUILD_AFLAGS +=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) $(arch-asflags-y) -include asm/unified.h -msoft-float CHECKFLAGS += -D__arm__
Per the discussion about half-way down in [1], the kernel doesn't actually support the ARMv3 ISA, but selects it for some ARMv4 ISA hardware that benefits from ARMv3 code generation. Such a consideration, then, only applies to the compiler but not to the assembler. This commit passes -march=armv4 to the assembler in those cases, so that code written for ARMv4 will continue to compile and run fine, without needing module-specific asflags-y overrides. [1] https://lore.kernel.org/lkml/CAKv+Gu9FoFQymp2-=rUeh14CkUKON389OCE7stYCOFwKZpaCrg@mail.gmail.com/ Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- I don't have too much familiarity with hardware this old, nor access to testing systems, so please do carefully evaluate the assertions above before merging this, since I'm not sure I have a full understanding of the Linux ARMv3 situation. arch/arm/Makefile | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) -- 2.19.0