diff mbox series

ARM: makefile: pass -march=armv4 to assembler even on CPU32v3

Message ID 20180930024904.14240-1-Jason@zx2c4.com
State New
Headers show
Series ARM: makefile: pass -march=armv4 to assembler even on CPU32v3 | expand

Commit Message

Jason A. Donenfeld Sept. 30, 2018, 2:49 a.m. UTC
Per the discussion about half-way down in [1], the kernel doesn't
actually support the ARMv3 ISA, but selects it for some ARMv4 ISA
hardware that benefits from ARMv3 code generation. Such a consideration,
then, only applies to the compiler but not to the assembler. This commit
passes -march=armv4 to the assembler in those cases, so that code
written for ARMv4 will continue to compile and run fine, without needing
module-specific asflags-y overrides.

[1] https://lore.kernel.org/lkml/CAKv+Gu9FoFQymp2-=rUeh14CkUKON389OCE7stYCOFwKZpaCrg@mail.gmail.com/

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
I don't have too much familiarity with hardware this old, nor access to
testing systems, so please do carefully evaluate the assertions above
before merging this, since I'm not sure I have a full understanding of
the Linux ARMv3 situation.

 arch/arm/Makefile | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

-- 
2.19.0

Comments

Arnd Bergmann Oct. 1, 2018, 3:13 p.m. UTC | #1
On Sun, Sep 30, 2018 at 4:49 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>

> Per the discussion about half-way down in [1], the kernel doesn't

> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA

> hardware that benefits from ARMv3 code generation. Such a consideration,

> then, only applies to the compiler but not to the assembler. This commit

> passes -march=armv4 to the assembler in those cases, so that code

> written for ARMv4 will continue to compile and run fine, without needing

> module-specific asflags-y overrides.

>

> [1] https://lore.kernel.org/lkml/CAKv+Gu9FoFQymp2-=rUeh14CkUKON389OCE7stYCOFwKZpaCrg@mail.gmail.com/

>

> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

> I don't have too much familiarity with hardware this old, nor access to

> testing systems, so please do carefully evaluate the assertions above

> before merging this, since I'm not sure I have a full understanding of

> the Linux ARMv3 situation.


I took a look at the original build problem. It seems that the issue
here that your assembler implementation contains multiplication
instructions that are part of ARMv3M and higher but not ARMv3.

Since RPC/sa110 supports those instructions, should we maybe
just build the kernel with -march-armv3 instead?

Russell, does this make sense, or is there a reason we're not
already doing the below?

       Arnd

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 8c05ab53425a..445ae57ee3a1 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -81,7 +81,7 @@ endif
 arch-$(CONFIG_CPU_32v5)                =-D__LINUX_ARM_ARCH__=5 -march=armv5te
 arch-$(CONFIG_CPU_32v4T)       =-D__LINUX_ARM_ARCH__=4 -march=armv4t
 arch-$(CONFIG_CPU_32v4)                =-D__LINUX_ARM_ARCH__=4 $(call
cc-option,-march=armv4,-march=armv4t)
-arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3
+arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3m

 # Evaluate arch cc-option calls now
 arch-y := $(arch-y)
Russell King (Oracle) Oct. 1, 2018, 5:56 p.m. UTC | #2
On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote:
> Per the discussion about half-way down in [1], the kernel doesn't

> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA

> hardware that benefits from ARMv3 code generation.


The issue is to do with the half-word stores in the ARMv4 ISA, which
need to be avoided on StrongARM RiscPC - the bus from the processor
card (which was designed for ARM610 and ARM710) does not support
anything except 8-bit and 32-bit accesses, so the 16-bit load/store
instructions don't work correctly.

Obviously, the reason for having the compiler use ARMv3 is to avoid
those instructions which we have no other way to prevent - however,
the use of ARMv3 with the assembler ensures that ldrh/strh are not
accidentally used.

We could argue that the ARMv3 assembly files are now stable, so the
chances of ldrh/strh being introduced is low, which would make this
change tolerable, but the commit message needs to spell out that
we lose this protection.

> Such a consideration,

> then, only applies to the compiler but not to the assembler. This commit

> passes -march=armv4 to the assembler in those cases, so that code

> written for ARMv4 will continue to compile and run fine, without needing

> module-specific asflags-y overrides.


Note that "code written for ARMv4" will not be usable on this platform
if it makes use of ldrh/strh, so depending on which instructions the
assembler is complaining about, it could very well be a real "you're
doing something wrong" case.

The side effect of this patch is that such cases will now be hidden
rather than evaluated on a case-by-case basis.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up
According to speedtest.net: 13Mbps down 490kbps up
Ard Biesheuvel Oct. 1, 2018, 6:10 p.m. UTC | #3
On 1 October 2018 at 19:56, Russell King - ARM Linux
<linux@armlinux.org.uk> wrote:
> On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote:

>> Per the discussion about half-way down in [1], the kernel doesn't

>> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA

>> hardware that benefits from ARMv3 code generation.

>

> The issue is to do with the half-word stores in the ARMv4 ISA, which

> need to be avoided on StrongARM RiscPC - the bus from the processor

> card (which was designed for ARM610 and ARM710) does not support

> anything except 8-bit and 32-bit accesses, so the 16-bit load/store

> instructions don't work correctly.

>

> Obviously, the reason for having the compiler use ARMv3 is to avoid

> those instructions which we have no other way to prevent - however,

> the use of ARMv3 with the assembler ensures that ldrh/strh are not

> accidentally used.

>

> We could argue that the ARMv3 assembly files are now stable, so the

> chances of ldrh/strh being introduced is low, which would make this

> change tolerable, but the commit message needs to spell out that

> we lose this protection.

>

>> Such a consideration,

>> then, only applies to the compiler but not to the assembler. This commit

>> passes -march=armv4 to the assembler in those cases, so that code

>> written for ARMv4 will continue to compile and run fine, without needing

>> module-specific asflags-y overrides.

>

> Note that "code written for ARMv4" will not be usable on this platform

> if it makes use of ldrh/strh, so depending on which instructions the

> assembler is complaining about, it could very well be a real "you're

> doing something wrong" case.

>

> The side effect of this patch is that such cases will now be hidden

> rather than evaluated on a case-by-case basis.

>


Thanks for the insight.

So Arnd's suggestion to switch to armv3-m would actually be feasible
then? The code in question does not use ldrh only umull, and so it
should build with armv3m as well. And we will even generate some
better code for RiscPC if we apply it to both the cflags and the
asflags.
Russell King (Oracle) Oct. 1, 2018, 6:13 p.m. UTC | #4
On Mon, Oct 01, 2018 at 08:10:26PM +0200, Ard Biesheuvel wrote:
> On 1 October 2018 at 19:56, Russell King - ARM Linux

> <linux@armlinux.org.uk> wrote:

> > On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote:

> >> Per the discussion about half-way down in [1], the kernel doesn't

> >> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA

> >> hardware that benefits from ARMv3 code generation.

> >

> > The issue is to do with the half-word stores in the ARMv4 ISA, which

> > need to be avoided on StrongARM RiscPC - the bus from the processor

> > card (which was designed for ARM610 and ARM710) does not support

> > anything except 8-bit and 32-bit accesses, so the 16-bit load/store

> > instructions don't work correctly.

> >

> > Obviously, the reason for having the compiler use ARMv3 is to avoid

> > those instructions which we have no other way to prevent - however,

> > the use of ARMv3 with the assembler ensures that ldrh/strh are not

> > accidentally used.

> >

> > We could argue that the ARMv3 assembly files are now stable, so the

> > chances of ldrh/strh being introduced is low, which would make this

> > change tolerable, but the commit message needs to spell out that

> > we lose this protection.

> >

> >> Such a consideration,

> >> then, only applies to the compiler but not to the assembler. This commit

> >> passes -march=armv4 to the assembler in those cases, so that code

> >> written for ARMv4 will continue to compile and run fine, without needing

> >> module-specific asflags-y overrides.

> >

> > Note that "code written for ARMv4" will not be usable on this platform

> > if it makes use of ldrh/strh, so depending on which instructions the

> > assembler is complaining about, it could very well be a real "you're

> > doing something wrong" case.

> >

> > The side effect of this patch is that such cases will now be hidden

> > rather than evaluated on a case-by-case basis.

> >

> 

> Thanks for the insight.

> 

> So Arnd's suggestion to switch to armv3-m would actually be feasible

> then? The code in question does not use ldrh only umull, and so it

> should build with armv3m as well. And we will even generate some

> better code for RiscPC if we apply it to both the cflags and the

> asflags.


Yep.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up
According to speedtest.net: 13Mbps down 490kbps up
Jason A. Donenfeld Oct. 2, 2018, 3:53 a.m. UTC | #5
Hi Arnd,

Apologies for the delay in getting back to you. I had some MTA issues
and stupidly assumed ARM developers were taking the day off instead...

On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote:
> -arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3

> +arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3m


Unfortunately this doesn't really cut it in my case, as it's not only
those multiplications:
chacha20-arm.S:402: Error: selected processor does not support `bxeq
lr' in ARM mode

I think we're going to wind up playing whack-a-mole in silly ways. The
fact of the matter is that the ARM assembly I'm adding to the tree is
for ARMv4 and up, and not for ARMv3.

I think there are three options to work around this issue:

1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends".
2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select
-march=armv4.
3) This patch.

My initial plan was (1). ArdB recommended I do (2) instead. I thought
that was a bit too nuanced and submitted (3).

It sounds like in light of the bus issues, (1) might be the best
solution after all?

Let me know, and I'll follow your direction.

Regards,
Jason
Jason A. Donenfeld Oct. 2, 2018, 3:56 a.m. UTC | #6
Hi Ard,

On Tue, Oct 2, 2018 at 5:54 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> So Arnd's suggestion to switch to armv3-m would actually be feasible

> then? The code in question does not use ldrh only umull, and so it

> should build with armv3m as well. And we will even generate some

> better code for RiscPC if we apply it to both the cflags and the

> asflags.


It turns out armv3m isn't a solution for the crypto code, since it
does other armv4-isms. So we'll need to find another solution for
that; I'm partial to my initial Kconfig one just doing "depends
!CPU_32V3".

With regards to RiscPC though, given that armv3 codegen might be
better, it sounds like it's a good idea to set -march=armv3m anyway
for the platform, irrespective of fixing the problem at hand here.

Regards,
Jason
Jason A. Donenfeld Oct. 2, 2018, 4:08 a.m. UTC | #7
Hi Russell,

> On 1 October 2018 at 19:56, Russell King - ARM Linux <linux@armlinux.org.uk> wrote:

> We could argue that the ARMv3 assembly files are now stable, so the

> chances of ldrh/strh being introduced is low, which would make this

> change tolerable, but the commit message needs to spell out that

> we lose this protection.


Actually I don't think that argument really holds too well. We very
well could introduce some new ARMv4 assembly -- an optimized crypto
routine, for example -- and it could use ldrh/strh. That isn't the
case now, but it might be the case later. For that reason, I suspect
the proper solution is just not building new cryptography assembly
that's written for ARMv4 in mind on CPU_32v3 systems. This way there's
never a mismatch of expectations.

Jason
Ard Biesheuvel Oct. 2, 2018, 5:39 a.m. UTC | #8
On 2 October 2018 at 05:53, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Arnd,

>

> Apologies for the delay in getting back to you. I had some MTA issues

> and stupidly assumed ARM developers were taking the day off instead...

>

> On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote:

>> -arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3

>> +arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3m

>

> Unfortunately this doesn't really cut it in my case, as it's not only

> those multiplications:

> chacha20-arm.S:402: Error: selected processor does not support `bxeq

> lr' in ARM mode

>


We have a macro for doing function returns: just replace 'bxeq lr'
with 'reteq lr' (and be sure to include <asm/assembler.h>)


> I think we're going to wind up playing whack-a-mole in silly ways. The

> fact of the matter is that the ARM assembly I'm adding to the tree is

> for ARMv4 and up, and not for ARMv3.

>

> I think there are three options to work around this issue:

>

> 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends".

> 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select

> -march=armv4.

> 3) This patch.

>

> My initial plan was (1). ArdB recommended I do (2) instead. I thought

> that was a bit too nuanced and submitted (3).

>

> It sounds like in light of the bus issues, (1) might be the best

> solution after all?

>

> Let me know, and I'll follow your direction.

>

> Regards,

> Jason
Arnd Bergmann Oct. 2, 2018, 7:51 a.m. UTC | #9
On Tue, Oct 2, 2018 at 5:53 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>

> Hi Arnd,

>

> Apologies for the delay in getting back to you. I had some MTA issues

> and stupidly assumed ARM developers were taking the day off instead...

>

> On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > -arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3

> > +arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3m

>

> Unfortunately this doesn't really cut it in my case, as it's not only

> those multiplications:

> chacha20-arm.S:402: Error: selected processor does not support `bxeq

> lr' in ARM mode

>

> I think we're going to wind up playing whack-a-mole in silly ways. The

> fact of the matter is that the ARM assembly I'm adding to the tree is

> for ARMv4 and up, and not for ARMv3.


I don't see what issues remain. The 'reteq lr' that Ard mentioned
is definitely the correct way to return from assembly (you also need
that for plain armv4, as 'bx' was added in armv4t), and Russell
confirmed that using -march=armv3m is something we want
anyway for mach-rpc.

> I think there are three options to work around this issue:

>

> 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends".

> 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select

> -march=armv4.

> 3) This patch.

>

> My initial plan was (1). ArdB recommended I do (2) instead. I thought

> that was a bit too nuanced and submitted (3).


I suspect all three of the above fail to work for armv4.

         Arnd
Ard Biesheuvel Oct. 2, 2018, 9:16 a.m. UTC | #10
(adding Eric since he wrote the ChaCha20 scalar code)

On 2 October 2018 at 09:51, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tue, Oct 2, 2018 at 5:53 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

>>

>> Hi Arnd,

>>

>> Apologies for the delay in getting back to you. I had some MTA issues

>> and stupidly assumed ARM developers were taking the day off instead...

>>

>> On Tue, Oct 2, 2018 at 5:33 AM Arnd Bergmann <arnd@arndb.de> wrote:

>> > -arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3

>> > +arch-$(CONFIG_CPU_32v3)                =-D__LINUX_ARM_ARCH__=3 -march=armv3m

>>

>> Unfortunately this doesn't really cut it in my case, as it's not only

>> those multiplications:

>> chacha20-arm.S:402: Error: selected processor does not support `bxeq

>> lr' in ARM mode

>>

>> I think we're going to wind up playing whack-a-mole in silly ways. The

>> fact of the matter is that the ARM assembly I'm adding to the tree is

>> for ARMv4 and up, and not for ARMv3.

>

> I don't see what issues remain. The 'reteq lr' that Ard mentioned

> is definitely the correct way to return from assembly (you also need

> that for plain armv4, as 'bx' was added in armv4t), and Russell

> confirmed that using -march=armv3m is something we want

> anyway for mach-rpc.

>


In fact, this bxeq instruction is the only remaining impediment to
building all scalar code with -march-arm3m, and looking at the code

ENTRY(chacha20_arm)
    cmp r2, #0 // len == 0?
    bxeq lr

it seems to me that we can move this len == 0 check into the caller instead.

index 163815f51aac..b2108e00d451 100644
--- a/lib/zinc/chacha20/chacha20-arm-glue.h
+++ b/lib/zinc/chacha20/chacha20-arm-glue.h
@@ -59,6 +59,8 @@ static inline bool chacha20_arch(struct chacha20_ctx
*ctx, u8 *dst,
                        src += bytes;
                        simd_relax(simd_context);
                } else {
+                       if (unlikely(!len))
+                               break;
                        chacha20_arm(dst, src, len, ctx->key, ctx->counter);
                        ctx->counter[0] += (len + 63) / 64;
                        break;
diff --git a/lib/zinc/chacha20/chacha20-arm.S b/lib/zinc/chacha20/chacha20-arm.S
index 5abedafcf129..845843a14ab1 100644
--- a/lib/zinc/chacha20/chacha20-arm.S
+++ b/lib/zinc/chacha20/chacha20-arm.S
@@ -398,9 +398,6 @@
  *                  const u32 iv[4]);
  */
 ENTRY(chacha20_arm)
-       cmp             r2, #0                  // len == 0?
-       bxeq            lr
-
        push            {r0-r2,r4-r11,lr}

        // Push state x0-x15 onto stack.
Jason A. Donenfeld Oct. 2, 2018, 12:30 p.m. UTC | #11
Hi Arnd,

On Tue, Oct 2, 2018 at 9:58 AM Arnd Bergmann <arnd@arndb.de> wrote:
> > I think we're going to wind up playing whack-a-mole in silly ways. The

> > fact of the matter is that the ARM assembly I'm adding to the tree is

> > for ARMv4 and up, and not for ARMv3.

>

> I don't see what issues remain. The 'reteq lr' that Ard mentioned

> is definitely the correct way to return from assembly (you also need

> that for plain armv4, as 'bx' was added in armv4t), and Russell

> confirmed that using -march=armv3m is something we want

> anyway for mach-rpc.


I'll do that. I can confirm that after changing it to `reteq lr`,
everything works well with armv3m.

> > I think there are three options to work around this issue:

> >

> > 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends".

> > 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select

> > -march=armv4.

> > 3) This patch.

> >

> > My initial plan was (1). ArdB recommended I do (2) instead. I thought

> > that was a bit too nuanced and submitted (3).

>

> I suspect all three of the above fail to work for armv4.


Armv4 does actually work in this configuration, in fact. But anyway,
I'll go with your primary suggestion and we therefore can move ahead
with changing the global cflag to march=armv3m. Would you like me to
submit the patch for this, or would yo like to handle it?

Jason
Ard Biesheuvel Oct. 2, 2018, 1:10 p.m. UTC | #12
On 2 October 2018 at 14:30, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Arnd,

>

> On Tue, Oct 2, 2018 at 9:58 AM Arnd Bergmann <arnd@arndb.de> wrote:

>> > I think we're going to wind up playing whack-a-mole in silly ways. The

>> > fact of the matter is that the ARM assembly I'm adding to the tree is

>> > for ARMv4 and up, and not for ARMv3.

>>

>> I don't see what issues remain. The 'reteq lr' that Ard mentioned

>> is definitely the correct way to return from assembly (you also need

>> that for plain armv4, as 'bx' was added in armv4t), and Russell

>> confirmed that using -march=armv3m is something we want

>> anyway for mach-rpc.

>

> I'll do that. I can confirm that after changing it to `reteq lr`,

> everything works well with armv3m.

>

>> > I think there are three options to work around this issue:

>> >

>> > 1) Not build my assembly when CONFIG_CPU_32v3 via a Kconfig "depends".

>> > 2) Set asflags-$(CONFIG_CPU_32v3) inside my module locally to select

>> > -march=armv4.

>> > 3) This patch.

>> >

>> > My initial plan was (1). ArdB recommended I do (2) instead. I thought

>> > that was a bit too nuanced and submitted (3).

>>

>> I suspect all three of the above fail to work for armv4.

>

> Armv4 does actually work in this configuration, in fact.


It builds but it doesn't run, at least not when built into the kernel
proper (which will be the case after random.c moves to this library)

Your toolchain is implicitly passing --fix-v4bx to the assembler,
which causes it to permit bx instructions in ARMv4 object code, but
tag them with special R_ARM_V4BX ELF relocations. The ARM module
loader does take these into account, so built as a module, it works.
However, when built into the core kernel, we have to rely on the
linker to patch this instruction into the ARMv4 equivalent, and a
quick check reveals that that is currently not the case.

Bottom line: let's not go there.

> But anyway,

> I'll go with your primary suggestion and we therefore can move ahead

> with changing the global cflag to march=armv3m. Would you like me to

> submit the patch for this, or would yo like to handle it?

>


Yes please go ahead.
Jason A. Donenfeld Oct. 2, 2018, 1:12 p.m. UTC | #13
Hey Ard,

On Tue, Oct 2, 2018 at 3:10 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>

> It builds but it doesn't run, at least not when built into the kernel

> proper (which will be the case after random.c moves to this library)

>

> Your toolchain is implicitly passing --fix-v4bx to the assembler,

> which causes it to permit bx instructions in ARMv4 object code, but

> tag them with special R_ARM_V4BX ELF relocations. The ARM module

> loader does take these into account, so built as a module, it works.

> However, when built into the core kernel, we have to rely on the

> linker to patch this instruction into the ARMv4 equivalent, and a

> quick check reveals that that is currently not the case.

>

> Bottom line: let's not go there.


Oh, thanks a bunch for explaining that. Since I wrote the last email
I've been playing around with as, readelf, and qemu and was slightly
mystified.

>

> > But anyway,

> > I'll go with your primary suggestion and we therefore can move ahead

> > with changing the global cflag to march=armv3m. Would you like me to

> > submit the patch for this, or would yo like to handle it?

> >

>

> Yes please go ahead.


Alright, incoming in a few minutes.

Jason
diff mbox series

Patch

diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index d1516f85f25d..7a264cacb482 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -76,8 +76,14 @@  arch-$(CONFIG_CPU_32v4T)	=-D__LINUX_ARM_ARCH__=4 -march=armv4t
 arch-$(CONFIG_CPU_32v4)		=-D__LINUX_ARM_ARCH__=4 -march=armv4
 arch-$(CONFIG_CPU_32v3)		=-D__LINUX_ARM_ARCH__=3 -march=armv3
 
+# We do not actually support the ARMv3 ISA and prefer it above only for
+# code generation purposes, which does not apply to assembly. So we pass
+# v4 to the assembler, so that we can still assemble all instructions.
+arch-asflags-$(CONFIG_CPU_32v3)	=-march=armv4
+
 # Evaluate arch cc-option calls now
 arch-y := $(arch-y)
+arch-asflags-y := $(arch-asflags-y)
 
 # This selects how we optimise for the processor.
 tune-$(CONFIG_CPU_ARM7TDMI)	=-mtune=arm7tdmi
@@ -130,7 +136,7 @@  endif
 
 # Need -Uarm for gcc < 3.x
 KBUILD_CFLAGS	+=$(CFLAGS_ABI) $(CFLAGS_ISA) $(arch-y) $(tune-y) $(call cc-option,-mshort-load-bytes,$(call cc-option,-malignment-traps,)) -msoft-float -Uarm
-KBUILD_AFLAGS	+=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) -include asm/unified.h -msoft-float
+KBUILD_AFLAGS	+=$(CFLAGS_ABI) $(AFLAGS_ISA) $(arch-y) $(tune-y) $(arch-asflags-y) -include asm/unified.h -msoft-float
 
 CHECKFLAGS	+= -D__arm__