diff mbox series

[2/2] ARM: futex: make futex_detect_cmpxchg more reliable

Message ID 20190307091514.2489338-2-arnd@arndb.de
State New
Headers show
Series [1/2] futex: mark futex_detect_cmpxchg() as 'noinline' | expand

Commit Message

Arnd Bergmann March 7, 2019, 9:14 a.m. UTC
Passing registers containing zero as both the address (NULL pointer)
and data into cmpxchg_futex_value_locked() leads clang to assign
the same register for both inputs on ARM, which triggers a warning
explaining that this instruction has unpredictable behavior on ARMv5.

/tmp/futex-7e740e.s: Assembler messages:
/tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,
as Mikael wrote:
 "One way of fixing this is to make uaddr an input/output register, since
 "that prevents it from overlapping any other input or output."

but then withdrawn as the warning was determined to be harmless, and it
apparently never showed up again with later gcc versions.

Now the same problem is back when compiling with clang, and we are trying
to get clang to build the kernel without warnings, as gcc normally does.

Cc: Mikael Pettersson <mikpe@it.uu.se>
Cc: Mikael Pettersson <mikpelinux@gmail.com>
Cc: Dave Martin <Dave.Martin@arm.com>
Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
 arch/arm/include/asm/futex.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
2.20.0

Comments

Nick Desaulniers March 7, 2019, 7:39 p.m. UTC | #1
On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:
>

> Passing registers containing zero as both the address (NULL pointer)

> and data into cmpxchg_futex_value_locked() leads clang to assign

> the same register for both inputs on ARM, which triggers a warning

> explaining that this instruction has unpredictable behavior on ARMv5.

>

> /tmp/futex-7e740e.s: Assembler messages:

> /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

>

> This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> as Mikael wrote:

>  "One way of fixing this is to make uaddr an input/output register, since

>  "that prevents it from overlapping any other input or output."

>

> but then withdrawn as the warning was determined to be harmless, and it

> apparently never showed up again with later gcc versions.

>

> Now the same problem is back when compiling with clang, and we are trying

> to get clang to build the kernel without warnings, as gcc normally does.

>

> Cc: Mikael Pettersson <mikpe@it.uu.se>

> Cc: Mikael Pettersson <mikpelinux@gmail.com>

> Cc: Dave Martin <Dave.Martin@arm.com>

> Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> ---

>  arch/arm/include/asm/futex.h | 10 +++++-----

>  1 file changed, 5 insertions(+), 5 deletions(-)

>

> diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> index 0a46676b4245..79790912974e 100644

> --- a/arch/arm/include/asm/futex.h

> +++ b/arch/arm/include/asm/futex.h

> @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

>         preempt_disable();

>         __ua_flags = uaccess_save_and_enable();

>         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> -       "       teq     %1, %2\n"

> +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> +       "       teq     %1, %3\n"

>         "       it      eq      @ explicit IT needed for the 2b label\n"

> -       "2:     " TUSER(streq) "        %3, [%4]\n"

> +       "2:     " TUSER(streq) "        %4, [%2]\n"

>         __futex_atomic_ex_table("%5")

> -       : "+r" (ret), "=&r" (val)

> -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

>         : "cc", "memory");

>         uaccess_restore(__ua_flags);


Underspecification of constraints to extended inline assembly is a
common issue exposed by other compilers (and possibly but in-effect
infrequently compiler upgrades).
So the reordering of the constraints means the in the assembly (notes
for other reviewers):
%2 -> %3
%3 -> %4
%4 -> %2
Yep, looks good to me, thanks for finding this old patch and resending, Arnd!
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>


I think it would be good to further credit Mikael with reported by and
suggested by tags, but not sure which email address is preferred?

-- 
Thanks,
~Nick Desaulniers
Russell King (Oracle) March 7, 2019, 11:48 p.m. UTC | #2
On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:
> On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> >

> > Passing registers containing zero as both the address (NULL pointer)

> > and data into cmpxchg_futex_value_locked() leads clang to assign

> > the same register for both inputs on ARM, which triggers a warning

> > explaining that this instruction has unpredictable behavior on ARMv5.

> >

> > /tmp/futex-7e740e.s: Assembler messages:

> > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> >

> > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > as Mikael wrote:

> >  "One way of fixing this is to make uaddr an input/output register, since

> >  "that prevents it from overlapping any other input or output."

> >

> > but then withdrawn as the warning was determined to be harmless, and it

> > apparently never showed up again with later gcc versions.

> >

> > Now the same problem is back when compiling with clang, and we are trying

> > to get clang to build the kernel without warnings, as gcc normally does.

> >

> > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > Cc: Dave Martin <Dave.Martin@arm.com>

> > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > ---

> >  arch/arm/include/asm/futex.h | 10 +++++-----

> >  1 file changed, 5 insertions(+), 5 deletions(-)

> >

> > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > index 0a46676b4245..79790912974e 100644

> > --- a/arch/arm/include/asm/futex.h

> > +++ b/arch/arm/include/asm/futex.h

> > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> >         preempt_disable();

> >         __ua_flags = uaccess_save_and_enable();

> >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > -       "       teq     %1, %2\n"

> > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > +       "       teq     %1, %3\n"

> >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> >         __futex_atomic_ex_table("%5")

> > -       : "+r" (ret), "=&r" (val)

> > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> >         : "cc", "memory");

> >         uaccess_restore(__ua_flags);

> 

> Underspecification of constraints to extended inline assembly is a

> common issue exposed by other compilers (and possibly but in-effect

> infrequently compiler upgrades).

> So the reordering of the constraints means the in the assembly (notes

> for other reviewers):

> %2 -> %3

> %3 -> %4

> %4 -> %2

> Yep, looks good to me, thanks for finding this old patch and resending, Arnd!


I don't see what is "underspecified" in the original constraints.
Please explain.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Nick Desaulniers March 8, 2019, 12:04 a.m. UTC | #3
On Thu, Mar 7, 2019 at 3:49 PM Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > Underspecification of constraints to extended inline assembly is a

> > common issue exposed by other compilers (and possibly but in-effect

> > infrequently compiler upgrades).

>

> I don't see what is "underspecified" in the original constraints.

> Please explain.


From the link:

The problem is that in the T(streq) insn, %3 and %4 MUST be different registers,
but nothing in the asm() constrains them to be different.

> > >  "One way of fixing this is to make uaddr an input/output register, since

> > >  "that prevents it from overlapping any other input or output."


-- 
Thanks,
~Nick Desaulniers
Ard Biesheuvel March 8, 2019, 8:57 a.m. UTC | #4
On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > >

> > > Passing registers containing zero as both the address (NULL pointer)

> > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > the same register for both inputs on ARM, which triggers a warning

> > > explaining that this instruction has unpredictable behavior on ARMv5.

> > >

> > > /tmp/futex-7e740e.s: Assembler messages:

> > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > >

> > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > as Mikael wrote:

> > >  "One way of fixing this is to make uaddr an input/output register, since

> > >  "that prevents it from overlapping any other input or output."

> > >

> > > but then withdrawn as the warning was determined to be harmless, and it

> > > apparently never showed up again with later gcc versions.

> > >

> > > Now the same problem is back when compiling with clang, and we are trying

> > > to get clang to build the kernel without warnings, as gcc normally does.

> > >

> > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > ---

> > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > >

> > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > index 0a46676b4245..79790912974e 100644

> > > --- a/arch/arm/include/asm/futex.h

> > > +++ b/arch/arm/include/asm/futex.h

> > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > >         preempt_disable();

> > >         __ua_flags = uaccess_save_and_enable();

> > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > -       "       teq     %1, %2\n"

> > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > +       "       teq     %1, %3\n"

> > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > >         __futex_atomic_ex_table("%5")

> > > -       : "+r" (ret), "=&r" (val)

> > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > >         : "cc", "memory");

> > >         uaccess_restore(__ua_flags);

> >

> > Underspecification of constraints to extended inline assembly is a

> > common issue exposed by other compilers (and possibly but in-effect

> > infrequently compiler upgrades).

> > So the reordering of the constraints means the in the assembly (notes

> > for other reviewers):

> > %2 -> %3

> > %3 -> %4

> > %4 -> %2

> > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

>

> I don't see what is "underspecified" in the original constraints.

> Please explain.

>


I agree that that statement makes little sense.

As Russell points out in the referenced thread, there is nothing wrong
with the generated assembly, given that the UNPREDICTABLE opcode is
unreachable in practice. Unfortunately, we have no way to flag this
diagnostic as a known false positive, and AFAICT, there is no reason
we couldn't end up with the same diagnostic popping up for GCC builds
in the future, considering that the register assignment matches the
constraints. (We have seen somewhat similar issues where constant
folded function clones are emitted with a constant argument that could
never occur in reality [0])

Given the above, the only meaningful way to invoke this function is
with different registers assigned to %3 and %4, and so tightening the
constraints to guarantee that does not actually result in worse code
(except maybe for the instantiations that we won't ever call in the
first place). So I think we should fix this.

I wonder if just adding

BUG_ON(__builtin_constant_p(uaddr));

at the beginning makes any difference - this shouldn't result in any
object code differences since the conditional will always evaluate to
false at build time for instantiations we care about.


[0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/
Russell King (Oracle) March 8, 2019, 9:53 a.m. UTC | #5
On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > >

> > > > Passing registers containing zero as both the address (NULL pointer)

> > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > the same register for both inputs on ARM, which triggers a warning

> > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > >

> > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > >

> > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > as Mikael wrote:

> > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > >  "that prevents it from overlapping any other input or output."

> > > >

> > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > apparently never showed up again with later gcc versions.

> > > >

> > > > Now the same problem is back when compiling with clang, and we are trying

> > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > >

> > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > ---

> > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > >

> > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > index 0a46676b4245..79790912974e 100644

> > > > --- a/arch/arm/include/asm/futex.h

> > > > +++ b/arch/arm/include/asm/futex.h

> > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > >         preempt_disable();

> > > >         __ua_flags = uaccess_save_and_enable();

> > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > -       "       teq     %1, %2\n"

> > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > +       "       teq     %1, %3\n"

> > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > >         __futex_atomic_ex_table("%5")

> > > > -       : "+r" (ret), "=&r" (val)

> > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > >         : "cc", "memory");

> > > >         uaccess_restore(__ua_flags);

> > >

> > > Underspecification of constraints to extended inline assembly is a

> > > common issue exposed by other compilers (and possibly but in-effect

> > > infrequently compiler upgrades).

> > > So the reordering of the constraints means the in the assembly (notes

> > > for other reviewers):

> > > %2 -> %3

> > > %3 -> %4

> > > %4 -> %2

> > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> >

> > I don't see what is "underspecified" in the original constraints.

> > Please explain.

> >

> 

> I agree that that statement makes little sense.

> 

> As Russell points out in the referenced thread, there is nothing wrong

> with the generated assembly, given that the UNPREDICTABLE opcode is

> unreachable in practice. Unfortunately, we have no way to flag this

> diagnostic as a known false positive, and AFAICT, there is no reason

> we couldn't end up with the same diagnostic popping up for GCC builds

> in the future, considering that the register assignment matches the

> constraints. (We have seen somewhat similar issues where constant

> folded function clones are emitted with a constant argument that could

> never occur in reality [0])

> 

> Given the above, the only meaningful way to invoke this function is

> with different registers assigned to %3 and %4, and so tightening the

> constraints to guarantee that does not actually result in worse code

> (except maybe for the instantiations that we won't ever call in the

> first place). So I think we should fix this.

> 

> I wonder if just adding

> 

> BUG_ON(__builtin_constant_p(uaddr));

> 

> at the beginning makes any difference - this shouldn't result in any

> object code differences since the conditional will always evaluate to

> false at build time for instantiations we care about.

> 

> 

> [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/


What I'm actually asking is:

The GCC manual says that input operands _may_ overlap output operands
since GCC assumes that input operands are consumed before output
operands are written.  This is an explicit statement.

The GCC manual does not say that input operands may overlap with each
other, and the behaviour of GCC thus far (apart from one version,
presumably caused by a bug) has been that input operands are unique.

Clang appears to be different: it allows input operands that are
registers, and contain the same constant value to be the same physical
register.

The assertion is that the constraints are under-specified.  I am
questioning that assertion.

If the constraints are under-specified, I would have expected gcc-4.4's
behaviour to have persisted, and we would've been told by gcc's
developers to fix our code.  That didn't happen, and instead gcc seems
to have been fixed.  So, my conclusion is that it is intentional that
input operands to asm() do not overlap with themselves.

It seems to me that the work-around for clang is to change every input
operand to be an output operand with a "+&r" contraint - an operand
that is both read and written by the "instruction", and that the operand
is "earlyclobber".  For something that is really only read, that seems
strange.

Also, reading GCC's manual, it would appear that "+&" is wrong.

`+'
     Means that this operand is both read and written by the
     instruction.

     When the compiler fixes up the operands to satisfy the constraints,
     it needs to know which operands are inputs to the instruction and
     which are outputs from it.  `=' identifies an output; `+'
     identifies an operand that is both input and output; all other
                                   ^^^^^^^^^^^^^^^^^^^^^
     operands are assumed to be input only.

`&'
     Means (in a particular alternative) that this operand is an
     "earlyclobber" operand, which is modified before the instruction is
     finished using the input operands.  Therefore, this operand may
     not lie in a register that is used as an input operand or as part
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     of any memory address.

So "+" says that this operand is an input but "&" says that it must not
be in a register that is used as an input.  That's contradictory, and I
think we can expect GCC to barf or at least end up doing strange stuff,
if not with existing versions, then with future versions.

Hence, I'm asking for clarification why it is thought that the existing
code underspecifies the asm constraints, and I'm trying to get some more
thought about what the constraints should be, in case there is a need to
use "better" constraints.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Russell King (Oracle) March 8, 2019, 9:54 a.m. UTC | #6
On Thu, Mar 07, 2019 at 04:04:23PM -0800, Nick Desaulniers wrote:
> On Thu, Mar 7, 2019 at 3:49 PM Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > Underspecification of constraints to extended inline assembly is a

> > > common issue exposed by other compilers (and possibly but in-effect

> > > infrequently compiler upgrades).

> >

> > I don't see what is "underspecified" in the original constraints.

> > Please explain.

> 

> From the link:

> 

> The problem is that in the T(streq) insn, %3 and %4 MUST be different registers,

> but nothing in the asm() constrains them to be different.


Thanks, but that is not what I'm asking.  See my reply to Ard.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Ard Biesheuvel March 8, 2019, 10:08 a.m. UTC | #7
On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:

> > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> > <linux@armlinux.org.uk> wrote:

> > >

> > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > > >

> > > > > Passing registers containing zero as both the address (NULL pointer)

> > > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > > the same register for both inputs on ARM, which triggers a warning

> > > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > > >

> > > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > > >

> > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > > as Mikael wrote:

> > > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > > >  "that prevents it from overlapping any other input or output."

> > > > >

> > > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > > apparently never showed up again with later gcc versions.

> > > > >

> > > > > Now the same problem is back when compiling with clang, and we are trying

> > > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > > >

> > > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > > ---

> > > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > > >

> > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > > index 0a46676b4245..79790912974e 100644

> > > > > --- a/arch/arm/include/asm/futex.h

> > > > > +++ b/arch/arm/include/asm/futex.h

> > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > > >         preempt_disable();

> > > > >         __ua_flags = uaccess_save_and_enable();

> > > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > > -       "       teq     %1, %2\n"

> > > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > > +       "       teq     %1, %3\n"

> > > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > > >         __futex_atomic_ex_table("%5")

> > > > > -       : "+r" (ret), "=&r" (val)

> > > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > > >         : "cc", "memory");

> > > > >         uaccess_restore(__ua_flags);

> > > >

> > > > Underspecification of constraints to extended inline assembly is a

> > > > common issue exposed by other compilers (and possibly but in-effect

> > > > infrequently compiler upgrades).

> > > > So the reordering of the constraints means the in the assembly (notes

> > > > for other reviewers):

> > > > %2 -> %3

> > > > %3 -> %4

> > > > %4 -> %2

> > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> > >

> > > I don't see what is "underspecified" in the original constraints.

> > > Please explain.

> > >

> >

> > I agree that that statement makes little sense.

> >

> > As Russell points out in the referenced thread, there is nothing wrong

> > with the generated assembly, given that the UNPREDICTABLE opcode is

> > unreachable in practice. Unfortunately, we have no way to flag this

> > diagnostic as a known false positive, and AFAICT, there is no reason

> > we couldn't end up with the same diagnostic popping up for GCC builds

> > in the future, considering that the register assignment matches the

> > constraints. (We have seen somewhat similar issues where constant

> > folded function clones are emitted with a constant argument that could

> > never occur in reality [0])

> >

> > Given the above, the only meaningful way to invoke this function is

> > with different registers assigned to %3 and %4, and so tightening the

> > constraints to guarantee that does not actually result in worse code

> > (except maybe for the instantiations that we won't ever call in the

> > first place). So I think we should fix this.

> >

> > I wonder if just adding

> >

> > BUG_ON(__builtin_constant_p(uaddr));

> >

> > at the beginning makes any difference - this shouldn't result in any

> > object code differences since the conditional will always evaluate to

> > false at build time for instantiations we care about.

> >

> >

> > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/

>

> What I'm actually asking is:

>

> The GCC manual says that input operands _may_ overlap output operands

> since GCC assumes that input operands are consumed before output

> operands are written.  This is an explicit statement.

>

> The GCC manual does not say that input operands may overlap with each

> other, and the behaviour of GCC thus far (apart from one version,

> presumably caused by a bug) has been that input operands are unique.

>


Not entirely. I have run into issues where GCC assumes that registers
that are only used for input operands are left untouched by the asm
code. I.e., if you put an asm() block in a loop and modify an input
register, your code may break on the next pass, even if the input
register does not overlap with an output register.

To me, that seems to suggest that whether or not inputs may overlap is
irrelevant, since they are not expected to be modified.

> Clang appears to be different: it allows input operands that are

> registers, and contain the same constant value to be the same physical

> register.

>

> The assertion is that the constraints are under-specified.  I am

> questioning that assertion.

>

> If the constraints are under-specified, I would have expected gcc-4.4's

> behaviour to have persisted, and we would've been told by gcc's

> developers to fix our code.  That didn't happen, and instead gcc seems

> to have been fixed.  So, my conclusion is that it is intentional that

> input operands to asm() do not overlap with themselves.

>


Whether we hit the error or not is not deterministic. Like in the
ilog2() case I quoted, GCC may decide to instantiate a constant folded
['curried', if you will] clone of a function, and so even if any calls
to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval
and uaddr are compiled, it does not mean they occur like that in the C
code.

> It seems to me that the work-around for clang is to change every input

> operand to be an output operand with a "+&r" contraint - an operand

> that is both read and written by the "instruction", and that the operand

> is "earlyclobber".  For something that is really only read, that seems

> strange.

>

> Also, reading GCC's manual, it would appear that "+&" is wrong.

>

> `+'

>      Means that this operand is both read and written by the

>      instruction.

>

>      When the compiler fixes up the operands to satisfy the constraints,

>      it needs to know which operands are inputs to the instruction and

>      which are outputs from it.  `=' identifies an output; `+'

>      identifies an operand that is both input and output; all other

>                                    ^^^^^^^^^^^^^^^^^^^^^

>      operands are assumed to be input only.

>

> `&'

>      Means (in a particular alternative) that this operand is an

>      "earlyclobber" operand, which is modified before the instruction is

>      finished using the input operands.  Therefore, this operand may

>      not lie in a register that is used as an input operand or as part

>      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>      of any memory address.

>

> So "+" says that this operand is an input but "&" says that it must not

> be in a register that is used as an input.  That's contradictory, and I

> think we can expect GCC to barf or at least end up doing strange stuff,

> if not with existing versions, then with future versions.

>


I wondered about the same thing: given that the asm itself is a black
box to the compiler, it can never reuse an in/output register for
output, so when it is clobbered is irrelevant.

> Hence, I'm asking for clarification why it is thought that the existing

> code underspecifies the asm constraints, and I'm trying to get some more

> thought about what the constraints should be, in case there is a need to

> use "better" constraints.

>


I think the constraints are correct, but as I argued before,
tightening the constraints to ensure that uaddr and newval are not
mapped onto the same register should not result in any object code
changes, except for the case where the compiler instantiated a
constprop clone that is bogus to begin with.
Ard Biesheuvel March 8, 2019, 10:16 a.m. UTC | #8
On Fri, 8 Mar 2019 at 11:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>

> On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:

> > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > > >

> > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > > > >

> > > > > > Passing registers containing zero as both the address (NULL pointer)

> > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > > > the same register for both inputs on ARM, which triggers a warning

> > > > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > > > >

> > > > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > > > >

> > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > > > as Mikael wrote:

> > > > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > > > >  "that prevents it from overlapping any other input or output."

> > > > > >

> > > > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > > > apparently never showed up again with later gcc versions.

> > > > > >

> > > > > > Now the same problem is back when compiling with clang, and we are trying

> > > > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > > > >

> > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > > > ---

> > > > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > > > >

> > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > > > index 0a46676b4245..79790912974e 100644

> > > > > > --- a/arch/arm/include/asm/futex.h

> > > > > > +++ b/arch/arm/include/asm/futex.h

> > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > > > >         preempt_disable();

> > > > > >         __ua_flags = uaccess_save_and_enable();

> > > > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > > > -       "       teq     %1, %2\n"

> > > > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > > > +       "       teq     %1, %3\n"

> > > > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > > > >         __futex_atomic_ex_table("%5")

> > > > > > -       : "+r" (ret), "=&r" (val)

> > > > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > > > >         : "cc", "memory");

> > > > > >         uaccess_restore(__ua_flags);

> > > > >

> > > > > Underspecification of constraints to extended inline assembly is a

> > > > > common issue exposed by other compilers (and possibly but in-effect

> > > > > infrequently compiler upgrades).

> > > > > So the reordering of the constraints means the in the assembly (notes

> > > > > for other reviewers):

> > > > > %2 -> %3

> > > > > %3 -> %4

> > > > > %4 -> %2

> > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> > > >

> > > > I don't see what is "underspecified" in the original constraints.

> > > > Please explain.

> > > >

> > >

> > > I agree that that statement makes little sense.

> > >

> > > As Russell points out in the referenced thread, there is nothing wrong

> > > with the generated assembly, given that the UNPREDICTABLE opcode is

> > > unreachable in practice. Unfortunately, we have no way to flag this

> > > diagnostic as a known false positive, and AFAICT, there is no reason

> > > we couldn't end up with the same diagnostic popping up for GCC builds

> > > in the future, considering that the register assignment matches the

> > > constraints. (We have seen somewhat similar issues where constant

> > > folded function clones are emitted with a constant argument that could

> > > never occur in reality [0])

> > >

> > > Given the above, the only meaningful way to invoke this function is

> > > with different registers assigned to %3 and %4, and so tightening the

> > > constraints to guarantee that does not actually result in worse code

> > > (except maybe for the instantiations that we won't ever call in the

> > > first place). So I think we should fix this.

> > >

> > > I wonder if just adding

> > >

> > > BUG_ON(__builtin_constant_p(uaddr));

> > >

> > > at the beginning makes any difference - this shouldn't result in any

> > > object code differences since the conditional will always evaluate to

> > > false at build time for instantiations we care about.

> > >

> > >

> > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/

> >

> > What I'm actually asking is:

> >

> > The GCC manual says that input operands _may_ overlap output operands

> > since GCC assumes that input operands are consumed before output

> > operands are written.  This is an explicit statement.

> >

> > The GCC manual does not say that input operands may overlap with each

> > other, and the behaviour of GCC thus far (apart from one version,

> > presumably caused by a bug) has been that input operands are unique.

> >

>

> Not entirely. I have run into issues where GCC assumes that registers

> that are only used for input operands are left untouched by the asm

> code. I.e., if you put an asm() block in a loop and modify an input

> register, your code may break on the next pass, even if the input

> register does not overlap with an output register.

>

> To me, that seems to suggest that whether or not inputs may overlap is

> irrelevant, since they are not expected to be modified.

>

> > Clang appears to be different: it allows input operands that are

> > registers, and contain the same constant value to be the same physical

> > register.

> >

> > The assertion is that the constraints are under-specified.  I am

> > questioning that assertion.

> >

> > If the constraints are under-specified, I would have expected gcc-4.4's

> > behaviour to have persisted, and we would've been told by gcc's

> > developers to fix our code.  That didn't happen, and instead gcc seems

> > to have been fixed.  So, my conclusion is that it is intentional that

> > input operands to asm() do not overlap with themselves.

> >

>

> Whether we hit the error or not is not deterministic. Like in the

> ilog2() case I quoted, GCC may decide to instantiate a constant folded

> ['curried', if you will] clone of a function, and so even if any calls

> to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval

> and uaddr are compiled, it does not mean they occur like that in the C

> code.

>

> > It seems to me that the work-around for clang is to change every input

> > operand to be an output operand with a "+&r" contraint - an operand

> > that is both read and written by the "instruction", and that the operand

> > is "earlyclobber".  For something that is really only read, that seems

> > strange.

> >

> > Also, reading GCC's manual, it would appear that "+&" is wrong.

> >

> > `+'

> >      Means that this operand is both read and written by the

> >      instruction.

> >

> >      When the compiler fixes up the operands to satisfy the constraints,

> >      it needs to know which operands are inputs to the instruction and

> >      which are outputs from it.  `=' identifies an output; `+'

> >      identifies an operand that is both input and output; all other

> >                                    ^^^^^^^^^^^^^^^^^^^^^

> >      operands are assumed to be input only.

> >

> > `&'

> >      Means (in a particular alternative) that this operand is an

> >      "earlyclobber" operand, which is modified before the instruction is

> >      finished using the input operands.  Therefore, this operand may

> >      not lie in a register that is used as an input operand or as part

> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> >      of any memory address.

> >

> > So "+" says that this operand is an input but "&" says that it must not

> > be in a register that is used as an input.  That's contradictory, and I

> > think we can expect GCC to barf or at least end up doing strange stuff,

> > if not with existing versions, then with future versions.

> >

>

> I wondered about the same thing: given that the asm itself is a black

> box to the compiler, it can never reuse an in/output register for

> output, so when it is clobbered is irrelevant.

>

> > Hence, I'm asking for clarification why it is thought that the existing

> > code underspecifies the asm constraints, and I'm trying to get some more

> > thought about what the constraints should be, in case there is a need to

> > use "better" constraints.

> >

>

> I think the constraints are correct, but as I argued before,

> tightening the constraints to ensure that uaddr and newval are not

> mapped onto the same register should not result in any object code

> changes, except for the case where the compiler instantiated a

> constprop clone that is bogus to begin with.


Compiling the following code

"""
#include <stdio.h>

static void foo(void *a, int b)
{
    asm("str %0, [%1]" :: "r"(a), "r"(b));
}

int main(void)
{
    foo(NULL, 0);
}
"""

with GCC 6.3 (at -O2) gives me

.arch armv7-a
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 18, 4
.file "futex.c"
.section .text.startup,"ax",%progbits
.align 1
.p2align 2,,3
.global main
.syntax unified
.thumb
.thumb_func
.fpu vfpv3-d16
.type main, %function
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movs r0, #0
.syntax unified
@ 6 "/tmp/futex.c" 1
str r0, [r0]
@ 0 "" 2
.thumb
.syntax unified
bx lr
.size main, .-main
.ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516"
.section .note.GNU-stack,"",%progbits

and so GCC definitely behaves similar in this regard.
Russell King (Oracle) March 8, 2019, 10:34 a.m. UTC | #9
On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:

> > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > > >

> > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > > > >

> > > > > > Passing registers containing zero as both the address (NULL pointer)

> > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > > > the same register for both inputs on ARM, which triggers a warning

> > > > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > > > >

> > > > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > > > >

> > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > > > as Mikael wrote:

> > > > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > > > >  "that prevents it from overlapping any other input or output."

> > > > > >

> > > > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > > > apparently never showed up again with later gcc versions.

> > > > > >

> > > > > > Now the same problem is back when compiling with clang, and we are trying

> > > > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > > > >

> > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > > > ---

> > > > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > > > >

> > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > > > index 0a46676b4245..79790912974e 100644

> > > > > > --- a/arch/arm/include/asm/futex.h

> > > > > > +++ b/arch/arm/include/asm/futex.h

> > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > > > >         preempt_disable();

> > > > > >         __ua_flags = uaccess_save_and_enable();

> > > > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > > > -       "       teq     %1, %2\n"

> > > > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > > > +       "       teq     %1, %3\n"

> > > > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > > > >         __futex_atomic_ex_table("%5")

> > > > > > -       : "+r" (ret), "=&r" (val)

> > > > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > > > >         : "cc", "memory");

> > > > > >         uaccess_restore(__ua_flags);

> > > > >

> > > > > Underspecification of constraints to extended inline assembly is a

> > > > > common issue exposed by other compilers (and possibly but in-effect

> > > > > infrequently compiler upgrades).

> > > > > So the reordering of the constraints means the in the assembly (notes

> > > > > for other reviewers):

> > > > > %2 -> %3

> > > > > %3 -> %4

> > > > > %4 -> %2

> > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> > > >

> > > > I don't see what is "underspecified" in the original constraints.

> > > > Please explain.

> > > >

> > >

> > > I agree that that statement makes little sense.

> > >

> > > As Russell points out in the referenced thread, there is nothing wrong

> > > with the generated assembly, given that the UNPREDICTABLE opcode is

> > > unreachable in practice. Unfortunately, we have no way to flag this

> > > diagnostic as a known false positive, and AFAICT, there is no reason

> > > we couldn't end up with the same diagnostic popping up for GCC builds

> > > in the future, considering that the register assignment matches the

> > > constraints. (We have seen somewhat similar issues where constant

> > > folded function clones are emitted with a constant argument that could

> > > never occur in reality [0])

> > >

> > > Given the above, the only meaningful way to invoke this function is

> > > with different registers assigned to %3 and %4, and so tightening the

> > > constraints to guarantee that does not actually result in worse code

> > > (except maybe for the instantiations that we won't ever call in the

> > > first place). So I think we should fix this.

> > >

> > > I wonder if just adding

> > >

> > > BUG_ON(__builtin_constant_p(uaddr));

> > >

> > > at the beginning makes any difference - this shouldn't result in any

> > > object code differences since the conditional will always evaluate to

> > > false at build time for instantiations we care about.

> > >

> > >

> > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/

> >

> > What I'm actually asking is:

> >

> > The GCC manual says that input operands _may_ overlap output operands

> > since GCC assumes that input operands are consumed before output

> > operands are written.  This is an explicit statement.

> >

> > The GCC manual does not say that input operands may overlap with each

> > other, and the behaviour of GCC thus far (apart from one version,

> > presumably caused by a bug) has been that input operands are unique.

> >

> 

> Not entirely. I have run into issues where GCC assumes that registers

> that are only used for input operands are left untouched by the asm

> code. I.e., if you put an asm() block in a loop and modify an input

> register, your code may break on the next pass, even if the input

> register does not overlap with an output register.


GCC has had the expectation for decades that _input_ operands are not
changed in value by the code in the assembly.  This isn't quite the
same thing as the uniqueness of the register allocation for input
operands.

> To me, that seems to suggest that whether or not inputs may overlap is

> irrelevant, since they are not expected to be modified.


How is:

	stmfd	sp!, {r0-r3, ip, lr}
	bl	foo
	ldmfd	sp!, {r0-r3, ip, lr}

where r1 may be an input operand (to pass an argument to foo) any
different from:

	ldrt	r0, [r1]

as far as whether r1 is modified in both cases?  In both cases, the
value of r1 is read and written by both instructions, but in both
cases the value of r1 remains the same no matter what the value of r1
was.

The "input operands should not be modified" is entirely orthogonal to
the input operand register allocation.

> > Clang appears to be different: it allows input operands that are

> > registers, and contain the same constant value to be the same physical

> > register.

> >

> > The assertion is that the constraints are under-specified.  I am

> > questioning that assertion.

> >

> > If the constraints are under-specified, I would have expected gcc-4.4's

> > behaviour to have persisted, and we would've been told by gcc's

> > developers to fix our code.  That didn't happen, and instead gcc seems

> > to have been fixed.  So, my conclusion is that it is intentional that

> > input operands to asm() do not overlap with themselves.

> >

> 

> Whether we hit the error or not is not deterministic. Like in the

> ilog2() case I quoted, GCC may decide to instantiate a constant folded

> ['curried', if you will] clone of a function, and so even if any calls

> to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval

> and uaddr are compiled, it does not mean they occur like that in the C

> code.


Again, I think this is different: gcc knows what the C code is doing and
can optimise it.  GCC doesn't have any idea what the code in an asm() is
doing beyond what the constraints are telling it, and the rules for
those constraints set out in the GCC manual.

Given that we are explicitly talking about the register allocation for
input operands, I'm not sure how the ilog2() case you mention applies.

> > It seems to me that the work-around for clang is to change every input

> > operand to be an output operand with a "+&r" contraint - an operand

> > that is both read and written by the "instruction", and that the operand

> > is "earlyclobber".  For something that is really only read, that seems

> > strange.

> >

> > Also, reading GCC's manual, it would appear that "+&" is wrong.

> >

> > `+'

> >      Means that this operand is both read and written by the

> >      instruction.

> >

> >      When the compiler fixes up the operands to satisfy the constraints,

> >      it needs to know which operands are inputs to the instruction and

> >      which are outputs from it.  `=' identifies an output; `+'

> >      identifies an operand that is both input and output; all other

> >                                    ^^^^^^^^^^^^^^^^^^^^^

> >      operands are assumed to be input only.

> >

> > `&'

> >      Means (in a particular alternative) that this operand is an

> >      "earlyclobber" operand, which is modified before the instruction is

> >      finished using the input operands.  Therefore, this operand may

> >      not lie in a register that is used as an input operand or as part

> >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> >      of any memory address.

> >

> > So "+" says that this operand is an input but "&" says that it must not

> > be in a register that is used as an input.  That's contradictory, and I

> > think we can expect GCC to barf or at least end up doing strange stuff,

> > if not with existing versions, then with future versions.

> >

> 

> I wondered about the same thing: given that the asm itself is a black

> box to the compiler, it can never reuse an in/output register for

> output, so when it is clobbered is irrelevant.


Let me try again - you seem to have completely missed my point.

+ specifies that the operand is an input.
& specifies that the operand is not an input.

+ and & are contradictory.

GCC is at liberty to not assign a value to an operand with a +&
modifier, or error out such a construction.

> 

> > Hence, I'm asking for clarification why it is thought that the existing

> > code underspecifies the asm constraints, and I'm trying to get some more

> > thought about what the constraints should be, in case there is a need to

> > use "better" constraints.

> 

> I think the constraints are correct, but as I argued before,

> tightening the constraints to ensure that uaddr and newval are not

> mapped onto the same register should not result in any object code

> changes, except for the case where the compiler instantiated a

> constprop clone that is bogus to begin with.


... by tightening it to an undefined combination of constraint modifiers
that just happens to seem to do the right thing.  No, this is not proper
"engineering".  This is bodging.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Ard Biesheuvel March 8, 2019, 10:45 a.m. UTC | #10
On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:

> > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin

> > <linux@armlinux.org.uk> wrote:

> > >

> > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:

> > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> > > > <linux@armlinux.org.uk> wrote:

> > > > >

> > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > > > > >

> > > > > > > Passing registers containing zero as both the address (NULL pointer)

> > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > > > > the same register for both inputs on ARM, which triggers a warning

> > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > > > > >

> > > > > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > > > > >

> > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > > > > as Mikael wrote:

> > > > > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > > > > >  "that prevents it from overlapping any other input or output."

> > > > > > >

> > > > > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > > > > apparently never showed up again with later gcc versions.

> > > > > > >

> > > > > > > Now the same problem is back when compiling with clang, and we are trying

> > > > > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > > > > >

> > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > > > > ---

> > > > > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > > > > >

> > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > > > > index 0a46676b4245..79790912974e 100644

> > > > > > > --- a/arch/arm/include/asm/futex.h

> > > > > > > +++ b/arch/arm/include/asm/futex.h

> > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > > > > >         preempt_disable();

> > > > > > >         __ua_flags = uaccess_save_and_enable();

> > > > > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > > > > -       "       teq     %1, %2\n"

> > > > > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > > > > +       "       teq     %1, %3\n"

> > > > > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > > > > >         __futex_atomic_ex_table("%5")

> > > > > > > -       : "+r" (ret), "=&r" (val)

> > > > > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > > > > >         : "cc", "memory");

> > > > > > >         uaccess_restore(__ua_flags);

> > > > > >

> > > > > > Underspecification of constraints to extended inline assembly is a

> > > > > > common issue exposed by other compilers (and possibly but in-effect

> > > > > > infrequently compiler upgrades).

> > > > > > So the reordering of the constraints means the in the assembly (notes

> > > > > > for other reviewers):

> > > > > > %2 -> %3

> > > > > > %3 -> %4

> > > > > > %4 -> %2

> > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> > > > >

> > > > > I don't see what is "underspecified" in the original constraints.

> > > > > Please explain.

> > > > >

> > > >

> > > > I agree that that statement makes little sense.

> > > >

> > > > As Russell points out in the referenced thread, there is nothing wrong

> > > > with the generated assembly, given that the UNPREDICTABLE opcode is

> > > > unreachable in practice. Unfortunately, we have no way to flag this

> > > > diagnostic as a known false positive, and AFAICT, there is no reason

> > > > we couldn't end up with the same diagnostic popping up for GCC builds

> > > > in the future, considering that the register assignment matches the

> > > > constraints. (We have seen somewhat similar issues where constant

> > > > folded function clones are emitted with a constant argument that could

> > > > never occur in reality [0])

> > > >

> > > > Given the above, the only meaningful way to invoke this function is

> > > > with different registers assigned to %3 and %4, and so tightening the

> > > > constraints to guarantee that does not actually result in worse code

> > > > (except maybe for the instantiations that we won't ever call in the

> > > > first place). So I think we should fix this.

> > > >

> > > > I wonder if just adding

> > > >

> > > > BUG_ON(__builtin_constant_p(uaddr));

> > > >

> > > > at the beginning makes any difference - this shouldn't result in any

> > > > object code differences since the conditional will always evaluate to

> > > > false at build time for instantiations we care about.

> > > >

> > > >

> > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/

> > >

> > > What I'm actually asking is:

> > >

> > > The GCC manual says that input operands _may_ overlap output operands

> > > since GCC assumes that input operands are consumed before output

> > > operands are written.  This is an explicit statement.

> > >

> > > The GCC manual does not say that input operands may overlap with each

> > > other, and the behaviour of GCC thus far (apart from one version,

> > > presumably caused by a bug) has been that input operands are unique.

> > >

> >

> > Not entirely. I have run into issues where GCC assumes that registers

> > that are only used for input operands are left untouched by the asm

> > code. I.e., if you put an asm() block in a loop and modify an input

> > register, your code may break on the next pass, even if the input

> > register does not overlap with an output register.

>

> GCC has had the expectation for decades that _input_ operands are not

> changed in value by the code in the assembly.  This isn't quite the

> same thing as the uniqueness of the register allocation for input

> operands.

>

> > To me, that seems to suggest that whether or not inputs may overlap is

> > irrelevant, since they are not expected to be modified.

>

> How is:

>

>         stmfd   sp!, {r0-r3, ip, lr}

>         bl      foo

>         ldmfd   sp!, {r0-r3, ip, lr}

>

> where r1 may be an input operand (to pass an argument to foo) any

> different from:

>

>         ldrt    r0, [r1]

>

> as far as whether r1 is modified in both cases?  In both cases, the

> value of r1 is read and written by both instructions, but in both

> cases the value of r1 remains the same no matter what the value of r1

> was.

>

> The "input operands should not be modified" is entirely orthogonal to

> the input operand register allocation.

>


The question is whether it is reasonable for GCC to use the same
register for input operands that have the same value. From the
assumption that GCC makes that the asm will not modified follows
directly that we can use the same register for different operands.

And in fact, since that asm code (when built in ARM mode) does modify
the register, uaddr should not be an input operand to begin with. In
other words, there is an actual bug here, and this patch fixes it.



> > > Clang appears to be different: it allows input operands that are

> > > registers, and contain the same constant value to be the same physical

> > > register.

> > >

> > > The assertion is that the constraints are under-specified.  I am

> > > questioning that assertion.

> > >

> > > If the constraints are under-specified, I would have expected gcc-4.4's

> > > behaviour to have persisted, and we would've been told by gcc's

> > > developers to fix our code.  That didn't happen, and instead gcc seems

> > > to have been fixed.  So, my conclusion is that it is intentional that

> > > input operands to asm() do not overlap with themselves.

> > >

> >

> > Whether we hit the error or not is not deterministic. Like in the

> > ilog2() case I quoted, GCC may decide to instantiate a constant folded

> > ['curried', if you will] clone of a function, and so even if any calls

> > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval

> > and uaddr are compiled, it does not mean they occur like that in the C

> > code.

>

> Again, I think this is different: gcc knows what the C code is doing and

> can optimise it.  GCC doesn't have any idea what the code in an asm() is

> doing beyond what the constraints are telling it, and the rules for

> those constraints set out in the GCC manual.

>

> Given that we are explicitly talking about the register allocation for

> input operands, I'm not sure how the ilog2() case you mention applies.

>


The relevance of the ilog2() case is that we are dealing with an
invocation of the function that never actually occurs in the code. The
compiler emits it as part of an optimization step, and this is how we
end up with constant operands for newval and uaddr.

> > > It seems to me that the work-around for clang is to change every input

> > > operand to be an output operand with a "+&r" contraint - an operand

> > > that is both read and written by the "instruction", and that the operand

> > > is "earlyclobber".  For something that is really only read, that seems

> > > strange.

> > >

> > > Also, reading GCC's manual, it would appear that "+&" is wrong.

> > >

> > > `+'

> > >      Means that this operand is both read and written by the

> > >      instruction.

> > >

> > >      When the compiler fixes up the operands to satisfy the constraints,

> > >      it needs to know which operands are inputs to the instruction and

> > >      which are outputs from it.  `=' identifies an output; `+'

> > >      identifies an operand that is both input and output; all other

> > >                                    ^^^^^^^^^^^^^^^^^^^^^

> > >      operands are assumed to be input only.

> > >

> > > `&'

> > >      Means (in a particular alternative) that this operand is an

> > >      "earlyclobber" operand, which is modified before the instruction is

> > >      finished using the input operands.  Therefore, this operand may

> > >      not lie in a register that is used as an input operand or as part

> > >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > >      of any memory address.

> > >

> > > So "+" says that this operand is an input but "&" says that it must not

> > > be in a register that is used as an input.  That's contradictory, and I

> > > think we can expect GCC to barf or at least end up doing strange stuff,

> > > if not with existing versions, then with future versions.

> > >

> >

> > I wondered about the same thing: given that the asm itself is a black

> > box to the compiler, it can never reuse an in/output register for

> > output, so when it is clobbered is irrelevant.

>

> Let me try again - you seem to have completely missed my point.

>

> + specifies that the operand is an input.

> & specifies that the operand is not an input.

>

> + and & are contradictory.

>

> GCC is at liberty to not assign a value to an operand with a +&

> modifier, or error out such a construction.

>


I agree that the +& does not make sense.

> >

> > > Hence, I'm asking for clarification why it is thought that the existing

> > > code underspecifies the asm constraints, and I'm trying to get some more

> > > thought about what the constraints should be, in case there is a need to

> > > use "better" constraints.

> >

> > I think the constraints are correct, but as I argued before,

> > tightening the constraints to ensure that uaddr and newval are not

> > mapped onto the same register should not result in any object code

> > changes, except for the case where the compiler instantiated a

> > constprop clone that is bogus to begin with.

>

> ... by tightening it to an undefined combination of constraint modifiers

> that just happens to seem to do the right thing.  No, this is not proper

> "engineering".  This is bodging.

>


As I argued above, using an input operand for uaddr is incorrect (in
ARM mode) since the instruction does modify the register. So modulo
the +&, I think the patch is an improvement.
Russell King (Oracle) March 8, 2019, 10:56 a.m. UTC | #11
On Fri, Mar 08, 2019 at 11:16:47AM +0100, Ard Biesheuvel wrote:
> Compiling the following code

> 

> """

> #include <stdio.h>

> 

> static void foo(void *a, int b)

> {

>     asm("str %0, [%1]" :: "r"(a), "r"(b));

> }

> 

> int main(void)

> {

>     foo(NULL, 0);

> }

> """

> 

> with GCC 6.3 (at -O2) gives me

> 

> .arch armv7-a

> .eabi_attribute 28, 1

> .eabi_attribute 20, 1

> .eabi_attribute 21, 1

> .eabi_attribute 23, 3

> .eabi_attribute 24, 1

> .eabi_attribute 25, 1

> .eabi_attribute 26, 2

> .eabi_attribute 30, 2

> .eabi_attribute 34, 1

> .eabi_attribute 18, 4

> .file "futex.c"

> .section .text.startup,"ax",%progbits

> .align 1

> .p2align 2,,3

> .global main

> .syntax unified

> .thumb

> .thumb_func

> .fpu vfpv3-d16

> .type main, %function

> main:

> @ args = 0, pretend = 0, frame = 0

> @ frame_needed = 0, uses_anonymous_args = 0

> @ link register save eliminated.

> movs r0, #0

> .syntax unified

> @ 6 "/tmp/futex.c" 1

> str r0, [r0]

> @ 0 "" 2

> .thumb

> .syntax unified

> bx lr

> .size main, .-main

> .ident "GCC: (Debian 6.3.0-18) 6.3.0 20170516"

> .section .note.GNU-stack,"",%progbits

> 

> and so GCC definitely behaves similar in this regard.


Let's take this further - a volatile is required for these cases to
avoid gcc eliminating the asm() due to the output not being used:

#define NULL ((void *)0)
static void foo(void *a, int b)
{
    asm volatile("str %1, [%0]" : "=&r" (a) : "0" (a), "r" (b));
}
int main(void)
{
    foo(NULL, 0);
}

produces:

        mov     r3, #0
        mov     r2, r3
        str r2, [r2]

which looks to me to be incorrect to the GCC manual - the '&' on the
output operand should mean that it does not conflict with other input
operands, but clearly 'r2' has ended up being 'b' as well.  I suspect
this is a bug, or if not, is completely counter-intuitive from the
description in the GCC manual.

Using "+r" (a) : "r" (b) also results in:

        mov     r3, #0
        str r3, [r3]

It seems that only using "+&r" (a) : "r" (b) avoids a and b being in
the same register, but I question whether we are stepping into
undefined compiler behaviour with that.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Russell King (Oracle) March 8, 2019, 10:58 a.m. UTC | #12
On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:

> > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > > >

> > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:

> > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> > > > > <linux@armlinux.org.uk> wrote:

> > > > > >

> > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > > > > > >

> > > > > > > > Passing registers containing zero as both the address (NULL pointer)

> > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > > > > > the same register for both inputs on ARM, which triggers a warning

> > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > > > > > >

> > > > > > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > > > > > >

> > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > > > > > as Mikael wrote:

> > > > > > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > > > > > >  "that prevents it from overlapping any other input or output."

> > > > > > > >

> > > > > > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > > > > > apparently never showed up again with later gcc versions.

> > > > > > > >

> > > > > > > > Now the same problem is back when compiling with clang, and we are trying

> > > > > > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > > > > > >

> > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > > > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > > > > > ---

> > > > > > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > > > > > >

> > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > > > > > index 0a46676b4245..79790912974e 100644

> > > > > > > > --- a/arch/arm/include/asm/futex.h

> > > > > > > > +++ b/arch/arm/include/asm/futex.h

> > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > > > > > >         preempt_disable();

> > > > > > > >         __ua_flags = uaccess_save_and_enable();

> > > > > > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > > > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > > > > > -       "       teq     %1, %2\n"

> > > > > > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > > > > > +       "       teq     %1, %3\n"

> > > > > > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > > > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > > > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > > > > > >         __futex_atomic_ex_table("%5")

> > > > > > > > -       : "+r" (ret), "=&r" (val)

> > > > > > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > > > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > > > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > > > > > >         : "cc", "memory");

> > > > > > > >         uaccess_restore(__ua_flags);

> > > > > > >

> > > > > > > Underspecification of constraints to extended inline assembly is a

> > > > > > > common issue exposed by other compilers (and possibly but in-effect

> > > > > > > infrequently compiler upgrades).

> > > > > > > So the reordering of the constraints means the in the assembly (notes

> > > > > > > for other reviewers):

> > > > > > > %2 -> %3

> > > > > > > %3 -> %4

> > > > > > > %4 -> %2

> > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> > > > > >

> > > > > > I don't see what is "underspecified" in the original constraints.

> > > > > > Please explain.

> > > > > >

> > > > >

> > > > > I agree that that statement makes little sense.

> > > > >

> > > > > As Russell points out in the referenced thread, there is nothing wrong

> > > > > with the generated assembly, given that the UNPREDICTABLE opcode is

> > > > > unreachable in practice. Unfortunately, we have no way to flag this

> > > > > diagnostic as a known false positive, and AFAICT, there is no reason

> > > > > we couldn't end up with the same diagnostic popping up for GCC builds

> > > > > in the future, considering that the register assignment matches the

> > > > > constraints. (We have seen somewhat similar issues where constant

> > > > > folded function clones are emitted with a constant argument that could

> > > > > never occur in reality [0])

> > > > >

> > > > > Given the above, the only meaningful way to invoke this function is

> > > > > with different registers assigned to %3 and %4, and so tightening the

> > > > > constraints to guarantee that does not actually result in worse code

> > > > > (except maybe for the instantiations that we won't ever call in the

> > > > > first place). So I think we should fix this.

> > > > >

> > > > > I wonder if just adding

> > > > >

> > > > > BUG_ON(__builtin_constant_p(uaddr));

> > > > >

> > > > > at the beginning makes any difference - this shouldn't result in any

> > > > > object code differences since the conditional will always evaluate to

> > > > > false at build time for instantiations we care about.

> > > > >

> > > > >

> > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/

> > > >

> > > > What I'm actually asking is:

> > > >

> > > > The GCC manual says that input operands _may_ overlap output operands

> > > > since GCC assumes that input operands are consumed before output

> > > > operands are written.  This is an explicit statement.

> > > >

> > > > The GCC manual does not say that input operands may overlap with each

> > > > other, and the behaviour of GCC thus far (apart from one version,

> > > > presumably caused by a bug) has been that input operands are unique.

> > > >

> > >

> > > Not entirely. I have run into issues where GCC assumes that registers

> > > that are only used for input operands are left untouched by the asm

> > > code. I.e., if you put an asm() block in a loop and modify an input

> > > register, your code may break on the next pass, even if the input

> > > register does not overlap with an output register.

> >

> > GCC has had the expectation for decades that _input_ operands are not

> > changed in value by the code in the assembly.  This isn't quite the

> > same thing as the uniqueness of the register allocation for input

> > operands.

> >

> > > To me, that seems to suggest that whether or not inputs may overlap is

> > > irrelevant, since they are not expected to be modified.

> >

> > How is:

> >

> >         stmfd   sp!, {r0-r3, ip, lr}

> >         bl      foo

> >         ldmfd   sp!, {r0-r3, ip, lr}

> >

> > where r1 may be an input operand (to pass an argument to foo) any

> > different from:

> >

> >         ldrt    r0, [r1]

> >

> > as far as whether r1 is modified in both cases?  In both cases, the

> > value of r1 is read and written by both instructions, but in both

> > cases the value of r1 remains the same no matter what the value of r1

> > was.

> >

> > The "input operands should not be modified" is entirely orthogonal to

> > the input operand register allocation.

> >

> 

> The question is whether it is reasonable for GCC to use the same

> register for input operands that have the same value. From the

> assumption that GCC makes that the asm will not modified follows

> directly that we can use the same register for different operands.

> 

> And in fact, since that asm code (when built in ARM mode) does modify

> the register, uaddr should not be an input operand to begin with. In

> other words, there is an actual bug here, and this patch fixes it.


Again, you miss my point.

> > > > Clang appears to be different: it allows input operands that are

> > > > registers, and contain the same constant value to be the same physical

> > > > register.

> > > >

> > > > The assertion is that the constraints are under-specified.  I am

> > > > questioning that assertion.

> > > >

> > > > If the constraints are under-specified, I would have expected gcc-4.4's

> > > > behaviour to have persisted, and we would've been told by gcc's

> > > > developers to fix our code.  That didn't happen, and instead gcc seems

> > > > to have been fixed.  So, my conclusion is that it is intentional that

> > > > input operands to asm() do not overlap with themselves.

> > > >

> > >

> > > Whether we hit the error or not is not deterministic. Like in the

> > > ilog2() case I quoted, GCC may decide to instantiate a constant folded

> > > ['curried', if you will] clone of a function, and so even if any calls

> > > to futex_atomic_cmpxchg_inatomic() with constant NULL args for newval

> > > and uaddr are compiled, it does not mean they occur like that in the C

> > > code.

> >

> > Again, I think this is different: gcc knows what the C code is doing and

> > can optimise it.  GCC doesn't have any idea what the code in an asm() is

> > doing beyond what the constraints are telling it, and the rules for

> > those constraints set out in the GCC manual.

> >

> > Given that we are explicitly talking about the register allocation for

> > input operands, I'm not sure how the ilog2() case you mention applies.

> >

> 

> The relevance of the ilog2() case is that we are dealing with an

> invocation of the function that never actually occurs in the code. The

> compiler emits it as part of an optimization step, and this is how we

> end up with constant operands for newval and uaddr.

> 

> > > > It seems to me that the work-around for clang is to change every input

> > > > operand to be an output operand with a "+&r" contraint - an operand

> > > > that is both read and written by the "instruction", and that the operand

> > > > is "earlyclobber".  For something that is really only read, that seems

> > > > strange.

> > > >

> > > > Also, reading GCC's manual, it would appear that "+&" is wrong.

> > > >

> > > > `+'

> > > >      Means that this operand is both read and written by the

> > > >      instruction.

> > > >

> > > >      When the compiler fixes up the operands to satisfy the constraints,

> > > >      it needs to know which operands are inputs to the instruction and

> > > >      which are outputs from it.  `=' identifies an output; `+'

> > > >      identifies an operand that is both input and output; all other

> > > >                                    ^^^^^^^^^^^^^^^^^^^^^

> > > >      operands are assumed to be input only.

> > > >

> > > > `&'

> > > >      Means (in a particular alternative) that this operand is an

> > > >      "earlyclobber" operand, which is modified before the instruction is

> > > >      finished using the input operands.  Therefore, this operand may

> > > >      not lie in a register that is used as an input operand or as part

> > > >      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > > >      of any memory address.

> > > >

> > > > So "+" says that this operand is an input but "&" says that it must not

> > > > be in a register that is used as an input.  That's contradictory, and I

> > > > think we can expect GCC to barf or at least end up doing strange stuff,

> > > > if not with existing versions, then with future versions.

> > > >

> > >

> > > I wondered about the same thing: given that the asm itself is a black

> > > box to the compiler, it can never reuse an in/output register for

> > > output, so when it is clobbered is irrelevant.

> >

> > Let me try again - you seem to have completely missed my point.

> >

> > + specifies that the operand is an input.

> > & specifies that the operand is not an input.

> >

> > + and & are contradictory.

> >

> > GCC is at liberty to not assign a value to an operand with a +&

> > modifier, or error out such a construction.

> >

> 

> I agree that the +& does not make sense.

> 

> > >

> > > > Hence, I'm asking for clarification why it is thought that the existing

> > > > code underspecifies the asm constraints, and I'm trying to get some more

> > > > thought about what the constraints should be, in case there is a need to

> > > > use "better" constraints.

> > >

> > > I think the constraints are correct, but as I argued before,

> > > tightening the constraints to ensure that uaddr and newval are not

> > > mapped onto the same register should not result in any object code

> > > changes, except for the case where the compiler instantiated a

> > > constprop clone that is bogus to begin with.

> >

> > ... by tightening it to an undefined combination of constraint modifiers

> > that just happens to seem to do the right thing.  No, this is not proper

> > "engineering".  This is bodging.

> >

> 

> As I argued above, using an input operand for uaddr is incorrect (in

> ARM mode) since the instruction does modify the register. So modulo

> the +&, I think the patch is an improvement.

> 


-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up
Dave Martin March 8, 2019, 11:55 a.m. UTC | #13
oN fRI, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:
> On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin

> <linux@armlinux.org.uk> wrote:

> >

> > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:

> > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin

> > > <linux@armlinux.org.uk> wrote:

> > > >

> > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:

> > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> > > > > <linux@armlinux.org.uk> wrote:

> > > > > >

> > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > > > > > >

> > > > > > > > Passing registers containing zero as both the address (NULL pointer)

> > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > > > > > the same register for both inputs on ARM, which triggers a warning

> > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > > > > > >

> > > > > > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > > > > > >

> > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > > > > > as Mikael wrote:

> > > > > > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > > > > > >  "that prevents it from overlapping any other input or output."

> > > > > > > >

> > > > > > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > > > > > apparently never showed up again with later gcc versions.

> > > > > > > >

> > > > > > > > Now the same problem is back when compiling with clang, and we are trying

> > > > > > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > > > > > >

> > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > > > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > > > > > ---

> > > > > > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > > > > > >

> > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > > > > > index 0a46676b4245..79790912974e 100644

> > > > > > > > --- a/arch/arm/include/asm/futex.h

> > > > > > > > +++ b/arch/arm/include/asm/futex.h

> > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > > > > > >         preempt_disable();

> > > > > > > >         __ua_flags = uaccess_save_and_enable();

> > > > > > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > > > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > > > > > -       "       teq     %1, %2\n"

> > > > > > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > > > > > +       "       teq     %1, %3\n"

> > > > > > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > > > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > > > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > > > > > >         __futex_atomic_ex_table("%5")

> > > > > > > > -       : "+r" (ret), "=&r" (val)

> > > > > > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > > > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > > > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > > > > > >         : "cc", "memory");

> > > > > > > >         uaccess_restore(__ua_flags);

> > > > > > >

> > > > > > > Underspecification of constraints to extended inline assembly is a

> > > > > > > common issue exposed by other compilers (and possibly but in-effect

> > > > > > > infrequently compiler upgrades).

> > > > > > > So the reordering of the constraints means the in the assembly (notes

> > > > > > > for other reviewers):

> > > > > > > %2 -> %3

> > > > > > > %3 -> %4

> > > > > > > %4 -> %2

> > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> > > > > >

> > > > > > I don't see what is "underspecified" in the original constraints.

> > > > > > Please explain.

> > > > > >

> > > > >

> > > > > I agree that that statement makes little sense.

> > > > >

> > > > > As Russell points out in the referenced thread, there is nothing wrong

> > > > > with the generated assembly, given that the UNPREDICTABLE opcode is

> > > > > unreachable in practice. Unfortunately, we have no way to flag this

> > > > > diagnostic as a known false positive, and AFAICT, there is no reason

> > > > > we couldn't end up with the same diagnostic popping up for GCC builds

> > > > > in the future, considering that the register assignment matches the

> > > > > constraints. (We have seen somewhat similar issues where constant

> > > > > folded function clones are emitted with a constant argument that could

> > > > > never occur in reality [0])

> > > > >

> > > > > Given the above, the only meaningful way to invoke this function is

> > > > > with different registers assigned to %3 and %4, and so tightening the

> > > > > constraints to guarantee that does not actually result in worse code

> > > > > (except maybe for the instantiations that we won't ever call in the

> > > > > first place). So I think we should fix this.

> > > > >

> > > > > I wonder if just adding

> > > > >

> > > > > BUG_ON(__builtin_constant_p(uaddr));

> > > > >

> > > > > at the beginning makes any difference - this shouldn't result in any

> > > > > object code differences since the conditional will always evaluate to

> > > > > false at build time for instantiations we care about.

> > > > >

> > > > >

> > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/

> > > >

> > > > What I'm actually asking is:

> > > >

> > > > The GCC manual says that input operands _may_ overlap output operands

> > > > since GCC assumes that input operands are consumed before output

> > > > operands are written.  This is an explicit statement.

> > > >

> > > > The GCC manual does not say that input operands may overlap with each

> > > > other, and the behaviour of GCC thus far (apart from one version,

> > > > presumably caused by a bug) has been that input operands are unique.

> > > >

> > >

> > > Not entirely. I have run into issues where GCC assumes that registers

> > > that are only used for input operands are left untouched by the asm

> > > code. I.e., if you put an asm() block in a loop and modify an input

> > > register, your code may break on the next pass, even if the input

> > > register does not overlap with an output register.

> >

> > GCC has had the expectation for decades that _input_ operands are not

> > changed in value by the code in the assembly.  This isn't quite the

> > same thing as the uniqueness of the register allocation for input

> > operands.

> >

> > > To me, that seems to suggest that whether or not inputs may overlap is

> > > irrelevant, since they are not expected to be modified.

> >

> > How is:

> >

> >         stmfd   sp!, {r0-r3, ip, lr}

> >         bl      foo

> >         ldmfd   sp!, {r0-r3, ip, lr}

> >

> > where r1 may be an input operand (to pass an argument to foo) any

> > different from:

> >

> >         ldrt    r0, [r1]

> >

> > as far as whether r1 is modified in both cases?  In both cases, the

> > value of r1 is read and written by both instructions, but in both

> > cases the value of r1 remains the same no matter what the value of r1

> > was.

> >

> > The "input operands should not be modified" is entirely orthogonal to

> > the input operand register allocation.

> >

> 

> The question is whether it is reasonable for GCC to use the same

> register for input operands that have the same value. From the

> assumption that GCC makes that the asm will not modified follows

> directly that we can use the same register for different operands.


Whether "reasonable" or not, GCC does it.  And I don't think this is new
behaviour...

int f(void)
{
	int res;

	asm ("ADD %0, %0, %0" : "=r" (res) : "r" (77), "r" (77));
	
	return res;
}

->

00000000 <f>:
   0:   e3a0004d       mov      r0, #77 ; 0x4d
   4:   e0800000       add      r0, r0, r0
   8:   e12fff1e       bx       lr

> And in fact, since that asm code (when built in ARM mode) does modify

> the register, uaddr should not be an input operand to begin with. In

> other words, there is an actual bug here, and this patch fixes it.


Does the old code modify the register?

As I read it, the register is written (in the ARM case) by the
underlying STRT instruction, but since the post-index offset it 0, the
value written back is the same as the value originally read.

In ARMv7-A,

	strt	r0, [r0], #imm

will store the original (unmodified) value of r0 if I read the
pseudocode correctly.  I can't remember the history, but I think that
older architecture versions provided a choice about whether the
unmodified or modified value was stored.  So gas probably just checks
whether the registers are the same and emits a warning to be on the safe
side.  If #imm is 0 (as in the existing futex code here) then it may
make no difference in practice though.

So, I'm not absolutely convinced there's a bug here, unless this is
truly specified as UNPREDICATABLE in older arch versions.

But the warning it at least annoying and use of "&" to prevent gas
allocating things to the same register is already widespread for "+"
asm arguments, even for arm.

If the _value_ of the affected operand is not changed by the asm, I
think we don't strictly need "+", but we are using "&" here for its
register allocation side-effects, and "&" (for its original purpose at
least) is only applicable to output ("=" or "+") operands.

So I think the patch probably makes sense.

IMHO the gas documentation is misleading (or at least unhelpful).

Cheers
---Dave
Ard Biesheuvel March 8, 2019, 11:55 a.m. UTC | #14
On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin
<linux@armlinux.org.uk> wrote:
>

> On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:

> > On Fri, 8 Mar 2019 at 11:34, Russell King - ARM Linux admin

> > <linux@armlinux.org.uk> wrote:

> > >

> > > On Fri, Mar 08, 2019 at 11:08:40AM +0100, Ard Biesheuvel wrote:

> > > > On Fri, 8 Mar 2019 at 10:53, Russell King - ARM Linux admin

> > > > <linux@armlinux.org.uk> wrote:

> > > > >

> > > > > On Fri, Mar 08, 2019 at 09:57:45AM +0100, Ard Biesheuvel wrote:

> > > > > > On Fri, 8 Mar 2019 at 00:49, Russell King - ARM Linux admin

> > > > > > <linux@armlinux.org.uk> wrote:

> > > > > > >

> > > > > > > On Thu, Mar 07, 2019 at 11:39:08AM -0800, Nick Desaulniers wrote:

> > > > > > > > On Thu, Mar 7, 2019 at 1:15 AM Arnd Bergmann <arnd@arndb.de> wrote:

> > > > > > > > >

> > > > > > > > > Passing registers containing zero as both the address (NULL pointer)

> > > > > > > > > and data into cmpxchg_futex_value_locked() leads clang to assign

> > > > > > > > > the same register for both inputs on ARM, which triggers a warning

> > > > > > > > > explaining that this instruction has unpredictable behavior on ARMv5.

> > > > > > > > >

> > > > > > > > > /tmp/futex-7e740e.s: Assembler messages:

> > > > > > > > > /tmp/futex-7e740e.s:12713: Warning: source register same as write-back base

> > > > > > > > >

> > > > > > > > > This patch was suggested by Mikael Pettersson back in 2011 (!) with gcc-4.4,

> > > > > > > > > as Mikael wrote:

> > > > > > > > >  "One way of fixing this is to make uaddr an input/output register, since

> > > > > > > > >  "that prevents it from overlapping any other input or output."

> > > > > > > > >

> > > > > > > > > but then withdrawn as the warning was determined to be harmless, and it

> > > > > > > > > apparently never showed up again with later gcc versions.

> > > > > > > > >

> > > > > > > > > Now the same problem is back when compiling with clang, and we are trying

> > > > > > > > > to get clang to build the kernel without warnings, as gcc normally does.

> > > > > > > > >

> > > > > > > > > Cc: Mikael Pettersson <mikpe@it.uu.se>

> > > > > > > > > Cc: Mikael Pettersson <mikpelinux@gmail.com>

> > > > > > > > > Cc: Dave Martin <Dave.Martin@arm.com>

> > > > > > > > > Link: https://lore.kernel.org/linux-arm-kernel/20009.45690.158286.161591@pilspetsen.it.uu.se/

> > > > > > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de>

> > > > > > > > > ---

> > > > > > > > >  arch/arm/include/asm/futex.h | 10 +++++-----

> > > > > > > > >  1 file changed, 5 insertions(+), 5 deletions(-)

> > > > > > > > >

> > > > > > > > > diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> > > > > > > > > index 0a46676b4245..79790912974e 100644

> > > > > > > > > --- a/arch/arm/include/asm/futex.h

> > > > > > > > > +++ b/arch/arm/include/asm/futex.h

> > > > > > > > > @@ -110,13 +110,13 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

> > > > > > > > >         preempt_disable();

> > > > > > > > >         __ua_flags = uaccess_save_and_enable();

> > > > > > > > >         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

> > > > > > > > > -       "1:     " TUSER(ldr) "  %1, [%4]\n"

> > > > > > > > > -       "       teq     %1, %2\n"

> > > > > > > > > +       "1:     " TUSER(ldr) "  %1, [%2]\n"

> > > > > > > > > +       "       teq     %1, %3\n"

> > > > > > > > >         "       it      eq      @ explicit IT needed for the 2b label\n"

> > > > > > > > > -       "2:     " TUSER(streq) "        %3, [%4]\n"

> > > > > > > > > +       "2:     " TUSER(streq) "        %4, [%2]\n"

> > > > > > > > >         __futex_atomic_ex_table("%5")

> > > > > > > > > -       : "+r" (ret), "=&r" (val)

> > > > > > > > > -       : "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)

> > > > > > > > > +       : "+&r" (ret), "=&r" (val), "+&r" (uaddr)

> > > > > > > > > +       : "r" (oldval), "r" (newval), "Ir" (-EFAULT)

> > > > > > > > >         : "cc", "memory");

> > > > > > > > >         uaccess_restore(__ua_flags);

> > > > > > > >

> > > > > > > > Underspecification of constraints to extended inline assembly is a

> > > > > > > > common issue exposed by other compilers (and possibly but in-effect

> > > > > > > > infrequently compiler upgrades).

> > > > > > > > So the reordering of the constraints means the in the assembly (notes

> > > > > > > > for other reviewers):

> > > > > > > > %2 -> %3

> > > > > > > > %3 -> %4

> > > > > > > > %4 -> %2

> > > > > > > > Yep, looks good to me, thanks for finding this old patch and resending, Arnd!

> > > > > > >

> > > > > > > I don't see what is "underspecified" in the original constraints.

> > > > > > > Please explain.

> > > > > > >

> > > > > >

> > > > > > I agree that that statement makes little sense.

> > > > > >

> > > > > > As Russell points out in the referenced thread, there is nothing wrong

> > > > > > with the generated assembly, given that the UNPREDICTABLE opcode is

> > > > > > unreachable in practice. Unfortunately, we have no way to flag this

> > > > > > diagnostic as a known false positive, and AFAICT, there is no reason

> > > > > > we couldn't end up with the same diagnostic popping up for GCC builds

> > > > > > in the future, considering that the register assignment matches the

> > > > > > constraints. (We have seen somewhat similar issues where constant

> > > > > > folded function clones are emitted with a constant argument that could

> > > > > > never occur in reality [0])

> > > > > >

> > > > > > Given the above, the only meaningful way to invoke this function is

> > > > > > with different registers assigned to %3 and %4, and so tightening the

> > > > > > constraints to guarantee that does not actually result in worse code

> > > > > > (except maybe for the instantiations that we won't ever call in the

> > > > > > first place). So I think we should fix this.

> > > > > >

> > > > > > I wonder if just adding

> > > > > >

> > > > > > BUG_ON(__builtin_constant_p(uaddr));

> > > > > >

> > > > > > at the beginning makes any difference - this shouldn't result in any

> > > > > > object code differences since the conditional will always evaluate to

> > > > > > false at build time for instantiations we care about.

> > > > > >

> > > > > >

> > > > > > [0] https://lore.kernel.org/lkml/9c74d635-d0d1-0893-8093-ce20b0933fc7@redhat.com/

> > > > >

> > > > > What I'm actually asking is:

> > > > >

> > > > > The GCC manual says that input operands _may_ overlap output operands

> > > > > since GCC assumes that input operands are consumed before output

> > > > > operands are written.  This is an explicit statement.

> > > > >

> > > > > The GCC manual does not say that input operands may overlap with each

> > > > > other, and the behaviour of GCC thus far (apart from one version,

> > > > > presumably caused by a bug) has been that input operands are unique.

> > > > >

> > > >

> > > > Not entirely. I have run into issues where GCC assumes that registers

> > > > that are only used for input operands are left untouched by the asm

> > > > code. I.e., if you put an asm() block in a loop and modify an input

> > > > register, your code may break on the next pass, even if the input

> > > > register does not overlap with an output register.

> > >

> > > GCC has had the expectation for decades that _input_ operands are not

> > > changed in value by the code in the assembly.  This isn't quite the

> > > same thing as the uniqueness of the register allocation for input

> > > operands.

> > >

> > > > To me, that seems to suggest that whether or not inputs may overlap is

> > > > irrelevant, since they are not expected to be modified.

> > >

> > > How is:

> > >

> > >         stmfd   sp!, {r0-r3, ip, lr}

> > >         bl      foo

> > >         ldmfd   sp!, {r0-r3, ip, lr}

> > >

> > > where r1 may be an input operand (to pass an argument to foo) any

> > > different from:

> > >

> > >         ldrt    r0, [r1]

> > >

> > > as far as whether r1 is modified in both cases?  In both cases, the

> > > value of r1 is read and written by both instructions, but in both

> > > cases the value of r1 remains the same no matter what the value of r1

> > > was.

> > >

> > > The "input operands should not be modified" is entirely orthogonal to

> > > the input operand register allocation.

> > >

> >

> > The question is whether it is reasonable for GCC to use the same

> > register for input operands that have the same value. From the

> > assumption that GCC makes that the asm will not modified follows

> > directly that we can use the same register for different operands.

> >

> > And in fact, since that asm code (when built in ARM mode) does modify

> > the register, uaddr should not be an input operand to begin with. In

> > other words, there is an actual bug here, and this patch fixes it.

>

> Again, you miss my point.

>


Perhaps. So let me summarize what I do understand.

1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed*
with the same compile time constant value of 0x0 for newval and uaddr,
we end up with an opcode for the STRT instruction that is CONSTRAINED
UNPREDICTABLE, but we will never execute it since the preceding load
will fault and enter the fixup handler.
2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to
occur in the code, but may be instantiated by the compiler when doing
constant propagation (like in the ilog2() case I quoted), but these
instantiations will never be called
3) both clang and gcc may map different inline asm input operands onto
the same register if the value is guaranteed to be the same (i.e.,
they are both compile time constants)

My statement about uaddr was slightly misguided, in the sense that our
invocation of STRT does use the post-index variant, but with an
increment of zero, so the value written back to the register equals
the original value. But it does explain why this opcode is CONSTRAINED
UNPREDICTABLE in the first place.

Given 2) above, I wonder if anyone could confirm whether adding
'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well.
Arnd Bergmann March 11, 2019, 2:34 p.m. UTC | #15
On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:

> > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:


>

> Perhaps. So let me summarize what I do understand.

>

> 1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed*

> with the same compile time constant value of 0x0 for newval and uaddr,

> we end up with an opcode for the STRT instruction that is CONSTRAINED

> UNPREDICTABLE, but we will never execute it since the preceding load

> will fault and enter the fixup handler.

> 2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to

> occur in the code, but may be instantiated by the compiler when doing

> constant propagation (like in the ilog2() case I quoted), but these

> instantiations will never be called

> 3) both clang and gcc may map different inline asm input operands onto

> the same register if the value is guaranteed to be the same (i.e.,

> they are both compile time constants)

>

> My statement about uaddr was slightly misguided, in the sense that our

> invocation of STRT does use the post-index variant, but with an

> increment of zero, so the value written back to the register equals

> the original value. But it does explain why this opcode is CONSTRAINED

> UNPREDICTABLE in the first place.

>

> Given 2) above, I wonder if anyone could confirm whether adding

> 'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well.


Like this?

diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 0a46676b4245..e6e9b403cd61 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -57,6 +57,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
        /* Prefetching cannot fault */
        prefetchw(uaddr);
        __ua_flags = uaccess_save_and_enable();
+       BUG_ON(__builtin_constant_p(uaddr) || !uaddr);
        __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
        "1:     ldrex   %1, [%4]\n"
        "       teq     %1, %2\n"

This had no effect here.

My first attempt (before finding the original patch from Mikael Pettersson)
was to change the probe to pass '1' as the value instead of '0', that
worked fine.

     Arnd
Ard Biesheuvel March 11, 2019, 2:36 p.m. UTC | #16
On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote:
>

> On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel

> <ard.biesheuvel@linaro.org> wrote:

> > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:

> > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:

>

> >

> > Perhaps. So let me summarize what I do understand.

> >

> > 1) if futex_atomic_cmpxchg_inatomic() is instantiated *and executed*

> > with the same compile time constant value of 0x0 for newval and uaddr,

> > we end up with an opcode for the STRT instruction that is CONSTRAINED

> > UNPREDICTABLE, but we will never execute it since the preceding load

> > will fault and enter the fixup handler.

> > 2) such occurrences of futex_atomic_cmpxchg_inatomic() are unlikely to

> > occur in the code, but may be instantiated by the compiler when doing

> > constant propagation (like in the ilog2() case I quoted), but these

> > instantiations will never be called

> > 3) both clang and gcc may map different inline asm input operands onto

> > the same register if the value is guaranteed to be the same (i.e.,

> > they are both compile time constants)

> >

> > My statement about uaddr was slightly misguided, in the sense that our

> > invocation of STRT does use the post-index variant, but with an

> > increment of zero, so the value written back to the register equals

> > the original value. But it does explain why this opcode is CONSTRAINED

> > UNPREDICTABLE in the first place.

> >

> > Given 2) above, I wonder if anyone could confirm whether adding

> > 'BUG_ON(__builtin_constant_p(uaddr))' silences the warning as well.

>

> Like this?

>

> diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h

> index 0a46676b4245..e6e9b403cd61 100644

> --- a/arch/arm/include/asm/futex.h

> +++ b/arch/arm/include/asm/futex.h

> @@ -57,6 +57,7 @@ futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,

>         /* Prefetching cannot fault */

>         prefetchw(uaddr);

>         __ua_flags = uaccess_save_and_enable();

> +       BUG_ON(__builtin_constant_p(uaddr) || !uaddr);

>         __asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"

>         "1:     ldrex   %1, [%4]\n"

>         "       teq     %1, %2\n"

>

> This had no effect here.

>

> My first attempt (before finding the original patch from Mikael Pettersson)

> was to change the probe to pass '1' as the value instead of '0', that

> worked fine.

>


Which probe is that?
Arnd Bergmann March 11, 2019, 4:29 p.m. UTC | #17
On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote:

> > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel

> > <ard.biesheuvel@linaro.org> wrote:

> > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:

> > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:

> >

> > My first attempt (before finding the original patch from Mikael Pettersson)

> > was to change the probe to pass '1' as the value instead of '0', that

> > worked fine.

> >

>

> Which probe is that?


diff --git a/kernel/futex.c b/kernel/futex.c
index c3b73b0311bc..19615ad3c4f7 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void)
         * implementation, the non-functional ones will return
         * -ENOSYS.
         */
-       if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)
+       if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT)
                futex_cmpxchg_enabled = 1;
 #endif
 }

      Arnd
Ard Biesheuvel March 11, 2019, 4:36 p.m. UTC | #18
On Mon, 11 Mar 2019 at 17:30, Arnd Bergmann <arnd@arndb.de> wrote:
>

> On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel

> <ard.biesheuvel@linaro.org> wrote:

> > On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote:

> > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel

> > > <ard.biesheuvel@linaro.org> wrote:

> > > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:

> > > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:

> > >

> > > My first attempt (before finding the original patch from Mikael Pettersson)

> > > was to change the probe to pass '1' as the value instead of '0', that

> > > worked fine.

> > >

> >

> > Which probe is that?

>

> diff --git a/kernel/futex.c b/kernel/futex.c

> index c3b73b0311bc..19615ad3c4f7 100644

> --- a/kernel/futex.c

> +++ b/kernel/futex.c

> @@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void)

>          * implementation, the non-functional ones will return

>          * -ENOSYS.

>          */

> -       if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)

> +       if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT)

>                 futex_cmpxchg_enabled = 1;

>  #endif

>  }

>


Ah ok.

That explains a lot.

Can't we just return -EFAULT if uaddr is NULL? Or does that defeat this check?

Note that PA-RISC has

        /* futex.c wants to do a cmpxchg_inatomic on kernel NULL, which is
         * our gateway page, and causes no end of trouble...
         */
        if (uaccess_kernel() && !uaddr)
                return -EFAULT;
Arnd Bergmann March 11, 2019, 8:58 p.m. UTC | #19
On Mon, Mar 11, 2019 at 5:36 PM Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
>

> On Mon, 11 Mar 2019 at 17:30, Arnd Bergmann <arnd@arndb.de> wrote:

> >

> > On Mon, Mar 11, 2019 at 3:36 PM Ard Biesheuvel

> > <ard.biesheuvel@linaro.org> wrote:

> > > On Mon, 11 Mar 2019 at 15:34, Arnd Bergmann <arnd@arndb.de> wrote:

> > > > On Fri, Mar 8, 2019 at 12:56 PM Ard Biesheuvel

> > > > <ard.biesheuvel@linaro.org> wrote:

> > > > > On Fri, 8 Mar 2019 at 11:58, Russell King - ARM Linux admin <linux@armlinux.org.uk> wrote:

> > > > > > On Fri, Mar 08, 2019 at 11:45:21AM +0100, Ard Biesheuvel wrote:

> > > >

> > > > My first attempt (before finding the original patch from Mikael Pettersson)

> > > > was to change the probe to pass '1' as the value instead of '0', that

> > > > worked fine.

> > > >

> > >

> > > Which probe is that?

> >

> > diff --git a/kernel/futex.c b/kernel/futex.c

> > index c3b73b0311bc..19615ad3c4f7 100644

> > --- a/kernel/futex.c

> > +++ b/kernel/futex.c

> > @@ -3864,7 +3864,7 @@ static void __init futex_detect_cmpxchg(void)

> >          * implementation, the non-functional ones will return

> >          * -ENOSYS.

> >          */

> > -       if (cmpxchg_futex_value_locked(&curval, NULL, 0, 0) == -EFAULT)

> > +       if (cmpxchg_futex_value_locked(&curval, NULL, 1, 1) == -EFAULT)

> >                 futex_cmpxchg_enabled = 1;

> >  #endif

> >  }

> >

>

> Ah ok.

>

> That explains a lot.

>

> Can't we just return -EFAULT if uaddr is NULL? Or does that defeat this check?


I think that would work here, it would just create a tiny overhead
for each call to futex_atomic_cmpxchg_inatomic().

Semi-related side note:
After I looked at access_ok() for a bit too long, I tried replacing it with

#define access_ok(addr, size) \
       (((u64)(uintptr_t)addr + (u64)(size_t)size) >=
current_thread_info()->addr_limit)

which interestingly seemed to improve the output with clang (it lets it
combine multiple access_ok() checks and schedule the instructions better,
compared to our inline asm implementation), but it unfortunately creates
horrible code with gcc.

      Arnd
diff mbox series

Patch

diff --git a/arch/arm/include/asm/futex.h b/arch/arm/include/asm/futex.h
index 0a46676b4245..79790912974e 100644
--- a/arch/arm/include/asm/futex.h
+++ b/arch/arm/include/asm/futex.h
@@ -110,13 +110,13 @@  futex_atomic_cmpxchg_inatomic(u32 *uval, u32 __user *uaddr,
 	preempt_disable();
 	__ua_flags = uaccess_save_and_enable();
 	__asm__ __volatile__("@futex_atomic_cmpxchg_inatomic\n"
-	"1:	" TUSER(ldr) "	%1, [%4]\n"
-	"	teq	%1, %2\n"
+	"1:	" TUSER(ldr) "	%1, [%2]\n"
+	"	teq	%1, %3\n"
 	"	it	eq	@ explicit IT needed for the 2b label\n"
-	"2:	" TUSER(streq) "	%3, [%4]\n"
+	"2:	" TUSER(streq) "	%4, [%2]\n"
 	__futex_atomic_ex_table("%5")
-	: "+r" (ret), "=&r" (val)
-	: "r" (oldval), "r" (newval), "r" (uaddr), "Ir" (-EFAULT)
+	: "+&r" (ret), "=&r" (val), "+&r" (uaddr)
+	: "r" (oldval), "r" (newval), "Ir" (-EFAULT)
 	: "cc", "memory");
 	uaccess_restore(__ua_flags);