diff mbox series

x86/smap: Fix the smap_save() asm

Message ID 9f513eef618b6e72a088cc8b2787496f190d1c2d.1600203307.git.luto@kernel.org
State New
Headers show
Series x86/smap: Fix the smap_save() asm | expand

Commit Message

Andy Lutomirski Sept. 15, 2020, 8:55 p.m. UTC
The old smap_save() code was:

  pushf
  pop %0

with %0 defined by an "=rm" constraint.  This is fine if the
compiler picked the register option, but it was incorrect with an
%rsp-relative memory operand.  With some intentional abuse, I can
get both gcc and clang to generate code along these lines:

  pushfq
  popq 0x8(%rsp)
  mov 0x8(%rsp),%rax

which is incorrect and will not work as intended.

Fix it by removing the memory option.  This issue is exacerbated by
a clang optimization bug:

  https://bugs.llvm.org/show_bug.cgi?id=47530

Fixes: e74deb11931f ("x86/uaccess: Introduce user_access_{save,restore}()")
Cc: stable@vger.kernel.org
Reported-by: Bill Wendling <morbo@google.com> # I think
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/include/asm/smap.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Nick Desaulniers Sept. 15, 2020, 9:24 p.m. UTC | #1
On Tue, Sep 15, 2020 at 1:56 PM Andy Lutomirski <luto@kernel.org> wrote:
>

> The old smap_save() code was:

>

>   pushf

>   pop %0

>

> with %0 defined by an "=rm" constraint.  This is fine if the

> compiler picked the register option, but it was incorrect with an

> %rsp-relative memory operand.


It is incorrect because ... (I think mentioning the point about the
red zone would be good, unless there were additional concerns?)

The patch should be fine, so

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>


regardless of whether or not you choose to amend the commit message.
I suspect that the use of "r" exclusively without "m" could lead to
register exhaustion (though it's more likely the compiler will spill
some other variable to stack), which is why it's common to use it in
conjunction with "m."  As Bill's patch notes, getting the "m" version
is poor for performance of this pattern, or at least for
native_{save|restore}_fl.  Being able to use the compiler builtins is
another possibility here.

> With some intentional abuse, I can

> get both gcc and clang to generate code along these lines:

>

>   pushfq

>   popq 0x8(%rsp)

>   mov 0x8(%rsp),%rax

>

> which is incorrect and will not work as intended.

>

> Fix it by removing the memory option.  This issue is exacerbated by

> a clang optimization bug:

>

>   https://bugs.llvm.org/show_bug.cgi?id=47530


This is something we should fix.  Bill, James, and I are discussing
this internally.  Thank you for filing a bug; I owe you a beer just
for that.

>

> Fixes: e74deb11931f ("x86/uaccess: Introduce user_access_{save,restore}()")

> Cc: stable@vger.kernel.org

> Reported-by: Bill Wendling <morbo@google.com> # I think


LOL, yes, the comment can be dropped...though I guess someone else may
have reported the problem to Bill?

> Signed-off-by: Andy Lutomirski <luto@kernel.org>

> ---

>  arch/x86/include/asm/smap.h | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

>

> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h

> index 8b58d6975d5d..be6d675ae3ac 100644

> --- a/arch/x86/include/asm/smap.h

> +++ b/arch/x86/include/asm/smap.h

> @@ -61,7 +61,7 @@ static __always_inline unsigned long smap_save(void)

>                       ALTERNATIVE("jmp 1f", "", X86_FEATURE_SMAP)

>                       "pushf; pop %0; " __ASM_CLAC "\n\t"

>                       "1:"

> -                     : "=rm" (flags) : : "memory", "cc");

> +                     : "=r" (flags) : : "memory", "cc");

>

>         return flags;

>  }

> --

> 2.26.2

>



-- 
Thanks,
~Nick Desaulniers
Bill Wendling Sept. 15, 2020, 9:31 p.m. UTC | #2
On Tue, Sep 15, 2020 at 2:24 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
> On Tue, Sep 15, 2020 at 1:56 PM Andy Lutomirski <luto@kernel.org> wrote:

> > Fixes: e74deb11931f ("x86/uaccess: Introduce user_access_{save,restore}()")

> > Cc: stable@vger.kernel.org

> > Reported-by: Bill Wendling <morbo@google.com> # I think

>

> LOL, yes, the comment can be dropped...though I guess someone else may

> have reported the problem to Bill?

>

I found this instance, but not the general issue. :-)

-bw
Andy Lutomirski Sept. 15, 2020, 11:11 p.m. UTC | #3
> On Sep 15, 2020, at 2:24 PM, Nick Desaulniers <ndesaulniers@google.com> wrote:

> 

> On Tue, Sep 15, 2020 at 1:56 PM Andy Lutomirski <luto@kernel.org> wrote:

>> 

>> The old smap_save() code was:

>> 

>>  pushf

>>  pop %0

>> 

>> with %0 defined by an "=rm" constraint.  This is fine if the

>> compiler picked the register option, but it was incorrect with an

>> %rsp-relative memory operand.

> 

> It is incorrect because ... (I think mentioning the point about the

> red zone would be good, unless there were additional concerns?)


This isn’t a red zone issue — it’s a just-plain-wrong issue.  The popf is storing the result in the wrong place in memory — it’s RSP-relative, but RSP is whatever the compiler thinks it should be minus 8, because the compiler doesn’t know that pushfq changed RSP.

> 

> This is something we should fix.  Bill, James, and I are discussing

> this internally.  Thank you for filing a bug; I owe you a beer just

> for that.


I’m looking forward to the day that beers can be exchanged in person again :)

> 

>> 

>> Fixes: e74deb11931f ("x86/uaccess: Introduce user_access_{save,restore}()")

>> Cc: stable@vger.kernel.org

>> Reported-by: Bill Wendling <morbo@google.com> # I think

> 

> LOL, yes, the comment can be dropped...though I guess someone else may

> have reported the problem to Bill?


The “I think” is because I’m not sure whether Bill reported this particular issue. But I’m fine with dropping it.
Bill Wendling Sept. 15, 2020, 11:43 p.m. UTC | #4
On Tue, Sep 15, 2020 at 4:40 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>

> On 16/09/2020 00:11, Andy Lutomirski wrote:

> >> On Sep 15, 2020, at 2:24 PM, Nick Desaulniers <ndesaulniers@google.com> wrote:

> >>

> >> On Tue, Sep 15, 2020 at 1:56 PM Andy Lutomirski <luto@kernel.org> wrote:

> >>> The old smap_save() code was:

> >>>

> >>>  pushf

> >>>  pop %0

> >>>

> >>> with %0 defined by an "=rm" constraint.  This is fine if the

> >>> compiler picked the register option, but it was incorrect with an

> >>> %rsp-relative memory operand.

> >> It is incorrect because ... (I think mentioning the point about the

> >> red zone would be good, unless there were additional concerns?)

> > This isn’t a red zone issue — it’s a just-plain-wrong issue.  The popf is storing the result in the wrong place in memory — it’s RSP-relative, but RSP is whatever the compiler thinks it should be minus 8, because the compiler doesn’t know that pushfq changed RSP.

>

> It's worse than that.  Even when stating that %rsp is modified in the

> asm, the generated code sequence is still buggy, for recent Clang and GCC.

>

> https://godbolt.org/z/ccz9v7

>

> It's clearly not safe to ever use memory operands with pushf/popf asm

> fragments.

>

Would this apply to native_save_fl() and native_restore_fl in
arch/x86/include/asm/irqflags.h? It was like that two revisions ago,
but it was changed (back) to "=rm" with a comment about it being safe.

> >> This is something we should fix.  Bill, James, and I are discussing

> >> this internally.  Thank you for filing a bug; I owe you a beer just

> >> for that.

> > I’m looking forward to the day that beers can be exchanged in person again :)

>

> +1 to that.

>

+100

-bw
Peter Zijlstra Sept. 16, 2020, 7:28 a.m. UTC | #5
On Tue, Sep 15, 2020 at 01:55:58PM -0700, Andy Lutomirski wrote:
> The old smap_save() code was:

> 

>   pushf

>   pop %0

> 

> with %0 defined by an "=rm" constraint.  This is fine if the

> compiler picked the register option, but it was incorrect with an

> %rsp-relative memory operand.  With some intentional abuse, I can

> get both gcc and clang to generate code along these lines:

> 

>   pushfq

>   popq 0x8(%rsp)

>   mov 0x8(%rsp),%rax

> 

> which is incorrect and will not work as intended.


We need another constraint :-)

> Fix it by removing the memory option.  This issue is exacerbated by

> a clang optimization bug:

> 

>   https://bugs.llvm.org/show_bug.cgi?id=47530

> 

> Fixes: e74deb11931f ("x86/uaccess: Introduce user_access_{save,restore}()")

> Cc: stable@vger.kernel.org

> Reported-by: Bill Wendling <morbo@google.com> # I think

> Signed-off-by: Andy Lutomirski <luto@kernel.org>

> ---

>  arch/x86/include/asm/smap.h | 2 +-

>  1 file changed, 1 insertion(+), 1 deletion(-)

> 

> diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h

> index 8b58d6975d5d..be6d675ae3ac 100644

> --- a/arch/x86/include/asm/smap.h

> +++ b/arch/x86/include/asm/smap.h

> @@ -61,7 +61,7 @@ static __always_inline unsigned long smap_save(void)

>  		      ALTERNATIVE("jmp 1f", "", X86_FEATURE_SMAP)

>  		      "pushf; pop %0; " __ASM_CLAC "\n\t"

>  		      "1:"

> -		      : "=rm" (flags) : : "memory", "cc");

> +		      : "=r" (flags) : : "memory", "cc");


native_save_fl() has the exact same code; you'll need to fix both.
Borislav Petkov Sept. 16, 2020, 8:26 a.m. UTC | #6
On Wed, Sep 16, 2020 at 12:40:30AM +0100, Andrew Cooper wrote:
> It's worse than that.  Even when stating that %rsp is modified in the

> asm, the generated code sequence is still buggy, for recent Clang and GCC.

> 

> https://godbolt.org/z/ccz9v7

> 

> It's clearly not safe to ever use memory operands with pushf/popf asm

> fragments.


So I went and singlestepped your snippet in gdb. And it all seems to
work - it is simply a bit confusing: :-)

eflags         0x246               [ PF ZF IF ]

=> 0x000055555555505d <main+13>:        9c      pushfq
0x7fffffffe440: 0x00007fffffffe540      0x0000000000000000
0x7fffffffe450: 0x0000000000000000      0x00007ffff7e0ecca
0x7fffffffe460: 0x00007fffffffe548      0x00000001ffffe7c9
0x7fffffffe470: 0x0000555555555050      0x00007ffff7e0e8f8
0x7fffffffe480: 0x0000000000000000      0x0c710afd7e78681b

those lines under the "=>" line are the stack contents printed with

$ x/10gx $sp

Then, we will pop into 0x8(%rsp):

=> 0x55555555505e <main+14>:    popq   0x8(%rsp)
0x7fffffffe438: 0x0000000000000346      0x00007fffffffe540
0x7fffffffe448: 0x0000000000000000      0x0000000000000000
0x7fffffffe458: 0x00007ffff7e0ecca      0x00007fffffffe548
0x7fffffffe468: 0x00000001ffffe7c9      0x0000555555555050
0x7fffffffe478: 0x00007ffff7e0e8f8      0x0000000000000000

Now, POP copies the value pointed to by %rsp, *increments* the stack
pointer and *then* computes the effective address of the operand. It
says so in the SDM too (thanks peterz!):

"If the ESP register is used as a base register for addressing a
destination operand in memory, the POP instruction computes the
effective address of the operand after it increments the ESP register."

*That*s why, FLAGS is in 0x7fffffffe448! which is %rsp + 8.

Basically flags is there *twice* on the stack:

(gdb) x/10x 0x7fffffffe438
0x7fffffffe438: 0x0000000000000346      0x00007fffffffe540
		^^^^^^^^^^^^^^^^^^
0x7fffffffe448: 0x0000000000000346      0x0000000000000000
		^^^^^^^^^^^^^^^^^^
0x7fffffffe458: 0x00007ffff7e0ecca      0x00007fffffffe548
0x7fffffffe468: 0x00000001ffffe7c9      0x0000555555555050
0x7fffffffe478: 0x00007ffff7e0e8f8      0x0000000000000000

and now we read the second copy into %rsi.

=> 0x555555555062 <main+18>:    mov    0x8(%rsp),%rsi
0x7fffffffe440: 0x00007fffffffe540      0x0000000000000346
0x7fffffffe450: 0x0000000000000000      0x00007ffff7e0ecca
0x7fffffffe460: 0x00007fffffffe548      0x00000001ffffe7c9
0x7fffffffe470: 0x0000555555555050      0x00007ffff7e0e8f8
0x7fffffffe480: 0x0000000000000000      0x0c710afd7e78681b

Looks like it works as designed.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Sept. 17, 2020, 6:04 a.m. UTC | #7
On Wed, Sep 16, 2020 at 11:48:42PM +0100, Andrew Cooper wrote:
> Every day is a school day.


Tell me about it...

> This is very definitely one to be filed in obscure x86 corner cases.

> 

> The code snippet above is actually wrong for the kernel, as it uses one

> slot of the red-zone.  Recompiling with -mno-red-zone makes something

> which looks safe stack-wise, give or take this behaviour.


Right, we recently disabled red zone in the early decompression stage,
for SEV-ES:

https://git.kernel.org/tip/6ba0efa46047936afa81460489cfd24bc95dd863

I probably should go audit that for similar funnies:

$ objdump -d arch/x86/boot/compressed/vmlinux | grep -E "pop.*\(%[er]?sp"
$

Nope, nothing. Because building your snippet with:

$ gcc -Wall -O2 -mno-red-zone -o flags{,.c}

still does use that one slot:

0000000000001050 <main>:
    1050:       48 83 ec 18             sub    $0x18,%rsp
    1054:       48 8d 3d a9 0f 00 00    lea    0xfa9(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
    105b:       31 c0                   xor    %eax,%eax
    105d:       9c                      pushfq
    105e:       8f 44 24 08             popq   0x8(%rsp)
    1062:       48 8b 74 24 08          mov    0x8(%rsp),%rsi

Wonder if that flag -mno-red-zone even does anything...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Sept. 17, 2020, 11:57 a.m. UTC | #8
On Thu, Sep 17, 2020 at 10:05:37AM +0000, David Laight wrote:
> The 'red-zone' allows leaf function to use stack memory for locals

> that is below (ie the wrong side of) the stack pointer.


After looking at

"Figure 3.3: Stack Frame with Base Pointer"

in the x86-64 ABI doc, you're probably right:

0(%rbp)
-8(%rbp)
...
0(%rsp)
.. red zone
-128(%rsp)

i.e., rsp-relative addresses with negative offsets are in the red zone.
So it looks like the compiler actually knows very well what's going on
here and allocates room on the stack for that 0x8(%rsp) slot.

Good.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Borislav Petkov Sept. 17, 2020, 2:39 p.m. UTC | #9
On Thu, Sep 17, 2020 at 02:25:50PM +0000, David Laight wrote:
> I actually wonder if there is any code that really benefits from

> the red-zone.


The kernel has been without a red zone since 2002 at least:

  commit 47f16da277d10ef9494f3e9da2a9113bb22bcd75
  Author: Andi Kleen <ak@muc.de>
  Date:   Tue Feb 12 20:17:35 2002 -0800

      [PATCH] x86_64 merge: arch + asm

      This adds the x86_64 arch and asm directories and a Documentation/x86_64.

  ...
  +CFLAGS += $(shell if $(CC) -mno-red-zone -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mno-red-zone"; fi )


Also, from the ABI doc:

"A.2.2 Stack Layout

The Linux kernel may align the end of the input argument area to a
8, instead of 16, byte boundary. It does not honor the red zone (see
section 3.2.2) and therefore this area is not allowed to be used by
kernel code. Kernel code should be compiled by GCC with the option
-mno-red-zone."

so forget the red zone.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Andy Lutomirski Sept. 17, 2020, 4:32 p.m. UTC | #10
On Thu, Sep 17, 2020 at 7:39 AM Borislav Petkov <bp@alien8.de> wrote:
>

> On Thu, Sep 17, 2020 at 02:25:50PM +0000, David Laight wrote:

> > I actually wonder if there is any code that really benefits from

> > the red-zone.

>

> The kernel has been without a red zone since 2002 at least:

>

>   commit 47f16da277d10ef9494f3e9da2a9113bb22bcd75

>   Author: Andi Kleen <ak@muc.de>

>   Date:   Tue Feb 12 20:17:35 2002 -0800

>

>       [PATCH] x86_64 merge: arch + asm

>

>       This adds the x86_64 arch and asm directories and a Documentation/x86_64.

>

>   ...

>   +CFLAGS += $(shell if $(CC) -mno-red-zone -S -o /dev/null -xc /dev/null >/dev/null 2>&1; then echo "-mno-red-zone"; fi )

>

>

> Also, from the ABI doc:

>

> "A.2.2 Stack Layout

>

> The Linux kernel may align the end of the input argument area to a

> 8, instead of 16, byte boundary. It does not honor the red zone (see

> section 3.2.2) and therefore this area is not allowed to be used by

> kernel code. Kernel code should be compiled by GCC with the option

> -mno-red-zone."

>

> so forget the red zone.

>

> --

> Regards/Gruss,

>     Boris.

>

> https://people.kernel.org/tglx/notes-about-netiquette


Regardless of anything that any docs may or may not say, the kernel
*can't* use a redzone -- an interrupt would overwrite it.
diff mbox series

Patch

diff --git a/arch/x86/include/asm/smap.h b/arch/x86/include/asm/smap.h
index 8b58d6975d5d..be6d675ae3ac 100644
--- a/arch/x86/include/asm/smap.h
+++ b/arch/x86/include/asm/smap.h
@@ -61,7 +61,7 @@  static __always_inline unsigned long smap_save(void)
 		      ALTERNATIVE("jmp 1f", "", X86_FEATURE_SMAP)
 		      "pushf; pop %0; " __ASM_CLAC "\n\t"
 		      "1:"
-		      : "=rm" (flags) : : "memory", "cc");
+		      : "=r" (flags) : : "memory", "cc");
 
 	return flags;
 }