[ARM] Fix PR85434: spill of stack protector's guard address

Message ID CAKnkMGureVC-Wf09yG1x70h+LTfh+oQt_e-6nU8P8xkAi2PYAQ@mail.gmail.com
State New
Headers show
Series
  • [ARM] Fix PR85434: spill of stack protector's guard address
Related show

Commit Message

Thomas Preudhomme April 27, 2018, 11:32 p.m.
On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by
loading its address first before loading its value from it as part of
the stack_protect_set or stack_protect_check insn pattern. This creates
the risk of spilling between the two.

It is particularly likely on Aarch32 when compiling PIC code because
- computing the address takes several instructions (first compute the
  GOT base and then the GOT entry by adding an offset) which increases
  the likelyhood of CSE
- the address computation can be CSEd due to the GOT entry computation
  being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC
  of a MEM like on AArche64.

This patch address both issues by (i) adding some scheduler barriers
around the stack protector code and (ii) making all memory loads
involved in computing the guard's address volatile. The use of volatile
rather than unspec was chosen so that the patterns for computing the
guard address can be the same as for normal global variable access thus
reusing more code. Finally the patch also improves the documentation to
mention the need to be careful when computing the address of the guard.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2018-04-27  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

        PR target/85434
        * cfgexpand.c (stack_protect_prologue): Emit scheduler barriers around
        stack protector code.
        * function.c (stack_protect_epilogue): Likewise.
        * config/arm/arm-protos.h (arm_stack_chk_guard_decl_p): Declare.
        * config/arm/arm.md (calculate_pic_address): Mark memory volatile if
        is computing address of stack protector's guard.
        (calculate_pic_address splitter): Likewise.
        * config/arm/arm.c (require_pic_register): Add parameter to control
        whether to insert instruction at the end of the instruction stream.
        (legitimize_pic_address): Force computing PIC address at the end of
        instruction stream and adapt logic to change in calculate_pic_address
        insn pattern.
        (arm_stack_chk_guard_decl_p): New function.
        (arm_emit_call_insn): Adapt to change in require_pic_register().
        * target.def (TARGET_STACK_PROTECT_GUARD): Document requirement on
        guard's address computation to be careful about not spilling.
        * doc/tm.texi: Regenerate.

*** gcc/testsuite/ChangeLog ***

2018-04-27  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

        PR target/85434
        * gcc.target/arm/pr85434.c: New testcase.

Testing: The code has been boostraped on an Armv8-A machine targeting:
- Aarch32 ARM mode with -mfpu=neon-fpv4 and hardfloat ABI
- Aarch64
Testsuite has been run for the following sets of flag:

- arm-eabi-aem/-mthumb/-march=armv4t
- arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
- arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard

(thereby testing the code for ARM, Thumb-2 and Thumb-1 mode) without any
regression.

Is it ok for trunk?

Best regards,

Thomas

Comments

Richard Sandiford April 28, 2018, 11:57 a.m. | #1
Thomas Preudhomme <thomas.preudhomme@linaro.org> writes:
> On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by

> loading its address first before loading its value from it as part of

> the stack_protect_set or stack_protect_check insn pattern. This creates

> the risk of spilling between the two.

>

> It is particularly likely on Aarch32 when compiling PIC code because

> - computing the address takes several instructions (first compute the

>   GOT base and then the GOT entry by adding an offset) which increases

>   the likelyhood of CSE

> - the address computation can be CSEd due to the GOT entry computation

>   being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC

>   of a MEM like on AArche64.

>

> This patch address both issues by (i) adding some scheduler barriers

> around the stack protector code and (ii) making all memory loads

> involved in computing the guard's address volatile. The use of volatile

> rather than unspec was chosen so that the patterns for computing the

> guard address can be the same as for normal global variable access thus

> reusing more code. Finally the patch also improves the documentation to

> mention the need to be careful when computing the address of the guard.


[...]

> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c

> index deab929..c7ced8f 100644

> --- a/gcc/cfgexpand.c

> +++ b/gcc/cfgexpand.c

> @@ -6156,6 +6156,10 @@ stack_protect_prologue (void)

>    tree guard_decl = targetm.stack_protect_guard ();

>    rtx x, y;

>  

> +  /* Prevent scheduling of instruction(s) between computation of the guard's

> +     address and setting of the canari to avoid any spill of the guard's

> +     address if computed outside the setting of the canari.  */

> +  emit_insn (gen_blockage ());

>    x = expand_normal (crtl->stack_protect_guard);

>    if (guard_decl)

>      y = expand_normal (guard_decl);


Scheduling barriers should definitely make spilling less likely,
but I don't think they avoid it completely.  For that I think we'd need to:

(a) make sure that gen_stack_protect_set gets passed (mem (symbol_ref)),
    which we can probably do by using DECL_RTL directly.  (Or failing that,
    expand_expr with EXPAND_CONST_ADDRESS should work.)

(b) make the target pattern accept (mem (symbol_ref)) and only legitimise
    it during a post-RA split.

The scheduling barriers would then be unnecessary, since the pattern
would be indivisible and self-contained until after RA.

On its own that would probably mean changing all backends to accept
the new style of mem.  One way of avoiding that would be to use the
maybe_expand_insn interface with targetm.code_for_stack_protect_set
instead of calling targetm.gen_stack_protect_set directly.
Arguably that's better anyway.

Thanks,
Richard
Segher Boessenkool April 28, 2018, 11:33 p.m. | #2
Hi!

On Sat, Apr 28, 2018 at 12:32:26AM +0100, Thomas Preudhomme wrote:
> On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by

> loading its address first before loading its value from it as part of

> the stack_protect_set or stack_protect_check insn pattern. This creates

> the risk of spilling between the two.


> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c

> index deab929..c7ced8f 100644

> --- a/gcc/cfgexpand.c

> +++ b/gcc/cfgexpand.c

> @@ -6156,6 +6156,10 @@ stack_protect_prologue (void)

>    tree guard_decl = targetm.stack_protect_guard ();

>    rtx x, y;

>  

> +  /* Prevent scheduling of instruction(s) between computation of the guard's

> +     address and setting of the canari to avoid any spill of the guard's

> +     address if computed outside the setting of the canari.  */

> +  emit_insn (gen_blockage ());

>    x = expand_normal (crtl->stack_protect_guard);

>    if (guard_decl)

>      y = expand_normal (guard_decl);


[ etc. ]

Why pessimise code for all targets (quite a lot), when it does not even
fix the problem on Arm completely (or not obviously, anyway)?

Instead, implement stack_protect_* and hide the memory accesses to the
stored canary value (and make sure its address is not left in a register
either!)

I doubt this can be done completely target-independent, it will always
be best effort that way, aka it won't really work.


Segher
Thomas Preudhomme May 2, 2018, 6:57 a.m. | #3
Hi Segher,

As mentionned in the ticket this was my first thought but this means
making the pattern aware of all the possible way the address could be
access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many
scratch registers are needed. I'd rather reuse the existing pattern as
much as possible to make sure they are well tested. Ideally I wanted a
way to mark a REG RTX so that it is never spilled and such that the
mark is propagated when the register is moved to another register or
propagated. But that is a bigger change so decided it should be an
improvement for later but needed another solution right now.

By the way about making sure the address is not left in a register, I
have a question regarding the current stack_protect_set and
stack_protect_check pattern and their requirements to have register
cleared afterwards: why is that necessary? Currently not all registers
are cleared and the guard is available in the canari before it is
overwritten anyway so I don't see how clearing the register adds any
extra security. What sort of attack is it protecting against?

Best regards,

Thomas

On 29 April 2018 at 00:33, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> Hi!

>

> On Sat, Apr 28, 2018 at 12:32:26AM +0100, Thomas Preudhomme wrote:

>> On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by

>> loading its address first before loading its value from it as part of

>> the stack_protect_set or stack_protect_check insn pattern. This creates

>> the risk of spilling between the two.

>

>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c

>> index deab929..c7ced8f 100644

>> --- a/gcc/cfgexpand.c

>> +++ b/gcc/cfgexpand.c

>> @@ -6156,6 +6156,10 @@ stack_protect_prologue (void)

>>    tree guard_decl = targetm.stack_protect_guard ();

>>    rtx x, y;

>>

>> +  /* Prevent scheduling of instruction(s) between computation of the guard's

>> +     address and setting of the canari to avoid any spill of the guard's

>> +     address if computed outside the setting of the canari.  */

>> +  emit_insn (gen_blockage ());

>>    x = expand_normal (crtl->stack_protect_guard);

>>    if (guard_decl)

>>      y = expand_normal (guard_decl);

>

> [ etc. ]

>

> Why pessimise code for all targets (quite a lot), when it does not even

> fix the problem on Arm completely (or not obviously, anyway)?

>

> Instead, implement stack_protect_* and hide the memory accesses to the

> stored canary value (and make sure its address is not left in a register

> either!)

>

> I doubt this can be done completely target-independent, it will always

> be best effort that way, aka it won't really work.

>

>

> Segher
Segher Boessenkool May 3, 2018, 4:55 p.m. | #4
Hi!

On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote:
> As mentionned in the ticket this was my first thought but this means

> making the pattern aware of all the possible way the address could be

> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many

> scratch registers are needed. I'd rather reuse the existing pattern as

> much as possible to make sure they are well tested. Ideally I wanted a

> way to mark a REG RTX so that it is never spilled and such that the

> mark is propagated when the register is moved to another register or

> propagated. But that is a bigger change so decided it should be an

> improvement for later but needed another solution right now.


How would that work, esp. for pseudos?  If too many regs have such a
mark then the compiler will have to sorry() or similar, not a good
thing at all.

> By the way about making sure the address is not left in a register, I

> have a question regarding the current stack_protect_set and

> stack_protect_check pattern and their requirements to have register

> cleared afterwards: why is that necessary? Currently not all registers

> are cleared and the guard is available in the canari before it is

> overwritten anyway so I don't see how clearing the register adds any

> extra security. What sort of attack is it protecting against?


From md.texi:

@item @samp{stack_protect_set}
This pattern, if defined, moves a @code{ptr_mode} value from the memory
in operand 1 to the memory in operand 0 without leaving the value in
a register afterward.  This is to avoid leaking the value some place
that an attacker might use to rewrite the stack guard slot after
having clobbered it.

(etc.)

Having the canary in a global variable makes it a lot easier for exploit
code to access it then if it is e.g. in TLS data.  Actually leaking a
pointer to it would make it extra easy...


Segher
Jeff Law May 3, 2018, 5:07 p.m. | #5
On 05/03/2018 10:55 AM, Segher Boessenkool wrote:
> Hi!

> 

> On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote:

>> As mentionned in the ticket this was my first thought but this means

>> making the pattern aware of all the possible way the address could be

>> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many

>> scratch registers are needed. I'd rather reuse the existing pattern as

>> much as possible to make sure they are well tested. Ideally I wanted a

>> way to mark a REG RTX so that it is never spilled and such that the

>> mark is propagated when the register is moved to another register or

>> propagated. But that is a bigger change so decided it should be an

>> improvement for later but needed another solution right now.

> 

> How would that work, esp. for pseudos?  If too many regs have such a

> mark then the compiler will have to sorry() or similar, not a good

> thing at all.

> 

>> By the way about making sure the address is not left in a register, I

>> have a question regarding the current stack_protect_set and

>> stack_protect_check pattern and their requirements to have register

>> cleared afterwards: why is that necessary? Currently not all registers

>> are cleared and the guard is available in the canari before it is

>> overwritten anyway so I don't see how clearing the register adds any

>> extra security. What sort of attack is it protecting against?

> 

> From md.texi:

> 

> @item @samp{stack_protect_set}

> This pattern, if defined, moves a @code{ptr_mode} value from the memory

> in operand 1 to the memory in operand 0 without leaving the value in

> a register afterward.  This is to avoid leaking the value some place

> that an attacker might use to rewrite the stack guard slot after

> having clobbered it.

> 

> (etc.)

> 

> Having the canary in a global variable makes it a lot easier for exploit

> code to access it then if it is e.g. in TLS data.  Actually leaking a

> pointer to it would make it extra easy...

Yup.  And at least one guard (not the stack guard) has that properly.
It's stuffed into static storage.  Not good.

Though I must admit it was awful convenient when tracking down a bug in
how the guard was used.

jeff
Thomas Preudhomme May 4, 2018, 4:52 a.m. | #6
I'll make a fool of myself but I still have further questions if you don't
mind (see inline).

On Friday, 4 May 2018, Segher Boessenkool <segher@kernel.crashing.org>
wrote:
> Hi!

>

> On Wed, May 02, 2018 at 07:57:55AM +0100, Thomas Preudhomme wrote:

>> As mentionned in the ticket this was my first thought but this means

>> making the pattern aware of all the possible way the address could be

>> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many

>> scratch registers are needed. I'd rather reuse the existing pattern as

>> much as possible to make sure they are well tested. Ideally I wanted a

>> way to mark a REG RTX so that it is never spilled and such that the

>> mark is propagated when the register is moved to another register or

>> propagated. But that is a bigger change so decided it should be an

>> improvement for later but needed another solution right now.

>

> How would that work, esp. for pseudos?  If too many regs have such a

> mark then the compiler will have to sorry() or similar, not a good

> thing at all.


I'm missing something, there should be the same amount of pseudo with that
mark as there is scratch in the new pattern doing memory address load(s) +
set / check. I'm guessing this is not as easy to achieve as it sounds.

>

>> By the way about making sure the address is not left in a register, I

>> have a question regarding the current stack_protect_set and

>> stack_protect_check pattern and their requirements to have register

>> cleared afterwards: why is that necessary? Currently not all registers

>> are cleared and the guard is available in the canari before it is

>> overwritten anyway so I don't see how clearing the register adds any

>> extra security. What sort of attack is it protecting against?

>

> From md.texi:

>

> @item @samp{stack_protect_set}

> This pattern, if defined, moves a @code{ptr_mode} value from the memory

> in operand 1 to the memory in operand 0 without leaving the value in

> a register afterward.  This is to avoid leaking the value some place

> that an attacker might use to rewrite the stack guard slot after

> having clobbered it.

>

> (etc.)


I've read that doc but what I don't understand is why the guard value being
leaked in a register would be a problem if modified. The pattern as they
are guarantee the guard is always reloaded from its canonical location
(e.g. TLS var). Because the patterns do not represent in RTL what they do
the compiler could not reuse the value left in a register. Are we worrying
about optimization the assembler could do?

>

> Having the canary in a global variable makes it a lot easier for exploit

> code to access it then if it is e.g. in TLS data.  Actually leaking a

> pointer to it would make it extra easy...


If an attacker can execute code to access and modify the guard, why would
s/he bother doing a stack overflow instead of just executing the code he
wants to directly?

Best regards,

Thomas
Segher Boessenkool May 4, 2018, 8:06 a.m. | #7
Hi Thomas,

On Fri, May 04, 2018 at 05:52:57AM +0100, Thomas Preudhomme wrote:
> >> As mentionned in the ticket this was my first thought but this means

> >> making the pattern aware of all the possible way the address could be

> >> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many

> >> scratch registers are needed. I'd rather reuse the existing pattern as

> >> much as possible to make sure they are well tested. Ideally I wanted a

> >> way to mark a REG RTX so that it is never spilled and such that the

> >> mark is propagated when the register is moved to another register or

> >> propagated. But that is a bigger change so decided it should be an

> >> improvement for later but needed another solution right now.

> >

> > How would that work, esp. for pseudos?  If too many regs have such a

> > mark then the compiler will have to sorry() or similar, not a good

> > thing at all.

> 

> I'm missing something, there should be the same amount of pseudo with that

> mark as there is scratch in the new pattern doing memory address load(s) +

> set / check. I'm guessing this is not as easy to achieve as it sounds.


But this pattern is expanded all the way at the beginning of the RTL
pipeline, so you'll need to prevent anything copying this.  And if any
other pattern wants to use this do-not-spill feature as well, you'll
have a problem no matter what.

> >> By the way about making sure the address is not left in a register, I

> >> have a question regarding the current stack_protect_set and

> >> stack_protect_check pattern and their requirements to have register

> >> cleared afterwards: why is that necessary? Currently not all registers

> >> are cleared and the guard is available in the canari before it is

> >> overwritten anyway so I don't see how clearing the register adds any

> >> extra security. What sort of attack is it protecting against?

> >

> > From md.texi:

> >

> > @item @samp{stack_protect_set}

> > This pattern, if defined, moves a @code{ptr_mode} value from the memory

> > in operand 1 to the memory in operand 0 without leaving the value in

> > a register afterward.  This is to avoid leaking the value some place

> > that an attacker might use to rewrite the stack guard slot after

> > having clobbered it.

> >

> > (etc.)

> 

> I've read that doc but what I don't understand is why the guard value being

> leaked in a register would be a problem if modified. The pattern as they

> are guarantee the guard is always reloaded from its canonical location

> (e.g. TLS var). Because the patterns do not represent in RTL what they do

> the compiler could not reuse the value left in a register. Are we worrying

> about optimization the assembler could do?

> 

> > Having the canary in a global variable makes it a lot easier for exploit

> > code to access it then if it is e.g. in TLS data.  Actually leaking a

> > pointer to it would make it extra easy...

> 

> If an attacker can execute code to access and modify the guard, why would

> s/he bother doing a stack overflow instead of just executing the code he

> wants to directly?


The issue is leaking the value so the user can observe it, and then when
overwriting the stack write the expected value to the cookie again, so
that the protection isn't triggered.

You don't necessarily need to execute code of your choice to overwrite
a memory location of your choice, fwiw.  SSP does not prevent all attacks,
just very many.


Segher
Jeff Law May 21, 2018, 10:49 p.m. | #8
On 05/02/2018 12:57 AM, Thomas Preudhomme wrote:
> Hi Segher,

> 

> As mentionned in the ticket this was my first thought but this means

> making the pattern aware of all the possible way the address could be

> access (PIC Vs non-PIC, Arm Vs Thumb-2 Vs Thumb-1) to decide how many

> scratch registers are needed. I'd rather reuse the existing pattern as

> much as possible to make sure they are well tested. Ideally I wanted a

> way to mark a REG RTX so that it is never spilled and such that the

> mark is propagated when the register is moved to another register or

> propagated. But that is a bigger change so decided it should be an

> improvement for later but needed another solution right now.

> 

> By the way about making sure the address is not left in a register, I

> have a question regarding the current stack_protect_set and

> stack_protect_check pattern and their requirements to have register

> cleared afterwards: why is that necessary? Currently not all registers

> are cleared and the guard is available in the canari before it is

> overwritten anyway so I don't see how clearing the register adds any

> extra security. What sort of attack is it protecting against?

I'm not aware of any way to make a REG so that it's never spilled.  It's
a concept we simply don't have.

About the closest you get is a fixed register.  But you certainly don't
want to do that.

I really think you're going to have to address this primarily in the ARM
backend, probably making a fair amount of things opaque to the rest of
the compiler.

Jeff

Patch

From 76c48e31130f212721addeeca830477e3b6f5e10 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@arm.com>
Date: Mon, 23 Apr 2018 14:37:11 +0100
Subject: [PATCH] [ARM] Fix PR85434: spill of stack protector's guard address

On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by
loading its address first before loading its value from it as part of
the stack_protect_set or stack_protect_check insn pattern. This creates
the risk of spilling between the two.

It is particularly likely on Aarch32 when compiling PIC code because
- computing the address takes several instructions (first compute the
  GOT base and then the GOT entry by adding an offset) which increases
  the likelyhood of CSE
- the address computation can be CSEd due to the GOT entry computation
  being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC
  of a MEM like on AArche64.

This patch address both issues by (i) adding some scheduler barriers
around the stack protector code and (ii) making all memory loads
involved in computing the guard's address volatile. The use of volatile
rather than unspec was chosen so that the patterns for computing the
guard address can be the same as for normal global variable access thus
reusing more code. Finally the patch also improves the documentation to
mention the need to be careful when computing the address of the guard.

ChangeLog entry is as follows:

*** gcc/ChangeLog ***

2018-04-27  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* cfgexpand.c (stack_protect_prologue): Emit scheduler barriers around
	stack protector code.
	* function.c (stack_protect_epilogue): Likewise.
	* config/arm/arm-protos.h (arm_stack_chk_guard_decl_p): Declare.
	* config/arm/arm.md (calculate_pic_address): Mark memory volatile if
	is computing address of stack protector's guard.
	(calculate_pic_address splitter): Likewise.
	* config/arm/arm.c (require_pic_register): Add parameter to control
	whether to insert instruction at the end of the instruction stream.
	(legitimize_pic_address): Force computing PIC address at the end of
	instruction stream and adapt logic to change in calculate_pic_address
	insn pattern.
	(arm_stack_chk_guard_decl_p): New function.
	(arm_emit_call_insn): Adapt to change in require_pic_register().
	* target.def (TARGET_STACK_PROTECT_GUARD): Document requirement on
	guard's address computation to be careful about not spilling.
	* doc/tm.texi: Regenerate.

*** gcc/testsuite/ChangeLog ***

2018-04-27  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New testcase.

Testing: The code has been boostraped on an Armv8-A machine targeting:
- Aarch32 ARM mode with -mfpu=neon-fpv4 and hardfloat ABI
- Aarch64
Testsuite has been run for the following sets of flag:

- arm-eabi-aem/-mthumb/-march=armv4t
- arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp
- arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard

(thereby testing the code for ARM, Thumb-2 and Thumb-1 mode) without any
regression.

Is it ok for trunk?

Best regards,

Thomas
---
 gcc/cfgexpand.c                        |   6 +
 gcc/config/arm/arm-protos.h            |   1 +
 gcc/config/arm/arm.c                   |  55 ++++++---
 gcc/config/arm/arm.md                  |  39 ++++++-
 gcc/doc/tm.texi                        |   4 +
 gcc/function.c                         |   5 +
 gcc/target.def                         |   4 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++++++++++
 8 files changed, 295 insertions(+), 19 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index deab929..c7ced8f 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6156,6 +6156,10 @@  stack_protect_prologue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
 
+  /* Prevent scheduling of instruction(s) between computation of the guard's
+     address and setting of the canari to avoid any spill of the guard's
+     address if computed outside the setting of the canari.  */
+  emit_insn (gen_blockage ());
   x = expand_normal (crtl->stack_protect_guard);
   if (guard_decl)
     y = expand_normal (guard_decl);
@@ -6168,11 +6172,13 @@  stack_protect_prologue (void)
     if (rtx_insn *insn = targetm.gen_stack_protect_set (x, y))
       {
 	emit_insn (insn);
+	emit_insn (gen_blockage ());
 	return;
       }
 
   /* Otherwise do a straight move.  */
   emit_move_insn (x, y);
+  emit_insn (gen_blockage ());
 }
 
 /* Translate the intermediate representation contained in the CFG
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 9d0acde..7f378be 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,6 +67,7 @@  extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
+extern bool arm_stack_chk_guard_decl_p (rtx);
 extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 08120c6..fb75e24 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7386,10 +7386,11 @@  legitimate_pic_operand_p (rtx x)
 }
 
 /* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.  INSERT_ON_EDGE
+   indicates whether to insert on entry edge or in the final insn stream.  */
 
 static void
-require_pic_register (void)
+require_pic_register (bool insert_on_edge)
 {
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
@@ -7440,12 +7441,14 @@  require_pic_register (void)
 		if (INSN_P (insn))
 		  INSN_LOCATION (insn) = prologue_location;
 
-	      /* We can be called during expansion of PHI nodes, where
-	         we can't yet emit instructions directly in the final
-		 insn stream.  Queue the insns on the entry edge, they will
-		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      /* We can be called during expansion of PHI nodes, where we can't
+		 yet emit instructions directly in the final insn stream.  When
+		 this is the case, insert_on_edge is true and we queue the
+		 insns on the entry edge.  They are then committed after
+		 everything else is expanded.  */
+	      if (insert_on_edge)
+		insert_insn_on_edge (seq,
+				     single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
 	    }
 	}
     }
@@ -7485,14 +7488,29 @@  legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx pat;
 	  rtx mem;
 
-	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  /* If this function doesn't have a pic register, create one now.
+	     Recreate one if user is stack protector code to reduce live
+	     range and reduce the risk of spilling.  */
+	  if (arm_stack_chk_guard_decl_p (orig))
+	    {
+	      crtl->uses_pic_offset_table = 0;
+	      cfun->machine->pic_reg = NULL_RTX;
+	      require_pic_register (false);
+	    }
+	  else
+	    require_pic_register (true);
 
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  insn = (rtx_insn *)
+	    gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = PATTERN (insn);
 
-	  /* Make the MEM as close to a constant as possible.  */
+	  /* Make the MEM as close to a constant as possible except when
+	     computing address of stack smashing protector's guard.  */
 	  mem = SET_SRC (pat);
-	  gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem));
+	  gcc_assert (MEM_P (mem));
+	  gcc_assert (arm_stack_chk_guard_decl_p (orig)
+		      ? MEM_VOLATILE_P (mem)
+		      : !MEM_VOLATILE_P (mem));
 	  MEM_READONLY_P (mem) = 1;
 	  MEM_NOTRAP_P (mem) = 1;
 
@@ -8494,6 +8512,15 @@  arm_tls_descseq_addr (rtx x, rtx reg)
   return reg;
 }
 
+/* Returns whether X is the declaration tree for stack smashing protector's
+   guard.  */
+
+bool
+arm_stack_chk_guard_decl_p (rtx x)
+{
+  return SYMBOL_REF_DECL (x) == targetm.stack_protect_guard ();
+}
+
 rtx
 legitimize_tls_address (rtx x, rtx reg)
 {
@@ -18078,7 +18105,7 @@  arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (true);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 2d5359e..0ba5133 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6150,11 +6150,23 @@ 
 ;;
 ;; Note: Update arm.c: legitimize_pic_address() when changing this pattern.
 (define_expand "calculate_pic_address"
-  [(set (match_operand:SI 0 "register_operand" "")
-	(mem:SI (plus:SI (match_operand:SI 1 "register_operand" "")
-			 (unspec:SI [(match_operand:SI 2 "" "")]
-				    UNSPEC_PIC_SYM))))]
+  [(match_operand:SI 0 "register_operand" "")
+   (match_operand:SI 1 "register_operand" "")
+   (match_operand:SI 2 "" "")]
   "flag_pic"
+  "
+  {
+    rtvec unspec_vec = gen_rtvec (1, operands[2]);
+    rtx unspec = gen_rtx_UNSPEC (SImode, unspec_vec, UNSPEC_PIC_SYM);
+    rtx plus = gen_rtx_PLUS (SImode, operands[1], unspec);
+    rtx mem = gen_rtx_MEM (SImode, plus);
+    /* PR85434: make access volatile to prevent CSE of expression computing
+       stack smashing protector's guard's address.  */
+    if (arm_stack_chk_guard_decl_p (operands[2]))
+      MEM_VOLATILE_P (mem) = true;
+    emit_move_insn (operands[0], mem);
+    DONE;
+  }"
 )
 
 ;; Split calculate_pic_address into pic_load_addr_* and a move.
@@ -6166,7 +6178,24 @@ 
   "flag_pic"
   [(set (match_dup 3) (unspec:SI [(match_dup 2)] UNSPEC_PIC_SYM))
    (set (match_dup 0) (mem:SI (plus:SI (match_dup 1) (match_dup 3))))]
-  "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];"
+  "
+  {
+    rtvec unspec_vec;
+    rtx unspec, plus, mem;
+
+    operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];
+    unspec_vec = gen_rtvec (1, operands[2]);
+    unspec = gen_rtx_UNSPEC (SImode, unspec_vec, UNSPEC_PIC_SYM);
+    emit_move_insn (operands[3], unspec);
+    plus = gen_rtx_PLUS (SImode, operands[1], operands[3]);
+    mem = gen_rtx_MEM (SImode, plus);
+    /* PR85434: make access volatile to prevent CSE of expression computing
+       stack smashing protector's guard's address.  */
+    if (arm_stack_chk_guard_decl_p (operands[2]))
+      MEM_VOLATILE_P (mem) = true;
+    emit_move_insn (operands[0], mem);
+    DONE;
+  }"
 )
 
 ;; operand1 is the memory address to go into 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917..804349d 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5027,6 +5027,10 @@  runtime to some random value and is used to initialize the guard value
 that is placed at the top of the local stack frame.  The type of this
 variable must be @code{ptr_type_node}.
 
+It is the responsability of the target backend to ensure that loading its
+address before loading its value does not spill the address or intermediate
+values towards the address.
+
 The default version of this hook creates a variable called
 @samp{__stack_chk_guard}, which is normally defined in @file{libgcc2.c}.
 @end deftypefn
diff --git a/gcc/function.c b/gcc/function.c
index 61515e3..611fcef 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -5098,6 +5098,10 @@  stack_protect_epilogue (void)
   rtx x, y;
   rtx_insn *seq;
 
+  /* Prevent scheduling of instruction(s) between computation of the guard's
+     address and comparison of its value with the canari to avoid any spill
+     of the guard's address if computed outside the check.  */
+  emit_insn (gen_blockage ());
   x = expand_normal (crtl->stack_protect_guard);
   if (guard_decl)
     y = expand_normal (guard_decl);
@@ -5122,6 +5126,7 @@  stack_protect_epilogue (void)
     predict_insn_def (tmp, PRED_NORETURN, TAKEN);
 
   expand_call (targetm.stack_protect_fail (), NULL_RTX, /*ignore=*/true);
+  emit_insn (gen_blockage ());
   free_temp_slots ();
   emit_label (label);
 }
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e..ec831d4 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -4227,6 +4227,10 @@  runtime to some random value and is used to initialize the guard value\n\
 that is placed at the top of the local stack frame.  The type of this\n\
 variable must be @code{ptr_type_node}.\n\
 \n\
+It is the responsability of the target backend to ensure that loading its\n\
+address before loading its value does not spill the address or intermediate\n\
+values towards the address.\n\
+\n\
 The default version of this hook creates a variable called\n\
 @samp{__stack_chk_guard}, which is normally defined in @file{libgcc2.c}.",
  tree, (void),
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 0000000..4143a86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.7.4