[ARM] PR85434: Prevent spilling of stack protector guard's address on ARM

Message ID CAKnkMGuvuevF5tT-_9D2cnLq2GFdG8ncaKVO8AdD_M0KtzOXUg@mail.gmail.com
State New
Headers show
Series
  • [ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
Related show

Commit Message

Thomas Preudhomme July 5, 2018, 2:48 p.m.
In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    PR target/85434
    * target-insns.def (stack_protect_combined_set): Define new standard
    pattern name.
    (stack_protect_combined_test): Likewise.
    * cfgexpand.c (stack_protect_prologue): Try new
    stack_protect_combined_set pattern first.
    * function.c (stack_protect_epilogue): Try new
    stack_protect_combined_test pattern first.
    * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
    parameters to control which register to use as PIC register and force
    reloading PIC register respectively.
    (legitimize_pic_address): Expose above new parameters in prototype and
    adapt recursive calls accordingly.
    (arm_legitimize_address): Adapt to new legitimize_pic_address
    prototype.
    (thumb_legitimize_address): Likewise.
    (arm_emit_call_insn): Adapt to new require_pic_register prototype.
    * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
    change.
    * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
    prototype change.
    (stack_protect_combined_set): New insn_and_split pattern.
    (stack_protect_set): New insn pattern.
    (stack_protect_combined_test): New insn_and_split pattern.
    (stack_protect_test): New insn pattern.
    * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
    (UNSPEC_SP_TEST): Likewise.
    * doc/md.texi (stack_protect_combined_set): Document new standard
    pattern name.
    (stack_protect_set): Clarify that the operand for guard's address is
    legal.
    (stack_protect_combined_test): Document new standard pattern name.
    (stack_protect_test): Clarify that the operand for guard's address is
    legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    PR target/85434
    * gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Comments

Thomas Preudhomme July 10, 2018, 8:56 a.m. | #1
Adding Jeff and Eric since the patch adds an RTL target hook.

Best regards,

Thomas

On Thu, 5 Jul 2018 at 15:48, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>

> In case of high register pressure in PIC mode, address of the stack

> protector's guard can be spilled on ARM targets as shown in PR85434,

> thus allowing an attacker to control what the canary would be compared

> against. ARM does lack stack_protect_set and stack_protect_test insn

> patterns, defining them does not help as the address is expanded

> regularly and the patterns only deal with the copy and test of the

> guard with the canary.

>

> This problem does not occur for x86 targets because the PIC access and

> the test can be done in the same instruction. Aarch64 is exempt too

> because PIC access insn pattern are mov of UNSPEC which prevents it from

> the second access in the epilogue being CSEd in cse_local pass with the

> first access in the prologue.

>

> The approach followed here is to create new "combined" set and test

> standard pattern names that take the unexpanded guard and do the set or

> test. This allows the target to use an opaque pattern (eg. using UNSPEC)

> to hide the individual instructions being generated to the compiler and

> split the pattern into generic load, compare and branch instruction

> after register allocator, therefore avoiding any spilling. This is here

> implemented for the ARM targets. For targets not implementing these new

> standard pattern names, the existing stack_protect_set and

> stack_protect_test pattern names are used.

>

> To be able to split PIC access after register allocation, the functions

> had to be augmented to force a new PIC register load and to control

> which register it loads into. This is because sharing the PIC register

> between prologue and epilogue could lead to spilling due to CSE again

> which an attacker could use to control what the canary gets compared

> against.

>

> ChangeLog entries are as follows:

>

> *** gcc/ChangeLog ***

>

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

>

>     PR target/85434

>     * target-insns.def (stack_protect_combined_set): Define new standard

>     pattern name.

>     (stack_protect_combined_test): Likewise.

>     * cfgexpand.c (stack_protect_prologue): Try new

>     stack_protect_combined_set pattern first.

>     * function.c (stack_protect_epilogue): Try new

>     stack_protect_combined_test pattern first.

>     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

>     parameters to control which register to use as PIC register and force

>     reloading PIC register respectively.

>     (legitimize_pic_address): Expose above new parameters in prototype and

>     adapt recursive calls accordingly.

>     (arm_legitimize_address): Adapt to new legitimize_pic_address

>     prototype.

>     (thumb_legitimize_address): Likewise.

>     (arm_emit_call_insn): Adapt to new require_pic_register prototype.

>     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

>     change.

>     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

>     prototype change.

>     (stack_protect_combined_set): New insn_and_split pattern.

>     (stack_protect_set): New insn pattern.

>     (stack_protect_combined_test): New insn_and_split pattern.

>     (stack_protect_test): New insn pattern.

>     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

>     (UNSPEC_SP_TEST): Likewise.

>     * doc/md.texi (stack_protect_combined_set): Document new standard

>     pattern name.

>     (stack_protect_set): Clarify that the operand for guard's address is

>     legal.

>     (stack_protect_combined_test): Document new standard pattern name.

>     (stack_protect_test): Clarify that the operand for guard's address is

>     legal.

>

> *** gcc/testsuite/ChangeLog ***

>

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

>

>     PR target/85434

>     * gcc.target/arm/pr85434.c: New test.

>

> Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on

> Aarch64. Testsuite shows no regression on these 3 variants either both

> with default flags and with -fstack-protector-all.

>

> Is this ok for trunk? If yes, would this be acceptable as a backport to

> GCC 6, 7 and 8 provided that no regression is found?

>

> Best regards,

>

> Thomas
From d917d48c2005e46154383589f203d06f3c6167e0 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	PR target/85434
	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to new require_pic_register prototype.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(stack_protect_combined_set): New insn_and_split pattern.
	(stack_protect_set): New insn pattern.
	(stack_protect_combined_test): New insn_and_split pattern.
	(stack_protect_test): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	PR target/85434
	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas
---
 gcc/cfgexpand.c                        |  10 ++
 gcc/config/arm/arm-protos.h            |   2 +-
 gcc/config/arm/arm.c                   |  56 ++++++---
 gcc/config/arm/arm.md                  |  92 ++++++++++++++-
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  53 +++++++--
 gcc/function.c                         |  37 ++++--
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++++++++++
 9 files changed, 419 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 9b91279..ea837e0 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6101,8 +6101,18 @@ stack_protect_prologue (void)
 {
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
+  struct expand_operand ops[2];
 
   x = expand_normal (crtl->stack_protect_guard);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  /* Allow the target to compute address of Y and copy it to X without
+     leaking Y into a register.  This combined address + copy pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
+    return;
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262..100844e 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ec3abbc..f4a9705 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)
 }
 
 /* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.
+
+   If not NULL, PIC_REG indicates which register to use as PIC register,
+   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC
+   register to be loaded, irregardless of whether it was loaded previously.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && can_create_pseudo_p ()
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7399,7 +7405,8 @@ require_pic_register (void)
 	  rtx_insn *seq, *insn;
 
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg =
+	      can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7410,7 +7417,8 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && can_create_pseudo_p ())
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
@@ -7427,15 +7435,29 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (can_create_pseudo_p ())
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
@@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
 
 	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
 
@@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a026..9582c3f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -8634,6 +8635,95 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_insn_and_split "stack_protect_combined_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+		   UNSPEC_SP_SET))
+   (match_scratch:SI 2 "=r")
+   (match_scratch:SI 3 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  rtx addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[2],
+					    operands[3], true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[2], operands[1]);
+    }
+}"
+)
+
+(define_insn "stack_protect_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
+(define_insn_and_split "stack_protect_combined_test"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m")
+		    (unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (match_scratch:SI 3 "=r")
+   (match_scratch:SI 4 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq, addr;
+
+  addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[3],
+					    operands[4],
+					    true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[3], operands[1]);
+    }
+  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
+  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
+  DONE;
+})
+
+(define_insn "stack_protect_test"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "r"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  ""
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e..429c937 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 09d6e30..da6423e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7356,22 +7356,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finallystore it), it is the backend's responsability to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
+This pattern, if defined, moves a @code{ptr_mode} value from the legal
+memory in operand 1 to the memory in operand 0 without leaving the
+value in a register afterward.  This is to avoid leaking the value some
+place that an attacker might use to rewrite the stack guard slot after
 having clobbered it.
 
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
+
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finallystore it), it is the backend's responsability to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+legal memory in operand 1 with the memory in operand 0 without leaving
+the value in a register afterward and branches to operand 2 if the
+values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 142cdae..bf96dd6 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
   rtx_insn *seq;
+  struct expand_operand ops[3];
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
-  else
-    y = const0_rtx;
-
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
-    emit_insn (seq);
-  else
-    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  create_fixed_operand (&ops[2], label);
+  /* Allow the target to compute address of Y and compare it with X without
+     leaking Y into a register.  This combined address + compare pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,
+			       3, ops))
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ()
+	  && ((seq = targetm.gen_stack_protect_test (x, y, label))
+	      != NULL_RTX))
+	emit_insn (seq);
+      else
+	emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+    }
 
   /* The noreturn predictor has been moved to the tree level.  The rtl-level
      predictors estimate this branch about 20%, which isn't enough to get
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3..d39889b 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 0000000..4143a86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
Jeff Law July 16, 2018, 9:46 p.m. | #2
On 07/05/2018 08:48 AM, Thomas Preudhomme wrote:
> In case of high register pressure in PIC mode, address of the stack

> protector's guard can be spilled on ARM targets as shown in PR85434,

> thus allowing an attacker to control what the canary would be compared

> against. ARM does lack stack_protect_set and stack_protect_test insn

> patterns, defining them does not help as the address is expanded

> regularly and the patterns only deal with the copy and test of the

> guard with the canary.

> 

> This problem does not occur for x86 targets because the PIC access and

> the test can be done in the same instruction. Aarch64 is exempt too

> because PIC access insn pattern are mov of UNSPEC which prevents it from

> the second access in the epilogue being CSEd in cse_local pass with the

> first access in the prologue.

> 

> The approach followed here is to create new "combined" set and test

> standard pattern names that take the unexpanded guard and do the set or

> test. This allows the target to use an opaque pattern (eg. using UNSPEC)

> to hide the individual instructions being generated to the compiler and

> split the pattern into generic load, compare and branch instruction

> after register allocator, therefore avoiding any spilling. This is here

> implemented for the ARM targets. For targets not implementing these new

> standard pattern names, the existing stack_protect_set and

> stack_protect_test pattern names are used.

> 

> To be able to split PIC access after register allocation, the functions

> had to be augmented to force a new PIC register load and to control

> which register it loads into. This is because sharing the PIC register

> between prologue and epilogue could lead to spilling due to CSE again

> which an attacker could use to control what the canary gets compared

> against.

> 

> ChangeLog entries are as follows:

> 

> *** gcc/ChangeLog ***

> 

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> 

>     PR target/85434

>     * target-insns.def (stack_protect_combined_set): Define new standard

>     pattern name.

>     (stack_protect_combined_test): Likewise.

>     * cfgexpand.c (stack_protect_prologue): Try new

>     stack_protect_combined_set pattern first.

>     * function.c (stack_protect_epilogue): Try new

>     stack_protect_combined_test pattern first.

>     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

>     parameters to control which register to use as PIC register and force

>     reloading PIC register respectively.

>     (legitimize_pic_address): Expose above new parameters in prototype and

>     adapt recursive calls accordingly.

>     (arm_legitimize_address): Adapt to new legitimize_pic_address

>     prototype.

>     (thumb_legitimize_address): Likewise.

>     (arm_emit_call_insn): Adapt to new require_pic_register prototype.

>     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

>     change.

>     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

>     prototype change.

>     (stack_protect_combined_set): New insn_and_split pattern.

>     (stack_protect_set): New insn pattern.

>     (stack_protect_combined_test): New insn_and_split pattern.

>     (stack_protect_test): New insn pattern.

>     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

>     (UNSPEC_SP_TEST): Likewise.

>     * doc/md.texi (stack_protect_combined_set): Document new standard

>     pattern name.

>     (stack_protect_set): Clarify that the operand for guard's address is

>     legal.

>     (stack_protect_combined_test): Document new standard pattern name.

>     (stack_protect_test): Clarify that the operand for guard's address is

>     legal.

> 

> *** gcc/testsuite/ChangeLog ***

> 

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> 

>     PR target/85434

>     * gcc.target/arm/pr85434.c: New test.

> 

> Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on

> Aarch64. Testsuite shows no regression on these 3 variants either both

> with default flags and with -fstack-protector-all.

> 

> Is this ok for trunk? If yes, would this be acceptable as a backport to

> GCC 6, 7 and 8 provided that no regression is found?

> 

> Best regards,

> 

> Thomas

> 

> 

> 0001-PR85434-Prevent-spilling-of-stack-protector-guard-s-.patch

> 

> 

> From d917d48c2005e46154383589f203d06f3c6167e0 Mon Sep 17 00:00:00 2001

> From: Thomas Preud'homme <thomas.preudhomme@linaro.org>

> Date: Tue, 8 May 2018 15:47:05 +0100

> Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address

>  on ARM

> 

> In case of high register pressure in PIC mode, address of the stack

> protector's guard can be spilled on ARM targets as shown in PR85434,

> thus allowing an attacker to control what the canary would be compared

> against. ARM does lack stack_protect_set and stack_protect_test insn

> patterns, defining them does not help as the address is expanded

> regularly and the patterns only deal with the copy and test of the

> guard with the canary.

> 

> This problem does not occur for x86 targets because the PIC access and

> the test can be done in the same instruction. Aarch64 is exempt too

> because PIC access insn pattern are mov of UNSPEC which prevents it from

> the second access in the epilogue being CSEd in cse_local pass with the

> first access in the prologue.

> 

> The approach followed here is to create new "combined" set and test

> standard pattern names that take the unexpanded guard and do the set or

> test. This allows the target to use an opaque pattern (eg. using UNSPEC)

> to hide the individual instructions being generated to the compiler and

> split the pattern into generic load, compare and branch instruction

> after register allocator, therefore avoiding any spilling. This is here

> implemented for the ARM targets. For targets not implementing these new

> standard pattern names, the existing stack_protect_set and

> stack_protect_test pattern names are used.

> 

> To be able to split PIC access after register allocation, the functions

> had to be augmented to force a new PIC register load and to control

> which register it loads into. This is because sharing the PIC register

> between prologue and epilogue could lead to spilling due to CSE again

> which an attacker could use to control what the canary gets compared

> against.

> 

> ChangeLog entries are as follows:

> 

> *** gcc/ChangeLog ***

> 

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> 

> 	PR target/85434

> 	* target-insns.def (stack_protect_combined_set): Define new standard

> 	pattern name.

> 	(stack_protect_combined_test): Likewise.

> 	* cfgexpand.c (stack_protect_prologue): Try new

> 	stack_protect_combined_set pattern first.

> 	* function.c (stack_protect_epilogue): Try new

> 	stack_protect_combined_test pattern first.

> 	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

> 	parameters to control which register to use as PIC register and force

> 	reloading PIC register respectively.

> 	(legitimize_pic_address): Expose above new parameters in prototype and

> 	adapt recursive calls accordingly.

> 	(arm_legitimize_address): Adapt to new legitimize_pic_address

> 	prototype.

> 	(thumb_legitimize_address): Likewise.

> 	(arm_emit_call_insn): Adapt to new require_pic_register prototype.

> 	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

> 	change.

> 	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

> 	prototype change.

> 	(stack_protect_combined_set): New insn_and_split pattern.

> 	(stack_protect_set): New insn pattern.

> 	(stack_protect_combined_test): New insn_and_split pattern.

> 	(stack_protect_test): New insn pattern.

> 	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

> 	(UNSPEC_SP_TEST): Likewise.

> 	* doc/md.texi (stack_protect_combined_set): Document new standard

> 	pattern name.

> 	(stack_protect_set): Clarify that the operand for guard's address is

> 	legal.

> 	(stack_protect_combined_test): Document new standard pattern name.

> 	(stack_protect_test): Clarify that the operand for guard's address is

> 	legal.

> 

> *** gcc/testsuite/ChangeLog ***

> 

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> 

> 	PR target/85434

> 	* gcc.target/arm/pr85434.c: New test.

Commenting strictly on the target independent bits...  WRT backports,
the release managers would have the final say.




> 

> Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on

> Aarch64. Testsuite shows no regression on these 3 variants either both

> with default flags and with -fstack-protector-all.

> 

> Is this ok for trunk? If yes, would this be acceptable as a backport to

> GCC 6, 7 and 8 provided that no regression is found?

> 

> Best regards,

> 

> Thomas

> ---

>  gcc/cfgexpand.c                        |  10 ++

>  gcc/config/arm/arm-protos.h            |   2 +-

>  gcc/config/arm/arm.c                   |  56 ++++++---

>  gcc/config/arm/arm.md                  |  92 ++++++++++++++-

>  gcc/config/arm/unspecs.md              |   3 +

>  gcc/doc/md.texi                        |  53 +++++++--

>  gcc/function.c                         |  37 ++++--

>  gcc/target-insns.def                   |   2 +

>  gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++++++++++

>  9 files changed, 419 insertions(+), 36 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

> 

> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

> index 09d6e30..da6423e 100644

> --- a/gcc/doc/md.texi

> +++ b/gcc/doc/md.texi

> @@ -7356,22 +7356,61 @@ builtins.

>  The get/set patterns have a single output/input operand respectively,

>  with @var{mode} intended to be @code{Pmode}.

>  

> +@cindex @code{stack_protect_combined_set} instruction pattern

> +@item @samp{stack_protect_combined_set}

> +This pattern, if defined, moves a @code{ptr_mode} value from an address

> +whose declaration RTX is given in operand 1 to the memory in operand 0

> +without leaving the value in a register afterward.  If several

> +instructions are needed by the target to perform the operation (eg. to

> +load the address from a GOT entry then load the @code{ptr_mode} value

> +and finallystore it), it is the backend's responsability to ensure no

"finally store" and "responsibility"

> +intermediate result gets spilled.  This is to avoid leaking the value

> +some place that an attacker might use to rewrite the stack guard slot

> +after having clobbered it.

> +

> +If this pattern is not defined, then the address declaration is

> +expanded first in the standard way and a @code{stack_protect_set}

> +pattern is then generated to move the value from that address to the

> +address in operand 0.

> +

>  @cindex @code{stack_protect_set} instruction pattern

>  @item @samp{stack_protect_set}

> -This pattern, if defined, moves a @code{ptr_mode} value from the memory

> -in operand 1 to the memory in operand 0 without leaving the value in

> -a register afterward.  This is to avoid leaking the value some place

> -that an attacker might use to rewrite the stack guard slot after

> +This pattern, if defined, moves a @code{ptr_mode} value from the legal

s/legal memory/valid memory location/


> +memory in operand 1 to the memory in operand 0 without leaving the

> +value in a register afterward.  This is to avoid leaking the value some

> +place that an attacker might use to rewrite the stack guard slot after

>  having clobbered it.

>  

> +Note: on targets where the addressing modes do not allow to load

> +directly from stack guard address, the address is expanded in a standard

> +way first which could cause some spills.

> +

>  If this pattern is not defined, then a plain move pattern is generated.

>  

> +@cindex @code{stack_protect_combined_test} instruction pattern

> +@item @samp{stack_protect_combined_test}

> +This pattern, if defined, compares a @code{ptr_mode} value from an

> +address whose declaration RTX is given in operand 1 with the memory in

> +operand 0 without leaving the value in a register afterward and

> +branches to operand 2 if the values were equal.  If several

> +instructions are needed by the target to perform the operation (eg. to

> +load the address from a GOT entry then load the @code{ptr_mode} value

> +and finallystore it), it is the backend's responsability to ensure no

"finally store" and "responsibility" again.


> +intermediate result gets spilled.  This is to avoid leaking the value

> +some place that an attacker might use to rewrite the stack guard slot

> +after having clobbered it.

> +

> +If this pattern is not defined, then the address declaration is

> +expanded first in the standard way and a @code{stack_protect_test}

> +pattern is then generated to compare the value from that address to the

> +value at the memory in operand 0.

> +

>  @cindex @code{stack_protect_test} instruction pattern

>  @item @samp{stack_protect_test}

>  This pattern, if defined, compares a @code{ptr_mode} value from the

> -memory in operand 1 with the memory in operand 0 without leaving the

> -value in a register afterward and branches to operand 2 if the values

> -were equal.

> +legal memory in operand 1 with the memory in operand 0 without leaving

s/legal memory/valid memory location/

The rest of the target independent stuff is fine.
jeff
Thomas Preudhomme July 17, 2018, 11:02 a.m. | #3
Fixed in attached patch. ChangeLog entries are unchanged:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    PR target/85434
    * target-insns.def (stack_protect_combined_set): Define new standard
    pattern name.
    (stack_protect_combined_test): Likewise.
    * cfgexpand.c (stack_protect_prologue): Try new
    stack_protect_combined_set pattern first.
    * function.c (stack_protect_epilogue): Try new
    stack_protect_combined_test pattern first.
    * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
    parameters to control which register to use as PIC register and force
    reloading PIC register respectively.
    (legitimize_pic_address): Expose above new parameters in prototype and
    adapt recursive calls accordingly.
    (arm_legitimize_address): Adapt to new legitimize_pic_address
    prototype.
    (thumb_legitimize_address): Likewise.
    (arm_emit_call_insn): Adapt to new require_pic_register prototype.
    * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
    change.
    * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
    prototype change.
    (stack_protect_combined_set): New insn_and_split pattern.
    (stack_protect_set): New insn pattern.
    (stack_protect_combined_test): New insn_and_split pattern.
    (stack_protect_test): New insn pattern.
    * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
    (UNSPEC_SP_TEST): Likewise.
    * doc/md.texi (stack_protect_combined_set): Document new standard
    pattern name.
    (stack_protect_set): Clarify that the operand for guard's address is
    legal.
    (stack_protect_combined_test): Document new standard pattern name.
    (stack_protect_test): Clarify that the operand for guard's address is
    legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    PR target/85434
    * gcc.target/arm/pr85434.c: New test.

Best regards,

Thomas
On Mon, 16 Jul 2018 at 22:46, Jeff Law <law@redhat.com> wrote:
>

> On 07/05/2018 08:48 AM, Thomas Preudhomme wrote:

> > In case of high register pressure in PIC mode, address of the stack

> > protector's guard can be spilled on ARM targets as shown in PR85434,

> > thus allowing an attacker to control what the canary would be compared

> > against. ARM does lack stack_protect_set and stack_protect_test insn

> > patterns, defining them does not help as the address is expanded

> > regularly and the patterns only deal with the copy and test of the

> > guard with the canary.

> >

> > This problem does not occur for x86 targets because the PIC access and

> > the test can be done in the same instruction. Aarch64 is exempt too

> > because PIC access insn pattern are mov of UNSPEC which prevents it from

> > the second access in the epilogue being CSEd in cse_local pass with the

> > first access in the prologue.

> >

> > The approach followed here is to create new "combined" set and test

> > standard pattern names that take the unexpanded guard and do the set or

> > test. This allows the target to use an opaque pattern (eg. using UNSPEC)

> > to hide the individual instructions being generated to the compiler and

> > split the pattern into generic load, compare and branch instruction

> > after register allocator, therefore avoiding any spilling. This is here

> > implemented for the ARM targets. For targets not implementing these new

> > standard pattern names, the existing stack_protect_set and

> > stack_protect_test pattern names are used.

> >

> > To be able to split PIC access after register allocation, the functions

> > had to be augmented to force a new PIC register load and to control

> > which register it loads into. This is because sharing the PIC register

> > between prologue and epilogue could lead to spilling due to CSE again

> > which an attacker could use to control what the canary gets compared

> > against.

> >

> > ChangeLog entries are as follows:

> >

> > *** gcc/ChangeLog ***

> >

> > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> >

> >     PR target/85434

> >     * target-insns.def (stack_protect_combined_set): Define new standard

> >     pattern name.

> >     (stack_protect_combined_test): Likewise.

> >     * cfgexpand.c (stack_protect_prologue): Try new

> >     stack_protect_combined_set pattern first.

> >     * function.c (stack_protect_epilogue): Try new

> >     stack_protect_combined_test pattern first.

> >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

> >     parameters to control which register to use as PIC register and force

> >     reloading PIC register respectively.

> >     (legitimize_pic_address): Expose above new parameters in prototype and

> >     adapt recursive calls accordingly.

> >     (arm_legitimize_address): Adapt to new legitimize_pic_address

> >     prototype.

> >     (thumb_legitimize_address): Likewise.

> >     (arm_emit_call_insn): Adapt to new require_pic_register prototype.

> >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

> >     change.

> >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

> >     prototype change.

> >     (stack_protect_combined_set): New insn_and_split pattern.

> >     (stack_protect_set): New insn pattern.

> >     (stack_protect_combined_test): New insn_and_split pattern.

> >     (stack_protect_test): New insn pattern.

> >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

> >     (UNSPEC_SP_TEST): Likewise.

> >     * doc/md.texi (stack_protect_combined_set): Document new standard

> >     pattern name.

> >     (stack_protect_set): Clarify that the operand for guard's address is

> >     legal.

> >     (stack_protect_combined_test): Document new standard pattern name.

> >     (stack_protect_test): Clarify that the operand for guard's address is

> >     legal.

> >

> > *** gcc/testsuite/ChangeLog ***

> >

> > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> >

> >     PR target/85434

> >     * gcc.target/arm/pr85434.c: New test.

> >

> > Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on

> > Aarch64. Testsuite shows no regression on these 3 variants either both

> > with default flags and with -fstack-protector-all.

> >

> > Is this ok for trunk? If yes, would this be acceptable as a backport to

> > GCC 6, 7 and 8 provided that no regression is found?

> >

> > Best regards,

> >

> > Thomas

> >

> >

> > 0001-PR85434-Prevent-spilling-of-stack-protector-guard-s-.patch

> >

> >

> > From d917d48c2005e46154383589f203d06f3c6167e0 Mon Sep 17 00:00:00 2001

> > From: Thomas Preud'homme <thomas.preudhomme@linaro.org>

> > Date: Tue, 8 May 2018 15:47:05 +0100

> > Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address

> >  on ARM

> >

> > In case of high register pressure in PIC mode, address of the stack

> > protector's guard can be spilled on ARM targets as shown in PR85434,

> > thus allowing an attacker to control what the canary would be compared

> > against. ARM does lack stack_protect_set and stack_protect_test insn

> > patterns, defining them does not help as the address is expanded

> > regularly and the patterns only deal with the copy and test of the

> > guard with the canary.

> >

> > This problem does not occur for x86 targets because the PIC access and

> > the test can be done in the same instruction. Aarch64 is exempt too

> > because PIC access insn pattern are mov of UNSPEC which prevents it from

> > the second access in the epilogue being CSEd in cse_local pass with the

> > first access in the prologue.

> >

> > The approach followed here is to create new "combined" set and test

> > standard pattern names that take the unexpanded guard and do the set or

> > test. This allows the target to use an opaque pattern (eg. using UNSPEC)

> > to hide the individual instructions being generated to the compiler and

> > split the pattern into generic load, compare and branch instruction

> > after register allocator, therefore avoiding any spilling. This is here

> > implemented for the ARM targets. For targets not implementing these new

> > standard pattern names, the existing stack_protect_set and

> > stack_protect_test pattern names are used.

> >

> > To be able to split PIC access after register allocation, the functions

> > had to be augmented to force a new PIC register load and to control

> > which register it loads into. This is because sharing the PIC register

> > between prologue and epilogue could lead to spilling due to CSE again

> > which an attacker could use to control what the canary gets compared

> > against.

> >

> > ChangeLog entries are as follows:

> >

> > *** gcc/ChangeLog ***

> >

> > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> >

> >       PR target/85434

> >       * target-insns.def (stack_protect_combined_set): Define new standard

> >       pattern name.

> >       (stack_protect_combined_test): Likewise.

> >       * cfgexpand.c (stack_protect_prologue): Try new

> >       stack_protect_combined_set pattern first.

> >       * function.c (stack_protect_epilogue): Try new

> >       stack_protect_combined_test pattern first.

> >       * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

> >       parameters to control which register to use as PIC register and force

> >       reloading PIC register respectively.

> >       (legitimize_pic_address): Expose above new parameters in prototype and

> >       adapt recursive calls accordingly.

> >       (arm_legitimize_address): Adapt to new legitimize_pic_address

> >       prototype.

> >       (thumb_legitimize_address): Likewise.

> >       (arm_emit_call_insn): Adapt to new require_pic_register prototype.

> >       * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

> >       change.

> >       * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

> >       prototype change.

> >       (stack_protect_combined_set): New insn_and_split pattern.

> >       (stack_protect_set): New insn pattern.

> >       (stack_protect_combined_test): New insn_and_split pattern.

> >       (stack_protect_test): New insn pattern.

> >       * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

> >       (UNSPEC_SP_TEST): Likewise.

> >       * doc/md.texi (stack_protect_combined_set): Document new standard

> >       pattern name.

> >       (stack_protect_set): Clarify that the operand for guard's address is

> >       legal.

> >       (stack_protect_combined_test): Document new standard pattern name.

> >       (stack_protect_test): Clarify that the operand for guard's address is

> >       legal.

> >

> > *** gcc/testsuite/ChangeLog ***

> >

> > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

> >

> >       PR target/85434

> >       * gcc.target/arm/pr85434.c: New test.

> Commenting strictly on the target independent bits...  WRT backports,

> the release managers would have the final say.

>

>

>

>

> >

> > Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on

> > Aarch64. Testsuite shows no regression on these 3 variants either both

> > with default flags and with -fstack-protector-all.

> >

> > Is this ok for trunk? If yes, would this be acceptable as a backport to

> > GCC 6, 7 and 8 provided that no regression is found?

> >

> > Best regards,

> >

> > Thomas

> > ---

> >  gcc/cfgexpand.c                        |  10 ++

> >  gcc/config/arm/arm-protos.h            |   2 +-

> >  gcc/config/arm/arm.c                   |  56 ++++++---

> >  gcc/config/arm/arm.md                  |  92 ++++++++++++++-

> >  gcc/config/arm/unspecs.md              |   3 +

> >  gcc/doc/md.texi                        |  53 +++++++--

> >  gcc/function.c                         |  37 ++++--

> >  gcc/target-insns.def                   |   2 +

> >  gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++++++++++

> >  9 files changed, 419 insertions(+), 36 deletions(-)

> >  create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

> >

> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

> > index 09d6e30..da6423e 100644

> > --- a/gcc/doc/md.texi

> > +++ b/gcc/doc/md.texi

> > @@ -7356,22 +7356,61 @@ builtins.

> >  The get/set patterns have a single output/input operand respectively,

> >  with @var{mode} intended to be @code{Pmode}.

> >

> > +@cindex @code{stack_protect_combined_set} instruction pattern

> > +@item @samp{stack_protect_combined_set}

> > +This pattern, if defined, moves a @code{ptr_mode} value from an address

> > +whose declaration RTX is given in operand 1 to the memory in operand 0

> > +without leaving the value in a register afterward.  If several

> > +instructions are needed by the target to perform the operation (eg. to

> > +load the address from a GOT entry then load the @code{ptr_mode} value

> > +and finallystore it), it is the backend's responsability to ensure no

> "finally store" and "responsibility"

>

> > +intermediate result gets spilled.  This is to avoid leaking the value

> > +some place that an attacker might use to rewrite the stack guard slot

> > +after having clobbered it.

> > +

> > +If this pattern is not defined, then the address declaration is

> > +expanded first in the standard way and a @code{stack_protect_set}

> > +pattern is then generated to move the value from that address to the

> > +address in operand 0.

> > +

> >  @cindex @code{stack_protect_set} instruction pattern

> >  @item @samp{stack_protect_set}

> > -This pattern, if defined, moves a @code{ptr_mode} value from the memory

> > -in operand 1 to the memory in operand 0 without leaving the value in

> > -a register afterward.  This is to avoid leaking the value some place

> > -that an attacker might use to rewrite the stack guard slot after

> > +This pattern, if defined, moves a @code{ptr_mode} value from the legal

> s/legal memory/valid memory location/

>

>

> > +memory in operand 1 to the memory in operand 0 without leaving the

> > +value in a register afterward.  This is to avoid leaking the value some

> > +place that an attacker might use to rewrite the stack guard slot after

> >  having clobbered it.

> >

> > +Note: on targets where the addressing modes do not allow to load

> > +directly from stack guard address, the address is expanded in a standard

> > +way first which could cause some spills.

> > +

> >  If this pattern is not defined, then a plain move pattern is generated.

> >

> > +@cindex @code{stack_protect_combined_test} instruction pattern

> > +@item @samp{stack_protect_combined_test}

> > +This pattern, if defined, compares a @code{ptr_mode} value from an

> > +address whose declaration RTX is given in operand 1 with the memory in

> > +operand 0 without leaving the value in a register afterward and

> > +branches to operand 2 if the values were equal.  If several

> > +instructions are needed by the target to perform the operation (eg. to

> > +load the address from a GOT entry then load the @code{ptr_mode} value

> > +and finallystore it), it is the backend's responsability to ensure no

> "finally store" and "responsibility" again.

>

>

> > +intermediate result gets spilled.  This is to avoid leaking the value

> > +some place that an attacker might use to rewrite the stack guard slot

> > +after having clobbered it.

> > +

> > +If this pattern is not defined, then the address declaration is

> > +expanded first in the standard way and a @code{stack_protect_test}

> > +pattern is then generated to compare the value from that address to the

> > +value at the memory in operand 0.

> > +

> >  @cindex @code{stack_protect_test} instruction pattern

> >  @item @samp{stack_protect_test}

> >  This pattern, if defined, compares a @code{ptr_mode} value from the

> > -memory in operand 1 with the memory in operand 0 without leaving the

> > -value in a register afterward and branches to operand 2 if the values

> > -were equal.

> > +legal memory in operand 1 with the memory in operand 0 without leaving

> s/legal memory/valid memory location/

>

> The rest of the target independent stuff is fine.

> jeff
From 97b8ad25109c042cc6b8bb40a8e885e311bdcdbf Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to new require_pic_register prototype.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(stack_protect_combined_set): New insn_and_split pattern.
	(stack_protect_set): New insn pattern.
	(stack_protect_combined_test): New insn_and_split pattern.
	(stack_protect_test): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  10 ++
 gcc/config/arm/arm-protos.h            |   2 +-
 gcc/config/arm/arm.c                   |  56 +++++--
 gcc/config/arm/arm.md                  |  92 +++++++++++-
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  37 +++--
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 9 files changed, 420 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..d1a893ac56e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6105,8 +6105,18 @@ stack_protect_prologue (void)
 {
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
+  struct expand_operand ops[2];
 
   x = expand_normal (crtl->stack_protect_guard);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  /* Allow the target to compute address of Y and copy it to X without
+     leaking Y into a register.  This combined address + copy pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
+    return;
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262ce64..100844e659c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ec3abbcba9f..f4a970580c2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)
 }
 
 /* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.
+
+   If not NULL, PIC_REG indicates which register to use as PIC register,
+   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC
+   register to be loaded, irregardless of whether it was loaded previously.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && can_create_pseudo_p ()
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7399,7 +7405,8 @@ require_pic_register (void)
 	  rtx_insn *seq, *insn;
 
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg =
+	      can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7410,7 +7417,8 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && can_create_pseudo_p ())
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
@@ -7427,15 +7435,29 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (can_create_pseudo_p ())
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
@@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
 
 	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
 
@@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a02668b0..9582c3fbb11 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -8634,6 +8635,95 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_insn_and_split "stack_protect_combined_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+		   UNSPEC_SP_SET))
+   (match_scratch:SI 2 "=r")
+   (match_scratch:SI 3 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  rtx addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[2],
+					    operands[3], true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[2], operands[1]);
+    }
+}"
+)
+
+(define_insn "stack_protect_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
+(define_insn_and_split "stack_protect_combined_test"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m")
+		    (unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (match_scratch:SI 3 "=r")
+   (match_scratch:SI 4 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq, addr;
+
+  addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[3],
+					    operands[4],
+					    true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[3], operands[1]);
+    }
+  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
+  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
+  DONE;
+})
+
+(define_insn "stack_protect_test"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "r"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  ""
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e10e4..429c937df60 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 734bc765387..f760da495bf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7362,22 +7362,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 142cdaec2ce..bf96dd6feae 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
   rtx_insn *seq;
+  struct expand_operand ops[3];
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
-  else
-    y = const0_rtx;
-
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
-    emit_insn (seq);
-  else
-    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  create_fixed_operand (&ops[2], label);
+  /* Allow the target to compute address of Y and compare it with X without
+     leaking Y into a register.  This combined address + compare pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,
+			       3, ops))
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ()
+	  && ((seq = targetm.gen_stack_protect_test (x, y, label))
+	      != NULL_RTX))
+	emit_insn (seq);
+      else
+	emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+    }
 
   /* The noreturn predictor has been moved to the tree level.  The rtl-level
      predictors estimate this branch about 20%, which isn't enough to get
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
Kyrill Tkachov July 19, 2018, 11:02 a.m. | #4
Hi Thomas,

On 17/07/18 12:02, Thomas Preudhomme wrote:
> Fixed in attached patch. ChangeLog entries are unchanged:

>

> *** gcc/ChangeLog ***

>

> 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

>

>     PR target/85434

>     * target-insns.def (stack_protect_combined_set): Define new standard

>     pattern name.

>     (stack_protect_combined_test): Likewise.

>     * cfgexpand.c (stack_protect_prologue): Try new

>     stack_protect_combined_set pattern first.

>     * function.c (stack_protect_epilogue): Try new

>     stack_protect_combined_test pattern first.

>     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

>     parameters to control which register to use as PIC register and force

>     reloading PIC register respectively.

>     (legitimize_pic_address): Expose above new parameters in prototype and

>     adapt recursive calls accordingly.

>     (arm_legitimize_address): Adapt to new legitimize_pic_address

>     prototype.

>     (thumb_legitimize_address): Likewise.

>     (arm_emit_call_insn): Adapt to new require_pic_register prototype.

>     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

>     change.

>     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

>     prototype change.

>     (stack_protect_combined_set): New insn_and_split pattern.

>     (stack_protect_set): New insn pattern.

>     (stack_protect_combined_test): New insn_and_split pattern.

>     (stack_protect_test): New insn pattern.

>     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

>     (UNSPEC_SP_TEST): Likewise.

>     * doc/md.texi (stack_protect_combined_set): Document new standard

>     pattern name.

>     (stack_protect_set): Clarify that the operand for guard's address is

>     legal.

>     (stack_protect_combined_test): Document new standard pattern name.

>     (stack_protect_test): Clarify that the operand for guard's address is

>     legal.

>

> *** gcc/testsuite/ChangeLog ***

>

> 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

>

>     PR target/85434

>     * gcc.target/arm/pr85434.c: New test.

>


Sorry for the delay. Some comments inline.

Kyrill

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..d1a893ac56e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6105,8 +6105,18 @@ stack_protect_prologue (void)
  {
    tree guard_decl = targetm.stack_protect_guard ();
    rtx x, y;
+  struct expand_operand ops[2];
  
    x = expand_normal (crtl->stack_protect_guard);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  /* Allow the target to compute address of Y and copy it to X without
+     leaking Y into a register.  This combined address + copy pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
+    return;
+
    if (guard_decl)
      y = expand_normal (guard_decl);
    else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262ce64..100844e659c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
  extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
  			       HOST_WIDE_INT, rtx, rtx, int);
  extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
  extern rtx legitimize_tls_address (rtx, rtx);
  extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
  extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ec3abbcba9f..f4a970580c2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)
  }
  
  /* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.
+
+   If not NULL, PIC_REG indicates which register to use as PIC register,
+   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC
+   register to be loaded, irregardless of whether it was loaded previously.  */
  
  static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
  {
    /* A lot of the logic here is made obscure by the fact that this
       routine gets called as part of the rtx cost estimation process.
       We don't want those calls to affect any assumptions about the real
       function; and further, we can't call entry_of_function() until we
       start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
      {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
        if (arm_pic_register != INVALID_REGNUM
+	  && can_create_pseudo_p ()
  	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
  	{
  	  if (!cfun->machine->pic_reg)
@@ -7399,7 +7405,8 @@ require_pic_register (void)
  	  rtx_insn *seq, *insn;
  
  	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg =
+	      can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;
  

I'll admit I'm not too familiar with this code, but don't you want to be setting cfun->machine->pic_reg to the pic_reg that
was passed into this function (if pic_reg != NULL as per the function comment)?


  	  /* Play games to avoid marking the function as needing pic
  	     if we are being called as part of the cost-estimation
@@ -7410,7 +7417,8 @@ require_pic_register (void)
  	      start_sequence ();
  
  	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && can_create_pseudo_p ())
  		emit_move_insn (cfun->machine->pic_reg,
  				gen_rtx_REG (Pmode, arm_pic_register));
  	      else
@@ -7427,15 +7435,29 @@ require_pic_register (void)
  	         we can't yet emit instructions directly in the final
  		 insn stream.  Queue the insns on the entry edge, they will
  		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (can_create_pseudo_p ())
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
  	    }

It's not the capability for creating pseudos that is relevant here, but whether we have an instruction
stream to insert insns on. Do you think that using currently_expanding_to_rtl would work as a condition?


  +/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is

"Legitimize"

  +   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.
+
+   Returns the register REG into which the PIC load is performed.  */
+
  rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
  {
    if (GET_CODE (orig) == SYMBOL_REF
        || GET_CODE (orig) == LABEL_REF)
@@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
  	  rtx mem;
  
  	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
  
  	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
  
@@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
  
        gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
  
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
        offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
  
        if (CONST_INT_P (offset))
  	{
@@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
      {
        /* We need to find and carefully transform any SYMBOL and LABEL
  	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
  
        if (new_x != orig_x)
  	x = new_x;
@@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
      {
        /* We need to find and carefully transform any SYMBOL and LABEL
  	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
  
        if (new_x != orig_x)
  	x = new_x;
@@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
  	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
  	  : !SYMBOL_REF_LOCAL_P (addr)))
      {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
        use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
      }
  
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a02668b0..9582c3fbb11 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
        operands[1] = legitimize_pic_address (operands[1], SImode,
  					    (!can_create_pseudo_p ()
  					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
    }
    "
  )
@@ -8634,6 +8635,95 @@
     (set_attr "conds" "clob")]
  )
  
+;; Named patterns for stack smashing protection.
+(define_insn_and_split "stack_protect_combined_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+		   UNSPEC_SP_SET))
+   (match_scratch:SI 2 "=r")
+   (match_scratch:SI 3 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  rtx addr = XEXP (operands[1], 0);


I think you want to tighten the predicate of operands[1].
general_operand is very, well, general, but you're taking its 0th subtree, which it may not necessarily have.

  +  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[2],
+					    operands[3], true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[2], operands[1]);
+    }
+}"
+)
+
+(define_insn "stack_protect_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]

I'm not familiar with this idiom, don't you instead want to mark the constraint on operand 1 as earlyclobber (or read-write)?

  +  ""
+  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
+(define_insn_and_split "stack_protect_combined_test"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m")
+		    (unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (match_scratch:SI 3 "=r")
+   (match_scratch:SI 4 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq, addr;
+
+  addr = XEXP (operands[1], 0);

Same concern as with stack_protect_combined_set about operand 1 predicate being too general.

  +  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[3],
+					    operands[4],
+					    true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[3], operands[1]);
+    }
+  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
+  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
+  DONE;
+})
+
+(define_insn "stack_protect_test"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "r"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]

Same concern on clobbering and constraints.

  +  ""
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
  (define_expand "casesi"
    [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
     (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e10e4..429c937df60 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
    UNSPEC_PROBE_STACK    ; Probe stack memory reference
    UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
  			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
  ])
  
  (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 734bc765387..f760da495bf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7362,22 +7362,61 @@ builtins.
  The get/set patterns have a single output/input operand respectively,
  with @var{mode} intended to be @code{Pmode}.
  
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
  @cindex @code{stack_protect_set} instruction pattern
  @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
  
  If this pattern is not defined, then a plain move pattern is generated.
  
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
  @cindex @code{stack_protect_test} instruction pattern
  @item @samp{stack_protect_test}
  This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
  
  If this pattern is not defined, then a plain compare pattern and
  conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 142cdaec2ce..bf96dd6feae 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)
    rtx_code_label *label = gen_label_rtx ();
    rtx x, y;
    rtx_insn *seq;
+  struct expand_operand ops[3];
  
    x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
-  else
-    y = const0_rtx;
-
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
-    emit_insn (seq);
-  else
-    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  create_fixed_operand (&ops[2], label);
+  /* Allow the target to compute address of Y and compare it with X without
+     leaking Y into a register.  This combined address + compare pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,
+			       3, ops))
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ()
+	  && ((seq = targetm.gen_stack_protect_test (x, y, label))
+	      != NULL_RTX))
+	emit_insn (seq);
+      else
+	emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+    }
  
    /* The noreturn predictor has been moved to the tree level.  The rtl-level
       predictors estimate this branch about 20%, which isn't enough to get
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
  DEF_TARGET_INSN (simple_return, (void))
  DEF_TARGET_INSN (split_stack_prologue, (void))
  DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
  DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
  DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
  DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
  DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
Thomas Preudhomme July 19, 2018, 4:34 p.m. | #5
[Dropping Jeff Law from the list since he already commented on the
middle end parts]

Hi Kyrill,

On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>

> Hi Thomas,

>

> On 17/07/18 12:02, Thomas Preudhomme wrote:

> > Fixed in attached patch. ChangeLog entries are unchanged:

> >

> > *** gcc/ChangeLog ***

> >

> > 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

> >

> >     PR target/85434

> >     * target-insns.def (stack_protect_combined_set): Define new standard

> >     pattern name.

> >     (stack_protect_combined_test): Likewise.

> >     * cfgexpand.c (stack_protect_prologue): Try new

> >     stack_protect_combined_set pattern first.

> >     * function.c (stack_protect_epilogue): Try new

> >     stack_protect_combined_test pattern first.

> >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

> >     parameters to control which register to use as PIC register and force

> >     reloading PIC register respectively.

> >     (legitimize_pic_address): Expose above new parameters in prototype and

> >     adapt recursive calls accordingly.

> >     (arm_legitimize_address): Adapt to new legitimize_pic_address

> >     prototype.

> >     (thumb_legitimize_address): Likewise.

> >     (arm_emit_call_insn): Adapt to new require_pic_register prototype.

> >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

> >     change.

> >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

> >     prototype change.

> >     (stack_protect_combined_set): New insn_and_split pattern.

> >     (stack_protect_set): New insn pattern.

> >     (stack_protect_combined_test): New insn_and_split pattern.

> >     (stack_protect_test): New insn pattern.

> >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

> >     (UNSPEC_SP_TEST): Likewise.

> >     * doc/md.texi (stack_protect_combined_set): Document new standard

> >     pattern name.

> >     (stack_protect_set): Clarify that the operand for guard's address is

> >     legal.

> >     (stack_protect_combined_test): Document new standard pattern name.

> >     (stack_protect_test): Clarify that the operand for guard's address is

> >     legal.

> >

> > *** gcc/testsuite/ChangeLog ***

> >

> > 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

> >

> >     PR target/85434

> >     * gcc.target/arm/pr85434.c: New test.

> >

>

> Sorry for the delay. Some comments inline.

>

> Kyrill

>

> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c

> index d6e3c382085..d1a893ac56e 100644

> --- a/gcc/cfgexpand.c

> +++ b/gcc/cfgexpand.c

> @@ -6105,8 +6105,18 @@ stack_protect_prologue (void)

>   {

>     tree guard_decl = targetm.stack_protect_guard ();

>     rtx x, y;

> +  struct expand_operand ops[2];

>

>     x = expand_normal (crtl->stack_protect_guard);

> +  create_fixed_operand (&ops[0], x);

> +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));

> +  /* Allow the target to compute address of Y and copy it to X without

> +     leaking Y into a register.  This combined address + copy pattern allows

> +     the target to prevent spilling of any intermediate results by splitting

> +     it after register allocator.  */

> +  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))

> +    return;

> +

>     if (guard_decl)

>       y = expand_normal (guard_decl);

>     else

> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

> index 8537262ce64..100844e659c 100644

> --- a/gcc/config/arm/arm-protos.h

> +++ b/gcc/config/arm/arm-protos.h

> @@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);

>   extern int arm_split_constant (RTX_CODE, machine_mode, rtx,

>                                HOST_WIDE_INT, rtx, rtx, int);

>   extern int legitimate_pic_operand_p (rtx);

> -extern rtx legitimize_pic_address (rtx, machine_mode, rtx);

> +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);

>   extern rtx legitimize_tls_address (rtx, rtx);

>   extern bool arm_legitimate_address_p (machine_mode, rtx, bool);

>   extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index ec3abbcba9f..f4a970580c2 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)

>   }

>

>   /* Record that the current function needs a PIC register.  Initialize

> -   cfun->machine->pic_reg if we have not already done so.  */

> +   cfun->machine->pic_reg if we have not already done so.

> +

> +   If not NULL, PIC_REG indicates which register to use as PIC register,

> +   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC

> +   register to be loaded, irregardless of whether it was loaded previously.  */

>

>   static void

> -require_pic_register (void)

> +require_pic_register (rtx pic_reg, bool compute_now)

>   {

>     /* A lot of the logic here is made obscure by the fact that this

>        routine gets called as part of the rtx cost estimation process.

>        We don't want those calls to affect any assumptions about the real

>        function; and further, we can't call entry_of_function() until we

>        start the real expansion process.  */

> -  if (!crtl->uses_pic_offset_table)

> +  if (!crtl->uses_pic_offset_table || compute_now)

>       {

> -      gcc_assert (can_create_pseudo_p ());

> +      gcc_assert (can_create_pseudo_p ()

> +                 || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));

>         if (arm_pic_register != INVALID_REGNUM

> +         && can_create_pseudo_p ()

>           && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))

>         {

>           if (!cfun->machine->pic_reg)

> @@ -7399,7 +7405,8 @@ require_pic_register (void)

>           rtx_insn *seq, *insn;

>

>           if (!cfun->machine->pic_reg)

> -           cfun->machine->pic_reg = gen_reg_rtx (Pmode);

> +           cfun->machine->pic_reg =

> +             can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;

>

>

> I'll admit I'm not too familiar with this code, but don't you want to be setting cfun->machine->pic_reg to the pic_reg that

> was passed into this function (if pic_reg != NULL as per the function comment)?


Indeed the comment needs to be fixed. I think it only makes sense to
specify a register after register allocation when the amount of
register is finite. Before that I don't see why one would want to do
that. I've changed the sentence about PIC_REG in the heading comment
to:

"A new pseudo register is used for the PIC register if possible,
otherwise PIC_REG must be non NULL and is used instead."

>

>

>           /* Play games to avoid marking the function as needing pic

>              if we are being called as part of the cost-estimation

> @@ -7410,7 +7417,8 @@ require_pic_register (void)

>               start_sequence ();

>

>               if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM

> -                 && arm_pic_register > LAST_LO_REGNUM)

> +                 && arm_pic_register > LAST_LO_REGNUM

> +                 && can_create_pseudo_p ())

>                 emit_move_insn (cfun->machine->pic_reg,

>                                 gen_rtx_REG (Pmode, arm_pic_register));

>               else

> @@ -7427,15 +7435,29 @@ require_pic_register (void)

>                  we can't yet emit instructions directly in the final

>                  insn stream.  Queue the insns on the entry edge, they will

>                  be committed after everything else is expanded.  */

> -             insert_insn_on_edge (seq,

> -                                  single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

> +             if (can_create_pseudo_p ())

> +               insert_insn_on_edge (seq,

> +                                    single_succ_edge

> +                                    (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

> +             else

> +               emit_insn (seq);

>             }

>

> It's not the capability for creating pseudos that is relevant here, but whether we have an instruction

> stream to insert insns on. Do you think that using currently_expanding_to_rtl would work as a condition?


Indeed it's not the right test. Do you mean testing if
currently_expanding_to_rtl is false? I used can_create_pseudo_p
because I knew the function could never be called when it was false
before and thus my changes would only impact the code I was changing.

>

>

>   +/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is

>

> "Legitimize"

>

>   +   created to hold the result of the load.  If not NULL, PIC_REG indicates

> +   which register to use as PIC register, otherwise it is decided by register

> +   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current

> +   location in the instruction stream, irregardless of whether it was loaded

> +   previously.

> +

> +   Returns the register REG into which the PIC load is performed.  */

> +

>   rtx

> -legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

> +legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,

> +                       bool compute_now)

>   {

>     if (GET_CODE (orig) == SYMBOL_REF

>         || GET_CODE (orig) == LABEL_REF)

> @@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

>           rtx mem;

>

>           /* If this function doesn't have a pic register, create one now.  */

> -         require_pic_register ();

> +         require_pic_register (pic_reg, compute_now);

>

>           pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);

>

> @@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

>

>         gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);

>

> -      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);

> +      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,

> +                                    pic_reg, compute_now);

>         offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,

> -                                      base == reg ? 0 : reg);

> +                                      base == reg ? 0 : reg, pic_reg,

> +                                      compute_now);

>

>         if (CONST_INT_P (offset))

>         {

> @@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)

>       {

>         /* We need to find and carefully transform any SYMBOL and LABEL

>          references; so go back to the original address expression.  */

> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);

> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,

> +                                         false /*compute_now*/);

>

>         if (new_x != orig_x)

>         x = new_x;

> @@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)

>       {

>         /* We need to find and carefully transform any SYMBOL and LABEL

>          references; so go back to the original address expression.  */

> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);

> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,

> +                                         false /*compute_now*/);

>

>         if (new_x != orig_x)

>         x = new_x;

> @@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)

>           ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))

>           : !SYMBOL_REF_LOCAL_P (addr)))

>       {

> -      require_pic_register ();

> +      require_pic_register (NULL_RTX, false /*compute_now*/);

>         use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);

>       }

>

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

> index 361a02668b0..9582c3fbb11 100644

> --- a/gcc/config/arm/arm.md

> +++ b/gcc/config/arm/arm.md

> @@ -6021,7 +6021,8 @@

>         operands[1] = legitimize_pic_address (operands[1], SImode,

>                                             (!can_create_pseudo_p ()

>                                              ? operands[0]

> -                                            : 0));

> +                                            : NULL_RTX), NULL_RTX,

> +                                           false /*compute_now*/);

>     }

>     "

>   )

> @@ -8634,6 +8635,95 @@

>      (set_attr "conds" "clob")]

>   )

>

> +;; Named patterns for stack smashing protection.

> +(define_insn_and_split "stack_protect_combined_set"

> +  [(set (match_operand:SI 0 "memory_operand" "=m")

> +       (unspec:SI [(match_operand:SI 1 "general_operand" "X")]

> +                  UNSPEC_SP_SET))

> +   (match_scratch:SI 2 "=r")

> +   (match_scratch:SI 3 "=r")]

> +  ""

> +  "#"

> +  "reload_completed"

> +  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]

> +                                           UNSPEC_SP_SET))

> +             (clobber (match_dup 2))])]

> +  "

> +{

> +  rtx addr = XEXP (operands[1], 0);

>

>

> I think you want to tighten the predicate of operands[1].

> general_operand is very, well, general, but you're taking its 0th subtree, which it may not necessarily have.


I was afraid you'd ask that. Yes indeed a better predicate should be
used but I'm not sure all the possible cases here. However it has to
be some sort of memory since this is the address of the guard. I think
the problem with using memory_operand was that it could be a MEM of a
SYMBOL_REF which is not legitimate at this point.

>

>   +  if (flag_pic)

> +    {

> +      /* Forces recomputing of GOT base now.  */

> +      operands[1] = legitimize_pic_address (addr, SImode, operands[2],

> +                                           operands[3], true /*compute_now*/);

> +    }

> +  else

> +    {

> +      if (!address_operand (addr, SImode))

> +       operands[1] = force_const_mem (SImode, addr);

> +      emit_move_insn (operands[2], operands[1]);

> +    }

> +}"

> +)

> +

> +(define_insn "stack_protect_set"

> +  [(set (match_operand:SI 0 "memory_operand" "=m")

> +       (unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]

> +        UNSPEC_SP_SET))

> +   (clobber (match_dup 1))]

>

> I'm not familiar with this idiom, don't you instead want to mark the constraint on operand 1 as earlyclobber (or read-write)?


According to the documentation earlyclobber applies to operands being
written to and this is an input operand only. As the doc says the
earlyclobber indicates to the compiler that it cannot reuse a register
used for an input operand to put the output operand into. Here I'm
trying to express instead that the operand needs to be saved/restored
if needed after this insn because it gets written to. This is only
needed in Thumb-1 because stack_protect_set_combined and
stack_protect_test_combined already use all registers available so I
cannot ask for an extra clobber register.

>

>   +  ""

> +  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"

> +  [(set_attr "length" "12")

> +   (set_attr "type" "multiple")])

> +

> +(define_insn_and_split "stack_protect_combined_test"

> +  [(set (pc)

> +       (if_then_else

> +               (eq (match_operand:SI 0 "memory_operand" "m")

> +                   (unspec:SI [(match_operand:SI 1 "general_operand" "X")]

> +                              UNSPEC_SP_TEST))

> +               (label_ref (match_operand 2))

> +               (pc)))

> +   (match_scratch:SI 3 "=r")

> +   (match_scratch:SI 4 "=r")]

> +  ""

> +  "#"

> +  "reload_completed"

> +  [(const_int 0)]

> +{

> +  rtx eq, addr;

> +

> +  addr = XEXP (operands[1], 0);

>

> Same concern as with stack_protect_combined_set about operand 1 predicate being too general.


Will try to create a new predicate, thanks for raising this up.

>

>   +  if (flag_pic)

> +    {

> +      /* Forces recomputing of GOT base now.  */

> +      operands[1] = legitimize_pic_address (addr, SImode, operands[3],

> +                                           operands[4],

> +                                           true /*compute_now*/);

> +    }

> +  else

> +    {

> +      if (!address_operand (addr, SImode))

> +       operands[1] = force_const_mem (SImode, addr);

> +      emit_move_insn (operands[3], operands[1]);

> +    }

> +  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));

> +  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);

> +  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));

> +  DONE;

> +})

> +

> +(define_insn "stack_protect_test"

> +  [(set (match_operand:SI 0 "register_operand" "=r")

> +       (unspec:SI [(match_operand:SI 1 "memory_operand" "m")

> +                   (mem:SI (match_operand:SI 2 "register_operand" "r"))]

> +        UNSPEC_SP_TEST))

> +   (clobber (match_dup 2))]

>

> Same concern on clobbering and constraints.


Same answer :-)

Best regards,

Thomas

>

>   +  ""

> +  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"

> +  [(set_attr "length" "12")

> +   (set_attr "type" "multiple")])

> +

>   (define_expand "casesi"

>     [(match_operand:SI 0 "s_register_operand" "")       ; index to jump on

>      (match_operand:SI 1 "const_int_operand" "")        ; lower bound

> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md

> index b05f85e10e4..429c937df60 100644

> --- a/gcc/config/arm/unspecs.md

> +++ b/gcc/config/arm/unspecs.md

> @@ -86,6 +86,9 @@

>     UNSPEC_PROBE_STACK    ; Probe stack memory reference

>     UNSPEC_NONSECURE_MEM        ; Represent non-secure memory in ARMv8-M with

>                         ; security extension

> +  UNSPEC_SP_SET                ; Represent the setting of stack protector's canary

> +  UNSPEC_SP_TEST       ; Represent the testing of stack protector's canary

> +                       ; against the guard.

>   ])

>

>   (define_c_enum "unspec" [

> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

> index 734bc765387..f760da495bf 100644

> --- a/gcc/doc/md.texi

> +++ b/gcc/doc/md.texi

> @@ -7362,22 +7362,61 @@ builtins.

>   The get/set patterns have a single output/input operand respectively,

>   with @var{mode} intended to be @code{Pmode}.

>

> +@cindex @code{stack_protect_combined_set} instruction pattern

> +@item @samp{stack_protect_combined_set}

> +This pattern, if defined, moves a @code{ptr_mode} value from an address

> +whose declaration RTX is given in operand 1 to the memory in operand 0

> +without leaving the value in a register afterward.  If several

> +instructions are needed by the target to perform the operation (eg. to

> +load the address from a GOT entry then load the @code{ptr_mode} value

> +and finally store it), it is the backend's responsibility to ensure no

> +intermediate result gets spilled.  This is to avoid leaking the value

> +some place that an attacker might use to rewrite the stack guard slot

> +after having clobbered it.

> +

> +If this pattern is not defined, then the address declaration is

> +expanded first in the standard way and a @code{stack_protect_set}

> +pattern is then generated to move the value from that address to the

> +address in operand 0.

> +

>   @cindex @code{stack_protect_set} instruction pattern

>   @item @samp{stack_protect_set}

> -This pattern, if defined, moves a @code{ptr_mode} value from the memory

> -in operand 1 to the memory in operand 0 without leaving the value in

> -a register afterward.  This is to avoid leaking the value some place

> -that an attacker might use to rewrite the stack guard slot after

> -having clobbered it.

> +This pattern, if defined, moves a @code{ptr_mode} value from the valid

> +memory location in operand 1 to the memory in operand 0 without leaving

> +the value in a register afterward.  This is to avoid leaking the value

> +some place that an attacker might use to rewrite the stack guard slot

> +after having clobbered it.

> +

> +Note: on targets where the addressing modes do not allow to load

> +directly from stack guard address, the address is expanded in a standard

> +way first which could cause some spills.

>

>   If this pattern is not defined, then a plain move pattern is generated.

>

> +@cindex @code{stack_protect_combined_test} instruction pattern

> +@item @samp{stack_protect_combined_test}

> +This pattern, if defined, compares a @code{ptr_mode} value from an

> +address whose declaration RTX is given in operand 1 with the memory in

> +operand 0 without leaving the value in a register afterward and

> +branches to operand 2 if the values were equal.  If several

> +instructions are needed by the target to perform the operation (eg. to

> +load the address from a GOT entry then load the @code{ptr_mode} value

> +and finally store it), it is the backend's responsibility to ensure no

> +intermediate result gets spilled.  This is to avoid leaking the value

> +some place that an attacker might use to rewrite the stack guard slot

> +after having clobbered it.

> +

> +If this pattern is not defined, then the address declaration is

> +expanded first in the standard way and a @code{stack_protect_test}

> +pattern is then generated to compare the value from that address to the

> +value at the memory in operand 0.

> +

>   @cindex @code{stack_protect_test} instruction pattern

>   @item @samp{stack_protect_test}

>   This pattern, if defined, compares a @code{ptr_mode} value from the

> -memory in operand 1 with the memory in operand 0 without leaving the

> -value in a register afterward and branches to operand 2 if the values

> -were equal.

> +valid memory location in operand 1 with the memory in operand 0 without

> +leaving the value in a register afterward and branches to operand 2 if

> +the values were equal.

>

>   If this pattern is not defined, then a plain compare pattern and

>   conditional branch pattern is used.

> diff --git a/gcc/function.c b/gcc/function.c

> index 142cdaec2ce..bf96dd6feae 100644

> --- a/gcc/function.c

> +++ b/gcc/function.c

> @@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)

>     rtx_code_label *label = gen_label_rtx ();

>     rtx x, y;

>     rtx_insn *seq;

> +  struct expand_operand ops[3];

>

>     x = expand_normal (crtl->stack_protect_guard);

> -  if (guard_decl)

> -    y = expand_normal (guard_decl);

> -  else

> -    y = const0_rtx;

> -

> -  /* Allow the target to compare Y with X without leaking either into

> -     a register.  */

> -  if (targetm.have_stack_protect_test ()

> -      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))

> -    emit_insn (seq);

> -  else

> -    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);

> +  create_fixed_operand (&ops[0], x);

> +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));

> +  create_fixed_operand (&ops[2], label);

> +  /* Allow the target to compute address of Y and compare it with X without

> +     leaking Y into a register.  This combined address + compare pattern allows

> +     the target to prevent spilling of any intermediate results by splitting

> +     it after register allocator.  */

> +  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,

> +                              3, ops))

> +    {

> +      if (guard_decl)

> +       y = expand_normal (guard_decl);

> +      else

> +       y = const0_rtx;

> +

> +      /* Allow the target to compare Y with X without leaking either into

> +        a register.  */

> +      if (targetm.have_stack_protect_test ()

> +         && ((seq = targetm.gen_stack_protect_test (x, y, label))

> +             != NULL_RTX))

> +       emit_insn (seq);

> +      else

> +       emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);

> +    }

>

>     /* The noreturn predictor has been moved to the tree level.  The rtl-level

>        predictors estimate this branch about 20%, which isn't enough to get

> diff --git a/gcc/target-insns.def b/gcc/target-insns.def

> index 9a552c3d11c..d39889b3522 100644

> --- a/gcc/target-insns.def

> +++ b/gcc/target-insns.def

> @@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,

>   DEF_TARGET_INSN (simple_return, (void))

>   DEF_TARGET_INSN (split_stack_prologue, (void))

>   DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))

> +DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))

>   DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))

> +DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))

>   DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))

>   DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))

>   DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))

> diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c

> new file mode 100644

> index 00000000000..4143a861f7c

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/pr85434.c

> @@ -0,0 +1,200 @@

> +/* { dg-do compile } */

> +/* { dg-require-effective-target fstack_protector }*/

> +/* { dg-require-effective-target fpic }*/

> +/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */

> +

> +#include <stddef.h>

> +#include <stdint.h>

> +

> +

> +static const unsigned char base64_enc_map[64] =

> +{

> +    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',

> +    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',

> +    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',

> +    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',

> +    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',

> +    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',

> +    '8', '9', '+', '/'

> +};

> +

> +#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */

> +

> +

> +void doSmth(void *x);

> +

> +#include <string.h>

> +

> +

> +void check(int n) {

> +

> +    if (!(n % 2 && n % 3 && n % 5)) {

> + __asm__  (   "add    r8, r8, #1;" );

> +    }

> +}

> +

> +uint32_t test(

> +  uint32_t a1,

> +  uint32_t a2,

> +  size_t a3,

> +  size_t a4,

> +  size_t a5,

> +  size_t a6)

> +{

> +  uint32_t nResult = 0;

> +  uint8_t* h = 0L;

> +  uint8_t X[128];

> +  uint8_t mac[64];

> +  size_t len;

> +

> +  doSmth(&a1);

> +  doSmth(&a2);

> +  doSmth(&a3);

> +  doSmth(&a4);

> +  doSmth(&a5);

> +  doSmth(&a6);

> +

> +  if (a1 && a2 && a3 && a4 && a5 && a6) {

> +    nResult = 1;

> +    h = (void*)X;

> +    len = sizeof(X);

> +    memset(X, a2, len);

> +    len -= 64;

> +    memcpy(mac ,X, len);

> +    *(h + len) = a6;

> +

> +    {

> +

> +

> +        unsigned char *dst = X;

> +        size_t dlen = a3;

> +        size_t *olen = &a6;

> +        const unsigned char *src = mac;

> +        size_t slen = a4;

> +    size_t i, n;

> +    int C1, C2, C3;

> +    unsigned char *p;

> +

> +    if( slen == 0 )

> +    {

> +        *olen = 0;

> +        return( 0 );

> +    }

> +

> +    n = slen / 3 + ( slen % 3 != 0 );

> +

> +    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )

> +    {

> +        *olen = BASE64_SIZE_T_MAX;

> +        return( 0 );

> +    }

> +

> +    n *= 4;

> +

> +    if( ( dlen < n + 1 ) || ( NULL == dst ) )

> +    {

> +        *olen = n + 1;

> +        return( 0 );

> +    }

> +

> +    n = ( slen / 3 ) * 3;

> +

> +    for( i = 0, p = dst; i < n; i += 3 )

> +    {

> +        C1 = *src++;

> +        C2 = *src++;

> +        C3 = *src++;

> +

> +        check(i);

> +

> +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];

> +        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];

> +        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];

> +        *p++ = base64_enc_map[C3 & 0x3F];

> +    }

> +

> +    if( i < slen )

> +    {

> +        C1 = *src++;

> +        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;

> +

> +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];

> +        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];

> +

> +        if( ( i + 1 ) < slen )

> +             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];

> +        else *p++ = '=';

> +

> +        *p++ = '=';

> +    }

> +

> +    *olen = p - dst;

> +    *p = 0;

> +

> +}

> +

> +  __asm__ ("mov r8, %0;" : "=r" ( nResult ));

> +  }

> +  else

> +  {

> +    nResult = 2;

> +  }

> +

> +  doSmth(X);

> +  doSmth(mac);

> +

> +

> +  return nResult;

> +}

> +

> +/* The pattern below catches sequences of instructions that were generated

> +   for ARM and Thumb-2 before the fix for this PR. They are of the form:

> +

> +   ldr     rX, <offset from sp or fp>

> +   <optional non ldr instructions>

> +   ldr     rY, <offset from sp or fp>

> +   ldr     rZ, [rX]

> +   <optional non ldr instructions>

> +   cmp     rY, rZ

> +   <optional non cmp instructions>

> +   bl      __stack_chk_fail

> +

> +   Ideally the optional block would check for the various rX, rY and rZ

> +   registers not being set but this is not possible due to back references

> +   being illegal in lookahead expression in Tcl, thus preventing to use the

> +   only construct that allow to negate a regexp from using the backreferences

> +   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp

> +   instructions with the assumptions that (i) those are not part of the stack

> +   protector sequences and (ii) they would only be scheduled here if they don't

> +   conflict with registers used by stack protector.

> +

> +   Note on the regexp logic:

> +   Allowing non X instructions (where X is ldr or cmp) is done by looking for

> +   some non newline spaces, followed by something which is not X, followed by

> +   an alphanumeric character followed by anything but a newline and ended by a

> +   newline the whole thing an undetermined number of times. The alphanumeric

> +   character is there to force the match of the negative lookahead for X to

> +   only happen after all the initial spaces and thus to check the mnemonic.

> +   This prevents it to match one of the initial space.  */

> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */

> +

> +/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR

> +   which had the form:

> +

> +   ldr     rS, <offset from sp or fp>

> +   <optional non ldr instructions>

> +   ldr     rT, <PC relative offset>

> +   <optional non ldr instructions>

> +   ldr     rX, [rS, rT]

> +   <optional non ldr instructions>

> +   ldr     rY, <offset from sp or fp>

> +   ldr     rZ, [rX]

> +   <optional non ldr instructions>

> +   cmp     rY, rZ

> +   <optional non cmp instructions>

> +   bl      __stack_chk_fail

> +

> +  Note on the regexp logic:

> +  PC relative offset is checked by looking for a source operand that does not

> +  contain [ or ].  */

> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */

>

>
Thomas Preudhomme July 25, 2018, 1:28 p.m. | #6
Hi Kyrill,

Using memory_operand worked, the issues I encountered when using it in
earlier versions of the patch must have been due to the missing test
on address_operand in the preparation statements which I added later.
Please find an updated patch in attachment. ChangeLog entry is as
follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    * target-insns.def (stack_protect_combined_set): Define new standard
    pattern name.
    (stack_protect_combined_test): Likewise.
    * cfgexpand.c (stack_protect_prologue): Try new
    stack_protect_combined_set pattern first.
    * function.c (stack_protect_epilogue): Try new
    stack_protect_combined_test pattern first.
    * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
    parameters to control which register to use as PIC register and force
    reloading PIC register respectively.  Insert in the stream of insns if
    possible.
    (legitimize_pic_address): Expose above new parameters in prototype and
    adapt recursive calls accordingly.
    (arm_legitimize_address): Adapt to new legitimize_pic_address
    prototype.
    (thumb_legitimize_address): Likewise.
    (arm_emit_call_insn): Adapt to new require_pic_register prototype.
    * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
    change.
    * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
    prototype change.
    (stack_protect_combined_set): New insn_and_split pattern.
    (stack_protect_set): New insn pattern.
    (stack_protect_combined_test): New insn_and_split pattern.
    (stack_protect_test): New insn pattern.
    * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
    (UNSPEC_SP_TEST): Likewise.
    * doc/md.texi (stack_protect_combined_set): Document new standard
    pattern name.
    (stack_protect_set): Clarify that the operand for guard's address is
    legal.
    (stack_protect_combined_test): Document new standard pattern name.
    (stack_protect_test): Clarify that the operand for guard's address is
    legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    * gcc.target/arm/pr85434.c: New test.

Bootstrapped again for Arm and Thumb-2 and regtested with and without
-fstack-protector-all without any regression.

Best regards,

Thomas
On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>

> [Dropping Jeff Law from the list since he already commented on the

> middle end parts]

>

> Hi Kyrill,

>

> On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov

> <kyrylo.tkachov@foss.arm.com> wrote:

> >

> > Hi Thomas,

> >

> > On 17/07/18 12:02, Thomas Preudhomme wrote:

> > > Fixed in attached patch. ChangeLog entries are unchanged:

> > >

> > > *** gcc/ChangeLog ***

> > >

> > > 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

> > >

> > >     PR target/85434

> > >     * target-insns.def (stack_protect_combined_set): Define new standard

> > >     pattern name.

> > >     (stack_protect_combined_test): Likewise.

> > >     * cfgexpand.c (stack_protect_prologue): Try new

> > >     stack_protect_combined_set pattern first.

> > >     * function.c (stack_protect_epilogue): Try new

> > >     stack_protect_combined_test pattern first.

> > >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

> > >     parameters to control which register to use as PIC register and force

> > >     reloading PIC register respectively.

> > >     (legitimize_pic_address): Expose above new parameters in prototype and

> > >     adapt recursive calls accordingly.

> > >     (arm_legitimize_address): Adapt to new legitimize_pic_address

> > >     prototype.

> > >     (thumb_legitimize_address): Likewise.

> > >     (arm_emit_call_insn): Adapt to new require_pic_register prototype.

> > >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

> > >     change.

> > >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

> > >     prototype change.

> > >     (stack_protect_combined_set): New insn_and_split pattern.

> > >     (stack_protect_set): New insn pattern.

> > >     (stack_protect_combined_test): New insn_and_split pattern.

> > >     (stack_protect_test): New insn pattern.

> > >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

> > >     (UNSPEC_SP_TEST): Likewise.

> > >     * doc/md.texi (stack_protect_combined_set): Document new standard

> > >     pattern name.

> > >     (stack_protect_set): Clarify that the operand for guard's address is

> > >     legal.

> > >     (stack_protect_combined_test): Document new standard pattern name.

> > >     (stack_protect_test): Clarify that the operand for guard's address is

> > >     legal.

> > >

> > > *** gcc/testsuite/ChangeLog ***

> > >

> > > 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

> > >

> > >     PR target/85434

> > >     * gcc.target/arm/pr85434.c: New test.

> > >

> >

> > Sorry for the delay. Some comments inline.

> >

> > Kyrill

> >

> > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c

> > index d6e3c382085..d1a893ac56e 100644

> > --- a/gcc/cfgexpand.c

> > +++ b/gcc/cfgexpand.c

> > @@ -6105,8 +6105,18 @@ stack_protect_prologue (void)

> >   {

> >     tree guard_decl = targetm.stack_protect_guard ();

> >     rtx x, y;

> > +  struct expand_operand ops[2];

> >

> >     x = expand_normal (crtl->stack_protect_guard);

> > +  create_fixed_operand (&ops[0], x);

> > +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));

> > +  /* Allow the target to compute address of Y and copy it to X without

> > +     leaking Y into a register.  This combined address + copy pattern allows

> > +     the target to prevent spilling of any intermediate results by splitting

> > +     it after register allocator.  */

> > +  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))

> > +    return;

> > +

> >     if (guard_decl)

> >       y = expand_normal (guard_decl);

> >     else

> > diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

> > index 8537262ce64..100844e659c 100644

> > --- a/gcc/config/arm/arm-protos.h

> > +++ b/gcc/config/arm/arm-protos.h

> > @@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);

> >   extern int arm_split_constant (RTX_CODE, machine_mode, rtx,

> >                                HOST_WIDE_INT, rtx, rtx, int);

> >   extern int legitimate_pic_operand_p (rtx);

> > -extern rtx legitimize_pic_address (rtx, machine_mode, rtx);

> > +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);

> >   extern rtx legitimize_tls_address (rtx, rtx);

> >   extern bool arm_legitimate_address_p (machine_mode, rtx, bool);

> >   extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);

> > diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> > index ec3abbcba9f..f4a970580c2 100644

> > --- a/gcc/config/arm/arm.c

> > +++ b/gcc/config/arm/arm.c

> > @@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)

> >   }

> >

> >   /* Record that the current function needs a PIC register.  Initialize

> > -   cfun->machine->pic_reg if we have not already done so.  */

> > +   cfun->machine->pic_reg if we have not already done so.

> > +

> > +   If not NULL, PIC_REG indicates which register to use as PIC register,

> > +   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC

> > +   register to be loaded, irregardless of whether it was loaded previously.  */

> >

> >   static void

> > -require_pic_register (void)

> > +require_pic_register (rtx pic_reg, bool compute_now)

> >   {

> >     /* A lot of the logic here is made obscure by the fact that this

> >        routine gets called as part of the rtx cost estimation process.

> >        We don't want those calls to affect any assumptions about the real

> >        function; and further, we can't call entry_of_function() until we

> >        start the real expansion process.  */

> > -  if (!crtl->uses_pic_offset_table)

> > +  if (!crtl->uses_pic_offset_table || compute_now)

> >       {

> > -      gcc_assert (can_create_pseudo_p ());

> > +      gcc_assert (can_create_pseudo_p ()

> > +                 || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));

> >         if (arm_pic_register != INVALID_REGNUM

> > +         && can_create_pseudo_p ()

> >           && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))

> >         {

> >           if (!cfun->machine->pic_reg)

> > @@ -7399,7 +7405,8 @@ require_pic_register (void)

> >           rtx_insn *seq, *insn;

> >

> >           if (!cfun->machine->pic_reg)

> > -           cfun->machine->pic_reg = gen_reg_rtx (Pmode);

> > +           cfun->machine->pic_reg =

> > +             can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;

> >

> >

> > I'll admit I'm not too familiar with this code, but don't you want to be setting cfun->machine->pic_reg to the pic_reg that

> > was passed into this function (if pic_reg != NULL as per the function comment)?

>

> Indeed the comment needs to be fixed. I think it only makes sense to

> specify a register after register allocation when the amount of

> register is finite. Before that I don't see why one would want to do

> that. I've changed the sentence about PIC_REG in the heading comment

> to:

>

> "A new pseudo register is used for the PIC register if possible,

> otherwise PIC_REG must be non NULL and is used instead."

>

> >

> >

> >           /* Play games to avoid marking the function as needing pic

> >              if we are being called as part of the cost-estimation

> > @@ -7410,7 +7417,8 @@ require_pic_register (void)

> >               start_sequence ();

> >

> >               if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM

> > -                 && arm_pic_register > LAST_LO_REGNUM)

> > +                 && arm_pic_register > LAST_LO_REGNUM

> > +                 && can_create_pseudo_p ())

> >                 emit_move_insn (cfun->machine->pic_reg,

> >                                 gen_rtx_REG (Pmode, arm_pic_register));

> >               else

> > @@ -7427,15 +7435,29 @@ require_pic_register (void)

> >                  we can't yet emit instructions directly in the final

> >                  insn stream.  Queue the insns on the entry edge, they will

> >                  be committed after everything else is expanded.  */

> > -             insert_insn_on_edge (seq,

> > -                                  single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

> > +             if (can_create_pseudo_p ())

> > +               insert_insn_on_edge (seq,

> > +                                    single_succ_edge

> > +                                    (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

> > +             else

> > +               emit_insn (seq);

> >             }

> >

> > It's not the capability for creating pseudos that is relevant here, but whether we have an instruction

> > stream to insert insns on. Do you think that using currently_expanding_to_rtl would work as a condition?

>

> Indeed it's not the right test. Do you mean testing if

> currently_expanding_to_rtl is false? I used can_create_pseudo_p

> because I knew the function could never be called when it was false

> before and thus my changes would only impact the code I was changing.

>

> >

> >

> >   +/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is

> >

> > "Legitimize"

> >

> >   +   created to hold the result of the load.  If not NULL, PIC_REG indicates

> > +   which register to use as PIC register, otherwise it is decided by register

> > +   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current

> > +   location in the instruction stream, irregardless of whether it was loaded

> > +   previously.

> > +

> > +   Returns the register REG into which the PIC load is performed.  */

> > +

> >   rtx

> > -legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

> > +legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,

> > +                       bool compute_now)

> >   {

> >     if (GET_CODE (orig) == SYMBOL_REF

> >         || GET_CODE (orig) == LABEL_REF)

> > @@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

> >           rtx mem;

> >

> >           /* If this function doesn't have a pic register, create one now.  */

> > -         require_pic_register ();

> > +         require_pic_register (pic_reg, compute_now);

> >

> >           pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);

> >

> > @@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

> >

> >         gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);

> >

> > -      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);

> > +      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,

> > +                                    pic_reg, compute_now);

> >         offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,

> > -                                      base == reg ? 0 : reg);

> > +                                      base == reg ? 0 : reg, pic_reg,

> > +                                      compute_now);

> >

> >         if (CONST_INT_P (offset))

> >         {

> > @@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)

> >       {

> >         /* We need to find and carefully transform any SYMBOL and LABEL

> >          references; so go back to the original address expression.  */

> > -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);

> > +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,

> > +                                         false /*compute_now*/);

> >

> >         if (new_x != orig_x)

> >         x = new_x;

> > @@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)

> >       {

> >         /* We need to find and carefully transform any SYMBOL and LABEL

> >          references; so go back to the original address expression.  */

> > -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);

> > +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,

> > +                                         false /*compute_now*/);

> >

> >         if (new_x != orig_x)

> >         x = new_x;

> > @@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)

> >           ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))

> >           : !SYMBOL_REF_LOCAL_P (addr)))

> >       {

> > -      require_pic_register ();

> > +      require_pic_register (NULL_RTX, false /*compute_now*/);

> >         use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);

> >       }

> >

> > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

> > index 361a02668b0..9582c3fbb11 100644

> > --- a/gcc/config/arm/arm.md

> > +++ b/gcc/config/arm/arm.md

> > @@ -6021,7 +6021,8 @@

> >         operands[1] = legitimize_pic_address (operands[1], SImode,

> >                                             (!can_create_pseudo_p ()

> >                                              ? operands[0]

> > -                                            : 0));

> > +                                            : NULL_RTX), NULL_RTX,

> > +                                           false /*compute_now*/);

> >     }

> >     "

> >   )

> > @@ -8634,6 +8635,95 @@

> >      (set_attr "conds" "clob")]

> >   )

> >

> > +;; Named patterns for stack smashing protection.

> > +(define_insn_and_split "stack_protect_combined_set"

> > +  [(set (match_operand:SI 0 "memory_operand" "=m")

> > +       (unspec:SI [(match_operand:SI 1 "general_operand" "X")]

> > +                  UNSPEC_SP_SET))

> > +   (match_scratch:SI 2 "=r")

> > +   (match_scratch:SI 3 "=r")]

> > +  ""

> > +  "#"

> > +  "reload_completed"

> > +  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]

> > +                                           UNSPEC_SP_SET))

> > +             (clobber (match_dup 2))])]

> > +  "

> > +{

> > +  rtx addr = XEXP (operands[1], 0);

> >

> >

> > I think you want to tighten the predicate of operands[1].

> > general_operand is very, well, general, but you're taking its 0th subtree, which it may not necessarily have.

>

> I was afraid you'd ask that. Yes indeed a better predicate should be

> used but I'm not sure all the possible cases here. However it has to

> be some sort of memory since this is the address of the guard. I think

> the problem with using memory_operand was that it could be a MEM of a

> SYMBOL_REF which is not legitimate at this point.

>

> >

> >   +  if (flag_pic)

> > +    {

> > +      /* Forces recomputing of GOT base now.  */

> > +      operands[1] = legitimize_pic_address (addr, SImode, operands[2],

> > +                                           operands[3], true /*compute_now*/);

> > +    }

> > +  else

> > +    {

> > +      if (!address_operand (addr, SImode))

> > +       operands[1] = force_const_mem (SImode, addr);

> > +      emit_move_insn (operands[2], operands[1]);

> > +    }

> > +}"

> > +)

> > +

> > +(define_insn "stack_protect_set"

> > +  [(set (match_operand:SI 0 "memory_operand" "=m")

> > +       (unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]

> > +        UNSPEC_SP_SET))

> > +   (clobber (match_dup 1))]

> >

> > I'm not familiar with this idiom, don't you instead want to mark the constraint on operand 1 as earlyclobber (or read-write)?

>

> According to the documentation earlyclobber applies to operands being

> written to and this is an input operand only. As the doc says the

> earlyclobber indicates to the compiler that it cannot reuse a register

> used for an input operand to put the output operand into. Here I'm

> trying to express instead that the operand needs to be saved/restored

> if needed after this insn because it gets written to. This is only

> needed in Thumb-1 because stack_protect_set_combined and

> stack_protect_test_combined already use all registers available so I

> cannot ask for an extra clobber register.

>

> >

> >   +  ""

> > +  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"

> > +  [(set_attr "length" "12")

> > +   (set_attr "type" "multiple")])

> > +

> > +(define_insn_and_split "stack_protect_combined_test"

> > +  [(set (pc)

> > +       (if_then_else

> > +               (eq (match_operand:SI 0 "memory_operand" "m")

> > +                   (unspec:SI [(match_operand:SI 1 "general_operand" "X")]

> > +                              UNSPEC_SP_TEST))

> > +               (label_ref (match_operand 2))

> > +               (pc)))

> > +   (match_scratch:SI 3 "=r")

> > +   (match_scratch:SI 4 "=r")]

> > +  ""

> > +  "#"

> > +  "reload_completed"

> > +  [(const_int 0)]

> > +{

> > +  rtx eq, addr;

> > +

> > +  addr = XEXP (operands[1], 0);

> >

> > Same concern as with stack_protect_combined_set about operand 1 predicate being too general.

>

> Will try to create a new predicate, thanks for raising this up.

>

> >

> >   +  if (flag_pic)

> > +    {

> > +      /* Forces recomputing of GOT base now.  */

> > +      operands[1] = legitimize_pic_address (addr, SImode, operands[3],

> > +                                           operands[4],

> > +                                           true /*compute_now*/);

> > +    }

> > +  else

> > +    {

> > +      if (!address_operand (addr, SImode))

> > +       operands[1] = force_const_mem (SImode, addr);

> > +      emit_move_insn (operands[3], operands[1]);

> > +    }

> > +  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));

> > +  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);

> > +  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));

> > +  DONE;

> > +})

> > +

> > +(define_insn "stack_protect_test"

> > +  [(set (match_operand:SI 0 "register_operand" "=r")

> > +       (unspec:SI [(match_operand:SI 1 "memory_operand" "m")

> > +                   (mem:SI (match_operand:SI 2 "register_operand" "r"))]

> > +        UNSPEC_SP_TEST))

> > +   (clobber (match_dup 2))]

> >

> > Same concern on clobbering and constraints.

>

> Same answer :-)

>

> Best regards,

>

> Thomas

>

> >

> >   +  ""

> > +  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"

> > +  [(set_attr "length" "12")

> > +   (set_attr "type" "multiple")])

> > +

> >   (define_expand "casesi"

> >     [(match_operand:SI 0 "s_register_operand" "")       ; index to jump on

> >      (match_operand:SI 1 "const_int_operand" "")        ; lower bound

> > diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md

> > index b05f85e10e4..429c937df60 100644

> > --- a/gcc/config/arm/unspecs.md

> > +++ b/gcc/config/arm/unspecs.md

> > @@ -86,6 +86,9 @@

> >     UNSPEC_PROBE_STACK    ; Probe stack memory reference

> >     UNSPEC_NONSECURE_MEM        ; Represent non-secure memory in ARMv8-M with

> >                         ; security extension

> > +  UNSPEC_SP_SET                ; Represent the setting of stack protector's canary

> > +  UNSPEC_SP_TEST       ; Represent the testing of stack protector's canary

> > +                       ; against the guard.

> >   ])

> >

> >   (define_c_enum "unspec" [

> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

> > index 734bc765387..f760da495bf 100644

> > --- a/gcc/doc/md.texi

> > +++ b/gcc/doc/md.texi

> > @@ -7362,22 +7362,61 @@ builtins.

> >   The get/set patterns have a single output/input operand respectively,

> >   with @var{mode} intended to be @code{Pmode}.

> >

> > +@cindex @code{stack_protect_combined_set} instruction pattern

> > +@item @samp{stack_protect_combined_set}

> > +This pattern, if defined, moves a @code{ptr_mode} value from an address

> > +whose declaration RTX is given in operand 1 to the memory in operand 0

> > +without leaving the value in a register afterward.  If several

> > +instructions are needed by the target to perform the operation (eg. to

> > +load the address from a GOT entry then load the @code{ptr_mode} value

> > +and finally store it), it is the backend's responsibility to ensure no

> > +intermediate result gets spilled.  This is to avoid leaking the value

> > +some place that an attacker might use to rewrite the stack guard slot

> > +after having clobbered it.

> > +

> > +If this pattern is not defined, then the address declaration is

> > +expanded first in the standard way and a @code{stack_protect_set}

> > +pattern is then generated to move the value from that address to the

> > +address in operand 0.

> > +

> >   @cindex @code{stack_protect_set} instruction pattern

> >   @item @samp{stack_protect_set}

> > -This pattern, if defined, moves a @code{ptr_mode} value from the memory

> > -in operand 1 to the memory in operand 0 without leaving the value in

> > -a register afterward.  This is to avoid leaking the value some place

> > -that an attacker might use to rewrite the stack guard slot after

> > -having clobbered it.

> > +This pattern, if defined, moves a @code{ptr_mode} value from the valid

> > +memory location in operand 1 to the memory in operand 0 without leaving

> > +the value in a register afterward.  This is to avoid leaking the value

> > +some place that an attacker might use to rewrite the stack guard slot

> > +after having clobbered it.

> > +

> > +Note: on targets where the addressing modes do not allow to load

> > +directly from stack guard address, the address is expanded in a standard

> > +way first which could cause some spills.

> >

> >   If this pattern is not defined, then a plain move pattern is generated.

> >

> > +@cindex @code{stack_protect_combined_test} instruction pattern

> > +@item @samp{stack_protect_combined_test}

> > +This pattern, if defined, compares a @code{ptr_mode} value from an

> > +address whose declaration RTX is given in operand 1 with the memory in

> > +operand 0 without leaving the value in a register afterward and

> > +branches to operand 2 if the values were equal.  If several

> > +instructions are needed by the target to perform the operation (eg. to

> > +load the address from a GOT entry then load the @code{ptr_mode} value

> > +and finally store it), it is the backend's responsibility to ensure no

> > +intermediate result gets spilled.  This is to avoid leaking the value

> > +some place that an attacker might use to rewrite the stack guard slot

> > +after having clobbered it.

> > +

> > +If this pattern is not defined, then the address declaration is

> > +expanded first in the standard way and a @code{stack_protect_test}

> > +pattern is then generated to compare the value from that address to the

> > +value at the memory in operand 0.

> > +

> >   @cindex @code{stack_protect_test} instruction pattern

> >   @item @samp{stack_protect_test}

> >   This pattern, if defined, compares a @code{ptr_mode} value from the

> > -memory in operand 1 with the memory in operand 0 without leaving the

> > -value in a register afterward and branches to operand 2 if the values

> > -were equal.

> > +valid memory location in operand 1 with the memory in operand 0 without

> > +leaving the value in a register afterward and branches to operand 2 if

> > +the values were equal.

> >

> >   If this pattern is not defined, then a plain compare pattern and

> >   conditional branch pattern is used.

> > diff --git a/gcc/function.c b/gcc/function.c

> > index 142cdaec2ce..bf96dd6feae 100644

> > --- a/gcc/function.c

> > +++ b/gcc/function.c

> > @@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)

> >     rtx_code_label *label = gen_label_rtx ();

> >     rtx x, y;

> >     rtx_insn *seq;

> > +  struct expand_operand ops[3];

> >

> >     x = expand_normal (crtl->stack_protect_guard);

> > -  if (guard_decl)

> > -    y = expand_normal (guard_decl);

> > -  else

> > -    y = const0_rtx;

> > -

> > -  /* Allow the target to compare Y with X without leaking either into

> > -     a register.  */

> > -  if (targetm.have_stack_protect_test ()

> > -      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))

> > -    emit_insn (seq);

> > -  else

> > -    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);

> > +  create_fixed_operand (&ops[0], x);

> > +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));

> > +  create_fixed_operand (&ops[2], label);

> > +  /* Allow the target to compute address of Y and compare it with X without

> > +     leaking Y into a register.  This combined address + compare pattern allows

> > +     the target to prevent spilling of any intermediate results by splitting

> > +     it after register allocator.  */

> > +  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,

> > +                              3, ops))

> > +    {

> > +      if (guard_decl)

> > +       y = expand_normal (guard_decl);

> > +      else

> > +       y = const0_rtx;

> > +

> > +      /* Allow the target to compare Y with X without leaking either into

> > +        a register.  */

> > +      if (targetm.have_stack_protect_test ()

> > +         && ((seq = targetm.gen_stack_protect_test (x, y, label))

> > +             != NULL_RTX))

> > +       emit_insn (seq);

> > +      else

> > +       emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);

> > +    }

> >

> >     /* The noreturn predictor has been moved to the tree level.  The rtl-level

> >        predictors estimate this branch about 20%, which isn't enough to get

> > diff --git a/gcc/target-insns.def b/gcc/target-insns.def

> > index 9a552c3d11c..d39889b3522 100644

> > --- a/gcc/target-insns.def

> > +++ b/gcc/target-insns.def

> > @@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,

> >   DEF_TARGET_INSN (simple_return, (void))

> >   DEF_TARGET_INSN (split_stack_prologue, (void))

> >   DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))

> > +DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))

> >   DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))

> > +DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))

> >   DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))

> >   DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))

> >   DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))

> > diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c

> > new file mode 100644

> > index 00000000000..4143a861f7c

> > --- /dev/null

> > +++ b/gcc/testsuite/gcc.target/arm/pr85434.c

> > @@ -0,0 +1,200 @@

> > +/* { dg-do compile } */

> > +/* { dg-require-effective-target fstack_protector }*/

> > +/* { dg-require-effective-target fpic }*/

> > +/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */

> > +

> > +#include <stddef.h>

> > +#include <stdint.h>

> > +

> > +

> > +static const unsigned char base64_enc_map[64] =

> > +{

> > +    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',

> > +    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',

> > +    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',

> > +    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',

> > +    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',

> > +    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',

> > +    '8', '9', '+', '/'

> > +};

> > +

> > +#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */

> > +

> > +

> > +void doSmth(void *x);

> > +

> > +#include <string.h>

> > +

> > +

> > +void check(int n) {

> > +

> > +    if (!(n % 2 && n % 3 && n % 5)) {

> > + __asm__  (   "add    r8, r8, #1;" );

> > +    }

> > +}

> > +

> > +uint32_t test(

> > +  uint32_t a1,

> > +  uint32_t a2,

> > +  size_t a3,

> > +  size_t a4,

> > +  size_t a5,

> > +  size_t a6)

> > +{

> > +  uint32_t nResult = 0;

> > +  uint8_t* h = 0L;

> > +  uint8_t X[128];

> > +  uint8_t mac[64];

> > +  size_t len;

> > +

> > +  doSmth(&a1);

> > +  doSmth(&a2);

> > +  doSmth(&a3);

> > +  doSmth(&a4);

> > +  doSmth(&a5);

> > +  doSmth(&a6);

> > +

> > +  if (a1 && a2 && a3 && a4 && a5 && a6) {

> > +    nResult = 1;

> > +    h = (void*)X;

> > +    len = sizeof(X);

> > +    memset(X, a2, len);

> > +    len -= 64;

> > +    memcpy(mac ,X, len);

> > +    *(h + len) = a6;

> > +

> > +    {

> > +

> > +

> > +        unsigned char *dst = X;

> > +        size_t dlen = a3;

> > +        size_t *olen = &a6;

> > +        const unsigned char *src = mac;

> > +        size_t slen = a4;

> > +    size_t i, n;

> > +    int C1, C2, C3;

> > +    unsigned char *p;

> > +

> > +    if( slen == 0 )

> > +    {

> > +        *olen = 0;

> > +        return( 0 );

> > +    }

> > +

> > +    n = slen / 3 + ( slen % 3 != 0 );

> > +

> > +    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )

> > +    {

> > +        *olen = BASE64_SIZE_T_MAX;

> > +        return( 0 );

> > +    }

> > +

> > +    n *= 4;

> > +

> > +    if( ( dlen < n + 1 ) || ( NULL == dst ) )

> > +    {

> > +        *olen = n + 1;

> > +        return( 0 );

> > +    }

> > +

> > +    n = ( slen / 3 ) * 3;

> > +

> > +    for( i = 0, p = dst; i < n; i += 3 )

> > +    {

> > +        C1 = *src++;

> > +        C2 = *src++;

> > +        C3 = *src++;

> > +

> > +        check(i);

> > +

> > +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];

> > +        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];

> > +        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];

> > +        *p++ = base64_enc_map[C3 & 0x3F];

> > +    }

> > +

> > +    if( i < slen )

> > +    {

> > +        C1 = *src++;

> > +        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;

> > +

> > +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];

> > +        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];

> > +

> > +        if( ( i + 1 ) < slen )

> > +             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];

> > +        else *p++ = '=';

> > +

> > +        *p++ = '=';

> > +    }

> > +

> > +    *olen = p - dst;

> > +    *p = 0;

> > +

> > +}

> > +

> > +  __asm__ ("mov r8, %0;" : "=r" ( nResult ));

> > +  }

> > +  else

> > +  {

> > +    nResult = 2;

> > +  }

> > +

> > +  doSmth(X);

> > +  doSmth(mac);

> > +

> > +

> > +  return nResult;

> > +}

> > +

> > +/* The pattern below catches sequences of instructions that were generated

> > +   for ARM and Thumb-2 before the fix for this PR. They are of the form:

> > +

> > +   ldr     rX, <offset from sp or fp>

> > +   <optional non ldr instructions>

> > +   ldr     rY, <offset from sp or fp>

> > +   ldr     rZ, [rX]

> > +   <optional non ldr instructions>

> > +   cmp     rY, rZ

> > +   <optional non cmp instructions>

> > +   bl      __stack_chk_fail

> > +

> > +   Ideally the optional block would check for the various rX, rY and rZ

> > +   registers not being set but this is not possible due to back references

> > +   being illegal in lookahead expression in Tcl, thus preventing to use the

> > +   only construct that allow to negate a regexp from using the backreferences

> > +   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp

> > +   instructions with the assumptions that (i) those are not part of the stack

> > +   protector sequences and (ii) they would only be scheduled here if they don't

> > +   conflict with registers used by stack protector.

> > +

> > +   Note on the regexp logic:

> > +   Allowing non X instructions (where X is ldr or cmp) is done by looking for

> > +   some non newline spaces, followed by something which is not X, followed by

> > +   an alphanumeric character followed by anything but a newline and ended by a

> > +   newline the whole thing an undetermined number of times. The alphanumeric

> > +   character is there to force the match of the negative lookahead for X to

> > +   only happen after all the initial spaces and thus to check the mnemonic.

> > +   This prevents it to match one of the initial space.  */

> > +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */

> > +

> > +/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR

> > +   which had the form:

> > +

> > +   ldr     rS, <offset from sp or fp>

> > +   <optional non ldr instructions>

> > +   ldr     rT, <PC relative offset>

> > +   <optional non ldr instructions>

> > +   ldr     rX, [rS, rT]

> > +   <optional non ldr instructions>

> > +   ldr     rY, <offset from sp or fp>

> > +   ldr     rZ, [rX]

> > +   <optional non ldr instructions>

> > +   cmp     rY, rZ

> > +   <optional non cmp instructions>

> > +   bl      __stack_chk_fail

> > +

> > +  Note on the regexp logic:

> > +  PC relative offset is checked by looking for a source operand that does not

> > +  contain [ or ].  */

> > +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */

> >

> >
From 73337cebf34094bfe93aac06625ef72c6a7cf030 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to new require_pic_register prototype.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(stack_protect_combined_set): New insn_and_split pattern.
	(stack_protect_set): New insn pattern.
	(stack_protect_combined_test): New insn_and_split pattern.
	(stack_protect_test): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  10 ++
 gcc/config/arm/arm-protos.h            |   2 +-
 gcc/config/arm/arm.c                   |  56 +++++--
 gcc/config/arm/arm.md                  |  92 +++++++++++-
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  37 +++--
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 9 files changed, 420 insertions(+), 37 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index d6e3c382085..d1a893ac56e 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6105,8 +6105,18 @@ stack_protect_prologue (void)
 {
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
+  struct expand_operand ops[2];
 
   x = expand_normal (crtl->stack_protect_guard);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  /* Allow the target to compute address of Y and copy it to X without
+     leaking Y into a register.  This combined address + copy pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
+    return;
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262ce64..100844e659c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cf12aceb5fd..418a66a3c01 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)
 }
 
 /* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.
+
+   A new pseudo register is used for the PIC register if possible, otherwise
+   PIC_REG must be non NULL and is used instead.  COMPUTE_NOW forces the PIC
+   register to be loaded, irregardless of whether it was loaded previously.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && can_create_pseudo_p ()
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7399,7 +7405,8 @@ require_pic_register (void)
 	  rtx_insn *seq, *insn;
 
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg =
+	      can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7410,7 +7417,8 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && can_create_pseudo_p ())
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
@@ -7427,15 +7435,29 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
@@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
 
 	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
 
@@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a02668b0..1c6740f0506 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -8634,6 +8635,95 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_insn_and_split "stack_protect_combined_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "X")]
+		   UNSPEC_SP_SET))
+   (match_scratch:SI 2 "=r")
+   (match_scratch:SI 3 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  rtx addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[2],
+					    operands[3], true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[2], operands[1]);
+    }
+}"
+)
+
+(define_insn "stack_protect_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
+(define_insn_and_split "stack_protect_combined_test"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m")
+		    (unspec:SI [(match_operand:SI 1 "memory_operand" "X")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (match_scratch:SI 3 "=r")
+   (match_scratch:SI 4 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq, addr;
+
+  addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[3],
+					    operands[4],
+					    true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[3], operands[1]);
+    }
+  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
+  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
+  DONE;
+})
+
+(define_insn "stack_protect_test"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "r"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  ""
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e10e4..429c937df60 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 734bc765387..f760da495bf 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7362,22 +7362,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 142cdaec2ce..bf96dd6feae 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
   rtx_insn *seq;
+  struct expand_operand ops[3];
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
-  else
-    y = const0_rtx;
-
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
-    emit_insn (seq);
-  else
-    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  create_fixed_operand (&ops[2], label);
+  /* Allow the target to compute address of Y and compare it with X without
+     leaking Y into a register.  This combined address + compare pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,
+			       3, ops))
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ()
+	  && ((seq = targetm.gen_stack_protect_test (x, y, label))
+	      != NULL_RTX))
+	emit_insn (seq);
+      else
+	emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+    }
 
   /* The noreturn predictor has been moved to the tree level.  The rtl-level
      predictors estimate this branch about 20%, which isn't enough to get
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
Kyrill Tkachov July 31, 2018, 1:36 p.m. | #7
Hi Thomas,

On 25/07/18 14:28, Thomas Preudhomme wrote:
> Hi Kyrill,

>

> Using memory_operand worked, the issues I encountered when using it in

> earlier versions of the patch must have been due to the missing test

> on address_operand in the preparation statements which I added later.

> Please find an updated patch in attachment. ChangeLog entry is as

> follows:

>

> *** gcc/ChangeLog ***

>

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

>

>      * target-insns.def (stack_protect_combined_set): Define new standard

>      pattern name.

>      (stack_protect_combined_test): Likewise.

>      * cfgexpand.c (stack_protect_prologue): Try new

>      stack_protect_combined_set pattern first.

>      * function.c (stack_protect_epilogue): Try new

>      stack_protect_combined_test pattern first.

>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

>      parameters to control which register to use as PIC register and force

>      reloading PIC register respectively.  Insert in the stream of insns if

>      possible.

>      (legitimize_pic_address): Expose above new parameters in prototype and

>      adapt recursive calls accordingly.

>      (arm_legitimize_address): Adapt to new legitimize_pic_address

>      prototype.

>      (thumb_legitimize_address): Likewise.

>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.

>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

>      change.

>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

>      prototype change.

>      (stack_protect_combined_set): New insn_and_split pattern.

>      (stack_protect_set): New insn pattern.

>      (stack_protect_combined_test): New insn_and_split pattern.

>      (stack_protect_test): New insn pattern.

>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

>      (UNSPEC_SP_TEST): Likewise.

>      * doc/md.texi (stack_protect_combined_set): Document new standard

>      pattern name.

>      (stack_protect_set): Clarify that the operand for guard's address is

>      legal.

>      (stack_protect_combined_test): Document new standard pattern name.

>      (stack_protect_test): Clarify that the operand for guard's address is

>      legal.

>

> *** gcc/testsuite/ChangeLog ***

>

> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

>

>      * gcc.target/arm/pr85434.c: New test.

>

> Bootstrapped again for Arm and Thumb-2 and regtested with and without

> -fstack-protector-all without any regression.


This looks ok to me now.
Thank you for your patience and addressing my comments from before.

Kyrill

> Best regards,

>

> Thomas

> On Thu, 19 Jul 2018 at 17:34, Thomas Preudhomme

> <thomas.preudhomme@linaro.org> wrote:

>> [Dropping Jeff Law from the list since he already commented on the

>> middle end parts]

>>

>> Hi Kyrill,

>>

>> On Thu, 19 Jul 2018 at 12:02, Kyrill Tkachov

>> <kyrylo.tkachov@foss.arm.com> wrote:

>>> Hi Thomas,

>>>

>>> On 17/07/18 12:02, Thomas Preudhomme wrote:

>>>> Fixed in attached patch. ChangeLog entries are unchanged:

>>>>

>>>> *** gcc/ChangeLog ***

>>>>

>>>> 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

>>>>

>>>>      PR target/85434

>>>>      * target-insns.def (stack_protect_combined_set): Define new standard

>>>>      pattern name.

>>>>      (stack_protect_combined_test): Likewise.

>>>>      * cfgexpand.c (stack_protect_prologue): Try new

>>>>      stack_protect_combined_set pattern first.

>>>>      * function.c (stack_protect_epilogue): Try new

>>>>      stack_protect_combined_test pattern first.

>>>>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now

>>>>      parameters to control which register to use as PIC register and force

>>>>      reloading PIC register respectively.

>>>>      (legitimize_pic_address): Expose above new parameters in prototype and

>>>>      adapt recursive calls accordingly.

>>>>      (arm_legitimize_address): Adapt to new legitimize_pic_address

>>>>      prototype.

>>>>      (thumb_legitimize_address): Likewise.

>>>>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.

>>>>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype

>>>>      change.

>>>>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

>>>>      prototype change.

>>>>      (stack_protect_combined_set): New insn_and_split pattern.

>>>>      (stack_protect_set): New insn pattern.

>>>>      (stack_protect_combined_test): New insn_and_split pattern.

>>>>      (stack_protect_test): New insn pattern.

>>>>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

>>>>      (UNSPEC_SP_TEST): Likewise.

>>>>      * doc/md.texi (stack_protect_combined_set): Document new standard

>>>>      pattern name.

>>>>      (stack_protect_set): Clarify that the operand for guard's address is

>>>>      legal.

>>>>      (stack_protect_combined_test): Document new standard pattern name.

>>>>      (stack_protect_test): Clarify that the operand for guard's address is

>>>>      legal.

>>>>

>>>> *** gcc/testsuite/ChangeLog ***

>>>>

>>>> 2018-07-05  Thomas Preud'homme <thomas.preudhomme@linaro.org>

>>>>

>>>>      PR target/85434

>>>>      * gcc.target/arm/pr85434.c: New test.

>>>>

>>> Sorry for the delay. Some comments inline.

>>>

>>> Kyrill

>>>

>>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c

>>> index d6e3c382085..d1a893ac56e 100644

>>> --- a/gcc/cfgexpand.c

>>> +++ b/gcc/cfgexpand.c

>>> @@ -6105,8 +6105,18 @@ stack_protect_prologue (void)

>>>    {

>>>      tree guard_decl = targetm.stack_protect_guard ();

>>>      rtx x, y;

>>> +  struct expand_operand ops[2];

>>>

>>>      x = expand_normal (crtl->stack_protect_guard);

>>> +  create_fixed_operand (&ops[0], x);

>>> +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));

>>> +  /* Allow the target to compute address of Y and copy it to X without

>>> +     leaking Y into a register.  This combined address + copy pattern allows

>>> +     the target to prevent spilling of any intermediate results by splitting

>>> +     it after register allocator.  */

>>> +  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))

>>> +    return;

>>> +

>>>      if (guard_decl)

>>>        y = expand_normal (guard_decl);

>>>      else

>>> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

>>> index 8537262ce64..100844e659c 100644

>>> --- a/gcc/config/arm/arm-protos.h

>>> +++ b/gcc/config/arm/arm-protos.h

>>> @@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);

>>>    extern int arm_split_constant (RTX_CODE, machine_mode, rtx,

>>>                                 HOST_WIDE_INT, rtx, rtx, int);

>>>    extern int legitimate_pic_operand_p (rtx);

>>> -extern rtx legitimize_pic_address (rtx, machine_mode, rtx);

>>> +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);

>>>    extern rtx legitimize_tls_address (rtx, rtx);

>>>    extern bool arm_legitimate_address_p (machine_mode, rtx, bool);

>>>    extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);

>>> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

>>> index ec3abbcba9f..f4a970580c2 100644

>>> --- a/gcc/config/arm/arm.c

>>> +++ b/gcc/config/arm/arm.c

>>> @@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)

>>>    }

>>>

>>>    /* Record that the current function needs a PIC register.  Initialize

>>> -   cfun->machine->pic_reg if we have not already done so.  */

>>> +   cfun->machine->pic_reg if we have not already done so.

>>> +

>>> +   If not NULL, PIC_REG indicates which register to use as PIC register,

>>> +   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC

>>> +   register to be loaded, irregardless of whether it was loaded previously.  */

>>>

>>>    static void

>>> -require_pic_register (void)

>>> +require_pic_register (rtx pic_reg, bool compute_now)

>>>    {

>>>      /* A lot of the logic here is made obscure by the fact that this

>>>         routine gets called as part of the rtx cost estimation process.

>>>         We don't want those calls to affect any assumptions about the real

>>>         function; and further, we can't call entry_of_function() until we

>>>         start the real expansion process.  */

>>> -  if (!crtl->uses_pic_offset_table)

>>> +  if (!crtl->uses_pic_offset_table || compute_now)

>>>        {

>>> -      gcc_assert (can_create_pseudo_p ());

>>> +      gcc_assert (can_create_pseudo_p ()

>>> +                 || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));

>>>          if (arm_pic_register != INVALID_REGNUM

>>> +         && can_create_pseudo_p ()

>>>            && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))

>>>          {

>>>            if (!cfun->machine->pic_reg)

>>> @@ -7399,7 +7405,8 @@ require_pic_register (void)

>>>            rtx_insn *seq, *insn;

>>>

>>>            if (!cfun->machine->pic_reg)

>>> -           cfun->machine->pic_reg = gen_reg_rtx (Pmode);

>>> +           cfun->machine->pic_reg =

>>> +             can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;

>>>

>>>

>>> I'll admit I'm not too familiar with this code, but don't you want to be setting cfun->machine->pic_reg to the pic_reg that

>>> was passed into this function (if pic_reg != NULL as per the function comment)?

>> Indeed the comment needs to be fixed. I think it only makes sense to

>> specify a register after register allocation when the amount of

>> register is finite. Before that I don't see why one would want to do

>> that. I've changed the sentence about PIC_REG in the heading comment

>> to:

>>

>> "A new pseudo register is used for the PIC register if possible,

>> otherwise PIC_REG must be non NULL and is used instead."

>>

>>>

>>>            /* Play games to avoid marking the function as needing pic

>>>               if we are being called as part of the cost-estimation

>>> @@ -7410,7 +7417,8 @@ require_pic_register (void)

>>>                start_sequence ();

>>>

>>>                if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM

>>> -                 && arm_pic_register > LAST_LO_REGNUM)

>>> +                 && arm_pic_register > LAST_LO_REGNUM

>>> +                 && can_create_pseudo_p ())

>>>                  emit_move_insn (cfun->machine->pic_reg,

>>>                                  gen_rtx_REG (Pmode, arm_pic_register));

>>>                else

>>> @@ -7427,15 +7435,29 @@ require_pic_register (void)

>>>                   we can't yet emit instructions directly in the final

>>>                   insn stream.  Queue the insns on the entry edge, they will

>>>                   be committed after everything else is expanded.  */

>>> -             insert_insn_on_edge (seq,

>>> -                                  single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

>>> +             if (can_create_pseudo_p ())

>>> +               insert_insn_on_edge (seq,

>>> +                                    single_succ_edge

>>> +                                    (ENTRY_BLOCK_PTR_FOR_FN (cfun)));

>>> +             else

>>> +               emit_insn (seq);

>>>              }

>>>

>>> It's not the capability for creating pseudos that is relevant here, but whether we have an instruction

>>> stream to insert insns on. Do you think that using currently_expanding_to_rtl would work as a condition?

>> Indeed it's not the right test. Do you mean testing if

>> currently_expanding_to_rtl is false? I used can_create_pseudo_p

>> because I knew the function could never be called when it was false

>> before and thus my changes would only impact the code I was changing.

>>

>>>

>>>    +/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is

>>>

>>> "Legitimize"

>>>

>>>    +   created to hold the result of the load.  If not NULL, PIC_REG indicates

>>> +   which register to use as PIC register, otherwise it is decided by register

>>> +   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current

>>> +   location in the instruction stream, irregardless of whether it was loaded

>>> +   previously.

>>> +

>>> +   Returns the register REG into which the PIC load is performed.  */

>>> +

>>>    rtx

>>> -legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

>>> +legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,

>>> +                       bool compute_now)

>>>    {

>>>      if (GET_CODE (orig) == SYMBOL_REF

>>>          || GET_CODE (orig) == LABEL_REF)

>>> @@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

>>>            rtx mem;

>>>

>>>            /* If this function doesn't have a pic register, create one now.  */

>>> -         require_pic_register ();

>>> +         require_pic_register (pic_reg, compute_now);

>>>

>>>            pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);

>>>

>>> @@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)

>>>

>>>          gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);

>>>

>>> -      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);

>>> +      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,

>>> +                                    pic_reg, compute_now);

>>>          offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,

>>> -                                      base == reg ? 0 : reg);

>>> +                                      base == reg ? 0 : reg, pic_reg,

>>> +                                      compute_now);

>>>

>>>          if (CONST_INT_P (offset))

>>>          {

>>> @@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)

>>>        {

>>>          /* We need to find and carefully transform any SYMBOL and LABEL

>>>           references; so go back to the original address expression.  */

>>> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);

>>> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,

>>> +                                         false /*compute_now*/);

>>>

>>>          if (new_x != orig_x)

>>>          x = new_x;

>>> @@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)

>>>        {

>>>          /* We need to find and carefully transform any SYMBOL and LABEL

>>>           references; so go back to the original address expression.  */

>>> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);

>>> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,

>>> +                                         false /*compute_now*/);

>>>

>>>          if (new_x != orig_x)

>>>          x = new_x;

>>> @@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)

>>>            ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))

>>>            : !SYMBOL_REF_LOCAL_P (addr)))

>>>        {

>>> -      require_pic_register ();

>>> +      require_pic_register (NULL_RTX, false /*compute_now*/);

>>>          use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);

>>>        }

>>>

>>> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

>>> index 361a02668b0..9582c3fbb11 100644

>>> --- a/gcc/config/arm/arm.md

>>> +++ b/gcc/config/arm/arm.md

>>> @@ -6021,7 +6021,8 @@

>>>          operands[1] = legitimize_pic_address (operands[1], SImode,

>>>                                              (!can_create_pseudo_p ()

>>>                                               ? operands[0]

>>> -                                            : 0));

>>> +                                            : NULL_RTX), NULL_RTX,

>>> +                                           false /*compute_now*/);

>>>      }

>>>      "

>>>    )

>>> @@ -8634,6 +8635,95 @@

>>>       (set_attr "conds" "clob")]

>>>    )

>>>

>>> +;; Named patterns for stack smashing protection.

>>> +(define_insn_and_split "stack_protect_combined_set"

>>> +  [(set (match_operand:SI 0 "memory_operand" "=m")

>>> +       (unspec:SI [(match_operand:SI 1 "general_operand" "X")]

>>> +                  UNSPEC_SP_SET))

>>> +   (match_scratch:SI 2 "=r")

>>> +   (match_scratch:SI 3 "=r")]

>>> +  ""

>>> +  "#"

>>> +  "reload_completed"

>>> +  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]

>>> +                                           UNSPEC_SP_SET))

>>> +             (clobber (match_dup 2))])]

>>> +  "

>>> +{

>>> +  rtx addr = XEXP (operands[1], 0);

>>>

>>>

>>> I think you want to tighten the predicate of operands[1].

>>> general_operand is very, well, general, but you're taking its 0th subtree, which it may not necessarily have.

>> I was afraid you'd ask that. Yes indeed a better predicate should be

>> used but I'm not sure all the possible cases here. However it has to

>> be some sort of memory since this is the address of the guard. I think

>> the problem with using memory_operand was that it could be a MEM of a

>> SYMBOL_REF which is not legitimate at this point.

>>

>>>    +  if (flag_pic)

>>> +    {

>>> +      /* Forces recomputing of GOT base now.  */

>>> +      operands[1] = legitimize_pic_address (addr, SImode, operands[2],

>>> +                                           operands[3], true /*compute_now*/);

>>> +    }

>>> +  else

>>> +    {

>>> +      if (!address_operand (addr, SImode))

>>> +       operands[1] = force_const_mem (SImode, addr);

>>> +      emit_move_insn (operands[2], operands[1]);

>>> +    }

>>> +}"

>>> +)

>>> +

>>> +(define_insn "stack_protect_set"

>>> +  [(set (match_operand:SI 0 "memory_operand" "=m")

>>> +       (unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]

>>> +        UNSPEC_SP_SET))

>>> +   (clobber (match_dup 1))]

>>>

>>> I'm not familiar with this idiom, don't you instead want to mark the constraint on operand 1 as earlyclobber (or read-write)?

>> According to the documentation earlyclobber applies to operands being

>> written to and this is an input operand only. As the doc says the

>> earlyclobber indicates to the compiler that it cannot reuse a register

>> used for an input operand to put the output operand into. Here I'm

>> trying to express instead that the operand needs to be saved/restored

>> if needed after this insn because it gets written to. This is only

>> needed in Thumb-1 because stack_protect_set_combined and

>> stack_protect_test_combined already use all registers available so I

>> cannot ask for an extra clobber register.

>>

>>>    +  ""

>>> +  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"

>>> +  [(set_attr "length" "12")

>>> +   (set_attr "type" "multiple")])

>>> +

>>> +(define_insn_and_split "stack_protect_combined_test"

>>> +  [(set (pc)

>>> +       (if_then_else

>>> +               (eq (match_operand:SI 0 "memory_operand" "m")

>>> +                   (unspec:SI [(match_operand:SI 1 "general_operand" "X")]

>>> +                              UNSPEC_SP_TEST))

>>> +               (label_ref (match_operand 2))

>>> +               (pc)))

>>> +   (match_scratch:SI 3 "=r")

>>> +   (match_scratch:SI 4 "=r")]

>>> +  ""

>>> +  "#"

>>> +  "reload_completed"

>>> +  [(const_int 0)]

>>> +{

>>> +  rtx eq, addr;

>>> +

>>> +  addr = XEXP (operands[1], 0);

>>>

>>> Same concern as with stack_protect_combined_set about operand 1 predicate being too general.

>> Will try to create a new predicate, thanks for raising this up.

>>

>>>    +  if (flag_pic)

>>> +    {

>>> +      /* Forces recomputing of GOT base now.  */

>>> +      operands[1] = legitimize_pic_address (addr, SImode, operands[3],

>>> +                                           operands[4],

>>> +                                           true /*compute_now*/);

>>> +    }

>>> +  else

>>> +    {

>>> +      if (!address_operand (addr, SImode))

>>> +       operands[1] = force_const_mem (SImode, addr);

>>> +      emit_move_insn (operands[3], operands[1]);

>>> +    }

>>> +  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));

>>> +  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);

>>> +  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));

>>> +  DONE;

>>> +})

>>> +

>>> +(define_insn "stack_protect_test"

>>> +  [(set (match_operand:SI 0 "register_operand" "=r")

>>> +       (unspec:SI [(match_operand:SI 1 "memory_operand" "m")

>>> +                   (mem:SI (match_operand:SI 2 "register_operand" "r"))]

>>> +        UNSPEC_SP_TEST))

>>> +   (clobber (match_dup 2))]

>>>

>>> Same concern on clobbering and constraints.

>> Same answer :-)

>>

>> Best regards,

>>

>> Thomas

>>

>>>    +  ""

>>> +  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"

>>> +  [(set_attr "length" "12")

>>> +   (set_attr "type" "multiple")])

>>> +

>>>    (define_expand "casesi"

>>>      [(match_operand:SI 0 "s_register_operand" "")       ; index to jump on

>>>       (match_operand:SI 1 "const_int_operand" "")        ; lower bound

>>> diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md

>>> index b05f85e10e4..429c937df60 100644

>>> --- a/gcc/config/arm/unspecs.md

>>> +++ b/gcc/config/arm/unspecs.md

>>> @@ -86,6 +86,9 @@

>>>      UNSPEC_PROBE_STACK    ; Probe stack memory reference

>>>      UNSPEC_NONSECURE_MEM        ; Represent non-secure memory in ARMv8-M with

>>>                          ; security extension

>>> +  UNSPEC_SP_SET                ; Represent the setting of stack protector's canary

>>> +  UNSPEC_SP_TEST       ; Represent the testing of stack protector's canary

>>> +                       ; against the guard.

>>>    ])

>>>

>>>    (define_c_enum "unspec" [

>>> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi

>>> index 734bc765387..f760da495bf 100644

>>> --- a/gcc/doc/md.texi

>>> +++ b/gcc/doc/md.texi

>>> @@ -7362,22 +7362,61 @@ builtins.

>>>    The get/set patterns have a single output/input operand respectively,

>>>    with @var{mode} intended to be @code{Pmode}.

>>>

>>> +@cindex @code{stack_protect_combined_set} instruction pattern

>>> +@item @samp{stack_protect_combined_set}

>>> +This pattern, if defined, moves a @code{ptr_mode} value from an address

>>> +whose declaration RTX is given in operand 1 to the memory in operand 0

>>> +without leaving the value in a register afterward.  If several

>>> +instructions are needed by the target to perform the operation (eg. to

>>> +load the address from a GOT entry then load the @code{ptr_mode} value

>>> +and finally store it), it is the backend's responsibility to ensure no

>>> +intermediate result gets spilled.  This is to avoid leaking the value

>>> +some place that an attacker might use to rewrite the stack guard slot

>>> +after having clobbered it.

>>> +

>>> +If this pattern is not defined, then the address declaration is

>>> +expanded first in the standard way and a @code{stack_protect_set}

>>> +pattern is then generated to move the value from that address to the

>>> +address in operand 0.

>>> +

>>>    @cindex @code{stack_protect_set} instruction pattern

>>>    @item @samp{stack_protect_set}

>>> -This pattern, if defined, moves a @code{ptr_mode} value from the memory

>>> -in operand 1 to the memory in operand 0 without leaving the value in

>>> -a register afterward.  This is to avoid leaking the value some place

>>> -that an attacker might use to rewrite the stack guard slot after

>>> -having clobbered it.

>>> +This pattern, if defined, moves a @code{ptr_mode} value from the valid

>>> +memory location in operand 1 to the memory in operand 0 without leaving

>>> +the value in a register afterward.  This is to avoid leaking the value

>>> +some place that an attacker might use to rewrite the stack guard slot

>>> +after having clobbered it.

>>> +

>>> +Note: on targets where the addressing modes do not allow to load

>>> +directly from stack guard address, the address is expanded in a standard

>>> +way first which could cause some spills.

>>>

>>>    If this pattern is not defined, then a plain move pattern is generated.

>>>

>>> +@cindex @code{stack_protect_combined_test} instruction pattern

>>> +@item @samp{stack_protect_combined_test}

>>> +This pattern, if defined, compares a @code{ptr_mode} value from an

>>> +address whose declaration RTX is given in operand 1 with the memory in

>>> +operand 0 without leaving the value in a register afterward and

>>> +branches to operand 2 if the values were equal.  If several

>>> +instructions are needed by the target to perform the operation (eg. to

>>> +load the address from a GOT entry then load the @code{ptr_mode} value

>>> +and finally store it), it is the backend's responsibility to ensure no

>>> +intermediate result gets spilled.  This is to avoid leaking the value

>>> +some place that an attacker might use to rewrite the stack guard slot

>>> +after having clobbered it.

>>> +

>>> +If this pattern is not defined, then the address declaration is

>>> +expanded first in the standard way and a @code{stack_protect_test}

>>> +pattern is then generated to compare the value from that address to the

>>> +value at the memory in operand 0.

>>> +

>>>    @cindex @code{stack_protect_test} instruction pattern

>>>    @item @samp{stack_protect_test}

>>>    This pattern, if defined, compares a @code{ptr_mode} value from the

>>> -memory in operand 1 with the memory in operand 0 without leaving the

>>> -value in a register afterward and branches to operand 2 if the values

>>> -were equal.

>>> +valid memory location in operand 1 with the memory in operand 0 without

>>> +leaving the value in a register afterward and branches to operand 2 if

>>> +the values were equal.

>>>

>>>    If this pattern is not defined, then a plain compare pattern and

>>>    conditional branch pattern is used.

>>> diff --git a/gcc/function.c b/gcc/function.c

>>> index 142cdaec2ce..bf96dd6feae 100644

>>> --- a/gcc/function.c

>>> +++ b/gcc/function.c

>>> @@ -4886,20 +4886,33 @@ stack_protect_epilogue (void)

>>>      rtx_code_label *label = gen_label_rtx ();

>>>      rtx x, y;

>>>      rtx_insn *seq;

>>> +  struct expand_operand ops[3];

>>>

>>>      x = expand_normal (crtl->stack_protect_guard);

>>> -  if (guard_decl)

>>> -    y = expand_normal (guard_decl);

>>> -  else

>>> -    y = const0_rtx;

>>> -

>>> -  /* Allow the target to compare Y with X without leaking either into

>>> -     a register.  */

>>> -  if (targetm.have_stack_protect_test ()

>>> -      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))

>>> -    emit_insn (seq);

>>> -  else

>>> -    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);

>>> +  create_fixed_operand (&ops[0], x);

>>> +  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));

>>> +  create_fixed_operand (&ops[2], label);

>>> +  /* Allow the target to compute address of Y and compare it with X without

>>> +     leaking Y into a register.  This combined address + compare pattern allows

>>> +     the target to prevent spilling of any intermediate results by splitting

>>> +     it after register allocator.  */

>>> +  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,

>>> +                              3, ops))

>>> +    {

>>> +      if (guard_decl)

>>> +       y = expand_normal (guard_decl);

>>> +      else

>>> +       y = const0_rtx;

>>> +

>>> +      /* Allow the target to compare Y with X without leaking either into

>>> +        a register.  */

>>> +      if (targetm.have_stack_protect_test ()

>>> +         && ((seq = targetm.gen_stack_protect_test (x, y, label))

>>> +             != NULL_RTX))

>>> +       emit_insn (seq);

>>> +      else

>>> +       emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);

>>> +    }

>>>

>>>      /* The noreturn predictor has been moved to the tree level.  The rtl-level

>>>         predictors estimate this branch about 20%, which isn't enough to get

>>> diff --git a/gcc/target-insns.def b/gcc/target-insns.def

>>> index 9a552c3d11c..d39889b3522 100644

>>> --- a/gcc/target-insns.def

>>> +++ b/gcc/target-insns.def

>>> @@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,

>>>    DEF_TARGET_INSN (simple_return, (void))

>>>    DEF_TARGET_INSN (split_stack_prologue, (void))

>>>    DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))

>>> +DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))

>>>    DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))

>>> +DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))

>>>    DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))

>>>    DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))

>>>    DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))

>>> diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c

>>> new file mode 100644

>>> index 00000000000..4143a861f7c

>>> --- /dev/null

>>> +++ b/gcc/testsuite/gcc.target/arm/pr85434.c

>>> @@ -0,0 +1,200 @@

>>> +/* { dg-do compile } */

>>> +/* { dg-require-effective-target fstack_protector }*/

>>> +/* { dg-require-effective-target fpic }*/

>>> +/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */

>>> +

>>> +#include <stddef.h>

>>> +#include <stdint.h>

>>> +

>>> +

>>> +static const unsigned char base64_enc_map[64] =

>>> +{

>>> +    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',

>>> +    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',

>>> +    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',

>>> +    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',

>>> +    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',

>>> +    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',

>>> +    '8', '9', '+', '/'

>>> +};

>>> +

>>> +#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */

>>> +

>>> +

>>> +void doSmth(void *x);

>>> +

>>> +#include <string.h>

>>> +

>>> +

>>> +void check(int n) {

>>> +

>>> +    if (!(n % 2 && n % 3 && n % 5)) {

>>> + __asm__  (   "add    r8, r8, #1;" );

>>> +    }

>>> +}

>>> +

>>> +uint32_t test(

>>> +  uint32_t a1,

>>> +  uint32_t a2,

>>> +  size_t a3,

>>> +  size_t a4,

>>> +  size_t a5,

>>> +  size_t a6)

>>> +{

>>> +  uint32_t nResult = 0;

>>> +  uint8_t* h = 0L;

>>> +  uint8_t X[128];

>>> +  uint8_t mac[64];

>>> +  size_t len;

>>> +

>>> +  doSmth(&a1);

>>> +  doSmth(&a2);

>>> +  doSmth(&a3);

>>> +  doSmth(&a4);

>>> +  doSmth(&a5);

>>> +  doSmth(&a6);

>>> +

>>> +  if (a1 && a2 && a3 && a4 && a5 && a6) {

>>> +    nResult = 1;

>>> +    h = (void*)X;

>>> +    len = sizeof(X);

>>> +    memset(X, a2, len);

>>> +    len -= 64;

>>> +    memcpy(mac ,X, len);

>>> +    *(h + len) = a6;

>>> +

>>> +    {

>>> +

>>> +

>>> +        unsigned char *dst = X;

>>> +        size_t dlen = a3;

>>> +        size_t *olen = &a6;

>>> +        const unsigned char *src = mac;

>>> +        size_t slen = a4;

>>> +    size_t i, n;

>>> +    int C1, C2, C3;

>>> +    unsigned char *p;

>>> +

>>> +    if( slen == 0 )

>>> +    {

>>> +        *olen = 0;

>>> +        return( 0 );

>>> +    }

>>> +

>>> +    n = slen / 3 + ( slen % 3 != 0 );

>>> +

>>> +    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )

>>> +    {

>>> +        *olen = BASE64_SIZE_T_MAX;

>>> +        return( 0 );

>>> +    }

>>> +

>>> +    n *= 4;

>>> +

>>> +    if( ( dlen < n + 1 ) || ( NULL == dst ) )

>>> +    {

>>> +        *olen = n + 1;

>>> +        return( 0 );

>>> +    }

>>> +

>>> +    n = ( slen / 3 ) * 3;

>>> +

>>> +    for( i = 0, p = dst; i < n; i += 3 )

>>> +    {

>>> +        C1 = *src++;

>>> +        C2 = *src++;

>>> +        C3 = *src++;

>>> +

>>> +        check(i);

>>> +

>>> +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];

>>> +        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];

>>> +        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];

>>> +        *p++ = base64_enc_map[C3 & 0x3F];

>>> +    }

>>> +

>>> +    if( i < slen )

>>> +    {

>>> +        C1 = *src++;

>>> +        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;

>>> +

>>> +        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];

>>> +        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];

>>> +

>>> +        if( ( i + 1 ) < slen )

>>> +             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];

>>> +        else *p++ = '=';

>>> +

>>> +        *p++ = '=';

>>> +    }

>>> +

>>> +    *olen = p - dst;

>>> +    *p = 0;

>>> +

>>> +}

>>> +

>>> +  __asm__ ("mov r8, %0;" : "=r" ( nResult ));

>>> +  }

>>> +  else

>>> +  {

>>> +    nResult = 2;

>>> +  }

>>> +

>>> +  doSmth(X);

>>> +  doSmth(mac);

>>> +

>>> +

>>> +  return nResult;

>>> +}

>>> +

>>> +/* The pattern below catches sequences of instructions that were generated

>>> +   for ARM and Thumb-2 before the fix for this PR. They are of the form:

>>> +

>>> +   ldr     rX, <offset from sp or fp>

>>> +   <optional non ldr instructions>

>>> +   ldr     rY, <offset from sp or fp>

>>> +   ldr     rZ, [rX]

>>> +   <optional non ldr instructions>

>>> +   cmp     rY, rZ

>>> +   <optional non cmp instructions>

>>> +   bl      __stack_chk_fail

>>> +

>>> +   Ideally the optional block would check for the various rX, rY and rZ

>>> +   registers not being set but this is not possible due to back references

>>> +   being illegal in lookahead expression in Tcl, thus preventing to use the

>>> +   only construct that allow to negate a regexp from using the backreferences

>>> +   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp

>>> +   instructions with the assumptions that (i) those are not part of the stack

>>> +   protector sequences and (ii) they would only be scheduled here if they don't

>>> +   conflict with registers used by stack protector.

>>> +

>>> +   Note on the regexp logic:

>>> +   Allowing non X instructions (where X is ldr or cmp) is done by looking for

>>> +   some non newline spaces, followed by something which is not X, followed by

>>> +   an alphanumeric character followed by anything but a newline and ended by a

>>> +   newline the whole thing an undetermined number of times. The alphanumeric

>>> +   character is there to force the match of the negative lookahead for X to

>>> +   only happen after all the initial spaces and thus to check the mnemonic.

>>> +   This prevents it to match one of the initial space.  */

>>> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */

>>> +

>>> +/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR

>>> +   which had the form:

>>> +

>>> +   ldr     rS, <offset from sp or fp>

>>> +   <optional non ldr instructions>

>>> +   ldr     rT, <PC relative offset>

>>> +   <optional non ldr instructions>

>>> +   ldr     rX, [rS, rT]

>>> +   <optional non ldr instructions>

>>> +   ldr     rY, <offset from sp or fp>

>>> +   ldr     rZ, [rX]

>>> +   <optional non ldr instructions>

>>> +   cmp     rY, rZ

>>> +   <optional non cmp instructions>

>>> +   bl      __stack_chk_fail

>>> +

>>> +  Note on the regexp logic:

>>> +  PC relative offset is checked by looking for a source operand that does not

>>> +  contain [ or ].  */

>>> +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */

>>>

>>>
H.J. Lu Aug. 2, 2018, 12:59 p.m. | #8
On Tue, Jul 31, 2018 at 6:36 AM, Kyrill  Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi Thomas,

>

>

> On 25/07/18 14:28, Thomas Preudhomme wrote:

>>

>> Hi Kyrill,

>>

>> Using memory_operand worked, the issues I encountered when using it in

>> earlier versions of the patch must have been due to the missing test

>> on address_operand in the preparation statements which I added later.

>> Please find an updated patch in attachment. ChangeLog entry is as

>> follows:

>>

>> *** gcc/ChangeLog ***

>>

>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

>>

>>      * target-insns.def (stack_protect_combined_set): Define new standard

>>      pattern name.

>>      (stack_protect_combined_test): Likewise.

>>      * cfgexpand.c (stack_protect_prologue): Try new

>>      stack_protect_combined_set pattern first.

>>      * function.c (stack_protect_epilogue): Try new

>>      stack_protect_combined_test pattern first.

>>      * config/arm/arm.c (require_pic_register): Add pic_reg and

>> compute_now

>>      parameters to control which register to use as PIC register and force

>>      reloading PIC register respectively.  Insert in the stream of insns

>> if

>>      possible.

>>      (legitimize_pic_address): Expose above new parameters in prototype

>> and

>>      adapt recursive calls accordingly.

>>      (arm_legitimize_address): Adapt to new legitimize_pic_address

>>      prototype.

>>      (thumb_legitimize_address): Likewise.

>>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.

>>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to

>> prototype

>>      change.

>>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address

>>      prototype change.

>>      (stack_protect_combined_set): New insn_and_split pattern.

>>      (stack_protect_set): New insn pattern.

>>      (stack_protect_combined_test): New insn_and_split pattern.

>>      (stack_protect_test): New insn pattern.

>>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.

>>      (UNSPEC_SP_TEST): Likewise.

>>      * doc/md.texi (stack_protect_combined_set): Document new standard

>>      pattern name.

>>      (stack_protect_set): Clarify that the operand for guard's address is

>>      legal.

>>      (stack_protect_combined_test): Document new standard pattern name.

>>      (stack_protect_test): Clarify that the operand for guard's address is

>>      legal.

>>

>> *** gcc/testsuite/ChangeLog ***

>>

>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

>>

>>      * gcc.target/arm/pr85434.c: New test.

>>

>> Bootstrapped again for Arm and Thumb-2 and regtested with and without

>> -fstack-protector-all without any regression.

>

>

> This looks ok to me now.

> Thank you for your patience and addressing my comments from before.

>


This breaks x86:

FAIL: gcc.dg/fstack-protector-strong.c (internal compiler error)
FAIL: gcc.dg/fstack-protector-strong.c (test for excess errors)
FAIL: gcc.dg/fstack-protector-strong.c  (test for warnings, line 109)
FAIL: gcc.dg/pr71585-3.c (internal compiler error)
FAIL: gcc.dg/pr71585-3.c (test for excess errors)
FAIL: gcc.dg/pr71585.c (internal compiler error)
FAIL: gcc.dg/pr71585.c (test for excess errors)
FAIL: gcc.target/i386/pr37275.c (internal compiler error)
FAIL: gcc.target/i386/pr37275.c (test for excess errors)
FAIL: gcc.target/i386/pr47780.c (internal compiler error)
FAIL: gcc.target/i386/pr47780.c (test for excess errors)
FAIL: gcc.target/i386/pr50788.c (internal compiler error)
FAIL: gcc.target/i386/pr50788.c (test for excess errors)
FAIL: gcc.target/i386/pr68680.c (internal compiler error)
FAIL: gcc.target/i386/pr68680.c (test for excess errors)
FAIL: gcc.target/i386/stack-prot-guard.c (internal compiler error)
FAIL: gcc.target/i386/stack-prot-guard.c (test for excess errors)
FAIL: gcc.target/i386/stack-prot-sym.c (internal compiler error)
FAIL: gcc.target/i386/stack-prot-sym.c (test for excess errors)
FAIL: g++.dg/fstack-protector-strong.C  -std=gnu++11 (internal compiler error)
FAIL: g++.dg/fstack-protector-strong.C  -std=gnu++11 (test for excess errors)
FAIL: g++.dg/fstack-protector-strong.C  -std=gnu++14 (internal compiler error)
FAIL: g++.dg/fstack-protector-strong.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/fstack-protector-strong.C  -std=gnu++98 (internal compiler error)
FAIL: g++.dg/fstack-protector-strong.C  -std=gnu++98 (test for excess errors)
FAIL: g++.dg/pr65032.C   (internal compiler error)
FAIL: g++.dg/pr65032.C   (test for excess errors)
FAIL: g++.dg/stackprotectexplicit2.C  -std=gnu++11 (internal compiler error)
FAIL: g++.dg/stackprotectexplicit2.C  -std=gnu++11 (test for excess errors)
FAIL: g++.dg/stackprotectexplicit2.C  -std=gnu++14 (internal compiler error)
FAIL: g++.dg/stackprotectexplicit2.C  -std=gnu++14 (test for excess errors)
FAIL: g++.dg/stackprotectexplicit2.C  -std=gnu++98 (internal compiler error)
FAIL: g++.dg/stackprotectexplicit2.C  -std=gnu++98 (test for excess errors)


-- 
H.J.

Patch

From d917d48c2005e46154383589f203d06f3c6167e0 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	PR target/85434
	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to new require_pic_register prototype.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(stack_protect_combined_set): New insn_and_split pattern.
	(stack_protect_set): New insn pattern.
	(stack_protect_combined_test): New insn_and_split pattern.
	(stack_protect_test): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	PR target/85434
	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas
---
 gcc/cfgexpand.c                        |  10 ++
 gcc/config/arm/arm-protos.h            |   2 +-
 gcc/config/arm/arm.c                   |  56 ++++++---
 gcc/config/arm/arm.md                  |  92 ++++++++++++++-
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  53 +++++++--
 gcc/function.c                         |  37 ++++--
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++++++++++
 9 files changed, 419 insertions(+), 36 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 9b91279..ea837e0 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6101,8 +6101,18 @@  stack_protect_prologue (void)
 {
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
+  struct expand_operand ops[2];
 
   x = expand_normal (crtl->stack_protect_guard);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  /* Allow the target to compute address of Y and copy it to X without
+     leaking Y into a register.  This combined address + copy pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
+    return;
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262..100844e 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,7 +67,7 @@  extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ec3abbc..f4a9705 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7369,20 +7369,26 @@  legitimate_pic_operand_p (rtx x)
 }
 
 /* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.
+
+   If not NULL, PIC_REG indicates which register to use as PIC register,
+   otherwise it is decided by register allocator.  COMPUTE_NOW forces the PIC
+   register to be loaded, irregardless of whether it was loaded previously.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && can_create_pseudo_p ()
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7399,7 +7405,8 @@  require_pic_register (void)
 	  rtx_insn *seq, *insn;
 
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg =
+	      can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7410,7 +7417,8 @@  require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && can_create_pseudo_p ())
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
@@ -7427,15 +7435,29 @@  require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (can_create_pseudo_p ())
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legimitize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
@@ -7469,7 +7491,7 @@  legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
 
 	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
 
@@ -7520,9 +7542,11 @@  legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -8707,7 +8731,8 @@  arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8775,7 +8800,8 @@  thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18059,7 +18085,7 @@  arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 361a026..9582c3f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@ 
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -8634,6 +8635,95 @@ 
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_insn_and_split "stack_protect_combined_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+		   UNSPEC_SP_SET))
+   (match_scratch:SI 2 "=r")
+   (match_scratch:SI 3 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  rtx addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[2],
+					    operands[3], true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[2], operands[1]);
+    }
+}"
+)
+
+(define_insn "stack_protect_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
+(define_insn_and_split "stack_protect_combined_test"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m")
+		    (unspec:SI [(match_operand:SI 1 "general_operand" "X")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (match_scratch:SI 3 "=r")
+   (match_scratch:SI 4 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq, addr;
+
+  addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[3],
+					    operands[4],
+					    true /*compute_now*/);
+    }
+  else
+    {
+      if (!address_operand (addr, SImode))
+	operands[1] = force_const_mem (SImode, addr);
+      emit_move_insn (operands[3], operands[1]);
+    }
+  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
+  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
+  DONE;
+})
+
+(define_insn "stack_protect_test"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "r"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  ""
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index b05f85e..429c937 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@ 
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 09d6e30..da6423e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7356,22 +7356,61 @@  builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finallystore it), it is the backend's responsability to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
+This pattern, if defined, moves a @code{ptr_mode} value from the legal
+memory in operand 1 to the memory in operand 0 without leaving the
+value in a register afterward.  This is to avoid leaking the value some
+place that an attacker might use to rewrite the stack guard slot after
 having clobbered it.
 
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
+
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finallystore it), it is the backend's responsability to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+legal memory in operand 1 with the memory in operand 0 without leaving
+the value in a register afterward and branches to operand 2 if the
+values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 142cdae..bf96dd6 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4886,20 +4886,33 @@  stack_protect_epilogue (void)
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
   rtx_insn *seq;
+  struct expand_operand ops[3];
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
-  else
-    y = const0_rtx;
-
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
-    emit_insn (seq);
-  else
-    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+  create_fixed_operand (&ops[0], x);
+  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
+  create_fixed_operand (&ops[2], label);
+  /* Allow the target to compute address of Y and compare it with X without
+     leaking Y into a register.  This combined address + compare pattern allows
+     the target to prevent spilling of any intermediate results by splitting
+     it after register allocator.  */
+  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,
+			       3, ops))
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ()
+	  && ((seq = targetm.gen_stack_protect_test (x, y, label))
+	      != NULL_RTX))
+	emit_insn (seq);
+      else
+	emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+    }
 
   /* The noreturn predictor has been moved to the tree level.  The rtl-level
      predictors estimate this branch about 20%, which isn't enough to get
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3..d39889b 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@  DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 0000000..4143a86
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@ 
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.7.4