[ARM] Fix PR85434: spill of stack protector's guard address

Message ID	CAKnkMGureVC-Wf09yG1x70h+LTfh+oQt_e-6nU8P8xkAi2PYAQ@mail.gmail.com
State	New
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of gcc-patches-return-476937-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id :list-unsubscribe:list-archive:list-post:list-help:sender :mime-version:from:date:message-id:subject:to:content-type; q= dns; s=default; b=MH5BqrlW58fILishrOQMxLih/qILHGivFU0shYI74AdCWH /7FJhUR6zsjAo3NAz6cXrbxzu6mz4OElJOqbTI3+HTs6mS7FQ1o6DPLWAf9fXKX+ j0dVU09ygabw4d3Mb8aBCByIeZ4httHO9tvTELh7jgcBUYE2haPWR7O+DyY1c= Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk Sender: gcc-patches-owner@gcc.gnu.org MIME-Version: 1.0 From: Thomas Preudhomme <thomas.preudhomme@linaro.org> Date: Sat, 28 Apr 2018 00:32:26 +0100 Message-ID: <CAKnkMGureVC-Wf09yG1x70h+LTfh+oQt_e-6nU8P8xkAi2PYAQ@mail.gmail.com> Subject: [ARM] Fix PR85434: spill of stack protector's guard address To: gcc-patches@gcc.gnu.org, Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>, Ramana Radhakrishnan <ramana.radhakrishnan@arm.com>, Richard Earnshaw <richard.earnshaw@arm.com> Content-Type: multipart/mixed; boundary="000000000000c13eb3056adcea38"
Series	[ARM] Fix PR85434: spill of stack protector's guard address \| expand [ARM] Fix PR85434: spill of stack protector's guard address

From 76c48e31130f212721addeeca830477e3b6f5e10 Mon Sep 17 00:00:00 2001 From: Thomas Preud'homme <thomas.preudhomme@arm.com> Date: Mon, 23 Apr 2018 14:37:11 +0100 Subject: [PATCH] [ARM] Fix PR85434: spill of stack protector's guard address On Arm (Aarch32 and Aarch64) the stack protector's guard is accessed by loading its address first before loading its value from it as part of the stack_protect_set or stack_protect_check insn pattern. This creates the risk of spilling between the two. It is particularly likely on Aarch32 when compiling PIC code because - computing the address takes several instructions (first compute the GOT base and then the GOT entry by adding an offset) which increases the likelyhood of CSE - the address computation can be CSEd due to the GOT entry computation being a MEM of the GOT base + an UNSPEC offset rather than an UNSPEC of a MEM like on AArche64. This patch address both issues by (i) adding some scheduler barriers around the stack protector code and (ii) making all memory loads involved in computing the guard's address volatile. The use of volatile rather than unspec was chosen so that the patterns for computing the guard address can be the same as for normal global variable access thus reusing more code. Finally the patch also improves the documentation to mention the need to be careful when computing the address of the guard. ChangeLog entry is as follows: *** gcc/ChangeLog *** 2018-04-27 Thomas Preud'homme <thomas.preudhomme@linaro.org> * cfgexpand.c (stack_protect_prologue): Emit scheduler barriers around stack protector code. * function.c (stack_protect_epilogue): Likewise. * config/arm/arm-protos.h (arm_stack_chk_guard_decl_p): Declare. * config/arm/arm.md (calculate_pic_address): Mark memory volatile if is computing address of stack protector's guard. (calculate_pic_address splitter): Likewise. * config/arm/arm.c (require_pic_register): Add parameter to control whether to insert instruction at the end of the instruction stream. (legitimize_pic_address): Force computing PIC address at the end of instruction stream and adapt logic to change in calculate_pic_address insn pattern. (arm_stack_chk_guard_decl_p): New function. (arm_emit_call_insn): Adapt to change in require_pic_register(). * target.def (TARGET_STACK_PROTECT_GUARD): Document requirement on guard's address computation to be careful about not spilling. * doc/tm.texi: Regenerate. *** gcc/testsuite/ChangeLog *** 2018-04-27 Thomas Preud'homme <thomas.preudhomme@linaro.org> * gcc.target/arm/pr85434.c: New testcase. Testing: The code has been boostraped on an Armv8-A machine targeting: - Aarch32 ARM mode with -mfpu=neon-fpv4 and hardfloat ABI - Aarch64 Testsuite has been run for the following sets of flag: - arm-eabi-aem/-mthumb/-march=armv4t - arm-eabi-aem/-marm/-march=armv7-a/-mfpu=vfpv3-d16/-mfloat-abi=softfp - arm-eabi-aem/-mthumb/-march=armv8-a/-mfpu=crypto-neon-fp-armv8/-mfloat-abi=hard (thereby testing the code for ARM, Thumb-2 and Thumb-1 mode) without any regression. Is it ok for trunk? Best regards, Thomas --- gcc/cfgexpand.c | 6 + gcc/config/arm/arm-protos.h | 1 + gcc/config/arm/arm.c | 55 ++++++--- gcc/config/arm/arm.md | 39 ++++++- gcc/doc/tm.texi | 4 + gcc/function.c | 5 + gcc/target.def | 4 + gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++++++++++ 8 files changed, 295 insertions(+), 19 deletions(-) create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index deab929..c7ced8f 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -6156,6 +6156,10 @@ stack_protect_prologue (void) tree guard_decl = targetm.stack_protect_guard (); rtx x, y; + /* Prevent scheduling of instruction(s) between computation of the guard's + address and setting of the canari to avoid any spill of the guard's + address if computed outside the setting of the canari. */ + emit_insn (gen_blockage ()); x = expand_normal (crtl->stack_protect_guard); if (guard_decl) y = expand_normal (guard_decl); @@ -6168,11 +6172,13 @@ stack_protect_prologue (void) if (rtx_insn *insn = targetm.gen_stack_protect_set (x, y)) { emit_insn (insn); + emit_insn (gen_blockage ()); return; } /* Otherwise do a straight move. */ emit_move_insn (x, y); + emit_insn (gen_blockage ()); } /* Translate the intermediate representation contained in the CFG diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 9d0acde..7f378be 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -67,6 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code); extern int arm_split_constant (RTX_CODE, machine_mode, rtx, HOST_WIDE_INT, rtx, rtx, int); extern int legitimate_pic_operand_p (rtx); +extern bool arm_stack_chk_guard_decl_p (rtx); extern rtx legitimize_pic_address (rtx, machine_mode, rtx); extern rtx legitimize_tls_address (rtx, rtx); extern bool arm_legitimate_address_p (machine_mode, rtx, bool); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 08120c6..fb75e24 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -7386,10 +7386,11 @@ legitimate_pic_operand_p (rtx x) } /* Record that the current function needs a PIC register. Initialize - cfun->machine->pic_reg if we have not already done so. */ + cfun->machine->pic_reg if we have not already done so. INSERT_ON_EDGE + indicates whether to insert on entry edge or in the final insn stream. */ static void -require_pic_register (void) +require_pic_register (bool insert_on_edge) { /* A lot of the logic here is made obscure by the fact that this routine gets called as part of the rtx cost estimation process. @@ -7440,12 +7441,14 @@ require_pic_register (void) if (INSN_P (insn)) INSN_LOCATION (insn) = prologue_location; - /* We can be called during expansion of PHI nodes, where - we can't yet emit instructions directly in the final - insn stream. Queue the insns on the entry edge, they will - be committed after everything else is expanded. */ - insert_insn_on_edge (seq, - single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun))); + /* We can be called during expansion of PHI nodes, where we can't + yet emit instructions directly in the final insn stream. When + this is the case, insert_on_edge is true and we queue the + insns on the entry edge. They are then committed after + everything else is expanded. */ + if (insert_on_edge) + insert_insn_on_edge (seq, + single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun))); } } } @@ -7485,14 +7488,29 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg) rtx pat; rtx mem; - /* If this function doesn't have a pic register, create one now. */ - require_pic_register (); + /* If this function doesn't have a pic register, create one now. + Recreate one if user is stack protector code to reduce live + range and reduce the risk of spilling. */ + if (arm_stack_chk_guard_decl_p (orig)) + { + crtl->uses_pic_offset_table = 0; + cfun->machine->pic_reg = NULL_RTX; + require_pic_register (false); + } + else + require_pic_register (true); - pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig); + insn = (rtx_insn *) + gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig); + pat = PATTERN (insn); - /* Make the MEM as close to a constant as possible. */ + /* Make the MEM as close to a constant as possible except when + computing address of stack smashing protector's guard. */ mem = SET_SRC (pat); - gcc_assert (MEM_P (mem) && !MEM_VOLATILE_P (mem)); + gcc_assert (MEM_P (mem)); + gcc_assert (arm_stack_chk_guard_decl_p (orig) + ? MEM_VOLATILE_P (mem) + : !MEM_VOLATILE_P (mem)); MEM_READONLY_P (mem) = 1; MEM_NOTRAP_P (mem) = 1; @@ -8494,6 +8512,15 @@ arm_tls_descseq_addr (rtx x, rtx reg) return reg; } +/* Returns whether X is the declaration tree for stack smashing protector's + guard. */ + +bool +arm_stack_chk_guard_decl_p (rtx x) +{ + return SYMBOL_REF_DECL (x) == targetm.stack_protect_guard (); +} + rtx legitimize_tls_address (rtx x, rtx reg) { @@ -18078,7 +18105,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall) ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr)) : !SYMBOL_REF_LOCAL_P (addr))) { - require_pic_register (); + require_pic_register (true); use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg); } diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2d5359e..0ba5133 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -6150,11 +6150,23 @@ ;; ;; Note: Update arm.c: legitimize_pic_address() when changing this pattern. (define_expand "calculate_pic_address" - [(set (match_operand:SI 0 "register_operand" "") - (mem:SI (plus:SI (match_operand:SI 1 "register_operand" "") - (unspec:SI [(match_operand:SI 2 "" "")] - UNSPEC_PIC_SYM))))] + [(match_operand:SI 0 "register_operand" "") + (match_operand:SI 1 "register_operand" "") + (match_operand:SI 2 "" "")] "flag_pic" + " + { + rtvec unspec_vec = gen_rtvec (1, operands[2]); + rtx unspec = gen_rtx_UNSPEC (SImode, unspec_vec, UNSPEC_PIC_SYM); + rtx plus = gen_rtx_PLUS (SImode, operands[1], unspec); + rtx mem = gen_rtx_MEM (SImode, plus); + /* PR85434: make access volatile to prevent CSE of expression computing + stack smashing protector's guard's address. */ + if (arm_stack_chk_guard_decl_p (operands[2])) + MEM_VOLATILE_P (mem) = true; + emit_move_insn (operands[0], mem); + DONE; + }" ) ;; Split calculate_pic_address into pic_load_addr_* and a move. @@ -6166,7 +6178,24 @@ "flag_pic" [(set (match_dup 3) (unspec:SI [(match_dup 2)] UNSPEC_PIC_SYM)) (set (match_dup 0) (mem:SI (plus:SI (match_dup 1) (match_dup 3))))] - "operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0];" + " + { + rtvec unspec_vec; + rtx unspec, plus, mem; + + operands[3] = can_create_pseudo_p () ? gen_reg_rtx (SImode) : operands[0]; + unspec_vec = gen_rtvec (1, operands[2]); + unspec = gen_rtx_UNSPEC (SImode, unspec_vec, UNSPEC_PIC_SYM); + emit_move_insn (operands[3], unspec); + plus = gen_rtx_PLUS (SImode, operands[1], operands[3]); + mem = gen_rtx_MEM (SImode, plus); + /* PR85434: make access volatile to prevent CSE of expression computing + stack smashing protector's guard's address. */ + if (arm_stack_chk_guard_decl_p (operands[2])) + MEM_VOLATILE_P (mem) = true; + emit_move_insn (operands[0], mem); + DONE; + }" ) ;; operand1 is the memory address to go into diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index bd8b917..804349d 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -5027,6 +5027,10 @@ runtime to some random value and is used to initialize the guard value that is placed at the top of the local stack frame. The type of this variable must be @code{ptr_type_node}. +It is the responsability of the target backend to ensure that loading its +address before loading its value does not spill the address or intermediate +values towards the address. + The default version of this hook creates a variable called @samp{__stack_chk_guard}, which is normally defined in @file{libgcc2.c}. @end deftypefn diff --git a/gcc/function.c b/gcc/function.c index 61515e3..611fcef 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -5098,6 +5098,10 @@ stack_protect_epilogue (void) rtx x, y; rtx_insn *seq; + /* Prevent scheduling of instruction(s) between computation of the guard's + address and comparison of its value with the canari to avoid any spill + of the guard's address if computed outside the check. */ + emit_insn (gen_blockage ()); x = expand_normal (crtl->stack_protect_guard); if (guard_decl) y = expand_normal (guard_decl); @@ -5122,6 +5126,7 @@ stack_protect_epilogue (void) predict_insn_def (tmp, PRED_NORETURN, TAKEN); expand_call (targetm.stack_protect_fail (), NULL_RTX, /*ignore=*/true); + emit_insn (gen_blockage ()); free_temp_slots (); emit_label (label); } diff --git a/gcc/target.def b/gcc/target.def index c5b2a1e..ec831d4 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4227,6 +4227,10 @@ runtime to some random value and is used to initialize the guard value\n\ that is placed at the top of the local stack frame. The type of this\n\ variable must be @code{ptr_type_node}.\n\ \n\ +It is the responsability of the target backend to ensure that loading its\n\ +address before loading its value does not spill the address or intermediate\n\ +values towards the address.\n\ +\n\ The default version of this hook creates a variable called\n\ @samp{__stack_chk_guard}, which is normally defined in @file{libgcc2.c}.", tree, (void), diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c new file mode 100644 index 0000000..4143a86 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/pr85434.c @@ -0,0 +1,200 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target fstack_protector }*/ +/* { dg-require-effective-target fpic }*/ +/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */ + +#include <stddef.h> +#include <stdint.h> + + +static const unsigned char base64_enc_map[64] = +{ + 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', + 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', + 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', + 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', + 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', + 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', + '8', '9', '+', '/' +}; + +#define BASE64_SIZE_T_MAX ( (size_t) -1 ) /* SIZE_T_MAX is not standard */ + + +void doSmth(void *x); + +#include <string.h> + + +void check(int n) { + + if (!(n % 2 && n % 3 && n % 5)) { + __asm__ ( "add r8, r8, #1;" ); + } +} + +uint32_t test( + uint32_t a1, + uint32_t a2, + size_t a3, + size_t a4, + size_t a5, + size_t a6) +{ + uint32_t nResult = 0; + uint8_t* h = 0L; + uint8_t X[128]; + uint8_t mac[64]; + size_t len; + + doSmth(&a1); + doSmth(&a2); + doSmth(&a3); + doSmth(&a4); + doSmth(&a5); + doSmth(&a6); + + if (a1 && a2 && a3 && a4 && a5 && a6) { + nResult = 1; + h = (void*)X; + len = sizeof(X); + memset(X, a2, len); + len -= 64; + memcpy(mac ,X, len); + *(h + len) = a6; + + { + + + unsigned char *dst = X; + size_t dlen = a3; + size_t *olen = &a6; + const unsigned char *src = mac; + size_t slen = a4; + size_t i, n; + int C1, C2, C3; + unsigned char *p; + + if( slen == 0 ) + { + *olen = 0; + return( 0 ); + } + + n = slen / 3 + ( slen % 3 != 0 ); + + if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 ) + { + *olen = BASE64_SIZE_T_MAX; + return( 0 ); + } + + n *= 4; + + if( ( dlen < n + 1 ) || ( NULL == dst ) ) + { + *olen = n + 1; + return( 0 ); + } + + n = ( slen / 3 ) * 3; + + for( i = 0, p = dst; i < n; i += 3 ) + { + C1 = *src++; + C2 = *src++; + C3 = *src++; + + check(i); + + *p++ = base64_enc_map[(C1 >> 2) & 0x3F]; + *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F]; + *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F]; + *p++ = base64_enc_map[C3 & 0x3F]; + } + + if( i < slen ) + { + C1 = *src++; + C2 = ( ( i + 1 ) < slen ) ? *src++ : 0; + + *p++ = base64_enc_map[(C1 >> 2) & 0x3F]; + *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F]; + + if( ( i + 1 ) < slen ) + *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F]; + else *p++ = '='; + + *p++ = '='; + } + + *olen = p - dst; + *p = 0; + +} + + __asm__ ("mov r8, %0;" : "=r" ( nResult )); + } + else + { + nResult = 2; + } + + doSmth(X); + doSmth(mac); + + + return nResult; +} + +/* The pattern below catches sequences of instructions that were generated + for ARM and Thumb-2 before the fix for this PR. They are of the form: + + ldr rX, <offset from sp or fp> + <optional non ldr instructions> + ldr rY, <offset from sp or fp> + ldr rZ, [rX] + <optional non ldr instructions> + cmp rY, rZ + <optional non cmp instructions> + bl __stack_chk_fail + + Ideally the optional block would check for the various rX, rY and rZ + registers not being set but this is not possible due to back references + being illegal in lookahead expression in Tcl, thus preventing to use the + only construct that allow to negate a regexp from using the backreferences + to those registers. Instead we go for the heuristic of allowing non ldr/cmp + instructions with the assumptions that (i) those are not part of the stack + protector sequences and (ii) they would only be scheduled here if they don't + conflict with registers used by stack protector. + + Note on the regexp logic: + Allowing non X instructions (where X is ldr or cmp) is done by looking for + some non newline spaces, followed by something which is not X, followed by + an alphanumeric character followed by anything but a newline and ended by a + newline the whole thing an undetermined number of times. The alphanumeric + character is there to force the match of the negative lookahead for X to + only happen after all the initial spaces and thus to check the mnemonic. + This prevents it to match one of the initial space. */ +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */ + +/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR + which had the form: + + ldr rS, <offset from sp or fp> + <optional non ldr instructions> + ldr rT, <PC relative offset> + <optional non ldr instructions> + ldr rX, [rS, rT] + <optional non ldr instructions> + ldr rY, <offset from sp or fp> + ldr rZ, [rX] + <optional non ldr instructions> + cmp rY, rZ + <optional non cmp instructions> + bl __stack_chk_fail + + Note on the regexp logic: + PC relative offset is checked by looking for a source operand that does not + contain [ or ]. */ +/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */ -- 2.7.4

[ARM] Fix PR85434: spill of stack protector's guard address

Commit Message

Comments

Patch