Message ID | 1197003135ea0b0aeecd038a07db6b787a7db6f6.1515072356.git.Richard.Earnshaw@arm.com |
---|---|
State | Superseded |
Headers | show |
Series | Add __builtin_load_no_speculate | expand |
On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw <Richard.Earnshaw@arm.com> wrote: > > This patch implements support for __builtin_load_no_speculate on > AArch64. On this architecture we inhibit speclation by emitting a > combination of CSEL and a hint instruction that ensures the CSEL is > full resolved when the operands to the CSEL may involve a speculative > load. Missing a high-level exlanation here but it looks like you do sth like if (access is not in bounds) no-speculate-access; else regular access; with the upper/lower bounds. Is this because 'no-speculate-access' is prohibitely expensive even when factoring in the condition computation and the condjump? (trying to reverse engineer how the actual assembly will look like from the patch, there isn't even a testcase :/) > * config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed > to 'H' operand qualifier. > (aarch64_inhibit_load_speculation): New function. > (TARGET_INHIBIT_LOAD_SPECULATION): Redefine. > * config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile > code. > (nospeculate<ALLI:mode>, nospeculateti): New patterns. > --- > gcc/config/aarch64/aarch64.c | 92 +++++++++++++++++++++++++++++++++++++++++++ > gcc/config/aarch64/aarch64.md | 28 +++++++++++++ > 2 files changed, 120 insertions(+) >
On 05/01/18 09:51, Richard Biener wrote: > On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw > <Richard.Earnshaw@arm.com> wrote: >> >> This patch implements support for __builtin_load_no_speculate on >> AArch64. On this architecture we inhibit speclation by emitting a >> combination of CSEL and a hint instruction that ensures the CSEL is >> full resolved when the operands to the CSEL may involve a speculative >> load. > > Missing a high-level exlanation here but it looks like you do sth like > > if (access is not in bounds) > no-speculate-access; > else > regular access; For aarch64 we end up with a sequence like CMP 'bounds' B<cond> out-of-range LDR x1, [ptr] out-of-range: CSEL x1, FAILVAL, x1, <cond> CSDB The CSEL+CSDB combination ensures that even if we do fetch x1 from an out-of-range location the value won't persist beyond the end of the code sequence and thus can't be used for further speculation. > > with the upper/lower bounds. Is this because 'no-speculate-access' is > prohibitely > expensive even when factoring in the condition computation and the condjump? > (trying to reverse engineer how the actual assembly will look like > from the patch, It's all in the white paper. > there isn't even a testcase :/) > >> * config/aarch64/aarch64.c (aarch64_print_operand): Handle zero passed >> to 'H' operand qualifier. >> (aarch64_inhibit_load_speculation): New function. >> (TARGET_INHIBIT_LOAD_SPECULATION): Redefine. >> * config/aarch64/aarch64.md (UNSPECV_NOSPECULATE): New unspec_volatile >> code. >> (nospeculate<ALLI:mode>, nospeculateti): New patterns. >> --- >> gcc/config/aarch64/aarch64.c | 92 +++++++++++++++++++++++++++++++++++++++++++ >> gcc/config/aarch64/aarch64.md | 28 +++++++++++++ >> 2 files changed, 120 insertions(+) >>
On 01/05/2018 03:48 AM, Richard Earnshaw (lists) wrote: > On 05/01/18 09:51, Richard Biener wrote: >> On Thu, Jan 4, 2018 at 2:58 PM, Richard Earnshaw >> <Richard.Earnshaw@arm.com> wrote: >>> >>> This patch implements support for __builtin_load_no_speculate on >>> AArch64. On this architecture we inhibit speclation by emitting a >>> combination of CSEL and a hint instruction that ensures the CSEL is >>> full resolved when the operands to the CSEL may involve a speculative >>> load. >> >> Missing a high-level exlanation here but it looks like you do sth like >> >> if (access is not in bounds) >> no-speculate-access; >> else >> regular access; > > For aarch64 we end up with a sequence like > > CMP 'bounds' > B<cond> out-of-range > LDR x1, [ptr] > out-of-range: > CSEL x1, FAILVAL, x1, <cond> > CSDB > > The CSEL+CSDB combination ensures that even if we do fetch x1 from an > out-of-range location the value won't persist beyond the end of the code > sequence and thus can't be used for further speculation. If I try to look at this architecture independently the code starting at out-of-range is just a conditional move and fence. Given those primitives any target ought to be able to define __builtin_load_no_speculate -- which is obviously important :-) Jeff
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 93e9d9f9..7410921 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5315,6 +5315,14 @@ aarch64_print_operand (FILE *f, rtx x, int code) break; case 'H': + /* Print the higher numbered register of a pair (TImode) of regs. */ + if (x == const0_rtx + || (CONST_DOUBLE_P (x) && aarch64_float_const_zero_rtx_p (x))) + { + asm_fprintf (f, "xzr"); + break; + } + if (!REG_P (x) || !GP_REGNUM_P (REGNO (x) + 1)) { output_operand_lossage ("invalid operand for '%%%c'", code); @@ -15115,6 +15123,87 @@ aarch64_sched_can_speculate_insn (rtx_insn *insn) } } +static rtx +aarch64_inhibit_load_speculation (machine_mode mode, rtx result, rtx mem, + rtx lower_bound, rtx upper_bound, + rtx fail_result, rtx cmpptr) +{ + rtx cond, comparison; + rtx target = gen_reg_rtx (mode); + rtx tgt2 = result; + + if (!register_operand (cmpptr, ptr_mode)) + cmpptr = force_reg (ptr_mode, cmpptr); + + if (!register_operand (tgt2, mode)) + tgt2 = gen_reg_rtx (mode); + + if (upper_bound == NULL) + { + if (!register_operand (lower_bound, ptr_mode)) + lower_bound = force_reg (ptr_mode, lower_bound); + + cond = aarch64_gen_compare_reg (LTU, cmpptr, lower_bound); + comparison = gen_rtx_LTU (VOIDmode, cond, const0_rtx); + } + else if (lower_bound == NULL) + { + if (!register_operand (upper_bound, ptr_mode)) + upper_bound = force_reg (ptr_mode, upper_bound); + + cond = aarch64_gen_compare_reg (GEU, cmpptr, upper_bound); + comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx); + } + else + { + if (!register_operand (lower_bound, ptr_mode)) + lower_bound = force_reg (ptr_mode, lower_bound); + + if (!register_operand (upper_bound, ptr_mode)) + upper_bound = force_reg (ptr_mode, upper_bound); + + rtx cond1 = aarch64_gen_compare_reg (GEU, cmpptr, lower_bound); + rtx comparison1 = gen_rtx_GEU (ptr_mode, cond1, const0_rtx); + rtx failcond = GEN_INT (aarch64_get_condition_code (comparison1)^1); + cond = gen_rtx_REG (CCmode, CC_REGNUM); + if (ptr_mode == SImode) + emit_insn (gen_ccmpsi (cond1, cond, cmpptr, upper_bound, comparison1, + failcond)); + else + emit_insn (gen_ccmpdi (cond1, cond, cmpptr, upper_bound, comparison1, + failcond)); + comparison = gen_rtx_GEU (VOIDmode, cond, const0_rtx); + } + + rtx_code_label *label = gen_label_rtx (); + emit_jump_insn (gen_condjump (comparison, cond, label)); + emit_move_insn (target, mem); + emit_label (label); + + insn_code icode; + + switch (mode) + { + case E_QImode: icode = CODE_FOR_nospeculateqi; break; + case E_HImode: icode = CODE_FOR_nospeculatehi; break; + case E_SImode: icode = CODE_FOR_nospeculatesi; break; + case E_DImode: icode = CODE_FOR_nospeculatedi; break; + case E_TImode: icode = CODE_FOR_nospeculateti; break; + default: + gcc_unreachable (); + } + + if (! insn_operand_matches (icode, 4, fail_result)) + fail_result = force_reg (mode, fail_result); + + emit_insn (GEN_FCN (icode) (tgt2, comparison, cond, target, fail_result)); + + if (tgt2 != result) + emit_move_insn (result, tgt2); + + return result; +} + /* Target-specific selftests. */ #if CHECKING_P @@ -15554,6 +15643,9 @@ aarch64_libgcc_floating_mode_supported_p #undef TARGET_CONSTANT_ALIGNMENT #define TARGET_CONSTANT_ALIGNMENT aarch64_constant_alignment +#undef TARGET_INHIBIT_LOAD_SPECULATION +#define TARGET_INHIBIT_LOAD_SPECULATION aarch64_inhibit_load_speculation + #if CHECKING_P #undef TARGET_RUN_TARGET_SELFTESTS #define TARGET_RUN_TARGET_SELFTESTS selftest::aarch64_run_selftests diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index f1e2a07..1a1f398 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,7 @@ UNSPECV_SET_FPSR ; Represent assign of FPSR content. UNSPECV_BLOCKAGE ; Represent a blockage UNSPECV_PROBE_STACK_RANGE ; Represent stack range probing. + UNSPECV_NOSPECULATE ; Inhibit speculation ] ) @@ -5797,6 +5798,33 @@ DONE; }) +(define_insn "nospeculate<ALLI:mode>" + [(set (match_operand:ALLI 0 "register_operand" "=r") + (unspec_volatile:ALLI + [(match_operator 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:ALLI 3 "register_operand" "r") + (match_operand:ALLI 4 "aarch64_reg_or_zero" "rZ")] + UNSPECV_NOSPECULATE))] + "" + "csel\\t%<w>0, %<w>3, %<w>4, %M1\;hint\t#0x14\t// CSDB" + [(set_attr "type" "csel") + (set_attr "length" "8")] +) + +(define_insn "nospeculateti" + [(set (match_operand:TI 0 "register_operand" "=r") + (unspec_volatile:TI + [(match_operator 1 "aarch64_comparison_operator" + [(match_operand 2 "cc_register" "") (const_int 0)]) + (match_operand:TI 3 "register_operand" "r") + (match_operand:TI 4 "aarch64_reg_or_zero" "rZ")] + UNSPECV_NOSPECULATE))] + "" + "csel\\t%x0, %x3, %x4, %M1\;csel\\t%H0, %H3, %H4, %M1\;hint\t#0x14\t// CSDB" + [(set_attr "type" "csel") + (set_attr "length" "12")] +) ;; AdvSIMD Stuff (include "aarch64-simd.md")