Message ID | 797c1fda-ba0a-f6be-f9e4-43bba28a8fc7@foss.arm.com |
---|---|
State | New |
Headers | show |
Hi Thomas, On 24/10/16 09:05, Thomas Preudhomme wrote: > Ping? > > Best regards, > > Thomas > > On 14/10/16 14:50, Thomas Preudhomme wrote: >> Ping? >> >> Best regards, >> >> Thomas >> >> On 03/10/16 17:44, Thomas Preudhomme wrote: >>> Ping? >>> >>> Best regards, >>> >>> Thomas >>> >>> On 22/09/16 14:44, Thomas Preudhomme wrote: >>>> Hi, >>>> >>>> This patch is part of a patch series to add support for atomic operations on >>>> ARMv8-M Baseline targets in GCC. This specific patch refactors the expander and >>>> splitter for atomics to make the logic work with ARMv8-M Baseline which has >>>> limitation of Thumb-1 in terms of CC flag setting and different conditional >>>> compare insn patterns. >>>> >>>> ChangeLog entry is as follows: >>>> >>>> *** gcc/ChangeLog *** >>>> >>>> 2016-09-02 Thomas Preud'homme <thomas.preudhomme@arm.com> >>>> >>>> * config/arm/arm.c (arm_expand_compare_and_swap): Add new bdst local >>>> variable. Add the new parameter to the insn generator. Set that >>>> parameter to be CC flag for 32-bit targets, bval otherwise. Set the >>>> return value from the negation of that parameter for Thumb-1, keeping >>>> the logic unchanged otherwise except for using bdst as the destination >>>> register of the compare_and_swap insn. >>>> (arm_split_compare_and_swap): Add explanation about how is the value >>>> returned to the function comment. Rename scratch variable to >>>> neg_bval. Adapt initialization of variables holding operands to the >>>> new operand numbers. Use return register to hold result of store >>>> exclusive for Thumb-1, scratch register otherwise. Construct the >>>> appropriate cbranch for Thumb-1 targets, keeping the logic unchanged >>>> for 32-bit targets. Guard Z flag setting to restrict to 32bit targets. >>>> Use gen_cbranchsi4 rather than hand-written conditional branch to loop >>>> for strongly ordered compare_and_swap. >>>> * config/arm/predicates.md (cc_register_operand): New predicate. >>>> * config/arm/sync.md (atomic_compare_and_swap<mode>_1): Use a >>>> match_operand with the new predicate to accept either the CC flag or a >>>> destination register for the boolean return value, restricting it to >>>> CC flag only via constraint. Adapt operand numbers accordingly. >>>> >>>> >>>> Testing: No code generation difference for ARMv7-A, ARMv7VE and ARMv8-A on all >>>> atomic and synchronization testcases in the testsuite [2]. Patchset was also >>>> bootstrapped with --enable-itm --enable-gomp on ARMv8-A in ARM and Thumb mode at >>>> optimization level -O1 and above [1] without any regression in the testsuite and >>>> no code generation difference in libitm and libgomp. >>>> >>>> Code generation for ARMv8-M Baseline has been manually examined and compared >>>> against ARMv8-A Thumb-2 for the following configuration without finding any >>>> issue: >>>> >>>> gcc.dg/atomic-op-2.c at -Os >>>> gcc.dg/atomic-compare-exchange-2.c at -Os >>>> gcc.dg/atomic-compare-exchange-3.c at -O3 >>>> >>>> >>>> Is this ok for trunk? >>>> This is ok. Thanks, Kyrill >>>> Best regards, >>>> >>>> Thomas >>>> >>>> [1] CFLAGS_FOR_TARGET and CXXFLAGS_FOR_TARGET were set to "-O1 -g", "-O3 -g" and >>>> undefined ("-O2 -g") >>>> [2] The exact list is: >>>> >>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-1.c >>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-2.c >>>> gcc/testsuite/gcc.dg/atomic-compare-exchange-3.c >>>> gcc/testsuite/gcc.dg/atomic-exchange-1.c >>>> gcc/testsuite/gcc.dg/atomic-exchange-2.c >>>> gcc/testsuite/gcc.dg/atomic-exchange-3.c >>>> gcc/testsuite/gcc.dg/atomic-fence.c >>>> gcc/testsuite/gcc.dg/atomic-flag.c >>>> gcc/testsuite/gcc.dg/atomic-generic.c >>>> gcc/testsuite/gcc.dg/atomic-generic-aux.c >>>> gcc/testsuite/gcc.dg/atomic-invalid-2.c >>>> gcc/testsuite/gcc.dg/atomic-load-1.c >>>> gcc/testsuite/gcc.dg/atomic-load-2.c >>>> gcc/testsuite/gcc.dg/atomic-load-3.c >>>> gcc/testsuite/gcc.dg/atomic-lockfree.c >>>> gcc/testsuite/gcc.dg/atomic-lockfree-aux.c >>>> gcc/testsuite/gcc.dg/atomic-noinline.c >>>> gcc/testsuite/gcc.dg/atomic-noinline-aux.c >>>> gcc/testsuite/gcc.dg/atomic-op-1.c >>>> gcc/testsuite/gcc.dg/atomic-op-2.c >>>> gcc/testsuite/gcc.dg/atomic-op-3.c >>>> gcc/testsuite/gcc.dg/atomic-op-6.c >>>> gcc/testsuite/gcc.dg/atomic-store-1.c >>>> gcc/testsuite/gcc.dg/atomic-store-2.c >>>> gcc/testsuite/gcc.dg/atomic-store-3.c >>>> gcc/testsuite/g++.dg/ext/atomic-1.C >>>> gcc/testsuite/g++.dg/ext/atomic-2.C >>>> gcc/testsuite/gcc.target/arm/atomic-comp-swap-release-acquire.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-acq_rel.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-acquire.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-char.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-consume.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-int.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-relaxed.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-release.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-seq_cst.c >>>> gcc/testsuite/gcc.target/arm/atomic-op-short.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_1.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_2.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_3.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_4.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_5.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_6.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_7.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_8.c >>>> gcc/testsuite/gcc.target/arm/atomic_loaddi_9.c >>>> gcc/testsuite/gcc.target/arm/sync-1.c >>>> gcc/testsuite/gcc.target/arm/synchronize.c >>>> gcc/testsuite/gcc.target/arm/armv8-sync-comp-swap.c >>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-acquire.c >>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-full.c >>>> gcc/testsuite/gcc.target/arm/armv8-sync-op-release.c >>>> libstdc++-v3/testsuite/29_atomics/atomic/60658.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/62259.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/64658.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/65147.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/65913.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/70766.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/49445.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/constexpr.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/copy_list.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/default.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/direct_list.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/single_value.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/cons/user_pod.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/51811.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/56011.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_assignment.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/integral_conversion.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/operators/pointer_partial_void.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/base_classes.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/compare_exchange_lowering.cc >>>> >>>> >>>> libstdc++-v3/testsuite/29_atomics/atomic/requirements/explicit_instantiation/1.cc >>>> >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/clear/1.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/1.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/56012.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/aggregate.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/cons/default.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/standard_layout.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/requirements/trivial.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/explicit.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_flag/test_and_set/implicit.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/60940.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/65147.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/constexpr.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/copy_list.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/default.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/direct_list.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/cons/single_value.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/bitwise.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/decrement.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/increment.cc >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_assignment.cc >>>> >>>> >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/operators/integral_conversion.cc >>>> >>>> >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/standard_layout.cc >>>> >>>> libstdc++-v3/testsuite/29_atomics/atomic_integral/requirements/trivial.cc >>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/functions_std_c++0x.cc >>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/macros.cc >>>> libstdc++-v3/testsuite/29_atomics/headers/atomic/types_std_c++0x.cc
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 39e3aa85c0cc1d42b0c58dda143513feb248827e..c3249d42ae6720369eaaebb460b25687fde0af6c 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28152,9 +28152,9 @@ emit_unlikely_jump (rtx insn) void arm_expand_compare_and_swap (rtx operands[]) { - rtx bval, rval, mem, oldval, newval, is_weak, mod_s, mod_f, x; + rtx bval, bdst, rval, mem, oldval, newval, is_weak, mod_s, mod_f, x; machine_mode mode; - rtx (*gen) (rtx, rtx, rtx, rtx, rtx, rtx, rtx); + rtx (*gen) (rtx, rtx, rtx, rtx, rtx, rtx, rtx, rtx); bval = operands[0]; rval = operands[1]; @@ -28211,43 +28211,54 @@ arm_expand_compare_and_swap (rtx operands[]) gcc_unreachable (); } - emit_insn (gen (rval, mem, oldval, newval, is_weak, mod_s, mod_f)); + bdst = TARGET_THUMB1 ? bval : gen_rtx_REG (CCmode, CC_REGNUM); + emit_insn (gen (bdst, rval, mem, oldval, newval, is_weak, mod_s, mod_f)); if (mode == QImode || mode == HImode) emit_move_insn (operands[1], gen_lowpart (mode, rval)); /* In all cases, we arrange for success to be signaled by Z set. This arrangement allows for the boolean result to be used directly - in a subsequent branch, post optimization. */ - x = gen_rtx_REG (CCmode, CC_REGNUM); - x = gen_rtx_EQ (SImode, x, const0_rtx); - emit_insn (gen_rtx_SET (bval, x)); + in a subsequent branch, post optimization. For Thumb-1 targets, the + boolean negation of the result is also stored in bval because Thumb-1 + backend lacks dependency tracking for CC flag due to flag-setting not + being represented at RTL level. */ + if (TARGET_THUMB1) + emit_insn (gen_cstoresi_eq0_thumb1 (bval, bdst)); + else + { + x = gen_rtx_EQ (SImode, bdst, const0_rtx); + emit_insn (gen_rtx_SET (bval, x)); + } } /* Split a compare and swap pattern. It is IMPLEMENTATION DEFINED whether another memory store between the load-exclusive and store-exclusive can reset the monitor from Exclusive to Open state. This means we must wait until after reload to split the pattern, lest we get a register spill in - the middle of the atomic sequence. */ + the middle of the atomic sequence. Success of the compare and swap is + indicated by the Z flag set for 32bit targets and by neg_bval being zero + for Thumb-1 targets (ie. negation of the boolean value returned by + atomic_compare_and_swapmode standard pattern in operand 0). */ void arm_split_compare_and_swap (rtx operands[]) { - rtx rval, mem, oldval, newval, scratch; + rtx rval, mem, oldval, newval, neg_bval; machine_mode mode; enum memmodel mod_s, mod_f; bool is_weak; rtx_code_label *label1, *label2; rtx x, cond; - rval = operands[0]; - mem = operands[1]; - oldval = operands[2]; - newval = operands[3]; - is_weak = (operands[4] != const0_rtx); - mod_s = memmodel_from_int (INTVAL (operands[5])); - mod_f = memmodel_from_int (INTVAL (operands[6])); - scratch = operands[7]; + rval = operands[1]; + mem = operands[2]; + oldval = operands[3]; + newval = operands[4]; + is_weak = (operands[5] != const0_rtx); + mod_s = memmodel_from_int (INTVAL (operands[6])); + mod_f = memmodel_from_int (INTVAL (operands[7])); + neg_bval = TARGET_THUMB1 ? operands[0] : operands[8]; mode = GET_MODE (mem); bool is_armv8_sync = arm_arch8 && is_mm_sync (mod_s); @@ -28279,26 +28290,44 @@ arm_split_compare_and_swap (rtx operands[]) arm_emit_load_exclusive (mode, rval, mem, use_acquire); - cond = arm_gen_compare_reg (NE, rval, oldval, scratch); - x = gen_rtx_NE (VOIDmode, cond, const0_rtx); - x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, - gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); - emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + /* Z is set to 0 for 32bit targets (resp. rval set to 1) if oldval != rval, + as required to communicate with arm_expand_compare_and_swap. */ + if (TARGET_32BIT) + { + cond = arm_gen_compare_reg (NE, rval, oldval, neg_bval); + x = gen_rtx_NE (VOIDmode, cond, const0_rtx); + x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, + gen_rtx_LABEL_REF (Pmode, label2), pc_rtx); + emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + } + else + { + emit_move_insn (neg_bval, const1_rtx); + cond = gen_rtx_NE (VOIDmode, rval, oldval); + if (thumb1_cmpneg_operand (oldval, SImode)) + emit_unlikely_jump (gen_cbranchsi4_scratch (neg_bval, rval, oldval, + label2, cond)); + else + emit_unlikely_jump (gen_cbranchsi4_insn (cond, rval, oldval, label2)); + } - arm_emit_store_exclusive (mode, scratch, mem, newval, use_release); + arm_emit_store_exclusive (mode, neg_bval, mem, newval, use_release); /* Weak or strong, we want EQ to be true for success, so that we match the flags that we got from the compare above. */ - cond = gen_rtx_REG (CCmode, CC_REGNUM); - x = gen_rtx_COMPARE (CCmode, scratch, const0_rtx); - emit_insn (gen_rtx_SET (cond, x)); + if (TARGET_32BIT) + { + cond = gen_rtx_REG (CCmode, CC_REGNUM); + x = gen_rtx_COMPARE (CCmode, neg_bval, const0_rtx); + emit_insn (gen_rtx_SET (cond, x)); + } if (!is_weak) { - x = gen_rtx_NE (VOIDmode, cond, const0_rtx); - x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, - gen_rtx_LABEL_REF (Pmode, label1), pc_rtx); - emit_unlikely_jump (gen_rtx_SET (pc_rtx, x)); + /* Z is set to boolean value of !neg_bval, as required to communicate + with arm_expand_compare_and_swap. */ + x = gen_rtx_NE (VOIDmode, neg_bval, const0_rtx); + emit_unlikely_jump (gen_cbranchsi4 (x, neg_bval, const0_rtx, label1)); } if (!is_mm_relaxed (mod_f)) diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md index 3e747d682300fe4c232a618e9a549a833ee153fe..af727edaa570fe67948c4432d9fa7bb90815feb8 100644 --- a/gcc/config/arm/predicates.md +++ b/gcc/config/arm/predicates.md @@ -391,6 +391,12 @@ || mode == CC_DGTUmode)); }) +;; Any register, including CC +(define_predicate "cc_register_operand" + (and (match_code "reg") + (ior (match_operand 0 "s_register_operand") + (match_operand 0 "cc_register")))) + (define_special_predicate "arm_extendqisi_mem_op" (and (match_operand 0 "memory_operand") (match_test "TARGET_ARM ? arm_legitimate_address_outer_p (mode, diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md index d36c24f76f670d7602f766d7172286504faa7af5..b4e0713108d9867d7226fad3241e46d1faf3172a 100644 --- a/gcc/config/arm/sync.md +++ b/gcc/config/arm/sync.md @@ -190,20 +190,20 @@ }) (define_insn_and_split "atomic_compare_and_swap<mode>_1" - [(set (reg:CC_Z CC_REGNUM) ;; bool out + [(set (match_operand 0 "cc_register_operand" "=&c") ;; bool out (unspec_volatile:CC_Z [(const_int 0)] VUNSPEC_ATOMIC_CAS)) - (set (match_operand:SI 0 "s_register_operand" "=&r") ;; val out + (set (match_operand:SI 1 "s_register_operand" "=&r") ;; val out (zero_extend:SI - (match_operand:NARROW 1 "mem_noofs_operand" "+Ua"))) ;; memory - (set (match_dup 1) + (match_operand:NARROW 2 "mem_noofs_operand" "+Ua"))) ;; memory + (set (match_dup 2) (unspec_volatile:NARROW - [(match_operand:SI 2 "arm_add_operand" "rIL") ;; expected - (match_operand:NARROW 3 "s_register_operand" "r") ;; desired - (match_operand:SI 4 "const_int_operand") ;; is_weak - (match_operand:SI 5 "const_int_operand") ;; mod_s - (match_operand:SI 6 "const_int_operand")] ;; mod_f + [(match_operand:SI 3 "arm_add_operand" "rIL") ;; expected + (match_operand:NARROW 4 "s_register_operand" "r") ;; desired + (match_operand:SI 5 "const_int_operand") ;; is_weak + (match_operand:SI 6 "const_int_operand") ;; mod_s + (match_operand:SI 7 "const_int_operand")] ;; mod_f VUNSPEC_ATOMIC_CAS)) - (clobber (match_scratch:SI 7 "=&r"))] + (clobber (match_scratch:SI 8 "=&r"))] "<sync_predtab>" "#" "&& reload_completed" @@ -219,19 +219,19 @@ [(SI "rIL") (DI "rDi")]) (define_insn_and_split "atomic_compare_and_swap<mode>_1" - [(set (reg:CC_Z CC_REGNUM) ;; bool out + [(set (match_operand 0 "cc_register_operand" "=&c") ;; bool out (unspec_volatile:CC_Z [(const_int 0)] VUNSPEC_ATOMIC_CAS)) - (set (match_operand:SIDI 0 "s_register_operand" "=&r") ;; val out - (match_operand:SIDI 1 "mem_noofs_operand" "+Ua")) ;; memory - (set (match_dup 1) + (set (match_operand:SIDI 1 "s_register_operand" "=&r") ;; val out + (match_operand:SIDI 2 "mem_noofs_operand" "+Ua")) ;; memory + (set (match_dup 2) (unspec_volatile:SIDI - [(match_operand:SIDI 2 "<cas_cmp_operand>" "<cas_cmp_str>") ;; expect - (match_operand:SIDI 3 "s_register_operand" "r") ;; desired - (match_operand:SI 4 "const_int_operand") ;; is_weak - (match_operand:SI 5 "const_int_operand") ;; mod_s - (match_operand:SI 6 "const_int_operand")] ;; mod_f + [(match_operand:SIDI 3 "<cas_cmp_operand>" "<cas_cmp_str>") ;; expect + (match_operand:SIDI 4 "s_register_operand" "r") ;; desired + (match_operand:SI 5 "const_int_operand") ;; is_weak + (match_operand:SI 6 "const_int_operand") ;; mod_s + (match_operand:SI 7 "const_int_operand")] ;; mod_f VUNSPEC_ATOMIC_CAS)) - (clobber (match_scratch:SI 7 "=&r"))] + (clobber (match_scratch:SI 8 "=&r"))] "<sync_predtab>" "#" "&& reload_completed"