Message ID | 5666A119.8010206@arm.com |
---|---|
State | Accepted |
Commit | 3297949ea9b4b350f0ccc97f289d9d918c9ba8c7 |
Headers | show |
On Tue, Dec 08, 2015 at 09:21:29AM +0000, Kyrill Tkachov wrote: > Hi all, > > The test gcc.target/aarch64/vbslq_u64_1.c started failing recently due to some tree-level changes. > This just exposed a deficiency in our xor-and-xor pattern for the vector bit-select pattern: > aarch64_simd_bsl<mode>_internal. > > We now fail to match the rtx: > (set (reg:V4SI 79) > (xor:V4SI (and:V4SI (xor:V4SI (reg:V4SI 32 v0 [ a ]) > (reg/v:V4SI 77 [ b ])) > (reg:V4SI 34 v2 [ mask ])) > (reg/v:V4SI 77 [ b ]))) > > whereas before combine attempted: > (set (reg:V4SI 79) > (xor:V4SI (and:V4SI (xor:V4SI (reg/v:V4SI 77 [ b ]) > (reg:V4SI 32 v0 [ a ])) > (reg:V4SI 34 v2 [ mask ])) > (reg/v:V4SI 77 [ b ]))) > > Note that just the order of the operands of the inner XOR has changed. > This could be solved by making the second operand of the outer XOR a 4th operand > of the pattern, enforcing that it should be equal to operand 2 or 3 in the pattern > condition and performing the appropriate swapping in the output template. > However, the aarch64_simd_bsl<mode>_internal pattern is expanded to by other > places in aarch64-simd.md and updating all the callsites to add a 4th operand is > wasteful and makes them harder to understand. > > Therefore this patch adds a new define_insn with the match_dup of operand 2 in > the outer XOR. I also had to update the alternatives/constraints in the pattern > and the output template. Basically it involves swapping operands 2 and 3 around in the > constraints and output templates. Yuck, but OK. Thanks, James
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 030a1013caa8a965bcd1615c9686d0be715be921..2856f017c16380623960e21d66149d18efe176f0 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -2153,6 +2153,10 @@ (define_insn "aarch64_reduc_<maxmin_uns>_internal<mode>" ;; bit op0, op2, mask ;; if (op0 = op2) (so 0-bits in mask choose bits from op1, else op0) ;; bif op0, op1, mask +;; +;; This pattern is expanded to by the aarch64_simd_bsl<mode> expander. +;; Some forms of straight-line code may generate the equivalent form +;; in *aarch64_simd_bsl<mode>_alt. (define_insn "aarch64_simd_bsl<mode>_internal" [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w") @@ -2172,6 +2176,29 @@ (define_insn "aarch64_simd_bsl<mode>_internal" [(set_attr "type" "neon_bsl<q>")] ) +;; We need this form in addition to the above pattern to match the case +;; when combine tries merging three insns such that the second operand of +;; the outer XOR matches the second operand of the inner XOR rather than +;; the first. The two are equivalent but since recog doesn't try all +;; permutations of commutative operations, we have to have a separate pattern. + +(define_insn "*aarch64_simd_bsl<mode>_alt" + [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w") + (xor:VSDQ_I_DI + (and:VSDQ_I_DI + (xor:VSDQ_I_DI + (match_operand:VSDQ_I_DI 3 "register_operand" "w,w,0") + (match_operand:VSDQ_I_DI 2 "register_operand" "w,0,w")) + (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w")) + (match_dup:VSDQ_I_DI 2)))] + "TARGET_SIMD" + "@ + bsl\\t%0.<Vbtype>, %3.<Vbtype>, %2.<Vbtype> + bit\\t%0.<Vbtype>, %3.<Vbtype>, %1.<Vbtype> + bif\\t%0.<Vbtype>, %2.<Vbtype>, %1.<Vbtype>" + [(set_attr "type" "neon_bsl<q>")] +) + (define_expand "aarch64_simd_bsl<mode>" [(match_operand:VALLDIF 0 "register_operand") (match_operand:<V_cmp_result> 1 "register_operand")