Message ID | 1411069109-31425-2-git-send-email-charles.baylis@linaro.org |
---|---|
State | New |
Headers | show |
Hi Charles, Good to see these intrinsics being brought into the modern world :) Some style comments inline. On 18/09/14 20:38, Charles Baylis wrote: > This patch adds new patterns and builtins to represent single lane structure > loads instructions, which will be used to implement the vld[234](q?)_lane_* > intrinsics. > > Tested (with the rest of the patch series) with make check on aarch64-oe-linux > with qemu, and also causes no regressions in clyon's NEON intrinsics tests. > > <DATE> Charles Baylis <charles.baylis@linaro.org> > * config/aarch64/aarch64-builtins.c > (aarch64_types_loadstruct_lane_qualifiers): Define. > * config/aarch64/aarch64-simd-builtins.def (ld2_lane, ld3_lane, > ld4_lane): New builtins. > * config/aarch64/aarch64-simd.md (vec_load_lanesoi_lane<mode>): New > pattern. > (vec_load_lanesci_lane<mode>): Likewise. > (vec_load_lanesxi_lane<mode>): Likewise. > (aarch64_ld2_lane<VQ:mode>): New expand. > (aarch64_ld3_lane<VQ:mode>): Likewise. > (aarch64_ld4_lane<VQ:mode>): Likewise. This is missing an entry for the config/aarch64/aarch64.md hunk. > diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md > index 493e886..f6c4018 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -4003,6 +4003,18 @@ > [(set_attr "type" "neon_load2_2reg<q>")] > ) > > +(define_insn "vec_load_lanesoi_lane<mode>" > + [(set (match_operand:OI 0 "register_operand" "=w") > + (unspec:OI [(match_operand:<V_TWO_ELEM> 1 "aarch64_simd_struct_operand" "Utv") > + (match_operand:OI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] > + UNSPEC_LD2_LANE))] > + "TARGET_SIMD" > + "ld2\\t{%S0.<Vetype> - %T0.<Vetype>}[%3], %1" > + [(set_attr "type" "neon_load2_one_lane<q>")] > +) The VQ mode iterator goes over the 128-wide modes so the "type" attribute here will always be neon_load2_one_lane_q. Using the <q> mode attribute is still correct but personally I think it makes it just that little bit harder to figure out for a newbie who will have to open iterators.md to figure out the meaning of it, or for someone who's not sure whether the 'q' is added with an underscore or without. I would just use neon_load2_one_lane_q. > > +(define_insn "vec_load_lanesci_lane<mode>" > + [(set (match_operand:CI 0 "register_operand" "=w") > + (unspec:CI [(match_operand:<V_THREE_ELEM> 1 "aarch64_simd_struct_operand" "Utv") > + (match_operand:CI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + UNSPEC_LD3_LANE))] > + "TARGET_SIMD" > + "ld3\\t{%S0.<Vetype> - %U0.<Vetype>}[%3], %1" > + [(set_attr "type" "neon_load3_one_lane<q>")] > +) Likewise. > > +(define_insn "vec_load_lanesxi_lane<mode>" > + [(set (match_operand:XI 0 "register_operand" "=w") > + (unspec:XI [(match_operand:<V_FOUR_ELEM> 1 "aarch64_simd_struct_operand" "Utv") > + (match_operand:XI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + UNSPEC_LD4_LANE))] > + "TARGET_SIMD" > + "ld4\\t{%S0.<Vetype> - %V0.<Vetype>}[%3], %1" > + [(set_attr "type" "neon_load4_one_lane<q>")] > +) Same here. > > +(define_expand "aarch64_ld2_lane<VQ:mode>" > + [(match_operand:OI 0 "register_operand" "=w") > + (match_operand:DI 1 "register_operand" "w") > + (match_operand:OI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_SIMD" > +{ > + enum machine_mode mode = <V_TWO_ELEM>mode; > + rtx mem = gen_rtx_MEM (mode, operands[1]); > + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); > + > + emit_insn (gen_vec_load_lanesoi_lane<VQ:mode> (operands[0], > + mem, > + operands[2], > + operands[3])); > + DONE; > +}) I think saying <VQ:mode> is redundant since VQ is the only mode iterator in the pattern. Just <mode> should work, right? > + > +(define_expand "aarch64_ld3_lane<VQ:mode>" > + [(match_operand:CI 0 "register_operand" "=w") > + (match_operand:DI 1 "register_operand" "w") > + (match_operand:CI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_SIMD" > +{ > + enum machine_mode mode = <V_THREE_ELEM>mode; > + rtx mem = gen_rtx_MEM (mode, operands[1]); > + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); > + > + emit_insn (gen_vec_load_lanesci_lane<VQ:mode> (operands[0], > + mem, > + operands[2], > + operands[3])); > + DONE; > +}) Likewise. > + > +(define_expand "aarch64_ld4_lane<VQ:mode>" > + [(match_operand:XI 0 "register_operand" "=w") > + (match_operand:DI 1 "register_operand" "w") > + (match_operand:XI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_SIMD" > +{ > + enum machine_mode mode = <V_FOUR_ELEM>mode; > + rtx mem = gen_rtx_MEM (mode, operands[1]); > + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); > + > + emit_insn (gen_vec_load_lanesxi_lane<VQ:mode> (operands[0], > + mem, > + operands[2], > + operands[3])); > + DONE; > +}) > + Likewise. > + > ;; Expanders for builtins to extract vector registers from large > ;; opaque integer modes. > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c60038a..ea924ab 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -92,6 +92,9 @@ > UNSPEC_LD2 > UNSPEC_LD3 > UNSPEC_LD4 > + UNSPEC_LD2_LANE > + UNSPEC_LD3_LANE > + UNSPEC_LD4_LANE > UNSPEC_MB > UNSPEC_NOP > UNSPEC_PRLG_STK
> > +(define_expand "aarch64_ld2_lane<VQ:mode>" > + [(match_operand:OI 0 "register_operand" "=w") > + (match_operand:DI 1 "register_operand" "w") > + (match_operand:OI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_SIMD" > +{ > + enum machine_mode mode = <V_TWO_ELEM>mode; > + rtx mem = gen_rtx_MEM (mode, operands[1]); > + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); > + The endianess lane correction breaks this for BE. You don't need the endianess lane correction here - we always call neon intrinsics with the architectural lane number - irrespective of endianness. Unless ofcourse you flip it somewhere to make it a part of RTL vec_select lane patterns, which you don't here. You could also do some lane-bounds checking here in the expander. > + emit_insn (gen_vec_load_lanesoi_lane<VQ:mode> (operands[0], > + mem, > + operands[2], > + operands[3])); > + DONE; > +}) > + > +(define_expand "aarch64_ld3_lane<VQ:mode>" > + [(match_operand:CI 0 "register_operand" "=w") > + (match_operand:DI 1 "register_operand" "w") > + (match_operand:CI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_SIMD" > +{ > + enum machine_mode mode = <V_THREE_ELEM>mode; > + rtx mem = gen_rtx_MEM (mode, operands[1]); > + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); > + No endianness correction for lanes necessary. > + emit_insn (gen_vec_load_lanesci_lane<VQ:mode> (operands[0], > + mem, > + operands[2], > + operands[3])); > + DONE; > +}) > + > +(define_expand "aarch64_ld4_lane<VQ:mode>" > + [(match_operand:XI 0 "register_operand" "=w") > + (match_operand:DI 1 "register_operand" "w") > + (match_operand:XI 2 "register_operand" "0") > + (match_operand:SI 3 "immediate_operand" "i") > + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] > + "TARGET_SIMD" > +{ > + enum machine_mode mode = <V_FOUR_ELEM>mode; > + rtx mem = gen_rtx_MEM (mode, operands[1]); > + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); > + Same. > + emit_insn (gen_vec_load_lanesxi_lane<VQ:mode> (operands[0], > + mem, > + operands[2], > + operands[3])); > + DONE; > +}) > + > + > + > ;; Expanders for builtins to extract vector registers from large > ;; opaque integer modes. > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c60038a..ea924ab 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -92,6 +92,9 @@ > UNSPEC_LD2 > UNSPEC_LD3 > UNSPEC_LD4 > + UNSPEC_LD2_LANE > + UNSPEC_LD3_LANE > + UNSPEC_LD4_LANE > UNSPEC_MB > UNSPEC_NOP > UNSPEC_PRLG_STK > Thanks, Tejas.
Kyril, Tejas, Thanks for the review. I agree with all points and will respin v2 accordingly Charles
diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c index 395b4ec..818729c 100644 --- a/gcc/config/aarch64/aarch64-builtins.c +++ b/gcc/config/aarch64/aarch64-builtins.c @@ -201,6 +201,11 @@ aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS] = { qualifier_none, qualifier_const_pointer_map_mode }; #define TYPES_LOAD1 (aarch64_types_load1_qualifiers) #define TYPES_LOADSTRUCT (aarch64_types_load1_qualifiers) +static enum aarch64_type_qualifiers +aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS] + = { qualifier_none, qualifier_const_pointer_map_mode, + qualifier_none, qualifier_none }; +#define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers) static enum aarch64_type_qualifiers aarch64_types_bsl_p_qualifiers[SIMD_MAX_BUILTIN_ARGS] diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def index de264c4..5d3e122 100644 --- a/gcc/config/aarch64/aarch64-simd-builtins.def +++ b/gcc/config/aarch64/aarch64-simd-builtins.def @@ -83,6 +83,10 @@ BUILTIN_VQ (LOADSTRUCT, ld2, 0) BUILTIN_VQ (LOADSTRUCT, ld3, 0) BUILTIN_VQ (LOADSTRUCT, ld4, 0) + /* Implemented by aarch64_ld<VSTRUCT:nregs>_lane<VQ:mode>. */ + BUILTIN_VQ (LOADSTRUCT_LANE, ld2_lane, 0) + BUILTIN_VQ (LOADSTRUCT_LANE, ld3_lane, 0) + BUILTIN_VQ (LOADSTRUCT_LANE, ld4_lane, 0) /* Implemented by aarch64_st<VSTRUCT:nregs><VDC:mode>. */ BUILTIN_VDC (STORESTRUCT, st2, 0) BUILTIN_VDC (STORESTRUCT, st3, 0) diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 493e886..f6c4018 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4003,6 +4003,18 @@ [(set_attr "type" "neon_load2_2reg<q>")] ) +(define_insn "vec_load_lanesoi_lane<mode>" + [(set (match_operand:OI 0 "register_operand" "=w") + (unspec:OI [(match_operand:<V_TWO_ELEM> 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:OI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ] + UNSPEC_LD2_LANE))] + "TARGET_SIMD" + "ld2\\t{%S0.<Vetype> - %T0.<Vetype>}[%3], %1" + [(set_attr "type" "neon_load2_one_lane<q>")] +) + (define_insn "vec_store_lanesoi<mode>" [(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:OI [(match_operand:OI 1 "register_operand" "w") @@ -4034,6 +4046,18 @@ [(set_attr "type" "neon_load3_3reg<q>")] ) +(define_insn "vec_load_lanesci_lane<mode>" + [(set (match_operand:CI 0 "register_operand" "=w") + (unspec:CI [(match_operand:<V_THREE_ELEM> 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:CI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_LD3_LANE))] + "TARGET_SIMD" + "ld3\\t{%S0.<Vetype> - %U0.<Vetype>}[%3], %1" + [(set_attr "type" "neon_load3_one_lane<q>")] +) + (define_insn "vec_store_lanesci<mode>" [(set (match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:CI [(match_operand:CI 1 "register_operand" "w") @@ -4065,6 +4089,18 @@ [(set_attr "type" "neon_load4_4reg<q>")] ) +(define_insn "vec_load_lanesxi_lane<mode>" + [(set (match_operand:XI 0 "register_operand" "=w") + (unspec:XI [(match_operand:<V_FOUR_ELEM> 1 "aarch64_simd_struct_operand" "Utv") + (match_operand:XI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + UNSPEC_LD4_LANE))] + "TARGET_SIMD" + "ld4\\t{%S0.<Vetype> - %V0.<Vetype>}[%3], %1" + [(set_attr "type" "neon_load4_one_lane<q>")] +) + (define_insn "vec_store_lanesxi<mode>" [(set (match_operand:XI 0 "aarch64_simd_struct_operand" "=Utv") (unspec:XI [(match_operand:XI 1 "register_operand" "w") @@ -4378,6 +4414,65 @@ DONE; }) +(define_expand "aarch64_ld2_lane<VQ:mode>" + [(match_operand:OI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "w") + (match_operand:OI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + enum machine_mode mode = <V_TWO_ELEM>mode; + rtx mem = gen_rtx_MEM (mode, operands[1]); + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); + + emit_insn (gen_vec_load_lanesoi_lane<VQ:mode> (operands[0], + mem, + operands[2], + operands[3])); + DONE; +}) + +(define_expand "aarch64_ld3_lane<VQ:mode>" + [(match_operand:CI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "w") + (match_operand:CI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + enum machine_mode mode = <V_THREE_ELEM>mode; + rtx mem = gen_rtx_MEM (mode, operands[1]); + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); + + emit_insn (gen_vec_load_lanesci_lane<VQ:mode> (operands[0], + mem, + operands[2], + operands[3])); + DONE; +}) + +(define_expand "aarch64_ld4_lane<VQ:mode>" + [(match_operand:XI 0 "register_operand" "=w") + (match_operand:DI 1 "register_operand" "w") + (match_operand:XI 2 "register_operand" "0") + (match_operand:SI 3 "immediate_operand" "i") + (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)] + "TARGET_SIMD" +{ + enum machine_mode mode = <V_FOUR_ELEM>mode; + rtx mem = gen_rtx_MEM (mode, operands[1]); + operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3]))); + + emit_insn (gen_vec_load_lanesxi_lane<VQ:mode> (operands[0], + mem, + operands[2], + operands[3])); + DONE; +}) + + + ;; Expanders for builtins to extract vector registers from large ;; opaque integer modes. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c60038a..ea924ab 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -92,6 +92,9 @@ UNSPEC_LD2 UNSPEC_LD3 UNSPEC_LD4 + UNSPEC_LD2_LANE + UNSPEC_LD3_LANE + UNSPEC_LD4_LANE UNSPEC_MB UNSPEC_NOP UNSPEC_PRLG_STK