diff mbox

[1/4,AARCH64,NEON] Add patterns + builtins for vld[234](q?)_lane_* intrinsics

Message ID 1411069109-31425-2-git-send-email-charles.baylis@linaro.org
State New
Headers show

Commit Message

Charles Baylis Sept. 18, 2014, 7:38 p.m. UTC
This patch adds new patterns and builtins to represent single lane structure
loads instructions, which will be used to implement the vld[234](q?)_lane_*
intrinsics.

Tested (with the rest of the patch series) with make check on aarch64-oe-linux
with qemu, and also causes no regressions in clyon's NEON intrinsics tests.

<DATE>  Charles Baylis  <charles.baylis@linaro.org>
	* config/aarch64/aarch64-builtins.c
	(aarch64_types_loadstruct_lane_qualifiers): Define.
	* config/aarch64/aarch64-simd-builtins.def (ld2_lane, ld3_lane,
	ld4_lane): New builtins.
	* config/aarch64/aarch64-simd.md (vec_load_lanesoi_lane<mode>): New
	pattern.
	(vec_load_lanesci_lane<mode>): Likewise.
	(vec_load_lanesxi_lane<mode>): Likewise.
	(aarch64_ld2_lane<VQ:mode>): New expand.
	(aarch64_ld3_lane<VQ:mode>): Likewise.
	(aarch64_ld4_lane<VQ:mode>): Likewise.

Change-Id: I205ab46aa3f3f2486cc163b93e1da080a87c3419
---
 gcc/config/aarch64/aarch64-builtins.c        |  5 ++
 gcc/config/aarch64/aarch64-simd-builtins.def |  4 ++
 gcc/config/aarch64/aarch64-simd.md           | 95 ++++++++++++++++++++++++++++
 gcc/config/aarch64/aarch64.md                |  3 +
 4 files changed, 107 insertions(+)

Comments

Kyrylo Tkachov Sept. 19, 2014, 8:40 a.m. UTC | #1
Hi Charles,

Good to see these intrinsics being brought into the modern world :)
Some style comments inline.

On 18/09/14 20:38, Charles Baylis wrote:
> This patch adds new patterns and builtins to represent single lane structure
> loads instructions, which will be used to implement the vld[234](q?)_lane_*
> intrinsics.
>
> Tested (with the rest of the patch series) with make check on aarch64-oe-linux
> with qemu, and also causes no regressions in clyon's NEON intrinsics tests.
>
> <DATE>  Charles Baylis  <charles.baylis@linaro.org>
> 	* config/aarch64/aarch64-builtins.c
> 	(aarch64_types_loadstruct_lane_qualifiers): Define.
> 	* config/aarch64/aarch64-simd-builtins.def (ld2_lane, ld3_lane,
> 	ld4_lane): New builtins.
> 	* config/aarch64/aarch64-simd.md (vec_load_lanesoi_lane<mode>): New
> 	pattern.
> 	(vec_load_lanesci_lane<mode>): Likewise.
> 	(vec_load_lanesxi_lane<mode>): Likewise.
> 	(aarch64_ld2_lane<VQ:mode>): New expand.
> 	(aarch64_ld3_lane<VQ:mode>): Likewise.
> 	(aarch64_ld4_lane<VQ:mode>): Likewise.

This is missing an entry for the config/aarch64/aarch64.md hunk.

> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 493e886..f6c4018 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -4003,6 +4003,18 @@
>     [(set_attr "type" "neon_load2_2reg<q>")]
>   )
>   
> +(define_insn "vec_load_lanesoi_lane<mode>"
> +  [(set (match_operand:OI 0 "register_operand" "=w")
> +	(unspec:OI [(match_operand:<V_TWO_ELEM> 1 "aarch64_simd_struct_operand" "Utv")
> +		    (match_operand:OI 2 "register_operand" "0")
> +		    (match_operand:SI 3 "immediate_operand" "i")
> +		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
> +		   UNSPEC_LD2_LANE))]
> +  "TARGET_SIMD"
> +  "ld2\\t{%S0.<Vetype> - %T0.<Vetype>}[%3], %1"
> +  [(set_attr "type" "neon_load2_one_lane<q>")]
> +)

The VQ mode iterator goes over the 128-wide modes so the "type" 
attribute here will always be neon_load2_one_lane_q. Using the <q> mode 
attribute is still correct but personally I think it makes it just that 
little bit harder to figure out for a newbie who will have to open 
iterators.md to figure out the meaning of it, or for someone who's not 
sure whether the 'q' is added with an underscore or without. I would 
just use neon_load2_one_lane_q.

>   
> +(define_insn "vec_load_lanesci_lane<mode>"
> +  [(set (match_operand:CI 0 "register_operand" "=w")
> +	(unspec:CI [(match_operand:<V_THREE_ELEM> 1 "aarch64_simd_struct_operand" "Utv")
> +		    (match_operand:CI 2 "register_operand" "0")
> +		    (match_operand:SI 3 "immediate_operand" "i")
> +		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +		   UNSPEC_LD3_LANE))]
> +  "TARGET_SIMD"
> +  "ld3\\t{%S0.<Vetype> - %U0.<Vetype>}[%3], %1"
> +  [(set_attr "type" "neon_load3_one_lane<q>")]
> +)

Likewise.

>   
> +(define_insn "vec_load_lanesxi_lane<mode>"
> +  [(set (match_operand:XI 0 "register_operand" "=w")
> +	(unspec:XI [(match_operand:<V_FOUR_ELEM> 1 "aarch64_simd_struct_operand" "Utv")
> +		    (match_operand:XI 2 "register_operand" "0")
> +		    (match_operand:SI 3 "immediate_operand" "i")
> +		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +		   UNSPEC_LD4_LANE))]
> +  "TARGET_SIMD"
> +  "ld4\\t{%S0.<Vetype> - %V0.<Vetype>}[%3], %1"
> +  [(set_attr "type" "neon_load4_one_lane<q>")]
> +)

Same here.

>   
> +(define_expand "aarch64_ld2_lane<VQ:mode>"
> +  [(match_operand:OI 0 "register_operand" "=w")
> +	(match_operand:DI 1 "register_operand" "w")
> +	(match_operand:OI 2 "register_operand" "0")
> +	(match_operand:SI 3 "immediate_operand" "i")
> +	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +  "TARGET_SIMD"
> +{
> +  enum machine_mode mode = <V_TWO_ELEM>mode;
> +  rtx mem = gen_rtx_MEM (mode, operands[1]);
> +  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> +
> +  emit_insn (gen_vec_load_lanesoi_lane<VQ:mode> (operands[0],
> +						  mem,
> +						  operands[2],
> +						  operands[3]));
> +  DONE;
> +})

I think saying <VQ:mode> is redundant since VQ is the only mode iterator 
in the pattern.
Just <mode> should work, right?

> +
> +(define_expand "aarch64_ld3_lane<VQ:mode>"
> +  [(match_operand:CI 0 "register_operand" "=w")
> +	(match_operand:DI 1 "register_operand" "w")
> +	(match_operand:CI 2 "register_operand" "0")
> +	(match_operand:SI 3 "immediate_operand" "i")
> +	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +  "TARGET_SIMD"
> +{
> +  enum machine_mode mode = <V_THREE_ELEM>mode;
> +  rtx mem = gen_rtx_MEM (mode, operands[1]);
> +  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> +
> +  emit_insn (gen_vec_load_lanesci_lane<VQ:mode> (operands[0],
> +						  mem,
> +						  operands[2],
> +						  operands[3]));
> +  DONE;
> +})
Likewise.

> +
> +(define_expand "aarch64_ld4_lane<VQ:mode>"
> +  [(match_operand:XI 0 "register_operand" "=w")
> +	(match_operand:DI 1 "register_operand" "w")
> +	(match_operand:XI 2 "register_operand" "0")
> +	(match_operand:SI 3 "immediate_operand" "i")
> +	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +  "TARGET_SIMD"
> +{
> +  enum machine_mode mode = <V_FOUR_ELEM>mode;
> +  rtx mem = gen_rtx_MEM (mode, operands[1]);
> +  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> +
> +  emit_insn (gen_vec_load_lanesxi_lane<VQ:mode> (operands[0],
> +						  mem,
> +						  operands[2],
> +						  operands[3]));
> +  DONE;
> +})
> +

Likewise.

> +
>   ;; Expanders for builtins to extract vector registers from large
>   ;; opaque integer modes.
>   
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index c60038a..ea924ab 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -92,6 +92,9 @@
>       UNSPEC_LD2
>       UNSPEC_LD3
>       UNSPEC_LD4
> +    UNSPEC_LD2_LANE
> +    UNSPEC_LD3_LANE
> +    UNSPEC_LD4_LANE
>       UNSPEC_MB
>       UNSPEC_NOP
>       UNSPEC_PRLG_STK
Tejas Belagod Sept. 19, 2014, 10:45 a.m. UTC | #2
>
> +(define_expand "aarch64_ld2_lane<VQ:mode>"
> +  [(match_operand:OI 0 "register_operand" "=w")
> +	(match_operand:DI 1 "register_operand" "w")
> +	(match_operand:OI 2 "register_operand" "0")
> +	(match_operand:SI 3 "immediate_operand" "i")
> +	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +  "TARGET_SIMD"
> +{
> +  enum machine_mode mode = <V_TWO_ELEM>mode;
> +  rtx mem = gen_rtx_MEM (mode, operands[1]);
> +  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> +

The endianess lane correction breaks this for BE.

You don't need the endianess lane correction here - we always call neon 
intrinsics with the architectural lane number - irrespective of 
endianness. Unless ofcourse you flip it somewhere to make it a part of 
RTL vec_select lane patterns, which you don't here.

You could also do some lane-bounds checking here in the expander.

> +  emit_insn (gen_vec_load_lanesoi_lane<VQ:mode> (operands[0],
> +						  mem,
> +						  operands[2],
> +						  operands[3]));
> +  DONE;
> +})
> +
> +(define_expand "aarch64_ld3_lane<VQ:mode>"
> +  [(match_operand:CI 0 "register_operand" "=w")
> +	(match_operand:DI 1 "register_operand" "w")
> +	(match_operand:CI 2 "register_operand" "0")
> +	(match_operand:SI 3 "immediate_operand" "i")
> +	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +  "TARGET_SIMD"
> +{
> +  enum machine_mode mode = <V_THREE_ELEM>mode;
> +  rtx mem = gen_rtx_MEM (mode, operands[1]);
> +  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> +

No endianness correction for lanes necessary.

> +  emit_insn (gen_vec_load_lanesci_lane<VQ:mode> (operands[0],
> +						  mem,
> +						  operands[2],
> +						  operands[3]));
> +  DONE;
> +})
> +
> +(define_expand "aarch64_ld4_lane<VQ:mode>"
> +  [(match_operand:XI 0 "register_operand" "=w")
> +	(match_operand:DI 1 "register_operand" "w")
> +	(match_operand:XI 2 "register_operand" "0")
> +	(match_operand:SI 3 "immediate_operand" "i")
> +	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> +  "TARGET_SIMD"
> +{
> +  enum machine_mode mode = <V_FOUR_ELEM>mode;
> +  rtx mem = gen_rtx_MEM (mode, operands[1]);
> +  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> +

Same.

> +  emit_insn (gen_vec_load_lanesxi_lane<VQ:mode> (operands[0],
> +						  mem,
> +						  operands[2],
> +						  operands[3]));
> +  DONE;
> +})
> +
> +
> +
>   ;; Expanders for builtins to extract vector registers from large
>   ;; opaque integer modes.
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index c60038a..ea924ab 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -92,6 +92,9 @@
>       UNSPEC_LD2
>       UNSPEC_LD3
>       UNSPEC_LD4
> +    UNSPEC_LD2_LANE
> +    UNSPEC_LD3_LANE
> +    UNSPEC_LD4_LANE
>       UNSPEC_MB
>       UNSPEC_NOP
>       UNSPEC_PRLG_STK
>


Thanks,
Tejas.
Charles Baylis Sept. 24, 2014, 4:36 p.m. UTC | #3
Kyril, Tejas,

Thanks for the review. I agree with all points and will respin v2 accordingly

Charles
diff mbox

Patch

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 395b4ec..818729c 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -201,6 +201,11 @@  aarch64_types_load1_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_const_pointer_map_mode };
 #define TYPES_LOAD1 (aarch64_types_load1_qualifiers)
 #define TYPES_LOADSTRUCT (aarch64_types_load1_qualifiers)
+static enum aarch64_type_qualifiers
+aarch64_types_loadstruct_lane_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_none, qualifier_const_pointer_map_mode,
+      qualifier_none, qualifier_none };
+#define TYPES_LOADSTRUCT_LANE (aarch64_types_loadstruct_lane_qualifiers)
 
 static enum aarch64_type_qualifiers
 aarch64_types_bsl_p_qualifiers[SIMD_MAX_BUILTIN_ARGS]
diff --git a/gcc/config/aarch64/aarch64-simd-builtins.def b/gcc/config/aarch64/aarch64-simd-builtins.def
index de264c4..5d3e122 100644
--- a/gcc/config/aarch64/aarch64-simd-builtins.def
+++ b/gcc/config/aarch64/aarch64-simd-builtins.def
@@ -83,6 +83,10 @@ 
   BUILTIN_VQ (LOADSTRUCT, ld2, 0)
   BUILTIN_VQ (LOADSTRUCT, ld3, 0)
   BUILTIN_VQ (LOADSTRUCT, ld4, 0)
+  /* Implemented by aarch64_ld<VSTRUCT:nregs>_lane<VQ:mode>.  */
+  BUILTIN_VQ (LOADSTRUCT_LANE, ld2_lane, 0)
+  BUILTIN_VQ (LOADSTRUCT_LANE, ld3_lane, 0)
+  BUILTIN_VQ (LOADSTRUCT_LANE, ld4_lane, 0)
   /* Implemented by aarch64_st<VSTRUCT:nregs><VDC:mode>.  */
   BUILTIN_VDC (STORESTRUCT, st2, 0)
   BUILTIN_VDC (STORESTRUCT, st3, 0)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 493e886..f6c4018 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4003,6 +4003,18 @@ 
   [(set_attr "type" "neon_load2_2reg<q>")]
 )
 
+(define_insn "vec_load_lanesoi_lane<mode>"
+  [(set (match_operand:OI 0 "register_operand" "=w")
+	(unspec:OI [(match_operand:<V_TWO_ELEM> 1 "aarch64_simd_struct_operand" "Utv")
+		    (match_operand:OI 2 "register_operand" "0")
+		    (match_operand:SI 3 "immediate_operand" "i")
+		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY) ]
+		   UNSPEC_LD2_LANE))]
+  "TARGET_SIMD"
+  "ld2\\t{%S0.<Vetype> - %T0.<Vetype>}[%3], %1"
+  [(set_attr "type" "neon_load2_one_lane<q>")]
+)
+
 (define_insn "vec_store_lanesoi<mode>"
   [(set (match_operand:OI 0 "aarch64_simd_struct_operand" "=Utv")
 	(unspec:OI [(match_operand:OI 1 "register_operand" "w")
@@ -4034,6 +4046,18 @@ 
   [(set_attr "type" "neon_load3_3reg<q>")]
 )
 
+(define_insn "vec_load_lanesci_lane<mode>"
+  [(set (match_operand:CI 0 "register_operand" "=w")
+	(unspec:CI [(match_operand:<V_THREE_ELEM> 1 "aarch64_simd_struct_operand" "Utv")
+		    (match_operand:CI 2 "register_operand" "0")
+		    (match_operand:SI 3 "immediate_operand" "i")
+		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		   UNSPEC_LD3_LANE))]
+  "TARGET_SIMD"
+  "ld3\\t{%S0.<Vetype> - %U0.<Vetype>}[%3], %1"
+  [(set_attr "type" "neon_load3_one_lane<q>")]
+)
+
 (define_insn "vec_store_lanesci<mode>"
   [(set (match_operand:CI 0 "aarch64_simd_struct_operand" "=Utv")
 	(unspec:CI [(match_operand:CI 1 "register_operand" "w")
@@ -4065,6 +4089,18 @@ 
   [(set_attr "type" "neon_load4_4reg<q>")]
 )
 
+(define_insn "vec_load_lanesxi_lane<mode>"
+  [(set (match_operand:XI 0 "register_operand" "=w")
+	(unspec:XI [(match_operand:<V_FOUR_ELEM> 1 "aarch64_simd_struct_operand" "Utv")
+		    (match_operand:XI 2 "register_operand" "0")
+		    (match_operand:SI 3 "immediate_operand" "i")
+		    (unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+		   UNSPEC_LD4_LANE))]
+  "TARGET_SIMD"
+  "ld4\\t{%S0.<Vetype> - %V0.<Vetype>}[%3], %1"
+  [(set_attr "type" "neon_load4_one_lane<q>")]
+)
+
 (define_insn "vec_store_lanesxi<mode>"
   [(set (match_operand:XI 0 "aarch64_simd_struct_operand" "=Utv")
 	(unspec:XI [(match_operand:XI 1 "register_operand" "w")
@@ -4378,6 +4414,65 @@ 
   DONE;
 })
 
+(define_expand "aarch64_ld2_lane<VQ:mode>"
+  [(match_operand:OI 0 "register_operand" "=w")
+	(match_operand:DI 1 "register_operand" "w")
+	(match_operand:OI 2 "register_operand" "0")
+	(match_operand:SI 3 "immediate_operand" "i")
+	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  enum machine_mode mode = <V_TWO_ELEM>mode;
+  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+
+  emit_insn (gen_vec_load_lanesoi_lane<VQ:mode> (operands[0],
+						  mem,
+						  operands[2],
+						  operands[3]));
+  DONE;
+})
+
+(define_expand "aarch64_ld3_lane<VQ:mode>"
+  [(match_operand:CI 0 "register_operand" "=w")
+	(match_operand:DI 1 "register_operand" "w")
+	(match_operand:CI 2 "register_operand" "0")
+	(match_operand:SI 3 "immediate_operand" "i")
+	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  enum machine_mode mode = <V_THREE_ELEM>mode;
+  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+
+  emit_insn (gen_vec_load_lanesci_lane<VQ:mode> (operands[0],
+						  mem,
+						  operands[2],
+						  operands[3]));
+  DONE;
+})
+
+(define_expand "aarch64_ld4_lane<VQ:mode>"
+  [(match_operand:XI 0 "register_operand" "=w")
+	(match_operand:DI 1 "register_operand" "w")
+	(match_operand:XI 2 "register_operand" "0")
+	(match_operand:SI 3 "immediate_operand" "i")
+	(unspec:VQ [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
+  "TARGET_SIMD"
+{
+  enum machine_mode mode = <V_FOUR_ELEM>mode;
+  rtx mem = gen_rtx_MEM (mode, operands[1]);
+  operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+
+  emit_insn (gen_vec_load_lanesxi_lane<VQ:mode> (operands[0],
+						  mem,
+						  operands[2],
+						  operands[3]));
+  DONE;
+})
+
+
+
 ;; Expanders for builtins to extract vector registers from large
 ;; opaque integer modes.
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index c60038a..ea924ab 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -92,6 +92,9 @@ 
     UNSPEC_LD2
     UNSPEC_LD3
     UNSPEC_LD4
+    UNSPEC_LD2_LANE
+    UNSPEC_LD3_LANE
+    UNSPEC_LD4_LANE
     UNSPEC_MB
     UNSPEC_NOP
     UNSPEC_PRLG_STK