diff mbox series

[2/3,ARM] Refactor costs calculation for MEM.

Message ID 1505205277-26276-3-git-send-email-charles.baylis@linaro.org
State New
Headers show
Series Addressing mode costs v3 | expand

Commit Message

Charles Baylis Sept. 12, 2017, 8:34 a.m. UTC
From: Charles Baylis <charles.baylis@linaro.org>


This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.

In this patch, the calculation of the cost of the addressing
mode is left as a placeholder, to be added in a subsequent
patch.

gcc/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

	* config/arm/arm.c (arm_mem_costs): New function.
	(arm_rtx_costs_internal): Use arm_mem_costs.

gcc/testsuite/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

	* gcc.target/arm/addr-modes-float.c: New test.
	* gcc.target/arm/addr-modes-int.c: New test.
	* gcc.target/arm/addr-modes.h: New header.

Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
 gcc/config/arm/arm.c                            | 67 ++++++++++++++++---------
 gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 ++++++++++++++++
 gcc/testsuite/gcc.target/arm/addr-modes-int.c   | 46 +++++++++++++++++
 gcc/testsuite/gcc.target/arm/addr-modes.h       | 53 +++++++++++++++++++
 4 files changed, 183 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h

-- 
2.7.4

Comments

Kyrill Tkachov Sept. 13, 2017, 9:02 a.m. UTC | #1
Hi Charles,

On 12/09/17 09:34, charles.baylis@linaro.org wrote:
> From: Charles Baylis <charles.baylis@linaro.org>

>

> This patch moves the calculation of costs for MEM into a

> separate function, and reforms the calculation into two

> parts. Firstly any additional cost of the addressing mode

> is calculated, and then the cost of the memory access itself

> is added.

>

> In this patch, the calculation of the cost of the addressing

> mode is left as a placeholder, to be added in a subsequent

> patch.

>


Can you please mention how has this series been tested?
A bootstrap and test run on arm-none-linux-gnueabihf is required at least.
Also, do you have any benchmarking results for this?
I agree that generating the addressing modes in the new tests is desirable.
So I'm not objecting to the goal of this patch, but a check to make sure 
that this doesn't regress SPEC
would be great.  Further comments on the patch inline.

> gcc/ChangeLog:

>

> <date>  Charles Baylis <charles.baylis@linaro.org>

>

>         * config/arm/arm.c (arm_mem_costs): New function.

>         (arm_rtx_costs_internal): Use arm_mem_costs.

>

> gcc/testsuite/ChangeLog:

>

> <date>  Charles Baylis <charles.baylis@linaro.org>

>

>         * gcc.target/arm/addr-modes-float.c: New test.

>         * gcc.target/arm/addr-modes-int.c: New test.

>         * gcc.target/arm/addr-modes.h: New header.

>

> Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e

> ---

>  gcc/config/arm/arm.c                            | 67 

> ++++++++++++++++---------

>  gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 ++++++++++++++++

>  gcc/testsuite/gcc.target/arm/addr-modes-int.c   | 46 +++++++++++++++++

>  gcc/testsuite/gcc.target/arm/addr-modes.h       | 53 +++++++++++++++++++

>  4 files changed, 183 insertions(+), 25 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c

>  create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c

>  create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h

>

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index 32001e5..b8dbed6 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -9228,8 +9228,48 @@ arm_unspec_cost (rtx x, enum rtx_code /* 

> outer_code */, bool speed_p, int *cost)

> } \

>          while (0);

>

> +/* Helper function for arm_rtx_costs_internal.  Calculates the cost 

> of a MEM,

> +   considering the costs of the addressing mode and memory access

> +   separately.  */

> +static bool

> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,

> +              int *cost, bool speed_p)

> +{

> +  machine_mode mode = GET_MODE (x);

> +  if (flag_pic

> +      && GET_CODE (XEXP (x, 0)) == PLUS

> +      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))

> +    /* This will be split into two instructions.  Add the cost of the

> +       additional instruction here.  The cost of the memory access is 

> computed

> +       below.  See arm.md:calculate_pic_address.  */

> +    *cost = COSTS_N_INSNS (1);

> +  else

> +    *cost = 0;


For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a 
each insn)
plus the appropriate field in extra_cost. So you should unconditionally 
initialise the cost
to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) 
with the condition above.

> +

> +  /* Calculate cost of the addressing mode.  */

> +  if (speed_p)

> +    {

> +      /* TODO: Add table-driven costs for addressing modes.  (See 

> patch 2) */

> +    }


You mean "patch 3". I recommend you just remove this conditional from 
this patch and add the logic
in patch 3 entirely.

> +

> +  /* Calculate cost of memory access.  */

> +  if (speed_p)

> +    {

> +      /* data transfer is transfer size divided by bus width.  */

> +      int bus_width_bytes = current_tune->bus_width / 4;


This should be bus_width / BITS_PER_UNIT to get the size in bytes.
BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the
cost calculation and generated code is still appropriate.

> +      *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);

> +      *cost += extra_cost->ldst.load;

> +    }

> +  else

> +    {

> +      *cost += COSTS_N_INSNS (1);

> +    }


Given my first comment above this else would be deleted.

Thanks,
Kyrill

> +

> +  return true;

> +}

> +

>  /* RTX costs.  Make an estimate of the cost of executing the operation

> -   X, which is contained with an operation with code OUTER_CODE.

> +   X, which is contained within an operation with code OUTER_CODE.

>     SPEED_P indicates whether the cost desired is the performance cost,

>     or the size cost.  The estimate is stored in COST and the return

>     value is TRUE if the cost calculation is final, or FALSE if the

> @@ -9308,30 +9348,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code 

> code, enum rtx_code outer_code,

>        return false;

>

>      case MEM:

> -      /* A memory access costs 1 insn if the mode is small, or the 

> address is

> -        a single register, otherwise it costs one insn per word.  */

> -      if (REG_P (XEXP (x, 0)))

> -       *cost = COSTS_N_INSNS (1);

> -      else if (flag_pic

> -              && GET_CODE (XEXP (x, 0)) == PLUS

> -              && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))

> -       /* This will be split into two instructions.

> -          See arm.md:calculate_pic_address.  */

> -       *cost = COSTS_N_INSNS (2);

> -      else

> -       *cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));

> -

> -      /* For speed optimizations, add the costs of the address and

> -        accessing memory.  */

> -      if (speed_p)

> -#ifdef NOT_YET

> -       *cost += (extra_cost->ldst.load

> -                 + arm_address_cost (XEXP (x, 0), mode,

> -                                     ADDR_SPACE_GENERIC, speed_p));

> -#else

> -        *cost += extra_cost->ldst.load;

> -#endif

> -      return true;

> +      return arm_mem_costs (x, extra_cost, cost, speed_p);

>

>      case PARALLEL:

>      {

> diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c 

> b/gcc/testsuite/gcc.target/arm/addr-modes-float.c

> new file mode 100644

> index 0000000..3b4235c

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c

> @@ -0,0 +1,42 @@

> +/* { dg-options "-O2" } */

> +/* { dg-add-options arm_neon } */

> +/* { dg-require-effective-target arm_neon_ok } */

> +/* { dg-do compile } */

> +

> +#include <arm_neon.h>

> +

> +#include "addr-modes.h"

> +

> +POST_STORE(float)

> +/* { dg-final { scan-assembler "vstmia.32" } } */

> +POST_STORE(double)

> +/* { dg-final { scan-assembler "vstmia.64" } } */

> +

> +POST_LOAD(float)

> +/* { dg-final { scan-assembler "vldmia.32" } } */

> +POST_LOAD(double)

> +/* { dg-final { scan-assembler "vldmia.64" } } */

> +

> +POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)

> +/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } 

> } */

> +POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)

> +/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, 

> \\\[r\[0-9\]+\\\]!" } } */

> +

> +POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)

> +/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } 

> } */

> +POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)

> +/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, 

> \\\[r\[0-9\]+\\\]!" } } */

> +

> +POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)

> +/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } 

> } */

> +POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)

> +/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], 

> d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], 

> d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */

> +

> +POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)

> +/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } 

> } */

> +POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)

> +/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], 

> d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], 

> d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */

> +

> +/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */

> diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c 

> b/gcc/testsuite/gcc.target/arm/addr-modes-int.c

> new file mode 100644

> index 0000000..e3e1e6a

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c

> @@ -0,0 +1,46 @@

> +/* { dg-options "-O2 -march=armv7-a" } */

> +/* { dg-add-options arm_neon } */

> +/* { dg-require-effective-target arm_neon_ok } */

> +/* { dg-do compile } */

> +

> +#include "addr-modes.h"

> +

> +typedef long long ll;

> +

> +PRE_STORE(char)

> +/* { dg-final { scan-assembler "strb.*#1]!" } } */

> +PRE_STORE(short)

> +/* { dg-final { scan-assembler "strh.*#2]!" } } */

> +PRE_STORE(int)

> +/* { dg-final { scan-assembler "str.*#4]!" } } */

> +PRE_STORE(ll)

> +/* { dg-final { scan-assembler "strd.*#8]!" } } */

> +

> +POST_STORE(char)

> +/* { dg-final { scan-assembler "strb.*], #1" } } */

> +POST_STORE(short)

> +/* { dg-final { scan-assembler "strh.*], #2" } } */

> +POST_STORE(int)

> +/* { dg-final { scan-assembler "str.*], #4" } } */

> +POST_STORE(ll)

> +/* { dg-final { scan-assembler "strd.*], #8" } } */

> +

> +PRE_LOAD(char)

> +/* { dg-final { scan-assembler "ldrb.*#1]!" } } */

> +PRE_LOAD(short)

> +/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */

> +PRE_LOAD(int)

> +/* { dg-final { scan-assembler "ldr.*#4]!" } } */

> +PRE_LOAD(ll)

> +/* { dg-final { scan-assembler "ldrd.*#8]!" } } */

> +

> +POST_LOAD(char)

> +/* { dg-final { scan-assembler "ldrb.*], #1" } } */

> +POST_LOAD(short)

> +/* { dg-final { scan-assembler "ldrsh.*], #2" } } */

> +POST_LOAD(int)

> +/* { dg-final { scan-assembler "ldr.*], #4" } } */

> +POST_LOAD(ll)

> +/* { dg-final { scan-assembler "ldrd.*], #8" } } */

> +

> +/* { dg-final { scan-assembler-not "\tadd" } } */

> diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h 

> b/gcc/testsuite/gcc.target/arm/addr-modes.h

> new file mode 100644

> index 0000000..eac4678

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/addr-modes.h

> @@ -0,0 +1,53 @@

> +

> +#define PRE_STORE(T)                   \

> +  T *                                  \

> +  T ## _pre_store (T *p, T v)          \

> +  {                                    \

> +    *++p = v;                          \

> +    return p;                          \

> +  }                                    \

> +

> +#define POST_STORE(T)                  \

> +  T *                                  \

> +  T ## _post_store (T *p, T v)         \

> +  {                                    \

> +    *p++ = v;                          \

> +    return p;                          \

> +  }

> +

> +#define POST_STORE_VEC(T, VT, OP)      \

> +  T *                                  \

> +  VT ## _post_store (T * p, VT v)      \

> +  {                                    \

> +    OP (p, v);                         \

> +    p += sizeof (VT) / sizeof (T);     \

> +    return p;                          \

> +  }

> +

> +#define PRE_LOAD(T)                    \

> +  void                                 \

> +  T ## _pre_load (T *p)                        \

> +  {                                    \

> +    extern void f ## T (T*,T);         \

> +    T x = *++p;                                \

> +    f ## T (p, x);                     \

> +  }

> +

> +#define POST_LOAD(T)                   \

> +  void                                 \

> +  T ## _post_load (T *p)               \

> +  {                                    \

> +    extern void f ## T (T*,T);         \

> +    T x = *p++;                                \

> +    f ## T (p, x);                     \

> +  }

> +

> +#define POST_LOAD_VEC(T, VT, OP)       \

> +  void                                 \

> +  VT ## _post_load (T * p)             \

> +  {                                    \

> +    extern void f ## T (T*,T);         \

> +    VT x = OP (p, v);                  \

> +    p += sizeof (VT) / sizeof (T);     \

> +    f ## T (p, x);                     \

> +  }

> -- 

> 2.7.4

>
Charles Baylis Sept. 15, 2017, 3:38 p.m. UTC | #2
On 13 September 2017 at 10:02, Kyrill  Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
> Hi Charles,

>

> On 12/09/17 09:34, charles.baylis@linaro.org wrote:

>>

>> From: Charles Baylis <charles.baylis@linaro.org>

>>

>> This patch moves the calculation of costs for MEM into a

>> separate function, and reforms the calculation into two

>> parts. Firstly any additional cost of the addressing mode

>> is calculated, and then the cost of the memory access itself

>> is added.

>>

>> In this patch, the calculation of the cost of the addressing

>> mode is left as a placeholder, to be added in a subsequent

>> patch.

>>

>

> Can you please mention how has this series been tested?

> A bootstrap and test run on arm-none-linux-gnueabihf is required at least.


It has been tested with make check on arm-unknown-linux-gnueabihf with
no regressions. I've successfully bootstrapped the next spin.

> Also, do you have any benchmarking results for this?

> I agree that generating the addressing modes in the new tests is desirable.

> So I'm not objecting to the goal of this patch, but a check to make sure

> that this doesn't regress SPEC

> would be great.  Further comments on the patch inline.


SPEC2006 scores are unaffected by this patch on Cortex-A57.

>> +/* Helper function for arm_rtx_costs_internal.  Calculates the cost of a

>> MEM,

>> +   considering the costs of the addressing mode and memory access

>> +   separately.  */

>> +static bool

>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,

>> +              int *cost, bool speed_p)

>> +{

>> +  machine_mode mode = GET_MODE (x);

>> +  if (flag_pic

>> +      && GET_CODE (XEXP (x, 0)) == PLUS

>> +      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))

>> +    /* This will be split into two instructions.  Add the cost of the

>> +       additional instruction here.  The cost of the memory access is

>> computed

>> +       below.  See arm.md:calculate_pic_address.  */

>> +    *cost = COSTS_N_INSNS (1);

>> +  else

>> +    *cost = 0;

>

>

> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a each

> insn)

> plus the appropriate field in extra_cost. So you should unconditionally

> initialise the cost

> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) with

> the condition above.


OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)
part because the cost of a single bus transfer is included in that
initial cost.

>> +

>> +  /* Calculate cost of the addressing mode.  */

>> +  if (speed_p)

>> +    {

>> +      /* TODO: Add table-driven costs for addressing modes.  (See patch

>> 2) */

>> +    }

>

>

> You mean "patch 3". I recommend you just remove this conditional from this

> patch and add the logic

> in patch 3 entirely.


OK.

>> +

>> +  /* Calculate cost of memory access.  */

>> +  if (speed_p)

>> +    {

>> +      /* data transfer is transfer size divided by bus width.  */

>> +      int bus_width_bytes = current_tune->bus_width / 4;

>

>

> This should be bus_width / BITS_PER_UNIT to get the size in bytes.

> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the

> cost calculation and generated code is still appropriate.


Oops, I changed the units around and messed this up. I'll fix this.

>> +      *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);

>> +      *cost += extra_cost->ldst.load;

>> +    }

>> +  else

>> +    {

>> +      *cost += COSTS_N_INSNS (1);

>> +    }

>

> Given my first comment above this else would be deleted.


OK
From f81e1d3212475a3dc7aaeb8cb3171c6defd40687 Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Wed, 8 Feb 2017 16:52:10 +0000
Subject: [PATCH 2/3] [ARM] Refactor costs calculation for MEM.

This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.

In this patch, the calculation of the cost of the addressing
mode is left as a placeholder, to be added in a subsequent
patch.

gcc/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

	* config/arm/arm.c (arm_mem_costs): New function.
	(arm_rtx_costs_internal): Use arm_mem_costs.

gcc/testsuite/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

	* gcc.target/arm/addr-modes-float.c: New test.
	* gcc.target/arm/addr-modes-int.c: New test.
	* gcc.target/arm/addr-modes.h: New header.

Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
 gcc/config/arm/arm.c                            | 60 ++++++++++++++-----------
 gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 +++++++++++++++++
 gcc/testsuite/gcc.target/arm/addr-modes-int.c   | 46 +++++++++++++++++++
 gcc/testsuite/gcc.target/arm/addr-modes.h       | 53 ++++++++++++++++++++++
 4 files changed, 176 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 32001e5..64230b8 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9228,8 +9228,41 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
 	  }								\
 	while (0);
 
+/* Helper function for arm_rtx_costs_internal.  Calculates the cost of a MEM,
+   considering the costs of the addressing mode and memory access
+   separately.  */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+	       int *cost, bool speed_p)
+{
+  machine_mode mode = GET_MODE (x);
+
+  *cost = COSTS_N_INSNS (1);
+
+  if (flag_pic
+      && GET_CODE (XEXP (x, 0)) == PLUS
+      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+    /* This will be split into two instructions.  Add the cost of the
+       additional instruction here.  The cost of the memory access is computed
+       below.  See arm.md:calculate_pic_address.  */
+    *cost += COSTS_N_INSNS (1);
+
+  /* Calculate cost of memory access.  */
+  if (speed_p)
+    {
+      /* data transfer is transfer size divided by bus width.  */
+      int bus_width_bytes = current_tune->bus_width / BITS_PER_UNIT;
+      int num_transfers = CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
+      /* first transfer included in initial cost */
+      *cost += COSTS_N_INSNS (num_transfers - 1);
+      *cost += extra_cost->ldst.load;
+    }
+
+  return true;
+}
+
 /* RTX costs.  Make an estimate of the cost of executing the operation
-   X, which is contained with an operation with code OUTER_CODE.
+   X, which is contained within an operation with code OUTER_CODE.
    SPEED_P indicates whether the cost desired is the performance cost,
    or the size cost.  The estimate is stored in COST and the return
    value is TRUE if the cost calculation is final, or FALSE if the
@@ -9308,30 +9341,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
       return false;
 
     case MEM:
-      /* A memory access costs 1 insn if the mode is small, or the address is
-	 a single register, otherwise it costs one insn per word.  */
-      if (REG_P (XEXP (x, 0)))
-	*cost = COSTS_N_INSNS (1);
-      else if (flag_pic
-	       && GET_CODE (XEXP (x, 0)) == PLUS
-	       && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
-	/* This will be split into two instructions.
-	   See arm.md:calculate_pic_address.  */
-	*cost = COSTS_N_INSNS (2);
-      else
-	*cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));
-
-      /* For speed optimizations, add the costs of the address and
-	 accessing memory.  */
-      if (speed_p)
-#ifdef NOT_YET
-	*cost += (extra_cost->ldst.load
-		  + arm_address_cost (XEXP (x, 0), mode,
-				      ADDR_SPACE_GENERIC, speed_p));
-#else
-        *cost += extra_cost->ldst.load;
-#endif
-      return true;
+      return arm_mem_costs (x, extra_cost, cost, speed_p);
 
     case PARALLEL:
     {
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
new file mode 100644
index 0000000..3b4235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
@@ -0,0 +1,42 @@
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include <arm_neon.h>
+
+#include "addr-modes.h"
+
+POST_STORE(float)
+/* { dg-final { scan-assembler "vstmia.32" } } */
+POST_STORE(double)
+/* { dg-final { scan-assembler "vstmia.64" } } */
+
+POST_LOAD(float)
+/* { dg-final { scan-assembler "vldmia.32" } } */
+POST_LOAD(double)
+/* { dg-final { scan-assembler "vldmia.64" } } */
+
+POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
new file mode 100644
index 0000000..e3e1e6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O2 -march=armv7-a" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include "addr-modes.h"
+
+typedef long long ll;
+
+PRE_STORE(char)
+/* { dg-final { scan-assembler "strb.*#1]!" } } */
+PRE_STORE(short)
+/* { dg-final { scan-assembler "strh.*#2]!" } } */
+PRE_STORE(int)
+/* { dg-final { scan-assembler "str.*#4]!" } } */
+PRE_STORE(ll)
+/* { dg-final { scan-assembler "strd.*#8]!" } } */
+
+POST_STORE(char)
+/* { dg-final { scan-assembler "strb.*], #1" } } */
+POST_STORE(short)
+/* { dg-final { scan-assembler "strh.*], #2" } } */
+POST_STORE(int)
+/* { dg-final { scan-assembler "str.*], #4" } } */
+POST_STORE(ll)
+/* { dg-final { scan-assembler "strd.*], #8" } } */
+
+PRE_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*#1]!" } } */
+PRE_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */
+PRE_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*#4]!" } } */
+PRE_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*#8]!" } } */
+
+POST_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*], #1" } } */
+POST_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*], #2" } } */
+POST_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*], #4" } } */
+POST_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*], #8" } } */
+
+/* { dg-final { scan-assembler-not "\tadd" } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h b/gcc/testsuite/gcc.target/arm/addr-modes.h
new file mode 100644
index 0000000..eac4678
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes.h
@@ -0,0 +1,53 @@
+
+#define PRE_STORE(T)			\
+  T *					\
+  T ## _pre_store (T *p, T v)		\
+  {					\
+    *++p = v;				\
+    return p;				\
+  }					\
+
+#define POST_STORE(T)			\
+  T *					\
+  T ## _post_store (T *p, T v)		\
+  {					\
+    *p++ = v;				\
+    return p;				\
+  }
+
+#define POST_STORE_VEC(T, VT, OP)	\
+  T *					\
+  VT ## _post_store (T * p, VT v)	\
+  {					\
+    OP (p, v);				\
+    p += sizeof (VT) / sizeof (T);	\
+    return p;				\
+  }
+
+#define PRE_LOAD(T)			\
+  void					\
+  T ## _pre_load (T *p)			\
+  {					\
+    extern void f ## T (T*,T);		\
+    T x = *++p;				\
+    f ## T (p, x);			\
+  }
+
+#define POST_LOAD(T)			\
+  void					\
+  T ## _post_load (T *p)		\
+  {					\
+    extern void f ## T (T*,T);		\
+    T x = *p++;				\
+    f ## T (p, x);			\
+  }
+
+#define POST_LOAD_VEC(T, VT, OP)	\
+  void					\
+  VT ## _post_load (T * p)		\
+  {					\
+    extern void f ## T (T*,T);		\
+    VT x = OP (p, v);			\
+    p += sizeof (VT) / sizeof (T);	\
+    f ## T (p, x);			\
+  }
Kyrill Tkachov Sept. 15, 2017, 5:01 p.m. UTC | #3
On 15/09/17 16:38, Charles Baylis wrote:
> On 13 September 2017 at 10:02, Kyrill  Tkachov

> <kyrylo.tkachov@foss.arm.com> wrote:

>> Hi Charles,

>>

>> On 12/09/17 09:34, charles.baylis@linaro.org wrote:

>>> From: Charles Baylis <charles.baylis@linaro.org>

>>>

>>> This patch moves the calculation of costs for MEM into a

>>> separate function, and reforms the calculation into two

>>> parts. Firstly any additional cost of the addressing mode

>>> is calculated, and then the cost of the memory access itself

>>> is added.

>>>

>>> In this patch, the calculation of the cost of the addressing

>>> mode is left as a placeholder, to be added in a subsequent

>>> patch.

>>>

>> Can you please mention how has this series been tested?

>> A bootstrap and test run on arm-none-linux-gnueabihf is required at least.

> It has been tested with make check on arm-unknown-linux-gnueabihf with

> no regressions. I've successfully bootstrapped the next spin.


Thanks.

>> Also, do you have any benchmarking results for this?

>> I agree that generating the addressing modes in the new tests is desirable.

>> So I'm not objecting to the goal of this patch, but a check to make sure

>> that this doesn't regress SPEC

>> would be great.  Further comments on the patch inline.

> SPEC2006 scores are unaffected by this patch on Cortex-A57.


Good, thanks for checking :)

>>> +/* Helper function for arm_rtx_costs_internal.  Calculates the cost of a

>>> MEM,

>>> +   considering the costs of the addressing mode and memory access

>>> +   separately.  */

>>> +static bool

>>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,

>>> +              int *cost, bool speed_p)

>>> +{

>>> +  machine_mode mode = GET_MODE (x);

>>> +  if (flag_pic

>>> +      && GET_CODE (XEXP (x, 0)) == PLUS

>>> +      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))

>>> +    /* This will be split into two instructions.  Add the cost of the

>>> +       additional instruction here.  The cost of the memory access is

>>> computed

>>> +       below.  See arm.md:calculate_pic_address.  */

>>> +    *cost = COSTS_N_INSNS (1);

>>> +  else

>>> +    *cost = 0;

>>

>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a each

>> insn)

>> plus the appropriate field in extra_cost. So you should unconditionally

>> initialise the cost

>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1) with

>> the condition above.

> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)

> part because the cost of a single bus transfer is included in that

> initial cost.

>

>>> +

>>> +  /* Calculate cost of the addressing mode.  */

>>> +  if (speed_p)

>>> +    {

>>> +      /* TODO: Add table-driven costs for addressing modes.  (See patch

>>> 2) */

>>> +    }

>>

>> You mean "patch 3". I recommend you just remove this conditional from this

>> patch and add the logic

>> in patch 3 entirely.

> OK.

>

>>> +

>>> +  /* Calculate cost of memory access.  */

>>> +  if (speed_p)

>>> +    {

>>> +      /* data transfer is transfer size divided by bus width.  */

>>> +      int bus_width_bytes = current_tune->bus_width / 4;

>>

>> This should be bus_width / BITS_PER_UNIT to get the size in bytes.

>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure the

>> cost calculation and generated code is still appropriate.

> Oops, I changed the units around and messed this up. I'll fix this.

>

>>> +      *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);

>>> +      *cost += extra_cost->ldst.load;

>>> +    }

>>> +  else

>>> +    {

>>> +      *cost += COSTS_N_INSNS (1);

>>> +    }

>> Given my first comment above this else would be deleted.

> OK


I have a concern about using the bus_width parameter which
I explain in the thread for patch 1 (I don't think we need it, we should 
use the fields in extra_cost->ldst
more carefully).

Kyrill
Charles Baylis Nov. 20, 2017, 9:09 p.m. UTC | #4
On 15 September 2017 at 18:01, Kyrill  Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>

> On 15/09/17 16:38, Charles Baylis wrote:

>>

>> On 13 September 2017 at 10:02, Kyrill  Tkachov

>> <kyrylo.tkachov@foss.arm.com> wrote:

>>>

>>> Hi Charles,

>>>

>>> On 12/09/17 09:34, charles.baylis@linaro.org wrote:

>>>>

>>>> From: Charles Baylis <charles.baylis@linaro.org>

>>>>

>>>> This patch moves the calculation of costs for MEM into a

>>>> separate function, and reforms the calculation into two

>>>> parts. Firstly any additional cost of the addressing mode

>>>> is calculated, and then the cost of the memory access itself

>>>> is added.

>>>>

>>>> In this patch, the calculation of the cost of the addressing

>>>> mode is left as a placeholder, to be added in a subsequent

>>>> patch.

>>>>

>>> Can you please mention how has this series been tested?

>>> A bootstrap and test run on arm-none-linux-gnueabihf is required at

>>> least.

>>

>> It has been tested with make check on arm-unknown-linux-gnueabihf with

>> no regressions. I've successfully bootstrapped the next spin.

>

>

> Thanks.

>

>>> Also, do you have any benchmarking results for this?

>>> I agree that generating the addressing modes in the new tests is

>>> desirable.

>>> So I'm not objecting to the goal of this patch, but a check to make sure

>>> that this doesn't regress SPEC

>>> would be great.  Further comments on the patch inline.

>>

>> SPEC2006 scores are unaffected by this patch on Cortex-A57.

>

>

> Good, thanks for checking :)

>

>

>>>> +/* Helper function for arm_rtx_costs_internal.  Calculates the cost of

>>>> a

>>>> MEM,

>>>> +   considering the costs of the addressing mode and memory access

>>>> +   separately.  */

>>>> +static bool

>>>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,

>>>> +              int *cost, bool speed_p)

>>>> +{

>>>> +  machine_mode mode = GET_MODE (x);

>>>> +  if (flag_pic

>>>> +      && GET_CODE (XEXP (x, 0)) == PLUS

>>>> +      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))

>>>> +    /* This will be split into two instructions.  Add the cost of the

>>>> +       additional instruction here.  The cost of the memory access is

>>>> computed

>>>> +       below.  See arm.md:calculate_pic_address.  */

>>>> +    *cost = COSTS_N_INSNS (1);

>>>> +  else

>>>> +    *cost = 0;

>>>

>>>

>>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a

>>> each

>>> insn)

>>> plus the appropriate field in extra_cost. So you should unconditionally

>>> initialise the cost

>>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1)

>>> with

>>> the condition above.

>>

>> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)

>> part because the cost of a single bus transfer is included in that

>> initial cost.

>>

>>>> +

>>>> +  /* Calculate cost of the addressing mode.  */

>>>> +  if (speed_p)

>>>> +    {

>>>> +      /* TODO: Add table-driven costs for addressing modes.  (See patch

>>>> 2) */

>>>> +    }

>>>

>>>

>>> You mean "patch 3". I recommend you just remove this conditional from

>>> this

>>> patch and add the logic

>>> in patch 3 entirely.

>>

>> OK.

>>

>>>> +

>>>> +  /* Calculate cost of memory access.  */

>>>> +  if (speed_p)

>>>> +    {

>>>> +      /* data transfer is transfer size divided by bus width.  */

>>>> +      int bus_width_bytes = current_tune->bus_width / 4;

>>>

>>>

>>> This should be bus_width / BITS_PER_UNIT to get the size in bytes.

>>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure

>>> the

>>> cost calculation and generated code is still appropriate.

>>

>> Oops, I changed the units around and messed this up. I'll fix this.

>>

>>>> +      *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);

>>>> +      *cost += extra_cost->ldst.load;

>>>> +    }

>>>> +  else

>>>> +    {

>>>> +      *cost += COSTS_N_INSNS (1);

>>>> +    }

>>>

>>> Given my first comment above this else would be deleted.

>>

>> OK

>

>

> I have a concern about using the bus_width parameter which

> I explain in the thread for patch 1 (I don't think we need it, we should use

> the fields in extra_cost->ldst

> more carefully).


I have modified this patch accordingly. Patch 1 is no longer needed.

Passes "make check" (with patch 3) on arm-linux-gnueabihf with no
regressions. Bootstrap is in progress.

Can I still get this in during stage 3?

gcc/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/arm.c (arm_mem_costs): New function.
        (arm_rtx_costs_internal): Use arm_mem_costs.

gcc/testsuite/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

        * gcc.target/arm/addr-modes-float.c: New test.
        * gcc.target/arm/addr-modes-int.c: New test.
        * gcc.target/arm/addr-modes.h: New header.
From 26d9c0839ef7318074d3fd38dca3989bd3e51d54 Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Wed, 8 Feb 2017 16:52:10 +0000
Subject: [PATCH 1/3] [ARM] Refactor costs calculation for MEM.

This patch moves the calculation of costs for MEM into a
separate function, and reforms the calculation into two
parts. Firstly any additional cost of the addressing mode
is calculated, and then the cost of the memory access itself
is added.

In this patch, the calculation of the cost of the addressing
mode is omitted, to be added in a subsequent patch.

gcc/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

	* config/arm/arm.c (arm_mem_costs): New function.
	(arm_rtx_costs_internal): Use arm_mem_costs.

gcc/testsuite/ChangeLog:

<date>  Charles Baylis  <charles.baylis@linaro.org>

	* gcc.target/arm/addr-modes-float.c: New test.
	* gcc.target/arm/addr-modes-int.c: New test.
	* gcc.target/arm/addr-modes.h: New header.

Change-Id: I99e93406ea39ee31f71c7bf428ad3e127b7a618e
---
 gcc/config/arm/arm.c                            | 71 ++++++++++++++++---------
 gcc/testsuite/gcc.target/arm/addr-modes-float.c | 42 +++++++++++++++
 gcc/testsuite/gcc.target/arm/addr-modes-int.c   | 46 ++++++++++++++++
 gcc/testsuite/gcc.target/arm/addr-modes.h       | 53 ++++++++++++++++++
 4 files changed, 187 insertions(+), 25 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-float.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes-int.c
 create mode 100644 gcc/testsuite/gcc.target/arm/addr-modes.h

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 1c2f8fa..ce59d80 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9242,8 +9242,52 @@ arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
 	  }								\
 	while (0)
 
+/* Helper function for arm_rtx_costs_internal.  Calculates the cost of a MEM,
+   considering the costs of the addressing mode and memory access
+   separately.  */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+	       int *cost, bool speed_p)
+{
+  machine_mode mode = GET_MODE (x);
+
+  *cost = COSTS_N_INSNS (1);
+
+  if (flag_pic
+      && GET_CODE (XEXP (x, 0)) == PLUS
+      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+    /* This will be split into two instructions.  Add the cost of the
+       additional instruction here.  The cost of the memory access is computed
+       below.  See arm.md:calculate_pic_address.  */
+    *cost += COSTS_N_INSNS (1);
+
+  /* Calculate cost of memory access.  */
+  if (speed_p)
+    {
+      if (FLOAT_MODE_P (mode))
+	{
+	  if (GET_MODE_SIZE (mode) == 8)
+	    *cost += extra_cost->ldst.loadd;
+	  else
+	    *cost += extra_cost->ldst.loadf;
+	}
+      else if (VECTOR_MODE_P (mode))
+	*cost += extra_cost->ldst.loadv;
+      else
+	{
+	  /* Integer modes */
+	  if (GET_MODE_SIZE (mode) == 8)
+	    *cost += extra_cost->ldst.ldrd;
+	  else
+	    *cost += extra_cost->ldst.load;
+	}
+    }
+
+  return true;
+}
+
 /* RTX costs.  Make an estimate of the cost of executing the operation
-   X, which is contained with an operation with code OUTER_CODE.
+   X, which is contained within an operation with code OUTER_CODE.
    SPEED_P indicates whether the cost desired is the performance cost,
    or the size cost.  The estimate is stored in COST and the return
    value is TRUE if the cost calculation is final, or FALSE if the
@@ -9322,30 +9366,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
       return false;
 
     case MEM:
-      /* A memory access costs 1 insn if the mode is small, or the address is
-	 a single register, otherwise it costs one insn per word.  */
-      if (REG_P (XEXP (x, 0)))
-	*cost = COSTS_N_INSNS (1);
-      else if (flag_pic
-	       && GET_CODE (XEXP (x, 0)) == PLUS
-	       && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
-	/* This will be split into two instructions.
-	   See arm.md:calculate_pic_address.  */
-	*cost = COSTS_N_INSNS (2);
-      else
-	*cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));
-
-      /* For speed optimizations, add the costs of the address and
-	 accessing memory.  */
-      if (speed_p)
-#ifdef NOT_YET
-	*cost += (extra_cost->ldst.load
-		  + arm_address_cost (XEXP (x, 0), mode,
-				      ADDR_SPACE_GENERIC, speed_p));
-#else
-        *cost += extra_cost->ldst.load;
-#endif
-      return true;
+      return arm_mem_costs (x, extra_cost, cost, speed_p);
 
     case PARALLEL:
     {
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
new file mode 100644
index 0000000..3b4235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
@@ -0,0 +1,42 @@
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include <arm_neon.h>
+
+#include "addr-modes.h"
+
+POST_STORE(float)
+/* { dg-final { scan-assembler "vstmia.32" } } */
+POST_STORE(double)
+/* { dg-final { scan-assembler "vstmia.64" } } */
+
+POST_LOAD(float)
+/* { dg-final { scan-assembler "vldmia.32" } } */
+POST_LOAD(double)
+/* { dg-final { scan-assembler "vldmia.64" } } */
+
+POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
new file mode 100644
index 0000000..e3e1e6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
@@ -0,0 +1,46 @@
+/* { dg-options "-O2 -march=armv7-a" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include "addr-modes.h"
+
+typedef long long ll;
+
+PRE_STORE(char)
+/* { dg-final { scan-assembler "strb.*#1]!" } } */
+PRE_STORE(short)
+/* { dg-final { scan-assembler "strh.*#2]!" } } */
+PRE_STORE(int)
+/* { dg-final { scan-assembler "str.*#4]!" } } */
+PRE_STORE(ll)
+/* { dg-final { scan-assembler "strd.*#8]!" } } */
+
+POST_STORE(char)
+/* { dg-final { scan-assembler "strb.*], #1" } } */
+POST_STORE(short)
+/* { dg-final { scan-assembler "strh.*], #2" } } */
+POST_STORE(int)
+/* { dg-final { scan-assembler "str.*], #4" } } */
+POST_STORE(ll)
+/* { dg-final { scan-assembler "strd.*], #8" } } */
+
+PRE_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*#1]!" } } */
+PRE_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */
+PRE_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*#4]!" } } */
+PRE_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*#8]!" } } */
+
+POST_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*], #1" } } */
+POST_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*], #2" } } */
+POST_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*], #4" } } */
+POST_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*], #8" } } */
+
+/* { dg-final { scan-assembler-not "\tadd" } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h b/gcc/testsuite/gcc.target/arm/addr-modes.h
new file mode 100644
index 0000000..eac4678
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes.h
@@ -0,0 +1,53 @@
+
+#define PRE_STORE(T)			\
+  T *					\
+  T ## _pre_store (T *p, T v)		\
+  {					\
+    *++p = v;				\
+    return p;				\
+  }					\
+
+#define POST_STORE(T)			\
+  T *					\
+  T ## _post_store (T *p, T v)		\
+  {					\
+    *p++ = v;				\
+    return p;				\
+  }
+
+#define POST_STORE_VEC(T, VT, OP)	\
+  T *					\
+  VT ## _post_store (T * p, VT v)	\
+  {					\
+    OP (p, v);				\
+    p += sizeof (VT) / sizeof (T);	\
+    return p;				\
+  }
+
+#define PRE_LOAD(T)			\
+  void					\
+  T ## _pre_load (T *p)			\
+  {					\
+    extern void f ## T (T*,T);		\
+    T x = *++p;				\
+    f ## T (p, x);			\
+  }
+
+#define POST_LOAD(T)			\
+  void					\
+  T ## _post_load (T *p)		\
+  {					\
+    extern void f ## T (T*,T);		\
+    T x = *p++;				\
+    f ## T (p, x);			\
+  }
+
+#define POST_LOAD_VEC(T, VT, OP)	\
+  void					\
+  VT ## _post_load (T * p)		\
+  {					\
+    extern void f ## T (T*,T);		\
+    VT x = OP (p, v);			\
+    p += sizeof (VT) / sizeof (T);	\
+    f ## T (p, x);			\
+  }
Charles Baylis Nov. 21, 2017, 9:51 a.m. UTC | #5
On 20 November 2017 at 21:09, Charles Baylis <charles.baylis@linaro.org> wrote:

> I have modified this patch accordingly. Patch 1 is no longer needed.

>

> Passes "make check" (with patch 3) on arm-linux-gnueabihf with no

> regressions. Bootstrap is in progress.


Bootstrap built successfully using qemu host.

> Can I still get this in during stage 3?

>

> gcc/ChangeLog:

>

> <date>  Charles Baylis  <charles.baylis@linaro.org>

>

>         * config/arm/arm.c (arm_mem_costs): New function.

>         (arm_rtx_costs_internal): Use arm_mem_costs.

>

> gcc/testsuite/ChangeLog:

>

> <date>  Charles Baylis  <charles.baylis@linaro.org>

>

>         * gcc.target/arm/addr-modes-float.c: New test.

>         * gcc.target/arm/addr-modes-int.c: New test.

>         * gcc.target/arm/addr-modes.h: New header.
Kyrill Tkachov Nov. 23, 2017, 10:01 a.m. UTC | #6
Hi Charles,

On 20/11/17 21:09, Charles Baylis wrote:
> On 15 September 2017 at 18:01, Kyrill Tkachov

> <kyrylo.tkachov@foss.arm.com> wrote:

> >

> > On 15/09/17 16:38, Charles Baylis wrote:

> >>

> >> On 13 September 2017 at 10:02, Kyrill  Tkachov

> >> <kyrylo.tkachov@foss.arm.com> wrote:

> >>>

> >>> Hi Charles,

> >>>

> >>> On 12/09/17 09:34, charles.baylis@linaro.org wrote:

> >>>>

> >>>> From: Charles Baylis <charles.baylis@linaro.org>

> >>>>

> >>>> This patch moves the calculation of costs for MEM into a

> >>>> separate function, and reforms the calculation into two

> >>>> parts. Firstly any additional cost of the addressing mode

> >>>> is calculated, and then the cost of the memory access itself

> >>>> is added.

> >>>>

> >>>> In this patch, the calculation of the cost of the addressing

> >>>> mode is left as a placeholder, to be added in a subsequent

> >>>> patch.

> >>>>

> >>> Can you please mention how has this series been tested?

> >>> A bootstrap and test run on arm-none-linux-gnueabihf is required at

> >>> least.

> >>

> >> It has been tested with make check on arm-unknown-linux-gnueabihf with

> >> no regressions. I've successfully bootstrapped the next spin.

> >

> >

> > Thanks.

> >

> >>> Also, do you have any benchmarking results for this?

> >>> I agree that generating the addressing modes in the new tests is

> >>> desirable.

> >>> So I'm not objecting to the goal of this patch, but a check to 

> make sure

> >>> that this doesn't regress SPEC

> >>> would be great.  Further comments on the patch inline.

> >>

> >> SPEC2006 scores are unaffected by this patch on Cortex-A57.

> >

> >

> > Good, thanks for checking :)

> >

> >

> >>>> +/* Helper function for arm_rtx_costs_internal.  Calculates the 

> cost of

> >>>> a

> >>>> MEM,

> >>>> +   considering the costs of the addressing mode and memory access

> >>>> +   separately.  */

> >>>> +static bool

> >>>> +arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,

> >>>> +              int *cost, bool speed_p)

> >>>> +{

> >>>> +  machine_mode mode = GET_MODE (x);

> >>>> +  if (flag_pic

> >>>> +      && GET_CODE (XEXP (x, 0)) == PLUS

> >>>> +      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))

> >>>> +    /* This will be split into two instructions.  Add the cost 

> of the

> >>>> +       additional instruction here.  The cost of the memory 

> access is

> >>>> computed

> >>>> +       below.  See arm.md:calculate_pic_address.  */

> >>>> +    *cost = COSTS_N_INSNS (1);

> >>>> +  else

> >>>> +    *cost = 0;

> >>>

> >>>

> >>> For speed_p we want the size cost of the insn (COSTS_N_INSNS (1) for a

> >>> each

> >>> insn)

> >>> plus the appropriate field in extra_cost. So you should 

> unconditionally

> >>> initialise the cost

> >>> to COSTS_N_INSNS (1), conditionally increment it by COSTS_N_INSNS (1)

> >>> with

> >>> the condition above.

> >>

> >> OK. I also have to subtract that COSTS_N_INSNS (1) in the if (speed_p)

> >> part because the cost of a single bus transfer is included in that

> >> initial cost.

> >>

> >>>> +

> >>>> +  /* Calculate cost of the addressing mode.  */

> >>>> +  if (speed_p)

> >>>> +    {

> >>>> +      /* TODO: Add table-driven costs for addressing modes.  

> (See patch

> >>>> 2) */

> >>>> +    }

> >>>

> >>>

> >>> You mean "patch 3". I recommend you just remove this conditional from

> >>> this

> >>> patch and add the logic

> >>> in patch 3 entirely.

> >>

> >> OK.

> >>

> >>>> +

> >>>> +  /* Calculate cost of memory access.  */

> >>>> +  if (speed_p)

> >>>> +    {

> >>>> +      /* data transfer is transfer size divided by bus width.  */

> >>>> +      int bus_width_bytes = current_tune->bus_width / 4;

> >>>

> >>>

> >>> This should be bus_width / BITS_PER_UNIT to get the size in bytes.

> >>> BITS_PER_UNIT is 8 though, so you'll have to double check to make sure

> >>> the

> >>> cost calculation and generated code is still appropriate.

> >>

> >> Oops, I changed the units around and messed this up. I'll fix this.

> >>

> >>>> +      *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);

> >>>> +      *cost += extra_cost->ldst.load;

> >>>> +    }

> >>>> +  else

> >>>> +    {

> >>>> +      *cost += COSTS_N_INSNS (1);

> >>>> +    }

> >>>

> >>> Given my first comment above this else would be deleted.

> >>

> >> OK

> >

> >

> > I have a concern about using the bus_width parameter which

> > I explain in the thread for patch 1 (I don't think we need it, we 

> should use

> > the fields in extra_cost->ldst

> > more carefully).

>

> I have modified this patch accordingly. Patch 1 is no longer needed.

>

> Passes "make check" (with patch 3) on arm-linux-gnueabihf with no

> regressions. Bootstrap is in progress.

>

> Can I still get this in during stage 3?

>


Thanks, these are ok for trunk.
They were originally posted way before stage 3 and this is just a rework,
so it's acceptable at this stage as far as I'm concerned.

Thank you for working on these,
Kyrill

> gcc/ChangeLog:

>

> <date>  Charles Baylis <charles.baylis@linaro.org>

>

>         * config/arm/arm.c (arm_mem_costs): New function.

>         (arm_rtx_costs_internal): Use arm_mem_costs.

>

> gcc/testsuite/ChangeLog:

>

> <date>  Charles Baylis <charles.baylis@linaro.org>

>

>         * gcc.target/arm/addr-modes-float.c: New test.

>         * gcc.target/arm/addr-modes-int.c: New test.

>         * gcc.target/arm/addr-modes.h: New header.
Charles Baylis Nov. 23, 2017, 6:53 p.m. UTC | #7
On 23 November 2017 at 10:01, Kyrill  Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:

>

> Thanks, these are ok for trunk.

> They were originally posted way before stage 3 and this is just a rework,

> so it's acceptable at this stage as far as I'm concerned.


Thanks. Committed to trunk as r255111.
diff mbox series

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 32001e5..b8dbed6 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9228,8 +9228,48 @@  arm_unspec_cost (rtx x, enum rtx_code /* outer_code */, bool speed_p, int *cost)
 	  }								\
 	while (0);
 
+/* Helper function for arm_rtx_costs_internal.  Calculates the cost of a MEM,
+   considering the costs of the addressing mode and memory access
+   separately.  */
+static bool
+arm_mem_costs (rtx x, const struct cpu_cost_table *extra_cost,
+	       int *cost, bool speed_p)
+{
+  machine_mode mode = GET_MODE (x);
+  if (flag_pic
+      && GET_CODE (XEXP (x, 0)) == PLUS
+      && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
+    /* This will be split into two instructions.  Add the cost of the
+       additional instruction here.  The cost of the memory access is computed
+       below.  See arm.md:calculate_pic_address.  */
+    *cost = COSTS_N_INSNS (1);
+  else
+    *cost = 0;
+
+  /* Calculate cost of the addressing mode.  */
+  if (speed_p)
+    {
+      /* TODO: Add table-driven costs for addressing modes.  (See patch 2) */
+    }
+
+  /* Calculate cost of memory access.  */
+  if (speed_p)
+    {
+      /* data transfer is transfer size divided by bus width.  */
+      int bus_width_bytes = current_tune->bus_width / 4;
+      *cost += CEIL (GET_MODE_SIZE (mode), bus_width_bytes);
+      *cost += extra_cost->ldst.load;
+    }
+  else
+    {
+      *cost += COSTS_N_INSNS (1);
+    }
+
+  return true;
+}
+
 /* RTX costs.  Make an estimate of the cost of executing the operation
-   X, which is contained with an operation with code OUTER_CODE.
+   X, which is contained within an operation with code OUTER_CODE.
    SPEED_P indicates whether the cost desired is the performance cost,
    or the size cost.  The estimate is stored in COST and the return
    value is TRUE if the cost calculation is final, or FALSE if the
@@ -9308,30 +9348,7 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
       return false;
 
     case MEM:
-      /* A memory access costs 1 insn if the mode is small, or the address is
-	 a single register, otherwise it costs one insn per word.  */
-      if (REG_P (XEXP (x, 0)))
-	*cost = COSTS_N_INSNS (1);
-      else if (flag_pic
-	       && GET_CODE (XEXP (x, 0)) == PLUS
-	       && will_be_in_index_register (XEXP (XEXP (x, 0), 1)))
-	/* This will be split into two instructions.
-	   See arm.md:calculate_pic_address.  */
-	*cost = COSTS_N_INSNS (2);
-      else
-	*cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));
-
-      /* For speed optimizations, add the costs of the address and
-	 accessing memory.  */
-      if (speed_p)
-#ifdef NOT_YET
-	*cost += (extra_cost->ldst.load
-		  + arm_address_cost (XEXP (x, 0), mode,
-				      ADDR_SPACE_GENERIC, speed_p));
-#else
-        *cost += extra_cost->ldst.load;
-#endif
-      return true;
+      return arm_mem_costs (x, extra_cost, cost, speed_p);
 
     case PARALLEL:
     {
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-float.c b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
new file mode 100644
index 0000000..3b4235c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-float.c
@@ -0,0 +1,42 @@ 
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include <arm_neon.h>
+
+#include "addr-modes.h"
+
+POST_STORE(float)
+/* { dg-final { scan-assembler "vstmia.32" } } */
+POST_STORE(double)
+/* { dg-final { scan-assembler "vstmia.64" } } */
+
+POST_LOAD(float)
+/* { dg-final { scan-assembler "vldmia.32" } } */
+POST_LOAD(double)
+/* { dg-final { scan-assembler "vldmia.64" } } */
+
+POST_STORE_VEC (int8_t, int8x8_t, vst1_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16_t, vst1q_s8)
+/* { dg-final { scan-assembler "vst1.8\t\{.*\[-,\]d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x2_t, vst2_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x2_t, vst2q_s8)
+/* { dg-final { scan-assembler "vst2.8\t\{.*-d.*\}, \\\[r\[0-9\]+\\\]!" } } */
+
+POST_STORE_VEC (int8_t, int8x8x3_t, vst3_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x3_t, vst3q_s8)
+/* { dg-final { scan-assembler "vst3.8\t\{d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst3.8\t\{d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+POST_STORE_VEC (int8_t, int8x8x4_t, vst4_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{.*\}, \\\[r\[0-9\]+\\\]!" } } */
+POST_STORE_VEC (int8_t, int8x16x4_t, vst4q_s8)
+/* { dg-final { scan-assembler "vst4.8\t\{d\[02468\], d\[02468\], d\[02468\], d\[02468\]\}, \\\[r\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vst4.8\t\{d\[13579\], d\[13579\], d\[13579\], d\[13579\]\}, \\\[r\[0-9\]+\\\]!" { xfail *-*-* } } } */
+
+/* { dg-final { scan-assembler-not "add" { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes-int.c b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
new file mode 100644
index 0000000..e3e1e6a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes-int.c
@@ -0,0 +1,46 @@ 
+/* { dg-options "-O2 -march=armv7-a" } */
+/* { dg-add-options arm_neon } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-do compile } */
+
+#include "addr-modes.h"
+
+typedef long long ll;
+
+PRE_STORE(char)
+/* { dg-final { scan-assembler "strb.*#1]!" } } */
+PRE_STORE(short)
+/* { dg-final { scan-assembler "strh.*#2]!" } } */
+PRE_STORE(int)
+/* { dg-final { scan-assembler "str.*#4]!" } } */
+PRE_STORE(ll)
+/* { dg-final { scan-assembler "strd.*#8]!" } } */
+
+POST_STORE(char)
+/* { dg-final { scan-assembler "strb.*], #1" } } */
+POST_STORE(short)
+/* { dg-final { scan-assembler "strh.*], #2" } } */
+POST_STORE(int)
+/* { dg-final { scan-assembler "str.*], #4" } } */
+POST_STORE(ll)
+/* { dg-final { scan-assembler "strd.*], #8" } } */
+
+PRE_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*#1]!" } } */
+PRE_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*#2]!" } } */
+PRE_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*#4]!" } } */
+PRE_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*#8]!" } } */
+
+POST_LOAD(char)
+/* { dg-final { scan-assembler "ldrb.*], #1" } } */
+POST_LOAD(short)
+/* { dg-final { scan-assembler "ldrsh.*], #2" } } */
+POST_LOAD(int)
+/* { dg-final { scan-assembler "ldr.*], #4" } } */
+POST_LOAD(ll)
+/* { dg-final { scan-assembler "ldrd.*], #8" } } */
+
+/* { dg-final { scan-assembler-not "\tadd" } } */
diff --git a/gcc/testsuite/gcc.target/arm/addr-modes.h b/gcc/testsuite/gcc.target/arm/addr-modes.h
new file mode 100644
index 0000000..eac4678
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/addr-modes.h
@@ -0,0 +1,53 @@ 
+
+#define PRE_STORE(T)			\
+  T *					\
+  T ## _pre_store (T *p, T v)		\
+  {					\
+    *++p = v;				\
+    return p;				\
+  }					\
+
+#define POST_STORE(T)			\
+  T *					\
+  T ## _post_store (T *p, T v)		\
+  {					\
+    *p++ = v;				\
+    return p;				\
+  }
+
+#define POST_STORE_VEC(T, VT, OP)	\
+  T *					\
+  VT ## _post_store (T * p, VT v)	\
+  {					\
+    OP (p, v);				\
+    p += sizeof (VT) / sizeof (T);	\
+    return p;				\
+  }
+
+#define PRE_LOAD(T)			\
+  void					\
+  T ## _pre_load (T *p)			\
+  {					\
+    extern void f ## T (T*,T);		\
+    T x = *++p;				\
+    f ## T (p, x);			\
+  }
+
+#define POST_LOAD(T)			\
+  void					\
+  T ## _post_load (T *p)		\
+  {					\
+    extern void f ## T (T*,T);		\
+    T x = *p++;				\
+    f ## T (p, x);			\
+  }
+
+#define POST_LOAD_VEC(T, VT, OP)	\
+  void					\
+  VT ## _post_load (T * p)		\
+  {					\
+    extern void f ## T (T*,T);		\
+    VT x = OP (p, v);			\
+    p += sizeof (VT) / sizeof (T);	\
+    f ## T (p, x);			\
+  }