diff mbox

[v2,ARM] PR61551 RFC: Improve costs for NEON addressing modes

Message ID CADnVucBwhJ3n59iMy9amFiQ=kgre64kn99MW7AwBr7FLxTSczQ@mail.gmail.com
State New
Headers show

Commit Message

Charles Baylis Nov. 13, 2015, 4:01 p.m. UTC
Hi

Following on from previous discussion:
https://gcc.gnu.org/ml/gcc-patches/2015-10/msg03464.html
and IRC.

I'm going to try once more to make the case for fixing the worst
problem for GCC 6, pending a rewrite of the address_cost
infrastructure for GCC 7. I think the rewrite you're describing is
overkill for this problem. There is one specific problem which I would
like to fix for GCC6, and that is the failure of the ARM backend to
allow use of post-indexed addressing for some vector modes.

Test program:
 #include <arm_neon.h>
char *f(char *p, int8x8x4_t v, int r)
{ vst4_s8(p, v); p+=32; return p; }

Desired code:
f:
        vst4.8  {d0-d3}, [r0]!
        bx      lr

Currently generated code:
f:
        mov     r3, r0
        adds    r0, r0, #32
        vst4.8  {d0-d3}, [r3]
        bx      lr

The auto-inc-dec phase does not apply in this case, because the costs
for RTXs which use POST_INC are wrong. Using gdb to poke at this, we
can see:

$ arm-unknown-linux-gnueabihf-gcc -mfpu=neon -O3 -S /tmp/foo.c
-wrapper gdb,--args
GNU gdb (Ubuntu 7.9-1ubuntu1) 7.9
<snip>
Reading symbols from
/home/charles.baylis/tools/tools-arm-unknown-linux-gnueabihf-git/bin/../libexec/gcc/arm-unknown-linux-gnueabihf/6.0.0/cc1...done.
(gdb) b auto-inc-dec.c:473
Breakpoint 1 at 0x102c253: file
/home/charles.baylis/srcarea/gcc/gcc-git/gcc/auto-inc-dec.c, line 473.
(gdb) r
<snip, gdb has an assertion failure here, but it works to continue>
(gdb) print debug_rtx(mem)
(mem:OI (reg/v/f:SI 112 [ p ]) [0 MEM[(signed char[32] *)p_2(D)]+0 S32 A8])
$1 = void
(gdb) print rtx_cost(mem, V16QImode, SET, 1, false)
$2 = 4
(gdb) print debug_rtx(mem_tmp)
(mem:OI (post_inc:SI (reg/f:SI 115 [ p ])) [0  S32 A64])
$3 = void
(gdb) print rtx_cost(mem_tmp, V16QImode, SET, 1, false)
$4 = 32

So, the cost of
     (mem:OI (reg/v/f:SI 112 [ p ]))
is 4, while the cost of
    (mem:OI (post_inc:SI (reg/f:SI 115 [ p ])))
is 32.

That is a difference equivalent to 7 insns, which has no basis in
reality. It is just a bug.

Addressing some specific review points from the previous version.

> > +    {

> > +      0,

> > +      COSTS_N_INSNS (15),

> > +      COSTS_N_INSNS (15),

> > +      COSTS_N_INSNS (15),

> > +      COSTS_N_INSNS (15)

> > +    } /* vec512 */

> >    }

> >  };

>

> I'm curious as to the numbers here - The costs should reflect the relative costs of the

> addressing modes not the costs of the loads and stores - thus having high numbers

> here for vector modes may just prevent this from even triggering in auto-inc-dec

> code ? In my experience with GCC I've never satisfactorily answered the question

> whether these should be comparable to rtx_costs or not. In an ideal world they should

> be but I'm never sure. IOW I'm not sure if using COSTS_N_INSNS or plain numbers

> here is appropriate.


That's the point of the patch. These numbers give the same behaviour
as the current arm_rtx_costs code, and they are obviously wrong.

> 17:45 < ramana> My problem is that the mid-end in a number of other places

>         compares the cost coming out of rtx_cost and address_cost and if the 2

>         are not in sync we get funny values.


There is already no correspondence at all between the two at present.
My patch doesn't address this, but I think it must at least make it
better.

However, I don't really understand this comment - as you point out
above, address_cost and rtx_cost return values measured in different
units. I don't see how they can be made to correspond, given that.

> Right, but this does not change arm_address_costs - so how is this going to work?

> I would like this moved into a new function aarch_address_costs and that replacing

> arm_address_costs only to be called from here.


I could do that, but if I did, I would have to resubmit the patch at
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00387.html along with a
reimplemention of arm_address_costs which used a table without
changing its numerical results (pending subsequent tuning). Since the
former would already solve my problem, and the latter would then be a
pure code clean up of a separate function, why not accept the '387
patch as is, and leave the clean up until GCC 7?

Alternatively, this is an updated patch series which changes the costs
for MEMs in arm_rtx_costs using the table. Passes make check with no
regressions for arm-unknown-linux-gnueabihf on qemu.
diff mbox

Patch

From 68f4318327e75a709f6a3bea327915c0558127df Mon Sep 17 00:00:00 2001
From: Charles Baylis <charles.baylis@linaro.org>
Date: Fri, 13 Nov 2015 12:24:08 +0000
Subject: [PATCH 4/4] Use integer costs for soft float

If using soft float, then costs of accessing FP values is actually the same
as the cost of accessing integers of the same size.

Change-Id: Icb672b2b599ea4e433bc0b29c228e9f910aeb3ee
---
 gcc/config/arm/arm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 101ff28..726a385 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -9579,15 +9579,15 @@  arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code,
 		  : ARM_NUM_REGS (mode) <= 12 ? extra_cost->extra_mem->vec384[op]
 		  : extra_cost->extra_mem->vec512[op]);
 	      }
-	    else if (FLOAT_MODE_P (mode))
+	    else if (mode == BLKmode)
+	      *cost += extra_cost->extra_mem->blk[op];
+	    else if (FLOAT_MODE_P (mode) && TARGET_HARD_FLOAT)
 	      {
 		*cost +=
 		  (ARM_NUM_REGS (mode) <= 1 ? extra_cost->extra_mem->sf[op]
 		  : ARM_NUM_REGS (mode) <= 2 ? extra_cost->extra_mem->df[op]
 					     : extra_cost->extra_mem->cdf[op]);
 	      }
-	    else if (mode == BLKmode)
-	      *cost += extra_cost->extra_mem->blk[op];
             else
 	      { /* integer modes */
 		*cost +=
-- 
1.9.1