Message ID | CADnVucAaG=uAZyxQGvyf5bqrmW8JfhfjCp84uCpVnf+=Tois5w@mail.gmail.com |
---|---|
State | New |
Headers | show |
On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis <charles.baylis@linaro.org> wrote: > This patch adds support for post-indexed addressing for NEON structure > memory accesses. > > For example VLD1.8 {d0}, [r0], r1 > > > Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. > > Ok for trunk? This looks like a reasonable start but this work doesn't look complete to me yet. Can you also look at the impact on performance of a range of benchmarks especially a popular embedded one to see how this behaves unless you have already done so ? POST_INC, POST_MODIFY usually have a funny way of biting you with either ivopts or the way in which address costs work. I think there maybe further tweaks needed but for a first step I'd like to know what the performance impact is. I would also suggest running this through clyon's neon intrinsics testsuite to see if that catches any issues especially with the large vector modes. regards Ramana > > > gcc/Changelog: > > 2014-06-02 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/arm.c (neon_vector_mem_operand): Allow register > POST_MODIFY for neon loads and stores. > (arm_print_operand): Output post-index register for neon loads and > stores.
On 5 June 2014 07:27, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis > <charles.baylis@linaro.org> wrote: >> This patch adds support for post-indexed addressing for NEON structure >> memory accesses. >> >> For example VLD1.8 {d0}, [r0], r1 >> >> >> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. >> >> Ok for trunk? > > This looks like a reasonable start but this work doesn't look complete > to me yet. > > Can you also look at the impact on performance of a range of > benchmarks especially a popular embedded one to see how this behaves > unless you have already done so ? I ran a popular suite of embedded benchmarks, and there is no impact at all on Chromebook (including with the additional attached patch) The patch was developed to address a performance issue with a new version of libvpx which uses intrinsics instead of NEON assembler. The patch results in a 3% improvement for VP8 decode. > POST_INC, POST_MODIFY usually have a funny way of biting you with > either ivopts or the way in which address costs work. I think there > maybe further tweaks needed but for a first step I'd like to know what > the performance impact is. > I would also suggest running this through clyon's neon intrinsics > testsuite to see if that catches any issues especially with the large > vector modes. No issues found in clyon's tests. Your mention of larger vector modes prompted me to check that the patch has the desired result with them. In fact, the costs are estimated incorrectly which means the post_modify pattern is not used. The attached patch fixes that. (used in combination with my original patch) 2014-06-15 Charles Baylis <charles.bayls@linaro.org> * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with embedded side effects.
On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis <charles.baylis@linaro.org> wrote: > This patch adds support for post-indexed addressing for NEON structure > memory accesses. > > For example VLD1.8 {d0}, [r0], r1 > > > Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. > > Ok for trunk? This is OK. Ramana > > > gcc/Changelog: > > 2014-06-02 Charles Baylis <charles.baylis@linaro.org> > > * config/arm/arm.c (neon_vector_mem_operand): Allow register > POST_MODIFY for neon loads and stores. > (arm_print_operand): Output post-index register for neon loads and > stores.
On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis <charles.baylis@linaro.org> wrote: > On 5 June 2014 07:27, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: >> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis >> <charles.baylis@linaro.org> wrote: >>> This patch adds support for post-indexed addressing for NEON structure >>> memory accesses. >>> >>> For example VLD1.8 {d0}, [r0], r1 >>> >>> >>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. >>> >>> Ok for trunk? >> >> This looks like a reasonable start but this work doesn't look complete >> to me yet. >> >> Can you also look at the impact on performance of a range of >> benchmarks especially a popular embedded one to see how this behaves >> unless you have already done so ? > > I ran a popular suite of embedded benchmarks, and there is no impact > at all on Chromebook (including with the additional attached patch) Thanks for the due diligence > > The patch was developed to address a performance issue with a new > version of libvpx which uses intrinsics instead of NEON assembler. The > patch results in a 3% improvement for VP8 decode. Good - 3% not to be sneezed at. > >> POST_INC, POST_MODIFY usually have a funny way of biting you with >> either ivopts or the way in which address costs work. I think there >> maybe further tweaks needed but for a first step I'd like to know what >> the performance impact is. > >> I would also suggest running this through clyon's neon intrinsics >> testsuite to see if that catches any issues especially with the large >> vector modes. Thanks. > > No issues found in clyon's tests. Please keep an eye out for any regressions. > > Your mention of larger vector modes prompted me to check that the > patch has the desired result with them. In fact, the costs are > estimated incorrectly which means the post_modify pattern is not used. > The attached patch fixes that. (used in combination with my original > patch) > > > 2014-06-15 Charles Baylis <charles.bayls@linaro.org> > > * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with > embedded side effects. I'm not too thrilled with putting in more special cases that are not table driven in there. Can you file a PR with some testcases that show this so that we don't forget and CC me on it please ? Ramana
On 18 June 2014 11:01, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis > <charles.baylis@linaro.org> wrote: >> This patch adds support for post-indexed addressing for NEON structure >> memory accesses. >> >> For example VLD1.8 {d0}, [r0], r1 >> >> >> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu. >> >> Ok for trunk? > > This is OK. Committed as r211783.
On 18 June 2014 11:06, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: >> 2014-06-15 Charles Baylis <charles.bayls@linaro.org> >> >> * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with >> embedded side effects. > > I'm not too thrilled with putting in more special cases that are not > table driven in there. Can you file a PR with some testcases that show > this so that we don't forget and CC me on it please ? I created PR61551 and CC'd.
On 18 June 2014 at 11:06, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote: > On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis > <charles.baylis@linaro.org> wrote: >> Your mention of larger vector modes prompted me to check that the >> patch has the desired result with them. In fact, the costs are >> estimated incorrectly which means the post_modify pattern is not used. >> The attached patch fixes that. (used in combination with my original >> patch) >> >> >> 2014-06-15 Charles Baylis <charles.bayls@linaro.org> >> >> * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with >> embedded side effects. > > I'm not too thrilled with putting in more special cases that are not > table driven in there. Can you file a PR with some testcases that show > this so that we don't forget and CC me on it please ? I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551 at the time. I've come back to look at this again and would like to fix it in this release cycle. I still don't really understand what you mean by table-driven in this context. Do you still hold this view, and if so, could you describe what you'd like to see instead of this patch?
From a8e0bdbceab00d5e5b655611965d3975ba74365c Mon Sep 17 00:00:00 2001 From: Charles Baylis <charles.baylis@linaro.org> Date: Tue, 6 May 2014 15:23:46 +0100 Subject: [PATCH] post-indexed addressing for vld/vst 2014-05-09 Charles Baylis <charles.baylis@linaro.org> * config/arm/arm.c (neon_vector_mem_operand): Allow register POST_MODIFY for neon loads and stores. (arm_print_operand): Output post-index register for neon loads and stores. --- gcc/config/arm/arm.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 1117bd4..6ab02ef 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -12786,7 +12786,11 @@ neon_vector_mem_operand (rtx op, int type, bool strict) || (type == 0 && GET_CODE (ind) == PRE_DEC)) return arm_address_register_rtx_p (XEXP (ind, 0), 0); - /* FIXME: vld1 allows register post-modify. */ + /* Allow post-increment by register for VLDn */ + if (type == 2 && GET_CODE (ind) == POST_MODIFY + && GET_CODE (XEXP (ind, 1)) == PLUS + && REG_P (XEXP (XEXP (ind, 1), 1))) + return true; /* Match: (plus (reg) @@ -21816,6 +21820,7 @@ arm_print_operand (FILE *stream, rtx x, int code) { rtx addr; bool postinc = FALSE; + rtx postinc_reg = NULL; unsigned align, memsize, align_bits; gcc_assert (MEM_P (x)); @@ -21825,6 +21830,11 @@ arm_print_operand (FILE *stream, rtx x, int code) postinc = 1; addr = XEXP (addr, 0); } + if (GET_CODE (addr) == POST_MODIFY) + { + postinc_reg = XEXP( XEXP (addr, 1), 1); + addr = XEXP (addr, 0); + } asm_fprintf (stream, "[%r", REGNO (addr)); /* We know the alignment of this access, so we can emit a hint in the @@ -21850,6 +21860,8 @@ arm_print_operand (FILE *stream, rtx x, int code) if (postinc) fputs("!", stream); + if (postinc_reg) + asm_fprintf (stream, ", %r", REGNO (postinc_reg)); } return; -- 1.9.1