[ARM] Post-indexed addressing for NEON memory access

Message ID	CADnVucAaG=uAZyxQGvyf5bqrmW8JfhfjCp84uCpVnf+=Tois5w@mail.gmail.com
State	New
Headers	show Return-Path: <patchwork-forward+bncBDF3PVNXTMFBBK6VWKOAKGQEWYXYE6Q@linaro.org> Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 2607:f8b0:400c:c03::235 as permitted sender) client-ip=2607:f8b0:400c:c03::235; Received-SPF: pass (google.com: domain of gcc-patches-return-369202-patch=linaro.org@gcc.gnu.org designates 209.132.180.131 as permitted sender) client-ip=209.132.180.131; Mailing-List: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Precedence: list Sender: gcc-patches-owner@gcc.gnu.org MIME-Version: 1.0 Date: Mon, 2 Jun 2014 17:47:19 +0100 Message-ID: <CADnVucAaG=uAZyxQGvyf5bqrmW8JfhfjCp84uCpVnf+=Tois5w@mail.gmail.com> Subject: [PATCH] [ARM] Post-indexed addressing for NEON memory access From: Charles Baylis <charles.baylis@linaro.org> To: GCC Patches <gcc-patches@gcc.gnu.org>, Richard Earnshaw <Richard.Earnshaw@arm.com>, Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com> Content-Type: multipart/mixed; boundary=001a11c1369208fbb004fadd263f

Charles Baylis June 2, 2014, 4:47 p.m. UTC

This patch adds support for post-indexed addressing for NEON structure
memory accesses.

For example VLD1.8 {d0}, [r0], r1


Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.

Ok for trunk?


gcc/Changelog:

2014-06-02  Charles Baylis  <charles.baylis@linaro.org>

        * config/arm/arm.c (neon_vector_mem_operand): Allow register
        POST_MODIFY for neon loads and stores.
        (arm_print_operand): Output post-index register for neon loads and
        stores.

Ramana Radhakrishnan June 5, 2014, 6:27 a.m. UTC | #1

On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> This patch adds support for post-indexed addressing for NEON structure
> memory accesses.
>
> For example VLD1.8 {d0}, [r0], r1
>
>
> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>
> Ok for trunk?

This looks like a reasonable start but this work doesn't look complete
to me yet.

Can you also look at the impact on performance of a range of
benchmarks especially a popular embedded one to see how this behaves
unless you have already done so ?

POST_INC, POST_MODIFY usually have a funny way of biting you with
either ivopts or the way in which address costs work. I think there
maybe further tweaks needed but for a first step I'd like to know what
the performance impact is.

I would also suggest running this through clyon's neon intrinsics
testsuite to see if that catches any issues especially with the large
vector modes.

regards
Ramana

>
>
> gcc/Changelog:
>
> 2014-06-02  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/arm.c (neon_vector_mem_operand): Allow register
>         POST_MODIFY for neon loads and stores.
>         (arm_print_operand): Output post-index register for neon loads and
>         stores.

Charles Baylis June 17, 2014, 3:03 p.m. UTC | #2

On 5 June 2014 07:27, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
> <charles.baylis@linaro.org> wrote:
>> This patch adds support for post-indexed addressing for NEON structure
>> memory accesses.
>>
>> For example VLD1.8 {d0}, [r0], r1
>>
>>
>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>
>> Ok for trunk?
>
> This looks like a reasonable start but this work doesn't look complete
> to me yet.
>
> Can you also look at the impact on performance of a range of
> benchmarks especially a popular embedded one to see how this behaves
> unless you have already done so ?

I ran a popular suite of embedded benchmarks, and there is no impact
at all on Chromebook (including with the additional attached patch)

The patch was developed to address a performance issue with a new
version of libvpx which uses intrinsics instead of NEON assembler. The
patch results in a 3% improvement for VP8 decode.

> POST_INC, POST_MODIFY usually have a funny way of biting you with
> either ivopts or the way in which address costs work. I think there
> maybe further tweaks needed but for a first step I'd like to know what
> the performance impact is.

> I would also suggest running this through clyon's neon intrinsics
> testsuite to see if that catches any issues especially with the large
> vector modes.

No issues found in clyon's tests.

Your mention of larger vector modes prompted me to check that the
patch has the desired result with them. In fact, the costs are
estimated incorrectly which means the post_modify pattern is not used.
The attached patch fixes that. (used in combination with my original
patch)


2014-06-15  Charles Baylis  <charles.bayls@linaro.org>

        * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
        embedded side effects.

Ramana Radhakrishnan June 18, 2014, 10:01 a.m. UTC | #3

On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> This patch adds support for post-indexed addressing for NEON structure
> memory accesses.
>
> For example VLD1.8 {d0}, [r0], r1
>
>
> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>
> Ok for trunk?

This is OK.

Ramana
>
>
> gcc/Changelog:
>
> 2014-06-02  Charles Baylis  <charles.baylis@linaro.org>
>
>         * config/arm/arm.c (neon_vector_mem_operand): Allow register
>         POST_MODIFY for neon loads and stores.
>         (arm_print_operand): Output post-index register for neon loads and
>         stores.

Ramana Radhakrishnan June 18, 2014, 10:06 a.m. UTC | #4

On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis
<charles.baylis@linaro.org> wrote:
> On 5 June 2014 07:27, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
>> <charles.baylis@linaro.org> wrote:
>>> This patch adds support for post-indexed addressing for NEON structure
>>> memory accesses.
>>>
>>> For example VLD1.8 {d0}, [r0], r1
>>>
>>>
>>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>>
>>> Ok for trunk?
>>
>> This looks like a reasonable start but this work doesn't look complete
>> to me yet.
>>
>> Can you also look at the impact on performance of a range of
>> benchmarks especially a popular embedded one to see how this behaves
>> unless you have already done so ?
>
> I ran a popular suite of embedded benchmarks, and there is no impact
> at all on Chromebook (including with the additional attached patch)

Thanks for the due diligence

>
> The patch was developed to address a performance issue with a new
> version of libvpx which uses intrinsics instead of NEON assembler. The
> patch results in a 3% improvement for VP8 decode.

Good - 3% not to be sneezed at.

>
>> POST_INC, POST_MODIFY usually have a funny way of biting you with
>> either ivopts or the way in which address costs work. I think there
>> maybe further tweaks needed but for a first step I'd like to know what
>> the performance impact is.
>
>> I would also suggest running this through clyon's neon intrinsics
>> testsuite to see if that catches any issues especially with the large
>> vector modes.

Thanks.

>
> No issues found in clyon's tests.

Please keep an eye out for any regressions.

>
> Your mention of larger vector modes prompted me to check that the
> patch has the desired result with them. In fact, the costs are
> estimated incorrectly which means the post_modify pattern is not used.
> The attached patch fixes that. (used in combination with my original
> patch)
>
>
> 2014-06-15  Charles Baylis  <charles.bayls@linaro.org>
>
>         * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
>         embedded side effects.

I'm not too thrilled with putting in more special cases that are not
table driven in there. Can you file a PR with some testcases that show
this so that we don't forget and CC me on it please ?


Ramana

Charles Baylis June 18, 2014, 1:49 p.m. UTC | #5

On 18 June 2014 11:01, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
> On Mon, Jun 2, 2014 at 5:47 PM, Charles Baylis
> <charles.baylis@linaro.org> wrote:
>> This patch adds support for post-indexed addressing for NEON structure
>> memory accesses.
>>
>> For example VLD1.8 {d0}, [r0], r1
>>
>>
>> Bootstrapped and checked on arm-unknown-gnueabihf using Qemu.
>>
>> Ok for trunk?
>
> This is OK.

Committed as r211783.

Charles Baylis June 18, 2014, 2:30 p.m. UTC | #6

On 18 June 2014 11:06, Ramana Radhakrishnan <ramana.gcc@googlemail.com> wrote:
>> 2014-06-15  Charles Baylis  <charles.bayls@linaro.org>
>>
>>         * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
>>         embedded side effects.
>
> I'm not too thrilled with putting in more special cases that are not
> table driven in there. Can you file a PR with some testcases that show
> this so that we don't forget and CC me on it please ?

I created PR61551 and CC'd.

Charles Baylis June 19, 2015, 6:04 p.m. UTC | #7

On 18 June 2014 at 11:06, Ramana Radhakrishnan
<ramana.gcc@googlemail.com> wrote:
> On Tue, Jun 17, 2014 at 4:03 PM, Charles Baylis
> <charles.baylis@linaro.org> wrote:
>> Your mention of larger vector modes prompted me to check that the
>> patch has the desired result with them. In fact, the costs are
>> estimated incorrectly which means the post_modify pattern is not used.
>> The attached patch fixes that. (used in combination with my original
>> patch)
>>
>>
>> 2014-06-15  Charles Baylis  <charles.bayls@linaro.org>
>>
>>         * config/arm/arm.c (arm_new_rtx_costs): Reduce cost for mem with
>>         embedded side effects.
>
> I'm not too thrilled with putting in more special cases that are not
> table driven in there. Can you file a PR with some testcases that show
> this so that we don't forget and CC me on it please ?

I created https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61551 at the time.

I've come back to look at this again and would like to fix it in this
release cycle. I still don't really understand what you mean by
table-driven in this context. Do you still hold this view, and if so,
could you describe what you'd like to see instead of this patch?

[ARM] Post-indexed addressing for NEON memory access

Commit Message

Comments

Patch