mbox series

[0/2,RFC,PR88836,AARCH64] Fix redundant ptest instruction

Message ID 1557972484-24599-1-git-send-email-kugan.vivekanandarajah@linaro.org
Headers show
Series Fix redundant ptest instruction | expand

Message

Kugan Vivekanandarajah May 16, 2019, 2:08 a.m. UTC
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>


Inorder to fix this PR.
 * We need to change the whilelo pattern in backend
 * Change RTL CSE such that:
   - Add support for VEC_DUPLICATE
   - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the
     parallel rtx at the end.

For example, with patch1, we now have rtl insn as follows:

(insn 19 18 20 3 (parallel [
            (set (reg:VNx4BI 93 [ next_mask_18 ])
                (unspec:VNx4BI [
                        (const_int 0 [0])
                        (reg:DI 95 [ _33 ])
                    ] UNSPEC_WHILE_LO))
            (set (reg:CC 66 cc)
                (compare:CC (unspec:SI [
                            (vec_duplicate:VNx4BI (const_int 1 [0x1]))
                            (reg:VNx4BI 93 [ next_mask_18 ])
                        ] UNSPEC_PTEST_PTRUE)
                    (const_int 0 [0])))
        ]) 4244 {while_ultdivnx4bi}

When cse_insn process the first, it records the CSE set in reg 93.  Then after
processing both the instruction in the parallel rtx, we invalidate all
expression with reg 93 which means expression in the second instruction is
invalidated for CSE. Attached patch relaxes this by invalidating before processing the
second.

Bootstrap and regression testing for the current version is ongoing.

Thanks,
Kugan

Kugan Vivekanandarajah (2):
  [PR88836][aarch64] Set CC_REGNUM instead of clobber
  [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one

 gcc/config/aarch64/aarch64-sve.md          |  9 +++-
 gcc/cse.c                                  | 67 ++++++++++++++++++++++++++----
 gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++
 3 files changed, 80 insertions(+), 10 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c

-- 
2.7.4

Comments

Richard Sandiford May 16, 2019, 8:13 a.m. UTC | #1
kugan.vivekanandarajah@linaro.org writes:
> From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>

>

> Inorder to fix this PR.

>  * We need to change the whilelo pattern in backend

>  * Change RTL CSE such that:

>    - Add support for VEC_DUPLICATE

>    - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the

>      parallel rtx at the end.

>

> For example, with patch1, we now have rtl insn as follows:

>

> (insn 19 18 20 3 (parallel [

>             (set (reg:VNx4BI 93 [ next_mask_18 ])

>                 (unspec:VNx4BI [

>                         (const_int 0 [0])

>                         (reg:DI 95 [ _33 ])

>                     ] UNSPEC_WHILE_LO))

>             (set (reg:CC 66 cc)

>                 (compare:CC (unspec:SI [

>                             (vec_duplicate:VNx4BI (const_int 1 [0x1]))

>                             (reg:VNx4BI 93 [ next_mask_18 ])

>                         ] UNSPEC_PTEST_PTRUE)

>                     (const_int 0 [0])))

>         ]) 4244 {while_ultdivnx4bi}

>

> When cse_insn process the first, it records the CSE set in reg 93.  Then after

> processing both the instruction in the parallel rtx, we invalidate all

> expression with reg 93 which means expression in the second instruction is

> invalidated for CSE. Attached patch relaxes this by invalidating before processing the

> second.


As far as patch 1 goes: the traditional reason for using clobbers
to start with is that:

- setting CC places a requirement on what CC must be after that instruction.
  We then have to rely on REG_UNUSED notes to tell whether that value
  actually matters or not.

  This was a bigger deal before df though.  It might not matter as much now.

- many passes find it harder to deal with multiple sets rather than
  single sets, so it's better to keep a single_set unless we know
  that both results are needed.

It's currently combine's job to create a multiple set in cases
where both results are useful.  The pattern for that already exists
(*while_ult<GPI:mode><PRED_ALL:mode>_cc), so if we do go for something
like patch 1, we should simply expand to that insn rather than adding a
new one.  Note that:

  (vec_duplicate:PRED_ALL (const_int 1))

shouldn't appear in the insn stream.  It should always be a const_vector
instead.

From a quick check, I haven't yet found a case where setting CC up-front
hurts though, so maybe the above is out-of-date best practice and we
should set the register up-front after all, if only to reduce the number
of patterns.

However, if possible, I think we should fix the PR in a way that works
for instructions that only optionally set the flags (which for AArch64
is the common case).  So it would be good if we could fix the PR without
needing patch 1.

Thanks,
Richard

>

> Bootstrap and regression testing for the current version is ongoing.

>

> Thanks,

> Kugan

>

> Kugan Vivekanandarajah (2):

>   [PR88836][aarch64] Set CC_REGNUM instead of clobber

>   [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one

>

>  gcc/config/aarch64/aarch64-sve.md          |  9 +++-

>  gcc/cse.c                                  | 67 ++++++++++++++++++++++++++----

>  gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++

>  3 files changed, 80 insertions(+), 10 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c
Kugan Vivekanandarajah June 20, 2019, 4:55 a.m. UTC | #2
Hi Richard,

Thanks for your comments.

On Thu, 16 May 2019 at 18:13, Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> kugan.vivekanandarajah@linaro.org writes:

> > From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>

> >

> > Inorder to fix this PR.

> >  * We need to change the whilelo pattern in backend

> >  * Change RTL CSE such that:

> >    - Add support for VEC_DUPLICATE

> >    - When handling PARALLEL rtx in cse_insn, we kill CSE defined by all the

> >      parallel rtx at the end.

> >

> > For example, with patch1, we now have rtl insn as follows:

> >

> > (insn 19 18 20 3 (parallel [

> >             (set (reg:VNx4BI 93 [ next_mask_18 ])

> >                 (unspec:VNx4BI [

> >                         (const_int 0 [0])

> >                         (reg:DI 95 [ _33 ])

> >                     ] UNSPEC_WHILE_LO))

> >             (set (reg:CC 66 cc)

> >                 (compare:CC (unspec:SI [

> >                             (vec_duplicate:VNx4BI (const_int 1 [0x1]))

> >                             (reg:VNx4BI 93 [ next_mask_18 ])

> >                         ] UNSPEC_PTEST_PTRUE)

> >                     (const_int 0 [0])))

> >         ]) 4244 {while_ultdivnx4bi}

> >

> > When cse_insn process the first, it records the CSE set in reg 93.  Then after

> > processing both the instruction in the parallel rtx, we invalidate all

> > expression with reg 93 which means expression in the second instruction is

> > invalidated for CSE. Attached patch relaxes this by invalidating before processing the

> > second.

>

> As far as patch 1 goes: the traditional reason for using clobbers

> to start with is that:

>

> - setting CC places a requirement on what CC must be after that instruction.

>   We then have to rely on REG_UNUSED notes to tell whether that value

>   actually matters or not.

>

>   This was a bigger deal before df though.  It might not matter as much now.

>

> - many passes find it harder to deal with multiple sets rather than

>   single sets, so it's better to keep a single_set unless we know

>   that both results are needed.

>

> It's currently combine's job to create a multiple set in cases

> where both results are useful.  The pattern for that already exists

> (*while_ult<GPI:mode><PRED_ALL:mode>_cc), so if we do go for something

> like patch 1, we should simply expand to that insn rather than adding a

> new one.  Note that:

>

>   (vec_duplicate:PRED_ALL (const_int 1))

>

> shouldn't appear in the insn stream.  It should always be a const_vector

> instead.

>

> From a quick check, I haven't yet found a case where setting CC up-front

> hurts though, so maybe the above is out-of-date best practice and we

> should set the register up-front after all, if only to reduce the number

> of patterns.

>

> However, if possible, I think we should fix the PR in a way that works

> for instructions that only optionally set the flags (which for AArch64

> is the common case).  So it would be good if we could fix the PR without

> needing patch 1.


Do you think that combine should be able to set this. Sorry, I don't
understand how we can let other passes know that this instruction will
set the flags needed.

Thanks,
Kugan

>

> Thanks,

> Richard

>

> >

> > Bootstrap and regression testing for the current version is ongoing.

> >

> > Thanks,

> > Kugan

> >

> > Kugan Vivekanandarajah (2):

> >   [PR88836][aarch64] Set CC_REGNUM instead of clobber

> >   [PR88836][aarch64] Fix CSE to process parallel rtx dest one by one

> >

> >  gcc/config/aarch64/aarch64-sve.md          |  9 +++-

> >  gcc/cse.c                                  | 67 ++++++++++++++++++++++++++----

> >  gcc/testsuite/gcc.target/aarch64/pr88836.c | 14 +++++++

> >  3 files changed, 80 insertions(+), 10 deletions(-)

> >  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr88836.c