[RFC,00/23] target/arm: decode generator and initial sve patches

Message ID	20171218174552.18871-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Date: Mon, 18 Dec 2017 09:45:29 -0800 Message-Id: <20171218174552.18871-1-richard.henderson@linaro.org> Subject: [Qemu-devel] [RFC 00/23] target/arm: decode generator and initial sve patches Precedence: list Cc: peter.maydell@linaro.org, qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	target/arm: decode generator and initial sve patches \| expand [RFC,00/23] target/arm: decode generator and initial sve patches [01/23] scripts: Add decodetree.py [02/23] target/arm: Add SVE decode skeleton [03/23] target/arm: Implement SVE Bitwise Logical - Unpredicated Group [04/23] target/arm: Implement PTRUE, PFALSE, SETFFR [05/23] target/arm: Implement SVE predicate logical operations [06/23] target/arm: Implement SVE load vector/predicate [07/23] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group [08/23] target/arm: Handle SVE registers in write_fp_dreg [09/23] target/arm: Handle SVE registers when using clear_vec_high [10/23] target/arm: Implement SVE Integer Reduction Group [11/23] target/arm: Implement SVE bitwise shift by immediate (predicated) [12/23] target/arm: Implement SVE bitwise shift by vector (predicated) [13/23] target/arm: Implement SVE bitwise shift by wide elements (predicated) [14/23] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group [15/23] target/arm: Implement SVE Integer Multiply-Add Group [16/23] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group [17/23] target/arm: Implement SVE Index Generation Group [18/23] target/arm: Implement SVE Stack Allocation Group [19/23] target/arm: Implement SVE Bitwise Shift - Unpredicated Group [20/23] target/arm: Implement SVE Compute Vector Address Group [21/23] target/arm: Implement SVE floating-point exponential accelerator [22/23] target/arm: Implement SVE floating-point trig select coefficient [23/23] target/arm: Implement SVE Element Count Group, register destinations

Richard Henderson Dec. 18, 2017, 5:45 p.m. UTC

The most important part here, for review, is the first patch.

I add a code generator, writen in python, which takes an input file
that describes the opcode bits and field bits of the instructions,
and outputs a function that does all of the decoding.

The subsequent patches begin to add SVE support and also demonstrate
how I envision how both the decoder and the tcg host vector support
are to be used.  Thus, review of the direction would be appreciated
before there are another 100 patches along the same style.


r~


Richard Henderson (23):
  scripts: Add decodetree.py
  target/arm: Add SVE decode skeleton
  target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  target/arm: Implement PTRUE, PFALSE, SETFFR
  target/arm: Implement SVE predicate logical operations
  target/arm: Implement SVE load vector/predicate
  target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
  target/arm: Handle SVE registers in write_fp_dreg
  target/arm: Handle SVE registers when using clear_vec_high
  target/arm: Implement SVE Integer Reduction Group
  target/arm: Implement SVE bitwise shift by immediate (predicated)
  target/arm: Implement SVE bitwise shift by vector (predicated)
  target/arm: Implement SVE bitwise shift by wide elements (predicated)
  target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  target/arm: Implement SVE Integer Multiply-Add Group
  target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
  target/arm: Implement SVE Index Generation Group
  target/arm: Implement SVE Stack Allocation Group
  target/arm: Implement SVE Bitwise Shift - Unpredicated Group
  target/arm: Implement SVE Compute Vector Address Group
  target/arm: Implement SVE floating-point exponential accelerator
  target/arm: Implement SVE floating-point trig select coefficient
  target/arm: Implement SVE Element Count Group, register destinations

 target/arm/helper-sve.h    |  409 ++++++++++++++
 target/arm/helper.h        |    1 +
 target/arm/translate-a64.h |  112 ++++
 target/arm/sve_helper.c    | 1177 +++++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c |  272 +++------
 target/arm/translate-sve.c | 1313 ++++++++++++++++++++++++++++++++++++++++++++
 .gitignore                 |    1 +
 scripts/decodetree.py      |  984 +++++++++++++++++++++++++++++++++
 target/arm/Makefile.objs   |   11 +
 target/arm/sve.def         |  328 +++++++++++
 10 files changed, 4418 insertions(+), 190 deletions(-)
 create mode 100644 target/arm/helper-sve.h
 create mode 100644 target/arm/translate-a64.h
 create mode 100644 target/arm/sve_helper.c
 create mode 100644 target/arm/translate-sve.c
 create mode 100755 scripts/decodetree.py
 create mode 100644 target/arm/sve.def

-- 
2.14.3

Peter Maydell Jan. 11, 2018, 5:56 p.m. UTC | #1

On 18 December 2017 at 17:45, Richard Henderson
<richard.henderson@linaro.org> wrote:
> The most important part here, for review, is the first patch.

>

> I add a code generator, writen in python, which takes an input file

> that describes the opcode bits and field bits of the instructions,

> and outputs a function that does all of the decoding.

>

> The subsequent patches begin to add SVE support and also demonstrate

> how I envision how both the decoder and the tcg host vector support

> are to be used.  Thus, review of the direction would be appreciated

> before there are another 100 patches along the same style.


This doesn't apply to master -- do you have an example of
what the generated code comes out like?

thanks
-- PMM

Richard Henderson Jan. 11, 2018, 7:23 p.m. UTC | #2

On 01/11/2018 09:56 AM, Peter Maydell wrote:
> On 18 December 2017 at 17:45, Richard Henderson

> <richard.henderson@linaro.org> wrote:

>> The most important part here, for review, is the first patch.

>>

>> I add a code generator, writen in python, which takes an input file

>> that describes the opcode bits and field bits of the instructions,

>> and outputs a function that does all of the decoding.

>>

>> The subsequent patches begin to add SVE support and also demonstrate

>> how I envision how both the decoder and the tcg host vector support

>> are to be used.  Thus, review of the direction would be appreciated

>> before there are another 100 patches along the same style.

> 

> This doesn't apply to master -- do you have an example of

> what the generated code comes out like?


That's why I gave you a link to a buildable branch on Tuesday.

But here's are some snippets from what's current in my tree.

Note that I play games with the decode and translation such that e.g. SETFFR ->
PTRUE p16, all; RDFFR pd -> ORR pd, p16, p16, p16.  That's what you'll be
seeing in the last dozen lines.  But I also chose that snippet because it shows
the nesting when instruction subsets need to decode more bits.


r~


    switch ((insn >> 24) & 0xff) {
    case 0x4:
        /* 00000100 ........ ........ ........ */
        switch (insn & 0x0020e000) {
        case 0x00000000:
            /* 00000100 ..0..... 000..... ........ */
            switch ((insn >> 16) & 0x1f) {
            case 0x0:
                /* 00000100 ..000000 000..... ........ */
                extract_rdn_pg_rm_esz(&u.f_rprr_esz, insn);
                trans_ADD_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;
            case 0x1:
                /* 00000100 ..000001 000..... ........ */
                extract_rdn_pg_rm_esz(&u.f_rprr_esz, insn);
                trans_SUB_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;
            case 0x3:
                /* 00000100 ..000011 000..... ........ */
                extract_rdm_pg_rn_esz(&u.f_rprr_esz, insn);
                trans_SUB_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;
            case 0x8:
                /* 00000100 ..001000 000..... ........ */
                extract_rdn_pg_rm_esz(&u.f_rprr_esz, insn);
                trans_SMAX_zpzz(ctx, &u.f_rprr_esz, insn);
                return true;

...

            case 0x00100000:
                /* 00100101 ..01.... 11...... ...0.... */
                switch ((insn >> 17) & 0x7) {
                case 0x0:
                    /* 00100101 ..01000. 11...... ...0.... */
                    extract_Fmt_42(&u.f_22, insn);
                    switch (insn & 0x00c1020f) {
                    case 0x00400000:
                        /* 00100101 01010000 11....0. ...00000 */
                        trans_PTEST(ctx, &u.f_22, insn);
                        return true;
                    }
                    return false;
                case 0x4:
                    /* 00100101 ..01100. 11...... ...0.... */
                    switch ((insn >> 10) & 0xf) {
                    case 0x0:
                        /* 00100101 ..01100. 110000.. ...0.... */
                        extract_pd_pn(&u.f_rr_esz, insn);
                        switch (insn & 0x00c10200) {
                        case 0x00400000:
                            /* 00100101 01011000 1100000. ...0.... */
                            trans_PFIRST(ctx, &u.f_rr_esz, insn);
                            return true;
                        }
                        return false;
                    case 0x1:
                        /* 00100101 ..01100. 110001.. ...0.... */
                        extract_pd_pn_esz(&u.f_rr_esz, insn);
                        switch (insn & 0x00010200) {
                        case 0x00010000:
                            /* 00100101 ..011001 1100010. ...0.... */
                            trans_PNEXT(ctx, &u.f_rr_esz, insn);
                            return true;
                        }
                        return false;
                    case 0x8:
                        /* 00100101 ..01100. 111000.. ...0.... */
                        extract_Fmt_43(&u.f_ptrue, insn);
                        trans_PTRUE(ctx, &u.f_ptrue, insn);
                        return true;
                    case 0x9:
                        /* 00100101 ..01100. 111001.. ...0.... */
                        extract_Fmt_45(&u.f_ptrue, insn);
                        switch (insn & 0x00c103e0) {
                        case 0x00000000:
                            /* 00100101 00011000 11100100 0000.... */
                            trans_PTRUE(ctx, &u.f_ptrue, insn);
                            return true;
                        }
                        return false;
                    case 0xc:
                        /* 00100101 ..01100. 111100.. ...0.... */
                        switch (insn & 0x00810200) {
                        case 0x00000000:
                            /* 00100101 0.011000 1111000. ...0.... */
                            extract_Fmt_46(&u.f_rprr_s, insn);
                            trans_ORR_pppp(ctx, &u.f_rprr_s, insn);
                            return true;
                        case 0x00010000:
                            /* 00100101 0.011001 1111000. ...0.... */
                            extract_Fmt_47(&u.f_rprr_s, insn);
                            switch (insn & 0x004001e0) {
                            case 0x00000000:
                                /* 00100101 00011001 11110000 0000.... */
                                trans_ORR_pppp(ctx, &u.f_rprr_s, insn);
                                return true;
                            }
                            return false;
                        }
                        return false;
                    }
                    return false;
                }
                return false;

Peter Maydell Jan. 11, 2018, 7:27 p.m. UTC | #3

On 11 January 2018 at 19:23, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/11/2018 09:56 AM, Peter Maydell wrote:

>> On 18 December 2017 at 17:45, Richard Henderson

>> <richard.henderson@linaro.org> wrote:

>>> The most important part here, for review, is the first patch.

>>>

>>> I add a code generator, writen in python, which takes an input file

>>> that describes the opcode bits and field bits of the instructions,

>>> and outputs a function that does all of the decoding.

>>>

>>> The subsequent patches begin to add SVE support and also demonstrate

>>> how I envision how both the decoder and the tcg host vector support

>>> are to be used.  Thus, review of the direction would be appreciated

>>> before there are another 100 patches along the same style.

>>

>> This doesn't apply to master -- do you have an example of

>> what the generated code comes out like?

>

> That's why I gave you a link to a buildable branch on Tuesday.


tgt-arm-cplx ? I had a look at that but it seemed to be a different
set of patches to this lot.

thanks
-- PMM

Richard Henderson Jan. 11, 2018, 7:34 p.m. UTC | #4

On 01/11/2018 11:27 AM, Peter Maydell wrote:
> On 11 January 2018 at 19:23, Richard Henderson

> <richard.henderson@linaro.org> wrote:

>> On 01/11/2018 09:56 AM, Peter Maydell wrote:

>>> On 18 December 2017 at 17:45, Richard Henderson

>>> <richard.henderson@linaro.org> wrote:

>>>> The most important part here, for review, is the first patch.

>>>>

>>>> I add a code generator, writen in python, which takes an input file

>>>> that describes the opcode bits and field bits of the instructions,

>>>> and outputs a function that does all of the decoding.

>>>>

>>>> The subsequent patches begin to add SVE support and also demonstrate

>>>> how I envision how both the decoder and the tcg host vector support

>>>> are to be used.  Thus, review of the direction would be appreciated

>>>> before there are another 100 patches along the same style.

>>>

>>> This doesn't apply to master -- do you have an example of

>>> what the generated code comes out like?

>>

>> That's why I gave you a link to a buildable branch on Tuesday.

> 

> tgt-arm-cplx ? I had a look at that but it seemed to be a different

> set of patches to this lot.


No, tgt-arm-sve-{1,2,3}.  All of them should build.

tgt-arm-cplx is the armv8.{1,3} stuff that's waiting on fp16 to go in.


r~

Peter Maydell Jan. 12, 2018, 12:42 p.m. UTC | #5

On 11 January 2018 at 19:34, Richard Henderson
<richard.henderson@linaro.org> wrote:
> On 01/11/2018 11:27 AM, Peter Maydell wrote:

>> tgt-arm-cplx ? I had a look at that but it seemed to be a different

>> set of patches to this lot.

>

> No, tgt-arm-sve-{1,2,3}.  All of them should build.

Thank you. I've now had a look at the generated code and
reviewed patch 1. I had a quick look through some of the
later patches, mostly to check how the use of the script
looks, but I don't intend to review them properly at this
point (unless you think I should).

PS: a quote which kept coming to mind while I was reading
this patchset:

  'Er, what does the Z mean?' said Zaphod.
  'Which one?'
  'Any one.'

thanks
-- PMM

[RFC,00/23] target/arm: decode generator and initial sve patches

Message

Comments