[v2,00/67] target/arm: Scalable Vector Extension

Message ID	20180217182323.25885-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Date: Sat, 17 Feb 2018 10:22:16 -0800 Message-Id: <20180217182323.25885-1-richard.henderson@linaro.org> Subject: [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension Precedence: list Cc: qemu-arm@nongnu.org Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	target/arm: Scalable Vector Extension \| expand [v2,00/67] target/arm: Scalable Vector Extension [v2,01/67] target/arm: Enable SVE for aarch64-linux-user [v2,02/67] target/arm: Introduce translate-a64.h [v2,03/67] target/arm: Add SVE decode skeleton [v2,04/67] target/arm: Implement SVE Bitwise Logical - Unpredicated Group [v2,05/67] target/arm: Implement SVE load vector/predicate [v2,06/67] target/arm: Implement SVE predicate test [v2,07/67] target/arm: Implement SVE Predicate Logical Operations Group [v2,08/67] target/arm: Implement SVE Predicate Misc Group [v2,09/67] target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group [v2,10/67] target/arm: Implement SVE Integer Reduction Group [v2,11/67] target/arm: Implement SVE bitwise shift by immediate (predicated) [v2,12/67] target/arm: Implement SVE bitwise shift by vector (predicated) [v2,13/67] target/arm: Implement SVE bitwise shift by wide elements (predicated) [v2,14/67] target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group [v2,15/67] target/arm: Implement SVE Integer Multiply-Add Group [v2,16/67] target/arm: Implement SVE Integer Arithmetic - Unpredicated Group [v2,17/67] target/arm: Implement SVE Index Generation Group [v2,18/67] target/arm: Implement SVE Stack Allocation Group [v2,19/67] target/arm: Implement SVE Bitwise Shift - Unpredicated Group [v2,20/67] target/arm: Implement SVE Compute Vector Address Group [v2,21/67] target/arm: Implement SVE floating-point exponential accelerator [v2,22/67] target/arm: Implement SVE floating-point trig select coefficient [v2,23/67] target/arm: Implement SVE Element Count Group [v2,24/67] target/arm: Implement SVE Bitwise Immediate Group [v2,25/67] target/arm: Implement SVE Integer Wide Immediate - Predicated Group [v2,26/67] target/arm: Implement SVE Permute - Extract Group [v2,27/67] target/arm: Implement SVE Permute - Unpredicated Group [v2,28/67] target/arm: Implement SVE Permute - Predicates Group [v2,29/67] target/arm: Implement SVE Permute - Interleaving Group [v2,30/67] target/arm: Implement SVE compress active elements [v2,31/67] target/arm: Implement SVE conditionally broadcast/extract element [v2,32/67] target/arm: Implement SVE copy to vector (predicated) [v2,33/67] target/arm: Implement SVE reverse within elements [v2,34/67] target/arm: Implement SVE vector splice (predicated) [v2,35/67] target/arm: Implement SVE Select Vectors Group [v2,36/67] target/arm: Implement SVE Integer Compare - Vectors Group [v2,37/67] target/arm: Implement SVE Integer Compare - Immediate Group [v2,38/67] target/arm: Implement SVE Partition Break Group [v2,39/67] target/arm: Implement SVE Predicate Count Group [v2,40/67] target/arm: Implement SVE Integer Compare - Scalars Group [v2,41/67] target/arm: Implement FDUP/DUP [v2,42/67] target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group [v2,43/67] target/arm: Implement SVE Floating Point Arithmetic - Unpredicated Group [v2,44/67] target/arm: Implement SVE Memory Contiguous Load Group [v2,45/67] target/arm: Implement SVE Memory Contiguous Store Group [v2,46/67] target/arm: Implement SVE load and broadcast quadword [v2,47/67] target/arm: Implement SVE integer convert to floating-point [v2,48/67] target/arm: Implement SVE floating-point arithmetic (predicated) [v2,49/67] target/arm: Implement SVE FP Multiply-Add Group [v2,50/67] target/arm: Implement SVE Floating Point Accumulating Reduction Group [v2,51/67] target/arm: Implement SVE load and broadcast element [v2,52/67] target/arm: Implement SVE store vector/predicate register [v2,53/67] target/arm: Implement SVE scatter stores [v2,54/67] target/arm: Implement SVE prefetches [v2,55/67] target/arm: Implement SVE gather loads [v2,56/67] target/arm: Implement SVE scatter store vector immediate [v2,57/67] target/arm: Implement SVE floating-point compare vectors [v2,58/67] target/arm: Implement SVE floating-point arithmetic with immediate [v2,59/67] target/arm: Implement SVE Floating Point Multiply Indexed Group [v2,60/67] target/arm: Implement SVE FP Fast Reduction Group [v2,61/67] target/arm: Implement SVE Floating Point Unary Operations - Unpredicated Group [v2,62/67] target/arm: Implement SVE FP Compare with Zero Group [v2,63/67] target/arm: Implement SVE floating-point trig multiply-add coefficient [v2,64/67] target/arm: Implement SVE floating-point convert precision [v2,65/67] target/arm: Implement SVE floating-point convert to integer [v2,66/67] target/arm: Implement SVE floating-point round to integral value [v2,67/67] target/arm: Implement SVE floating-point unary operations

Message ID

20180217182323.25885-1-richard.henderson@linaro.org

Headers

Received-SPF: pass (google.com: domain of
	qemu-devel-bounces+patch=linaro.org@nongnu.org designates
	2001:4830:134:3::11 as permitted sender)
	client-ip=2001:4830:134:3::11; 
From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Date: Sat, 17 Feb 2018 10:22:16 -0800
Message-Id: <20180217182323.25885-1-richard.henderson@linaro.org>
Subject: [Qemu-devel] [PATCH v2 00/67] target/arm: Scalable Vector Extension
Precedence: list
Cc: qemu-arm@nongnu.org
Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org
Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>

Series

target/arm: Scalable Vector Extension | expand

Message

Richard Henderson Feb. 17, 2018, 6:22 p.m. UTC

This is 99% of the instruction set.  There are a few things missing,
notably first-fault and non-fault loads (even these are decoded, but
simply treated as normal loads for now).

The patch set is dependant on at least 3 other branches.
A fully composed tree is available as

  git://github.com/rth7680/qemu.git tgt-arm-sve-7

There are a few checkpatch errors due to macros and typedefs, but
nothing that isn't be obvious as a false positive.

This is able to run SVE enabled Himeno and LULESH benchmarks as
compiled by last week's gcc-8:

$ ./aarch64-linux-user/qemu-aarch64 ~/himeno-advsimd
mimax = 129 mjmax = 65 mkmax = 65
imax = 128 jmax = 64 kmax =64
cpu : 67.028643 sec.
Loop executed for 200 times
Gosa : 1.688752e-03 
MFLOPS measured : 49.136295
Score based on MMX Pentium 200MHz : 1.522662

$ ./aarch64-linux-user/qemu-aarch64 ~/himeno-sve 
mimax = 129 mjmax = 65 mkmax = 65
imax = 128 jmax = 64 kmax =64
cpu : 43.481213 sec.
Loop executed for 200 times
Gosa : 3.830036e-06 
MFLOPS measured : 75.746259
Score based on MMX Pentium 200MHz : 2.347266

Hopefully the size of the patch set isn't too daunting...


r~


Richard Henderson (67):
  target/arm: Enable SVE for aarch64-linux-user
  target/arm: Introduce translate-a64.h
  target/arm: Add SVE decode skeleton
  target/arm: Implement SVE Bitwise Logical - Unpredicated Group
  target/arm: Implement SVE load vector/predicate
  target/arm: Implement SVE predicate test
  target/arm: Implement SVE Predicate Logical Operations Group
  target/arm: Implement SVE Predicate Misc Group
  target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group
  target/arm: Implement SVE Integer Reduction Group
  target/arm: Implement SVE bitwise shift by immediate (predicated)
  target/arm: Implement SVE bitwise shift by vector (predicated)
  target/arm: Implement SVE bitwise shift by wide elements (predicated)
  target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group
  target/arm: Implement SVE Integer Multiply-Add Group
  target/arm: Implement SVE Integer Arithmetic - Unpredicated Group
  target/arm: Implement SVE Index Generation Group
  target/arm: Implement SVE Stack Allocation Group
  target/arm: Implement SVE Bitwise Shift - Unpredicated Group
  target/arm: Implement SVE Compute Vector Address Group
  target/arm: Implement SVE floating-point exponential accelerator
  target/arm: Implement SVE floating-point trig select coefficient
  target/arm: Implement SVE Element Count Group
  target/arm: Implement SVE Bitwise Immediate Group
  target/arm: Implement SVE Integer Wide Immediate - Predicated Group
  target/arm: Implement SVE Permute - Extract Group
  target/arm: Implement SVE Permute - Unpredicated Group
  target/arm: Implement SVE Permute - Predicates Group
  target/arm: Implement SVE Permute - Interleaving Group
  target/arm: Implement SVE compress active elements
  target/arm: Implement SVE conditionally broadcast/extract element
  target/arm: Implement SVE copy to vector (predicated)
  target/arm: Implement SVE reverse within elements
  target/arm: Implement SVE vector splice (predicated)
  target/arm: Implement SVE Select Vectors Group
  target/arm: Implement SVE Integer Compare - Vectors Group
  target/arm: Implement SVE Integer Compare - Immediate Group
  target/arm: Implement SVE Partition Break Group
  target/arm: Implement SVE Predicate Count Group
  target/arm: Implement SVE Integer Compare - Scalars Group
  target/arm: Implement FDUP/DUP
  target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group
  target/arm: Implement SVE Floating Point Arithmetic - Unpredicated
    Group
  target/arm: Implement SVE Memory Contiguous Load Group
  target/arm: Implement SVE Memory Contiguous Store Group
  target/arm: Implement SVE load and broadcast quadword
  target/arm: Implement SVE integer convert to floating-point
  target/arm: Implement SVE floating-point arithmetic (predicated)
  target/arm: Implement SVE FP Multiply-Add Group
  target/arm: Implement SVE Floating Point Accumulating Reduction Group
  target/arm: Implement SVE load and broadcast element
  target/arm: Implement SVE store vector/predicate register
  target/arm: Implement SVE scatter stores
  target/arm: Implement SVE prefetches
  target/arm: Implement SVE gather loads
  target/arm: Implement SVE scatter store vector immediate
  target/arm: Implement SVE floating-point compare vectors
  target/arm: Implement SVE floating-point arithmetic with immediate
  target/arm: Implement SVE Floating Point Multiply Indexed Group
  target/arm: Implement SVE FP Fast Reduction Group
  target/arm: Implement SVE Floating Point Unary Operations -
    Unpredicated Group
  target/arm: Implement SVE FP Compare with Zero Group
  target/arm: Implement SVE floating-point trig multiply-add coefficient
  target/arm: Implement SVE floating-point convert precision
  target/arm: Implement SVE floating-point convert to integer
  target/arm: Implement SVE floating-point round to integral value
  target/arm: Implement SVE floating-point unary operations

 target/arm/cpu.h           |    7 +-
 target/arm/helper-sve.h    | 1285 ++++++++++++
 target/arm/helper.h        |   42 +
 target/arm/translate-a64.h |  110 ++
 target/arm/cpu.c           |    7 +
 target/arm/cpu64.c         |    1 +
 target/arm/sve_helper.c    | 4051 ++++++++++++++++++++++++++++++++++++++
 target/arm/translate-a64.c |  112 +-
 target/arm/translate-sve.c | 4626 ++++++++++++++++++++++++++++++++++++++++++++
 target/arm/vec_helper.c    |  178 ++
 .gitignore                 |    1 +
 target/arm/Makefile.objs   |   12 +-
 target/arm/sve.decode      | 1067 ++++++++++
 13 files changed, 11408 insertions(+), 91 deletions(-)
 create mode 100644 target/arm/helper-sve.h
 create mode 100644 target/arm/translate-a64.h
 create mode 100644 target/arm/sve_helper.c
 create mode 100644 target/arm/translate-sve.c
 create mode 100644 target/arm/vec_helper.c
 create mode 100644 target/arm/sve.decode

-- 
2.14.3

Comments

Alex Bennée Feb. 23, 2018, 5:05 p.m. UTC | #1

Richard Henderson <richard.henderson@linaro.org> writes:

> This is 99% of the instruction set.  There are a few things missing,

> notably first-fault and non-fault loads (even these are decoded, but

> simply treated as normal loads for now).

>

> The patch set is dependant on at least 3 other branches.

> A fully composed tree is available as

>

>   git://github.com/rth7680/qemu.git tgt-arm-sve-7


Well now it's down just my half-precision patches because I was able to
apply this to my recently re-based against master arm-fp16-v3:

  https://github.com/stsquad/qemu/tree/review/sve-vectors-v2-rebase

>

> There are a few checkpatch errors due to macros and typedefs, but

> nothing that isn't be obvious as a false positive.

>

> This is able to run SVE enabled Himeno and LULESH benchmarks as

> compiled by last week's gcc-8:

>

> $ ./aarch64-linux-user/qemu-aarch64 ~/himeno-advsimd

> mimax = 129 mjmax = 65 mkmax = 65

> imax = 128 jmax = 64 kmax =64

> cpu : 67.028643 sec.

> Loop executed for 200 times

> Gosa : 1.688752e-03

> MFLOPS measured : 49.136295

> Score based on MMX Pentium 200MHz : 1.522662

>

> $ ./aarch64-linux-user/qemu-aarch64 ~/himeno-sve

> mimax = 129 mjmax = 65 mkmax = 65

> imax = 128 jmax = 64 kmax =64

> cpu : 43.481213 sec.

> Loop executed for 200 times

> Gosa : 3.830036e-06

> MFLOPS measured : 75.746259

> Score based on MMX Pentium 200MHz : 2.347266

>

> Hopefully the size of the patch set isn't too daunting...

>

>

> r~

>

>

> Richard Henderson (67):

>   target/arm: Enable SVE for aarch64-linux-user

>   target/arm: Introduce translate-a64.h

>   target/arm: Add SVE decode skeleton

>   target/arm: Implement SVE Bitwise Logical - Unpredicated Group

>   target/arm: Implement SVE load vector/predicate

>   target/arm: Implement SVE predicate test

>   target/arm: Implement SVE Predicate Logical Operations Group

>   target/arm: Implement SVE Predicate Misc Group

>   target/arm: Implement SVE Integer Binary Arithmetic - Predicated Group

>   target/arm: Implement SVE Integer Reduction Group

>   target/arm: Implement SVE bitwise shift by immediate (predicated)

>   target/arm: Implement SVE bitwise shift by vector (predicated)

>   target/arm: Implement SVE bitwise shift by wide elements (predicated)

>   target/arm: Implement SVE Integer Arithmetic - Unary Predicated Group

>   target/arm: Implement SVE Integer Multiply-Add Group

>   target/arm: Implement SVE Integer Arithmetic - Unpredicated Group

>   target/arm: Implement SVE Index Generation Group

>   target/arm: Implement SVE Stack Allocation Group

>   target/arm: Implement SVE Bitwise Shift - Unpredicated Group

>   target/arm: Implement SVE Compute Vector Address Group

>   target/arm: Implement SVE floating-point exponential accelerator

>   target/arm: Implement SVE floating-point trig select coefficient

>   target/arm: Implement SVE Element Count Group

>   target/arm: Implement SVE Bitwise Immediate Group

>   target/arm: Implement SVE Integer Wide Immediate - Predicated Group

>   target/arm: Implement SVE Permute - Extract Group

>   target/arm: Implement SVE Permute - Unpredicated Group

>   target/arm: Implement SVE Permute - Predicates Group

>   target/arm: Implement SVE Permute - Interleaving Group

>   target/arm: Implement SVE compress active elements

>   target/arm: Implement SVE conditionally broadcast/extract element

>   target/arm: Implement SVE copy to vector (predicated)

>   target/arm: Implement SVE reverse within elements

>   target/arm: Implement SVE vector splice (predicated)

>   target/arm: Implement SVE Select Vectors Group

>   target/arm: Implement SVE Integer Compare - Vectors Group

>   target/arm: Implement SVE Integer Compare - Immediate Group

>   target/arm: Implement SVE Partition Break Group

>   target/arm: Implement SVE Predicate Count Group

>   target/arm: Implement SVE Integer Compare - Scalars Group

>   target/arm: Implement FDUP/DUP

>   target/arm: Implement SVE Integer Wide Immediate - Unpredicated Group

>   target/arm: Implement SVE Floating Point Arithmetic - Unpredicated

>     Group

>   target/arm: Implement SVE Memory Contiguous Load Group

>   target/arm: Implement SVE Memory Contiguous Store Group

>   target/arm: Implement SVE load and broadcast quadword

>   target/arm: Implement SVE integer convert to floating-point

>   target/arm: Implement SVE floating-point arithmetic (predicated)

>   target/arm: Implement SVE FP Multiply-Add Group

>   target/arm: Implement SVE Floating Point Accumulating Reduction Group

>   target/arm: Implement SVE load and broadcast element

>   target/arm: Implement SVE store vector/predicate register

>   target/arm: Implement SVE scatter stores

>   target/arm: Implement SVE prefetches

>   target/arm: Implement SVE gather loads

>   target/arm: Implement SVE scatter store vector immediate

>   target/arm: Implement SVE floating-point compare vectors

>   target/arm: Implement SVE floating-point arithmetic with immediate

>   target/arm: Implement SVE Floating Point Multiply Indexed Group

>   target/arm: Implement SVE FP Fast Reduction Group

>   target/arm: Implement SVE Floating Point Unary Operations -

>     Unpredicated Group

>   target/arm: Implement SVE FP Compare with Zero Group

>   target/arm: Implement SVE floating-point trig multiply-add coefficient

>   target/arm: Implement SVE floating-point convert precision

>   target/arm: Implement SVE floating-point convert to integer

>   target/arm: Implement SVE floating-point round to integral value

>   target/arm: Implement SVE floating-point unary operations

>

>  target/arm/cpu.h           |    7 +-

>  target/arm/helper-sve.h    | 1285 ++++++++++++

>  target/arm/helper.h        |   42 +

>  target/arm/translate-a64.h |  110 ++

>  target/arm/cpu.c           |    7 +

>  target/arm/cpu64.c         |    1 +

>  target/arm/sve_helper.c    | 4051 ++++++++++++++++++++++++++++++++++++++

>  target/arm/translate-a64.c |  112 +-

>  target/arm/translate-sve.c | 4626 ++++++++++++++++++++++++++++++++++++++++++++

>  target/arm/vec_helper.c    |  178 ++

>  .gitignore                 |    1 +

>  target/arm/Makefile.objs   |   12 +-

>  target/arm/sve.decode      | 1067 ++++++++++

>  13 files changed, 11408 insertions(+), 91 deletions(-)

>  create mode 100644 target/arm/helper-sve.h

>  create mode 100644 target/arm/translate-a64.h

>  create mode 100644 target/arm/sve_helper.c

>  create mode 100644 target/arm/translate-sve.c

>  create mode 100644 target/arm/vec_helper.c

>  create mode 100644 target/arm/sve.decode



--
Alex Bennée

Alex Bennée April 3, 2018, 3:41 p.m. UTC | #2

Richard Henderson <richard.henderson@linaro.org> writes:

> This is 99% of the instruction set.  There are a few things missing,

> notably first-fault and non-fault loads (even these are decoded, but

> simply treated as normal loads for now).

I've finished my quick pass, apart from the individual comments I think
it looks pretty good.

--
Alex Bennée