[00/34] tcg, target/ppc vector improvements

Message ID	20181218063911.2112-1-richard.henderson@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of qemu-devel-bounces+patch=linaro.org@nongnu.org designates 2001:4830:134:3::11 as permitted sender) client-ip=2001:4830:134:3::11; From: Richard Henderson <richard.henderson@linaro.org> To: qemu-devel@nongnu.org Date: Mon, 17 Dec 2018 22:38:37 -0800 Message-Id: <20181218063911.2112-1-richard.henderson@linaro.org> Subject: [Qemu-devel] [PATCH 00/34] tcg, target/ppc vector improvements Precedence: list Cc: qemu-ppc@nongnu.org, mark.cave-ayland@ilande.co.uk, david@gibson.dropbear.id.au Errors-To: qemu-devel-bounces+patch=linaro.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patch=linaro.org@nongnu.org>
Series	tcg, target/ppc vector improvements \| expand [00/34] tcg, target/ppc vector improvements [01/34] tcg: Add logical simplifications during gvec expand [02/34] target/arm: Rely on optimization within tcg_gen_gvec_or [03/34] tcg: Add gvec expanders for nand, nor, eqv [04/34] tcg: Add write_aofs to GVecGen4 [05/34] tcg: Add opcodes for vector saturated arithmetic [06/34] tcg/i386: Implement vector saturating arithmetic [07/34] tcg: Add opcodes for vector minmax arithmetic [08/34] tcg/i386: Implement vector minmax arithmetic [09/34] target/arm: Use vector minmax expanders for aarch64 [10/34] target/arm: Use vector minmax expanders for aarch32 [11/34] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access [12/34] target/ppc: introduce get_avr64() and set_avr64() helpers for VMX register access [13/34] target/ppc: introduce get_cpu_vsr{l, h}() and set_cpu_vsr{l, h}() helpers for VSR registe... [14/34] target/ppc: switch FPR, VMX and VSX helpers to access data directly from cpu_env [15/34] target/ppc: merge ppc_vsr_t and ppc_avr_t union types [16/34] target/ppc: move FP and VMX registers into aligned vsr register array [17/34] target/ppc: convert VMX logical instructions to use vector operations [18/34] target/ppc: convert vaddu[b, h, w, d] and vsubu[b, h, w, d] over to use vector operations [19/34] target/ppc: convert vspltis[bhw] to use vector operations [20/34] target/ppc: convert vsplt[bhw] to use vector operations [21/34] target/ppc: nand, nor, eqv are now generic vector operations [22/34] target/ppc: convert VSX logical operations to vector operations [23/34] target/ppc: convert xxspltib to vector operations [24/34] target/ppc: convert xxspltw to vector operations [25/34] target/ppc: convert xxsel to vector operations [26/34] target/ppc: Pass integer to helper_mtvscr [27/34] target/ppc: Use helper_mtvscr for reset and gdb [28/34] target/ppc: Remove vscr_nj and vscr_sat [29/34] target/ppc: Add helper_mfvscr [30/34] target/ppc: Use mtvscr/mfvscr for vmstate [31/34] target/ppc: Add set_vscr_sat [32/34] target/ppc: Split out VSCR_SAT to a vector field [33/34] target/ppc: convert vadds and vsubs to vector operations [34/34] target/ppc: convert vmin* and vmax* to vector operations

Richard Henderson Dec. 18, 2018, 6:38 a.m. UTC

This implements some of the things that I talked about with Mark
this morning / yesterday.  In particular:

(0) Implement expanders for nand, nor, eqv logical operations.

(1) Implement saturating arithmetic for the tcg backend.

    While I had expanders for these, they always went to helpers.
    It's easy enough to expand byte and half-word operations for x86.
    Beyond that, 32 and 64-bit operations can be expanded with integers.

(2) Implement minmax arithmetic for the tcg backend.

    While I had integral minmax operations, I had not yet added
    any vector expanders for this.  (The integral stuff came in
    for atomic minmax.)

(3) Trivial conversions to minmax for target/arm.

(4) Patches 11-18 are identical to Mark's.

(5) Patches 19-25 implement splat and logicals for VMX and VSX.

    VSX is no more difficult than VMX for these.  It does seem to be
    just about everything that we can do for VSX at the momement.

(6) Patches 26-33 implement saturating arithmetic for VMX.

(7) Patch 34 implements minmax arithmetic for VMX.

I've tested the new operations via aarch64 guest, as that's the set
of risu test cases I've got handy.  The rest is untested so far.


r~


Mark Cave-Ayland (8):
  target/ppc: introduce get_fpr() and set_fpr() helpers for FP register
    access
  target/ppc: introduce get_avr64() and set_avr64() helpers for VMX
    register access
  target/ppc: introduce get_cpu_vsr{l,h}() and set_cpu_vsr{l,h}()
    helpers for VSR register access
  target/ppc: switch FPR, VMX and VSX helpers to access data directly
    from cpu_env
  target/ppc: merge ppc_vsr_t and ppc_avr_t union types
  target/ppc: move FP and VMX registers into aligned vsr register array
  target/ppc: convert VMX logical instructions to use vector operations
  target/ppc: convert vaddu[b,h,w,d] and vsubu[b,h,w,d] over to use
    vector operations

Richard Henderson (26):
  tcg: Add logical simplifications during gvec expand
  target/arm: Rely on optimization within tcg_gen_gvec_or
  tcg: Add gvec expanders for nand, nor, eqv
  tcg: Add write_aofs to GVecGen4
  tcg: Add opcodes for vector saturated arithmetic
  tcg/i386: Implement vector saturating arithmetic
  tcg: Add opcodes for vector minmax arithmetic
  tcg/i386: Implement vector minmax arithmetic
  target/arm: Use vector minmax expanders for aarch64
  target/arm: Use vector minmax expanders for aarch32
  target/ppc: convert vspltis[bhw] to use vector operations
  target/ppc: convert vsplt[bhw] to use vector operations
  target/ppc: nand, nor, eqv are now generic vector operations
  target/ppc: convert VSX logical operations to vector operations
  target/ppc: convert xxspltib to vector operations
  target/ppc: convert xxspltw to vector operations
  target/ppc: convert xxsel to vector operations
  target/ppc: Pass integer to helper_mtvscr
  target/ppc: Use helper_mtvscr for reset and gdb
  target/ppc: Remove vscr_nj and vscr_sat
  target/ppc: Add helper_mfvscr
  target/ppc: Use mtvscr/mfvscr for vmstate
  target/ppc: Add set_vscr_sat
  target/ppc: Split out VSCR_SAT to a vector field
  target/ppc: convert vadd*s and vsub*s to vector operations
  target/ppc: convert vmin* and vmax* to vector operations

 accel/tcg/tcg-runtime.h             |  23 +
 target/ppc/cpu.h                    |  30 +-
 target/ppc/helper.h                 |  57 +-
 target/ppc/internal.h               |  29 +-
 tcg/aarch64/tcg-target.h            |   2 +
 tcg/i386/tcg-target.h               |   2 +
 tcg/tcg-op-gvec.h                   |  18 +
 tcg/tcg-op.h                        |  11 +
 tcg/tcg-opc.h                       |   8 +
 tcg/tcg.h                           |   2 +
 accel/tcg/tcg-runtime-gvec.c        | 257 +++++++++
 linux-user/ppc/signal.c             |  24 +-
 target/arm/translate-a64.c          |  41 +-
 target/arm/translate-sve.c          |   6 +-
 target/arm/translate.c              |  37 +-
 target/ppc/arch_dump.c              |  15 +-
 target/ppc/gdbstub.c                |   8 +-
 target/ppc/int_helper.c             | 194 +++----
 target/ppc/machine.c                | 116 +++-
 target/ppc/monitor.c                |   4 +-
 target/ppc/translate.c              |  74 ++-
 target/ppc/translate/dfp-impl.inc.c |   2 +-
 target/ppc/translate/fp-impl.inc.c  | 490 ++++++++++++----
 target/ppc/translate/vmx-impl.inc.c | 349 +++++++-----
 target/ppc/translate/vsx-impl.inc.c | 834 +++++++++++++++++++---------
 target/ppc/translate_init.inc.c     |  31 +-
 tcg/i386/tcg-target.inc.c           | 106 ++++
 tcg/tcg-op-gvec.c                   | 305 ++++++++--
 tcg/tcg-op-vec.c                    |  75 ++-
 tcg/tcg.c                           |  10 +
 30 files changed, 2275 insertions(+), 885 deletions(-)

-- 
2.17.2

Mark Cave-Ayland Dec. 18, 2018, 9:49 a.m. UTC | #1

On 18/12/2018 06:38, Richard Henderson wrote:

> This implements some of the things that I talked about with Mark

> this morning / yesterday.  In particular:

> 

> (0) Implement expanders for nand, nor, eqv logical operations.

> 

> (1) Implement saturating arithmetic for the tcg backend.

> 

>     While I had expanders for these, they always went to helpers.

>     It's easy enough to expand byte and half-word operations for x86.

>     Beyond that, 32 and 64-bit operations can be expanded with integers.

> 

> (2) Implement minmax arithmetic for the tcg backend.

> 

>     While I had integral minmax operations, I had not yet added

>     any vector expanders for this.  (The integral stuff came in

>     for atomic minmax.)

> 

> (3) Trivial conversions to minmax for target/arm.

> 

> (4) Patches 11-18 are identical to Mark's.

> 

> (5) Patches 19-25 implement splat and logicals for VMX and VSX.

> 

>     VSX is no more difficult than VMX for these.  It does seem to be

>     just about everything that we can do for VSX at the momement.

> 

> (6) Patches 26-33 implement saturating arithmetic for VMX.

> 

> (7) Patch 34 implements minmax arithmetic for VMX.

> 

> I've tested the new operations via aarch64 guest, as that's the set

> of risu test cases I've got handy.  The rest is untested so far.

Thank you for working on this! I've just given this patchset a spin on my test images
and here's what I found:

- The version of my target/ppc patchset you've used is the one that I posted to the
mailing list which doesn't have the GEN_FLOAT macro fixes, removal of the uint64_t *
cast that you requested, and additional SoBs

I've taken this patchset, replaced my patches with the latest versions, and repushed
to github at https://github.com/mcayland/qemu/tree/ppc-altivec-rth.

- This patchset introduces visual artefacts on-screen for both OS X and OS 9

A quick bisection suggests that there could be 2 separate issues related to the
implementation of splat:

Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black
border to appear around the OS X splash screen
(https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an
overflow/alignment issue.

Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector
operations" causes corruption of the OS X splash screen
(https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be
an endian issue.

Having said that, the results look really promising, and I don't think it will take
too long to resolve any outstanding issues. I will be around on IRC later today if
that helps too.

ATB,

Mark.

Mark Cave-Ayland Dec. 18, 2018, 2:51 p.m. UTC | #2

On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> A quick bisection suggests that there could be 2 separate issues related to the

> implementation of splat:

> 

> Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black

> border to appear around the OS X splash screen

> (https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an

> overflow/alignment issue.


This one appears to be a sign extension issue - if I make use of the same technique
used by the previous helper then this problem goes away. Below is my experimental
diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations":

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index be638cdb1a..6cd25c8dc6 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \
 #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \
 static void glue(gen_, name)(DisasContext *ctx)                         \
     {                                                                   \
-        int simm;                                                       \
+        int8_t simm;                                                    \
         if (unlikely(!ctx->altivec_enabled)) {                          \
             gen_exception(ctx, POWERPC_EXCP_VPU);                       \
             return;                                                     \
         }                                                               \
-        simm = SIMM5(ctx->opcode);                                      \
+        simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3;                  \
         tcg_op(avr64_offset(rD(ctx->opcode), true), 16, 16, simm);      \
     }


ATB,

Mark.

Mark Cave-Ayland Dec. 18, 2018, 3:05 p.m. UTC | #3

On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector

> operations" causes corruption of the OS X splash screen

> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be

> an endian issue.


Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this
patch helps a lot, but something still isn't quite right:
https://www.ilande.co.uk/tmp/qemu/badapple3.png.

Adding some more debugging seems to suggest that boffs is being handled correctly
based upon vece/uimm...

ins: vsplth  bofs before: 304e0
  bofs after: 304e0  uimm: 0  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e2  uimm: 1  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e4  uimm: 2  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e6  uimm: 3  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e0  uimm: 0  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e2  uimm: 1  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e4  uimm: 2  vece: 1
ins: vsplth  bofs before: 304e0
  bofs after: 304e6  uimm: 3  vece: 1
ins: vsplth  bofs before: 30560
  bofs after: 3056e  uimm: 7  vece: 1
ins: vsplth  bofs before: 30540
  bofs after: 3054e  uimm: 7  vece: 1
ins: vsplth  bofs before: 30490
  bofs after: 30492  uimm: 1  vece: 1
ins: vspltw  bofs before: 30580
  bofs after: 3058c  uimm: 3  vece: 2
ins: vsplth  bofs before: 30580
  bofs after: 30586  uimm: 3  vece: 1
ins: vspltw  bofs before: 30580
  bofs after: 30580  uimm: 0  vece: 2
ins: vspltb  bofs before: 30560
  bofs after: 30560  uimm: 0  vece: 0
ins: vsplth  bofs before: 304d0
  bofs after: 304d0  uimm: 0  vece: 1
ins: vsplth  bofs before: 304d0
  bofs after: 304d2  uimm: 1  vece: 1
ins: vsplth  bofs before: 304d0
  bofs after: 304d4  uimm: 2  vece: 1
ins: vsplth  bofs before: 304d0
  bofs after: 304d4  uimm: 2  vece: 1


ATB,

Mark.

Richard Henderson Dec. 18, 2018, 3:07 p.m. UTC | #4

On 12/18/18 6:51 AM, Mark Cave-Ayland wrote:
> On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> 

>> A quick bisection suggests that there could be 2 separate issues related to the

>> implementation of splat:

>>

>> Patch "target/ppc: convert vspltis[bhw] to use vector operations" causes a black

>> border to appear around the OS X splash screen

>> (https://www.ilande.co.uk/tmp/qemu/badapple1.png) which may suggest an

>> overflow/alignment issue.

> 

> This one appears to be a sign extension issue - if I make use of the same technique

> used by the previous helper then this problem goes away. Below is my experimental

> diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations":

> 

> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c

> index be638cdb1a..6cd25c8dc6 100644

> --- a/target/ppc/translate/vmx-impl.inc.c

> +++ b/target/ppc/translate/vmx-impl.inc.c

> @@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \

>  #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \

>  static void glue(gen_, name)(DisasContext *ctx)                         \

>      {                                                                   \

> -        int simm;                                                       \

> +        int8_t simm; 


This shouldn't matter.
                                                   \
>          if (unlikely(!ctx->altivec_enabled)) {                          \

>              gen_exception(ctx, POWERPC_EXCP_VPU);                       \

>              return;                                                     \

>          }                                                               \

> -        simm = SIMM5(ctx->opcode);                                      \

> +        simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3;                  \


This suggests that SIMM5 should be using sextract32.


r~

Richard Henderson Dec. 18, 2018, 3:17 p.m. UTC | #5

On 12/18/18 7:05 AM, Mark Cave-Ayland wrote:
> On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> 

>> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector

>> operations" causes corruption of the OS X splash screen

>> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be

>> an endian issue.

> 

> Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this

> patch helps a lot, but something still isn't quite right:

> https://www.ilande.co.uk/tmp/qemu/badapple3.png.

I can't figure out what the host+guest endian rules for ppc_avr_t are at all.

Certainly there appear to be bugs wrt vscr and which end of the register we
pull the value.  On the tcg side we take host endianness into account, and on
the gdb side we always use u32[3].

r~

Mark Cave-Ayland Dec. 18, 2018, 3:22 p.m. UTC | #6

On 18/12/2018 15:07, Richard Henderson wrote:

>> This one appears to be a sign extension issue - if I make use of the same technique

>> used by the previous helper then this problem goes away. Below is my experimental

>> diff to be squashed into "target/ppc: convert vspltis[bhw] to use vector operations":

>>

>> diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c

>> index be638cdb1a..6cd25c8dc6 100644

>> --- a/target/ppc/translate/vmx-impl.inc.c

>> +++ b/target/ppc/translate/vmx-impl.inc.c

>> @@ -723,12 +723,12 @@ GEN_VXRFORM_DUAL(vcmpgtfp, PPC_ALTIVEC, PPC_NONE, \

>>  #define GEN_VXFORM_DUPI(name, tcg_op, opc2, opc3)                       \

>>  static void glue(gen_, name)(DisasContext *ctx)                         \

>>      {                                                                   \

>> -        int simm;                                                       \

>> +        int8_t simm; 

> 

> This shouldn't matter.

>                                                    \

>>          if (unlikely(!ctx->altivec_enabled)) {                          \

>>              gen_exception(ctx, POWERPC_EXCP_VPU);                       \

>>              return;                                                     \

>>          }                                                               \

>> -        simm = SIMM5(ctx->opcode);                                      \

>> +        simm = (int8_t)(SIMM5(ctx->opcode) << 3) >> 3;                  \

> 

> This suggests that SIMM5 should be using sextract32.


There's certainly an obvious typo here, but on its own it doesn't fix the issue:

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index b77d564a65..08eee1cd84 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -124,7 +124,7 @@ EXTRACT_SHELPER(SIMM, 0, 16);
 /* 16 bits unsigned immediate value */
 EXTRACT_HELPER(UIMM, 0, 16);
 /* 5 bits signed immediate value */
-EXTRACT_HELPER(SIMM5, 16, 5);
+EXTRACT_SHELPER(SIMM5, 16, 5);
 /* 5 bits signed immediate value */
 EXTRACT_HELPER(UIMM5, 16, 5);
 /* 4 bits unsigned immediate value */


ATB,

Mark.

Mark Cave-Ayland Dec. 18, 2018, 3:26 p.m. UTC | #7

On 18/12/2018 15:17, Richard Henderson wrote:

> On 12/18/18 7:05 AM, Mark Cave-Ayland wrote:

>> On 18/12/2018 09:49, Mark Cave-Ayland wrote:

>>

>>> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector

>>> operations" causes corruption of the OS X splash screen

>>> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be

>>> an endian issue.

>>

>> Changing "#ifndef HOST_WORDS_BIGENDIAN" to "#ifdef HOST_WORDS_BIGENDIAN" in this

>> patch helps a lot, but something still isn't quite right:

>> https://www.ilande.co.uk/tmp/qemu/badapple3.png.

> 

> I can't figure out what the host+guest endian rules for ppc_avr_t are at all.

> 

> Certainly there appear to be bugs wrt vscr and which end of the register we

> pull the value.  On the tcg side we take host endianness into account, and on

> the gdb side we always use u32[3].


That seems wrong to me. Given that the ppc_avr_t is a union then I'd expect it to be
in host order? Certainly in the VMX helper macros I've looked at, the members are set
directly with no byte swapping.


ATB,

Mark.

Richard Henderson Dec. 18, 2018, 4:16 p.m. UTC | #8

On 12/18/18 7:26 AM, Mark Cave-Ayland wrote:
> That seems wrong to me. Given that the ppc_avr_t is a union then I'd expect it to be

> in host order? Certainly in the VMX helper macros I've looked at, the members are set

> directly with no byte swapping.


"Host order"?  For both words of the vector?

That's certainly going to cause problems wrt VSX and FPU registers.  We're
hard-coding that as fpu == vsx.u64[0] (both before and after your patch set).

For vscr, on master we have

void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r)
{
#if defined(HOST_WORDS_BIGENDIAN)
    env->vscr = r->u32[3];
#else
    env->vscr = r->u32[0];
#endif

and

        if (needs_byteswap) {
            vmxregset->avr[i].u64[0] = bswap64(cpu->env.avr[i].u64[1]);
            vmxregset->avr[i].u64[1] = bswap64(cpu->env.avr[i].u64[0]);
        } else {
            vmxregset->avr[i].u64[0] = cpu->env.avr[i].u64[0];
            vmxregset->avr[i].u64[1] = cpu->env.avr[i].u64[1];
        }
    }
    vmxregset->vscr.u32[3] = cpu_to_dump32(s, cpu->env.vscr);

For helper macros that apply the same operation to all lanes, it doesn't matter
which order in which the lanes are processed, so of course I would expect them
to be processed in host order.

It's cases that do not apply the same operation, such as merges, where the
problems would arise.

There are at least 3 schemes being employed to address this:

#if defined(HOST_WORDS_BIGENDIAN)
#define HI_IDX 0
#define LO_IDX 1
#define AVRB(i) u8[i]
#define AVRW(i) u32[i]
#else
#define HI_IDX 1
#define LO_IDX 0
#define AVRB(i) u8[15-(i)]
#define AVRW(i) u32[3-(i)]
#endif

...
#if defined(HOST_WORDS_BIGENDIAN)
#define EL_IDX(i) (i)
#else
#define EL_IDX(i) (3 - (i))
#endif

...
#define EL_IDX(i) (i)
#else
#define EL_IDX(i) (1 - (i))
#endif

...
#if defined(HOST_WORDS_BIGENDIAN)
        result.u8[i] = a->u8[indexA] ^ b->u8[indexB];
#else
        result.u8[i] = a->u8[15-indexA] ^ b->u8[15-indexB];
#endif


r~

Mark Cave-Ayland Jan. 3, 2019, 2:58 p.m. UTC | #9

On 18/12/2018 09:49, Mark Cave-Ayland wrote:

> Following on from this, the next patch "target/ppc: convert vsplt[bhw] to use vector

> operations" causes corruption of the OS X splash screen

> (https://www.ilande.co.uk/tmp/qemu/badapple2.png) in a way that suggests there may be

> an endian issue.


After some more digging I've found out what's going on here by dumping out the AVR
registers before and after:

Before the patch:

BEFORE:
uimm: 0  size: 2
sreg: 99 @ 0x7f54fd7157a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 99 @ 0x7f54fd715870 - 7f ff de ad 7f ff de ad 7f ff de ad 7f ff de ad
AFTER:
dreg: 99 @ 0x7f54fd715870 - 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a

BEFORE:
uimm: 1  size: 2
sreg: 99 @ 0x7f54fd7157a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 99 @ 0x7f54fd715870 - 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a
AFTER:
dreg: 99 @ 0x7f54fd715870 - 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9 1 d9


After the patch:

BEFORE:
uimm: 0  size: 2
sreg: 5 @ 0x7fe5a0c4a7a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 18 @ 0x7fe5a0c4a870 - 7f ff de ad 7f ff de ad 7f ff de ad 7f ff de ad
AFTER:
dreg: 18 @ 0x7fe5a0c4a870 - 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1

BEFORE:
uimm: 1  size: 2
sreg: 5 @ 0x7fe5a0c4a7a0 - 1 6a 1 d9 1 15 fd 63 0 0 0 0 0 0 0 0
dreg: 18 @ 0x7fe5a0c4a870 - 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1 5d 1
AFTER:
dreg: 18 @ 0x7fe5a0c4a870 - 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1 6a 1


As you can see vsplth splat is one byte off with this patch applied and the cause is
the xor in the #ifndef HOST_WORDS_BIGENDIAN block: before the xor is applied, bofs is
aligned to 2 bytes and with bofs ^ 15 the LSB is set to 1 again, introducing the 1
byte error.

Applying the following patch to mask bofs based upon the size of vece seems to fix
the issue here for me on little-endian Intel:

diff --git a/target/ppc/translate/vmx-impl.inc.c b/target/ppc/translate/vmx-impl.inc.c
index 59d3bc6e02..41ddbd879f 100644
--- a/target/ppc/translate/vmx-impl.inc.c
+++ b/target/ppc/translate/vmx-impl.inc.c
@@ -815,6 +815,7 @@ static void gen_vsplt(DisasContext *ctx, int vece)
     bofs += (uimm << vece) & 15;
 #ifndef HOST_WORDS_BIGENDIAN
     bofs ^= 15;
+    bofs &= ~((1 << vece) - 1);
 #endif

     tcg_gen_gvec_dup_mem(vece, dofs, bofs, 16, 16);


ATB,

Mark.

Mark Cave-Ayland Jan. 3, 2019, 6:31 p.m. UTC | #10

On 18/12/2018 06:38, Richard Henderson wrote:

> This implements some of the things that I talked about with Mark

> this morning / yesterday.  In particular:

> 

> (0) Implement expanders for nand, nor, eqv logical operations.

> 

> (1) Implement saturating arithmetic for the tcg backend.

> 

>     While I had expanders for these, they always went to helpers.

>     It's easy enough to expand byte and half-word operations for x86.

>     Beyond that, 32 and 64-bit operations can be expanded with integers.

> 

> (2) Implement minmax arithmetic for the tcg backend.

> 

>     While I had integral minmax operations, I had not yet added

>     any vector expanders for this.  (The integral stuff came in

>     for atomic minmax.)

> 

> (3) Trivial conversions to minmax for target/arm.

> 

> (4) Patches 11-18 are identical to Mark's.

> 

> (5) Patches 19-25 implement splat and logicals for VMX and VSX.

> 

>     VSX is no more difficult than VMX for these.  It does seem to be

>     just about everything that we can do for VSX at the momement.

> 

> (6) Patches 26-33 implement saturating arithmetic for VMX.

> 

> (7) Patch 34 implements minmax arithmetic for VMX.

> 

> I've tested the new operations via aarch64 guest, as that's the set

> of risu test cases I've got handy.  The rest is untested so far.


I've taken my previous PPC patchsets below:

[PATCH v5 0/9] target/ppc: prepare for conversion to TCG vector operations
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00063.html

[PATCH v2 0/8] target/ppc: remove various endian hacks from int_helper.c
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg06149.html

and then rebased this patchset on top of them (including a squash of the vsplt fix
posted earlier today at
https://lists.gnu.org/archive/html/qemu-devel/2019-01/msg00287.html) and pushed the
result to https://github.com/mcayland/qemu/tree/ppc-altivec-v5.5-rth.

Fixing the vsplt instruction now gives a readable display in my MacOS tests, but I'm
still seeing "shadows" such as https://www.ilande.co.uk/tmp/qemu/badapple4.png which
I've bisected down to:


commit 71f229eb331e979971a0a79e5a2fcdfb9380bd06
Author: Richard Henderson <richard.henderson@linaro.org>
Date:   Mon Dec 17 22:39:10 2018 -0800

    target/ppc: convert vadd*s and vsub*s to vector operations

    Signed-off-by: Richard Henderson <richard.henderson@linaro.org>



So looks like there's something still not quite right with the saturation flag/vector
saturation implementation.


ATB,

Mark.

Richard Henderson Jan. 4, 2019, 10:33 p.m. UTC | #11

On 1/4/19 4:31 AM, Mark Cave-Ayland wrote:
> Fixing the vsplt instruction now gives a readable display in my MacOS tests, but I'm

> still seeing "shadows" such as https://www.ilande.co.uk/tmp/qemu/badapple4.png which

> I've bisected down to:

> 

> 

> commit 71f229eb331e979971a0a79e5a2fcdfb9380bd06

> Author: Richard Henderson <richard.henderson@linaro.org>

> Date:   Mon Dec 17 22:39:10 2018 -0800

> 

>     target/ppc: convert vadd*s and vsub*s to vector operations

> 

>     Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

> 

> 

> So looks like there's something still not quite right with the saturation flag/vector

> saturation implementation.

> 


Ok, I'll try and set up some RISU tests to track this down next week.


r~

[00/34] tcg, target/ppc vector improvements

Message

Comments