mbox series

[RFC,for-8.2,00/18] crypto: Provide clmul.h and host accel

Message ID 20230713211435.13505-1-richard.henderson@linaro.org
Headers show
Series crypto: Provide clmul.h and host accel | expand

Message

Richard Henderson July 13, 2023, 9:14 p.m. UTC
Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
carry-less multiply under emulation.

This is less polished than the AES patch set:

(1) Should I split HAVE_CLMUL_ACCEL into per-width HAVE_CLMUL{N}_ACCEL?
    The "_generic" and "_accel" split is different from aes-round.h
    because of the difference in support for different widths, and it
    means that each host accel has more boilerplate.

(2) Should I bother trying to accelerate anything other than 64x64->128?
    That seems to be the one that GSM really wants anyway.  I'd keep all
    of the sizes implemented generically, since that centralizes the 3
    target implementations.

(3) The use of Int128 isn't fantastic -- better would be a vector type,
    though that has its own special problems for ppc64le (see the
    endianness hoops within aes-round.h).  Perhaps leave things in
    env memory, like I was mostly able to do with AES?

(4) No guest test case(s).


r~


[1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/

Richard Henderson (18):
  crypto: Add generic 8-bit carry-less multiply routines
  target/arm: Use clmul_8* routines
  target/s390x: Use clmul_8* routines
  target/ppc: Use clmul_8* routines
  crypto: Add generic 16-bit carry-less multiply routines
  target/arm: Use clmul_16* routines
  target/s390x: Use clmul_16* routines
  target/ppc: Use clmul_16* routines
  crypto: Add generic 32-bit carry-less multiply routines
  target/arm: Use clmul_32* routines
  target/s390x: Use clmul_32* routines
  target/ppc: Use clmul_32* routines
  crypto: Add generic 64-bit carry-less multiply routine
  target/arm: Use clmul_64
  target/s390x: Use clmul_64
  target/ppc: Use clmul_64
  host/include/i386: Implement clmul.h
  host/include/aarch64: Implement clmul.h

 host/include/aarch64/host/cpuinfo.h      |   1 +
 host/include/aarch64/host/crypto/clmul.h | 230 +++++++++++++++++++++++
 host/include/generic/host/crypto/clmul.h |  28 +++
 host/include/i386/host/cpuinfo.h         |   1 +
 host/include/i386/host/crypto/clmul.h    | 187 ++++++++++++++++++
 host/include/x86_64/host/crypto/clmul.h  |   1 +
 include/crypto/clmul.h                   | 123 ++++++++++++
 target/arm/tcg/vec_internal.h            |  11 --
 crypto/clmul.c                           | 163 ++++++++++++++++
 target/arm/tcg/mve_helper.c              |  16 +-
 target/arm/tcg/vec_helper.c              | 112 ++---------
 target/ppc/int_helper.c                  |  63 +++----
 target/s390x/tcg/vec_int_helper.c        | 175 +++++++----------
 util/cpuinfo-aarch64.c                   |   4 +-
 util/cpuinfo-i386.c                      |   1 +
 crypto/meson.build                       |   9 +-
 16 files changed, 865 insertions(+), 260 deletions(-)
 create mode 100644 host/include/aarch64/host/crypto/clmul.h
 create mode 100644 host/include/generic/host/crypto/clmul.h
 create mode 100644 host/include/i386/host/crypto/clmul.h
 create mode 100644 host/include/x86_64/host/crypto/clmul.h
 create mode 100644 include/crypto/clmul.h
 create mode 100644 crypto/clmul.c

Comments

Ard Biesheuvel Aug. 3, 2023, 2:02 p.m. UTC | #1
On Thu, 13 Jul 2023 at 23:14, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Inspired by Ard Biesheuvel's RFC patches [1] for accelerating
> carry-less multiply under emulation.
>
> This is less polished than the AES patch set:
>
> (1) Should I split HAVE_CLMUL_ACCEL into per-width HAVE_CLMUL{N}_ACCEL?
>     The "_generic" and "_accel" split is different from aes-round.h
>     because of the difference in support for different widths, and it
>     means that each host accel has more boilerplate.
>
> (2) Should I bother trying to accelerate anything other than 64x64->128?

That is the only compelling use case afaict.

>     That seems to be the one that GSM really wants anyway.  I'd keep all
>     of the sizes implemented generically, since that centralizes the 3
>     target implementations.
>
> (3) The use of Int128 isn't fantastic -- better would be a vector type,
>     though that has its own special problems for ppc64le (see the
>     endianness hoops within aes-round.h).  Perhaps leave things in
>     env memory, like I was mostly able to do with AES?
>
> (4) No guest test case(s).
>
>
> r~
>
>
> [1] https://patchew.org/QEMU/20230601123332.3297404-1-ardb@kernel.org/
>
> Richard Henderson (18):
>   crypto: Add generic 8-bit carry-less multiply routines
>   target/arm: Use clmul_8* routines
>   target/s390x: Use clmul_8* routines
>   target/ppc: Use clmul_8* routines
>   crypto: Add generic 16-bit carry-less multiply routines
>   target/arm: Use clmul_16* routines
>   target/s390x: Use clmul_16* routines
>   target/ppc: Use clmul_16* routines
>   crypto: Add generic 32-bit carry-less multiply routines
>   target/arm: Use clmul_32* routines
>   target/s390x: Use clmul_32* routines
>   target/ppc: Use clmul_32* routines
>   crypto: Add generic 64-bit carry-less multiply routine
>   target/arm: Use clmul_64
>   target/s390x: Use clmul_64
>   target/ppc: Use clmul_64
>   host/include/i386: Implement clmul.h
>   host/include/aarch64: Implement clmul.h
>
>  host/include/aarch64/host/cpuinfo.h      |   1 +
>  host/include/aarch64/host/crypto/clmul.h | 230 +++++++++++++++++++++++
>  host/include/generic/host/crypto/clmul.h |  28 +++
>  host/include/i386/host/cpuinfo.h         |   1 +
>  host/include/i386/host/crypto/clmul.h    | 187 ++++++++++++++++++
>  host/include/x86_64/host/crypto/clmul.h  |   1 +
>  include/crypto/clmul.h                   | 123 ++++++++++++
>  target/arm/tcg/vec_internal.h            |  11 --
>  crypto/clmul.c                           | 163 ++++++++++++++++
>  target/arm/tcg/mve_helper.c              |  16 +-
>  target/arm/tcg/vec_helper.c              | 112 ++---------
>  target/ppc/int_helper.c                  |  63 +++----
>  target/s390x/tcg/vec_int_helper.c        | 175 +++++++----------
>  util/cpuinfo-aarch64.c                   |   4 +-
>  util/cpuinfo-i386.c                      |   1 +
>  crypto/meson.build                       |   9 +-
>  16 files changed, 865 insertions(+), 260 deletions(-)
>  create mode 100644 host/include/aarch64/host/crypto/clmul.h
>  create mode 100644 host/include/generic/host/crypto/clmul.h
>  create mode 100644 host/include/i386/host/crypto/clmul.h
>  create mode 100644 host/include/x86_64/host/crypto/clmul.h
>  create mode 100644 include/crypto/clmul.h
>  create mode 100644 crypto/clmul.c
>
> --
> 2.34.1
>