mbox series

[00/10] AEGIS x86 assembly tuning

Message ID 20241007012430.163606-1-ebiggers@kernel.org
Headers show
Series AEGIS x86 assembly tuning | expand

Message

Eric Biggers Oct. 7, 2024, 1:24 a.m. UTC
This series cleans up the AES-NI optimized implementation of AEGIS-128.

Performance is improved by 1-5% depending on the input lengths.  Binary
code size is reduced by about 20% (measuring glue + assembly combined),
and source code length is reduced by about 150 lines.

The first patch also fixes a bug which could theoretically cause
incorrect behavior but was seemingly not being encountered in practice.

Note: future optimizations for AEGIS-128 could involve adding AVX512 /
AVX10 optimized assembly code.  However, unfortunately due to the way
that AEGIS-128 is specified, its level of parallelism is limited, and it
can't really take advantage of vector lengths greater than 128 bits.
So, probably this would provide only another modest improvement, mostly
coming from being able to use the ternary logic instructions.

Eric Biggers (10):
  crypto: x86/aegis128 - access 32-bit arguments as 32-bit
  crypto: x86/aegis128 - remove no-op init and exit functions
  crypto: x86/aegis128 - eliminate some indirect calls
  crypto: x86/aegis128 - don't bother with special code for aligned data
  crypto: x86/aegis128 - optimize length block preparation using SSE4.1
  crypto: x86/aegis128 - improve assembly function prototypes
  crypto: x86/aegis128 - optimize partial block handling using SSE4.1
  crypto: x86/aegis128 - take advantage of block-aligned len
  crypto: x86/aegis128 - remove unneeded FRAME_BEGIN and FRAME_END
  crypto: x86/aegis128 - remove unneeded RETs

 arch/x86/crypto/Kconfig               |   4 +-
 arch/x86/crypto/aegis128-aesni-asm.S  | 532 ++++++++++----------------
 arch/x86/crypto/aegis128-aesni-glue.c | 145 ++++---
 3 files changed, 261 insertions(+), 420 deletions(-)


base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc

Comments

Ondrej Mosnacek Oct. 15, 2024, 12:48 p.m. UTC | #1
On Mon, Oct 7, 2024 at 3:33 AM Eric Biggers <ebiggers@kernel.org> wrote:
>
> This series cleans up the AES-NI optimized implementation of AEGIS-128.
>
> Performance is improved by 1-5% depending on the input lengths.  Binary
> code size is reduced by about 20% (measuring glue + assembly combined),
> and source code length is reduced by about 150 lines.
>
> The first patch also fixes a bug which could theoretically cause
> incorrect behavior but was seemingly not being encountered in practice.
>
> Note: future optimizations for AEGIS-128 could involve adding AVX512 /
> AVX10 optimized assembly code.  However, unfortunately due to the way
> that AEGIS-128 is specified, its level of parallelism is limited, and it
> can't really take advantage of vector lengths greater than 128 bits.
> So, probably this would provide only another modest improvement, mostly
> coming from being able to use the ternary logic instructions.
>
> Eric Biggers (10):
>   crypto: x86/aegis128 - access 32-bit arguments as 32-bit
>   crypto: x86/aegis128 - remove no-op init and exit functions
>   crypto: x86/aegis128 - eliminate some indirect calls
>   crypto: x86/aegis128 - don't bother with special code for aligned data
>   crypto: x86/aegis128 - optimize length block preparation using SSE4.1
>   crypto: x86/aegis128 - improve assembly function prototypes
>   crypto: x86/aegis128 - optimize partial block handling using SSE4.1
>   crypto: x86/aegis128 - take advantage of block-aligned len
>   crypto: x86/aegis128 - remove unneeded FRAME_BEGIN and FRAME_END
>   crypto: x86/aegis128 - remove unneeded RETs
>
>  arch/x86/crypto/Kconfig               |   4 +-
>  arch/x86/crypto/aegis128-aesni-asm.S  | 532 ++++++++++----------------
>  arch/x86/crypto/aegis128-aesni-glue.c | 145 ++++---
>  3 files changed, 261 insertions(+), 420 deletions(-)
>
>
> base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
> --
> 2.46.2
>

Nice work!

Notwithstanding my non-blocking comment on patch #3:

Reviewed-by: Ondrej Mosnacek <omosnace@redhat.com>

--
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.