mbox series

[v3,00/10] crypto - AES for ARM/arm64 updates for v4.11 (round #2)

Message ID 1485645939-17126-1-git-send-email-ard.biesheuvel@linaro.org
Headers show
Series crypto - AES for ARM/arm64 updates for v4.11 (round #2) | expand

Message

Ard Biesheuvel Jan. 28, 2017, 11:25 p.m. UTC
Patch #1 is a fix for the CBC chaining issue that was discussed on the
mailing list. The driver itself is queued for v4.11, so this fix can go
right on top.

Patches #2 - #6 clear the cra_alignmasks of various drivers: all NEON
capable CPUs can perform unaligned accesses, and the advantage of using
the slightly faster aligned accessors (which only exist on ARM not arm64)
is certainly outweighed by the cost of copying data to suitably aligned
buffers.

NOTE: patch #5 won't apply unless 'crypto: arm64/aes-blk - honour iv_out
requirement in CBC and CTR modes' is applied first, which was sent out
separately as a bugfix for v3.16 - v4.9. If this is a problem, this patch
can wait.

Patch #7 and #8 are minor tweaks to the new scalar AES code.

Patch #9 improves the performance of the plain NEON AES code, to make it
more suitable as a fallback for the new bitsliced NEON code, which can
only operate on 8 blocks in parallel, and needs another driver to perform
CBC encryption or XTS tweak generation.

Patch #10 updates the new bitsliced AES NEON code to switch to the plain
NEON driver as a fallback.

Patches #9 and #10 improve the performance of CBC encryption by ~35% on
low end cores such as the Cortex-A53 that can be found in the Raspberry Pi3

Changes since v2:
- use polynomial multiply NEON instruction for multiplication by x^2, this
  eliminates 4 instructions from the decrypt path (#9)

Changes since v1:
- shave off another few cycles from the sequential AES NEON code (patch #9)

Ard Biesheuvel (10):
  crypto: arm64/aes-neon-bs - honour iv_out requirement in CTR mode
  crypto: arm/aes-ce - remove cra_alignmask
  crypto: arm/chacha20 - remove cra_alignmask
  crypto: arm64/aes-ce-ccm - remove cra_alignmask
  crypto: arm64/aes-blk - remove cra_alignmask
  crypto: arm64/chacha20 - remove cra_alignmask
  crypto: arm64/aes - avoid literals for cross-module symbol references
  crypto: arm64/aes - performance tweak
  crypto: arm64/aes-neon-blk - tweak performance for low end cores
  crypto: arm64/aes - replace scalar fallback with plain NEON fallback

 arch/arm/crypto/aes-ce-core.S          |  84 ++++---
 arch/arm/crypto/aes-ce-glue.c          |  15 +-
 arch/arm/crypto/chacha20-neon-glue.c   |   1 -
 arch/arm64/crypto/Kconfig              |   2 +-
 arch/arm64/crypto/aes-ce-ccm-glue.c    |   1 -
 arch/arm64/crypto/aes-cipher-core.S    |  59 ++---
 arch/arm64/crypto/aes-glue.c           |  18 +-
 arch/arm64/crypto/aes-modes.S          |   8 +-
 arch/arm64/crypto/aes-neon.S           | 235 +++++++++-----------
 arch/arm64/crypto/aes-neonbs-core.S    |  25 ++-
 arch/arm64/crypto/aes-neonbs-glue.c    |  38 +++-
 arch/arm64/crypto/chacha20-neon-glue.c |   1 -
 12 files changed, 224 insertions(+), 263 deletions(-)

-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Herbert Xu Feb. 3, 2017, 10:22 a.m. UTC | #1
On Sat, Jan 28, 2017 at 11:25:29PM +0000, Ard Biesheuvel wrote:
> Patch #1 is a fix for the CBC chaining issue that was discussed on the

> mailing list. The driver itself is queued for v4.11, so this fix can go

> right on top.

> 

> Patches #2 - #6 clear the cra_alignmasks of various drivers: all NEON

> capable CPUs can perform unaligned accesses, and the advantage of using

> the slightly faster aligned accessors (which only exist on ARM not arm64)

> is certainly outweighed by the cost of copying data to suitably aligned

> buffers.

> 

> NOTE: patch #5 won't apply unless 'crypto: arm64/aes-blk - honour iv_out

> requirement in CBC and CTR modes' is applied first, which was sent out

> separately as a bugfix for v3.16 - v4.9. If this is a problem, this patch

> can wait.

> 

> Patch #7 and #8 are minor tweaks to the new scalar AES code.

> 

> Patch #9 improves the performance of the plain NEON AES code, to make it

> more suitable as a fallback for the new bitsliced NEON code, which can

> only operate on 8 blocks in parallel, and needs another driver to perform

> CBC encryption or XTS tweak generation.

> 

> Patch #10 updates the new bitsliced AES NEON code to switch to the plain

> NEON driver as a fallback.

> 

> Patches #9 and #10 improve the performance of CBC encryption by ~35% on

> low end cores such as the Cortex-A53 that can be found in the Raspberry Pi3

> 

> Changes since v2:

> - use polynomial multiply NEON instruction for multiplication by x^2, this

>   eliminates 4 instructions from the decrypt path (#9)

> 

> Changes since v1:

> - shave off another few cycles from the sequential AES NEON code (patch #9)


Patches 2-10 applied.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt