From patchwork Mon Dec  4 12:26:29 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120513
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365400qgn;
 Mon, 4 Dec 2017 04:27:08 -0800 (PST)
X-Google-Smtp-Source: AGs4zMZGMUwv2vsjerxYWolqSO45RGo9CH4nVRYGPQP45Wa44ZoY4/E5eWCDSmwNTCdTgGlKJ9mW
X-Received: by 10.101.78.7 with SMTP id r7mr13929043pgt.209.1512390428706;
 Mon, 04 Dec 2017 04:27:08 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390428; cv=none;
 d=google.com; s=arc-20160816;
 b=Z5Y1NeVgpwaaplXJRM+BAfsO97fioW4agsrKqv/DSXYDEs8ygKiWZw0707G2fb2v7L
 PEHtTSaGMxkCbDPBkFv83XutsmXpYIddFWhOo3bDHwIKIJ5xoAYkJbmiFCDzDD9TFvFG
 eAjn/QZeVohOlNDxV02BW2IhItPXMjIBS0/F3RNE8e0H/3xsghqWBeqbTOMixlsm0EjF
 IsTzv1JO/Gka1nBL0HF9i+8NHO1kmVdubbndgEzY8g+valXH9/mLEbjjkMnm1rb0Jg1R
 VCPs6TPSZpTpRjCPZnvSr8H6V32jLB1GJ6Luz2AFzsjp6vmWbwVuhrwm/SAN1ecKu/Kg
 cZ5Q==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=;
 b=uyDGeOKNRA4jlVM5QpT00Zx4pY4jvQZ8vYc3McfTm6LvZzj70TvxkY6aWJAPa0KZJk
 +jOD6B1fEqtFlJccN69m/EQWuNhIeJupmNqzn+/vd/tY83eZcAZr4s6YBMDn6pdCrQVK
 NJIyElvN1op9C3xroCVHiM65cqKBZ241WLaOAN9Amh1nXvNEVJcQQVDPJ48bO49qzg2w
 DYtpO0zUsqEvp20iyB902BxeybNtiZEzLlAvVh1LhajkIZ7Qu9MCe06S3m0ock40rFmK
 KdaitxnGi07iCYhq+moiqjtxvOxDsaIoyvatS0J1oGKsBHbZ+ltD5XFZxIWCaZqKPOzp
 qL1g==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=B9pErP/O;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id ay5si7779699plb.11.2017.12.04.04.27.08;
 Mon, 04 Dec 2017 04:27:08 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=B9pErP/O;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753092AbdLDM1H (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:07 -0500
Received: from mail-wr0-f195.google.com ([209.85.128.195]:38014 "EHLO
 mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753138AbdLDM1F (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:05 -0500
Received: by mail-wr0-f195.google.com with SMTP id o2so17065498wro.5
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:04 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=;
 b=B9pErP/OiJwnsmBwbJ4+Go3Psp+3iBem0bxBs7B7cM8edylLk8wXVDJnY+ItBz0xgc
 EbG+l0otxPACT0ZYqSOGAFv+EYb67I26YS8hBH/ShQxPENtKO2I3ftR/Y+e2DG9x7I14
 DfPEmqfCoypggHc9ldiGjzJJQsEYqUHx79BFw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=;
 b=b5gR1zUbIW8dUPt7MEq28onVzRNFiUR4sb1MyF1dnFNKPrAHgEMv2CwW5RnJuyFl1a
 FRMw5sxo+qb2tkeKQIqNEWTA4LbghYV8AitHIDzaGDAa/QeGUnKEhWwMNLx1qOXyLt3p
 uHo/3uwn5JtDzE2sWg8BFK9z/SIOlYsqnVnKUFu2NYyDJtK6qFUqPHrUPDsv1ycEX+QK
 dHG5OUbK4DeNyFm64Oc4uihtNH4mc2IM/iI7U6WGtAlOfzkkLXr/vXsIQRFY0ZlUxyzz
 kuUzxLHIdNj8k6JxtKq+SGiZiJjch8VowMR59wz9Mge+zd3fMGdYTkdABorR6hSxovxo
 8gzw==
X-Gm-Message-State: AJaThX41UN6IGOHGiNFVQuQRetr31gHqC6lyTFWDXVSGFLERTQTjT18K
 VrnArKED5Q0umtBGNXk38fJQawtbpT0=
X-Received: by 10.223.166.51 with SMTP id k48mr12424280wrc.125.1512390423491; 
 Mon, 04 Dec 2017 04:27:03 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.00
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:02 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 03/19] crypto: arm64/aes-blk - move kernel mode neon
 en/disable into loop
Date: Mon,  4 Dec 2017 12:26:29 +0000
Message-Id: <20171204122645.31535-4-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Note that this requires some reshuffling of the registers in the asm
code, because the XTS routines can no longer rely on the registers to
retain their contents between invocations.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-glue.c        | 95 ++++++++++----------
 arch/arm64/crypto/aes-modes.S       | 90 +++++++++----------
 arch/arm64/crypto/aes-neonbs-glue.c | 14 ++-
 3 files changed, 97 insertions(+), 102 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index 998ba519a026..00a3e2fd6a48 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2");
 
 /* defined in aes-modes.S */
 asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[],
-				int rounds, int blocks, int first);
+				int rounds, int blocks);
 asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[],
-				int rounds, int blocks, int first);
+				int rounds, int blocks);
 
 asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[],
-				int rounds, int blocks, u8 iv[], int first);
+				int rounds, int blocks, u8 iv[]);
 asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[],
-				int rounds, int blocks, u8 iv[], int first);
+				int rounds, int blocks, u8 iv[]);
 
 asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
-				int rounds, int blocks, u8 ctr[], int first);
+				int rounds, int blocks, u8 ctr[]);
 
 asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[],
 				int rounds, int blocks, u8 const rk2[], u8 iv[],
@@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err, first, rounds = 6 + ctx->key_length / 4;
+	int err, rounds = 6 + ctx->key_length / 4;
 	struct skcipher_walk walk;
 	unsigned int blocks;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
-	for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+	while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
+		kernel_neon_begin();
 		aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				(u8 *)ctx->key_enc, rounds, blocks, first);
+				(u8 *)ctx->key_enc, rounds, blocks);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 	return err;
 }
 
@@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err, first, rounds = 6 + ctx->key_length / 4;
+	int err, rounds = 6 + ctx->key_length / 4;
 	struct skcipher_walk walk;
 	unsigned int blocks;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
-	for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+	while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
+		kernel_neon_begin();
 		aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				(u8 *)ctx->key_dec, rounds, blocks, first);
+				(u8 *)ctx->key_dec, rounds, blocks);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 	return err;
 }
 
@@ -173,20 +173,19 @@ static int cbc_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err, first, rounds = 6 + ctx->key_length / 4;
+	int err, rounds = 6 + ctx->key_length / 4;
 	struct skcipher_walk walk;
 	unsigned int blocks;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
-	for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+	while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
+		kernel_neon_begin();
 		aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				(u8 *)ctx->key_enc, rounds, blocks, walk.iv,
-				first);
+				(u8 *)ctx->key_enc, rounds, blocks, walk.iv);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 	return err;
 }
 
@@ -194,20 +193,19 @@ static int cbc_decrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err, first, rounds = 6 + ctx->key_length / 4;
+	int err, rounds = 6 + ctx->key_length / 4;
 	struct skcipher_walk walk;
 	unsigned int blocks;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
-	for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+	while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
+		kernel_neon_begin();
 		aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				(u8 *)ctx->key_dec, rounds, blocks, walk.iv,
-				first);
+				(u8 *)ctx->key_dec, rounds, blocks, walk.iv);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 	return err;
 }
 
@@ -215,20 +213,18 @@ static int ctr_encrypt(struct skcipher_request *req)
 {
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm);
-	int err, first, rounds = 6 + ctx->key_length / 4;
+	int err, rounds = 6 + ctx->key_length / 4;
 	struct skcipher_walk walk;
 	int blocks;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	first = 1;
-	kernel_neon_begin();
 	while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) {
+		kernel_neon_begin();
 		aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				(u8 *)ctx->key_enc, rounds, blocks, walk.iv,
-				first);
+				(u8 *)ctx->key_enc, rounds, blocks, walk.iv);
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
-		first = 0;
+		kernel_neon_end();
 	}
 	if (walk.nbytes) {
 		u8 __aligned(8) tail[AES_BLOCK_SIZE];
@@ -241,12 +237,13 @@ static int ctr_encrypt(struct skcipher_request *req)
 		 */
 		blocks = -1;
 
+		kernel_neon_begin();
 		aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds,
-				blocks, walk.iv, first);
+				blocks, walk.iv);
+		kernel_neon_end();
 		crypto_xor_cpy(tdst, tsrc, tail, nbytes);
 		err = skcipher_walk_done(&walk, 0);
 	}
-	kernel_neon_end();
 
 	return err;
 }
@@ -270,16 +267,16 @@ static int xts_encrypt(struct skcipher_request *req)
 	struct skcipher_walk walk;
 	unsigned int blocks;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
 	for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+		kernel_neon_begin();
 		aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
 				(u8 *)ctx->key1.key_enc, rounds, blocks,
 				(u8 *)ctx->key2.key_enc, walk.iv, first);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 
 	return err;
 }
@@ -292,16 +289,16 @@ static int xts_decrypt(struct skcipher_request *req)
 	struct skcipher_walk walk;
 	unsigned int blocks;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
 	for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) {
+		kernel_neon_begin();
 		aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
 				(u8 *)ctx->key1.key_dec, rounds, blocks,
 				(u8 *)ctx->key2.key_enc, walk.iv, first);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 
 	return err;
 }
@@ -425,7 +422,7 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key,
 
 	/* encrypt the zero vector */
 	kernel_neon_begin();
-	aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1, 1);
+	aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1);
 	kernel_neon_end();
 
 	cmac_gf128_mul_by_x(consts, consts);
@@ -454,8 +451,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key,
 		return err;
 
 	kernel_neon_begin();
-	aes_ecb_encrypt(key, ks[0], rk, rounds, 1, 1);
-	aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2, 0);
+	aes_ecb_encrypt(key, ks[0], rk, rounds, 1);
+	aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2);
 	kernel_neon_end();
 
 	return cbcmac_setkey(tfm, key, sizeof(key));
diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index 2674d43d1384..65b273667b34 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -40,24 +40,24 @@
 #if INTERLEAVE == 2
 
 aes_encrypt_block2x:
-	encrypt_block2x	v0, v1, w3, x2, x6, w7
+	encrypt_block2x	v0, v1, w3, x2, x8, w7
 	ret
 ENDPROC(aes_encrypt_block2x)
 
 aes_decrypt_block2x:
-	decrypt_block2x	v0, v1, w3, x2, x6, w7
+	decrypt_block2x	v0, v1, w3, x2, x8, w7
 	ret
 ENDPROC(aes_decrypt_block2x)
 
 #elif INTERLEAVE == 4
 
 aes_encrypt_block4x:
-	encrypt_block4x	v0, v1, v2, v3, w3, x2, x6, w7
+	encrypt_block4x	v0, v1, v2, v3, w3, x2, x8, w7
 	ret
 ENDPROC(aes_encrypt_block4x)
 
 aes_decrypt_block4x:
-	decrypt_block4x	v0, v1, v2, v3, w3, x2, x6, w7
+	decrypt_block4x	v0, v1, v2, v3, w3, x2, x8, w7
 	ret
 ENDPROC(aes_decrypt_block4x)
 
@@ -86,33 +86,32 @@ ENDPROC(aes_decrypt_block4x)
 #define FRAME_POP
 
 	.macro		do_encrypt_block2x
-	encrypt_block2x	v0, v1, w3, x2, x6, w7
+	encrypt_block2x	v0, v1, w3, x2, x8, w7
 	.endm
 
 	.macro		do_decrypt_block2x
-	decrypt_block2x	v0, v1, w3, x2, x6, w7
+	decrypt_block2x	v0, v1, w3, x2, x8, w7
 	.endm
 
 	.macro		do_encrypt_block4x
-	encrypt_block4x	v0, v1, v2, v3, w3, x2, x6, w7
+	encrypt_block4x	v0, v1, v2, v3, w3, x2, x8, w7
 	.endm
 
 	.macro		do_decrypt_block4x
-	decrypt_block4x	v0, v1, v2, v3, w3, x2, x6, w7
+	decrypt_block4x	v0, v1, v2, v3, w3, x2, x8, w7
 	.endm
 
 #endif
 
 	/*
 	 * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
-	 *		   int blocks, int first)
+	 *		   int blocks)
 	 * aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
-	 *		   int blocks, int first)
+	 *		   int blocks)
 	 */
 
 AES_ENTRY(aes_ecb_encrypt)
 	FRAME_PUSH
-	cbz		w5, .LecbencloopNx
 
 	enc_prepare	w3, x2, x5
 
@@ -148,7 +147,6 @@ AES_ENDPROC(aes_ecb_encrypt)
 
 AES_ENTRY(aes_ecb_decrypt)
 	FRAME_PUSH
-	cbz		w5, .LecbdecloopNx
 
 	dec_prepare	w3, x2, x5
 
@@ -184,14 +182,12 @@ AES_ENDPROC(aes_ecb_decrypt)
 
 	/*
 	 * aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
-	 *		   int blocks, u8 iv[], int first)
+	 *		   int blocks, u8 iv[])
 	 * aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
-	 *		   int blocks, u8 iv[], int first)
+	 *		   int blocks, u8 iv[])
 	 */
 
 AES_ENTRY(aes_cbc_encrypt)
-	cbz		w6, .Lcbcencloop
-
 	ld1		{v0.16b}, [x5]			/* get iv */
 	enc_prepare	w3, x2, x6
 
@@ -209,7 +205,6 @@ AES_ENDPROC(aes_cbc_encrypt)
 
 AES_ENTRY(aes_cbc_decrypt)
 	FRAME_PUSH
-	cbz		w6, .LcbcdecloopNx
 
 	ld1		{v7.16b}, [x5]			/* get iv */
 	dec_prepare	w3, x2, x6
@@ -264,20 +259,19 @@ AES_ENDPROC(aes_cbc_decrypt)
 
 	/*
 	 * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
-	 *		   int blocks, u8 ctr[], int first)
+	 *		   int blocks, u8 ctr[])
 	 */
 
 AES_ENTRY(aes_ctr_encrypt)
 	FRAME_PUSH
-	cbz		w6, .Lctrnotfirst	/* 1st time around? */
+
 	enc_prepare	w3, x2, x6
 	ld1		{v4.16b}, [x5]
 
-.Lctrnotfirst:
-	umov		x8, v4.d[1]		/* keep swabbed ctr in reg */
-	rev		x8, x8
+	umov		x6, v4.d[1]		/* keep swabbed ctr in reg */
+	rev		x6, x6
 #if INTERLEAVE >= 2
-	cmn		w8, w4			/* 32 bit overflow? */
+	cmn		w6, w4			/* 32 bit overflow? */
 	bcs		.Lctrloop
 .LctrloopNx:
 	subs		w4, w4, #INTERLEAVE
@@ -285,11 +279,11 @@ AES_ENTRY(aes_ctr_encrypt)
 #if INTERLEAVE == 2
 	mov		v0.8b, v4.8b
 	mov		v1.8b, v4.8b
-	rev		x7, x8
-	add		x8, x8, #1
+	rev		x7, x6
+	add		x6, x6, #1
 	ins		v0.d[1], x7
-	rev		x7, x8
-	add		x8, x8, #1
+	rev		x7, x6
+	add		x6, x6, #1
 	ins		v1.d[1], x7
 	ld1		{v2.16b-v3.16b}, [x1], #32	/* get 2 input blocks */
 	do_encrypt_block2x
@@ -298,7 +292,7 @@ AES_ENTRY(aes_ctr_encrypt)
 	st1		{v0.16b-v1.16b}, [x0], #32
 #else
 	ldr		q8, =0x30000000200000001	/* addends 1,2,3[,0] */
-	dup		v7.4s, w8
+	dup		v7.4s, w6
 	mov		v0.16b, v4.16b
 	add		v7.4s, v7.4s, v8.4s
 	mov		v1.16b, v4.16b
@@ -316,9 +310,9 @@ AES_ENTRY(aes_ctr_encrypt)
 	eor		v2.16b, v7.16b, v2.16b
 	eor		v3.16b, v5.16b, v3.16b
 	st1		{v0.16b-v3.16b}, [x0], #64
-	add		x8, x8, #INTERLEAVE
+	add		x6, x6, #INTERLEAVE
 #endif
-	rev		x7, x8
+	rev		x7, x6
 	ins		v4.d[1], x7
 	cbz		w4, .Lctrout
 	b		.LctrloopNx
@@ -328,10 +322,10 @@ AES_ENTRY(aes_ctr_encrypt)
 #endif
 .Lctrloop:
 	mov		v0.16b, v4.16b
-	encrypt_block	v0, w3, x2, x6, w7
+	encrypt_block	v0, w3, x2, x8, w7
 
-	adds		x8, x8, #1		/* increment BE ctr */
-	rev		x7, x8
+	adds		x6, x6, #1		/* increment BE ctr */
+	rev		x7, x6
 	ins		v4.d[1], x7
 	bcs		.Lctrcarry		/* overflow? */
 
@@ -385,15 +379,17 @@ CPU_BE(	.quad		0x87, 1		)
 
 AES_ENTRY(aes_xts_encrypt)
 	FRAME_PUSH
-	cbz		w7, .LxtsencloopNx
-
 	ld1		{v4.16b}, [x6]
-	enc_prepare	w3, x5, x6
-	encrypt_block	v4, w3, x5, x6, w7		/* first tweak */
-	enc_switch_key	w3, x2, x6
+	cbz		w7, .Lxtsencnotfirst
+
+	enc_prepare	w3, x5, x8
+	encrypt_block	v4, w3, x5, x8, w7		/* first tweak */
+	enc_switch_key	w3, x2, x8
 	ldr		q7, .Lxts_mul_x
 	b		.LxtsencNx
 
+.Lxtsencnotfirst:
+	enc_prepare	w3, x2, x8
 .LxtsencloopNx:
 	ldr		q7, .Lxts_mul_x
 	next_tweak	v4, v4, v7, v8
@@ -442,7 +438,7 @@ AES_ENTRY(aes_xts_encrypt)
 .Lxtsencloop:
 	ld1		{v1.16b}, [x1], #16
 	eor		v0.16b, v1.16b, v4.16b
-	encrypt_block	v0, w3, x2, x6, w7
+	encrypt_block	v0, w3, x2, x8, w7
 	eor		v0.16b, v0.16b, v4.16b
 	st1		{v0.16b}, [x0], #16
 	subs		w4, w4, #1
@@ -450,6 +446,7 @@ AES_ENTRY(aes_xts_encrypt)
 	next_tweak	v4, v4, v7, v8
 	b		.Lxtsencloop
 .Lxtsencout:
+	st1		{v4.16b}, [x6]
 	FRAME_POP
 	ret
 AES_ENDPROC(aes_xts_encrypt)
@@ -457,15 +454,17 @@ AES_ENDPROC(aes_xts_encrypt)
 
 AES_ENTRY(aes_xts_decrypt)
 	FRAME_PUSH
-	cbz		w7, .LxtsdecloopNx
-
 	ld1		{v4.16b}, [x6]
-	enc_prepare	w3, x5, x6
-	encrypt_block	v4, w3, x5, x6, w7		/* first tweak */
-	dec_prepare	w3, x2, x6
+	cbz		w7, .Lxtsdecnotfirst
+
+	enc_prepare	w3, x5, x8
+	encrypt_block	v4, w3, x5, x8, w7		/* first tweak */
+	dec_prepare	w3, x2, x8
 	ldr		q7, .Lxts_mul_x
 	b		.LxtsdecNx
 
+.Lxtsdecnotfirst:
+	dec_prepare	w3, x2, x8
 .LxtsdecloopNx:
 	ldr		q7, .Lxts_mul_x
 	next_tweak	v4, v4, v7, v8
@@ -514,7 +513,7 @@ AES_ENTRY(aes_xts_decrypt)
 .Lxtsdecloop:
 	ld1		{v1.16b}, [x1], #16
 	eor		v0.16b, v1.16b, v4.16b
-	decrypt_block	v0, w3, x2, x6, w7
+	decrypt_block	v0, w3, x2, x8, w7
 	eor		v0.16b, v0.16b, v4.16b
 	st1		{v0.16b}, [x0], #16
 	subs		w4, w4, #1
@@ -522,6 +521,7 @@ AES_ENTRY(aes_xts_decrypt)
 	next_tweak	v4, v4, v7, v8
 	b		.Lxtsdecloop
 .Lxtsdecout:
+	st1		{v4.16b}, [x6]
 	FRAME_POP
 	ret
 AES_ENDPROC(aes_xts_decrypt)
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index c55d68ccb89f..9d823c77ec84 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -46,10 +46,9 @@ asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[],
 
 /* borrowed from aes-neon-blk.ko */
 asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[],
-				     int rounds, int blocks, int first);
+				     int rounds, int blocks);
 asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[],
-				     int rounds, int blocks, u8 iv[],
-				     int first);
+				     int rounds, int blocks, u8 iv[]);
 
 struct aesbs_ctx {
 	u8	rk[13 * (8 * AES_BLOCK_SIZE) + 32];
@@ -157,7 +156,7 @@ static int cbc_encrypt(struct skcipher_request *req)
 	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
 	struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm);
 	struct skcipher_walk walk;
-	int err, first = 1;
+	int err;
 
 	err = skcipher_walk_virt(&walk, req, true);
 
@@ -167,10 +166,9 @@ static int cbc_encrypt(struct skcipher_request *req)
 
 		/* fall back to the non-bitsliced NEON implementation */
 		neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
-				     ctx->enc, ctx->key.rounds, blocks, walk.iv,
-				     first);
+				     ctx->enc, ctx->key.rounds, blocks,
+				     walk.iv);
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
-		first = 0;
 	}
 	kernel_neon_end();
 	return err;
@@ -311,7 +309,7 @@ static int __xts_crypt(struct skcipher_request *req,
 	kernel_neon_begin();
 
 	neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey,
-			     ctx->key.rounds, 1, 1);
+			     ctx->key.rounds, 1);
 
 	while (walk.nbytes >= AES_BLOCK_SIZE) {
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;

From patchwork Mon Dec  4 12:26:30 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120514
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365477qgn;
 Mon, 4 Dec 2017 04:27:13 -0800 (PST)
X-Google-Smtp-Source: AGs4zMZc5JRnH9Ati0dWxkEj0qb7YV05QOJyVwYDs0KSSfLd3LUSqnoX1Ny8GO4wLtjZAe/eW2co
X-Received: by 10.84.169.67 with SMTP id g61mr14309338plb.152.1512390433364; 
 Mon, 04 Dec 2017 04:27:13 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390433; cv=none;
 d=google.com; s=arc-20160816;
 b=oOpJCn2/oThPMsm7RdN5RM0cG3cazCnh2vPmpCNiYXxoXBCMQiTpO0OGMNmrcHpqnC
 8wAmeeyB+ZUBrtFIk7lqsM+QOhTBveCtLUXIbTcn1ilpi10Im0NEGy3qUFeL8CyGECNy
 0Vwhyma4qu7xNe497A4K8yLXbJONuYgINnVO3CWZ+9o1R7WJ+tKR7UqpGXnP2rfsWQV+
 on5nu9Mmf4+s/9VNkhnsXGXH5FJ3cCjATNBgKtBcjKOHm8+92EaRvMbV+5+rRAhLVdNO
 JSgrr660O0ZY1c6PUEIgcpNvdTs4r3v/bOeR0ZwetbTT5mAPGa6pk3BgvhU9dGrYtbkI
 GHsw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=;
 b=sWcxZ7km1eJ5SCLUYQftQER0UI1p/7fd6bNUaSW4eWybqnlA1rCc0ck6T6gV5b0KeO
 DaTydBVwyXY/8U6rlWroHQWMqtxQ4m1LrA3C2a4BQDyUKVLPP/EccCGi+VtFHwtLeXYT
 gHYtbsOo6s9+BPGq4n8XVztL04vKG2C9UykrCIFn/Tp/a7dGPWTRhmYs8n8yI/zU9/5G
 oB508/SsPWSag81WQ7rvU0nGx5oW7hmYxP1vJQyC9YyooqXvE69/5iyS3iWG/jtB2JbU
 xILdoUtLnyotScJtDDm0Bmtq20vZvaIaVDUGar675qmnmw7pMr2IiO41nhmJQoO734Sl
 R6Ew==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=c0ihMOSb;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id ay5si7779699plb.11.2017.12.04.04.27.13;
 Mon, 04 Dec 2017 04:27:13 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=c0ihMOSb;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753157AbdLDM1L (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:11 -0500
Received: from mail-wm0-f65.google.com ([74.125.82.65]:46594 "EHLO
 mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753136AbdLDM1I (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:08 -0500
Received: by mail-wm0-f65.google.com with SMTP id r78so5466675wme.5
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=;
 b=c0ihMOSbcJS4ttMKW2fKt4ehWzuMtyfqzB8iY1U66vdVdy+NPdOS5bZzgff8X7hMX4
 lmJpEbBEOGT0fgAYOzKIBotc9uNQPNO1MS6sa+ScITxzVjQ+5923YyqSVXKRs778/7Ux
 TpEWYr3qrZImdlo6Y0bCwRvnkLSMjfS5KYPhA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=;
 b=la9MPAiUXUqiD6KpcCsjbb89XgZkMN6Myx1uDf36aEIDpFpou4vonDvsdg21yL4B2t
 q3V00Go3kvd0oNv6H0ury7GZy6vGF88JIffdbxll9u2Y6/6Q0W9ybC0MLY4FvVIES7zs
 rj6y4xI9koequo6b0NMdqZMPFqKVNkkV/P0cejER3okuHkhbVo6toSmXFoXe0oW1tAmA
 fNQ/FY/l//Gm2gLsrgMHt9WUCNZVKxXWt8cj1hH9ggX/KP3FyydBFwVyazz+cHAk/Rrz
 lerdGs+RADiAg9ElWYehsZGAZEEN+8XvAanfxAACQ5h07ouUVRDBhlmPfvZiZF5lFt8l
 rJaw==
X-Gm-Message-State: AKGB3mLvNde/1lWVob0MNGgOAmRkv5OfZBXYvMgBenKV+HhEH2TAwFpG
 7g/mzu6nPG4uR2UWqTYi5GByPt1oBGI=
X-Received: by 10.28.5.198 with SMTP id 189mr2944610wmf.29.1512390426976;
 Mon, 04 Dec 2017 04:27:06 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.03
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:06 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 04/19] crypto: arm64/aes-bs - move kernel mode neon
 en/disable into loop
Date: Mon,  4 Dec 2017 12:26:30 +0000
Message-Id: <20171204122645.31535-5-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-neonbs-glue.c | 36 +++++++++-----------
 1 file changed, 17 insertions(+), 19 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
index 9d823c77ec84..e7a95a566462 100644
--- a/arch/arm64/crypto/aes-neonbs-glue.c
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -99,9 +99,8 @@ static int __ecb_crypt(struct skcipher_request *req,
 	struct skcipher_walk walk;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
 	while (walk.nbytes >= AES_BLOCK_SIZE) {
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
@@ -109,12 +108,13 @@ static int __ecb_crypt(struct skcipher_request *req,
 			blocks = round_down(blocks,
 					    walk.stride / AES_BLOCK_SIZE);
 
+		kernel_neon_begin();
 		fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk,
 		   ctx->rounds, blocks);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk,
 					 walk.nbytes - blocks * AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 
 	return err;
 }
@@ -158,19 +158,19 @@ static int cbc_encrypt(struct skcipher_request *req)
 	struct skcipher_walk walk;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
 	while (walk.nbytes >= AES_BLOCK_SIZE) {
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
 		/* fall back to the non-bitsliced NEON implementation */
+		kernel_neon_begin();
 		neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
 				     ctx->enc, ctx->key.rounds, blocks,
 				     walk.iv);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 	return err;
 }
 
@@ -181,9 +181,8 @@ static int cbc_decrypt(struct skcipher_request *req)
 	struct skcipher_walk walk;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
 	while (walk.nbytes >= AES_BLOCK_SIZE) {
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
@@ -191,13 +190,14 @@ static int cbc_decrypt(struct skcipher_request *req)
 			blocks = round_down(blocks,
 					    walk.stride / AES_BLOCK_SIZE);
 
+		kernel_neon_begin();
 		aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr,
 				  ctx->key.rk, ctx->key.rounds, blocks,
 				  walk.iv);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk,
 					 walk.nbytes - blocks * AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
 
 	return err;
 }
@@ -229,9 +229,8 @@ static int ctr_encrypt(struct skcipher_request *req)
 	u8 buf[AES_BLOCK_SIZE];
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
-	kernel_neon_begin();
 	while (walk.nbytes > 0) {
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
 		u8 *final = (walk.total % AES_BLOCK_SIZE) ? buf : NULL;
@@ -242,8 +241,10 @@ static int ctr_encrypt(struct skcipher_request *req)
 			final = NULL;
 		}
 
+		kernel_neon_begin();
 		aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
 				  ctx->rk, ctx->rounds, blocks, walk.iv, final);
+		kernel_neon_end();
 
 		if (final) {
 			u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE;
@@ -258,8 +259,6 @@ static int ctr_encrypt(struct skcipher_request *req)
 		err = skcipher_walk_done(&walk,
 					 walk.nbytes - blocks * AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
-
 	return err;
 }
 
@@ -304,12 +303,11 @@ static int __xts_crypt(struct skcipher_request *req,
 	struct skcipher_walk walk;
 	int err;
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
 	kernel_neon_begin();
-
-	neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey,
-			     ctx->key.rounds, 1);
+	neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, ctx->key.rounds, 1);
+	kernel_neon_end();
 
 	while (walk.nbytes >= AES_BLOCK_SIZE) {
 		unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
@@ -318,13 +316,13 @@ static int __xts_crypt(struct skcipher_request *req,
 			blocks = round_down(blocks,
 					    walk.stride / AES_BLOCK_SIZE);
 
+		kernel_neon_begin();
 		fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk,
 		   ctx->key.rounds, blocks, walk.iv);
+		kernel_neon_end();
 		err = skcipher_walk_done(&walk,
 					 walk.nbytes - blocks * AES_BLOCK_SIZE);
 	}
-	kernel_neon_end();
-
 	return err;
 }
 

From patchwork Mon Dec  4 12:26:31 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120515
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365519qgn;
 Mon, 4 Dec 2017 04:27:15 -0800 (PST)
X-Google-Smtp-Source: AGs4zMaDkZEdwfa26B867vGmKgHF+Xad/5MG6554nt1l1QwUTJQq4XNyYlB2IX6L7GJO6j2XWQLC
X-Received: by 10.159.204.146 with SMTP id t18mr11375075plo.237.1512390435411; 
 Mon, 04 Dec 2017 04:27:15 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390435; cv=none;
 d=google.com; s=arc-20160816;
 b=o01Ag549QScbxEj+Za02rQjlSS67tyT5Af03hvQm0tv02yHmWqX61OBuflo+L4U6dt
 zTch7HFm1suUhIMAY9WD/3CnZIxHgEU/VIuC0/YqvQHwsu4By9YkelxR84TyKU47Wu20
 Xjgz20QAqb1oUBouNEbOq9XZsHfb1VUTbXKhSuAa7OgvQ1PPHV1dfdkFe54Bk/asiP/J
 pvv23ksYU1Q25ytGOCBUBVh+5JRRAd710fRZfr6WcNuvrsuq9BvzG+qqIyD5lte4Z+s8
 +qfIzv4b+dNb0VX135m67YN0LUfjfxIRk9n4d7NJcgiTtQl40k+9Cm9JRwLFJVe7k+Y/
 +FFg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=;
 b=AW+iYHbvWFCddi4/vLn/Li4cwn2FKZLOZ5Xu4L2o7sEFjywgpbZMfvpCbz7AZlxeiV
 WS5guVZy+yO4jIbSG7Djob1DXwUMqjPlR79vkyXSXTVtWH7lvLZy1Ln9l8fTEKuitQXD
 EGY6WBezGCuGEztFd2piNIRrNSZOw3b21Z+5VtMHbOu69G+5gJNXNBef9U+B+VP39wQ7
 +98a6MLB1We/UzZ7FXcSxo+HihG0Kkc/CTmrPWUDqvaQcQxm/4hdtTcUqefNgPKQ8ScA
 gjdiVXQMSRrm6lxnlrD1rKFpgeMvao6JMM6j6GuiDVyfQ2/gG8CwoCKHPezFRqyIfNju
 LzFw==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=SgyXGla/;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 u144si9409634pgb.226.2017.12.04.04.27.15; 
 Mon, 04 Dec 2017 04:27:15 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=SgyXGla/;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753138AbdLDM1O (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:14 -0500
Received: from mail-wm0-f65.google.com ([74.125.82.65]:44482 "EHLO
 mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753142AbdLDM1L (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:11 -0500
Received: by mail-wm0-f65.google.com with SMTP id t8so5453293wmc.3
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=;
 b=SgyXGla/u3x1dLflpupGjmIczL/OalUwN1xEkR1cUiEdQiB8aIqmP0v6T4uIn41pk3
 5SQbM6r34qGzHkN4bDwKbMnVDmbtvWi4A92YzbuXVzpeYx50Wa7wb5OGtSwNArVdVnTy
 eGNoyZ20gDDvKMcfT/RcPAoT1X30H6Np5KX1s=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=;
 b=jIilYinuJMmFItGakrvIvgf+eoBswopMbiSPW1aigDs/poaYaYTp1TlOkRpNMODt7t
 7NkT8awaiiCTqVIv3p7/2cAfVyDDUGk4666s3p7QldAeGHSTeDvJmvTaZXzUYOq1lzeh
 +MYpoRqwgpnF37OHhjZW95DmxtlLokntWyuF/EkxsvELY7XMdy8meX4B14dNXkD8bp+v
 3B+EADMB+3Sq8uePfMVUR3mUJCPyLOEaJiwRl/5KBcbyKeIYRugeu49YaGyF2lTOTRaS
 x1+Tk3+PcQUbn5EzSldOSvtiFBdS1Es9Lqu+YFMTdrPpuv31a2fTKLizi9KeqZV8KZvh
 wrAA==
X-Gm-Message-State: AKGB3mK8bBm+oWTaqj8Tq4csnAlE1qbB+wg4Vc/Y+5RMWLWPh907bMzJ
 TGOavsuNd/Pv+UJo4WNFdDvzW3Hcqtg=
X-Received: by 10.28.178.135 with SMTP id b129mr2746569wmf.103.1512390430248; 
 Mon, 04 Dec 2017 04:27:10 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.07
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:09 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 05/19] crypto: arm64/chacha20 - move kernel mode neon
 en/disable into loop
Date: Mon,  4 Dec 2017 12:26:31 +0000
Message-Id: <20171204122645.31535-6-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/chacha20-neon-glue.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
index cbdb75d15cd0..727579c93ded 100644
--- a/arch/arm64/crypto/chacha20-neon-glue.c
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -37,12 +37,19 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
 	u8 buf[CHACHA20_BLOCK_SIZE];
 
 	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+		kernel_neon_begin();
 		chacha20_4block_xor_neon(state, dst, src);
+		kernel_neon_end();
 		bytes -= CHACHA20_BLOCK_SIZE * 4;
 		src += CHACHA20_BLOCK_SIZE * 4;
 		dst += CHACHA20_BLOCK_SIZE * 4;
 		state[12] += 4;
 	}
+
+	if (!bytes)
+		return;
+
+	kernel_neon_begin();
 	while (bytes >= CHACHA20_BLOCK_SIZE) {
 		chacha20_block_xor_neon(state, dst, src);
 		bytes -= CHACHA20_BLOCK_SIZE;
@@ -55,6 +62,7 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
 		chacha20_block_xor_neon(state, buf, buf);
 		memcpy(dst, buf, bytes);
 	}
+	kernel_neon_end();
 }
 
 static int chacha20_neon(struct skcipher_request *req)
@@ -68,11 +76,10 @@ static int chacha20_neon(struct skcipher_request *req)
 	if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
 		return crypto_chacha20_crypt(req);
 
-	err = skcipher_walk_virt(&walk, req, true);
+	err = skcipher_walk_virt(&walk, req, false);
 
 	crypto_chacha20_init(state, ctx, walk.iv);
 
-	kernel_neon_begin();
 	while (walk.nbytes > 0) {
 		unsigned int nbytes = walk.nbytes;
 
@@ -83,7 +90,6 @@ static int chacha20_neon(struct skcipher_request *req)
 				nbytes);
 		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
 	}
-	kernel_neon_end();
 
 	return err;
 }

From patchwork Mon Dec  4 12:26:32 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120516
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365597qgn;
 Mon, 4 Dec 2017 04:27:19 -0800 (PST)
X-Google-Smtp-Source: AGs4zMYxnzhvrMNh9QP3tJfvaAcxAT9dHqB3mL+lD1m0EOmuSRS2KV0tyO1kV+uH0+ufTSiqPc1I
X-Received: by 10.101.86.6 with SMTP id l6mr12288726pgs.153.1512390439605;
 Mon, 04 Dec 2017 04:27:19 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390439; cv=none;
 d=google.com; s=arc-20160816;
 b=IP/+u7pToxnza6RMBjv6foKdoFrIMnYbschE5J/yLg62F81UiH91OsoErXsuxSjfG/
 z4GO6uwWriS+sXxcMV43hiMprcUOfvH0Zwjt5sl64XHHb+lGqU6pHNDwANLYrxHyAaaH
 4bxF0shUTvYZssytZdlbTmArJVUVxYOF/10e2S6LM/vbO3cD5wy5vSFlI9KIVZ3e6/gu
 8c2UBm9+iCl4ts5jnrs6dyRIa/VBIGDqM7KpgjIKQdyxiuDF+6lTHirEFKFDlfi5iQ/d
 0m83SHcjZ95XKBs/T54ylZHfzJjVa7RuRDk+RBxtgavNEF/0emIxvpWexjr98svSjMq3
 Fjsg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=Ivjs08rQtt4hYa83WFs0Y/nrkWzevRyHdzepLIVTug8=;
 b=xf60Np7P/gxGzpleiNelCDYOL6/LUiAFArq6uelXb7c7Q8ucQAJUqc6MouZNeLCAUB
 U8TmS3O4Cxqhh9M82gU7MWV0UTUzjMKuzAYjChXbCQn7SCw3mR/raMVv3yxF7OW+iDku
 9LvVwBqlwbv9QUdRBKov5gHwLNK0ZmDDpVWA1tepf2cEP36QazZtFui4fDSqU1KBe7yF
 2Q9Ze/TRs0w/iKXvnITebf2hRD7FRe7bugxqhRz8FxTLODaAxzz3LClkRkgBOKVvy/XM
 VjRNQI6KoYgzC906Yno+CY4XSrh2gBdZ/8lqLv/9tZ7r/JV9kwTXJfTMQK26sVlPfTx9
 o7Cw==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=ZLIaS9xR;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 u144si9409634pgb.226.2017.12.04.04.27.19; 
 Mon, 04 Dec 2017 04:27:19 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=ZLIaS9xR;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753030AbdLDM1R (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:17 -0500
Received: from mail-wm0-f66.google.com ([74.125.82.66]:41444 "EHLO
 mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753171AbdLDM1P (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:15 -0500
Received: by mail-wm0-f66.google.com with SMTP id g75so5491545wme.0
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=Ivjs08rQtt4hYa83WFs0Y/nrkWzevRyHdzepLIVTug8=;
 b=ZLIaS9xRMfvdqVMVxqf5e5HYJqlTjw1bO+aQVVEAQKLIYYrQ6xPtiQk/t2liefAAaY
 T4qlPN6Lr7k2c7dESJ1acTjqQp7senrggYAZ7ENDmw+12CjZh+elPm3CbgE5/DWDMJo6
 S9TVxwe3YhPxLDujE4xmzQGOqxjWDZBSBmTto=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=Ivjs08rQtt4hYa83WFs0Y/nrkWzevRyHdzepLIVTug8=;
 b=HAaRRptkK6QKYkPRg+nPS7mOZhglD6UClUBpnidGqi91fPdTlJKSvroeAgIxfftlwd
 hbuukcF5aMrktgdxxusVljfcl0U7rnPxgHySBopFIsYE9ApjsSQIxhe/x+obx9f/O/Y7
 BwU6sGmJQh0GIGvaX5T5WKOpG0HV4QsvToIDJoTJtZxgmGFv0HE6AAJGWEYSkazYNiPe
 dWHMx68hXUQR4vF03Is/Rq9GxVd68mwIGqO+V4hqLgLaWFJ51gDbwg0+0MFHlnHPFHp7
 g1b4uZJtWy1uWK80t0KW95vrbHVHj6iNQUIluOnZrLARQyt34np5g32yVdup9mEBMqAd
 jm/g==
X-Gm-Message-State: AKGB3mItCYEAIPgoA0a6oG9Y4CS1zpMA/nklTIXZqXh80wxhTjtaa39E
 l4qFZR6nQD4d85MVMyN10ke4gla6HYI=
X-Received: by 10.28.71.5 with SMTP id u5mr6901936wma.84.1512390433808;
 Mon, 04 Dec 2017 04:27:13 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.10
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:12 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 06/19] crypto: arm64/ghash - move kernel mode neon
 en/disable into loop
Date: Mon,  4 Dec 2017 12:26:32 +0000
Message-Id: <20171204122645.31535-7-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

When kernel mode NEON was first introduced on arm64, the preserve and
restore of the userland NEON state was completely unoptimized, and
involved saving all registers on each call to kernel_neon_begin(),
and restoring them on each call to kernel_neon_end(). For this reason,
the NEON crypto code that was introduced at the time keeps the NEON
enabled throughout the execution of the crypto API methods, which may
include calls back into the crypto API that could result in memory
allocation or other actions that we should avoid when running with
preemption disabled.

Since then, we have optimized the kernel mode NEON handling, which now
restores lazily (upon return to userland), and so the preserve action
is only costly the first time it is called after entering the kernel.

So let's put the kernel_neon_begin() and kernel_neon_end() calls around
the actual invocations of the NEON crypto code, and run the remainder of
the code with kernel mode NEON disabled (and preemption enabled)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/ghash-ce-glue.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c
index cfc9c92814fd..cb39503673d4 100644
--- a/arch/arm64/crypto/ghash-ce-glue.c
+++ b/arch/arm64/crypto/ghash-ce-glue.c
@@ -368,26 +368,28 @@ static int gcm_encrypt(struct aead_request *req)
 		pmull_gcm_encrypt_block(ks, iv, NULL,
 					num_rounds(&ctx->aes_key));
 		put_unaligned_be32(3, iv + GCM_IV_SIZE);
+		kernel_neon_end();
 
-		err = skcipher_walk_aead_encrypt(&walk, req, true);
+		err = skcipher_walk_aead_encrypt(&walk, req, false);
 
 		while (walk.nbytes >= AES_BLOCK_SIZE) {
 			int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
+			kernel_neon_begin();
 			pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr,
 					  walk.src.virt.addr, &ctx->ghash_key,
 					  iv, num_rounds(&ctx->aes_key), ks);
+			kernel_neon_end();
 
 			err = skcipher_walk_done(&walk,
 						 walk.nbytes % AES_BLOCK_SIZE);
 		}
-		kernel_neon_end();
 	} else {
 		__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
 				    num_rounds(&ctx->aes_key));
 		put_unaligned_be32(2, iv + GCM_IV_SIZE);
 
-		err = skcipher_walk_aead_encrypt(&walk, req, true);
+		err = skcipher_walk_aead_encrypt(&walk, req, false);
 
 		while (walk.nbytes >= AES_BLOCK_SIZE) {
 			int blocks = walk.nbytes / AES_BLOCK_SIZE;
@@ -467,15 +469,18 @@ static int gcm_decrypt(struct aead_request *req)
 		pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc,
 					num_rounds(&ctx->aes_key));
 		put_unaligned_be32(2, iv + GCM_IV_SIZE);
+		kernel_neon_end();
 
-		err = skcipher_walk_aead_decrypt(&walk, req, true);
+		err = skcipher_walk_aead_decrypt(&walk, req, false);
 
 		while (walk.nbytes >= AES_BLOCK_SIZE) {
 			int blocks = walk.nbytes / AES_BLOCK_SIZE;
 
+			kernel_neon_begin();
 			pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr,
 					  walk.src.virt.addr, &ctx->ghash_key,
 					  iv, num_rounds(&ctx->aes_key));
+			kernel_neon_end();
 
 			err = skcipher_walk_done(&walk,
 						 walk.nbytes % AES_BLOCK_SIZE);
@@ -483,14 +488,12 @@ static int gcm_decrypt(struct aead_request *req)
 		if (walk.nbytes)
 			pmull_gcm_encrypt_block(iv, iv, NULL,
 						num_rounds(&ctx->aes_key));
-
-		kernel_neon_end();
 	} else {
 		__aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv,
 				    num_rounds(&ctx->aes_key));
 		put_unaligned_be32(2, iv + GCM_IV_SIZE);
 
-		err = skcipher_walk_aead_decrypt(&walk, req, true);
+		err = skcipher_walk_aead_decrypt(&walk, req, false);
 
 		while (walk.nbytes >= AES_BLOCK_SIZE) {
 			int blocks = walk.nbytes / AES_BLOCK_SIZE;

From patchwork Mon Dec  4 12:26:35 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120519
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365710qgn;
 Mon, 4 Dec 2017 04:27:27 -0800 (PST)
X-Google-Smtp-Source: AGs4zMbQ9TatcA6LKujakOvLPLbjS9EUc7jE8t/CPQSzxEiGu0jgpIF/gkCiYbfvkRdhhDWw74Ym
X-Received: by 10.98.74.148 with SMTP id c20mr19172527pfj.200.1512390447304; 
 Mon, 04 Dec 2017 04:27:27 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390447; cv=none;
 d=google.com; s=arc-20160816;
 b=tsJTSeJG+oZlyAslZ5teShTIxQ736HQ439k14qVWJb2l1DG0LPscCrASKi6gqIdQBl
 tfTvlkkdMMAwm2n2Gruv0oEOaDkJJSCCcyIa4+WQ3idYMRItdye8+Fq3tOqLCn/tzDp4
 EICb3TH00oKtFYVfkVypzaJ9U2WpuGHCxlYa5gbo7CaUEOsmMLMn842A0lRXmxMgjp7S
 U0UoK0lJkU5uMafbwU/TDN/szjW9OEr2VnvNmYxyVAWR7vttxec76E2EtG+kH27akmfC
 t2rbPXy/ZkbCY3oNYisF5LpB2obj+0+e7a2aWTA7XaFz7LuGJaBo6DSFzPrmYnbHsz04
 vMHg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=;
 b=0wZoSEh7c9UryhEsZHegqAIV2Ku+ml1GxiPrQKqcXR74QhThpGdqPprd4dp9mzGFIW
 RJl8ezg30nBWuH+LvIh8kw4zwMi2c4gG/dI47Eq7K1ebTyETf2gNoAjjd8hXogrLps/9
 4YWP1H23kvpCxaa+oqoxdIRqTY4YnvpUWaqBcm5Fk19P6WxVmbf9bpIpclq8DQEM+8Cq
 htGsLmDxGaWi2FvyYCs1DL2uRjwPu5zuDyh1NpaUjEfOHyHWH1W4rEvF8pplRYvJjvUU
 wdfYPzVDNS+7VIvARRFscmDnNszyslMVwo225jDdDPk21bG25RXuWnlJZnXk3viYsb7r
 Pokw==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=T5X6DWNb;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 u144si9409634pgb.226.2017.12.04.04.27.27; 
 Mon, 04 Dec 2017 04:27:27 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=T5X6DWNb;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753248AbdLDM1Z (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:25 -0500
Received: from mail-wm0-f66.google.com ([74.125.82.66]:39401 "EHLO
 mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753241AbdLDM1Y (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:24 -0500
Received: by mail-wm0-f66.google.com with SMTP id i11so13902077wmf.4
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:23 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=;
 b=T5X6DWNbBmJA75bMdo7t4EtdknQlVh3wODDqkcpC2B9iB3A+E2TZ8a08nksjXWb5Kg
 hJ0+u2S9QNy37nEpnp42RLFX8VhZnBuQnSkZrtab4tR+Kjj3ngiFnBgNPbB8SR9ftSmA
 ZeCDCI6+BHfNlx27OKpZhf1IKSi7Ke+hfa3OA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=;
 b=rXMSdNA8youDphL8Wo6Iw+qHboHQFUKfKVtvCzX8Q2Des5++zIlLEixcwcU4YHBJu+
 4gm6W/wBlHwQ4609H0bnPOMPJNeU+i6o/0Qa46yf2x0Hf9mjMRjtsFDy223xDrJVBg8Z
 Ls3IkDGcxDQAXcHZh80rwzhHFWsOcw9XL6+iCe7vDuDiKzMbENI+T1GqFyinv4VkEkde
 i+/qldo62mRQjxkOB9TXrUy2Ratf8hPulcqEH+p7GEQI6OPDh7aoosl9g6iqR2W7Ye23
 Y+T6CM8hTbo+8p77togjTM7w5fuSexUeBeyC0g4O5P9kxNi/mgUzukEteZIZsajkFvOJ
 apVQ==
X-Gm-Message-State: AJaThX7tXG50ghRNqRQSQXtJtMir/O6Hzk2nGSxO6jwRSlzFnp/4zRAj
 KZpdIDMXvdi/el3CEfF2spyOOCxiH/E=
X-Received: by 10.28.216.196 with SMTP id p187mr6452225wmg.158.1512390442611; 
 Mon, 04 Dec 2017 04:27:22 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.19
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:21 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 09/19] crypto: arm64/aes-blk - add 4 way interleave to
 CBC-MAC encrypt path
Date: Mon,  4 Dec 2017 12:26:35 +0000
Message-Id: <20171204122645.31535-10-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

CBC MAC is strictly sequential, and so the current AES code simply
processes the input one block at a time. However, we are about to add
yield support, which adds a bit of overhead, and which we prefer to
align with other modes in terms of granularity (i.e., it is better to
have all routines yield every 64 bytes and not have an exception for
CBC MAC which yields every 16 bytes)

So unroll the loop by 4. We still cannot perform the AES algorithm in
parallel, but we can at least merge the loads and stores.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-modes.S | 23 ++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index e86535a1329d..a68412e1e3a4 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -395,8 +395,28 @@ AES_ENDPROC(aes_xts_decrypt)
 AES_ENTRY(aes_mac_update)
 	ld1		{v0.16b}, [x4]			/* get dg */
 	enc_prepare	w2, x1, x7
-	cbnz		w5, .Lmacenc
+	cbz		w5, .Lmacloop4x
 
+	encrypt_block	v0, w2, x1, x7, w8
+
+.Lmacloop4x:
+	subs		w3, w3, #4
+	bmi		.Lmac1x
+	ld1		{v1.16b-v4.16b}, [x0], #64	/* get next pt block */
+	eor		v0.16b, v0.16b, v1.16b		/* ..and xor with dg */
+	encrypt_block	v0, w2, x1, x7, w8
+	eor		v0.16b, v0.16b, v2.16b
+	encrypt_block	v0, w2, x1, x7, w8
+	eor		v0.16b, v0.16b, v3.16b
+	encrypt_block	v0, w2, x1, x7, w8
+	eor		v0.16b, v0.16b, v4.16b
+	cmp		w3, wzr
+	csinv		x5, x6, xzr, eq
+	cbz		w5, .Lmacout
+	encrypt_block	v0, w2, x1, x7, w8
+	b		.Lmacloop4x
+.Lmac1x:
+	add		w3, w3, #4
 .Lmacloop:
 	cbz		w3, .Lmacout
 	ld1		{v1.16b}, [x0], #16		/* get next pt block */
@@ -406,7 +426,6 @@ AES_ENTRY(aes_mac_update)
 	csinv		x5, x6, xzr, eq
 	cbz		w5, .Lmacout
 
-.Lmacenc:
 	encrypt_block	v0, w2, x1, x7, w8
 	b		.Lmacloop
 

From patchwork Mon Dec  4 12:26:36 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120520
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365776qgn;
 Mon, 4 Dec 2017 04:27:30 -0800 (PST)
X-Google-Smtp-Source: AGs4zMbylGkI5S/u5p6DUNHBrZWUUG4RigaqvRRG9EMkGEaBgN2y8j8nAA4FMeY0PYAE0f3OmwU1
X-Received: by 10.101.93.66 with SMTP id e2mr13603030pgt.50.1512390450807;
 Mon, 04 Dec 2017 04:27:30 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390450; cv=none;
 d=google.com; s=arc-20160816;
 b=TIf4XBtp2cfPk6AVum195OyVXm0TN7RIxYn5ELifH4SS9v5S/oXdFujx6/WoywG+4N
 fjX1eVlAKfxWQ4UY6YkmyZTnCxOoMNpN2aVwCO202odzgzt+DfFc/ApM1VwtoFsDazVQ
 EZHSzdFW9gVONAlWHIZY/yX5blkMteWUxECtOv2LaxPZbM9PhLLdFzKuB7d5r80m1V4Z
 cvUO+dkRy9Jn+OoS0J3FzXYJWfz4segUJvq5vfivHKeWmxZGCW6zf4qs86jInaaugOba
 5IDEX1BIYGlutAlM9i2N0YUdVFpmfTxqqv8HKI9x0wUJxI6t0+DTTlY8TIn9CCQ/znRW
 Rc3A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=;
 b=tZyk314AeZT+RFR9RrwRPuNXZK+vFsXdfOLf+mflRpPVM7UYxhmIRU3TM+RZBrbTkR
 yewopQRlL3/n601CPGP0IqtjuEpweWYil92jr/MqKpMKe3tA46jqCQfXoH6sMV7z2/RR
 OhQtGGuNrB7GlHDKpyA83IDzef5dhQM6Lt+olEUgg9Mx0HV2cotrZFZE64VlXAhnTo/s
 uBJwVfoLrtuBx2Nscl+r2IMCjs0l8V0HY92qUWn1fZee0RzAYeQQec92XKb7J4BjRmbP
 jbVx6f/SNEkS1GoszxGsuxc6rIXEEu8ytuqm7fIKiZqiMWG+J3hJ9ayNzKBNkWUp/JP2
 lEzQ==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=jlIZxCzn;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.30; 
 Mon, 04 Dec 2017 04:27:30 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=jlIZxCzn;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753265AbdLDM12 (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:28 -0500
Received: from mail-wm0-f66.google.com ([74.125.82.66]:46671 "EHLO
 mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753260AbdLDM10 (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:26 -0500
Received: by mail-wm0-f66.google.com with SMTP id r78so5468569wme.5
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:26 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=;
 b=jlIZxCznZx/RtAlVcf+JuB12Pzu4NtHO4tfWZeTQ9z4xtm+kg3S9YksejWa3udtb0/
 9Y6rE+cULrIqUTht1BaBHCcO9frkwJbSxKTwYQLJc0AIQzCTVcbx8Yv1dJf15ZKuG82+
 Lqxu1POXcqgSWQLqSk7WRBBoqTRoLb6FRQ03U=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=;
 b=UF0cSR44DZq1P7pijLUaz79l0+DT14Tf6HVhmIm5moUuU1CUniloeeCQ7bfwIpGEjg
 /CcrfjtnLnlWqZllEpU8JgVHzEXbpWEmdI6Q4Jk45P9QUoAXDbmj8euQEhp1FbxbcL+Y
 uOP1dH4VYDLwGTVnz2gjwrT4jQP44ZbqXhm+nzIVWRTE74iPzCacqVAlYag8We143kBs
 6Hqaw4PYbpdvNAdnbuQ2kVAdxK3nOYEhh1370S4JHEbEajfCai42vpqS0YW8D+kG9JmY
 Du5SHIvOVgdNo88bLhHcP6uRGFwUFzt1LbPRbt0VCjj37F0EiKsTX0IEobpQvKjXd15u
 uYaA==
X-Gm-Message-State: AKGB3mIYfDRogbw+zyoklhvoWqA8umUywqCuj2CA3msoeRwkvBQmIjNj
 nuob5LGktFB+eBOh0jDaoLM04jYa95I=
X-Received: by 10.28.63.16 with SMTP id m16mr2937869wma.19.1512390445103;
 Mon, 04 Dec 2017 04:27:25 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.22
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:24 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 10/19] crypto: arm64/sha256-neon - play nice with
 CONFIG_PREEMPT kernels
Date: Mon,  4 Dec 2017 12:26:36 +0000
Message-Id: <20171204122645.31535-11-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Tweak the SHA256 update routines to invoke the SHA256 block transform
block by block, to avoid excessive scheduling delays caused by the
NEON algorithm running with preemption disabled.

Also, remove a stale comment which no longer applies now that kernel
mode NEON is actually disallowed in some contexts.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/sha256-glue.c | 36 +++++++++++++-------
 1 file changed, 23 insertions(+), 13 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
index b064d925fe2a..e8880ccdc71f 100644
--- a/arch/arm64/crypto/sha256-glue.c
+++ b/arch/arm64/crypto/sha256-glue.c
@@ -89,21 +89,32 @@ static struct shash_alg algs[] = { {
 static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
 			      unsigned int len)
 {
-	/*
-	 * Stacking and unstacking a substantial slice of the NEON register
-	 * file may significantly affect performance for small updates when
-	 * executing in interrupt context, so fall back to the scalar code
-	 * in that case.
-	 */
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
 	if (!may_use_simd())
 		return sha256_base_do_update(desc, data, len,
 				(sha256_block_fn *)sha256_block_data_order);
 
-	kernel_neon_begin();
-	sha256_base_do_update(desc, data, len,
-				(sha256_block_fn *)sha256_block_neon);
-	kernel_neon_end();
+	while (len > 0) {
+		unsigned int chunk = len;
+
+		/*
+		 * Don't hog the CPU for the entire time it takes to process all
+		 * input when running on a preemptible kernel, but process the
+		 * data block by block instead.
+		 */
+		if (IS_ENABLED(CONFIG_PREEMPT) &&
+		    chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE)
+			chunk = SHA256_BLOCK_SIZE -
+				sctx->count % SHA256_BLOCK_SIZE;
 
+		kernel_neon_begin();
+		sha256_base_do_update(desc, data, chunk,
+				      (sha256_block_fn *)sha256_block_neon);
+		kernel_neon_end();
+		data += chunk;
+		len -= chunk;
+	}
 	return 0;
 }
 
@@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, const u8 *data,
 		sha256_base_do_finalize(desc,
 				(sha256_block_fn *)sha256_block_data_order);
 	} else {
-		kernel_neon_begin();
 		if (len)
-			sha256_base_do_update(desc, data, len,
-				(sha256_block_fn *)sha256_block_neon);
+			sha256_update_neon(desc, data, len);
+		kernel_neon_begin();
 		sha256_base_do_finalize(desc,
 				(sha256_block_fn *)sha256_block_neon);
 		kernel_neon_end();

From patchwork Mon Dec  4 12:26:37 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120521
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365830qgn;
 Mon, 4 Dec 2017 04:27:34 -0800 (PST)
X-Google-Smtp-Source: AGs4zMZ78e1QwV8xEXhx7c9F9zAteI6PAGCoIvOjQM2SN8hF1wyhF5I+iTisbjSLdmiC6mnAI3vT
X-Received: by 10.101.102.66 with SMTP id z2mr14013510pgv.352.1512390454403; 
 Mon, 04 Dec 2017 04:27:34 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390454; cv=none;
 d=google.com; s=arc-20160816;
 b=L+aNCmZzcblnhphvBLF2pZuuE/PV/mXBzVzWBKYnDXgDj0fEBO47Mv/IaxLJE/lb/g
 ktF/AX0LbAsyvybxDzulEZ1HMn383YrbLRYR2BIYtSh3zlswAk0YGpI3gOT715MePtjE
 OXNAVAg3wB0dns9PE2IXrCUehlq5+e/tIzdxOwvFuVmtdmOlPwaXiqfYAV27ZGus1foW
 s4MCRFjggYyK2cj71ELNLtSkMRAV4/5My1xABAPYQ47J58MBcfec/ajZnMUC3GnOC8y1
 vtrpq2wQBWB2kcSRUIerbEorjbsI675yREOCgr6BodNXuIUiA4MFPHgAOa99QeJNDp24
 N2Eg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=iLnvUryj6wgRmRZyo3vDg6ZHKz55Pt1BEioZv1Vwiy8=;
 b=nxB2Z8ej4hhZFLZNdJpONw/jmMOy5ZKxsiYhfuHcysj3qwFrl53vOeiTD/NVGKEVg2
 wHZq+y/f0PCPC6v1x3YCHKkIv5ylHCY5SlQa+I9xoF6wQHlgtQwrK6ae/zttVo8bmYyx
 po5cKfB946CiHtsJurIgISDGnjNUAP+KWI9gZ68oRyiEcpnr11jdVRv9Twx6bqB4SyDJ
 7ZEOaI/FH4JgIdQ8bvYmrELNPvikgTop1E1g9TUVl4GPRqR7cFO1GFNEGS/0gk0Ftikv
 jmGz7tOGF0+aiilN6cCcZ1kj5OsD2mbehElVU6U8C9gTv43kDTb4SzueIbBhpI4tk5ne
 o35A==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=RNFQ6A9W;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.34; 
 Mon, 04 Dec 2017 04:27:34 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=RNFQ6A9W;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753275AbdLDM1c (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:32 -0500
Received: from mail-wm0-f66.google.com ([74.125.82.66]:45616 "EHLO
 mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753271AbdLDM13 (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:29 -0500
Received: by mail-wm0-f66.google.com with SMTP id 9so5448131wme.4
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:28 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=iLnvUryj6wgRmRZyo3vDg6ZHKz55Pt1BEioZv1Vwiy8=;
 b=RNFQ6A9WBnbvP5M4fVOCAJpZH4ICe6gFetdbaxRv+3k/VjNUOFjrtlOpAkXrk79Xyc
 DxOmUjE4C0YiYnKhqS0ZYK6ft1DGvGI3Rfr4c3IwiqQeAuiB6zqpg5GkfgZoMZTGOJkg
 Sj6u4HwE2OG6/e0VYYzYHTW5gkomlrQJPDtfw=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=iLnvUryj6wgRmRZyo3vDg6ZHKz55Pt1BEioZv1Vwiy8=;
 b=jyKxAreH9MggLfeiHqDWxWTITIyRfKXCS91u8/SoLZ3qlD6zoAzycQgpwKHsfFwG6h
 ZNd51U1GDDLQDau8p7S3sgPtZzws2Yg0qh4K21x1FUHU8KAtY//qzgLri+4g7fWiuaVr
 WJB89748j6F+2rnVwg5bxeLXpUHFOa6HiKxzhTRyf0CyY7+0gNYpYy1vNEi0Cw6rRijG
 5EkUJabj0QKUySWnEYF0+jUEw7VtyxOd9wx8P4BuIoQYAitw0JYZ9qJRm7x4ENakAUdW
 hJxkdWDP9PLFL/bBlwqW2IT+GTEcXMneqamybanNfEEinU21CVkyQ810r78ZyYQWOMhS
 ks4A==
X-Gm-Message-State: AJaThX5Gfm93jrJ31XpeFTPXt/629rE2vkr3CTBmv40aKGfuPUHWPj7J
 M4mSRWh23ngFivI3X+ppNpkWINSbxaw=
X-Received: by 10.28.198.139 with SMTP id w133mr8048759wmf.13.1512390447568; 
 Mon, 04 Dec 2017 04:27:27 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.25
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:26 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 11/19] arm64: assembler: add macro to conditionally yield
 the NEON under PREEMPT
Date: Mon,  4 Dec 2017 12:26:37 +0000
Message-Id: <20171204122645.31535-12-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Add a support macro to conditionally yield the NEON (and thus the CPU)
that may be called from the assembler code. Given that especially the
instruction based accelerated crypto code may use very tight loops, add
some parametrization so that the TIF_NEED_RESCHED flag test is only
executed every so many loop iterations.

In some cases, yielding the NEON involves saving and restoring a non
trivial amount of context (especially in the CRC folding algorithms),
and so the macro is split into two, and the code in between is only
executed when the yield path is taken, allowing the contex to be preserved.
The second macro takes a label argument that marks the resume-from-yield
path, which should restore the preserved context again.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/assembler.h | 50 ++++++++++++++++++++
 1 file changed, 50 insertions(+)

-- 
2.11.0

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index aef72d886677..917b026d3e00 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -512,4 +512,54 @@ alternative_else_nop_endif
 #endif
 	.endm
 
+/*
+ * yield_neon - check whether to yield to another runnable task from
+ *		kernel mode NEON code (running with preemption disabled)
+ *
+ * - Check whether the preempt count is exactly 1, in which case disabling
+ *   preemption once will make the task preemptible. If this is not the case,
+ *   yielding is pointless.
+ * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable
+ *   kernel mode NEON (which will trigger a reschedule), and branch to the
+ *   yield fixup code at @lbl.
+ */
+	.macro		yield_neon, lbl:req, ctr, order, stride, loop
+	yield_neon_pre	\ctr, \order, \stride, \loop
+	yield_neon_post	\lbl
+	.endm
+
+	.macro		yield_neon_pre, ctr, order=0, stride, loop=4444f
+#ifdef CONFIG_PREEMPT
+	/*
+	 * With some algorithms, it makes little sense to poll the
+	 * TIF_NEED_RESCHED flag after every iteration, so only perform
+	 * the check every 2^order strides.
+	 */
+	.if		\order > 1
+	.if		(\stride & (\stride - 1)) != 0
+	.error		"stride should be a power of 2"
+	.endif
+	tst		\ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1)
+	b.ne		\loop
+	.endif
+
+	get_thread_info	x0
+	ldr		w1, [x0, #TSK_TI_PREEMPT]
+	ldr		x0, [x0, #TSK_TI_FLAGS]
+	cmp		w1, #1 // == PREEMPT_OFFSET
+	csel		x0, x0, xzr, eq
+	tbnz		x0, #TIF_NEED_RESCHED, 5555f	// needs rescheduling?
+4444:
+#endif
+	.subsection	1
+5555:
+	.endm
+
+	.macro		yield_neon_post, lbl:req
+	bl		kernel_neon_end
+	bl		kernel_neon_begin
+	b		\lbl
+	.previous
+	.endm
+
 #endif	/* __ASM_ASSEMBLER_H */

From patchwork Mon Dec  4 12:26:38 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120522
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365892qgn;
 Mon, 4 Dec 2017 04:27:37 -0800 (PST)
X-Google-Smtp-Source: AGs4zMaN7FVqiNPJOvIk4azn701Q62iNo+jobTvqVnJ6PbHoMxLm/eFkelQBP1ohPUTeesSi6a0S
X-Received: by 10.99.127.25 with SMTP id a25mr13631461pgd.10.1512390457602; 
 Mon, 04 Dec 2017 04:27:37 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390457; cv=none;
 d=google.com; s=arc-20160816;
 b=Vivaq3p9GM4U5khF8Vh0VbGgZ21pyFTgKticCtEum4/d60gzzzFxdG0+h5yQWfrvvW
 VdBjpLUyeaekwKhkdwqrQpsSyXEM66we5xVVM+FNcAPtfkFy8FaGv/LwQQmKcqBrYhPI
 dmlx7Ra2kI4xx5vw7uy4ivGzdjyn+JosteYldiMG4WPrK8ovL5yJXasEoaRFtVeQ1142
 hO9baBg2uO+Bg8qnJrvQ+yFgFbUNpw41mmoZnZFP78VdsIaD8LmAn5je9V4eEtygHveU
 qfVe6Jrzx1flDrdL0VjDnRqbal2ftQ2N+Emf7eKZvN1DDejCNFcscqgHuEgIGNnGQx80
 /joA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=rZXeGMBOOCqXYLCtDRFIOa9MhRvenoqWmRAqEcgLXII=;
 b=ZFtxfYruLQKLDXdpcGdhN+nRCdLHCLKaHClCD86NsfmY02JX92J8Pyf7d8dXQkQ8hj
 P5iF9IaROyPL7V4CXg/xyXxujMSTp7Vnvg/l04J4R5Xhw571d+9izysIkaWTJUGEdpV4
 GHS4RGFnVKynUYd/wFoiSns5ButtacRhuac2kRBn9Q+CQhSfhZJxyoVBd0J3nzyQeu70
 mLODXhMtnzLl5Y7Wk5nu+1pZAb3iz4ewMnuPckj0HlULfhA/d3dlIenVSbKVeqi5f1Rh
 3ZxNg+0+KzWav1A8uQpEHzd7Vmke1kdA/TV6P3NInQoIq6rIopyZKkZxLJ8dfJjutNCl
 LxkQ==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=OaPVCwB6;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.37; 
 Mon, 04 Dec 2017 04:27:37 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=OaPVCwB6;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753271AbdLDM1f (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:35 -0500
Received: from mail-wm0-f67.google.com ([74.125.82.67]:44559 "EHLO
 mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753260AbdLDM1c (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:32 -0500
Received: by mail-wm0-f67.google.com with SMTP id t8so5455482wmc.3
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=rZXeGMBOOCqXYLCtDRFIOa9MhRvenoqWmRAqEcgLXII=;
 b=OaPVCwB6LlyiqdsakcOELuyj73LBJVykzdKu5xH7OuNkiElOvi3LoFdaj2c1fqbwve
 /q8iHyXcRl1AeWF03LGrN4tsao1Casn9WzpX6UJcwyFSmrNFy0qujDdtej04GfKVs5oN
 o5cZjNn4W8l/vumtsynIXsGOwkvopMNOUDFpE=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=rZXeGMBOOCqXYLCtDRFIOa9MhRvenoqWmRAqEcgLXII=;
 b=A1fjsto+uyj42r8OfKDz0dVqossvIw+YAt4WOAUkYLZBJFPP1ufQFo2GRa7gGdJhUf
 TM1pYO67/iNgDu0jUoeMhbV2CTq6HnksmiDp56XfOIDmgmIKZso5XVPB2CCHsIuL57mI
 t351zVbWPRbpKXhJWtLKVCWX06vGCooQ9p+0XKmjIYlffwO+2HcG3gzVGFjPUxbt9oKu
 IllVQjJe5Bils77aIvVqiEqNFGZMU9HI3IokDkH/cqvHZKdUvuET97s3+ubPHXx6d7NS
 geGSVpmfEEh6GR6sSy0akWwlgRr2MJzRhVPozKtloMLNLyXZMC3CiVL2hBo4zHWrN07n
 jl+Q==
X-Gm-Message-State: AKGB3mJj2OvHwsG8c7brwyWBIx9pZPk6L6ezK2nLkqbJzVE1pNpJNYkh
 F0wAojTf5iTiLL8Yxo5qpdUfwH+xihM=
X-Received: by 10.28.110.24 with SMTP id j24mr6624027wmc.100.1512390451014; 
 Mon, 04 Dec 2017 04:27:31 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.27
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:30 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 12/19] crypto: arm64/sha1-ce - yield every 8 blocks of input
Date: Mon,  4 Dec 2017 12:26:38 +0000
Message-Id: <20171204122645.31535-13-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON every 8 blocks of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/sha1-ce-core.S | 45 ++++++++++++++------
 1 file changed, 32 insertions(+), 13 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S
index 8550408735a0..7ae0dd369e0a 100644
--- a/arch/arm64/crypto/sha1-ce-core.S
+++ b/arch/arm64/crypto/sha1-ce-core.S
@@ -70,31 +70,40 @@
 	 *			  int blocks)
 	 */
 ENTRY(sha1_ce_transform)
+	stp		x29, x30, [sp, #-48]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	str		x21, [sp, #32]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+
 	/* load round constants */
-	adr		x6, .Lsha1_rcon
+0:	adr		x6, .Lsha1_rcon
 	ld1r		{k0.4s}, [x6], #4
 	ld1r		{k1.4s}, [x6], #4
 	ld1r		{k2.4s}, [x6], #4
 	ld1r		{k3.4s}, [x6]
 
 	/* load state */
-	ld1		{dgav.4s}, [x0]
-	ldr		dgb, [x0, #16]
+	ld1		{dgav.4s}, [x19]
+	ldr		dgb, [x19, #16]
 
 	/* load sha1_ce_state::finalize */
 	ldr_l		w4, sha1_ce_offsetof_finalize, x4
-	ldr		w4, [x0, x4]
+	ldr		w4, [x19, x4]
 
 	/* load input */
-0:	ld1		{v8.4s-v11.4s}, [x1], #64
-	sub		w2, w2, #1
+1:	ld1		{v8.4s-v11.4s}, [x20], #64
+	sub		w21, w21, #1
 
 CPU_LE(	rev32		v8.16b, v8.16b		)
 CPU_LE(	rev32		v9.16b, v9.16b		)
 CPU_LE(	rev32		v10.16b, v10.16b	)
 CPU_LE(	rev32		v11.16b, v11.16b	)
 
-1:	add		t0.4s, v8.4s, k0.4s
+2:	add		t0.4s, v8.4s, k0.4s
 	mov		dg0v.16b, dgav.16b
 
 	add_update	c, ev, k0,  8,  9, 10, 11, dgb
@@ -125,16 +134,23 @@ CPU_LE(	rev32		v11.16b, v11.16b	)
 	add		dgbv.2s, dgbv.2s, dg1v.2s
 	add		dgav.4s, dgav.4s, dg0v.4s
 
-	cbnz		w2, 0b
+	cbz		w21, 3f
+
+	yield_neon_pre	w21, 3, 1, 1b			// yield every 8 blocks
+	st1		{dgav.4s}, [x19]
+	str		dgb, [x19, #16]
+	yield_neon_post	0b
+
+	b		1b
 
 	/*
 	 * Final block: add padding and total bit count.
 	 * Skip if the input size was not a round multiple of the block size,
 	 * the padding is handled by the C code in that case.
 	 */
-	cbz		x4, 3f
+3:	cbz		x4, 4f
 	ldr_l		w4, sha1_ce_offsetof_count, x4
-	ldr		x4, [x0, x4]
+	ldr		x4, [x19, x4]
 	movi		v9.2d, #0
 	mov		x8, #0x80000000
 	movi		v10.2d, #0
@@ -143,10 +159,13 @@ CPU_LE(	rev32		v11.16b, v11.16b	)
 	mov		x4, #0
 	mov		v11.d[0], xzr
 	mov		v11.d[1], x7
-	b		1b
+	b		2b
 
 	/* store new state */
-3:	st1		{dgav.4s}, [x0]
-	str		dgb, [x0, #16]
+4:	st1		{dgav.4s}, [x19]
+	str		dgb, [x19, #16]
+	ldp		x19, x20, [sp, #16]
+	ldr		x21, [sp, #32]
+	ldp		x29, x30, [sp], #48
 	ret
 ENDPROC(sha1_ce_transform)

From patchwork Mon Dec  4 12:26:40 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120524
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4365973qgn;
 Mon, 4 Dec 2017 04:27:43 -0800 (PST)
X-Google-Smtp-Source: AGs4zMa+uxiSEgB9qerOB1L4Lj6QmG1mFISb3TIszZQZB2F0fwH62RdzZRtC4MhzyhE4ZRJ+kKox
X-Received: by 10.101.93.66 with SMTP id e2mr13603535pgt.50.1512390463233;
 Mon, 04 Dec 2017 04:27:43 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390463; cv=none;
 d=google.com; s=arc-20160816;
 b=YEDQP4QgqhZcgrf4MOL+RZmOypcr56wt4aRFDylocAtcdJLUbMgWKu5LCnYSWmtB4I
 /A/FL/bBY620BrlXo+vlOP65528hJvoMkjhu2TkwjuALhZdl2iz7BGf3SubhR2Ys2Wl3
 RvFBbFURXef1RUXIIh/WKPFSr07ad9bBf4BJuhiobIrYVR23CVJR74EfgzOgm75zAnhS
 q6vqOdL0vDR9eY7d8ve2q9fMSeg6RmkLQWqHg06ftg/bV/f8GsIR89+LvzQea5xeZJn+
 0yxlTwKlo99N3lrrU60qS9hT56V3Ixo7/yWCWw86LDEwO5wn0svZ7tv5PAKIvbCpP9no
 77Qg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=r18z/JBw9RLQuCRhkSydrmEY1w+3CL324jASA6EYkUY=;
 b=EFS2OtS70IhHPN8xuCVYMHDKPg3djvdDykR3P12XVAHDLD/diFz+DA9XfXcLtLjwae
 iGRiv1eu/ZMMGIGruKlmo51jdtRv7krkdyO+iwpX9TVlI5ha9xPUXbMiERGoPlvQCT+o
 MpzFc0G++nRf3/xVU/x1cIDZSPQAjkHXo43rUH0Tqu7VKPj4kLeWki4Ge4ruUoQcmJJ9
 xScdoTCmvGB/zC03OCbs61zj26hkMuNwE16YZo1DShUBDtf84u+14sFQzZW8hmONSoO5
 Fh1q3MjsNxwGvlOnkKucknV1EVgFXaYxFPeSikHUv0/wTGdr0P+kcjJ1TljDOVSZtFwv
 Or5Q==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=OHgc9la7;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.43; 
 Mon, 04 Dec 2017 04:27:43 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=OHgc9la7;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753260AbdLDM1l (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:41 -0500
Received: from mail-wr0-f193.google.com ([209.85.128.193]:41591 "EHLO
 mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753296AbdLDM1k (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:40 -0500
Received: by mail-wr0-f193.google.com with SMTP id z18so17104494wrb.8
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=r18z/JBw9RLQuCRhkSydrmEY1w+3CL324jASA6EYkUY=;
 b=OHgc9la7qCyPwXua4vOrBttRmOMw6EOrNsxhnaZQe9qzUr2xQUbhWydenjwaeYBHj9
 K0/fYDwb2euN6aZD8YITxkJ8r6SR6b2cjF83AKYCwkqBiLgHL3jmSyHOWWrhkFpIr5bZ
 yX/NJlYGR2MomdOwrbcK6q1sN9P8d2XOGfvH0=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=r18z/JBw9RLQuCRhkSydrmEY1w+3CL324jASA6EYkUY=;
 b=FbWg492tHLE0o2TJIi/nNBCzstlwcTvsgWCNciOS881u6xmpjKnkGqqW6kaZKZYydN
 U/NOfU412MktJpIsLlK3FYzfD/7mSTEQ+kS7BSD3t2clO5ZXZ5kBeBT7NP4tRsanIDDt
 LVRBbuj7o8zGNC6xkMkgqVJHmGkyDUzTldpclUwpgFE75hI08pbD/mYiGHLBp45nz1G1
 Al01k/YTllgtdaBwnY7nKvyN6lIcmmHf1Z4g6j+AlsUpkNvRubrmfEVKqV8bhoyFq64q
 Y0ZN135lWiydfpgBEBj3mY9CDyac9GID9vZzlDvZOaMYR6sIzhuJQenzCBk1mcWVKJB2
 cBBg==
X-Gm-Message-State: AJaThX6yz7ZvsF6vU/dllJxXKbkC/R/T73T0jOGfmyfKoVj7YA7ZJ6l+
 o9zV27Zs1DVu0KmzV2qqkDQpXrkRprA=
X-Received: by 10.223.195.138 with SMTP id p10mr13830439wrf.88.1512390458125; 
 Mon, 04 Dec 2017 04:27:38 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.34
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:37 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 14/19] crypto: arm64/aes-blk - yield after processing a
 fixed chunk of input
Date: Mon,  4 Dec 2017 12:26:40 +0000
Message-Id: <20171204122645.31535-15-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Currently, the AES block code may keep preemption disabled for as long
as it takes to process each contigous chunk of input, which could be as
large as a page or skb, depending on the context.

For this code to be useable in RT context, it needs to operate on fixed
chunks of limited size. So let's add a yield after each 16 blocks (for
the CE case) or after every block (for the pure NEON case), which will
disable and re-enable kernel mode NEON if a reschedule is pending.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-ce.S    |  17 +-
 arch/arm64/crypto/aes-modes.S | 379 +++++++++++++-------
 arch/arm64/crypto/aes-neon.S  |   2 +
 3 files changed, 272 insertions(+), 126 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S
index 50330f5c3adc..ccb17b65005a 100644
--- a/arch/arm64/crypto/aes-ce.S
+++ b/arch/arm64/crypto/aes-ce.S
@@ -15,6 +15,8 @@
 #define AES_ENTRY(func)		ENTRY(ce_ ## func)
 #define AES_ENDPROC(func)	ENDPROC(ce_ ## func)
 
+#define AES_YIELD_ORDER		4
+
 	.arch		armv8-a+crypto
 
 	/* preload all round keys */
@@ -30,18 +32,21 @@
 	.endm
 
 	/* prepare for encryption with key in rk[] */
-	.macro		enc_prepare, rounds, rk, ignore
-	load_round_keys	\rounds, \rk
+	.macro		enc_prepare, rounds, rk, temp
+	mov		\temp, \rk
+	load_round_keys	\rounds, \temp
 	.endm
 
 	/* prepare for encryption (again) but with new key in rk[] */
-	.macro		enc_switch_key, rounds, rk, ignore
-	load_round_keys	\rounds, \rk
+	.macro		enc_switch_key, rounds, rk, temp
+	mov		\temp, \rk
+	load_round_keys	\rounds, \temp
 	.endm
 
 	/* prepare for decryption with key in rk[] */
-	.macro		dec_prepare, rounds, rk, ignore
-	load_round_keys	\rounds, \rk
+	.macro		dec_prepare, rounds, rk, temp
+	mov		\temp, \rk
+	load_round_keys	\rounds, \temp
 	.endm
 
 	.macro		do_enc_Nx, de, mc, k, i0, i1, i2, i3
diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S
index a68412e1e3a4..6fcdf82fa295 100644
--- a/arch/arm64/crypto/aes-modes.S
+++ b/arch/arm64/crypto/aes-modes.S
@@ -14,12 +14,12 @@
 	.align		4
 
 aes_encrypt_block4x:
-	encrypt_block4x	v0, v1, v2, v3, w3, x2, x8, w7
+	encrypt_block4x	v0, v1, v2, v3, w22, x21, x8, w7
 	ret
 ENDPROC(aes_encrypt_block4x)
 
 aes_decrypt_block4x:
-	decrypt_block4x	v0, v1, v2, v3, w3, x2, x8, w7
+	decrypt_block4x	v0, v1, v2, v3, w22, x21, x8, w7
 	ret
 ENDPROC(aes_decrypt_block4x)
 
@@ -31,57 +31,85 @@ ENDPROC(aes_decrypt_block4x)
 	 */
 
 AES_ENTRY(aes_ecb_encrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	str		x23, [sp, #48]
 
-	enc_prepare	w3, x2, x5
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+
+.Lecbencrestart:
+	enc_prepare	w22, x21, x5
 
 .LecbencloopNx:
-	subs		w4, w4, #4
+	subs		w23, w23, #4
 	bmi		.Lecbenc1x
-	ld1		{v0.16b-v3.16b}, [x1], #64	/* get 4 pt blocks */
+	ld1		{v0.16b-v3.16b}, [x20], #64	/* get 4 pt blocks */
 	bl		aes_encrypt_block4x
-	st1		{v0.16b-v3.16b}, [x0], #64
+	st1		{v0.16b-v3.16b}, [x19], #64
+	yield_neon	.Lecbencrestart, w23, AES_YIELD_ORDER, 4, .LecbencloopNx
 	b		.LecbencloopNx
 .Lecbenc1x:
-	adds		w4, w4, #4
+	adds		w23, w23, #4
 	beq		.Lecbencout
 .Lecbencloop:
-	ld1		{v0.16b}, [x1], #16		/* get next pt block */
-	encrypt_block	v0, w3, x2, x5, w6
-	st1		{v0.16b}, [x0], #16
-	subs		w4, w4, #1
+	ld1		{v0.16b}, [x20], #16		/* get next pt block */
+	encrypt_block	v0, w22, x21, x5, w6
+	st1		{v0.16b}, [x19], #16
+	subs		w23, w23, #1
 	bne		.Lecbencloop
 .Lecbencout:
-	ldp		x29, x30, [sp], #16
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldr		x23, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 AES_ENDPROC(aes_ecb_encrypt)
 
 
 AES_ENTRY(aes_ecb_decrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	str		x23, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
 
-	dec_prepare	w3, x2, x5
+.Lecbdecrestart:
+	dec_prepare	w22, x21, x5
 
 .LecbdecloopNx:
-	subs		w4, w4, #4
+	subs		w23, w23, #4
 	bmi		.Lecbdec1x
-	ld1		{v0.16b-v3.16b}, [x1], #64	/* get 4 ct blocks */
+	ld1		{v0.16b-v3.16b}, [x20], #64	/* get 4 ct blocks */
 	bl		aes_decrypt_block4x
-	st1		{v0.16b-v3.16b}, [x0], #64
+	st1		{v0.16b-v3.16b}, [x19], #64
+	yield_neon	.Lecbdecrestart, w23, AES_YIELD_ORDER, 4, .LecbdecloopNx
 	b		.LecbdecloopNx
 .Lecbdec1x:
-	adds		w4, w4, #4
+	adds		w23, w23, #4
 	beq		.Lecbdecout
 .Lecbdecloop:
-	ld1		{v0.16b}, [x1], #16		/* get next ct block */
-	decrypt_block	v0, w3, x2, x5, w6
-	st1		{v0.16b}, [x0], #16
-	subs		w4, w4, #1
+	ld1		{v0.16b}, [x20], #16		/* get next ct block */
+	decrypt_block	v0, w22, x21, x5, w6
+	st1		{v0.16b}, [x19], #16
+	subs		w23, w23, #1
 	bne		.Lecbdecloop
 .Lecbdecout:
-	ldp		x29, x30, [sp], #16
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldr		x23, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 AES_ENDPROC(aes_ecb_decrypt)
 
@@ -94,78 +122,114 @@ AES_ENDPROC(aes_ecb_decrypt)
 	 */
 
 AES_ENTRY(aes_cbc_encrypt)
-	ld1		{v4.16b}, [x5]			/* get iv */
-	enc_prepare	w3, x2, x6
+	stp		x29, x30, [sp, #-64]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x5
+
+.Lcbcencrestart:
+	ld1		{v4.16b}, [x24]			/* get iv */
+	enc_prepare	w22, x21, x6
 
 .Lcbcencloop4x:
-	subs		w4, w4, #4
+	subs		w23, w23, #4
 	bmi		.Lcbcenc1x
-	ld1		{v0.16b-v3.16b}, [x1], #64	/* get 4 pt blocks */
+	ld1		{v0.16b-v3.16b}, [x20], #64	/* get 4 pt blocks */
 	eor		v0.16b, v0.16b, v4.16b		/* ..and xor with iv */
-	encrypt_block	v0, w3, x2, x6, w7
+	encrypt_block	v0, w22, x21, x6, w7
 	eor		v1.16b, v1.16b, v0.16b
-	encrypt_block	v1, w3, x2, x6, w7
+	encrypt_block	v1, w22, x21, x6, w7
 	eor		v2.16b, v2.16b, v1.16b
-	encrypt_block	v2, w3, x2, x6, w7
+	encrypt_block	v2, w22, x21, x6, w7
 	eor		v3.16b, v3.16b, v2.16b
-	encrypt_block	v3, w3, x2, x6, w7
-	st1		{v0.16b-v3.16b}, [x0], #64
+	encrypt_block	v3, w22, x21, x6, w7
+	st1		{v0.16b-v3.16b}, [x19], #64
 	mov		v4.16b, v3.16b
+	st1		{v4.16b}, [x24]			/* return iv */
+	yield_neon	.Lcbcencrestart, w23, AES_YIELD_ORDER, 4, .Lcbcencloop4x
 	b		.Lcbcencloop4x
 .Lcbcenc1x:
-	adds		w4, w4, #4
+	adds		w23, w23, #4
 	beq		.Lcbcencout
 .Lcbcencloop:
-	ld1		{v0.16b}, [x1], #16		/* get next pt block */
+	ld1		{v0.16b}, [x20], #16		/* get next pt block */
 	eor		v4.16b, v4.16b, v0.16b		/* ..and xor with iv */
-	encrypt_block	v4, w3, x2, x6, w7
-	st1		{v4.16b}, [x0], #16
-	subs		w4, w4, #1
+	encrypt_block	v4, w22, x21, x6, w7
+	st1		{v4.16b}, [x19], #16
+	subs		w23, w23, #1
 	bne		.Lcbcencloop
 .Lcbcencout:
-	st1		{v4.16b}, [x5]			/* return iv */
+	st1		{v4.16b}, [x24]			/* return iv */
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 AES_ENDPROC(aes_cbc_encrypt)
 
 
 AES_ENTRY(aes_cbc_decrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
 
-	ld1		{v7.16b}, [x5]			/* get iv */
-	dec_prepare	w3, x2, x6
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x5
+
+.Lcbcdecrestart:
+	ld1		{v7.16b}, [x24]			/* get iv */
+	dec_prepare	w22, x21, x6
 
 .LcbcdecloopNx:
-	subs		w4, w4, #4
+	subs		w23, w23, #4
 	bmi		.Lcbcdec1x
-	ld1		{v0.16b-v3.16b}, [x1], #64	/* get 4 ct blocks */
+	ld1		{v0.16b-v3.16b}, [x20], #64	/* get 4 ct blocks */
 	mov		v4.16b, v0.16b
 	mov		v5.16b, v1.16b
 	mov		v6.16b, v2.16b
 	bl		aes_decrypt_block4x
-	sub		x1, x1, #16
+	sub		x20, x20, #16
 	eor		v0.16b, v0.16b, v7.16b
 	eor		v1.16b, v1.16b, v4.16b
-	ld1		{v7.16b}, [x1], #16		/* reload 1 ct block */
+	ld1		{v7.16b}, [x20], #16		/* reload 1 ct block */
 	eor		v2.16b, v2.16b, v5.16b
 	eor		v3.16b, v3.16b, v6.16b
-	st1		{v0.16b-v3.16b}, [x0], #64
+	st1		{v0.16b-v3.16b}, [x19], #64
+	st1		{v7.16b}, [x24]			/* return iv */
+	yield_neon	.Lcbcdecrestart, w23, AES_YIELD_ORDER, 4, .LcbcdecloopNx
 	b		.LcbcdecloopNx
 .Lcbcdec1x:
-	adds		w4, w4, #4
+	adds		w23, w23, #4
 	beq		.Lcbcdecout
 .Lcbcdecloop:
-	ld1		{v1.16b}, [x1], #16		/* get next ct block */
+	ld1		{v1.16b}, [x20], #16		/* get next ct block */
 	mov		v0.16b, v1.16b			/* ...and copy to v0 */
-	decrypt_block	v0, w3, x2, x6, w7
+	decrypt_block	v0, w22, x21, x6, w7
 	eor		v0.16b, v0.16b, v7.16b		/* xor with iv => pt */
 	mov		v7.16b, v1.16b			/* ct is next iv */
-	st1		{v0.16b}, [x0], #16
-	subs		w4, w4, #1
+	st1		{v0.16b}, [x19], #16
+	subs		w23, w23, #1
 	bne		.Lcbcdecloop
 .Lcbcdecout:
-	st1		{v7.16b}, [x5]			/* return iv */
-	ldp		x29, x30, [sp], #16
+	st1		{v7.16b}, [x24]			/* return iv */
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 AES_ENDPROC(aes_cbc_decrypt)
 
@@ -176,19 +240,30 @@ AES_ENDPROC(aes_cbc_decrypt)
 	 */
 
 AES_ENTRY(aes_ctr_encrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x5
 
-	enc_prepare	w3, x2, x6
-	ld1		{v4.16b}, [x5]
+.Lctrrestart:
+	enc_prepare	w22, x21, x6
+	ld1		{v4.16b}, [x24]
 
 	umov		x6, v4.d[1]		/* keep swabbed ctr in reg */
 	rev		x6, x6
-	cmn		w6, w4			/* 32 bit overflow? */
-	bcs		.Lctrloop
 .LctrloopNx:
-	subs		w4, w4, #4
+	subs		w23, w23, #4
 	bmi		.Lctr1x
+	cmn		w6, #4			/* 32 bit overflow? */
+	bcs		.Lctr1x
 	ldr		q8, =0x30000000200000001	/* addends 1,2,3[,0] */
 	dup		v7.4s, w6
 	mov		v0.16b, v4.16b
@@ -200,25 +275,27 @@ AES_ENTRY(aes_ctr_encrypt)
 	mov		v1.s[3], v8.s[0]
 	mov		v2.s[3], v8.s[1]
 	mov		v3.s[3], v8.s[2]
-	ld1		{v5.16b-v7.16b}, [x1], #48	/* get 3 input blocks */
+	ld1		{v5.16b-v7.16b}, [x20], #48	/* get 3 input blocks */
 	bl		aes_encrypt_block4x
 	eor		v0.16b, v5.16b, v0.16b
-	ld1		{v5.16b}, [x1], #16		/* get 1 input block  */
+	ld1		{v5.16b}, [x20], #16		/* get 1 input block  */
 	eor		v1.16b, v6.16b, v1.16b
 	eor		v2.16b, v7.16b, v2.16b
 	eor		v3.16b, v5.16b, v3.16b
-	st1		{v0.16b-v3.16b}, [x0], #64
+	st1		{v0.16b-v3.16b}, [x19], #64
 	add		x6, x6, #4
 	rev		x7, x6
 	ins		v4.d[1], x7
-	cbz		w4, .Lctrout
+	cbz		w23, .Lctrout
+	st1		{v4.16b}, [x24]		/* return next CTR value */
+	yield_neon	.Lctrrestart, w23, AES_YIELD_ORDER, 4, .LctrloopNx
 	b		.LctrloopNx
 .Lctr1x:
-	adds		w4, w4, #4
+	adds		w23, w23, #4
 	beq		.Lctrout
 .Lctrloop:
 	mov		v0.16b, v4.16b
-	encrypt_block	v0, w3, x2, x8, w7
+	encrypt_block	v0, w22, x21, x8, w7
 
 	adds		x6, x6, #1		/* increment BE ctr */
 	rev		x7, x6
@@ -226,22 +303,25 @@ AES_ENTRY(aes_ctr_encrypt)
 	bcs		.Lctrcarry		/* overflow? */
 
 .Lctrcarrydone:
-	subs		w4, w4, #1
+	subs		w23, w23, #1
 	bmi		.Lctrtailblock		/* blocks <0 means tail block */
-	ld1		{v3.16b}, [x1], #16
+	ld1		{v3.16b}, [x20], #16
 	eor		v3.16b, v0.16b, v3.16b
-	st1		{v3.16b}, [x0], #16
+	st1		{v3.16b}, [x19], #16
 	bne		.Lctrloop
 
 .Lctrout:
-	st1		{v4.16b}, [x5]		/* return next CTR value */
-	ldp		x29, x30, [sp], #16
+	st1		{v4.16b}, [x24]		/* return next CTR value */
+.Lctrret:
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 
 .Lctrtailblock:
-	st1		{v0.16b}, [x0]
-	ldp		x29, x30, [sp], #16
-	ret
+	st1		{v0.16b}, [x19]
+	b		.Lctrret
 
 .Lctrcarry:
 	umov		x7, v4.d[0]		/* load upper word of ctr  */
@@ -274,10 +354,20 @@ CPU_LE(	.quad		1, 0x87		)
 CPU_BE(	.quad		0x87, 1		)
 
 AES_ENTRY(aes_xts_encrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
-
-	ld1		{v4.16b}, [x6]
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x6
+
+	ld1		{v4.16b}, [x24]
 	cbz		w7, .Lxtsencnotfirst
 
 	enc_prepare	w3, x5, x8
@@ -286,15 +376,17 @@ AES_ENTRY(aes_xts_encrypt)
 	ldr		q7, .Lxts_mul_x
 	b		.LxtsencNx
 
+.Lxtsencrestart:
+	ld1		{v4.16b}, [x24]
 .Lxtsencnotfirst:
-	enc_prepare	w3, x2, x8
+	enc_prepare	w22, x21, x8
 .LxtsencloopNx:
 	ldr		q7, .Lxts_mul_x
 	next_tweak	v4, v4, v7, v8
 .LxtsencNx:
-	subs		w4, w4, #4
+	subs		w23, w23, #4
 	bmi		.Lxtsenc1x
-	ld1		{v0.16b-v3.16b}, [x1], #64	/* get 4 pt blocks */
+	ld1		{v0.16b-v3.16b}, [x20], #64	/* get 4 pt blocks */
 	next_tweak	v5, v4, v7, v8
 	eor		v0.16b, v0.16b, v4.16b
 	next_tweak	v6, v5, v7, v8
@@ -307,35 +399,50 @@ AES_ENTRY(aes_xts_encrypt)
 	eor		v0.16b, v0.16b, v4.16b
 	eor		v1.16b, v1.16b, v5.16b
 	eor		v2.16b, v2.16b, v6.16b
-	st1		{v0.16b-v3.16b}, [x0], #64
+	st1		{v0.16b-v3.16b}, [x19], #64
 	mov		v4.16b, v7.16b
-	cbz		w4, .Lxtsencout
+	cbz		w23, .Lxtsencout
+	st1		{v4.16b}, [x24]
+	yield_neon	.Lxtsencrestart, w23, AES_YIELD_ORDER, 4, .LxtsencloopNx
 	b		.LxtsencloopNx
 .Lxtsenc1x:
-	adds		w4, w4, #4
+	adds		w23, w23, #4
 	beq		.Lxtsencout
 .Lxtsencloop:
-	ld1		{v1.16b}, [x1], #16
+	ld1		{v1.16b}, [x20], #16
 	eor		v0.16b, v1.16b, v4.16b
-	encrypt_block	v0, w3, x2, x8, w7
+	encrypt_block	v0, w22, x21, x8, w7
 	eor		v0.16b, v0.16b, v4.16b
-	st1		{v0.16b}, [x0], #16
-	subs		w4, w4, #1
+	st1		{v0.16b}, [x19], #16
+	subs		w23, w23, #1
 	beq		.Lxtsencout
 	next_tweak	v4, v4, v7, v8
 	b		.Lxtsencloop
 .Lxtsencout:
-	st1		{v4.16b}, [x6]
-	ldp		x29, x30, [sp], #16
+	st1		{v4.16b}, [x24]
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 AES_ENDPROC(aes_xts_encrypt)
 
 
 AES_ENTRY(aes_xts_decrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
-
-	ld1		{v4.16b}, [x6]
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x6
+
+	ld1		{v4.16b}, [x24]
 	cbz		w7, .Lxtsdecnotfirst
 
 	enc_prepare	w3, x5, x8
@@ -344,15 +451,17 @@ AES_ENTRY(aes_xts_decrypt)
 	ldr		q7, .Lxts_mul_x
 	b		.LxtsdecNx
 
+.Lxtsdecrestart:
+	ld1		{v4.16b}, [x24]
 .Lxtsdecnotfirst:
-	dec_prepare	w3, x2, x8
+	dec_prepare	w22, x21, x8
 .LxtsdecloopNx:
 	ldr		q7, .Lxts_mul_x
 	next_tweak	v4, v4, v7, v8
 .LxtsdecNx:
-	subs		w4, w4, #4
+	subs		w23, w23, #4
 	bmi		.Lxtsdec1x
-	ld1		{v0.16b-v3.16b}, [x1], #64	/* get 4 ct blocks */
+	ld1		{v0.16b-v3.16b}, [x20], #64	/* get 4 ct blocks */
 	next_tweak	v5, v4, v7, v8
 	eor		v0.16b, v0.16b, v4.16b
 	next_tweak	v6, v5, v7, v8
@@ -365,26 +474,31 @@ AES_ENTRY(aes_xts_decrypt)
 	eor		v0.16b, v0.16b, v4.16b
 	eor		v1.16b, v1.16b, v5.16b
 	eor		v2.16b, v2.16b, v6.16b
-	st1		{v0.16b-v3.16b}, [x0], #64
+	st1		{v0.16b-v3.16b}, [x19], #64
 	mov		v4.16b, v7.16b
-	cbz		w4, .Lxtsdecout
+	cbz		w23, .Lxtsdecout
+	st1		{v4.16b}, [x24]
+	yield_neon	.Lxtsdecrestart, w23, AES_YIELD_ORDER, 4, .LxtsdecloopNx
 	b		.LxtsdecloopNx
 .Lxtsdec1x:
-	adds		w4, w4, #4
+	adds		w23, w23, #4
 	beq		.Lxtsdecout
 .Lxtsdecloop:
-	ld1		{v1.16b}, [x1], #16
+	ld1		{v1.16b}, [x20], #16
 	eor		v0.16b, v1.16b, v4.16b
-	decrypt_block	v0, w3, x2, x8, w7
+	decrypt_block	v0, w22, x21, x8, w7
 	eor		v0.16b, v0.16b, v4.16b
-	st1		{v0.16b}, [x0], #16
-	subs		w4, w4, #1
+	st1		{v0.16b}, [x19], #16
+	subs		w23, w23, #1
 	beq		.Lxtsdecout
 	next_tweak	v4, v4, v7, v8
 	b		.Lxtsdecloop
 .Lxtsdecout:
-	st1		{v4.16b}, [x6]
-	ldp		x29, x30, [sp], #16
+	st1		{v4.16b}, [x24]
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 AES_ENDPROC(aes_xts_decrypt)
 
@@ -393,43 +507,68 @@ AES_ENDPROC(aes_xts_decrypt)
 	 *		  int blocks, u8 dg[], int enc_before, int enc_after)
 	 */
 AES_ENTRY(aes_mac_update)
-	ld1		{v0.16b}, [x4]			/* get dg */
+	stp		x29, x30, [sp, #-64]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x6
+
+	ld1		{v0.16b}, [x23]			/* get dg */
 	enc_prepare	w2, x1, x7
 	cbz		w5, .Lmacloop4x
 
 	encrypt_block	v0, w2, x1, x7, w8
 
 .Lmacloop4x:
-	subs		w3, w3, #4
+	subs		w22, w22, #4
 	bmi		.Lmac1x
-	ld1		{v1.16b-v4.16b}, [x0], #64	/* get next pt block */
+	ld1		{v1.16b-v4.16b}, [x19], #64	/* get next pt block */
 	eor		v0.16b, v0.16b, v1.16b		/* ..and xor with dg */
-	encrypt_block	v0, w2, x1, x7, w8
+	encrypt_block	v0, w21, x20, x7, w8
 	eor		v0.16b, v0.16b, v2.16b
-	encrypt_block	v0, w2, x1, x7, w8
+	encrypt_block	v0, w21, x20, x7, w8
 	eor		v0.16b, v0.16b, v3.16b
-	encrypt_block	v0, w2, x1, x7, w8
+	encrypt_block	v0, w21, x20, x7, w8
 	eor		v0.16b, v0.16b, v4.16b
-	cmp		w3, wzr
-	csinv		x5, x6, xzr, eq
+	cmp		w22, wzr
+	csinv		x5, x24, xzr, eq
 	cbz		w5, .Lmacout
-	encrypt_block	v0, w2, x1, x7, w8
+	encrypt_block	v0, w21, x20, x7, w8
+	st1		{v0.16b}, [x23]			/* return dg */
+	yield_neon	.Lmacrestart, w22, AES_YIELD_ORDER, 4, .Lmacloop4x
 	b		.Lmacloop4x
 .Lmac1x:
-	add		w3, w3, #4
+	add		w22, w22, #4
 .Lmacloop:
-	cbz		w3, .Lmacout
-	ld1		{v1.16b}, [x0], #16		/* get next pt block */
+	cbz		w22, .Lmacout
+	ld1		{v1.16b}, [x19], #16		/* get next pt block */
 	eor		v0.16b, v0.16b, v1.16b		/* ..and xor with dg */
 
-	subs		w3, w3, #1
-	csinv		x5, x6, xzr, eq
+	subs		w22, w22, #1
+	csinv		x5, x24, xzr, eq
 	cbz		w5, .Lmacout
 
-	encrypt_block	v0, w2, x1, x7, w8
+.Lmacenc:
+	encrypt_block	v0, w21, x20, x7, w8
 	b		.Lmacloop
 
 .Lmacout:
-	st1		{v0.16b}, [x4]			/* return dg */
+	st1		{v0.16b}, [x23]			/* return dg */
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
+
+.Lmacrestart:
+	ld1		{v0.16b}, [x23]			/* get dg */
+	enc_prepare	w21, x20, x0
+	b		.Lmacloop4x
 AES_ENDPROC(aes_mac_update)
diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S
index f1e3aa2732f9..dab7be7d3628 100644
--- a/arch/arm64/crypto/aes-neon.S
+++ b/arch/arm64/crypto/aes-neon.S
@@ -14,6 +14,8 @@
 #define AES_ENTRY(func)		ENTRY(neon_ ## func)
 #define AES_ENDPROC(func)	ENDPROC(neon_ ## func)
 
+#define AES_YIELD_ORDER		0
+
 	/* multiply by polynomial 'x' in GF(2^8) */
 	.macro		mul_by_x, out, in, temp, const
 	sshr		\temp, \in, #7

From patchwork Mon Dec  4 12:26:41 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120525
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4366028qgn;
 Mon, 4 Dec 2017 04:27:47 -0800 (PST)
X-Google-Smtp-Source: AGs4zMaS9bdofeKMgqnGjVbaqyDLAuPGNceFVv2qAy4iflgjqy4SPe3XWgo47X2PSH9RlrOgBJx/
X-Received: by 10.101.102.66 with SMTP id z2mr14014025pgv.352.1512390467107; 
 Mon, 04 Dec 2017 04:27:47 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390467; cv=none;
 d=google.com; s=arc-20160816;
 b=PLT4+NGIYpLha7ut1hvdA5S2VQTjAdrPjjaukn9Q5GlUJSGDqwAAgw43gts/2xoJ8E
 pj+6DpKSJb1+/JDYUKhw2Dj/bxfXvppTvrzAqR9QlKzx9hPPtYZza9AHCFhpNpz39AOb
 /QSLuBjaw1HDVhKQUFjh0YyQBiFVHtghZLRDLecUDkBIsS0dmu3rMc8RhnRdBC9RcUnx
 qFgtfXAp/L7QYuantEaU+68o5uNVoSh5h1+bCOW/f4iijJbKpqJMAmKVSFg3ymYFeHMD
 F51DYY23HmOAj94SxEWtQKElfZ8x/54Lt6QaRftjtagVBHT08+uroP4DLymBKzkwP+si
 mfIw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=BKeaFq/3e/pVWNSo0hlDncWIIBb4lu/zqClWxciSjME=;
 b=aerLpzNmRQ9uGxEEXcElN6ORzwOEo4n2jAeCSgdfhPxiOQEuDTjo/o/VG04DYg45Mo
 BFabF5LrzNRqca5xiEiIOGH3euT2O7X8H1ROYj0Mnm1j9LBgjoEbo0UWa9QBuCJhgH/y
 QXNMjer317guMBRwItDuVO8sZilA+JVN0F0dBKr8YE+CYrf4hmPeTi/4engIsD8Si9/Z
 fenrHu46ZxkJDfL9sFaHn91iCSZJG/5/lGUEiDVFM0RrhGaFj3VqM+SuxvGktoDcuwgX
 Waga+2fVLlM4bhONh5HOBGzSlcgNW2hU4TUbuLHzUsyFQqavc6qQRyvaS3svMQK8Fcx7
 l43g==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=e0dyzNYs;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.46; 
 Mon, 04 Dec 2017 04:27:47 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=e0dyzNYs;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753324AbdLDM1p (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:45 -0500
Received: from mail-wm0-f68.google.com ([74.125.82.68]:45657 "EHLO
 mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753236AbdLDM1m (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:42 -0500
Received: by mail-wm0-f68.google.com with SMTP id 9so5449464wme.4
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=BKeaFq/3e/pVWNSo0hlDncWIIBb4lu/zqClWxciSjME=;
 b=e0dyzNYsk/p9zdGG4Ify+WoPuGJw8KJ8Wu2hED4P+211ZoVH+7jgfVgNzcT7Ul3e2M
 NCFgJF/RfrutOnJy9Y++jKDdgz+SYRRLv6tc2VVjd+BwHihcaDIiyo2NJn+CZE53ezDO
 uEDmLLlDr5tFsOO5FHOSDKPdEJG1vPu7g04mc=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=BKeaFq/3e/pVWNSo0hlDncWIIBb4lu/zqClWxciSjME=;
 b=rcvtyUEGDSnK1gdJYg5NqXJ2+5webWwsd1DRglnHxW+lOJcNsREgeSJv4hTghSI3mf
 5jDwW735qHpVCZf+cc9unxN7uRduaw2a76OeAm5MSki+i9ldZuMQFsngcnHLLauv0jJ/
 Un76QqLaZNeBIQkHaaY+M0Ulj/1ySPhbQsIgCt/GPnoTy3S81x8+7gWvhNTOnb+Nue9z
 1sdIMFf99Q6WxuhPZNFU2XJnez5uwq3xJwIe2HxHdtsww14gTjIHZ+MwS3I/o8NGd+bp
 EP7UCbld/nmcrflm+uA585gc5GJ1EK9E2htCeldey/IYNufFwJzWn7W63SU0h39X6tnC
 ed0w==
X-Gm-Message-State: AKGB3mL1NVMlwu4mcf/8w8IPnD/y3WlZgncpAdUDj2BPQO4QVxauU73X
 GwVaNT6fBArWMmEp1L0a3eXa/SY1i68=
X-Received: by 10.28.174.20 with SMTP id x20mr3130431wme.27.1512390460974;
 Mon, 04 Dec 2017 04:27:40 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.38
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:40 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 15/19] crypto: arm64/aes-bs - yield after processing each
 128 bytes of input
Date: Mon,  4 Dec 2017 12:26:41 +0000
Message-Id: <20171204122645.31535-16-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Currently, the bit-sliced AES code may keep preemption disabled for as
long as it takes to process each contigous chunk of input, which could
be as large as a page or skb, depending on the context.

For this code to be useable in RT context, it needs to operate on fixed
chunks of limited size. So let's add a yield after each 128 bytes of input,
(i.e., 8x the AES block size, which is the natural granularity for a bit
sliced algorithm.) This will disable and re-enable kernel mode NEON if a
reschedule is pending.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/aes-neonbs-core.S | 317 ++++++++++++--------
 1 file changed, 190 insertions(+), 127 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S
index ca0472500433..4532a2262742 100644
--- a/arch/arm64/crypto/aes-neonbs-core.S
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -565,54 +565,68 @@ ENDPROC(aesbs_decrypt8)
 	 *		     int blocks)
 	 */
 	.macro		__ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	str		x23, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
 
 99:	mov		x5, #1
-	lsl		x5, x5, x4
-	subs		w4, w4, #8
-	csel		x4, x4, xzr, pl
+	lsl		x5, x5, x23
+	subs		w23, w23, #8
+	csel		x23, x23, xzr, pl
 	csel		x5, x5, xzr, mi
 
-	ld1		{v0.16b}, [x1], #16
+	ld1		{v0.16b}, [x20], #16
 	tbnz		x5, #1, 0f
-	ld1		{v1.16b}, [x1], #16
+	ld1		{v1.16b}, [x20], #16
 	tbnz		x5, #2, 0f
-	ld1		{v2.16b}, [x1], #16
+	ld1		{v2.16b}, [x20], #16
 	tbnz		x5, #3, 0f
-	ld1		{v3.16b}, [x1], #16
+	ld1		{v3.16b}, [x20], #16
 	tbnz		x5, #4, 0f
-	ld1		{v4.16b}, [x1], #16
+	ld1		{v4.16b}, [x20], #16
 	tbnz		x5, #5, 0f
-	ld1		{v5.16b}, [x1], #16
+	ld1		{v5.16b}, [x20], #16
 	tbnz		x5, #6, 0f
-	ld1		{v6.16b}, [x1], #16
+	ld1		{v6.16b}, [x20], #16
 	tbnz		x5, #7, 0f
-	ld1		{v7.16b}, [x1], #16
+	ld1		{v7.16b}, [x20], #16
 
-0:	mov		bskey, x2
-	mov		rounds, x3
+0:	mov		bskey, x21
+	mov		rounds, x22
 	bl		\do8
 
-	st1		{\o0\().16b}, [x0], #16
+	st1		{\o0\().16b}, [x19], #16
 	tbnz		x5, #1, 1f
-	st1		{\o1\().16b}, [x0], #16
+	st1		{\o1\().16b}, [x19], #16
 	tbnz		x5, #2, 1f
-	st1		{\o2\().16b}, [x0], #16
+	st1		{\o2\().16b}, [x19], #16
 	tbnz		x5, #3, 1f
-	st1		{\o3\().16b}, [x0], #16
+	st1		{\o3\().16b}, [x19], #16
 	tbnz		x5, #4, 1f
-	st1		{\o4\().16b}, [x0], #16
+	st1		{\o4\().16b}, [x19], #16
 	tbnz		x5, #5, 1f
-	st1		{\o5\().16b}, [x0], #16
+	st1		{\o5\().16b}, [x19], #16
 	tbnz		x5, #6, 1f
-	st1		{\o6\().16b}, [x0], #16
+	st1		{\o6\().16b}, [x19], #16
 	tbnz		x5, #7, 1f
-	st1		{\o7\().16b}, [x0], #16
+	st1		{\o7\().16b}, [x19], #16
 
-	cbnz		x4, 99b
+	cbz		x23, 1f
+	yield_neon	99b
+	b		99b
 
-1:	ldp		x29, x30, [sp], #16
+1:	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldr		x23, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 	.endm
 
@@ -632,43 +646,53 @@ ENDPROC(aesbs_ecb_decrypt)
 	 */
 	.align		4
 ENTRY(aesbs_cbc_decrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-64]!
 	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x5
 
 99:	mov		x6, #1
-	lsl		x6, x6, x4
-	subs		w4, w4, #8
-	csel		x4, x4, xzr, pl
+	lsl		x6, x6, x23
+	subs		w23, w23, #8
+	csel		x23, x23, xzr, pl
 	csel		x6, x6, xzr, mi
 
-	ld1		{v0.16b}, [x1], #16
+	ld1		{v0.16b}, [x20], #16
 	mov		v25.16b, v0.16b
 	tbnz		x6, #1, 0f
-	ld1		{v1.16b}, [x1], #16
+	ld1		{v1.16b}, [x20], #16
 	mov		v26.16b, v1.16b
 	tbnz		x6, #2, 0f
-	ld1		{v2.16b}, [x1], #16
+	ld1		{v2.16b}, [x20], #16
 	mov		v27.16b, v2.16b
 	tbnz		x6, #3, 0f
-	ld1		{v3.16b}, [x1], #16
+	ld1		{v3.16b}, [x20], #16
 	mov		v28.16b, v3.16b
 	tbnz		x6, #4, 0f
-	ld1		{v4.16b}, [x1], #16
+	ld1		{v4.16b}, [x20], #16
 	mov		v29.16b, v4.16b
 	tbnz		x6, #5, 0f
-	ld1		{v5.16b}, [x1], #16
+	ld1		{v5.16b}, [x20], #16
 	mov		v30.16b, v5.16b
 	tbnz		x6, #6, 0f
-	ld1		{v6.16b}, [x1], #16
+	ld1		{v6.16b}, [x20], #16
 	mov		v31.16b, v6.16b
 	tbnz		x6, #7, 0f
-	ld1		{v7.16b}, [x1]
+	ld1		{v7.16b}, [x20]
 
-0:	mov		bskey, x2
-	mov		rounds, x3
+0:	mov		bskey, x21
+	mov		rounds, x22
 	bl		aesbs_decrypt8
 
-	ld1		{v24.16b}, [x5]			// load IV
+	ld1		{v24.16b}, [x24]		// load IV
 
 	eor		v1.16b, v1.16b, v25.16b
 	eor		v6.16b, v6.16b, v26.16b
@@ -679,34 +703,39 @@ ENTRY(aesbs_cbc_decrypt)
 	eor		v3.16b, v3.16b, v30.16b
 	eor		v5.16b, v5.16b, v31.16b
 
-	st1		{v0.16b}, [x0], #16
+	st1		{v0.16b}, [x19], #16
 	mov		v24.16b, v25.16b
 	tbnz		x6, #1, 1f
-	st1		{v1.16b}, [x0], #16
+	st1		{v1.16b}, [x19], #16
 	mov		v24.16b, v26.16b
 	tbnz		x6, #2, 1f
-	st1		{v6.16b}, [x0], #16
+	st1		{v6.16b}, [x19], #16
 	mov		v24.16b, v27.16b
 	tbnz		x6, #3, 1f
-	st1		{v4.16b}, [x0], #16
+	st1		{v4.16b}, [x19], #16
 	mov		v24.16b, v28.16b
 	tbnz		x6, #4, 1f
-	st1		{v2.16b}, [x0], #16
+	st1		{v2.16b}, [x19], #16
 	mov		v24.16b, v29.16b
 	tbnz		x6, #5, 1f
-	st1		{v7.16b}, [x0], #16
+	st1		{v7.16b}, [x19], #16
 	mov		v24.16b, v30.16b
 	tbnz		x6, #6, 1f
-	st1		{v3.16b}, [x0], #16
+	st1		{v3.16b}, [x19], #16
 	mov		v24.16b, v31.16b
 	tbnz		x6, #7, 1f
-	ld1		{v24.16b}, [x1], #16
-	st1		{v5.16b}, [x0], #16
-1:	st1		{v24.16b}, [x5]			// store IV
-
-	cbnz		x4, 99b
-
-	ldp		x29, x30, [sp], #16
+	ld1		{v24.16b}, [x20], #16
+	st1		{v5.16b}, [x19], #16
+1:	st1		{v24.16b}, [x24]		// store IV
+
+	cbz		x23, 2f
+	yield_neon	99b
+	b		99b
+
+2:	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 ENDPROC(aesbs_cbc_decrypt)
 
@@ -731,65 +760,75 @@ CPU_BE(	.quad		0x87, 1		)
 	 */
 __xts_crypt8:
 	mov		x6, #1
-	lsl		x6, x6, x4
-	subs		w4, w4, #8
-	csel		x4, x4, xzr, pl
+	lsl		x6, x6, x23
+	subs		w23, w23, #8
+	csel		x23, x23, xzr, pl
 	csel		x6, x6, xzr, mi
 
-	ld1		{v0.16b}, [x1], #16
+	ld1		{v0.16b}, [x20], #16
 	next_tweak	v26, v25, v30, v31
 	eor		v0.16b, v0.16b, v25.16b
 	tbnz		x6, #1, 0f
 
-	ld1		{v1.16b}, [x1], #16
+	ld1		{v1.16b}, [x20], #16
 	next_tweak	v27, v26, v30, v31
 	eor		v1.16b, v1.16b, v26.16b
 	tbnz		x6, #2, 0f
 
-	ld1		{v2.16b}, [x1], #16
+	ld1		{v2.16b}, [x20], #16
 	next_tweak	v28, v27, v30, v31
 	eor		v2.16b, v2.16b, v27.16b
 	tbnz		x6, #3, 0f
 
-	ld1		{v3.16b}, [x1], #16
+	ld1		{v3.16b}, [x20], #16
 	next_tweak	v29, v28, v30, v31
 	eor		v3.16b, v3.16b, v28.16b
 	tbnz		x6, #4, 0f
 
-	ld1		{v4.16b}, [x1], #16
+	ld1		{v4.16b}, [x20], #16
 	str		q29, [sp, #16]
 	eor		v4.16b, v4.16b, v29.16b
 	next_tweak	v29, v29, v30, v31
 	tbnz		x6, #5, 0f
 
-	ld1		{v5.16b}, [x1], #16
+	ld1		{v5.16b}, [x20], #16
 	str		q29, [sp, #32]
 	eor		v5.16b, v5.16b, v29.16b
 	next_tweak	v29, v29, v30, v31
 	tbnz		x6, #6, 0f
 
-	ld1		{v6.16b}, [x1], #16
+	ld1		{v6.16b}, [x20], #16
 	str		q29, [sp, #48]
 	eor		v6.16b, v6.16b, v29.16b
 	next_tweak	v29, v29, v30, v31
 	tbnz		x6, #7, 0f
 
-	ld1		{v7.16b}, [x1], #16
+	ld1		{v7.16b}, [x20], #16
 	str		q29, [sp, #64]
 	eor		v7.16b, v7.16b, v29.16b
 	next_tweak	v29, v29, v30, v31
 
-0:	mov		bskey, x2
-	mov		rounds, x3
+0:	mov		bskey, x21
+	mov		rounds, x22
 	br		x7
 ENDPROC(__xts_crypt8)
 
 	.macro		__xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
-	stp		x29, x30, [sp, #-80]!
+	stp		x29, x30, [sp, #-128]!
 	mov		x29, sp
+	stp		x19, x20, [sp, #80]
+	stp		x21, x22, [sp, #96]
+	stp		x23, x24, [sp, #112]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x5
 
-	ldr		q30, .Lxts_mul_x
-	ld1		{v25.16b}, [x5]
+0:	ldr		q30, .Lxts_mul_x
+	ld1		{v25.16b}, [x24]
 
 99:	adr		x7, \do8
 	bl		__xts_crypt8
@@ -802,16 +841,16 @@ ENDPROC(__xts_crypt8)
 	eor		\o2\().16b, \o2\().16b, v27.16b
 	eor		\o3\().16b, \o3\().16b, v28.16b
 
-	st1		{\o0\().16b}, [x0], #16
+	st1		{\o0\().16b}, [x19], #16
 	mov		v25.16b, v26.16b
 	tbnz		x6, #1, 1f
-	st1		{\o1\().16b}, [x0], #16
+	st1		{\o1\().16b}, [x19], #16
 	mov		v25.16b, v27.16b
 	tbnz		x6, #2, 1f
-	st1		{\o2\().16b}, [x0], #16
+	st1		{\o2\().16b}, [x19], #16
 	mov		v25.16b, v28.16b
 	tbnz		x6, #3, 1f
-	st1		{\o3\().16b}, [x0], #16
+	st1		{\o3\().16b}, [x19], #16
 	mov		v25.16b, v29.16b
 	tbnz		x6, #4, 1f
 
@@ -820,18 +859,24 @@ ENDPROC(__xts_crypt8)
 	eor		\o6\().16b, \o6\().16b, v18.16b
 	eor		\o7\().16b, \o7\().16b, v19.16b
 
-	st1		{\o4\().16b}, [x0], #16
+	st1		{\o4\().16b}, [x19], #16
 	tbnz		x6, #5, 1f
-	st1		{\o5\().16b}, [x0], #16
+	st1		{\o5\().16b}, [x19], #16
 	tbnz		x6, #6, 1f
-	st1		{\o6\().16b}, [x0], #16
+	st1		{\o6\().16b}, [x19], #16
 	tbnz		x6, #7, 1f
-	st1		{\o7\().16b}, [x0], #16
-
-	cbnz		x4, 99b
-
-1:	st1		{v25.16b}, [x5]
-	ldp		x29, x30, [sp], #80
+	st1		{\o7\().16b}, [x19], #16
+
+	cbz		x23, 1f
+	st1		{v25.16b}, [x24]
+	yield_neon	0b
+	b		99b
+
+1:	st1		{v25.16b}, [x24]
+	ldp		x19, x20, [sp, #80]
+	ldp		x21, x22, [sp, #96]
+	ldp		x23, x24, [sp, #112]
+	ldp		x29, x30, [sp], #128
 	ret
 	.endm
 
@@ -856,24 +901,36 @@ ENDPROC(aesbs_xts_decrypt)
 	 *		     int rounds, int blocks, u8 iv[], u8 final[])
 	 */
 ENTRY(aesbs_ctr_encrypt)
-	stp		x29, x30, [sp, #-16]!
+	stp		x29, x30, [sp, #-80]!
 	mov		x29, sp
-
-	cmp		x6, #0
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+	str		x25, [sp, #64]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x5
+	mov		x25, x6
+
+	cmp		x25, #0
 	cset		x10, ne
-	add		x4, x4, x10		// do one extra block if final
+	add		x23, x23, x10		// do one extra block if final
 
-	ldp		x7, x8, [x5]
-	ld1		{v0.16b}, [x5]
+98:	ldp		x7, x8, [x24]
+	ld1		{v0.16b}, [x24]
 CPU_LE(	rev		x7, x7		)
 CPU_LE(	rev		x8, x8		)
 	adds		x8, x8, #1
 	adc		x7, x7, xzr
 
 99:	mov		x9, #1
-	lsl		x9, x9, x4
-	subs		w4, w4, #8
-	csel		x4, x4, xzr, pl
+	lsl		x9, x9, x23
+	subs		w23, w23, #8
+	csel		x23, x23, xzr, pl
 	csel		x9, x9, xzr, le
 
 	tbnz		x9, #1, 0f
@@ -891,82 +948,88 @@ CPU_LE(	rev		x8, x8		)
 	tbnz		x9, #7, 0f
 	next_ctr	v7
 
-0:	mov		bskey, x2
-	mov		rounds, x3
+0:	mov		bskey, x21
+	mov		rounds, x22
 	bl		aesbs_encrypt8
 
 	lsr		x9, x9, x10		// disregard the extra block
 	tbnz		x9, #0, 0f
 
-	ld1		{v8.16b}, [x1], #16
+	ld1		{v8.16b}, [x20], #16
 	eor		v0.16b, v0.16b, v8.16b
-	st1		{v0.16b}, [x0], #16
+	st1		{v0.16b}, [x19], #16
 	tbnz		x9, #1, 1f
 
-	ld1		{v9.16b}, [x1], #16
+	ld1		{v9.16b}, [x20], #16
 	eor		v1.16b, v1.16b, v9.16b
-	st1		{v1.16b}, [x0], #16
+	st1		{v1.16b}, [x19], #16
 	tbnz		x9, #2, 2f
 
-	ld1		{v10.16b}, [x1], #16
+	ld1		{v10.16b}, [x20], #16
 	eor		v4.16b, v4.16b, v10.16b
-	st1		{v4.16b}, [x0], #16
+	st1		{v4.16b}, [x19], #16
 	tbnz		x9, #3, 3f
 
-	ld1		{v11.16b}, [x1], #16
+	ld1		{v11.16b}, [x20], #16
 	eor		v6.16b, v6.16b, v11.16b
-	st1		{v6.16b}, [x0], #16
+	st1		{v6.16b}, [x19], #16
 	tbnz		x9, #4, 4f
 
-	ld1		{v12.16b}, [x1], #16
+	ld1		{v12.16b}, [x20], #16
 	eor		v3.16b, v3.16b, v12.16b
-	st1		{v3.16b}, [x0], #16
+	st1		{v3.16b}, [x19], #16
 	tbnz		x9, #5, 5f
 
-	ld1		{v13.16b}, [x1], #16
+	ld1		{v13.16b}, [x20], #16
 	eor		v7.16b, v7.16b, v13.16b
-	st1		{v7.16b}, [x0], #16
+	st1		{v7.16b}, [x19], #16
 	tbnz		x9, #6, 6f
 
-	ld1		{v14.16b}, [x1], #16
+	ld1		{v14.16b}, [x20], #16
 	eor		v2.16b, v2.16b, v14.16b
-	st1		{v2.16b}, [x0], #16
+	st1		{v2.16b}, [x19], #16
 	tbnz		x9, #7, 7f
 
-	ld1		{v15.16b}, [x1], #16
+	ld1		{v15.16b}, [x20], #16
 	eor		v5.16b, v5.16b, v15.16b
-	st1		{v5.16b}, [x0], #16
+	st1		{v5.16b}, [x19], #16
 
 8:	next_ctr	v0
-	cbnz		x4, 99b
-
-0:	st1		{v0.16b}, [x5]
-	ldp		x29, x30, [sp], #16
+	st1		{v0.16b}, [x24]
+	cbz		x23, 0f
+	yield_neon	98b
+	b		99b
+
+0:	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldr		x25, [sp, #64]
+	ldp		x29, x30, [sp], #80
 	ret
 
 	/*
 	 * If we are handling the tail of the input (x6 != NULL), return the
 	 * final keystream block back to the caller.
 	 */
-1:	cbz		x6, 8b
-	st1		{v1.16b}, [x6]
+1:	cbz		x25, 8b
+	st1		{v1.16b}, [x25]
 	b		8b
-2:	cbz		x6, 8b
-	st1		{v4.16b}, [x6]
+2:	cbz		x25, 8b
+	st1		{v4.16b}, [x25]
 	b		8b
-3:	cbz		x6, 8b
-	st1		{v6.16b}, [x6]
+3:	cbz		x25, 8b
+	st1		{v6.16b}, [x25]
 	b		8b
-4:	cbz		x6, 8b
-	st1		{v3.16b}, [x6]
+4:	cbz		x25, 8b
+	st1		{v3.16b}, [x25]
 	b		8b
-5:	cbz		x6, 8b
-	st1		{v7.16b}, [x6]
+5:	cbz		x25, 8b
+	st1		{v7.16b}, [x25]
 	b		8b
-6:	cbz		x6, 8b
-	st1		{v2.16b}, [x6]
+6:	cbz		x25, 8b
+	st1		{v2.16b}, [x25]
 	b		8b
-7:	cbz		x6, 8b
-	st1		{v5.16b}, [x6]
+7:	cbz		x25, 8b
+	st1		{v5.16b}, [x25]
 	b		8b
 ENDPROC(aesbs_ctr_encrypt)

From patchwork Mon Dec  4 12:26:42 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120526
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4366071qgn;
 Mon, 4 Dec 2017 04:27:49 -0800 (PST)
X-Google-Smtp-Source: AGs4zMbyBt/AblYpKMrdbKi5cScr+BpEqQnWRQOCdo/E83g9EsUditF3ddgND0zA2aSnrYW8J58p
X-Received: by 10.98.159.16 with SMTP id g16mr19275147pfe.53.1512390469805; 
 Mon, 04 Dec 2017 04:27:49 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390469; cv=none;
 d=google.com; s=arc-20160816;
 b=MyubI5VEMpKg/qj193DLJtZs6Px+1vzW2wpIHBXrIs3+Gvr1JgnA/FKx7B2qbIGfxO
 1QmZT7x0uDG0bANgxn88++h0Mm91DY2shkumq/SLIO1+kYHrFgup/K5I+G0qLzynQtOL
 YTwQrmzvBTPvN4M0g3y2Qd/Oq5ZlkTRaLlsWWUX/hlDcpkvNwakuIYYUGOVhsnCDnOlA
 YR1eG8Sd0hThGCETC19EX49q+tG3U5ZQqhruLr1p3Y3QhYEk4AqZWzrrZZ7WthYdErf+
 uXwNV5lW8roujFcuns/mE9TgbBxHv/roy4B/ZSapz4z3/95IKPUj1qF57DHkvg0361st
 llbw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=;
 b=arlSHtrbxeVG6fEorsTBSaz3dIlYySVyMnD4+bW8/wHrm7LsTfzRscjt01G2j6sdbz
 bkxI7amSSGdCEq57/p+BCCrOgWETbAuPzg3uBxl2YNdLHDbrrKrT3MDllTeSa7wibrEb
 PTuWGpAWix0ay7aLCw6YlGeeWz/crACGZPLKi7pH0M3yYqCNdS17KTPaayCQpbx/m9Yi
 ih0hKvRAJtVTaGWh01tH4MDirL8BswbqxQJAxmxv0jnFu7Eb9fUuX0NgQxwOsoWgmy5O
 /gZ6U62siKEV5kkvIfhfdDhp+vD0F6y5cFgmmyx+tp9UIft47T2NZcIC4+Qk+7jXy3tu
 58BA==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=RRPICWwD;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.49; 
 Mon, 04 Dec 2017 04:27:49 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=RRPICWwD;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753334AbdLDM1s (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:48 -0500
Received: from mail-wm0-f66.google.com ([74.125.82.66]:37000 "EHLO
 mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753323AbdLDM1p (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:45 -0500
Received: by mail-wm0-f66.google.com with SMTP id f140so13921499wmd.2
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=;
 b=RRPICWwD8QaMGtwp4l1SQrQAwjJhkk27XHqdg0sz5ZgViyfB9WmuJBZErqR7vUnVES
 b6LIMY/i3y1zMWfwOBZkwM65x8RxBooUFclFaRM+49JGlQFxMnl/+Bikfimd+O1Ufr94
 gVt8FwdgbCypZkJlWYhun4SCo41fhF/b84eO4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=;
 b=GaqjAv1SzkV0DaYck5M5+lEDdVpEC+ALcC1T9PplW39VZlj+MopWN85RAul0saWvwE
 oreeCYSJGjwDfV4XjUg1GFU+2mwFq85eb/VK48C/QTsCDeaqgIrOVsvKuez7g2mCSvRw
 2SGsrWJ+igzHmWkqrGTX50EyCrAT22ALiAYNFyVeSM4oH8jYTgKB4sM4M/YS/ELCavw0
 QH27ejjfEebetVZsGHRYdCbQ9vPdZjHwIOShQDkxyzmU0BemeCydYaz9Pfsa1KfuWjQ1
 2WiktFkkmVz170VtYw+LNu0LZubHRrCjZu04eXMMgQz935TgQvhXZzKpjPuoFVt1Jej8
 wN+w==
X-Gm-Message-State: AKGB3mKEfJfFlmAnu0iIbZ2jouYlD8a5hBirpOS1im3/eGA74rJtPN5R
 147ESoqWm7baqogGZZYRys6+LZJgU4c=
X-Received: by 10.28.30.151 with SMTP id e145mr2785194wme.8.1512390463952;
 Mon, 04 Dec 2017 04:27:43 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.41
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:43 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 16/19] crypto: arm64/aes-ghash - yield after processing
 fixed number of blocks
Date: Mon,  4 Dec 2017 12:26:42 +0000
Message-Id: <20171204122645.31535-17-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

This updates both the core GHASH as well as the AES-GCM algorithm to
yield each time after processing a fixed chunk of input. For the GCM
driver, we align with the other AES/CE block mode drivers, and use
a block size of 64 bytes. The core GHASH is much shorter, so let's
use a block size of 128 bytes for that one.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/ghash-ce-core.S | 128 ++++++++++++++------
 1 file changed, 92 insertions(+), 36 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S
index 11ebf1ae248a..fbfd4681675d 100644
--- a/arch/arm64/crypto/ghash-ce-core.S
+++ b/arch/arm64/crypto/ghash-ce-core.S
@@ -212,23 +212,36 @@
 	ushr		XL.2d, XL.2d, #1
 	.endm
 
-	.macro		__pmull_ghash, pn
-	ld1		{SHASH.2d}, [x3]
-	ld1		{XL.2d}, [x1]
+	.macro		__pmull_ghash, pn, yield
+	stp		x29, x30, [sp, #-64]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	str		x23, [sp, #48]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+
+0:	ld1		{SHASH.2d}, [x22]
+	ld1		{XL.2d}, [x20]
 	ext		SHASH2.16b, SHASH.16b, SHASH.16b, #8
 	eor		SHASH2.16b, SHASH2.16b, SHASH.16b
 
 	__pmull_pre_\pn
 
 	/* do the head block first, if supplied */
-	cbz		x4, 0f
-	ld1		{T1.2d}, [x4]
-	b		1f
+	cbz		x23, 1f
+	ld1		{T1.2d}, [x23]
+	mov		x23, xzr
+	b		2f
 
-0:	ld1		{T1.2d}, [x2], #16
-	sub		w0, w0, #1
+1:	ld1		{T1.2d}, [x21], #16
+	sub		w19, w19, #1
 
-1:	/* multiply XL by SHASH in GF(2^128) */
+2:	/* multiply XL by SHASH in GF(2^128) */
 CPU_LE(	rev64		T1.16b, T1.16b	)
 
 	ext		T2.16b, XL.16b, XL.16b, #8
@@ -250,9 +263,19 @@ CPU_LE(	rev64		T1.16b, T1.16b	)
 	eor		T2.16b, T2.16b, XH.16b
 	eor		XL.16b, XL.16b, T2.16b
 
-	cbnz		w0, 0b
+	cbz		w19, 3f
 
-	st1		{XL.2d}, [x1]
+	yield_neon_pre	w19, \yield, 1, 1b
+	st1		{XL.2d}, [x20]
+	yield_neon_post	0b
+
+	b		1b
+
+3:	st1		{XL.2d}, [x20]
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldr		x23, [sp, #48]
+	ldp		x29, x30, [sp], #64
 	ret
 	.endm
 
@@ -261,11 +284,11 @@ CPU_LE(	rev64		T1.16b, T1.16b	)
 	 *			   struct ghash_key const *k, const char *head)
 	 */
 ENTRY(pmull_ghash_update_p64)
-	__pmull_ghash	p64
+	__pmull_ghash	p64, 5
 ENDPROC(pmull_ghash_update_p64)
 
 ENTRY(pmull_ghash_update_p8)
-	__pmull_ghash	p8
+	__pmull_ghash	p8, 2
 ENDPROC(pmull_ghash_update_p8)
 
 	KS		.req	v8
@@ -304,38 +327,56 @@ ENDPROC(pmull_ghash_update_p8)
 	.endm
 
 	.macro		pmull_gcm_do_crypt, enc
-	ld1		{SHASH.2d}, [x4]
-	ld1		{XL.2d}, [x1]
-	ldr		x8, [x5, #8]			// load lower counter
+	stp		x29, x30, [sp, #-96]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+	stp		x23, x24, [sp, #48]
+	stp		x25, x26, [sp, #64]
+	str		x27, [sp, #80]
+
+	mov		x19, x0
+	mov		x20, x1
+	mov		x21, x2
+	mov		x22, x3
+	mov		x23, x4
+	mov		x24, x5
+	mov		x25, x6
+	mov		x26, x7
+
+	ldr		x27, [x24, #8]			// load lower counter
+CPU_LE(	rev		x27, x27	)
+
+0:	ld1		{SHASH.2d}, [x23]
+	ld1		{XL.2d}, [x20]
 
 	movi		MASK.16b, #0xe1
 	ext		SHASH2.16b, SHASH.16b, SHASH.16b, #8
-CPU_LE(	rev		x8, x8		)
 	shl		MASK.2d, MASK.2d, #57
 	eor		SHASH2.16b, SHASH2.16b, SHASH.16b
 
 	.if		\enc == 1
-	ld1		{KS.16b}, [x7]
+	ld1		{KS.16b}, [x26]
 	.endif
 
-0:	ld1		{CTR.8b}, [x5]			// load upper counter
-	ld1		{INP.16b}, [x3], #16
-	rev		x9, x8
-	add		x8, x8, #1
-	sub		w0, w0, #1
+1:	ld1		{CTR.8b}, [x24]			// load upper counter
+	ld1		{INP.16b}, [x22], #16
+	rev		x9, x27
+	add		x27, x27, #1
+	sub		w19, w19, #1
 	ins		CTR.d[1], x9			// set lower counter
 
 	.if		\enc == 1
 	eor		INP.16b, INP.16b, KS.16b	// encrypt input
-	st1		{INP.16b}, [x2], #16
+	st1		{INP.16b}, [x21], #16
 	.endif
 
 	rev64		T1.16b, INP.16b
 
-	cmp		w6, #12
-	b.ge		2f				// AES-192/256?
+	cmp		w25, #12
+	b.ge		4f				// AES-192/256?
 
-1:	enc_round	CTR, v21
+2:	enc_round	CTR, v21
 
 	ext		T2.16b, XL.16b, XL.16b, #8
 	ext		IN1.16b, T1.16b, T1.16b, #8
@@ -390,27 +431,42 @@ CPU_LE(	rev		x8, x8		)
 
 	.if		\enc == 0
 	eor		INP.16b, INP.16b, KS.16b
-	st1		{INP.16b}, [x2], #16
+	st1		{INP.16b}, [x21], #16
 	.endif
 
-	cbnz		w0, 0b
+	cbz		w19, 3f
 
-CPU_LE(	rev		x8, x8		)
-	st1		{XL.2d}, [x1]
-	str		x8, [x5, #8]			// store lower counter
+	yield_neon_pre	w19, 8, 1, 1b			// yield every 8 blocks
+	st1		{XL.2d}, [x20]
+	.if		\enc == 1
+	st1		{KS.16b}, [x26]
+	.endif
+	yield_neon_post	0b
 
+	b		1b
+
+3:	st1		{XL.2d}, [x20]
 	.if		\enc == 1
-	st1		{KS.16b}, [x7]
+	st1		{KS.16b}, [x26]
 	.endif
 
+CPU_LE(	rev		x27, x27	)
+	str		x27, [x24, #8]			// store lower counter
+
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x23, x24, [sp, #48]
+	ldp		x25, x26, [sp, #64]
+	ldr		x27, [sp, #80]
+	ldp		x29, x30, [sp], #96
 	ret
 
-2:	b.eq		3f				// AES-192?
+4:	b.eq		5f				// AES-192?
 	enc_round	CTR, v17
 	enc_round	CTR, v18
-3:	enc_round	CTR, v19
+5:	enc_round	CTR, v19
 	enc_round	CTR, v20
-	b		1b
+	b		2b
 	.endm
 
 	/*

From patchwork Mon Dec  4 12:26:43 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120527
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4366125qgn;
 Mon, 4 Dec 2017 04:27:53 -0800 (PST)
X-Google-Smtp-Source: AGs4zMZ0C5tdvdp2I+sM9GaDQLd9FR5d/inQSrOu9huBuhY/ADjHgs7N3+AlufjR93GxiefCNJ1M
X-Received: by 10.98.32.21 with SMTP id g21mr19263862pfg.52.1512390473009;
 Mon, 04 Dec 2017 04:27:53 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390473; cv=none;
 d=google.com; s=arc-20160816;
 b=uaW3MvIWnvpPzq7k6VoSU/Gpnd2mHeQmjXeFPKOsm5akCx5YADOMtTcrQIkVFsaWQF
 S4ug9/ViqcsadOoKFIOmtZ0zbDkswf2564AmZmsdAxBrklFurfRUAVkrhviywJhkajjp
 jxmSdxihgCFsUPSLlpx1qgj1XthdnhQlQ6t2K6+mAIcV1LycM/UdvfQaqsK23N/yN02Y
 wsTOccwcNaua3yzhD2HQ/Li3Z9dHUeJV3Bsx2LKj01iaK33HlaFduyep297pJ9y0x+fF
 dFi725BaRYGVk11XrVFCcTh5/4Q+/qWn7QmnYO3dUZqUOGZqI7BO5eTB+0QhWS/4IHtX
 GDuQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=;
 b=WIfpyczhphvtOaJT/d0H22/i4ivGUL46f39QrS1vWiJiuAT/tDSY92puRy4Sq3cSjH
 gJNkOIVpdi1fU2wtvT+lGcYlcm4pHRhO6S75ILXgGV7JfHToiGcxKC4fklwoqY9KxPLB
 4hRoraOyD89fM3Zpa4NHP5XWx+J/JBraIYWiOOeEsXpi7Sbq1iqDFcOe12TvNtEzNy3o
 WpqZkZXOD9LKpzBsO2RLIKR8p+DmVSbrTtnBa0smob+6Z+ZXmORfk9YdWf6YTPMaM6ni
 RSw/EtV9UMJ/bjXVoMqiSGv17M0w+15aUyvxqYjFBV7F1l5T2oFafOzgmJ/NwjF/m1Ho
 3neQ==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=kGajrMYV;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.52; 
 Mon, 04 Dec 2017 04:27:52 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=kGajrMYV;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753382AbdLDM1v (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:51 -0500
Received: from mail-wr0-f196.google.com ([209.85.128.196]:40343 "EHLO
 mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753330AbdLDM1s (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:48 -0500
Received: by mail-wr0-f196.google.com with SMTP id q9so17063198wre.7
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=;
 b=kGajrMYVWr2DrIDPBGOAoBuGMS4jRsgcCIK9v80ULwBvyTf6Dg8wUjMu4KECCLpUWq
 QARSFBT69OdgKKepLJENdOSBzHGpN+cNeUIaw0ZI4cjizCkFyKNSRlAkDmUFZOh5q2PQ
 yYdOzJLbTyFbbpToLeIs9NcPaFHQc3gLZ1HVc=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=;
 b=k/7M4Xwm80ODh3eDELdtwNqXZ5xKOPyFcUQrS/9pxq/5HB2qfQiVC2dzrt7/K4aGKB
 0iP5U7KsCRyrQJmUq2AkI3Yl26+HG8jTuviu9aRAnYMYK9mU3KGSzB8Iq8ZQHz+m9oQ1
 BKO2XUEXaAvgFjzF2GxqPOYx8pKEozyL1o/0Dd8lZi2i+VrzFvBswvDAW8YnrKvnAPdl
 aMRkMNjVY3X8dSBPHS/1R3O6ZJwiN5BQCngDvvC/3MfQ4o+nIuH2ZwZEPBuXGQ1MFLSw
 y9V/S+6/2igtlE0gI/0HudjGDTcRwywgX6R1Lj5g1UbXGCGfU8Kb3V3HxnNkqhm4B39q
 KP5Q==
X-Gm-Message-State: AJaThX6G4ohTwa5falcTMYPLmOxf7pMTqIJi9FRjj+sfTKqWVJl+ke5V
 55N1+I+W4MFFE+3STWhIurkCUymbwvM=
X-Received: by 10.223.176.27 with SMTP id f27mr13178031wra.105.1512390467409; 
 Mon, 04 Dec 2017 04:27:47 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.44
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:46 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 17/19] crypto: arm64/crc32-ce - yield NEON every 16
 blocks of input
Date: Mon,  4 Dec 2017 12:26:43 +0000
Message-Id: <20171204122645.31535-18-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON every 16 blocks of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crc32-ce-core.S | 55 +++++++++++++++-----
 1 file changed, 43 insertions(+), 12 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S
index 18f5a8442276..bca3d22fae7b 100644
--- a/arch/arm64/crypto/crc32-ce-core.S
+++ b/arch/arm64/crypto/crc32-ce-core.S
@@ -100,9 +100,9 @@
 	dCONSTANT	.req	d0
 	qCONSTANT	.req	q0
 
-	BUF		.req	x0
-	LEN		.req	x1
-	CRC		.req	x2
+	BUF		.req	x19
+	LEN		.req	x20
+	CRC		.req	x21
 
 	vzr		.req	v9
 
@@ -116,13 +116,27 @@
 	 *                     size_t len, uint crc32)
 	 */
 ENTRY(crc32_pmull_le)
-	adr		x3, .Lcrc32_constants
+	stp		x29, x30, [sp, #-112]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+
+	adr		x22, .Lcrc32_constants
 	b		0f
 
 ENTRY(crc32c_pmull_le)
-	adr		x3, .Lcrc32c_constants
+	stp		x29, x30, [sp, #-112]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+
+	adr		x22, .Lcrc32c_constants
 
-0:	bic		LEN, LEN, #15
+0:	mov		BUF, x0
+	mov		LEN, x1
+	mov		CRC, x2
+
+	bic		LEN, LEN, #15
 	ld1		{v1.16b-v4.16b}, [BUF], #0x40
 	movi		vzr.16b, #0
 	fmov		dCONSTANT, CRC
@@ -131,7 +145,7 @@ ENTRY(crc32c_pmull_le)
 	cmp		LEN, #0x40
 	b.lt		less_64
 
-	ldr		qCONSTANT, [x3]
+	ldr		qCONSTANT, [x22]
 
 loop_64:		/* 64 bytes Full cache line folding */
 	sub		LEN, LEN, #0x40
@@ -161,10 +175,24 @@ loop_64:		/* 64 bytes Full cache line folding */
 	eor		v4.16b, v4.16b, v8.16b
 
 	cmp		LEN, #0x40
-	b.ge		loop_64
+	b.lt		less_64
+
+	yield_neon_pre	LEN, 4, 64, loop_64		// yield every 16 blocks
+	stp		q1, q2, [sp, #48]
+	stp		q3, q4, [sp, #80]
+	yield_neon_post	2f
+	b		loop_64
+
+	.subsection	1
+2:	ldp		q1, q2, [sp, #48]
+	ldp		q3, q4, [sp, #80]
+	ldr		qCONSTANT, [x22]
+	movi		vzr.16b, #0
+	b		loop_64
+	.previous
 
 less_64:		/* Folding cache line into 128bit */
-	ldr		qCONSTANT, [x3, #16]
+	ldr		qCONSTANT, [x22, #16]
 
 	pmull2		v5.1q, v1.2d, vCONSTANT.2d
 	pmull		v1.1q, v1.1d, vCONSTANT.1d
@@ -203,8 +231,8 @@ fold_64:
 	eor		v1.16b, v1.16b, v2.16b
 
 	/* final 32-bit fold */
-	ldr		dCONSTANT, [x3, #32]
-	ldr		d3, [x3, #40]
+	ldr		dCONSTANT, [x22, #32]
+	ldr		d3, [x22, #40]
 
 	ext		v2.16b, v1.16b, vzr.16b, #4
 	and		v1.16b, v1.16b, v3.16b
@@ -212,7 +240,7 @@ fold_64:
 	eor		v1.16b, v1.16b, v2.16b
 
 	/* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */
-	ldr		qCONSTANT, [x3, #48]
+	ldr		qCONSTANT, [x22, #48]
 
 	and		v2.16b, v1.16b, v3.16b
 	ext		v2.16b, vzr.16b, v2.16b, #8
@@ -222,6 +250,9 @@ fold_64:
 	eor		v1.16b, v1.16b, v2.16b
 	mov		w0, v1.s[1]
 
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x29, x30, [sp], #112
 	ret
 ENDPROC(crc32_pmull_le)
 ENDPROC(crc32c_pmull_le)

From patchwork Mon Dec  4 12:26:44 2017
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 120528
Delivered-To: patch@linaro.org
Received: by 10.140.22.227 with SMTP id 90csp4366185qgn;
 Mon, 4 Dec 2017 04:27:57 -0800 (PST)
X-Google-Smtp-Source: AGs4zMZhMN3x5VVlZz+pEMi94yKZqIQ4MQbHOVjmLSxosVo0o5K+mPnhY8S5KgTiu28N7/kedSaH
X-Received: by 10.98.189.17 with SMTP id a17mr18902948pff.97.1512390477799; 
 Mon, 04 Dec 2017 04:27:57 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1512390477; cv=none;
 d=google.com; s=arc-20160816;
 b=AZs2lDMZYHBJYyYGOy8/2VgJP1Jl9R627m3QP7lFjJl3bWVduyMLq7MpTQkkFV/oe1
 w72KSnnU7eZmkpYUcmCTKbkr4uYdcFIOl31wfHOLHGJWgZy4IHQFNYTEuA8F7hnfx1Gu
 6IRZ9pUlJjPT/8fvz5K7VAzVDLRfi51YG+O/2WwJ6u2xXs+iIVU5cVhtq+G0KwGnHxUR
 SIWkxLv4bW+ooiuJtuDxVXqTP+3cE6p8m85+KlBf5yo2fpHS/hoEWFG4VknHNmEyHv+S
 NmkkGJwJtdHTwDVAAaBar097hZXnhpyT19BhAygInm25wkI8TXe9kTOBEVxnShWQSX3c
 1Lng==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:references:in-reply-to:message-id:date
 :subject:cc:to:from:dkim-signature:arc-authentication-results;
 bh=Jxuwcw6BxVcDUPRNazZM1ET5ppsEyXX+HMSvldFkj70=;
 b=Lt6yW97hNCGVrW+RX4OnESPXG/xnzmcVZrxdXneW0s4sSOHQtZA8tF8o6b1caSBM4O
 t7jN3uSMGcD9bO38ojKH6wt2lcyAAYqDkJa+kKakQ0QE7JSIznWLmhbUNjDYTKMns1V7
 UwU+ByBK8merDkwqIpmhmEnhji5FNqm1Q23gLKrV9aGHwFWbm3SUH9C4jRwwzoPFN1Ws
 t2wMDexMBfkaVGZ9YSIVMhZlnbyjU3QEAmxKnH9opbNOQ3FMyeBRO+VqZuNola9AeyDS
 w6QMp2ksDvBoOlf5rF5FofmQtikn7R6SX55CVCaq9c+Vv09rD/toRf2ljmqb2XS3KLtg
 8/rw==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=ZOYQdiX+;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 d23si10177768pfe.339.2017.12.04.04.27.57; 
 Mon, 04 Dec 2017 04:27:57 -0800 (PST)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=ZOYQdiX+;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1753322AbdLDM1y (ORCPT <rfc822;victor.chong@linaro.org>
 + 1 other); Mon, 4 Dec 2017 07:27:54 -0500
Received: from mail-wm0-f68.google.com ([74.125.82.68]:43939 "EHLO
 mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1753367AbdLDM1w (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 4 Dec 2017 07:27:52 -0500
Received: by mail-wm0-f68.google.com with SMTP id n138so5458163wmg.2
 for <linux-crypto@vger.kernel.org>;
 Mon, 04 Dec 2017 04:27:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id:in-reply-to:references;
 bh=Jxuwcw6BxVcDUPRNazZM1ET5ppsEyXX+HMSvldFkj70=;
 b=ZOYQdiX+S1+XiuFzWfPa4V8n7jUNlVke4o+5BMth07FIyPcI6bnsVvbd41NTUVi+Vt
 LhOm7LMpQpBC3uiOjGLConmeKSuTD+QskPEEsky74mb3g8+HgGGsb7HhTs7f98Ann2Gw
 PezeIBOqRiRl9MqHNC7EpOe/n0d/vBv0DjkaE=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references;
 bh=Jxuwcw6BxVcDUPRNazZM1ET5ppsEyXX+HMSvldFkj70=;
 b=QNCDXkklNYaIrEhLDOzsVqKaX7dMdRlSgcKZPXB8gHdDrLNMql56Tf3P2TuPn4qvbQ
 3WIeK1vf3xx+mE2AS+FYKjecpf6HH86gM11fxzfmx0LnZvrQ1nFYdmL7nPE7lWILKx/W
 AtsfQO87dSKwuIA/Ga77veQWJnQ6A4PToWLGkgB4oyWMdlTVk4R0J07e+hcyWSF7y7rf
 Q2jngQSq18FRXAcg9SvrO6ziYkClwGL0Py4ppcH4RWD35D0VfMPQ3s0YTRR+LaIvLYAg
 Kymhe3+EZlwEnHbZNqdEBag9q21k6aAormmNnwhOWnJRC6n6mERFirCkgcocwwV8IKpv
 PxpQ==
X-Gm-Message-State: AJaThX7GDdwtxOZiidoQy++gX70vFZJK4TKlHZMxQdur/vc6h4OZMjLu
 ShjzLgc3XzHazzuwF8sJnIDJY2LXfRg=
X-Received: by 10.28.216.196 with SMTP id p187mr6453235wmg.158.1512390470671; 
 Mon, 04 Dec 2017 04:27:50 -0800 (PST)
Received: from localhost.localdomain ([105.150.171.234])
 by smtp.gmail.com with ESMTPSA id
 a8sm7665839wmh.41.2017.12.04.04.27.47
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 04 Dec 2017 04:27:50 -0800 (PST)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>,
 Dave Martin <Dave.Martin@arm.com>,
 Russell King - ARM Linux <linux@armlinux.org.uk>,
 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
 Mark Rutland <mark.rutland@arm.com>, linux-rt-users@vger.kernel.org,
 Peter Zijlstra <peterz@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Will Deacon <will.deacon@arm.com>, Steven Rostedt <rostedt@goodmis.org>,
 Thomas Gleixner <tglx@linutronix.de>
Subject: [PATCH v2 18/19] crypto: arm64/crct10dif-ce - yield NEON every 8
 blocks of input
Date: Mon,  4 Dec 2017 12:26:44 +0000
Message-Id: <20171204122645.31535-19-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.11.0
In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
References: <20171204122645.31535-1-ard.biesheuvel@linaro.org>
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

Avoid excessive scheduling delays under a preemptible kernel by
yielding the NEON every 8 blocks of input.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/crct10dif-ce-core.S | 39 ++++++++++++++++++--
 1 file changed, 35 insertions(+), 4 deletions(-)

-- 
2.11.0

diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S
index d5b5a8c038c8..d57067e80bae 100644
--- a/arch/arm64/crypto/crct10dif-ce-core.S
+++ b/arch/arm64/crypto/crct10dif-ce-core.S
@@ -74,13 +74,22 @@
 	.text
 	.cpu		generic+crypto
 
-	arg1_low32	.req	w0
-	arg2		.req	x1
-	arg3		.req	x2
+	arg1_low32	.req	w19
+	arg2		.req	x20
+	arg3		.req	x21
 
 	vzr		.req	v13
 
 ENTRY(crc_t10dif_pmull)
+	stp		x29, x30, [sp, #-176]!
+	mov		x29, sp
+	stp		x19, x20, [sp, #16]
+	stp		x21, x22, [sp, #32]
+
+	mov		arg1_low32, w0
+	mov		arg2, x1
+	mov		arg3, x2
+
 	movi		vzr.16b, #0		// init zero register
 
 	// adjust the 16-bit initial_crc value, scale it to 32 bits
@@ -175,8 +184,27 @@ CPU_LE(	ext		v12.16b, v12.16b, v12.16b, #8	)
 	subs		arg3, arg3, #128
 
 	// check if there is another 64B in the buffer to be able to fold
-	b.ge		_fold_64_B_loop
+	b.lt		_fold_64_B_end
+
+	yield_neon_pre	arg3, 3, 128, _fold_64_B_loop	// yield every 8 blocks
+	stp		q0, q1, [sp, #48]
+	stp		q2, q3, [sp, #80]
+	stp		q4, q5, [sp, #112]
+	stp		q6, q7, [sp, #144]
+	yield_neon_post	2f
+	b		_fold_64_B_loop
+
+	.subsection	1
+2:	ldp		q0, q1, [sp, #48]
+	ldp		q2, q3, [sp, #80]
+	ldp		q4, q5, [sp, #112]
+	ldp		q6, q7, [sp, #144]
+	ldr		q10, rk3
+	movi		vzr.16b, #0		// init zero register
+	b		_fold_64_B_loop
+	.previous
 
+_fold_64_B_end:
 	// at this point, the buffer pointer is pointing at the last y Bytes
 	// of the buffer the 64B of folded data is in 4 of the vector
 	// registers: v0, v1, v2, v3
@@ -304,6 +332,9 @@ _barrett:
 _cleanup:
 	// scale the result back to 16 bits
 	lsr		x0, x0, #16
+	ldp		x19, x20, [sp, #16]
+	ldp		x21, x22, [sp, #32]
+	ldp		x29, x30, [sp], #176
 	ret
 
 _less_than_128: