From patchwork Sat Mar 10 15:21:48 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131293 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2244430lja; Sat, 10 Mar 2018 07:22:41 -0800 (PST) X-Google-Smtp-Source: AG47ELsIfHmyuomDLQGzb7XaqhXHWEdb8cA2B/TDq8mCkxF0IEERzqbObEt6zGu64al8ottizHAJ X-Received: by 2002:a17:902:aa89:: with SMTP id d9-v6mr2342211plr.337.1520695361543; Sat, 10 Mar 2018 07:22:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695361; cv=none; d=google.com; s=arc-20160816; b=e0k9Me4uM+EqM8PWZF7Az1fezplovdOPidz72Mx/Es7oI8SfclMZQJWXpIFCvloqL3 gejmROG5gTfJAHCjLbnnl22SkdRKw0SekTYnW22qnVKn2h+Puxn83HVJGsIWo0+EJnES hC6c9cq1t5Ds9p61s2UGrEYn70VZ2kM1GiS5V9IMZKKaDdS4LNmchHoK8atD5g5CAmdA l9MHOVPhix7kYiCON4ORp2Cb6B+TuY/HtSV/S4PLpDW38lPLc71n5UkhYXnIyQB89OUE UdzH2jgrrG3ekDRh8Iu7upSpeONWd5KN+iSaCLzdDYA19B0cVMW3pi2u0dFx43777eiG EgwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=jtmrSUzpUH3yaxpzLfp4ZQ51FNxwKf8yrell9RgmiJc=; b=hn6F01NWZpDOTxb7BQG3ph3ycoUYt6bKBHhl2qEey0GvUKWsJhABHawVm57I+SP89O l+ntxVP29Nso1U5vbfHFG5tvITMNK3aXeie47r0tziA0yym31EZXzwE0WZTLC4kEOOQY GI4KMnGOYmO5ZIFnW2qtDgAWfx31wGadaFKXKcruGd7yoawoa0sVfaOGG3tWrEkPijq4 xg37kERnecUpmx1pnOetsZ3PpmEyi6rVGR0Y5EpMqKQ0/3s4g+q2KPY+d/z2VxiA1ldJ HniuUzP95uM3rAB0rVzSMbUfxV1McPjYgSNcmi+DhSIH3oKRkJi3r/V9giP6ezORIhYB Vukg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=E7QL+rBf; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.22.41; Sat, 10 Mar 2018 07:22:41 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=E7QL+rBf; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932265AbeCJPWk (ORCPT + 4 others); Sat, 10 Mar 2018 10:22:40 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:38060 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932110AbeCJPWh (ORCPT ); Sat, 10 Mar 2018 10:22:37 -0500 Received: by mail-wr0-f194.google.com with SMTP id n7so11635767wrn.5 for ; Sat, 10 Mar 2018 07:22:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=OO3oEPBlt7zUQiwYCFtDcF0pKRZQFOzDlWIwR8EF7YU=; b=E7QL+rBfdqATc0GcGGTpmBQfQA5w1ZuIsH/9g6C49oEmzMTLvMVmdosyUZMsuMLrWG 4tXQcTR74gfRwqVxwxZ+7A7fLhIYPatz/upGLs1xprrwVUlCfWuKuw07UV3ro5lF7moT qJODL+dxWs3yz1EQQ5+4nTPk6vucTnA8A8uO4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=OO3oEPBlt7zUQiwYCFtDcF0pKRZQFOzDlWIwR8EF7YU=; b=Ocb6uSeYzTPVr+wRnC8yz2jPwszd5oNfFOdroOwOBPH+mPV4pi1PsCJWnDrv6HdgY1 yP0TwdXwJD28mYxCqR0iQ1Va95ywP6l49avEfOTqg3/9u3xrXJXN8bx6DYMSoCc63/1T n9CQx7H8ihmLUxywcWiwYarOYVXejUI7PW43DfJN8JqhnxsYI2EHZax6zV7ipMCnCgkI TSBzCEIFwOLo2oU6NTm7KLNznGEAQVU4UruIcEDcPtyT3XcZQWvAh9unZaLHFGfZ7zsG EOpW7hy7aDPGGSIk0xa6zNMPw9KVdjwyhoEoTvIUMbLvKErQmlLhceLcbaKny9bhKrFp C45Q== X-Gm-Message-State: AElRT7FwJ5mCBUaYTSwt5ALeSZi2oXsoQNyPZQX5jng+8JFgwa1NyiDY v8H7iTBI+YrnyFCjSF2phG1S9A== X-Received: by 10.223.146.102 with SMTP id 93mr1750951wrj.255.1520695356193; Sat, 10 Mar 2018 07:22:36 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.22.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:22:34 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 03/23] crypto: arm64/aes-blk - move kernel mode neon en/disable into loop Date: Sat, 10 Mar 2018 15:21:48 +0000 Message-Id: <20180310152208.10369-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Note that this requires some reshuffling of the registers in the asm code, because the XTS routines can no longer rely on the registers to retain their contents between invocations. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 95 ++++++++++---------- arch/arm64/crypto/aes-modes.S | 90 +++++++++---------- arch/arm64/crypto/aes-neonbs-glue.c | 14 ++- 3 files changed, 97 insertions(+), 102 deletions(-) -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 2fa850e86aa8..253188fb8cb0 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2"); /* defined in aes-modes.S */ asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 ctr[], int first); + int rounds, int blocks, u8 ctr[]); asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], int rounds, int blocks, u8 const rk2[], u8 iv[], @@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, first); + (u8 *)ctx->key_enc, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, first); + (u8 *)ctx->key_dec, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -173,20 +173,19 @@ static int cbc_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -194,20 +193,19 @@ static int cbc_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_dec, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -215,20 +213,18 @@ static int ctr_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - first = 1; - kernel_neon_begin(); while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; + kernel_neon_end(); } if (walk.nbytes) { u8 __aligned(8) tail[AES_BLOCK_SIZE]; @@ -241,12 +237,13 @@ static int ctr_encrypt(struct skcipher_request *req) */ blocks = -1; + kernel_neon_begin(); aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds, - blocks, walk.iv, first); + blocks, walk.iv); + kernel_neon_end(); crypto_xor_cpy(tdst, tsrc, tail, nbytes); err = skcipher_walk_done(&walk, 0); } - kernel_neon_end(); return err; } @@ -270,16 +267,16 @@ static int xts_encrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_enc, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -292,16 +289,16 @@ static int xts_decrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_dec, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -425,7 +422,7 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, /* encrypt the zero vector */ kernel_neon_begin(); - aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1, 1); + aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1); kernel_neon_end(); cmac_gf128_mul_by_x(consts, consts); @@ -454,8 +451,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, return err; kernel_neon_begin(); - aes_ecb_encrypt(key, ks[0], rk, rounds, 1, 1); - aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2, 0); + aes_ecb_encrypt(key, ks[0], rk, rounds, 1); + aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2); kernel_neon_end(); return cbcmac_setkey(tfm, key, sizeof(key)); diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 2674d43d1384..65b273667b34 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -40,24 +40,24 @@ #if INTERLEAVE == 2 aes_encrypt_block2x: - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block2x) aes_decrypt_block2x: - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block2x) #elif INTERLEAVE == 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -86,33 +86,32 @@ ENDPROC(aes_decrypt_block4x) #define FRAME_POP .macro do_encrypt_block2x - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_decrypt_block2x - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_encrypt_block4x - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm .macro do_decrypt_block4x - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm #endif /* * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) * aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) */ AES_ENTRY(aes_ecb_encrypt) FRAME_PUSH - cbz w5, .LecbencloopNx enc_prepare w3, x2, x5 @@ -148,7 +147,6 @@ AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) FRAME_PUSH - cbz w5, .LecbdecloopNx dec_prepare w3, x2, x5 @@ -184,14 +182,12 @@ AES_ENDPROC(aes_ecb_decrypt) /* * aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) * aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) */ AES_ENTRY(aes_cbc_encrypt) - cbz w6, .Lcbcencloop - ld1 {v0.16b}, [x5] /* get iv */ enc_prepare w3, x2, x6 @@ -209,7 +205,6 @@ AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) FRAME_PUSH - cbz w6, .LcbcdecloopNx ld1 {v7.16b}, [x5] /* get iv */ dec_prepare w3, x2, x6 @@ -264,20 +259,19 @@ AES_ENDPROC(aes_cbc_decrypt) /* * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 ctr[], int first) + * int blocks, u8 ctr[]) */ AES_ENTRY(aes_ctr_encrypt) FRAME_PUSH - cbz w6, .Lctrnotfirst /* 1st time around? */ + enc_prepare w3, x2, x6 ld1 {v4.16b}, [x5] -.Lctrnotfirst: - umov x8, v4.d[1] /* keep swabbed ctr in reg */ - rev x8, x8 + umov x6, v4.d[1] /* keep swabbed ctr in reg */ + rev x6, x6 #if INTERLEAVE >= 2 - cmn w8, w4 /* 32 bit overflow? */ + cmn w6, w4 /* 32 bit overflow? */ bcs .Lctrloop .LctrloopNx: subs w4, w4, #INTERLEAVE @@ -285,11 +279,11 @@ AES_ENTRY(aes_ctr_encrypt) #if INTERLEAVE == 2 mov v0.8b, v4.8b mov v1.8b, v4.8b - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v0.d[1], x7 - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v1.d[1], x7 ld1 {v2.16b-v3.16b}, [x1], #32 /* get 2 input blocks */ do_encrypt_block2x @@ -298,7 +292,7 @@ AES_ENTRY(aes_ctr_encrypt) st1 {v0.16b-v1.16b}, [x0], #32 #else ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ - dup v7.4s, w8 + dup v7.4s, w6 mov v0.16b, v4.16b add v7.4s, v7.4s, v8.4s mov v1.16b, v4.16b @@ -316,9 +310,9 @@ AES_ENTRY(aes_ctr_encrypt) eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b st1 {v0.16b-v3.16b}, [x0], #64 - add x8, x8, #INTERLEAVE + add x6, x6, #INTERLEAVE #endif - rev x7, x8 + rev x7, x6 ins v4.d[1], x7 cbz w4, .Lctrout b .LctrloopNx @@ -328,10 +322,10 @@ AES_ENTRY(aes_ctr_encrypt) #endif .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 - adds x8, x8, #1 /* increment BE ctr */ - rev x7, x8 + adds x6, x6, #1 /* increment BE ctr */ + rev x7, x6 ins v4.d[1], x7 bcs .Lctrcarry /* overflow? */ @@ -385,15 +379,17 @@ CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) FRAME_PUSH - cbz w7, .LxtsencloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - enc_switch_key w3, x2, x6 + cbz w7, .Lxtsencnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + enc_switch_key w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsencNx +.Lxtsencnotfirst: + enc_prepare w3, x2, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -442,7 +438,7 @@ AES_ENTRY(aes_xts_encrypt) .Lxtsencloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -450,6 +446,7 @@ AES_ENTRY(aes_xts_encrypt) next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_encrypt) @@ -457,15 +454,17 @@ AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) FRAME_PUSH - cbz w7, .LxtsdecloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - dec_prepare w3, x2, x6 + cbz w7, .Lxtsdecnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + dec_prepare w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsdecNx +.Lxtsdecnotfirst: + dec_prepare w3, x2, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -514,7 +513,7 @@ AES_ENTRY(aes_xts_decrypt) .Lxtsdecloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w3, x2, x6, w7 + decrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -522,6 +521,7 @@ AES_ENTRY(aes_xts_decrypt) next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_decrypt) diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index c55d68ccb89f..9d823c77ec84 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -46,10 +46,9 @@ asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[], /* borrowed from aes-neon-blk.ko */ asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, u8 iv[], - int first); + int rounds, int blocks, u8 iv[]); struct aesbs_ctx { u8 rk[13 * (8 * AES_BLOCK_SIZE) + 32]; @@ -157,7 +156,7 @@ static int cbc_encrypt(struct skcipher_request *req) struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; - int err, first = 1; + int err; err = skcipher_walk_virt(&walk, req, true); @@ -167,10 +166,9 @@ static int cbc_encrypt(struct skcipher_request *req) /* fall back to the non-bitsliced NEON implementation */ neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - ctx->enc, ctx->key.rounds, blocks, walk.iv, - first); + ctx->enc, ctx->key.rounds, blocks, + walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; } kernel_neon_end(); return err; @@ -311,7 +309,7 @@ static int __xts_crypt(struct skcipher_request *req, kernel_neon_begin(); neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, - ctx->key.rounds, 1, 1); + ctx->key.rounds, 1); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; From patchwork Sat Mar 10 15:21:54 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131299 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2244688lja; Sat, 10 Mar 2018 07:23:01 -0800 (PST) X-Google-Smtp-Source: AG47ELueHdVLVftnGFfDrqLw0P5XNtgG0lWKVlqMNLqAhOPHv2YlkrJrlqCT8Op4CxN/9PU5tYZL X-Received: by 10.98.137.147 with SMTP id n19mr2277227pfk.193.1520695381319; Sat, 10 Mar 2018 07:23:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695381; cv=none; d=google.com; s=arc-20160816; b=IVSu3ZnS9P7UWyiK3kkF2yMA21XO/0/h4CddsdHaljYBWtHD0hBfcySPm/gPDyeiuL NgIVhvgKycRfRZnkyIhcZQlKGNRbrv8Q0OhmFJjhuKhhEQHnu0slOsqJmv7yRaySFXOA sm9LicxNM44Aw/r1REx3lcoB2OghpqoFTgoF2VSWynkHy7fONSCGCFRcdKi47t5GPaNl nUTVf/jVva77dfJKDCiqvU5vitkTcjQW6UvyCetMUYlCIW4XlBYipRnTeAJAuDcixlvz 83VWVkSQR2hgqvTLZ5mn84sgipqvVpvTjLRan8ipyOoQ/wCJuTMQIXUD+sVwAf9Makgj 6u5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=0pSO1I7/1d3SIyd/CIYxopugahniWYaNXOHMgXM3sEc=; b=CkTN8bq097IfZ1JRG8L5hv8yUs6cdPxxzmyBQzrzC+dkEbds9190WSw0URoBgVkQtI RDBA842hy60kRNOktsuZsSpBwHjstO0Gcku/tSc6KJZLQ6p1o2J8xtvdk4fnATk9A4Og k6x8UGt+Bmwm4JzADbebip6EGUs5w1fB7OBtrTXTi1eeufgS7cpnX/gLwQOeTn1/p54S WE+4TmjNbCixYCdmt/MeACZiVvEWDABMQl9MhKL/vI+7Hl60vxlcIpy07TKwJ83jG93V VxR8ZnY3lRczf9Ye40YQTHrVKcnwL5IEsZPr6VEUFnRWF6QseZTcKX9s8v++ZQuh0Efg O0lA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=RknFKhhG; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.23.01; Sat, 10 Mar 2018 07:23:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=RknFKhhG; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932315AbeCJPW7 (ORCPT + 4 others); Sat, 10 Mar 2018 10:22:59 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:43528 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932308AbeCJPW5 (ORCPT ); Sat, 10 Mar 2018 10:22:57 -0500 Received: by mail-wr0-f193.google.com with SMTP id a63so8024182wrc.10 for ; Sat, 10 Mar 2018 07:22:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=RYUXyT4yhMIRanK7ihkLKdOKlgN14RDXrNInu0AEK8g=; b=RknFKhhG0D7vyNolv/HJayLrHRepnfuk8E4K5bw7LPuka/mELechLkjTQttpHSUJwJ VkHkVreuHUqi9lVl8N+2jMmTT7NWy2MNDe1TpSQxouT5uHI28Y6WSEkuToVG9+lVXqHO jDxGKUAZUJupEAtqbchw96mD4gOLS8vyUaNH0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=RYUXyT4yhMIRanK7ihkLKdOKlgN14RDXrNInu0AEK8g=; b=gDeeOem3nVlnX+NTNiChfjlkvT89Xs4Tpr621lwhx0X2YRT+2SKOPLaBPwy20AXQkw HRSGkiikx47C1CJkP11zLA3y8GvcIQvEmPttWNpk1+Fx1lbyNvnjnq+8Kc2mCVd6Lc4g R2iGWa/h3cBQ0MdKarsvfbxaBAbP+hfprXxKetphO5k0DQ0m7ddO7bAnGPD2drRICKFu NI/VxlJIzsr7jBYUpEdNjRVcDrWf+LoiQPdGGPwe1aVR7nfQeYi0x7BrQQWKD6PeDW5Q I5AmMwGw7ocEPcUSfMKEbswTeTIkr0Q5m42B0H7JXNOvRThS8Rk2gKWHzcp2Og1vGYqt qWeg== X-Gm-Message-State: AElRT7FGo7pjzKwYWrIqK2zU1uTQPiysR1QxWT7QIRS1Xwpn8+ozZcxc xrVQjZhnrHXBH8l7z2PlOcy2tHuhM9Q= X-Received: by 10.223.136.15 with SMTP id d15mr1890013wrd.127.1520695375885; Sat, 10 Mar 2018 07:22:55 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.22.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:22:55 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 09/23] crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels Date: Sat, 10 Mar 2018 15:21:54 +0000 Message-Id: <20180310152208.10369-10-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Tweak the SHA256 update routines to invoke the SHA256 block transform block by block, to avoid excessive scheduling delays caused by the NEON algorithm running with preemption disabled. Also, remove a stale comment which no longer applies now that kernel mode NEON is actually disallowed in some contexts. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha256-glue.c | 36 +++++++++++++------- 1 file changed, 23 insertions(+), 13 deletions(-) -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c index b064d925fe2a..e8880ccdc71f 100644 --- a/arch/arm64/crypto/sha256-glue.c +++ b/arch/arm64/crypto/sha256-glue.c @@ -89,21 +89,32 @@ static struct shash_alg algs[] = { { static int sha256_update_neon(struct shash_desc *desc, const u8 *data, unsigned int len) { - /* - * Stacking and unstacking a substantial slice of the NEON register - * file may significantly affect performance for small updates when - * executing in interrupt context, so fall back to the scalar code - * in that case. - */ + struct sha256_state *sctx = shash_desc_ctx(desc); + if (!may_use_simd()) return sha256_base_do_update(desc, data, len, (sha256_block_fn *)sha256_block_data_order); - kernel_neon_begin(); - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); - kernel_neon_end(); + while (len > 0) { + unsigned int chunk = len; + + /* + * Don't hog the CPU for the entire time it takes to process all + * input when running on a preemptible kernel, but process the + * data block by block instead. + */ + if (IS_ENABLED(CONFIG_PREEMPT) && + chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE) + chunk = SHA256_BLOCK_SIZE - + sctx->count % SHA256_BLOCK_SIZE; + kernel_neon_begin(); + sha256_base_do_update(desc, data, chunk, + (sha256_block_fn *)sha256_block_neon); + kernel_neon_end(); + data += chunk; + len -= chunk; + } return 0; } @@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, const u8 *data, sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_data_order); } else { - kernel_neon_begin(); if (len) - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); + sha256_update_neon(desc, data, len); + kernel_neon_begin(); sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_neon); kernel_neon_end(); From patchwork Sat Mar 10 15:21:57 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131302 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2244825lja; Sat, 10 Mar 2018 07:23:10 -0800 (PST) X-Google-Smtp-Source: AG47ELvK4YzxfOaeTx+cJTOYxGssbXoHfMAKgvGvijtAdo/6716DuO6mr/S6/4WojNHBx5OIY93w X-Received: by 10.98.247.9 with SMTP id h9mr2268483pfi.212.1520695389938; Sat, 10 Mar 2018 07:23:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695389; cv=none; d=google.com; s=arc-20160816; b=EJe4WXLs7y6Wqjo/dvBLyGdAw3/m6zJJDguQ9TVhpRR2JykNQlFCAa9iZZGA/s5o4z aju5UeUOOt6lQ50x5CL6GyRTSDNMSH/LDFJT3Fl8Nzvy9EOHvlKb3wwGXN9tFDkMlMmn 2VgggZvkuCMHFKVm8BPvYNA5+Zj3BfDL3lRjUgntxrJqriEfHg6cJCh+sABwwwYhXRGw cWKvCjKNr5NpjKzb+lhSvNbGLd1lEEV7u0GxoyPj4UDVbWCrMaioMrNBfGNB/zlT1Leu GYc0QdZnp6p/QWWykhe4Oj6Y73WYVreYXqRLV8gDkzwSccyTH/wIvDjLWCerLo8gDIRR cVMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=HIaJ2+vYgpjVaRswbk9moKmdBZkLjUoatcf1FGuw1Kk=; b=AsNHIUoYYKcscnYY2Dn6QI72heJNHAJrqsvHB0qvgNWqn0P+wLOivPOolYIboNMlxR Ml6RJdSk2cQWIw+8eCex9eIitTwpfJ+y/Jh0u0/TthmA73rQG6huVPfppiL6vK+kMpG2 XK9j69tJngdNm+LsUB3Gx2ZyTUwdhMmKmETyqRVNRGbe/v6v/hLDIiCoHbZXoD+d/sc3 eKJbGblNrsRpgnuwmpbaahzL1vYn8EaH5y4zwkD3zT29Ts0csOQCwXEE/+k/WoLdtUR3 si0kNVWftHqLnQSdrQXsckxkIoQOR5sqF6uqxG1FwEHSFdnrn89qnqX7F+Xks6qisEbr 6lgw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=BlX59uZr; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.23.09; Sat, 10 Mar 2018 07:23:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=BlX59uZr; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932325AbeCJPXI (ORCPT + 4 others); Sat, 10 Mar 2018 10:23:08 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:35149 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932333AbeCJPXG (ORCPT ); Sat, 10 Mar 2018 10:23:06 -0500 Received: by mail-wm0-f65.google.com with SMTP id x7so8758697wmc.0 for ; Sat, 10 Mar 2018 07:23:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=b5L/U4UbL5kjXRpLckrVX3q+qInCo8IzJ15unk/D7c0=; b=BlX59uZrJIhgn55fe1G3KrfItzjy3qR0gn6UBxOds1HAum6BqyJzBJv39VjU/48MC4 Z0YdzpM66a1FOheE4l8rum8zy3o5b9+C6+dN1HOip1E0zWU1w3767SHP3iYmKjg7zXx5 tG2sGTeZmWcRWDJk8Co8aW6z0ROZfDMBfIaZE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=b5L/U4UbL5kjXRpLckrVX3q+qInCo8IzJ15unk/D7c0=; b=qJzfGd4lKqkj4rUsTbMYd1Z/lg4So9co5kLUdJZrPVPKPJslGWtgtLTYoZhQYtYaZ7 4zY+LHZl4uwFQCUnLZ2nfSuXIKs4CwRvQKAdSY+xsXHMjOR/CL3XM8OlJqpQ2XSLZWTQ E3ueLLblpjPDCC6BQBenQCtLaeJNT80JH+jpb8ppn0QWnHbOsoaWdAYrYlQvD3aIAJbM WFGjvG5iZcxyNdLgKfYvttBdJkEaF/XX6DLjXFNZswgesfTiXpoYcPgcA+bSVfoR3IGc 6u3gLijvXoyEUzmjxWgjpphA36/PMJDVBEbKmHCDo66IVcCIoWFG98LeyzFRYLhNP5bJ G/RA== X-Gm-Message-State: AElRT7H9miWgrep48x8aG33sn97gxxyDUzeRuiWfKAdqBW+jpRHgJ5XZ 5U+C13TGjpQoMEsjdLmNPfD6+A== X-Received: by 10.28.109.90 with SMTP id i87mr1336898wmc.71.1520695385165; Sat, 10 Mar 2018 07:23:05 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.23.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:23:04 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 12/23] crypto: arm64/sha1-ce - yield NEON after every block of input Date: Sat, 10 Mar 2018 15:21:57 +0000 Message-Id: <20180310152208.10369-13-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha1-ce-core.S | 42 ++++++++++++++------ 1 file changed, 29 insertions(+), 13 deletions(-) -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index 46049850727d..78eb35fb5056 100644 --- a/arch/arm64/crypto/sha1-ce-core.S +++ b/arch/arm64/crypto/sha1-ce-core.S @@ -69,30 +69,36 @@ * int blocks) */ ENTRY(sha1_ce_transform) + frame_push 3 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - loadrc k0.4s, 0x5a827999, w6 +0: loadrc k0.4s, 0x5a827999, w6 loadrc k1.4s, 0x6ed9eba1, w6 loadrc k2.4s, 0x8f1bbcdc, w6 loadrc k3.4s, 0xca62c1d6, w6 /* load state */ - ld1 {dgav.4s}, [x0] - ldr dgb, [x0, #16] + ld1 {dgav.4s}, [x19] + ldr dgb, [x19, #16] /* load sha1_ce_state::finalize */ ldr_l w4, sha1_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v8.4s-v11.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v8.4s-v11.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE( rev32 v8.16b, v8.16b ) CPU_LE( rev32 v9.16b, v9.16b ) CPU_LE( rev32 v10.16b, v10.16b ) CPU_LE( rev32 v11.16b, v11.16b ) -1: add t0.4s, v8.4s, k0.4s +2: add t0.4s, v8.4s, k0.4s mov dg0v.16b, dgav.16b add_update c, ev, k0, 8, 9, 10, 11, dgb @@ -123,16 +129,25 @@ CPU_LE( rev32 v11.16b, v11.16b ) add dgbv.2s, dgbv.2s, dg1v.2s add dgav.4s, dgav.4s, dg0v.4s - cbnz w2, 0b + cbz w21, 3f + + if_will_cond_yield_neon + st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha1_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] movi v9.2d, #0 mov x8, #0x80000000 movi v10.2d, #0 @@ -141,10 +156,11 @@ CPU_LE( rev32 v11.16b, v11.16b ) mov x4, #0 mov v11.d[0], xzr mov v11.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s}, [x0] - str dgb, [x0, #16] +4: st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + frame_pop ret ENDPROC(sha1_ce_transform) From patchwork Sat Mar 10 15:21:58 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131303 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2244895lja; Sat, 10 Mar 2018 07:23:15 -0800 (PST) X-Google-Smtp-Source: AG47ELskuyKVH5jxlSoDzRKRP3q9Sb00U+kvGNTy2xTLKyhBGlARSH7EeZGnii5iCsxgX19waw0M X-Received: by 10.99.126.22 with SMTP id z22mr1930688pgc.131.1520695395142; Sat, 10 Mar 2018 07:23:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695395; cv=none; d=google.com; s=arc-20160816; b=E2wEttX43JCqh6MyOLemj7PJ/UYnplj5mmWiDKPOMBf9pE30wPYEix0su39yMzDh5R O3Lmkb460N0L8I8RUpKANgMkXylDb2+Vm91IC8RG8E/njY2A6J8+kB+TRD6ozbkdKnko 8B/x8XyzlDKE6BwtukzPqqSbk3/NPTnfjUaV7QZtLNe+0tvefx+qDIzrr4NG9+iszmiq n7jDUPFm1vBICIly3dgAceRnZlKc5X7cm64ciP/zKssDd1Ga6AnTgK/j5l/MjcQ0hbTl OrLerrRDaKGTWo8lpANc8gaWRUbAiOqJgSxaMNbM0VYnzQQJqVWCvlD/RxtySIFid4OV eTgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=FRGh/TpI6+LpOabC5TRhlwpbhdoLqvMSFt3gyzS2XsE=; b=adFD5qamW6v4Z85fl7WVFRbG2mpUcdpZdrVuWS3+zKsroYXJbzbrSwhpmwNI9A1Ju0 1QSYjd66fTk7jQ10pKMwFu/bV6wWHma5l2TbC/30V7ZbURPMPfhdSDqId5kiDFHnohzm BFvzgYa/k/7LavWwfWOnNKpBVH+VcT57Ju24pEPidxtEQYmRu7XW1qI39FhG16mhnYvW lMMcWcMOxWVW0NBx86rViZSp46eCZ3M9k+WAItH8MZgj6I51y5vwNd5pH2xag66Juak7 PKFJxWXJSbMnWVMaX3q+onbjlnc6eX8o8a5+9RVdZX46BQbvvr9DtXYBgyPs6ZfmAcLZ XNMA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=JxhTbwnv; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.23.14; Sat, 10 Mar 2018 07:23:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=JxhTbwnv; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932344AbeCJPXL (ORCPT + 4 others); Sat, 10 Mar 2018 10:23:11 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:34008 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932333AbeCJPXJ (ORCPT ); Sat, 10 Mar 2018 10:23:09 -0500 Received: by mail-wr0-f193.google.com with SMTP id o8so11656447wra.1 for ; Sat, 10 Mar 2018 07:23:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6S0c00C+8/wTFzECcMczaDUphxrVLoCXvzqxTv8Ik0U=; b=JxhTbwnvShuB1NodIOmY29mRDIiGH+AhUMmyJNoPgaU1wePY85yGJ3I8mD/s0DdGqu HcuQDvpjIcW9Py7fGeh3HufZl2OlgY76/v18a45dTA4w4sUS8cDbYh1AGTeuROJAIlDQ fDtFB/yPm1ikjNLvHTx+btMLY82eL9iDEG2YE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6S0c00C+8/wTFzECcMczaDUphxrVLoCXvzqxTv8Ik0U=; b=PwrOeuWSwf0B9zwB83T07pBuPJjvaCfuDBuhyeRXT7Eckw6IMVmFjqm/9zKB5rLYYc OQHSIwFgb8BKhd9BAwBRp0oAdBew0mB9uLNOwrhqPcJR7DnUxFd+PhRLNbG93YK3VaBL Suv3qqw4zet6mtXnp6uN3W8yoxRz3h9E4ytY0vKOnYL9HzlBi3Nr02H9v4SdosqqgDRj eIrITqwgs3VyQFDe8Z+q2nYKEi71W+6KJnmm0pAynbZ9Ak0vnnaYkvUwJgo6E2WBl3Lq 7mnpMrZ85iFbrvGew+FTcROv/qtTurSj1PsuRsv75Mjyd33fZeK6xOFyjMNks6/23p+9 Xdgg== X-Gm-Message-State: AElRT7FCVwYbUevDHr1jFtA5KGDJw3K3MtlJs/ePVx93Vby7AZriDY5T KEgbUl3+WR/23XBiYRs0ezDcLg== X-Received: by 10.223.197.1 with SMTP id q1mr1718779wrf.268.1520695388109; Sat, 10 Mar 2018 07:23:08 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.23.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:23:07 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 13/23] crypto: arm64/sha2-ce - yield NEON after every block of input Date: Sat, 10 Mar 2018 15:21:58 +0000 Message-Id: <20180310152208.10369-14-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha2-ce-core.S | 37 ++++++++++++++------ 1 file changed, 26 insertions(+), 11 deletions(-) -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S index 4c3c89b812ce..cd8b36412469 100644 --- a/arch/arm64/crypto/sha2-ce-core.S +++ b/arch/arm64/crypto/sha2-ce-core.S @@ -79,30 +79,36 @@ */ .text ENTRY(sha2_ce_transform) + frame_push 3 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - adr_l x8, .Lsha2_rcon +0: adr_l x8, .Lsha2_rcon ld1 { v0.4s- v3.4s}, [x8], #64 ld1 { v4.4s- v7.4s}, [x8], #64 ld1 { v8.4s-v11.4s}, [x8], #64 ld1 {v12.4s-v15.4s}, [x8] /* load state */ - ld1 {dgav.4s, dgbv.4s}, [x0] + ld1 {dgav.4s, dgbv.4s}, [x19] /* load sha256_ce_state::finalize */ ldr_l w4, sha256_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v16.4s-v19.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v16.4s-v19.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE( rev32 v16.16b, v16.16b ) CPU_LE( rev32 v17.16b, v17.16b ) CPU_LE( rev32 v18.16b, v18.16b ) CPU_LE( rev32 v19.16b, v19.16b ) -1: add t0.4s, v16.4s, v0.4s +2: add t0.4s, v16.4s, v0.4s mov dg0v.16b, dgav.16b mov dg1v.16b, dgbv.16b @@ -131,16 +137,24 @@ CPU_LE( rev32 v19.16b, v19.16b ) add dgbv.4s, dgbv.4s, dg1v.4s /* handled all input blocks? */ - cbnz w2, 0b + cbz w21, 3f + + if_will_cond_yield_neon + st1 {dgav.4s, dgbv.4s}, [x19] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha256_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] movi v17.2d, #0 mov x8, #0x80000000 movi v18.2d, #0 @@ -149,9 +163,10 @@ CPU_LE( rev32 v19.16b, v19.16b ) mov x4, #0 mov v19.d[0], xzr mov v19.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s, dgbv.4s}, [x0] +4: st1 {dgav.4s, dgbv.4s}, [x19] + frame_pop ret ENDPROC(sha2_ce_transform) From patchwork Sat Mar 10 15:21:59 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131304 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2244905lja; Sat, 10 Mar 2018 07:23:15 -0800 (PST) X-Google-Smtp-Source: AG47ELsmuU4h6D9KuSTMb6cjmjlwyRRroitJSabjnS9NmI//mKpzWpqzWBlfkoFlNsk1msetVTA1 X-Received: by 10.99.153.1 with SMTP id d1mr1893399pge.338.1520695395563; Sat, 10 Mar 2018 07:23:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695395; cv=none; d=google.com; s=arc-20160816; b=eLsc0/4MavspRmGWXt7KbhYWI2VR6aoDkjB5O0lh91mGO+dKy8Wo63M7BPqJnK57EU +55aqS9Zh4LBfz6d8MDF5qfUTorPR5MINXoxIxoaMts7gHMKxmS7TX2p55LVND8y69Ev flPa81Gk8TEmQSQPzMUWRXaYMAwyseC0kUKW1WWRs7fix4+A8+EzAYFTxxNyjb+t9wYs bMZLhkIPb22HAlVNkx0gLVGYeV/0D29lWmFYUF3KOclzUL5qAX8Q7iQCH/8xvABifrUs tiiAe6lbcXW14KInKAiqWH/nUF4XcAHZD+RXe3huHKS5I2OJJojEPdnCAO28TLl83DPl +anw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=qcLpG0bu0AoKv5TwPlRrrpuYDncCeRbq5pWK0Mjd/og=; b=Kttsg9K0VVQCZHLQxKJ5PUFEthFy0+XV5pEzZiEktiMU0YDgAbBygGO2cJQeEOvEB4 WfxXZXCInxGnD3ZLtDBryMGIascad3QCGqrLPiAJ4OQfcrUw5PiZ+BZf8kaIQfh3Y/05 ai6yvuJOGg+LsyZ+wWrs/jDkKoqayFE9O1X1qKVy7r3tCeRW9q5chTKOopg4UZA5WEQ+ o8T6snpriIBITR+vEsk4Cm/IkzjvD0Q74gvXn8FMGch9+AS08SJK0p76IALqUopsGbEE Gniff2+qYFc+AFrIBfG11j4aZ9cVqSWW7VUuFWts4IDVCa50uzoQHugb2QN4CtNoEWmq VLJA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=NvnoSm/v; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.23.15; Sat, 10 Mar 2018 07:23:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=NvnoSm/v; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932348AbeCJPXO (ORCPT + 4 others); Sat, 10 Mar 2018 10:23:14 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:45877 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932333AbeCJPXM (ORCPT ); Sat, 10 Mar 2018 10:23:12 -0500 Received: by mail-wr0-f196.google.com with SMTP id h2so4349047wre.12 for ; Sat, 10 Mar 2018 07:23:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5SiLhzy4dazU8VXtu+bV6CCUXlCVYzyZ1psXbNjAwzc=; b=NvnoSm/vvfm7HkRqpJ0Vavt639NgqcdP7qmt2xJ2NHOYgnSCVN1t5JQYiilmkyfFb/ vEXWbTbF3unTnVaUz7MDdtvlkr0jgMQkjXFBkeEfblxmbYCeGVU2MdLKdgbFBkXjsEyy Z7KM4/r71RYp+3YJvElznKyzXkFIgu8spN3MQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5SiLhzy4dazU8VXtu+bV6CCUXlCVYzyZ1psXbNjAwzc=; b=bUAxHceED+xvIJjQIzaaH8yIsllUHanhderUhmlyHR8nJ9nEUayALvmPqX3CHIJxOP nfwDZeDW+En6Lxiph8nyHd1wgEz2u3ZkQlFTK+SyaLym8OH+qsqJjqPX5QZMlAay3npX Ss34pCajS5priThRrvdeCBADzgMdNH1SnPP3PDb5sjBr8UvAKzPd+AspMi8tTqYginiH S8OtRrGjtxxSrs6FsXJXCPn4pqIe1b8PhKTl5bfbEFtEEEQN3dtbIwbDqzxQClxbimng +9hEjTr6WXuNybXEoTvjcRNfMnrqxoiQozIqhpLJSFhXp/wNcfp1YSqrLoOcLu+wK81b YgVA== X-Gm-Message-State: AElRT7FB4wucVWygpi1UVAqVvIi6vhNNoul91JGvDRxBpa0ftRWlS6Or 19ReA9EvK3X2l9se8rhLFmekog== X-Received: by 10.223.151.129 with SMTP id s1mr1888888wrb.206.1520695391215; Sat, 10 Mar 2018 07:23:11 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.23.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:23:10 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 14/23] crypto: arm64/aes-ccm - yield NEON after every block of input Date: Sat, 10 Mar 2018 15:21:59 +0000 Message-Id: <20180310152208.10369-15-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 150 +++++++++++++------- 1 file changed, 95 insertions(+), 55 deletions(-) -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index e3a375c4cb83..88f5aef7934c 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -19,24 +19,33 @@ * u32 *macp, u8 const rk[], u32 rounds); */ ENTRY(ce_aes_ccm_auth_data) - ldr w8, [x3] /* leftover from prev round? */ + frame_push 7 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + + ldr w25, [x22] /* leftover from prev round? */ ld1 {v0.16b}, [x0] /* load mac */ - cbz w8, 1f - sub w8, w8, #16 + cbz w25, 1f + sub w25, w25, #16 eor v1.16b, v1.16b, v1.16b -0: ldrb w7, [x1], #1 /* get 1 byte of input */ - subs w2, w2, #1 - add w8, w8, #1 +0: ldrb w7, [x20], #1 /* get 1 byte of input */ + subs w21, w21, #1 + add w25, w25, #1 ins v1.b[0], w7 ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */ beq 8f /* out of input? */ - cbnz w8, 0b + cbnz w25, 0b eor v0.16b, v0.16b, v1.16b -1: ld1 {v3.4s}, [x4] /* load first round key */ - prfm pldl1strm, [x1] - cmp w5, #12 /* which key size? */ - add x6, x4, #16 - sub w7, w5, #2 /* modified # of rounds */ +1: ld1 {v3.4s}, [x23] /* load first round key */ + prfm pldl1strm, [x20] + cmp w24, #12 /* which key size? */ + add x6, x23, #16 + sub w7, w24, #2 /* modified # of rounds */ bmi 2f bne 5f mov v5.16b, v3.16b @@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data) ld1 {v5.4s}, [x6], #16 /* load next round key */ bpl 3b aese v0.16b, v4.16b - subs w2, w2, #16 /* last data? */ + subs w21, w21, #16 /* last data? */ eor v0.16b, v0.16b, v5.16b /* final round */ bmi 6f - ld1 {v1.16b}, [x1], #16 /* load next input block */ + ld1 {v1.16b}, [x20], #16 /* load next input block */ eor v0.16b, v0.16b, v1.16b /* xor with mac */ - bne 1b -6: st1 {v0.16b}, [x0] /* store mac */ + beq 6f + + if_will_cond_yield_neon + st1 {v0.16b}, [x19] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x19] /* reload mac */ + endif_yield_neon + + b 1b +6: st1 {v0.16b}, [x19] /* store mac */ beq 10f - adds w2, w2, #16 + adds w21, w21, #16 beq 10f - mov w8, w2 -7: ldrb w7, [x1], #1 + mov w25, w21 +7: ldrb w7, [x20], #1 umov w6, v0.b[0] eor w6, w6, w7 - strb w6, [x0], #1 - subs w2, w2, #1 + strb w6, [x19], #1 + subs w21, w21, #1 beq 10f ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ b 7b -8: mov w7, w8 - add w8, w8, #16 +8: mov w7, w25 + add w25, w25, #16 9: ext v1.16b, v1.16b, v1.16b, #1 adds w7, w7, #1 bne 9b eor v0.16b, v0.16b, v1.16b - st1 {v0.16b}, [x0] -10: str w8, [x3] + st1 {v0.16b}, [x19] +10: str w25, [x22] + + frame_pop ret ENDPROC(ce_aes_ccm_auth_data) @@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final) ENDPROC(ce_aes_ccm_final) .macro aes_ccm_do_crypt,enc - ldr x8, [x6, #8] /* load lower ctr */ - ld1 {v0.16b}, [x5] /* load mac */ -CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ + frame_push 8 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + ldr x26, [x25, #8] /* load lower ctr */ + ld1 {v0.16b}, [x24] /* load mac */ +CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */ 0: /* outer loop */ - ld1 {v1.8b}, [x6] /* load upper ctr */ - prfm pldl1strm, [x1] - add x8, x8, #1 - rev x9, x8 - cmp w4, #12 /* which key size? */ - sub w7, w4, #2 /* get modified # of rounds */ + ld1 {v1.8b}, [x25] /* load upper ctr */ + prfm pldl1strm, [x20] + add x26, x26, #1 + rev x9, x26 + cmp w23, #12 /* which key size? */ + sub w7, w23, #2 /* get modified # of rounds */ ins v1.d[1], x9 /* no carry in lower ctr */ - ld1 {v3.4s}, [x3] /* load first round key */ - add x10, x3, #16 + ld1 {v3.4s}, [x22] /* load first round key */ + add x10, x22, #16 bmi 1f bne 4f mov v5.16b, v3.16b @@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ bpl 2b aese v0.16b, v4.16b aese v1.16b, v4.16b - subs w2, w2, #16 - bmi 6f /* partial block? */ - ld1 {v2.16b}, [x1], #16 /* load next input block */ + subs w21, w21, #16 + bmi 7f /* partial block? */ + ld1 {v2.16b}, [x20], #16 /* load next input block */ .if \enc == 1 eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */ @@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ eor v1.16b, v2.16b, v5.16b /* final round enc */ .endif eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ - st1 {v1.16b}, [x0], #16 /* write output block */ - bne 0b -CPU_LE( rev x8, x8 ) - st1 {v0.16b}, [x5] /* store mac */ - str x8, [x6, #8] /* store lsb end of ctr (BE) */ -5: ret - -6: eor v0.16b, v0.16b, v5.16b /* final round mac */ + st1 {v1.16b}, [x19], #16 /* write output block */ + beq 5f + + if_will_cond_yield_neon + st1 {v0.16b}, [x24] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x24] /* reload mac */ + endif_yield_neon + + b 0b +5: +CPU_LE( rev x26, x26 ) + st1 {v0.16b}, [x24] /* store mac */ + str x26, [x25, #8] /* store lsb end of ctr (BE) */ + +6: frame_pop + ret + +7: eor v0.16b, v0.16b, v5.16b /* final round mac */ eor v1.16b, v1.16b, v5.16b /* final round enc */ - st1 {v0.16b}, [x5] /* store mac */ - add w2, w2, #16 /* process partial tail block */ -7: ldrb w9, [x1], #1 /* get 1 byte of input */ + st1 {v0.16b}, [x24] /* store mac */ + add w21, w21, #16 /* process partial tail block */ +8: ldrb w9, [x20], #1 /* get 1 byte of input */ umov w6, v1.b[0] /* get top crypted ctr byte */ umov w7, v0.b[0] /* get top mac byte */ .if \enc == 1 @@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 ) eor w9, w9, w6 eor w7, w7, w9 .endif - strb w9, [x0], #1 /* store out byte */ - strb w7, [x5], #1 /* store mac byte */ - subs w2, w2, #1 - beq 5b + strb w9, [x19], #1 /* store out byte */ + strb w7, [x24], #1 /* store mac byte */ + subs w21, w21, #1 + beq 6b ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */ - b 7b + b 8b .endm /* From patchwork Sat Mar 10 15:22:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131313 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2245312lja; Sat, 10 Mar 2018 07:23:44 -0800 (PST) X-Google-Smtp-Source: AG47ELscLl+bi1G6zLTyTNSK9Lc1hdR5L1TtGqoYFi/y9twp/+6sv4xacaPfB1wgMVeErL5FFRU7 X-Received: by 2002:a17:902:7d17:: with SMTP id z23-v6mr2365471pll.237.1520695424552; Sat, 10 Mar 2018 07:23:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695424; cv=none; d=google.com; s=arc-20160816; b=p86r2KXWEK4oRn80SqC4eRBw/pdC2TsdSR+0QCRgtQHuojUl7R6TZIiQMwlkKr+BYA XhVp9bKJc1k3I5Haj6q1o9ggCBnoTv4aLUsZ1CdIHyE040f6F1ZgaCQNnfXKp9lFDbEU /kyDTl7KYeG+Dju/8GxeNLCjB2aG83KQovnyOcgPSJJmzI+cQR063hM20+NV9Bfr1Jkb EfnU/jm5tctT8h0z1TYYMUqf9eC0D020ViyrUXAaH7lgy1zypGvq2P7o6lbCO5wRbsI9 Fe0Rect73UK6Tw7q8FYncQF/+5xrKPGyyXNAgCybgvRhE4AXMBXjpn/0GNVZsxDFRu2/ si9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=uxjzQNLyWXAWnhASbAVqAn408zF7GfxLtktok/ozjG4=; b=mKf4y+LRlEoPZiavP6hQd4WQxvYZfSOSMpdCqLYFRR/l60ApfRjuvLDTZ91dN+/u4Z S+trXQ64o42U8JWpk1KYZHL2ZYmwuFmUElOUnOUm2Vs94g+2bRs2T4XH1q3iU4ky2V6D Wg1NDzJh5xreYvGBPf/i6R12LCsTSV0NGyA3p0tctIAZuaNkKzh+Pk3k33Sxfr3ZrZt8 QSWdFvWITHgdPAEM9aGwcOU+b8NYMY6HsYE9Knnjb3KIQxJTfkI+wn9mDtQ4/DuTfnNS 1fUhrVCzAudGweDGAFBFhdYvsFybP2zsxq8drgnUircw3g34Rdt2JWer8kuFA6ctzHgo 5LPg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=kVBFaH+c; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.23.44; Sat, 10 Mar 2018 07:23:44 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=kVBFaH+c; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932383AbeCJPXm (ORCPT + 4 others); Sat, 10 Mar 2018 10:23:42 -0500 Received: from mail-wr0-f195.google.com ([209.85.128.195]:36523 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932354AbeCJPXl (ORCPT ); Sat, 10 Mar 2018 10:23:41 -0500 Received: by mail-wr0-f195.google.com with SMTP id v111so11653187wrb.3 for ; Sat, 10 Mar 2018 07:23:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=J3yIC47FNsulB0QXNdmsvbsH2sCTyn6D7td1rJviiJE=; b=kVBFaH+cLGoUBq8W8bAVnlrJGNI2dfw4BPlZ2L5ceMFHqBzVfb/HsiFbDBDTyq7G76 0jKnoSwrpGzHaOlFLUfe0Huzpf9fxbu2P+HauymHwjiUA83ZOhmcFKw1MvUwm3JBOQmU IU88r9h+oozzXoHQoCXzFoEUz00d5boKy2FO4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=J3yIC47FNsulB0QXNdmsvbsH2sCTyn6D7td1rJviiJE=; b=huOPXFtlatZOVKW9nic5GN+Mkaut0qXknbxV+Q0t5R7EAgKtWrHSX+d8gD5gKiSq2A LnZ7QsWRojeV6QsBdsW6ZojtkH9ugtAYQeo83NdBYcdP2QQWAYKBFf5E4yHDvfdjtNmZ Y8gWyRNmiEybX2LiCRu/SyVmWUlriCnbZFb8gVv6PYWDvhYLXJp0XRH4QzFXhw2NfxPm okd8PihRCDIknFavvgHxrbJIG19JubpcpzfIN5tbw3WaRyxnMntUhwUikqqcpTTNbGPX h+ZKSaOwEbGtJmB1mSi9JikL0AjL+vUYYSAJeK6rxyDwLvIlFxFAVDcKhBV4xpz4vWJK SvTQ== X-Gm-Message-State: AElRT7GVzdD+eHgo7GWRNXpuTo6KazSj4hQxBFa/186pehigr5eRBkPI zZooVHg0h42gP6AXzUa3YNN/hw== X-Received: by 10.223.142.21 with SMTP id n21mr1929804wrb.85.1520695419968; Sat, 10 Mar 2018 07:23:39 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.23.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:23:39 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 23/23] DO NOT MERGE Date: Sat, 10 Mar 2018 15:22:08 +0000 Message-Id: <20180310152208.10369-24-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Test code to force a kernel_neon_end+begin sequence at every yield point, and wipe the entire NEON state before resuming the algorithm. --- arch/arm64/include/asm/assembler.h | 33 ++++++++++++++++++++ 1 file changed, 33 insertions(+) -- 2.15.1 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 61168cbe9781..b471b0bbdfe6 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -678,6 +678,7 @@ alternative_else_nop_endif cmp w1, #PREEMPT_DISABLE_OFFSET csel x0, x0, xzr, eq tbnz x0, #TIF_NEED_RESCHED, .Lyield_\@ // needs rescheduling? + b .Lyield_\@ #endif /* fall through to endif_yield_neon */ .subsection 1 @@ -687,6 +688,38 @@ alternative_else_nop_endif .macro do_cond_yield_neon bl kernel_neon_end bl kernel_neon_begin + movi v0.16b, #0x55 + movi v1.16b, #0x55 + movi v2.16b, #0x55 + movi v3.16b, #0x55 + movi v4.16b, #0x55 + movi v5.16b, #0x55 + movi v6.16b, #0x55 + movi v7.16b, #0x55 + movi v8.16b, #0x55 + movi v9.16b, #0x55 + movi v10.16b, #0x55 + movi v11.16b, #0x55 + movi v12.16b, #0x55 + movi v13.16b, #0x55 + movi v14.16b, #0x55 + movi v15.16b, #0x55 + movi v16.16b, #0x55 + movi v17.16b, #0x55 + movi v18.16b, #0x55 + movi v19.16b, #0x55 + movi v20.16b, #0x55 + movi v21.16b, #0x55 + movi v22.16b, #0x55 + movi v23.16b, #0x55 + movi v24.16b, #0x55 + movi v25.16b, #0x55 + movi v26.16b, #0x55 + movi v27.16b, #0x55 + movi v28.16b, #0x55 + movi v29.16b, #0x55 + movi v30.16b, #0x55 + movi v31.16b, #0x55 .endm .macro endif_yield_neon, lbl