From patchwork Mon Dec 4 12:26:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120513 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365400qgn; Mon, 4 Dec 2017 04:27:08 -0800 (PST) X-Google-Smtp-Source: AGs4zMZGMUwv2vsjerxYWolqSO45RGo9CH4nVRYGPQP45Wa44ZoY4/E5eWCDSmwNTCdTgGlKJ9mW X-Received: by 10.101.78.7 with SMTP id r7mr13929043pgt.209.1512390428706; Mon, 04 Dec 2017 04:27:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390428; cv=none; d=google.com; s=arc-20160816; b=Z5Y1NeVgpwaaplXJRM+BAfsO97fioW4agsrKqv/DSXYDEs8ygKiWZw0707G2fb2v7L PEHtTSaGMxkCbDPBkFv83XutsmXpYIddFWhOo3bDHwIKIJ5xoAYkJbmiFCDzDD9TFvFG eAjn/QZeVohOlNDxV02BW2IhItPXMjIBS0/F3RNE8e0H/3xsghqWBeqbTOMixlsm0EjF IsTzv1JO/Gka1nBL0HF9i+8NHO1kmVdubbndgEzY8g+valXH9/mLEbjjkMnm1rb0Jg1R VCPs6TPSZpTpRjCPZnvSr8H6V32jLB1GJ6Luz2AFzsjp6vmWbwVuhrwm/SAN1ecKu/Kg cZ5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=uyDGeOKNRA4jlVM5QpT00Zx4pY4jvQZ8vYc3McfTm6LvZzj70TvxkY6aWJAPa0KZJk +jOD6B1fEqtFlJccN69m/EQWuNhIeJupmNqzn+/vd/tY83eZcAZr4s6YBMDn6pdCrQVK NJIyElvN1op9C3xroCVHiM65cqKBZ241WLaOAN9Amh1nXvNEVJcQQVDPJ48bO49qzg2w DYtpO0zUsqEvp20iyB902BxeybNtiZEzLlAvVh1LhajkIZ7Qu9MCe06S3m0ock40rFmK KdaitxnGi07iCYhq+moiqjtxvOxDsaIoyvatS0J1oGKsBHbZ+ltD5XFZxIWCaZqKPOzp qL1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=B9pErP/O; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay5si7779699plb.11.2017.12.04.04.27.08; Mon, 04 Dec 2017 04:27:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=B9pErP/O; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753092AbdLDM1H (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:07 -0500 Received: from mail-wr0-f195.google.com ([209.85.128.195]:38014 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753138AbdLDM1F (ORCPT ); Mon, 4 Dec 2017 07:27:05 -0500 Received: by mail-wr0-f195.google.com with SMTP id o2so17065498wro.5 for ; Mon, 04 Dec 2017 04:27:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=B9pErP/OiJwnsmBwbJ4+Go3Psp+3iBem0bxBs7B7cM8edylLk8wXVDJnY+ItBz0xgc EbG+l0otxPACT0ZYqSOGAFv+EYb67I26YS8hBH/ShQxPENtKO2I3ftR/Y+e2DG9x7I14 DfPEmqfCoypggHc9ldiGjzJJQsEYqUHx79BFw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=b5gR1zUbIW8dUPt7MEq28onVzRNFiUR4sb1MyF1dnFNKPrAHgEMv2CwW5RnJuyFl1a FRMw5sxo+qb2tkeKQIqNEWTA4LbghYV8AitHIDzaGDAa/QeGUnKEhWwMNLx1qOXyLt3p uHo/3uwn5JtDzE2sWg8BFK9z/SIOlYsqnVnKUFu2NYyDJtK6qFUqPHrUPDsv1ycEX+QK dHG5OUbK4DeNyFm64Oc4uihtNH4mc2IM/iI7U6WGtAlOfzkkLXr/vXsIQRFY0ZlUxyzz kuUzxLHIdNj8k6JxtKq+SGiZiJjch8VowMR59wz9Mge+zd3fMGdYTkdABorR6hSxovxo 8gzw== X-Gm-Message-State: AJaThX41UN6IGOHGiNFVQuQRetr31gHqC6lyTFWDXVSGFLERTQTjT18K VrnArKED5Q0umtBGNXk38fJQawtbpT0= X-Received: by 10.223.166.51 with SMTP id k48mr12424280wrc.125.1512390423491; Mon, 04 Dec 2017 04:27:03 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.00 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:02 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 03/19] crypto: arm64/aes-blk - move kernel mode neon en/disable into loop Date: Mon, 4 Dec 2017 12:26:29 +0000 Message-Id: <20171204122645.31535-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Note that this requires some reshuffling of the registers in the asm code, because the XTS routines can no longer rely on the registers to retain their contents between invocations. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 95 ++++++++++---------- arch/arm64/crypto/aes-modes.S | 90 +++++++++---------- arch/arm64/crypto/aes-neonbs-glue.c | 14 ++- 3 files changed, 97 insertions(+), 102 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 998ba519a026..00a3e2fd6a48 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2"); /* defined in aes-modes.S */ asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 ctr[], int first); + int rounds, int blocks, u8 ctr[]); asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], int rounds, int blocks, u8 const rk2[], u8 iv[], @@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, first); + (u8 *)ctx->key_enc, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, first); + (u8 *)ctx->key_dec, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -173,20 +173,19 @@ static int cbc_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -194,20 +193,19 @@ static int cbc_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_dec, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -215,20 +213,18 @@ static int ctr_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - first = 1; - kernel_neon_begin(); while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; + kernel_neon_end(); } if (walk.nbytes) { u8 __aligned(8) tail[AES_BLOCK_SIZE]; @@ -241,12 +237,13 @@ static int ctr_encrypt(struct skcipher_request *req) */ blocks = -1; + kernel_neon_begin(); aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds, - blocks, walk.iv, first); + blocks, walk.iv); + kernel_neon_end(); crypto_xor_cpy(tdst, tsrc, tail, nbytes); err = skcipher_walk_done(&walk, 0); } - kernel_neon_end(); return err; } @@ -270,16 +267,16 @@ static int xts_encrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_enc, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -292,16 +289,16 @@ static int xts_decrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_dec, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -425,7 +422,7 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, /* encrypt the zero vector */ kernel_neon_begin(); - aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1, 1); + aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1); kernel_neon_end(); cmac_gf128_mul_by_x(consts, consts); @@ -454,8 +451,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, return err; kernel_neon_begin(); - aes_ecb_encrypt(key, ks[0], rk, rounds, 1, 1); - aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2, 0); + aes_ecb_encrypt(key, ks[0], rk, rounds, 1); + aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2); kernel_neon_end(); return cbcmac_setkey(tfm, key, sizeof(key)); diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 2674d43d1384..65b273667b34 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -40,24 +40,24 @@ #if INTERLEAVE == 2 aes_encrypt_block2x: - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block2x) aes_decrypt_block2x: - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block2x) #elif INTERLEAVE == 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -86,33 +86,32 @@ ENDPROC(aes_decrypt_block4x) #define FRAME_POP .macro do_encrypt_block2x - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_decrypt_block2x - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_encrypt_block4x - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm .macro do_decrypt_block4x - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm #endif /* * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) * aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) */ AES_ENTRY(aes_ecb_encrypt) FRAME_PUSH - cbz w5, .LecbencloopNx enc_prepare w3, x2, x5 @@ -148,7 +147,6 @@ AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) FRAME_PUSH - cbz w5, .LecbdecloopNx dec_prepare w3, x2, x5 @@ -184,14 +182,12 @@ AES_ENDPROC(aes_ecb_decrypt) /* * aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) * aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) */ AES_ENTRY(aes_cbc_encrypt) - cbz w6, .Lcbcencloop - ld1 {v0.16b}, [x5] /* get iv */ enc_prepare w3, x2, x6 @@ -209,7 +205,6 @@ AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) FRAME_PUSH - cbz w6, .LcbcdecloopNx ld1 {v7.16b}, [x5] /* get iv */ dec_prepare w3, x2, x6 @@ -264,20 +259,19 @@ AES_ENDPROC(aes_cbc_decrypt) /* * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 ctr[], int first) + * int blocks, u8 ctr[]) */ AES_ENTRY(aes_ctr_encrypt) FRAME_PUSH - cbz w6, .Lctrnotfirst /* 1st time around? */ + enc_prepare w3, x2, x6 ld1 {v4.16b}, [x5] -.Lctrnotfirst: - umov x8, v4.d[1] /* keep swabbed ctr in reg */ - rev x8, x8 + umov x6, v4.d[1] /* keep swabbed ctr in reg */ + rev x6, x6 #if INTERLEAVE >= 2 - cmn w8, w4 /* 32 bit overflow? */ + cmn w6, w4 /* 32 bit overflow? */ bcs .Lctrloop .LctrloopNx: subs w4, w4, #INTERLEAVE @@ -285,11 +279,11 @@ AES_ENTRY(aes_ctr_encrypt) #if INTERLEAVE == 2 mov v0.8b, v4.8b mov v1.8b, v4.8b - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v0.d[1], x7 - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v1.d[1], x7 ld1 {v2.16b-v3.16b}, [x1], #32 /* get 2 input blocks */ do_encrypt_block2x @@ -298,7 +292,7 @@ AES_ENTRY(aes_ctr_encrypt) st1 {v0.16b-v1.16b}, [x0], #32 #else ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ - dup v7.4s, w8 + dup v7.4s, w6 mov v0.16b, v4.16b add v7.4s, v7.4s, v8.4s mov v1.16b, v4.16b @@ -316,9 +310,9 @@ AES_ENTRY(aes_ctr_encrypt) eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b st1 {v0.16b-v3.16b}, [x0], #64 - add x8, x8, #INTERLEAVE + add x6, x6, #INTERLEAVE #endif - rev x7, x8 + rev x7, x6 ins v4.d[1], x7 cbz w4, .Lctrout b .LctrloopNx @@ -328,10 +322,10 @@ AES_ENTRY(aes_ctr_encrypt) #endif .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 - adds x8, x8, #1 /* increment BE ctr */ - rev x7, x8 + adds x6, x6, #1 /* increment BE ctr */ + rev x7, x6 ins v4.d[1], x7 bcs .Lctrcarry /* overflow? */ @@ -385,15 +379,17 @@ CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) FRAME_PUSH - cbz w7, .LxtsencloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - enc_switch_key w3, x2, x6 + cbz w7, .Lxtsencnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + enc_switch_key w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsencNx +.Lxtsencnotfirst: + enc_prepare w3, x2, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -442,7 +438,7 @@ AES_ENTRY(aes_xts_encrypt) .Lxtsencloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -450,6 +446,7 @@ AES_ENTRY(aes_xts_encrypt) next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_encrypt) @@ -457,15 +454,17 @@ AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) FRAME_PUSH - cbz w7, .LxtsdecloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - dec_prepare w3, x2, x6 + cbz w7, .Lxtsdecnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + dec_prepare w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsdecNx +.Lxtsdecnotfirst: + dec_prepare w3, x2, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -514,7 +513,7 @@ AES_ENTRY(aes_xts_decrypt) .Lxtsdecloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w3, x2, x6, w7 + decrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -522,6 +521,7 @@ AES_ENTRY(aes_xts_decrypt) next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_decrypt) diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index c55d68ccb89f..9d823c77ec84 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -46,10 +46,9 @@ asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[], /* borrowed from aes-neon-blk.ko */ asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, u8 iv[], - int first); + int rounds, int blocks, u8 iv[]); struct aesbs_ctx { u8 rk[13 * (8 * AES_BLOCK_SIZE) + 32]; @@ -157,7 +156,7 @@ static int cbc_encrypt(struct skcipher_request *req) struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; - int err, first = 1; + int err; err = skcipher_walk_virt(&walk, req, true); @@ -167,10 +166,9 @@ static int cbc_encrypt(struct skcipher_request *req) /* fall back to the non-bitsliced NEON implementation */ neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - ctx->enc, ctx->key.rounds, blocks, walk.iv, - first); + ctx->enc, ctx->key.rounds, blocks, + walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; } kernel_neon_end(); return err; @@ -311,7 +309,7 @@ static int __xts_crypt(struct skcipher_request *req, kernel_neon_begin(); neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, - ctx->key.rounds, 1, 1); + ctx->key.rounds, 1); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; From patchwork Mon Dec 4 12:26:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120514 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365477qgn; Mon, 4 Dec 2017 04:27:13 -0800 (PST) X-Google-Smtp-Source: AGs4zMZc5JRnH9Ati0dWxkEj0qb7YV05QOJyVwYDs0KSSfLd3LUSqnoX1Ny8GO4wLtjZAe/eW2co X-Received: by 10.84.169.67 with SMTP id g61mr14309338plb.152.1512390433364; Mon, 04 Dec 2017 04:27:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390433; cv=none; d=google.com; s=arc-20160816; b=oOpJCn2/oThPMsm7RdN5RM0cG3cazCnh2vPmpCNiYXxoXBCMQiTpO0OGMNmrcHpqnC 8wAmeeyB+ZUBrtFIk7lqsM+QOhTBveCtLUXIbTcn1ilpi10Im0NEGy3qUFeL8CyGECNy 0Vwhyma4qu7xNe497A4K8yLXbJONuYgINnVO3CWZ+9o1R7WJ+tKR7UqpGXnP2rfsWQV+ on5nu9Mmf4+s/9VNkhnsXGXH5FJ3cCjATNBgKtBcjKOHm8+92EaRvMbV+5+rRAhLVdNO JSgrr660O0ZY1c6PUEIgcpNvdTs4r3v/bOeR0ZwetbTT5mAPGa6pk3BgvhU9dGrYtbkI GHsw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=sWcxZ7km1eJ5SCLUYQftQER0UI1p/7fd6bNUaSW4eWybqnlA1rCc0ck6T6gV5b0KeO DaTydBVwyXY/8U6rlWroHQWMqtxQ4m1LrA3C2a4BQDyUKVLPP/EccCGi+VtFHwtLeXYT gHYtbsOo6s9+BPGq4n8XVztL04vKG2C9UykrCIFn/Tp/a7dGPWTRhmYs8n8yI/zU9/5G oB508/SsPWSag81WQ7rvU0nGx5oW7hmYxP1vJQyC9YyooqXvE69/5iyS3iWG/jtB2JbU xILdoUtLnyotScJtDDm0Bmtq20vZvaIaVDUGar675qmnmw7pMr2IiO41nhmJQoO734Sl R6Ew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=c0ihMOSb; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ay5si7779699plb.11.2017.12.04.04.27.13; Mon, 04 Dec 2017 04:27:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=c0ihMOSb; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753157AbdLDM1L (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:11 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:46594 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753136AbdLDM1I (ORCPT ); Mon, 4 Dec 2017 07:27:08 -0500 Received: by mail-wm0-f65.google.com with SMTP id r78so5466675wme.5 for ; Mon, 04 Dec 2017 04:27:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=c0ihMOSbcJS4ttMKW2fKt4ehWzuMtyfqzB8iY1U66vdVdy+NPdOS5bZzgff8X7hMX4 lmJpEbBEOGT0fgAYOzKIBotc9uNQPNO1MS6sa+ScITxzVjQ+5923YyqSVXKRs778/7Ux TpEWYr3qrZImdlo6Y0bCwRvnkLSMjfS5KYPhA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=la9MPAiUXUqiD6KpcCsjbb89XgZkMN6Myx1uDf36aEIDpFpou4vonDvsdg21yL4B2t q3V00Go3kvd0oNv6H0ury7GZy6vGF88JIffdbxll9u2Y6/6Q0W9ybC0MLY4FvVIES7zs rj6y4xI9koequo6b0NMdqZMPFqKVNkkV/P0cejER3okuHkhbVo6toSmXFoXe0oW1tAmA fNQ/FY/l//Gm2gLsrgMHt9WUCNZVKxXWt8cj1hH9ggX/KP3FyydBFwVyazz+cHAk/Rrz lerdGs+RADiAg9ElWYehsZGAZEEN+8XvAanfxAACQ5h07ouUVRDBhlmPfvZiZF5lFt8l rJaw== X-Gm-Message-State: AKGB3mLvNde/1lWVob0MNGgOAmRkv5OfZBXYvMgBenKV+HhEH2TAwFpG 7g/mzu6nPG4uR2UWqTYi5GByPt1oBGI= X-Received: by 10.28.5.198 with SMTP id 189mr2944610wmf.29.1512390426976; Mon, 04 Dec 2017 04:27:06 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:06 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 04/19] crypto: arm64/aes-bs - move kernel mode neon en/disable into loop Date: Mon, 4 Dec 2017 12:26:30 +0000 Message-Id: <20171204122645.31535-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-glue.c | 36 +++++++++----------- 1 file changed, 17 insertions(+), 19 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index 9d823c77ec84..e7a95a566462 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -99,9 +99,8 @@ static int __ecb_crypt(struct skcipher_request *req, struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -109,12 +108,13 @@ static int __ecb_crypt(struct skcipher_request *req, blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -158,19 +158,19 @@ static int cbc_encrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; /* fall back to the non-bitsliced NEON implementation */ + kernel_neon_begin(); neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->enc, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -181,9 +181,8 @@ static int cbc_decrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -191,13 +190,14 @@ static int cbc_decrypt(struct skcipher_request *req) blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -229,9 +229,8 @@ static int ctr_encrypt(struct skcipher_request *req) u8 buf[AES_BLOCK_SIZE]; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; u8 *final = (walk.total % AES_BLOCK_SIZE) ? buf : NULL; @@ -242,8 +241,10 @@ static int ctr_encrypt(struct skcipher_request *req) final = NULL; } + kernel_neon_begin(); aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks, walk.iv, final); + kernel_neon_end(); if (final) { u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE; @@ -258,8 +259,6 @@ static int ctr_encrypt(struct skcipher_request *req) err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); - return err; } @@ -304,12 +303,11 @@ static int __xts_crypt(struct skcipher_request *req, struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); kernel_neon_begin(); - - neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, - ctx->key.rounds, 1); + neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, ctx->key.rounds, 1); + kernel_neon_end(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -318,13 +316,13 @@ static int __xts_crypt(struct skcipher_request *req, blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); - return err; } From patchwork Mon Dec 4 12:26:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120515 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365519qgn; Mon, 4 Dec 2017 04:27:15 -0800 (PST) X-Google-Smtp-Source: AGs4zMaDkZEdwfa26B867vGmKgHF+Xad/5MG6554nt1l1QwUTJQq4XNyYlB2IX6L7GJO6j2XWQLC X-Received: by 10.159.204.146 with SMTP id t18mr11375075plo.237.1512390435411; Mon, 04 Dec 2017 04:27:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390435; cv=none; d=google.com; s=arc-20160816; b=o01Ag549QScbxEj+Za02rQjlSS67tyT5Af03hvQm0tv02yHmWqX61OBuflo+L4U6dt zTch7HFm1suUhIMAY9WD/3CnZIxHgEU/VIuC0/YqvQHwsu4By9YkelxR84TyKU47Wu20 Xjgz20QAqb1oUBouNEbOq9XZsHfb1VUTbXKhSuAa7OgvQ1PPHV1dfdkFe54Bk/asiP/J pvv23ksYU1Q25ytGOCBUBVh+5JRRAd710fRZfr6WcNuvrsuq9BvzG+qqIyD5lte4Z+s8 +qfIzv4b+dNb0VX135m67YN0LUfjfxIRk9n4d7NJcgiTtQl40k+9Cm9JRwLFJVe7k+Y/ +FFg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=AW+iYHbvWFCddi4/vLn/Li4cwn2FKZLOZ5Xu4L2o7sEFjywgpbZMfvpCbz7AZlxeiV WS5guVZy+yO4jIbSG7Djob1DXwUMqjPlR79vkyXSXTVtWH7lvLZy1Ln9l8fTEKuitQXD EGY6WBezGCuGEztFd2piNIRrNSZOw3b21Z+5VtMHbOu69G+5gJNXNBef9U+B+VP39wQ7 +98a6MLB1We/UzZ7FXcSxo+HihG0Kkc/CTmrPWUDqvaQcQxm/4hdtTcUqefNgPKQ8ScA gjdiVXQMSRrm6lxnlrD1rKFpgeMvao6JMM6j6GuiDVyfQ2/gG8CwoCKHPezFRqyIfNju LzFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=SgyXGla/; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u144si9409634pgb.226.2017.12.04.04.27.15; Mon, 04 Dec 2017 04:27:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=SgyXGla/; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753138AbdLDM1O (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:14 -0500 Received: from mail-wm0-f65.google.com ([74.125.82.65]:44482 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753142AbdLDM1L (ORCPT ); Mon, 4 Dec 2017 07:27:11 -0500 Received: by mail-wm0-f65.google.com with SMTP id t8so5453293wmc.3 for ; Mon, 04 Dec 2017 04:27:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=SgyXGla/u3x1dLflpupGjmIczL/OalUwN1xEkR1cUiEdQiB8aIqmP0v6T4uIn41pk3 5SQbM6r34qGzHkN4bDwKbMnVDmbtvWi4A92YzbuXVzpeYx50Wa7wb5OGtSwNArVdVnTy eGNoyZ20gDDvKMcfT/RcPAoT1X30H6Np5KX1s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=jIilYinuJMmFItGakrvIvgf+eoBswopMbiSPW1aigDs/poaYaYTp1TlOkRpNMODt7t 7NkT8awaiiCTqVIv3p7/2cAfVyDDUGk4666s3p7QldAeGHSTeDvJmvTaZXzUYOq1lzeh +MYpoRqwgpnF37OHhjZW95DmxtlLokntWyuF/EkxsvELY7XMdy8meX4B14dNXkD8bp+v 3B+EADMB+3Sq8uePfMVUR3mUJCPyLOEaJiwRl/5KBcbyKeIYRugeu49YaGyF2lTOTRaS x1+Tk3+PcQUbn5EzSldOSvtiFBdS1Es9Lqu+YFMTdrPpuv31a2fTKLizi9KeqZV8KZvh wrAA== X-Gm-Message-State: AKGB3mK8bBm+oWTaqj8Tq4csnAlE1qbB+wg4Vc/Y+5RMWLWPh907bMzJ TGOavsuNd/Pv+UJo4WNFdDvzW3Hcqtg= X-Received: by 10.28.178.135 with SMTP id b129mr2746569wmf.103.1512390430248; Mon, 04 Dec 2017 04:27:10 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:09 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 05/19] crypto: arm64/chacha20 - move kernel mode neon en/disable into loop Date: Mon, 4 Dec 2017 12:26:31 +0000 Message-Id: <20171204122645.31535-6-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha20-neon-glue.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c index cbdb75d15cd0..727579c93ded 100644 --- a/arch/arm64/crypto/chacha20-neon-glue.c +++ b/arch/arm64/crypto/chacha20-neon-glue.c @@ -37,12 +37,19 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, u8 buf[CHACHA20_BLOCK_SIZE]; while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + kernel_neon_begin(); chacha20_4block_xor_neon(state, dst, src); + kernel_neon_end(); bytes -= CHACHA20_BLOCK_SIZE * 4; src += CHACHA20_BLOCK_SIZE * 4; dst += CHACHA20_BLOCK_SIZE * 4; state[12] += 4; } + + if (!bytes) + return; + + kernel_neon_begin(); while (bytes >= CHACHA20_BLOCK_SIZE) { chacha20_block_xor_neon(state, dst, src); bytes -= CHACHA20_BLOCK_SIZE; @@ -55,6 +62,7 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, chacha20_block_xor_neon(state, buf, buf); memcpy(dst, buf, bytes); } + kernel_neon_end(); } static int chacha20_neon(struct skcipher_request *req) @@ -68,11 +76,10 @@ static int chacha20_neon(struct skcipher_request *req) if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE) return crypto_chacha20_crypt(req); - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); crypto_chacha20_init(state, ctx, walk.iv); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; @@ -83,7 +90,6 @@ static int chacha20_neon(struct skcipher_request *req) nbytes); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } - kernel_neon_end(); return err; } From patchwork Mon Dec 4 12:26:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120516 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365597qgn; Mon, 4 Dec 2017 04:27:19 -0800 (PST) X-Google-Smtp-Source: AGs4zMYxnzhvrMNh9QP3tJfvaAcxAT9dHqB3mL+lD1m0EOmuSRS2KV0tyO1kV+uH0+ufTSiqPc1I X-Received: by 10.101.86.6 with SMTP id l6mr12288726pgs.153.1512390439605; Mon, 04 Dec 2017 04:27:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390439; cv=none; d=google.com; s=arc-20160816; b=IP/+u7pToxnza6RMBjv6foKdoFrIMnYbschE5J/yLg62F81UiH91OsoErXsuxSjfG/ z4GO6uwWriS+sXxcMV43hiMprcUOfvH0Zwjt5sl64XHHb+lGqU6pHNDwANLYrxHyAaaH 4bxF0shUTvYZssytZdlbTmArJVUVxYOF/10e2S6LM/vbO3cD5wy5vSFlI9KIVZ3e6/gu 8c2UBm9+iCl4ts5jnrs6dyRIa/VBIGDqM7KpgjIKQdyxiuDF+6lTHirEFKFDlfi5iQ/d 0m83SHcjZ95XKBs/T54ylZHfzJjVa7RuRDk+RBxtgavNEF/0emIxvpWexjr98svSjMq3 Fjsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=Ivjs08rQtt4hYa83WFs0Y/nrkWzevRyHdzepLIVTug8=; b=xf60Np7P/gxGzpleiNelCDYOL6/LUiAFArq6uelXb7c7Q8ucQAJUqc6MouZNeLCAUB U8TmS3O4Cxqhh9M82gU7MWV0UTUzjMKuzAYjChXbCQn7SCw3mR/raMVv3yxF7OW+iDku 9LvVwBqlwbv9QUdRBKov5gHwLNK0ZmDDpVWA1tepf2cEP36QazZtFui4fDSqU1KBe7yF 2Q9Ze/TRs0w/iKXvnITebf2hRD7FRe7bugxqhRz8FxTLODaAxzz3LClkRkgBOKVvy/XM VjRNQI6KoYgzC906Yno+CY4XSrh2gBdZ/8lqLv/9tZ7r/JV9kwTXJfTMQK26sVlPfTx9 o7Cw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZLIaS9xR; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u144si9409634pgb.226.2017.12.04.04.27.19; Mon, 04 Dec 2017 04:27:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZLIaS9xR; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753030AbdLDM1R (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:17 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:41444 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753171AbdLDM1P (ORCPT ); Mon, 4 Dec 2017 07:27:15 -0500 Received: by mail-wm0-f66.google.com with SMTP id g75so5491545wme.0 for ; Mon, 04 Dec 2017 04:27:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Ivjs08rQtt4hYa83WFs0Y/nrkWzevRyHdzepLIVTug8=; b=ZLIaS9xRMfvdqVMVxqf5e5HYJqlTjw1bO+aQVVEAQKLIYYrQ6xPtiQk/t2liefAAaY T4qlPN6Lr7k2c7dESJ1acTjqQp7senrggYAZ7ENDmw+12CjZh+elPm3CbgE5/DWDMJo6 S9TVxwe3YhPxLDujE4xmzQGOqxjWDZBSBmTto= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Ivjs08rQtt4hYa83WFs0Y/nrkWzevRyHdzepLIVTug8=; b=HAaRRptkK6QKYkPRg+nPS7mOZhglD6UClUBpnidGqi91fPdTlJKSvroeAgIxfftlwd hbuukcF5aMrktgdxxusVljfcl0U7rnPxgHySBopFIsYE9ApjsSQIxhe/x+obx9f/O/Y7 BwU6sGmJQh0GIGvaX5T5WKOpG0HV4QsvToIDJoTJtZxgmGFv0HE6AAJGWEYSkazYNiPe dWHMx68hXUQR4vF03Is/Rq9GxVd68mwIGqO+V4hqLgLaWFJ51gDbwg0+0MFHlnHPFHp7 g1b4uZJtWy1uWK80t0KW95vrbHVHj6iNQUIluOnZrLARQyt34np5g32yVdup9mEBMqAd jm/g== X-Gm-Message-State: AKGB3mItCYEAIPgoA0a6oG9Y4CS1zpMA/nklTIXZqXh80wxhTjtaa39E l4qFZR6nQD4d85MVMyN10ke4gla6HYI= X-Received: by 10.28.71.5 with SMTP id u5mr6901936wma.84.1512390433808; Mon, 04 Dec 2017 04:27:13 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:12 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 06/19] crypto: arm64/ghash - move kernel mode neon en/disable into loop Date: Mon, 4 Dec 2017 12:26:32 +0000 Message-Id: <20171204122645.31535-7-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-glue.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..cb39503673d4 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -368,26 +368,28 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, iv, num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +469,18 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, iv, num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +488,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; From patchwork Mon Dec 4 12:26:35 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120519 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365710qgn; Mon, 4 Dec 2017 04:27:27 -0800 (PST) X-Google-Smtp-Source: AGs4zMbQ9TatcA6LKujakOvLPLbjS9EUc7jE8t/CPQSzxEiGu0jgpIF/gkCiYbfvkRdhhDWw74Ym X-Received: by 10.98.74.148 with SMTP id c20mr19172527pfj.200.1512390447304; Mon, 04 Dec 2017 04:27:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390447; cv=none; d=google.com; s=arc-20160816; b=tsJTSeJG+oZlyAslZ5teShTIxQ736HQ439k14qVWJb2l1DG0LPscCrASKi6gqIdQBl tfTvlkkdMMAwm2n2Gruv0oEOaDkJJSCCcyIa4+WQ3idYMRItdye8+Fq3tOqLCn/tzDp4 EICb3TH00oKtFYVfkVypzaJ9U2WpuGHCxlYa5gbo7CaUEOsmMLMn842A0lRXmxMgjp7S U0UoK0lJkU5uMafbwU/TDN/szjW9OEr2VnvNmYxyVAWR7vttxec76E2EtG+kH27akmfC t2rbPXy/ZkbCY3oNYisF5LpB2obj+0+e7a2aWTA7XaFz7LuGJaBo6DSFzPrmYnbHsz04 vMHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=; b=0wZoSEh7c9UryhEsZHegqAIV2Ku+ml1GxiPrQKqcXR74QhThpGdqPprd4dp9mzGFIW RJl8ezg30nBWuH+LvIh8kw4zwMi2c4gG/dI47Eq7K1ebTyETf2gNoAjjd8hXogrLps/9 4YWP1H23kvpCxaa+oqoxdIRqTY4YnvpUWaqBcm5Fk19P6WxVmbf9bpIpclq8DQEM+8Cq htGsLmDxGaWi2FvyYCs1DL2uRjwPu5zuDyh1NpaUjEfOHyHWH1W4rEvF8pplRYvJjvUU wdfYPzVDNS+7VIvARRFscmDnNszyslMVwo225jDdDPk21bG25RXuWnlJZnXk3viYsb7r Pokw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=T5X6DWNb; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u144si9409634pgb.226.2017.12.04.04.27.27; Mon, 04 Dec 2017 04:27:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=T5X6DWNb; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753248AbdLDM1Z (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:25 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:39401 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753241AbdLDM1Y (ORCPT ); Mon, 4 Dec 2017 07:27:24 -0500 Received: by mail-wm0-f66.google.com with SMTP id i11so13902077wmf.4 for ; Mon, 04 Dec 2017 04:27:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=; b=T5X6DWNbBmJA75bMdo7t4EtdknQlVh3wODDqkcpC2B9iB3A+E2TZ8a08nksjXWb5Kg hJ0+u2S9QNy37nEpnp42RLFX8VhZnBuQnSkZrtab4tR+Kjj3ngiFnBgNPbB8SR9ftSmA ZeCDCI6+BHfNlx27OKpZhf1IKSi7Ke+hfa3OA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=; b=rXMSdNA8youDphL8Wo6Iw+qHboHQFUKfKVtvCzX8Q2Des5++zIlLEixcwcU4YHBJu+ 4gm6W/wBlHwQ4609H0bnPOMPJNeU+i6o/0Qa46yf2x0Hf9mjMRjtsFDy223xDrJVBg8Z Ls3IkDGcxDQAXcHZh80rwzhHFWsOcw9XL6+iCe7vDuDiKzMbENI+T1GqFyinv4VkEkde i+/qldo62mRQjxkOB9TXrUy2Ratf8hPulcqEH+p7GEQI6OPDh7aoosl9g6iqR2W7Ye23 Y+T6CM8hTbo+8p77togjTM7w5fuSexUeBeyC0g4O5P9kxNi/mgUzukEteZIZsajkFvOJ apVQ== X-Gm-Message-State: AJaThX7tXG50ghRNqRQSQXtJtMir/O6Hzk2nGSxO6jwRSlzFnp/4zRAj KZpdIDMXvdi/el3CEfF2spyOOCxiH/E= X-Received: by 10.28.216.196 with SMTP id p187mr6452225wmg.158.1512390442611; Mon, 04 Dec 2017 04:27:22 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:21 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 09/19] crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path Date: Mon, 4 Dec 2017 12:26:35 +0000 Message-Id: <20171204122645.31535-10-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org CBC MAC is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC MAC which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 23 ++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index e86535a1329d..a68412e1e3a4 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -395,8 +395,28 @@ AES_ENDPROC(aes_xts_decrypt) AES_ENTRY(aes_mac_update) ld1 {v0.16b}, [x4] /* get dg */ enc_prepare w2, x1, x7 - cbnz w5, .Lmacenc + cbz w5, .Lmacloop4x + encrypt_block v0, w2, x1, x7, w8 + +.Lmacloop4x: + subs w3, w3, #4 + bmi .Lmac1x + ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */ + eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v2.16b + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v3.16b + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v4.16b + cmp w3, wzr + csinv x5, x6, xzr, eq + cbz w5, .Lmacout + encrypt_block v0, w2, x1, x7, w8 + b .Lmacloop4x +.Lmac1x: + add w3, w3, #4 .Lmacloop: cbz w3, .Lmacout ld1 {v1.16b}, [x0], #16 /* get next pt block */ @@ -406,7 +426,6 @@ AES_ENTRY(aes_mac_update) csinv x5, x6, xzr, eq cbz w5, .Lmacout -.Lmacenc: encrypt_block v0, w2, x1, x7, w8 b .Lmacloop From patchwork Mon Dec 4 12:26:36 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120520 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365776qgn; Mon, 4 Dec 2017 04:27:30 -0800 (PST) X-Google-Smtp-Source: AGs4zMbylGkI5S/u5p6DUNHBrZWUUG4RigaqvRRG9EMkGEaBgN2y8j8nAA4FMeY0PYAE0f3OmwU1 X-Received: by 10.101.93.66 with SMTP id e2mr13603030pgt.50.1512390450807; Mon, 04 Dec 2017 04:27:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390450; cv=none; d=google.com; s=arc-20160816; b=TIf4XBtp2cfPk6AVum195OyVXm0TN7RIxYn5ELifH4SS9v5S/oXdFujx6/WoywG+4N fjX1eVlAKfxWQ4UY6YkmyZTnCxOoMNpN2aVwCO202odzgzt+DfFc/ApM1VwtoFsDazVQ EZHSzdFW9gVONAlWHIZY/yX5blkMteWUxECtOv2LaxPZbM9PhLLdFzKuB7d5r80m1V4Z cvUO+dkRy9Jn+OoS0J3FzXYJWfz4segUJvq5vfivHKeWmxZGCW6zf4qs86jInaaugOba 5IDEX1BIYGlutAlM9i2N0YUdVFpmfTxqqv8HKI9x0wUJxI6t0+DTTlY8TIn9CCQ/znRW Rc3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=tZyk314AeZT+RFR9RrwRPuNXZK+vFsXdfOLf+mflRpPVM7UYxhmIRU3TM+RZBrbTkR yewopQRlL3/n601CPGP0IqtjuEpweWYil92jr/MqKpMKe3tA46jqCQfXoH6sMV7z2/RR OhQtGGuNrB7GlHDKpyA83IDzef5dhQM6Lt+olEUgg9Mx0HV2cotrZFZE64VlXAhnTo/s uBJwVfoLrtuBx2Nscl+r2IMCjs0l8V0HY92qUWn1fZee0RzAYeQQec92XKb7J4BjRmbP jbVx6f/SNEkS1GoszxGsuxc6rIXEEu8ytuqm7fIKiZqiMWG+J3hJ9ayNzKBNkWUp/JP2 lEzQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jlIZxCzn; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.30; Mon, 04 Dec 2017 04:27:30 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jlIZxCzn; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753265AbdLDM12 (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:28 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:46671 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753260AbdLDM10 (ORCPT ); Mon, 4 Dec 2017 07:27:26 -0500 Received: by mail-wm0-f66.google.com with SMTP id r78so5468569wme.5 for ; Mon, 04 Dec 2017 04:27:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=jlIZxCznZx/RtAlVcf+JuB12Pzu4NtHO4tfWZeTQ9z4xtm+kg3S9YksejWa3udtb0/ 9Y6rE+cULrIqUTht1BaBHCcO9frkwJbSxKTwYQLJc0AIQzCTVcbx8Yv1dJf15ZKuG82+ Lqxu1POXcqgSWQLqSk7WRBBoqTRoLb6FRQ03U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=UF0cSR44DZq1P7pijLUaz79l0+DT14Tf6HVhmIm5moUuU1CUniloeeCQ7bfwIpGEjg /CcrfjtnLnlWqZllEpU8JgVHzEXbpWEmdI6Q4Jk45P9QUoAXDbmj8euQEhp1FbxbcL+Y uOP1dH4VYDLwGTVnz2gjwrT4jQP44ZbqXhm+nzIVWRTE74iPzCacqVAlYag8We143kBs 6Hqaw4PYbpdvNAdnbuQ2kVAdxK3nOYEhh1370S4JHEbEajfCai42vpqS0YW8D+kG9JmY Du5SHIvOVgdNo88bLhHcP6uRGFwUFzt1LbPRbt0VCjj37F0EiKsTX0IEobpQvKjXd15u uYaA== X-Gm-Message-State: AKGB3mIYfDRogbw+zyoklhvoWqA8umUywqCuj2CA3msoeRwkvBQmIjNj nuob5LGktFB+eBOh0jDaoLM04jYa95I= X-Received: by 10.28.63.16 with SMTP id m16mr2937869wma.19.1512390445103; Mon, 04 Dec 2017 04:27:25 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:24 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 10/19] crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels Date: Mon, 4 Dec 2017 12:26:36 +0000 Message-Id: <20171204122645.31535-11-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Tweak the SHA256 update routines to invoke the SHA256 block transform block by block, to avoid excessive scheduling delays caused by the NEON algorithm running with preemption disabled. Also, remove a stale comment which no longer applies now that kernel mode NEON is actually disallowed in some contexts. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha256-glue.c | 36 +++++++++++++------- 1 file changed, 23 insertions(+), 13 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c index b064d925fe2a..e8880ccdc71f 100644 --- a/arch/arm64/crypto/sha256-glue.c +++ b/arch/arm64/crypto/sha256-glue.c @@ -89,21 +89,32 @@ static struct shash_alg algs[] = { { static int sha256_update_neon(struct shash_desc *desc, const u8 *data, unsigned int len) { - /* - * Stacking and unstacking a substantial slice of the NEON register - * file may significantly affect performance for small updates when - * executing in interrupt context, so fall back to the scalar code - * in that case. - */ + struct sha256_state *sctx = shash_desc_ctx(desc); + if (!may_use_simd()) return sha256_base_do_update(desc, data, len, (sha256_block_fn *)sha256_block_data_order); - kernel_neon_begin(); - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); - kernel_neon_end(); + while (len > 0) { + unsigned int chunk = len; + + /* + * Don't hog the CPU for the entire time it takes to process all + * input when running on a preemptible kernel, but process the + * data block by block instead. + */ + if (IS_ENABLED(CONFIG_PREEMPT) && + chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE) + chunk = SHA256_BLOCK_SIZE - + sctx->count % SHA256_BLOCK_SIZE; + kernel_neon_begin(); + sha256_base_do_update(desc, data, chunk, + (sha256_block_fn *)sha256_block_neon); + kernel_neon_end(); + data += chunk; + len -= chunk; + } return 0; } @@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, const u8 *data, sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_data_order); } else { - kernel_neon_begin(); if (len) - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); + sha256_update_neon(desc, data, len); + kernel_neon_begin(); sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_neon); kernel_neon_end(); From patchwork Mon Dec 4 12:26:37 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120521 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365830qgn; Mon, 4 Dec 2017 04:27:34 -0800 (PST) X-Google-Smtp-Source: AGs4zMZ78e1QwV8xEXhx7c9F9zAteI6PAGCoIvOjQM2SN8hF1wyhF5I+iTisbjSLdmiC6mnAI3vT X-Received: by 10.101.102.66 with SMTP id z2mr14013510pgv.352.1512390454403; Mon, 04 Dec 2017 04:27:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390454; cv=none; d=google.com; s=arc-20160816; b=L+aNCmZzcblnhphvBLF2pZuuE/PV/mXBzVzWBKYnDXgDj0fEBO47Mv/IaxLJE/lb/g ktF/AX0LbAsyvybxDzulEZ1HMn383YrbLRYR2BIYtSh3zlswAk0YGpI3gOT715MePtjE OXNAVAg3wB0dns9PE2IXrCUehlq5+e/tIzdxOwvFuVmtdmOlPwaXiqfYAV27ZGus1foW s4MCRFjggYyK2cj71ELNLtSkMRAV4/5My1xABAPYQ47J58MBcfec/ajZnMUC3GnOC8y1 vtrpq2wQBWB2kcSRUIerbEorjbsI675yREOCgr6BodNXuIUiA4MFPHgAOa99QeJNDp24 N2Eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=iLnvUryj6wgRmRZyo3vDg6ZHKz55Pt1BEioZv1Vwiy8=; b=nxB2Z8ej4hhZFLZNdJpONw/jmMOy5ZKxsiYhfuHcysj3qwFrl53vOeiTD/NVGKEVg2 wHZq+y/f0PCPC6v1x3YCHKkIv5ylHCY5SlQa+I9xoF6wQHlgtQwrK6ae/zttVo8bmYyx po5cKfB946CiHtsJurIgISDGnjNUAP+KWI9gZ68oRyiEcpnr11jdVRv9Twx6bqB4SyDJ 7ZEOaI/FH4JgIdQ8bvYmrELNPvikgTop1E1g9TUVl4GPRqR7cFO1GFNEGS/0gk0Ftikv jmGz7tOGF0+aiilN6cCcZ1kj5OsD2mbehElVU6U8C9gTv43kDTb4SzueIbBhpI4tk5ne o35A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RNFQ6A9W; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.34; Mon, 04 Dec 2017 04:27:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RNFQ6A9W; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753275AbdLDM1c (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:32 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:45616 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753271AbdLDM13 (ORCPT ); Mon, 4 Dec 2017 07:27:29 -0500 Received: by mail-wm0-f66.google.com with SMTP id 9so5448131wme.4 for ; Mon, 04 Dec 2017 04:27:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=iLnvUryj6wgRmRZyo3vDg6ZHKz55Pt1BEioZv1Vwiy8=; b=RNFQ6A9WBnbvP5M4fVOCAJpZH4ICe6gFetdbaxRv+3k/VjNUOFjrtlOpAkXrk79Xyc DxOmUjE4C0YiYnKhqS0ZYK6ft1DGvGI3Rfr4c3IwiqQeAuiB6zqpg5GkfgZoMZTGOJkg Sj6u4HwE2OG6/e0VYYzYHTW5gkomlrQJPDtfw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=iLnvUryj6wgRmRZyo3vDg6ZHKz55Pt1BEioZv1Vwiy8=; b=jyKxAreH9MggLfeiHqDWxWTITIyRfKXCS91u8/SoLZ3qlD6zoAzycQgpwKHsfFwG6h ZNd51U1GDDLQDau8p7S3sgPtZzws2Yg0qh4K21x1FUHU8KAtY//qzgLri+4g7fWiuaVr WJB89748j6F+2rnVwg5bxeLXpUHFOa6HiKxzhTRyf0CyY7+0gNYpYy1vNEi0Cw6rRijG 5EkUJabj0QKUySWnEYF0+jUEw7VtyxOd9wx8P4BuIoQYAitw0JYZ9qJRm7x4ENakAUdW hJxkdWDP9PLFL/bBlwqW2IT+GTEcXMneqamybanNfEEinU21CVkyQ810r78ZyYQWOMhS ks4A== X-Gm-Message-State: AJaThX5Gfm93jrJ31XpeFTPXt/629rE2vkr3CTBmv40aKGfuPUHWPj7J M4mSRWh23ngFivI3X+ppNpkWINSbxaw= X-Received: by 10.28.198.139 with SMTP id w133mr8048759wmf.13.1512390447568; Mon, 04 Dec 2017 04:27:27 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:26 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 11/19] arm64: assembler: add macro to conditionally yield the NEON under PREEMPT Date: Mon, 4 Dec 2017 12:26:37 +0000 Message-Id: <20171204122645.31535-12-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Add a support macro to conditionally yield the NEON (and thus the CPU) that may be called from the assembler code. Given that especially the instruction based accelerated crypto code may use very tight loops, add some parametrization so that the TIF_NEED_RESCHED flag test is only executed every so many loop iterations. In some cases, yielding the NEON involves saving and restoring a non trivial amount of context (especially in the CRC folding algorithms), and so the macro is split into two, and the code in between is only executed when the yield path is taken, allowing the contex to be preserved. The second macro takes a label argument that marks the resume-from-yield path, which should restore the preserved context again. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 50 ++++++++++++++++++++ 1 file changed, 50 insertions(+) -- 2.11.0 diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index aef72d886677..917b026d3e00 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -512,4 +512,54 @@ alternative_else_nop_endif #endif .endm +/* + * yield_neon - check whether to yield to another runnable task from + * kernel mode NEON code (running with preemption disabled) + * + * - Check whether the preempt count is exactly 1, in which case disabling + * preemption once will make the task preemptible. If this is not the case, + * yielding is pointless. + * - Check whether TIF_NEED_RESCHED is set, and if so, disable and re-enable + * kernel mode NEON (which will trigger a reschedule), and branch to the + * yield fixup code at @lbl. + */ + .macro yield_neon, lbl:req, ctr, order, stride, loop + yield_neon_pre \ctr, \order, \stride, \loop + yield_neon_post \lbl + .endm + + .macro yield_neon_pre, ctr, order=0, stride, loop=4444f +#ifdef CONFIG_PREEMPT + /* + * With some algorithms, it makes little sense to poll the + * TIF_NEED_RESCHED flag after every iteration, so only perform + * the check every 2^order strides. + */ + .if \order > 1 + .if (\stride & (\stride - 1)) != 0 + .error "stride should be a power of 2" + .endif + tst \ctr, #((1 << \order) * \stride - 1) & ~(\stride - 1) + b.ne \loop + .endif + + get_thread_info x0 + ldr w1, [x0, #TSK_TI_PREEMPT] + ldr x0, [x0, #TSK_TI_FLAGS] + cmp w1, #1 // == PREEMPT_OFFSET + csel x0, x0, xzr, eq + tbnz x0, #TIF_NEED_RESCHED, 5555f // needs rescheduling? +4444: +#endif + .subsection 1 +5555: + .endm + + .macro yield_neon_post, lbl:req + bl kernel_neon_end + bl kernel_neon_begin + b \lbl + .previous + .endm + #endif /* __ASM_ASSEMBLER_H */ From patchwork Mon Dec 4 12:26:38 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120522 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365892qgn; Mon, 4 Dec 2017 04:27:37 -0800 (PST) X-Google-Smtp-Source: AGs4zMaN7FVqiNPJOvIk4azn701Q62iNo+jobTvqVnJ6PbHoMxLm/eFkelQBP1ohPUTeesSi6a0S X-Received: by 10.99.127.25 with SMTP id a25mr13631461pgd.10.1512390457602; Mon, 04 Dec 2017 04:27:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390457; cv=none; d=google.com; s=arc-20160816; b=Vivaq3p9GM4U5khF8Vh0VbGgZ21pyFTgKticCtEum4/d60gzzzFxdG0+h5yQWfrvvW VdBjpLUyeaekwKhkdwqrQpsSyXEM66we5xVVM+FNcAPtfkFy8FaGv/LwQQmKcqBrYhPI dmlx7Ra2kI4xx5vw7uy4ivGzdjyn+JosteYldiMG4WPrK8ovL5yJXasEoaRFtVeQ1142 hO9baBg2uO+Bg8qnJrvQ+yFgFbUNpw41mmoZnZFP78VdsIaD8LmAn5je9V4eEtygHveU qfVe6Jrzx1flDrdL0VjDnRqbal2ftQ2N+Emf7eKZvN1DDejCNFcscqgHuEgIGNnGQx80 /joA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=rZXeGMBOOCqXYLCtDRFIOa9MhRvenoqWmRAqEcgLXII=; b=ZFtxfYruLQKLDXdpcGdhN+nRCdLHCLKaHClCD86NsfmY02JX92J8Pyf7d8dXQkQ8hj P5iF9IaROyPL7V4CXg/xyXxujMSTp7Vnvg/l04J4R5Xhw571d+9izysIkaWTJUGEdpV4 GHS4RGFnVKynUYd/wFoiSns5ButtacRhuac2kRBn9Q+CQhSfhZJxyoVBd0J3nzyQeu70 mLODXhMtnzLl5Y7Wk5nu+1pZAb3iz4ewMnuPckj0HlULfhA/d3dlIenVSbKVeqi5f1Rh 3ZxNg+0+KzWav1A8uQpEHzd7Vmke1kdA/TV6P3NInQoIq6rIopyZKkZxLJ8dfJjutNCl LxkQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OaPVCwB6; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.37; Mon, 04 Dec 2017 04:27:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OaPVCwB6; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753271AbdLDM1f (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:35 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:44559 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753260AbdLDM1c (ORCPT ); Mon, 4 Dec 2017 07:27:32 -0500 Received: by mail-wm0-f67.google.com with SMTP id t8so5455482wmc.3 for ; Mon, 04 Dec 2017 04:27:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=rZXeGMBOOCqXYLCtDRFIOa9MhRvenoqWmRAqEcgLXII=; b=OaPVCwB6LlyiqdsakcOELuyj73LBJVykzdKu5xH7OuNkiElOvi3LoFdaj2c1fqbwve /q8iHyXcRl1AeWF03LGrN4tsao1Casn9WzpX6UJcwyFSmrNFy0qujDdtej04GfKVs5oN o5cZjNn4W8l/vumtsynIXsGOwkvopMNOUDFpE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=rZXeGMBOOCqXYLCtDRFIOa9MhRvenoqWmRAqEcgLXII=; b=A1fjsto+uyj42r8OfKDz0dVqossvIw+YAt4WOAUkYLZBJFPP1ufQFo2GRa7gGdJhUf TM1pYO67/iNgDu0jUoeMhbV2CTq6HnksmiDp56XfOIDmgmIKZso5XVPB2CCHsIuL57mI t351zVbWPRbpKXhJWtLKVCWX06vGCooQ9p+0XKmjIYlffwO+2HcG3gzVGFjPUxbt9oKu IllVQjJe5Bils77aIvVqiEqNFGZMU9HI3IokDkH/cqvHZKdUvuET97s3+ubPHXx6d7NS geGSVpmfEEh6GR6sSy0akWwlgRr2MJzRhVPozKtloMLNLyXZMC3CiVL2hBo4zHWrN07n jl+Q== X-Gm-Message-State: AKGB3mJj2OvHwsG8c7brwyWBIx9pZPk6L6ezK2nLkqbJzVE1pNpJNYkh F0wAojTf5iTiLL8Yxo5qpdUfwH+xihM= X-Received: by 10.28.110.24 with SMTP id j24mr6624027wmc.100.1512390451014; Mon, 04 Dec 2017 04:27:31 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:30 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 12/19] crypto: arm64/sha1-ce - yield every 8 blocks of input Date: Mon, 4 Dec 2017 12:26:38 +0000 Message-Id: <20171204122645.31535-13-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 8 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha1-ce-core.S | 45 ++++++++++++++------ 1 file changed, 32 insertions(+), 13 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index 8550408735a0..7ae0dd369e0a 100644 --- a/arch/arm64/crypto/sha1-ce-core.S +++ b/arch/arm64/crypto/sha1-ce-core.S @@ -70,31 +70,40 @@ * int blocks) */ ENTRY(sha1_ce_transform) + stp x29, x30, [sp, #-48]! + mov x29, sp + stp x19, x20, [sp, #16] + str x21, [sp, #32] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - adr x6, .Lsha1_rcon +0: adr x6, .Lsha1_rcon ld1r {k0.4s}, [x6], #4 ld1r {k1.4s}, [x6], #4 ld1r {k2.4s}, [x6], #4 ld1r {k3.4s}, [x6] /* load state */ - ld1 {dgav.4s}, [x0] - ldr dgb, [x0, #16] + ld1 {dgav.4s}, [x19] + ldr dgb, [x19, #16] /* load sha1_ce_state::finalize */ ldr_l w4, sha1_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v8.4s-v11.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v8.4s-v11.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE( rev32 v8.16b, v8.16b ) CPU_LE( rev32 v9.16b, v9.16b ) CPU_LE( rev32 v10.16b, v10.16b ) CPU_LE( rev32 v11.16b, v11.16b ) -1: add t0.4s, v8.4s, k0.4s +2: add t0.4s, v8.4s, k0.4s mov dg0v.16b, dgav.16b add_update c, ev, k0, 8, 9, 10, 11, dgb @@ -125,16 +134,23 @@ CPU_LE( rev32 v11.16b, v11.16b ) add dgbv.2s, dgbv.2s, dg1v.2s add dgav.4s, dgav.4s, dg0v.4s - cbnz w2, 0b + cbz w21, 3f + + yield_neon_pre w21, 3, 1, 1b // yield every 8 blocks + st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + yield_neon_post 0b + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha1_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] movi v9.2d, #0 mov x8, #0x80000000 movi v10.2d, #0 @@ -143,10 +159,13 @@ CPU_LE( rev32 v11.16b, v11.16b ) mov x4, #0 mov v11.d[0], xzr mov v11.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s}, [x0] - str dgb, [x0, #16] +4: st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + ldp x19, x20, [sp, #16] + ldr x21, [sp, #32] + ldp x29, x30, [sp], #48 ret ENDPROC(sha1_ce_transform) From patchwork Mon Dec 4 12:26:40 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120524 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4365973qgn; Mon, 4 Dec 2017 04:27:43 -0800 (PST) X-Google-Smtp-Source: AGs4zMa+uxiSEgB9qerOB1L4Lj6QmG1mFISb3TIszZQZB2F0fwH62RdzZRtC4MhzyhE4ZRJ+kKox X-Received: by 10.101.93.66 with SMTP id e2mr13603535pgt.50.1512390463233; Mon, 04 Dec 2017 04:27:43 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390463; cv=none; d=google.com; s=arc-20160816; b=YEDQP4QgqhZcgrf4MOL+RZmOypcr56wt4aRFDylocAtcdJLUbMgWKu5LCnYSWmtB4I /A/FL/bBY620BrlXo+vlOP65528hJvoMkjhu2TkwjuALhZdl2iz7BGf3SubhR2Ys2Wl3 RvFBbFURXef1RUXIIh/WKPFSr07ad9bBf4BJuhiobIrYVR23CVJR74EfgzOgm75zAnhS q6vqOdL0vDR9eY7d8ve2q9fMSeg6RmkLQWqHg06ftg/bV/f8GsIR89+LvzQea5xeZJn+ 0yxlTwKlo99N3lrrU60qS9hT56V3Ixo7/yWCWw86LDEwO5wn0svZ7tv5PAKIvbCpP9no 77Qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=r18z/JBw9RLQuCRhkSydrmEY1w+3CL324jASA6EYkUY=; b=EFS2OtS70IhHPN8xuCVYMHDKPg3djvdDykR3P12XVAHDLD/diFz+DA9XfXcLtLjwae iGRiv1eu/ZMMGIGruKlmo51jdtRv7krkdyO+iwpX9TVlI5ha9xPUXbMiERGoPlvQCT+o MpzFc0G++nRf3/xVU/x1cIDZSPQAjkHXo43rUH0Tqu7VKPj4kLeWki4Ge4ruUoQcmJJ9 xScdoTCmvGB/zC03OCbs61zj26hkMuNwE16YZo1DShUBDtf84u+14sFQzZW8hmONSoO5 Fh1q3MjsNxwGvlOnkKucknV1EVgFXaYxFPeSikHUv0/wTGdr0P+kcjJ1TljDOVSZtFwv Or5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OHgc9la7; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.43; Mon, 04 Dec 2017 04:27:43 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OHgc9la7; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753260AbdLDM1l (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:41 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:41591 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753296AbdLDM1k (ORCPT ); Mon, 4 Dec 2017 07:27:40 -0500 Received: by mail-wr0-f193.google.com with SMTP id z18so17104494wrb.8 for ; Mon, 04 Dec 2017 04:27:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=r18z/JBw9RLQuCRhkSydrmEY1w+3CL324jASA6EYkUY=; b=OHgc9la7qCyPwXua4vOrBttRmOMw6EOrNsxhnaZQe9qzUr2xQUbhWydenjwaeYBHj9 K0/fYDwb2euN6aZD8YITxkJ8r6SR6b2cjF83AKYCwkqBiLgHL3jmSyHOWWrhkFpIr5bZ yX/NJlYGR2MomdOwrbcK6q1sN9P8d2XOGfvH0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=r18z/JBw9RLQuCRhkSydrmEY1w+3CL324jASA6EYkUY=; b=FbWg492tHLE0o2TJIi/nNBCzstlwcTvsgWCNciOS881u6xmpjKnkGqqW6kaZKZYydN U/NOfU412MktJpIsLlK3FYzfD/7mSTEQ+kS7BSD3t2clO5ZXZ5kBeBT7NP4tRsanIDDt LVRBbuj7o8zGNC6xkMkgqVJHmGkyDUzTldpclUwpgFE75hI08pbD/mYiGHLBp45nz1G1 Al01k/YTllgtdaBwnY7nKvyN6lIcmmHf1Z4g6j+AlsUpkNvRubrmfEVKqV8bhoyFq64q Y0ZN135lWiydfpgBEBj3mY9CDyac9GID9vZzlDvZOaMYR6sIzhuJQenzCBk1mcWVKJB2 cBBg== X-Gm-Message-State: AJaThX6yz7ZvsF6vU/dllJxXKbkC/R/T73T0jOGfmyfKoVj7YA7ZJ6l+ o9zV27Zs1DVu0KmzV2qqkDQpXrkRprA= X-Received: by 10.223.195.138 with SMTP id p10mr13830439wrf.88.1512390458125; Mon, 04 Dec 2017 04:27:38 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:37 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 14/19] crypto: arm64/aes-blk - yield after processing a fixed chunk of input Date: Mon, 4 Dec 2017 12:26:40 +0000 Message-Id: <20171204122645.31535-15-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Currently, the AES block code may keep preemption disabled for as long as it takes to process each contigous chunk of input, which could be as large as a page or skb, depending on the context. For this code to be useable in RT context, it needs to operate on fixed chunks of limited size. So let's add a yield after each 16 blocks (for the CE case) or after every block (for the pure NEON case), which will disable and re-enable kernel mode NEON if a reschedule is pending. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce.S | 17 +- arch/arm64/crypto/aes-modes.S | 379 +++++++++++++------- arch/arm64/crypto/aes-neon.S | 2 + 3 files changed, 272 insertions(+), 126 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 50330f5c3adc..ccb17b65005a 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -15,6 +15,8 @@ #define AES_ENTRY(func) ENTRY(ce_ ## func) #define AES_ENDPROC(func) ENDPROC(ce_ ## func) +#define AES_YIELD_ORDER 4 + .arch armv8-a+crypto /* preload all round keys */ @@ -30,18 +32,21 @@ .endm /* prepare for encryption with key in rk[] */ - .macro enc_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for encryption (again) but with new key in rk[] */ - .macro enc_switch_key, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_switch_key, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for decryption with key in rk[] */ - .macro dec_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro dec_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index a68412e1e3a4..6fcdf82fa295 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -14,12 +14,12 @@ .align 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -31,57 +31,85 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] - enc_prepare w3, x2, x5 + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +.Lecbencrestart: + enc_prepare w22, x21, x5 .LecbencloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lecbenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ bl aes_encrypt_block4x - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + yield_neon .Lecbencrestart, w23, AES_YIELD_ORDER, 4, .LecbencloopNx b .LecbencloopNx .Lecbenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lecbencout .Lecbencloop: - ld1 {v0.16b}, [x1], #16 /* get next pt block */ - encrypt_block v0, w3, x2, x5, w6 - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + ld1 {v0.16b}, [x20], #16 /* get next pt block */ + encrypt_block v0, w22, x21, x5, w6 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lecbencloop .Lecbencout: - ldp x29, x30, [sp], #16 + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 - dec_prepare w3, x2, x5 +.Lecbdecrestart: + dec_prepare w22, x21, x5 .LecbdecloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lecbdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ bl aes_decrypt_block4x - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + yield_neon .Lecbdecrestart, w23, AES_YIELD_ORDER, 4, .LecbdecloopNx b .LecbdecloopNx .Lecbdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lecbdecout .Lecbdecloop: - ld1 {v0.16b}, [x1], #16 /* get next ct block */ - decrypt_block v0, w3, x2, x5, w6 - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + ld1 {v0.16b}, [x20], #16 /* get next ct block */ + decrypt_block v0, w22, x21, x5, w6 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lecbdecloop .Lecbdecout: - ldp x29, x30, [sp], #16 + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret AES_ENDPROC(aes_ecb_decrypt) @@ -94,78 +122,114 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - ld1 {v4.16b}, [x5] /* get iv */ - enc_prepare w3, x2, x6 + stp x29, x30, [sp, #-64]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + +.Lcbcencrestart: + ld1 {v4.16b}, [x24] /* get iv */ + enc_prepare w22, x21, x6 .Lcbcencloop4x: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lcbcenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w22, x21, x6, w7 eor v1.16b, v1.16b, v0.16b - encrypt_block v1, w3, x2, x6, w7 + encrypt_block v1, w22, x21, x6, w7 eor v2.16b, v2.16b, v1.16b - encrypt_block v2, w3, x2, x6, w7 + encrypt_block v2, w22, x21, x6, w7 eor v3.16b, v3.16b, v2.16b - encrypt_block v3, w3, x2, x6, w7 - st1 {v0.16b-v3.16b}, [x0], #64 + encrypt_block v3, w22, x21, x6, w7 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v3.16b + st1 {v4.16b}, [x24] /* return iv */ + yield_neon .Lcbcencrestart, w23, AES_YIELD_ORDER, 4, .Lcbcencloop4x b .Lcbcencloop4x .Lcbcenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lcbcencout .Lcbcencloop: - ld1 {v0.16b}, [x1], #16 /* get next pt block */ + ld1 {v0.16b}, [x20], #16 /* get next pt block */ eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ - encrypt_block v4, w3, x2, x6, w7 - st1 {v4.16b}, [x0], #16 - subs w4, w4, #1 + encrypt_block v4, w22, x21, x6, w7 + st1 {v4.16b}, [x19], #16 + subs w23, w23, #1 bne .Lcbcencloop .Lcbcencout: - st1 {v4.16b}, [x5] /* return iv */ + st1 {v4.16b}, [x24] /* return iv */ + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x29, x30, [sp], #64 ret AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] - ld1 {v7.16b}, [x5] /* get iv */ - dec_prepare w3, x2, x6 + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + +.Lcbcdecrestart: + ld1 {v7.16b}, [x24] /* get iv */ + dec_prepare w22, x21, x6 .LcbcdecloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lcbcdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ mov v4.16b, v0.16b mov v5.16b, v1.16b mov v6.16b, v2.16b bl aes_decrypt_block4x - sub x1, x1, #16 + sub x20, x20, #16 eor v0.16b, v0.16b, v7.16b eor v1.16b, v1.16b, v4.16b - ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */ + ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */ eor v2.16b, v2.16b, v5.16b eor v3.16b, v3.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v7.16b}, [x24] /* return iv */ + yield_neon .Lcbcdecrestart, w23, AES_YIELD_ORDER, 4, .LcbcdecloopNx b .LcbcdecloopNx .Lcbcdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lcbcdecout .Lcbcdecloop: - ld1 {v1.16b}, [x1], #16 /* get next ct block */ + ld1 {v1.16b}, [x20], #16 /* get next ct block */ mov v0.16b, v1.16b /* ...and copy to v0 */ - decrypt_block v0, w3, x2, x6, w7 + decrypt_block v0, w22, x21, x6, w7 eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */ mov v7.16b, v1.16b /* ct is next iv */ - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lcbcdecloop .Lcbcdecout: - st1 {v7.16b}, [x5] /* return iv */ - ldp x29, x30, [sp], #16 + st1 {v7.16b}, [x24] /* return iv */ + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x29, x30, [sp], #64 ret AES_ENDPROC(aes_cbc_decrypt) @@ -176,19 +240,30 @@ AES_ENDPROC(aes_cbc_decrypt) */ AES_ENTRY(aes_ctr_encrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 - enc_prepare w3, x2, x6 - ld1 {v4.16b}, [x5] +.Lctrrestart: + enc_prepare w22, x21, x6 + ld1 {v4.16b}, [x24] umov x6, v4.d[1] /* keep swabbed ctr in reg */ rev x6, x6 - cmn w6, w4 /* 32 bit overflow? */ - bcs .Lctrloop .LctrloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lctr1x + cmn w6, #4 /* 32 bit overflow? */ + bcs .Lctr1x ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ dup v7.4s, w6 mov v0.16b, v4.16b @@ -200,25 +275,27 @@ AES_ENTRY(aes_ctr_encrypt) mov v1.s[3], v8.s[0] mov v2.s[3], v8.s[1] mov v3.s[3], v8.s[2] - ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */ + ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */ bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b - ld1 {v5.16b}, [x1], #16 /* get 1 input block */ + ld1 {v5.16b}, [x20], #16 /* get 1 input block */ eor v1.16b, v6.16b, v1.16b eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 add x6, x6, #4 rev x7, x6 ins v4.d[1], x7 - cbz w4, .Lctrout + cbz w23, .Lctrout + st1 {v4.16b}, [x24] /* return next CTR value */ + yield_neon .Lctrrestart, w23, AES_YIELD_ORDER, 4, .LctrloopNx b .LctrloopNx .Lctr1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lctrout .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w3, x2, x8, w7 + encrypt_block v0, w22, x21, x8, w7 adds x6, x6, #1 /* increment BE ctr */ rev x7, x6 @@ -226,22 +303,25 @@ AES_ENTRY(aes_ctr_encrypt) bcs .Lctrcarry /* overflow? */ .Lctrcarrydone: - subs w4, w4, #1 + subs w23, w23, #1 bmi .Lctrtailblock /* blocks <0 means tail block */ - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 eor v3.16b, v0.16b, v3.16b - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 bne .Lctrloop .Lctrout: - st1 {v4.16b}, [x5] /* return next CTR value */ - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] /* return next CTR value */ +.Lctrret: + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x29, x30, [sp], #64 ret .Lctrtailblock: - st1 {v0.16b}, [x0] - ldp x29, x30, [sp], #16 - ret + st1 {v0.16b}, [x19] + b .Lctrret .Lctrcarry: umov x7, v4.d[0] /* load upper word of ctr */ @@ -274,10 +354,20 @@ CPU_LE( .quad 1, 0x87 ) CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp - - ld1 {v4.16b}, [x6] + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 + + ld1 {v4.16b}, [x24] cbz w7, .Lxtsencnotfirst enc_prepare w3, x5, x8 @@ -286,15 +376,17 @@ AES_ENTRY(aes_xts_encrypt) ldr q7, .Lxts_mul_x b .LxtsencNx +.Lxtsencrestart: + ld1 {v4.16b}, [x24] .Lxtsencnotfirst: - enc_prepare w3, x2, x8 + enc_prepare w22, x21, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsencNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lxtsenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -307,35 +399,50 @@ AES_ENTRY(aes_xts_encrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v7.16b - cbz w4, .Lxtsencout + cbz w23, .Lxtsencout + st1 {v4.16b}, [x24] + yield_neon .Lxtsencrestart, w23, AES_YIELD_ORDER, 4, .LxtsencloopNx b .LxtsencloopNx .Lxtsenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lxtsencout .Lxtsencloop: - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w3, x2, x8, w7 + encrypt_block v0, w22, x21, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 beq .Lxtsencout next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: - st1 {v4.16b}, [x6] - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x29, x30, [sp], #64 ret AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp - - ld1 {v4.16b}, [x6] + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 + + ld1 {v4.16b}, [x24] cbz w7, .Lxtsdecnotfirst enc_prepare w3, x5, x8 @@ -344,15 +451,17 @@ AES_ENTRY(aes_xts_decrypt) ldr q7, .Lxts_mul_x b .LxtsdecNx +.Lxtsdecrestart: + ld1 {v4.16b}, [x24] .Lxtsdecnotfirst: - dec_prepare w3, x2, x8 + dec_prepare w22, x21, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsdecNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lxtsdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -365,26 +474,31 @@ AES_ENTRY(aes_xts_decrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v7.16b - cbz w4, .Lxtsdecout + cbz w23, .Lxtsdecout + st1 {v4.16b}, [x24] + yield_neon .Lxtsdecrestart, w23, AES_YIELD_ORDER, 4, .LxtsdecloopNx b .LxtsdecloopNx .Lxtsdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lxtsdecout .Lxtsdecloop: - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w3, x2, x8, w7 + decrypt_block v0, w22, x21, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 beq .Lxtsdecout next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: - st1 {v4.16b}, [x6] - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x29, x30, [sp], #64 ret AES_ENDPROC(aes_xts_decrypt) @@ -393,43 +507,68 @@ AES_ENDPROC(aes_xts_decrypt) * int blocks, u8 dg[], int enc_before, int enc_after) */ AES_ENTRY(aes_mac_update) - ld1 {v0.16b}, [x4] /* get dg */ + stp x29, x30, [sp, #-64]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 + + ld1 {v0.16b}, [x23] /* get dg */ enc_prepare w2, x1, x7 cbz w5, .Lmacloop4x encrypt_block v0, w2, x1, x7, w8 .Lmacloop4x: - subs w3, w3, #4 + subs w22, w22, #4 bmi .Lmac1x - ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */ + ld1 {v1.16b-v4.16b}, [x19], #64 /* get next pt block */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v2.16b - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v3.16b - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v4.16b - cmp w3, wzr - csinv x5, x6, xzr, eq + cmp w22, wzr + csinv x5, x24, xzr, eq cbz w5, .Lmacout - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 + st1 {v0.16b}, [x23] /* return dg */ + yield_neon .Lmacrestart, w22, AES_YIELD_ORDER, 4, .Lmacloop4x b .Lmacloop4x .Lmac1x: - add w3, w3, #4 + add w22, w22, #4 .Lmacloop: - cbz w3, .Lmacout - ld1 {v1.16b}, [x0], #16 /* get next pt block */ + cbz w22, .Lmacout + ld1 {v1.16b}, [x19], #16 /* get next pt block */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ - subs w3, w3, #1 - csinv x5, x6, xzr, eq + subs w22, w22, #1 + csinv x5, x24, xzr, eq cbz w5, .Lmacout - encrypt_block v0, w2, x1, x7, w8 +.Lmacenc: + encrypt_block v0, w21, x20, x7, w8 b .Lmacloop .Lmacout: - st1 {v0.16b}, [x4] /* return dg */ + st1 {v0.16b}, [x23] /* return dg */ + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x29, x30, [sp], #64 ret + +.Lmacrestart: + ld1 {v0.16b}, [x23] /* get dg */ + enc_prepare w21, x20, x0 + b .Lmacloop4x AES_ENDPROC(aes_mac_update) diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S index f1e3aa2732f9..dab7be7d3628 100644 --- a/arch/arm64/crypto/aes-neon.S +++ b/arch/arm64/crypto/aes-neon.S @@ -14,6 +14,8 @@ #define AES_ENTRY(func) ENTRY(neon_ ## func) #define AES_ENDPROC(func) ENDPROC(neon_ ## func) +#define AES_YIELD_ORDER 0 + /* multiply by polynomial 'x' in GF(2^8) */ .macro mul_by_x, out, in, temp, const sshr \temp, \in, #7 From patchwork Mon Dec 4 12:26:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120525 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4366028qgn; Mon, 4 Dec 2017 04:27:47 -0800 (PST) X-Google-Smtp-Source: AGs4zMaS9bdofeKMgqnGjVbaqyDLAuPGNceFVv2qAy4iflgjqy4SPe3XWgo47X2PSH9RlrOgBJx/ X-Received: by 10.101.102.66 with SMTP id z2mr14014025pgv.352.1512390467107; Mon, 04 Dec 2017 04:27:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390467; cv=none; d=google.com; s=arc-20160816; b=PLT4+NGIYpLha7ut1hvdA5S2VQTjAdrPjjaukn9Q5GlUJSGDqwAAgw43gts/2xoJ8E pj+6DpKSJb1+/JDYUKhw2Dj/bxfXvppTvrzAqR9QlKzx9hPPtYZza9AHCFhpNpz39AOb /QSLuBjaw1HDVhKQUFjh0YyQBiFVHtghZLRDLecUDkBIsS0dmu3rMc8RhnRdBC9RcUnx qFgtfXAp/L7QYuantEaU+68o5uNVoSh5h1+bCOW/f4iijJbKpqJMAmKVSFg3ymYFeHMD F51DYY23HmOAj94SxEWtQKElfZ8x/54Lt6QaRftjtagVBHT08+uroP4DLymBKzkwP+si mfIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=BKeaFq/3e/pVWNSo0hlDncWIIBb4lu/zqClWxciSjME=; b=aerLpzNmRQ9uGxEEXcElN6ORzwOEo4n2jAeCSgdfhPxiOQEuDTjo/o/VG04DYg45Mo BFabF5LrzNRqca5xiEiIOGH3euT2O7X8H1ROYj0Mnm1j9LBgjoEbo0UWa9QBuCJhgH/y QXNMjer317guMBRwItDuVO8sZilA+JVN0F0dBKr8YE+CYrf4hmPeTi/4engIsD8Si9/Z fenrHu46ZxkJDfL9sFaHn91iCSZJG/5/lGUEiDVFM0RrhGaFj3VqM+SuxvGktoDcuwgX Waga+2fVLlM4bhONh5HOBGzSlcgNW2hU4TUbuLHzUsyFQqavc6qQRyvaS3svMQK8Fcx7 l43g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=e0dyzNYs; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.46; Mon, 04 Dec 2017 04:27:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=e0dyzNYs; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753324AbdLDM1p (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:45 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:45657 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753236AbdLDM1m (ORCPT ); Mon, 4 Dec 2017 07:27:42 -0500 Received: by mail-wm0-f68.google.com with SMTP id 9so5449464wme.4 for ; Mon, 04 Dec 2017 04:27:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=BKeaFq/3e/pVWNSo0hlDncWIIBb4lu/zqClWxciSjME=; b=e0dyzNYsk/p9zdGG4Ify+WoPuGJw8KJ8Wu2hED4P+211ZoVH+7jgfVgNzcT7Ul3e2M NCFgJF/RfrutOnJy9Y++jKDdgz+SYRRLv6tc2VVjd+BwHihcaDIiyo2NJn+CZE53ezDO uEDmLLlDr5tFsOO5FHOSDKPdEJG1vPu7g04mc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=BKeaFq/3e/pVWNSo0hlDncWIIBb4lu/zqClWxciSjME=; b=rcvtyUEGDSnK1gdJYg5NqXJ2+5webWwsd1DRglnHxW+lOJcNsREgeSJv4hTghSI3mf 5jDwW735qHpVCZf+cc9unxN7uRduaw2a76OeAm5MSki+i9ldZuMQFsngcnHLLauv0jJ/ Un76QqLaZNeBIQkHaaY+M0Ulj/1ySPhbQsIgCt/GPnoTy3S81x8+7gWvhNTOnb+Nue9z 1sdIMFf99Q6WxuhPZNFU2XJnez5uwq3xJwIe2HxHdtsww14gTjIHZ+MwS3I/o8NGd+bp EP7UCbld/nmcrflm+uA585gc5GJ1EK9E2htCeldey/IYNufFwJzWn7W63SU0h39X6tnC ed0w== X-Gm-Message-State: AKGB3mL1NVMlwu4mcf/8w8IPnD/y3WlZgncpAdUDj2BPQO4QVxauU73X GwVaNT6fBArWMmEp1L0a3eXa/SY1i68= X-Received: by 10.28.174.20 with SMTP id x20mr3130431wme.27.1512390460974; Mon, 04 Dec 2017 04:27:40 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:40 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 15/19] crypto: arm64/aes-bs - yield after processing each 128 bytes of input Date: Mon, 4 Dec 2017 12:26:41 +0000 Message-Id: <20171204122645.31535-16-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Currently, the bit-sliced AES code may keep preemption disabled for as long as it takes to process each contigous chunk of input, which could be as large as a page or skb, depending on the context. For this code to be useable in RT context, it needs to operate on fixed chunks of limited size. So let's add a yield after each 128 bytes of input, (i.e., 8x the AES block size, which is the natural granularity for a bit sliced algorithm.) This will disable and re-enable kernel mode NEON if a reschedule is pending. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 317 ++++++++++++-------- 1 file changed, 190 insertions(+), 127 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index ca0472500433..4532a2262742 100644 --- a/arch/arm64/crypto/aes-neonbs-core.S +++ b/arch/arm64/crypto/aes-neonbs-core.S @@ -565,54 +565,68 @@ ENDPROC(aesbs_decrypt8) * int blocks) */ .macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 99: mov x5, #1 - lsl x5, x5, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x5, x5, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x5, x5, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 tbnz x5, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 tbnz x5, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 tbnz x5, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 tbnz x5, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 tbnz x5, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 tbnz x5, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 tbnz x5, #7, 0f - ld1 {v7.16b}, [x1], #16 + ld1 {v7.16b}, [x20], #16 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl \do8 - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 tbnz x5, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 tbnz x5, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 tbnz x5, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 tbnz x5, #4, 1f - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x5, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x5, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x5, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 - cbnz x4, 99b + cbz x23, 1f + yield_neon 99b + b 99b -1: ldp x29, x30, [sp], #16 +1: ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret .endm @@ -632,43 +646,53 @@ ENDPROC(aesbs_ecb_decrypt) */ .align 4 ENTRY(aesbs_cbc_decrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-64]! mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 99: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 mov v25.16b, v0.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 mov v26.16b, v1.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 mov v27.16b, v2.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 mov v28.16b, v3.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 mov v29.16b, v4.16b tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 mov v30.16b, v5.16b tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 mov v31.16b, v6.16b tbnz x6, #7, 0f - ld1 {v7.16b}, [x1] + ld1 {v7.16b}, [x20] -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_decrypt8 - ld1 {v24.16b}, [x5] // load IV + ld1 {v24.16b}, [x24] // load IV eor v1.16b, v1.16b, v25.16b eor v6.16b, v6.16b, v26.16b @@ -679,34 +703,39 @@ ENTRY(aesbs_cbc_decrypt) eor v3.16b, v3.16b, v30.16b eor v5.16b, v5.16b, v31.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 mov v24.16b, v25.16b tbnz x6, #1, 1f - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 mov v24.16b, v26.16b tbnz x6, #2, 1f - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 mov v24.16b, v27.16b tbnz x6, #3, 1f - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 mov v24.16b, v28.16b tbnz x6, #4, 1f - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 mov v24.16b, v29.16b tbnz x6, #5, 1f - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 mov v24.16b, v30.16b tbnz x6, #6, 1f - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 mov v24.16b, v31.16b tbnz x6, #7, 1f - ld1 {v24.16b}, [x1], #16 - st1 {v5.16b}, [x0], #16 -1: st1 {v24.16b}, [x5] // store IV - - cbnz x4, 99b - - ldp x29, x30, [sp], #16 + ld1 {v24.16b}, [x20], #16 + st1 {v5.16b}, [x19], #16 +1: st1 {v24.16b}, [x24] // store IV + + cbz x23, 2f + yield_neon 99b + b 99b + +2: ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x29, x30, [sp], #64 ret ENDPROC(aesbs_cbc_decrypt) @@ -731,65 +760,75 @@ CPU_BE( .quad 0x87, 1 ) */ __xts_crypt8: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 next_tweak v26, v25, v30, v31 eor v0.16b, v0.16b, v25.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 next_tweak v27, v26, v30, v31 eor v1.16b, v1.16b, v26.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 next_tweak v28, v27, v30, v31 eor v2.16b, v2.16b, v27.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 next_tweak v29, v28, v30, v31 eor v3.16b, v3.16b, v28.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 str q29, [sp, #16] eor v4.16b, v4.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 str q29, [sp, #32] eor v5.16b, v5.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 str q29, [sp, #48] eor v6.16b, v6.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #7, 0f - ld1 {v7.16b}, [x1], #16 + ld1 {v7.16b}, [x20], #16 str q29, [sp, #64] eor v7.16b, v7.16b, v29.16b next_tweak v29, v29, v30, v31 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 br x7 ENDPROC(__xts_crypt8) .macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-80]! + stp x29, x30, [sp, #-128]! mov x29, sp + stp x19, x20, [sp, #80] + stp x21, x22, [sp, #96] + stp x23, x24, [sp, #112] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 - ldr q30, .Lxts_mul_x - ld1 {v25.16b}, [x5] +0: ldr q30, .Lxts_mul_x + ld1 {v25.16b}, [x24] 99: adr x7, \do8 bl __xts_crypt8 @@ -802,16 +841,16 @@ ENDPROC(__xts_crypt8) eor \o2\().16b, \o2\().16b, v27.16b eor \o3\().16b, \o3\().16b, v28.16b - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 mov v25.16b, v26.16b tbnz x6, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 mov v25.16b, v27.16b tbnz x6, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 mov v25.16b, v28.16b tbnz x6, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 mov v25.16b, v29.16b tbnz x6, #4, 1f @@ -820,18 +859,24 @@ ENDPROC(__xts_crypt8) eor \o6\().16b, \o6\().16b, v18.16b eor \o7\().16b, \o7\().16b, v19.16b - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x6, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x6, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x6, #7, 1f - st1 {\o7\().16b}, [x0], #16 - - cbnz x4, 99b - -1: st1 {v25.16b}, [x5] - ldp x29, x30, [sp], #80 + st1 {\o7\().16b}, [x19], #16 + + cbz x23, 1f + st1 {v25.16b}, [x24] + yield_neon 0b + b 99b + +1: st1 {v25.16b}, [x24] + ldp x19, x20, [sp, #80] + ldp x21, x22, [sp, #96] + ldp x23, x24, [sp, #112] + ldp x29, x30, [sp], #128 ret .endm @@ -856,24 +901,36 @@ ENDPROC(aesbs_xts_decrypt) * int rounds, int blocks, u8 iv[], u8 final[]) */ ENTRY(aesbs_ctr_encrypt) - stp x29, x30, [sp, #-16]! + stp x29, x30, [sp, #-80]! mov x29, sp - - cmp x6, #0 + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + str x25, [sp, #64] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + cmp x25, #0 cset x10, ne - add x4, x4, x10 // do one extra block if final + add x23, x23, x10 // do one extra block if final - ldp x7, x8, [x5] - ld1 {v0.16b}, [x5] +98: ldp x7, x8, [x24] + ld1 {v0.16b}, [x24] CPU_LE( rev x7, x7 ) CPU_LE( rev x8, x8 ) adds x8, x8, #1 adc x7, x7, xzr 99: mov x9, #1 - lsl x9, x9, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x9, x9, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x9, x9, xzr, le tbnz x9, #1, 0f @@ -891,82 +948,88 @@ CPU_LE( rev x8, x8 ) tbnz x9, #7, 0f next_ctr v7 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_encrypt8 lsr x9, x9, x10 // disregard the extra block tbnz x9, #0, 0f - ld1 {v8.16b}, [x1], #16 + ld1 {v8.16b}, [x20], #16 eor v0.16b, v0.16b, v8.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 tbnz x9, #1, 1f - ld1 {v9.16b}, [x1], #16 + ld1 {v9.16b}, [x20], #16 eor v1.16b, v1.16b, v9.16b - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 tbnz x9, #2, 2f - ld1 {v10.16b}, [x1], #16 + ld1 {v10.16b}, [x20], #16 eor v4.16b, v4.16b, v10.16b - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 tbnz x9, #3, 3f - ld1 {v11.16b}, [x1], #16 + ld1 {v11.16b}, [x20], #16 eor v6.16b, v6.16b, v11.16b - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 tbnz x9, #4, 4f - ld1 {v12.16b}, [x1], #16 + ld1 {v12.16b}, [x20], #16 eor v3.16b, v3.16b, v12.16b - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 tbnz x9, #5, 5f - ld1 {v13.16b}, [x1], #16 + ld1 {v13.16b}, [x20], #16 eor v7.16b, v7.16b, v13.16b - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 tbnz x9, #6, 6f - ld1 {v14.16b}, [x1], #16 + ld1 {v14.16b}, [x20], #16 eor v2.16b, v2.16b, v14.16b - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 tbnz x9, #7, 7f - ld1 {v15.16b}, [x1], #16 + ld1 {v15.16b}, [x20], #16 eor v5.16b, v5.16b, v15.16b - st1 {v5.16b}, [x0], #16 + st1 {v5.16b}, [x19], #16 8: next_ctr v0 - cbnz x4, 99b - -0: st1 {v0.16b}, [x5] - ldp x29, x30, [sp], #16 + st1 {v0.16b}, [x24] + cbz x23, 0f + yield_neon 98b + b 99b + +0: ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldr x25, [sp, #64] + ldp x29, x30, [sp], #80 ret /* * If we are handling the tail of the input (x6 != NULL), return the * final keystream block back to the caller. */ -1: cbz x6, 8b - st1 {v1.16b}, [x6] +1: cbz x25, 8b + st1 {v1.16b}, [x25] b 8b -2: cbz x6, 8b - st1 {v4.16b}, [x6] +2: cbz x25, 8b + st1 {v4.16b}, [x25] b 8b -3: cbz x6, 8b - st1 {v6.16b}, [x6] +3: cbz x25, 8b + st1 {v6.16b}, [x25] b 8b -4: cbz x6, 8b - st1 {v3.16b}, [x6] +4: cbz x25, 8b + st1 {v3.16b}, [x25] b 8b -5: cbz x6, 8b - st1 {v7.16b}, [x6] +5: cbz x25, 8b + st1 {v7.16b}, [x25] b 8b -6: cbz x6, 8b - st1 {v2.16b}, [x6] +6: cbz x25, 8b + st1 {v2.16b}, [x25] b 8b -7: cbz x6, 8b - st1 {v5.16b}, [x6] +7: cbz x25, 8b + st1 {v5.16b}, [x25] b 8b ENDPROC(aesbs_ctr_encrypt) From patchwork Mon Dec 4 12:26:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120526 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4366071qgn; Mon, 4 Dec 2017 04:27:49 -0800 (PST) X-Google-Smtp-Source: AGs4zMbyBt/AblYpKMrdbKi5cScr+BpEqQnWRQOCdo/E83g9EsUditF3ddgND0zA2aSnrYW8J58p X-Received: by 10.98.159.16 with SMTP id g16mr19275147pfe.53.1512390469805; Mon, 04 Dec 2017 04:27:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390469; cv=none; d=google.com; s=arc-20160816; b=MyubI5VEMpKg/qj193DLJtZs6Px+1vzW2wpIHBXrIs3+Gvr1JgnA/FKx7B2qbIGfxO 1QmZT7x0uDG0bANgxn88++h0Mm91DY2shkumq/SLIO1+kYHrFgup/K5I+G0qLzynQtOL YTwQrmzvBTPvN4M0g3y2Qd/Oq5ZlkTRaLlsWWUX/hlDcpkvNwakuIYYUGOVhsnCDnOlA YR1eG8Sd0hThGCETC19EX49q+tG3U5ZQqhruLr1p3Y3QhYEk4AqZWzrrZZ7WthYdErf+ uXwNV5lW8roujFcuns/mE9TgbBxHv/roy4B/ZSapz4z3/95IKPUj1qF57DHkvg0361st llbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=; b=arlSHtrbxeVG6fEorsTBSaz3dIlYySVyMnD4+bW8/wHrm7LsTfzRscjt01G2j6sdbz bkxI7amSSGdCEq57/p+BCCrOgWETbAuPzg3uBxl2YNdLHDbrrKrT3MDllTeSa7wibrEb PTuWGpAWix0ay7aLCw6YlGeeWz/crACGZPLKi7pH0M3yYqCNdS17KTPaayCQpbx/m9Yi ih0hKvRAJtVTaGWh01tH4MDirL8BswbqxQJAxmxv0jnFu7Eb9fUuX0NgQxwOsoWgmy5O /gZ6U62siKEV5kkvIfhfdDhp+vD0F6y5cFgmmyx+tp9UIft47T2NZcIC4+Qk+7jXy3tu 58BA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RRPICWwD; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.49; Mon, 04 Dec 2017 04:27:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RRPICWwD; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753334AbdLDM1s (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:48 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:37000 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753323AbdLDM1p (ORCPT ); Mon, 4 Dec 2017 07:27:45 -0500 Received: by mail-wm0-f66.google.com with SMTP id f140so13921499wmd.2 for ; Mon, 04 Dec 2017 04:27:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=; b=RRPICWwD8QaMGtwp4l1SQrQAwjJhkk27XHqdg0sz5ZgViyfB9WmuJBZErqR7vUnVES b6LIMY/i3y1zMWfwOBZkwM65x8RxBooUFclFaRM+49JGlQFxMnl/+Bikfimd+O1Ufr94 gVt8FwdgbCypZkJlWYhun4SCo41fhF/b84eO4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=; b=GaqjAv1SzkV0DaYck5M5+lEDdVpEC+ALcC1T9PplW39VZlj+MopWN85RAul0saWvwE oreeCYSJGjwDfV4XjUg1GFU+2mwFq85eb/VK48C/QTsCDeaqgIrOVsvKuez7g2mCSvRw 2SGsrWJ+igzHmWkqrGTX50EyCrAT22ALiAYNFyVeSM4oH8jYTgKB4sM4M/YS/ELCavw0 QH27ejjfEebetVZsGHRYdCbQ9vPdZjHwIOShQDkxyzmU0BemeCydYaz9Pfsa1KfuWjQ1 2WiktFkkmVz170VtYw+LNu0LZubHRrCjZu04eXMMgQz935TgQvhXZzKpjPuoFVt1Jej8 wN+w== X-Gm-Message-State: AKGB3mKEfJfFlmAnu0iIbZ2jouYlD8a5hBirpOS1im3/eGA74rJtPN5R 147ESoqWm7baqogGZZYRys6+LZJgU4c= X-Received: by 10.28.30.151 with SMTP id e145mr2785194wme.8.1512390463952; Mon, 04 Dec 2017 04:27:43 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:43 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 16/19] crypto: arm64/aes-ghash - yield after processing fixed number of blocks Date: Mon, 4 Dec 2017 12:26:42 +0000 Message-Id: <20171204122645.31535-17-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org This updates both the core GHASH as well as the AES-GCM algorithm to yield each time after processing a fixed chunk of input. For the GCM driver, we align with the other AES/CE block mode drivers, and use a block size of 64 bytes. The core GHASH is much shorter, so let's use a block size of 128 bytes for that one. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 128 ++++++++++++++------ 1 file changed, 92 insertions(+), 36 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..fbfd4681675d 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -212,23 +212,36 @@ ushr XL.2d, XL.2d, #1 .endm - .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + .macro __pmull_ghash, pn, yield + stp x29, x30, [sp, #-64]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +263,19 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f - st1 {XL.2d}, [x1] + yield_neon_pre w19, \yield, 1, 1b + st1 {XL.2d}, [x20] + yield_neon_post 0b + + b 1b + +3: st1 {XL.2d}, [x20] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret .endm @@ -261,11 +284,11 @@ CPU_LE( rev64 T1.16b, T1.16b ) * struct ghash_key const *k, const char *head) */ ENTRY(pmull_ghash_update_p64) - __pmull_ghash p64 + __pmull_ghash p64, 5 ENDPROC(pmull_ghash_update_p64) ENTRY(pmull_ghash_update_p8) - __pmull_ghash p8 + __pmull_ghash p8, 2 ENDPROC(pmull_ghash_update_p8) KS .req v8 @@ -304,38 +327,56 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + stp x29, x30, [sp, #-96]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + stp x25, x26, [sp, #64] + str x27, [sp, #80] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + + ldr x27, [x24, #8] // load lower counter +CPU_LE( rev x27, x27 ) + +0: ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x26] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x27 + add x27, x27, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w25, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +431,42 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + yield_neon_pre w19, 8, 1, 1b // yield every 8 blocks + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x26] + .endif + yield_neon_post 0b + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x26] .endif +CPU_LE( rev x27, x27 ) + str x27, [x24, #8] // store lower counter + + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x25, x26, [sp, #64] + ldr x27, [sp, #80] + ldp x29, x30, [sp], #96 ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /* From patchwork Mon Dec 4 12:26:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120527 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4366125qgn; Mon, 4 Dec 2017 04:27:53 -0800 (PST) X-Google-Smtp-Source: AGs4zMZ0C5tdvdp2I+sM9GaDQLd9FR5d/inQSrOu9huBuhY/ADjHgs7N3+AlufjR93GxiefCNJ1M X-Received: by 10.98.32.21 with SMTP id g21mr19263862pfg.52.1512390473009; Mon, 04 Dec 2017 04:27:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390473; cv=none; d=google.com; s=arc-20160816; b=uaW3MvIWnvpPzq7k6VoSU/Gpnd2mHeQmjXeFPKOsm5akCx5YADOMtTcrQIkVFsaWQF S4ug9/ViqcsadOoKFIOmtZ0zbDkswf2564AmZmsdAxBrklFurfRUAVkrhviywJhkajjp jxmSdxihgCFsUPSLlpx1qgj1XthdnhQlQ6t2K6+mAIcV1LycM/UdvfQaqsK23N/yN02Y wsTOccwcNaua3yzhD2HQ/Li3Z9dHUeJV3Bsx2LKj01iaK33HlaFduyep297pJ9y0x+fF dFi725BaRYGVk11XrVFCcTh5/4Q+/qWn7QmnYO3dUZqUOGZqI7BO5eTB+0QhWS/4IHtX GDuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=; b=WIfpyczhphvtOaJT/d0H22/i4ivGUL46f39QrS1vWiJiuAT/tDSY92puRy4Sq3cSjH gJNkOIVpdi1fU2wtvT+lGcYlcm4pHRhO6S75ILXgGV7JfHToiGcxKC4fklwoqY9KxPLB 4hRoraOyD89fM3Zpa4NHP5XWx+J/JBraIYWiOOeEsXpi7Sbq1iqDFcOe12TvNtEzNy3o WpqZkZXOD9LKpzBsO2RLIKR8p+DmVSbrTtnBa0smob+6Z+ZXmORfk9YdWf6YTPMaM6ni RSw/EtV9UMJ/bjXVoMqiSGv17M0w+15aUyvxqYjFBV7F1l5T2oFafOzgmJ/NwjF/m1Ho 3neQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kGajrMYV; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.52; Mon, 04 Dec 2017 04:27:52 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kGajrMYV; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753382AbdLDM1v (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:51 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:40343 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753330AbdLDM1s (ORCPT ); Mon, 4 Dec 2017 07:27:48 -0500 Received: by mail-wr0-f196.google.com with SMTP id q9so17063198wre.7 for ; Mon, 04 Dec 2017 04:27:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=; b=kGajrMYVWr2DrIDPBGOAoBuGMS4jRsgcCIK9v80ULwBvyTf6Dg8wUjMu4KECCLpUWq QARSFBT69OdgKKepLJENdOSBzHGpN+cNeUIaw0ZI4cjizCkFyKNSRlAkDmUFZOh5q2PQ yYdOzJLbTyFbbpToLeIs9NcPaFHQc3gLZ1HVc= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=r3pWMbsQK1Jd8mT5KgNAPizdVYdUe5jOnoV7PBK9uJU=; b=k/7M4Xwm80ODh3eDELdtwNqXZ5xKOPyFcUQrS/9pxq/5HB2qfQiVC2dzrt7/K4aGKB 0iP5U7KsCRyrQJmUq2AkI3Yl26+HG8jTuviu9aRAnYMYK9mU3KGSzB8Iq8ZQHz+m9oQ1 BKO2XUEXaAvgFjzF2GxqPOYx8pKEozyL1o/0Dd8lZi2i+VrzFvBswvDAW8YnrKvnAPdl aMRkMNjVY3X8dSBPHS/1R3O6ZJwiN5BQCngDvvC/3MfQ4o+nIuH2ZwZEPBuXGQ1MFLSw y9V/S+6/2igtlE0gI/0HudjGDTcRwywgX6R1Lj5g1UbXGCGfU8Kb3V3HxnNkqhm4B39q KP5Q== X-Gm-Message-State: AJaThX6G4ohTwa5falcTMYPLmOxf7pMTqIJi9FRjj+sfTKqWVJl+ke5V 55N1+I+W4MFFE+3STWhIurkCUymbwvM= X-Received: by 10.223.176.27 with SMTP id f27mr13178031wra.105.1512390467409; Mon, 04 Dec 2017 04:27:47 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:46 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 17/19] crypto: arm64/crc32-ce - yield NEON every 16 blocks of input Date: Mon, 4 Dec 2017 12:26:43 +0000 Message-Id: <20171204122645.31535-18-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 16 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crc32-ce-core.S | 55 +++++++++++++++----- 1 file changed, 43 insertions(+), 12 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S index 18f5a8442276..bca3d22fae7b 100644 --- a/arch/arm64/crypto/crc32-ce-core.S +++ b/arch/arm64/crypto/crc32-ce-core.S @@ -100,9 +100,9 @@ dCONSTANT .req d0 qCONSTANT .req q0 - BUF .req x0 - LEN .req x1 - CRC .req x2 + BUF .req x19 + LEN .req x20 + CRC .req x21 vzr .req v9 @@ -116,13 +116,27 @@ * size_t len, uint crc32) */ ENTRY(crc32_pmull_le) - adr x3, .Lcrc32_constants + stp x29, x30, [sp, #-112]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + adr x22, .Lcrc32_constants b 0f ENTRY(crc32c_pmull_le) - adr x3, .Lcrc32c_constants + stp x29, x30, [sp, #-112]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + adr x22, .Lcrc32c_constants -0: bic LEN, LEN, #15 +0: mov BUF, x0 + mov LEN, x1 + mov CRC, x2 + + bic LEN, LEN, #15 ld1 {v1.16b-v4.16b}, [BUF], #0x40 movi vzr.16b, #0 fmov dCONSTANT, CRC @@ -131,7 +145,7 @@ ENTRY(crc32c_pmull_le) cmp LEN, #0x40 b.lt less_64 - ldr qCONSTANT, [x3] + ldr qCONSTANT, [x22] loop_64: /* 64 bytes Full cache line folding */ sub LEN, LEN, #0x40 @@ -161,10 +175,24 @@ loop_64: /* 64 bytes Full cache line folding */ eor v4.16b, v4.16b, v8.16b cmp LEN, #0x40 - b.ge loop_64 + b.lt less_64 + + yield_neon_pre LEN, 4, 64, loop_64 // yield every 16 blocks + stp q1, q2, [sp, #48] + stp q3, q4, [sp, #80] + yield_neon_post 2f + b loop_64 + + .subsection 1 +2: ldp q1, q2, [sp, #48] + ldp q3, q4, [sp, #80] + ldr qCONSTANT, [x22] + movi vzr.16b, #0 + b loop_64 + .previous less_64: /* Folding cache line into 128bit */ - ldr qCONSTANT, [x3, #16] + ldr qCONSTANT, [x22, #16] pmull2 v5.1q, v1.2d, vCONSTANT.2d pmull v1.1q, v1.1d, vCONSTANT.1d @@ -203,8 +231,8 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* final 32-bit fold */ - ldr dCONSTANT, [x3, #32] - ldr d3, [x3, #40] + ldr dCONSTANT, [x22, #32] + ldr d3, [x22, #40] ext v2.16b, v1.16b, vzr.16b, #4 and v1.16b, v1.16b, v3.16b @@ -212,7 +240,7 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */ - ldr qCONSTANT, [x3, #48] + ldr qCONSTANT, [x22, #48] and v2.16b, v1.16b, v3.16b ext v2.16b, vzr.16b, v2.16b, #8 @@ -222,6 +250,9 @@ fold_64: eor v1.16b, v1.16b, v2.16b mov w0, v1.s[1] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x29, x30, [sp], #112 ret ENDPROC(crc32_pmull_le) ENDPROC(crc32c_pmull_le) From patchwork Mon Dec 4 12:26:44 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120528 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4366185qgn; Mon, 4 Dec 2017 04:27:57 -0800 (PST) X-Google-Smtp-Source: AGs4zMZhMN3x5VVlZz+pEMi94yKZqIQ4MQbHOVjmLSxosVo0o5K+mPnhY8S5KgTiu28N7/kedSaH X-Received: by 10.98.189.17 with SMTP id a17mr18902948pff.97.1512390477799; Mon, 04 Dec 2017 04:27:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390477; cv=none; d=google.com; s=arc-20160816; b=AZs2lDMZYHBJYyYGOy8/2VgJP1Jl9R627m3QP7lFjJl3bWVduyMLq7MpTQkkFV/oe1 w72KSnnU7eZmkpYUcmCTKbkr4uYdcFIOl31wfHOLHGJWgZy4IHQFNYTEuA8F7hnfx1Gu 6IRZ9pUlJjPT/8fvz5K7VAzVDLRfi51YG+O/2WwJ6u2xXs+iIVU5cVhtq+G0KwGnHxUR SIWkxLv4bW+ooiuJtuDxVXqTP+3cE6p8m85+KlBf5yo2fpHS/hoEWFG4VknHNmEyHv+S NmkkGJwJtdHTwDVAAaBar097hZXnhpyT19BhAygInm25wkI8TXe9kTOBEVxnShWQSX3c 1Lng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=Jxuwcw6BxVcDUPRNazZM1ET5ppsEyXX+HMSvldFkj70=; b=Lt6yW97hNCGVrW+RX4OnESPXG/xnzmcVZrxdXneW0s4sSOHQtZA8tF8o6b1caSBM4O t7jN3uSMGcD9bO38ojKH6wt2lcyAAYqDkJa+kKakQ0QE7JSIznWLmhbUNjDYTKMns1V7 UwU+ByBK8merDkwqIpmhmEnhji5FNqm1Q23gLKrV9aGHwFWbm3SUH9C4jRwwzoPFN1Ws t2wMDexMBfkaVGZ9YSIVMhZlnbyjU3QEAmxKnH9opbNOQ3FMyeBRO+VqZuNola9AeyDS w6QMp2ksDvBoOlf5rF5FofmQtikn7R6SX55CVCaq9c+Vv09rD/toRf2ljmqb2XS3KLtg 8/rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZOYQdiX+; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.57; Mon, 04 Dec 2017 04:27:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZOYQdiX+; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753322AbdLDM1y (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:54 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:43939 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753367AbdLDM1w (ORCPT ); Mon, 4 Dec 2017 07:27:52 -0500 Received: by mail-wm0-f68.google.com with SMTP id n138so5458163wmg.2 for ; Mon, 04 Dec 2017 04:27:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Jxuwcw6BxVcDUPRNazZM1ET5ppsEyXX+HMSvldFkj70=; b=ZOYQdiX+S1+XiuFzWfPa4V8n7jUNlVke4o+5BMth07FIyPcI6bnsVvbd41NTUVi+Vt LhOm7LMpQpBC3uiOjGLConmeKSuTD+QskPEEsky74mb3g8+HgGGsb7HhTs7f98Ann2Gw PezeIBOqRiRl9MqHNC7EpOe/n0d/vBv0DjkaE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Jxuwcw6BxVcDUPRNazZM1ET5ppsEyXX+HMSvldFkj70=; b=QNCDXkklNYaIrEhLDOzsVqKaX7dMdRlSgcKZPXB8gHdDrLNMql56Tf3P2TuPn4qvbQ 3WIeK1vf3xx+mE2AS+FYKjecpf6HH86gM11fxzfmx0LnZvrQ1nFYdmL7nPE7lWILKx/W AtsfQO87dSKwuIA/Ga77veQWJnQ6A4PToWLGkgB4oyWMdlTVk4R0J07e+hcyWSF7y7rf Q2jngQSq18FRXAcg9SvrO6ziYkClwGL0Py4ppcH4RWD35D0VfMPQ3s0YTRR+LaIvLYAg Kymhe3+EZlwEnHbZNqdEBag9q21k6aAormmNnwhOWnJRC6n6mERFirCkgcocwwV8IKpv PxpQ== X-Gm-Message-State: AJaThX7GDdwtxOZiidoQy++gX70vFZJK4TKlHZMxQdur/vc6h4OZMjLu ShjzLgc3XzHazzuwF8sJnIDJY2LXfRg= X-Received: by 10.28.216.196 with SMTP id p187mr6453235wmg.158.1512390470671; Mon, 04 Dec 2017 04:27:50 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:50 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 18/19] crypto: arm64/crct10dif-ce - yield NEON every 8 blocks of input Date: Mon, 4 Dec 2017 12:26:44 +0000 Message-Id: <20171204122645.31535-19-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON every 8 blocks of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crct10dif-ce-core.S | 39 ++++++++++++++++++-- 1 file changed, 35 insertions(+), 4 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S index d5b5a8c038c8..d57067e80bae 100644 --- a/arch/arm64/crypto/crct10dif-ce-core.S +++ b/arch/arm64/crypto/crct10dif-ce-core.S @@ -74,13 +74,22 @@ .text .cpu generic+crypto - arg1_low32 .req w0 - arg2 .req x1 - arg3 .req x2 + arg1_low32 .req w19 + arg2 .req x20 + arg3 .req x21 vzr .req v13 ENTRY(crc_t10dif_pmull) + stp x29, x30, [sp, #-176]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + + mov arg1_low32, w0 + mov arg2, x1 + mov arg3, x2 + movi vzr.16b, #0 // init zero register // adjust the 16-bit initial_crc value, scale it to 32 bits @@ -175,8 +184,27 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 ) subs arg3, arg3, #128 // check if there is another 64B in the buffer to be able to fold - b.ge _fold_64_B_loop + b.lt _fold_64_B_end + + yield_neon_pre arg3, 3, 128, _fold_64_B_loop // yield every 8 blocks + stp q0, q1, [sp, #48] + stp q2, q3, [sp, #80] + stp q4, q5, [sp, #112] + stp q6, q7, [sp, #144] + yield_neon_post 2f + b _fold_64_B_loop + + .subsection 1 +2: ldp q0, q1, [sp, #48] + ldp q2, q3, [sp, #80] + ldp q4, q5, [sp, #112] + ldp q6, q7, [sp, #144] + ldr q10, rk3 + movi vzr.16b, #0 // init zero register + b _fold_64_B_loop + .previous +_fold_64_B_end: // at this point, the buffer pointer is pointing at the last y Bytes // of the buffer the 64B of folded data is in 4 of the vector // registers: v0, v1, v2, v3 @@ -304,6 +332,9 @@ _barrett: _cleanup: // scale the result back to 16 bits lsr x0, x0, #16 + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x29, x30, [sp], #176 ret _less_than_128: