From patchwork Mon Jan 2 18:21:04 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 89569 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp7697035qgi; Mon, 2 Jan 2017 10:23:26 -0800 (PST) X-Received: by 10.99.138.68 with SMTP id y65mr109066947pgd.117.1483381406852; Mon, 02 Jan 2017 10:23:26 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 12si66368401pfi.251.2017.01.02.10.23.26; Mon, 02 Jan 2017 10:23:26 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751947AbdABSXW (ORCPT + 1 other); Mon, 2 Jan 2017 13:23:22 -0500 Received: from mail-wj0-f169.google.com ([209.85.210.169]:33086 "EHLO mail-wj0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756392AbdABSWX (ORCPT ); Mon, 2 Jan 2017 13:22:23 -0500 Received: by mail-wj0-f169.google.com with SMTP id tq7so192498084wjb.0 for ; Mon, 02 Jan 2017 10:22:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=7Msw/FA5eCFKn4CtppvfYfSGiAy9mxv6lRXBzr0NkII=; b=LbE/GYURpgpfyMxw8hZVQrXB7ipA+GpgzD/4qwCfStzlJHmakuS4bEF2OKtbgSUMsn Eix7Mxtxq/CU+jgWfLhySxDUeHx+EqoR3Byu2h6tiQeYuhU6ixk4Xt98jdoSE+Jt5Gp3 d5qvCSG64Djkj3B7vBHm3NlCcO9kyieCuYm1w= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7Msw/FA5eCFKn4CtppvfYfSGiAy9mxv6lRXBzr0NkII=; b=pZSXh4tU7PkcgkUbaXDS6HdIOSYEv5CBf3jjyNAhMdJXdLzrsh1Q33mrsDET51A4P7 KYtoLqCm8VNzAP6s8eVS+TdMyvV5v2J4fXDz7PnVf92Nbc+HlIMXQOBvEA421SHxSQSX U7Deh8nyx9plBb629x9F0I2ZGRQBf/DaFlATRgorIfIxracYBeDiLHc6i7pWy7Vare9H 1GJ2ata5apYV8BAEjjUToQwNbq6n2pLmHgLLU1fyaIfYOwELVZYG7xTKK/owpACz986u kPQ0iidFKN0qQ+omMK+yrPnUwwM6kChvIoKnSY4BYaOCCeSQpS+eYfMk9zuPpHE4LEM8 7Uyw== X-Gm-Message-State: AIkVDXJ2oAkbzbYFRC4xkooddZX2CFvBzFLKM4SB3arwM3SH49MnR+s9xNU+iLW55audEr6c X-Received: by 10.194.86.67 with SMTP id n3mr59920919wjz.105.1483381341853; Mon, 02 Jan 2017 10:22:21 -0800 (PST) Received: from localhost.localdomain ([105.146.125.96]) by smtp.gmail.com with ESMTPSA id l67sm85077652wmf.0.2017.01.02.10.22.19 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 02 Jan 2017 10:22:21 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, herbert@gondor.apana.org.au, Ard Biesheuvel Subject: [PATCH 2/6] crypto: arm/aes-neonbs - process 8 blocks in parallel if we can Date: Mon, 2 Jan 2017 18:21:04 +0000 Message-Id: <1483381268-12987-3-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1483381268-12987-1-git-send-email-ard.biesheuvel@linaro.org> References: <1483381268-12987-1-git-send-email-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The bit-sliced NEON implementation of AES only performs optimally if it can process 8 blocks of input in parallel. This is due to the nature of bit slicing, where the n-th bit of each byte of AES state of each input block is collected into NEON register 'n', for registers q0 - q7. This implies that the amount of work for the transform is fixed, regardless of whether we are handling just one block or 8 in parallel. So let's try a bit harder to iterate over the input in suitably sized chunks, by setting the newly introduced walksize attribute to 8x the value of AES_BLOCK_SIZE, and tweaking the loops to only process multiples of the walk size, unless we are handling the last chunk in the input stream. Note that the skcipher walk API guarantees that a step in the walk never returns less than 'walksize' bytes if there are at least that many bytes of input still available. However, it does *not* guarantee that those steps produce an exact multiple of the walk size. Signed-off-by: Ard Biesheuvel --- arch/arm/crypto/aesbs-glue.c | 67 +++++++++++--------- 1 file changed, 38 insertions(+), 29 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe linux-crypto" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm/crypto/aesbs-glue.c b/arch/arm/crypto/aesbs-glue.c index d8e06de72ef3..f3019333c2eb 100644 --- a/arch/arm/crypto/aesbs-glue.c +++ b/arch/arm/crypto/aesbs-glue.c @@ -121,39 +121,26 @@ static int aesbs_cbc_encrypt(struct skcipher_request *req) return crypto_cbc_encrypt_walk(req, aesbs_encrypt_one); } -static inline void aesbs_decrypt_one(struct crypto_skcipher *tfm, - const u8 *src, u8 *dst) -{ - struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); - - AES_decrypt(src, dst, &ctx->dec.rk); -} - static int aesbs_cbc_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; - unsigned int nbytes; int err; - for (err = skcipher_walk_virt(&walk, req, false); - (nbytes = walk.nbytes); err = skcipher_walk_done(&walk, nbytes)) { - u32 blocks = nbytes / AES_BLOCK_SIZE; - u8 *dst = walk.dst.virt.addr; - u8 *src = walk.src.virt.addr; - u8 *iv = walk.iv; - - if (blocks >= 8) { - kernel_neon_begin(); - bsaes_cbc_encrypt(src, dst, nbytes, &ctx->dec, iv); - kernel_neon_end(); - nbytes %= AES_BLOCK_SIZE; - continue; - } + err = skcipher_walk_virt(&walk, req, false); + + while (walk.nbytes) { + unsigned int nbytes = walk.nbytes; + + if (nbytes < walk.total) + nbytes = round_down(nbytes, walk.stride); - nbytes = crypto_cbc_decrypt_blocks(&walk, tfm, - aesbs_decrypt_one); + kernel_neon_begin(); + bsaes_cbc_encrypt(walk.src.virt.addr, walk.dst.virt.addr, + nbytes, &ctx->dec, walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } @@ -186,6 +173,12 @@ static int aesbs_ctr_encrypt(struct skcipher_request *req) __be32 *ctr = (__be32 *)walk.iv; u32 headroom = UINT_MAX - be32_to_cpu(ctr[3]); + if (walk.nbytes < walk.total) { + blocks = round_down(blocks, + walk.stride / AES_BLOCK_SIZE); + tail = walk.nbytes - blocks * AES_BLOCK_SIZE; + } + /* avoid 32 bit counter overflow in the NEON code */ if (unlikely(headroom < blocks)) { blocks = headroom + 1; @@ -198,6 +191,9 @@ static int aesbs_ctr_encrypt(struct skcipher_request *req) kernel_neon_end(); inc_be128_ctr(ctr, blocks); + if (tail > 0 && tail < AES_BLOCK_SIZE) + break; + err = skcipher_walk_done(&walk, tail); } if (walk.nbytes) { @@ -227,11 +223,16 @@ static int aesbs_xts_encrypt(struct skcipher_request *req) AES_encrypt(walk.iv, walk.iv, &ctx->twkey); while (walk.nbytes) { + unsigned int nbytes = walk.nbytes; + + if (nbytes < walk.total) + nbytes = round_down(nbytes, walk.stride); + kernel_neon_begin(); bsaes_xts_encrypt(walk.src.virt.addr, walk.dst.virt.addr, - walk.nbytes, &ctx->enc, walk.iv); + nbytes, &ctx->enc, walk.iv); kernel_neon_end(); - err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } @@ -249,11 +250,16 @@ static int aesbs_xts_decrypt(struct skcipher_request *req) AES_encrypt(walk.iv, walk.iv, &ctx->twkey); while (walk.nbytes) { + unsigned int nbytes = walk.nbytes; + + if (nbytes < walk.total) + nbytes = round_down(nbytes, walk.stride); + kernel_neon_begin(); bsaes_xts_decrypt(walk.src.virt.addr, walk.dst.virt.addr, - walk.nbytes, &ctx->dec, walk.iv); + nbytes, &ctx->dec, walk.iv); kernel_neon_end(); - err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } return err; } @@ -272,6 +278,7 @@ static struct skcipher_alg aesbs_algs[] = { { .min_keysize = AES_MIN_KEY_SIZE, .max_keysize = AES_MAX_KEY_SIZE, .ivsize = AES_BLOCK_SIZE, + .walksize = 8 * AES_BLOCK_SIZE, .setkey = aesbs_cbc_set_key, .encrypt = aesbs_cbc_encrypt, .decrypt = aesbs_cbc_decrypt, @@ -290,6 +297,7 @@ static struct skcipher_alg aesbs_algs[] = { { .max_keysize = AES_MAX_KEY_SIZE, .ivsize = AES_BLOCK_SIZE, .chunksize = AES_BLOCK_SIZE, + .walksize = 8 * AES_BLOCK_SIZE, .setkey = aesbs_ctr_set_key, .encrypt = aesbs_ctr_encrypt, .decrypt = aesbs_ctr_encrypt, @@ -307,6 +315,7 @@ static struct skcipher_alg aesbs_algs[] = { { .min_keysize = 2 * AES_MIN_KEY_SIZE, .max_keysize = 2 * AES_MAX_KEY_SIZE, .ivsize = AES_BLOCK_SIZE, + .walksize = 8 * AES_BLOCK_SIZE, .setkey = aesbs_xts_set_key, .encrypt = aesbs_xts_encrypt, .decrypt = aesbs_xts_decrypt,