From patchwork Wed Dec 6 19:43:43 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120901 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp7469255qgn; Wed, 6 Dec 2017 11:45:02 -0800 (PST) X-Google-Smtp-Source: AGs4zMY+9ouzLtwY+wmnSy6wZBvZUobkG7scgzBIjqR5zS+uhYr7BcSyglX8T/HR0geRNye/NGDi X-Received: by 10.99.141.75 with SMTP id z72mr21747224pgd.73.1512589501907; Wed, 06 Dec 2017 11:45:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512589501; cv=none; d=google.com; s=arc-20160816; b=VPd4CxS4DU4n4K3lBLVji73RSZiolMtCL/37CqT50J6h67owG0YrOOvMr5zS0oMvz2 U6LYHE4uelKcw1a9ole23M8IluDutFiweytaSmbfYCzZ13/tcuWXwJVIYSAkMwj5u+ko FzxFuLo98eBZBzYp8XaJuhaEHfglkrxL51EZFXygNFp1X++Xubo1/gomuOx5LTTMWOw0 GIglXSOCT8rSeDGFy/3Yc77xtfEAZzD+5eJKXVYEufa1N34Snf/ISap2PtawtUU0vgAd 2BGRS6hPnWYVigpwPRHk0bPD7tuIo6aO657e4BME5QlQFHVfQ0d19+CBy9Z0acemrRaZ IF/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=56HLO1hTwAirQLQREK7pSWn82Sof/aZd1e7vq37Xg9E=; b=yJ6xqfma/SxZMP2f9B7Gj8dNrOH2lAh0RNs3bnL4rGgooyRjz9equH5H49ZjG5drv4 dMVINYu9bwKw/A0MXuDHq4aIoAX2NOqRh79aiax2u59HTrLGj306MILhSOyWg846Ksk6 Qvm9Q3yZPe+RFu2pm5RcKm/QUuaVmeVfPK71bvpx4j5J9Hzig14oKbXkpSrL3+poDuUs EdRMFZTqhHQJ8/7Buj2lILaJ9yn5XzGgeakOPNlhsB9Qd0k66aQUiXGzTcr+JTynfOrh zVjaVJa/tXqw05baVFszjZBEUQIiMZjs0E1eQhSAZywGc9JjxlgO1yLBB+VvKwD9QMzv +D3Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MjJ1+Qph; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i16si2382524pgv.496.2017.12.06.11.45.01; Wed, 06 Dec 2017 11:45:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MjJ1+Qph; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752503AbdLFTpA (ORCPT + 1 other); Wed, 6 Dec 2017 14:45:00 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:38558 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752449AbdLFTos (ORCPT ); Wed, 6 Dec 2017 14:44:48 -0500 Received: by mail-wm0-f67.google.com with SMTP id 64so9079786wme.3 for ; Wed, 06 Dec 2017 11:44:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=56HLO1hTwAirQLQREK7pSWn82Sof/aZd1e7vq37Xg9E=; b=MjJ1+QphBa6L1FoU1vU1zgGB4tMbJwUojxL/pPqwt6UYLdpxBZVtiYeNYEhEF/oMaZ q1lLIchsKVNI0kfZF9zkQH3rYA1zl+E5r9ufh5UwSedU+tNtqBoWYvePgzrbgnyJwJ39 y1orqL0p28DbeQwi6r8rM0ePMyEAT9FUwywEs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=56HLO1hTwAirQLQREK7pSWn82Sof/aZd1e7vq37Xg9E=; b=AEH0iXXs6prxs4oBkd1KWbm7E2xdZ4J3+XR7umHvQ7UGYZW+tu0hdxcahy+vwzrEj3 sYACrLrkCxK/czkekBuSlSapYJZg4tnnOoX8R80f5ARW9xKVHIhoSiNWWd5H4oMANyM2 mXZkVD3OYTM4cnQW3kyLtZTEIDdX0fXxRwE+R9F/bVzfwQsIvfrle6jsjb8UBe2Xkzvy v7fMfeIyROO+vlfuMGxKCL5hJBpymTSDZdadLbRTPIHBb78YQ/mXygxe9a6W0X1S6CdY QHb5hDAud9eNFct4y1wddYtmnaMawcrejn+izuuzXhKFnfQbwm/sTB9jExfYXEtwPMMF OmYA== X-Gm-Message-State: AKGB3mIuK3g4PXq+IAeb5da7BXup5Ja3SHr7RuCxdd5zF3/zxURXz+dr GpL6s6vGS50FOjA88E1xRKFy32DGyQA= X-Received: by 10.28.69.8 with SMTP id s8mr9499380wma.65.1512589487285; Wed, 06 Dec 2017 11:44:47 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id b66sm3596594wmh.32.2017.12.06.11.44.44 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 06 Dec 2017 11:44:46 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v3 17/20] crypto: arm64/aes-ghash - yield NEON after every block of input Date: Wed, 6 Dec 2017 19:43:43 +0000 Message-Id: <20171206194346.24393-18-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171206194346.24393-1-ard.biesheuvel@linaro.org> References: <20171206194346.24393-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 113 ++++++++++++++------ arch/arm64/crypto/ghash-ce-glue.c | 28 +++-- 2 files changed, 97 insertions(+), 44 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..8da87cfcce66 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,22 +213,31 @@ .endm .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f + + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b - st1 {XL.2d}, [x1] +3: st1 {XL.2d}, [x20] + frame_pop 5 ret .endm @@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + frame_push 10 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + .if \enc == 1 + ldr x27, [sp, #96] // first stacked arg + .endif + + ldr x28, [x24, #8] // load lower counter +CPU_LE( rev x28, x28 ) + +0: mov x0, x25 + load_round_keys w26, x0 + ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x27] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x28 + add x28, x28, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w26, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x27] + .endif + do_cond_yield_neon + b 0b + endif_yield_neon + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x27] .endif +CPU_LE( rev x28, x28 ) + str x28, [x24, #8] // store lower counter + + frame_pop 10 ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /* diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..7cf0b1aa6ea8 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds, u8 ks[]); + u8 ctr[], u32 const rk[], int rounds, + u8 ks[]); asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds); + u8 ctr[], u32 const rk[], int rounds); asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], u32 const rk[], int rounds); @@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key), ks); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key)); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE;