From patchwork Sat Mar 10 15:22:02 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131307 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2245103lja; Sat, 10 Mar 2018 07:23:28 -0800 (PST) X-Google-Smtp-Source: AG47ELvgkNOZlDLPFr1ZIxLSRBgCYYq9QpsaEZ3Qp3tb4MPdd2jU+GRYPaezNO/8dGKWAhEbH9/t X-Received: by 2002:a17:902:69cf:: with SMTP id m15-v6mr2441934pln.104.1520695408798; Sat, 10 Mar 2018 07:23:28 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695408; cv=none; d=google.com; s=arc-20160816; b=RlwNAPNixkhmM5ZdEMz8hAbydWw4MeWBvnWKZNE3JKAZjJxYLQRsboMFI1HVTYQVYP +TgB5HsFsR/tUPOXvE+AbTvHsn2n0WcFlZS3C94IbD0ssd93K3ahlNlJI8oS7ZK2igUf AH7Zxy41XL+NHrUvjylTFa4Plhx1Xv/vmJT/zcwx6N/+JQIJiw1EbSJ1W+e4XqaWLf2q 7xF9Fo9I4PvXbQX/D3MgPwoxiJ+br1GVObDe8xRaSedD0nZRw1WC2OGcKubR7dwaN9Lm Vx6E+xdoLwx97UBLCdwzZYa1bQInk1M3sNfR8S/UnuVbgHZMbIE2CaddPzCSacrkxD6Y nKqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=hzExnUsi1U7yux9CliLi5bwg9/Gzl1FVceOfYaKTmwI=; b=pwQ0NcNYttY2yoe2wDz0mBTKiU5G72flSn7eN+5+P0L5Nw80SmJsHWewvnED5cZwx2 9Id4nRP7vUrSYDoL0MttJ+NPDXO/UMLtP48ib3Rp/5vksDJcEd48EcXzpLkuPBNlutc5 tu0zX2/1HlM+vSzQpNX9IX7TG2p2UpM3N4Def8Uldb7SmUd3ffdbC/GDU/ogwS+HS54/ +bMyIpK76LqWvwBEo7ydWUSWLEQYMCOhnZ3n5zQOnD4sVlnwBfnyG0bL3OMSy8V3uTRV 2CBRIo4Ur4j4tEtKyzPkDMggJuhYCKitYnt0LHT0bkm5lvDq2zSeh78ewhyxI2agNp6r Fmvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CBh3k7gT; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.23.28; Sat, 10 Mar 2018 07:23:28 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CBh3k7gT; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932363AbeCJPX0 (ORCPT + 1 other); Sat, 10 Mar 2018 10:23:26 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:51872 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932355AbeCJPXV (ORCPT ); Sat, 10 Mar 2018 10:23:21 -0500 Received: by mail-wm0-f68.google.com with SMTP id h21so8873810wmd.1 for ; Sat, 10 Mar 2018 07:23:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=hzExnUsi1U7yux9CliLi5bwg9/Gzl1FVceOfYaKTmwI=; b=CBh3k7gT+TdA5CSNM7NYhh5dYjah4ZH/diSamieP3QiZ6oa/bJqpMwr8PruDKZiHqD sGgDCnCqcSTTxk7ouxWBg/bRIFhhV3dbX+rhEYtmYdGknBvZCZ72xp+qY1HhWTG0qvcy yKXUYHjeFWIATE1aN7FFMLESVlAO/m6M9R6Zg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=hzExnUsi1U7yux9CliLi5bwg9/Gzl1FVceOfYaKTmwI=; b=AFE8/49MAMIgSgXSECCE1uKVSTDeoyerO2mc0SeYyUpF+pGn6nv7Nf9vSJY3ofsZKq oHxaSo7qCEiBpjk6I0E6eW01yzpqxi0yIzqyly0oDa03DGGwjpFArCxXO6CmsKulAQ5H fioKBoxS17+TpXAemm9iUwIYNiNYxZrKLJ8KuFcMGENJz7RIA8pMbL72MJlRjsn3a37K Zqz3EkPIAan0Qlp07qP4XXjpJaCrcwFVWMBZAzYPOiTNC/KigJG7uH2PdZCW2jfkJUxu KMJgsrTKillYdqQpcxppjwsVkFoF4pIuR6DgtSRg9z55aH6cVKT1xkI2SEATs5UK043M iqqA== X-Gm-Message-State: AElRT7F3/ZNC06YDwCaL1RLXJPxCbK8Cb7rmA6UVqJo1DhlAEaVSsdhj VJfvpJ7m6kFfqVwrDVoSMW2ayq77+zk= X-Received: by 10.28.210.81 with SMTP id j78mr1185285wmg.55.1520695400035; Sat, 10 Mar 2018 07:23:20 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.23.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:23:18 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 17/23] crypto: arm64/aes-ghash - yield NEON after every block of input Date: Sat, 10 Mar 2018 15:22:02 +0000 Message-Id: <20180310152208.10369-18-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 113 ++++++++++++++------ arch/arm64/crypto/ghash-ce-glue.c | 28 +++-- 2 files changed, 97 insertions(+), 44 deletions(-) -- 2.15.1 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..dcffb9e77589 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,22 +213,31 @@ .endm .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f + + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b - st1 {XL.2d}, [x1] +3: st1 {XL.2d}, [x20] + frame_pop ret .endm @@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + frame_push 10 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + .if \enc == 1 + ldr x27, [sp, #96] // first stacked arg + .endif + + ldr x28, [x24, #8] // load lower counter +CPU_LE( rev x28, x28 ) + +0: mov x0, x25 + load_round_keys w26, x0 + ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x27] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x28 + add x28, x28, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w26, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x27] + .endif + do_cond_yield_neon + b 0b + endif_yield_neon + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x27] .endif +CPU_LE( rev x28, x28 ) + str x28, [x24, #8] // store lower counter + + frame_pop ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /* diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..7cf0b1aa6ea8 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds, u8 ks[]); + u8 ctr[], u32 const rk[], int rounds, + u8 ks[]); asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds); + u8 ctr[], u32 const rk[], int rounds); asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], u32 const rk[], int rounds); @@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key), ks); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key)); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE;