From patchwork Mon Apr 30 16:18:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134716 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948251lji; Mon, 30 Apr 2018 09:18:58 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqdWpWHeRh4JVl4JSdaTd5m0RaDIQztPhlyrqRVIrOtRDL6okA8nZOrm3VpWieqBp66GqaH X-Received: by 2002:a63:7516:: with SMTP id q22-v6mr10527816pgc.68.1525105138248; Mon, 30 Apr 2018 09:18:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105138; cv=none; d=google.com; s=arc-20160816; b=b7cFmy0zUjPaH0Vd7Ghbzqi7jbUKOOi0e4PyaouqNVs0xoE4nKLgbJdR/5Kn4WXZz3 VFplaxXEFfTS1kMiZ7LtYiK1+MeIaBqbJkmYtCILHnaPrEdl0QddaVxZuQZXkKphoyHo 8o82INsDxOBUQsxPi0arIQaOhVQy3EELVJEPy1CokozcAy+bNXn7PyEG1IfA+CGXNUDB aj3IUWXtjtt7CrjVXvAbNzE4bphzejCUFSoOJFqLze3//QEkNaBg7LBfDLYMzip+g/Us 7wUnVQ6Qp44TaGIZGPS/hO8S1WBbr9h5wrcSBYI2fJ2gXLvdgbqzob3g6V7KeLoclc+c ZsOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=otq+ZDRLnmO2iM8H5dpLOgh6Ie0uRtzEdTL5p3+yWuw=; b=HJOl0dLOZIWfVhIvZqYWQ6iIH5KzsniuSBC6S65YDJWjwZdADyduvosqZG3MpWITBl Yv7zKB4r8ONnCuBmGUyxe9FseSpOliZkYAJK2ceRhBYDxI/qhX3Hn6c7puC6JmOcctFL kuERPSM3v+IPWhGcsq96ayaH5In1YRs7BdFQAgfYdl2bqRLv9JslffT5gtx4+8CPBd+b wdS4GfpiQWdo9Abx6DZTbd3xENQyiYkeSG1ldKuO3DzsawgpH63Fg+ocbiAJ4dsNwZi/ 71ifEyxuOSvXbvfu4IT7VavQ0AGX4Kt0SKlq/arIdoQ3/GCs8MQpIZqOsPjr4DG+tO6p svOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OT9/HXm0; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.58; Mon, 30 Apr 2018 09:18:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OT9/HXm0; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754628AbeD3QS5 (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:57 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:39113 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QS4 (ORCPT ); Mon, 30 Apr 2018 12:18:56 -0400 Received: by mail-wr0-f193.google.com with SMTP id q3-v6so8580334wrj.6 for ; Mon, 30 Apr 2018 09:18:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=otq+ZDRLnmO2iM8H5dpLOgh6Ie0uRtzEdTL5p3+yWuw=; b=OT9/HXm08H2z0lZV81YHpL+x3eri0vbs54TzHsCwgk+oCiG37sVNx4FpPF8BKNbGCz mKDdYZ4JMdlCZk+hvK7+CZ9S8RiRH6aA6ba6rl86DWDGFSqjsRQDAieFEH+AwDyU70Vv JZA8C0YM/aXiUzrY5VLympTSqkcCaY4d5cJDI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=otq+ZDRLnmO2iM8H5dpLOgh6Ie0uRtzEdTL5p3+yWuw=; b=XAIkRqfZqWwUh71oYCFEgKJCfSBKBRSR2PbceAH87qRF3cH1Ty3Muf09zTPGaY4Kxh zu2RjlbRrHAoXrY7EIydtn+9aeu/9NyYJ31GpRBhnR5VTa16tqCf8ApkFoL9unLOeDRH 6cfyKS3CD/NMFZUPhuGmh7P6wH/JrD8HI0S+JqoqnDI8BfJwSKRXMzewVhJ3PEpq2dUc rt27HJ05C8Wwwn1kgbyTRey1xds27nZSer3fCgdKU6nPnigMP3Z2D+t3EIdv5jqFb8pj 6Es95+xoVWM6dtv+u5F32gmJ/JA3yeSajcSYko58efHgY31KNfYBlWrgbrlJXDi7IKc5 D4hA== X-Gm-Message-State: ALQs6tDRcYS+vT/ypbd7lmPYyFmFjkMCxZRKMVkQhLyRgiX1kVj8o17I hpzVYfnYJ9AfOpoR7js/mBjmx4904F4= X-Received: by 2002:adf:aa48:: with SMTP id q8-v6mr8775466wrd.140.1525105134845; Mon, 30 Apr 2018 09:18:54 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:54 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 06/10] crypto: arm64/aes-ghash - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:26 +0200 Message-Id: <20180430161830.14892-7-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 113 ++++++++++++++------ arch/arm64/crypto/ghash-ce-glue.c | 28 +++-- 2 files changed, 97 insertions(+), 44 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..dcffb9e77589 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,22 +213,31 @@ .endm .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f + + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b - st1 {XL.2d}, [x1] +3: st1 {XL.2d}, [x20] + frame_pop ret .endm @@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + frame_push 10 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + .if \enc == 1 + ldr x27, [sp, #96] // first stacked arg + .endif + + ldr x28, [x24, #8] // load lower counter +CPU_LE( rev x28, x28 ) + +0: mov x0, x25 + load_round_keys w26, x0 + ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x27] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x28 + add x28, x28, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w26, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x27] + .endif + do_cond_yield_neon + b 0b + endif_yield_neon + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x27] .endif +CPU_LE( rev x28, x28 ) + str x28, [x24, #8] // store lower counter + + frame_pop ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /* diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..7cf0b1aa6ea8 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds, u8 ks[]); + u8 ctr[], u32 const rk[], int rounds, + u8 ks[]); asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds); + u8 ctr[], u32 const rk[], int rounds); asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], u32 const rk[], int rounds); @@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key), ks); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key)); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE;