From patchwork Mon Sep 10 14:41:12 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 146324 Delivered-To: patch@linaro.org Received: by 2002:a2e:1648:0:0:0:0:0 with SMTP id 8-v6csp2574211ljw; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) X-Google-Smtp-Source: ANB0VdY7Pv4dseL7EkUJGrq/vlXBz0sT+aVUZobay/lifw7vb6D56TATK1eoj5F1Qppea10myc3S X-Received: by 2002:a63:b95e:: with SMTP id v30-v6mr22810690pgo.221.1536590621352; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536590621; cv=none; d=google.com; s=arc-20160816; b=gfLpfMpqFYQa3VgM+LXYvLrp6HdTOGsn1Zxyhrg6Sj5AXZwTW0ARJ2j8nEwyABuSdQ VpjTlCdAmBSnHqbM1pMPuK8FoBkmv0pLvEAenjJRdn74aS+yHdii02omfiWK1KizQrG9 EWGeBqvr44r+w3RMeyTk/9EMNfl1FcBjYQev4uh5CHhshOWSuaNuI9QY62ATMeF1gdzR 5BZgAZb/s/eO4ocl5Wi2aO3xcGCAYukPRGY04s3KGnh4KC8zwnYORMZUkAogOlszw1GR w3nhz3zzh1sf87bOFmcXf1Us6tf1NRydIXEF+2y6luNrbqbBwgtHBHcb1zwDw2qrbVhd Mo+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=DxzK5rTVQtRtOt64mIU0TR0vJxnD1pnarcwPWibBfSM=; b=txNGaX6OA3eS2srd8tn6ZDsqB2teauGu5jpjw27bhteUZXTPP+KrTksaVIBgFgtEve 15U4RDOgbm+bPaP8aa3y54+NQwbIr0gi/y2amqcAEa9xxTsQT9a6UZcLwg6dOId2F1Is kJV9B2JGxEUp9nQNQcsPLQhs62bZWYqEy4Yxv+i9kKaT1voscM7m2yk2fKQHep7wOdEW lLn03eIMrdAh7YbU4ZyAa40f4F1yoVvczOjPm5EeT0QqYLg2OO2AHYkNENIcHm3JCFhO JxjLjjCxC8oG9nWdk8wIIXlnx5kA8tFN+cO5XDsf4l5F0okWtxpJ/sZ/5QnaxR+Sb1Ph McCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VhFaOUa6; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n16-v6si18397464pgl.508.2018.09.10.07.43.40; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VhFaOUa6; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728494AbeIJTiE (ORCPT + 2 others); Mon, 10 Sep 2018 15:38:04 -0400 Received: from mail-ed1-f65.google.com ([209.85.208.65]:44739 "EHLO mail-ed1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728127AbeIJTiD (ORCPT ); Mon, 10 Sep 2018 15:38:03 -0400 Received: by mail-ed1-f65.google.com with SMTP id s10-v6so16690807edb.11 for ; Mon, 10 Sep 2018 07:43:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=DxzK5rTVQtRtOt64mIU0TR0vJxnD1pnarcwPWibBfSM=; b=VhFaOUa6vuFA+XfHPuBw+IbWMewIlyezYp8h8Z5LfU56yqQ+IZEwoHhOl1ySQGZRJe 3ZQsvAFbW/+rakGtmABz2JPaSO0yhlU2mHya8NoD0LvgDQiSjDm17qBKmvCq8iv8RsZ2 hxjt9/UhFaphdEfq6upzkiSFrF8reD3agTfAM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=DxzK5rTVQtRtOt64mIU0TR0vJxnD1pnarcwPWibBfSM=; b=VfBF5/8TD3wPTVIPSrXrwpMySkh9yCAmPxt53PqXCfGHe0WWJEuT2jnDcg5+e7Xlpw g5Z5YpQrFhx6zOSeXTEEr4azO8B1LdAj1rz/RTlprlLt0dv7u3Ut0+0PWodKyZghEU+F DGCaDRxIVOBsCFvRH8F/0zwxaBzUkHYQnOtmx0zVfJF1dyFKER+yRcFRMHaD3J272Fmv oqK2hBQRGdowMWs3uopz+7Z13Ci8VeVOPkdyAUyLZul00uIhepQsQ2HB5Z3nBXX6oyJu Q4D2zNQtAOzc3wQhf+rD3ce2HFW7h4XaC0jxSFVlvlna1oWjD+FD0inathCc9mdf9hZN RzXQ== X-Gm-Message-State: APzg51Dim7dq5C3THfSJDGlyZSWZv1c/2cUwc7wjIgF7aG4Hk01OY/iG rCZQUkPboTAc46MMLy3o1ew+1JdFBilxRq1y X-Received: by 2002:a50:a93c:: with SMTP id l57-v6mr23265868edc.229.1536590615465; Mon, 10 Sep 2018 07:43:35 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:34 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Eric Biggers , Theodore Ts'o , Steve Capper Subject: [PATCH 1/4] crypto: arm64/aes-blk - remove pointless (u8 *) casts Date: Mon, 10 Sep 2018 16:41:12 +0200 Message-Id: <20180910144115.25727-2-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org For some reason, the asmlinkage prototypes of the NEON routines take u8[] arguments for the round key arrays, while the actual round keys are arrays of u32, and so passing them into those routines requires u8* casts at each occurrence. Fix that. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 47 ++++++++++---------- 1 file changed, 23 insertions(+), 24 deletions(-) -- 2.18.0 diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index adcb83eb683c..1c6934544c1f 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -63,24 +63,24 @@ MODULE_AUTHOR("Ard Biesheuvel "); MODULE_LICENSE("GPL v2"); /* defined in aes-modes.S */ -asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks); -asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks); -asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 iv[]); -asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 iv[]); -asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], +asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 ctr[]); -asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], - int rounds, int blocks, u8 const rk2[], u8 iv[], +asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u32 const rk1[], + int rounds, int blocks, u32 const rk2[], u8 iv[], int first); -asmlinkage void aes_xts_decrypt(u8 out[], u8 const in[], u8 const rk1[], - int rounds, int blocks, u8 const rk2[], u8 iv[], +asmlinkage void aes_xts_decrypt(u8 out[], u8 const in[], u32 const rk1[], + int rounds, int blocks, u32 const rk2[], u8 iv[], int first); asmlinkage void aes_mac_update(u8 const in[], u32 const rk[], int rounds, @@ -142,7 +142,7 @@ static int ecb_encrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks); + ctx->key_enc, rounds, blocks); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -162,7 +162,7 @@ static int ecb_decrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks); + ctx->key_dec, rounds, blocks); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -182,7 +182,7 @@ static int cbc_encrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + ctx->key_enc, rounds, blocks, walk.iv); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -202,7 +202,7 @@ static int cbc_decrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, walk.iv); + ctx->key_dec, rounds, blocks, walk.iv); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -222,7 +222,7 @@ static int ctr_encrypt(struct skcipher_request *req) while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { kernel_neon_begin(); aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + ctx->key_enc, rounds, blocks, walk.iv); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -238,7 +238,7 @@ static int ctr_encrypt(struct skcipher_request *req) blocks = -1; kernel_neon_begin(); - aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds, + aes_ctr_encrypt(tail, NULL, ctx->key_enc, rounds, blocks, walk.iv); kernel_neon_end(); crypto_xor_cpy(tdst, tsrc, tail, nbytes); @@ -272,8 +272,8 @@ static int xts_encrypt(struct skcipher_request *req) for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { kernel_neon_begin(); aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key1.key_enc, rounds, blocks, - (u8 *)ctx->key2.key_enc, walk.iv, first); + ctx->key1.key_enc, rounds, blocks, + ctx->key2.key_enc, walk.iv, first); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -294,8 +294,8 @@ static int xts_decrypt(struct skcipher_request *req) for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { kernel_neon_begin(); aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key1.key_dec, rounds, blocks, - (u8 *)ctx->key2.key_enc, walk.iv, first); + ctx->key1.key_dec, rounds, blocks, + ctx->key2.key_enc, walk.iv, first); kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } @@ -412,7 +412,6 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, { struct mac_tfm_ctx *ctx = crypto_shash_ctx(tfm); be128 *consts = (be128 *)ctx->consts; - u8 *rk = (u8 *)ctx->key.key_enc; int rounds = 6 + key_len / 4; int err; @@ -422,7 +421,8 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, /* encrypt the zero vector */ kernel_neon_begin(); - aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1); + aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, ctx->key.key_enc, + rounds, 1); kernel_neon_end(); cmac_gf128_mul_by_x(consts, consts); @@ -441,7 +441,6 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, }; struct mac_tfm_ctx *ctx = crypto_shash_ctx(tfm); - u8 *rk = (u8 *)ctx->key.key_enc; int rounds = 6 + key_len / 4; u8 key[AES_BLOCK_SIZE]; int err; @@ -451,8 +450,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, return err; kernel_neon_begin(); - aes_ecb_encrypt(key, ks[0], rk, rounds, 1); - aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2); + aes_ecb_encrypt(key, ks[0], ctx->key.key_enc, rounds, 1); + aes_ecb_encrypt(ctx->consts, ks[1], ctx->key.key_enc, rounds, 2); kernel_neon_end(); return cbcmac_setkey(tfm, key, sizeof(key)); From patchwork Mon Sep 10 14:41:13 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 146325 Delivered-To: patch@linaro.org Received: by 2002:a2e:1648:0:0:0:0:0 with SMTP id 8-v6csp2574221ljw; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbrkiSkzsgDLK5HQYpttlg/QxCQVIF/2J/4coEi9C6m/2UAese4cxxvQ00Im3XVOlqeKF+k X-Received: by 2002:a63:f713:: with SMTP id x19-v6mr22559661pgh.233.1536590621670; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536590621; cv=none; d=google.com; s=arc-20160816; b=kGSwzW435wPovGYn1IzVTrISQZJIGX1LcznmNUvJYX1PQXjxunKYJLG/zuMY3SeVSC up5+LfG5YsAa06T4oPqdsG2BQhlbmTDiC4F9skrRzFHyi+XGNUqVnwvB02io9Or2XeKm NgmQ1JcF8bDAVT8pi3C8Xdt3zyIqhV1vuAwJx4QhvzDg1UdiudCLaC2VGZZYPo4ol44w DuU0yfUsCINatt61+0Gj/GeR3z7wJZDyDrYWxWsfkGD0zJe9XAgJA0qfu529218P/ZJ2 NKVDZ1wVpad34tchoyKb57N0SJfJqxTDl3cMRMct7RVv/ndjgH+Vlduh0jX4a/ObyfOQ ZCmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=J3nTDlngnQajkhPfrknhmz15LcdNryx7N6oB4x8rhF8=; b=XQ+vvgv+IPP/3FodPC/uNZYMcx71xfSFvlUD7Qt3RXpszLJ4GrPUDv5PqXiKIAdpvv Cu45s6Cjn1uW1QQzt98hPKb7ZRWfjJ+MetFBZc+btZOE7WQh3Qy2SrPnssgWd0t42C8i mcLQls0Ps80fpCAEWW1Y4un/mHODF9C+JKj4t4Hh4rs/jO4IlV0f5jrPhpLmzBQjZZib SVM6HyMHabkRpGKFIedM+DnqIjaZ88FctGrBXkYeE6NQtpbJbaFugs65eUcrkmoAv8F3 mMTp5dD20pBTQXanm4yeH6GjY5M9OQCxi5ZlhvDrmCLG05eWdfBqGgkKdcPZ7JDd2QRY RiSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EZzcachb; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n16-v6si18397464pgl.508.2018.09.10.07.43.41; Mon, 10 Sep 2018 07:43:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EZzcachb; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728741AbeIJTiF (ORCPT + 2 others); Mon, 10 Sep 2018 15:38:05 -0400 Received: from mail-ed1-f68.google.com ([209.85.208.68]:46275 "EHLO mail-ed1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728439AbeIJTiE (ORCPT ); Mon, 10 Sep 2018 15:38:04 -0400 Received: by mail-ed1-f68.google.com with SMTP id k14-v6so16697916edr.13 for ; Mon, 10 Sep 2018 07:43:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=J3nTDlngnQajkhPfrknhmz15LcdNryx7N6oB4x8rhF8=; b=EZzcachbkmLjHiDx24FWbqlumeUalxsgQ34TgYIeh5kf3ojs9qfoLhUV2e9mS8nVYw cj4gJIdwstBHN8SOV8cCwWeSHJ1nd/CIz76uOHQyVA6gP6Gc2RgeZ876JSljyVzKrVJ9 PbrGBTTnnxD28umHjw+0vzMQ1FkpgwcR92/zE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=J3nTDlngnQajkhPfrknhmz15LcdNryx7N6oB4x8rhF8=; b=t1oEPLsA968zu0XWYijXJDuiEHiL1BLNv0vU/9s0SBq52W0PqXTAv8LxK7ES9Um6ts JC/Xq3Nviey4zRbc6LY6za2GGvfJexPOabT/TbY+lOT5a1MYHcE2ilW48cMC/13TgZtZ kMUvquUM6vi6B24bFkRh6mxF3LTPMHpgjnMIpjn93D6cp03KNqeaphLNfSq1vogXSx9N S1MvzqxW57q3ulZsHxBiD4oNiKnj1QoMk32CmLQbI7s81UlVx1BbR6opgegobA7OEOvj sxp9r5bFjCyKKzn4XerjO2T0vak4kEPL3/LlSYcjo5AFxC/37VGaUBtz4QIGG/5OeA/J Cu4g== X-Gm-Message-State: APzg51AmEj9cR0rx3KerrE2+R0vbM8yfVGXSdsk793RkzR0WiscGa2XK Apnl8kPISfotXU87yiUNp1+BjLr9d7KQErLw X-Received: by 2002:a50:9732:: with SMTP id c47-v6mr23025034edb.89.1536590616709; Mon, 10 Sep 2018 07:43:36 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:35 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Eric Biggers , Theodore Ts'o , Steve Capper Subject: [PATCH 2/4] crypto: arm64/aes-blk - revert NEON yield for skciphers Date: Mon, 10 Sep 2018 16:41:13 +0200 Message-Id: <20180910144115.25727-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The reasoning of commit f10dc56c64bb ("crypto: arm64 - revert NEON yield for fast AEAD implementations") applies equally to skciphers: the walk API already guarantees that the input size of each call into the NEON code is bounded to the size of a page, and so there is no need for an additional TIF_NEED_RESCHED flag check inside the inner loop. So revert the skcipher changes to aes-modes.S (but retain the mac ones) This partially reverts commit 0c8f838a52fe9fd82761861a934f16ef9896b4e5. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 281 ++++++++------------ 1 file changed, 108 insertions(+), 173 deletions(-) -- 2.18.0 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 496c243de4ac..35632d11200f 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -14,12 +14,12 @@ .align 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -31,71 +31,57 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - frame_push 5 + stp x29, x30, [sp, #-16]! + mov x29, sp - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - -.Lecbencrestart: - enc_prepare w22, x21, x5 + enc_prepare w3, x2, x5 .LecbencloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lecbenc1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ bl aes_encrypt_block4x - st1 {v0.16b-v3.16b}, [x19], #64 - cond_yield_neon .Lecbencrestart + st1 {v0.16b-v3.16b}, [x0], #64 b .LecbencloopNx .Lecbenc1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lecbencout .Lecbencloop: - ld1 {v0.16b}, [x20], #16 /* get next pt block */ - encrypt_block v0, w22, x21, x5, w6 - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + ld1 {v0.16b}, [x1], #16 /* get next pt block */ + encrypt_block v0, w3, x2, x5, w6 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 bne .Lecbencloop .Lecbencout: - frame_pop + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) - frame_push 5 + stp x29, x30, [sp, #-16]! + mov x29, sp - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - -.Lecbdecrestart: - dec_prepare w22, x21, x5 + dec_prepare w3, x2, x5 .LecbdecloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lecbdec1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ bl aes_decrypt_block4x - st1 {v0.16b-v3.16b}, [x19], #64 - cond_yield_neon .Lecbdecrestart + st1 {v0.16b-v3.16b}, [x0], #64 b .LecbdecloopNx .Lecbdec1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lecbdecout .Lecbdecloop: - ld1 {v0.16b}, [x20], #16 /* get next ct block */ - decrypt_block v0, w22, x21, x5, w6 - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + ld1 {v0.16b}, [x1], #16 /* get next ct block */ + decrypt_block v0, w3, x2, x5, w6 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 bne .Lecbdecloop .Lecbdecout: - frame_pop + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_ecb_decrypt) @@ -108,100 +94,78 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x5 - -.Lcbcencrestart: - ld1 {v4.16b}, [x24] /* get iv */ - enc_prepare w22, x21, x6 + ld1 {v4.16b}, [x5] /* get iv */ + enc_prepare w3, x2, x6 .Lcbcencloop4x: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lcbcenc1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ - encrypt_block v0, w22, x21, x6, w7 + encrypt_block v0, w3, x2, x6, w7 eor v1.16b, v1.16b, v0.16b - encrypt_block v1, w22, x21, x6, w7 + encrypt_block v1, w3, x2, x6, w7 eor v2.16b, v2.16b, v1.16b - encrypt_block v2, w22, x21, x6, w7 + encrypt_block v2, w3, x2, x6, w7 eor v3.16b, v3.16b, v2.16b - encrypt_block v3, w22, x21, x6, w7 - st1 {v0.16b-v3.16b}, [x19], #64 + encrypt_block v3, w3, x2, x6, w7 + st1 {v0.16b-v3.16b}, [x0], #64 mov v4.16b, v3.16b - st1 {v4.16b}, [x24] /* return iv */ - cond_yield_neon .Lcbcencrestart b .Lcbcencloop4x .Lcbcenc1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lcbcencout .Lcbcencloop: - ld1 {v0.16b}, [x20], #16 /* get next pt block */ + ld1 {v0.16b}, [x1], #16 /* get next pt block */ eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ - encrypt_block v4, w22, x21, x6, w7 - st1 {v4.16b}, [x19], #16 - subs w23, w23, #1 + encrypt_block v4, w3, x2, x6, w7 + st1 {v4.16b}, [x0], #16 + subs w4, w4, #1 bne .Lcbcencloop .Lcbcencout: - st1 {v4.16b}, [x24] /* return iv */ - frame_pop + st1 {v4.16b}, [x5] /* return iv */ ret AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x5 + stp x29, x30, [sp, #-16]! + mov x29, sp -.Lcbcdecrestart: - ld1 {v7.16b}, [x24] /* get iv */ - dec_prepare w22, x21, x6 + ld1 {v7.16b}, [x5] /* get iv */ + dec_prepare w3, x2, x6 .LcbcdecloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lcbcdec1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ mov v4.16b, v0.16b mov v5.16b, v1.16b mov v6.16b, v2.16b bl aes_decrypt_block4x - sub x20, x20, #16 + sub x1, x1, #16 eor v0.16b, v0.16b, v7.16b eor v1.16b, v1.16b, v4.16b - ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */ + ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */ eor v2.16b, v2.16b, v5.16b eor v3.16b, v3.16b, v6.16b - st1 {v0.16b-v3.16b}, [x19], #64 - st1 {v7.16b}, [x24] /* return iv */ - cond_yield_neon .Lcbcdecrestart + st1 {v0.16b-v3.16b}, [x0], #64 b .LcbcdecloopNx .Lcbcdec1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lcbcdecout .Lcbcdecloop: - ld1 {v1.16b}, [x20], #16 /* get next ct block */ + ld1 {v1.16b}, [x1], #16 /* get next ct block */ mov v0.16b, v1.16b /* ...and copy to v0 */ - decrypt_block v0, w22, x21, x6, w7 + decrypt_block v0, w3, x2, x6, w7 eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */ mov v7.16b, v1.16b /* ct is next iv */ - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 bne .Lcbcdecloop .Lcbcdecout: - st1 {v7.16b}, [x24] /* return iv */ - frame_pop + st1 {v7.16b}, [x5] /* return iv */ + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_cbc_decrypt) @@ -212,26 +176,19 @@ AES_ENDPROC(aes_cbc_decrypt) */ AES_ENTRY(aes_ctr_encrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x5 + stp x29, x30, [sp, #-16]! + mov x29, sp -.Lctrrestart: - enc_prepare w22, x21, x6 - ld1 {v4.16b}, [x24] + enc_prepare w3, x2, x6 + ld1 {v4.16b}, [x5] umov x6, v4.d[1] /* keep swabbed ctr in reg */ rev x6, x6 + cmn w6, w4 /* 32 bit overflow? */ + bcs .Lctrloop .LctrloopNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lctr1x - cmn w6, #4 /* 32 bit overflow? */ - bcs .Lctr1x add w7, w6, #1 mov v0.16b, v4.16b add w8, w6, #2 @@ -245,27 +202,25 @@ AES_ENTRY(aes_ctr_encrypt) rev w9, w9 mov v2.s[3], w8 mov v3.s[3], w9 - ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */ + ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */ bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b - ld1 {v5.16b}, [x20], #16 /* get 1 input block */ + ld1 {v5.16b}, [x1], #16 /* get 1 input block */ eor v1.16b, v6.16b, v1.16b eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b - st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v0.16b-v3.16b}, [x0], #64 add x6, x6, #4 rev x7, x6 ins v4.d[1], x7 - cbz w23, .Lctrout - st1 {v4.16b}, [x24] /* return next CTR value */ - cond_yield_neon .Lctrrestart + cbz w4, .Lctrout b .LctrloopNx .Lctr1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lctrout .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w22, x21, x8, w7 + encrypt_block v0, w3, x2, x8, w7 adds x6, x6, #1 /* increment BE ctr */ rev x7, x6 @@ -273,22 +228,22 @@ AES_ENTRY(aes_ctr_encrypt) bcs .Lctrcarry /* overflow? */ .Lctrcarrydone: - subs w23, w23, #1 + subs w4, w4, #1 bmi .Lctrtailblock /* blocks <0 means tail block */ - ld1 {v3.16b}, [x20], #16 + ld1 {v3.16b}, [x1], #16 eor v3.16b, v0.16b, v3.16b - st1 {v3.16b}, [x19], #16 + st1 {v3.16b}, [x0], #16 bne .Lctrloop .Lctrout: - st1 {v4.16b}, [x24] /* return next CTR value */ -.Lctrret: - frame_pop + st1 {v4.16b}, [x5] /* return next CTR value */ + ldp x29, x30, [sp], #16 ret .Lctrtailblock: - st1 {v0.16b}, [x19] - b .Lctrret + st1 {v0.16b}, [x0] + ldp x29, x30, [sp], #16 + ret .Lctrcarry: umov x7, v4.d[0] /* load upper word of ctr */ @@ -321,16 +276,10 @@ CPU_LE( .quad 1, 0x87 ) CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) - frame_push 6 + stp x29, x30, [sp, #-16]! + mov x29, sp - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x6 - - ld1 {v4.16b}, [x24] + ld1 {v4.16b}, [x6] cbz w7, .Lxtsencnotfirst enc_prepare w3, x5, x8 @@ -339,17 +288,15 @@ AES_ENTRY(aes_xts_encrypt) ldr q7, .Lxts_mul_x b .LxtsencNx -.Lxtsencrestart: - ld1 {v4.16b}, [x24] .Lxtsencnotfirst: - enc_prepare w22, x21, x8 + enc_prepare w3, x2, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsencNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lxtsenc1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -362,43 +309,35 @@ AES_ENTRY(aes_xts_encrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v0.16b-v3.16b}, [x0], #64 mov v4.16b, v7.16b - cbz w23, .Lxtsencout - st1 {v4.16b}, [x24] - cond_yield_neon .Lxtsencrestart + cbz w4, .Lxtsencout b .LxtsencloopNx .Lxtsenc1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lxtsencout .Lxtsencloop: - ld1 {v1.16b}, [x20], #16 + ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w22, x21, x8, w7 + encrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 beq .Lxtsencout next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: - st1 {v4.16b}, [x24] - frame_pop + st1 {v4.16b}, [x6] + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) - frame_push 6 - - mov x19, x0 - mov x20, x1 - mov x21, x2 - mov x22, x3 - mov x23, x4 - mov x24, x6 + stp x29, x30, [sp, #-16]! + mov x29, sp - ld1 {v4.16b}, [x24] + ld1 {v4.16b}, [x6] cbz w7, .Lxtsdecnotfirst enc_prepare w3, x5, x8 @@ -407,17 +346,15 @@ AES_ENTRY(aes_xts_decrypt) ldr q7, .Lxts_mul_x b .LxtsdecNx -.Lxtsdecrestart: - ld1 {v4.16b}, [x24] .Lxtsdecnotfirst: - dec_prepare w22, x21, x8 + dec_prepare w3, x2, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsdecNx: - subs w23, w23, #4 + subs w4, w4, #4 bmi .Lxtsdec1x - ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -430,28 +367,26 @@ AES_ENTRY(aes_xts_decrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v0.16b-v3.16b}, [x0], #64 mov v4.16b, v7.16b - cbz w23, .Lxtsdecout - st1 {v4.16b}, [x24] - cond_yield_neon .Lxtsdecrestart + cbz w4, .Lxtsdecout b .LxtsdecloopNx .Lxtsdec1x: - adds w23, w23, #4 + adds w4, w4, #4 beq .Lxtsdecout .Lxtsdecloop: - ld1 {v1.16b}, [x20], #16 + ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w22, x21, x8, w7 + decrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x19], #16 - subs w23, w23, #1 + st1 {v0.16b}, [x0], #16 + subs w4, w4, #1 beq .Lxtsdecout next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: - st1 {v4.16b}, [x24] - frame_pop + st1 {v4.16b}, [x6] + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_xts_decrypt) From patchwork Mon Sep 10 14:41:14 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 146327 Delivered-To: patch@linaro.org Received: by 2002:a2e:1648:0:0:0:0:0 with SMTP id 8-v6csp2574280ljw; Mon, 10 Sep 2018 07:43:44 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZMAaKxd21Ap09+lQmG9BfJjOnBDnlShQWeWLcM3Y0lLZHb+2LdJXWJ62BnnpJ+6m3wfH4x X-Received: by 2002:a17:902:b03:: with SMTP id 3-v6mr22704325plq.156.1536590624631; Mon, 10 Sep 2018 07:43:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536590624; cv=none; d=google.com; s=arc-20160816; b=xUAvdSL53TuVM6Vl1Dq7dKFUlELpv/Vij4plAfgPlAG6k2ogCjx0ef/tKCxrQfZTaN Sfsp7SZiM5L1c2VxqZxMzraod7clB11IB+OB7fCrFyL4htfDn+6TkwzQGbShMtGRvTkd 0kaZC/hu/8sm2BUM7Z3lMhFBX2Mgf695VGojnGT99Y5OB/n6WOno0q+5coLnw0kUNS8s XC6CDW2Boip2kPnjVWgHUcxO9Dnn9D61JbP3WfzlKAHm/kciq3dQwK0zKbPldzATWjMh PiC4kKsaDNAeW+cLfMaASozBqLRKx8sU9iHcy2bZxhRGNAo5yUNE1uoT/7/5XUY3UzZv XE+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=QdQn+Jnqnhc3mKNK3m6fspLGagvSsd7tXLlRb+bffL4=; b=tGm7f4RTmnbVdvkt8PLzITiJH+2PUxpf3agHqOUfOlTgSyPS1GZm/NBg7UKPFqJbrM SDcICn7yI/8diOMY8gXdfNtucTPraJNRplg7gMGDbDW0Al2vW7JwdGM6qGmQ+9jnZZ5Y ffbTjAyTn7FpF8U+wqHbtXaaFJuGsn0WMuFB5jVDNUjU1fA7Qzs5w4dteCuWJ8yhT8/V jXJpFCQH5UmlxHw4KMQ+RnAH5KzIYWmmLUCMX717dQz57STWWyx3t7ZTN5j/cflg75QO 21LWtl7wzrT8D0F9loTOy6vEDaUzIoVl58MtwHUDqiPMqk5LgFs+scVDjkiI5eR2DA5w vAkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="Ji5w2j/x"; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t8-v6si15985186plo.319.2018.09.10.07.43.44; Mon, 10 Sep 2018 07:43:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="Ji5w2j/x"; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728439AbeIJTiI (ORCPT + 2 others); Mon, 10 Sep 2018 15:38:08 -0400 Received: from mail-ed1-f67.google.com ([209.85.208.67]:33114 "EHLO mail-ed1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728127AbeIJTiH (ORCPT ); Mon, 10 Sep 2018 15:38:07 -0400 Received: by mail-ed1-f67.google.com with SMTP id d8-v6so16746537edv.0 for ; Mon, 10 Sep 2018 07:43:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=QdQn+Jnqnhc3mKNK3m6fspLGagvSsd7tXLlRb+bffL4=; b=Ji5w2j/xNigRL1UUc/1hzcFLIYjdOAu/rCkuyoKfC/2uzCtgQvcu3o02E9fg6R78oF GCdteO7flD+PpRMA2wEBe5idscz2/fCplJp0ruEOYX7Yh0QcLxoFBQ6BDukTE9G5T8Fo co5GzmQt4Qj+eGW+NgtxmLCmZAWqFwb8JSlEA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=QdQn+Jnqnhc3mKNK3m6fspLGagvSsd7tXLlRb+bffL4=; b=ZF8KoV5+9pzRKzpqsjCFccGkqNWvF8Z6/9CBhy8386Mm8A3rpveRw2cAFPfcSX7l7n VTUvAZCj8R5UyQBycOdGNhHLFymcnWoW/FmdW/L0M/u7XZTXpkQ8F7meYo2BuSFCCDL6 VEygOV9Fi7WUNjwRbJa1FafTbKnADLH6l4Jm9/pASvG8Jnn/8MMiSAGwvLVHMkoH9qFh 1DpVt9ApEfpExYChDPQXSqmS9y5v/LWet+Vf9kK0P0grMF4sEmFaDaC4D1J2AFzhgq75 NPFeNMpd+saShvZTYXMmB4gCdeiM1VGzbxw+gwAATgYfcryI9llvBxN8z2p2CpLLukvK HnJA== X-Gm-Message-State: APzg51BoC/aNyH1r0d88iXq2znrvLXHD0WEiHNnWuu4KiAA1Ebe0KR6A dg4poF9528VK2Li8NGrLIte2WR3OzJabrKzh X-Received: by 2002:a50:c05a:: with SMTP id u26-v6mr24105733edd.107.1536590618170; Mon, 10 Sep 2018 07:43:38 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:37 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Eric Biggers , Theodore Ts'o , Steve Capper Subject: [PATCH 3/4] crypto: arm64/aes-blk - add support for CTS-CBC mode Date: Mon, 10 Sep 2018 16:41:14 +0200 Message-Id: <20180910144115.25727-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Currently, we rely on the generic CTS chaining mode wrapper to instantiate the cts(cbc(aes)) skcipher. Due to the high performance of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte), any overhead in the chaining mode layers is amplified, and so it pays off considerably to fold the CTS handling into the SIMD routines. On Cortex-A53, this results in a ~50% speedup for smaller input sizes. Signed-off-by: Ard Biesheuvel --- This patch supersedes '[RFC/RFT PATCH] crypto: arm64/aes-ce - add support for CTS-CBC mode' sent out last Saturday. Changes: - keep subreq and scatterlist in request ctx structure - optimize away second scatterwalk_ffwd() invocation when encrypting in-place - keep permute table in .rodata section - polish asm code (drop literal + offset reference, reorder insns) Raw performance numbers after the patch. arch/arm64/crypto/aes-glue.c | 165 ++++++++++++++++++++ arch/arm64/crypto/aes-modes.S | 79 +++++++++- 2 files changed, 243 insertions(+), 1 deletion(-) -- 2.18.0 Cortex-A53 @ 1 GHz BEFORE: testing speed of async cts(cbc(aes)) (cts(cbc-aes-ce)) encryption 0 (128 bit key, 16 byte blocks): 1407866 ops in 1 secs ( 22525856 bytes) 1 (128 bit key, 64 byte blocks): 466814 ops in 1 secs ( 29876096 bytes) 2 (128 bit key, 256 byte blocks): 401023 ops in 1 secs (102661888 bytes) 3 (128 bit key, 1024 byte blocks): 258238 ops in 1 secs (264435712 bytes) 4 (128 bit key, 8192 byte blocks): 57905 ops in 1 secs (474357760 bytes) 5 (192 bit key, 16 byte blocks): 1388333 ops in 1 secs ( 22213328 bytes) 6 (192 bit key, 64 byte blocks): 448595 ops in 1 secs ( 28710080 bytes) 7 (192 bit key, 256 byte blocks): 376951 ops in 1 secs ( 96499456 bytes) 8 (192 bit key, 1024 byte blocks): 231635 ops in 1 secs (237194240 bytes) 9 (192 bit key, 8192 byte blocks): 43345 ops in 1 secs (355082240 bytes) 10 (256 bit key, 16 byte blocks): 1370820 ops in 1 secs ( 21933120 bytes) 11 (256 bit key, 64 byte blocks): 452352 ops in 1 secs ( 28950528 bytes) 12 (256 bit key, 256 byte blocks): 376506 ops in 1 secs ( 96385536 bytes) 13 (256 bit key, 1024 byte blocks): 223219 ops in 1 secs (228576256 bytes) 14 (256 bit key, 8192 byte blocks): 44874 ops in 1 secs (367607808 bytes) testing speed of async cts(cbc(aes)) (cts(cbc-aes-ce)) decryption 0 (128 bit key, 16 byte blocks): 1402795 ops in 1 secs ( 22444720 bytes) 1 (128 bit key, 64 byte blocks): 403300 ops in 1 secs ( 25811200 bytes) 2 (128 bit key, 256 byte blocks): 367710 ops in 1 secs ( 94133760 bytes) 3 (128 bit key, 1024 byte blocks): 269118 ops in 1 secs (275576832 bytes) 4 (128 bit key, 8192 byte blocks): 74706 ops in 1 secs (611991552 bytes) 5 (192 bit key, 16 byte blocks): 1381390 ops in 1 secs ( 22102240 bytes) 6 (192 bit key, 64 byte blocks): 388555 ops in 1 secs ( 24867520 bytes) 7 (192 bit key, 256 byte blocks): 350282 ops in 1 secs ( 89672192 bytes) 8 (192 bit key, 1024 byte blocks): 251268 ops in 1 secs (257298432 bytes) 9 (192 bit key, 8192 byte blocks): 56535 ops in 1 secs (463134720 bytes) 10 (256 bit key, 16 byte blocks): 1364334 ops in 1 secs ( 21829344 bytes) 11 (256 bit key, 64 byte blocks): 392610 ops in 1 secs ( 25127040 bytes) 12 (256 bit key, 256 byte blocks): 351150 ops in 1 secs ( 89894400 bytes) 13 (256 bit key, 1024 byte blocks): 247455 ops in 1 secs (253393920 bytes) 14 (256 bit key, 8192 byte blocks): 62530 ops in 1 secs (512245760 bytes) AFTER: testing speed of async cts(cbc(aes)) (cts-cbc-aes-ce) encryption 0 (128 bit key, 16 byte blocks): 1380568 ops in 1 secs ( 22089088 bytes) 1 (128 bit key, 64 byte blocks): 692731 ops in 1 secs ( 44334784 bytes) 2 (128 bit key, 256 byte blocks): 556393 ops in 1 secs (142436608 bytes) 3 (128 bit key, 1024 byte blocks): 314635 ops in 1 secs (322186240 bytes) 4 (128 bit key, 8192 byte blocks): 57550 ops in 1 secs (471449600 bytes) 5 (192 bit key, 16 byte blocks): 1367027 ops in 1 secs ( 21872432 bytes) 6 (192 bit key, 64 byte blocks): 675058 ops in 1 secs ( 43203712 bytes) 7 (192 bit key, 256 byte blocks): 523177 ops in 1 secs (133933312 bytes) 8 (192 bit key, 1024 byte blocks): 279235 ops in 1 secs (285936640 bytes) 9 (192 bit key, 8192 byte blocks): 46316 ops in 1 secs (379420672 bytes) 10 (256 bit key, 16 byte blocks): 1353576 ops in 1 secs ( 21657216 bytes) 11 (256 bit key, 64 byte blocks): 664523 ops in 1 secs ( 42529472 bytes) 12 (256 bit key, 256 byte blocks): 508141 ops in 1 secs (130084096 bytes) 13 (256 bit key, 1024 byte blocks): 264386 ops in 1 secs (270731264 bytes) 14 (256 bit key, 8192 byte blocks): 47224 ops in 1 secs (386859008 bytes) testing speed of async cts(cbc(aes)) (cts-cbc-aes-ce) decryption 0 (128 bit key, 16 byte blocks): 1388553 ops in 1 secs ( 22216848 bytes) 1 (128 bit key, 64 byte blocks): 688402 ops in 1 secs ( 44057728 bytes) 2 (128 bit key, 256 byte blocks): 589268 ops in 1 secs (150852608 bytes) 3 (128 bit key, 1024 byte blocks): 372238 ops in 1 secs (381171712 bytes) 4 (128 bit key, 8192 byte blocks): 75691 ops in 1 secs (620060672 bytes) 5 (192 bit key, 16 byte blocks): 1366220 ops in 1 secs ( 21859520 bytes) 6 (192 bit key, 64 byte blocks): 666889 ops in 1 secs ( 42680896 bytes) 7 (192 bit key, 256 byte blocks): 561809 ops in 1 secs (143823104 bytes) 8 (192 bit key, 1024 byte blocks): 344117 ops in 1 secs (352375808 bytes) 9 (192 bit key, 8192 byte blocks): 63150 ops in 1 secs (517324800 bytes) 10 (256 bit key, 16 byte blocks): 1349266 ops in 1 secs ( 21588256 bytes) 11 (256 bit key, 64 byte blocks): 661056 ops in 1 secs ( 42307584 bytes) 12 (256 bit key, 256 byte blocks): 550261 ops in 1 secs (140866816 bytes) 13 (256 bit key, 1024 byte blocks): 332947 ops in 1 secs (340937728 bytes) 14 (256 bit key, 8192 byte blocks): 68759 ops in 1 secs (563273728 bytes) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 1c6934544c1f..26d2b0263ba6 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -31,6 +32,8 @@ #define aes_ecb_decrypt ce_aes_ecb_decrypt #define aes_cbc_encrypt ce_aes_cbc_encrypt #define aes_cbc_decrypt ce_aes_cbc_decrypt +#define aes_cbc_cts_encrypt ce_aes_cbc_cts_encrypt +#define aes_cbc_cts_decrypt ce_aes_cbc_cts_decrypt #define aes_ctr_encrypt ce_aes_ctr_encrypt #define aes_xts_encrypt ce_aes_xts_encrypt #define aes_xts_decrypt ce_aes_xts_decrypt @@ -45,6 +48,8 @@ MODULE_DESCRIPTION("AES-ECB/CBC/CTR/XTS using ARMv8 Crypto Extensions"); #define aes_ecb_decrypt neon_aes_ecb_decrypt #define aes_cbc_encrypt neon_aes_cbc_encrypt #define aes_cbc_decrypt neon_aes_cbc_decrypt +#define aes_cbc_cts_encrypt neon_aes_cbc_cts_encrypt +#define aes_cbc_cts_decrypt neon_aes_cbc_cts_decrypt #define aes_ctr_encrypt neon_aes_ctr_encrypt #define aes_xts_encrypt neon_aes_xts_encrypt #define aes_xts_decrypt neon_aes_xts_decrypt @@ -73,6 +78,11 @@ asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 iv[]); +asmlinkage void aes_cbc_cts_encrypt(u8 out[], u8 const in[], u32 const rk[], + int rounds, int bytes, u8 const iv[]); +asmlinkage void aes_cbc_cts_decrypt(u8 out[], u8 const in[], u32 const rk[], + int rounds, int bytes, u8 const iv[]); + asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u32 const rk[], int rounds, int blocks, u8 ctr[]); @@ -87,6 +97,12 @@ asmlinkage void aes_mac_update(u8 const in[], u32 const rk[], int rounds, int blocks, u8 dg[], int enc_before, int enc_after); +struct cts_cbc_req_ctx { + struct scatterlist sg_src[2]; + struct scatterlist sg_dst[2]; + struct skcipher_request subreq; +}; + struct crypto_aes_xts_ctx { struct crypto_aes_ctx key1; struct crypto_aes_ctx __aligned(8) key2; @@ -209,6 +225,136 @@ static int cbc_decrypt(struct skcipher_request *req) return err; } +static int cts_cbc_init_tfm(struct crypto_skcipher *tfm) +{ + crypto_skcipher_set_reqsize(tfm, sizeof(struct cts_cbc_req_ctx)); + return 0; +} + +static int cts_cbc_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct cts_cbc_req_ctx *rctx = skcipher_request_ctx(req); + int err, rounds = 6 + ctx->key_length / 4; + int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2; + struct scatterlist *src = req->src, *dst = req->dst; + struct skcipher_walk walk; + + skcipher_request_set_tfm(&rctx->subreq, tfm); + + if (req->cryptlen == AES_BLOCK_SIZE) + cbc_blocks = 1; + + if (cbc_blocks > 0) { + unsigned int blocks; + + skcipher_request_set_crypt(&rctx->subreq, req->src, req->dst, + cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); + aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_enc, rounds, blocks, walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, + walk.nbytes % AES_BLOCK_SIZE); + } + if (err) + return err; + + if (req->cryptlen == AES_BLOCK_SIZE) + return 0; + + dst = src = scatterwalk_ffwd(rctx->sg_src, req->src, + rctx->subreq.cryptlen); + if (req->dst != req->src) + dst = scatterwalk_ffwd(rctx->sg_dst, req->dst, + rctx->subreq.cryptlen); + } + + /* handle ciphertext stealing */ + skcipher_request_set_crypt(&rctx->subreq, src, dst, + req->cryptlen - cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + if (err) + return err; + + kernel_neon_begin(); + aes_cbc_cts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_enc, rounds, walk.nbytes, walk.iv); + kernel_neon_end(); + + return skcipher_walk_done(&walk, 0); +} + +static int cts_cbc_decrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); + struct cts_cbc_req_ctx *rctx = skcipher_request_ctx(req); + int err, rounds = 6 + ctx->key_length / 4; + int cbc_blocks = DIV_ROUND_UP(req->cryptlen, AES_BLOCK_SIZE) - 2; + struct scatterlist *src = req->src, *dst = req->dst; + struct skcipher_walk walk; + + skcipher_request_set_tfm(&rctx->subreq, tfm); + + if (req->cryptlen == AES_BLOCK_SIZE) + cbc_blocks = 1; + + if (cbc_blocks > 0) { + unsigned int blocks; + + skcipher_request_set_crypt(&rctx->subreq, req->src, req->dst, + cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); + aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_dec, rounds, blocks, walk.iv); + kernel_neon_end(); + err = skcipher_walk_done(&walk, + walk.nbytes % AES_BLOCK_SIZE); + } + if (err) + return err; + + if (req->cryptlen == AES_BLOCK_SIZE) + return 0; + + dst = src = scatterwalk_ffwd(rctx->sg_src, req->src, + rctx->subreq.cryptlen); + if (req->dst != req->src) + dst = scatterwalk_ffwd(rctx->sg_dst, req->dst, + rctx->subreq.cryptlen); + } + + /* handle ciphertext stealing */ + skcipher_request_set_crypt(&rctx->subreq, src, dst, + req->cryptlen - cbc_blocks * AES_BLOCK_SIZE, + req->iv); + + err = skcipher_walk_virt(&walk, &rctx->subreq, false); + if (err) + return err; + + kernel_neon_begin(); + aes_cbc_cts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, + ctx->key_dec, rounds, walk.nbytes, walk.iv); + kernel_neon_end(); + + return skcipher_walk_done(&walk, 0); +} + static int ctr_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); @@ -334,6 +480,25 @@ static struct skcipher_alg aes_algs[] = { { .setkey = skcipher_aes_setkey, .encrypt = cbc_encrypt, .decrypt = cbc_decrypt, +}, { + .base = { + .cra_name = "__cts(cbc(aes))", + .cra_driver_name = "__cts-cbc-aes-" MODE, + .cra_priority = PRIO, + .cra_flags = CRYPTO_ALG_INTERNAL, + .cra_blocksize = 1, + .cra_ctxsize = sizeof(struct crypto_aes_ctx), + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .walksize = 2 * AES_BLOCK_SIZE, + .setkey = skcipher_aes_setkey, + .encrypt = cts_cbc_encrypt, + .decrypt = cts_cbc_decrypt, + .init = cts_cbc_init_tfm, }, { .base = { .cra_name = "__ctr(aes)", diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 35632d11200f..82931fba53d2 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -170,6 +170,84 @@ AES_ENTRY(aes_cbc_decrypt) AES_ENDPROC(aes_cbc_decrypt) + /* + * aes_cbc_cts_encrypt(u8 out[], u8 const in[], u32 const rk[], + * int rounds, int bytes, u8 const iv[]) + * aes_cbc_cts_decrypt(u8 out[], u8 const in[], u32 const rk[], + * int rounds, int bytes, u8 const iv[]) + */ + +AES_ENTRY(aes_cbc_cts_encrypt) + adr_l x8, .Lcts_permute_table + sub x4, x4, #16 + add x9, x8, #32 + add x8, x8, x4 + sub x9, x9, x4 + ld1 {v3.16b}, [x8] + ld1 {v4.16b}, [x9] + + ld1 {v0.16b}, [x1], x4 /* overlapping loads */ + ld1 {v1.16b}, [x1] + + ld1 {v5.16b}, [x5] /* get iv */ + enc_prepare w3, x2, x6 + + eor v0.16b, v0.16b, v5.16b /* xor with iv */ + tbl v1.16b, {v1.16b}, v4.16b + encrypt_block v0, w3, x2, x6, w7 + + eor v1.16b, v1.16b, v0.16b + tbl v0.16b, {v0.16b}, v3.16b + encrypt_block v1, w3, x2, x6, w7 + + add x4, x0, x4 + st1 {v0.16b}, [x4] /* overlapping stores */ + st1 {v1.16b}, [x0] + ret +AES_ENDPROC(aes_cbc_cts_encrypt) + +AES_ENTRY(aes_cbc_cts_decrypt) + adr_l x8, .Lcts_permute_table + sub x4, x4, #16 + add x9, x8, #32 + add x8, x8, x4 + sub x9, x9, x4 + ld1 {v3.16b}, [x8] + ld1 {v4.16b}, [x9] + + ld1 {v0.16b}, [x1], x4 /* overlapping loads */ + ld1 {v1.16b}, [x1] + + ld1 {v5.16b}, [x5] /* get iv */ + dec_prepare w3, x2, x6 + + tbl v2.16b, {v1.16b}, v4.16b + decrypt_block v0, w3, x2, x6, w7 + eor v2.16b, v2.16b, v0.16b + + tbx v0.16b, {v1.16b}, v4.16b + tbl v2.16b, {v2.16b}, v3.16b + decrypt_block v0, w3, x2, x6, w7 + eor v0.16b, v0.16b, v5.16b /* xor with iv */ + + add x4, x0, x4 + st1 {v2.16b}, [x4] /* overlapping stores */ + st1 {v0.16b}, [x0] + ret +AES_ENDPROC(aes_cbc_cts_decrypt) + + .section ".rodata", "a" + .align 6 +.Lcts_permute_table: + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7 + .byte 0x8, 0x9, 0xa, 0xb, 0xc, 0xd, 0xe, 0xf + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .byte 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff + .previous + + /* * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, * int blocks, u8 ctr[]) @@ -253,7 +331,6 @@ AES_ENTRY(aes_ctr_encrypt) ins v4.d[0], x7 b .Lctrcarrydone AES_ENDPROC(aes_ctr_encrypt) - .ltorg /* From patchwork Mon Sep 10 14:41:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 146326 Delivered-To: patch@linaro.org Received: by 2002:a2e:1648:0:0:0:0:0 with SMTP id 8-v6csp2574255ljw; Mon, 10 Sep 2018 07:43:43 -0700 (PDT) X-Google-Smtp-Source: ANB0Vda6wdGUNfKbOqAoFZrCjphLrP2UAkiN7HcrubiVbdctK5t94GgDmnk5mQ9uZwQokj0KxrYe X-Received: by 2002:a65:668f:: with SMTP id b15-v6mr20585863pgw.426.1536590623474; Mon, 10 Sep 2018 07:43:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1536590623; cv=none; d=google.com; s=arc-20160816; b=GvrCg/CQ1H/NfPGBosPMAtMnsbkATg/GfrbDMeeoxF1Kq74Xpy9EzYWdKwPOWL3jJH emQT6CtG5MZRbbyQF64pJTieFKw3xmbnil2CqxTBa0nqUZjVL41JmQyf2ivql/iEMCmW UI2HqW4dA8rseRKEU3q4VTGdP2hUavjv1hvaT8cjVK2TKxxlMdxl1TJfQMMSPdFcaP25 R5E/tNTqGvXTa+u1D69fQtVc9/OBe+ZydudLEAahiY+BcO1Jjw1fQojBx4XaCzzGhDxq 7MfFTM61EVSSTkK897BpZvkSGIQuLeuLjO+LnaLtHxg5kbiOWc7RepaAUTjaYZItHyw/ qWBg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=taH3jmgsbe6hfa9nQbmElDrdFynmUDx/ZvpOv3cX+LY=; b=vZ6rP7PoOcD5NE7t8oTtg8sPqfTAvsXVMZpag78WCW1pQ0IzLJQJNCRsFCk9xFIso3 CYHzZa+UJF3Cxq2YIG9z7cG7pjf86OjKk3PAqxHPFNotXWkmzBiEB/ItPwVF5xd2RZoF Pe7MKHKMR3MMWp5fOChio9v2kePR83q7QaasECnYadG3NUEhW5PAXtQH/PcuU1qJIHSn p+TtWF41/OidXZk5FeEzH0TKL/vqLcEihyrOdDCpz+jSngrbssrYkOZBm2LtAvZRJ9iD M2xbwZGisRHp7IYmoPCTk1pOucKU5ldCYRPiVUMzhD6TZF1i4Q2VF6bqqaibUX5gD5wl Yonw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MphNqBrB; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t8-v6si15985186plo.319.2018.09.10.07.43.43; Mon, 10 Sep 2018 07:43:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MphNqBrB; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728416AbeIJTiH (ORCPT + 2 others); Mon, 10 Sep 2018 15:38:07 -0400 Received: from mail-ed1-f48.google.com ([209.85.208.48]:38761 "EHLO mail-ed1-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728439AbeIJTiG (ORCPT ); Mon, 10 Sep 2018 15:38:06 -0400 Received: by mail-ed1-f48.google.com with SMTP id h33-v6so16701285edb.5 for ; Mon, 10 Sep 2018 07:43:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=taH3jmgsbe6hfa9nQbmElDrdFynmUDx/ZvpOv3cX+LY=; b=MphNqBrB+3ihpLXpKNXKwodVTnQo3XyzqKSpovUW/gVC71mf7roYPOJLuM+0RFXhZG 0EObiDWWWgIcN3QhuW45aCbelYHT5VZlGJTOoet2VbUiaSniOer57HuYlje7iAZqcZGz w2TNl/GYGrk0h+QnYGVHQtaVUQm+j4357dNRw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=taH3jmgsbe6hfa9nQbmElDrdFynmUDx/ZvpOv3cX+LY=; b=BcMS9AQZ92+zG3G+TX4pbO/A8MEc36jcNmrwlBSolO8zfIcncZIWXKVtw7PCqH76pA MNR8WgGNJIG410DSIW9/pbpdqr7r+vpgOIdWcsCxFMqL7pafvneAHydoo1/n7W0pG8PZ lJVZzGXMuuqmnGgFZeFo2o76YKrotiuxJD5AKC+JZhg/rg7CO/vAw6GgWf2CYX3c2zwx LGQMg1bdTleuLocDUq6ztwdKFtWPbRf3gGarIc7JBli1gv2VTVNw9WaLYFJLpPXwZd7r wkrGXV1T1P1FVYLCYCGvuDiYDKnFQPvS3NkgNrB3I3hl25hT4l11R2kq5nt7ROen0Zo7 mfEQ== X-Gm-Message-State: APzg51DEcEIjDBihDZQX2lU6fJONAj8QPikixBDgxm54EqOkGcWDWTUG ZOSpjDxfEx7yt7eWsLvpYA9X3aDj9qsRqNvk X-Received: by 2002:a50:a267:: with SMTP id 94-v6mr24264780edl.189.1536590619233; Mon, 10 Sep 2018 07:43:39 -0700 (PDT) Received: from rev02.arnhem.chello.nl (dhcp-077-251-017-237.chello.nl. [77.251.17.237]) by smtp.gmail.com with ESMTPSA id d35-v6sm8279487eda.25.2018.09.10.07.43.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 10 Sep 2018 07:43:38 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Eric Biggers , Theodore Ts'o , Steve Capper Subject: [PATCH 4/4] crypto: arm64/aes-blk - improve XTS mask handling Date: Mon, 10 Sep 2018 16:41:15 +0200 Message-Id: <20180910144115.25727-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 In-Reply-To: <20180910144115.25727-1-ard.biesheuvel@linaro.org> References: <20180910144115.25727-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box. This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop. Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector. On Cortex-A53, this results in a ~4% speedup. Signed-off-by: Ard Biesheuvel --- Raw performance numbers after the patch. arch/arm64/crypto/aes-ce.S | 5 +++ arch/arm64/crypto/aes-modes.S | 40 ++++++++++---------- arch/arm64/crypto/aes-neon.S | 6 +++ 3 files changed, 32 insertions(+), 19 deletions(-) -- 2.18.0 Cortex-A53 @ 1 GHz BEFORE: testing speed of async xts(aes) (xts-aes-ce) encryption 0 (256 bit key, 16 byte blocks): 1338059 ops in 1 secs ( 21408944 bytes) 1 (256 bit key, 64 byte blocks): 1249191 ops in 1 secs ( 79948224 bytes) 2 (256 bit key, 256 byte blocks): 918979 ops in 1 secs (235258624 bytes) 3 (256 bit key, 1024 byte blocks): 456993 ops in 1 secs (467960832 bytes) 4 (256 bit key, 8192 byte blocks): 74937 ops in 1 secs (613883904 bytes) 5 (512 bit key, 16 byte blocks): 1269281 ops in 1 secs ( 20308496 bytes) 6 (512 bit key, 64 byte blocks): 1176362 ops in 1 secs ( 75287168 bytes) 7 (512 bit key, 256 byte blocks): 840553 ops in 1 secs (215181568 bytes) 8 (512 bit key, 1024 byte blocks): 400329 ops in 1 secs (409936896 bytes) 9 (512 bit key, 8192 byte blocks): 64268 ops in 1 secs (526483456 bytes) testing speed of async xts(aes) (xts-aes-ce) decryption 0 (256 bit key, 16 byte blocks): 1333819 ops in 1 secs ( 21341104 bytes) 1 (256 bit key, 64 byte blocks): 1239393 ops in 1 secs ( 79321152 bytes) 2 (256 bit key, 256 byte blocks): 913715 ops in 1 secs (233911040 bytes) 3 (256 bit key, 1024 byte blocks): 455176 ops in 1 secs (466100224 bytes) 4 (256 bit key, 8192 byte blocks): 74343 ops in 1 secs (609017856 bytes) 5 (512 bit key, 16 byte blocks): 1274941 ops in 1 secs ( 20399056 bytes) 6 (512 bit key, 64 byte blocks): 1182107 ops in 1 secs ( 75654848 bytes) 7 (512 bit key, 256 byte blocks): 844930 ops in 1 secs (216302080 bytes) 8 (512 bit key, 1024 byte blocks): 401614 ops in 1 secs (411252736 bytes) 9 (512 bit key, 8192 byte blocks): 63913 ops in 1 secs (523575296 bytes) AFTER: testing speed of async xts(aes) (xts-aes-ce) encryption 0 (256 bit key, 16 byte blocks): 1398063 ops in 1 secs ( 22369008 bytes) 1 (256 bit key, 64 byte blocks): 1302694 ops in 1 secs ( 83372416 bytes) 2 (256 bit key, 256 byte blocks): 951692 ops in 1 secs (243633152 bytes) 3 (256 bit key, 1024 byte blocks): 473198 ops in 1 secs (484554752 bytes) 4 (256 bit key, 8192 byte blocks): 77204 ops in 1 secs (632455168 bytes) 5 (512 bit key, 16 byte blocks): 1323582 ops in 1 secs ( 21177312 bytes) 6 (512 bit key, 64 byte blocks): 1222306 ops in 1 secs ( 78227584 bytes) 7 (512 bit key, 256 byte blocks): 871791 ops in 1 secs (223178496 bytes) 8 (512 bit key, 1024 byte blocks): 413557 ops in 1 secs (423482368 bytes) 9 (512 bit key, 8192 byte blocks): 66014 ops in 1 secs (540786688 bytes) testing speed of async xts(aes) (xts-aes-ce) decryption 0 (256 bit key, 16 byte blocks): 1399388 ops in 1 secs ( 22390208 bytes) 1 (256 bit key, 64 byte blocks): 1300861 ops in 1 secs ( 83255104 bytes) 2 (256 bit key, 256 byte blocks): 951950 ops in 1 secs (243699200 bytes) 3 (256 bit key, 1024 byte blocks): 473399 ops in 1 secs (484760576 bytes) 4 (256 bit key, 8192 byte blocks): 77168 ops in 1 secs (632160256 bytes) 5 (512 bit key, 16 byte blocks): 1317833 ops in 1 secs ( 21085328 bytes) 6 (512 bit key, 64 byte blocks): 1217145 ops in 1 secs ( 77897280 bytes) 7 (512 bit key, 256 byte blocks): 868323 ops in 1 secs (222290688 bytes) 8 (512 bit key, 1024 byte blocks): 412821 ops in 1 secs (422728704 bytes) 9 (512 bit key, 8192 byte blocks): 65919 ops in 1 secs (540008448 bytes) diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 623e74ed1c67..143070510809 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -17,6 +17,11 @@ .arch armv8-a+crypto + xtsmask .req v16 + + .macro xts_reload_mask, tmp + .endm + /* preload all round keys */ .macro load_round_keys, rounds, rk cmp \rounds, #12 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 82931fba53d2..5c0fa7905d24 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -340,17 +340,19 @@ AES_ENDPROC(aes_ctr_encrypt) * int blocks, u8 const rk2[], u8 iv[], int first) */ - .macro next_tweak, out, in, const, tmp + .macro next_tweak, out, in, tmp sshr \tmp\().2d, \in\().2d, #63 - and \tmp\().16b, \tmp\().16b, \const\().16b + and \tmp\().16b, \tmp\().16b, xtsmask.16b add \out\().2d, \in\().2d, \in\().2d ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8 eor \out\().16b, \out\().16b, \tmp\().16b .endm -.Lxts_mul_x: -CPU_LE( .quad 1, 0x87 ) -CPU_BE( .quad 0x87, 1 ) + .macro xts_load_mask, tmp + movi xtsmask.2s, #0x1 + movi \tmp\().2s, #0x87 + uzp1 xtsmask.4s, xtsmask.4s, \tmp\().4s + .endm AES_ENTRY(aes_xts_encrypt) stp x29, x30, [sp, #-16]! @@ -362,24 +364,24 @@ AES_ENTRY(aes_xts_encrypt) enc_prepare w3, x5, x8 encrypt_block v4, w3, x5, x8, w7 /* first tweak */ enc_switch_key w3, x2, x8 - ldr q7, .Lxts_mul_x + xts_load_mask v8 b .LxtsencNx .Lxtsencnotfirst: enc_prepare w3, x2, x8 .LxtsencloopNx: - ldr q7, .Lxts_mul_x - next_tweak v4, v4, v7, v8 + xts_reload_mask v8 + next_tweak v4, v4, v8 .LxtsencNx: subs w4, w4, #4 bmi .Lxtsenc1x ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ - next_tweak v5, v4, v7, v8 + next_tweak v5, v4, v8 eor v0.16b, v0.16b, v4.16b - next_tweak v6, v5, v7, v8 + next_tweak v6, v5, v8 eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - next_tweak v7, v6, v7, v8 + next_tweak v7, v6, v8 eor v3.16b, v3.16b, v7.16b bl aes_encrypt_block4x eor v3.16b, v3.16b, v7.16b @@ -401,7 +403,7 @@ AES_ENTRY(aes_xts_encrypt) st1 {v0.16b}, [x0], #16 subs w4, w4, #1 beq .Lxtsencout - next_tweak v4, v4, v7, v8 + next_tweak v4, v4, v8 b .Lxtsencloop .Lxtsencout: st1 {v4.16b}, [x6] @@ -420,24 +422,24 @@ AES_ENTRY(aes_xts_decrypt) enc_prepare w3, x5, x8 encrypt_block v4, w3, x5, x8, w7 /* first tweak */ dec_prepare w3, x2, x8 - ldr q7, .Lxts_mul_x + xts_load_mask v8 b .LxtsdecNx .Lxtsdecnotfirst: dec_prepare w3, x2, x8 .LxtsdecloopNx: - ldr q7, .Lxts_mul_x - next_tweak v4, v4, v7, v8 + xts_reload_mask v8 + next_tweak v4, v4, v8 .LxtsdecNx: subs w4, w4, #4 bmi .Lxtsdec1x ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ - next_tweak v5, v4, v7, v8 + next_tweak v5, v4, v8 eor v0.16b, v0.16b, v4.16b - next_tweak v6, v5, v7, v8 + next_tweak v6, v5, v8 eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - next_tweak v7, v6, v7, v8 + next_tweak v7, v6, v8 eor v3.16b, v3.16b, v7.16b bl aes_decrypt_block4x eor v3.16b, v3.16b, v7.16b @@ -459,7 +461,7 @@ AES_ENTRY(aes_xts_decrypt) st1 {v0.16b}, [x0], #16 subs w4, w4, #1 beq .Lxtsdecout - next_tweak v4, v4, v7, v8 + next_tweak v4, v4, v8 b .Lxtsdecloop .Lxtsdecout: st1 {v4.16b}, [x6] diff --git a/arch/arm64/crypto/aes-neon.S b/arch/arm64/crypto/aes-neon.S index 1c7b45b7268e..29100f692e8a 100644 --- a/arch/arm64/crypto/aes-neon.S +++ b/arch/arm64/crypto/aes-neon.S @@ -14,6 +14,12 @@ #define AES_ENTRY(func) ENTRY(neon_ ## func) #define AES_ENDPROC(func) ENDPROC(neon_ ## func) + xtsmask .req v7 + + .macro xts_reload_mask, tmp + xts_load_mask \tmp + .endm + /* multiply by polynomial 'x' in GF(2^8) */ .macro mul_by_x, out, in, temp, const sshr \temp, \in, #7