From patchwork Thu Aug 23 16:48:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 144971 Delivered-To: patch@linaro.org Received: by 2002:a2e:1648:0:0:0:0:0 with SMTP id 8-v6csp349831ljw; Thu, 23 Aug 2018 09:48:53 -0700 (PDT) X-Google-Smtp-Source: AA+uWPwC1dr4ZOcjlVk/7Lih6Ok5dAVBsmrdO0aZuP6XEz5wsVM1kVsh8HbOxWkmMFzX2mj+cIxJ X-Received: by 2002:a62:9645:: with SMTP id c66-v6mr63894195pfe.56.1535042933511; Thu, 23 Aug 2018 09:48:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535042933; cv=none; d=google.com; s=arc-20160816; b=B6FnS6sqGAJouZ9aMhXQX8To27D0sK3AyfQOqofj0L0RwOUFbHa19F2HShxWG9BhhF gchEkEawzZN7CkJxbPJ5Y1ztfFwtIcuXlUvp0yZMauYrO/mRLIn7uJV5IFFDhcaxPi7m BQBuJ9LrcJAgrd7aKf3L0T9mQFZHVuzwDT9mPKweGkO17FvXtjwOy63N6KIKD+PYi6DN Ka4YikcocfDe/569UDAocU7kkz7iXGiGGweRsVQVSOVeP2hiWfqYzl9tVQ2lbOub9F+r 1pyCefAE47g7NXlvPdp6jW/hHf6ooK8HdekWxtj5D3R0YK5EzoI2qDCQzQBVHL2JbhxA LhEQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=HXprc1UMt3DU7pU+ISD0EBii8sWYj4A3IxG92gQ7Xgo=; b=CqcxiYk5Rcm3F0D2g7AIVCc9n3Bqg4TDxpGkSm6/ugG4yxYXgj532hqpKBBXeR5Fyc vy/EEDxCQ+o/JgjQleO5g+RmWheBXWsHxwMEWzVgrPOt4Y5PrLWjSYmhaivxhgQ2X7ak oyDqLRBoPkI5tuAoTUNfCRFXSHQx1Osp5jCn4iFau1duyHKwb3icgxwlnUUr7cyKBgRz 5CDyt+gJyDMc2nBhILj6jC/l2mnrbjEHFYuzq8PhI+w48j3p742FE2KkNAZ6kEOr9Ca4 cc0Ofg33aVVQdithvazl5zTmB6p7tcoRpCjM1ICxdZWJfJ/RAEkIW4l1hHtpKt89BQyX xenw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Ss1mQZU9; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d13-v6si4495736pll.337.2018.08.23.09.48.53; Thu, 23 Aug 2018 09:48:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Ss1mQZU9; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726156AbeHWUT1 (ORCPT + 2 others); Thu, 23 Aug 2018 16:19:27 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:34284 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726064AbeHWUT1 (ORCPT ); Thu, 23 Aug 2018 16:19:27 -0400 Received: by mail-wr1-f65.google.com with SMTP id g33-v6so5196424wrd.1 for ; Thu, 23 Aug 2018 09:48:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=HXprc1UMt3DU7pU+ISD0EBii8sWYj4A3IxG92gQ7Xgo=; b=Ss1mQZU9qS7nuFM6gsu9CmhFsnO0hIcAUSPVfLZ94Ndeb4U0aE7Mwd7u4gxgs5Oos7 GLzzmsQSVbcyyG2jaO1iq4YqJURoy8Qyv0A2PJ6QdcgFXmCS8dzF5cIjJAa9oiLQ+q+O JAKT1KeT0iWc6IxDB7fsyslC4DGEfmP2YcqQs= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=HXprc1UMt3DU7pU+ISD0EBii8sWYj4A3IxG92gQ7Xgo=; b=jhCjXkBuKHYfp4SZchNoPp4LPU7YDxq1vyVXCkD45WlX1U1u3Xn4+PBzYGGf2aZrul N4+SkG34HcjA/ZH3x8td1DM+sVe77850UfUQRfzm7LrxNw3ZCM2pof6WS+e8KVs4pkBM b2t8GoFCuAaKUo/QOJwbrNNzQfaKSh8l+tT05mY5tfrQT3MxxYw5mQi+JjdxLBhgELeY kHtW2B6tuXiAi06WrTDEx9gEFN9iZQaw5ABYs6MS8HvLMXUc2tHdtq52YHJkN2WR6BfH EN9vGsPzurLG/DL93nUq2ZTR+OMfttDts0WRjnQMgces2dLZ2ND2rPlGCF7ubbSD1Eqd DSHg== X-Gm-Message-State: APzg51C3p8c5RENy6wactgUYyaGN46N2Q1d3zrkjyDr6SnmX9AsHDXt3 660JOyNygx2crz9Cdhl9pGtNEONjeeEiaA== X-Received: by 2002:a5d:4a44:: with SMTP id v4-v6mr6892544wrs.278.1535042930556; Thu, 23 Aug 2018 09:48:50 -0700 (PDT) Received: from rev02.lan (cpc107249-cmbg18-2-0-cust143.5-4.cable.virginm.net. [80.3.80.144]) by smtp.gmail.com with ESMTPSA id d12-v6sm5391382wru.36.2018.08.23.09.48.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Aug 2018 09:48:49 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, will.deacon@arm.com, herbert@gondor.apana.org.au, Ard Biesheuvel , Nick Desaulniers Subject: [PATCH v2] crypto: arm64/aes-modes - get rid of literal load of addend vector Date: Thu, 23 Aug 2018 17:48:45 +0100 Message-Id: <20180823164845.20055-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.18.0 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Replace the literal load of the addend vector with a sequence that performs each add individually. This sequence is only 2 instructions longer than the original, and 2% faster on Cortex-A53. This is an improvement by itself, but also works around a Clang issue, whose integrated assembler does not implement the GNU ARM asm syntax completely, and does not support the =literal notation for FP registers (more info at https://bugs.llvm.org/show_bug.cgi?id=38642) Cc: Nick Desaulniers Signed-off-by: Ard Biesheuvel --- v2: replace convoluted code involving a SIMD add to increment four BE counters at once with individual add/rev/mov instructions arch/arm64/crypto/aes-modes.S | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) -- 2.18.0 Reviewed-by: Nick Desaulniers diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 483a7130cf0e..496c243de4ac 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -232,17 +232,19 @@ AES_ENTRY(aes_ctr_encrypt) bmi .Lctr1x cmn w6, #4 /* 32 bit overflow? */ bcs .Lctr1x - ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ - dup v7.4s, w6 + add w7, w6, #1 mov v0.16b, v4.16b - add v7.4s, v7.4s, v8.4s + add w8, w6, #2 mov v1.16b, v4.16b - rev32 v8.16b, v7.16b + add w9, w6, #3 mov v2.16b, v4.16b + rev w7, w7 mov v3.16b, v4.16b - mov v1.s[3], v8.s[0] - mov v2.s[3], v8.s[1] - mov v3.s[3], v8.s[2] + rev w8, w8 + mov v1.s[3], w7 + rev w9, w9 + mov v2.s[3], w8 + mov v3.s[3], w9 ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */ bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b