From patchwork Mon Aug 19 14:15:00 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Ard Biesheuvel <ard.biesheuvel@linaro.org>
X-Patchwork-Id: 171629
Delivered-To: patch@linaro.org
Received: by 2002:a92:d204:0:0:0:0:0 with SMTP id y4csp3042234ily;
 Mon, 19 Aug 2019 07:15:04 -0700 (PDT)
X-Google-Smtp-Source: APXvYqy9S/Xe8DlrghIYyUQDXK4JNhumI+yZI/VHFU2eVM+T1zNFVgnyqxKCIZ0W5AuqSg6fVUMv
X-Received: by 2002:a62:1a45:: with SMTP id
 a66mr24891415pfa.142.1566224104595; 
 Mon, 19 Aug 2019 07:15:04 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1566224104; cv=none;
 d=google.com; s=arc-20160816;
 b=1F1qAn6JdfdR+Ol1SkTV9LEN2RqunpybO7cyEY+OV7VSBKHQl1IgnUHPW1r6Na+B1x
 WRrglbbUfc/CWHvFRe5t3Dk55+doXyhVS4v/ndNz+cGl37wL4QefEaXJQ2qOhcmoocC4
 bPAG9UmEt4ggHhf8vWenQAFx88CEhz8z8Rhtcm06RQ9nSoUoaKm38+kWNEZMMV99RgZJ
 4HeIAka6s90JOpKb70+xuK7RgKAe0HTXT53pF3In7FIHVOtV6BsjZt9VuTo59yh/Wc9+
 oAnkYw+Es+dpV4IFcbQVMc+hscWACmkATFI3+O/Kr2/yHadbR/Sn+YHPK/rrk8NwamWp
 i5pA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816; 
 h=list-id:precedence:sender:message-id:date:subject:cc:to:from
 :dkim-signature;
 bh=0rdXhafMVjaKUUvPLb/wKqjrVAJEWD02wI54kwgy7/8=;
 b=gD7XbgmnqrNT9VLQwLJMxuNIqitDrz4HKp/RJQOHVlIrFkWqQNOK8bHvHJoryIPibb
 CSNndyF5SaSJ1XJXSVjKBDv+hfda2hWbfCFjos6i9aAwBZojPn/j0dPmqJ80TJlpd/wH
 E0mts7PZSwE2CykPlKjCrC1y0JvTDU3LQzEEYMlqwiDqeEWWe8DwbnvSm/xSqsHHMr3S
 pRwSeLwsi1Jv+CPgnFlNv8wf+AZSLnMfkions9KVnfSP1q3jdxOgKmEFDR03HgnGfZd8
 j63hqSVVDy0/jEDH3+ZbBYYGqjx20OBukJZ4nuZZe85VNEiuuwL90DhCgbvGNVTFMqjU
 5AOA==
ARC-Authentication-Results: i=1; mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=aTrG2WNj;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Return-Path: <linux-crypto-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id 91si9867281plh.315.2019.08.19.07.15.04;
 Mon, 19 Aug 2019 07:15:04 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com;
 dkim=pass header.i=@linaro.org header.s=google header.b=aTrG2WNj;
 spf=pass (google.com: best guess record for domain of
 linux-crypto-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-crypto-owner@vger.kernel.org; 
 dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1726736AbfHSOPD (ORCPT <rfc822;sumit.garg@linaro.org>
 + 3 others); Mon, 19 Aug 2019 10:15:03 -0400
Received: from mail-wm1-f67.google.com ([209.85.128.67]:51186 "EHLO
 mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1725536AbfHSOPD (ORCPT
 <rfc822;linux-crypto@vger.kernel.org>);
 Mon, 19 Aug 2019 10:15:03 -0400
Received: by mail-wm1-f67.google.com with SMTP id v15so1781678wml.0
 for <linux-crypto@vger.kernel.org>;
 Mon, 19 Aug 2019 07:15:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=from:to:cc:subject:date:message-id;
 bh=0rdXhafMVjaKUUvPLb/wKqjrVAJEWD02wI54kwgy7/8=;
 b=aTrG2WNj4nhYSYaFtYaZJ5Gsrrt6i2JczZV9rzkdjXnbsRrhP2s+JDcrtiCgg5Wbop
 YYtQpjiwCviXq702LJ79A+n13Ai2eTRWqysoNdl0CCpYA68Hjcsa9kviUPizw6wKy4mG
 rSXbgXK9u+/I4IIEiNw+o1J0Jh0cfSbZ9Gvh8fep9fPnZEenLCdObs9S8/6aCfijv/qF
 91I4rP/NpPlP3wEBJ75eSaIk/6sWf95vKG96WlXKuFmplZn53xVdriOxWGOlgRhX2MwZ
 kl1/u+APeI9GnC2tQuvrcO0hb1N9kazdA81wYVRW3oS7d8vnFRsnGEy0ben8YR+Uw8YV
 BDPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:from:to:cc:subject:date:message-id;
 bh=0rdXhafMVjaKUUvPLb/wKqjrVAJEWD02wI54kwgy7/8=;
 b=QkOyiq3AVeW+QH0TITouUI2SUAEYWQTF/BYHkehCr/WpnGc9Hs6zoPIcTsAS5SoI/N
 wkCudbdx+AhH8mZS8tDCg02jxmpSItAcksGKUBnhbl6dwWx4fUSOh5yQRxhSafW4e2iv
 2eDIKMp5mOshsPW8q82rqwwL0e/P6g2r9PfmK7nB09Vt7r34o6IkppDtM+WMhMDx9eD/
 2/Fh5Hqduy94xyvciPKGshT3pWojhuz8knUIh1PtpbN6/gmkHC1Xud4GS1TVqYh3MzkX
 hlGleFVU6jQDMT3GGCooEdt5F2J317E7j4Ed+1qttuuBOkM3vNtVQo60NrfHOdAkWm2K
 yz2g==
X-Gm-Message-State: APjAAAVTz5uozqwwplPaGQGjFDkDii1kLCWNVtiYZdCcRuWxWgbFgB9E
 7f3yCn5zC45lu9OXScXuzIkyn3Fz/rkfWQ==
X-Received: by 2002:a05:600c:24d0:: with SMTP id
 16mr20217200wmu.83.1566224101068; 
 Mon, 19 Aug 2019 07:15:01 -0700 (PDT)
Received: from localhost.localdomain (11.172.185.81.rev.sfr.net.
 [81.185.172.11]) by smtp.gmail.com with ESMTPSA id
 r4sm11604141wrq.82.2019.08.19.07.14.58
 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Mon, 19 Aug 2019 07:15:00 -0700 (PDT)
From: Ard Biesheuvel <ard.biesheuvel@linaro.org>
To: linux-crypto@vger.kernel.org
Cc: herbert@gondor.apana.org.au, natechancellor@gmail.com,
 Ard Biesheuvel <ard.biesheuvel@linaro.org>
Subject: [PATCH] crypto: arm64/aegis128 - use explicit vector load for
 permute vectors
Date: Mon, 19 Aug 2019 17:15:00 +0300
Message-Id: <20190819141500.1070-1-ard.biesheuvel@linaro.org>
X-Mailer: git-send-email 2.17.1
Sender: linux-crypto-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-crypto.vger.kernel.org>
X-Mailing-List: linux-crypto@vger.kernel.org

When building the new aegis128 NEON code in big endian mode, Clang
complains about the const uint8x16_t permute vectors in the following
way:

  crypto/aegis128-neon-inner.c:58:40: warning: vector initializers are not
      compatible with NEON intrinsics in big endian mode
      [-Wnonportable-vector-initialization]
                static const uint8x16_t shift_rows = {
                                                     ^
  crypto/aegis128-neon-inner.c:58:40: note: consider using vld1q_u8() to
      initialize a vector from memory, or vcombine_u8(vcreate_u8(), vcreate_u8())
      to initialize from integer constants

Since the same issue applies to the uint8x16x4_t loads of the AES Sbox,
update those references as well. However, since GCC does not implement
the vld1q_u8_x4() intrinsic, switch from IS_ENABLED() to a preprocessor
conditional to conditionally include this code.

Reported-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 crypto/aegis128-neon-inner.c | 38 ++++++++++----------
 1 file changed, 19 insertions(+), 19 deletions(-)

-- 
2.17.1
Tested-by: Nathan Chancellor <natechancellor@gmail.com>

diff --git a/crypto/aegis128-neon-inner.c b/crypto/aegis128-neon-inner.c
index ed55568afd1b..f05310ca22aa 100644
--- a/crypto/aegis128-neon-inner.c
+++ b/crypto/aegis128-neon-inner.c
@@ -26,7 +26,7 @@ struct aegis128_state {
 	uint8x16_t v[5];
 };
 
-extern const uint8x16x4_t crypto_aes_sbox[];
+extern const uint8_t crypto_aes_sbox[];
 
 static struct aegis128_state aegis128_load_state_neon(const void *state)
 {
@@ -55,39 +55,39 @@ uint8x16_t aegis_aes_round(uint8x16_t w)
 
 #ifdef CONFIG_ARM64
 	if (!__builtin_expect(aegis128_have_aes_insn, 1)) {
-		static const uint8x16_t shift_rows = {
+		static const uint8_t shift_rows[] = {
 			0x0, 0x5, 0xa, 0xf, 0x4, 0x9, 0xe, 0x3,
 			0x8, 0xd, 0x2, 0x7, 0xc, 0x1, 0x6, 0xb,
 		};
-		static const uint8x16_t ror32by8 = {
+		static const uint8_t ror32by8[] = {
 			0x1, 0x2, 0x3, 0x0, 0x5, 0x6, 0x7, 0x4,
 			0x9, 0xa, 0xb, 0x8, 0xd, 0xe, 0xf, 0xc,
 		};
 		uint8x16_t v;
 
 		// shift rows
-		w = vqtbl1q_u8(w, shift_rows);
+		w = vqtbl1q_u8(w, vld1q_u8(shift_rows));
 
 		// sub bytes
-		if (!IS_ENABLED(CONFIG_CC_IS_GCC)) {
-			v = vqtbl4q_u8(crypto_aes_sbox[0], w);
-			v = vqtbx4q_u8(v, crypto_aes_sbox[1], w - 0x40);
-			v = vqtbx4q_u8(v, crypto_aes_sbox[2], w - 0x80);
-			v = vqtbx4q_u8(v, crypto_aes_sbox[3], w - 0xc0);
-		} else {
-			asm("tbl %0.16b, {v16.16b-v19.16b}, %1.16b" : "=w"(v) : "w"(w));
-			w -= 0x40;
-			asm("tbx %0.16b, {v20.16b-v23.16b}, %1.16b" : "+w"(v) : "w"(w));
-			w -= 0x40;
-			asm("tbx %0.16b, {v24.16b-v27.16b}, %1.16b" : "+w"(v) : "w"(w));
-			w -= 0x40;
-			asm("tbx %0.16b, {v28.16b-v31.16b}, %1.16b" : "+w"(v) : "w"(w));
-		}
+#ifndef CONFIG_CC_IS_GCC
+		v = vqtbl4q_u8(vld1q_u8_x4(crypto_aes_sbox), w);
+		v = vqtbx4q_u8(v, vld1q_u8_x4(crypto_aes_sbox + 0x40), w - 0x40);
+		v = vqtbx4q_u8(v, vld1q_u8_x4(crypto_aes_sbox + 0x80), w - 0x80);
+		v = vqtbx4q_u8(v, vld1q_u8_x4(crypto_aes_sbox + 0xc0), w - 0xc0);
+#else
+		asm("tbl %0.16b, {v16.16b-v19.16b}, %1.16b" : "=w"(v) : "w"(w));
+		w -= 0x40;
+		asm("tbx %0.16b, {v20.16b-v23.16b}, %1.16b" : "+w"(v) : "w"(w));
+		w -= 0x40;
+		asm("tbx %0.16b, {v24.16b-v27.16b}, %1.16b" : "+w"(v) : "w"(w));
+		w -= 0x40;
+		asm("tbx %0.16b, {v28.16b-v31.16b}, %1.16b" : "+w"(v) : "w"(w));
+#endif
 
 		// mix columns
 		w = (v << 1) ^ (uint8x16_t)(((int8x16_t)v >> 7) & 0x1b);
 		w ^= (uint8x16_t)vrev32q_u16((uint16x8_t)v);
-		w ^= vqtbl1q_u8(v ^ w, ror32by8);
+		w ^= vqtbl1q_u8(v ^ w, vld1q_u8(ror32by8));
 
 		return w;
 	}