From patchwork Sat Jan 27 09:18:32 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 126050 Delivered-To: patch@linaro.org Received: by 10.46.84.92 with SMTP id y28csp921676ljd; Sat, 27 Jan 2018 01:18:51 -0800 (PST) X-Google-Smtp-Source: AH8x226/fyXmYGg0KMOQSSr/E1jCnFEtmxNl1dzvKQXjoCK1DT8aesHnp99pgC/E7RIFfqZKalKK X-Received: by 10.98.234.19 with SMTP id t19mr6861001pfh.74.1517044730899; Sat, 27 Jan 2018 01:18:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1517044730; cv=none; d=google.com; s=arc-20160816; b=oMJuIYrrpVce9tBIObsRvt5+84LLKEqEioh3nLp9O2FxrEck6dCaaHa8pMqjrP9puO BULUVibZSYPiUVW1ff9dIYlaaXzinbPywK5pjNcSV70ZIPJ1vnxdmbD8xKbJbk919oYC cgS+Dw98Lz2vgW1VvOA0Oll1UQFoGbHxdbjsIdnVo0lStpnNnIx/9wNvvzOCirsri2kW /Ic91OUWpSSMgHF9vRElkCNqeLb/qrKYna72SATv2nZaZVzOqS6nJ0/FlK+2IteJxHaU dTfAu/bJcI8+DSrq/M9ANLDeNUXtVQfqKH5O6PqnpDNUIJxU+0fI/uvqppRXr3Wc8XwD 2jAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :dkim-signature:arc-authentication-results; bh=UF9RZTl0FFM7kvLrk0zfZUkF4yxhxlzaSnQZAwmP5VU=; b=HdDXBULiSA51rwSNq1ABmKtMxzg24M+kj/6s+MrMGRhFuc98oH9x2AowZMoWMEvxrL 4gYJ4uNUuMwCFlL/piJsxOgKj8Oj4h3GVppp4CQnLv2YrAaJetTvTfxMxKCC6hW47d8K XrTwjC5FIRiH2cJOIlCN4CEJPc0Ii/iljw/z0jh3FMSG/nXOUk2o+CT99/Qr5MXoVev8 x86SfHGUn9VmyKoIy/moy9AwCdvz7xSyAtpg+H2oMFd6Ex4oeHku8dOsZZIw+oUngDpl TtE4eOzclLQz2QOND5C0hLdZ1/MzkBYRJQKPyxJarCeBHBD9p4+V/RJvZKgo8pMrZ+Kw 57ig== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LyondO8E; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n10-v6si4972526plk.255.2018.01.27.01.18.49; Sat, 27 Jan 2018 01:18:50 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=LyondO8E; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751516AbeA0JSq (ORCPT + 1 other); Sat, 27 Jan 2018 04:18:46 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:36936 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751899AbeA0JSn (ORCPT ); Sat, 27 Jan 2018 04:18:43 -0500 Received: by mail-wm0-f66.google.com with SMTP id v71so25255765wmv.2 for ; Sat, 27 Jan 2018 01:18:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=UF9RZTl0FFM7kvLrk0zfZUkF4yxhxlzaSnQZAwmP5VU=; b=LyondO8Ey4bjfJqk2vI0Qkryq9NnDJbcUqm+8C2qXlk9RFI+8dKyMix5BBtNlXZq4s vanNR51dxNXJPy++dTAEm7cV4PB4ZZEmqjzAgpXVWjlDOtNDls7h6bAqt9no/8jKAX9c fjHrvbhVHfW4gGNXlaXddazCYAMFNvriyjWOY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=UF9RZTl0FFM7kvLrk0zfZUkF4yxhxlzaSnQZAwmP5VU=; b=LJCuhLkjRRO9e9AAI0vua1QJ2jDC5ME9xC23hJnNDmu1n/nj6TXNHnVWazUhpH3OUw +7WzDLm9EZ57t2CY4I2WgFfbEWAhH23nskdMPiOqXOX7qQ6wlbInn/ukRolTN06PEbcc sp9erfmwB7Dc8f1k/njzrDej+gGszUfUMYtDTwrO2RAa0OToBxP+Eg0jjDxoIYXJN0DN k8SULf1Oq5AmlNoklpg1HNZsZJNAw8V7JAaxe/YFRfvY+gcRz/9CCVPne8yomLIRRroh 7pKBUxJeGw3iEL7+3jUCp6DQUy9egEKN/UUvcUBJSLLHlsDubnWeLu/MPzSOi1iR2WDF oKnw== X-Gm-Message-State: AKwxytcUZnty2K+qrORYh+Jfv1n3w0BgIEupXq2eJMDfpnkU7/alTnY6 Zz8pPGSEIUxo/N5b6w2DWraOxOXfdO0= X-Received: by 10.28.112.21 with SMTP id l21mr14089423wmc.70.1517044721582; Sat, 27 Jan 2018 01:18:41 -0800 (PST) Received: from localhost.localdomain ([160.167.127.168]) by smtp.gmail.com with ESMTPSA id 186sm7313977wmu.16.2018.01.27.01.18.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 27 Jan 2018 01:18:40 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: arnd@arndb.de, Ard Biesheuvel Subject: [PATCH] crypto/generic - sha3: deal with oversize stack frames Date: Sat, 27 Jan 2018 09:18:32 +0000 Message-Id: <20180127091832.24435-1-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org As reported by kbuild test robot, the optimized SHA3 C implementation compiles to mn10300 code that uses a disproportionate amount of stack space, i.e., crypto/sha3_generic.c: In function 'keccakf': crypto/sha3_generic.c:147:1: warning: the frame size of 1232 bytes is larger than 1024 bytes [-Wframe-larger-than=] As kindly diagnosed by Arnd, this does not only occur when building for the mn10300 architecture (which is what the report was about) but also for h8300, and builds for other 32-bit architectures show an increase in stack space utilization as well. Given that SHA3 operates on 64-bit quantities, and keeps a state matrix of 25 64-bit words, it is not surprising that 32-bit architectures with few general purpose registers are impacted the most by this, and it is therefore reasonable to implement a workaround that distinguishes between 32-bit and 64-bit architectures. Arnd figured out that taking the round calculation out of the loop, and inlining it explicitly but only on 64-bit architectures preserves most of the performance gain achieved by the rewrite, and also gets rid of the excessive use of stack space. Reported-by: kbuild test robot Suggested-by: Arnd Bergmann Signed-off-by: Ard Biesheuvel --- crypto/sha3_generic.c | 218 +++++++++++--------- 1 file changed, 118 insertions(+), 100 deletions(-) -- 2.11.0 diff --git a/crypto/sha3_generic.c b/crypto/sha3_generic.c index a965b9d80559..951c4eb70262 100644 --- a/crypto/sha3_generic.c +++ b/crypto/sha3_generic.c @@ -20,6 +20,20 @@ #include #include +/* + * On some 32-bit architectures (mn10300 and h8300), GCC ends up using + * over 1 KB of stack if we inline the round calculation into the loop + * in keccakf(). On the other hand, on 64-bit architectures with plenty + * of [64-bit wide] general purpose registers, not inlining it severely + * hurts performance. So let's use 64-bitness as a heuristic to decide + * whether to inline or not. + */ +#ifdef CONFIG_64BIT +#define SHA3_INLINE inline +#else +#define SHA3_INLINE noinline +#endif + #define KECCAK_ROUNDS 24 static const u64 keccakf_rndc[24] = { @@ -35,111 +49,115 @@ static const u64 keccakf_rndc[24] = { /* update the state with given number of rounds */ -static void __attribute__((__optimize__("O3"))) keccakf(u64 st[25]) +static SHA3_INLINE void keccakf_round(u64 st[25]) { u64 t[5], tt, bc[5]; - int round; - for (round = 0; round < KECCAK_ROUNDS; round++) { + /* Theta */ + bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15] ^ st[20]; + bc[1] = st[1] ^ st[6] ^ st[11] ^ st[16] ^ st[21]; + bc[2] = st[2] ^ st[7] ^ st[12] ^ st[17] ^ st[22]; + bc[3] = st[3] ^ st[8] ^ st[13] ^ st[18] ^ st[23]; + bc[4] = st[4] ^ st[9] ^ st[14] ^ st[19] ^ st[24]; + + t[0] = bc[4] ^ rol64(bc[1], 1); + t[1] = bc[0] ^ rol64(bc[2], 1); + t[2] = bc[1] ^ rol64(bc[3], 1); + t[3] = bc[2] ^ rol64(bc[4], 1); + t[4] = bc[3] ^ rol64(bc[0], 1); + + st[0] ^= t[0]; + + /* Rho Pi */ + tt = st[1]; + st[ 1] = rol64(st[ 6] ^ t[1], 44); + st[ 6] = rol64(st[ 9] ^ t[4], 20); + st[ 9] = rol64(st[22] ^ t[2], 61); + st[22] = rol64(st[14] ^ t[4], 39); + st[14] = rol64(st[20] ^ t[0], 18); + st[20] = rol64(st[ 2] ^ t[2], 62); + st[ 2] = rol64(st[12] ^ t[2], 43); + st[12] = rol64(st[13] ^ t[3], 25); + st[13] = rol64(st[19] ^ t[4], 8); + st[19] = rol64(st[23] ^ t[3], 56); + st[23] = rol64(st[15] ^ t[0], 41); + st[15] = rol64(st[ 4] ^ t[4], 27); + st[ 4] = rol64(st[24] ^ t[4], 14); + st[24] = rol64(st[21] ^ t[1], 2); + st[21] = rol64(st[ 8] ^ t[3], 55); + st[ 8] = rol64(st[16] ^ t[1], 45); + st[16] = rol64(st[ 5] ^ t[0], 36); + st[ 5] = rol64(st[ 3] ^ t[3], 28); + st[ 3] = rol64(st[18] ^ t[3], 21); + st[18] = rol64(st[17] ^ t[2], 15); + st[17] = rol64(st[11] ^ t[1], 10); + st[11] = rol64(st[ 7] ^ t[2], 6); + st[ 7] = rol64(st[10] ^ t[0], 3); + st[10] = rol64( tt ^ t[1], 1); + + /* Chi */ + bc[ 0] = ~st[ 1] & st[ 2]; + bc[ 1] = ~st[ 2] & st[ 3]; + bc[ 2] = ~st[ 3] & st[ 4]; + bc[ 3] = ~st[ 4] & st[ 0]; + bc[ 4] = ~st[ 0] & st[ 1]; + st[ 0] ^= bc[ 0]; + st[ 1] ^= bc[ 1]; + st[ 2] ^= bc[ 2]; + st[ 3] ^= bc[ 3]; + st[ 4] ^= bc[ 4]; + + bc[ 0] = ~st[ 6] & st[ 7]; + bc[ 1] = ~st[ 7] & st[ 8]; + bc[ 2] = ~st[ 8] & st[ 9]; + bc[ 3] = ~st[ 9] & st[ 5]; + bc[ 4] = ~st[ 5] & st[ 6]; + st[ 5] ^= bc[ 0]; + st[ 6] ^= bc[ 1]; + st[ 7] ^= bc[ 2]; + st[ 8] ^= bc[ 3]; + st[ 9] ^= bc[ 4]; + + bc[ 0] = ~st[11] & st[12]; + bc[ 1] = ~st[12] & st[13]; + bc[ 2] = ~st[13] & st[14]; + bc[ 3] = ~st[14] & st[10]; + bc[ 4] = ~st[10] & st[11]; + st[10] ^= bc[ 0]; + st[11] ^= bc[ 1]; + st[12] ^= bc[ 2]; + st[13] ^= bc[ 3]; + st[14] ^= bc[ 4]; + + bc[ 0] = ~st[16] & st[17]; + bc[ 1] = ~st[17] & st[18]; + bc[ 2] = ~st[18] & st[19]; + bc[ 3] = ~st[19] & st[15]; + bc[ 4] = ~st[15] & st[16]; + st[15] ^= bc[ 0]; + st[16] ^= bc[ 1]; + st[17] ^= bc[ 2]; + st[18] ^= bc[ 3]; + st[19] ^= bc[ 4]; + + bc[ 0] = ~st[21] & st[22]; + bc[ 1] = ~st[22] & st[23]; + bc[ 2] = ~st[23] & st[24]; + bc[ 3] = ~st[24] & st[20]; + bc[ 4] = ~st[20] & st[21]; + st[20] ^= bc[ 0]; + st[21] ^= bc[ 1]; + st[22] ^= bc[ 2]; + st[23] ^= bc[ 3]; + st[24] ^= bc[ 4]; +} - /* Theta */ - bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15] ^ st[20]; - bc[1] = st[1] ^ st[6] ^ st[11] ^ st[16] ^ st[21]; - bc[2] = st[2] ^ st[7] ^ st[12] ^ st[17] ^ st[22]; - bc[3] = st[3] ^ st[8] ^ st[13] ^ st[18] ^ st[23]; - bc[4] = st[4] ^ st[9] ^ st[14] ^ st[19] ^ st[24]; - - t[0] = bc[4] ^ rol64(bc[1], 1); - t[1] = bc[0] ^ rol64(bc[2], 1); - t[2] = bc[1] ^ rol64(bc[3], 1); - t[3] = bc[2] ^ rol64(bc[4], 1); - t[4] = bc[3] ^ rol64(bc[0], 1); - - st[0] ^= t[0]; - - /* Rho Pi */ - tt = st[1]; - st[ 1] = rol64(st[ 6] ^ t[1], 44); - st[ 6] = rol64(st[ 9] ^ t[4], 20); - st[ 9] = rol64(st[22] ^ t[2], 61); - st[22] = rol64(st[14] ^ t[4], 39); - st[14] = rol64(st[20] ^ t[0], 18); - st[20] = rol64(st[ 2] ^ t[2], 62); - st[ 2] = rol64(st[12] ^ t[2], 43); - st[12] = rol64(st[13] ^ t[3], 25); - st[13] = rol64(st[19] ^ t[4], 8); - st[19] = rol64(st[23] ^ t[3], 56); - st[23] = rol64(st[15] ^ t[0], 41); - st[15] = rol64(st[ 4] ^ t[4], 27); - st[ 4] = rol64(st[24] ^ t[4], 14); - st[24] = rol64(st[21] ^ t[1], 2); - st[21] = rol64(st[ 8] ^ t[3], 55); - st[ 8] = rol64(st[16] ^ t[1], 45); - st[16] = rol64(st[ 5] ^ t[0], 36); - st[ 5] = rol64(st[ 3] ^ t[3], 28); - st[ 3] = rol64(st[18] ^ t[3], 21); - st[18] = rol64(st[17] ^ t[2], 15); - st[17] = rol64(st[11] ^ t[1], 10); - st[11] = rol64(st[ 7] ^ t[2], 6); - st[ 7] = rol64(st[10] ^ t[0], 3); - st[10] = rol64( tt ^ t[1], 1); - - /* Chi */ - bc[ 0] = ~st[ 1] & st[ 2]; - bc[ 1] = ~st[ 2] & st[ 3]; - bc[ 2] = ~st[ 3] & st[ 4]; - bc[ 3] = ~st[ 4] & st[ 0]; - bc[ 4] = ~st[ 0] & st[ 1]; - st[ 0] ^= bc[ 0]; - st[ 1] ^= bc[ 1]; - st[ 2] ^= bc[ 2]; - st[ 3] ^= bc[ 3]; - st[ 4] ^= bc[ 4]; - - bc[ 0] = ~st[ 6] & st[ 7]; - bc[ 1] = ~st[ 7] & st[ 8]; - bc[ 2] = ~st[ 8] & st[ 9]; - bc[ 3] = ~st[ 9] & st[ 5]; - bc[ 4] = ~st[ 5] & st[ 6]; - st[ 5] ^= bc[ 0]; - st[ 6] ^= bc[ 1]; - st[ 7] ^= bc[ 2]; - st[ 8] ^= bc[ 3]; - st[ 9] ^= bc[ 4]; - - bc[ 0] = ~st[11] & st[12]; - bc[ 1] = ~st[12] & st[13]; - bc[ 2] = ~st[13] & st[14]; - bc[ 3] = ~st[14] & st[10]; - bc[ 4] = ~st[10] & st[11]; - st[10] ^= bc[ 0]; - st[11] ^= bc[ 1]; - st[12] ^= bc[ 2]; - st[13] ^= bc[ 3]; - st[14] ^= bc[ 4]; - - bc[ 0] = ~st[16] & st[17]; - bc[ 1] = ~st[17] & st[18]; - bc[ 2] = ~st[18] & st[19]; - bc[ 3] = ~st[19] & st[15]; - bc[ 4] = ~st[15] & st[16]; - st[15] ^= bc[ 0]; - st[16] ^= bc[ 1]; - st[17] ^= bc[ 2]; - st[18] ^= bc[ 3]; - st[19] ^= bc[ 4]; - - bc[ 0] = ~st[21] & st[22]; - bc[ 1] = ~st[22] & st[23]; - bc[ 2] = ~st[23] & st[24]; - bc[ 3] = ~st[24] & st[20]; - bc[ 4] = ~st[20] & st[21]; - st[20] ^= bc[ 0]; - st[21] ^= bc[ 1]; - st[22] ^= bc[ 2]; - st[23] ^= bc[ 3]; - st[24] ^= bc[ 4]; +static void __attribute__((__optimize__("O3"))) keccakf(u64 st[25]) +{ + int round; + for (round = 0; round < KECCAK_ROUNDS; round++) { + keccakf_round(st); /* Iota */ st[0] ^= keccakf_rndc[round]; }