From patchwork Fri Jan 19 12:04:34 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 125154 Delivered-To: patch@linaro.org Received: by 10.46.66.141 with SMTP id h13csp259601ljf; Fri, 19 Jan 2018 04:05:19 -0800 (PST) X-Google-Smtp-Source: ACJfBos+qfCD3xER4Svvc8pkVuJwgIk4x/i6gyY+VB8IGued3sXWq/A83brYH5wnNcbo0AW9W7x/ X-Received: by 10.101.97.165 with SMTP id i5mr8449168pgv.55.1516363519751; Fri, 19 Jan 2018 04:05:19 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516363519; cv=none; d=google.com; s=arc-20160816; b=eRBxNGEuhFXxhgNpmTwMg5LnkHpr8yaRNj131I+ZB8cly7lTMFVU9rEJYdEp9nieJ6 1L5DXcEoftzuo8ooZ6uVZwj8xuS8Bly+rSoQV9ybq7ZynDXn8oV1GZtBwxw8UxntNIep HKv+KtjXzq2XaWT7z1KIlpgMD7lbUT2NALlF4KJOJtVJY29WJssSxcFqFEq8Yc0HwT9R gEPiFAmPkJXC66SGKQWROH/HMyDYVRLw5FQo9sXOIvRafgYtfh5DDL/wqIjWd2rBaeYi LYfMF/iSnfEX2EopA8u9y1aG1cFa7b9X/WHcdRvn46n7j95KJeb0LsI9VvLVfY4cSi7E s/uQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=jDiA3YHzvd9DtMuAv8Pl57BeoHBoEBdl+hiFTougkRc=; b=bqQHBwGegWkCDNiFPfGbbPRFplsjhziEn+GEREXclYBZVS2a2j/szpjvLakt/L6lFb E+QV5B1n6QXR+g48DTIbFI4YvMbziFctQxRw6aS8aVarpAVTTFKeIJGgezOxY9uD2kqw ewWkXHzWVugSR8+yYF6tnPk1r6sqsOdn6pOb2dunnToGV2KdYNgdSxgxx+9BRkyC50Tk X/BJcHMY/HsQ3u126T5X57sqtxKMgdzaX/J2SruJtSQwwrpdJUk1mS6ZZ3Vy2nTPKtk9 qto9ttT8aPJXnNBvtQmgh1MzETN/veRMBfyxRmvblVKQfYyxtGy4SkcG1wROr6xGi1Td cnAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VJSI5+q1; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y2si5314051pgq.384.2018.01.19.04.05.19; Fri, 19 Jan 2018 04:05:19 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=VJSI5+q1; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754440AbeASMFS (ORCPT + 1 other); Fri, 19 Jan 2018 07:05:18 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:37268 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754232AbeASMFS (ORCPT ); Fri, 19 Jan 2018 07:05:18 -0500 Received: by mail-wm0-f67.google.com with SMTP id v71so3079542wmv.2 for ; Fri, 19 Jan 2018 04:05:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=jDiA3YHzvd9DtMuAv8Pl57BeoHBoEBdl+hiFTougkRc=; b=VJSI5+q1oKmUpzH2QwaYY0L1ycnIcbBheSRmSaqfeRLGt4zZf/yPj5hLh54uor1vzj YvE1I21wNnLS17LZZk2w3C8U+t+bHLc9trWleaEj3bUm+cU2Isw54B//W+Fbp5OiDM0P S3y5JygVff1OL6NQWm4uOdcPOUslnltOdtWRE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=jDiA3YHzvd9DtMuAv8Pl57BeoHBoEBdl+hiFTougkRc=; b=J5Rrzc9WPSuQioOaX/zY6pms1oPFchgWLjSp89k5DxzakYZ3irCUxYLdnR/GTu6fOP s38MpGxoT9vLAA7K1GIhzJ/LAySkfoP84uDq1stcJsM35iUBlg9mG+YzQIbdoPCDcqof 2z6sYUV+CjeAukntw1T9LW9BfHbrT/W3OirSkyhxoX2wOYR0A+Rrshx7fItdsEkoWfkQ 76/TnIY6KGNlo6xeF2kicwVXBMRZ9N5GhcU0i6W4t4F9tAtCSfJxhmNLRp28A3DJR0Fb VqhJlgGXpIg1+5rFlACZq5iT37TaWa2Dwn1xL2+KLTL+u6TOL49Ki8uPHFXFoQW6T1Av W+/Q== X-Gm-Message-State: AKwxytfoiyOIr8NENdmrJfxGWqxDIRWDF2IMBVUzSo8nvYZu/zaXWVv8 99I8OEY1WzTjxoNGIlfxibDn3TXRIIw= X-Received: by 10.28.232.131 with SMTP id f3mr7978778wmi.69.1516363516487; Fri, 19 Jan 2018 04:05:16 -0800 (PST) Received: from localhost.localdomain ([160.170.62.40]) by smtp.gmail.com with ESMTPSA id 127sm1039149wmk.14.2018.01.19.04.05.14 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 19 Jan 2018 04:05:15 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org, herbert@gondor.apana.org.au, Ard Biesheuvel Subject: [PATCH 2/8] crypto/generic: sha3: rewrite KECCAK transform to help the compiler optimize Date: Fri, 19 Jan 2018 12:04:34 +0000 Message-Id: <20180119120440.31556-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20180119120440.31556-1-ard.biesheuvel@linaro.org> References: <20180119120440.31556-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org The way the KECCAK transform is currently coded involves many references into the state array using indexes that are calculated at runtime using simple but non-trivial arithmetic. This forces the compiler to treat the state matrix as an array in memory rather than keep it in registers, which results in poor performance. So instead, let's rephrase the algorithm using fixed array indexes only. This helps the compiler keep the state matrix in registers, resulting in the following speedup (SHA3-256 performance in cycles per byte): before after speedup Intel Core i7 @ 2.0 GHz (2.9 turbo) 100.6 35.7 2.8x Cortex-A57 @ 2.0 GHz (64-bit mode) 101.6 12.7 8.0x Cortex-A53 @ 1.0 GHz 224.4 15.8 14.2x Cortex-A57 @ 2.0 GHz (32-bit mode) 201.8 63.0 3.2x Signed-off-by: Ard Biesheuvel --- crypto/sha3_generic.c | 134 ++++++++++++++------ 1 file changed, 96 insertions(+), 38 deletions(-) -- 2.11.0 diff --git a/crypto/sha3_generic.c b/crypto/sha3_generic.c index a68be626017c..5fecb609e3be 100644 --- a/crypto/sha3_generic.c +++ b/crypto/sha3_generic.c @@ -5,6 +5,7 @@ * http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.202.pdf * * SHA-3 code by Jeff Garzik + * Ard Biesheuvel * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the Free @@ -22,8 +23,6 @@ #define KECCAK_ROUNDS 24 -#define ROTL64(x, y) (((x) << (y)) | ((x) >> (64 - (y)))) - static const u64 keccakf_rndc[24] = { 0x0000000000000001ULL, 0x0000000000008082ULL, 0x800000000000808aULL, 0x8000000080008000ULL, 0x000000000000808bULL, 0x0000000080000001ULL, @@ -35,53 +34,112 @@ static const u64 keccakf_rndc[24] = { 0x8000000000008080ULL, 0x0000000080000001ULL, 0x8000000080008008ULL }; -static const int keccakf_rotc[24] = { - 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, 2, 14, - 27, 41, 56, 8, 25, 43, 62, 18, 39, 61, 20, 44 -}; - -static const int keccakf_piln[24] = { - 10, 7, 11, 17, 18, 3, 5, 16, 8, 21, 24, 4, - 15, 23, 19, 13, 12, 2, 20, 14, 22, 9, 6, 1 -}; - /* update the state with given number of rounds */ -static void keccakf(u64 st[25]) +static void __attribute__((__optimize__("O3"))) keccakf(u64 st[25]) { - int i, j, round; - u64 t, bc[5]; + u64 t[5], tt, bc[5]; + int round; for (round = 0; round < KECCAK_ROUNDS; round++) { /* Theta */ - for (i = 0; i < 5; i++) - bc[i] = st[i] ^ st[i + 5] ^ st[i + 10] ^ st[i + 15] - ^ st[i + 20]; - - for (i = 0; i < 5; i++) { - t = bc[(i + 4) % 5] ^ ROTL64(bc[(i + 1) % 5], 1); - for (j = 0; j < 25; j += 5) - st[j + i] ^= t; - } + bc[0] = st[0] ^ st[5] ^ st[10] ^ st[15] ^ st[20]; + bc[1] = st[1] ^ st[6] ^ st[11] ^ st[16] ^ st[21]; + bc[2] = st[2] ^ st[7] ^ st[12] ^ st[17] ^ st[22]; + bc[3] = st[3] ^ st[8] ^ st[13] ^ st[18] ^ st[23]; + bc[4] = st[4] ^ st[9] ^ st[14] ^ st[19] ^ st[24]; + + t[0] = bc[4] ^ rol64(bc[1], 1); + t[1] = bc[0] ^ rol64(bc[2], 1); + t[2] = bc[1] ^ rol64(bc[3], 1); + t[3] = bc[2] ^ rol64(bc[4], 1); + t[4] = bc[3] ^ rol64(bc[0], 1); + + st[0] ^= t[0]; /* Rho Pi */ - t = st[1]; - for (i = 0; i < 24; i++) { - j = keccakf_piln[i]; - bc[0] = st[j]; - st[j] = ROTL64(t, keccakf_rotc[i]); - t = bc[0]; - } + tt = st[1]; + st[ 1] = rol64(st[ 6] ^ t[1], 44); + st[ 6] = rol64(st[ 9] ^ t[4], 20); + st[ 9] = rol64(st[22] ^ t[2], 61); + st[22] = rol64(st[14] ^ t[4], 39); + st[14] = rol64(st[20] ^ t[0], 18); + st[20] = rol64(st[ 2] ^ t[2], 62); + st[ 2] = rol64(st[12] ^ t[2], 43); + st[12] = rol64(st[13] ^ t[3], 25); + st[13] = rol64(st[19] ^ t[4], 8); + st[19] = rol64(st[23] ^ t[3], 56); + st[23] = rol64(st[15] ^ t[0], 41); + st[15] = rol64(st[ 4] ^ t[4], 27); + st[ 4] = rol64(st[24] ^ t[4], 14); + st[24] = rol64(st[21] ^ t[1], 2); + st[21] = rol64(st[ 8] ^ t[3], 55); + st[ 8] = rol64(st[16] ^ t[1], 45); + st[16] = rol64(st[ 5] ^ t[0], 36); + st[ 5] = rol64(st[ 3] ^ t[3], 28); + st[ 3] = rol64(st[18] ^ t[3], 21); + st[18] = rol64(st[17] ^ t[2], 15); + st[17] = rol64(st[11] ^ t[1], 10); + st[11] = rol64(st[ 7] ^ t[2], 6); + st[ 7] = rol64(st[10] ^ t[0], 3); + st[10] = rol64( tt ^ t[1], 1); /* Chi */ - for (j = 0; j < 25; j += 5) { - for (i = 0; i < 5; i++) - bc[i] = st[j + i]; - for (i = 0; i < 5; i++) - st[j + i] ^= (~bc[(i + 1) % 5]) & - bc[(i + 2) % 5]; - } + bc[ 0] = ~st[ 1] & st[ 2]; + bc[ 1] = ~st[ 2] & st[ 3]; + bc[ 2] = ~st[ 3] & st[ 4]; + bc[ 3] = ~st[ 4] & st[ 0]; + bc[ 4] = ~st[ 0] & st[ 1]; + st[ 0] ^= bc[ 0]; + st[ 1] ^= bc[ 1]; + st[ 2] ^= bc[ 2]; + st[ 3] ^= bc[ 3]; + st[ 4] ^= bc[ 4]; + + bc[ 0] = ~st[ 6] & st[ 7]; + bc[ 1] = ~st[ 7] & st[ 8]; + bc[ 2] = ~st[ 8] & st[ 9]; + bc[ 3] = ~st[ 9] & st[ 5]; + bc[ 4] = ~st[ 5] & st[ 6]; + st[ 5] ^= bc[ 0]; + st[ 6] ^= bc[ 1]; + st[ 7] ^= bc[ 2]; + st[ 8] ^= bc[ 3]; + st[ 9] ^= bc[ 4]; + + bc[ 0] = ~st[11] & st[12]; + bc[ 1] = ~st[12] & st[13]; + bc[ 2] = ~st[13] & st[14]; + bc[ 3] = ~st[14] & st[10]; + bc[ 4] = ~st[10] & st[11]; + st[10] ^= bc[ 0]; + st[11] ^= bc[ 1]; + st[12] ^= bc[ 2]; + st[13] ^= bc[ 3]; + st[14] ^= bc[ 4]; + + bc[ 0] = ~st[16] & st[17]; + bc[ 1] = ~st[17] & st[18]; + bc[ 2] = ~st[18] & st[19]; + bc[ 3] = ~st[19] & st[15]; + bc[ 4] = ~st[15] & st[16]; + st[15] ^= bc[ 0]; + st[16] ^= bc[ 1]; + st[17] ^= bc[ 2]; + st[18] ^= bc[ 3]; + st[19] ^= bc[ 4]; + + bc[ 0] = ~st[21] & st[22]; + bc[ 1] = ~st[22] & st[23]; + bc[ 2] = ~st[23] & st[24]; + bc[ 3] = ~st[24] & st[20]; + bc[ 4] = ~st[20] & st[21]; + st[20] ^= bc[ 0]; + st[21] ^= bc[ 1]; + st[22] ^= bc[ 2]; + st[23] ^= bc[ 3]; + st[24] ^= bc[ 4]; /* Iota */ st[0] ^= keccakf_rndc[round];