From patchwork Sat Jan 28 23:33:33 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 92781 Delivered-To: patch@linaro.org Received: by 10.140.20.99 with SMTP id 90csp849412qgi; Sat, 28 Jan 2017 15:37:25 -0800 (PST) X-Received: by 10.98.51.70 with SMTP id z67mr16058842pfz.68.1485646645580; Sat, 28 Jan 2017 15:37:25 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [65.50.211.133]) by mx.google.com with ESMTPS id l127si5085833pga.348.2017.01.28.15.37.25 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 28 Jan 2017 15:37:25 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 65.50.211.133 as permitted sender) client-ip=65.50.211.133; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 65.50.211.133 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.87 #1 (Red Hat Linux)) id 1cXcYd-000170-6V; Sat, 28 Jan 2017 23:37:23 +0000 Received: from mail-wm0-f45.google.com ([74.125.82.45]) by bombadil.infradead.org with esmtps (Exim 4.87 #1 (Red Hat Linux)) id 1cXcWn-00087u-Aj for linux-arm-kernel@lists.infradead.org; Sat, 28 Jan 2017 23:35:32 +0000 Received: by mail-wm0-f45.google.com with SMTP id r126so35531630wmr.1 for ; Sat, 28 Jan 2017 15:35:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=FoKY3l6c5e1ndSbjSgS2QdGdO6J8AeZCDpIJk0IMYBQ=; b=HTsJBUVAwCiDt0Y5tbynXr2DuqkVghQaXG3WNy91ghYCom1Fd/suRi013wUzS+1yTC 938lyb5fkFTIjv13JHXRqc0dFhDZRvAalDUra5zy8ZLXf3AhhCgY3gTsPgFnu+53ifME ofKza/e6bgDojyEbLPGHgCBezMoqUAzTrnuv4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=FoKY3l6c5e1ndSbjSgS2QdGdO6J8AeZCDpIJk0IMYBQ=; b=t+whVemQUANr7NUTNJuSwGzb5hjddWipqY/8zx+kC/tuDI5wiWnvR7mmomdKobAqBc ptLiZfUx+zzyt3goLmKRWR3ozfVe0hUhAc0HJf3ls2tvu4ykxLfDaWVZ2ZAbHuQVyhEm VVS/3k4lrKrW6blI0ic9lq4KKLLZ7CzAXJUhXb2zOgyn6qg9qruaW3FHYlQ/xN10YdWU pCA3YT3RALwfDyliIOXmy3NRRnm/Ve9DP/CH+GKWYDpMotZOZogMvW/WrUqkbhE0bZXM okSMCOeujuHakYEFIxv9eMOeiUmBWnUcPFBpdnNubE9bQuzQriCVgUeNFsXMHBruXxuM qQUA== X-Gm-Message-State: AIkVDXLeDVaZaS/T6I6JZWENbwJtREHaGbmw1QnNimuB6mxew3Dk7ra8PHdqtBTuGCG2w0rg X-Received: by 10.223.165.1 with SMTP id i1mr12723091wrb.82.1485646448262; Sat, 28 Jan 2017 15:34:08 -0800 (PST) Received: from localhost.localdomain ([160.163.215.165]) by smtp.gmail.com with ESMTPSA id 40sm15141221wry.22.2017.01.28.15.34.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sat, 28 Jan 2017 15:34:07 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Subject: [RFC PATCH v2 4/4] crypto: aes - add generic time invariant AES for CTR/CCM/GCM Date: Sat, 28 Jan 2017 23:33:33 +0000 Message-Id: <1485646413-17491-5-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1485646413-17491-1-git-send-email-ard.biesheuvel@linaro.org> References: <1485646413-17491-1-git-send-email-ard.biesheuvel@linaro.org> X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20170128_153529_687146_478291DE X-CRM114-Status: GOOD ( 22.60 ) X-Spam-Score: -1.5 (-) X-Spam-Report: SpamAssassin version 3.4.1 on bombadil.infradead.org summary: Content analysis details: (-1.5 points) pts rule name description ---- ---------------------- -------------------------------------------------- 0.5 RCVD_IN_SORBS_SPAM RBL: SORBS: sender is a spam source [74.125.82.45 listed in dnsbl.sorbs.net] -0.0 RCVD_IN_MSPIKE_H3 RBL: Good reputation (+3) [74.125.82.45 listed in wl.mailspike.net] -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [74.125.82.45 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org Lookup table based AES is sensitive to timing attacks, which is due to the fact that such table lookups are data dependent, and the fact that 8 KB worth of tables covers a significant number of cachelines on any architecture. For network facing algorithms such as CTR, CCM or GCM, this presents a security risk, which is why arch specific AES ports are typically time invariant, either through the use of special instructions, or by using SIMD algorithms that don't rely on table lookups. For generic code, this is difficult to achieve without losing too much performance, but we can improve the situation significantly by switching to an implementation that only needs 256 bytes of table data (the actual S-box itself), which can be prefetched at the start of each block to eliminate data dependent latencies. Note that this only implements AES encryption, which is all we need for CTR and CBC-MAC. AES decryption can easily be implemented in a similar way, but is significantly more costly. This code runs at ~25 cycles per byte on ARM Cortex-A57 (while the ordinary generic AES driver manages 18 cycles per byte on this hardware). Signed-off-by: Ard Biesheuvel --- crypto/Kconfig | 17 ++ crypto/Makefile | 1 + crypto/aes_ti.c | 320 ++++++++++++++++++++ 3 files changed, 338 insertions(+) -- 2.7.4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel diff --git a/crypto/Kconfig b/crypto/Kconfig index e8269d1b0282..f8dcc4003661 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -896,6 +896,23 @@ config CRYPTO_AES See for more information. +config CRYPTO_AES_TI + tristate "AES in CTR and CBC-MAC modes (fixed time)" + select CRYPTO_BLKCIPHER + select CRYPTO_HASH + select CRYPTO_AES + help + This is a generic implementation of AES in CTR and CBC-MAC modes + which is designed to eliminate data dependent latencies as much as + possible without affecting performance too much. It is intended for + use by the generic CCM and GCM drivers, and other CTR based modes. + + Instead of using 8 lookup tables of 1 KB each, both for encryption + and decryption, this implementation only uses a single S-box of 256 + bytes, and attempts to eliminate data dependent latencies by + prefetching the entire table into the cache at the start of each + block. + config CRYPTO_AES_586 tristate "AES cipher algorithms (i586)" depends on (X86 || UML_X86) && !64BIT diff --git a/crypto/Makefile b/crypto/Makefile index b8f0e3eb0791..bcd834536163 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -99,6 +99,7 @@ obj-$(CONFIG_CRYPTO_TWOFISH) += twofish_generic.o obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o obj-$(CONFIG_CRYPTO_AES) += aes_generic.o +obj-$(CONFIG_CRYPTO_AES_TI) += aes_ti.o obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o obj-$(CONFIG_CRYPTO_CAST5) += cast5_generic.o diff --git a/crypto/aes_ti.c b/crypto/aes_ti.c new file mode 100644 index 000000000000..63c47789ccfc --- /dev/null +++ b/crypto/aes_ti.c @@ -0,0 +1,320 @@ +/* + * Scalar fixed time AES core transform for CTR/CCM/GCM + * + * Copyright (C) 2017 Linaro Ltd + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include +#include + +struct aes_ti_ctx { + u32 rk[AES_MAX_KEYLENGTH_U32]; + int rounds; +}; + +struct cbcmac_desc_ctx { + unsigned int len; + u8 dg[AES_BLOCK_SIZE]; +}; + +/* + * Emit the sbox as __weak with external linkage to prevent the compiler + * from doing constant folding on sbox references involving fixed indexes. + */ +__weak const u8 __cacheline_aligned __aesti_sbox[] = { + 0x63, 0x7c, 0x77, 0x7b, 0xf2, 0x6b, 0x6f, 0xc5, + 0x30, 0x01, 0x67, 0x2b, 0xfe, 0xd7, 0xab, 0x76, + 0xca, 0x82, 0xc9, 0x7d, 0xfa, 0x59, 0x47, 0xf0, + 0xad, 0xd4, 0xa2, 0xaf, 0x9c, 0xa4, 0x72, 0xc0, + 0xb7, 0xfd, 0x93, 0x26, 0x36, 0x3f, 0xf7, 0xcc, + 0x34, 0xa5, 0xe5, 0xf1, 0x71, 0xd8, 0x31, 0x15, + 0x04, 0xc7, 0x23, 0xc3, 0x18, 0x96, 0x05, 0x9a, + 0x07, 0x12, 0x80, 0xe2, 0xeb, 0x27, 0xb2, 0x75, + 0x09, 0x83, 0x2c, 0x1a, 0x1b, 0x6e, 0x5a, 0xa0, + 0x52, 0x3b, 0xd6, 0xb3, 0x29, 0xe3, 0x2f, 0x84, + 0x53, 0xd1, 0x00, 0xed, 0x20, 0xfc, 0xb1, 0x5b, + 0x6a, 0xcb, 0xbe, 0x39, 0x4a, 0x4c, 0x58, 0xcf, + 0xd0, 0xef, 0xaa, 0xfb, 0x43, 0x4d, 0x33, 0x85, + 0x45, 0xf9, 0x02, 0x7f, 0x50, 0x3c, 0x9f, 0xa8, + 0x51, 0xa3, 0x40, 0x8f, 0x92, 0x9d, 0x38, 0xf5, + 0xbc, 0xb6, 0xda, 0x21, 0x10, 0xff, 0xf3, 0xd2, + 0xcd, 0x0c, 0x13, 0xec, 0x5f, 0x97, 0x44, 0x17, + 0xc4, 0xa7, 0x7e, 0x3d, 0x64, 0x5d, 0x19, 0x73, + 0x60, 0x81, 0x4f, 0xdc, 0x22, 0x2a, 0x90, 0x88, + 0x46, 0xee, 0xb8, 0x14, 0xde, 0x5e, 0x0b, 0xdb, + 0xe0, 0x32, 0x3a, 0x0a, 0x49, 0x06, 0x24, 0x5c, + 0xc2, 0xd3, 0xac, 0x62, 0x91, 0x95, 0xe4, 0x79, + 0xe7, 0xc8, 0x37, 0x6d, 0x8d, 0xd5, 0x4e, 0xa9, + 0x6c, 0x56, 0xf4, 0xea, 0x65, 0x7a, 0xae, 0x08, + 0xba, 0x78, 0x25, 0x2e, 0x1c, 0xa6, 0xb4, 0xc6, + 0xe8, 0xdd, 0x74, 0x1f, 0x4b, 0xbd, 0x8b, 0x8a, + 0x70, 0x3e, 0xb5, 0x66, 0x48, 0x03, 0xf6, 0x0e, + 0x61, 0x35, 0x57, 0xb9, 0x86, 0xc1, 0x1d, 0x9e, + 0xe1, 0xf8, 0x98, 0x11, 0x69, 0xd9, 0x8e, 0x94, + 0x9b, 0x1e, 0x87, 0xe9, 0xce, 0x55, 0x28, 0xdf, + 0x8c, 0xa1, 0x89, 0x0d, 0xbf, 0xe6, 0x42, 0x68, + 0x41, 0x99, 0x2d, 0x0f, 0xb0, 0x54, 0xbb, 0x16, +}; + +static int aesti_set_key(struct aes_ti_ctx *ctx, const u8 *in_key, + unsigned int key_len) +{ + struct crypto_aes_ctx rk; + int err; + + err = crypto_aes_expand_key(&rk, in_key, key_len); + if (err) + return err; + + memcpy(ctx->rk, rk.key_enc, sizeof(ctx->rk)); + ctx->rounds = 6 + key_len / 4; + + /* + * In order to force the compiler to emit data independent Sbox lookups + * at the start of each block, xor the first round key with values at + * fixed indexes in the Sbox. This will need to be repeated each time + * the key is used, which will pull the entire Sbox into the D-cache + * before any data dependent Sbox lookups are performed. + */ + ctx->rk[0] ^= __aesti_sbox[ 0] ^ __aesti_sbox[128]; + ctx->rk[1] ^= __aesti_sbox[32] ^ __aesti_sbox[160]; + ctx->rk[2] ^= __aesti_sbox[64] ^ __aesti_sbox[192]; + ctx->rk[3] ^= __aesti_sbox[96] ^ __aesti_sbox[224]; + + return 0; +} + +static u32 mul_by_x(u32 w) +{ + /* multiply by polynomial 'x' (0b10) in GF(2^8) */ + return ((w & 0x80808080) >> 7) * 0x1b ^ ((w & 0x7f7f7f7f) << 1); +} + +static u32 mix_columns(u32 x) +{ + u32 y = mul_by_x(x) ^ ror32(x, 16); + + return y ^ ror32(x ^ y, 8); +} + +static __always_inline u32 subshift(u32 in[], int pos) +{ + return (__aesti_sbox[in[pos] & 0xff]) ^ + (__aesti_sbox[(in[(pos + 1) % 4] >> 8) & 0xff] << 8) ^ + (__aesti_sbox[(in[(pos + 2) % 4] >> 16) & 0xff] << 16) ^ + (__aesti_sbox[(in[(pos + 3) % 4] >> 24) & 0xff] << 24); +} + +static void aesti_encrypt(struct aes_ti_ctx *ctx, u8 *out, const u8 *in) +{ + u32 st0[4], st1[4]; + u32 *rkp = ctx->rk + 4; + int round; + + st0[0] = get_unaligned_le32(in); + st0[1] = get_unaligned_le32(in + 4); + st0[2] = get_unaligned_le32(in + 8); + st0[3] = get_unaligned_le32(in + 12); + + st0[0] ^= ctx->rk[0] ^ __aesti_sbox[ 0] ^ __aesti_sbox[128]; + st0[1] ^= ctx->rk[1] ^ __aesti_sbox[32] ^ __aesti_sbox[160]; + st0[2] ^= ctx->rk[2] ^ __aesti_sbox[64] ^ __aesti_sbox[192]; + st0[3] ^= ctx->rk[3] ^ __aesti_sbox[96] ^ __aesti_sbox[224]; + + for (round = 0;; round += 2, rkp += 8) { + st1[0] = mix_columns(subshift(st0, 0)) ^ rkp[0]; + st1[1] = mix_columns(subshift(st0, 1)) ^ rkp[1]; + st1[2] = mix_columns(subshift(st0, 2)) ^ rkp[2]; + st1[3] = mix_columns(subshift(st0, 3)) ^ rkp[3]; + + if (round == ctx->rounds - 2) + break; + + st0[0] = mix_columns(subshift(st1, 0)) ^ rkp[4]; + st0[1] = mix_columns(subshift(st1, 1)) ^ rkp[5]; + st0[2] = mix_columns(subshift(st1, 2)) ^ rkp[6]; + st0[3] = mix_columns(subshift(st1, 3)) ^ rkp[7]; + } + + put_unaligned_le32(subshift(st1, 0) ^ rkp[4], out); + put_unaligned_le32(subshift(st1, 1) ^ rkp[5], out + 4); + put_unaligned_le32(subshift(st1, 2) ^ rkp[6], out + 8); + put_unaligned_le32(subshift(st1, 3) ^ rkp[7], out + 12); +} + +static int aesti_ctr_set_key(struct crypto_skcipher *tfm, const u8 *in_key, + unsigned int key_len) +{ + struct aes_ti_ctx *ctx = crypto_skcipher_ctx(tfm); + int err; + + err = aesti_set_key(ctx, in_key, key_len); + if (err) + crypto_skcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN); + return err; +} + +static int aesti_ctr_encrypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct aes_ti_ctx *ctx = crypto_skcipher_ctx(tfm); + struct skcipher_walk walk; + u8 buf[AES_BLOCK_SIZE]; + int err; + + err = skcipher_walk_virt(&walk, req, false); + + while (walk.nbytes > 0) { + u8 *dst = walk.dst.virt.addr; + u8 *src = walk.src.virt.addr; + int nbytes = walk.nbytes; + int tail = 0; + + if (nbytes < walk.total) { + nbytes = round_down(nbytes, AES_BLOCK_SIZE); + tail = walk.nbytes % AES_BLOCK_SIZE; + } + + do { + int bsize = min(nbytes, AES_BLOCK_SIZE); + + aesti_encrypt(ctx, buf, walk.iv); + if (dst != src) + memcpy(dst, src, bsize); + crypto_xor(dst, buf, bsize); + crypto_inc(walk.iv, AES_BLOCK_SIZE); + + dst += AES_BLOCK_SIZE; + src += AES_BLOCK_SIZE; + nbytes -= AES_BLOCK_SIZE; + } while (nbytes > 0); + + err = skcipher_walk_done(&walk, tail); + } + return err; +} + +static struct skcipher_alg ctr_alg = { + .base.cra_name = "ctr(aes)", + .base.cra_driver_name = "ctr-aes-ti", + .base.cra_priority = 100 + 1, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct aes_ti_ctx), + .base.cra_module = THIS_MODULE, + + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .chunksize = AES_BLOCK_SIZE, + .ivsize = AES_BLOCK_SIZE, + .setkey = aesti_ctr_set_key, + .encrypt = aesti_ctr_encrypt, + .decrypt = aesti_ctr_encrypt, +}; + +static int aesti_cbcmac_setkey(struct crypto_shash *tfm, + const u8 *in_key, unsigned int key_len) +{ + struct aes_ti_ctx *ctx = crypto_shash_ctx(tfm); + int err; + + err = aesti_set_key(ctx, in_key, key_len); + if (err) + crypto_shash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN); + + return err; +} + +static int aesti_cbcmac_init(struct shash_desc *desc) +{ + struct cbcmac_desc_ctx *ctx = shash_desc_ctx(desc); + + memset(ctx->dg, 0, AES_BLOCK_SIZE); + ctx->len = 0; + + return 0; +} + +static int aesti_cbcmac_update(struct shash_desc *desc, const u8 *p, + unsigned int len) +{ + struct aes_ti_ctx *tctx = crypto_shash_ctx(desc->tfm); + struct cbcmac_desc_ctx *ctx = shash_desc_ctx(desc); + + while (len--) { + ctx->dg[ctx->len++] ^= *p++; + + if (ctx->len == AES_BLOCK_SIZE) { + aesti_encrypt(tctx, ctx->dg, ctx->dg); + ctx->len = 0; + } + } + + return 0; +} + +static int aesti_cbcmac_final(struct shash_desc *desc, u8 *out) +{ + struct aes_ti_ctx *tctx = crypto_shash_ctx(desc->tfm); + struct cbcmac_desc_ctx *ctx = shash_desc_ctx(desc); + + if (ctx->len) + aesti_encrypt(tctx, out, ctx->dg); + else + memcpy(out, ctx->dg, AES_BLOCK_SIZE); + + return 0; +} + +static struct shash_alg cbcmac_alg = { + .base.cra_name = "cbcmac(aes)", + .base.cra_driver_name = "cbcmac-aes-ti", + .base.cra_priority = 100 + 1, + .base.cra_flags = CRYPTO_ALG_TYPE_SHASH, + .base.cra_blocksize = 1, + .base.cra_ctxsize = sizeof(struct aes_ti_ctx), + .base.cra_module = THIS_MODULE, + + .digestsize = AES_BLOCK_SIZE, + .init = aesti_cbcmac_init, + .update = aesti_cbcmac_update, + .final = aesti_cbcmac_final, + .setkey = aesti_cbcmac_setkey, + .descsize = sizeof(struct cbcmac_desc_ctx), +}; + +static int __init aes_init(void) +{ + int err; + + err = crypto_register_skcipher(&ctr_alg); + if (err) + return err; + + err = crypto_register_shash(&cbcmac_alg); + if (err) + crypto_unregister_skcipher(&ctr_alg); + return err; +} + +static void __exit aes_fini(void) +{ + crypto_unregister_shash(&cbcmac_alg); + crypto_unregister_skcipher(&ctr_alg); +} + +module_init(aes_init); +module_exit(aes_fini); + +MODULE_DESCRIPTION("Generic fixed time AES transform in CTR and CBC-MAC modes"); +MODULE_AUTHOR("Ard Biesheuvel "); +MODULE_LICENSE("GPL v2"); +MODULE_ALIAS_CRYPTO("cbcmac(aes)"); +MODULE_ALIAS_CRYPTO("ctr(aes)");