From patchwork Wed Jan 3 22:39:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arnd Bergmann X-Patchwork-Id: 123363 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp10723401qgn; Wed, 3 Jan 2018 14:40:08 -0800 (PST) X-Google-Smtp-Source: ACJfBotMpQeZrrYm5BEh1+PN1bRk6WiusO1PQxt3oRtky/Eje9T/BYhsAmyV0jkI16H1/NAGeysj X-Received: by 10.98.155.22 with SMTP id r22mr2734505pfd.96.1515019207941; Wed, 03 Jan 2018 14:40:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1515019207; cv=none; d=google.com; s=arc-20160816; b=sndAxRsg3GboUZfio6cIc0aF/gwUGRe6ggl7pjvb66DkNknNB0TFSC1l7vgXWFy0e/ hYlDnK3mobmz8sZxqWMy1WHAoOvypD6ByBTCh+2Q0SMPq9VrLiQxIrIkv21BrtQSWO7s maj7+kIxnzP06SXiQX4Q/Jg3DkJ7fQ7YBut0sgJrVUs0m2RgRLh8/CGsK3zft3P43v7F rD1ifasrUGakZ1uEg5cEliBcUPBLlJyurC75I6M4Vk5NRq6laI//XOHrll4Fh9xWolEp 1/BUgOoeV70/1lVpZDPPyFkJvFmmRhHeyldY06N75OCRBJBAZ359Bj29OF8vyCNn1b0z FoCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=ol2bIDgaCfJUzqqW9HeY0xdzbYPRxZPeHTHw4i1HC48=; b=PswO5hAQUpsviBs6YN0c/j9eiMkYCyEyVT/E784oTYLVEaSf4Yl3aK6PfJh24zP9aW 7Qvu7ZP0/VUOA24N89Jvvg3BKuMQqR3RQYOx0Q1HXrMy6CsgbdVmye/UE9r/aSfuztWB AoXUSxUrOOtOjfEFltNEX8XzcB0emLa/kDw4g5SpDMFe440eHEvl3MP7/8qVP9ZwcEgl EBMCpK3Y/bTuFTLYzXEuq+Op/2MDTTw5SmZIknCPWUCkrzTXkric/ghM1YKVfET/MRXV Rpvr7nd2MUF+J8FxpW8/p9K0MFgUn2E5RrNX6g3NNYT8sg+GlllEtpLWhK93BXcu4nju FhdQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m13si1195963pgs.37.2018.01.03.14.40.07; Wed, 03 Jan 2018 14:40:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751134AbeACWkG (ORCPT + 1 other); Wed, 3 Jan 2018 17:40:06 -0500 Received: from mout.kundenserver.de ([217.72.192.73]:63705 "EHLO mout.kundenserver.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750979AbeACWkF (ORCPT ); Wed, 3 Jan 2018 17:40:05 -0500 Received: from wuerfel.lan ([95.208.111.237]) by mrelayeu.kundenserver.de (mreue103 [212.227.15.145]) with ESMTPA (Nemesis) id 0LoaZ0-1f8vCl1S94-00gW87; Wed, 03 Jan 2018 23:39:45 +0100 From: Arnd Bergmann To: Herbert Xu , "David S. Miller" Cc: Arnd Bergmann , Richard Biener , Jakub Jelinek , Ard Biesheuvel , linux-crypto@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] [v2] crypto: aes-generic - build with -Os on gcc-7+ Date: Wed, 3 Jan 2018 23:39:27 +0100 Message-Id: <20180103223940.3715372-1-arnd@arndb.de> X-Mailer: git-send-email 2.9.0 X-Provags-ID: V03:K0:XvTNTwL/9veMYD6P7p64GkGtZJwN+zu8oq2BHbs/+MHnXbshCR/ e4XvwgeNGiE4DkOD/sVWIwYArM76PN9OHhurOXZY2+qRgGZmKbCq2n10MOC6kIzbbnZXVLQ WO9eLqfGdF2P0oYkfDdjEssAwb9fwd1jUfew1V+6s5GbCh7+Ymf+xzxTmVXSuXIgzdQLJXG vvZQ15cdaeDlZAng7SIYQ== X-UI-Out-Filterresults: notjunk:1; V01:K0:iKXrbG1vnoo=:xLGtScEwhlit8TEA+S1WgD NzByujS3PpfIW1K7vg6GUIsvZD9wGVRoCQz0UkoWJ+06v8lq8vQmLn3Jh+dNuGwCjHeKx+gN2 28/fOPF1tV0NWCQsLe3ZO7mmn2N2vyu/uXfXTDOVAzosssHyl1Wj2M5B+/+I0BEimtpThkbQ0 s5fQLenccLBChIddPJhlUWFrA9s/5qr8WBkZC+yMSczd2lanTP+evj7Y1N5jAxeMaYTHoGFbp LqfNrj7hlUhuUn6rLbUJfmvv6dkkZMagLoqySq/3DZptRHM8l1i0UV2UnIcseGD5MjM200Z/2 hl1wbpm2cKG6x/USxkCzi5Q6Z2SgkgY081ZAp83Cd8wrJlQk/NZwojo7q8i3nvI1h5+Sl76PK xmgp2cWF4EDy2tcQ4j3ArXkCBxTrwNi7UDXSPJCLu+d8bzkGCTdVedSoszYPEYcGvmnFhB/a0 eCyhf1W0OqfRncO4J4WZmyxwIgvk2pirFn2gt/uYiCIAYlFFSyVMioc/EocFjgr1zsow/LMsg /3hut/gC1ozQiLN/kx5+txxQWJex27TI7HdGFD9YL4c/77Qqd6Oh3VhU27l/V7cfzsebt05f7 ayrI7qKzzhCx6hXTZE0ORh7jxT1iAZSPUfPvM5jEt15/HdUyWXuZD4KFmU0nrgJlzwKIzjHgo OJdVBt/tYddB6tdsmRJlXl0jk2Uofz1eU1Sp4SBXQ1HqNzhilAggXhADNoyjH43EnO8xHIcqA gIcRjYyaaa5uqvHENIOULY3w7gh5C/gbTErJ44Ugo+fErnbld9gNswHv/pE= Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org While testing other changes, I discovered that gcc-7.2.1 produces badly optimized code for aes_encrypt/aes_decrypt. This is especially true when CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely large stack usage that in turn might cause kernel stack overflows: crypto/aes_generic.c: In function 'aes_encrypt': crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=] crypto/aes_generic.c: In function 'aes_decrypt': crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=] I verified that this problem exists on all architectures that are supported by gcc-7.2, though arm64 in particular is less affected than the others. I also found that gcc-7.1 and gcc-8 do not show the extreme stack usage but still produce worse code than earlier versions for this file, apparently because of optimization passes that generally provide a substantial improvement in object code quality but understandably fail to find any shortcuts in the AES algorithm. Possible workarounds include a) disabling -ftree-pre and -ftree-sra optimizations, this was an earlier patch I tried, which reliably fixed the stack usage, but caused a serious performance regression in some versions, as later testing found. b) disabling UBSAN on this file or all ciphers, as suggested by Ard Biesheuvel. This would lead to massively better crypto performance in UBSAN-enabled kernels and avoid the stack usage, but there is a concern over whether we should exclude arbitrary files from UBSAN at all. c) Forcing the optimization level in a different way. Similar to a), but rather than deselecting specific optimization stages, this now uses "gcc -Os" for this file, regardless of the CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE/SIZE option. This is a reliable workaround for the stack consumption on all architecture, and I've retested the performance results now on x86, cycles/byte (lower is better) for cbc(aes-generic) with 256 bit keys: -O2 -Os gcc-6.3.1 14.9 15.1 gcc-7.0.1 14.7 15.3 gcc-7.1.1 15.3 14.7 gcc-7.2.1 16.8 15.9 gcc-8.0.0 15.5 15.6 This implements the option c) by enabling forcing -Os on all compiler versions starting with gcc-7.1. As a workaround for PR83356, it would only be needed for gcc-7.2+ with UBSAN enabled, but since it also shows better performance on gcc-7.1 without UBSAN, it seems appropriate to use the faster version here as well. Side note: during testing, I also played with the AES code in libressl, which had a similar performance regression from gcc-6 to gcc-7.2, but was three times slower overall. It might be interesting to investigate that further and possibly port the Linux implementation into that. Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651 Cc: Richard Biener Cc: Jakub Jelinek Cc: Ard Biesheuvel Signed-off-by: Arnd Bergmann --- crypto/Makefile | 1 + 1 file changed, 1 insertion(+) -- 2.9.0 diff --git a/crypto/Makefile b/crypto/Makefile index d674884b2d51..daa69360e054 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -99,6 +99,7 @@ obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 obj-$(CONFIG_CRYPTO_AES) += aes_generic.o +CFLAGS_aes_generic.o := $(call cc-ifversion, -ge, 0701, -Os) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356 obj-$(CONFIG_CRYPTO_AES_TI) += aes_ti.o obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o