From patchwork Thu Jan 18 20:59:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sasha Levin X-Patchwork-Id: 125023 Delivered-To: patch@linaro.org Received: by 10.46.64.27 with SMTP id n27csp293762lja; Thu, 18 Jan 2018 12:59:46 -0800 (PST) X-Google-Smtp-Source: ACJfBov9l9KcuqSjuqvLZScjLIPfArrMX4p70fzr2JMe9A1HbhvgmFEpQs9P+eBCW+KahQ6Dq3Yd X-Received: by 10.98.42.23 with SMTP id q23mr43627935pfq.161.1516309185947; Thu, 18 Jan 2018 12:59:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516309185; cv=none; d=google.com; s=arc-20160816; b=jYI1F1wbMQIPtcqg+yN0PlD48A0l6FoBuh3vf4ttLwRDRn8ICrNzIuMnqdTX5pBM8I bj2EploFy3vdE8fyfwRBT9rEUUqrgMJ6fWkd+OBShzJPicQwyE4BcvqcqcEIw5D31M3o tPdSvslVIib+ChPCw/vZ2yQnvlLMvwqsSIHrfXVbnLXeMjn9N56ZhvHBUIz6Gy++YouF lrETKEOYlT/fJVABmvjqHD/MhQVDUbpr2h6Txk6od2NB+kGJ4+s2ddqK0XX+2wdVAeav UcxaqB2ef2FK7p8oC5ELNZz9RbA7vU/TwyEWldHBy0fp2bp2q6iKOXGnJdve6l+02vKV Xkng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:content-transfer-encoding :spamdiagnosticmetadata:spamdiagnosticoutput:content-language :accept-language:in-reply-to:references:message-id:date:thread-index :thread-topic:subject:cc:to:from:dkim-signature :arc-authentication-results; bh=sWUD1bOSkAkE1MbSt1wBCEVP4q4h9vBVeJcTigu0SqM=; b=WmKyksWhLKegiTKQOTGVk6xczZOSd/YzvWT2vf0BZOXmgjA85POTlK6WntkA8vAE/s 0suieNXzDNcf1JcdrxZ2UdZo+0NsvWgv5KAuUxPEqNFcbNE+QgvqOhmi21v5Nzz2GgE2 GwM6DsZpE/m1zd2sQmQDa8nqVP3LGn42O05n4hr+e09bu6CmVqkSdWj+0BOJ9iFeRgLC lNH1NRrmKh4wjO/k1sHBaym2xKIcPMQIVihPfqhMU8l07M05MnM7P/IqMurRsG/ci2ch 30+bMnxR2K4vH+eEarX2angKrRiVYC+Xn1D7QNiNtGqYKpPk897AFDb9LT2BpjrqoEGp 0NYg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=AFMerld4; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c16si4933770pgu.706.2018.01.18.12.59.45; Thu, 18 Jan 2018 12:59:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@microsoft.com header.s=selector1 header.b=AFMerld4; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=microsoft.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932357AbeARU7o (ORCPT + 10 others); Thu, 18 Jan 2018 15:59:44 -0500 Received: from mail-by2nam01on0120.outbound.protection.outlook.com ([104.47.34.120]:44880 "EHLO NAM01-BY2-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753649AbeARU7k (ORCPT ); Thu, 18 Jan 2018 15:59:40 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=sWUD1bOSkAkE1MbSt1wBCEVP4q4h9vBVeJcTigu0SqM=; b=AFMerld4zAuaeh8ucEvAptiWcP4/EROHwsyORHFgqwBUMWdKiwJ9blQ6hXtbWKoxlT4pbp8iWjMwh0sxiybwQmbCHjK/LylLZj056pbYAG1Lr4qQqD/lwN51GlCqvDp/pUnYP67B0wfQwZldOuWSx2fPwH7JLaPtWXshS0Sw62w= Received: from DM5PR2101MB1032.namprd21.prod.outlook.com (52.132.128.13) by DM5PR2101MB1064.namprd21.prod.outlook.com (52.132.130.10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.444.7; Thu, 18 Jan 2018 20:59:38 +0000 Received: from DM5PR2101MB1032.namprd21.prod.outlook.com ([fe80::6485:b98:d15e:9da7]) by DM5PR2101MB1032.namprd21.prod.outlook.com ([fe80::6485:b98:d15e:9da7%2]) with mapi id 15.20.0428.008; Thu, 18 Jan 2018 20:59:38 +0000 From: Sasha Levin To: "stable@vger.kernel.org" , "stable-commits@vger.kernel.org" CC: Arnd Bergmann , Herbert Xu , Sasha Levin Subject: [added to the 4.1 stable tree] crypto: improve gcc optimization flags for serpent and wp512 Thread-Topic: [added to the 4.1 stable tree] crypto: improve gcc optimization flags for serpent and wp512 Thread-Index: AQHTkJ8/GPfnZQP5QkCm6guuDuqOpQ== Date: Thu, 18 Jan 2018 20:59:35 +0000 Message-ID: <20180118205908.3220-23-alexander.levin@microsoft.com> References: <20180118205908.3220-1-alexander.levin@microsoft.com> In-Reply-To: <20180118205908.3220-1-alexander.levin@microsoft.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [167.220.98.9] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; DM5PR2101MB1064; 7:QoQPcp/NsP9DcWJb/GlgBDCXITW4AGYn0ULVFu/GTmD/s7/NmkvP8V0EgxQK3csyELMTWdaLtcANqAps3gztL7jQVTKMdqd275E1b/4++7Rf5pPgxLjLC3nSUmwJTy1i1BnbKMGaCFaTO3mUH66/NMIUQK+IwSu0tqyaW2HSUR6Dj46uo1+aVAHSu5g3YL+GJli/KRP6G2bTw89A0omk0Nkd5L5iHBOXFcn5RzGxIY/W2fP7oKZgzBdoSZMql0RT x-ms-office365-filtering-correlation-id: ba833c0c-4931-4041-2b91-08d55eb66376 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(7020095)(4652020)(5600026)(4604075)(3008032)(4534125)(4602075)(4627221)(201703031133081)(201702281549075)(48565401081)(2017052603307)(7193020); SRVR:DM5PR2101MB1064; x-ms-traffictypediagnostic: DM5PR2101MB1064: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(28532068793085)(20558992708506)(22074186197030)(89211679590171)(183786458502308); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(61425038)(6040495)(2401047)(5005006)(8121501046)(3231046)(2400067)(944501161)(3002001)(10201501046)(93006095)(93001095)(6055026)(61426038)(61427038)(6041282)(20161123558120)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123562045)(6072148)(201708071742011); SRVR:DM5PR2101MB1064; BCL:0; PCL:0; RULEID:(100000803126)(100110400120); SRVR:DM5PR2101MB1064; x-forefront-prvs: 05568D1FF7 x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(39380400002)(346002)(396003)(39860400002)(366004)(376002)(199004)(189003)(106356001)(25786009)(68736007)(107886003)(2900100001)(26005)(105586002)(102836004)(8676002)(4326008)(6506007)(14454004)(59450400001)(76176011)(86612001)(81156014)(10090500001)(8936002)(86362001)(575784001)(72206003)(966005)(81166006)(5250100002)(6346003)(2501003)(478600001)(10290500003)(99286004)(2950100002)(305945005)(53936002)(2906002)(6666003)(5660300001)(3846002)(1076002)(66066001)(6116002)(22452003)(316002)(6512007)(110136005)(54906003)(3280700002)(6436002)(3660700001)(97736004)(6306002)(36756003)(7736002)(6486002)(22906009); DIR:OUT; SFP:1102; SCL:1; SRVR:DM5PR2101MB1064; H:DM5PR2101MB1032.namprd21.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts) authentication-results: spf=none (sender IP is ) smtp.mailfrom=Alexander.Levin@microsoft.com; x-microsoft-antispam-message-info: O0rmFsJNzoh/SzkfGulN6b9T9IkF3x9jVjBEVFhC5Jzes2jrOwVJKwBw+FF0J4p0Op9Lcw2AsKN4ck6r0+69VQ== spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: microsoft.com X-MS-Exchange-CrossTenant-Network-Message-Id: ba833c0c-4931-4041-2b91-08d55eb66376 X-MS-Exchange-CrossTenant-originalarrivaltime: 18 Jan 2018 20:59:35.5572 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47 X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR2101MB1064 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Arnd Bergmann This patch has been added to the stable tree. If you have any objections, please let us know. -- 2.11.0 =============== [ Upstream commit 7d6e9105026788c497f0ab32fa16c82f4ab5ff61 ] An ancient gcc bug (first reported in 2003) has apparently resurfaced on MIPS, where kernelci.org reports an overly large stack frame in the whirlpool hash algorithm: crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=] With some testing in different configurations, I'm seeing large variations in stack frames size up to 1500 bytes for what should have around 300 bytes at most. I also checked the reference implementation, which is essentially the same code but also comes with some test and benchmarking infrastructure. It seems that recent compiler versions on at least arm, arm64 and powerpc have a partial fix for this problem, but enabling "-fsched-pressure", but even with that fix they suffer from the issue to a certain degree. Some testing on arm64 shows that the time needed to hash a given amount of data is roughly proportional to the stack frame size here, which makes sense given that the wp512 implementation is doing lots of loads for table lookups, and the problem with the overly large stack is a result of doing a lot more loads and stores for spilled registers (as seen from inspecting the object code). Disabling -fschedule-insns consistently fixes the problem for wp512, in my collection of cross-compilers, the results are consistently better or identical when comparing the stack sizes in this function, though some architectures (notable x86) have schedule-insns disabled by default. The four columns are: default: -O2 press: -O2 -fsched-pressure nopress: -O2 -fschedule-insns -fno-sched-pressure nosched: -O2 -no-schedule-insns (disables sched-pressure) default press nopress nosched alpha-linux-gcc-4.9.3 1136 848 1136 176 am33_2.0-linux-gcc-4.9.3 2100 2076 2100 2104 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 cris-linux-gcc-4.9.3 272 272 272 272 frv-linux-gcc-4.9.3 1128 1000 1128 280 hppa64-linux-gcc-4.9.3 1128 336 1128 184 hppa-linux-gcc-4.9.3 644 308 644 276 i386-linux-gcc-4.9.3 352 352 352 352 m32r-linux-gcc-4.9.3 720 656 720 268 microblaze-linux-gcc-4.9.3 1108 604 1108 256 mips64-linux-gcc-4.9.3 1328 592 1328 208 mips-linux-gcc-4.9.3 1096 624 1096 240 powerpc64-linux-gcc-4.9.3 1088 432 1088 160 powerpc-linux-gcc-4.9.3 1080 584 1080 224 s390-linux-gcc-4.9.3 456 456 624 360 sh3-linux-gcc-4.9.3 292 292 292 292 sparc64-linux-gcc-4.9.3 992 240 992 208 sparc-linux-gcc-4.9.3 680 592 680 312 x86_64-linux-gcc-4.9.3 224 240 272 224 xtensa-linux-gcc-4.9.3 1152 704 1152 304 aarch64-linux-gcc-7.0.0 224 224 1104 208 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 mips-linux-gcc-7.0.0 1120 648 1120 272 x86_64-linux-gcc-7.0.1 240 240 304 240 arm-linux-gnueabi-gcc-4.4.7 840 392 arm-linux-gnueabi-gcc-4.5.4 784 728 784 320 arm-linux-gnueabi-gcc-4.6.4 736 728 736 304 arm-linux-gnueabi-gcc-4.7.4 944 784 944 352 arm-linux-gnueabi-gcc-4.8.5 464 464 760 352 arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352 arm-linux-gnueabi-gcc-5.3.1 824 824 1064 336 arm-linux-gnueabi-gcc-6.1.1 808 808 1056 344 arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352 Trying the same test for serpent-generic, the picture is a bit different, and while -fno-schedule-insns is generally better here than the default, -fsched-pressure wins overall, so I picked that instead. default press nopress nosched alpha-linux-gcc-4.9.3 1392 864 1392 960 am33_2.0-linux-gcc-4.9.3 536 524 536 528 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 cris-linux-gcc-4.9.3 528 528 528 528 frv-linux-gcc-4.9.3 536 400 536 504 hppa64-linux-gcc-4.9.3 524 208 524 480 hppa-linux-gcc-4.9.3 768 472 768 508 i386-linux-gcc-4.9.3 564 564 564 564 m32r-linux-gcc-4.9.3 712 576 712 532 microblaze-linux-gcc-4.9.3 724 392 724 512 mips64-linux-gcc-4.9.3 720 384 720 496 mips-linux-gcc-4.9.3 728 384 728 496 powerpc64-linux-gcc-4.9.3 704 304 704 480 powerpc-linux-gcc-4.9.3 704 296 704 480 s390-linux-gcc-4.9.3 560 560 592 536 sh3-linux-gcc-4.9.3 540 540 540 540 sparc64-linux-gcc-4.9.3 544 352 544 496 sparc-linux-gcc-4.9.3 544 344 544 496 x86_64-linux-gcc-4.9.3 528 536 576 528 xtensa-linux-gcc-4.9.3 752 544 752 544 aarch64-linux-gcc-7.0.0 432 432 656 480 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 mips-linux-gcc-7.0.0 720 464 720 488 x86_64-linux-gcc-7.0.1 536 528 600 536 arm-linux-gnueabi-gcc-4.4.7 592 440 arm-linux-gnueabi-gcc-4.5.4 776 448 776 544 arm-linux-gnueabi-gcc-4.6.4 776 448 776 544 arm-linux-gnueabi-gcc-4.7.4 768 448 768 544 arm-linux-gnueabi-gcc-4.8.5 488 488 776 544 arm-linux-gnueabi-gcc-4.9.3 552 552 776 536 arm-linux-gnueabi-gcc-5.3.1 552 552 776 536 arm-linux-gnueabi-gcc-6.1.1 560 560 776 536 arm-linux-gnueabi-gcc-7.0.1 616 616 808 536 I did not do any runtime tests with serpent, so it is possible that stack frame size does not directly correlate with runtime performance here and it actually makes things worse, but it's more likely to help here, and the reduced stack frame size is probably enough reason to apply the patch, especially given that the crypto code is often used in deep call chains. Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/ Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488 Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 Cc: Ralf Baechle Signed-off-by: Arnd Bergmann Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin --- crypto/Makefile | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/Makefile b/crypto/Makefile index 97b7d3ac87e7..16766ced6a44 100644 --- a/crypto/Makefile +++ b/crypto/Makefile @@ -47,6 +47,7 @@ obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o obj-$(CONFIG_CRYPTO_WP512) += wp512.o +CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o obj-$(CONFIG_CRYPTO_ECB) += ecb.o @@ -68,6 +69,7 @@ obj-$(CONFIG_CRYPTO_BLOWFISH_COMMON) += blowfish_common.o obj-$(CONFIG_CRYPTO_TWOFISH) += twofish_generic.o obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o +CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149 obj-$(CONFIG_CRYPTO_AES) += aes_generic.o obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o