diff mbox series

[RFT] crypto: aes-generic - turn off -ftree-pre and -ftree-sra

Message ID 20171220205213.1025257-1-arnd@arndb.de
State New
Headers show
Series [RFT] crypto: aes-generic - turn off -ftree-pre and -ftree-sra | expand

Commit Message

Arnd Bergmann Dec. 20, 2017, 8:52 p.m. UTC
While testing other changes, I discovered that gcc-7.2.1 produces badly
optimized code for aes_encrypt/aes_decrypt. This is especially true when
CONFIG_UBSAN_SANITIZE_ALL is enabled, where it leads to extremely
large stack usage that in turn might cause kernel stack overflows:

crypto/aes_generic.c: In function 'aes_encrypt':
crypto/aes_generic.c:1371:1: warning: the frame size of 4880 bytes is larger than 2048 bytes [-Wframe-larger-than=]
crypto/aes_generic.c: In function 'aes_decrypt':
crypto/aes_generic.c:1441:1: warning: the frame size of 4864 bytes is larger than 2048 bytes [-Wframe-larger-than=]

I verified that this problem exists on all architectures that are
supported by gcc-7.2, though arm64 in particular is less affected than
the others. I also found that gcc-7.1 and gcc-8 do not show the extreme
stack usage but still produce worse code than earlier versions for this
file, apparently because of optimization passes that generally provide
a substantial improvement in object code quality but understandably fail
to find any shortcuts in the AES algorithm.

Turning off the tree-pre and tree-sra optimization steps seems to
reverse the effect, and could be used as a workaround in case we
don't get a good gcc fix.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
Cc: Richard Biener <rguenther@suse.de>
Cc: Jakub Jelinek <jakub@gcc.gnu.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>

---
Jakub and Richard have done a more detailed analysis of this, and are
working on ways to improve the code again. In the meantime, I'm sending
this workaround to the Linux crypto maintainers to make them aware of
this issue and for testing.

What would be a good way to test the performance of the AES code with
the various combinations of compiler versions, as well as UBSAN and this
patch enabled or disabled?
---
 crypto/aes_generic.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

-- 
2.9.0

Comments

Arnd Bergmann Dec. 20, 2017, 9:31 p.m. UTC | #1
On Wed, Dec 20, 2017 at 10:14 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 20 December 2017 at 20:52, Arnd Bergmann <arnd@arndb.de> wrote:

>

> You can use the tcrypt.ko module to benchmark AES.

>

> modprobe tcrypt mode=200 sec=1


Ok, that's what I was looking for. I don't think I'll have time to
analyze this before
my Christmas break (I'm only here one more day, and have not set up a test
environment for this)

> to run a (lengthy) AES benchmark in various modes. AES-128 in ECB mode

> using the largest block size tested is what I usually use for

> comparison.

>

> On my Cortex-A57, the generic AES code runs at ~18 cycles per byte.

> Note that we have alternative scalar implementations on ARM and arm64

> that are faster so the performance of aes-generic is not really

> relevant (and so it is essentially dead code)


Ok. arm64 is also the least affected by this problem out of all architectures,
but it most architectures don't have an optimized implementation.

      Arnd
Jakub Jelinek Dec. 20, 2017, 9:46 p.m. UTC | #2
On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote:
> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c

> index ca554d57d01e..35f973ba9878 100644

> --- a/crypto/aes_generic.c

> +++ b/crypto/aes_generic.c

> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key);

>  	f_rl(bo, bi, 3, k);	\

>  } while (0)

>  

> +#if __GNUC__ >= 7

> +/*

> + * Newer compilers try to optimize integer arithmetic more aggressively,

> + * which generally improves code quality a lot, but in this specific case

> + * ends up hurting more than it helps, in some configurations drastically

> + * so. This turns off two optimization steps that have been shown to

> + * lead to rather badly optimized code with gcc-7.

> + *

> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356

> + */

> +#pragma GCC optimize("-fno-tree-pre")

> +#pragma GCC optimize("-fno-tree-sra")


So do it only when UBSAN is enabled?  GCC doesn't have a particular
predefined macro for those (only for asan and tsan), but either the kernel
does have something already, or could have something added in the
corresponding Makefile.

	Jakub
Arnd Bergmann Dec. 21, 2017, 10:20 a.m. UTC | #3
On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote:

>> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c

>> index ca554d57d01e..35f973ba9878 100644

>> --- a/crypto/aes_generic.c

>> +++ b/crypto/aes_generic.c

>> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key);

>>       f_rl(bo, bi, 3, k);     \

>>  } while (0)

>>

>> +#if __GNUC__ >= 7

>> +/*

>> + * Newer compilers try to optimize integer arithmetic more aggressively,

>> + * which generally improves code quality a lot, but in this specific case

>> + * ends up hurting more than it helps, in some configurations drastically

>> + * so. This turns off two optimization steps that have been shown to

>> + * lead to rather badly optimized code with gcc-7.

>> + *

>> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356

>> + */

>> +#pragma GCC optimize("-fno-tree-pre")

>> +#pragma GCC optimize("-fno-tree-sra")

>

> So do it only when UBSAN is enabled?  GCC doesn't have a particular

> predefined macro for those (only for asan and tsan), but either the kernel

> does have something already, or could have something added in the

> corresponding Makefile.


My original interpretation of the resulting object code suggested that disabling
those two optimizations produced better results for this particular
file even without
UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have
been better, but I did some measurements now as Ard suggested, showing
cycles/byte for AES256/CBC with 8KB blocks:


               default      ubsan         patched        patched+ubsan
gcc-4.3.6        14.9        ----           14.9         ----
gcc-4.6.4        15.0        ----           15.8         ----
gcc-4.9.4        15.5        20.7           15.9         20.9
gcc-5.5.0        15.6        47.3           86.4         48.8
gcc-6.3.1        14.6        49.4           94.3         50.9
gcc-7.1.1        13.5        54.6           15.2         52.0
gcc-7.2.1        16.8       124.7           92.0         52.2
gcc-8.0.0        15.0      no boot          15.3        no boot

I checked that there are actually three significant digits on the measurements,
detailed output is available at https://pastebin.com/eFsWYjQp

It seems that I was wrong about the interpretation that disabling
the optimization would be a win on gcc-7 and higher, it almost
always makes things worse even with UBSAN. Making that
check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)"
would help here, or we could list the file as an exception for
UBSAN and never sanitize it.

Looking at the 'default' column, I wonder if anyone would be interested
in looking at why the throughput regressed with gcc-7.2 and gcc-8.

       Arnd
PrasannaKumar Muralidharan Dec. 21, 2017, 1:47 p.m. UTC | #4
Hi Ard,

On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote:

>> On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>>> On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote:

>>>> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c

>>>> index ca554d57d01e..35f973ba9878 100644

>>>> --- a/crypto/aes_generic.c

>>>> +++ b/crypto/aes_generic.c

>>>> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key);

>>>>       f_rl(bo, bi, 3, k);     \

>>>>  } while (0)

>>>>

>>>> +#if __GNUC__ >= 7

>>>> +/*

>>>> + * Newer compilers try to optimize integer arithmetic more aggressively,

>>>> + * which generally improves code quality a lot, but in this specific case

>>>> + * ends up hurting more than it helps, in some configurations drastically

>>>> + * so. This turns off two optimization steps that have been shown to

>>>> + * lead to rather badly optimized code with gcc-7.

>>>> + *

>>>> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356

>>>> + */

>>>> +#pragma GCC optimize("-fno-tree-pre")

>>>> +#pragma GCC optimize("-fno-tree-sra")

>>>

>>> So do it only when UBSAN is enabled?  GCC doesn't have a particular

>>> predefined macro for those (only for asan and tsan), but either the kernel

>>> does have something already, or could have something added in the

>>> corresponding Makefile.

>>

>> My original interpretation of the resulting object code suggested that disabling

>> those two optimizations produced better results for this particular

>> file even without

>> UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have

>> been better, but I did some measurements now as Ard suggested, showing

>> cycles/byte for AES256/CBC with 8KB blocks:

>>

>>

>>                default      ubsan         patched        patched+ubsan

>> gcc-4.3.6        14.9        ----           14.9         ----

>> gcc-4.6.4        15.0        ----           15.8         ----

>> gcc-4.9.4        15.5        20.7           15.9         20.9

>> gcc-5.5.0        15.6        47.3           86.4         48.8

>> gcc-6.3.1        14.6        49.4           94.3         50.9

>> gcc-7.1.1        13.5        54.6           15.2         52.0

>> gcc-7.2.1        16.8       124.7           92.0         52.2

>> gcc-8.0.0        15.0      no boot          15.3        no boot

>>

>> I checked that there are actually three significant digits on the measurements,

>> detailed output is available at https://pastebin.com/eFsWYjQp

>>

>> It seems that I was wrong about the interpretation that disabling

>> the optimization would be a win on gcc-7 and higher, it almost

>> always makes things worse even with UBSAN. Making that

>> check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)"

>> would help here, or we could list the file as an exception for

>> UBSAN and never sanitize it.

>>

>> Looking at the 'default' column, I wonder if anyone would be interested

>> in looking at why the throughput regressed with gcc-7.2 and gcc-8.

>>

>

> Thanks for the elaborate benchmarks. Looking at the bugzilla entry, it

> appears the UBSAN code inserts runtime checks to ensure that certain

> u8 variables don't assume values <0 or >255, which seems like a rather

> pointless exercise to me. But even if it didn't, I think it is

> justified to disable UBSAN on all of the low-level cipher

> implementations, given that they are hardly ever modified, and

> typically don't suffer from the issues UBSAN tries to identify.

>

> So my vote is to disable UBSAN for all such cipher implementations:

> aes_generic, but also aes_ti, which has a similar 256 byte lookup

> table [although it does not seem to be affected by the same issue as

> aes_generic], and possibly others as well.

>

> Perhaps it makes sense to move core cipher code into a separate

> sub-directory, and disable UBSAN at the directory level?

>

> It would involve the following files

>

> crypto/aes_generic.c

> crypto/aes_ti.c

> crypto/anubis.c

> crypto/arc4.c

> crypto/blowfish_generic.c

> crypto/camellia_generic.c

> crypto/cast5_generic.c

> crypto/cast6_generic.c

> crypto/des_generic.c

> crypto/fcrypt.c

> crypto/khazad.c

> crypto/seed.c

> crypto/serpent_generic.c

> crypto/tea.c

> crypto/twofish_generic.c


As *SAN is enabled only on developer setup, is such a change required?
Looks like I am missing something here. Can you explain what value it
provides?

Regards,
PrasannaKumar
Ard Biesheuvel Dec. 22, 2017, 3:47 p.m. UTC | #5
On 21 December 2017 at 13:47, PrasannaKumar Muralidharan
<prasannatsmkumar@gmail.com> wrote:
> Hi Ard,

>

> On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>> On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote:

>>> On Wed, Dec 20, 2017 at 10:46 PM, Jakub Jelinek <jakub@redhat.com> wrote:

>>>> On Wed, Dec 20, 2017 at 09:52:05PM +0100, Arnd Bergmann wrote:

>>>>> diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c

>>>>> index ca554d57d01e..35f973ba9878 100644

>>>>> --- a/crypto/aes_generic.c

>>>>> +++ b/crypto/aes_generic.c

>>>>> @@ -1331,6 +1331,20 @@ EXPORT_SYMBOL_GPL(crypto_aes_set_key);

>>>>>       f_rl(bo, bi, 3, k);     \

>>>>>  } while (0)

>>>>>

>>>>> +#if __GNUC__ >= 7

>>>>> +/*

>>>>> + * Newer compilers try to optimize integer arithmetic more aggressively,

>>>>> + * which generally improves code quality a lot, but in this specific case

>>>>> + * ends up hurting more than it helps, in some configurations drastically

>>>>> + * so. This turns off two optimization steps that have been shown to

>>>>> + * lead to rather badly optimized code with gcc-7.

>>>>> + *

>>>>> + * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356

>>>>> + */

>>>>> +#pragma GCC optimize("-fno-tree-pre")

>>>>> +#pragma GCC optimize("-fno-tree-sra")

>>>>

>>>> So do it only when UBSAN is enabled?  GCC doesn't have a particular

>>>> predefined macro for those (only for asan and tsan), but either the kernel

>>>> does have something already, or could have something added in the

>>>> corresponding Makefile.

>>>

>>> My original interpretation of the resulting object code suggested that disabling

>>> those two optimizations produced better results for this particular

>>> file even without

>>> UBSAN, on both gcc-7 and gcc-8 (but not gcc-6), so my patch might have

>>> been better, but I did some measurements now as Ard suggested, showing

>>> cycles/byte for AES256/CBC with 8KB blocks:

>>>

>>>

>>>                default      ubsan         patched        patched+ubsan

>>> gcc-4.3.6        14.9        ----           14.9         ----

>>> gcc-4.6.4        15.0        ----           15.8         ----

>>> gcc-4.9.4        15.5        20.7           15.9         20.9

>>> gcc-5.5.0        15.6        47.3           86.4         48.8

>>> gcc-6.3.1        14.6        49.4           94.3         50.9

>>> gcc-7.1.1        13.5        54.6           15.2         52.0

>>> gcc-7.2.1        16.8       124.7           92.0         52.2

>>> gcc-8.0.0        15.0      no boot          15.3        no boot

>>>

>>> I checked that there are actually three significant digits on the measurements,

>>> detailed output is available at https://pastebin.com/eFsWYjQp

>>>

>>> It seems that I was wrong about the interpretation that disabling

>>> the optimization would be a win on gcc-7 and higher, it almost

>>> always makes things worse even with UBSAN. Making that

>>> check "#if __GNUC__ == 7 && IS_ENABLED(CONFIG_UBSAN_SANITIZE_ALL)"

>>> would help here, or we could list the file as an exception for

>>> UBSAN and never sanitize it.

>>>

>>> Looking at the 'default' column, I wonder if anyone would be interested

>>> in looking at why the throughput regressed with gcc-7.2 and gcc-8.

>>>

>>

>> Thanks for the elaborate benchmarks. Looking at the bugzilla entry, it

>> appears the UBSAN code inserts runtime checks to ensure that certain

>> u8 variables don't assume values <0 or >255, which seems like a rather

>> pointless exercise to me. But even if it didn't, I think it is

>> justified to disable UBSAN on all of the low-level cipher

>> implementations, given that they are hardly ever modified, and

>> typically don't suffer from the issues UBSAN tries to identify.

>>

>> So my vote is to disable UBSAN for all such cipher implementations:

>> aes_generic, but also aes_ti, which has a similar 256 byte lookup

>> table [although it does not seem to be affected by the same issue as

>> aes_generic], and possibly others as well.

>>

>> Perhaps it makes sense to move core cipher code into a separate

>> sub-directory, and disable UBSAN at the directory level?

>>

>> It would involve the following files

>>

>> crypto/aes_generic.c

>> crypto/aes_ti.c

>> crypto/anubis.c

>> crypto/arc4.c

>> crypto/blowfish_generic.c

>> crypto/camellia_generic.c

>> crypto/cast5_generic.c

>> crypto/cast6_generic.c

>> crypto/des_generic.c

>> crypto/fcrypt.c

>> crypto/khazad.c

>> crypto/seed.c

>> crypto/serpent_generic.c

>> crypto/tea.c

>> crypto/twofish_generic.c

>

> As *SAN is enabled only on developer setup, is such a change required?

> Looks like I am missing something here. Can you explain what value it

> provides?

>


Well, in this particular case, the value it provides is that the
kernel can still boot and invoke the AES code without overflowing the
kernel stack. Of course, this is a compiler issue that hopefully gets
fixed, but I think it may be reasonable to exclude some C code from
UBSAN by default.
Arnd Bergmann Jan. 3, 2018, 4:37 p.m. UTC | #6
On Fri, Dec 22, 2017 at 4:47 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 21 December 2017 at 13:47, PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> wrote:

>> On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>>> On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote:

>>>

>>> So my vote is to disable UBSAN for all such cipher implementations:

>>> aes_generic, but also aes_ti, which has a similar 256 byte lookup

>>> table [although it does not seem to be affected by the same issue as

>>> aes_generic], and possibly others as well.

>>>

>>> Perhaps it makes sense to move core cipher code into a separate

>>> sub-directory, and disable UBSAN at the directory level?

>>>

>>> It would involve the following files

>>>

>>> crypto/aes_generic.c

>>> crypto/aes_ti.c

>>> crypto/anubis.c

>>> crypto/arc4.c

>>> crypto/blowfish_generic.c

>>> crypto/camellia_generic.c

>>> crypto/cast5_generic.c

>>> crypto/cast6_generic.c

>>> crypto/des_generic.c

>>> crypto/fcrypt.c

>>> crypto/khazad.c

>>> crypto/seed.c

>>> crypto/serpent_generic.c

>>> crypto/tea.c

>>> crypto/twofish_generic.c

>>

>> As *SAN is enabled only on developer setup, is such a change required?

>> Looks like I am missing something here. Can you explain what value it

>> provides?

>>

>

> Well, in this particular case, the value it provides is that the

> kernel can still boot and invoke the AES code without overflowing the

> kernel stack. Of course, this is a compiler issue that hopefully gets

> fixed, but I think it may be reasonable to exclude some C code from

> UBSAN by default.


Any idea how to proceed here? I've retested with the latest gcc snapshot
and verified that the problem is still there. No idea what the chance of
getting it fixed before the 7.3 release is. From the performance tests
I've done, the patch I posted is pretty much useless, it causes significant
performance regressions on most other compiler versions.

A minimal patch would be to disable UBSAN specifically for aes-generic.c
for gcc-7.2+ but not gcc-8 to avoid the potential stack overflow. We could
also force building with -Os on gcc-7, and leave UBSAN enabled,
this would improve performance some 3-5% on x86 with gcc-7 (both
7.1 and 7.2.1) and avoid the stack overflow.

For the performance regression in gcc-7.2.1 on this file, I've opened
a separate gcc PR now, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651
I've also tested the libressl version of their generic AES code, with
mixed results (it's appears to be much slower than the kernel version
to start with, and while it has further performance regressions with recent
compilers, those are with a different set of versions compared to the
kernel implementation, and it does not suffer from the high stack usage).

       Arnd
Ard Biesheuvel Jan. 3, 2018, 5:36 p.m. UTC | #7
On 3 January 2018 at 16:37, Arnd Bergmann <arnd@arndb.de> wrote:
> On Fri, Dec 22, 2017 at 4:47 PM, Ard Biesheuvel

> <ard.biesheuvel@linaro.org> wrote:

>> On 21 December 2017 at 13:47, PrasannaKumar Muralidharan <prasannatsmkumar@gmail.com> wrote:

>>> On 21 December 2017 at 17:52, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>>>> On 21 December 2017 at 10:20, Arnd Bergmann <arnd@arndb.de> wrote:

>>>>

>>>> So my vote is to disable UBSAN for all such cipher implementations:

>>>> aes_generic, but also aes_ti, which has a similar 256 byte lookup

>>>> table [although it does not seem to be affected by the same issue as

>>>> aes_generic], and possibly others as well.

>>>>

>>>> Perhaps it makes sense to move core cipher code into a separate

>>>> sub-directory, and disable UBSAN at the directory level?

>>>>

>>>> It would involve the following files

>>>>

>>>> crypto/aes_generic.c

>>>> crypto/aes_ti.c

>>>> crypto/anubis.c

>>>> crypto/arc4.c

>>>> crypto/blowfish_generic.c

>>>> crypto/camellia_generic.c

>>>> crypto/cast5_generic.c

>>>> crypto/cast6_generic.c

>>>> crypto/des_generic.c

>>>> crypto/fcrypt.c

>>>> crypto/khazad.c

>>>> crypto/seed.c

>>>> crypto/serpent_generic.c

>>>> crypto/tea.c

>>>> crypto/twofish_generic.c

>>>

>>> As *SAN is enabled only on developer setup, is such a change required?

>>> Looks like I am missing something here. Can you explain what value it

>>> provides?

>>>

>>

>> Well, in this particular case, the value it provides is that the

>> kernel can still boot and invoke the AES code without overflowing the

>> kernel stack. Of course, this is a compiler issue that hopefully gets

>> fixed, but I think it may be reasonable to exclude some C code from

>> UBSAN by default.

>

> Any idea how to proceed here? I've retested with the latest gcc snapshot

> and verified that the problem is still there. No idea what the chance of

> getting it fixed before the 7.3 release is. From the performance tests

> I've done, the patch I posted is pretty much useless, it causes significant

> performance regressions on most other compiler versions.

>

> A minimal patch would be to disable UBSAN specifically for aes-generic.c

> for gcc-7.2+ but not gcc-8 to avoid the potential stack overflow. We could

> also force building with -Os on gcc-7, and leave UBSAN enabled,

> this would improve performance some 3-5% on x86 with gcc-7 (both

> 7.1 and 7.2.1) and avoid the stack overflow.

>


Can't we just disable UBSAN for that file for all GCC versions and be
done with it? It is not a production feature, and that code is
unlikely to change in ways where UBSAN would make a difference anyway,
nor is it ever executed on 99.9% of systems running Linux.

> For the performance regression in gcc-7.2.1 on this file, I've opened

> a separate gcc PR now, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83651

> I've also tested the libressl version of their generic AES code, with

> mixed results (it's appears to be much slower than the kernel version

> to start with, and while it has further performance regressions with recent

> compilers, those are with a different set of versions compared to the

> kernel implementation, and it does not suffer from the high stack usage).

>
diff mbox series

Patch

diff --git a/crypto/aes_generic.c b/crypto/aes_generic.c
index ca554d57d01e..35f973ba9878 100644
--- a/crypto/aes_generic.c
+++ b/crypto/aes_generic.c
@@ -1331,6 +1331,20 @@  EXPORT_SYMBOL_GPL(crypto_aes_set_key);
 	f_rl(bo, bi, 3, k);	\
 } while (0)
 
+#if __GNUC__ >= 7
+/*
+ * Newer compilers try to optimize integer arithmetic more aggressively,
+ * which generally improves code quality a lot, but in this specific case
+ * ends up hurting more than it helps, in some configurations drastically
+ * so. This turns off two optimization steps that have been shown to
+ * lead to rather badly optimized code with gcc-7.
+ *
+ * See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83356
+ */
+#pragma GCC optimize("-fno-tree-pre")
+#pragma GCC optimize("-fno-tree-sra")
+#endif
+
 static void aes_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
 {
 	const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm);