mbox series

[v3,0/7] crypto: aes - allow generic AES to be omitted

Message ID 1497950940-24243-1-git-send-email-ard.biesheuvel@linaro.org
Headers show
Series crypto: aes - allow generic AES to be omitted | expand

Message

Ard Biesheuvel June 20, 2017, 9:28 a.m. UTC
The generic AES driver uses 16 lookup tables of 1 KB each, and has
encryption and decryption routines that are fully unrolled. Given how
the dependencies between this code and other drivers are declared in
Kconfig files, this code is always pulled into the core kernel, even
if it is usually superseded at runtime by accelerated drivers that
exist for many architectures.

This leaves us with 25 KB of dead code in the kernel, which is negligible
in typical environments, but which is actually a big deal for the IoT
domain, where every kilobyte counts.

Also, the scalar, table based AES routines that exist for ARM, arm64, i586
and x86_64 share the lookup tables with AES generic, and may be invoked
occasionally when the time-invariant AES-NI or other special instruction
drivers are called in interrupt context, at which time the SIMD register
file cannot be used. Pulling 16 KB of code and 9 KB of instructions into
the L1s (and evicting what was already there) when a softirq happens to
be handled in the context of an interrupt taken from kernel mode (which
means no SIMD on x86) is also something that we may like to avoid, by
falling back to a much smaller and moderately less performant driver.
(Note that arm64 will be updated shortly to supply fallbacks for all
SIMD based AES implementations, which will be based on the core routines
[if they are accepted].)

For the reasons above, this series refactors the way the various AES
implementations are wired up, to allow the generic version in
crypto/aes_generic.c to be omitted from the build entirely.

Patch #1 removes some bogus 'select CRYPTO_AES' statement.

Patch #2 introduces CRYPTO_AES_CORE and its implementation crypto/aes_core.c,
which contains the existing key expansion routines, and default encrypt and
decrypt routines that are not exposed as a crypto_cipher themselves, but
can be pulled in by other AES drivers. These routines only depend on the two
256 byte Sboxes

Patch #3 switches the fallback in the AES-NI code to the new, generic encrypt
and decrypt routines so it no longer depends on the x86 scalar code or
[transitively] on AES-generic.

Patch #4 removes the local copy of the AES sboxes from the arm64 NEON driver,
and switches to the ones exposed by the new AES core module instead.

Patch #5 repurposes the CRYPTO_AES Kconfig symbol as an abstract symbol that
indicates whether some implementation of AES needs to be available. The
existing generic code is now controlled by CRYPTO_AES_GENERIC.

Patch #6 updates the Kconfig help text to be more descriptive of what they
actually control, rather than duplicating AES's wikipedia entry a number of
times.

Patch #7 updates the Kconfig logic so CRYPTO_AES_GENERIC can be disabled if
any CRYPTO_AES dependencies are satisfied by the fixed time driver.

v3: - fix big-endian issue in refactored fixed-time AES driver
    - improve Kconfig help texts
    - add patch #4

v2: - repurpose CRYPTO_AES and avoid HAVE_AES/NEED_AES Kconfig symbols
    - don't factor out tables from AES generic to be reused by per arch drivers,
      since the space saving is moderate (the generic code only), and the
      drivers weren't made to be small anyway


Ard Biesheuvel (7):
  drivers/crypto/Kconfig: drop bogus CRYPTO_AES dependencies
  crypto: aes - refactor shared routines into separate core module
  crypto: x86/aes-ni - switch to generic fallback
  crypto: arm64/aes-neon - reuse Sboxes from AES core module
  crypto: aes - repurpose CRYPTO_AES and introduce CRYPTO_AES_GENERIC
  crypto: aes - add meaningful help text to the various AES drivers
  crypto: aes - allow generic AES to be replaced by fixed time AES

 arch/arm/crypto/Kconfig            |  20 +-
 arch/arm64/crypto/Kconfig          |  34 +-
 arch/arm64/crypto/aes-neon.S       |  74 +----
 arch/x86/crypto/aesni-intel_glue.c |   4 +-
 crypto/Kconfig                     | 161 ++++------
 crypto/Makefile                    |   3 +-
 crypto/aes_core.c                  | 333 ++++++++++++++++++++
 crypto/aes_generic.c               | 178 -----------
 crypto/aes_ti.c                    | 315 ++----------------
 drivers/crypto/Kconfig             |  13 +-
 include/crypto/aes.h               |   6 +
 net/sunrpc/Kconfig                 |   3 +-
 12 files changed, 486 insertions(+), 658 deletions(-)
 create mode 100644 crypto/aes_core.c

-- 
2.7.4

Comments

Herbert Xu July 18, 2017, 5:25 a.m. UTC | #1
On Tue, Jun 20, 2017 at 11:28:53AM +0200, Ard Biesheuvel wrote:
> The generic AES driver uses 16 lookup tables of 1 KB each, and has

> encryption and decryption routines that are fully unrolled. Given how

> the dependencies between this code and other drivers are declared in

> Kconfig files, this code is always pulled into the core kernel, even

> if it is usually superseded at runtime by accelerated drivers that

> exist for many architectures.


Why can't we simply replace aes-generic with aes-ti?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Ard Biesheuvel July 18, 2017, 6:32 a.m. UTC | #2
On 18 July 2017 at 06:25, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Jun 20, 2017 at 11:28:53AM +0200, Ard Biesheuvel wrote:

>> The generic AES driver uses 16 lookup tables of 1 KB each, and has

>> encryption and decryption routines that are fully unrolled. Given how

>> the dependencies between this code and other drivers are declared in

>> Kconfig files, this code is always pulled into the core kernel, even

>> if it is usually superseded at runtime by accelerated drivers that

>> exist for many architectures.

>

> Why can't we simply replace aes-generic with aes-ti?

>


Because it is slower, and how much slower is architecture dependent
(if your arch has slow multiplication, aes-ti decryption will be dog
slow compared to aes-generic)

Also, quite a few architectures have table based implementations that
reuse crypto_ft_tab/crypto_fl_tab etc so we'd need to factor out those
into a separate module if we were to remove aes-generic.
Herbert Xu July 18, 2017, 7:18 a.m. UTC | #3
On Tue, Jul 18, 2017 at 07:32:41AM +0100, Ard Biesheuvel wrote:
>

> Because it is slower, and how much slower is architecture dependent

> (if your arch has slow multiplication, aes-ti decryption will be dog

> slow compared to aes-generic)


Right, but does anybody actually care? My guess is that on such a
platform aes-generic is going to be dog-slow anyway.

> Also, quite a few architectures have table based implementations that

> reuse crypto_ft_tab/crypto_fl_tab etc so we'd need to factor out those

> into a separate module if we were to remove aes-generic.


You mean x86 and arm, right? Isn't the idea of your patch-set to allow
aes-generic to be disabled on x86/arm, no?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Ard Biesheuvel July 18, 2017, 7:57 a.m. UTC | #4
On 18 July 2017 at 08:18, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Jul 18, 2017 at 07:32:41AM +0100, Ard Biesheuvel wrote:

>>

>> Because it is slower, and how much slower is architecture dependent

>> (if your arch has slow multiplication, aes-ti decryption will be dog

>> slow compared to aes-generic)

>

> Right, but does anybody actually care? My guess is that on such a

> platform aes-generic is going to be dog-slow anyway.

>


Could be, yes, but it is not trivial to find out.

>> Also, quite a few architectures have table based implementations that

>> reuse crypto_ft_tab/crypto_fl_tab etc so we'd need to factor out those

>> into a separate module if we were to remove aes-generic.

>

> You mean x86 and arm, right? Isn't the idea of your patch-set to allow

> aes-generic to be disabled on x86/arm, no?

>


Yes. The original aes-ti implemented encryption only, to provide
AES-CBCMAC, CMAC etc, i.e., algorithms that rely on encryption only,
and cannot be parallelized, the use case being 32-bit ARM, which has a
fast, NEON-based, time invariant implementation for parallel modes,
but had to fall back to the table based AES for the MAC part of CCM
etc. As suggested by Eric, AES-TI was converted into a full cipher,
allowing it to replace AES generic entirely. (I have documented the
rationale more elaborately here:
http://www.workofard.com/2017/02/time-invariant-aes/)

So if you care about security and/or the cache/memory footprint more
than about speed, you can disable the table based implementations that
exist for i586, x86, ARM and arm64 (all of which have faster and time
invariant implementations based on SIMD or special instructions
anyway, so for 95% of the cases, it does not really matter).

But I'd be happy to rework the code so that the small time invariant
routines implement the core AES code, fulfilling the existing
dependencies on CRYPTO_AES. Then, we could still provide AES generic
on top of that, or only expose the tables (I don't think we should
remove the x86/arm table based drivers altogether). This fits well
with my non-SIMD fallback changes for arm64, which invoke
crypto_aes_encrypt() directly (rather than going via
CRYPTO_ALG_NEED_FALLBACK resolution, which itself may produce another
SIMD based algo which requires a non-SIMD fallback itself)
Herbert Xu July 18, 2017, 8:30 a.m. UTC | #5
On Tue, Jul 18, 2017 at 08:57:28AM +0100, Ard Biesheuvel wrote:
>

> So if you care about security and/or the cache/memory footprint more

> than about speed, you can disable the table based implementations that

> exist for i586, x86, ARM and arm64 (all of which have faster and time

> invariant implementations based on SIMD or special instructions

> anyway, so for 95% of the cases, it does not really matter).


The thing is that anybody who cares about speed won't be using
aes-generic anyway.  We have way too many AES implementations
as it is, and having two C implementations is really getting
silly.

So would it be possible for you to proceed with your work in
such a way that we end up with just aes-ti as the generic C
implementation?

As for the table-based asm implementations yes they can stay and
work out some way of sharing that table at the source-code level.
At run-time the table can just go into the asm module directly
since you'd only have one on each platform, right?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Ard Biesheuvel July 18, 2017, 8:35 a.m. UTC | #6
On 18 July 2017 at 09:30, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Jul 18, 2017 at 08:57:28AM +0100, Ard Biesheuvel wrote:

>>

>> So if you care about security and/or the cache/memory footprint more

>> than about speed, you can disable the table based implementations that

>> exist for i586, x86, ARM and arm64 (all of which have faster and time

>> invariant implementations based on SIMD or special instructions

>> anyway, so for 95% of the cases, it does not really matter).

>

> The thing is that anybody who cares about speed won't be using

> aes-generic anyway.  We have way too many AES implementations

> as it is, and having two C implementations is really getting

> silly.

>

> So would it be possible for you to proceed with your work in

> such a way that we end up with just aes-ti as the generic C

> implementation?

>


Sure.

> As for the table-based asm implementations yes they can stay and

> work out some way of sharing that table at the source-code level.

> At run-time the table can just go into the asm module directly

> since you'd only have one on each platform, right?

>


Indeed. And ARM only uses 4 of those 16 tables anyway (and really only
needs two of them, so I will fix that as well)