Message ID | 20180930085859.15038-3-ard.biesheuvel@linaro.org |
---|---|
State | New |
Headers | show |
Series | crypto - fix aegis/morus for big endian systems | expand |
On 30 September 2018 at 10:58, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Use the correct __le32 annotation and accessors to perform the > single round of AES encryption performed inside the AEGIS transform. > Otherwise, tcrypt reports: > > alg: aead: Test 1 failed on encryption for aegis128-generic > 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > alg: aead: Test 1 failed on encryption for aegis128l-generic > 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > alg: aead: Test 1 failed on encryption for aegis256-generic > 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > > While at it, let's refer to the first precomputed table only, and > derive the other ones by rotation. This reduces the D-cache footprint > by 75%, and shouldn't be too costly or free on load/store architectures > (and X86 has its own AES-NI based implementation) > > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") > Cc: <stable@vger.kernel.org> # v4.18+ > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > crypto/aegis.h | 23 +++++++++----------- > 1 file changed, 10 insertions(+), 13 deletions(-) > > diff --git a/crypto/aegis.h b/crypto/aegis.h > index f1c6900ddb80..84d3e07a3c33 100644 > --- a/crypto/aegis.h > +++ b/crypto/aegis.h > @@ -21,7 +21,7 @@ > > union aegis_block { > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > u8 bytes[AEGIS_BLOCK_SIZE]; > }; > > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, > { > u32 *d = dst->words32; > const u8 *s = src->bytes; > - const u32 *k = key->words32; > + const __le32 *k = key->words32; > const u32 *t0 = crypto_ft_tab[0]; > - const u32 *t1 = crypto_ft_tab[1]; > - const u32 *t2 = crypto_ft_tab[2]; > - const u32 *t3 = crypto_ft_tab[3]; > u32 d0, d1, d2, d3; > > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ rol32(t0[s[15]], 24); > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ rol32(t0[s[ 3]], 24); > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ rol32(t0[s[ 7]], 24); > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ rol32(t0[s[11]], 24); > > - d[0] = d0; > - d[1] = d1; > - d[2] = d2; > - d[3] = d3; > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); I suppose this > + d[0] = cpu_to_le32(d0) ^ k[0]; > + d[1] = cpu_to_le32(d1) ^ k[1]; > + d[2] = cpu_to_le32(d2) ^ k[2]; > + d[3] = cpu_to_le32(d3) ^ k[3]; should work fine as well > } > > #endif /* _CRYPTO_AEGIS_H */ > -- > 2.19.0 >
Hi Ard, On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > Use the correct __le32 annotation and accessors to perform the > single round of AES encryption performed inside the AEGIS transform. > Otherwise, tcrypt reports: > > alg: aead: Test 1 failed on encryption for aegis128-generic > 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > alg: aead: Test 1 failed on encryption for aegis128l-generic > 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > alg: aead: Test 1 failed on encryption for aegis256-generic > 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c Hm... I think the reason I made a mistake here is that I first had a version with the AES table hard-coded and I had an #ifdef <is big endian> #else #endif there with values for little-endian and big-endian variants. Then I realized the aes_generic module exports the crypto_ft_table and rewrote the code to use that. Somewhere along the way I forgot to check if the aes_generic table uses the same trick and correct the code... It would be nice to apply the same optimization to aes_generic.c, but unfortunately the current tables are exported so changing the convention would break external modules that use them :/ > > While at it, let's refer to the first precomputed table only, and > derive the other ones by rotation. This reduces the D-cache footprint > by 75%, and shouldn't be too costly or free on load/store architectures > (and X86 has its own AES-NI based implementation) Could you maybe extract this into a separate patch? I don't think we should mix functional and performance fixes together. > > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") > Cc: <stable@vger.kernel.org> # v4.18+ > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > --- > crypto/aegis.h | 23 +++++++++----------- > 1 file changed, 10 insertions(+), 13 deletions(-) > > diff --git a/crypto/aegis.h b/crypto/aegis.h > index f1c6900ddb80..84d3e07a3c33 100644 > --- a/crypto/aegis.h > +++ b/crypto/aegis.h > @@ -21,7 +21,7 @@ > > union aegis_block { > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > u8 bytes[AEGIS_BLOCK_SIZE]; > }; > > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, > { > u32 *d = dst->words32; > const u8 *s = src->bytes; > - const u32 *k = key->words32; > + const __le32 *k = key->words32; > const u32 *t0 = crypto_ft_tab[0]; > - const u32 *t1 = crypto_ft_tab[1]; > - const u32 *t2 = crypto_ft_tab[2]; > - const u32 *t3 = crypto_ft_tab[3]; > u32 d0, d1, d2, d3; > > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ rol32(t0[s[15]], 24); > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ rol32(t0[s[ 3]], 24); > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ rol32(t0[s[ 7]], 24); > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ rol32(t0[s[11]], 24); > > - d[0] = d0; > - d[1] = d1; > - d[2] = d2; > - d[3] = d3; > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); > } > > #endif /* _CRYPTO_AEGIS_H */ > -- > 2.19.0 > Thanks, -- Ondrej Mosnacek <omosnace at redhat dot com> Associate Software Engineer, Security Technologies Red Hat, Inc.
On 1 October 2018 at 09:50, Ondrej Mosnacek <omosnace@redhat.com> wrote: > Hi Ard, > > On Sun, Sep 30, 2018 at 10:59 AM Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> Use the correct __le32 annotation and accessors to perform the >> single round of AES encryption performed inside the AEGIS transform. >> Otherwise, tcrypt reports: >> >> alg: aead: Test 1 failed on encryption for aegis128-generic >> 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e >> alg: aead: Test 1 failed on encryption for aegis128l-generic >> 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 >> alg: aead: Test 1 failed on encryption for aegis256-generic >> 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > > Hm... I think the reason I made a mistake here is that I first had a > version with the AES table hard-coded and I had an #ifdef <is big > endian> #else #endif there with values for little-endian and > big-endian variants. Then I realized the aes_generic module exports > the crypto_ft_table and rewrote the code to use that. Somewhere along > the way I forgot to check if the aes_generic table uses the same trick > and correct the code... > > It would be nice to apply the same optimization to aes_generic.c, but > unfortunately the current tables are exported so changing the > convention would break external modules that use them :/ > Indeed. I am doing some refactoring work on the AES code, which is how I ran into this in the first place. https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=for-kernelci >> >> While at it, let's refer to the first precomputed table only, and >> derive the other ones by rotation. This reduces the D-cache footprint >> by 75%, and shouldn't be too costly or free on load/store architectures >> (and X86 has its own AES-NI based implementation) > > Could you maybe extract this into a separate patch? I don't think we > should mix functional and performance fixes together. > Yeah, good point. I will do that and fold in the simplification. >> >> Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") >> Cc: <stable@vger.kernel.org> # v4.18+ >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> --- >> crypto/aegis.h | 23 +++++++++----------- >> 1 file changed, 10 insertions(+), 13 deletions(-) >> >> diff --git a/crypto/aegis.h b/crypto/aegis.h >> index f1c6900ddb80..84d3e07a3c33 100644 >> --- a/crypto/aegis.h >> +++ b/crypto/aegis.h >> @@ -21,7 +21,7 @@ >> >> union aegis_block { >> __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; >> - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; >> + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; >> u8 bytes[AEGIS_BLOCK_SIZE]; >> }; >> >> @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, >> { >> u32 *d = dst->words32; >> const u8 *s = src->bytes; >> - const u32 *k = key->words32; >> + const __le32 *k = key->words32; >> const u32 *t0 = crypto_ft_tab[0]; >> - const u32 *t1 = crypto_ft_tab[1]; >> - const u32 *t2 = crypto_ft_tab[2]; >> - const u32 *t3 = crypto_ft_tab[3]; >> u32 d0, d1, d2, d3; >> >> - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; >> - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; >> - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; >> - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; >> + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ rol32(t0[s[15]], 24); >> + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ rol32(t0[s[ 3]], 24); >> + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ rol32(t0[s[ 7]], 24); >> + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ rol32(t0[s[11]], 24); >> >> - d[0] = d0; >> - d[1] = d1; >> - d[2] = d2; >> - d[3] = d3; >> + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); >> + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); >> + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); >> + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); >> } >> >> #endif /* _CRYPTO_AEGIS_H */ >> -- >> 2.19.0 >> > > Thanks, > > -- > Ondrej Mosnacek <omosnace at redhat dot com> > Associate Software Engineer, Security Technologies > Red Hat, Inc.
On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 30 September 2018 at 10:58, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > > Use the correct __le32 annotation and accessors to perform the > > single round of AES encryption performed inside the AEGIS transform. > > Otherwise, tcrypt reports: > > > > alg: aead: Test 1 failed on encryption for aegis128-generic > > 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > > alg: aead: Test 1 failed on encryption for aegis128l-generic > > 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > > alg: aead: Test 1 failed on encryption for aegis256-generic > > 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > > > > While at it, let's refer to the first precomputed table only, and > > derive the other ones by rotation. This reduces the D-cache footprint > > by 75%, and shouldn't be too costly or free on load/store architectures > > (and X86 has its own AES-NI based implementation) > > > > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") > > Cc: <stable@vger.kernel.org> # v4.18+ > > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > > --- > > crypto/aegis.h | 23 +++++++++----------- > > 1 file changed, 10 insertions(+), 13 deletions(-) > > > > diff --git a/crypto/aegis.h b/crypto/aegis.h > > index f1c6900ddb80..84d3e07a3c33 100644 > > --- a/crypto/aegis.h > > +++ b/crypto/aegis.h > > @@ -21,7 +21,7 @@ > > > > union aegis_block { > > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > > u8 bytes[AEGIS_BLOCK_SIZE]; > > }; > > > > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, > > { > > u32 *d = dst->words32; > > const u8 *s = src->bytes; > > - const u32 *k = key->words32; > > + const __le32 *k = key->words32; > > const u32 *t0 = crypto_ft_tab[0]; > > - const u32 *t1 = crypto_ft_tab[1]; > > - const u32 *t2 = crypto_ft_tab[2]; > > - const u32 *t3 = crypto_ft_tab[3]; > > u32 d0, d1, d2, d3; > > > > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ rol32(t0[s[15]], 24); > > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ rol32(t0[s[ 3]], 24); > > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ rol32(t0[s[ 7]], 24); > > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ rol32(t0[s[11]], 24); > > > > - d[0] = d0; > > - d[1] = d1; > > - d[2] = d2; > > - d[3] = d3; > > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); > > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); > > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); > > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); > > > I suppose this > > > + d[0] = cpu_to_le32(d0) ^ k[0]; > > + d[1] = cpu_to_le32(d1) ^ k[1]; > > + d[2] = cpu_to_le32(d2) ^ k[2]; > > + d[3] = cpu_to_le32(d3) ^ k[3]; > > should work fine as well Yeah, that looks nicer, but I'm not sure if it is completely OK to do bitwise/arithmetic operations directly on the __[lb]e* types... Maybe yes, but the code I've seen that used them usually seemed to treat them as opaque types. > > > } > > > > #endif /* _CRYPTO_AEGIS_H */ > > -- > > 2.19.0 > > -- Ondrej Mosnacek <omosnace at redhat dot com> Associate Software Engineer, Security Technologies Red Hat, Inc.
On 1 October 2018 at 10:00, Ondrej Mosnacek <omosnace@redhat.com> wrote: > On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> On 30 September 2018 at 10:58, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: >> > Use the correct __le32 annotation and accessors to perform the >> > single round of AES encryption performed inside the AEGIS transform. >> > Otherwise, tcrypt reports: >> > >> > alg: aead: Test 1 failed on encryption for aegis128-generic >> > 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e >> > alg: aead: Test 1 failed on encryption for aegis128l-generic >> > 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 >> > alg: aead: Test 1 failed on encryption for aegis256-generic >> > 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c >> > >> > While at it, let's refer to the first precomputed table only, and >> > derive the other ones by rotation. This reduces the D-cache footprint >> > by 75%, and shouldn't be too costly or free on load/store architectures >> > (and X86 has its own AES-NI based implementation) >> > >> > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") >> > Cc: <stable@vger.kernel.org> # v4.18+ >> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> > --- >> > crypto/aegis.h | 23 +++++++++----------- >> > 1 file changed, 10 insertions(+), 13 deletions(-) >> > >> > diff --git a/crypto/aegis.h b/crypto/aegis.h >> > index f1c6900ddb80..84d3e07a3c33 100644 >> > --- a/crypto/aegis.h >> > +++ b/crypto/aegis.h >> > @@ -21,7 +21,7 @@ >> > >> > union aegis_block { >> > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; >> > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; >> > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; >> > u8 bytes[AEGIS_BLOCK_SIZE]; >> > }; >> > >> > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, >> > { >> > u32 *d = dst->words32; >> > const u8 *s = src->bytes; >> > - const u32 *k = key->words32; >> > + const __le32 *k = key->words32; >> > const u32 *t0 = crypto_ft_tab[0]; >> > - const u32 *t1 = crypto_ft_tab[1]; >> > - const u32 *t2 = crypto_ft_tab[2]; >> > - const u32 *t3 = crypto_ft_tab[3]; >> > u32 d0, d1, d2, d3; >> > >> > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; >> > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; >> > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; >> > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; >> > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ rol32(t0[s[15]], 24); >> > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ rol32(t0[s[ 3]], 24); >> > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ rol32(t0[s[ 7]], 24); >> > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ rol32(t0[s[11]], 24); >> > >> > - d[0] = d0; >> > - d[1] = d1; >> > - d[2] = d2; >> > - d[3] = d3; >> > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); >> > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); >> > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); >> > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); >> >> >> I suppose this >> >> > + d[0] = cpu_to_le32(d0) ^ k[0]; >> > + d[1] = cpu_to_le32(d1) ^ k[1]; >> > + d[2] = cpu_to_le32(d2) ^ k[2]; >> > + d[3] = cpu_to_le32(d3) ^ k[3]; >> >> should work fine as well > > Yeah, that looks nicer, but I'm not sure if it is completely OK to do > bitwise/arithmetic operations directly on the __[lb]e* types... Maybe > yes, but the code I've seen that used them usually seemed to treat > them as opaque types. > No, xor is fine with __le/__be types
On Mon, Oct 1, 2018 at 10:01 AM Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 1 October 2018 at 10:00, Ondrej Mosnacek <omosnace@redhat.com> wrote: > > On Sun, Sep 30, 2018 at 1:14 PM Ard Biesheuvel > > <ard.biesheuvel@linaro.org> wrote: > >> On 30 September 2018 at 10:58, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > >> > Use the correct __le32 annotation and accessors to perform the > >> > single round of AES encryption performed inside the AEGIS transform. > >> > Otherwise, tcrypt reports: > >> > > >> > alg: aead: Test 1 failed on encryption for aegis128-generic > >> > 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e > >> > alg: aead: Test 1 failed on encryption for aegis128l-generic > >> > 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 > >> > alg: aead: Test 1 failed on encryption for aegis256-generic > >> > 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c > >> > > >> > While at it, let's refer to the first precomputed table only, and > >> > derive the other ones by rotation. This reduces the D-cache footprint > >> > by 75%, and shouldn't be too costly or free on load/store architectures > >> > (and X86 has its own AES-NI based implementation) > >> > > >> > Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") > >> > Cc: <stable@vger.kernel.org> # v4.18+ > >> > Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > >> > --- > >> > crypto/aegis.h | 23 +++++++++----------- > >> > 1 file changed, 10 insertions(+), 13 deletions(-) > >> > > >> > diff --git a/crypto/aegis.h b/crypto/aegis.h > >> > index f1c6900ddb80..84d3e07a3c33 100644 > >> > --- a/crypto/aegis.h > >> > +++ b/crypto/aegis.h > >> > @@ -21,7 +21,7 @@ > >> > > >> > union aegis_block { > >> > __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; > >> > - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; > >> > + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; > >> > u8 bytes[AEGIS_BLOCK_SIZE]; > >> > }; > >> > > >> > @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, > >> > { > >> > u32 *d = dst->words32; > >> > const u8 *s = src->bytes; > >> > - const u32 *k = key->words32; > >> > + const __le32 *k = key->words32; > >> > const u32 *t0 = crypto_ft_tab[0]; > >> > - const u32 *t1 = crypto_ft_tab[1]; > >> > - const u32 *t2 = crypto_ft_tab[2]; > >> > - const u32 *t3 = crypto_ft_tab[3]; > >> > u32 d0, d1, d2, d3; > >> > > >> > - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; > >> > - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; > >> > - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; > >> > - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; > >> > + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ rol32(t0[s[15]], 24); > >> > + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ rol32(t0[s[ 3]], 24); > >> > + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ rol32(t0[s[ 7]], 24); > >> > + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ rol32(t0[s[11]], 24); > >> > > >> > - d[0] = d0; > >> > - d[1] = d1; > >> > - d[2] = d2; > >> > - d[3] = d3; > >> > + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); > >> > + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); > >> > + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); > >> > + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); > >> > >> > >> I suppose this > >> > >> > + d[0] = cpu_to_le32(d0) ^ k[0]; > >> > + d[1] = cpu_to_le32(d1) ^ k[1]; > >> > + d[2] = cpu_to_le32(d2) ^ k[2]; > >> > + d[3] = cpu_to_le32(d3) ^ k[3]; > >> > >> should work fine as well > > > > Yeah, that looks nicer, but I'm not sure if it is completely OK to do > > bitwise/arithmetic operations directly on the __[lb]e* types... Maybe > > yes, but the code I've seen that used them usually seemed to treat > > them as opaque types. > > > > No, xor is fine with __le/__be types Ah, OK then. Good to know :) -- Ondrej Mosnacek <omosnace at redhat dot com> Associate Software Engineer, Security Technologies Red Hat, Inc.
diff --git a/crypto/aegis.h b/crypto/aegis.h index f1c6900ddb80..84d3e07a3c33 100644 --- a/crypto/aegis.h +++ b/crypto/aegis.h @@ -21,7 +21,7 @@ union aegis_block { __le64 words64[AEGIS_BLOCK_SIZE / sizeof(__le64)]; - u32 words32[AEGIS_BLOCK_SIZE / sizeof(u32)]; + __le32 words32[AEGIS_BLOCK_SIZE / sizeof(__le32)]; u8 bytes[AEGIS_BLOCK_SIZE]; }; @@ -59,22 +59,19 @@ static void crypto_aegis_aesenc(union aegis_block *dst, { u32 *d = dst->words32; const u8 *s = src->bytes; - const u32 *k = key->words32; + const __le32 *k = key->words32; const u32 *t0 = crypto_ft_tab[0]; - const u32 *t1 = crypto_ft_tab[1]; - const u32 *t2 = crypto_ft_tab[2]; - const u32 *t3 = crypto_ft_tab[3]; u32 d0, d1, d2, d3; - d0 = t0[s[ 0]] ^ t1[s[ 5]] ^ t2[s[10]] ^ t3[s[15]] ^ k[0]; - d1 = t0[s[ 4]] ^ t1[s[ 9]] ^ t2[s[14]] ^ t3[s[ 3]] ^ k[1]; - d2 = t0[s[ 8]] ^ t1[s[13]] ^ t2[s[ 2]] ^ t3[s[ 7]] ^ k[2]; - d3 = t0[s[12]] ^ t1[s[ 1]] ^ t2[s[ 6]] ^ t3[s[11]] ^ k[3]; + d0 = t0[s[ 0]] ^ rol32(t0[s[ 5]], 8) ^ rol32(t0[s[10]], 16) ^ rol32(t0[s[15]], 24); + d1 = t0[s[ 4]] ^ rol32(t0[s[ 9]], 8) ^ rol32(t0[s[14]], 16) ^ rol32(t0[s[ 3]], 24); + d2 = t0[s[ 8]] ^ rol32(t0[s[13]], 8) ^ rol32(t0[s[ 2]], 16) ^ rol32(t0[s[ 7]], 24); + d3 = t0[s[12]] ^ rol32(t0[s[ 1]], 8) ^ rol32(t0[s[ 6]], 16) ^ rol32(t0[s[11]], 24); - d[0] = d0; - d[1] = d1; - d[2] = d2; - d[3] = d3; + d[0] = cpu_to_le32(d0 ^ le32_to_cpu(k[0])); + d[1] = cpu_to_le32(d1 ^ le32_to_cpu(k[1])); + d[2] = cpu_to_le32(d2 ^ le32_to_cpu(k[2])); + d[3] = cpu_to_le32(d3 ^ le32_to_cpu(k[3])); } #endif /* _CRYPTO_AEGIS_H */
Use the correct __le32 annotation and accessors to perform the single round of AES encryption performed inside the AEGIS transform. Otherwise, tcrypt reports: alg: aead: Test 1 failed on encryption for aegis128-generic 00000000: 6c 25 25 4a 3c 10 1d 27 2b c1 d4 84 9a ef 7f 6e alg: aead: Test 1 failed on encryption for aegis128l-generic 00000000: cd c6 e3 b8 a0 70 9d 8e c2 4f 6f fe 71 42 df 28 alg: aead: Test 1 failed on encryption for aegis256-generic 00000000: aa ed 07 b1 96 1d e9 e6 f2 ed b5 8e 1c 5f dc 1c While at it, let's refer to the first precomputed table only, and derive the other ones by rotation. This reduces the D-cache footprint by 75%, and shouldn't be too costly or free on load/store architectures (and X86 has its own AES-NI based implementation) Fixes: f606a88e5823 ("crypto: aegis - Add generic AEGIS AEAD implementations") Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> --- crypto/aegis.h | 23 +++++++++----------- 1 file changed, 10 insertions(+), 13 deletions(-) -- 2.19.0