mbox series

[v2,0/6] crypto: lib/sha256 - cleanup/optimization

Message ID 20201020203957.3512851-1-nivedita@alum.mit.edu
Headers show
Series crypto: lib/sha256 - cleanup/optimization | expand

Message

Arvind Sankar Oct. 20, 2020, 8:39 p.m. UTC
Patch 1 -- Use memzero_explicit() instead of structure assignment/plain
memset() to clear sensitive state.

Patch 2 -- I am not sure about this one: currently the temporary
variables used in the generic sha256 implementation are cleared, but the
clearing is optimized away due to lack of compiler barriers. I don't
think it's really necessary to clear them, but I'm not a cryptanalyst,
so I would like comment on whether it's indeed safe not to, or we should
instead add the required barriers to force clearing.

The last four patches are optimizations for generic sha256.

v2:
- Add patch to combine K and W arrays, suggested by David
- Reformat SHA256_ROUND() macro a little

Arvind Sankar (6):
  crypto: Use memzero_explicit() for clearing state
  crypto: lib/sha256 - Don't clear temporary variables
  crypto: lib/sha256 - Clear W[] in sha256_update() instead of
    sha256_transform()
  crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64
  crypto: lib/sha256 - Unroll LOAD and BLEND loops
  crypto: lib/sha - Combine round constants and message schedule

 include/crypto/sha1_base.h   |   3 +-
 include/crypto/sha256_base.h |   3 +-
 include/crypto/sha512_base.h |   3 +-
 include/crypto/sm3_base.h    |   3 +-
 lib/crypto/sha256.c          | 211 +++++++++++------------------------
 5 files changed, 71 insertions(+), 152 deletions(-)

Comments

Eric Biggers Oct. 22, 2020, 4:58 a.m. UTC | #1
On Tue, Oct 20, 2020 at 04:39:53PM -0400, Arvind Sankar wrote:
> The assignments to clear a through h and t1/t2 are optimized out by the

> compiler because they are unused after the assignments.

> 

> These variables shouldn't be very sensitive: t1/t2 can be calculated

> from a through h, so they don't reveal any additional information.

> Knowing a through h is equivalent to knowing one 64-byte block's SHA256

> hash (with non-standard initial value) which, assuming SHA256 is secure,

> doesn't reveal any information about the input.

> 

> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>


I don't entirely buy the second paragraph.  It could be the case that the input
is less than or equal to one SHA-256 block (64 bytes), in which case leaking
'a' through 'h' would reveal the final SHA-256 hash if the input length is
known.  And note that callers might consider either the input, the resulting
hash, or both to be sensitive information -- it depends.

> ---

>  lib/crypto/sha256.c | 1 -

>  1 file changed, 1 deletion(-)

> 

> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c

> index d43bc39ab05e..099cd11f83c1 100644

> --- a/lib/crypto/sha256.c

> +++ b/lib/crypto/sha256.c

> @@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input)

>  	state[4] += e; state[5] += f; state[6] += g; state[7] += h;

>  

>  	/* clear any sensitive info... */

> -	a = b = c = d = e = f = g = h = t1 = t2 = 0;

>  	memzero_explicit(W, 64 * sizeof(u32));

>  }


Your change itself is fine, though.  As you mentioned, these assignments get
optimized out, so they weren't accomplishing anything.

The fact is, there just isn't any way to guarantee in C code that all sensitive
variables get cleared.

So we shouldn't (and generally don't) bother trying to clear individual u32's,
ints, etc. like this, but rather only structs and arrays, as clearing those is
more likely to work as intended.

- Eric
Eric Biggers Oct. 22, 2020, 4:59 a.m. UTC | #2
On Tue, Oct 20, 2020 at 04:39:54PM -0400, Arvind Sankar wrote:
> The temporary W[] array is currently zeroed out once every call to

> sha256_transform(), i.e. once every 64 bytes of input data. Moving it to

> sha256_update() instead so that it is cleared only once per update can

> save about 2-3% of the total time taken to compute the digest, with a

> reasonable memset() implementation, and considerably more (~20%) with a

> bad one (eg the x86 purgatory currently uses a memset() coded in C).

> 

> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>


Looks good,

Reviewed-by: Eric Biggers <ebiggers@google.com>
Eric Biggers Oct. 22, 2020, 5:02 a.m. UTC | #3
On Tue, Oct 20, 2020 at 04:39:55PM -0400, Arvind Sankar wrote:
> This reduces code size substantially (on x86_64 with gcc-10 the size of

> sha256_update() goes from 7593 bytes to 1952 bytes including the new

> SHA256_K array), and on x86 is slightly faster than the full unroll

> (tesed on Broadwell Xeon).


tesed => tested

> 

> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>

> ---

>  lib/crypto/sha256.c | 166 ++++++++------------------------------------

>  1 file changed, 30 insertions(+), 136 deletions(-)

> 

> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c

> index c6bfeacc5b81..5efd390706c6 100644

> --- a/lib/crypto/sha256.c

> +++ b/lib/crypto/sha256.c

> @@ -18,6 +18,17 @@

>  #include <crypto/sha.h>

>  #include <asm/unaligned.h>

>  

> +static const u32 SHA256_K[] = {

> +	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,

> +	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,

> +	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,

> +	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,

> +	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,

> +	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,

> +	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,

> +	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,

> +};


Limit this to 80 columns?

Otherwise this looks good.

- Eric
Arvind Sankar Oct. 23, 2020, 3:12 a.m. UTC | #4
On Wed, Oct 21, 2020 at 10:02:19PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:55PM -0400, Arvind Sankar wrote:

> > This reduces code size substantially (on x86_64 with gcc-10 the size of

> > sha256_update() goes from 7593 bytes to 1952 bytes including the new

> > SHA256_K array), and on x86 is slightly faster than the full unroll

> > (tesed on Broadwell Xeon).

> 

> tesed => tested

> 

> > 

> > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>

> > ---

> >  lib/crypto/sha256.c | 166 ++++++++------------------------------------

> >  1 file changed, 30 insertions(+), 136 deletions(-)

> > 

> > diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c

> > index c6bfeacc5b81..5efd390706c6 100644

> > --- a/lib/crypto/sha256.c

> > +++ b/lib/crypto/sha256.c

> > @@ -18,6 +18,17 @@

> >  #include <crypto/sha.h>

> >  #include <asm/unaligned.h>

> >  

> > +static const u32 SHA256_K[] = {

> > +	0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5,

> > +	0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174,

> > +	0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da,

> > +	0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967,

> > +	0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85,

> > +	0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070,

> > +	0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3,

> > +	0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2,

> > +};

> 

> Limit this to 80 columns?


I was aiming for 8 columns per line to match all the other groupings by
eight. It does slightly exceed 100 columns but can this be an exception,
or should I maybe make it 4 columns per line?

> 

> Otherwise this looks good.

> 

> - Eric
Herbert Xu Oct. 23, 2020, 3:16 a.m. UTC | #5
On Thu, Oct 22, 2020 at 11:12:36PM -0400, Arvind Sankar wrote:
>

> I was aiming for 8 columns per line to match all the other groupings by

> eight. It does slightly exceed 100 columns but can this be an exception,

> or should I maybe make it 4 columns per line?


Please limit it to 4 columns.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Arvind Sankar Oct. 23, 2020, 3:17 a.m. UTC | #6
On Wed, Oct 21, 2020 at 09:58:50PM -0700, Eric Biggers wrote:
> On Tue, Oct 20, 2020 at 04:39:53PM -0400, Arvind Sankar wrote:

> > The assignments to clear a through h and t1/t2 are optimized out by the

> > compiler because they are unused after the assignments.

> > 

> > These variables shouldn't be very sensitive: t1/t2 can be calculated

> > from a through h, so they don't reveal any additional information.

> > Knowing a through h is equivalent to knowing one 64-byte block's SHA256

> > hash (with non-standard initial value) which, assuming SHA256 is secure,

> > doesn't reveal any information about the input.

> > 

> > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>

> 

> I don't entirely buy the second paragraph.  It could be the case that the input

> is less than or equal to one SHA-256 block (64 bytes), in which case leaking

> 'a' through 'h' would reveal the final SHA-256 hash if the input length is

> known.  And note that callers might consider either the input, the resulting

> hash, or both to be sensitive information -- it depends.


The "non-standard initial value" was just parenthetical -- my thinking
was that revealing the hash, whether the real SHA hash or an
intermediate one starting at some other initial value, shouldn't reveal
the input; not that you get any additional security from being an
intermediate block. But if the hash itself could be sensitive, yeah then
a-h are sensitive anyway.

> 

> > ---

> >  lib/crypto/sha256.c | 1 -

> >  1 file changed, 1 deletion(-)

> > 

> > diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c

> > index d43bc39ab05e..099cd11f83c1 100644

> > --- a/lib/crypto/sha256.c

> > +++ b/lib/crypto/sha256.c

> > @@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input)

> >  	state[4] += e; state[5] += f; state[6] += g; state[7] += h;

> >  

> >  	/* clear any sensitive info... */

> > -	a = b = c = d = e = f = g = h = t1 = t2 = 0;

> >  	memzero_explicit(W, 64 * sizeof(u32));

> >  }

> 

> Your change itself is fine, though.  As you mentioned, these assignments get

> optimized out, so they weren't accomplishing anything.

> 

> The fact is, there just isn't any way to guarantee in C code that all sensitive

> variables get cleared.

> 

> So we shouldn't (and generally don't) bother trying to clear individual u32's,

> ints, etc. like this, but rather only structs and arrays, as clearing those is

> more likely to work as intended.

> 

> - Eric


Ok, I'll just drop the second paragraph from the commit message then.