mbox series

[v2,0/4] random: use computational hash for entropy extraction, and related fixes

Message ID 20220204135325.8327-1-Jason@zx2c4.com
Headers show
Series random: use computational hash for entropy extraction, and related fixes | expand

Message

Jason A. Donenfeld Feb. 4, 2022, 1:53 p.m. UTC
The bulk of the motivation for this and description of crypto
vulnerabilities is in the first patch of this series. The following two
patches then fix up entropy accounting for the new model. The last patch
fixes a minor code safety issue.

This v2 improves some documentation comments and unifies these patches
into one patchset.

Jason A. Donenfeld (4):
  random: use computational hash for entropy extraction
  random: simplify entropy debiting
  random: use linear min-entropy accumulation crediting
  random: make credit_entropy_bits() always safe

 drivers/char/random.c         | 490 ++++++----------------------------
 include/trace/events/random.h |  30 +--
 2 files changed, 87 insertions(+), 433 deletions(-)

Comments

Eric Biggers Feb. 5, 2022, 7 a.m. UTC | #1
On Fri, Feb 04, 2022 at 02:53:24PM +0100, Jason A. Donenfeld wrote:
> What if we're wrong and the above is nonsense? It's not, but let's
> assume we don't want the actual _behavior_ of the code to change much.
> Currently that behavior is not extracting from the input pool until it
> has 128 bits of entropy in it. With the old algorithm, we'd hit that
> magic 128 number after roughly 256 calls to credit_entropy_bits(1). So,
> we can retain more or less the old behavior by waiting to extract from
> the input pool until it hits 256 bits of entropy using the new code. For
> people concerned about this change, it means that there's not that much
> practical behavioral change. And for folks actually trying to model
> the behavior rigorously, it means that we have an even higher margin
> against attacks.

I tested this, and it actually was 205 calls prior to patch 1 in this series,
and 267 calls after patch 1.  That's in contrast to 256 calls after this patch.
Not a big difference, but this is going to result in ~25% more single-bit calls
being needed compared to the old version.  It's unclear whether you're arguing
that's basically the same, or whether you thought it was a smaller difference.

> +enum {
>  	POOL_BITS = BLAKE2S_HASH_SIZE * 8,
> -	POOL_BITSHIFT = ilog2(POOL_BITS),
> -	POOL_MIN_BITS = POOL_BITS / 2,
> -
> -	/* To allow fractional bits to be tracked, the entropy_count field is
> -	 * denominated in units of 1/8th bits. */
> -	POOL_ENTROPY_SHIFT = 3,
> -#define POOL_ENTROPY_BITS() (input_pool.entropy_count >> POOL_ENTROPY_SHIFT)
> -	POOL_FRACBITS = POOL_BITS << POOL_ENTROPY_SHIFT,
> -	POOL_MIN_FRACBITS = POOL_MIN_BITS << POOL_ENTROPY_SHIFT
> +	POOL_MIN_BITS = POOL_BITS /* No point in settling for less. */
>  };

Doesn't the default value of random_write_wakeup_bits need to be increased to
this value?  Otherwise, the pool can get stuck with entropy_count greater than
or equal to random_write_wakeup_bits (192) but less than POOL_MIN_BITS (256).

In fact, the only correct value of random_write_wakeup_bits will be 256, i.e.
the entire pool.  Perhaps it should no longer be configurable via /proc/sys?
Note that there's also an off-by one bug that will need to be fixed:
add_hwgenerator_randomness() checks entropy_count <= random_write_wakeup_bits
rather than entropy_count < random_write_wakeup_bits as random_poll() does.

Also: given all these considerations, perhaps the increase in POOL_MIN_BITS is
best done in a separate patch?

> +	do {
> +		entropy_count = orig = READ_ONCE(input_pool.entropy_count);
> +		entropy_count = min(POOL_BITS, entropy_count + nbits);
> +	} while (cmpxchg(&input_pool.entropy_count, orig, entropy_count) != orig);

This can be simplified slightly:

	do {
		orig = READ_ONCE(input_pool.entropy_count);
		entropy_count = min(POOL_BITS, orig + nbits);
	} while (cmpxchg(&input_pool.entropy_count, orig, entropy_count) != orig);

- Eric
Dominik Brodowski Feb. 5, 2022, 8:23 a.m. UTC | #2
Am Fri, Feb 04, 2022 at 02:53:22PM +0100 schrieb Jason A. Donenfeld:
> The current 4096-bit LFSR used for entropy collection had a few
> desirable attributes for the context in which it was created. For
> example, the state was huge, which meant that /dev/random would be able
> to output quite a bit of accumulated entropy before blocking. It was
> also, in its time, quite fast at accumulating entropy byte-by-byte,
> which matters given the varying contexts in which mix_pool_bytes() is
> called. And its diffusion was relatively high, which meant that changes
> would ripple across several words of state rather quickly.
> 
> However, it also suffers from a few security vulnerabilities. In
> particular, inputs learned by an attacker can be undone, but more over,
> if the state of the pool leaks, its contents can be controlled and
> entirely zeroed out. I've demonstrated this attack with this SMT2
> script, <https://xn--4db.cc/5o9xO8pb>, which Boolector/CaDiCal solves in
> a matter of seconds on a single core of my laptop, resulting in little
> proof of concept C demonstrators such as <https://xn--4db.cc/jCkvvIaH/c>.
> 
> For basically all recent formal models of RNGs, these attacks represent
> a significant cryptographic flaw. But how does this manifest
> practically? If an attacker has access to the system to such a degree
> that he can learn the internal state of the RNG, arguably there are
> other lower hanging vulnerabilities -- side-channel, infoleak, or
> otherwise -- that might have higher priority. On the other hand, seed
> files are frequently used on systems that have a hard time generating
> much entropy on their own, and these seed files, being files, often leak
> or are duplicated and distributed accidentally, or are even seeded over
> the Internet intentionally, where their contents might be recorded or
> tampered with. Seen this way, an otherwise quasi-implausible
> vulnerability is a bit more practical than initially thought.
> 
> Another aspect of the current mix_pool_bytes() function is that, while
> its performance was arguably competitive for the time in which it was
> created, it's no longer considered so. This patch improves performance
> significantly: on a high-end CPU, an i7-11850H, it improves performance
> of mix_pool_bytes() by 225%, and on a low-end CPU, a Cortex-A7, it
> improves performance by 103%.
> 
> This commit replaces the LFSR of mix_pool_bytes() with a straight-
> forward cryptographic hash function, BLAKE2s, which is already in use
> for pool extraction. Universal hashing with a secret seed was considered
> too, something along the lines of <https://eprint.iacr.org/2013/338>,
> but the requirement for a secret seed makes for a chicken & egg problem.
> Instead we go with a formally proven scheme using a computational hash
> function, described in sections 5.1, 6.4, and B.1.8 of
> <https://eprint.iacr.org/2019/198>.
> 
> BLAKE2s outputs 256 bits, which should give us an appropriate amount of
> min-entropy accumulation, and a wide enough margin of collision
> resistance against active attacks. mix_pool_bytes() becomes a simple
> call to blake2s_update(), for accumulation, while the extraction step
> becomes a blake2s_final() to generate a seed, with which we can then do
> a HKDF-like or BLAKE2X-like expansion, the first part of which we fold
> back as an init key for subsequent blake2s_update()s, and the rest we
> produce to the caller. This then is provided to our CRNG like usual. In
> that expansion step, we make opportunistic use of 32 bytes of RDRAND

Why are we only using RDRAND here, and not RDSEED?

Thanks,
	Dominik
Jason A. Donenfeld Feb. 5, 2022, 11:42 a.m. UTC | #3
On 2/5/22, Dominik Brodowski <linux@dominikbrodowski.net> wrote:
> Why are we only using RDRAND here, and not RDSEED?

Simply because that's what was already used here; I didn't revisit the
old decision. It seems like any changes there should be made in a
separate patch with its own justification. If you think there's good
rationale, free to send that.

Part of why these changes are so gradual is because much of random.c
isn't my code originally. Were it mine, I'd presumably know all my
various rationales and be able to rapidly think within them and
reevaluate. But because that's not the case, I find that I'm spending
a lot of time trying to reconstruct the original rationales of its
authors. IOW, rather than saying, "I don't get this, must be bad," I'm
trying to do a little bit of archeology to at least make sure I know
what I'm disagreeing with, if I even disagree at all. That's time
consuming in part, but also is part of doing things evolutionarily.

With regards to RDRAND vs RDSEED, just off the top of my head -- I'm
writing this email on my phone -- I think extract_entropy/extract_buf
used to be used as part of /dev/random's blocking stream, which
ostensibly could mean more frequent calls, once every 10 bytes IIRC.
Nowadays it's only called once every 5 minutes (per numa node), so
maybe RDSEED could make sense? Or maybe there are other reasons to
unearth, or none at all. We'll have to look and see.

Jason
Jason A. Donenfeld Feb. 5, 2022, 11:43 a.m. UTC | #4
On 2/5/22, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Nowadays it's only called once every 5 minutes (per numa

Period, not per node.
Jason A. Donenfeld Feb. 5, 2022, 12:54 p.m. UTC | #5
Hi Eric,

On Sat, Feb 5, 2022 at 8:00 AM Eric Biggers <ebiggers@kernel.org> wrote:
> I tested this, and it actually was 205 calls prior to patch 1 in this series,
> and 267 calls after patch 1.  That's in contrast to 256 calls after this patch.
> Not a big difference, but this is going to result in ~25% more single-bit calls
> being needed compared to the old version.  It's unclear whether you're arguing
> that's basically the same, or whether you thought it was a smaller difference.

My argument is that we're not _decreasing_ the security in any
substantive way with this change.

> Doesn't the default value of random_write_wakeup_bits need to be increased to
> this value?  Otherwise, the pool can get stuck with entropy_count greater than
> or equal to random_write_wakeup_bits (192) but less than POOL_MIN_BITS (256).

Good catch, thanks.

> In fact, the only correct value of random_write_wakeup_bits will be 256, i.e.
> the entire pool.  Perhaps it should no longer be configurable via /proc/sys?

I think so, yea. I'll change up in an add-on commit.

> Note that there's also an off-by one bug that will need to be fixed:
> add_hwgenerator_randomness() checks entropy_count <= random_write_wakeup_bits
> rather than entropy_count < random_write_wakeup_bits as random_poll() does.

Thanks.

> > +     do {
> > +             entropy_count = orig = READ_ONCE(input_pool.entropy_count);
> > +             entropy_count = min(POOL_BITS, entropy_count + nbits);
> > +     } while (cmpxchg(&input_pool.entropy_count, orig, entropy_count) != orig);
>
> This can be simplified slightly:
>
>         do {
>                 orig = READ_ONCE(input_pool.entropy_count);
>                 entropy_count = min(POOL_BITS, orig + nbits);
>         } while (cmpxchg(&input_pool.entropy_count, orig, entropy_count) != orig);

That looks nicer indeed. Will do.

Thanks for your comments.

Jason