diff mbox series

[v2] random: do crng pre-init loading in worker rather than irq

Message ID 20220224152937.12747-1-Jason@zx2c4.com
State New
Headers show
Series [v2] random: do crng pre-init loading in worker rather than irq | expand

Commit Message

Jason A. Donenfeld Feb. 24, 2022, 3:29 p.m. UTC
Taking spinlocks from IRQ context is problematic for PREEMPT_RT. That
is, in part, why we take trylocks instead. But apparently this still
trips up various lock dependency analyzers. That seems like a bug in the
analyzers that should be fixed, rather than having to change things
here.

But maybe there's another reason to change things up: by deferring the
crng pre-init loading to the worker, we can use the cryptographic hash
function rather than xor, which is perhaps a meaningful difference when
considering this data has only been through the relatively weak
fast_mix() function.

The biggest downside of this approach is that the pre-init loading is
now deferred until later, which means things that need random numbers
after interrupts are enabled, but before workqueues are running -- or
before this particular worker manages to run -- are going to get into
trouble. Hopefully in the real world, this window is rather small,
especially since this code won't run until 64 interrupts had occurred.

Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
- [Dominik] Call crng_pre_init_inject() before calling mix_pool_bytes().

 drivers/char/random.c | 65 +++++++++++++------------------------------
 1 file changed, 19 insertions(+), 46 deletions(-)

Comments

Sebastian Andrzej Siewior Feb. 28, 2022, 2:02 p.m. UTC | #1
On 2022-02-24 16:29:37 [+0100], Jason A. Donenfeld wrote:
> Taking spinlocks from IRQ context is problematic for PREEMPT_RT. That
> is, in part, why we take trylocks instead. But apparently this still
> trips up various lock dependency analyzers. That seems like a bug in the
> analyzers that should be fixed, rather than having to change things
> here.

Could you please post a lockdep report so I can take a look?

> But maybe there's another reason to change things up: by deferring the
> crng pre-init loading to the worker, we can use the cryptographic hash
> function rather than xor, which is perhaps a meaningful difference when
> considering this data has only been through the relatively weak
> fast_mix() function.
> 
> The biggest downside of this approach is that the pre-init loading is
> now deferred until later, which means things that need random numbers
> after interrupts are enabled, but before workqueues are running -- or
> before this particular worker manages to run -- are going to get into
> trouble. Hopefully in the real world, this window is rather small,
> especially since this code won't run until 64 interrupts had occurred.
> 
> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Eric Biggers <ebiggers@kernel.org>
> Cc: Theodore Ts'o <tytso@mit.edu>
> Reviewed-by: Dominik Brodowski <linux@dominikbrodowski.net>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Other than that:

Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian
Sebastian Andrzej Siewior Feb. 28, 2022, 2:29 p.m. UTC | #2
On 2022-02-28 15:17:19 [+0100], Jason A. Donenfeld wrote:
> Hey Sebastian,
Hi Jason,

> On 2/28/22, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> > On 2022-02-24 16:29:37 [+0100], Jason A. Donenfeld wrote:
> >> Taking spinlocks from IRQ context is problematic for PREEMPT_RT. That
> >> is, in part, why we take trylocks instead. But apparently this still
> >> trips up various lock dependency analyzers. That seems like a bug in the
> >> analyzers that should be fixed, rather than having to change things
> >> here.
> >
> > Could you please post a lockdep report so I can take a look?
> 
> I thought the problem with lockdep was stated by you somewhere in this thread?
> https://lore.kernel.org/lkml/YfOqsOiNfURyvFRX@linutronix.de/
> "But even then we need to find a way to move the crng init part
> (crng_fast_load()) out of the hard-IRQ."
> And Jonathan posted two related (?) splats he ran into.
> 
> I may have gotten that all wrong, in which case, I'll just excise that
> part from the commit message. I'm pretty sure you want this patch
> either way, right?

Oh, that report. So yes, I want that patch ;)

In this case the lockdep is right. The thing that it affects only
PREEMPT_RT.
That trylock is not the thing that lockdep complains about but the
spin_lock_irqsave() within invalidate_batched_entropy().

Taking a spinlock_t from IRQ context is problematic for PREEMPT_RT,
correct. A spin_try_lock() is also problematic since another spin_lock()
invocation would PI-boost the wrong task (the spin_try_lock() is invoked
from an IRQ-context so the task on CPU (random task or idle) is not the
actual owner). I'm pointing this out because there was also _another_
problem with try_lock from hard-IRQ context which was fixed in the
meantime.

Would it work for you to update the commit message? Basically I'm fine
with the firs sentence but the remaining part is misleading.

> Jason

Sebastian
Jason A. Donenfeld Feb. 28, 2022, 3:10 p.m. UTC | #3
Hi Sebastian,

On Mon, Feb 28, 2022 at 03:29:32PM +0100, Sebastian Andrzej Siewior wrote:
> > > Could you please post a lockdep report so I can take a look?
> > 
> > I thought the problem with lockdep was stated by you somewhere in this thread?
> > https://lore.kernel.org/lkml/YfOqsOiNfURyvFRX@linutronix.de/
> > "But even then we need to find a way to move the crng init part
> > (crng_fast_load()) out of the hard-IRQ."
> > And Jonathan posted two related (?) splats he ran into.
> > 
> > I may have gotten that all wrong, in which case, I'll just excise that
> > part from the commit message. I'm pretty sure you want this patch
> > either way, right?
> 
> Oh, that report. So yes, I want that patch ;)
> 
> In this case the lockdep is right. The thing that it affects only
> PREEMPT_RT.
> That trylock is not the thing that lockdep complains about but the
> spin_lock_irqsave() within invalidate_batched_entropy().
> 
> Taking a spinlock_t from IRQ context is problematic for PREEMPT_RT,
> correct. A spin_try_lock() is also problematic since another spin_lock()
> invocation would PI-boost the wrong task (the spin_try_lock() is invoked
> from an IRQ-context so the task on CPU (random task or idle) is not the
> actual owner). I'm pointing this out because there was also _another_
> problem with try_lock from hard-IRQ context which was fixed in the
> meantime.
> 
> Would it work for you to update the commit message? Basically I'm fine
> with the firs sentence but the remaining part is misleading.

Ahh, I understand, okay. Yes, I'll change that first paragraph to
incorporate your wording, as:

"""
Taking spinlocks from IRQ context is generally problematic for
PREEMPT_RT. That is, in part, why we take trylocks instead. However, a
spin_try_lock() is also problematic since another spin_lock() invocation
can potentially PI-boost the wrong task, as the spin_try_lock() is
invoked from an IRQ-context, so the task on CPU (random task or idle) is
not the actual owner.
"""

Jason
diff mbox series

Patch

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 536237a0f073..19bf44b9ba0f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -443,10 +443,6 @@  static void crng_make_state(u32 chacha_state[CHACHA_STATE_WORDS],
  * boot time when it's better to have something there rather than
  * nothing.
  *
- * There are two paths, a slow one and a fast one. The slow one
- * hashes the input along with the current key. The fast one simply
- * xors it in, and should only be used from interrupt context.
- *
  * If account is set, then the crng_init_cnt counter is incremented.
  * This shouldn't be set by functions like add_device_randomness(),
  * where we can't trust the buffer passed to it is guaranteed to be
@@ -455,19 +451,15 @@  static void crng_make_state(u32 chacha_state[CHACHA_STATE_WORDS],
  * Returns the number of bytes processed from input, which is bounded
  * by CRNG_INIT_CNT_THRESH if account is true.
  */
-static size_t crng_pre_init_inject(const void *input, size_t len,
-				   bool fast, bool account)
+static size_t crng_pre_init_inject(const void *input, size_t len, bool account)
 {
 	static int crng_init_cnt = 0;
+	struct blake2s_state hash;
 	unsigned long flags;
 
-	if (fast) {
-		if (!spin_trylock_irqsave(&base_crng.lock, flags))
-			return 0;
-	} else {
-		spin_lock_irqsave(&base_crng.lock, flags);
-	}
+	blake2s_init(&hash, sizeof(base_crng.key));
 
+	spin_lock_irqsave(&base_crng.lock, flags);
 	if (crng_init != 0) {
 		spin_unlock_irqrestore(&base_crng.lock, flags);
 		return 0;
@@ -476,21 +468,9 @@  static size_t crng_pre_init_inject(const void *input, size_t len,
 	if (account)
 		len = min_t(size_t, len, CRNG_INIT_CNT_THRESH - crng_init_cnt);
 
-	if (fast) {
-		const u8 *src = input;
-		size_t i;
-
-		for (i = 0; i < len; ++i)
-			base_crng.key[(crng_init_cnt + i) %
-				      sizeof(base_crng.key)] ^= src[i];
-	} else {
-		struct blake2s_state hash;
-
-		blake2s_init(&hash, sizeof(base_crng.key));
-		blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
-		blake2s_update(&hash, input, len);
-		blake2s_final(&hash, base_crng.key);
-	}
+	blake2s_update(&hash, base_crng.key, sizeof(base_crng.key));
+	blake2s_update(&hash, input, len);
+	blake2s_final(&hash, base_crng.key);
 
 	if (account) {
 		crng_init_cnt += len;
@@ -1040,7 +1020,7 @@  void add_device_randomness(const void *buf, size_t size)
 	unsigned long flags;
 
 	if (crng_init == 0 && size)
-		crng_pre_init_inject(buf, size, false, false);
+		crng_pre_init_inject(buf, size, false);
 
 	spin_lock_irqsave(&input_pool.lock, flags);
 	_mix_pool_bytes(buf, size);
@@ -1157,7 +1137,7 @@  void add_hwgenerator_randomness(const void *buffer, size_t count,
 				size_t entropy)
 {
 	if (unlikely(crng_init == 0)) {
-		size_t ret = crng_pre_init_inject(buffer, count, false, true);
+		size_t ret = crng_pre_init_inject(buffer, count, true);
 		mix_pool_bytes(buffer, ret);
 		count -= ret;
 		buffer += ret;
@@ -1297,8 +1277,14 @@  static void mix_interrupt_randomness(struct work_struct *work)
 	fast_pool->last = jiffies;
 	local_irq_enable();
 
-	mix_pool_bytes(pool, sizeof(pool));
-	credit_entropy_bits(1);
+	if (unlikely(crng_init == 0)) {
+		crng_pre_init_inject(pool, sizeof(pool), true);
+		mix_pool_bytes(pool, sizeof(pool));
+	} else {
+		mix_pool_bytes(pool, sizeof(pool));
+		credit_entropy_bits(1);
+	}
+
 	memzero_explicit(pool, sizeof(pool));
 }
 
@@ -1331,24 +1317,11 @@  void add_interrupt_randomness(int irq)
 	fast_mix(fast_pool->pool32);
 	new_count = ++fast_pool->count;
 
-	if (unlikely(crng_init == 0)) {
-		if (new_count >= 64 &&
-		    crng_pre_init_inject(fast_pool->pool32, sizeof(fast_pool->pool32),
-					 true, true) > 0) {
-			fast_pool->count = 0;
-			fast_pool->last = now;
-			if (spin_trylock(&input_pool.lock)) {
-				_mix_pool_bytes(&fast_pool->pool32, sizeof(fast_pool->pool32));
-				spin_unlock(&input_pool.lock);
-			}
-		}
-		return;
-	}
-
 	if (new_count & MIX_INFLIGHT)
 		return;
 
-	if (new_count < 64 && !time_after(now, fast_pool->last + HZ))
+	if (new_count < 64 && (!time_after(now, fast_pool->last + HZ) ||
+			       unlikely(crng_init == 0)))
 		return;
 
 	if (unlikely(!fast_pool->mix.func))