diff mbox series

random: add chacha8_block and swtich the rng to it

Message ID 20240429134942.2873253-1-aaron.toponce@gmail.com
State New
Headers show
Series random: add chacha8_block and swtich the rng to it | expand

Commit Message

Aaron Toponce April 29, 2024, 1:48 p.m. UTC
According to Jean-Philippe Aumasson in his paper "Too Much Crypto" [1]:

> "The best result on ChaCha is a key recovery attack on the 7-round version
> with 2^237.7 time complexity using output data from 2^96 instances of ChaCha,
> that is, 2^105 bytes of data."

He then proposes that ChaCha use 8 rounds instead of 20, providing a 2.5x
speed-up. As such, this patch adds chacha8_block and chacha12_block and switches
the RNG from ChaCha20 to ChaCha8 to take advantage of that efficiency without
sacrificing security.

[1]: https://eprint.iacr.org/2019/1492

On my ThinkPad T480s with an Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz, the
speed-up is close to what would be expected.

Without the patch:

  $ dd if=/dev/urandom of=/dev/null bs=32M count=300
  300+0 records in
  300+0 records out
  10066329600 bytes (10 GB, 9.4 GiB) copied, 20.4806 s, 492 MB/s

With the patch:

  $ dd if=/dev/urandom of=/dev/null bs=32M count=300
  300+0 records in
  300+0 records out
  10066329600 bytes (10 GB, 9.4 GiB) copied, 11.5321 s, 873 MB/s

Signed-off-by: Aaron Toponce <aaron.toponce@gmail.com>
---
 drivers/char/random.c   |  8 ++++----
 include/crypto/chacha.h | 14 ++++++++++++--
 lib/crypto/chacha.c     |  6 +++---
 3 files changed, 19 insertions(+), 9 deletions(-)

Comments

Eric Biggers April 30, 2024, 3:11 a.m. UTC | #1
On Mon, Apr 29, 2024 at 07:48:49AM -0600, Aaron Toponce wrote:
> According to Jean-Philippe Aumasson in his paper "Too Much Crypto" [1]:
> 
> > "The best result on ChaCha is a key recovery attack on the 7-round version
> > with 2^237.7 time complexity using output data from 2^96 instances of ChaCha,
> > that is, 2^105 bytes of data."
> 
> He then proposes that ChaCha use 8 rounds instead of 20, providing a 2.5x
> speed-up. As such, this patch adds chacha8_block and chacha12_block and switches
> the RNG from ChaCha20 to ChaCha8 to take advantage of that efficiency without
> sacrificing security.
> 

I don't think there is consensus on ChaCha8 being recommended.  Adiantum uses
ChaCha12, but even that received some pushback.

The Linux RNG is also usually used only for small amounts of data, and its
security (and the perception of its security) is extremely important.

So just staying with ChaCha20 seems appropriate.

Note also that currently the Linux RNG is using a portable C implementation of
ChaCha20.  If there is actually a desire to accelerate large reads (which again,
aren't the main use case of the Linux RNG), it would be possible to use a SIMD
implementation of ChaCha20, which already exists in the kernel.  That would
speed up ChaCha20 by roughly 2-5x depending on the CPU.

- Eric
Aaron Toponce April 30, 2024, 4:41 a.m. UTC | #2
On Mon, Apr 29, 2024 at 08:11:05PM -0700, Eric Biggers wrote:
> I don't think there is consensus on ChaCha8 being recommended.  Adiantum uses
> ChaCha12, but even that received some pushback.
> 
> The Linux RNG is also usually used only for small amounts of data, and its
> security (and the perception of its security) is extremely important.
> 
> So just staying with ChaCha20 seems appropriate.

The 7-round attack does indeed fall short of the required 256-bits of security
per the stated goals of the ChaCha stream cipher, coming in at ~237 bits.
However, this attack is not catastrophic and of theoretical interest only. It's
well outside of practical reach. The 8-round version however reaches our
required security goal and is currently unbroken.

An interesting note in that paper is how we got ChaCha20 to begin with. ChaCha
is an evolution of Salsa20 which was included in the final eSTREAM portfolio [1]
The final eSTREAM portfolio recommends Salsa20/12, which is the 12 round
Salsa20. but with better diffusion [2]. In the "Too Much Crypto" paper, it
states [3]:

> "Regarding ChaCha, the eSTREAM actually recommended Salsa20/12, or ChaCha's
> predecessor with 12 rounds instead of 20, but ChaCha was de facto standardized
> with 20 rounds."

ChaCha20 is 13 additional rounds for extra security margin, more than have been
demonstrated for ChaCha to be secure.

[1]: https://www.ecrypt.eu.org/stream/
[2]: https://cr.yp.to/chacha/chacha-20080128.pdf
[3]: https://eprint.iacr.org/2019/1492

The reduced-round analysis of ChaCha is actually *better* than Salsa20.
Salsa20/8 has a known attack complexity with ~249 bits and Salsa20/7 has a known
attack complexity of ~153 bits. No known attacks exist against ChaCha8, and the
complexity against ChaCha7 is ~237 bits. This demonstrates to me that ChaCha's
security is very robust, and ChaCha8 is solid choice for a CSPRNG.

> Note also that currently the Linux RNG is using a portable C implementation of
> ChaCha20.  If there is actually a desire to accelerate large reads (which
> again, aren't the main use case of the Linux RNG), it would be possible to use
> a SIMD implementation of ChaCha20, which already exists in the kernel.  That
> would speed up ChaCha20 by roughly 2-5x depending on the CPU.

If ChaCha8 makes us uncomfortable, even though defensible, ChaCha12 is a good
compromise. As you mentioned, Google implemented ChaCha12 in Adiantum. It offers
a 1.67x speedup over ChaCha20 while still providing 5 additional rounds of
security over the best known attack.
Theodore Ts'o April 30, 2024, 4:26 p.m. UTC | #3
On Mon, Apr 29, 2024 at 10:41:10PM -0600, Aaron Toponce wrote:
> > Note also that currently the Linux RNG is using a portable C
> > implementation of ChaCha20.  If there is actually a desire to
> > accelerate large reads (which again, aren't the main use case of
> > the Linux RNG), it would be possible to use a SIMD implementation
> > of ChaCha20, which already exists in the kernel.  That would speed
> > up ChaCha20 by roughly 2-5x depending on the CPU.
> 
> If ChaCha8 makes us uncomfortable, even though defensible, ChaCha12
> is a good compromise. As you mentioned, Google implemented ChaCha12
> in Adiantum. It offers a 1.67x speedup over ChaCha20 while still
> providing 5 additional rounds of security over the best known
> attack.

I'm not sure I see the point of trying to accelerate the Linux RNG.
Sure, doing "dd if=/dev/urandom" is *fun*, but what's the real world
use case where this actually matters?  The kernel RNG is meant for key
generation, where a much larger safety margin is a good thing, and
where absolute performance is generally not a big deal.

After all, with key generation generally you are also performing some
kind of assymetric key crypto as part of the IKE or TLS setup, and the
time to generate the session key is in the noise.  And if you are
trying to wipe a disk, using something like shred(1) is really the
right tool.

Ultimately this boils down to a risk/benefit tradeoff.  I judge the
risk that you are a shill sent by a nation-state security agency ala
Jia Tan of xz infamy, trying to weaken Linux's RNG to be very low; but
what's the benefit?  If the risk is low, but the benefit is also low,
maybe it's not worth it?

Cheers,

					- Ted
Aaron Toponce April 30, 2024, 4:44 p.m. UTC | #4
On Tue, Apr 30, 2024 at 12:26:32PM -0400, Theodore Ts'o wrote:
> I'm not sure I see the point of trying to accelerate the Linux RNG.
> Sure, doing "dd if=/dev/urandom" is *fun*, but what's the real world
> use case where this actually matters?  The kernel RNG is meant for key
> generation, where a much larger safety margin is a good thing, and
> where absolute performance is generally not a big deal.

The goal is just to make the CSPRNG more efficient without sacrificing security.
Of course most reads will be small for cryptographic keys. ChaCha8 means even
those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example
was just to demonstrate the efficiency, not to be "fun".

> I judge the risk that you are a shill sent by a nation-state security agency
> ala Jia Tan of xz infamy, trying to weaken Linux's RNG to be very low; 

Unlike Jia Tan, my name is not anonymous. I've been very public and transparent
about who I am, the software I work on, the security research I've participated
in, and the communities I involve myself in. I don't work for a nation state nor
am I interested in compromising the kernel RNG.

In fact, I work for a local ISP out of Salt Lake City, Utah where we provide a
web hosting product with KVM. We are very interested in a secure Linux stack as
our business depends on it.

You and I have also had email communication about the kernel RNG in the paste.
I've also interacted with Jason Donenfeld about the RNG and putting together a
document on the evolution of the RNG from 1.3.30 to current.

I'll ignore the attempeted ad hominem. I understand the uneasy feeling due to
the xz(1) backdoor and the kneejerk reactions to not trust anyone with proposals
that might seem radical.
Theodore Ts'o May 1, 2024, 2:22 a.m. UTC | #5
So first of all, my apologies for giving you offense.  I really didn't
think you were a shill for the NSA or the MSS, but I have to admit
that when I get large set of patches which removes "unnecessary" code,
which is _technically_ safe, but which reduces the safety margin, I
find myself wondering whether it's part of a binary payload.  (This is
especially when I get patches from someone that I don't normally
receive patches from.)  Unfortunately, in the wake of the xz hack,
we're just all going to have to be a lot more careful.

On Tue, Apr 30, 2024 at 10:44:09AM -0600, Aaron Toponce wrote:
>
> The goal is just to make the CSPRNG more efficient without sacrificing security.
> Of course most reads will be small for cryptographic keys. ChaCha8 means even
> those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example
> was just to demonstrate the efficiency, not to be "fun".

This is a philosophical question; are we going for maximum efficiency,
or maximum safety so long as it meets the performance requirements for
the intended use case?  From an academic perspective, or if a
cryptographer is designing cipher for a NIST competition, there's a
strong desire for maximum efficiency, since that's one of the metrics
used in the competition.  But for the Linux RNG, my bias is to go for
safety, since we're not competing on who can do the fast bulk
encryption, but "sufficiently fast for keygen".

People of good will can disagree on what the approach should be.  I
tend to have much of a pragmatic engineer's perspective.  It's been
said that the Empire State Building is overbuilt by a factor of 10,
but that doesn't bother me.  People are now saying that perhaps the
Francis Scott Key bridge, when it is rebuilt, should have more safety
margin, since container ships have gotten so much bigger.  (And
apparently, cheap sh*t diesel fuel that is contaminated and the ship
owners buy fuel from the lowest bidder.)

Or we can talk about how Boeing has been trying to cheap-out on plane
manufacturing to save $$$; but I think you get the point of where I'm
coming from.  I'm not a big fan of trimming safety margins and making
things more efficient for it's own sake.  (At least in the case of
Boeing, the CEO at least got paid $22/million a year, so at least
there's that.  :-)

Now, if this is actually impacting the TLS connection termination for
a Facebook or Bing or Google's front end web server, then great, we
can try to optimize it.  But if it's not a bottleneck, what's the
point?  Making change for change's sake, especially when it's reducing
safety margins, is just one of those things that I find really hard to
get excited about.

Cheers,

					- Ted
Jason A. Donenfeld May 1, 2024, 12:21 p.m. UTC | #6
Hey Aaron,

There are probably better ways of speeding this up (e.g. my vDSO work,
which should be coming back soon) than just removing rounds and hoping
for the best.

The problem is that there's extremely broad consensus that ChaCha20 is
good at what it does. There's much less so for ChaCha8. JP's _probably_
right, and it all seems like a sensible risk analysis...maybe...but
also, why play with fire? Is it really worth it? I don't think there's
much harm done in being really conservative about all this.

Another consideration with the RNG is that most everybody else's crypto
relies on the RNG being good. If some consumer of the RNG wants to use
single DES, so be it. If another consumer wants to use a cascade of
ChaCha20 and AES and Serpent and Keccak for something, okay. Those
aren't our choices. But we shouldn't prevent those choices by weakening
the RNG.

So while it *might* be kinda overkill, there's also broad consensus that
what we've got is *definitely* sufficient for all uses. At the same
time, it's still pretty darn fast, there exist other ways to make it
faster, and I don't think it's /overly/ much.

Jason
Jean-Philippe Aumasson May 1, 2024, 12:38 p.m. UTC | #7
My 2 cents:

As a cryptanalyst, having discovered the 2008 attack on ChaCha that's
only been slightly improved in 16 years: the 20-round ChaCha20is a
clear waste of CPU cycles, but ChaCha8 is admittedly risky, though
more in terms of PR than pure crypto merits (plus, afaiu the threat
model of ChaCha in the Linux PRNG doesnt allow the kind of chosen-IV
"attack" known to work on reduced-round versions).

Switching from ChaCha20 to ChaCha12 might still raise eyebrows but I
dont think any respectable crypto/security expert will suspect a
JiaTan situation.

On Wed, May 1, 2024 at 2:28 PM Theodore Ts'o <tytso@mit.edu> wrote:
>
> So first of all, my apologies for giving you offense.  I really didn't
> think you were a shill for the NSA or the MSS, but I have to admit
> that when I get large set of patches which removes "unnecessary" code,
> which is _technically_ safe, but which reduces the safety margin, I
> find myself wondering whether it's part of a binary payload.  (This is
> especially when I get patches from someone that I don't normally
> receive patches from.)  Unfortunately, in the wake of the xz hack,
> we're just all going to have to be a lot more careful.
>
> On Tue, Apr 30, 2024 at 10:44:09AM -0600, Aaron Toponce wrote:
> >
> > The goal is just to make the CSPRNG more efficient without sacrificing security.
> > Of course most reads will be small for cryptographic keys. ChaCha8 means even
> > those small reads will be 2.5x more efficient than ChaCha20. The dd(1) example
> > was just to demonstrate the efficiency, not to be "fun".
>
> This is a philosophical question; are we going for maximum efficiency,
> or maximum safety so long as it meets the performance requirements for
> the intended use case?  From an academic perspective, or if a
> cryptographer is designing cipher for a NIST competition, there's a
> strong desire for maximum efficiency, since that's one of the metrics
> used in the competition.  But for the Linux RNG, my bias is to go for
> safety, since we're not competing on who can do the fast bulk
> encryption, but "sufficiently fast for keygen".
>
> People of good will can disagree on what the approach should be.  I
> tend to have much of a pragmatic engineer's perspective.  It's been
> said that the Empire State Building is overbuilt by a factor of 10,
> but that doesn't bother me.  People are now saying that perhaps the
> Francis Scott Key bridge, when it is rebuilt, should have more safety
> margin, since container ships have gotten so much bigger.  (And
> apparently, cheap sh*t diesel fuel that is contaminated and the ship
> owners buy fuel from the lowest bidder.)
>
> Or we can talk about how Boeing has been trying to cheap-out on plane
> manufacturing to save $$$; but I think you get the point of where I'm
> coming from.  I'm not a big fan of trimming safety margins and making
> things more efficient for it's own sake.  (At least in the case of
> Boeing, the CEO at least got paid $22/million a year, so at least
> there's that.  :-)
>
> Now, if this is actually impacting the TLS connection termination for
> a Facebook or Bing or Google's front end web server, then great, we
> can try to optimize it.  But if it's not a bottleneck, what's the
> point?  Making change for change's sake, especially when it's reducing
> safety margins, is just one of those things that I find really hard to
> get excited about.
>
> Cheers,
>
>                                         - Ted
Aaron Toponce May 1, 2024, 2:02 p.m. UTC | #8
On Wed, May 01, 2024 at 02:38:52PM +0200, Jean-Philippe Aumasson wrote:
> Switching from ChaCha20 to ChaCha12 might still raise eyebrows but I
> dont think any respectable crypto/security expert will suspect a
> JiaTan situation.

I also mentioned this earlier in the thread; that is, to switch to ChaCha12 if
ChaCha8 makes us uncomfortable. It's not without precedent also:

- eSTREAM recommends Salsa20/12 in their final portfolio
- Adiantum uses XChaCha12 
- Rust uses ChaCha12 rand::rngs::StdRng

There may be other precedent of ChaCha12 with from non-trivial projects I'm
unfamiliar with.
Aaron Toponce May 2, 2024, 1:41 p.m. UTC | #9
On Wed, May 01, 2024 at 02:21:35PM +0200, Jason A. Donenfeld wrote:
> There are probably better ways of speeding this up (e.g. my vDSO work,
> which should be coming back soon) than just removing rounds and hoping
> for the best.
> 
> The problem is that there's extremely broad consensus that ChaCha20 is
> good at what it does. There's much less so for ChaCha8. JP's _probably_
> right, and it all seems like a sensible risk analysis...maybe...but
> also, why play with fire? Is it really worth it? I don't think there's
> much harm done in being really conservative about all this.
> 
> Another consideration with the RNG is that most everybody else's crypto
> relies on the RNG being good. If some consumer of the RNG wants to use
> single DES, so be it. If another consumer wants to use a cascade of
> ChaCha20 and AES and Serpent and Keccak for something, okay. Those
> aren't our choices. But we shouldn't prevent those choices by weakening
> the RNG.
> 
> So while it *might* be kinda overkill, there's also broad consensus that
> what we've got is *definitely* sufficient for all uses. At the same
> time, it's still pretty darn fast, there exist other ways to make it
> faster, and I don't think it's /overly/ much.

ChaCha20 reminds me of cascading encryption actually. That's a good analogy.
VeraCrypt offers several cascading options choices, such as AES(Twofish),
AES(Twofish(Serpent)), Kuzneychik(Serpent(Camellia)), etc. While there isn't
anything technically wrong with the approach, most security-minded folks would
agree it's overkill. Using just AES, or just Twofish, or just Serpent is more
than sufficent. ChaCha20 is kind of like that. It's extra security because "just
in case".

ChaCha8 is certainly aggressive. As another analogy, it's a 10 character random
password. While a 10 character password hashed with MD5 is *probably* fine at 65
bits, 13 random characters (80 bits) would definitely be safer. But 20 random
characters (128 bits) is certainly overkill to protect against even the most
well-funded orgs with distributed GPU resources cracking password hashes.

ChaCha12 seems like a good compromise. It's 5 rounds of security away from the
latest known attack while also providing a solid performance improvement.

Cheers,
diff mbox series

Patch

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 2597cb43f438..2e14a30b795f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -302,7 +302,7 @@  static void crng_fast_key_erasure(u8 key[CHACHA_KEY_SIZE],
 	chacha_init_consts(chacha_state);
 	memcpy(&chacha_state[4], key, CHACHA_KEY_SIZE);
 	memset(&chacha_state[12], 0, sizeof(u32) * 4);
-	chacha20_block(chacha_state, first_block);
+	chacha8_block(chacha_state, first_block);
 
 	memcpy(key, first_block, CHACHA_KEY_SIZE);
 	memcpy(random_data, first_block + CHACHA_KEY_SIZE, random_data_len);
@@ -388,13 +388,13 @@  static void _get_random_bytes(void *buf, size_t len)
 
 	while (len) {
 		if (len < CHACHA_BLOCK_SIZE) {
-			chacha20_block(chacha_state, tmp);
+			chacha8_block(chacha_state, tmp);
 			memcpy(buf, tmp, len);
 			memzero_explicit(tmp, sizeof(tmp));
 			break;
 		}
 
-		chacha20_block(chacha_state, buf);
+		chacha8_block(chacha_state, buf);
 		if (unlikely(chacha_state[12] == 0))
 			++chacha_state[13];
 		len -= CHACHA_BLOCK_SIZE;
@@ -444,7 +444,7 @@  static ssize_t get_random_bytes_user(struct iov_iter *iter)
 	}
 
 	for (;;) {
-		chacha20_block(chacha_state, block);
+		chacha8_block(chacha_state, block);
 		if (unlikely(chacha_state[12] == 0))
 			++chacha_state[13];
 
diff --git a/include/crypto/chacha.h b/include/crypto/chacha.h
index b3ea73b81944..64c45121c69a 100644
--- a/include/crypto/chacha.h
+++ b/include/crypto/chacha.h
@@ -8,8 +8,7 @@ 
  *
  * The ChaCha paper specifies 20, 12, and 8-round variants.  In general, it is
  * recommended to use the 20-round variant ChaCha20.  However, the other
- * variants can be needed in some performance-sensitive scenarios.  The generic
- * ChaCha code currently allows only the 20 and 12-round variants.
+ * variants can be needed in some performance-sensitive scenarios.
  */
 
 #ifndef _CRYPTO_CHACHA_H
@@ -31,11 +30,22 @@ 
 #define XCHACHA_IV_SIZE		32
 
 void chacha_block_generic(u32 *state, u8 *stream, int nrounds);
+
 static inline void chacha20_block(u32 *state, u8 *stream)
 {
 	chacha_block_generic(state, stream, 20);
 }
 
+static inline void chacha12_block(u32 *state, u8 *stream)
+{
+	chacha_block_generic(state, stream, 12);
+}
+
+static inline void chacha8_block(u32 *state, u8 *stream)
+{
+	chacha_block_generic(state, stream, 8);
+}
+
 void hchacha_block_arch(const u32 *state, u32 *out, int nrounds);
 void hchacha_block_generic(const u32 *state, u32 *out, int nrounds);
 
diff --git a/lib/crypto/chacha.c b/lib/crypto/chacha.c
index b748fd3d256e..15e773629f1d 100644
--- a/lib/crypto/chacha.c
+++ b/lib/crypto/chacha.c
@@ -18,7 +18,7 @@  static void chacha_permute(u32 *x, int nrounds)
 	int i;
 
 	/* whitelist the allowed round counts */
-	WARN_ON_ONCE(nrounds != 20 && nrounds != 12);
+	WARN_ON_ONCE(nrounds != 20 && nrounds != 12 && nrounds != 8);
 
 	for (i = 0; i < nrounds; i += 2) {
 		x[0]  += x[4];    x[12] = rol32(x[12] ^ x[0],  16);
@@ -67,7 +67,7 @@  static void chacha_permute(u32 *x, int nrounds)
  * chacha_block_generic - generate one keystream block and increment block counter
  * @state: input state matrix (16 32-bit words)
  * @stream: output keystream block (64 bytes)
- * @nrounds: number of rounds (20 or 12; 20 is recommended)
+ * @nrounds: number of rounds (20, 12, or 8; 20 is recommended)
  *
  * This is the ChaCha core, a function from 64-byte strings to 64-byte strings.
  * The caller has already converted the endianness of the input.  This function
@@ -93,7 +93,7 @@  EXPORT_SYMBOL(chacha_block_generic);
  * hchacha_block_generic - abbreviated ChaCha core, for XChaCha
  * @state: input state matrix (16 32-bit words)
  * @stream: output (8 32-bit words)
- * @nrounds: number of rounds (20 or 12; 20 is recommended)
+ * @nrounds: number of rounds (20, 12, or 8; 20 is recommended)
  *
  * HChaCha is the ChaCha equivalent of HSalsa and is an intermediate step
  * towards XChaCha (see https://cr.yp.to/snuffle/xsalsa-20081128.pdf).  HChaCha