mbox series

[RFC,0/4] random: a simple vDSO mechanism for reseeding userspace CSPRNGs

Message ID cover.1673539719.git.ydroneaud@opteya.com
Headers show
Series random: a simple vDSO mechanism for reseeding userspace CSPRNGs | expand

Message

Yann Droneaud Jan. 12, 2023, 5:02 p.m. UTC
Hi,

Here's my humble hack at improving kernel for a faster secure arc4random()
userspace implementation, by allowing userspace to buffer getrandom()
generated entropy, discarding it as the kernel's own CSPRNG is reseeded.

It's largely built upon the vDSO work of Jason A. Donenfeld, as part of
its latest patchset "[PATCH v14 0/7] implement getrandom() in vDSO" [1]
but it's made simpler by making available only one of the missing tools
for the userspace to properly buffer the output of getrandom().

Using MADV_WIPEONFORK and mlock(), userspace can reasonably offer forward
secrecy*, until something like VM_DROPPABLE[2] is provided by the kernel,
to allow for the buffer memory to never, ever be written to the disk
before its used, being inherited accross fork(), and isn't limited by
RLIMIT_MEMLOCK.

 * provided userspace can mlock() the memory and calls mlock() on buffer
   after fork, as memory locks are not inherited accross fork().

As it's a hack, it's far from perfect. The main drawback I see is the
case where fresh entropy has to be discarded as the kernel's CSPRNG
generation is updated as the result of calling getrandom() to generate
the mentionned entropy. The workaround, is to limit the amount of fresh
entropy fetched when a kernel's CSPRNG generation change is detected,
and to increase the amount the data retrieved with getrandom() when
generation doesn't change between calls.

Performance wise, the improvements are here, as one can check with the
test program provided:

    getrandom(,,GRND_TIMESTAMP) test
    getrandom() support GRND_TIMESTAMP
    found getrandom() in vDSO at 0x7ffc3efccc60
    == direct syscall getrandom(), 16777216 u32, 2.866324020 s,   5.853 M u32/s, 170.846 ns/u32
    == direct vDSO getrandom(),    16777216 u32, 2.883473280 s,   5.818 M u32/s, 171.868 ns/u32
    == pooled syscall getrandom(), 16777216 u32, 1.152421219 s,  14.558 M u32/s,  68.690 ns/u32, (0 bytes discarded)
    == pooled vDSO getrandom(),    16777216 u32, 0.162477863 s, 103.258 M u32/s,   9.684 ns/u32, (0 bytes discarded)

With the requirement to mlock() the memory page(s) used to buffer
getrandom() output, I'm not sure userspace could afford to allocate
4KBytes per thread, before being hit by RLIMIT_MEMLOCK (or worse,
OOM killer). Thus, some form of sharing between threads would be
needed, which would require locking, reducing the performances
shown above.

Also I haven't studied the security impact of making the kernel base
CSPRNG seed generation available to userspace. It can be made more
opaque if needed.

Regards.

[1] https://lore.kernel.org/all/20230101162910.710293-1-Jason@zx2c4.com/
[2] https://lore.kernel.org/all/20230101162910.710293-3-Jason@zx2c4.com/

Jason A. Donenfeld (2):
  random: introduce generic vDSO getrandom(,, GRND_TIMESTAMP) fast path
  x86: vdso: Wire up getrandom() vDSO implementation.

Yann Droneaud (2):
  random: introduce getrandom() GRND_TIMESTAMP
  testing: add a getrandom() GRND_TIMESTAMP vDSO demonstration/benchmark

 MAINTAINERS                                   |   1 +
 arch/x86/Kconfig                              |   1 +
 arch/x86/entry/vdso/Makefile                  |   3 +-
 arch/x86/entry/vdso/vdso.lds.S                |   2 +
 arch/x86/entry/vdso/vgetrandom.c              |  17 +
 arch/x86/include/asm/vdso/getrandom.h         |  42 +++
 arch/x86/include/asm/vdso/vsyscall.h          |   2 +
 arch/x86/include/asm/vvar.h                   |  16 +
 drivers/char/random.c                         |  52 ++-
 include/linux/random.h                        |  31 ++
 include/uapi/linux/random.h                   |   2 +
 include/vdso/datapage.h                       |   9 +
 lib/vdso/Kconfig                              |   5 +
 lib/vdso/getrandom.c                          |  51 +++
 tools/testing/crypto/getrandom/Makefile       |   4 +
 .../testing/crypto/getrandom/test-getrandom.c | 307 ++++++++++++++++++
 16 files changed, 543 insertions(+), 2 deletions(-)
 create mode 100644 arch/x86/entry/vdso/vgetrandom.c
 create mode 100644 arch/x86/include/asm/vdso/getrandom.h
 create mode 100644 lib/vdso/getrandom.c
 create mode 100644 tools/testing/crypto/getrandom/Makefile
 create mode 100644 tools/testing/crypto/getrandom/test-getrandom.c

Comments

Jason A. Donenfeld Jan. 12, 2023, 5:07 p.m. UTC | #1
Sorry Yann, but I'm not interested in this approach, and I don't think
reviewing the details of it are a good allocation of time. I don't
want to lock the kernel into having specific reseeding semantics that
are a contract with userspace, which is what this approach does.
Please just let me iterate on my original patchset for a little bit,
without adding more junk to the already overly large conversation.
Peter Lafreniere Jan. 13, 2023, 12:32 a.m. UTC | #2
[...]

>GRND_TIMESTAMP allows userspace to ask the kernel if previous
>"timestamp" has changed as the result of an event that
>triggered kernel CSPRNG reseed, and to update the "timestamp".

>In case the "timestamp" hasn't changed, userspace CSPRNG can
>consume a slice of its buffered random stream.

>If it has changed, remaining userspace buffered random values
>should be discarded. Userspace should call getrandom() to fill
>and/or generate its buffer with updated seed.
>It's advised to test again the "timestamp" to deal with the
>race condition, where the kernel reseed just after the call
>to getrandom() to get entropy.

This second check is the 'safe thing' to do, but if you're that
worried about race conditions then this api is useless. You can't
ignore the inherent TOCTOU problem with the GRND_TIMESTAMP calls.

That said, the race condition is so tiny that the added overhead
of a third syscall (without vDSO support) starts to negate the
value in buffering the random numbers (not to mention adding
significant latency to every nth arc4random() call, for example).

IMO, for callers that cannot accept the risk, the getrandom(,,0)
option is the perfect alternative.

[...]

>+static ssize_t get_random_timestamp(char __user *ubuf, size_t len, unsigned int flags)
>+{
>+	u64 ts;
>+
>+	/* other combination not supported */
>+	if (WARN(flags != GRND_TIMESTAMP, "GRND_TIMESTAMP cannot be used with other flags"))
>+		return -EINVAL;

If userspace messes up the flags then it's the problem of the
caller. Why clutter the logs in that case?

At the very least this should be WARN_ONCE() to avoid log spam.

>+	/* shorter structure not supported */
>+	if (len < sizeof(ts))
>+		return -EINVAL;

This should be sizeof(u64) to match the vDSO patch and to avoid
having to change this condition if ts becomes larger.

>+
>+	if (copy_from_user(&ts, ubuf, sizeof(ts)))
>+		return -EFAULT;
>+
>+	/* longer structure supported, only if 0-padded,
>+	   timestamp might be extended in the future with more fields */
>+	if (len > sizeof(ts)) {
>+		char __user *p = ubuf + sizeof(ts);
>+		size_t l = len - sizeof(ts);
>+
>+		while (l) {
>+			char b;
>+
>+			if (get_user(b, p++))
>+				return -EFAULT;
>+
>+			if (b)
>+				return -EINVAL;
>+		}
>+	}
>+
>+	if (!get_random_timestamp_update(&ts, READ_ONCE(base_crng.generation)))
>+		return 0;
>+
>+	if (copy_to_user(ubuf, &ts, sizeof(ts)))
>+		return -EFAULT;
>+
>+	return sizeof(ts);
>+}
>+
> SYSCALL_DEFINE3(getrandom, char __user *, ubuf, size_t, len, unsigned int, flags)
> {
> 	struct iov_iter iter;
> 	struct iovec iov;
> 	int ret;
>
>-	if (flags & ~(GRND_NONBLOCK | GRND_RANDOM | GRND_INSECURE))
>+	if (flags & ~(GRND_NONBLOCK | GRND_RANDOM | GRND_INSECURE | GRND_TIMESTAMP))
> 		return -EINVAL;
>
>+	if (unlikely(flags & GRND_TIMESTAMP))
>+		return get_random_timestamp(ubuf, len, flags);
>+

I'd remove the unlikely(). I don't like to assume usage patterns.

> 	/*
> 	 * Requesting insecure and blocking randomness at the same time makes
> 	 * no sense.
>diff --git a/include/linux/random.h b/include/linux/random.h
>index b0a940af4fff..bc219b5a96a5 100644
>--- a/include/linux/random.h
>+++ b/include/linux/random.h
>@@ -161,4 +161,35 @@ int random_online_cpu(unsigned int cpu);
> extern const struct file_operations random_fops, urandom_fops;
> #endif
>
>+/*
>+ * get_random_timestamp_update()
>+ *
>+ * @generation: current CRNG generation (from base_crng.generation
>+ *              or _vdso_rng_data.generation)
>+ *
>+ * Return: timestamp size if previous timestamp is expired and is updated,
>+ *         0 if not expired (and not updated)
>+ */
>+static inline bool get_random_timestamp_update(u64 *user_ts,
>+					       u64 generation)
>+{
>+	u64 ts;
>+
>+	/* base_crng.generation is never ULONG_MAX,
>+	 * OTOH userspace will initialize its timestamp
>+	 * to 0, so inverting base_crng.generation ensure
>+	 * first call to getrandom(,,GRND_TIMESTAMP) will
>+	 * update
>+	 */

Rather than assuming that the timestamp will start zero-initilized,
expect it to be uninitilized. Either way the code works.

>+	ts = ~generation;
>+
>+	/* not expired ? no refresh suggested */
>+	if (*user_ts == ts)
>+		return false;
>+
>+	*user_ts = ts;
>+
>+	return true;
>+}
>+

Not that it matters much, but you could make generation a u64* that
gets dereferenced by get_random_timestamp_update(). It's cleaner for
the caller, barely changes this function, and will get inlined anyway.
I might just be imposing my personal style in this case though.

After a cursory skimming of the rest of the series I think that this
is a worthwhile direction to pursue. Jason's series is growing bulky
and this provides the needed slimming while solving the root problem.

The only thing I see immediately is the TOCTOU problem and the extra
steps needed to guarantee forward secrecy.

I should mention that I'm not a security or rng expert at all.

Cheers,
Peter Lafreniere
<peter@n8pjl.ca>
H. Peter Anvin Jan. 14, 2023, 2:22 a.m. UTC | #3
On 1/12/23 11:55, Yann Droneaud wrote:
> Hi
> 
> 12 janvier 2023 à 18:07 "Jason A. Donenfeld" <Jason@zx2c4.com> a écrit:
>   
>> Sorry Yann, but I'm not interested in this approach, and I don't think
>> reviewing the details of it are a good allocation of time. I don't
>> want to lock the kernel into having specific reseeding semantics that
>> are a contract with userspace, which is what this approach does.
> 
> This patch adds a mean for the kernel to tell userspace: between the
> last time you call us with getrandom(timestamp,, GRND_TIMESTAMP),
> something happened that trigger an update to the opaque cookie given
> to getrandom(timestamp, GRND_TIMESTAMP). When such update happen,
> userspace is advised to discard buffered random data and retry.
> 
> The meaning of the timestamp cookie is up to the kernel, and can be
> changed anytime. Userspace is not expected to read the content of this
> blob. Userspace only acts on the length returned by getrandom(,, GRND_TIMESTAMP):
>   -1 : not supported
>    0 : cookie not updated, no need to discard buffered data
>   >0 : cookie updated, userspace should discard buffered data
> 
> For the cookie, I've used a single u64, but two u64 could be a better start,
> providing room for implementing improved behavior in future kernel versions.
> 
>> Please just let me iterate on my original patchset for a little bit,
>> without adding more junk to the already overly large conversation.
> 
> I like the simplicity of my so called "junk". It's streamlined, doesn't
> require a new syscall, doesn't require a new copy of ChaCha20 code.
> 
> I'm sorry it doesn't fit your expectations.
> 

Why would anything more than a 64-bit counter be ever necessary? It only 
needs to be incremented.

Let user space manage keeping track of the cookie matching its own 
buffers. You do NOT want this to be stateful, because that's just 
begging for multiple libraries to step on each other.

Export the cookie from the vdso and volià, a very cheap check around any 
user space randomness buffer will work:

	static clone_cookie_t last_cookie;
	clone_cookie_t this_cookie;

	this_cookie = get_clone_cookie();
	do {
		while (this_cookie != last_cookie) {
			last_cookie = this_cookie;
			reinit_randomness();
			this_cookie = get_clone_cookie();
		}

		extract_randomness_from_buffer();
		this_cookie = get_clone_cookie();
	} while (this_cookie != last_cookie);

	last_cookie = this_cookie;

	-hpa
Andy Lutomirski Jan. 16, 2023, 7:49 p.m. UTC | #4
> On Jan 13, 2023, at 7:16 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> 
> On 1/12/23 11:55, Yann Droneaud wrote:
>> Hi
>> 12 janvier 2023 à 18:07 "Jason A. Donenfeld" <Jason@zx2c4.com> a écrit:
>>  
>>> Sorry Yann, but I'm not interested in this approach, and I don't think
>>> reviewing the details of it are a good allocation of time. I don't
>>> want to lock the kernel into having specific reseeding semantics that
>>> are a contract with userspace, which is what this approach does.
>> This patch adds a mean for the kernel to tell userspace: between the
>> last time you call us with getrandom(timestamp,, GRND_TIMESTAMP),
>> something happened that trigger an update to the opaque cookie given
>> to getrandom(timestamp, GRND_TIMESTAMP). When such update happen,
>> userspace is advised to discard buffered random data and retry.
>> The meaning of the timestamp cookie is up to the kernel, and can be
>> changed anytime. Userspace is not expected to read the content of this
>> blob. Userspace only acts on the length returned by getrandom(,, GRND_TIMESTAMP):
>>  -1 : not supported
>>   0 : cookie not updated, no need to discard buffered data
>>  >0 : cookie updated, userspace should discard buffered data
>> For the cookie, I've used a single u64, but two u64 could be a better start,
>> providing room for implementing improved behavior in future kernel versions.
>>> Please just let me iterate on my original patchset for a little bit,
>>> without adding more junk to the already overly large conversation.
>> I like the simplicity of my so called "junk". It's streamlined, doesn't
>> require a new syscall, doesn't require a new copy of ChaCha20 code.
>> I'm sorry it doesn't fit your expectations.
> 
> Why would anything more than a 64-bit counter be ever necessary? It only needs to be incremented.

This is completely broken with CRIU or, for that matter, with VM forking.

> 
> Let user space manage keeping track of the cookie matching its own buffers. You do NOT want this to be stateful, because that's just begging for multiple libraries to step on each other.
> 
> Export the cookie from the vdso and volià, a very cheap check around any user space randomness buffer will work:
> 
>    static clone_cookie_t last_cookie;
>    clone_cookie_t this_cookie;
> 
>    this_cookie = get_clone_cookie();
>    do {
>        while (this_cookie != last_cookie) {
>            last_cookie = this_cookie;
>            reinit_randomness();
>            this_cookie = get_clone_cookie();
>        }
> 
>        extract_randomness_from_buffer();
>        this_cookie = get_clone_cookie();
>    } while (this_cookie != last_cookie);
> 
>    last_cookie = this_cookie;
> 
>    -hpa
Yann Droneaud Jan. 19, 2023, 11:19 a.m. UTC | #5
Hi,

16 janvier 2023 à 20:50 "Andy Lutomirski" <luto@amacapital.net> a écrit:
> > On Jan 13, 2023, at 7:16 PM, H. Peter Anvin <hpa@zytor.com> wrote:  
> >  On 1/12/23 11:55, Yann Droneaud wrote:
> > >  12 janvier 2023 à 18:07 "Jason A. Donenfeld" <Jason@zx2c4.com> a écrit:
> > > 
> > 
> >  Sorry Yann, but I'm not interested in this approach, and I don't think
> >  reviewing the details of it are a good allocation of time. I don't
> >  want to lock the kernel into having specific reseeding semantics that
> >  are a contract with userspace, which is what this approach does.
> > 
> > > 
> > > This patch adds a mean for the kernel to tell userspace: between the
> > >  last time you call us with getrandom(timestamp,, GRND_TIMESTAMP),
> > >  something happened that trigger an update to the opaque cookie given
> > >  to getrandom(timestamp, GRND_TIMESTAMP). When such update happen,
> > >  userspace is advised to discard buffered random data and retry.
> > >  The meaning of the timestamp cookie is up to the kernel, and can be
> > >  changed anytime. Userspace is not expected to read the content of this
> > >  blob. Userspace only acts on the length returned by getrandom(,, GRND_TIMESTAMP):
> > >  -1 : not supported
> > >  0 : cookie not updated, no need to discard buffered data
> > >  >0 : cookie updated, userspace should discard buffered data
> > >  For the cookie, I've used a single u64, but two u64 could be a better start,
> > >  providing room for implementing improved behavior in future kernel versions.
> > > 
> > 
> >  Please just let me iterate on my original patchset for a little bit,
> >  without adding more junk to the already overly large conversation.
> > 
> > > 
> > > I like the simplicity of my so called "junk". It's streamlined, doesn't
> > >  require a new syscall, doesn't require a new copy of ChaCha20 code.
> > >  I'm sorry it doesn't fit your expectations.
> > > 
> > 
> >  
> >  Why would anything more than a 64-bit counter be ever necessary? It only needs to be incremented.
> > 
> 
> This is completely broken with CRIU or, for that matter, with VM forking.
>

Which raise the question of the support of CRIU with Jason's vDSO proposal.

AFAIK CRIU handle vDSO[1] by interposing symbols so that, on restore, the process
will call the interposed functions, which will resolve the new vDSO's functions.

vgetrandom_alloc() would have been called before the checkpoint, allocating one
opaque state of size x. After the restore, the vDSO's getrandom() would be given
this opaque state, expecting it having size y. As the content of the opaque state
should have been cleared per MADV_WIPEONFORK, there's nothing in the state that
could help vDSO's getrandom() to achieve backward compatibility.

I think backward compatibility can be achieved by adding an opaque state size
argument to vDSO's getrandom().

What to think Jason ?

[1] https://criu.org/Vdso

Regards.