From patchwork Mon Aug 19 19:31:07 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 820278 Delivered-To: patch@linaro.org Received: by 2002:adf:a3c8:0:b0:367:895a:4699 with SMTP id m8csp1591815wrb; Mon, 19 Aug 2024 12:36:09 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVaWLxvxJreA0QiZnvHI2LeMTWJtVNRtrFrJDdWcLlupuHeaJ5T0Od1e0PwuzVyYggULG7JBl5ZIc2R4CeWIc2n X-Google-Smtp-Source: AGHT+IF+NpGYLE0YIfNK2gDfp2it7+kPBDTNwT8KfJq6ydg2L+i6xelkHIafpHFeDMasz2+yNzDQ X-Received: by 2002:a05:690c:93:b0:62c:c5ed:234e with SMTP id 00721157ae682-6b1bdf07a50mr127571137b3.36.1724096169729; Mon, 19 Aug 2024 12:36:09 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1724096169; cv=pass; d=google.com; s=arc-20160816; b=K+gDVfNCFNVTuVlBYaKZh69FmFpfSlmAvMl+RFukpDgt71oFmOrT9avslK7ZOfRyuA 7V+xlmIFdQlj6ZFBd1hmT/M2t/iIRbjAA4A80AlVo4LmbHE4Z6H0M9eq1+xj0efmr1lN p5nzp/7czyth54LMszE/yiisTxxkpiqX4AwUo5KOaOn0BCKP02RQn7SztTGRbodMbmoq 6tMilQL/b4S/DDyiu5ktbiLk7h79Rcuj7XL/fLjtmkmREapGwj0O0k33Fldt104sx79A ZieXy8nB2fq1yPT8dkIAIiM/2B9+h39V/ThCBMt9jhfOhaU9QU53ookPNqoTpqj25xJ5 BQ3A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=yV9UPBAVU4exdxyfYtg2fYb+cIEw1eDOtMxljht+7Yw=; fh=VJcMz8pp/yFiAG18dUzouXT40/3fcA6HO5lYK7cLUVQ=; b=sU2thWdaY4q3I3UdqWrwdTWQoDjnpdrI4qcKC0BBHeJ+Ps13IrlNvuvAAgZzRcZNRC Pj2MSSlv/ir2nRurgVq3hFtXbUkVU/D4h6DgW7Q4hcIuf+Hmx9l2McnP7TptGgWcwXox E3ryrYITeGteQoG6107W8UbQvuxpqls46Wh0VpUTeQblEA2YhMAEWhUMW0GdWdfI3eZD Inmuv5Cf4YbzS1pe21NYpzdAJYgajxzim32EkbpHX6rfqHGjC/2HBAl8SeqhCTKCLn9I ep4xtEuF+7aP6QfBN9wMYTMnArC+bOOf1JS6fGgqqTplvNeVtjkjiYzfXSSmrkU1jKMQ zxVg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EgvXOUH9; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id 6a1803df08f44-6bf6fdfd5aasi109023426d6.175.2024.08.19.12.36.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Aug 2024 12:36:09 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=EgvXOUH9; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 48E2D384A4B1 for ; Mon, 19 Aug 2024 19:36:09 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by sourceware.org (Postfix) with ESMTPS id 650E7384A86E for ; Mon, 19 Aug 2024 19:35:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 650E7384A86E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 650E7384A86E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::62d ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724096158; cv=none; b=kow3tXOt6QG+o+N5ZnFvpE9/U5mdVnOnzktnFpa85WD5kJKSeriZRYwxL7HP5mz2BX+h7YKINjbxB3+vFoXVkHViOkmpbaMmgkI16NHBULfMRQgdFBwIdAwCkWEr2QYTjUJm9x/gB33Es8alhfN0OnVyv30QhN3ujsekoqyH59s= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1724096158; c=relaxed/simple; bh=zLPW9LM279CGkHQikOXaBie/7+/i+frX5ogTfmt/BAY=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=caFF/N/4GgPmn6zfHdQckQtV7+HQLbgjaiO2xKDg0tMgv1xjOrYCq4RyIgxBFOQ2lPNxycUUvUDIpd7QCpaJ6z9yCtJ5G7DkXPoCId2gVBrL9iDUZXKN567sIbX/MXtiar7nmLzVfRaWanmnhHU9wWZQrIoQh83hYUjJf4xoBWc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-201ed196debso36836145ad.1 for ; Mon, 19 Aug 2024 12:35:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1724096152; x=1724700952; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=yV9UPBAVU4exdxyfYtg2fYb+cIEw1eDOtMxljht+7Yw=; b=EgvXOUH9CQIUPK99gVUMW5jZUS3sOe6w8X5albbyq1YlBzJGmg6WL9KS2GqDNw5Psy 6wvMsiz3t0eC+9eb3RIbdfzid4i1HtBT7hMNXjOD8KZS3YlvcZZlWommnHXp+oqL50pm N+I6zm9SoEEXSfMX5bqdo9WSnkc/Bt2Qhk9vHFPiasNBiJI/zuYHinWXzRQIa58v8KEy F4tOxL9cut525OyYpnYGpohPVNd98umMymv1CHWk1K/5Qhwj93+I1pdKg/zBy8b6ctsY S4uMsBnKFZuUhdBWR7UtieXUfSEPUB2ahTX3SI/SPM9+FpHXypCLvjmqomPKOJf0x4fQ 0vAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724096152; x=1724700952; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yV9UPBAVU4exdxyfYtg2fYb+cIEw1eDOtMxljht+7Yw=; b=AeAV1fQHRgfx1GAFqbSPPbZ8VKaiDW6h3OIDXaGv4stbCVdC1+7JsjpjtP+mAVeLFT fbtfTUrni4il3ZCgY27U2Hx/qXbAvL0fCbCRwKU+nNw4lFy34UzD7CHaoavsviABKToY W9kmOy1W4GtdBhSClwarMpRRTdaz4HojxOFlR3hMncXxo04wqiqn+m/qsucqACj9LeL8 4YKfwAAgGZYqWhAeHwxzBr5+nSzxODWFgoZnQkA/EpY47wfqdXLNd5gibT7ZL0oDMeOI 3Dd4W1Q3ER5mQR4Vk/fTgaReBPQ7LKvqM/T4kIp3n23dKnsVxc8glgah7D+KO/vFInmQ wP4A== X-Gm-Message-State: AOJu0YytGdZAJOcNwVI1mOTKpQouWAgyhMQKiGGykFd81sV6I/A8NnFo z8DSwYd3/xH1zemFGlrFgla7fx3N/oiiNzicOi9XVQBcDlL5k2MkI6s+7rlmNsaIakEbfAac5ty H X-Received: by 2002:a17:903:4345:b0:201:f8b4:3e3c with SMTP id d9443c01a7336-20203e49439mr89415935ad.12.1724096151581; Mon, 19 Aug 2024 12:35:51 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c2:d053:c537:6ce4:f0d7:d513]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-201f02fb7afsm65731335ad.5.2024.08.19.12.35.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 19 Aug 2024 12:35:51 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: "Jason A . Donenfeld" , Florian Weimer Subject: [PATCH v3] linux: Add support for getrandom vDSO Date: Mon, 19 Aug 2024 16:31:07 -0300 Message-ID: <20240819193540.17711-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-12.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org Linux 6.11 will support calling getrandom() from the vDSO (commit 4ad10a5f5f78a5b3e525a63bd075a4eb1139dde1). It operates on a thread-local opaque state allocated with mmap using flags specified by the vDSO. Multiple states are allocated at once, as many as fit into a page, and these are held in an array of available states to be doled out to each thread upon first use, and recycled when a thread terminates. As these states run low, more are allocated. To make this procedure async-signal-safe, a simple guard is used in the LSB of the opaque state address, falling back to the syscall if there's reentrancy contention. Also, fork() is handled by blocking signals on opaque state allocation (so Fork() always see a consistent state even if interrupts a getrandom() call) and by iterating over the thread stack cache on reclaim_stack. Each opaque states will be either in the free states list (grnd_alloc.states) or allocated to a running thread. It's currently enabled for x86_64. As the kernel support gains more platforms (arm64 is in the works), this can be easily turned on for those. Checked on x86_64-linux-gnu. Co-authored-by: Jason A. Donenfeld NB: this potentially makes getrandom a non-cancellable entrypoint if the vDSO syscall is called for any reason. I am not sure about where to make the vDSO to *not* issue the syscall in such case, or make getrandom() a non-cancellable entrypoint. However, this patch should be orthogonal to wheter change in the kernel or change getrandom() semantic. --- Changes from v2: * Move the getrandom opaque state buffer to 'struct pthread'. * Move the state release to start_thread and after signals are blocked, to avoid a possible concurrency update. * Move the state reset on fork() to reclaim_stack. This makes all the logic of handling threads on one place, and simplify the __getrandom_fork_subprocess (no need to extra argument). * Fixed some style issue on comments. * Do not use mremap to reallocate the free states, it avoids a possible fork() issue where the allocator state pointer update is interrupted just after the mremap call returns, but before the assignment. This is done by mmap/munmap a a new buffer with the new size, a fork() will not invalidate the old buffer. * Block all signals before taking the lock to avoid the _Fork() issue if it interrupts getrandom while the lock is taken. --- include/sys/random.h | 4 + malloc/malloc.c | 4 +- nptl/allocatestack.c | 2 + nptl/descr.h | 3 + nptl/pthread_create.c | 5 + sysdeps/generic/not-cancel.h | 4 +- sysdeps/mach/hurd/not-cancel.h | 4 +- sysdeps/nptl/_Fork.c | 2 + sysdeps/nptl/fork.h | 12 + sysdeps/unix/sysv/linux/dl-vdso-setup.c | 5 + sysdeps/unix/sysv/linux/dl-vdso-setup.h | 3 + sysdeps/unix/sysv/linux/getrandom.c | 235 ++++++++++++++++++- sysdeps/unix/sysv/linux/getrandom_vdso.h | 36 +++ sysdeps/unix/sysv/linux/include/sys/random.h | 11 + sysdeps/unix/sysv/linux/not-cancel.h | 7 +- sysdeps/unix/sysv/linux/x86_64/sysdep.h | 1 + 16 files changed, 331 insertions(+), 7 deletions(-) create mode 100644 sysdeps/unix/sysv/linux/getrandom_vdso.h create mode 100644 sysdeps/unix/sysv/linux/include/sys/random.h diff --git a/include/sys/random.h b/include/sys/random.h index 6aa313d35d..35f64a0339 100644 --- a/include/sys/random.h +++ b/include/sys/random.h @@ -1,8 +1,12 @@ #ifndef _SYS_RANDOM_H #include +#include_next + # ifndef _ISOMAC +# include + extern ssize_t __getrandom (void *__buffer, size_t __length, unsigned int __flags) __wur; libc_hidden_proto (__getrandom) diff --git a/malloc/malloc.c b/malloc/malloc.c index bcb6e5b83c..9e577ab900 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -3140,8 +3140,8 @@ static void tcache_key_initialize (void) { /* We need to use the _nostatus version here, see BZ 29624. */ - if (__getrandom_nocancel_nostatus (&tcache_key, sizeof(tcache_key), - GRND_NONBLOCK) + if (__getrandom_nocancel_nostatus_direct (&tcache_key, sizeof(tcache_key), + GRND_NONBLOCK) != sizeof (tcache_key)) { tcache_key = random_bits (); diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 2cb562f8ea..d9adb5856c 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -132,6 +132,8 @@ get_cached_stack (size_t *sizep, void **memp) __libc_lock_init (result->exit_lock); memset (&result->tls_state, 0, sizeof result->tls_state); + result->getrandom_buf = NULL; + /* Clear the DTV. */ dtv_t *dtv = GET_DTV (TLS_TPADJ (result)); for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt) diff --git a/nptl/descr.h b/nptl/descr.h index 8cef95810c..4697f633e1 100644 --- a/nptl/descr.h +++ b/nptl/descr.h @@ -404,6 +404,9 @@ struct pthread /* Used on strsignal. */ struct tls_internal_t tls_state; + /* getrandom vDSO per-thread opaque state. */ + void *getrandom_buf; + /* rseq area registered with the kernel. Use a custom definition here to isolate from kernel struct rseq changes. The implementation of sched_getcpu needs acccess to the cpu_id field; diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 1d3665d5ed..d1f5568b3b 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -38,6 +38,7 @@ #include #include #include +#include #include @@ -549,6 +550,10 @@ start_thread (void *arg) } #endif + /* Release the vDSO getrandom per-thread buffer with all signal blocked, + to avoid creating a new free-state block during thread release. */ + __getrandom_vdso_release (pd); + if (!pd->user_stack) advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd, pd->guardsize); diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h index 2dd1064600..8e3f49cc07 100644 --- a/sysdeps/generic/not-cancel.h +++ b/sysdeps/generic/not-cancel.h @@ -51,7 +51,9 @@ __fcntl64 (fd, cmd, __VA_ARGS__) #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) -#define __getrandom_nocancel_nostatus(buf, size, flags) \ +#define __getrandom_nocancel_direct(buf, size, flags) \ + __getrandom (buf, size, flags) +#define __getrandom_nocancel_nostatus_direct(buf, size, flags) \ __getrandom (buf, size, flags) #define __poll_infinity_nocancel(fds, nfds) \ __poll (fds, nfds, -1) diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h index 69fb3c00ef..ec5f5aa895 100644 --- a/sysdeps/mach/hurd/not-cancel.h +++ b/sysdeps/mach/hurd/not-cancel.h @@ -79,7 +79,7 @@ __typeof (__fcntl) __fcntl_nocancel; /* Non cancellable getrandom syscall that does not also set errno in case of failure. */ static inline ssize_t -__getrandom_nocancel_nostatus (void *buf, size_t buflen, unsigned int flags) +__getrandom_nocancel_nostatus_direct (void *buf, size_t buflen, unsigned int flags) { int save_errno = errno; ssize_t r = __getrandom (buf, buflen, flags); @@ -90,6 +90,8 @@ __getrandom_nocancel_nostatus (void *buf, size_t buflen, unsigned int flags) #define __getrandom_nocancel(buf, size, flags) \ __getrandom (buf, size, flags) +#define __getrandom_nocancel_direct(buf, size, flags) \ + __getrandom (buf, size, flags) #define __poll_infinity_nocancel(fds, nfds) \ __poll (fds, nfds, -1) diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index ef199ddbc3..adb7c18b29 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -18,6 +18,7 @@ #include #include +#include pid_t _Fork (void) @@ -43,6 +44,7 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); + call_function_static_weak (__getrandom_fork_subprocess); } return pid; } diff --git a/sysdeps/nptl/fork.h b/sysdeps/nptl/fork.h index 7643926df9..106b2cf71d 100644 --- a/sysdeps/nptl/fork.h +++ b/sysdeps/nptl/fork.h @@ -26,6 +26,7 @@ #include #include #include +#include static inline void fork_system_setup (void) @@ -46,6 +47,7 @@ fork_system_setup_after_fork (void) call_function_static_weak (__mq_notify_fork_subprocess); call_function_static_weak (__timer_fork_subprocess); + call_function_static_weak (__getrandom_fork_subprocess); } /* In case of a fork() call the memory allocation in the child will be @@ -128,9 +130,19 @@ reclaim_stacks (void) curp->specific_used = true; } } + + call_function_static_weak (__getrandom_reset_state, curp); } } + /* Also reset stale getrandom states for user stack threads. */ + list_for_each (runp, &GL (dl_stack_user)) + { + struct pthread *curp = list_entry (runp, struct pthread, list); + if (curp != self) + call_function_static_weak (__getrandom_reset_state, curp); + } + /* Add the stack of all running threads to the cache. */ list_splice (&GL (dl_stack_used), &GL (dl_stack_cache)); diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.c b/sysdeps/unix/sysv/linux/dl-vdso-setup.c index 3a44944dbb..476c6db75a 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.c +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.c @@ -66,6 +66,11 @@ PROCINFO_CLASS int (*_dl_vdso_clock_getres) (clockid_t, PROCINFO_CLASS int (*_dl_vdso_clock_getres_time64) (clockid_t, struct __timespec64 *) RELRO; # endif +# ifdef HAVE_GETRANDOM_VSYSCALL +PROCINFO_CLASS ssize_t (*_dl_vdso_getrandom) (void *buffer, size_t len, + unsigned int flags, void *state, + size_t state_len) RELRO; +# endif /* PowerPC specific ones. */ # ifdef HAVE_GET_TBFREQ diff --git a/sysdeps/unix/sysv/linux/dl-vdso-setup.h b/sysdeps/unix/sysv/linux/dl-vdso-setup.h index 8aee5a8212..cde99f608c 100644 --- a/sysdeps/unix/sysv/linux/dl-vdso-setup.h +++ b/sysdeps/unix/sysv/linux/dl-vdso-setup.h @@ -50,6 +50,9 @@ setup_vdso_pointers (void) #ifdef HAVE_RISCV_HWPROBE GLRO(dl_vdso_riscv_hwprobe) = dl_vdso_vsym (HAVE_RISCV_HWPROBE); #endif +#ifdef HAVE_GETRANDOM_VSYSCALL + GLRO(dl_vdso_getrandom) = dl_vdso_vsym (HAVE_GETRANDOM_VSYSCALL); +#endif } #endif diff --git a/sysdeps/unix/sysv/linux/getrandom.c b/sysdeps/unix/sysv/linux/getrandom.c index 777d1decf0..cf44d90b44 100644 --- a/sysdeps/unix/sysv/linux/getrandom.c +++ b/sysdeps/unix/sysv/linux/getrandom.c @@ -21,12 +21,245 @@ #include #include +#ifdef HAVE_GETRANDOM_VSYSCALL +# include +# include +# include +# include +# include +# include +# include +# include + +# define ALIGN_PAGE(p) PTR_ALIGN_UP (p, GLRO (dl_pagesize)) +# define READ_ONCE(p) (*((volatile typeof (p) *) (&(p)))) +# define WRITE_ONCE(p, v) (*((volatile typeof (p) *) (&(p))) = (v)) +# define RESERVE_PTR(p) ((void *) ((uintptr_t) (p) | 1UL)) +# define RELEASE_PTR(p) ((void *) ((uintptr_t) (p) & ~1UL)) +# define IS_RESERVED_PTR(p) (!!((uintptr_t) (p) & 1UL)) + +static struct +{ + /* Must be held always on access, as this is used by multiple threads. */ + __libc_lock_define (, lock); + + /* Stack of opaque states for use in vgetrandom. */ + void **states; + + /* Size of each opaque state, copied from vgetrandom_opaque_params. */ + size_t state_size; + + /* Number of states available in the queue. */ + size_t len; + + /* Number of states in the queue plus the number of states used in + threads. */ + size_t total; + + /* Number of states that the states array can hold before needing to be + resized. */ + size_t cap; +} grnd_alloc = { .lock = LLL_LOCK_INITIALIZER }; + +/* Allocate an opaque state for vgetrandom. If the allocator does not have + any, mmap() another page of them. */ +static void * +vgetrandom_get_state (void) +{ + void *state = NULL; + + /* The signal blocking avoid the potential issue where _Fork() (which is + async-signal-safe) is called with the lock taken. The function is called + only once during thread lifetime, so the overhead should be ok. */ + internal_sigset_t set; + internal_signal_block_all (&set); + __libc_lock_lock (grnd_alloc.lock); + + if (grnd_alloc.len == 0) + { + struct vgetrandom_opaque_params params; + size_t block_size, num = __get_nprocs (); /* Just a decent heuristic. */ + void *block; + + if (GLRO (dl_vdso_getrandom) (NULL, 0, 0, ¶ms, ~0UL) != 0) + goto out; + grnd_alloc.state_size = params.size_of_opaque_state; + + block_size = ALIGN_PAGE (num * grnd_alloc.state_size); + num = (GLRO (dl_pagesize) / grnd_alloc.state_size) * + (block_size / GLRO (dl_pagesize)); + block = __mmap (0, block_size, params.mmap_prot, + params.mmap_flags, -1, 0); + if (block == MAP_FAILED) + goto out; + __set_vma_name (block, block_size, " glibc: getrandom"); + + if (grnd_alloc.total + num > grnd_alloc.cap) + { + void **states; + size_t states_size; + /* Use a new mmap instead of trying to mremap. It avoids a + potential multithread fork issue where fork is called just after + mremap returns but before assigning to the grnd_alloc.states, + thus making the its value invalid in the child. */ + void *old_states = grnd_alloc.states; + size_t old_states_size = ALIGN_PAGE (sizeof (*grnd_alloc.states) * + grnd_alloc.total + num); + if (grnd_alloc.states == NULL) + states_size = old_states_size; + else + states_size = ALIGN_PAGE (sizeof (*grnd_alloc.states) + * grnd_alloc.cap); + + states = __mmap (NULL, states_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (states == MAP_FAILED) + goto unmap; + + /* Atomically replace the old state, so if a fork happens the child + process will see a consistent free state buffer. The size might + not be updated, but it does not really matter since the buffer is + always increased. */ + atomic_store_relaxed (&grnd_alloc.states, states); + if (old_states != NULL) + __munmap (old_states, old_states_size); + + __set_vma_name (states, states_size, " glibc: getrandom states"); + grnd_alloc.cap = states_size / sizeof (*grnd_alloc.states); + } + + for (size_t i = 0; i < num; ++i) + { + /* States should not straddle a page. */ + if (((uintptr_t) block & (GLRO (dl_pagesize) - 1)) + + grnd_alloc.state_size > GLRO (dl_pagesize)) + block = ALIGN_PAGE (block); + grnd_alloc.states[i] = block; + block += grnd_alloc.state_size; + } + grnd_alloc.len = num; + grnd_alloc.total += num; + goto success; + + unmap: + __munmap (block, block_size); + goto out; + } + +success: + state = grnd_alloc.states[--grnd_alloc.len]; + +out: + __libc_lock_unlock (grnd_alloc.lock); + internal_signal_restore_set (&set); + return state; +} + +/* Returns true when vgetrandom is used successfully. Returns false if the + syscall fallback should be issued in the case the vDSO is not present, in + the case of reentrancy, or if any memory allocation fails. */ +static bool +__getrandom_vdso (ssize_t *ret, void *buffer, size_t length, + unsigned int flags) +{ + if (GLRO (dl_vdso_getrandom) == NULL) + return false; + + struct pthread *self = THREAD_SELF; + + /* If the LSB of getrandom_buf is set, then this function is already being + called, and we have a reentrant call from a signal handler. In this case + fallback to the syscall. */ + void *state = READ_ONCE (self->getrandom_buf); + if (IS_RESERVED_PTR (state)) + return false; + WRITE_ONCE (self->getrandom_buf, RESERVE_PTR (state)); + + bool r = false; + if (state == NULL) + { + state = vgetrandom_get_state (); + if (state == NULL) + goto out; + } + + *ret = GLRO (dl_vdso_getrandom) (buffer, length, flags, state, + grnd_alloc.state_size); + if (INTERNAL_SYSCALL_ERROR_P (*ret)) + { + __set_errno (INTERNAL_SYSCALL_ERRNO (*ret)); + *ret = -1; + } + r = true; + +out: + WRITE_ONCE (self->getrandom_buf, state); + return r; +} +#endif + +/* Re-add the state state from CURP on the free list. */ +void +__getrandom_reset_state (struct pthread *curp) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + if (curp->getrandom_buf == NULL) + return; + grnd_alloc.states[grnd_alloc.len++] = RELEASE_PTR (curp->getrandom_buf); + curp->getrandom_buf = NULL; +#endif +} + +/* Called when a thread terminates, and adds its random buffer back into the + allocator pool for use in a future thread. */ +void +__getrandom_vdso_release (struct pthread *curp) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + if (curp->getrandom_buf == NULL) + return; + + __libc_lock_lock (grnd_alloc.lock); + grnd_alloc.states[grnd_alloc.len++] = curp->getrandom_buf; + __libc_lock_unlock (grnd_alloc.lock); +#endif +} + +/* Reset the internal lock state in case another thread has locked while + this thread calls fork. The stale thread states will be handled by + reclaim_stacks which calls __getrandom_reset_state on each thread. */ +void +__getrandom_fork_subprocess (void) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + grnd_alloc.lock = LLL_LOCK_INITIALIZER; +#endif +} + +ssize_t +__getrandom_nocancel (void *buffer, size_t length, unsigned int flags) +{ +#ifdef HAVE_GETRANDOM_VSYSCALL + ssize_t r; + if (__getrandom_vdso (&r, buffer, length, flags)) + return r; +#endif + + return INLINE_SYSCALL_CALL (getrandom, buffer, length, flags); +} + /* Write up to LENGTH bytes of randomness starting at BUFFER. Return the number of bytes written, or -1 on error. */ ssize_t __getrandom (void *buffer, size_t length, unsigned int flags) { - return SYSCALL_CANCEL (getrandom, buffer, length, flags); +#ifdef HAVE_GETRANDOM_VSYSCALL + ssize_t r; + if (__getrandom_vdso (&r, buffer, length, flags)) + return r; +#endif + + return INTERNAL_SYSCALL_CALL (getrandom, buffer, length, flags); } libc_hidden_def (__getrandom) weak_alias (__getrandom, getrandom) diff --git a/sysdeps/unix/sysv/linux/getrandom_vdso.h b/sysdeps/unix/sysv/linux/getrandom_vdso.h new file mode 100644 index 0000000000..d1ef690e50 --- /dev/null +++ b/sysdeps/unix/sysv/linux/getrandom_vdso.h @@ -0,0 +1,36 @@ +/* Linux getrandom vDSO support. + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _GETRANDOM_VDSO_H +#define _GETRANDOM_VDSO_H + +#include +#include +#include + +/* Used to query the vDSO for the required mmap flags and the opaque + per-thread state size Defined by linux/random.h. */ +struct vgetrandom_opaque_params +{ + uint32_t size_of_opaque_state; + uint32_t mmap_prot; + uint32_t mmap_flags; + uint32_t reserved[13]; +}; + +#endif diff --git a/sysdeps/unix/sysv/linux/include/sys/random.h b/sysdeps/unix/sysv/linux/include/sys/random.h new file mode 100644 index 0000000000..3b96f1f287 --- /dev/null +++ b/sysdeps/unix/sysv/linux/include/sys/random.h @@ -0,0 +1,11 @@ +#ifndef _LINUX_SYS_RANDOM +#define _LINUX_SYS_RANDOm + +# ifndef _ISOMAC +# include + +extern void __getrandom_fork_subprocess (void) attribute_hidden; +extern void __getrandom_vdso_release (struct pthread *curp) attribute_hidden; +extern void __getrandom_reset_state (struct pthread *curp) attribute_hidden; +# endif +#endif diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 2a7585b73f..12f26912d3 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -27,6 +27,7 @@ #include #include #include +#include /* Non cancellable open syscall. */ __typeof (open) __open_nocancel; @@ -84,15 +85,17 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt) } static inline ssize_t -__getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) +__getrandom_nocancel_direct (void *buf, size_t buflen, unsigned int flags) { return INLINE_SYSCALL_CALL (getrandom, buf, buflen, flags); } +__typeof (getrandom) __getrandom_nocancel attribute_hidden; + /* Non cancellable getrandom syscall that does not also set errno in case of failure. */ static inline ssize_t -__getrandom_nocancel_nostatus (void *buf, size_t buflen, unsigned int flags) +__getrandom_nocancel_nostatus_direct (void *buf, size_t buflen, unsigned int flags) { return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags); } diff --git a/sysdeps/unix/sysv/linux/x86_64/sysdep.h b/sysdeps/unix/sysv/linux/x86_64/sysdep.h index a2b021bd86..7dc072ae2d 100644 --- a/sysdeps/unix/sysv/linux/x86_64/sysdep.h +++ b/sysdeps/unix/sysv/linux/x86_64/sysdep.h @@ -376,6 +376,7 @@ # define HAVE_TIME_VSYSCALL "__vdso_time" # define HAVE_GETCPU_VSYSCALL "__vdso_getcpu" # define HAVE_CLOCK_GETRES64_VSYSCALL "__vdso_clock_getres" +# define HAVE_GETRANDOM_VSYSCALL "__vdso_getrandom" # define HAVE_CLONE3_WRAPPER 1