From patchwork Thu Jul 14 11:28:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590373 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1476982max; Thu, 14 Jul 2022 04:29:56 -0700 (PDT) X-Google-Smtp-Source: AGRyM1v1Q7y/KVkYVRIK15qmbiHXk34/VKu4WeqVxXaG2qHPkf4/gQ3Ska1Ib8L60187vHP6dgYX X-Received: by 2002:a17:906:8a45:b0:72b:31d4:d537 with SMTP id gx5-20020a1709068a4500b0072b31d4d537mr8379353ejc.170.1657798195814; Thu, 14 Jul 2022 04:29:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798195; cv=none; d=google.com; s=arc-20160816; b=ZzYxn0vjJBlUDas/SNrZUkSlF0/FI/V9w1pASXs6NOdR5tUOl6aRGvJbQIbz7Rewgk aT3acASLJGc497+F5vOYUWIwIXYtO/XtmfZLeBqsiqa5yF1uiWs0H1VFsVGu7U3YC5c/ hHUZSKRcs7WqaTUwgT9sdbeMHKqhSjR9mszK/4/kLgGb8weQeeQV327izQ+TIpJmW+Kr OTYtpRCiamzmXZqzk3dcEWySZv1Y01RuPAgKId5A4ileMRMQk7m6mHi4xOVyLYfK0KlA tNMS+2ULP7glmX2Vl0lnOpk5TIAgVhKuIZDDxaTEMH8MPq2QIAb0kMH5sS83/14G6QLN d0cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=T3JdCYd9o4zYJNPiwNO2Yoo6sdCTRoWhBQz0NeOlXCs=; b=m7GdrHGiPraZFCPRb6HnyYqI5qja8k/wXqCQpEzrtKJGnRO+a4yHF6BnrtziLSE068 ku0OsF3aG8+dGDL/1nWFcJLPpYA9KLkMDa23vhJX3IQFB+MTRfTYOBJ6+ecbIaHgGFOD 5W2d1Kr3icKQASnDSWlE4RWSHlNnerDe7eBLpEU9PqoxpQZFjH5HLnRn5TH7RhISlrAl SJ2PF8c6Yf6/aUhO3DS6o25nbPADZbVL/yodReAp5V3xV/ecsAu/5ODYoZagdjuGwFEB pGk7nm13ryaXyyhhfCtM5SB9Ghd19u+7AI9Qvq4gJwkvrkjxnFdUKZgVLOK7dJxzIncB 6ACw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=fA5Ml0ld; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id m11-20020a17090607cb00b0072b6290f476si1495703ejc.842.2022.07.14.04.29.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:55 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=fA5Ml0ld; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3CB74388CC12 for ; Thu, 14 Jul 2022 11:29:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3CB74388CC12 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798194; bh=T3JdCYd9o4zYJNPiwNO2Yoo6sdCTRoWhBQz0NeOlXCs=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=fA5Ml0ldTbUJUkc3rjdsguZTxqWvIGRhmSchS3k0HdtHgEDJ7aEEJftSutRkoodTx hKhS55gRdIq9rzHvdT3QemeZFkHJ+hPsY5STM4GPDuedH6aNMc1vowgmIbqmlGaZZv E5Bd2Aq0yqpDkEtm6WR3h7AeJv7O0g12P00S/eg4= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x35.google.com (mail-oa1-x35.google.com [IPv6:2001:4860:4864:20::35]) by sourceware.org (Postfix) with ESMTPS id E78163887F5F for ; Thu, 14 Jul 2022 11:28:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E78163887F5F Received: by mail-oa1-x35.google.com with SMTP id 586e51a60fabf-10c0d96953fso2099050fac.0 for ; Thu, 14 Jul 2022 04:28:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=T3JdCYd9o4zYJNPiwNO2Yoo6sdCTRoWhBQz0NeOlXCs=; b=LOReRCVstLyhTc1X6u4CV1PIRMthYCmzuClJynx9mPA9placFqOaQubAu9P1MVtz9r I1t94e0GRf1k3y22dVzrQCWh3JrNFx4a3EA5K+laheMAerAZmt6PKAdtP3/J97WOkm80 GFbYDUKLReq5ygGQagXjNge73QRlTiLSE/U5YIdAV/z7+4drBIL5fjxQkJz1UPg/B5Sr l3/SsY4Hn6OaMd5KBevN2h/ujBtS5RDKZIDS4lGuBQfCy9oEmSZjMigpHxZD/myUtIzd 0PsqJHLAkKJbmyv67fpvOvjToOS5CmfLdbMIXwkbMSyC4O9+QvuBOa4Y1mT/fKpv0XOL ZM7g== X-Gm-Message-State: AJIora/rKlu4kprSx2c2AxrWrGUVILzSh8ACoR1LhGolpA9qEZYRrLge SqRwpMr1V2Vh/dDsjdSw0eJosJSTLnLPIw== X-Received: by 2002:a05:6870:e997:b0:10c:6f42:b05e with SMTP id r23-20020a056870e99700b0010c6f42b05emr4199443oao.89.1657798136104; Thu, 14 Jul 2022 04:28:56 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.28.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:28:55 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 1/9] stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417) Date: Thu, 14 Jul 2022 08:28:37 -0300 Message-Id: <20220714112845.704678-2-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Cc: Yann Droneaud Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto The implementation is based on scalar Chacha20 with per-thread cache. It uses getrandom or /dev/urandom as fallback to get the initial entropy, and reseeds the internal state on every 16MB of consumed buffer. To improve performance and lower memory consumption the per-thread cache is allocated lazily on first arc4random functions call, and if the memory allocation fails getentropy or /dev/urandom is used as fallback. The cache is also cleared on thread exit iff it was initialized (so if arc4random is not called it is not touched). Although it is lock-free, arc4random is still not async-signal-safe (the per thread state is not updated atomically). The ChaCha20 implementation is based on RFC8439 [1], omitting the final XOR of the keystream with the plaintext because the plaintext is a stream of zeros. This strategy is similar to what OpenBSD arc4random does. The arc4random_uniform is based on previous work by Florian Weimer, where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete Uniform Generation from Coin Flips, and Applications (2013) [2], who credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform random number generation (1976), for solving the general case. The main advantage of this method is the that the unit of randomness is not the uniform random variable (uint32_t), but a random bit. It optimizes the internal buffer sampling by initially consuming a 32-bit random variable and then sampling byte per byte. Depending of the upper bound requested, it might lead to better CPU utilization. Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. Co-authored-by: Florian Weimer Reviewed-by: Yann Droneaud [1] https://datatracker.ietf.org/doc/html/rfc8439 [2] https://arxiv.org/pdf/1304.1916.pdf --- NEWS | 4 + include/stdlib.h | 12 + malloc/thread-freeres.c | 2 +- nptl/allocatestack.c | 3 +- stdlib/Makefile | 2 + stdlib/Versions | 5 + stdlib/arc4random.c | 208 ++++++++++++++++++ stdlib/arc4random.h | 48 ++++ stdlib/arc4random_uniform.c | 140 ++++++++++++ stdlib/chacha20.c | 187 ++++++++++++++++ stdlib/stdlib.h | 13 ++ sysdeps/generic/not-cancel.h | 2 + sysdeps/generic/tls-internal-struct.h | 1 + sysdeps/generic/tls-internal.c | 18 ++ sysdeps/generic/tls-internal.h | 7 +- sysdeps/mach/hurd/_Fork.c | 2 + sysdeps/mach/hurd/i386/libc.abilist | 3 + sysdeps/mach/hurd/not-cancel.h | 3 + sysdeps/nptl/_Fork.c | 2 + sysdeps/unix/sysv/linux/aarch64/libc.abilist | 3 + sysdeps/unix/sysv/linux/alpha/libc.abilist | 3 + sysdeps/unix/sysv/linux/arc/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/arm/le/libc.abilist | 3 + sysdeps/unix/sysv/linux/csky/libc.abilist | 3 + sysdeps/unix/sysv/linux/hppa/libc.abilist | 3 + sysdeps/unix/sysv/linux/i386/libc.abilist | 3 + sysdeps/unix/sysv/linux/ia64/libc.abilist | 3 + .../sysv/linux/m68k/coldfire/libc.abilist | 3 + .../unix/sysv/linux/m68k/m680x0/libc.abilist | 3 + .../sysv/linux/microblaze/be/libc.abilist | 3 + .../sysv/linux/microblaze/le/libc.abilist | 3 + .../sysv/linux/mips/mips32/fpu/libc.abilist | 3 + .../sysv/linux/mips/mips32/nofpu/libc.abilist | 3 + .../sysv/linux/mips/mips64/n32/libc.abilist | 3 + .../sysv/linux/mips/mips64/n64/libc.abilist | 3 + sysdeps/unix/sysv/linux/nios2/libc.abilist | 3 + sysdeps/unix/sysv/linux/not-cancel.h | 7 + sysdeps/unix/sysv/linux/or1k/libc.abilist | 3 + .../linux/powerpc/powerpc32/fpu/libc.abilist | 3 + .../powerpc/powerpc32/nofpu/libc.abilist | 3 + .../linux/powerpc/powerpc64/be/libc.abilist | 3 + .../linux/powerpc/powerpc64/le/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv32/libc.abilist | 3 + .../unix/sysv/linux/riscv/rv64/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-32/libc.abilist | 3 + .../unix/sysv/linux/s390/s390-64/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/be/libc.abilist | 3 + sysdeps/unix/sysv/linux/sh/le/libc.abilist | 3 + .../sysv/linux/sparc/sparc32/libc.abilist | 3 + .../sysv/linux/sparc/sparc64/libc.abilist | 3 + sysdeps/unix/sysv/linux/tls-internal.c | 39 +++- sysdeps/unix/sysv/linux/tls-internal.h | 8 +- .../unix/sysv/linux/x86_64/64/libc.abilist | 3 + .../unix/sysv/linux/x86_64/x32/libc.abilist | 3 + 55 files changed, 800 insertions(+), 15 deletions(-) create mode 100644 stdlib/arc4random.c create mode 100644 stdlib/arc4random.h create mode 100644 stdlib/arc4random_uniform.c create mode 100644 stdlib/chacha20.c diff --git a/NEWS b/NEWS index df882ec243..8420a65cd0 100644 --- a/NEWS +++ b/NEWS @@ -60,6 +60,10 @@ Major new features: _GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature test macro is not defined (if __cpp_char8_t is defined, then char8_t is a builtin type). +* The functions arc4random, arc4random_buf, and arc4random_uniform have been + added. The functions use a pseudo-random number generator along with + entropy from the kernel. + Deprecated and removed features, and other changes affecting compatibility: * Support for prelink will be removed in the next release; this includes diff --git a/include/stdlib.h b/include/stdlib.h index 1c6f70b082..cae7f7cdf8 100644 --- a/include/stdlib.h +++ b/include/stdlib.h @@ -144,6 +144,18 @@ libc_hidden_proto (__ptsname_r) libc_hidden_proto (grantpt) libc_hidden_proto (unlockpt) +__typeof (arc4random) __arc4random; +libc_hidden_proto (__arc4random); +__typeof (arc4random_buf) __arc4random_buf; +libc_hidden_proto (__arc4random_buf); +__typeof (arc4random_uniform) __arc4random_uniform; +libc_hidden_proto (__arc4random_uniform); +extern void __arc4random_buf_internal (void *buffer, size_t len) + attribute_hidden; +/* Called from the fork function to reinitialize the internal cipher state + in child process. */ +extern void __arc4random_fork_subprocess (void) attribute_hidden; + extern double __strtod_internal (const char *__restrict __nptr, char **__restrict __endptr, int __group) __THROW __nonnull ((1)) __wur; diff --git a/malloc/thread-freeres.c b/malloc/thread-freeres.c index 3894652169..b22e1d789f 100644 --- a/malloc/thread-freeres.c +++ b/malloc/thread-freeres.c @@ -36,7 +36,7 @@ __libc_thread_freeres (void) __rpc_thread_destroy (); #endif call_function_static_weak (__res_thread_freeres); - __glibc_tls_internal_free (); + call_function_static_weak (__glibc_tls_internal_free); call_function_static_weak (__libc_dlerror_result_free); /* This should come last because it shuts down malloc for this diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c index 98f5f6dd85..219854f2cb 100644 --- a/nptl/allocatestack.c +++ b/nptl/allocatestack.c @@ -32,6 +32,7 @@ #include #include #include +#include /* Default alignment of stack. */ #ifndef STACK_ALIGN @@ -127,7 +128,7 @@ get_cached_stack (size_t *sizep, void **memp) result->exiting = false; __libc_lock_init (result->exit_lock); - result->tls_state = (struct tls_internal_t) { 0 }; + memset (&result->tls_state, 0, sizeof result->tls_state); /* Clear the DTV. */ dtv_t *dtv = GET_DTV (TLS_TPADJ (result)); diff --git a/stdlib/Makefile b/stdlib/Makefile index d4a4d5679a..62f8253225 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -53,6 +53,8 @@ routines := \ a64l \ abort \ abs \ + arc4random \ + arc4random_uniform \ at_quick_exit \ atof \ atoi \ diff --git a/stdlib/Versions b/stdlib/Versions index 5e9099a153..d09a308fb5 100644 --- a/stdlib/Versions +++ b/stdlib/Versions @@ -136,6 +136,11 @@ libc { strtof32; strtof64; strtof32x; strtof32_l; strtof64_l; strtof32x_l; } + GLIBC_2.36 { + arc4random; + arc4random_buf; + arc4random_uniform; + } GLIBC_PRIVATE { # functions which have an additional interface since they are # are cancelable. diff --git a/stdlib/arc4random.c b/stdlib/arc4random.c new file mode 100644 index 0000000000..21142e639e --- /dev/null +++ b/stdlib/arc4random.c @@ -0,0 +1,208 @@ +/* Pseudo Random Number Generator based on ChaCha20. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* arc4random keeps two counters: 'have' is the current valid bytes not yet + consumed in 'buf' while 'count' is the maximum number of bytes until a + reseed. + + Both the initial seed and reseed try to obtain entropy from the kernel + and abort the process if none could be obtained. + + The state 'buf' improves the usage of the cipher calls, allowing to call + optimized implementations (if the architecture provides it) and minimize + function call overhead. */ + +#include + +/* Called from the fork function to reset the state. */ +void +__arc4random_fork_subprocess (void) +{ + struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; + if (state != NULL) + { + explicit_bzero (state, sizeof (*state)); + /* Force key init. */ + state->count = -1; + } +} + +/* Return the current thread random state or try to create one if there is + none available. In the case malloc can not allocate a state, arc4random + will try to get entropy with arc4random_getentropy. */ +static struct arc4random_state_t * +arc4random_get_state (void) +{ + struct arc4random_state_t *state = __glibc_tls_internal ()->rand_state; + if (state == NULL) + { + state = malloc (sizeof (struct arc4random_state_t)); + if (state != NULL) + { + /* Force key initialization on first call. */ + state->count = -1; + __glibc_tls_internal ()->rand_state = state; + } + } + return state; +} + +static void +arc4random_getrandom_failure (void) +{ + __libc_fatal ("Fatal glibc error: cannot get entropy for arc4random\n"); +} + +static void +arc4random_rekey (struct arc4random_state_t *state, uint8_t *rnd, size_t rndlen) +{ + chacha20_crypt (state->ctx, state->buf, state->buf, sizeof state->buf); + + /* Mix optional user provided data. */ + if (rnd != NULL) + { + size_t m = MIN (rndlen, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); + for (size_t i = 0; i < m; i++) + state->buf[i] ^= rnd[i]; + } + + /* Immediately reinit for backtracking resistance. */ + chacha20_init (state->ctx, state->buf, state->buf + CHACHA20_KEY_SIZE); + explicit_bzero (state->buf, CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); + state->have = sizeof (state->buf) - (CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE); +} + +static void +arc4random_getentropy (void *rnd, size_t len) +{ + if (__getrandom_nocancel (rnd, len, GRND_NONBLOCK) == len) + return; + + int fd = TEMP_FAILURE_RETRY (__open64_nocancel ("/dev/urandom", + O_RDONLY | O_CLOEXEC)); + if (fd != -1) + { + uint8_t *p = rnd; + uint8_t *end = p + len; + do + { + ssize_t ret = TEMP_FAILURE_RETRY (__read_nocancel (fd, p, end - p)); + if (ret <= 0) + arc4random_getrandom_failure (); + p += ret; + } + while (p < end); + + if (__close_nocancel (fd) == 0) + return; + } + arc4random_getrandom_failure (); +} + +/* Check if the thread context STATE should be reseed with kernel entropy + depending of requested LEN bytes. If there is less than requested, + the state is either initialized or reseeded, otherwise the internal + counter subtract the requested length. */ +static void +arc4random_check_stir (struct arc4random_state_t *state, size_t len) +{ + if (state->count <= len || state->count == -1) + { + uint8_t rnd[CHACHA20_KEY_SIZE + CHACHA20_IV_SIZE]; + arc4random_getentropy (rnd, sizeof rnd); + + if (state->count == -1) + chacha20_init (state->ctx, rnd, rnd + CHACHA20_KEY_SIZE); + else + arc4random_rekey (state, rnd, sizeof rnd); + + explicit_bzero (rnd, sizeof rnd); + + /* Invalidate the buf. */ + state->have = 0; + memset (state->buf, 0, sizeof state->buf); + state->count = CHACHA20_RESEED_SIZE; + } + else + state->count -= len; +} + +void +__arc4random_buf (void *buffer, size_t len) +{ + struct arc4random_state_t *state = arc4random_get_state (); + if (__glibc_unlikely (state == NULL)) + { + arc4random_getentropy (buffer, len); + return; + } + + arc4random_check_stir (state, len); + while (len > 0) + { + if (state->have > 0) + { + size_t m = MIN (len, state->have); + uint8_t *ks = state->buf + sizeof (state->buf) - state->have; + memcpy (buffer, ks, m); + explicit_bzero (ks, m); + buffer += m; + len -= m; + state->have -= m; + } + if (state->have == 0) + arc4random_rekey (state, NULL, 0); + } +} +libc_hidden_def (__arc4random_buf) +weak_alias (__arc4random_buf, arc4random_buf) + +uint32_t +__arc4random (void) +{ + uint32_t r; + + struct arc4random_state_t *state = arc4random_get_state (); + if (__glibc_unlikely (state == NULL)) + { + arc4random_getentropy (&r, sizeof (uint32_t)); + return r; + } + + arc4random_check_stir (state, sizeof (uint32_t)); + if (state->have < sizeof (uint32_t)) + arc4random_rekey (state, NULL, 0); + uint8_t *ks = state->buf + sizeof (state->buf) - state->have; + memcpy (&r, ks, sizeof (uint32_t)); + memset (ks, 0, sizeof (uint32_t)); + state->have -= sizeof (uint32_t); + + return r; +} +libc_hidden_def (__arc4random) +weak_alias (__arc4random, arc4random) diff --git a/stdlib/arc4random.h b/stdlib/arc4random.h new file mode 100644 index 0000000000..9a16e2b63f --- /dev/null +++ b/stdlib/arc4random.h @@ -0,0 +1,48 @@ +/* Arc4random definition used on TLS. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#ifndef _CHACHA20_H +#define _CHACHA20_H + +#include +#include + +/* Internal ChaCha20 state. */ +#define CHACHA20_STATE_LEN 16 +#define CHACHA20_BLOCK_SIZE 64 + +/* Maximum number bytes until reseed (16 MB). */ +#define CHACHA20_RESEED_SIZE (16 * 1024 * 1024) + +/* Internal arc4random buffer, used on each feedback step so offer some + backtracking protection and to allow better used of vectorized + chacha20 implementations. */ +#define CHACHA20_BUFSIZE (8 * CHACHA20_BLOCK_SIZE) + +_Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE + CHACHA20_BLOCK_SIZE"); + +struct arc4random_state_t +{ + uint32_t ctx[CHACHA20_STATE_LEN]; + size_t have; + size_t count; + uint8_t buf[CHACHA20_BUFSIZE]; +}; + +#endif diff --git a/stdlib/arc4random_uniform.c b/stdlib/arc4random_uniform.c new file mode 100644 index 0000000000..83772de5cd --- /dev/null +++ b/stdlib/arc4random_uniform.c @@ -0,0 +1,140 @@ +/* Random pseudo generator number which returns a single 32 bit value + uniformly distributed but with an upper_bound. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include + +/* Return the number of bytes which cover values up to the limit. */ +__attribute__ ((const)) +static uint32_t +byte_count (uint32_t n) +{ + if (n < (1U << 8)) + return 1; + else if (n < (1U << 16)) + return 2; + else if (n < (1U << 24)) + return 3; + else + return 4; +} + +/* Fill the lower bits of the result with randomness, according to the + number of bytes requested. */ +static void +random_bytes (uint32_t *result, uint32_t byte_count) +{ + *result = 0; + unsigned char *ptr = (unsigned char *) result; + if (__BYTE_ORDER == __BIG_ENDIAN) + ptr += 4 - byte_count; + __arc4random_buf (ptr, byte_count); +} + +uint32_t +__arc4random_uniform (uint32_t n) +{ + if (n <= 1) + /* There is no valid return value for a zero limit, and 0 is the + only possible result for limit 1. */ + return 0; + + /* The bits variable serves as a source for bits. Prefetch the + minimum number of bytes needed. */ + uint32_t count = byte_count (n); + uint32_t bits_length = count * CHAR_BIT; + uint32_t bits; + random_bytes (&bits, count); + + /* Powers of two are easy. */ + if (powerof2 (n)) + return bits & (n - 1); + + /* The general case. This algorithm follows Jérémie Lumbroso, + Optimal Discrete Uniform Generation from Coin Flips, and + Applications (2013), who credits Donald E. Knuth and Andrew + C. Yao, The complexity of nonuniform random number generation + (1976), for solving the general case. + + The implementation below unrolls the initialization stage of the + loop, where v is less than n. */ + + /* Use 64-bit variables even though the intermediate results are + never larger than 33 bits. This ensures the code is easier to + compile on 64-bit architectures. */ + uint64_t v; + uint64_t c; + + /* Initialize v and c. v is the smallest power of 2 which is larger + than n.*/ + { + uint32_t log2p1 = 32 - __builtin_clz (n); + v = 1ULL << log2p1; + c = bits & (v - 1); + bits >>= log2p1; + bits_length -= log2p1; + } + + /* At the start of the loop, c is uniformly distributed within the + half-open interval [0, v), and v < 2n < 2**33. */ + while (true) + { + if (v >= n) + { + /* If the candidate is less than n, accept it. */ + if (c < n) + /* c is uniformly distributed on [0, n). */ + return c; + else + { + /* c is uniformly distributed on [n, v). */ + v -= n; + c -= n; + /* The distribution was shifted, so c is uniformly + distributed on [0, v) again. */ + } + } + /* v < n here. */ + + /* Replenish the bit source if necessary. */ + if (bits_length == 0) + { + /* Overwrite the least significant byte. */ + random_bytes (&bits, 1); + bits_length = CHAR_BIT; + } + + /* Double the range. No overflow because v < n < 2**32. */ + v *= 2; + /* v < 2n here. */ + + /* Extract a bit and append it to c. c remains less than v and + thus 2**33. */ + c = (c << 1) | (bits & 1); + bits >>= 1; + --bits_length; + + /* At this point, c is uniformly distributed on [0, v) again, + and v < 2n < 2**33. */ + } +} +libc_hidden_def (__arc4random_uniform) +weak_alias (__arc4random_uniform, arc4random_uniform) diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c new file mode 100644 index 0000000000..4549fc780f --- /dev/null +++ b/stdlib/chacha20.c @@ -0,0 +1,187 @@ +/* Generic ChaCha20 implementation (used on arc4random). + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include + +/* 32-bit stream position, then 96-bit nonce. */ +#define CHACHA20_IV_SIZE 16 +#define CHACHA20_KEY_SIZE 32 + +#define CHACHA20_STATE_LEN 16 + +/* The ChaCha20 implementation is based on RFC8439 [1], omitting the final + XOR of the keystream with the plaintext because the plaintext is a + stream of zeros. */ + +enum chacha20_constants +{ + CHACHA20_CONSTANT_EXPA = 0x61707865U, + CHACHA20_CONSTANT_ND_3 = 0x3320646eU, + CHACHA20_CONSTANT_2_BY = 0x79622d32U, + CHACHA20_CONSTANT_TE_K = 0x6b206574U +}; + +static inline uint32_t +read_unaligned_32 (const uint8_t *p) +{ + uint32_t r; + memcpy (&r, p, sizeof (r)); + return r; +} + +static inline void +write_unaligned_32 (uint8_t *p, uint32_t v) +{ + memcpy (p, &v, sizeof (v)); +} + +#if __BYTE_ORDER == __BIG_ENDIAN +# define read_unaligned_le32(p) __builtin_bswap32 (read_unaligned_32 (p)) +# define set_state(v) __builtin_bswap32 ((v)) +#else +# define read_unaligned_le32(p) read_unaligned_32 ((p)) +# define set_state(v) (v) +#endif + +static inline void +chacha20_init (uint32_t *state, const uint8_t *key, const uint8_t *iv) +{ + state[0] = CHACHA20_CONSTANT_EXPA; + state[1] = CHACHA20_CONSTANT_ND_3; + state[2] = CHACHA20_CONSTANT_2_BY; + state[3] = CHACHA20_CONSTANT_TE_K; + + state[4] = read_unaligned_le32 (key + 0 * sizeof (uint32_t)); + state[5] = read_unaligned_le32 (key + 1 * sizeof (uint32_t)); + state[6] = read_unaligned_le32 (key + 2 * sizeof (uint32_t)); + state[7] = read_unaligned_le32 (key + 3 * sizeof (uint32_t)); + state[8] = read_unaligned_le32 (key + 4 * sizeof (uint32_t)); + state[9] = read_unaligned_le32 (key + 5 * sizeof (uint32_t)); + state[10] = read_unaligned_le32 (key + 6 * sizeof (uint32_t)); + state[11] = read_unaligned_le32 (key + 7 * sizeof (uint32_t)); + + state[12] = read_unaligned_le32 (iv + 0 * sizeof (uint32_t)); + state[13] = read_unaligned_le32 (iv + 1 * sizeof (uint32_t)); + state[14] = read_unaligned_le32 (iv + 2 * sizeof (uint32_t)); + state[15] = read_unaligned_le32 (iv + 3 * sizeof (uint32_t)); +} + +static inline uint32_t +rotl32 (unsigned int shift, uint32_t word) +{ + return (word << (shift & 31)) | (word >> ((-shift) & 31)); +} + +static void +state_final (const uint8_t *src, uint8_t *dst, uint32_t v) +{ +#ifdef CHACHA20_XOR_FINAL + v ^= read_unaligned_32 (src); +#endif + write_unaligned_32 (dst, v); +} + +static inline void +chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) +{ + uint32_t x0, x1, x2, x3, x4, x5, x6, x7; + uint32_t x8, x9, x10, x11, x12, x13, x14, x15; + + x0 = state[0]; + x1 = state[1]; + x2 = state[2]; + x3 = state[3]; + x4 = state[4]; + x5 = state[5]; + x6 = state[6]; + x7 = state[7]; + x8 = state[8]; + x9 = state[9]; + x10 = state[10]; + x11 = state[11]; + x12 = state[12]; + x13 = state[13]; + x14 = state[14]; + x15 = state[15]; + + for (int i = 0; i < 20; i += 2) + { +#define QROUND(_x0, _x1, _x2, _x3) \ + do { \ + _x0 = _x0 + _x1; _x3 = rotl32 (16, (_x0 ^ _x3)); \ + _x2 = _x2 + _x3; _x1 = rotl32 (12, (_x1 ^ _x2)); \ + _x0 = _x0 + _x1; _x3 = rotl32 (8, (_x0 ^ _x3)); \ + _x2 = _x2 + _x3; _x1 = rotl32 (7, (_x1 ^ _x2)); \ + } while(0) + + QROUND (x0, x4, x8, x12); + QROUND (x1, x5, x9, x13); + QROUND (x2, x6, x10, x14); + QROUND (x3, x7, x11, x15); + + QROUND (x0, x5, x10, x15); + QROUND (x1, x6, x11, x12); + QROUND (x2, x7, x8, x13); + QROUND (x3, x4, x9, x14); + } + + state_final (&src[0], &dst[0], set_state (x0 + state[0])); + state_final (&src[4], &dst[4], set_state (x1 + state[1])); + state_final (&src[8], &dst[8], set_state (x2 + state[2])); + state_final (&src[12], &dst[12], set_state (x3 + state[3])); + state_final (&src[16], &dst[16], set_state (x4 + state[4])); + state_final (&src[20], &dst[20], set_state (x5 + state[5])); + state_final (&src[24], &dst[24], set_state (x6 + state[6])); + state_final (&src[28], &dst[28], set_state (x7 + state[7])); + state_final (&src[32], &dst[32], set_state (x8 + state[8])); + state_final (&src[36], &dst[36], set_state (x9 + state[9])); + state_final (&src[40], &dst[40], set_state (x10 + state[10])); + state_final (&src[44], &dst[44], set_state (x11 + state[11])); + state_final (&src[48], &dst[48], set_state (x12 + state[12])); + state_final (&src[52], &dst[52], set_state (x13 + state[13])); + state_final (&src[56], &dst[56], set_state (x14 + state[14])); + state_final (&src[60], &dst[60], set_state (x15 + state[15])); + + state[12]++; +} + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ + while (bytes >= CHACHA20_BLOCK_SIZE) + { + chacha20_block (state, dst, src); + + bytes -= CHACHA20_BLOCK_SIZE; + dst += CHACHA20_BLOCK_SIZE; + src += CHACHA20_BLOCK_SIZE; + } + + if (__glibc_unlikely (bytes != 0)) + { + uint8_t stream[CHACHA20_BLOCK_SIZE]; + chacha20_block (state, stream, src); + memcpy (dst, stream, bytes); + explicit_bzero (stream, sizeof stream); + } +} diff --git a/stdlib/stdlib.h b/stdlib/stdlib.h index bf7cd438e1..3a630a0ce8 100644 --- a/stdlib/stdlib.h +++ b/stdlib/stdlib.h @@ -533,6 +533,19 @@ extern int seed48_r (unsigned short int __seed16v[3], extern int lcong48_r (unsigned short int __param[7], struct drand48_data *__buffer) __THROW __nonnull ((1, 2)); + +/* Return a random integer between zero and 2**32-1 (inclusive). */ +extern __uint32_t arc4random (void) + __THROW __wur; + +/* Fill the buffer with random data. */ +extern void arc4random_buf (void *__buf, size_t __size) + __THROW __nonnull ((1)); + +/* Return a random number between zero (inclusive) and the specified + limit (exclusive). */ +extern __uint32_t arc4random_uniform (__uint32_t __upper_bound) + __THROW __wur; # endif /* Use misc. */ #endif /* Use misc or X/Open. */ diff --git a/sysdeps/generic/not-cancel.h b/sysdeps/generic/not-cancel.h index 2104efeb54..acceb9b67f 100644 --- a/sysdeps/generic/not-cancel.h +++ b/sysdeps/generic/not-cancel.h @@ -48,5 +48,7 @@ (void) __writev (fd, iov, n) #define __fcntl64_nocancel(fd, cmd, ...) \ __fcntl64 (fd, cmd, __VA_ARGS__) +#define __getrandom_nocancel(buf, size, flags) \ + __getrandom (buf, size, flags) #endif /* NOT_CANCEL_H */ diff --git a/sysdeps/generic/tls-internal-struct.h b/sysdeps/generic/tls-internal-struct.h index d76c715a96..a91915831b 100644 --- a/sysdeps/generic/tls-internal-struct.h +++ b/sysdeps/generic/tls-internal-struct.h @@ -23,6 +23,7 @@ struct tls_internal_t { char *strsignal_buf; char *strerror_l_buf; + struct arc4random_state_t *rand_state; }; #endif diff --git a/sysdeps/generic/tls-internal.c b/sysdeps/generic/tls-internal.c index 898c20b61c..8a0f37d509 100644 --- a/sysdeps/generic/tls-internal.c +++ b/sysdeps/generic/tls-internal.c @@ -16,6 +16,24 @@ License along with the GNU C Library; if not, see . */ +#include +#include #include __thread struct tls_internal_t __tls_internal; + +void +__glibc_tls_internal_free (void) +{ + free (__tls_internal.strsignal_buf); + free (__tls_internal.strerror_l_buf); + + if (__tls_internal.rand_state != NULL) + { + /* Clear any lingering random state prior so if the thread stack is + cached it won't leak any data. */ + explicit_bzero (__tls_internal.rand_state, + sizeof (*__tls_internal.rand_state)); + free (__tls_internal.rand_state); + } +} diff --git a/sysdeps/generic/tls-internal.h b/sysdeps/generic/tls-internal.h index acb8ac9abe..3f53e4a1fa 100644 --- a/sysdeps/generic/tls-internal.h +++ b/sysdeps/generic/tls-internal.h @@ -30,11 +30,6 @@ __glibc_tls_internal (void) return &__tls_internal; } -static inline void -__glibc_tls_internal_free (void) -{ - free (__tls_internal.strsignal_buf); - free (__tls_internal.strerror_l_buf); -} +extern void __glibc_tls_internal_free (void) attribute_hidden; #endif diff --git a/sysdeps/mach/hurd/_Fork.c b/sysdeps/mach/hurd/_Fork.c index e60b86fab1..667068c8cf 100644 --- a/sysdeps/mach/hurd/_Fork.c +++ b/sysdeps/mach/hurd/_Fork.c @@ -662,6 +662,8 @@ retry: _hurd_malloc_fork_child (); call_function_static_weak (__malloc_fork_unlock_child); + call_function_static_weak (__arc4random_fork_subprocess); + /* Run things that want to run in the child task to set up. */ RUN_HOOK (_hurd_fork_child_hook, ()); diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist index 66fb0e28fa..4e3200ef55 100644 --- a/sysdeps/mach/hurd/i386/libc.abilist +++ b/sysdeps/mach/hurd/i386/libc.abilist @@ -2289,6 +2289,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 close_range F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 mbrtoc8 F GLIBC_2.4 __confstr_chk F diff --git a/sysdeps/mach/hurd/not-cancel.h b/sysdeps/mach/hurd/not-cancel.h index 6ec92ced84..9a3a7ed59a 100644 --- a/sysdeps/mach/hurd/not-cancel.h +++ b/sysdeps/mach/hurd/not-cancel.h @@ -74,6 +74,9 @@ __typeof (__fcntl) __fcntl_nocancel; #define __fcntl64_nocancel(...) \ __fcntl_nocancel (__VA_ARGS__) +#define __getrandom_nocancel(buf, size, flags) \ + __getrandom (buf, size, flags) + #if IS_IN (libc) hidden_proto (__close_nocancel) hidden_proto (__close_nocancel_nostatus) diff --git a/sysdeps/nptl/_Fork.c b/sysdeps/nptl/_Fork.c index dd568992e2..7dc02569f6 100644 --- a/sysdeps/nptl/_Fork.c +++ b/sysdeps/nptl/_Fork.c @@ -43,6 +43,8 @@ _Fork (void) self->robust_head.list = &self->robust_head; INTERNAL_SYSCALL_CALL (set_robust_list, &self->robust_head, sizeof (struct robust_list_head)); + + call_function_static_weak (__arc4random_fork_subprocess); } return pid; } diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist index 516b029d30..b66fadef40 100644 --- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist +++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist @@ -2616,6 +2616,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist index dde08899fe..f918bb2d48 100644 --- a/sysdeps/unix/sysv/linux/alpha/libc.abilist +++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist @@ -2713,6 +2713,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/arc/libc.abilist b/sysdeps/unix/sysv/linux/arc/libc.abilist index ade44d3029..093043a533 100644 --- a/sysdeps/unix/sysv/linux/arc/libc.abilist +++ b/sysdeps/unix/sysv/linux/arc/libc.abilist @@ -2377,6 +2377,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist index 98b33708af..e0668a80cf 100644 --- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist @@ -496,6 +496,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist index 05dbbe5bcc..d28e7c60b7 100644 --- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist @@ -493,6 +493,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist index 430a24349e..922b05062f 100644 --- a/sysdeps/unix/sysv/linux/csky/libc.abilist +++ b/sysdeps/unix/sysv/linux/csky/libc.abilist @@ -2652,6 +2652,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist index de44616526..412144f94c 100644 --- a/sysdeps/unix/sysv/linux/hppa/libc.abilist +++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist @@ -2601,6 +2601,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist index 18b4fbf26e..134393900a 100644 --- a/sysdeps/unix/sysv/linux/i386/libc.abilist +++ b/sysdeps/unix/sysv/linux/i386/libc.abilist @@ -2785,6 +2785,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist index a8e959d417..02c65b6482 100644 --- a/sysdeps/unix/sysv/linux/ia64/libc.abilist +++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist @@ -2551,6 +2551,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist index 3a7e4ef6e4..0604029c68 100644 --- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist @@ -497,6 +497,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist index 864ad2cdf8..af2be5c80d 100644 --- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist +++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist @@ -2728,6 +2728,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist index 163058420d..e090b8d48f 100644 --- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist @@ -2701,6 +2701,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist index a6debfda56..8c5b2db243 100644 --- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist @@ -2698,6 +2698,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist index 2b53d888de..68847134a2 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist @@ -2693,6 +2693,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist index 849aae4130..daa44e64fa 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist @@ -2691,6 +2691,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist index 37f6c1bf58..6169188c96 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist @@ -2699,6 +2699,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist index ff1eb91e10..2f7f1ccaf7 100644 --- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist +++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist @@ -2602,6 +2602,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist index 361b91f547..58e9b486b0 100644 --- a/sysdeps/unix/sysv/linux/nios2/libc.abilist +++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist @@ -2740,6 +2740,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/not-cancel.h b/sysdeps/unix/sysv/linux/not-cancel.h index 75b9e0ee1e..2c58d5ae2f 100644 --- a/sysdeps/unix/sysv/linux/not-cancel.h +++ b/sysdeps/unix/sysv/linux/not-cancel.h @@ -67,6 +67,13 @@ __writev_nocancel_nostatus (int fd, const struct iovec *iov, int iovcnt) INTERNAL_SYSCALL_CALL (writev, fd, iov, iovcnt); } +static inline int +__getrandom_nocancel (void *buf, size_t buflen, unsigned int flags) +{ + return INTERNAL_SYSCALL_CALL (getrandom, buf, buflen, flags); +} + + /* Uncancelable fcntl. */ __typeof (__fcntl) __fcntl64_nocancel; diff --git a/sysdeps/unix/sysv/linux/or1k/libc.abilist b/sysdeps/unix/sysv/linux/or1k/libc.abilist index cb91606377..ffdb8819d5 100644 --- a/sysdeps/unix/sysv/linux/or1k/libc.abilist +++ b/sysdeps/unix/sysv/linux/or1k/libc.abilist @@ -2123,6 +2123,9 @@ GLIBC_2.35 wprintf F GLIBC_2.35 write F GLIBC_2.35 writev F GLIBC_2.35 wscanf F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist index 1264aff6ef..8c9ca32cbe 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist @@ -2755,6 +2755,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist index f96d6e37b5..08a6604aab 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist @@ -2788,6 +2788,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist index e7082e1bd3..849863e639 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist @@ -2510,6 +2510,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist index 1032c7e46a..b2ccee08c6 100644 --- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist @@ -2812,6 +2812,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist index f932db7c22..ff90d1bff2 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv32/libc.abilist @@ -2379,6 +2379,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist index ccc53b0bb8..f1017f6ec5 100644 --- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist +++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist @@ -2579,6 +2579,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist index dbf6501007..009f22931e 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist @@ -2753,6 +2753,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist index 98f08a01b6..0e0b3df973 100644 --- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist +++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist @@ -2547,6 +2547,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist index df11cc8f13..afb5bc37b1 100644 --- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist @@ -2608,6 +2608,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist index 4ee5513d18..2b53a3cf92 100644 --- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist +++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist @@ -2605,6 +2605,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist index 3cefa76871..43b9844a99 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist @@ -2748,6 +2748,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist index e3ea5c4383..9ec4a0bc7f 100644 --- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist +++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist @@ -2574,6 +2574,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/tls-internal.c b/sysdeps/unix/sysv/linux/tls-internal.c index 6e25b021ab..0326ebb767 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.c +++ b/sysdeps/unix/sysv/linux/tls-internal.c @@ -1 +1,38 @@ -/* Empty. */ +/* Per-thread state. Linux version. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +void +__glibc_tls_internal_free (void) +{ + struct pthread *self = THREAD_SELF; + free (self->tls_state.strsignal_buf); + free (self->tls_state.strerror_l_buf); + + if (self->tls_state.rand_state != NULL) + { + /* Clear any lingering random state prior so if the thread stack is + cached it won't leak any data. */ + explicit_bzero (self->tls_state.rand_state, + sizeof (*self->tls_state.rand_state)); + free (self->tls_state.rand_state); + } +} diff --git a/sysdeps/unix/sysv/linux/tls-internal.h b/sysdeps/unix/sysv/linux/tls-internal.h index f7a1a62135..ebc65d896a 100644 --- a/sysdeps/unix/sysv/linux/tls-internal.h +++ b/sysdeps/unix/sysv/linux/tls-internal.h @@ -28,11 +28,7 @@ __glibc_tls_internal (void) return &THREAD_SELF->tls_state; } -static inline void -__glibc_tls_internal_free (void) -{ - free (THREAD_SELF->tls_state.strsignal_buf); - free (THREAD_SELF->tls_state.strerror_l_buf); -} +/* Reset the arc4random TCB state on fork. */ +extern void __glibc_tls_internal_free (void) attribute_hidden; #endif diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist index 2944bc7837..367c8d0a03 100644 --- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist @@ -2525,6 +2525,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist index 47296193af..6a614efb62 100644 --- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist +++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist @@ -2631,6 +2631,9 @@ GLIBC_2.35 __memcmpeq F GLIBC_2.35 _dl_find_object F GLIBC_2.35 epoll_pwait2 F GLIBC_2.35 posix_spawn_file_actions_addtcsetpgrp_np F +GLIBC_2.36 arc4random F +GLIBC_2.36 arc4random_buf F +GLIBC_2.36 arc4random_uniform F GLIBC_2.36 c8rtomb F GLIBC_2.36 fsconfig F GLIBC_2.36 fsmount F From patchwork Thu Jul 14 11:28:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590371 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1476344max; Thu, 14 Jul 2022 04:29:12 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uENlYFGTev/FvBfa2Pu11fcBInCniKRW0cY6UhlcXw3AzSxCYa5PNeDEe17KPVk1EMtOfx X-Received: by 2002:a05:6402:254c:b0:43a:9e77:3b29 with SMTP id l12-20020a056402254c00b0043a9e773b29mr11492692edb.356.1657798151896; Thu, 14 Jul 2022 04:29:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798151; cv=none; d=google.com; s=arc-20160816; b=SOR7ENlHWzPg5vGJRU/zSTYKnSHKFR/KQElxtnre3W4bv/QviucDSQ7CGO5eLuoFpa FyjFXwodkvIQVwTx+WucERv/1oeAT1CJc/szkGn25UCsxnQhjJbmV5S+1VG6SHU+eP4m RbyOYI1tt0XAmaOgXt3/AyRRgNfij66QZ3UfdvrxbWWQr7cSvMpzSWBPEg6ENdRtaDXE QPmlQas1rw7DXVmOSbwBA0QpBvrck8kU6sW/avNfSm5tl/vVGQwdSY98eAUUDuRkyIPF hfmngF4BJBPEdp72khc5grF1uDn9I2f8bGWsveE7peSdD3Ux6NLASaVMrmzCpWmtKePf 6jGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=JllzIUDGYwNGpgtCbXfjtS/s3EobyC9A48ExMW7L4FU=; b=baIT4QGl9lkrp3KUFOwuv/hseFx7XZg/dmNl2mhKTYxqY6Iy7UWcLZ7E2pP04BgTi1 70kn+URDfciUdEY5uIYsWDlo1xcy22L4UUQaTXk++rHzpMClOuznJpi2q6Znsvh2Tdz2 Dtt5W9UBX3J+Nbi14yjR9sWWpGG+ZYqffnQS/w/ZjWhi1k0PrSjyJUYpsIje8gdRgEXi RJ93bwSHuof2K1pF0qqQTfBtoa54bk+waLqvnQhTsrqkx/xDKt/yHcXN9RIPBHExaAbk xrOPKK3Jw1fxqMHxJRFzgfjU4Qq+DeSZpnMivuXq0artDFWhhN5yKOinHeOu29baXYVC lojw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=dhe1LzFT; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g15-20020a1709065d0f00b007120a60b38csi1636470ejt.568.2022.07.14.04.29.11 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:11 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=dhe1LzFT; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7923F388CEA8 for ; Thu, 14 Jul 2022 11:29:10 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7923F388CEA8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798150; bh=JllzIUDGYwNGpgtCbXfjtS/s3EobyC9A48ExMW7L4FU=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=dhe1LzFTCk4DeXcOdG8emQmDNJGJxmEg5HRr87g2u66x7MOwf7EOeTiNLNjG27DJy Vnnwe+1xq+RBwLk6Y0SezLeEPHA2lSciNZbnkViIUxB6RWw88izo7pC1baFYk+Mhx8 OMUr7PmEAOu6AHitGU0H/dw6aT1yugtit3vDXV88= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oa1-x2a.google.com (mail-oa1-x2a.google.com [IPv6:2001:4860:4864:20::2a]) by sourceware.org (Postfix) with ESMTPS id E5903388B685 for ; Thu, 14 Jul 2022 11:28:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E5903388B685 Received: by mail-oa1-x2a.google.com with SMTP id 586e51a60fabf-10cf9f5b500so895563fac.2 for ; Thu, 14 Jul 2022 04:28:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JllzIUDGYwNGpgtCbXfjtS/s3EobyC9A48ExMW7L4FU=; b=ZZj681WfrJQ/tLmGKHKchsOC4+jvlKJ/9MLVMvm2eFIacETcQ+EemfZNs2XPOkKxG1 FQ4FO8hPa//0SugjpH1/w57Zt2F6H3cievYkbRqj1HeTs5MRzvh9c+VE0xd5SX+qrl5m hPrbCugmnqs5yXVNkY7HUiciGDaZG8BGwl4Nq6UjNwYGCAnwqbyg0B1mdCnJ9KCDrZsZ bPT+QY6ku7QqGgcGyuzzHPFBSF6ucFCDUYZirygVCE6roeVKFLsKoVzCDsYsCebyC+1c oyAuREDQT4K5F7e58+MwM3xsitCPjbbLvTpP/HR9dPejhmIvaoYlZJ6NH291hkaXxmuY oofA== X-Gm-Message-State: AJIora+qlg+JJi3MTE8EXDpM3o1iaNpiR+ZS+o9KvjpoSvfNpYHWy5Gs eZmdaS70aMZNmnUcTQQuga4z2Th60q02jA== X-Received: by 2002:a05:6870:b609:b0:f2:74e7:9bf1 with SMTP id cm9-20020a056870b60900b000f274e79bf1mr6879819oab.141.1657798137693; Thu, 14 Jul 2022 04:28:57 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.28.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:28:57 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 2/9] stdlib: Add arc4random tests Date: Thu, 14 Jul 2022 08:28:38 -0300 Message-Id: <20220714112845.704678-3-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto The basic tst-arc4random-chacha20.c checks if the output of ChaCha20 implementation matches the reference test vectors from RFC8439. The tst-arc4random-fork.c check if subprocesses generate distinct streams of randomness (if fork handling is done correctly). The tst-arc4random-stats.c is a statistical test to the randomness of arc4random, arc4random_buf, and arc4random_uniform. The tst-arc4random-thread.c check if threads generate distinct streams of randomness (if function are thread-safe). Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. Co-authored-by: Florian Weimer Checked on x86_64-linux-gnu and aarch64-linux-gnu. --- stdlib/Makefile | 7 + stdlib/tst-arc4random-chacha20.c | 167 +++++++++++++++ stdlib/tst-arc4random-fork.c | 198 ++++++++++++++++++ stdlib/tst-arc4random-stats.c | 147 +++++++++++++ stdlib/tst-arc4random-thread.c | 341 +++++++++++++++++++++++++++++++ 5 files changed, 860 insertions(+) create mode 100644 stdlib/tst-arc4random-chacha20.c create mode 100644 stdlib/tst-arc4random-fork.c create mode 100644 stdlib/tst-arc4random-stats.c create mode 100644 stdlib/tst-arc4random-thread.c diff --git a/stdlib/Makefile b/stdlib/Makefile index 62f8253225..a900962685 100644 --- a/stdlib/Makefile +++ b/stdlib/Makefile @@ -183,6 +183,9 @@ tests := \ testmb2 \ testrand \ testsort \ + tst-arc4random-fork \ + tst-arc4random-stats \ + tst-arc4random-thread \ tst-at_quick_exit \ tst-atexit \ tst-atof1 \ @@ -243,6 +246,7 @@ tests := \ # tests tests-internal := \ + tst-arc4random-chacha20 \ tst-strtod1i \ tst-strtod3 \ tst-strtod4 \ @@ -252,6 +256,7 @@ tests-internal := \ # tests-internal tests-static := \ + tst-arc4random-chacha20 \ tst-secure-getenv \ # tests-static @@ -271,6 +276,8 @@ LDLIBS-test-cxa_atexit-race = $(shared-thread-library) LDLIBS-test-cxa_atexit-race2 = $(shared-thread-library) LDLIBS-test-on_exit-race = $(shared-thread-library) LDLIBS-tst-canon-bz26341 = $(shared-thread-library) +LDLIBS-tst-arc4random-fork = $(shared-thread-library) +LDLIBS-tst-arc4random-thread = $(shared-thread-library) LDLIBS-test-dlclose-exit-race = $(shared-thread-library) LDFLAGS-test-dlclose-exit-race = $(LDFLAGS-rdynamic) diff --git a/stdlib/tst-arc4random-chacha20.c b/stdlib/tst-arc4random-chacha20.c new file mode 100644 index 0000000000..45ba54920d --- /dev/null +++ b/stdlib/tst-arc4random-chacha20.c @@ -0,0 +1,167 @@ +/* Basic tests for chacha20 cypher used in arc4random. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +/* The test does not define CHACHA20_XOR_FINAL to mimic what arc4random + actual does. */ +#include + +static int +do_test (void) +{ + const uint8_t key[CHACHA20_KEY_SIZE] = + { + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + }; + const uint8_t iv[CHACHA20_IV_SIZE] = + { + 0x0, 0x0, 0x0, 0x0, + 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, + }; + const uint8_t expected1[CHACHA20_BUFSIZE] = + { + 0x76, 0xb8, 0xe0, 0xad, 0xa0, 0xf1, 0x3d, 0x90, 0x40, 0x5d, 0x6a, + 0xe5, 0x53, 0x86, 0xbd, 0x28, 0xbd, 0xd2, 0x19, 0xb8, 0xa0, 0x8d, + 0xed, 0x1a, 0xa8, 0x36, 0xef, 0xcc, 0x8b, 0x77, 0x0d, 0xc7, 0xda, + 0x41, 0x59, 0x7c, 0x51, 0x57, 0x48, 0x8d, 0x77, 0x24, 0xe0, 0x3f, + 0xb8, 0xd8, 0x4a, 0x37, 0x6a, 0x43, 0xb8, 0xf4, 0x15, 0x18, 0xa1, + 0x1c, 0xc3, 0x87, 0xb6, 0x69, 0xb2, 0xee, 0x65, 0x86, 0x9f, 0x07, + 0xe7, 0xbe, 0x55, 0x51, 0x38, 0x7a, 0x98, 0xba, 0x97, 0x7c, 0x73, + 0x2d, 0x08, 0x0d, 0xcb, 0x0f, 0x29, 0xa0, 0x48, 0xe3, 0x65, 0x69, + 0x12, 0xc6, 0x53, 0x3e, 0x32, 0xee, 0x7a, 0xed, 0x29, 0xb7, 0x21, + 0x76, 0x9c, 0xe6, 0x4e, 0x43, 0xd5, 0x71, 0x33, 0xb0, 0x74, 0xd8, + 0x39, 0xd5, 0x31, 0xed, 0x1f, 0x28, 0x51, 0x0a, 0xfb, 0x45, 0xac, + 0xe1, 0x0a, 0x1f, 0x4b, 0x79, 0x4d, 0x6f, 0x2d, 0x09, 0xa0, 0xe6, + 0x63, 0x26, 0x6c, 0xe1, 0xae, 0x7e, 0xd1, 0x08, 0x19, 0x68, 0xa0, + 0x75, 0x8e, 0x71, 0x8e, 0x99, 0x7b, 0xd3, 0x62, 0xc6, 0xb0, 0xc3, + 0x46, 0x34, 0xa9, 0xa0, 0xb3, 0x5d, 0x01, 0x27, 0x37, 0x68, 0x1f, + 0x7b, 0x5d, 0x0f, 0x28, 0x1e, 0x3a, 0xfd, 0xe4, 0x58, 0xbc, 0x1e, + 0x73, 0xd2, 0xd3, 0x13, 0xc9, 0xcf, 0x94, 0xc0, 0x5f, 0xf3, 0x71, + 0x62, 0x40, 0xa2, 0x48, 0xf2, 0x13, 0x20, 0xa0, 0x58, 0xd7, 0xb3, + 0x56, 0x6b, 0xd5, 0x20, 0xda, 0xaa, 0x3e, 0xd2, 0xbf, 0x0a, 0xc5, + 0xb8, 0xb1, 0x20, 0xfb, 0x85, 0x27, 0x73, 0xc3, 0x63, 0x97, 0x34, + 0xb4, 0x5c, 0x91, 0xa4, 0x2d, 0xd4, 0xcb, 0x83, 0xf8, 0x84, 0x0d, + 0x2e, 0xed, 0xb1, 0x58, 0x13, 0x10, 0x62, 0xac, 0x3f, 0x1f, 0x2c, + 0xf8, 0xff, 0x6d, 0xcd, 0x18, 0x56, 0xe8, 0x6a, 0x1e, 0x6c, 0x31, + 0x67, 0x16, 0x7e, 0xe5, 0xa6, 0x88, 0x74, 0x2b, 0x47, 0xc5, 0xad, + 0xfb, 0x59, 0xd4, 0xdf, 0x76, 0xfd, 0x1d, 0xb1, 0xe5, 0x1e, 0xe0, + 0x3b, 0x1c, 0xa9, 0xf8, 0x2a, 0xca, 0x17, 0x3e, 0xdb, 0x8b, 0x72, + 0x93, 0x47, 0x4e, 0xbe, 0x98, 0x0f, 0x90, 0x4d, 0x10, 0xc9, 0x16, + 0x44, 0x2b, 0x47, 0x83, 0xa0, 0xe9, 0x84, 0x86, 0x0c, 0xb6, 0xc9, + 0x57, 0xb3, 0x9c, 0x38, 0xed, 0x8f, 0x51, 0xcf, 0xfa, 0xa6, 0x8a, + 0x4d, 0xe0, 0x10, 0x25, 0xa3, 0x9c, 0x50, 0x45, 0x46, 0xb9, 0xdc, + 0x14, 0x06, 0xa7, 0xeb, 0x28, 0x15, 0x1e, 0x51, 0x50, 0xd7, 0xb2, + 0x04, 0xba, 0xa7, 0x19, 0xd4, 0xf0, 0x91, 0x02, 0x12, 0x17, 0xdb, + 0x5c, 0xf1, 0xb5, 0xc8, 0x4c, 0x4f, 0xa7, 0x1a, 0x87, 0x96, 0x10, + 0xa1, 0xa6, 0x95, 0xac, 0x52, 0x7c, 0x5b, 0x56, 0x77, 0x4a, 0x6b, + 0x8a, 0x21, 0xaa, 0xe8, 0x86, 0x85, 0x86, 0x8e, 0x09, 0x4c, 0xf2, + 0x9e, 0xf4, 0x09, 0x0a, 0xf7, 0xa9, 0x0c, 0xc0, 0x7e, 0x88, 0x17, + 0xaa, 0x52, 0x87, 0x63, 0x79, 0x7d, 0x3c, 0x33, 0x2b, 0x67, 0xca, + 0x4b, 0xc1, 0x10, 0x64, 0x2c, 0x21, 0x51, 0xec, 0x47, 0xee, 0x84, + 0xcb, 0x8c, 0x42, 0xd8, 0x5f, 0x10, 0xe2, 0xa8, 0xcb, 0x18, 0xc3, + 0xb7, 0x33, 0x5f, 0x26, 0xe8, 0xc3, 0x9a, 0x12, 0xb1, 0xbc, 0xc1, + 0x70, 0x71, 0x77, 0xb7, 0x61, 0x38, 0x73, 0x2e, 0xed, 0xaa, 0xb7, + 0x4d, 0xa1, 0x41, 0x0f, 0xc0, 0x55, 0xea, 0x06, 0x8c, 0x99, 0xe9, + 0x26, 0x0a, 0xcb, 0xe3, 0x37, 0xcf, 0x5d, 0x3e, 0x00, 0xe5, 0xb3, + 0x23, 0x0f, 0xfe, 0xdb, 0x0b, 0x99, 0x07, 0x87, 0xd0, 0xc7, 0x0e, + 0x0b, 0xfe, 0x41, 0x98, 0xea, 0x67, 0x58, 0xdd, 0x5a, 0x61, 0xfb, + 0x5f, 0xec, 0x2d, 0xf9, 0x81, 0xf3, 0x1b, 0xef, 0xe1, 0x53, 0xf8, + 0x1d, 0x17, 0x16, 0x17, 0x84, 0xdb + }; + + const uint8_t expected2[CHACHA20_BUFSIZE] = + { + 0x1c, 0x88, 0x22, 0xd5, 0x3c, 0xd1, 0xee, 0x7d, 0xb5, 0x32, 0x36, + 0x48, 0x28, 0xbd, 0xf4, 0x04, 0xb0, 0x40, 0xa8, 0xdc, 0xc5, 0x22, + 0xf3, 0xd3, 0xd9, 0x9a, 0xec, 0x4b, 0x80, 0x57, 0xed, 0xb8, 0x50, + 0x09, 0x31, 0xa2, 0xc4, 0x2d, 0x2f, 0x0c, 0x57, 0x08, 0x47, 0x10, + 0x0b, 0x57, 0x54, 0xda, 0xfc, 0x5f, 0xbd, 0xb8, 0x94, 0xbb, 0xef, + 0x1a, 0x2d, 0xe1, 0xa0, 0x7f, 0x8b, 0xa0, 0xc4, 0xb9, 0x19, 0x30, + 0x10, 0x66, 0xed, 0xbc, 0x05, 0x6b, 0x7b, 0x48, 0x1e, 0x7a, 0x0c, + 0x46, 0x29, 0x7b, 0xbb, 0x58, 0x9d, 0x9d, 0xa5, 0xb6, 0x75, 0xa6, + 0x72, 0x3e, 0x15, 0x2e, 0x5e, 0x63, 0xa4, 0xce, 0x03, 0x4e, 0x9e, + 0x83, 0xe5, 0x8a, 0x01, 0x3a, 0xf0, 0xe7, 0x35, 0x2f, 0xb7, 0x90, + 0x85, 0x14, 0xe3, 0xb3, 0xd1, 0x04, 0x0d, 0x0b, 0xb9, 0x63, 0xb3, + 0x95, 0x4b, 0x63, 0x6b, 0x5f, 0xd4, 0xbf, 0x6d, 0x0a, 0xad, 0xba, + 0xf8, 0x15, 0x7d, 0x06, 0x2a, 0xcb, 0x24, 0x18, 0xc1, 0x76, 0xa4, + 0x75, 0x51, 0x1b, 0x35, 0xc3, 0xf6, 0x21, 0x8a, 0x56, 0x68, 0xea, + 0x5b, 0xc6, 0xf5, 0x4b, 0x87, 0x82, 0xf8, 0xb3, 0x40, 0xf0, 0x0a, + 0xc1, 0xbe, 0xba, 0x5e, 0x62, 0xcd, 0x63, 0x2a, 0x7c, 0xe7, 0x80, + 0x9c, 0x72, 0x56, 0x08, 0xac, 0xa5, 0xef, 0xbf, 0x7c, 0x41, 0xf2, + 0x37, 0x64, 0x3f, 0x06, 0xc0, 0x99, 0x72, 0x07, 0x17, 0x1d, 0xe8, + 0x67, 0xf9, 0xd6, 0x97, 0xbf, 0x5e, 0xa6, 0x01, 0x1a, 0xbc, 0xce, + 0x6c, 0x8c, 0xdb, 0x21, 0x13, 0x94, 0xd2, 0xc0, 0x2d, 0xd0, 0xfb, + 0x60, 0xdb, 0x5a, 0x2c, 0x17, 0xac, 0x3d, 0xc8, 0x58, 0x78, 0xa9, + 0x0b, 0xed, 0x38, 0x09, 0xdb, 0xb9, 0x6e, 0xaa, 0x54, 0x26, 0xfc, + 0x8e, 0xae, 0x0d, 0x2d, 0x65, 0xc4, 0x2a, 0x47, 0x9f, 0x08, 0x86, + 0x48, 0xbe, 0x2d, 0xc8, 0x01, 0xd8, 0x2a, 0x36, 0x6f, 0xdd, 0xc0, + 0xef, 0x23, 0x42, 0x63, 0xc0, 0xb6, 0x41, 0x7d, 0x5f, 0x9d, 0xa4, + 0x18, 0x17, 0xb8, 0x8d, 0x68, 0xe5, 0xe6, 0x71, 0x95, 0xc5, 0xc1, + 0xee, 0x30, 0x95, 0xe8, 0x21, 0xf2, 0x25, 0x24, 0xb2, 0x0b, 0xe4, + 0x1c, 0xeb, 0x59, 0x04, 0x12, 0xe4, 0x1d, 0xc6, 0x48, 0x84, 0x3f, + 0xa9, 0xbf, 0xec, 0x7a, 0x3d, 0xcf, 0x61, 0xab, 0x05, 0x41, 0x57, + 0x33, 0x16, 0xd3, 0xfa, 0x81, 0x51, 0x62, 0x93, 0x03, 0xfe, 0x97, + 0x41, 0x56, 0x2e, 0xd0, 0x65, 0xdb, 0x4e, 0xbc, 0x00, 0x50, 0xef, + 0x55, 0x83, 0x64, 0xae, 0x81, 0x12, 0x4a, 0x28, 0xf5, 0xc0, 0x13, + 0x13, 0x23, 0x2f, 0xbc, 0x49, 0x6d, 0xfd, 0x8a, 0x25, 0x68, 0x65, + 0x7b, 0x68, 0x6d, 0x72, 0x14, 0x38, 0x2a, 0x1a, 0x00, 0x90, 0x30, + 0x17, 0xdd, 0xa9, 0x69, 0x87, 0x84, 0x42, 0xba, 0x5a, 0xff, 0xf6, + 0x61, 0x3f, 0x55, 0x3c, 0xbb, 0x23, 0x3c, 0xe4, 0x6d, 0x9a, 0xee, + 0x93, 0xa7, 0x87, 0x6c, 0xf5, 0xe9, 0xe8, 0x29, 0x12, 0xb1, 0x8c, + 0xad, 0xf0, 0xb3, 0x43, 0x27, 0xb2, 0xe0, 0x42, 0x7e, 0xcf, 0x66, + 0xb7, 0xce, 0xb7, 0xc0, 0x91, 0x8d, 0xc4, 0x7b, 0xdf, 0xf1, 0x2a, + 0x06, 0x2a, 0xdf, 0x07, 0x13, 0x30, 0x09, 0xce, 0x7a, 0x5e, 0x5c, + 0x91, 0x7e, 0x01, 0x68, 0x30, 0x61, 0x09, 0xb7, 0xcb, 0x49, 0x65, + 0x3a, 0x6d, 0x2c, 0xae, 0xf0, 0x05, 0xde, 0x78, 0x3a, 0x9a, 0x9b, + 0xfe, 0x05, 0x38, 0x1e, 0xd1, 0x34, 0x8d, 0x94, 0xec, 0x65, 0x88, + 0x6f, 0x9c, 0x0b, 0x61, 0x9c, 0x52, 0xc5, 0x53, 0x38, 0x00, 0xb1, + 0x6c, 0x83, 0x61, 0x72, 0xb9, 0x51, 0x82, 0xdb, 0xc5, 0xee, 0xc0, + 0x42, 0xb8, 0x9e, 0x22, 0xf1, 0x1a, 0x08, 0x5b, 0x73, 0x9a, 0x36, + 0x11, 0xcd, 0x8d, 0x83, 0x60, 0x18 + }; + + /* Check with the expected internal arc4random keystream buffer. Some + architecture optimizations expects a buffer with a minimum size which + is a multiple of then ChaCha20 blocksize, so they might not be prepared + to handle smaller buffers. */ + + uint8_t output[CHACHA20_BUFSIZE]; + + uint32_t state[CHACHA20_STATE_LEN]; + chacha20_init (state, key, iv); + + /* Check with the initial state. */ + uint8_t input[CHACHA20_BUFSIZE] = { 0 }; + + chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); + TEST_COMPARE_BLOB (output, sizeof output, expected1, CHACHA20_BUFSIZE); + + /* And on the next round. */ + chacha20_crypt (state, output, input, CHACHA20_BUFSIZE); + TEST_COMPARE_BLOB (output, sizeof output, expected2, CHACHA20_BUFSIZE); + + return 0; +} + +#include diff --git a/stdlib/tst-arc4random-fork.c b/stdlib/tst-arc4random-fork.c new file mode 100644 index 0000000000..019c0a99de --- /dev/null +++ b/stdlib/tst-arc4random-fork.c @@ -0,0 +1,198 @@ +/* Test that subprocesses generate distinct streams of randomness. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Collect random data from subprocesses and check that all the + results are unique. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Perform multiple runs. The subsequent runs start with an + already-initialized random number generator. (The number 1500 was + seen to reproduce failures reliable in case of a race condition in + the fork detection code.) */ +enum { runs = 1500 }; + +/* One hundred processes in total. This should be high enough to + expose any issues, but low enough not to tax the overall system too + much. */ +enum { subprocesses = 49 }; + +/* The total number of processes. */ +enum { processes = subprocesses + 1 }; + +/* Number of bytes of randomness to generate per process. Large + enough to make false positive duplicates extremely unlikely. */ +enum { random_size = 16 }; + +/* Generated bytes of randomness. */ +struct result +{ + unsigned char bytes[random_size]; +}; + +/* Shared across all processes. */ +static struct shared_data +{ + pthread_barrier_t barrier; + struct result results[runs][processes]; +} *shared_data; + +static void +generate_arc4random (unsigned char *bytes) +{ + for (int i = 0; i < random_size / sizeof (uint32_t); i++) + { + uint32_t x = arc4random (); + memcpy (&bytes[4 * i], &x, sizeof x); + } +} + +static void +generate_arc4random_buf (unsigned char *bytes) +{ + arc4random_buf (bytes, random_size); +} + +static void +generate_arc4random_uniform (unsigned char *bytes) +{ + for (int i = 0; i < random_size; i++) + bytes[i] = arc4random_uniform (256); +} + +/* Invoked to collect data from a subprocess. */ +static void +subprocess (int run, int process_index, void (*func)(unsigned char *)) +{ + xpthread_barrier_wait (&shared_data->barrier); + func (shared_data->results[run][process_index].bytes); +} + +/* Used to sort the results. */ +struct index +{ + int run; + int process_index; +}; + +/* Used to sort an array of struct index values. */ +static int +index_compare (const void *left1, const void *right1) +{ + const struct index *left = left1; + const struct index *right = right1; + + return memcmp (shared_data->results[left->run][left->process_index].bytes, + shared_data->results[right->run][right->process_index].bytes, + random_size); +} + +static int +do_test_func (void (*func)(unsigned char *bytes)) +{ + /* Collect random data. */ + for (int run = 0; run < runs; ++run) + { + pid_t pids[subprocesses]; + for (int process_index = 0; process_index < subprocesses; + ++process_index) + { + pids[process_index] = xfork (); + if (pids[process_index] == 0) + { + subprocess (run, process_index, func); + _exit (0); + } + } + + /* Trigger all subprocesses. Also add data from the parent + process. */ + subprocess (run, subprocesses, func); + + for (int process_index = 0; process_index < subprocesses; + ++process_index) + { + int status; + xwaitpid (pids[process_index], &status, 0); + if (status != 0) + FAIL_EXIT1 ("subprocess index %d (PID %d) exit status %d\n", + process_index, (int) pids[process_index], status); + } + } + + /* Check for duplicates. */ + struct index indexes[runs * processes]; + for (int run = 0; run < runs; ++run) + for (int process_index = 0; process_index < processes; ++process_index) + indexes[run * processes + process_index] + = (struct index) { .run = run, .process_index = process_index }; + qsort (indexes, array_length (indexes), sizeof (indexes[0]), index_compare); + for (size_t i = 1; i < array_length (indexes); ++i) + { + if (index_compare (indexes + i - 1, indexes + i) == 0) + { + support_record_failure (); + unsigned char *bytes + = shared_data->results[indexes[i].run] + [indexes[i].process_index].bytes; + char *quoted = support_quote_blob (bytes, random_size); + printf ("error: duplicate randomness data: \"%s\"\n" + " run %d, subprocess %d\n" + " run %d, subprocess %d\n", + quoted, indexes[i - 1].run, indexes[i - 1].process_index, + indexes[i].run, indexes[i].process_index); + free (quoted); + } + } + + return 0; +} + +static int +do_test (void) +{ + shared_data = support_shared_allocate (sizeof (*shared_data)); + { + pthread_barrierattr_t attr; + xpthread_barrierattr_init (&attr); + xpthread_barrierattr_setpshared (&attr, PTHREAD_PROCESS_SHARED); + xpthread_barrier_init (&shared_data->barrier, &attr, processes); + xpthread_barrierattr_destroy (&attr); + } + + do_test_func (generate_arc4random); + do_test_func (generate_arc4random_buf); + do_test_func (generate_arc4random_uniform); + + xpthread_barrier_destroy (&shared_data->barrier); + support_shared_free (shared_data); + shared_data = NULL; + + return 0; +} + +#define TIMEOUT 40 +#include diff --git a/stdlib/tst-arc4random-stats.c b/stdlib/tst-arc4random-stats.c new file mode 100644 index 0000000000..c29be437ed --- /dev/null +++ b/stdlib/tst-arc4random-stats.c @@ -0,0 +1,147 @@ +/* Statistical tests for arc4random-related functions. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include + +enum +{ + arc4random_key_size = 32 +}; + +struct key +{ + unsigned char data[arc4random_key_size]; +}; + +/* With 12,000 keys, the probability that a byte in a predetermined + position does not have a predetermined value in all generated keys + is about 4e-21. The probability that this happens with any of the + 16 * 256 possible byte position/values is 1.6e-17. This results in + an acceptably low false-positive rate. */ +enum { key_count = 12000 }; + +static struct key keys[key_count]; + +/* Used to perform the distribution check. */ +static int byte_counts[arc4random_key_size][256]; + +/* Bail out after this many failures. */ +enum { failure_limit = 100 }; + +static void +find_stuck_bytes (bool (*func) (unsigned char *key)) +{ + memset (&keys, 0xcc, sizeof (keys)); + + int failures = 0; + for (int key = 0; key < key_count; ++key) + { + while (true) + { + if (func (keys[key].data)) + break; + ++failures; + if (failures >= failure_limit) + { + printf ("warning: bailing out after %d failures\n", failures); + return; + } + } + } + printf ("info: key generation finished with %d failures\n", failures); + + memset (&byte_counts, 0, sizeof (byte_counts)); + for (int key = 0; key < key_count; ++key) + for (int pos = 0; pos < arc4random_key_size; ++pos) + ++byte_counts[pos][keys[key].data[pos]]; + + for (int pos = 0; pos < arc4random_key_size; ++pos) + for (int byte = 0; byte < 256; ++byte) + if (byte_counts[pos][byte] == 0) + { + support_record_failure (); + printf ("error: byte %d never appeared at position %d\n", byte, pos); + } +} + +/* Test adapter for arc4random. */ +static bool +generate_arc4random (unsigned char *key) +{ + uint32_t words[arc4random_key_size / 4]; + _Static_assert (sizeof (words) == arc4random_key_size, "sizeof (words)"); + + for (int i = 0; i < array_length (words); ++i) + words[i] = arc4random (); + memcpy (key, &words, arc4random_key_size); + return true; +} + +/* Test adapter for arc4random_buf. */ +static bool +generate_arc4random_buf (unsigned char *key) +{ + arc4random_buf (key, arc4random_key_size); + return true; +} + +/* Test adapter for arc4random_uniform. */ +static bool +generate_arc4random_uniform (unsigned char *key) +{ + for (int i = 0; i < arc4random_key_size; ++i) + key[i] = arc4random_uniform (256); + return true; +} + +/* Test adapter for arc4random_uniform with argument 257. This means + that byte 0 happens more often, but we do not perform such a + statistcal check, so the test will still pass */ +static bool +generate_arc4random_uniform_257 (unsigned char *key) +{ + for (int i = 0; i < arc4random_key_size; ++i) + key[i] = arc4random_uniform (257); + return true; +} + +static int +do_test (void) +{ + puts ("info: arc4random implementation test"); + find_stuck_bytes (generate_arc4random); + + puts ("info: arc4random_buf implementation test"); + find_stuck_bytes (generate_arc4random_buf); + + puts ("info: arc4random_uniform implementation test"); + find_stuck_bytes (generate_arc4random_uniform); + + puts ("info: arc4random_uniform implementation test (257 variant)"); + find_stuck_bytes (generate_arc4random_uniform_257); + + return 0; +} + +#include diff --git a/stdlib/tst-arc4random-thread.c b/stdlib/tst-arc4random-thread.c new file mode 100644 index 0000000000..797048a7cb --- /dev/null +++ b/stdlib/tst-arc4random-thread.c @@ -0,0 +1,341 @@ +/* Test that threads generate distinct streams of randomness. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include +#include +#include +#include +#include +#include + +/* Number of arc4random_buf calls per thread. */ +enum { count_per_thread = 5000 }; + +/* Number of threads computing randomness. */ +enum { inner_threads = 5 }; + +/* Number of threads launching other threads. Chosen as to not to + overload the system. */ +enum { outer_threads = 7 }; + +/* Number of launching rounds performed by the outer threads. */ +enum { outer_rounds = 10 }; + +/* Maximum number of bytes generated in an arc4random call. */ +enum { max_size = 32 }; + +/* Sizes generated by threads. Must be long enough to be unique with + high probability. */ +static const int sizes[] = { 12, 15, 16, 17, 24, 31, max_size }; + +/* Data structure to capture randomness results. */ +struct blob +{ + unsigned int size; + int thread_id; + unsigned int index; + unsigned char bytes[max_size]; +}; + +struct subprocess_args +{ + struct blob *blob; + void (*func)(unsigned char *, size_t); +}; + +static void +generate_arc4random (unsigned char *bytes, size_t size) +{ + int i; + for (i = 0; i < size / sizeof (uint32_t); i++) + { + uint32_t x = arc4random (); + memcpy (&bytes[4 * i], &x, sizeof x); + } + int rem = size % sizeof (uint32_t); + if (rem > 0) + { + uint32_t x = arc4random (); + memcpy (&bytes[4 * i], &x, rem); + } +} + +static void +generate_arc4random_buf (unsigned char *bytes, size_t size) +{ + arc4random_buf (bytes, size); +} + +static void +generate_arc4random_uniform (unsigned char *bytes, size_t size) +{ + for (int i = 0; i < size; i++) + bytes[i] = arc4random_uniform (256); +} + +#define DYNARRAY_STRUCT dynarray_blob +#define DYNARRAY_ELEMENT struct blob +#define DYNARRAY_PREFIX dynarray_blob_ +#include + +/* Sort blob elements by length first, then by comparing the data + member. */ +static int +compare_blob (const void *left1, const void *right1) +{ + const struct blob *left = left1; + const struct blob *right = right1; + + if (left->size != right->size) + /* No overflow due to limited range. */ + return left->size - right->size; + return memcmp (left->bytes, right->bytes, left->size); +} + +/* Used to store the global result. */ +static pthread_mutex_t global_result_lock = PTHREAD_MUTEX_INITIALIZER; +static struct dynarray_blob global_result; + +/* Copy data to the global result, with locking. */ +static void +copy_result_to_global (struct dynarray_blob *result) +{ + xpthread_mutex_lock (&global_result_lock); + size_t old_size = dynarray_blob_size (&global_result); + TEST_VERIFY_EXIT + (dynarray_blob_resize (&global_result, + old_size + dynarray_blob_size (result))); + memcpy (dynarray_blob_begin (&global_result) + old_size, + dynarray_blob_begin (result), + dynarray_blob_size (result) * sizeof (struct blob)); + xpthread_mutex_unlock (&global_result_lock); +} + +/* Used to assign unique thread IDs. Accessed atomically. */ +static int next_thread_id; + +static void * +inner_thread (void *closure) +{ + void (*func) (unsigned char *, size_t) = closure; + + /* Use local result to avoid global lock contention while generating + randomness. */ + struct dynarray_blob result; + dynarray_blob_init (&result); + + int thread_id = __atomic_fetch_add (&next_thread_id, 1, __ATOMIC_RELAXED); + + /* Determine the sizes to be used by this thread. */ + int size_slot = thread_id % (array_length (sizes) + 1); + bool switch_sizes = size_slot == array_length (sizes); + if (switch_sizes) + size_slot = 0; + + /* Compute the random blobs. */ + for (int i = 0; i < count_per_thread; ++i) + { + struct blob *place = dynarray_blob_emplace (&result); + TEST_VERIFY_EXIT (place != NULL); + place->size = sizes[size_slot]; + place->thread_id = thread_id; + place->index = i; + func (place->bytes, place->size); + + if (switch_sizes) + size_slot = (size_slot + 1) % array_length (sizes); + } + + /* Store the blobs in the global result structure. */ + copy_result_to_global (&result); + + dynarray_blob_free (&result); + + return NULL; +} + +/* Launch the inner threads and wait for their termination. */ +static void * +outer_thread (void *closure) +{ + void (*func) (unsigned char *, size_t) = closure; + + for (int round = 0; round < outer_rounds; ++round) + { + pthread_t threads[inner_threads]; + + for (int i = 0; i < inner_threads; ++i) + threads[i] = xpthread_create (NULL, inner_thread, func); + + for (int i = 0; i < inner_threads; ++i) + xpthread_join (threads[i]); + } + + return NULL; +} + +static bool termination_requested; + +/* Call arc4random_buf to fill one blob with 16 bytes. */ +static void * +get_one_blob_thread (void *closure) +{ + struct subprocess_args *arg = closure; + struct blob *result = arg->blob; + + result->size = 16; + arg->func (result->bytes, result->size); + return NULL; +} + +/* Invoked from fork_thread to actually obtain randomness data. */ +static void +fork_thread_subprocess (void *closure) +{ + struct subprocess_args *arg = closure; + struct blob *shared_result = arg->blob; + + struct subprocess_args args[3] = + { + { shared_result + 0, arg->func }, + { shared_result + 1, arg->func }, + { shared_result + 2, arg->func } + }; + + pthread_t thr1 = xpthread_create (NULL, get_one_blob_thread, &args[1]); + pthread_t thr2 = xpthread_create (NULL, get_one_blob_thread, &args[2]); + get_one_blob_thread (&args[0]); + xpthread_join (thr1); + xpthread_join (thr2); +} + +/* Continuously fork subprocesses to obtain a little bit of + randomness. */ +static void * +fork_thread (void *closure) +{ + void (*func)(unsigned char *, size_t) = closure; + + struct dynarray_blob result; + dynarray_blob_init (&result); + + /* Three blobs from each subprocess. */ + struct blob *shared_result + = support_shared_allocate (3 * sizeof (*shared_result)); + + while (!__atomic_load_n (&termination_requested, __ATOMIC_RELAXED)) + { + /* Obtain the results from a subprocess. */ + struct subprocess_args arg = { shared_result, func }; + support_isolate_in_subprocess (fork_thread_subprocess, &arg); + + for (int i = 0; i < 3; ++i) + { + struct blob *place = dynarray_blob_emplace (&result); + TEST_VERIFY_EXIT (place != NULL); + place->size = shared_result[i].size; + place->thread_id = -1; + place->index = i; + memcpy (place->bytes, shared_result[i].bytes, place->size); + } + } + + support_shared_free (shared_result); + + copy_result_to_global (&result); + dynarray_blob_free (&result); + + return NULL; +} + +/* Launch the outer threads and wait for their termination. */ +static void +run_outer_threads (void (*func)(unsigned char *, size_t)) +{ + /* Special thread that continuously calls fork. */ + pthread_t fork_thread_id = xpthread_create (NULL, fork_thread, func); + + pthread_t threads[outer_threads]; + for (int i = 0; i < outer_threads; ++i) + threads[i] = xpthread_create (NULL, outer_thread, func); + + for (int i = 0; i < outer_threads; ++i) + xpthread_join (threads[i]); + + __atomic_store_n (&termination_requested, true, __ATOMIC_RELAXED); + xpthread_join (fork_thread_id); +} + +static int +do_test_func (const char *fname, void (*func)(unsigned char *, size_t)) +{ + dynarray_blob_init (&global_result); + int expected_blobs + = count_per_thread * inner_threads * outer_threads * outer_rounds; + printf ("info: %s: minimum of %d blob results expected\n", + fname, expected_blobs); + + run_outer_threads (func); + + /* The forking thread delivers a non-deterministic number of + results, which is why expected_blobs is only a minimun number of + results. */ + printf ("info: %s: %zu blob results observed\n", fname, + dynarray_blob_size (&global_result)); + TEST_VERIFY (dynarray_blob_size (&global_result) >= expected_blobs); + + /* Verify that there are no duplicates. */ + qsort (dynarray_blob_begin (&global_result), + dynarray_blob_size (&global_result), + sizeof (struct blob), compare_blob); + struct blob *end = dynarray_blob_end (&global_result); + for (struct blob *p = dynarray_blob_begin (&global_result) + 1; + p < end; ++p) + { + if (compare_blob (p - 1, p) == 0) + { + support_record_failure (); + char *quoted = support_quote_blob (p->bytes, p->size); + printf ("error: %s: duplicate blob: \"%s\" (%d bytes)\n", + fname, quoted, (int) p->size); + printf (" first source: thread %d, index %u\n", + p[-1].thread_id, p[-1].index); + printf (" second source: thread %d, index %u\n", + p[0].thread_id, p[0].index); + free (quoted); + } + } + + dynarray_blob_free (&global_result); + + return 0; +} + +static int +do_test (void) +{ + do_test_func ("arc4random", generate_arc4random); + do_test_func ("arc4random_buf", generate_arc4random_buf); + do_test_func ("arc4random_uniform", generate_arc4random_uniform); + + return 0; +} + +#include From patchwork Thu Jul 14 11:28:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590370 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1476331max; Thu, 14 Jul 2022 04:29:10 -0700 (PDT) X-Google-Smtp-Source: AGRyM1ugOkd2iPOFOaTi62pGLAqHmfBn0D/BtIPSpOmONx4WVMUcgISZK1A1jidj5aYdDMpjx1Sz X-Received: by 2002:a05:6402:c47:b0:437:ce2d:c30d with SMTP id cs7-20020a0564020c4700b00437ce2dc30dmr11575193edb.395.1657798150674; Thu, 14 Jul 2022 04:29:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798150; cv=none; d=google.com; s=arc-20160816; b=U+GgUtGwpwCZwoooYjYl7+CQBjUHIKlV8p2LOp+qP8m66TXduUvb2Ip8I/QXbxqTCX ZddcRgrAW5DGHp2kNKDuYKYYT9aVjs2uhsPKbiZ/oAq2/fztosbtmEkrXsIXSVJRtBgt /BO59wtiXDnjIg/f8z3j2OL3116LCI2NvJl8m/yI3tUQaHtpG7oUe0TErSnOSMADFlde wDAuZC3R9nTb65g6UxmpgjJKVDm3JXwulDUvLy0g4ANbUlbHVSkVHZkWXvcalSQR7KqT OHmv8YFFvuopCYHNveDSu6wn6H15sr1af+rhiih+HElJAjD9e/r/UIlSwRoA5cDzdGcV pxJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=TZgsPAuQ7b7NGHsaLjLHKd9XBdd0FAVE3a8IHzoFoCg=; b=oh3ls8oRssSrLOQdCGohbjVNgaO0E9NUi1pKWo1x11KP3i3h4epiSX9+tA0GuLXxP6 QEHBeZN19wyt0Zv0SPZ+hkQAZ/uZJH+xfR2hARJ4a22QNLU1FkxBz+1lsQMKtYWjFDIq EFb+VVVeQQT66O/8de6iTtAvhEwHycfRv0MngLwJPfr28y+z7Nm2xs1DcpUGswu+kNYi tiuz5DOXpAkVRZrjc75+4//THO8Ftraoc2brE8CBRKlHZCiDgCe7UgL1zPmMIhTbGYKy ob2rzFeDBHkrnoky9HEN+w8M/6O+A9aBC/dn2GLR/nJiomuupfBbZ+pX4miR6P/vBHaC HJAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=OodZNeHm; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id v2-20020a1709067d8200b006efe41f067asi1600768ejo.234.2022.07.14.04.29.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:10 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=OodZNeHm; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 63C32388CE9B for ; Thu, 14 Jul 2022 11:29:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 63C32388CE9B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798149; bh=TZgsPAuQ7b7NGHsaLjLHKd9XBdd0FAVE3a8IHzoFoCg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=OodZNeHm5uLjD5JH5mOoL8nyGN+fspWSLvG7ttesyBu0WBJ6a6HnJBPvasFW79ajH +qrUXp8n3LWW4fNT0/Ihtsx+iovArN4QgtN6OHW6ubbRz7WoPMXYd/JlPlS+ebWahz 3+u9VRI+1LeBAIgf+bBu0lROk+2DrWvKtot3E0rQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 2526B3889E30 for ; Thu, 14 Jul 2022 11:29:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2526B3889E30 Received: by mail-oi1-x236.google.com with SMTP id j70so1962129oih.10 for ; Thu, 14 Jul 2022 04:29:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=TZgsPAuQ7b7NGHsaLjLHKd9XBdd0FAVE3a8IHzoFoCg=; b=SKXy259m0ex0saKPuofpAVwkfe+wb4qsdWU/dm27qBBNHWeW/E728ltQQBxC46mNqt zwCbAxtb4ShwzoOXWlBo6aEJmlVe0MfdctfrYjjN1vmEeTFm9SpZCayg+GtHlkLxhhaO 8skOensrIjGuiaszsqQFhryNoofhjj8VPhcV4S84Y9Iov3g8oDcM0Ca8ItrVhbXy0zIU nfRVTwqIpWyR3xCqBUCJfebfmUtQPUrzBbcIAqgUhzb/v8d+NYGKa3z/BUME/j5RPIdm GRlaYN6DoxTlg/FgsMAhH+smD7GOo9Z7YjJokYTB4ZL+sl/DEX71fKv1fTnUHD6tTLY8 Hu6A== X-Gm-Message-State: AJIora8qzCVZWotvgSSUwVGei4O3Tbes07/XIw275MaWRpchncrtpHMh jRADfhIayZpiSuf0k2WlKEhi9AmjC5ExtQ== X-Received: by 2002:a05:6808:1829:b0:33a:c33:5d54 with SMTP id bh41-20020a056808182900b0033a0c335d54mr6990595oib.299.1657798139310; Thu, 14 Jul 2022 04:28:59 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.28.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:28:59 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 3/9] benchtests: Add arc4random benchtest Date: Thu, 14 Jul 2022 08:28:39 -0300 Message-Id: <20220714112845.704678-4-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto It shows both throughput (total bytes obtained in the test duration) and latecy for both arc4random and arc4random_buf with different sizes. Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. --- benchtests/Makefile | 5 +- benchtests/bench-arc4random.c | 218 +++++++++++++++++++++++++++ benchtests/bench-hash-funcs-kernel.h | 1 + benchtests/bench-hash-funcs.c | 2 - benchtests/bench-util.h | 7 + 5 files changed, 230 insertions(+), 3 deletions(-) create mode 100644 benchtests/bench-arc4random.c diff --git a/benchtests/Makefile b/benchtests/Makefile index c279041e19..d99771be74 100644 --- a/benchtests/Makefile +++ b/benchtests/Makefile @@ -233,7 +233,10 @@ hash-benchset := \ nss-hash \ # hash-benchset -stdlib-benchset := strtod +stdlib-benchset := \ + arc4random \ + strtod \ + # stdlib-benchset stdio-common-benchset := sprintf diff --git a/benchtests/bench-arc4random.c b/benchtests/bench-arc4random.c new file mode 100644 index 0000000000..d8fd40298e --- /dev/null +++ b/benchtests/bench-arc4random.c @@ -0,0 +1,218 @@ +/* arc4random benchmarks. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include "bench-timing.h" +#include "bench-util.h" +#include "json-lib.h" +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static volatile sig_atomic_t timer_finished; + +static void timer_callback (int unused) +{ + timer_finished = 1; +} + +static timer_t timer; + +/* Run for approximately DURATION seconds, and it does not matter who + receive the signal (so not need to mask it on main thread). */ +static void +timer_start (void) +{ + timer_finished = 0; + timer = support_create_timer (DURATION, 0, false, timer_callback); +} +static void +timer_stop (void) +{ + support_delete_timer (timer); +} + +static const uint32_t sizes[] = { 0, 16, 32, 48, 64, 80, 96, 112, 128 }; + +static double +bench_throughput (void) +{ + uint64_t n = 0; + + struct timespec start, end; + clock_gettime (CLOCK_MONOTONIC, &start); + while (1) + { + DO_NOT_OPTIMIZE_OUT (arc4random ()); + n++; + + if (timer_finished == 1) + break; + } + clock_gettime (CLOCK_MONOTONIC, &end); + struct timespec diff = timespec_sub (end, start); + + double total = (double) n * sizeof (uint32_t); + double duration = (double) diff.tv_sec + + (double) diff.tv_nsec / TIMESPEC_HZ; + + return total / duration; +} + +static double +bench_latency (void) +{ + timing_t start, stop, cur; + const size_t iters = 1024; + + TIMING_NOW (start); + for (size_t i = 0; i < iters; i++) + DO_NOT_OPTIMIZE_OUT (arc4random ()); + TIMING_NOW (stop); + + TIMING_DIFF (cur, start, stop); + + return (double) (cur) / (double) iters; +} + +static double +bench_buf_throughput (size_t len) +{ + uint8_t buf[len]; + uint64_t n = 0; + + struct timespec start, end; + clock_gettime (CLOCK_MONOTONIC, &start); + while (1) + { + arc4random_buf (buf, len); + n++; + + if (timer_finished == 1) + break; + } + clock_gettime (CLOCK_MONOTONIC, &end); + struct timespec diff = timespec_sub (end, start); + + double total = (double) n * len; + double duration = (double) diff.tv_sec + + (double) diff.tv_nsec / TIMESPEC_HZ; + + return total / duration; +} + +static double +bench_buf_latency (size_t len) +{ + timing_t start, stop, cur; + const size_t iters = 1024; + + uint8_t buf[len]; + + TIMING_NOW (start); + for (size_t i = 0; i < iters; i++) + arc4random_buf (buf, len); + TIMING_NOW (stop); + + TIMING_DIFF (cur, start, stop); + + return (double) (cur) / (double) iters; +} + +static void +bench_singlethread (json_ctx_t *json_ctx) +{ + json_element_object_begin (json_ctx); + + json_array_begin (json_ctx, "throughput"); + for (int i = 0; i < array_length (sizes); i++) + { + timer_start (); + double r = sizes[i] == 0 + ? bench_throughput () : bench_buf_throughput (sizes[i]); + timer_stop (); + + json_element_double (json_ctx, r); + } + json_array_end (json_ctx); + + json_array_begin (json_ctx, "latency"); + for (int i = 0; i < array_length (sizes); i++) + { + timer_start (); + double r = sizes[i] == 0 + ? bench_latency () : bench_buf_latency (sizes[i]); + timer_stop (); + + json_element_double (json_ctx, r); + } + json_array_end (json_ctx); + + json_element_object_end (json_ctx); +} + +static void +run_bench (json_ctx_t *json_ctx, const char *name, + char *const*fnames, size_t fnameslen, + void (*bench) (json_ctx_t *ctx)) +{ + json_attr_object_begin (json_ctx, name); + json_array_begin (json_ctx, "functions"); + for (int i = 0; i < fnameslen; i++) + json_element_string (json_ctx, fnames[i]); + json_array_end (json_ctx); + + json_array_begin (json_ctx, "results"); + bench (json_ctx); + json_array_end (json_ctx); + json_attr_object_end (json_ctx); +} + +static int +do_test (void) +{ + char *fnames[array_length (sizes)]; + for (int i = 0; i < array_length (sizes); i++) + if (sizes[i] == 0) + fnames[i] = xasprintf ("arc4random"); + else + fnames[i] = xasprintf ("arc4random_buf(%u)", sizes[i]); + + json_ctx_t json_ctx; + json_init (&json_ctx, 0, stdout); + + json_document_begin (&json_ctx); + json_attr_string (&json_ctx, "timing_type", TIMING_TYPE); + + run_bench (&json_ctx, "single-thread", fnames, array_length (fnames), + bench_singlethread); + + json_document_end (&json_ctx); + + for (int i = 0; i < array_length (sizes); i++) + free (fnames[i]); + + return 0; +} + +#include diff --git a/benchtests/bench-hash-funcs-kernel.h b/benchtests/bench-hash-funcs-kernel.h index 83995cc0ae..63034f7e44 100644 --- a/benchtests/bench-hash-funcs-kernel.h +++ b/benchtests/bench-hash-funcs-kernel.h @@ -17,6 +17,7 @@ . */ +#include "bench-util.h" /* We go through the trouble of using macros here because many of the hash functions are meant to be inlined so its not fair to benchmark diff --git a/benchtests/bench-hash-funcs.c b/benchtests/bench-hash-funcs.c index 578c5cbae2..44b349d30c 100644 --- a/benchtests/bench-hash-funcs.c +++ b/benchtests/bench-hash-funcs.c @@ -38,8 +38,6 @@ #include #include -#define DO_NOT_OPTIMIZE_OUT(x) __asm__ volatile("" : : "r,m"(x) : "memory") - enum { NFIXED_ITERS = 1048576, diff --git a/benchtests/bench-util.h b/benchtests/bench-util.h index d0e29423aa..00f78d649f 100644 --- a/benchtests/bench-util.h +++ b/benchtests/bench-util.h @@ -16,6 +16,13 @@ License along with the GNU C Library; if not, see . */ +/* Prevent compiler to optimize away call. */ +#define DO_NOT_OPTIMIZE_OUT(value) \ + ({ \ + __typeof (value) __v = (value); \ + asm volatile ("" : : "r,m" (__v) : "memory"); \ + __v; \ + }) #ifndef START_ITER # define START_ITER (100000000) From patchwork Thu Jul 14 11:28:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590374 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1476999max; Thu, 14 Jul 2022 04:29:57 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tGnT2EaV1d5bxwUxPpaKkQGUjTKslkst5WEDfAStWxycsAgTP5+Wv1dAjLrMcdTI4G4cGi X-Received: by 2002:a05:6402:3890:b0:43a:dd01:1581 with SMTP id fd16-20020a056402389000b0043add011581mr12231278edb.264.1657798197116; Thu, 14 Jul 2022 04:29:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798197; cv=none; d=google.com; s=arc-20160816; b=PvWzkOXtMlpNXFq6VW8vOL4xDwQtqHVysS4czibJat9q2IiJ0tlHGeT2JCbA/32rUP UMZ691gDX6p5699AFPam9O2R3J4GH6LdE3Ug1j1cA331K5tzdOnUdBATXM8jYDPDXQzx TJlGnVuVELcPOzpZ2VlZtEeRdSoAJjmMNY71hO6oQhAt6NDxWqJY0ZqFe/KVcvaKD1C+ KcOnDX2rXiKwM9af7/nyv7cioHCQtWRMOi4FlZr8UOWTh0nDZRWVY3WBXcR/k0AQm/6s nrLD2kycTEAR/oqobo01RW3Jn82xH04/EtBz2NAyplB3/AB0tCdJ/CVcTNvwFUXn5PcK gx4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=ZSNju9mxo/2RrXx5SOkZf+IFu1Yg+3nhRTUUbNIJuBg=; b=Dbm2+/IkPRtmUu5AC1ujN913V/K8gU3au+Hk7E6DtBw3JsbQSs88dUTJc+F8UmEPXK p9jnqSUkruVWTtODBFpLq/O35JCL+qb6JUkr3hv6SeQyB3ZK2nB+0eWVzKd1O19l89QH QkCoed8Yp9qbwfiwiFtNAG05KGrRkGo8XNQADCWuiNXtcUw3llopd56Y9N3HekkT3UiY oYeYTh868iHOWI/IjjjB2oz0mZdmcbAutwD3fQ8VgYLI3EE6DNWD/lNtP0gN9QlCPgkL 2Lke41kCKb8h6v3jcRmtUYb6xkaUt3Km6jH8iKiqh+GAk0m9Nx1eexyrZFy6qRvPK7xY B++w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=jMb1LIXK; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id z5-20020a1709060f0500b0072632426373si1408894eji.865.2022.07.14.04.29.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:57 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=jMb1LIXK; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0374A388CC12 for ; Thu, 14 Jul 2022 11:29:56 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0374A388CC12 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798196; bh=ZSNju9mxo/2RrXx5SOkZf+IFu1Yg+3nhRTUUbNIJuBg=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=jMb1LIXKiQyjuNigfGAhzKJxcHeilCwXLGuKkVkkmr/5nMcTT2cb1fURxbpIrLdXd Hca6sd2lyCIZ4niEpmL7XPjpzUiFDWswHq/YTvjnnmfenZHlyYmn5debYTLO8KlQtQ UyOcOZ/6hj3oUGJjiXYqAz8PlLqogIOifOEiQCGo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id BE9A33889E10 for ; Thu, 14 Jul 2022 11:29:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BE9A33889E10 Received: by mail-oi1-x22e.google.com with SMTP id u76so1812078oie.3 for ; Thu, 14 Jul 2022 04:29:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ZSNju9mxo/2RrXx5SOkZf+IFu1Yg+3nhRTUUbNIJuBg=; b=vorSJXtDQus4iIC/79XOoQQut3niijfKQeDS1sp1efbaXhJZ0iDwcKiIvz+RPKhY/S x2zvn+evcZppmQOZxX4X6kHKKyavMiqrIZHiiJHgLSvZoxmfudZLYnKXv1Gy2SwVS25S KjP3r393bz4ibQy8hk9cuAhvhBJcEskaZLzALdvosetrO9yjfMFOCrl6UoZpyQ2oO9ji DZZ/WKU0uD15dPD0uInXEQp36LKchEd5iDuSOy8twUsAqs4OZ4S7yB+TtvUzYMw/fFUw CGr8TIYT0aPKzKM4+SbzgxMjTenovx+CkKPEsS7I3Dbo42rtbTB8988mpFYWZQ7NH3/h DbAQ== X-Gm-Message-State: AJIora9oGjHJZT1hgt0QPBf+RKt7T0vbmufeHyMGqtCL9y6l3Tqd3gVm MFhNYeT2PoXtSZB9F89sGFwRXb6SRj3Oyw== X-Received: by 2002:a05:6808:308c:b0:335:864d:9768 with SMTP id bl12-20020a056808308c00b00335864d9768mr4161228oib.49.1657798140726; Thu, 14 Jul 2022 04:29:00 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.28.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:00 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 4/9] aarch64: Add optimized chacha20 Date: Thu, 14 Jul 2022 08:28:40 -0300 Message-Id: <20220714112845.704678-5-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-aarch64.S. It is used as default and only little-endian is supported (BE uses generic code). As for generic implementation, the last step that XOR with the input is omited. The final state register clearing is also omitted. On a virtualized Linux on Apple M1 it shows the following improvements (using formatted bench-arc4random data): GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 380.89 arc4random_buf(16) [single-thread] 500.73 arc4random_buf(32) [single-thread] 552.61 arc4random_buf(48) [single-thread] 566.82 arc4random_buf(64) [single-thread] 574.01 arc4random_buf(80) [single-thread] 581.02 arc4random_buf(96) [single-thread] 591.19 arc4random_buf(112) [single-thread] 592.29 arc4random_buf(128) [single-thread] 596.43 ----------------------------------------------- OPTIMIZED MB/s ----------------------------------------------- arc4random [single-thread] 569.60 arc4random_buf(16) [single-thread] 825.78 arc4random_buf(32) [single-thread] 987.03 arc4random_buf(48) [single-thread] 1042.39 arc4random_buf(64) [single-thread] 1075.50 arc4random_buf(80) [single-thread] 1094.68 arc4random_buf(96) [single-thread] 1130.16 arc4random_buf(112) [single-thread] 1129.58 arc4random_buf(128) [single-thread] 1137.91 ----------------------------------------------- Checked on aarch64-linux-gnu. --- LICENSES | 20 ++ stdlib/chacha20.c | 8 +- sysdeps/aarch64/Makefile | 4 + sysdeps/aarch64/chacha20-aarch64.S | 314 +++++++++++++++++++++++++++++ sysdeps/aarch64/chacha20_arch.h | 40 ++++ sysdeps/generic/chacha20_arch.h | 24 +++ 6 files changed, 408 insertions(+), 2 deletions(-) create mode 100644 sysdeps/aarch64/chacha20-aarch64.S create mode 100644 sysdeps/aarch64/chacha20_arch.h create mode 100644 sysdeps/generic/chacha20_arch.h diff --git a/LICENSES b/LICENSES index 530893b1dc..a94ea89d0d 100644 --- a/LICENSES +++ b/LICENSES @@ -389,3 +389,23 @@ Copyright 2001 by Stephen L. Moshier You should have received a copy of the GNU Lesser General Public License along with this library; if not, see . */ + +sysdeps/aarch64/chacha20-aarch64.S imports code from libgcrypt, with +the following notices: + +Copyright (C) 2017-2019 Jussi Kivilinna + +This file is part of Libgcrypt. + +Libgcrypt is free software; you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as +published by the Free Software Foundation; either version 2.1 of +the License, or (at your option) any later version. + +Libgcrypt is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU Lesser General Public License for more details. + +You should have received a copy of the GNU Lesser General Public +License along with this program; if not, see . diff --git a/stdlib/chacha20.c b/stdlib/chacha20.c index 4549fc780f..fd4babe875 100644 --- a/stdlib/chacha20.c +++ b/stdlib/chacha20.c @@ -165,8 +165,9 @@ chacha20_block (uint32_t *state, uint8_t *dst, const uint8_t *src) } static void -chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, - size_t bytes) +__attribute_maybe_unused__ +chacha20_crypt_generic (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) { while (bytes >= CHACHA20_BLOCK_SIZE) { @@ -185,3 +186,6 @@ chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, explicit_bzero (stream, sizeof stream); } } + +/* Get the architecture optimized version. */ +#include diff --git a/sysdeps/aarch64/Makefile b/sysdeps/aarch64/Makefile index 17fb1c5b72..7dfd1b62dd 100644 --- a/sysdeps/aarch64/Makefile +++ b/sysdeps/aarch64/Makefile @@ -51,6 +51,10 @@ ifeq ($(subdir),csu) gen-as-const-headers += tlsdesc.sym endif +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-aarch64 +endif + ifeq ($(subdir),gmon) CFLAGS-mcount.c += -mgeneral-regs-only endif diff --git a/sysdeps/aarch64/chacha20-aarch64.S b/sysdeps/aarch64/chacha20-aarch64.S new file mode 100644 index 0000000000..0d0e9bfc1b --- /dev/null +++ b/sysdeps/aarch64/chacha20-aarch64.S @@ -0,0 +1,314 @@ +/* Optimized AArch64 implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Copyright (C) 2017-2019 Jussi Kivilinna + + This file is part of Libgcrypt. + + Libgcrypt is free software; you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of + the License, or (at your option) any later version. + + Libgcrypt is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this program; if not, see . + */ + +/* Based on D. J. Bernstein reference implementation at + http://cr.yp.to/chacha.html: + + chacha-regs.c version 20080118 + D. J. Bernstein + Public domain. */ + +#include + +/* Only LE is supported. */ +#ifdef __AARCH64EL__ + +#define GET_DATA_POINTER(reg, name) \ + adrp reg, name ; \ + add reg, reg, :lo12:name + +/* 'ret' instruction replacement for straight-line speculation mitigation */ +#define ret_spec_stop \ + ret; dsb sy; isb; + +.cpu generic+simd + +.text + +/* register macros */ +#define INPUT x0 +#define DST x1 +#define SRC x2 +#define NBLKS x3 +#define ROUND x4 +#define INPUT_CTR x5 +#define INPUT_POS x6 +#define CTR x7 + +/* vector registers */ +#define X0 v16 +#define X4 v17 +#define X8 v18 +#define X12 v19 + +#define X1 v20 +#define X5 v21 + +#define X9 v22 +#define X13 v23 +#define X2 v24 +#define X6 v25 + +#define X3 v26 +#define X7 v27 +#define X11 v28 +#define X15 v29 + +#define X10 v30 +#define X14 v31 + +#define VCTR v0 +#define VTMP0 v1 +#define VTMP1 v2 +#define VTMP2 v3 +#define VTMP3 v4 +#define X12_TMP v5 +#define X13_TMP v6 +#define ROT8 v7 + +/********************************************************************** + helper macros + **********************************************************************/ + +#define _(...) __VA_ARGS__ + +#define vpunpckldq(s1, s2, dst) \ + zip1 dst.4s, s2.4s, s1.4s; + +#define vpunpckhdq(s1, s2, dst) \ + zip2 dst.4s, s2.4s, s1.4s; + +#define vpunpcklqdq(s1, s2, dst) \ + zip1 dst.2d, s2.2d, s1.2d; + +#define vpunpckhqdq(s1, s2, dst) \ + zip2 dst.2d, s2.2d, s1.2d; + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0, x1, x2, x3, t1, t2, t3) \ + vpunpckhdq(x1, x0, t2); \ + vpunpckldq(x1, x0, x0); \ + \ + vpunpckldq(x3, x2, t1); \ + vpunpckhdq(x3, x2, x2); \ + \ + vpunpckhqdq(t1, x0, x1); \ + vpunpcklqdq(t1, x0, x0); \ + \ + vpunpckhqdq(x2, t2, x3); \ + vpunpcklqdq(x2, t2, x2); + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define XOR(d,s1,s2) \ + eor d.16b, s2.16b, s1.16b; + +#define PLUS(ds,s) \ + add ds.4s, ds.4s, s.4s; + +#define ROTATE4(dst1,dst2,dst3,dst4,c,src1,src2,src3,src4) \ + shl dst1.4s, src1.4s, #(c); \ + shl dst2.4s, src2.4s, #(c); \ + shl dst3.4s, src3.4s, #(c); \ + shl dst4.4s, src4.4s, #(c); \ + sri dst1.4s, src1.4s, #(32 - (c)); \ + sri dst2.4s, src2.4s, #(32 - (c)); \ + sri dst3.4s, src3.4s, #(32 - (c)); \ + sri dst4.4s, src4.4s, #(32 - (c)); + +#define ROTATE4_8(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ + tbl dst1.16b, {src1.16b}, ROT8.16b; \ + tbl dst2.16b, {src2.16b}, ROT8.16b; \ + tbl dst3.16b, {src3.16b}, ROT8.16b; \ + tbl dst4.16b, {src4.16b}, ROT8.16b; + +#define ROTATE4_16(dst1,dst2,dst3,dst4,src1,src2,src3,src4) \ + rev32 dst1.8h, src1.8h; \ + rev32 dst2.8h, src2.8h; \ + rev32 dst3.8h, src3.8h; \ + rev32 dst4.8h, src4.8h; + +#define QUARTERROUND4(a1,b1,c1,d1,a2,b2,c2,d2,a3,b3,c3,d3,a4,b4,c4,d4,ign,tmp1,tmp2,tmp3,tmp4) \ + PLUS(a1,b1); PLUS(a2,b2); \ + PLUS(a3,b3); PLUS(a4,b4); \ + XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ + XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ + ROTATE4_16(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4); \ + PLUS(c1,d1); PLUS(c2,d2); \ + PLUS(c3,d3); PLUS(c4,d4); \ + XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ + XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ + ROTATE4(b1, b2, b3, b4, 12, tmp1, tmp2, tmp3, tmp4) \ + PLUS(a1,b1); PLUS(a2,b2); \ + PLUS(a3,b3); PLUS(a4,b4); \ + XOR(tmp1,d1,a1); XOR(tmp2,d2,a2); \ + XOR(tmp3,d3,a3); XOR(tmp4,d4,a4); \ + ROTATE4_8(d1, d2, d3, d4, tmp1, tmp2, tmp3, tmp4) \ + PLUS(c1,d1); PLUS(c2,d2); \ + PLUS(c3,d3); PLUS(c4,d4); \ + XOR(tmp1,b1,c1); XOR(tmp2,b2,c2); \ + XOR(tmp3,b3,c3); XOR(tmp4,b4,c4); \ + ROTATE4(b1, b2, b3, b4, 7, tmp1, tmp2, tmp3, tmp4) \ + +.align 4 +L(__chacha20_blocks4_data_inc_counter): + .long 0,1,2,3 + +.align 4 +L(__chacha20_blocks4_data_rot8): + .byte 3,0,1,2 + .byte 7,4,5,6 + .byte 11,8,9,10 + .byte 15,12,13,14 + +.hidden __chacha20_neon_blocks4 +ENTRY (__chacha20_neon_blocks4) + /* input: + * x0: input + * x1: dst + * x2: src + * x3: nblks (multiple of 4) + */ + + GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_rot8)) + add INPUT_CTR, INPUT, #(12*4); + ld1 {ROT8.16b}, [CTR]; + GET_DATA_POINTER(CTR, L(__chacha20_blocks4_data_inc_counter)) + mov INPUT_POS, INPUT; + ld1 {VCTR.16b}, [CTR]; + +L(loop4): + /* Construct counter vectors X12 and X13 */ + + ld1 {X15.16b}, [INPUT_CTR]; + mov ROUND, #20; + ld1 {VTMP1.16b-VTMP3.16b}, [INPUT_POS]; + + dup X12.4s, X15.s[0]; + dup X13.4s, X15.s[1]; + ldr CTR, [INPUT_CTR]; + add X12.4s, X12.4s, VCTR.4s; + dup X0.4s, VTMP1.s[0]; + dup X1.4s, VTMP1.s[1]; + dup X2.4s, VTMP1.s[2]; + dup X3.4s, VTMP1.s[3]; + dup X14.4s, X15.s[2]; + cmhi VTMP0.4s, VCTR.4s, X12.4s; + dup X15.4s, X15.s[3]; + add CTR, CTR, #4; /* Update counter */ + dup X4.4s, VTMP2.s[0]; + dup X5.4s, VTMP2.s[1]; + dup X6.4s, VTMP2.s[2]; + dup X7.4s, VTMP2.s[3]; + sub X13.4s, X13.4s, VTMP0.4s; + dup X8.4s, VTMP3.s[0]; + dup X9.4s, VTMP3.s[1]; + dup X10.4s, VTMP3.s[2]; + dup X11.4s, VTMP3.s[3]; + mov X12_TMP.16b, X12.16b; + mov X13_TMP.16b, X13.16b; + str CTR, [INPUT_CTR]; + +L(round2): + subs ROUND, ROUND, #2 + QUARTERROUND4(X0, X4, X8, X12, X1, X5, X9, X13, + X2, X6, X10, X14, X3, X7, X11, X15, + tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) + QUARTERROUND4(X0, X5, X10, X15, X1, X6, X11, X12, + X2, X7, X8, X13, X3, X4, X9, X14, + tmp:=,VTMP0,VTMP1,VTMP2,VTMP3) + b.ne L(round2); + + ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS], #32; + + PLUS(X12, X12_TMP); /* INPUT + 12 * 4 + counter */ + PLUS(X13, X13_TMP); /* INPUT + 13 * 4 + counter */ + + dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 0 * 4 */ + dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 1 * 4 */ + dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 2 * 4 */ + dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 3 * 4 */ + PLUS(X0, VTMP2); + PLUS(X1, VTMP3); + PLUS(X2, X12_TMP); + PLUS(X3, X13_TMP); + + dup VTMP2.4s, VTMP1.s[0]; /* INPUT + 4 * 4 */ + dup VTMP3.4s, VTMP1.s[1]; /* INPUT + 5 * 4 */ + dup X12_TMP.4s, VTMP1.s[2]; /* INPUT + 6 * 4 */ + dup X13_TMP.4s, VTMP1.s[3]; /* INPUT + 7 * 4 */ + ld1 {VTMP0.16b, VTMP1.16b}, [INPUT_POS]; + mov INPUT_POS, INPUT; + PLUS(X4, VTMP2); + PLUS(X5, VTMP3); + PLUS(X6, X12_TMP); + PLUS(X7, X13_TMP); + + dup VTMP2.4s, VTMP0.s[0]; /* INPUT + 8 * 4 */ + dup VTMP3.4s, VTMP0.s[1]; /* INPUT + 9 * 4 */ + dup X12_TMP.4s, VTMP0.s[2]; /* INPUT + 10 * 4 */ + dup X13_TMP.4s, VTMP0.s[3]; /* INPUT + 11 * 4 */ + dup VTMP0.4s, VTMP1.s[2]; /* INPUT + 14 * 4 */ + dup VTMP1.4s, VTMP1.s[3]; /* INPUT + 15 * 4 */ + PLUS(X8, VTMP2); + PLUS(X9, VTMP3); + PLUS(X10, X12_TMP); + PLUS(X11, X13_TMP); + PLUS(X14, VTMP0); + PLUS(X15, VTMP1); + + transpose_4x4(X0, X1, X2, X3, VTMP0, VTMP1, VTMP2); + transpose_4x4(X4, X5, X6, X7, VTMP0, VTMP1, VTMP2); + transpose_4x4(X8, X9, X10, X11, VTMP0, VTMP1, VTMP2); + transpose_4x4(X12, X13, X14, X15, VTMP0, VTMP1, VTMP2); + + subs NBLKS, NBLKS, #4; + + st1 {X0.16b,X4.16B,X8.16b, X12.16b}, [DST], #64 + st1 {X1.16b,X5.16b}, [DST], #32; + st1 {X9.16b, X13.16b, X2.16b, X6.16b}, [DST], #64 + st1 {X10.16b,X14.16b}, [DST], #32; + st1 {X3.16b, X7.16b, X11.16b, X15.16b}, [DST], #64; + + b.ne L(loop4); + + ret_spec_stop +END (__chacha20_neon_blocks4) + +#endif diff --git a/sysdeps/aarch64/chacha20_arch.h b/sysdeps/aarch64/chacha20_arch.h new file mode 100644 index 0000000000..9febee7bb6 --- /dev/null +++ b/sysdeps/aarch64/chacha20_arch.h @@ -0,0 +1,40 @@ +/* Chacha20 implementation, used on arc4random. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_neon_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE > CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); +#ifdef __AARCH64EL__ + __chacha20_neon_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +#else + chacha20_crypt_generic (state, dst, src, bytes); +#endif +} diff --git a/sysdeps/generic/chacha20_arch.h b/sysdeps/generic/chacha20_arch.h new file mode 100644 index 0000000000..efad41d034 --- /dev/null +++ b/sysdeps/generic/chacha20_arch.h @@ -0,0 +1,24 @@ +/* Chacha20 implementation, generic interface for encrypt. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +static inline void +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ + chacha20_crypt_generic (state, dst, src, bytes); +} From patchwork Thu Jul 14 11:28:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590376 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1477689max; Thu, 14 Jul 2022 04:30:40 -0700 (PDT) X-Google-Smtp-Source: AGRyM1u35n9QpwwkTgQ9vGqwr/rLS/FrzEe6GkcwK4e02YM+2FokgDM2Y3qAJbjPJJ6fiji6aWHO X-Received: by 2002:a17:906:8301:b0:6e4:896d:59b1 with SMTP id j1-20020a170906830100b006e4896d59b1mr8330284ejx.396.1657798240123; Thu, 14 Jul 2022 04:30:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798240; cv=none; d=google.com; s=arc-20160816; b=Jj55wJHvdsy3fdvpwrO+t0CuKYuXVALadMYvZhhwbWG4RKD7GMMVVo/D6uojcOdYm6 s9RCur0PJ2M/oytXo6NE+erMUFNzbBP9X8uTporujsaHVjYouP04+ASqUkeVVyn1DqyK P/WftYnwrCKMXd2jNwmCCcs7z1xy21scYaJT63Uh8v+4oWnFyCakYBsfKOcbUru629fq 1zS5AmJOaviUfstXv1qPNc3bF8sw+bwLzvg61ZFtfHhHH3r/EcDH/Xm//5NfivueEdOk WhoKD1U8TlqIMa96wZ2OGatL0e9u2IpNS0mVMQXkJs+meQlm4dAO1jq4rvI+Jyz9nVBE rqiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=P3eKuvKsS1qRL0REo84oxRHFMWGdCxW6edMjH2ssmYQ=; b=BndN5fnGVUm3XgYd+U19JlhIWnR+mxFet3rEnQzzjVU4fArOOFb0pMyOXzRVqDqj3l 6kbI+uzq8sqGTfRIb2CAmQMPqbl1PYPWHJJl3IA41fw4KBfW9CoTwX6eQFyJtexilOfc zqaX0vILRFuWTb7e1jiYxdChIa/gAn+v1PrGtz61mv7QY3X6k1XKjvdGUWicIWbarjL8 Q1ucSj1Oy2Iv0fCTMEs/1WQj7Hyb6MTgfv8o3Ij4uf0EfOsGlT80TihpLT/jOrtFMEKg +u9xmgIzxhs8wHkN9Yll5ZP9vAMfhftU7yTrfYJh7wJ+XrNlKBBs0IqWsXRgPNTVWnQF u1aQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b="dy/ptGL9"; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id hr3-20020a1709073f8300b007262da05deasi2144850ejc.432.2022.07.14.04.30.39 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:30:40 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b="dy/ptGL9"; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AEBBA388F366 for ; Thu, 14 Jul 2022 11:30:38 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AEBBA388F366 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798238; bh=P3eKuvKsS1qRL0REo84oxRHFMWGdCxW6edMjH2ssmYQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=dy/ptGL9anxykycgRjwDtM94hL1h97hBA2nhIfeNhzl1fNGefQGLv1WBWjUuNPD5T Ei7axKTlYqD/b/CAf4lTGzjWjdmrB+O/smVjFDops/oxuhxb+tIJyvNrhP4p/Fpvvn JaD4z4ZZMtSk8YTr27WmH1ZsziwEC5FSS+nMGp7Q= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id B05EF3889E38 for ; Thu, 14 Jul 2022 11:29:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org B05EF3889E38 Received: by mail-oi1-x236.google.com with SMTP id j70so1962234oih.10 for ; Thu, 14 Jul 2022 04:29:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=P3eKuvKsS1qRL0REo84oxRHFMWGdCxW6edMjH2ssmYQ=; b=lGxLiYqXIwP9jJ1p7CP0ifyaZxN19UIPnswg+i3CIm3HeMgvXc+h0a+3xhZ/B3nFMI oBR4Dz9sJij8woyarWc4c+h4Gm0Z9NoTC2oo+G/4s3jIHwM96D99dcniYAnTi0eTgBQL Exog9g7ZESg/XEVq4pvs+HThIrCgkabeYDhM8iRPwxUCN6s4DDENoMT7nWILsMHHjD0F kaBu4brIDABWmLVGNPw/zf7cebdO4h44YGW1xA+LQNer4VzcyFL3m1iJcBM4MA8xtv5S ++wuSO+d0brYsEeTkfBtQfdMSQlIxDCaZ3O35LUD70fyuG9AWYQvweqAKJ/UXPA/qLZc wVPQ== X-Gm-Message-State: AJIora+jBK1UbZ35pZ3n8Q/fUXO3hskQQA9rBq8jAS05Rgf4q484bTew xgTTOjOw2Cq31N4HbeA8gyPUG+9+r2vQ2w== X-Received: by 2002:a05:6808:14c3:b0:337:a1dc:89d5 with SMTP id f3-20020a05680814c300b00337a1dc89d5mr4067117oiw.201.1657798142178; Thu, 14 Jul 2022 04:29:02 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.29.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:01 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 5/9] x86: Add SSE2 optimized chacha20 Date: Thu, 14 Jul 2022 08:28:41 -0300 Message-Id: <20220714112845.704678-6-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-amd64-ssse3.S. It replaces the ROTATE_SHUF_2 (which uses pshufb) by ROTATE2 and thus making the original implementation SSE2. As for generic implementation, the last step that XOR with the input is omited. The final state register clearing is also omitted. On a Ryzen 9 5900X it shows the following improvements (using formatted bench-arc4random data): GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 443.11 arc4random_buf(16) [single-thread] 552.27 arc4random_buf(32) [single-thread] 626.86 arc4random_buf(48) [single-thread] 649.81 arc4random_buf(64) [single-thread] 663.95 arc4random_buf(80) [single-thread] 674.78 arc4random_buf(96) [single-thread] 675.17 arc4random_buf(112) [single-thread] 680.69 arc4random_buf(128) [single-thread] 683.20 ----------------------------------------------- SSE MB/s ----------------------------------------------- arc4random [single-thread] 704.25 arc4random_buf(16) [single-thread] 1018.17 arc4random_buf(32) [single-thread] 1315.27 arc4random_buf(48) [single-thread] 1449.36 arc4random_buf(64) [single-thread] 1511.16 arc4random_buf(80) [single-thread] 1539.48 arc4random_buf(96) [single-thread] 1571.06 arc4random_buf(112) [single-thread] 1596.16 arc4random_buf(128) [single-thread] 1613.48 ----------------------------------------------- Checked on x86_64-linux-gnu. --- LICENSES | 4 +- sysdeps/x86_64/Makefile | 6 + sysdeps/x86_64/chacha20-amd64-sse2.S | 306 +++++++++++++++++++++++++++ sysdeps/x86_64/chacha20_arch.h | 38 ++++ 4 files changed, 352 insertions(+), 2 deletions(-) create mode 100644 sysdeps/x86_64/chacha20-amd64-sse2.S create mode 100644 sysdeps/x86_64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index a94ea89d0d..47e9cd8e31 100644 --- a/LICENSES +++ b/LICENSES @@ -390,8 +390,8 @@ Copyright 2001 by Stephen L. Moshier License along with this library; if not, see . */ -sysdeps/aarch64/chacha20-aarch64.S imports code from libgcrypt, with -the following notices: +sysdeps/aarch64/chacha20-aarch64.S and sysdeps/x86_64/chacha20-amd64-sse2.S +imports code from libgcrypt, with the following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index e597a4855f..a2e5af3ca9 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -5,6 +5,12 @@ ifeq ($(subdir),csu) gen-as-const-headers += link-defines.sym endif +ifeq ($(subdir),stdlib) +sysdep_routines += \ + chacha20-amd64-sse2 \ + # sysdep_routines +endif + ifeq ($(subdir),gmon) sysdep_routines += _mcount # We cannot compile _mcount.S with -pg because that would create diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S new file mode 100644 index 0000000000..7b30f61446 --- /dev/null +++ b/sysdeps/x86_64/chacha20-amd64-sse2.S @@ -0,0 +1,306 @@ +/* Optimized SSE2 implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* chacha20-amd64-ssse3.S - SSSE3 implementation of ChaCha20 cipher + + Copyright (C) 2017-2019 Jussi Kivilinna + + This file is part of Libgcrypt. + + Libgcrypt is free software; you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of + the License, or (at your option) any later version. + + Libgcrypt is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this program; if not, see . +*/ + +/* Based on D. J. Bernstein reference implementation at + http://cr.yp.to/chacha.html: + + chacha-regs.c version 20080118 + D. J. Bernstein + Public domain. */ + +#include + +#ifdef PIC +# define rRIP (%rip) +#else +# define rRIP +#endif + +/* 'ret' instruction replacement for straight-line speculation mitigation */ +#define ret_spec_stop \ + ret; int3; + +/* register macros */ +#define INPUT %rdi +#define DST %rsi +#define SRC %rdx +#define NBLKS %rcx +#define ROUND %eax + +/* stack structure */ +#define STACK_VEC_X12 (16) +#define STACK_VEC_X13 (16 + STACK_VEC_X12) +#define STACK_TMP (16 + STACK_VEC_X13) +#define STACK_TMP1 (16 + STACK_TMP) +#define STACK_TMP2 (16 + STACK_TMP1) + +#define STACK_MAX (16 + STACK_TMP2) + +/* vector registers */ +#define X0 %xmm0 +#define X1 %xmm1 +#define X2 %xmm2 +#define X3 %xmm3 +#define X4 %xmm4 +#define X5 %xmm5 +#define X6 %xmm6 +#define X7 %xmm7 +#define X8 %xmm8 +#define X9 %xmm9 +#define X10 %xmm10 +#define X11 %xmm11 +#define X12 %xmm12 +#define X13 %xmm13 +#define X14 %xmm14 +#define X15 %xmm15 + +/********************************************************************** + helper macros + **********************************************************************/ + +/* 4x4 32-bit integer matrix transpose */ +#define TRANSPOSE_4x4(x0, x1, x2, x3, t1, t2, t3) \ + movdqa x0, t2; \ + punpckhdq x1, t2; \ + punpckldq x1, x0; \ + \ + movdqa x2, t1; \ + punpckldq x3, t1; \ + punpckhdq x3, x2; \ + \ + movdqa x0, x1; \ + punpckhqdq t1, x1; \ + punpcklqdq t1, x0; \ + \ + movdqa t2, x3; \ + punpckhqdq x2, x3; \ + punpcklqdq x2, t2; \ + movdqa t2, x2; + +/* fill xmm register with 32-bit value from memory */ +#define PBROADCASTD(mem32, xreg) \ + movd mem32, xreg; \ + pshufd $0, xreg, xreg; + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define ROTATE2(v1,v2,c,tmp1,tmp2) \ + movdqa v1, tmp1; \ + movdqa v2, tmp2; \ + psrld $(32 - (c)), v1; \ + pslld $(c), tmp1; \ + paddb tmp1, v1; \ + psrld $(32 - (c)), v2; \ + pslld $(c), tmp2; \ + paddb tmp2, v2; + +#define XOR(ds,s) \ + pxor s, ds; + +#define PLUS(ds,s) \ + paddd s, ds; + +#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,tmp2) \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE2(d1, d2, 16, tmp1, tmp2); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 12, tmp1, tmp2); \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE2(d1, d2, 8, tmp1, tmp2); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 7, tmp1, tmp2); + + .section .text.sse2,"ax",@progbits + +chacha20_data: + .align 16 +L(counter1): + .long 1,0,0,0 +L(inc_counter): + .long 0,1,2,3 +L(unsigned_cmp): + .long 0x80000000,0x80000000,0x80000000,0x80000000 + + .hidden __chacha20_sse2_blocks4 +ENTRY (__chacha20_sse2_blocks4) + /* input: + * %rdi: input + * %rsi: dst + * %rdx: src + * %rcx: nblks (multiple of 4) + */ + + pushq %rbp; + cfi_adjust_cfa_offset(8); + cfi_rel_offset(rbp, 0) + movq %rsp, %rbp; + cfi_def_cfa_register(%rbp); + + subq $STACK_MAX, %rsp; + andq $~15, %rsp; + +L(loop4): + mov $20, ROUND; + + /* Construct counter vectors X12 and X13 */ + movdqa L(inc_counter) rRIP, X0; + movdqa L(unsigned_cmp) rRIP, X2; + PBROADCASTD((12 * 4)(INPUT), X12); + PBROADCASTD((13 * 4)(INPUT), X13); + paddd X0, X12; + movdqa X12, X1; + pxor X2, X0; + pxor X2, X1; + pcmpgtd X1, X0; + psubd X0, X13; + movdqa X12, (STACK_VEC_X12)(%rsp); + movdqa X13, (STACK_VEC_X13)(%rsp); + + /* Load vectors */ + PBROADCASTD((0 * 4)(INPUT), X0); + PBROADCASTD((1 * 4)(INPUT), X1); + PBROADCASTD((2 * 4)(INPUT), X2); + PBROADCASTD((3 * 4)(INPUT), X3); + PBROADCASTD((4 * 4)(INPUT), X4); + PBROADCASTD((5 * 4)(INPUT), X5); + PBROADCASTD((6 * 4)(INPUT), X6); + PBROADCASTD((7 * 4)(INPUT), X7); + PBROADCASTD((8 * 4)(INPUT), X8); + PBROADCASTD((9 * 4)(INPUT), X9); + PBROADCASTD((10 * 4)(INPUT), X10); + PBROADCASTD((11 * 4)(INPUT), X11); + PBROADCASTD((14 * 4)(INPUT), X14); + PBROADCASTD((15 * 4)(INPUT), X15); + movdqa X11, (STACK_TMP)(%rsp); + movdqa X15, (STACK_TMP1)(%rsp); + +L(round2_4): + QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X11,X15) + movdqa (STACK_TMP)(%rsp), X11; + movdqa (STACK_TMP1)(%rsp), X15; + movdqa X8, (STACK_TMP)(%rsp); + movdqa X9, (STACK_TMP1)(%rsp); + QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,X9) + QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,X9) + movdqa (STACK_TMP)(%rsp), X8; + movdqa (STACK_TMP1)(%rsp), X9; + movdqa X11, (STACK_TMP)(%rsp); + movdqa X15, (STACK_TMP1)(%rsp); + QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X11,X15) + sub $2, ROUND; + jnz L(round2_4); + + /* tmp := X15 */ + movdqa (STACK_TMP)(%rsp), X11; + PBROADCASTD((0 * 4)(INPUT), X15); + PLUS(X0, X15); + PBROADCASTD((1 * 4)(INPUT), X15); + PLUS(X1, X15); + PBROADCASTD((2 * 4)(INPUT), X15); + PLUS(X2, X15); + PBROADCASTD((3 * 4)(INPUT), X15); + PLUS(X3, X15); + PBROADCASTD((4 * 4)(INPUT), X15); + PLUS(X4, X15); + PBROADCASTD((5 * 4)(INPUT), X15); + PLUS(X5, X15); + PBROADCASTD((6 * 4)(INPUT), X15); + PLUS(X6, X15); + PBROADCASTD((7 * 4)(INPUT), X15); + PLUS(X7, X15); + PBROADCASTD((8 * 4)(INPUT), X15); + PLUS(X8, X15); + PBROADCASTD((9 * 4)(INPUT), X15); + PLUS(X9, X15); + PBROADCASTD((10 * 4)(INPUT), X15); + PLUS(X10, X15); + PBROADCASTD((11 * 4)(INPUT), X15); + PLUS(X11, X15); + movdqa (STACK_VEC_X12)(%rsp), X15; + PLUS(X12, X15); + movdqa (STACK_VEC_X13)(%rsp), X15; + PLUS(X13, X15); + movdqa X13, (STACK_TMP)(%rsp); + PBROADCASTD((14 * 4)(INPUT), X15); + PLUS(X14, X15); + movdqa (STACK_TMP1)(%rsp), X15; + movdqa X14, (STACK_TMP1)(%rsp); + PBROADCASTD((15 * 4)(INPUT), X13); + PLUS(X15, X13); + movdqa X15, (STACK_TMP2)(%rsp); + + /* Update counter */ + addq $4, (12 * 4)(INPUT); + + TRANSPOSE_4x4(X0, X1, X2, X3, X13, X14, X15); + movdqu X0, (64 * 0 + 16 * 0)(DST) + movdqu X1, (64 * 1 + 16 * 0)(DST) + movdqu X2, (64 * 2 + 16 * 0)(DST) + movdqu X3, (64 * 3 + 16 * 0)(DST) + TRANSPOSE_4x4(X4, X5, X6, X7, X0, X1, X2); + movdqa (STACK_TMP)(%rsp), X13; + movdqa (STACK_TMP1)(%rsp), X14; + movdqa (STACK_TMP2)(%rsp), X15; + movdqu X4, (64 * 0 + 16 * 1)(DST) + movdqu X5, (64 * 1 + 16 * 1)(DST) + movdqu X6, (64 * 2 + 16 * 1)(DST) + movdqu X7, (64 * 3 + 16 * 1)(DST) + TRANSPOSE_4x4(X8, X9, X10, X11, X0, X1, X2); + movdqu X8, (64 * 0 + 16 * 2)(DST) + movdqu X9, (64 * 1 + 16 * 2)(DST) + movdqu X10, (64 * 2 + 16 * 2)(DST) + movdqu X11, (64 * 3 + 16 * 2)(DST) + TRANSPOSE_4x4(X12, X13, X14, X15, X0, X1, X2); + movdqu X12, (64 * 0 + 16 * 3)(DST) + movdqu X13, (64 * 1 + 16 * 3)(DST) + movdqu X14, (64 * 2 + 16 * 3)(DST) + movdqu X15, (64 * 3 + 16 * 3)(DST) + + sub $4, NBLKS; + lea (4 * 64)(DST), DST; + lea (4 * 64)(SRC), SRC; + jnz L(loop4); + + /* eax zeroed by round loop. */ + leave; + cfi_adjust_cfa_offset(-8) + cfi_def_cfa_register(%rsp); + ret_spec_stop; +END (__chacha20_sse2_blocks4) diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h new file mode 100644 index 0000000000..5738c840a9 --- /dev/null +++ b/sysdeps/x86_64/chacha20_arch.h @@ -0,0 +1,38 @@ +/* Chacha20 implementation, used on arc4random. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static inline void +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); + + __chacha20_sse2_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +} From patchwork Thu Jul 14 11:28:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590372 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1476827max; Thu, 14 Jul 2022 04:29:45 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vjdXiZ4yDV9v3RXwA7SzlISKp1AintexYXTJgzslAW+id10pGuXjhxWyalyeMzXHUYY6Kv X-Received: by 2002:a17:907:1def:b0:72b:33e6:46d6 with SMTP id og47-20020a1709071def00b0072b33e646d6mr8148164ejc.414.1657798185254; Thu, 14 Jul 2022 04:29:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798185; cv=none; d=google.com; s=arc-20160816; b=SXayQxGFBD006NaVXyXsn6J/s/pnheLKHteYPV3TwL37Perq4yXCs58yB+ODhvvt7x mS+QihTl1QV9AizMyJhmfg4dTzN9sEiNyqm3KXp9GJ7fSH6Gj1ah0ptDGBHb0z6/rKdF Ts2M9YCvhIlC7ygG0Y4t/g+5nufp5lNx2JsXwqZUZkJgIR8axDhA5vm0IeyRkDHPxIOL frI5frGEE7rocgRrj9HzD15iIvtBXDwIH5tCQm5rbuI4K2hO4V+4Q3ldPCbq7NQ5C9oS utGIVgwlH/q+F80opv5/GrvpKAjYRu9H28tqop9XYA2z8XQhZq1jGaFwQjndpd+CZnTI UX1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=koNseJ8S7MHxkuXllj/irx/1TdS99ZUO7XktkFUjJn4=; b=JQq6H5oI76Dpet1JmR12QIWtvWe8382K3lHF/woJ+GOPRNK9fgUu9iUz7Ly9GpOV9e sgan8l3w1vj3AXgMzIstDOOjzr2QdZiI3hWmpU0WM/jb+06aE+hsKsvI1d/PEC/S26GB jI9TCXxgOuJIOpteBaGL0oC5ItO67W1miEfUC0W69PPQFFdeplMD9GtwmXqsK2nYmvA4 mZ/VY5o7KS5SSR5PBquTwmlIie6u3UHOX13tAdymKsaMppf7GQmI0in+d3OCF0VyFfUH Y7Wdh2B9jBKaZ+GnizEdDzuU7x8l8N5UYs7YQk9qVpQBIsvOFyCRyQGWMjd94Jm61jOg 4LwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=mcRQAg0p; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id f26-20020a50d55a000000b0043743ad1af8si623881edj.356.2022.07.14.04.29.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:45 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=mcRQAg0p; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1B07A3887F7C for ; Thu, 14 Jul 2022 11:29:44 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1B07A3887F7C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798184; bh=koNseJ8S7MHxkuXllj/irx/1TdS99ZUO7XktkFUjJn4=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=mcRQAg0pcFrYSW8biKiCJLXK5nwgO2ERWq98Wtj6XA7PF73LhPJOognaBEPqnfffq 7dZ3/fZQcsAKpiKIfD3XkM50Jy8xKiQ9rKM59gircMP7nt5K1LEEpkj/UkcfaeAPrM MahOkEw5OMY8euGMtG1TVfCXsFlkocIzBbQcd3HQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-ot1-x332.google.com (mail-ot1-x332.google.com [IPv6:2607:f8b0:4864:20::332]) by sourceware.org (Postfix) with ESMTPS id 9238D388B6B4 for ; Thu, 14 Jul 2022 11:29:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9238D388B6B4 Received: by mail-ot1-x332.google.com with SMTP id r22-20020a056830419600b0061c6edc5dcfso935110otu.12 for ; Thu, 14 Jul 2022 04:29:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=koNseJ8S7MHxkuXllj/irx/1TdS99ZUO7XktkFUjJn4=; b=lsmnDVTCQwkegR5pFKG5wEFPtjTEI3TZH0tpmbH8YUZgdIaq50CeVFI/Jm+aiNalqV Ow6/Xfo4nfpXFCcoQAhdevRXVCdBg09P9GipPiQ2rPf0Kb9WulJZpCkrmW806Ajb3xrB rdA8/xsm6/G+T04WrXjtCSHmqpu2Hld2dlnK9LqRoNyhmhL7SlBGcfXRN0Y5aEFvdqM1 pvbs11rEEhNsutyWZfBnaZ7BdBg2cmK4pBehJDHaDgNjs/GUgwi5dlV8b+q3RiehJ+YS aNT8+W1Hvo0CBbXxnjH8j+axGZSbV9a4IJISSXZHUOT025l9z3rAJmX4LVewdvdQksFR LhoA== X-Gm-Message-State: AJIora8rriiNHvREs5kHZWJfU48eYNk13+d8QEQxBcKAISNL7Hv38q5n +vASyst1IdMdoSJ1vemKY2LY4DXwAVDgxg== X-Received: by 2002:a05:6830:d81:b0:616:abfa:796f with SMTP id bv1-20020a0568300d8100b00616abfa796fmr3139689otb.18.1657798143590; Thu, 14 Jul 2022 04:29:03 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.29.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:03 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 6/9] x86: Add AVX2 optimized chacha20 Date: Thu, 14 Jul 2022 08:28:42 -0300 Message-Id: <20220714112845.704678-7-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-amd64-avx2.S. It is used only if AVX2 is supported and enabled by the architecture. As for generic implementation, the last step that XOR with the input is omited. The final state register clearing is also omitted. On a Ryzen 9 5900X it shows the following improvements (using formatted bench-arc4random data): SSE MB/s ----------------------------------------------- arc4random [single-thread] 704.25 arc4random_buf(16) [single-thread] 1018.17 arc4random_buf(32) [single-thread] 1315.27 arc4random_buf(48) [single-thread] 1449.36 arc4random_buf(64) [single-thread] 1511.16 arc4random_buf(80) [single-thread] 1539.48 arc4random_buf(96) [single-thread] 1571.06 arc4random_buf(112) [single-thread] 1596.16 arc4random_buf(128) [single-thread] 1613.48 ----------------------------------------------- AVX2 MB/s ----------------------------------------------- arc4random [single-thread] 922.61 arc4random_buf(16) [single-thread] 1478.70 arc4random_buf(32) [single-thread] 2241.80 arc4random_buf(48) [single-thread] 2681.28 arc4random_buf(64) [single-thread] 2913.43 arc4random_buf(80) [single-thread] 3009.73 arc4random_buf(96) [single-thread] 3141.16 arc4random_buf(112) [single-thread] 3254.46 arc4random_buf(128) [single-thread] 3305.02 ----------------------------------------------- Checked on x86_64-linux-gnu. --- LICENSES | 5 +- sysdeps/x86_64/Makefile | 1 + sysdeps/x86_64/chacha20-amd64-avx2.S | 328 +++++++++++++++++++++++++++ sysdeps/x86_64/chacha20-amd64-sse2.S | 5 + sysdeps/x86_64/chacha20_arch.h | 27 ++- 5 files changed, 359 insertions(+), 7 deletions(-) create mode 100644 sysdeps/x86_64/chacha20-amd64-avx2.S diff --git a/LICENSES b/LICENSES index 47e9cd8e31..1617648813 100644 --- a/LICENSES +++ b/LICENSES @@ -390,8 +390,9 @@ Copyright 2001 by Stephen L. Moshier License along with this library; if not, see . */ -sysdeps/aarch64/chacha20-aarch64.S and sysdeps/x86_64/chacha20-amd64-sse2.S -imports code from libgcrypt, with the following notices: +sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, +and sysdeps/x86_64/chacha20-amd64-avx2.S imports code from libgcrypt, +with the following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/sysdeps/x86_64/Makefile b/sysdeps/x86_64/Makefile index a2e5af3ca9..a02fb9a114 100644 --- a/sysdeps/x86_64/Makefile +++ b/sysdeps/x86_64/Makefile @@ -8,6 +8,7 @@ endif ifeq ($(subdir),stdlib) sysdep_routines += \ chacha20-amd64-sse2 \ + chacha20-amd64-avx2 \ # sysdep_routines endif diff --git a/sysdeps/x86_64/chacha20-amd64-avx2.S b/sysdeps/x86_64/chacha20-amd64-avx2.S new file mode 100644 index 0000000000..eb07b99f48 --- /dev/null +++ b/sysdeps/x86_64/chacha20-amd64-avx2.S @@ -0,0 +1,328 @@ +/* Optimized AVX2 implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* chacha20-amd64-avx2.S - AVX2 implementation of ChaCha20 cipher + + Copyright (C) 2017-2019 Jussi Kivilinna + + This file is part of Libgcrypt. + + Libgcrypt is free software; you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of + the License, or (at your option) any later version. + + Libgcrypt is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this program; if not, see . +*/ + +/* Based on D. J. Bernstein reference implementation at + http://cr.yp.to/chacha.html: + + chacha-regs.c version 20080118 + D. J. Bernstein + Public domain. */ + +#include + +#ifdef PIC +# define rRIP (%rip) +#else +# define rRIP +#endif + +/* register macros */ +#define INPUT %rdi +#define DST %rsi +#define SRC %rdx +#define NBLKS %rcx +#define ROUND %eax + +/* stack structure */ +#define STACK_VEC_X12 (32) +#define STACK_VEC_X13 (32 + STACK_VEC_X12) +#define STACK_TMP (32 + STACK_VEC_X13) +#define STACK_TMP1 (32 + STACK_TMP) + +#define STACK_MAX (32 + STACK_TMP1) + +/* vector registers */ +#define X0 %ymm0 +#define X1 %ymm1 +#define X2 %ymm2 +#define X3 %ymm3 +#define X4 %ymm4 +#define X5 %ymm5 +#define X6 %ymm6 +#define X7 %ymm7 +#define X8 %ymm8 +#define X9 %ymm9 +#define X10 %ymm10 +#define X11 %ymm11 +#define X12 %ymm12 +#define X13 %ymm13 +#define X14 %ymm14 +#define X15 %ymm15 + +#define X0h %xmm0 +#define X1h %xmm1 +#define X2h %xmm2 +#define X3h %xmm3 +#define X4h %xmm4 +#define X5h %xmm5 +#define X6h %xmm6 +#define X7h %xmm7 +#define X8h %xmm8 +#define X9h %xmm9 +#define X10h %xmm10 +#define X11h %xmm11 +#define X12h %xmm12 +#define X13h %xmm13 +#define X14h %xmm14 +#define X15h %xmm15 + +/********************************************************************** + helper macros + **********************************************************************/ + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0,x1,x2,x3,t1,t2) \ + vpunpckhdq x1, x0, t2; \ + vpunpckldq x1, x0, x0; \ + \ + vpunpckldq x3, x2, t1; \ + vpunpckhdq x3, x2, x2; \ + \ + vpunpckhqdq t1, x0, x1; \ + vpunpcklqdq t1, x0, x0; \ + \ + vpunpckhqdq x2, t2, x3; \ + vpunpcklqdq x2, t2, x2; + +/* 2x2 128-bit matrix transpose */ +#define transpose_16byte_2x2(x0,x1,t1) \ + vmovdqa x0, t1; \ + vperm2i128 $0x20, x1, x0, x0; \ + vperm2i128 $0x31, x1, t1, x1; + +/********************************************************************** + 8-way chacha20 + **********************************************************************/ + +#define ROTATE2(v1,v2,c,tmp) \ + vpsrld $(32 - (c)), v1, tmp; \ + vpslld $(c), v1, v1; \ + vpaddb tmp, v1, v1; \ + vpsrld $(32 - (c)), v2, tmp; \ + vpslld $(c), v2, v2; \ + vpaddb tmp, v2, v2; + +#define ROTATE_SHUF_2(v1,v2,shuf) \ + vpshufb shuf, v1, v1; \ + vpshufb shuf, v2, v2; + +#define XOR(ds,s) \ + vpxor s, ds, ds; + +#define PLUS(ds,s) \ + vpaddd s, ds, ds; + +#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2,ign,tmp1,\ + interleave_op1,interleave_op2,\ + interleave_op3,interleave_op4) \ + vbroadcasti128 .Lshuf_rol16 rRIP, tmp1; \ + interleave_op1; \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE_SHUF_2(d1, d2, tmp1); \ + interleave_op2; \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 12, tmp1); \ + vbroadcasti128 .Lshuf_rol8 rRIP, tmp1; \ + interleave_op3; \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE_SHUF_2(d1, d2, tmp1); \ + interleave_op4; \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE2(b1, b2, 7, tmp1); + + .section .text.avx2, "ax", @progbits + .align 32 +chacha20_data: +L(shuf_rol16): + .byte 2,3,0,1,6,7,4,5,10,11,8,9,14,15,12,13 +L(shuf_rol8): + .byte 3,0,1,2,7,4,5,6,11,8,9,10,15,12,13,14 +L(inc_counter): + .byte 0,1,2,3,4,5,6,7 +L(unsigned_cmp): + .long 0x80000000 + + .hidden __chacha20_avx2_blocks8 +ENTRY (__chacha20_avx2_blocks8) + /* input: + * %rdi: input + * %rsi: dst + * %rdx: src + * %rcx: nblks (multiple of 8) + */ + vzeroupper; + + pushq %rbp; + cfi_adjust_cfa_offset(8); + cfi_rel_offset(rbp, 0) + movq %rsp, %rbp; + cfi_def_cfa_register(rbp); + + subq $STACK_MAX, %rsp; + andq $~31, %rsp; + +L(loop8): + mov $20, ROUND; + + /* Construct counter vectors X12 and X13 */ + vpmovzxbd L(inc_counter) rRIP, X0; + vpbroadcastd L(unsigned_cmp) rRIP, X2; + vpbroadcastd (12 * 4)(INPUT), X12; + vpbroadcastd (13 * 4)(INPUT), X13; + vpaddd X0, X12, X12; + vpxor X2, X0, X0; + vpxor X2, X12, X1; + vpcmpgtd X1, X0, X0; + vpsubd X0, X13, X13; + vmovdqa X12, (STACK_VEC_X12)(%rsp); + vmovdqa X13, (STACK_VEC_X13)(%rsp); + + /* Load vectors */ + vpbroadcastd (0 * 4)(INPUT), X0; + vpbroadcastd (1 * 4)(INPUT), X1; + vpbroadcastd (2 * 4)(INPUT), X2; + vpbroadcastd (3 * 4)(INPUT), X3; + vpbroadcastd (4 * 4)(INPUT), X4; + vpbroadcastd (5 * 4)(INPUT), X5; + vpbroadcastd (6 * 4)(INPUT), X6; + vpbroadcastd (7 * 4)(INPUT), X7; + vpbroadcastd (8 * 4)(INPUT), X8; + vpbroadcastd (9 * 4)(INPUT), X9; + vpbroadcastd (10 * 4)(INPUT), X10; + vpbroadcastd (11 * 4)(INPUT), X11; + vpbroadcastd (14 * 4)(INPUT), X14; + vpbroadcastd (15 * 4)(INPUT), X15; + vmovdqa X15, (STACK_TMP)(%rsp); + +L(round2): + QUARTERROUND2(X0, X4, X8, X12, X1, X5, X9, X13, tmp:=,X15,,,,) + vmovdqa (STACK_TMP)(%rsp), X15; + vmovdqa X8, (STACK_TMP)(%rsp); + QUARTERROUND2(X2, X6, X10, X14, X3, X7, X11, X15, tmp:=,X8,,,,) + QUARTERROUND2(X0, X5, X10, X15, X1, X6, X11, X12, tmp:=,X8,,,,) + vmovdqa (STACK_TMP)(%rsp), X8; + vmovdqa X15, (STACK_TMP)(%rsp); + QUARTERROUND2(X2, X7, X8, X13, X3, X4, X9, X14, tmp:=,X15,,,,) + sub $2, ROUND; + jnz L(round2); + + vmovdqa X8, (STACK_TMP1)(%rsp); + + /* tmp := X15 */ + vpbroadcastd (0 * 4)(INPUT), X15; + PLUS(X0, X15); + vpbroadcastd (1 * 4)(INPUT), X15; + PLUS(X1, X15); + vpbroadcastd (2 * 4)(INPUT), X15; + PLUS(X2, X15); + vpbroadcastd (3 * 4)(INPUT), X15; + PLUS(X3, X15); + vpbroadcastd (4 * 4)(INPUT), X15; + PLUS(X4, X15); + vpbroadcastd (5 * 4)(INPUT), X15; + PLUS(X5, X15); + vpbroadcastd (6 * 4)(INPUT), X15; + PLUS(X6, X15); + vpbroadcastd (7 * 4)(INPUT), X15; + PLUS(X7, X15); + transpose_4x4(X0, X1, X2, X3, X8, X15); + transpose_4x4(X4, X5, X6, X7, X8, X15); + vmovdqa (STACK_TMP1)(%rsp), X8; + transpose_16byte_2x2(X0, X4, X15); + transpose_16byte_2x2(X1, X5, X15); + transpose_16byte_2x2(X2, X6, X15); + transpose_16byte_2x2(X3, X7, X15); + vmovdqa (STACK_TMP)(%rsp), X15; + vmovdqu X0, (64 * 0 + 16 * 0)(DST) + vmovdqu X1, (64 * 1 + 16 * 0)(DST) + vpbroadcastd (8 * 4)(INPUT), X0; + PLUS(X8, X0); + vpbroadcastd (9 * 4)(INPUT), X0; + PLUS(X9, X0); + vpbroadcastd (10 * 4)(INPUT), X0; + PLUS(X10, X0); + vpbroadcastd (11 * 4)(INPUT), X0; + PLUS(X11, X0); + vmovdqa (STACK_VEC_X12)(%rsp), X0; + PLUS(X12, X0); + vmovdqa (STACK_VEC_X13)(%rsp), X0; + PLUS(X13, X0); + vpbroadcastd (14 * 4)(INPUT), X0; + PLUS(X14, X0); + vpbroadcastd (15 * 4)(INPUT), X0; + PLUS(X15, X0); + vmovdqu X2, (64 * 2 + 16 * 0)(DST) + vmovdqu X3, (64 * 3 + 16 * 0)(DST) + + /* Update counter */ + addq $8, (12 * 4)(INPUT); + + transpose_4x4(X8, X9, X10, X11, X0, X1); + transpose_4x4(X12, X13, X14, X15, X0, X1); + vmovdqu X4, (64 * 4 + 16 * 0)(DST) + vmovdqu X5, (64 * 5 + 16 * 0)(DST) + transpose_16byte_2x2(X8, X12, X0); + transpose_16byte_2x2(X9, X13, X0); + transpose_16byte_2x2(X10, X14, X0); + transpose_16byte_2x2(X11, X15, X0); + vmovdqu X6, (64 * 6 + 16 * 0)(DST) + vmovdqu X7, (64 * 7 + 16 * 0)(DST) + vmovdqu X8, (64 * 0 + 16 * 2)(DST) + vmovdqu X9, (64 * 1 + 16 * 2)(DST) + vmovdqu X10, (64 * 2 + 16 * 2)(DST) + vmovdqu X11, (64 * 3 + 16 * 2)(DST) + vmovdqu X12, (64 * 4 + 16 * 2)(DST) + vmovdqu X13, (64 * 5 + 16 * 2)(DST) + vmovdqu X14, (64 * 6 + 16 * 2)(DST) + vmovdqu X15, (64 * 7 + 16 * 2)(DST) + + sub $8, NBLKS; + lea (8 * 64)(DST), DST; + lea (8 * 64)(SRC), SRC; + jnz L(loop8); + + vzeroupper; + + /* eax zeroed by round loop. */ + leave; + cfi_adjust_cfa_offset(-8) + cfi_def_cfa_register(%rsp); + ret; + int3; +END(__chacha20_avx2_blocks8) diff --git a/sysdeps/x86_64/chacha20-amd64-sse2.S b/sysdeps/x86_64/chacha20-amd64-sse2.S index 7b30f61446..8910363e82 100644 --- a/sysdeps/x86_64/chacha20-amd64-sse2.S +++ b/sysdeps/x86_64/chacha20-amd64-sse2.S @@ -44,6 +44,9 @@ Public domain. */ #include +#include + +#if MINIMUM_X86_ISA_LEVEL <= 2 #ifdef PIC # define rRIP (%rip) @@ -304,3 +307,5 @@ L(round2_4): cfi_def_cfa_register(%rsp); ret_spec_stop; END (__chacha20_sse2_blocks4) + +#endif /* if MINIMUM_X86_ISA_LEVEL <= 2 */ diff --git a/sysdeps/x86_64/chacha20_arch.h b/sysdeps/x86_64/chacha20_arch.h index 5738c840a9..942aa3d5f6 100644 --- a/sysdeps/x86_64/chacha20_arch.h +++ b/sysdeps/x86_64/chacha20_arch.h @@ -16,6 +16,7 @@ License along with the GNU C Library; if not, see . */ +#include #include #include #include @@ -23,16 +24,32 @@ unsigned int __chacha20_sse2_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, size_t nblks) attribute_hidden; +unsigned int __chacha20_avx2_blocks8 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; static inline void chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, size_t bytes) { - _Static_assert (CHACHA20_BUFSIZE % 4 == 0, - "CHACHA20_BUFSIZE not multiple of 4"); - _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, - "CHACHA20_BUFSIZE <= CHACHA20_BLOCK_SIZE * 4"); + _Static_assert (CHACHA20_BUFSIZE % 4 == 0 && CHACHA20_BUFSIZE % 8 == 0, + "CHACHA20_BUFSIZE not multiple of 4 or 8"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); - __chacha20_sse2_blocks4 (state, dst, src, +#if MINIMUM_X86_ISA_LEVEL > 2 + __chacha20_avx2_blocks8 (state, dst, src, CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +#else + const struct cpu_features* cpu_features = __get_cpu_features (); + + /* AVX2 version uses vzeroupper, so disable it if RTM is enabled. */ + if (X86_ISA_CPU_FEATURE_USABLE_P (cpu_features, AVX2) + && X86_ISA_CPU_FEATURES_ARCH_P (cpu_features, Prefer_No_VZEROUPPER, !)) + __chacha20_avx2_blocks8 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); + else + __chacha20_sse2_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +#endif } From patchwork Thu Jul 14 11:28:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590375 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1477477max; Thu, 14 Jul 2022 04:30:27 -0700 (PDT) X-Google-Smtp-Source: AGRyM1u0HdlIYWtt3+RXD6DVdHDuUyFAvD4I0zL6xSyp9NfcM/0KdTEoTv2Pz0yV5ftTrwZahH00 X-Received: by 2002:a05:6402:2742:b0:43a:bd75:5e82 with SMTP id z2-20020a056402274200b0043abd755e82mr11466710edd.274.1657798227641; Thu, 14 Jul 2022 04:30:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798227; cv=none; d=google.com; s=arc-20160816; b=GS/80Zd4GS3H1n35unTZv8Kj2lZJ6pPAQ1gpWoE3Rz9yb/4zgdbyl2yrEs1xL4G3tz 4udNKfBHNIrRYrDsrPNoptq0URvXODxZgER2YMdRub/PjP1hJbLxlF0dNmObmyQH/Ix0 SnIit2YTVD7klOO7DYMm86dpqsvCpVl1PyNAXRrVq2ISsciCu+kktYRZKvjYFfNpgWW7 ms5HtF0TpiPWa9nrtZptRhcxKcyzk/inuvWm8vKhgnaPK6WhRT0p7X0JtPzryZJ+VuOS 1Hv4s1xaALvrQdIkoomJbIK6iNMIJecmzUfW5ey4MRMhQN7oLu1P9JFJRzhlDmeXj9PG oLVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=+2/v//5NN9x/djyk0DxZ+5vUMP+/v/GzkGC7rR+dhZo=; b=0v05tfBuVx539r+ki6lNooc2nvkVVhkbic7H2mQc6+ojzNOcmZZ6loy0tmtWrVvwl7 ft6F4Np069bmkbXdB0KYhJcFwYKUX/IzrSKFJP5mY6SoHyxDQ/d+6iu22Wg9BsICFAOu rEv+LV+oOGbz1rJY339GvbFdHmv0aJ96DkU9TXv+Ez2FFIV/S8OTeuMfh7OErPJrj4fs ovfH4Y3CLJg7Xo0W+T2OObEwYvRaaV5I0Lq/ogV5R0vdHCWFsxb6dl4c7FZIYWtNaTHR /z2cQN97zKsuTqDzMtmj6SCwYe5xzM0C/7zeAfhtoXie+xXqoo3vKqCsrzLHaf/JWySJ MCOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=Qg6Fwxb8; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id wj17-20020a170907051100b00726b8363630si1326103ejb.923.2022.07.14.04.30.27 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:30:27 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=Qg6Fwxb8; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7A3583818FCC for ; Thu, 14 Jul 2022 11:30:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7A3583818FCC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798226; bh=+2/v//5NN9x/djyk0DxZ+5vUMP+/v/GzkGC7rR+dhZo=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Qg6Fwxb8eqxYzem0GuiqxYFmyBoMlNpNYMgvallV3NED6/x8Z5BAUMavt56qyfr0C qU6+lAEmVgyTYy2hKC6HyxGdHbyn8SuJm1MnvL1q/t8F6sLUg6KIJYnyR3Xsl8cDtQ 0KVZeH/QtQT1ir9g+XJnyQQ0BGpR47HZgwsoC3bQ= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x236.google.com (mail-oi1-x236.google.com [IPv6:2607:f8b0:4864:20::236]) by sourceware.org (Postfix) with ESMTPS id 0E960388B69C for ; Thu, 14 Jul 2022 11:29:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 0E960388B69C Received: by mail-oi1-x236.google.com with SMTP id r191so1976622oie.7 for ; Thu, 14 Jul 2022 04:29:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+2/v//5NN9x/djyk0DxZ+5vUMP+/v/GzkGC7rR+dhZo=; b=DU/1E4BD/Qn54tmiMjuhloCrBzC1l72dofSLQHOHKJdvPtI/4OR0/0noPEuN9Y483v EWtX8/6azko1gCI9X2j/kBK5rHtu1QjwcCuUdiVs6Np4/ml08bK+MBkWIRxEQVLV+6FH 8MRXAqtqb1QynxyLKjZPbKNcOSQDjvhndQXbTrPZSNAMDmWw2DuiZ7NI+Xz7eMcpm2Vd Rqp5tMwgOsOUEpidiwNyeYM779EAYCXHYfBGTocbh2I466sDJRUU3vfqy7qV0GGmrFxj 1qAQ5mX5l1TJ6sx7ewTy5ntklBKKXl8ptYbUr/js7nphHGAHjRFPQl35Tfa7l1gKld7w +Nyg== X-Gm-Message-State: AJIora/i19052l/1ZQgvQhSPzXhz7uzCsDG4SvEUAlQPoyBpRdVQ518i xqIl604+qvhIkBsU076tgwVUFbNAH8vUoQ== X-Received: by 2002:a05:6808:1ab4:b0:33a:1081:2498 with SMTP id bm52-20020a0568081ab400b0033a10812498mr6708064oib.103.1657798145169; Thu, 14 Jul 2022 04:29:05 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.29.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:04 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 7/9] powerpc64: Add optimized chacha20 Date: Thu, 14 Jul 2022 08:28:43 -0300 Message-Id: <20220714112845.704678-8-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-ppc.c. It targets POWER8 and it is used on default for LE. On a POWER8 it shows the following improvements (using formatted bench-arc4random data): POWER8 GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 138.77 arc4random_buf(16) [single-thread] 174.36 arc4random_buf(32) [single-thread] 228.11 arc4random_buf(48) [single-thread] 252.31 arc4random_buf(64) [single-thread] 270.11 arc4random_buf(80) [single-thread] 278.97 arc4random_buf(96) [single-thread] 287.78 arc4random_buf(112) [single-thread] 291.92 arc4random_buf(128) [single-thread] 295.25 POWER8 MB/s ----------------------------------------------- arc4random [single-thread] 198.06 arc4random_buf(16) [single-thread] 278.79 arc4random_buf(32) [single-thread] 448.89 arc4random_buf(48) [single-thread] 551.09 arc4random_buf(64) [single-thread] 646.12 arc4random_buf(80) [single-thread] 698.04 arc4random_buf(96) [single-thread] 756.06 arc4random_buf(112) [single-thread] 784.12 arc4random_buf(128) [single-thread] 808.04 ----------------------------------------------- Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu. Reviewed-by: Paul E. Murphy --- LICENSES | 3 +- .../powerpc/powerpc64/be/multiarch/Makefile | 4 + .../powerpc64/be/multiarch/chacha20-ppc.c | 1 + .../powerpc64/be/multiarch/chacha20_arch.h | 42 +++ sysdeps/powerpc/powerpc64/power8/Makefile | 5 + .../powerpc/powerpc64/power8/chacha20-ppc.c | 256 ++++++++++++++++++ .../powerpc/powerpc64/power8/chacha20_arch.h | 37 +++ 7 files changed, 347 insertions(+), 1 deletion(-) create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/Makefile create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c create mode 100644 sysdeps/powerpc/powerpc64/power8/chacha20_arch.h diff --git a/LICENSES b/LICENSES index 1617648813..bc18026411 100644 --- a/LICENSES +++ b/LICENSES @@ -391,7 +391,8 @@ Copyright 2001 by Stephen L. Moshier . */ sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, -and sysdeps/x86_64/chacha20-amd64-avx2.S imports code from libgcrypt, +sysdeps/x86_64/chacha20-amd64-avx2.S, and +sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c imports code from libgcrypt, with the following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/Makefile b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile new file mode 100644 index 0000000000..8c75165f7f --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/Makefile @@ -0,0 +1,4 @@ +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-ppc +CFLAGS-chacha20-ppc.c += -mcpu=power8 +endif diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c new file mode 100644 index 0000000000..cf9e735326 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20-ppc.c @@ -0,0 +1 @@ +#include diff --git a/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h new file mode 100644 index 0000000000..6d2762d82b --- /dev/null +++ b/sysdeps/powerpc/powerpc64/be/multiarch/chacha20_arch.h @@ -0,0 +1,42 @@ +/* PowerPC optimization for ChaCha20. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); + + unsigned long int hwcap = GLRO(dl_hwcap); + unsigned long int hwcap2 = GLRO(dl_hwcap2); + if (hwcap2 & PPC_FEATURE2_ARCH_2_07 && hwcap & PPC_FEATURE_HAS_ALTIVEC) + __chacha20_power8_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); + else + chacha20_crypt_generic (state, dst, src, bytes); +} diff --git a/sysdeps/powerpc/powerpc64/power8/Makefile b/sysdeps/powerpc/powerpc64/power8/Makefile index 71a59529f3..abb0aa3f11 100644 --- a/sysdeps/powerpc/powerpc64/power8/Makefile +++ b/sysdeps/powerpc/powerpc64/power8/Makefile @@ -1,3 +1,8 @@ ifeq ($(subdir),string) sysdep_routines += strcasestr-ppc64 endif + +ifeq ($(subdir),stdlib) +sysdep_routines += chacha20-ppc +CFLAGS-chacha20-ppc.c += -mcpu=power8 +endif diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c new file mode 100644 index 0000000000..4a5c963621 --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c @@ -0,0 +1,256 @@ +/* Optimized PowerPC implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* chacha20-ppc.c - PowerPC vector implementation of ChaCha20 + Copyright (C) 2019 Jussi Kivilinna + + This file is part of Libgcrypt. + + Libgcrypt is free software; you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of + the License, or (at your option) any later version. + + Libgcrypt is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this program; if not, see . + */ + +#include +#include +#include +#include +#include + +typedef vector unsigned char vector16x_u8; +typedef vector unsigned int vector4x_u32; +typedef vector unsigned long long vector2x_u64; + +#if __BYTE_ORDER == __BIG_ENDIAN +static const vector16x_u8 le_bswap_const = + { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; +#endif + +static inline vector4x_u32 +vec_rol_elems (vector4x_u32 v, unsigned int idx) +{ +#if __BYTE_ORDER != __BIG_ENDIAN + return vec_sld (v, v, (16 - (4 * idx)) & 15); +#else + return vec_sld (v, v, (4 * idx) & 15); +#endif +} + +static inline vector4x_u32 +vec_load_le (unsigned long offset, const unsigned char *ptr) +{ + vector4x_u32 vec; + vec = vec_vsx_ld (offset, (const uint32_t *)ptr); +#if __BYTE_ORDER == __BIG_ENDIAN + vec = (vector4x_u32) vec_perm ((vector16x_u8)vec, (vector16x_u8)vec, + le_bswap_const); +#endif + return vec; +} + +static inline void +vec_store_le (vector4x_u32 vec, unsigned long offset, unsigned char *ptr) +{ +#if __BYTE_ORDER == __BIG_ENDIAN + vec = (vector4x_u32)vec_perm((vector16x_u8)vec, (vector16x_u8)vec, + le_bswap_const); +#endif + vec_vsx_st (vec, offset, (uint32_t *)ptr); +} + + +static inline vector4x_u32 +vec_add_ctr_u64 (vector4x_u32 v, vector4x_u32 a) +{ +#if __BYTE_ORDER == __BIG_ENDIAN + static const vector16x_u8 swap32 = + { 4, 5, 6, 7, 0, 1, 2, 3, 12, 13, 14, 15, 8, 9, 10, 11 }; + vector2x_u64 vec, add, sum; + + vec = (vector2x_u64)vec_perm ((vector16x_u8)v, (vector16x_u8)v, swap32); + add = (vector2x_u64)vec_perm ((vector16x_u8)a, (vector16x_u8)a, swap32); + sum = vec + add; + return (vector4x_u32)vec_perm ((vector16x_u8)sum, (vector16x_u8)sum, swap32); +#else + return (vector4x_u32)((vector2x_u64)(v) + (vector2x_u64)(a)); +#endif +} + +/********************************************************************** + 4-way chacha20 + **********************************************************************/ + +#define ROTATE(v1,rolv) \ + __asm__ ("vrlw %0,%1,%2\n\t" : "=v" (v1) : "v" (v1), "v" (rolv)) + +#define PLUS(ds,s) \ + ((ds) += (s)) + +#define XOR(ds,s) \ + ((ds) ^= (s)) + +#define ADD_U64(v,a) \ + (v = vec_add_ctr_u64(v, a)) + +/* 4x4 32-bit integer matrix transpose */ +#define transpose_4x4(x0, x1, x2, x3) ({ \ + vector4x_u32 t1 = vec_mergeh(x0, x2); \ + vector4x_u32 t2 = vec_mergel(x0, x2); \ + vector4x_u32 t3 = vec_mergeh(x1, x3); \ + x3 = vec_mergel(x1, x3); \ + x0 = vec_mergeh(t1, t3); \ + x1 = vec_mergel(t1, t3); \ + x2 = vec_mergeh(t2, x3); \ + x3 = vec_mergel(t2, x3); \ + }) + +#define QUARTERROUND2(a1,b1,c1,d1,a2,b2,c2,d2) \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE(d1, rotate_16); ROTATE(d2, rotate_16); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE(b1, rotate_12); ROTATE(b2, rotate_12); \ + PLUS(a1,b1); PLUS(a2,b2); XOR(d1,a1); XOR(d2,a2); \ + ROTATE(d1, rotate_8); ROTATE(d2, rotate_8); \ + PLUS(c1,d1); PLUS(c2,d2); XOR(b1,c1); XOR(b2,c2); \ + ROTATE(b1, rotate_7); ROTATE(b2, rotate_7); + +unsigned int attribute_hidden +__chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t nblks) +{ + vector4x_u32 counters_0123 = { 0, 1, 2, 3 }; + vector4x_u32 counter_4 = { 4, 0, 0, 0 }; + vector4x_u32 rotate_16 = { 16, 16, 16, 16 }; + vector4x_u32 rotate_12 = { 12, 12, 12, 12 }; + vector4x_u32 rotate_8 = { 8, 8, 8, 8 }; + vector4x_u32 rotate_7 = { 7, 7, 7, 7 }; + vector4x_u32 state0, state1, state2, state3; + vector4x_u32 v0, v1, v2, v3, v4, v5, v6, v7; + vector4x_u32 v8, v9, v10, v11, v12, v13, v14, v15; + vector4x_u32 tmp; + int i; + + /* Force preload of constants to vector registers. */ + __asm__ ("": "+v" (counters_0123) :: "memory"); + __asm__ ("": "+v" (counter_4) :: "memory"); + __asm__ ("": "+v" (rotate_16) :: "memory"); + __asm__ ("": "+v" (rotate_12) :: "memory"); + __asm__ ("": "+v" (rotate_8) :: "memory"); + __asm__ ("": "+v" (rotate_7) :: "memory"); + + state0 = vec_vsx_ld (0 * 16, state); + state1 = vec_vsx_ld (1 * 16, state); + state2 = vec_vsx_ld (2 * 16, state); + state3 = vec_vsx_ld (3 * 16, state); + + do + { + v0 = vec_splat (state0, 0); + v1 = vec_splat (state0, 1); + v2 = vec_splat (state0, 2); + v3 = vec_splat (state0, 3); + v4 = vec_splat (state1, 0); + v5 = vec_splat (state1, 1); + v6 = vec_splat (state1, 2); + v7 = vec_splat (state1, 3); + v8 = vec_splat (state2, 0); + v9 = vec_splat (state2, 1); + v10 = vec_splat (state2, 2); + v11 = vec_splat (state2, 3); + v12 = vec_splat (state3, 0); + v13 = vec_splat (state3, 1); + v14 = vec_splat (state3, 2); + v15 = vec_splat (state3, 3); + + v12 += counters_0123; + v13 -= vec_cmplt (v12, counters_0123); + + for (i = 20; i > 0; i -= 2) + { + QUARTERROUND2 (v0, v4, v8, v12, v1, v5, v9, v13) + QUARTERROUND2 (v2, v6, v10, v14, v3, v7, v11, v15) + QUARTERROUND2 (v0, v5, v10, v15, v1, v6, v11, v12) + QUARTERROUND2 (v2, v7, v8, v13, v3, v4, v9, v14) + } + + v0 += vec_splat (state0, 0); + v1 += vec_splat (state0, 1); + v2 += vec_splat (state0, 2); + v3 += vec_splat (state0, 3); + v4 += vec_splat (state1, 0); + v5 += vec_splat (state1, 1); + v6 += vec_splat (state1, 2); + v7 += vec_splat (state1, 3); + v8 += vec_splat (state2, 0); + v9 += vec_splat (state2, 1); + v10 += vec_splat (state2, 2); + v11 += vec_splat (state2, 3); + tmp = vec_splat( state3, 0); + tmp += counters_0123; + v12 += tmp; + v13 += vec_splat (state3, 1) - vec_cmplt (tmp, counters_0123); + v14 += vec_splat (state3, 2); + v15 += vec_splat (state3, 3); + ADD_U64 (state3, counter_4); + + transpose_4x4 (v0, v1, v2, v3); + transpose_4x4 (v4, v5, v6, v7); + transpose_4x4 (v8, v9, v10, v11); + transpose_4x4 (v12, v13, v14, v15); + + vec_store_le (v0, (64 * 0 + 16 * 0), dst); + vec_store_le (v1, (64 * 1 + 16 * 0), dst); + vec_store_le (v2, (64 * 2 + 16 * 0), dst); + vec_store_le (v3, (64 * 3 + 16 * 0), dst); + + vec_store_le (v4, (64 * 0 + 16 * 1), dst); + vec_store_le (v5, (64 * 1 + 16 * 1), dst); + vec_store_le (v6, (64 * 2 + 16 * 1), dst); + vec_store_le (v7, (64 * 3 + 16 * 1), dst); + + vec_store_le (v8, (64 * 0 + 16 * 2), dst); + vec_store_le (v9, (64 * 1 + 16 * 2), dst); + vec_store_le (v10, (64 * 2 + 16 * 2), dst); + vec_store_le (v11, (64 * 3 + 16 * 2), dst); + + vec_store_le (v12, (64 * 0 + 16 * 3), dst); + vec_store_le (v13, (64 * 1 + 16 * 3), dst); + vec_store_le (v14, (64 * 2 + 16 * 3), dst); + vec_store_le (v15, (64 * 3 + 16 * 3), dst); + + src += 4*64; + dst += 4*64; + + nblks -= 4; + } + while (nblks); + + vec_vsx_st (state3, 3 * 16, state); + + return 0; +} diff --git a/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h new file mode 100644 index 0000000000..270c71130f --- /dev/null +++ b/sysdeps/powerpc/powerpc64/power8/chacha20_arch.h @@ -0,0 +1,37 @@ +/* PowerPC optimization for ChaCha20. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include + +unsigned int __chacha20_power8_blocks4 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static void +chacha20_crypt (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t bytes) +{ + _Static_assert (CHACHA20_BUFSIZE % 4 == 0, + "CHACHA20_BUFSIZE not multiple of 4"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 4, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 4"); + + __chacha20_power8_blocks4 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); +} From patchwork Thu Jul 14 11:28:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590378 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1478384max; Thu, 14 Jul 2022 04:31:27 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uK2BpzdAD8G+ZHbwtG8kFmPS3cwzn1/SXIq8RnMg2A0TcrYWZpLNDniZNWybn19Kxm1d7t X-Received: by 2002:a17:907:869e:b0:72a:a008:82fc with SMTP id qa30-20020a170907869e00b0072aa00882fcmr8589896ejc.549.1657798287132; Thu, 14 Jul 2022 04:31:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798287; cv=none; d=google.com; s=arc-20160816; b=d7+o66IkPQkJCgvm8ZBIKuE6xHT9sFgAVrtSQp6n8+QxDpSJrDtty+DDUdIwsbMBA4 UtSCqcba/r7eHILkD2Xf1fWZcOdM9WAUx7JUGtqPN9DfkzWApEIVl8d7s2LzglAaL9VD hW1tepbjmHL2e95Up9b8bPNJ+8m+li4AjXfujKPfQe096LhQsSqwQpw97zS8xfnGD8oV ZN2crWodB7uBj2T3vN48vVRyCPEQx/Mk7RGIygyOj5iyoRJmvYycJhU0KIIIWrJIqMB+ 3gw/ERGTsLZvHTMM8q1AAICyMQ0DW3lGMHSykcAmPDDiem1pwmVsJFr5RsuJVJF22DcM SGrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=0xcaZoiVb12Hf6QibtNeQipmAZLiGzylLH3KFiEgLSQ=; b=CH9iSMofByF2+TCoOjor2D/WYiiGK+NKwkcAoF6BwO7BEMn7gggLd86waNb3brWdih cQbAm+uK54BfrQhD2oRIWR4AOBuKpZZIbxwaatlaV0JlsMNIaZGrZH+9pC+hOS7vwPr9 7P4GlLu8xvspbaGyHqRlehiOuPq6EjBNtw9Ocmy7e27d16IZE5udmlF6dRV6dDD5Ng6x fFNYL49EdXc+A9zTVEMkxwjEx5tUhc7NAMO1G5AYZY28Oc1jcTVQ/o0iy3iTy5Hbju5t kCAMTyh4K8dB3yUJTC6uRkq2ngbfdcbr881nOvSM/L2V/VdQErGqogCpYV36l9XrtLGd g9Nw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=nbdN+4T7; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id q13-20020a1709064c8d00b0072ed9ed7276si1130482eju.203.2022.07.14.04.31.26 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:31:27 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=nbdN+4T7; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B791A3899B4A for ; Thu, 14 Jul 2022 11:31:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B791A3899B4A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798285; bh=0xcaZoiVb12Hf6QibtNeQipmAZLiGzylLH3KFiEgLSQ=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=nbdN+4T7uAmMCN8ULznxeCxXio+iLdXnohT0XaBVB4nG8fCaUHoyFp+7UUsEpZh6U 0EZNETs32hbkJaZ+scV2k+HkxCvgEFy0W57IEWdyHeQtLQ/qjX4DzWhwNe5xlpImZV ce5vVVlhKUrTU2L5hCyIhgDULbh2sX7sHWey3sLo= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x22b.google.com (mail-oi1-x22b.google.com [IPv6:2607:f8b0:4864:20::22b]) by sourceware.org (Postfix) with ESMTPS id 99AD3388B6BA for ; Thu, 14 Jul 2022 11:29:07 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 99AD3388B6BA Received: by mail-oi1-x22b.google.com with SMTP id x185so2014112oig.1 for ; Thu, 14 Jul 2022 04:29:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=0xcaZoiVb12Hf6QibtNeQipmAZLiGzylLH3KFiEgLSQ=; b=2vvL3M6gQnlxfN8z7Yf7lwMFf6mCH7qvI4bT/86oRkKhHy8JOm2p6IgxURDUj6hivk CTjIjHli9st22Zi2ATeZdwdnumpTgkjSfA1tURVAszdKvJ486IKk9ugorgB7+youn7Fd iahhRk0vlPIhri4V0FvTyzFjCNRKBRwYi/eVfD1k+uCCC+2vMt0dIL5U+CPMJAS8kLId wU7sOy7G8ya7nQZhXvdktE5S5XPJ2L0fNWn2iKqWvsR4syT2AagCjPye4zi5Yi4yhH3t 2+rtFetjDli2djTs2fZXMn65xGGbUkXOaegWBKPTZPBSp3PSQcbwBWWXL8LEmVJ6bKYN fO3g== X-Gm-Message-State: AJIora8sQLv32DKN6RC3uzlCY1JXfEVQYIOBFyEQpeztp/hcwIdPAAls JXfiLR+3kh8WAF/oXQ4ZidIR5wsnSR/UKg== X-Received: by 2002:a05:6808:1153:b0:337:a486:f1ca with SMTP id u19-20020a056808115300b00337a486f1camr6815670oiu.264.1657798146607; Thu, 14 Jul 2022 04:29:06 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.29.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:06 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 8/9] s390x: Add optimized chacha20 Date: Thu, 14 Jul 2022 08:28:44 -0300 Message-Id: <20220714112845.704678-9-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-10.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_NUMSUBJECT, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto It adds vectorized ChaCha20 implementation based on libgcrypt cipher/chacha20-s390x.S. The final state register clearing is omitted. On a z15 it shows the following improvements (using formatted bench-arc4random data): GENERIC MB/s ----------------------------------------------- arc4random [single-thread] 198.92 arc4random_buf(16) [single-thread] 244.49 arc4random_buf(32) [single-thread] 282.73 arc4random_buf(48) [single-thread] 286.64 arc4random_buf(64) [single-thread] 320.06 arc4random_buf(80) [single-thread] 297.43 arc4random_buf(96) [single-thread] 310.96 arc4random_buf(112) [single-thread] 308.10 arc4random_buf(128) [single-thread] 309.90 ----------------------------------------------- VX. MB/s ----------------------------------------------- arc4random [single-thread] 430.26 arc4random_buf(16) [single-thread] 735.14 arc4random_buf(32) [single-thread] 1029.99 arc4random_buf(48) [single-thread] 1206.76 arc4random_buf(64) [single-thread] 1311.92 arc4random_buf(80) [single-thread] 1378.74 arc4random_buf(96) [single-thread] 1445.06 arc4random_buf(112) [single-thread] 1484.32 arc4random_buf(128) [single-thread] 1517.30 ----------------------------------------------- Checked on s390x-linux-gnu. --- LICENSES | 3 +- sysdeps/s390/s390-64/Makefile | 6 + sysdeps/s390/s390-64/chacha20-s390x.S | 573 ++++++++++++++++++++++++++ sysdeps/s390/s390-64/chacha20_arch.h | 45 ++ 4 files changed, 626 insertions(+), 1 deletion(-) create mode 100644 sysdeps/s390/s390-64/chacha20-s390x.S create mode 100644 sysdeps/s390/s390-64/chacha20_arch.h diff --git a/LICENSES b/LICENSES index bc18026411..710e8abf0a 100644 --- a/LICENSES +++ b/LICENSES @@ -392,7 +392,8 @@ Copyright 2001 by Stephen L. Moshier sysdeps/aarch64/chacha20-aarch64.S, sysdeps/x86_64/chacha20-amd64-sse2.S, sysdeps/x86_64/chacha20-amd64-avx2.S, and -sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c imports code from libgcrypt, +sysdeps/powerpc/powerpc64/power8/chacha20-ppc.c, and +sysdeps/s390/s390-64/chacha20-s390x.S imports code from libgcrypt, with the following notices: Copyright (C) 2017-2019 Jussi Kivilinna diff --git a/sysdeps/s390/s390-64/Makefile b/sysdeps/s390/s390-64/Makefile index 66ed844e68..96c110f490 100644 --- a/sysdeps/s390/s390-64/Makefile +++ b/sysdeps/s390/s390-64/Makefile @@ -67,3 +67,9 @@ tests-container += tst-glibc-hwcaps-cache endif endif # $(subdir) == elf + +ifeq ($(subdir),stdlib) +sysdep_routines += \ + chacha20-s390x \ + # sysdep_routines +endif diff --git a/sysdeps/s390/s390-64/chacha20-s390x.S b/sysdeps/s390/s390-64/chacha20-s390x.S new file mode 100644 index 0000000000..796da845c9 --- /dev/null +++ b/sysdeps/s390/s390-64/chacha20-s390x.S @@ -0,0 +1,573 @@ +/* Optimized s390x implementation of ChaCha20 cipher. + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* chacha20-s390x.S - zSeries implementation of ChaCha20 cipher + + Copyright (C) 2020 Jussi Kivilinna + + This file is part of Libgcrypt. + + Libgcrypt is free software; you can redistribute it and/or modify + it under the terms of the GNU Lesser General Public License as + published by the Free Software Foundation; either version 2.1 of + the License, or (at your option) any later version. + + Libgcrypt is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with this program; if not, see . + */ + +#include + +#ifdef HAVE_S390_VX_ASM_SUPPORT + +/* CFA expressions are used for pointing CFA and registers to + * SP relative offsets. */ +# define DW_REGNO_SP 15 + +/* Fixed length encoding used for integers for now. */ +# define DW_SLEB128_7BIT(value) \ + 0x00|((value) & 0x7f) +# define DW_SLEB128_28BIT(value) \ + 0x80|((value)&0x7f), \ + 0x80|(((value)>>7)&0x7f), \ + 0x80|(((value)>>14)&0x7f), \ + 0x00|(((value)>>21)&0x7f) + +# define cfi_cfa_on_stack(rsp_offs,cfa_depth) \ + .cfi_escape \ + 0x0f, /* DW_CFA_def_cfa_expression */ \ + DW_SLEB128_7BIT(11), /* length */ \ + 0x7f, /* DW_OP_breg15, rsp + constant */ \ + DW_SLEB128_28BIT(rsp_offs), \ + 0x06, /* DW_OP_deref */ \ + 0x23, /* DW_OP_plus_constu */ \ + DW_SLEB128_28BIT((cfa_depth)+160) + +.machine "z13+vx" +.text + +.balign 16 +.Lconsts: +.Lwordswap: + .byte 12, 13, 14, 15, 8, 9, 10, 11, 4, 5, 6, 7, 0, 1, 2, 3 +.Lbswap128: + .byte 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0 +.Lbswap32: + .byte 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 +.Lone: + .long 0, 0, 0, 1 +.Ladd_counter_0123: + .long 0, 1, 2, 3 +.Ladd_counter_4567: + .long 4, 5, 6, 7 + +/* register macros */ +#define INPUT %r2 +#define DST %r3 +#define SRC %r4 +#define NBLKS %r0 +#define ROUND %r1 + +/* stack structure */ + +#define STACK_FRAME_STD (8 * 16 + 8 * 4) +#define STACK_FRAME_F8_F15 (8 * 8) +#define STACK_FRAME_Y0_Y15 (16 * 16) +#define STACK_FRAME_CTR (4 * 16) +#define STACK_FRAME_PARAMS (6 * 8) + +#define STACK_MAX (STACK_FRAME_STD + STACK_FRAME_F8_F15 + \ + STACK_FRAME_Y0_Y15 + STACK_FRAME_CTR + \ + STACK_FRAME_PARAMS) + +#define STACK_F8 (STACK_MAX - STACK_FRAME_F8_F15) +#define STACK_F9 (STACK_F8 + 8) +#define STACK_F10 (STACK_F9 + 8) +#define STACK_F11 (STACK_F10 + 8) +#define STACK_F12 (STACK_F11 + 8) +#define STACK_F13 (STACK_F12 + 8) +#define STACK_F14 (STACK_F13 + 8) +#define STACK_F15 (STACK_F14 + 8) +#define STACK_Y0_Y15 (STACK_F8 - STACK_FRAME_Y0_Y15) +#define STACK_CTR (STACK_Y0_Y15 - STACK_FRAME_CTR) +#define STACK_INPUT (STACK_CTR - STACK_FRAME_PARAMS) +#define STACK_DST (STACK_INPUT + 8) +#define STACK_SRC (STACK_DST + 8) +#define STACK_NBLKS (STACK_SRC + 8) +#define STACK_POCTX (STACK_NBLKS + 8) +#define STACK_POSRC (STACK_POCTX + 8) + +#define STACK_G0_H3 STACK_Y0_Y15 + +/* vector registers */ +#define A0 %v0 +#define A1 %v1 +#define A2 %v2 +#define A3 %v3 + +#define B0 %v4 +#define B1 %v5 +#define B2 %v6 +#define B3 %v7 + +#define C0 %v8 +#define C1 %v9 +#define C2 %v10 +#define C3 %v11 + +#define D0 %v12 +#define D1 %v13 +#define D2 %v14 +#define D3 %v15 + +#define E0 %v16 +#define E1 %v17 +#define E2 %v18 +#define E3 %v19 + +#define F0 %v20 +#define F1 %v21 +#define F2 %v22 +#define F3 %v23 + +#define G0 %v24 +#define G1 %v25 +#define G2 %v26 +#define G3 %v27 + +#define H0 %v28 +#define H1 %v29 +#define H2 %v30 +#define H3 %v31 + +#define IO0 E0 +#define IO1 E1 +#define IO2 E2 +#define IO3 E3 +#define IO4 F0 +#define IO5 F1 +#define IO6 F2 +#define IO7 F3 + +#define S0 G0 +#define S1 G1 +#define S2 G2 +#define S3 G3 + +#define TMP0 H0 +#define TMP1 H1 +#define TMP2 H2 +#define TMP3 H3 + +#define X0 A0 +#define X1 A1 +#define X2 A2 +#define X3 A3 +#define X4 B0 +#define X5 B1 +#define X6 B2 +#define X7 B3 +#define X8 C0 +#define X9 C1 +#define X10 C2 +#define X11 C3 +#define X12 D0 +#define X13 D1 +#define X14 D2 +#define X15 D3 + +#define Y0 E0 +#define Y1 E1 +#define Y2 E2 +#define Y3 E3 +#define Y4 F0 +#define Y5 F1 +#define Y6 F2 +#define Y7 F3 +#define Y8 G0 +#define Y9 G1 +#define Y10 G2 +#define Y11 G3 +#define Y12 H0 +#define Y13 H1 +#define Y14 H2 +#define Y15 H3 + +/********************************************************************** + helper macros + **********************************************************************/ + +#define _ /*_*/ + +#define START_STACK(last_r) \ + lgr %r0, %r15; \ + lghi %r1, ~15; \ + stmg %r6, last_r, 6 * 8(%r15); \ + aghi %r0, -STACK_MAX; \ + ngr %r0, %r1; \ + lgr %r1, %r15; \ + cfi_def_cfa_register(1); \ + lgr %r15, %r0; \ + stg %r1, 0(%r15); \ + cfi_cfa_on_stack(0, 0); \ + std %f8, STACK_F8(%r15); \ + std %f9, STACK_F9(%r15); \ + std %f10, STACK_F10(%r15); \ + std %f11, STACK_F11(%r15); \ + std %f12, STACK_F12(%r15); \ + std %f13, STACK_F13(%r15); \ + std %f14, STACK_F14(%r15); \ + std %f15, STACK_F15(%r15); + +#define END_STACK(last_r) \ + lg %r1, 0(%r15); \ + ld %f8, STACK_F8(%r15); \ + ld %f9, STACK_F9(%r15); \ + ld %f10, STACK_F10(%r15); \ + ld %f11, STACK_F11(%r15); \ + ld %f12, STACK_F12(%r15); \ + ld %f13, STACK_F13(%r15); \ + ld %f14, STACK_F14(%r15); \ + ld %f15, STACK_F15(%r15); \ + lmg %r6, last_r, 6 * 8(%r1); \ + lgr %r15, %r1; \ + cfi_def_cfa_register(DW_REGNO_SP); + +#define PLUS(dst,src) \ + vaf dst, dst, src; + +#define XOR(dst,src) \ + vx dst, dst, src; + +#define ROTATE(v1,c) \ + verllf v1, v1, (c)(0); + +#define WORD_ROTATE(v1,s) \ + vsldb v1, v1, v1, ((s) * 4); + +#define DST_8(OPER, I, J) \ + OPER(A##I, J); OPER(B##I, J); OPER(C##I, J); OPER(D##I, J); \ + OPER(E##I, J); OPER(F##I, J); OPER(G##I, J); OPER(H##I, J); + +/********************************************************************** + round macros + **********************************************************************/ + +/********************************************************************** + 8-way chacha20 ("vertical") + **********************************************************************/ + +#define QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ + x8,x9,x10,x11,x12,x13,x14,x15,\ + y0,y1,y2,y3,y4,y5,y6,y7,\ + y8,y9,y10,y11,y12,y13,y14,y15,\ + op1,op2,op3,op4,op5,op6,op7,op8,\ + op9,op10,op11,op12) \ + op1; \ + PLUS(x0, x1); PLUS(x4, x5); \ + PLUS(x8, x9); PLUS(x12, x13); \ + PLUS(y0, y1); PLUS(y4, y5); \ + PLUS(y8, y9); PLUS(y12, y13); \ + op2; \ + XOR(x3, x0); XOR(x7, x4); \ + XOR(x11, x8); XOR(x15, x12); \ + XOR(y3, y0); XOR(y7, y4); \ + XOR(y11, y8); XOR(y15, y12); \ + op3; \ + ROTATE(x3, 16); ROTATE(x7, 16); \ + ROTATE(x11, 16); ROTATE(x15, 16); \ + ROTATE(y3, 16); ROTATE(y7, 16); \ + ROTATE(y11, 16); ROTATE(y15, 16); \ + op4; \ + PLUS(x2, x3); PLUS(x6, x7); \ + PLUS(x10, x11); PLUS(x14, x15); \ + PLUS(y2, y3); PLUS(y6, y7); \ + PLUS(y10, y11); PLUS(y14, y15); \ + op5; \ + XOR(x1, x2); XOR(x5, x6); \ + XOR(x9, x10); XOR(x13, x14); \ + XOR(y1, y2); XOR(y5, y6); \ + XOR(y9, y10); XOR(y13, y14); \ + op6; \ + ROTATE(x1,12); ROTATE(x5,12); \ + ROTATE(x9,12); ROTATE(x13,12); \ + ROTATE(y1,12); ROTATE(y5,12); \ + ROTATE(y9,12); ROTATE(y13,12); \ + op7; \ + PLUS(x0, x1); PLUS(x4, x5); \ + PLUS(x8, x9); PLUS(x12, x13); \ + PLUS(y0, y1); PLUS(y4, y5); \ + PLUS(y8, y9); PLUS(y12, y13); \ + op8; \ + XOR(x3, x0); XOR(x7, x4); \ + XOR(x11, x8); XOR(x15, x12); \ + XOR(y3, y0); XOR(y7, y4); \ + XOR(y11, y8); XOR(y15, y12); \ + op9; \ + ROTATE(x3,8); ROTATE(x7,8); \ + ROTATE(x11,8); ROTATE(x15,8); \ + ROTATE(y3,8); ROTATE(y7,8); \ + ROTATE(y11,8); ROTATE(y15,8); \ + op10; \ + PLUS(x2, x3); PLUS(x6, x7); \ + PLUS(x10, x11); PLUS(x14, x15); \ + PLUS(y2, y3); PLUS(y6, y7); \ + PLUS(y10, y11); PLUS(y14, y15); \ + op11; \ + XOR(x1, x2); XOR(x5, x6); \ + XOR(x9, x10); XOR(x13, x14); \ + XOR(y1, y2); XOR(y5, y6); \ + XOR(y9, y10); XOR(y13, y14); \ + op12; \ + ROTATE(x1,7); ROTATE(x5,7); \ + ROTATE(x9,7); ROTATE(x13,7); \ + ROTATE(y1,7); ROTATE(y5,7); \ + ROTATE(y9,7); ROTATE(y13,7); + +#define QUARTERROUND4_V8(x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,\ + y0,y1,y2,y3,y4,y5,y6,y7,y8,y9,y10,y11,y12,y13,y14,y15) \ + QUARTERROUND4_V8_POLY(x0,x1,x2,x3,x4,x5,x6,x7,\ + x8,x9,x10,x11,x12,x13,x14,x15,\ + y0,y1,y2,y3,y4,y5,y6,y7,\ + y8,y9,y10,y11,y12,y13,y14,y15,\ + ,,,,,,,,,,,) + +#define TRANSPOSE_4X4_2(v0,v1,v2,v3,va,vb,vc,vd,tmp0,tmp1,tmp2,tmpa,tmpb,tmpc) \ + vmrhf tmp0, v0, v1; \ + vmrhf tmp1, v2, v3; \ + vmrlf tmp2, v0, v1; \ + vmrlf v3, v2, v3; \ + vmrhf tmpa, va, vb; \ + vmrhf tmpb, vc, vd; \ + vmrlf tmpc, va, vb; \ + vmrlf vd, vc, vd; \ + vpdi v0, tmp0, tmp1, 0; \ + vpdi v1, tmp0, tmp1, 5; \ + vpdi v2, tmp2, v3, 0; \ + vpdi v3, tmp2, v3, 5; \ + vpdi va, tmpa, tmpb, 0; \ + vpdi vb, tmpa, tmpb, 5; \ + vpdi vc, tmpc, vd, 0; \ + vpdi vd, tmpc, vd, 5; + +.balign 8 +.globl __chacha20_s390x_vx_blocks8 +ENTRY (__chacha20_s390x_vx_blocks8) + /* input: + * %r2: input + * %r3: dst + * %r4: src + * %r5: nblks (multiple of 8) + */ + + START_STACK(%r8); + lgr NBLKS, %r5; + + larl %r7, .Lconsts; + + /* Load counter. */ + lg %r8, (12 * 4)(INPUT); + rllg %r8, %r8, 32; + +.balign 4 + /* Process eight chacha20 blocks per loop. */ +.Lloop8: + vlm Y0, Y3, 0(INPUT); + + slgfi NBLKS, 8; + lghi ROUND, (20 / 2); + + /* Construct counter vectors X12/X13 & Y12/Y13. */ + vl X4, (.Ladd_counter_0123 - .Lconsts)(%r7); + vl Y4, (.Ladd_counter_4567 - .Lconsts)(%r7); + vrepf Y12, Y3, 0; + vrepf Y13, Y3, 1; + vaccf X5, Y12, X4; + vaccf Y5, Y12, Y4; + vaf X12, Y12, X4; + vaf Y12, Y12, Y4; + vaf X13, Y13, X5; + vaf Y13, Y13, Y5; + + vrepf X0, Y0, 0; + vrepf X1, Y0, 1; + vrepf X2, Y0, 2; + vrepf X3, Y0, 3; + vrepf X4, Y1, 0; + vrepf X5, Y1, 1; + vrepf X6, Y1, 2; + vrepf X7, Y1, 3; + vrepf X8, Y2, 0; + vrepf X9, Y2, 1; + vrepf X10, Y2, 2; + vrepf X11, Y2, 3; + vrepf X14, Y3, 2; + vrepf X15, Y3, 3; + + /* Store counters for blocks 0-7. */ + vstm X12, X13, (STACK_CTR + 0 * 16)(%r15); + vstm Y12, Y13, (STACK_CTR + 2 * 16)(%r15); + + vlr Y0, X0; + vlr Y1, X1; + vlr Y2, X2; + vlr Y3, X3; + vlr Y4, X4; + vlr Y5, X5; + vlr Y6, X6; + vlr Y7, X7; + vlr Y8, X8; + vlr Y9, X9; + vlr Y10, X10; + vlr Y11, X11; + vlr Y14, X14; + vlr Y15, X15; + + /* Update and store counter. */ + agfi %r8, 8; + rllg %r5, %r8, 32; + stg %r5, (12 * 4)(INPUT); + +.balign 4 +.Lround2_8: + QUARTERROUND4_V8(X0, X4, X8, X12, X1, X5, X9, X13, + X2, X6, X10, X14, X3, X7, X11, X15, + Y0, Y4, Y8, Y12, Y1, Y5, Y9, Y13, + Y2, Y6, Y10, Y14, Y3, Y7, Y11, Y15); + QUARTERROUND4_V8(X0, X5, X10, X15, X1, X6, X11, X12, + X2, X7, X8, X13, X3, X4, X9, X14, + Y0, Y5, Y10, Y15, Y1, Y6, Y11, Y12, + Y2, Y7, Y8, Y13, Y3, Y4, Y9, Y14); + brctg ROUND, .Lround2_8; + + /* Store blocks 4-7. */ + vstm Y0, Y15, STACK_Y0_Y15(%r15); + + /* Load counters for blocks 0-3. */ + vlm Y0, Y1, (STACK_CTR + 0 * 16)(%r15); + + lghi ROUND, 1; + j .Lfirst_output_4blks_8; + +.balign 4 +.Lsecond_output_4blks_8: + /* Load blocks 4-7. */ + vlm X0, X15, STACK_Y0_Y15(%r15); + + /* Load counters for blocks 4-7. */ + vlm Y0, Y1, (STACK_CTR + 2 * 16)(%r15); + + lghi ROUND, 0; + +.balign 4 + /* Output four chacha20 blocks per loop. */ +.Lfirst_output_4blks_8: + vlm Y12, Y15, 0(INPUT); + PLUS(X12, Y0); + PLUS(X13, Y1); + vrepf Y0, Y12, 0; + vrepf Y1, Y12, 1; + vrepf Y2, Y12, 2; + vrepf Y3, Y12, 3; + vrepf Y4, Y13, 0; + vrepf Y5, Y13, 1; + vrepf Y6, Y13, 2; + vrepf Y7, Y13, 3; + vrepf Y8, Y14, 0; + vrepf Y9, Y14, 1; + vrepf Y10, Y14, 2; + vrepf Y11, Y14, 3; + vrepf Y14, Y15, 2; + vrepf Y15, Y15, 3; + PLUS(X0, Y0); + PLUS(X1, Y1); + PLUS(X2, Y2); + PLUS(X3, Y3); + PLUS(X4, Y4); + PLUS(X5, Y5); + PLUS(X6, Y6); + PLUS(X7, Y7); + PLUS(X8, Y8); + PLUS(X9, Y9); + PLUS(X10, Y10); + PLUS(X11, Y11); + PLUS(X14, Y14); + PLUS(X15, Y15); + + vl Y15, (.Lbswap32 - .Lconsts)(%r7); + TRANSPOSE_4X4_2(X0, X1, X2, X3, X4, X5, X6, X7, + Y9, Y10, Y11, Y12, Y13, Y14); + TRANSPOSE_4X4_2(X8, X9, X10, X11, X12, X13, X14, X15, + Y9, Y10, Y11, Y12, Y13, Y14); + + vlm Y0, Y14, 0(SRC); + vperm X0, X0, X0, Y15; + vperm X1, X1, X1, Y15; + vperm X2, X2, X2, Y15; + vperm X3, X3, X3, Y15; + vperm X4, X4, X4, Y15; + vperm X5, X5, X5, Y15; + vperm X6, X6, X6, Y15; + vperm X7, X7, X7, Y15; + vperm X8, X8, X8, Y15; + vperm X9, X9, X9, Y15; + vperm X10, X10, X10, Y15; + vperm X11, X11, X11, Y15; + vperm X12, X12, X12, Y15; + vperm X13, X13, X13, Y15; + vperm X14, X14, X14, Y15; + vperm X15, X15, X15, Y15; + vl Y15, (15 * 16)(SRC); + + XOR(Y0, X0); + XOR(Y1, X4); + XOR(Y2, X8); + XOR(Y3, X12); + XOR(Y4, X1); + XOR(Y5, X5); + XOR(Y6, X9); + XOR(Y7, X13); + XOR(Y8, X2); + XOR(Y9, X6); + XOR(Y10, X10); + XOR(Y11, X14); + XOR(Y12, X3); + XOR(Y13, X7); + XOR(Y14, X11); + XOR(Y15, X15); + vstm Y0, Y15, 0(DST); + + aghi SRC, 256; + aghi DST, 256; + + clgije ROUND, 1, .Lsecond_output_4blks_8; + + clgijhe NBLKS, 8, .Lloop8; + + + END_STACK(%r8); + xgr %r2, %r2; + br %r14; +END (__chacha20_s390x_vx_blocks8) + +#endif /* HAVE_S390_VX_ASM_SUPPORT */ diff --git a/sysdeps/s390/s390-64/chacha20_arch.h b/sysdeps/s390/s390-64/chacha20_arch.h new file mode 100644 index 0000000000..78252c5488 --- /dev/null +++ b/sysdeps/s390/s390-64/chacha20_arch.h @@ -0,0 +1,45 @@ +/* s390x optimization for ChaCha20.VE_S390_VX_ASM_SUPPORT + Copyright (C) 2022 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#include +#include +#include + +unsigned int __chacha20_s390x_vx_blocks8 (uint32_t *state, uint8_t *dst, + const uint8_t *src, size_t nblks) + attribute_hidden; + +static inline void +chacha20_crypt (uint32_t *state, uint8_t *dst, const uint8_t *src, + size_t bytes) +{ +#ifdef HAVE_S390_VX_ASM_SUPPORT + _Static_assert (CHACHA20_BUFSIZE % 8 == 0, + "CHACHA20_BUFSIZE not multiple of 8"); + _Static_assert (CHACHA20_BUFSIZE >= CHACHA20_BLOCK_SIZE * 8, + "CHACHA20_BUFSIZE < CHACHA20_BLOCK_SIZE * 8"); + + if (GLRO(dl_hwcap) & HWCAP_S390_VX) + { + __chacha20_s390x_vx_blocks8 (state, dst, src, + CHACHA20_BUFSIZE / CHACHA20_BLOCK_SIZE); + return; + } +#endif + chacha20_crypt_generic (state, dst, src, bytes); +} From patchwork Thu Jul 14 11:28:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 590377 Delivered-To: patch@linaro.org Received: by 2002:a05:7000:5817:0:0:0:0 with SMTP id j23csp1478158max; Thu, 14 Jul 2022 04:31:10 -0700 (PDT) X-Google-Smtp-Source: AGRyM1sfpeShMtc1MFuASlPsuwj7J60aWhSIBPJ7CBnnDAMHuzHbXht6MgmV3onWAUwYPLu25L4y X-Received: by 2002:a05:6402:1cc8:b0:437:a61a:5713 with SMTP id ds8-20020a0564021cc800b00437a61a5713mr11845071edb.340.1657798270537; Thu, 14 Jul 2022 04:31:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657798270; cv=none; d=google.com; s=arc-20160816; b=ekArFO29mgZboIxcu7yUVuS4MnQpSvNo67fHp9r09jUjgbyLMtZRa8HKaKlFzHqoQ/ i2QBqTFRV3pc2A2ph3bQiMeYTeWVXeTgELGaYkCeTE/PiHyWkdv780Cf4HgcNUshkTpr mkZaT7TU/9hXHM90AFrECnFzXmJAVH9CY6S0GNoGv90HCvOyXfLFgz4/rX1kCg0KcV2C +q2fxFEYbYRuFgCsKp/a1uj4/Tx6fTZVHK28a6FiAzSiuwG7YRTa78aBpdJSbsVy9IYM /4XDfxDm6BvcdXWfTc90ptSBkcybs4fvlT3Hz5qtnHajNWd8wau6esz2CwL6UuLCsW+u moQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=aMDRCzskLjpC+tZYW99AO+E/0W0Kj1wjhROzDkqDpAY=; b=yEcvjorV62NBzuHAu2ICCxIXofMQKF0ti9jSsrA/yhBAxN6uE5l67PJ8daqdTPU4Lt Kw740UFcCNUbfiD8oda8a413Ofh1pEGLe/p7nl/rTXn0lodygmZp2miUdRXyRUxCLGqn YlaNOkKFM58Y8M7OBxZzrtKacJr69TMZ9yPKhL3BEbyEcCRQuGOVn8B4CWLCe9Iuttan IA6N50jAozsywYAhoWJ++15J6J8KRs2K9Hivz19oC9/N5nYUxx+iA5E0kMCWXGzbbjti 7isRE9yD72uxbreUy/Apbc/QjDSd9vieKVdQHLmwJmI9Rs4PqjXyInXZf+pB88KMsc0x HvAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=dIlOKa7z; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Return-Path: Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g12-20020a056402424c00b004315dd76796si2272252edb.482.2022.07.14.04.31.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:31:10 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b=dIlOKa7z; spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3AE6C3889E1C for ; Thu, 14 Jul 2022 11:31:09 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3AE6C3889E1C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1657798269; bh=aMDRCzskLjpC+tZYW99AO+E/0W0Kj1wjhROzDkqDpAY=; h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=dIlOKa7zDZQ40oWmdBF/nePMAR3FjkqnUiBObq1sxz6x4hRq7w6JG9feOmmsOToI/ fGCRfi7Hjdl8jRNPr5M9VW10WFtFt5rZn0rTBAwR9GrVCXPstE3jF5DKKzk/SdVq80 8GAMAyBzzslA9iRYGj3u2/KizLAP4CZXsL7BHTXg= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id DFD02388CE97 for ; Thu, 14 Jul 2022 11:29:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DFD02388CE97 Received: by mail-oi1-x22e.google.com with SMTP id r191so1976737oie.7 for ; Thu, 14 Jul 2022 04:29:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=aMDRCzskLjpC+tZYW99AO+E/0W0Kj1wjhROzDkqDpAY=; b=0SGFU2HTY6low9mrYGmQh07zHSUbNMNnRDSuW2EJuRY16/CF5EuPsu8sakXI+/jC32 B+yC5tFsb9+aP3vPawwsbE5pptVzdFDtpDhOlLI9fVkKg4v5O+QjBnUXsz/YpMVhMkoa USMbbiBt54owC3GH/pk5wQ1RjYvb5e1TEobKNO3FL0F6Q7hia7BmYAUy1xBVBm9SQJCl rzKffeucZtL37e56tNG8d03qKctPy/mOfI5AMYIhuTtVIS2BF+g7FkzxV4tJYUvBfK5a gu9eGZUPY71GUIWS/28Tbe7Qe8hrYhBpJyYQXJrphvh4qJb8NwcwbWOHrYJKnw4iDvNP 5IHQ== X-Gm-Message-State: AJIora/qaI2Ffgo2mbmaBNNB8PIkQdX06fgJhJXPl1aWwy5AZG4iWlz0 aZoKfjuWZDa4NOvKi5pXqupaMsRCzX0Rdw== X-Received: by 2002:a05:6808:23cf:b0:335:63ca:419c with SMTP id bq15-20020a05680823cf00b0033563ca419cmr3970610oib.281.1657798148169; Thu, 14 Jul 2022 04:29:08 -0700 (PDT) Received: from mandiga.. ([2804:431:c7ca:19c3:3696:7000:2f6a:a6f4]) by smtp.gmail.com with ESMTPSA id k25-20020a056830243900b0061c4761c8cbsm562266ots.24.2022.07.14.04.29.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Jul 2022 04:29:07 -0700 (PDT) To: libc-alpha@sourceware.org, Florian Weimer Subject: [PATCH v10 9/9] manual: Add documentation for arc4random functions Date: Thu, 14 Jul 2022 08:28:45 -0300 Message-Id: <20220714112845.704678-10-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20220714112845.704678-1-adhemerval.zanella@linaro.org> References: <20220714112845.704678-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Adhemerval Zanella via Libc-alpha From: Adhemerval Zanella Netto Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org Sender: "Libc-alpha" From: Adhemerval Zanella Netto --- manual/math.texi | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/manual/math.texi b/manual/math.texi index 477a18b6d1..141695cc30 100644 --- a/manual/math.texi +++ b/manual/math.texi @@ -1447,6 +1447,7 @@ systems. * ISO Random:: @code{rand} and friends. * BSD Random:: @code{random} and friends. * SVID Random:: @code{drand48} and friends. +* High Quality Random:: @code{arc4random} and friends. @end menu @node ISO Random @@ -1985,6 +1986,51 @@ This function is a GNU extension and should not be used in portable programs. @end deftypefun +@node High Quality Random +@subsection High Quality Random Number Functions + +This section describes the random number functions provided as a GNU +extension, based on OpenBSD interfaces. + +@Theglibc{} uses kernel entropy obtained either through @code{getrandom} +or by reading @file{/dev/urandom} to seed and periodically re-seed the +internal state. A per-thread data pool is used, which allows fast output +generation. + +Although these functions provide higher random quality than ISO, BSD, and +SVID functions, these still use a Pseudo-Random generator and should not +be used in cryptographic contexts. + +The internal state is cleared and reseeded with kernel entropy on @code{fork} +and @code{_Fork}. It is not cleared on either a direct @code{clone} syscall +or when using @theglibc{} @code{syscall} function. + +The prototypes for these functions are in @file{stdlib.h}. +@pindex stdlib.h + +@deftypefun uint32_t arc4random (void) +@standards{BSD, stdlib.h} +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}} +This function returns a single 32-bit value in the range of @code{0} to +@code{2^32−1} (inclusive), which is twice the range of @code{rand} and +@code{random}. +@end deftypefun + +@deftypefun void arc4random_buf (void *@var{buffer}, size_t @var{length}) +@standards{BSD, stdlib.h} +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}} +This function fills the region @var{buffer} of length @var{length} bytes +with random data. +@end deftypefun + +@deftypefun uint32_t arc4random_uniform (uint32_t @var{upper_bound}) +@standards{BSD, stdlib.h} +@safety{@mtsafe{}@asunsafe{@asucorrupt{}}@acsafe{}} +This function returns a single 32-bit value, uniformly distributed but +less than the @var{upper_bound}. It avoids the @w{modulo bias} when the +upper bound is not a power of two. +@end deftypefun + @node FP Function Optimizations @section Is Fast Code or Small Code preferred? @cindex Optimization