From patchwork Tue Mar 5 17:02:57 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 778051 Delivered-To: patch@linaro.org Received: by 2002:a5d:48c8:0:b0:33e:474f:8c56 with SMTP id p8csp540976wrs; Tue, 5 Mar 2024 09:03:45 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCU7UzMknRTTl0/sMBVE3sI9Z42YYxiepslY3H2QSZjqZyueRdyX7mopipBX+AEwbtpeeywSpIc+REhLwmghbgx1 X-Google-Smtp-Source: AGHT+IFgSmaovbb+CwTzcHod4JDgmwZmCZBRPjmCkopCFBK0BMLq9d9CmIRQWkEuWpTzqFguEKUD X-Received: by 2002:a05:6214:5096:b0:68f:a0c2:7341 with SMTP id kk22-20020a056214509600b0068fa0c27341mr3712614qvb.13.1709658224718; Tue, 05 Mar 2024 09:03:44 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709658224; cv=pass; d=google.com; s=arc-20160816; b=i+vd0wxmGCG1jsAwH2aBvV78RYkx6DHQgJVjH777rNiCbrfNg626IyMnQ0GlcxJj65 K9yWe5pnWEIfr86Ahfzfcygie/97cs9GjkwqtUX80xyGdxr/6E7r/XqogfHiV6tb4F4/ i0JGiGk9kxHGDFuPho7kCaW7wSIrsALb541r8YVYNK/U9TuLPepK4k30qZpFlHB+sSES piMGRG6RgU5Xx6u9BnKRC+OnNT8SCDvexOuC/l/FO3cXOgwpuWIj0Pu4yB+YjKRktXY0 bUoWEhIDEXcQoJwk6N2I4RVLmoBuNLUZ577EthfJQI5HEJii9cCk5LflWl1H22hRinuv PsLw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=dBBpZYlv0He0NxaSQ0JvDk8vCfWDcR0G+e02fuRtEpU=; fh=JsS+P7lgaefH3JGAF1ZMkV7xk1BwSreG5tHenrOJbMY=; b=TkboqZVfwAJ2g8lzMX0iv1CHsNbGTGhJeapa8rhdyT82m8dFtJAN6DUxA+W0MBE8S8 wVm+swDah9EmmX85OpYj9VcqeCXd5WYufG+RpyFsrSriW/mkA01UgMGHSPpgiJqFzToh E09PhfmSzRmtJWeJ80Ixs73HnjxkyLnuU+6nIH2tGisUBlwaM1IqPYC6yIomyP0fvsAl 33DHpg88pOLeFE0AA+HUr9KbzGqfP2MrAs+4F3vLoLvyJq7Jyh4Td+7FxXbca9p8cUOH VC/xrzwYM84YVlOelFTlS6uEfOT7g/wz2BsaE7m77t/Lq7r0yKrX00OtQUz0/FUr7mpE NTcg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jBWUiqz4; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id p10-20020ad45f4a000000b006901a3b7869si12227618qvg.297.2024.03.05.09.03.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Mar 2024 09:03:44 -0800 (PST) Received-SPF: pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jBWUiqz4; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces+patch=linaro.org@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="libc-alpha-bounces+patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4C76438582A1 for ; Tue, 5 Mar 2024 17:03:44 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pg1-x534.google.com (mail-pg1-x534.google.com [IPv6:2607:f8b0:4864:20::534]) by sourceware.org (Postfix) with ESMTPS id AE7053858426 for ; Tue, 5 Mar 2024 17:03:08 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org AE7053858426 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org AE7053858426 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::534 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709658193; cv=none; b=jB5CdWAoxfplnyxb3bwb4ufyvivd8on7odzP3Tnymr6PiTM1HF/h1DPO5CzgdV4ew8mVENJuY8iD5y+yC7H7yNuGetdutymk5f501Tz/tmVa2UVsutkffWbZ5dn5W0G1ciHss/XEEUV0W1vri/IXF3+hbgjmOYRCYC09flXMgGQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1709658193; c=relaxed/simple; bh=dbYUGwwsHJpn/JilY/XEfcw3D1McKxt4sv2ycqWfArs=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=LQNszGYM+LV7+HTVFubczNL0EUWmCXZXhN80JMFo0255YxxmlooqEVKq9CqzwawMr6vuBWWKoa/NmDurnQzPt/U6yMwlpH0Ohc7IR4ZmW76gNTxqWoKBnljZ0CCrvkDMPmWwWx5TCMbmS+BLFL8DhDr4h+bovj36JRhScOJA+Fs= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pg1-x534.google.com with SMTP id 41be03b00d2f7-5c66b093b86so5438757a12.0 for ; Tue, 05 Mar 2024 09:03:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1709658183; x=1710262983; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=dBBpZYlv0He0NxaSQ0JvDk8vCfWDcR0G+e02fuRtEpU=; b=jBWUiqz4Q9IsU9uEUBm7ogMXZmlN5ACobbfuWV8J2fdc7zY4BNvwbvVwKO3jKPYyfG OSMhS4RHwTeVrxpD6TYMEqcbdx8oFo52crHV5xP+fuoDBLT6fLvnPKAoFJh1mgtYf40N dBHjNdcEDVaGGNum1sts+s+OwRcTJFKCwSVg25jJ7PHOCmwxXzMZ2WtmPmG6ccmciAMt MFfiSrr/ml5G1DOLa4w63IDCcDfxNbo1apZelqr+vDdggf+JcZ3XAUjCZdeG6iUY9rfw bfdBdgTe9NtmuOgHnPVn7xzfp99uYjT8FvZTxUZ7dbxy/WoMTvITNJ82O/mwwHYWKPSM 2zTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709658183; x=1710262983; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dBBpZYlv0He0NxaSQ0JvDk8vCfWDcR0G+e02fuRtEpU=; b=DwYu9yY7SXbVytTNpqH6uFtoAP6hdzNftkGppvErwvJxo1x7d3k517xIb5xdGTihls RVUMlPCN5xYhkv+MIatLnwv6srwRcT/74VvmaVUPa39gIr2K/GuIaZStvASCB+6tHHoG AKNG2BLVNr7vfRbYC8wF1NB0w4H4pKd9IXQlsLXWmziU8dsGegPpyDrLs/dFQoewrThW XEAsZGkBdcc+zw2++ON/cbZb9z5kJNWNL2pqYeaVzLA0LBlF64r49dbCkjHWnT7mhHa+ qpIouUhXmT8W7OEdS336OrzEB4SSACP+NhmX0Wudn9jP1MD62SzWdQ6UoG6d16iLikWC 0LoQ== X-Gm-Message-State: AOJu0YxUeMfXKGS9NF8hObyUkKY8cwEnZLp9WPV7Oj1DSlH/QwS3hq1P wW+tehq2RnS8T2ljgDnv3r8VuVP7l+++oT4vf9x/Z26W3Cl9UqAPZA47PSEGsppPf/7V6EApUpy b X-Received: by 2002:a17:90a:b107:b0:29a:f199:1647 with SMTP id z7-20020a17090ab10700b0029af1991647mr4105728pjq.1.1709658182319; Tue, 05 Mar 2024 09:03:02 -0800 (PST) Received: from mandiga.. ([2804:1b3:a7c1:ec17:6677:fbdd:5bee:239d]) by smtp.gmail.com with ESMTPSA id k76-20020a633d4f000000b005dc8702f0a9sm9333205pga.1.2024.03.05.09.03.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Mar 2024 09:03:01 -0800 (PST) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: vineetg@rivosinc.com, Evan Green , palmer@rivosinc.com, slewis@rivosinc.com, Andreas Schwab Subject: [PATCH v2] riscv: Fix alignment-ignorant memcpy implementation Date: Tue, 5 Mar 2024 14:02:57 -0300 Message-Id: <20240305170257.2639072-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces+patch=linaro.org@sourceware.org The memcpy optimization (commit 587a1290a1af7bee6db) has a series of mistakes: - The implementation is wrong: the chunk size calculation is wrong leading to invalid memory access. - It adds ifunc supports as default, so --disable-multi-arch does not work as expected for riscv. - It mixes Linux files (memcpy ifunc selection which requires the vDSO/syscall mechanism) with generic support (the memcpy optimization itself). - There is no __libc_ifunc_impl_list, which makes testing only check the selected implementation instead of all supported by the system. This patch also simplifies the required bits to enable ifunc: there is no need to memcopy.h; nor to add Linux-specific files. The __memcpy_noalignment tail handling now uses a branchless strategy similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte copies for size 1..3). Checked on riscv64 and riscv32 by explicitly enabling the function on __libc_ifunc_impl_list on qemu-system. Changes from v1: * Implement the memcpy in assembly to correctly handle RISCV strict-alignment. Reviewed-by: Evan Green Acked-by: Palmer Dabbelt --- sysdeps/riscv/memcpy_noalignment.S | 136 --------------- .../multiarch}/memcpy-generic.c | 8 +- sysdeps/riscv/multiarch/memcpy_noalignment.S | 162 ++++++++++++++++++ sysdeps/unix/sysv/linux/riscv/Makefile | 9 - sysdeps/unix/sysv/linux/riscv/hwprobe.c | 1 + .../sysv/linux/riscv/include/sys/hwprobe.h | 8 + .../unix/sysv/linux/riscv/multiarch/Makefile | 9 + .../linux/riscv/multiarch/ifunc-impl-list.c} | 33 +++- .../sysv/linux/riscv/multiarch}/memcpy.c | 16 +- 9 files changed, 215 insertions(+), 167 deletions(-) delete mode 100644 sysdeps/riscv/memcpy_noalignment.S rename sysdeps/{unix/sysv/linux/riscv => riscv/multiarch}/memcpy-generic.c (87%) create mode 100644 sysdeps/riscv/multiarch/memcpy_noalignment.S create mode 100644 sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h create mode 100644 sysdeps/unix/sysv/linux/riscv/multiarch/Makefile rename sysdeps/{riscv/memcopy.h => unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c} (51%) rename sysdeps/{riscv => unix/sysv/linux/riscv/multiarch}/memcpy.c (90%) diff --git a/sysdeps/riscv/memcpy_noalignment.S b/sysdeps/riscv/memcpy_noalignment.S deleted file mode 100644 index 621f8d028f..0000000000 --- a/sysdeps/riscv/memcpy_noalignment.S +++ /dev/null @@ -1,136 +0,0 @@ -/* memcpy for RISC-V, ignoring buffer alignment - Copyright (C) 2024 Free Software Foundation, Inc. - This file is part of the GNU C Library. - - The GNU C Library is free software; you can redistribute it and/or - modify it under the terms of the GNU Lesser General Public - License as published by the Free Software Foundation; either - version 2.1 of the License, or (at your option) any later version. - - The GNU C Library is distributed in the hope that it will be useful, - but WITHOUT ANY WARRANTY; without even the implied warranty of - MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - Lesser General Public License for more details. - - You should have received a copy of the GNU Lesser General Public - License along with the GNU C Library. If not, see - . */ - -#include -#include - -/* void *memcpy(void *, const void *, size_t) */ -ENTRY (__memcpy_noalignment) - move t6, a0 /* Preserve return value */ - - /* Bail if 0 */ - beqz a2, 7f - - /* Jump to byte copy if size < SZREG */ - li a4, SZREG - bltu a2, a4, 5f - - /* Round down to the nearest "page" size */ - andi a4, a2, ~((16*SZREG)-1) - beqz a4, 2f - add a3, a1, a4 - - /* Copy the first word to get dest word aligned */ - andi a5, t6, SZREG-1 - beqz a5, 1f - REG_L a6, (a1) - REG_S a6, (t6) - - /* Align dst up to a word, move src and size as well. */ - addi t6, t6, SZREG-1 - andi t6, t6, ~(SZREG-1) - sub a5, t6, a0 - add a1, a1, a5 - sub a2, a2, a5 - - /* Recompute page count */ - andi a4, a2, ~((16*SZREG)-1) - beqz a4, 2f - -1: - /* Copy "pages" (chunks of 16 registers) */ - REG_L a4, 0(a1) - REG_L a5, SZREG(a1) - REG_L a6, 2*SZREG(a1) - REG_L a7, 3*SZREG(a1) - REG_L t0, 4*SZREG(a1) - REG_L t1, 5*SZREG(a1) - REG_L t2, 6*SZREG(a1) - REG_L t3, 7*SZREG(a1) - REG_L t4, 8*SZREG(a1) - REG_L t5, 9*SZREG(a1) - REG_S a4, 0(t6) - REG_S a5, SZREG(t6) - REG_S a6, 2*SZREG(t6) - REG_S a7, 3*SZREG(t6) - REG_S t0, 4*SZREG(t6) - REG_S t1, 5*SZREG(t6) - REG_S t2, 6*SZREG(t6) - REG_S t3, 7*SZREG(t6) - REG_S t4, 8*SZREG(t6) - REG_S t5, 9*SZREG(t6) - REG_L a4, 10*SZREG(a1) - REG_L a5, 11*SZREG(a1) - REG_L a6, 12*SZREG(a1) - REG_L a7, 13*SZREG(a1) - REG_L t0, 14*SZREG(a1) - REG_L t1, 15*SZREG(a1) - addi a1, a1, 16*SZREG - REG_S a4, 10*SZREG(t6) - REG_S a5, 11*SZREG(t6) - REG_S a6, 12*SZREG(t6) - REG_S a7, 13*SZREG(t6) - REG_S t0, 14*SZREG(t6) - REG_S t1, 15*SZREG(t6) - addi t6, t6, 16*SZREG - bltu a1, a3, 1b - andi a2, a2, (16*SZREG)-1 /* Update count */ - -2: - /* Remainder is smaller than a page, compute native word count */ - beqz a2, 7f - andi a5, a2, ~(SZREG-1) - andi a2, a2, (SZREG-1) - add a3, a1, a5 - /* Jump directly to last word if no words. */ - beqz a5, 4f - -3: - /* Use single native register copy */ - REG_L a4, 0(a1) - addi a1, a1, SZREG - REG_S a4, 0(t6) - addi t6, t6, SZREG - bltu a1, a3, 3b - - /* Jump directly out if no more bytes */ - beqz a2, 7f - -4: - /* Copy the last word unaligned */ - add a3, a1, a2 - add a4, t6, a2 - REG_L a5, -SZREG(a3) - REG_S a5, -SZREG(a4) - ret - -5: - /* Copy bytes when the total copy is -extern __typeof (memcpy) __memcpy_generic; -hidden_proto (__memcpy_generic) - +#if IS_IN(libc) +# define MEMCPY __memcpy_generic +# undef libc_hidden_builtin_def +# define libc_hidden_builtin_def(x) +#endif #include diff --git a/sysdeps/riscv/multiarch/memcpy_noalignment.S b/sysdeps/riscv/multiarch/memcpy_noalignment.S new file mode 100644 index 0000000000..fa39be21d6 --- /dev/null +++ b/sysdeps/riscv/multiarch/memcpy_noalignment.S @@ -0,0 +1,162 @@ +/* memcpy for RISC-V, ignoring buffer alignment + Copyright (C) 2024 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include +#include + +/* memcpy optimization for CPUs with fast unaligned support + (RISCV_HWPROBE_MISALIGNED_FAST). + + Copies are split into 3 main cases: small copies up to SZREG, copies up to + BLOCK_SIZE (128 for 64 bits, 64 for 32 bits), and copies larger than BLOCK_SIZE. + + Large copies use a software pipelined loop processing BLOCK_SIZE bytes per + iteration. The destination pointer is SZREG-byte aligned to minimize store + unaligned accesses. + + The tail is handled with branchless copies. */ + +#define BLOCK_SIZE (16 * SZREG) + + .attribute unaligned_access, 1 +ENTRY (__memcpy_noalignment) + beq a2, zero, L(ret) + + /* if LEN < SZREG jump to tail handling. */ + li a5, SZREG-1 + mv a6, a0 + bleu a2, a5, L(tail) + + /* Copy the first word, align DEST to word, and adjust DEST/SRC/LEN + based on the amount adjusted to align DEST. */ + REG_L a3, 0(a1) + andi a5, a0, SZREG-1 + addi a2, a2, -SZREG + li a4, SZREG + sub a4, a4, a5 + REG_S a3, 0(a0) + add a2, a5, a2 + + /* If LEN < BLOCK_SIZE jump to word copy. */ + li a3, BLOCK_SIZE-1 + add a5, a0, a4 + add a1, a1, a4 + bleu a2, a3, L(word_copy_adjust) + addi a7, a2, -BLOCK_SIZE + andi a7, a7, -BLOCK_SIZE + addi a7, a7, BLOCK_SIZE + add a3, a5, a7 + mv a4, a1 +L(block_copy): + REG_L a6, 0(a4) + REG_L t0, SZREG(a4) + REG_L t1, (2*SZREG)(a4) + REG_L t2, (3*SZREG)(a4) + REG_L t3, (4*SZREG)(a4) + REG_L t4, (5*SZREG)(a4) + REG_L t5, (6*SZREG)(a4) + REG_L t6, (7*SZREG)(a4) + REG_S a6, 0(a5) + REG_S t0, SZREG(a5) + REG_S t1, (2*SZREG)(a5) + REG_S t2, (3*SZREG)(a5) + REG_S t3, (4*SZREG)(a5) + REG_S t4, (5*SZREG)(a5) + REG_S t5, (6*SZREG)(a5) + REG_S t6, (7*SZREG)(a5) + REG_L a6, (8*SZREG)(a4) + REG_L t0, (9*SZREG)(a4) + REG_L t1, (10*SZREG)(a4) + REG_L t2, (11*SZREG)(a4) + REG_L t3, (12*SZREG)(a4) + REG_L t4, (13*SZREG)(a4) + REG_L t5, (14*SZREG)(a4) + REG_L t6, (15*SZREG)(a4) + addi a4, a4, BLOCK_SIZE + REG_S a6, (8*SZREG)(a5) + REG_S t0, (9*SZREG)(a5) + REG_S t1, (10*SZREG)(a5) + REG_S t2, (11*SZREG)(a5) + REG_S t3, (12*SZREG)(a5) + REG_S t4, (13*SZREG)(a5) + REG_S t5, (14*SZREG)(a5) + REG_S t6, (15*SZREG)(a5) + addi a5, a5, BLOCK_SIZE + bne a5, a3, L(block_copy) + add a1, a1, a7 + andi a2, a2, BLOCK_SIZE-1 + + /* 0 <= a2/LEN < BLOCK_SIZE. */ +L(word_copy): + li a5, SZREG-1 + /* if LEN < SZREG jump to tail handling. */ + bleu a2, a5, L(tail_adjust) + addi a7, a2, -SZREG + andi a7, a7, -SZREG + addi a7, a7, SZREG + add a6, a3, a7 + mv a5, a1 +L(word_copy_loop): + REG_L a4, 0(a5) + addi a3, a3, SZREG + addi a5, a5, SZREG + REG_S a4, -SZREG(a3) + bne a3, a6, L(word_copy_loop) + add a1, a1, a7 + andi a2, a2, SZREG-1 + + /* Copy the last word unaligned. */ + add a3, a1, a2 + add a4, a6, a2 + REG_L t0, -SZREG(a3) + REG_S t0, -SZREG(a4) + ret + +L(tail): + /* Copy 4-7 bytes. */ + andi a5, a2, 4 + add a3, a1, a2 + add a4, a6, a2 + beq a5, zero, L(copy_0_3) + lw t0, 0(a1) + lw t1, -4(a3) + sw t0, 0(a6) + sw t1, -4(a4) + ret + + /* Copy 0-3 bytes. */ +L(copy_0_3): + beq a2, zero, L(ret) + srli a2, a2, 1 + add t4, a1, a2 + add t5, a6, a2 + lbu t0, 0(a1) + lbu t1, -1(a3) + lbu t2, 0(t4) + sb t0, 0(a6) + sb t1, -1(a4) + sb t2, 0(t5) +L(ret): + ret +L(tail_adjust): + mv a6, a3 + j L(tail) +L(word_copy_adjust): + mv a3, a5 + j L(word_copy) +END (__memcpy_noalignment) diff --git a/sysdeps/unix/sysv/linux/riscv/Makefile b/sysdeps/unix/sysv/linux/riscv/Makefile index 398ff7418b..04abf226ad 100644 --- a/sysdeps/unix/sysv/linux/riscv/Makefile +++ b/sysdeps/unix/sysv/linux/riscv/Makefile @@ -15,15 +15,6 @@ ifeq ($(subdir),stdlib) gen-as-const-headers += ucontext_i.sym endif -ifeq ($(subdir),string) -sysdep_routines += \ - memcpy \ - memcpy-generic \ - memcpy_noalignment \ - # sysdep_routines - -endif - abi-variants := ilp32 ilp32d lp64 lp64d ifeq (,$(filter $(default-abi),$(abi-variants))) diff --git a/sysdeps/unix/sysv/linux/riscv/hwprobe.c b/sysdeps/unix/sysv/linux/riscv/hwprobe.c index e64c159eb3..9159045478 100644 --- a/sysdeps/unix/sysv/linux/riscv/hwprobe.c +++ b/sysdeps/unix/sysv/linux/riscv/hwprobe.c @@ -34,3 +34,4 @@ int __riscv_hwprobe (struct riscv_hwprobe *pairs, size_t pair_count, /* Negate negative errno values to match pthreads API. */ return -r; } +libc_hidden_def (__riscv_hwprobe) diff --git a/sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h b/sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h new file mode 100644 index 0000000000..cce91c1b53 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/include/sys/hwprobe.h @@ -0,0 +1,8 @@ +#ifndef _SYS_HWPROBE_H +# include_next + +#ifndef _ISOMAC +libc_hidden_proto (__riscv_hwprobe) +#endif + +#endif diff --git a/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile new file mode 100644 index 0000000000..fcef5659d4 --- /dev/null +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/Makefile @@ -0,0 +1,9 @@ +ifeq ($(subdir),string) +sysdep_routines += \ + memcpy \ + memcpy-generic \ + memcpy_noalignment \ + # sysdep_routines + +CFLAGS-memcpy_noalignment.c += -mno-strict-align +endif diff --git a/sysdeps/riscv/memcopy.h b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c similarity index 51% rename from sysdeps/riscv/memcopy.h rename to sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c index 27675964b0..9f806d7a9e 100644 --- a/sysdeps/riscv/memcopy.h +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/ifunc-impl-list.c @@ -1,4 +1,4 @@ -/* memcopy.h -- definitions for memory copy functions. RISC-V version. +/* Enumerate available IFUNC implementations of a function. RISCV version. Copyright (C) 2024 Free Software Foundation, Inc. This file is part of the GNU C Library. @@ -16,11 +16,28 @@ License along with the GNU C Library; if not, see . */ -#include +#include +#include +#include -/* Redefine the generic memcpy implementation to __memcpy_generic, so - the memcpy ifunc can select between generic and special versions. - In rtld, don't bother with all the ifunciness. */ -#if IS_IN (libc) -#define MEMCPY __memcpy_generic -#endif +size_t +__libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, + size_t max) +{ + size_t i = max; + + bool fast_unaligned = false; + + struct riscv_hwprobe pair = { .key = RISCV_HWPROBE_KEY_CPUPERF_0 }; + if (__riscv_hwprobe (&pair, 1, 0, NULL, 0) == 0 + && (pair.value & RISCV_HWPROBE_MISALIGNED_MASK) + == RISCV_HWPROBE_MISALIGNED_FAST) + fast_unaligned = true; + + IFUNC_IMPL (i, name, memcpy, + IFUNC_IMPL_ADD (array, i, memcpy, fast_unaligned, + __memcpy_noalignment) + IFUNC_IMPL_ADD (array, i, memcpy, 1, __memcpy_generic)) + + return 0; +} diff --git a/sysdeps/riscv/memcpy.c b/sysdeps/unix/sysv/linux/riscv/multiarch/memcpy.c similarity index 90% rename from sysdeps/riscv/memcpy.c rename to sysdeps/unix/sysv/linux/riscv/multiarch/memcpy.c index 20f9548c44..51d8ace858 100644 --- a/sysdeps/riscv/memcpy.c +++ b/sysdeps/unix/sysv/linux/riscv/multiarch/memcpy.c @@ -28,8 +28,6 @@ # include # include -# define INIT_ARCH() - extern __typeof (__redirect_memcpy) __libc_memcpy; extern __typeof (__redirect_memcpy) __memcpy_generic attribute_hidden; @@ -38,14 +36,9 @@ extern __typeof (__redirect_memcpy) __memcpy_noalignment attribute_hidden; static inline __typeof (__redirect_memcpy) * select_memcpy_ifunc (uint64_t dl_hwcap, __riscv_hwprobe_t hwprobe_func) { - unsigned long long int value; - - INIT_ARCH (); - - if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &value) != 0) - return __memcpy_generic; - - if ((value & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST) + unsigned long long int v; + if (__riscv_hwprobe_one (hwprobe_func, RISCV_HWPROBE_KEY_CPUPERF_0, &v) == 0 + && (v & RISCV_HWPROBE_MISALIGNED_MASK) == RISCV_HWPROBE_MISALIGNED_FAST) return __memcpy_noalignment; return __memcpy_generic; @@ -59,5 +52,6 @@ strong_alias (__libc_memcpy, memcpy); __hidden_ver1 (memcpy, __GI_memcpy, __redirect_memcpy) __attribute__ ((visibility ("hidden"))) __attribute_copy__ (memcpy); # endif - +#else +# include #endif