From patchwork Fri Oct 25 18:21:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adhemerval Zanella Netto X-Patchwork-Id: 838375 Delivered-To: patch@linaro.org Received: by 2002:adf:e287:0:b0:37d:45d0:187 with SMTP id v7csp442804wri; Fri, 25 Oct 2024 11:26:45 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXQ5o/KI1aThGp1YV6uGEgOmU5bhuU7+HdfbnQ2gxnqFdKWQL712+gcQMsQ7+Wx6Kn53LNEyw==@linaro.org X-Google-Smtp-Source: AGHT+IE8soEty4c8fGSqnK/SJFzsYzV9wrrqs9Sq6JSjVsJkVHtkjEGABl26XHZsUwDHI3WKNsiW X-Received: by 2002:a05:620a:450c:b0:7b1:54f6:d1d3 with SMTP id af79cd13be357-7b193f623d6mr40160985a.57.1729880804752; Fri, 25 Oct 2024 11:26:44 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1729880804; cv=pass; d=google.com; s=arc-20240605; b=cTEDr7BhsTyJCm9QuTA2VH1a30niXoQmOLxCb7QALDkJzp0XA015yxATLJLFr5oExE ljMuDZGHcSjjCcZ5VqIXeJYnOWTtTal5yhRLnQWS+Av0l2pz50fasgZ8oA86jqt0ceyU aiKcOB1er/JRprDGKNTh7tSIwZp9gaTxVsBUtAgH8rvqqGDeZEhcE1+oLXm0HUF1TkFf 0z/V5F1YSm91pAZqhKz2WIPiRWfqoEKUnkMUXLyHqbvk8hCcMhs6yIKvuhCgeCok5qDY IvadxNAu4V15wXPFrvXFr66yXl4cZ3ku7KWKBEi9U/9LZ4FFUGafG5zKW9tIHg0aQlu2 f8aw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=CwCFYj1nvoP60cuoqq+/FCM3f50xDG4qPwPVeYWWFuI=; fh=t7K9SrKfAVamsrcTQr7hpJTc6WX7Vbl5hwH62Vkp0W0=; b=jXl423bMXMsMWtA6MRewIMBEbNKiI40BQFQWOcSC3+dhTrWPWHCdvMRQuKkM5TzTjx NLCKX8I2WcLB+dzSQ2X31mZHYaMfUYaBtB3H3r5mAV2kZW7o7Ej1ze+n0fSntkitJypP MQNlA6HCwlJLeE657qjZjfsMembNm1fAV1luQNTqieJrOBIELtp+jR2AXJkaD286c82b EA6MhNDBiDdBbrDci40sl//CXY+IDyWNfXFg8j8z52ZiFqISU66Vt5YbwzmOpmQ4Zow9 NPJ+7bm6d9AhtsJKmtL2Hc/hOuIyR/Udiyq/efD44j6mrvdaPulvmwQqkGK+Dng0EzfT pd8Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ybaEzumX; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id af79cd13be357-7b18d321d30si203027785a.329.2024.10.25.11.26.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 11:26:44 -0700 (PDT) Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ybaEzumX; arc=pass (i=1); spf=pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="libc-alpha-bounces~patch=linaro.org@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 471213858CD1 for ; Fri, 25 Oct 2024 18:26:44 +0000 (GMT) X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from mail-pl1-x630.google.com (mail-pl1-x630.google.com [IPv6:2607:f8b0:4864:20::630]) by sourceware.org (Postfix) with ESMTPS id CBD983858D21 for ; Fri, 25 Oct 2024 18:26:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CBD983858D21 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=linaro.org Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=linaro.org ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CBD983858D21 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::630 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729880788; cv=none; b=lBwtQUWzpyFQ8ZdEj4G86ncHS69lCLD3l05xVyi4//t+V90U3giB4qMht8XMmUDij4TMalOZe/68xOw/VsC3w9xWRAbhJM0euBBakwiUf4RCR3A3cgg5I6lQCPdjqRsd9Feq7WLLAwGEKcA3zaAh5/HnImAlJIDjW8G5cGZ+/hc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1729880788; c=relaxed/simple; bh=Jo75G4ngJ44zuB0kgCvV1w4rm+WgOzyjCrPlNmyVLXw=; h=DKIM-Signature:From:To:Subject:Date:Message-ID:MIME-Version; b=VJDgPcyzhkCpSdi0ltbjwrAPGww98mosi7h+Xd2gGm2gypayqlWVxNxBfa3xbnqoZnyQkFuEFpnAcX2TzstIclSJAHnaBtcOBTgtlP8DVks7dnYaQHFobmat8hj3LY8z+7gUCiSmDKq2H2KxScnAcC9Fs+dBOsCwI+Noyy7G+fc= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-20c6f492d2dso25013045ad.0 for ; Fri, 25 Oct 2024 11:26:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; t=1729880781; x=1730485581; darn=sourceware.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=CwCFYj1nvoP60cuoqq+/FCM3f50xDG4qPwPVeYWWFuI=; b=ybaEzumXelze81FCAtT7AT3W5+KSKp+stejKYNl3xrptzMzQ5q2UKhXfXS72DfNUfs q387ZYr04EcZvW8NN3VTAfKnstXUuhgz+0BBe7FitepfVV1ZA9YzLgeVuyGkGTxfECMq AWHqXAkpRSarU4QTP1QwaOetbrBcsGxhPMPDocX+Db+sqPXhi/yRgBPOXAy78abQDa6A DDdRH7jJ1VFEauVNQ4mLo700wI6ErXDCGB5AG0kmiIcEyugc082J2bPaqj/9Tl+osor0 yssvlujV7+R7cIyWx1D1R9mQZvlLsjKTIRbSCxbNds/Myi5JzEBTMkw0RHkYOymPXMwB dlTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729880781; x=1730485581; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CwCFYj1nvoP60cuoqq+/FCM3f50xDG4qPwPVeYWWFuI=; b=gFHCQn6rglDJQi9ES9d9QPa8pwrCF/VXd+65bAiYI+hWDt+CU1LRbSx7Um9dg2RkkR +KuE1MqScAD6fSqDeRsNSz1M3TdDEIVMt5MdQVSsn/pcXsK6i52E5GjT7LMcC5I2ouZ2 wZaGw9IR4/9aYFxOnL4ftIFQ5Ewef8MRVcCeU6b7N4oTeZ9Pgpkm/RkEUWYBQCBbG9z8 QAiXwqRbc9smJrQpO8kuLXFPg1UuFZ9B7nD2kosBVpUXjDkZlrLSFc0Modhkk/u1ELBI PxNPlngOy5/1r+OAoegGvYlbdjCHqyfw9UpzapjXLpj9ExU95ly/GDXmaYqo2xqPDTTl jjrQ== X-Gm-Message-State: AOJu0YwWaWlb/RA05Kzk55D44jOXa9Ic0Lh54RtHyrB+SGPkP1TPRZdo fbakzSAEsO+8fzkBzPf2pA8v0Mbv6jg9q3rpde0z8S6Dy6G67tcLsQug3ZZvnSQfEk5RC3hGF+7 E X-Received: by 2002:a17:902:f68f:b0:20e:5aaf:32c1 with SMTP id d9443c01a7336-210c6c090afmr550955ad.30.1729880781164; Fri, 25 Oct 2024 11:26:21 -0700 (PDT) Received: from mandiga.. ([2804:1b3:a7c3:a8a8:cb9e:64f4:66fb:5ca2]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-210bbf44550sm12314075ad.14.2024.10.25.11.26.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Oct 2024 11:26:18 -0700 (PDT) From: Adhemerval Zanella To: libc-alpha@sourceware.org Cc: Paul Zimmermann , Alexei Sibidanov Subject: [PATCH 00/17] Add more CORE-MATH on libm Date: Fri, 25 Oct 2024 15:21:38 -0300 Message-ID: <20241025182614.2022697-1-adhemerval.zanella@linaro.org> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org Following the tgammaf implementation (392b3f0971764) and its telling performance improvement, I worked with Pauz Zimmermann to check if we can integrate more routines on glibc. This patchset adds the optimized and correctly rounded exp10m1f, exp2m1f, expm1f, log10f, log2p1f, log1pf, and log10p1f. I also added a benchmark to evaluate each implementation. I tested the implementation on recent hardware (Ryzen 9 5900X for x86_64, Ampere/Neoverse for aarch64, and POWER10 for powerpc), and most of the implementation shows impressive performance improvements. Like the implementation from ARM optimized routines, the CORE-MATH one takes advantage of recent ISA and platform support (like fma and rounding instructions, along with FP throughpu). For a couple of implementations, exp10m1f, and exp2m1f, CORE-MATH shows slightly worse performance for x86_64-v1. It is due the glibc generic implementation that calls optimized exp10f/exp2f, and when a more recent ISA is used (x86_64-v2 or x86_64-v3) CORE-MATH shows a better output than the current implementation. For both cases I added iFUNC support to use FMA on x86_64. Adhemerval Zanella (17): math: Add e_gammaf_r to glibc code and style benchtests: Add exp10m1f benchmark benchtests: Add exp2m1f benchmark benchtests: Add expm1f benchmark benchtests: Add log10f benchmark benchtests: Add log2p1f benchmark benchtests: Add log1p benchmark benchtests: Add log10p1f benchmark math: Use exp10m1f from CORE-MATH math: Use exp2m1f from CORE-MATH math: Use expm1f from CORE-MATH math: Use log10f from CORE-MATH math: Use log2p1f from CORE-MATH math: Use log1pf from CORE-MATH math: Use log10p1f from CORE-MATH x86_64: Add exp10m1f with FMA x86_64: Add exp2m1f with FMA SHARED-FILES | 16 + benchtests/Makefile | 7 + benchtests/exp10m1f-inputs | 2389 ++++++++++++++ benchtests/exp2m1f-inputs | 2388 ++++++++++++++ benchtests/expm1f-inputs | 799 +++++ benchtests/log10f-inputs | 1005 ++++++ benchtests/log10p1f-inputs | 2888 +++++++++++++++++ benchtests/log1pf-inputs | 1005 ++++++ benchtests/log2p1f-inputs | 2888 +++++++++++++++++ sysdeps/aarch64/libm-test-ulps | 29 +- sysdeps/alpha/fpu/libm-test-ulps | 12 - sysdeps/arc/fpu/libm-test-ulps | 25 - sysdeps/arc/nofpu/libm-test-ulps | 7 - sysdeps/arm/libm-test-ulps | 31 +- sysdeps/csky/fpu/libm-test-ulps | 12 - sysdeps/csky/nofpu/libm-test-ulps | 12 - sysdeps/hppa/fpu/libm-test-ulps | 28 - sysdeps/i386/fpu/e_log10f.S | 66 - sysdeps/i386/fpu/libm-test-ulps | 25 - sysdeps/i386/fpu/s_expm1f.S | 112 - sysdeps/i386/fpu/s_log1pf.S | 66 - .../i386/i686/fpu/multiarch/libm-test-ulps | 25 - sysdeps/ieee754/flt-32/e_gammaf_r.c | 178 +- sysdeps/ieee754/flt-32/e_log10f.c | 196 +- sysdeps/ieee754/flt-32/s_exp10m1f.c | 227 ++ sysdeps/ieee754/flt-32/s_exp2m1f.c | 194 ++ sysdeps/ieee754/flt-32/s_expm1f.c | 232 +- sysdeps/ieee754/flt-32/s_log10p1f.c | 182 ++ sysdeps/ieee754/flt-32/s_log1pf.c | 271 +- sysdeps/ieee754/flt-32/s_log2p1f.c | 248 ++ .../math_errf.c => ieee754/flt-32/w_log1pf.c} | 0 sysdeps/loongarch/lp64/libm-test-ulps | 28 - sysdeps/m68k/coldfire/fpu/libm-test-ulps | 6 - sysdeps/m68k/m680x0/fpu/libm-test-ulps | 12 - sysdeps/m68k/m680x0/fpu/w_log1pf.c | 20 + sysdeps/microblaze/libm-test-ulps | 3 - sysdeps/mips/mips32/libm-test-ulps | 28 - sysdeps/mips/mips64/libm-test-ulps | 28 - sysdeps/nios2/libm-test-ulps | 3 - sysdeps/or1k/fpu/libm-test-ulps | 4 - sysdeps/or1k/nofpu/libm-test-ulps | 12 - sysdeps/powerpc/fpu/libm-test-ulps | 29 +- sysdeps/powerpc/nofpu/libm-test-ulps | 28 - sysdeps/riscv/nofpu/libm-test-ulps | 16 - sysdeps/riscv/rvd/libm-test-ulps | 28 - sysdeps/s390/fpu/libm-test-ulps | 28 - sysdeps/sh/libm-test-ulps | 6 - sysdeps/sparc/fpu/libm-test-ulps | 28 - sysdeps/x86_64/fpu/libm-test-ulps | 29 +- sysdeps/x86_64/fpu/multiarch/Makefile | 4 + sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c | 4 + sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c | 33 + sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c | 4 + sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c | 33 + 54 files changed, 14873 insertions(+), 1104 deletions(-) create mode 100644 benchtests/exp10m1f-inputs create mode 100644 benchtests/exp2m1f-inputs create mode 100644 benchtests/expm1f-inputs create mode 100644 benchtests/log10f-inputs create mode 100644 benchtests/log10p1f-inputs create mode 100644 benchtests/log1pf-inputs create mode 100644 benchtests/log2p1f-inputs delete mode 100644 sysdeps/i386/fpu/e_log10f.S delete mode 100644 sysdeps/i386/fpu/s_expm1f.S delete mode 100644 sysdeps/i386/fpu/s_log1pf.S create mode 100644 sysdeps/ieee754/flt-32/s_exp10m1f.c create mode 100644 sysdeps/ieee754/flt-32/s_exp2m1f.c create mode 100644 sysdeps/ieee754/flt-32/s_log10p1f.c create mode 100644 sysdeps/ieee754/flt-32/s_log2p1f.c rename sysdeps/{m68k/m680x0/fpu/math_errf.c => ieee754/flt-32/w_log1pf.c} (100%) create mode 100644 sysdeps/m68k/m680x0/fpu/w_log1pf.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c