[00/17] Add more CORE-MATH on libm

Message ID	20241025182614.2022697-1-adhemerval.zanella@linaro.org
Headers	show Delivered-To: patch@linaro.org Received-SPF: pass (google.com: domain of libc-alpha-bounces~patch=linaro.org@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CBD983858D21 From: Adhemerval Zanella <adhemerval.zanella@linaro.org> To: libc-alpha@sourceware.org Cc: Paul Zimmermann <Paul.Zimmermann@inria.fr>, Alexei Sibidanov <sibid@uvic.ca> Subject: [PATCH 00/17] Add more CORE-MATH on libm Date: Fri, 25 Oct 2024 15:21:38 -0300 Message-ID: <20241025182614.2022697-1-adhemerval.zanella@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org
Series	Add more CORE-MATH on libm \| expand [00/17] Add more CORE-MATH on libm [01/17] math: Add e_gammaf_r to glibc code and style [02/17] benchtests: Add exp10m1f benchmark [03/17] benchtests: Add exp2m1f benchmark [04/17] benchtests: Add expm1f benchmark [05/17] benchtests: Add log10f benchmark [06/17] benchtests: Add log2p1f benchmark [07/17] benchtests: Add log1p benchmark [08/17] benchtests: Add log10p1f benchmark [09/17] math: Use exp10m1f from CORE-MATH [10/17] math: Use exp2m1f from CORE-MATH [11/17] math: Use expm1f from CORE-MATH [13/17] math: Use log2p1f from CORE-MATH [15/17] math: Use log10p1f from CORE-MATH [16/17] x86_64: Add exp10m1f with FMA [17/17] x86_64: Add exp2m1f with FMA

Message ID

20241025182614.2022697-1-adhemerval.zanella@linaro.org

Headers

Received-SPF: pass (google.com: domain of
 libc-alpha-bounces~patch=linaro.org@sourceware.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CBD983858D21
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: libc-alpha@sourceware.org
Cc: Paul Zimmermann <Paul.Zimmermann@inria.fr>,
 Alexei Sibidanov <sibid@uvic.ca>
Subject: [PATCH 00/17] Add more CORE-MATH on libm
Date: Fri, 25 Oct 2024 15:21:38 -0300
Message-ID: <20241025182614.2022697-1-adhemerval.zanella@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: libc-alpha-bounces~patch=linaro.org@sourceware.org

Series

Add more CORE-MATH on libm | expand

Message

Adhemerval Zanella Oct. 25, 2024, 6:21 p.m. UTC

Following the tgammaf implementation (392b3f0971764) and its telling
performance improvement, I worked with Pauz Zimmermann to check if we
can integrate more routines on glibc.

This patchset adds the optimized and correctly rounded exp10m1f,
exp2m1f, expm1f, log10f, log2p1f, log1pf, and log10p1f. I also added
a benchmark to evaluate each implementation.

I tested the implementation on recent hardware (Ryzen 9 5900X for
x86_64, Ampere/Neoverse for aarch64, and POWER10 for powerpc), and
most of the implementation shows impressive performance
improvements. Like the implementation from ARM optimized routines,
the CORE-MATH one takes advantage of recent ISA and platform support
(like fma and rounding instructions, along with FP throughpu).

For a couple of implementations, exp10m1f, and exp2m1f, CORE-MATH
shows slightly worse performance for x86_64-v1. It is due the glibc
generic implementation that calls optimized exp10f/exp2f, and when a
more recent ISA is used (x86_64-v2 or x86_64-v3) CORE-MATH shows a
better output than the current implementation. For both cases I added
iFUNC support to use FMA on x86_64. 

Adhemerval Zanella (17):
  math: Add e_gammaf_r to glibc code and style
  benchtests: Add exp10m1f benchmark
  benchtests: Add exp2m1f benchmark
  benchtests: Add expm1f benchmark
  benchtests: Add log10f benchmark
  benchtests: Add log2p1f benchmark
  benchtests: Add log1p benchmark
  benchtests: Add log10p1f benchmark
  math: Use exp10m1f from CORE-MATH
  math: Use exp2m1f from CORE-MATH
  math: Use expm1f from CORE-MATH
  math: Use log10f from CORE-MATH
  math: Use log2p1f from CORE-MATH
  math: Use log1pf from CORE-MATH
  math: Use log10p1f from CORE-MATH
  x86_64: Add exp10m1f with FMA
  x86_64: Add exp2m1f with FMA

 SHARED-FILES                                  |   16 +
 benchtests/Makefile                           |    7 +
 benchtests/exp10m1f-inputs                    | 2389 ++++++++++++++
 benchtests/exp2m1f-inputs                     | 2388 ++++++++++++++
 benchtests/expm1f-inputs                      |  799 +++++
 benchtests/log10f-inputs                      | 1005 ++++++
 benchtests/log10p1f-inputs                    | 2888 +++++++++++++++++
 benchtests/log1pf-inputs                      | 1005 ++++++
 benchtests/log2p1f-inputs                     | 2888 +++++++++++++++++
 sysdeps/aarch64/libm-test-ulps                |   29 +-
 sysdeps/alpha/fpu/libm-test-ulps              |   12 -
 sysdeps/arc/fpu/libm-test-ulps                |   25 -
 sysdeps/arc/nofpu/libm-test-ulps              |    7 -
 sysdeps/arm/libm-test-ulps                    |   31 +-
 sysdeps/csky/fpu/libm-test-ulps               |   12 -
 sysdeps/csky/nofpu/libm-test-ulps             |   12 -
 sysdeps/hppa/fpu/libm-test-ulps               |   28 -
 sysdeps/i386/fpu/e_log10f.S                   |   66 -
 sysdeps/i386/fpu/libm-test-ulps               |   25 -
 sysdeps/i386/fpu/s_expm1f.S                   |  112 -
 sysdeps/i386/fpu/s_log1pf.S                   |   66 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   25 -
 sysdeps/ieee754/flt-32/e_gammaf_r.c           |  178 +-
 sysdeps/ieee754/flt-32/e_log10f.c             |  196 +-
 sysdeps/ieee754/flt-32/s_exp10m1f.c           |  227 ++
 sysdeps/ieee754/flt-32/s_exp2m1f.c            |  194 ++
 sysdeps/ieee754/flt-32/s_expm1f.c             |  232 +-
 sysdeps/ieee754/flt-32/s_log10p1f.c           |  182 ++
 sysdeps/ieee754/flt-32/s_log1pf.c             |  271 +-
 sysdeps/ieee754/flt-32/s_log2p1f.c            |  248 ++
 .../math_errf.c => ieee754/flt-32/w_log1pf.c} |    0
 sysdeps/loongarch/lp64/libm-test-ulps         |   28 -
 sysdeps/m68k/coldfire/fpu/libm-test-ulps      |    6 -
 sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   12 -
 sysdeps/m68k/m680x0/fpu/w_log1pf.c            |   20 +
 sysdeps/microblaze/libm-test-ulps             |    3 -
 sysdeps/mips/mips32/libm-test-ulps            |   28 -
 sysdeps/mips/mips64/libm-test-ulps            |   28 -
 sysdeps/nios2/libm-test-ulps                  |    3 -
 sysdeps/or1k/fpu/libm-test-ulps               |    4 -
 sysdeps/or1k/nofpu/libm-test-ulps             |   12 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   29 +-
 sysdeps/powerpc/nofpu/libm-test-ulps          |   28 -
 sysdeps/riscv/nofpu/libm-test-ulps            |   16 -
 sysdeps/riscv/rvd/libm-test-ulps              |   28 -
 sysdeps/s390/fpu/libm-test-ulps               |   28 -
 sysdeps/sh/libm-test-ulps                     |    6 -
 sysdeps/sparc/fpu/libm-test-ulps              |   28 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   29 +-
 sysdeps/x86_64/fpu/multiarch/Makefile         |    4 +
 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c |    4 +
 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c     |   33 +
 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c  |    4 +
 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c      |   33 +
 54 files changed, 14873 insertions(+), 1104 deletions(-)
 create mode 100644 benchtests/exp10m1f-inputs
 create mode 100644 benchtests/exp2m1f-inputs
 create mode 100644 benchtests/expm1f-inputs
 create mode 100644 benchtests/log10f-inputs
 create mode 100644 benchtests/log10p1f-inputs
 create mode 100644 benchtests/log1pf-inputs
 create mode 100644 benchtests/log2p1f-inputs
 delete mode 100644 sysdeps/i386/fpu/e_log10f.S
 delete mode 100644 sysdeps/i386/fpu/s_expm1f.S
 delete mode 100644 sysdeps/i386/fpu/s_log1pf.S
 create mode 100644 sysdeps/ieee754/flt-32/s_exp10m1f.c
 create mode 100644 sysdeps/ieee754/flt-32/s_exp2m1f.c
 create mode 100644 sysdeps/ieee754/flt-32/s_log10p1f.c
 create mode 100644 sysdeps/ieee754/flt-32/s_log2p1f.c
 rename sysdeps/{m68k/m680x0/fpu/math_errf.c => ieee754/flt-32/w_log1pf.c} (100%)
 create mode 100644 sysdeps/m68k/m680x0/fpu/w_log1pf.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c

Comments

Joseph Myers Oct. 25, 2024, 7:12 p.m. UTC | #1

On Fri, 25 Oct 2024, Adhemerval Zanella wrote:

> The CORE-MATH implementation is correctly rounded (for any rounding mode)
> and shows slight better performance to the generic log10pf.

This commit message should refer to log1pf, not log10pf.

Adhemerval Zanella Oct. 25, 2024, 8:07 p.m. UTC | #2

On 25/10/24 16:12, Joseph Myers wrote:
> On Fri, 25 Oct 2024, Adhemerval Zanella wrote:
> 
>> The CORE-MATH implementation is correctly rounded (for any rounding mode)
>> and shows slight better performance to the generic log10pf.
> 
> This commit message should refer to log1pf, not log10pf.
> 

Sigh, I though I had it fixed. Thanks for pointing out, Paul did it 
privately to me as well.

Noah Goldstein Oct. 26, 2024, 6:28 p.m. UTC | #3

On Fri, Oct 25, 2024 at 1:26 PM Adhemerval Zanella
<adhemerval.zanella@linaro.org> wrote:
>
> Following the tgammaf implementation (392b3f0971764) and its telling
> performance improvement, I worked with Pauz Zimmermann to check if we
> can integrate more routines on glibc.
>
> This patchset adds the optimized and correctly rounded exp10m1f,
> exp2m1f, expm1f, log10f, log2p1f, log1pf, and log10p1f. I also added
> a benchmark to evaluate each implementation.
>
> I tested the implementation on recent hardware (Ryzen 9 5900X for
> x86_64, Ampere/Neoverse for aarch64, and POWER10 for powerpc), and
> most of the implementation shows impressive performance
> improvements. Like the implementation from ARM optimized routines,
> the CORE-MATH one takes advantage of recent ISA and platform support
> (like fma and rounding instructions, along with FP throughpu).
>
> For a couple of implementations, exp10m1f, and exp2m1f, CORE-MATH
> shows slightly worse performance for x86_64-v1. It is due the glibc
> generic implementation that calls optimized exp10f/exp2f, and when a
> more recent ISA is used (x86_64-v2 or x86_64-v3) CORE-MATH shows a
> better output than the current implementation. For both cases I added
> iFUNC support to use FMA on x86_64.
>
> Adhemerval Zanella (17):
>   math: Add e_gammaf_r to glibc code and style
>   benchtests: Add exp10m1f benchmark
>   benchtests: Add exp2m1f benchmark
>   benchtests: Add expm1f benchmark
>   benchtests: Add log10f benchmark
>   benchtests: Add log2p1f benchmark
>   benchtests: Add log1p benchmark
>   benchtests: Add log10p1f benchmark
>   math: Use exp10m1f from CORE-MATH
>   math: Use exp2m1f from CORE-MATH
>   math: Use expm1f from CORE-MATH
>   math: Use log10f from CORE-MATH
>   math: Use log2p1f from CORE-MATH
>   math: Use log1pf from CORE-MATH
>   math: Use log10p1f from CORE-MATH
>   x86_64: Add exp10m1f with FMA
>   x86_64: Add exp2m1f with FMA
>
>  SHARED-FILES                                  |   16 +
>  benchtests/Makefile                           |    7 +
>  benchtests/exp10m1f-inputs                    | 2389 ++++++++++++++
>  benchtests/exp2m1f-inputs                     | 2388 ++++++++++++++
>  benchtests/expm1f-inputs                      |  799 +++++
>  benchtests/log10f-inputs                      | 1005 ++++++
>  benchtests/log10p1f-inputs                    | 2888 +++++++++++++++++
>  benchtests/log1pf-inputs                      | 1005 ++++++
>  benchtests/log2p1f-inputs                     | 2888 +++++++++++++++++
>  sysdeps/aarch64/libm-test-ulps                |   29 +-
>  sysdeps/alpha/fpu/libm-test-ulps              |   12 -
>  sysdeps/arc/fpu/libm-test-ulps                |   25 -
>  sysdeps/arc/nofpu/libm-test-ulps              |    7 -
>  sysdeps/arm/libm-test-ulps                    |   31 +-
>  sysdeps/csky/fpu/libm-test-ulps               |   12 -
>  sysdeps/csky/nofpu/libm-test-ulps             |   12 -
>  sysdeps/hppa/fpu/libm-test-ulps               |   28 -
>  sysdeps/i386/fpu/e_log10f.S                   |   66 -
>  sysdeps/i386/fpu/libm-test-ulps               |   25 -
>  sysdeps/i386/fpu/s_expm1f.S                   |  112 -
>  sysdeps/i386/fpu/s_log1pf.S                   |   66 -
>  .../i386/i686/fpu/multiarch/libm-test-ulps    |   25 -
>  sysdeps/ieee754/flt-32/e_gammaf_r.c           |  178 +-
>  sysdeps/ieee754/flt-32/e_log10f.c             |  196 +-
>  sysdeps/ieee754/flt-32/s_exp10m1f.c           |  227 ++
>  sysdeps/ieee754/flt-32/s_exp2m1f.c            |  194 ++
>  sysdeps/ieee754/flt-32/s_expm1f.c             |  232 +-
>  sysdeps/ieee754/flt-32/s_log10p1f.c           |  182 ++
>  sysdeps/ieee754/flt-32/s_log1pf.c             |  271 +-
>  sysdeps/ieee754/flt-32/s_log2p1f.c            |  248 ++
>  .../math_errf.c => ieee754/flt-32/w_log1pf.c} |    0
>  sysdeps/loongarch/lp64/libm-test-ulps         |   28 -
>  sysdeps/m68k/coldfire/fpu/libm-test-ulps      |    6 -
>  sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   12 -
>  sysdeps/m68k/m680x0/fpu/w_log1pf.c            |   20 +
>  sysdeps/microblaze/libm-test-ulps             |    3 -
>  sysdeps/mips/mips32/libm-test-ulps            |   28 -
>  sysdeps/mips/mips64/libm-test-ulps            |   28 -
>  sysdeps/nios2/libm-test-ulps                  |    3 -
>  sysdeps/or1k/fpu/libm-test-ulps               |    4 -
>  sysdeps/or1k/nofpu/libm-test-ulps             |   12 -
>  sysdeps/powerpc/fpu/libm-test-ulps            |   29 +-
>  sysdeps/powerpc/nofpu/libm-test-ulps          |   28 -
>  sysdeps/riscv/nofpu/libm-test-ulps            |   16 -
>  sysdeps/riscv/rvd/libm-test-ulps              |   28 -
>  sysdeps/s390/fpu/libm-test-ulps               |   28 -
>  sysdeps/sh/libm-test-ulps                     |    6 -
>  sysdeps/sparc/fpu/libm-test-ulps              |   28 -
>  sysdeps/x86_64/fpu/libm-test-ulps             |   29 +-
>  sysdeps/x86_64/fpu/multiarch/Makefile         |    4 +
>  sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c |    4 +
>  sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c     |   33 +
>  sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c  |    4 +
>  sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c      |   33 +
>  54 files changed, 14873 insertions(+), 1104 deletions(-)
>  create mode 100644 benchtests/exp10m1f-inputs
>  create mode 100644 benchtests/exp2m1f-inputs
>  create mode 100644 benchtests/expm1f-inputs
>  create mode 100644 benchtests/log10f-inputs
>  create mode 100644 benchtests/log10p1f-inputs
>  create mode 100644 benchtests/log1pf-inputs
>  create mode 100644 benchtests/log2p1f-inputs
>  delete mode 100644 sysdeps/i386/fpu/e_log10f.S
>  delete mode 100644 sysdeps/i386/fpu/s_expm1f.S
>  delete mode 100644 sysdeps/i386/fpu/s_log1pf.S
>  create mode 100644 sysdeps/ieee754/flt-32/s_exp10m1f.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_exp2m1f.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_log10p1f.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_log2p1f.c
>  rename sysdeps/{m68k/m680x0/fpu/math_errf.c => ieee754/flt-32/w_log1pf.c} (100%)
>  create mode 100644 sysdeps/m68k/m680x0/fpu/w_log1pf.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c
>
> --
> 2.43.0
>

Whitespace issues in some of your patches:
```
Applying: math: Add e_gammaf_r to glibc code and style
Applying: benchtests: Add exp10m1f benchmark
Applying: benchtests: Add exp2m1f benchmark
Applying: benchtests: Add expm1f benchmark
Applying: benchtests: Add log10f benchmark
Applying: benchtests: Add log2p1f benchmark
Applying: benchtests: Add log1p benchmark
Applying: benchtests: Add log10p1f benchmark
Applying: math: Use exp10m1f from CORE-MATH
Applying: math: Use exp2m1f from CORE-MATH
Applying: math: Use expm1f from CORE-MATH
Applying: math: Use log2p1f from CORE-MATH
.git/rebase-apply/patch:402: space before tab in indent.
               {
.git/rebase-apply/patch:456: space before tab in indent.
           };
warning: 2 lines add whitespace errors.
Applying: math: Use log10p1f from CORE-MATH
.git/rebase-apply/patch:352: trailing whitespace.
        {
.git/rebase-apply/patch:366: space before tab in indent.
           {
warning: 2 lines add whitespace errors.
Applying: x86_64: Add exp10m1f with FMA
Applying: x86_64: Add exp2m1f with FMA
```

Adhemerval Zanella Oct. 28, 2024, 12:09 p.m. UTC | #4

On 26/10/24 15:28, Noah Goldstein wrote:
> On Fri, Oct 25, 2024 at 1:26 PM Adhemerval Zanella
> <adhemerval.zanella@linaro.org> wrote:
>>
>> Following the tgammaf implementation (392b3f0971764) and its telling
>> performance improvement, I worked with Pauz Zimmermann to check if we
>> can integrate more routines on glibc.
>>
>> This patchset adds the optimized and correctly rounded exp10m1f,
>> exp2m1f, expm1f, log10f, log2p1f, log1pf, and log10p1f. I also added
>> a benchmark to evaluate each implementation.
>>
>> I tested the implementation on recent hardware (Ryzen 9 5900X for
>> x86_64, Ampere/Neoverse for aarch64, and POWER10 for powerpc), and
>> most of the implementation shows impressive performance
>> improvements. Like the implementation from ARM optimized routines,
>> the CORE-MATH one takes advantage of recent ISA and platform support
>> (like fma and rounding instructions, along with FP throughpu).
>>
>> For a couple of implementations, exp10m1f, and exp2m1f, CORE-MATH
>> shows slightly worse performance for x86_64-v1. It is due the glibc
>> generic implementation that calls optimized exp10f/exp2f, and when a
>> more recent ISA is used (x86_64-v2 or x86_64-v3) CORE-MATH shows a
>> better output than the current implementation. For both cases I added
>> iFUNC support to use FMA on x86_64.
>>
>> Adhemerval Zanella (17):
>>   math: Add e_gammaf_r to glibc code and style
>>   benchtests: Add exp10m1f benchmark
>>   benchtests: Add exp2m1f benchmark
>>   benchtests: Add expm1f benchmark
>>   benchtests: Add log10f benchmark
>>   benchtests: Add log2p1f benchmark
>>   benchtests: Add log1p benchmark
>>   benchtests: Add log10p1f benchmark
>>   math: Use exp10m1f from CORE-MATH
>>   math: Use exp2m1f from CORE-MATH
>>   math: Use expm1f from CORE-MATH
>>   math: Use log10f from CORE-MATH
>>   math: Use log2p1f from CORE-MATH
>>   math: Use log1pf from CORE-MATH
>>   math: Use log10p1f from CORE-MATH
>>   x86_64: Add exp10m1f with FMA
>>   x86_64: Add exp2m1f with FMA
>>
>>  SHARED-FILES                                  |   16 +
>>  benchtests/Makefile                           |    7 +
>>  benchtests/exp10m1f-inputs                    | 2389 ++++++++++++++
>>  benchtests/exp2m1f-inputs                     | 2388 ++++++++++++++
>>  benchtests/expm1f-inputs                      |  799 +++++
>>  benchtests/log10f-inputs                      | 1005 ++++++
>>  benchtests/log10p1f-inputs                    | 2888 +++++++++++++++++
>>  benchtests/log1pf-inputs                      | 1005 ++++++
>>  benchtests/log2p1f-inputs                     | 2888 +++++++++++++++++
>>  sysdeps/aarch64/libm-test-ulps                |   29 +-
>>  sysdeps/alpha/fpu/libm-test-ulps              |   12 -
>>  sysdeps/arc/fpu/libm-test-ulps                |   25 -
>>  sysdeps/arc/nofpu/libm-test-ulps              |    7 -
>>  sysdeps/arm/libm-test-ulps                    |   31 +-
>>  sysdeps/csky/fpu/libm-test-ulps               |   12 -
>>  sysdeps/csky/nofpu/libm-test-ulps             |   12 -
>>  sysdeps/hppa/fpu/libm-test-ulps               |   28 -
>>  sysdeps/i386/fpu/e_log10f.S                   |   66 -
>>  sysdeps/i386/fpu/libm-test-ulps               |   25 -
>>  sysdeps/i386/fpu/s_expm1f.S                   |  112 -
>>  sysdeps/i386/fpu/s_log1pf.S                   |   66 -
>>  .../i386/i686/fpu/multiarch/libm-test-ulps    |   25 -
>>  sysdeps/ieee754/flt-32/e_gammaf_r.c           |  178 +-
>>  sysdeps/ieee754/flt-32/e_log10f.c             |  196 +-
>>  sysdeps/ieee754/flt-32/s_exp10m1f.c           |  227 ++
>>  sysdeps/ieee754/flt-32/s_exp2m1f.c            |  194 ++
>>  sysdeps/ieee754/flt-32/s_expm1f.c             |  232 +-
>>  sysdeps/ieee754/flt-32/s_log10p1f.c           |  182 ++
>>  sysdeps/ieee754/flt-32/s_log1pf.c             |  271 +-
>>  sysdeps/ieee754/flt-32/s_log2p1f.c            |  248 ++
>>  .../math_errf.c => ieee754/flt-32/w_log1pf.c} |    0
>>  sysdeps/loongarch/lp64/libm-test-ulps         |   28 -
>>  sysdeps/m68k/coldfire/fpu/libm-test-ulps      |    6 -
>>  sysdeps/m68k/m680x0/fpu/libm-test-ulps        |   12 -
>>  sysdeps/m68k/m680x0/fpu/w_log1pf.c            |   20 +
>>  sysdeps/microblaze/libm-test-ulps             |    3 -
>>  sysdeps/mips/mips32/libm-test-ulps            |   28 -
>>  sysdeps/mips/mips64/libm-test-ulps            |   28 -
>>  sysdeps/nios2/libm-test-ulps                  |    3 -
>>  sysdeps/or1k/fpu/libm-test-ulps               |    4 -
>>  sysdeps/or1k/nofpu/libm-test-ulps             |   12 -
>>  sysdeps/powerpc/fpu/libm-test-ulps            |   29 +-
>>  sysdeps/powerpc/nofpu/libm-test-ulps          |   28 -
>>  sysdeps/riscv/nofpu/libm-test-ulps            |   16 -
>>  sysdeps/riscv/rvd/libm-test-ulps              |   28 -
>>  sysdeps/s390/fpu/libm-test-ulps               |   28 -
>>  sysdeps/sh/libm-test-ulps                     |    6 -
>>  sysdeps/sparc/fpu/libm-test-ulps              |   28 -
>>  sysdeps/x86_64/fpu/libm-test-ulps             |   29 +-
>>  sysdeps/x86_64/fpu/multiarch/Makefile         |    4 +
>>  sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c |    4 +
>>  sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c     |   33 +
>>  sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c  |    4 +
>>  sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c      |   33 +
>>  54 files changed, 14873 insertions(+), 1104 deletions(-)
>>  create mode 100644 benchtests/exp10m1f-inputs
>>  create mode 100644 benchtests/exp2m1f-inputs
>>  create mode 100644 benchtests/expm1f-inputs
>>  create mode 100644 benchtests/log10f-inputs
>>  create mode 100644 benchtests/log10p1f-inputs
>>  create mode 100644 benchtests/log1pf-inputs
>>  create mode 100644 benchtests/log2p1f-inputs
>>  delete mode 100644 sysdeps/i386/fpu/e_log10f.S
>>  delete mode 100644 sysdeps/i386/fpu/s_expm1f.S
>>  delete mode 100644 sysdeps/i386/fpu/s_log1pf.S
>>  create mode 100644 sysdeps/ieee754/flt-32/s_exp10m1f.c
>>  create mode 100644 sysdeps/ieee754/flt-32/s_exp2m1f.c
>>  create mode 100644 sysdeps/ieee754/flt-32/s_log10p1f.c
>>  create mode 100644 sysdeps/ieee754/flt-32/s_log2p1f.c
>>  rename sysdeps/{m68k/m680x0/fpu/math_errf.c => ieee754/flt-32/w_log1pf.c} (100%)
>>  create mode 100644 sysdeps/m68k/m680x0/fpu/w_log1pf.c
>>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f-fma.c
>>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp10m1f.c
>>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f-fma.c
>>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_exp2m1f.c
>>
>> --
>> 2.43.0
>>
> 
> Whitespace issues in some of your patches:
> ```
> Applying: math: Add e_gammaf_r to glibc code and style
> Applying: benchtests: Add exp10m1f benchmark
> Applying: benchtests: Add exp2m1f benchmark
> Applying: benchtests: Add expm1f benchmark
> Applying: benchtests: Add log10f benchmark
> Applying: benchtests: Add log2p1f benchmark
> Applying: benchtests: Add log1p benchmark
> Applying: benchtests: Add log10p1f benchmark
> Applying: math: Use exp10m1f from CORE-MATH
> Applying: math: Use exp2m1f from CORE-MATH
> Applying: math: Use expm1f from CORE-MATH
> Applying: math: Use log2p1f from CORE-MATH
> .git/rebase-apply/patch:402: space before tab in indent.
>                {
> .git/rebase-apply/patch:456: space before tab in indent.
>            };
> warning: 2 lines add whitespace errors.
> Applying: math: Use log10p1f from CORE-MATH
> .git/rebase-apply/patch:352: trailing whitespace.
>         {
> .git/rebase-apply/patch:366: space before tab in indent.
>            {
> warning: 2 lines add whitespace errors.
> Applying: x86_64: Add exp10m1f with FMA
> Applying: x86_64: Add exp2m1f with FMA
> ```

Thanks, I have fixed it loacally.