mbox series

[00/15] Add c23 CORE-MATH binary32 implementations to libm

Message ID 20250131191844.2582716-1-adhemerval.zanella@linaro.org
Headers show
Series Add c23 CORE-MATH binary32 implementations to libm | expand

Message

Adhemerval Zanella Netto Jan. 31, 2025, 7:17 p.m. UTC
This patchset adds the optimized and correctly rounded acospif, asinpif,
atan2pif, atanpif, cospif, sinpif, and tanpif from CORE-MATH [1].  Each
implementation has a benchmark to evaluate the performance improvements.

All implementation shows performance improvements in all but one case:
asinpif on x86_64/x86_64-v2. This is due to the use of a fma operation
in the fast patch. Only x86_64-v3 provides it without a function call,
and this is mitigated with an ifunc variant for x86_64-v3.

[1] https://gitlab.inria.fr/core-math/core-math

Adhemerval Zanella (15):
  benchtests: Add acospif
  benchtests: Add asinpif
  benchtests: Add atan2pif
  benchtests: Add atanpif
  benchtests: Add cospif
  benchtests: Add sinpif
  benchtests: Add tanpif
  math: Use acospif from CORE-MATH
  math: Use asinpif from CORE-MATH
  math: Use atan2pif from CORE-MATH
  math: Use atanpif from CORE-MATH
  math: Use cospif from CORE-MATH
  math: Use sinpif from CORE-MATH
  math: Use tanpif from CORE-MATH
  x86_64: Add asinpif with FMA

 SHARED-FILES                                  |   28 +
 benchtests/Makefile                           |    7 +
 benchtests/acospif-inputs                     | 2710 +++++++++++++++++
 benchtests/asinpif-inputs                     | 2710 +++++++++++++++++
 benchtests/atan2pif-inputs                    | 2005 ++++++++++++
 benchtests/atanpif-inputs                     | 2005 ++++++++++++
 benchtests/cospif-inputs                      | 2409 +++++++++++++++
 benchtests/sinpif-inputs                      | 2409 +++++++++++++++
 benchtests/tanpif-inputs                      | 2409 +++++++++++++++
 sysdeps/aarch64/libm-test-ulps                |   28 -
 sysdeps/arc/fpu/libm-test-ulps                |   28 -
 sysdeps/arc/nofpu/libm-test-ulps              |    7 -
 sysdeps/arm/libm-test-ulps                    |   28 -
 sysdeps/hppa/fpu/libm-test-ulps               |   28 -
 sysdeps/i386/fpu/libm-test-ulps               |   28 -
 .../i386/i686/fpu/multiarch/libm-test-ulps    |   28 -
 sysdeps/ieee754/flt-32/math_config.h          |   25 +
 sysdeps/ieee754/flt-32/s_acospif.c            |  137 +
 sysdeps/ieee754/flt-32/s_asinpif.c            |  138 +
 sysdeps/ieee754/flt-32/s_atan2pif.c           |  238 ++
 sysdeps/ieee754/flt-32/s_atanpif.c            |  109 +
 sysdeps/ieee754/flt-32/s_cospif.c             |  136 +
 sysdeps/ieee754/flt-32/s_sinpif.c             |  134 +
 sysdeps/ieee754/flt-32/s_tanpif.c             |   88 +
 sysdeps/loongarch/lp64/libm-test-ulps         |   28 -
 sysdeps/mips/mips64/libm-test-ulps            |   28 -
 sysdeps/or1k/fpu/libm-test-ulps               |   28 -
 sysdeps/or1k/nofpu/libm-test-ulps             |    7 -
 sysdeps/powerpc/fpu/libm-test-ulps            |   28 -
 sysdeps/powerpc/fpu/math_private.h            |    1 +
 sysdeps/riscv/nofpu/libm-test-ulps            |    7 -
 sysdeps/riscv/rvd/libm-test-ulps              |   28 -
 sysdeps/s390/fpu/libm-test-ulps               |   28 -
 sysdeps/sparc/fpu/libm-test-ulps              |   28 -
 sysdeps/x86_64/fpu/libm-test-ulps             |   28 -
 sysdeps/x86_64/fpu/multiarch/Makefile         |    2 +
 sysdeps/x86_64/fpu/multiarch/s_asinpif-fma.c  |    4 +
 sysdeps/x86_64/fpu/multiarch/s_asinpif.c      |   33 +
 38 files changed, 17737 insertions(+), 413 deletions(-)
 create mode 100644 benchtests/acospif-inputs
 create mode 100644 benchtests/asinpif-inputs
 create mode 100644 benchtests/atan2pif-inputs
 create mode 100644 benchtests/atanpif-inputs
 create mode 100644 benchtests/cospif-inputs
 create mode 100644 benchtests/sinpif-inputs
 create mode 100644 benchtests/tanpif-inputs
 create mode 100644 sysdeps/ieee754/flt-32/s_acospif.c
 create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
 create mode 100644 sysdeps/ieee754/flt-32/s_atan2pif.c
 create mode 100644 sysdeps/ieee754/flt-32/s_atanpif.c
 create mode 100644 sysdeps/ieee754/flt-32/s_cospif.c
 create mode 100644 sysdeps/ieee754/flt-32/s_sinpif.c
 create mode 100644 sysdeps/ieee754/flt-32/s_tanpif.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_asinpif-fma.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_asinpif.c

Comments

Paul Zimmermann Feb. 1, 2025, 8:14 a.m. UTC | #1
> From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
> Cc: DJ Delorie <dj@redhat.com>,
> 	Joseph Myers <josmyers@redhat.com>,
> 	Paul Zimmermann <Paul.Zimmermann@inria.fr>,
> 	Alexei Sibidanov <sibid@uvic.ca>
> Date: Fri, 31 Jan 2025 16:17:04 -0300
> 
> This patchset adds the optimized and correctly rounded acospif, asinpif,
> atan2pif, atanpif, cospif, sinpif, and tanpif from CORE-MATH [1].  Each
> implementation has a benchmark to evaluate the performance improvements.
> 
> All implementation shows performance improvements in all but one case:
> asinpif on x86_64/x86_64-v2. This is due to the use of a fma operation
> in the fast patch. Only x86_64-v3 provides it without a function call,

patch -> path

I will check each patch provides correct rounding for all rounding modes on x86_64.

Paul

> and this is mitigated with an ifunc variant for x86_64-v3.
> 
> [1] https://gitlab.inria.fr/core-math/core-math
> 
> Adhemerval Zanella (15):
>   benchtests: Add acospif
>   benchtests: Add asinpif
>   benchtests: Add atan2pif
>   benchtests: Add atanpif
>   benchtests: Add cospif
>   benchtests: Add sinpif
>   benchtests: Add tanpif
>   math: Use acospif from CORE-MATH
>   math: Use asinpif from CORE-MATH
>   math: Use atan2pif from CORE-MATH
>   math: Use atanpif from CORE-MATH
>   math: Use cospif from CORE-MATH
>   math: Use sinpif from CORE-MATH
>   math: Use tanpif from CORE-MATH
>   x86_64: Add asinpif with FMA
> 
>  SHARED-FILES                                  |   28 +
>  benchtests/Makefile                           |    7 +
>  benchtests/acospif-inputs                     | 2710 +++++++++++++++++
>  benchtests/asinpif-inputs                     | 2710 +++++++++++++++++
>  benchtests/atan2pif-inputs                    | 2005 ++++++++++++
>  benchtests/atanpif-inputs                     | 2005 ++++++++++++
>  benchtests/cospif-inputs                      | 2409 +++++++++++++++
>  benchtests/sinpif-inputs                      | 2409 +++++++++++++++
>  benchtests/tanpif-inputs                      | 2409 +++++++++++++++
>  sysdeps/aarch64/libm-test-ulps                |   28 -
>  sysdeps/arc/fpu/libm-test-ulps                |   28 -
>  sysdeps/arc/nofpu/libm-test-ulps              |    7 -
>  sysdeps/arm/libm-test-ulps                    |   28 -
>  sysdeps/hppa/fpu/libm-test-ulps               |   28 -
>  sysdeps/i386/fpu/libm-test-ulps               |   28 -
>  .../i386/i686/fpu/multiarch/libm-test-ulps    |   28 -
>  sysdeps/ieee754/flt-32/math_config.h          |   25 +
>  sysdeps/ieee754/flt-32/s_acospif.c            |  137 +
>  sysdeps/ieee754/flt-32/s_asinpif.c            |  138 +
>  sysdeps/ieee754/flt-32/s_atan2pif.c           |  238 ++
>  sysdeps/ieee754/flt-32/s_atanpif.c            |  109 +
>  sysdeps/ieee754/flt-32/s_cospif.c             |  136 +
>  sysdeps/ieee754/flt-32/s_sinpif.c             |  134 +
>  sysdeps/ieee754/flt-32/s_tanpif.c             |   88 +
>  sysdeps/loongarch/lp64/libm-test-ulps         |   28 -
>  sysdeps/mips/mips64/libm-test-ulps            |   28 -
>  sysdeps/or1k/fpu/libm-test-ulps               |   28 -
>  sysdeps/or1k/nofpu/libm-test-ulps             |    7 -
>  sysdeps/powerpc/fpu/libm-test-ulps            |   28 -
>  sysdeps/powerpc/fpu/math_private.h            |    1 +
>  sysdeps/riscv/nofpu/libm-test-ulps            |    7 -
>  sysdeps/riscv/rvd/libm-test-ulps              |   28 -
>  sysdeps/s390/fpu/libm-test-ulps               |   28 -
>  sysdeps/sparc/fpu/libm-test-ulps              |   28 -
>  sysdeps/x86_64/fpu/libm-test-ulps             |   28 -
>  sysdeps/x86_64/fpu/multiarch/Makefile         |    2 +
>  sysdeps/x86_64/fpu/multiarch/s_asinpif-fma.c  |    4 +
>  sysdeps/x86_64/fpu/multiarch/s_asinpif.c      |   33 +
>  38 files changed, 17737 insertions(+), 413 deletions(-)
>  create mode 100644 benchtests/acospif-inputs
>  create mode 100644 benchtests/asinpif-inputs
>  create mode 100644 benchtests/atan2pif-inputs
>  create mode 100644 benchtests/atanpif-inputs
>  create mode 100644 benchtests/cospif-inputs
>  create mode 100644 benchtests/sinpif-inputs
>  create mode 100644 benchtests/tanpif-inputs
>  create mode 100644 sysdeps/ieee754/flt-32/s_acospif.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_asinpif.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_atan2pif.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_atanpif.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_cospif.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_sinpif.c
>  create mode 100644 sysdeps/ieee754/flt-32/s_tanpif.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_asinpif-fma.c
>  create mode 100644 sysdeps/x86_64/fpu/multiarch/s_asinpif.c
> 
> -- 
> 2.43.0
> 
>