[v2] ARM: Improve armv7 memcpy performance.

Message ID	5220B5A1.40903@linaro.org
State	Superseded
Headers	show Return-Path: <patchwork-forward+bncBCXPZFGDUEJBBJPLQKIQKGQE3QJ6LRQ@linaro.org> Received-SPF: neutral (google.com: 209.85.128.181 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.128.181; Received-SPF: neutral (google.com: 74.125.83.51 is neither permitted nor denied by best guess record for domain of will.newton@linaro.org) client-ip=74.125.83.51; Message-ID: <5220B5A1.40903@linaro.org> Date: Fri, 30 Aug 2013 16:09:21 +0100 From: Will Newton <will.newton@linaro.org> User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: libc-ports@sourceware.org CC: patches@linaro.org Subject: [PATCH v2] ARM: Improve armv7 memcpy performance. Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit

Message ID

5220B5A1.40903@linaro.org

State

Superseded

Headers

Received-SPF: neutral (google.com: 209.85.128.181 is neither permitted nor
	denied by best guess record for domain of
	patch+caf_=patchwork-forward=linaro.org@linaro.org)
	client-ip=209.85.128.181; 
Received-SPF: neutral (google.com: 74.125.83.51 is neither permitted nor
	denied by best guess record for domain of
	will.newton@linaro.org) client-ip=74.125.83.51; 
Message-ID: <5220B5A1.40903@linaro.org>
Date: Fri, 30 Aug 2013 16:09:21 +0100
From: Will Newton <will.newton@linaro.org>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
	rv:17.0) Gecko/20130805 Thunderbird/17.0.8
MIME-Version: 1.0
To: libc-ports@sourceware.org
CC: patches@linaro.org
Subject: [PATCH v2] ARM: Improve armv7 memcpy performance.
Precedence: list
Mailing-list: list patchwork-forward@linaro.org;
	contact patchwork-forward+owners@linaro.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Commit Message

Will Newton Aug. 30, 2013, 3:09 p.m. UTC

Only enter the aligned copy loop with buffers that can be 8-byte
aligned. This improves performance slightly on Cortex-A9 and
Cortex-A15 cores for large copies with buffers that are 4-byte
aligned but not 8-byte aligned.

ports/ChangeLog.arm:

2013-08-30  Will Newton  <will.newton@linaro.org>

	* sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
	on entry to aligned copy loop to improve performance.
---
 ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Changes in v2:
 - Improved description

Comments

Carlos O'Donell Aug. 30, 2013, 5:16 p.m. UTC | #1

On 08/30/2013 11:09 AM, Will Newton wrote:
> 
> Only enter the aligned copy loop with buffers that can be 8-byte
> aligned. This improves performance slightly on Cortex-A9 and
> Cortex-A15 cores for large copies with buffers that are 4-byte
> aligned but not 8-byte aligned.
> 
> ports/ChangeLog.arm:
> 
> 2013-08-30  Will Newton  <will.newton@linaro.org>
> 
> 	* sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check
> 	on entry to aligned copy loop to improve performance.

How did you test this?

Did you use the glibc performance microbenchmark?

Does the microbenchmark show gains with this change? What are the numbers?

Cheers,
Carlos.

> ---
>  ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Changes in v2:
>  - Improved description
> 
> diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> index 3decad6..6e84173 100644
> --- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> +++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
> @@ -369,8 +369,8 @@ ENTRY(memcpy)
>  	cfi_adjust_cfa_offset (FRAME_SIZE)
>  	cfi_rel_offset (tmp2, 0)
>  	cfi_remember_state
> -	and	tmp2, src, #3
> -	and	tmp1, dst, #3
> +	and	tmp2, src, #7
> +	and	tmp1, dst, #7
>  	cmp	tmp1, tmp2
>  	bne	.Lcpy_notaligned
>

diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
index 3decad6..6e84173 100644
--- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
+++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S
@@ -369,8 +369,8 @@  ENTRY(memcpy)
 	cfi_adjust_cfa_offset (FRAME_SIZE)
 	cfi_rel_offset (tmp2, 0)
 	cfi_remember_state
-	and	tmp2, src, #3
-	and	tmp1, dst, #3
+	and	tmp2, src, #7
+	and	tmp1, dst, #7
 	cmp	tmp1, tmp2
 	bne	.Lcpy_notaligned

[v2] ARM: Improve armv7 memcpy performance.

Commit Message

Comments

Patch