From patchwork Mon Sep 16 08:37:14 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Will Newton X-Patchwork-Id: 20318 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-vb0-f71.google.com (mail-vb0-f71.google.com [209.85.212.71]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 994AA24694 for ; Mon, 16 Sep 2013 08:37:21 +0000 (UTC) Received: by mail-vb0-f71.google.com with SMTP id g17sf4015162vbg.10 for ; Mon, 16 Sep 2013 01:37:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:message-id:date:from:user-agent :mime-version:to:cc:subject:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :list-post:list-help:list-archive:list-unsubscribe:content-type :content-transfer-encoding; bh=etGGCtic/WMKDjXxxcl2lOKl/9bJ3MNBLtGqH7999Ks=; b=X3TzzlUKRpkddOV+nNAveFFai7dO9bXRdNrZ1ipsYeM+WHmwSd5HC4VDLRyd7yu9X3 ytf6zwQIRum9Po7AqU2VBmUchfSnk0bU343tVcrkbONkHFBgBgrZxgM9716FIwUecPcA 5xz4utI5hbAwue71miJP2L5zB262KlDClK+Nko3EaNoP8m5zEZSL9VmplxaF628Fq8wo TTUoWbU8mHRvfuGz+u9HXFRFg6+aNowSsNTdAKK9F8vD1CiHGslp4aV7RXRxnkJfLBWx YptJXXgXgNsZNSXZs1PPfo5EYvsoiR/Oyv+G6u3Pb2cA9xkw2hYnd3vmzIwW66hGJIZL 5eyQ== X-Gm-Message-State: ALoCoQnl7VwNF0r+7ArTuGRzlKDR7fFJ8xoC4ujYtJzoG8I/kcdEs2PyW68Z3MAQuIfSTZK0DGn0 X-Received: by 10.224.166.205 with SMTP id n13mr8462506qay.2.1379320640281; Mon, 16 Sep 2013 01:37:20 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.4.70 with SMTP id i6ls2083934qei.26.gmail; Mon, 16 Sep 2013 01:37:20 -0700 (PDT) X-Received: by 10.52.34.40 with SMTP id w8mr21883956vdi.7.1379320640198; Mon, 16 Sep 2013 01:37:20 -0700 (PDT) Received: from mail-vc0-f177.google.com (mail-vc0-f177.google.com [209.85.220.177]) by mx.google.com with ESMTPS id j6si6836847vet.55.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 16 Sep 2013 01:37:20 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.177 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.177; Received: by mail-vc0-f177.google.com with SMTP id hv10so373564vcb.36 for ; Mon, 16 Sep 2013 01:37:20 -0700 (PDT) X-Received: by 10.52.157.134 with SMTP id wm6mr641685vdb.26.1379320640064; Mon, 16 Sep 2013 01:37:20 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp84438vcz; Mon, 16 Sep 2013 01:37:19 -0700 (PDT) X-Received: by 10.204.123.199 with SMTP id q7mr23716331bkr.10.1379320638652; Mon, 16 Sep 2013 01:37:18 -0700 (PDT) Received: from mail-bk0-f53.google.com (mail-bk0-f53.google.com [209.85.214.53]) by mx.google.com with ESMTPS id oe5si3159004bkb.83.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 16 Sep 2013 01:37:18 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.214.53 is neither permitted nor denied by best guess record for domain of will.newton@linaro.org) client-ip=209.85.214.53; Received: by mail-bk0-f53.google.com with SMTP id d7so1332647bkh.26 for ; Mon, 16 Sep 2013 01:37:17 -0700 (PDT) X-Received: by 10.204.231.76 with SMTP id jp12mr425879bkb.48.1379320637300; Mon, 16 Sep 2013 01:37:17 -0700 (PDT) Received: from localhost.localdomain (cpc6-seac21-2-0-cust453.7-2.cable.virginmedia.com. [82.1.113.198]) by mx.google.com with ESMTPSA id zl3sm6854262bkb.4.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Mon, 16 Sep 2013 01:37:16 -0700 (PDT) Message-ID: <5236C33A.7080802@linaro.org> Date: Mon, 16 Sep 2013 09:37:14 +0100 From: Will Newton User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130805 Thunderbird/17.0.8 MIME-Version: 1.0 To: libc-ports@sourceware.org CC: patches@linaro.org Subject: [PATCH v4] ARM: Improve armv7 memcpy performance. X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: will.newton@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.177 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Only enter the aligned copy loop with buffers that can be 8-byte aligned. This improves performance slightly on Cortex-A9 and Cortex-A15 cores for large copies with buffers that are 4-byte aligned but not 8-byte aligned. ports/ChangeLog.arm: 2013-08-30 Will Newton * sysdeps/arm/armv7/multiarch/memcpy_impl.S: Tighten check on entry to aligned copy loop to improve performance. --- ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) Changes in v4: - More comment fixes The output of the cortex-strings benchmark can be found here (where "this" is the new code and "old" is the previous version): http://people.linaro.org/~will.newton/glibc_memcpy/ diff --git a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S index 3decad6..ad43a3d 100644 --- a/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S +++ b/ports/sysdeps/arm/armv7/multiarch/memcpy_impl.S @@ -24,7 +24,6 @@ ARMv6 (ARMv7-a if using Neon) ARM state Unaligned accesses - LDRD/STRD support unaligned word accesses */ @@ -369,8 +368,8 @@ ENTRY(memcpy) cfi_adjust_cfa_offset (FRAME_SIZE) cfi_rel_offset (tmp2, 0) cfi_remember_state - and tmp2, src, #3 - and tmp1, dst, #3 + and tmp2, src, #7 + and tmp1, dst, #7 cmp tmp1, tmp2 bne .Lcpy_notaligned @@ -381,9 +380,9 @@ ENTRY(memcpy) vmov.f32 s0, s0 #endif - /* SRC and DST have the same mutual 32-bit alignment, but we may + /* SRC and DST have the same mutual 64-bit alignment, but we may still need to pre-copy some bytes to get to natural alignment. - We bring DST into full 64-bit alignment. */ + We bring SRC and DST into full 64-bit alignment. */ lsls tmp2, dst, #29 beq 1f rsbs tmp2, tmp2, #0 @@ -515,7 +514,7 @@ ENTRY(memcpy) .Ltail63aligned: /* Count in tmp2. */ /* Copy up to 7 d-words of data. Similar to Ltail63unaligned, but - we know that the src and dest are 32-bit aligned so we can use + we know that the src and dest are 64-bit aligned so we can use LDRD/STRD to improve efficiency. */ /* TMP2 is now negative, but we don't care about that. The bottom six bits still tell us how many bytes are left to copy. */