From patchwork Mon Nov 5 12:42:17 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Martin X-Patchwork-Id: 12664 Return-Path: X-Original-To: patchwork@peony.canonical.com Delivered-To: patchwork@peony.canonical.com Received: from fiordland.canonical.com (fiordland.canonical.com [91.189.94.145]) by peony.canonical.com (Postfix) with ESMTP id BA84523E00 for ; Mon, 5 Nov 2012 12:42:30 +0000 (UTC) Received: from mail-ia0-f180.google.com (mail-ia0-f180.google.com [209.85.210.180]) by fiordland.canonical.com (Postfix) with ESMTP id 491FBA182F9 for ; Mon, 5 Nov 2012 12:42:30 +0000 (UTC) Received: by mail-ia0-f180.google.com with SMTP id f6so4096010iag.11 for ; Mon, 05 Nov 2012 04:42:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-forwarded-to:x-forwarded-for:delivered-to:received-spf:from:to:cc :subject:date:message-id:x-mailer:in-reply-to:references :x-gm-message-state; bh=pFxYKcgFko3QnMfx6Ukiegc9jd+G5Pifi2L013Fxgoc=; b=bwu+E77X35pVd61JfU6k8w7/++L1cGHlJJDxdUksqTjaTU68c2uAuq2PPO7d3VMfX0 1XNc1M22wlW/Er7m/LoCun8L6UIR3nu/tGOLxE+O09yhYa/ep7dP+1v92kShfxbZKH9B WzZRo0yqKTjtVVYEbw9EcD3L99CwhkBgFAt9S8TewirydQKVRt1q3Arx7O6SjV9RbvqO P4vTAqEOXb+nnrEm8Qh9aSnBhWYTs2Wo4gJf1Cq4wTZsONtTtUbbFU7L346kW57jzvQb MODf9wCuP7JPIL23v20OynlVSQakTC/hjcMsyiGnko07yIDJQXxd6F+4mNI8In9kmPfZ MurA== Received: by 10.42.57.10 with SMTP id b10mr8112892ich.54.1352119349741; Mon, 05 Nov 2012 04:42:29 -0800 (PST) X-Forwarded-To: linaro-patchwork@canonical.com X-Forwarded-For: patch@linaro.org linaro-patchwork@canonical.com Delivered-To: patches@linaro.org Received: by 10.50.67.148 with SMTP id n20csp208727igt; Mon, 5 Nov 2012 04:42:28 -0800 (PST) Received: by 10.152.104.240 with SMTP id gh16mr9181540lab.56.1352119348353; Mon, 05 Nov 2012 04:42:28 -0800 (PST) Received: from mail-la0-f52.google.com (mail-la0-f52.google.com [209.85.215.52]) by mx.google.com with ESMTPS id v1si12304034lbh.52.2012.11.05.04.42.27 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 05 Nov 2012 04:42:28 -0800 (PST) Received-SPF: neutral (google.com: 209.85.215.52 is neither permitted nor denied by best guess record for domain of dave.martin@linaro.org) client-ip=209.85.215.52; Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.215.52 is neither permitted nor denied by best guess record for domain of dave.martin@linaro.org) smtp.mail=dave.martin@linaro.org Received: by mail-la0-f52.google.com with SMTP id s15so3424220lag.25 for ; Mon, 05 Nov 2012 04:42:27 -0800 (PST) Received: by 10.112.48.133 with SMTP id l5mr3978248lbn.53.1352119347539; Mon, 05 Nov 2012 04:42:27 -0800 (PST) Received: from e103592.peterhouse.linaro.org (fw-lnat.cambridge.arm.com. [217.140.96.63]) by mx.google.com with ESMTPS id sx3sm5618680lab.9.2012.11.05.04.42.25 (version=SSLv3 cipher=OTHER); Mon, 05 Nov 2012 04:42:26 -0800 (PST) From: Dave Martin To: linux-arm-kernel@lists.infradead.org Cc: patches@linaro.org, Jussi Kivilinna , David McCullough , Herbert Xu Subject: [PATCH] arm/crypto: Make asm SHA-1 and AES code Thumb-2 compatible Date: Mon, 5 Nov 2012 12:42:17 +0000 Message-Id: <1352119337-22619-1-git-send-email-dave.martin@linaro.org> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <20121030100742.17910.70241.stgit@localhost6.localdomain6> References: <20121030100742.17910.70241.stgit@localhost6.localdomain6> X-Gm-Message-State: ALoCoQnUHAUrm1DG4KwCML4j1IsDN6pdyQP5vVf0Rf4Of7QaBneWgvfgc//meneIl89peTuE8LdU This patch fixes aes-armv4.S and sha1-armv4-large.S to work natively in Thumb. This allows ARM/Thumb interworking workarounds to be removed. I also take the opportunity to convert some explicit assembler directives for exported functions to the standard ENTRY()/ENDPROC(). For the code itself: * In sha1_block_data_order, use of TEQ with sp is deprecated in ARMv7 and not supported in Thumb. For the branches back to .L_00_15 and .L_40_59, the TEQ is converted to a CMP, under the assumption that clobbering the C flag here will not cause incorrect behaviour. For the first branch back to .L_20_39_or_60_79 the C flag is important, so sp is moved temporarily into another register so that TEQ can be used for the comparison. * In the AES code, most forms of register-indexed addressing with shifts and rotates are not permitted for loads and stores in Thumb, so the address calculation is done using a separate instruction for the Thumb case. The resulting code is unlikely to be optimally scheduled, but it should not have a large impact given the overall size of the code. I haven't run any benchmarks. Signed-off-by: Dave Martin Acked-by: Nicolas Pitre Tested-by: David McCullough Acked-by: David McCullough --- For now, I have built the code but not tested it. I'll consider the patch an RFC until someone gives me a Tested-by (or failing that, when I get around to testing it myself...) Cheers ---Dave arch/arm/crypto/aes-armv4.S | 64 +++++++++++------------------------ arch/arm/crypto/sha1-armv4-large.S | 24 +++++-------- 2 files changed, 29 insertions(+), 59 deletions(-) diff --git a/arch/arm/crypto/aes-armv4.S b/arch/arm/crypto/aes-armv4.S index e59b1d5..19d6cd6 100644 --- a/arch/arm/crypto/aes-armv4.S +++ b/arch/arm/crypto/aes-armv4.S @@ -34,8 +34,9 @@ @ A little glue here to select the correct code below for the ARM CPU @ that is being targetted. +#include + .text -.code 32 .type AES_Te,%object .align 5 @@ -145,10 +146,8 @@ AES_Te: @ void AES_encrypt(const unsigned char *in, unsigned char *out, @ const AES_KEY *key) { -.global AES_encrypt -.type AES_encrypt,%function .align 5 -AES_encrypt: +ENTRY(AES_encrypt) sub r3,pc,#8 @ AES_encrypt stmdb sp!,{r1,r4-r12,lr} mov r12,r0 @ inp @@ -239,15 +238,8 @@ AES_encrypt: strb r6,[r12,#14] strb r3,[r12,#15] #endif -#if __ARM_ARCH__>=5 ldmia sp!,{r4-r12,pc} -#else - ldmia sp!,{r4-r12,lr} - tst lr,#1 - moveq pc,lr @ be binary compatible with V4, yet - .word 0xe12fff1e @ interoperable with Thumb ISA:-) -#endif -.size AES_encrypt,.-AES_encrypt +ENDPROC(AES_encrypt) .type _armv4_AES_encrypt,%function .align 2 @@ -386,10 +378,8 @@ _armv4_AES_encrypt: ldr pc,[sp],#4 @ pop and return .size _armv4_AES_encrypt,.-_armv4_AES_encrypt -.global private_AES_set_encrypt_key -.type private_AES_set_encrypt_key,%function .align 5 -private_AES_set_encrypt_key: +ENTRY(private_AES_set_encrypt_key) _armv4_AES_set_encrypt_key: sub r3,pc,#8 @ AES_set_encrypt_key teq r0,#0 @@ -658,15 +648,11 @@ _armv4_AES_set_encrypt_key: .Ldone: mov r0,#0 ldmia sp!,{r4-r12,lr} -.Labrt: tst lr,#1 - moveq pc,lr @ be binary compatible with V4, yet - .word 0xe12fff1e @ interoperable with Thumb ISA:-) -.size private_AES_set_encrypt_key,.-private_AES_set_encrypt_key +.Labrt: mov pc,lr +ENDPROC(private_AES_set_encrypt_key) -.global private_AES_set_decrypt_key -.type private_AES_set_decrypt_key,%function .align 5 -private_AES_set_decrypt_key: +ENTRY(private_AES_set_decrypt_key) str lr,[sp,#-4]! @ push lr #if 0 @ kernel does both of these in setkey so optimise this bit out by @@ -748,15 +734,8 @@ private_AES_set_decrypt_key: bne .Lmix mov r0,#0 -#if __ARM_ARCH__>=5 ldmia sp!,{r4-r12,pc} -#else - ldmia sp!,{r4-r12,lr} - tst lr,#1 - moveq pc,lr @ be binary compatible with V4, yet - .word 0xe12fff1e @ interoperable with Thumb ISA:-) -#endif -.size private_AES_set_decrypt_key,.-private_AES_set_decrypt_key +ENDPROC(private_AES_set_decrypt_key) .type AES_Td,%object .align 5 @@ -862,10 +841,8 @@ AES_Td: @ void AES_decrypt(const unsigned char *in, unsigned char *out, @ const AES_KEY *key) { -.global AES_decrypt -.type AES_decrypt,%function .align 5 -AES_decrypt: +ENTRY(AES_decrypt) sub r3,pc,#8 @ AES_decrypt stmdb sp!,{r1,r4-r12,lr} mov r12,r0 @ inp @@ -956,15 +933,8 @@ AES_decrypt: strb r6,[r12,#14] strb r3,[r12,#15] #endif -#if __ARM_ARCH__>=5 ldmia sp!,{r4-r12,pc} -#else - ldmia sp!,{r4-r12,lr} - tst lr,#1 - moveq pc,lr @ be binary compatible with V4, yet - .word 0xe12fff1e @ interoperable with Thumb ISA:-) -#endif -.size AES_decrypt,.-AES_decrypt +ENDPROC(AES_decrypt) .type _armv4_AES_decrypt,%function .align 2 @@ -1064,7 +1034,9 @@ _armv4_AES_decrypt: and r9,lr,r1,lsr#8 ldrb r7,[r10,r7] @ Td4[s1>>0] - ldrb r1,[r10,r1,lsr#24] @ Td4[s1>>24] + ARM( ldrb r1,[r10,r1,lsr#24] ) @ Td4[s1>>24] + THUMB( add r1,r10,r1,lsr#24 ) @ Td4[s1>>24] + THUMB( ldrb r1,[r1] ) ldrb r8,[r10,r8] @ Td4[s1>>16] eor r0,r7,r0,lsl#24 ldrb r9,[r10,r9] @ Td4[s1>>8] @@ -1077,7 +1049,9 @@ _armv4_AES_decrypt: ldrb r8,[r10,r8] @ Td4[s2>>0] and r9,lr,r2,lsr#16 - ldrb r2,[r10,r2,lsr#24] @ Td4[s2>>24] + ARM( ldrb r2,[r10,r2,lsr#24] ) @ Td4[s2>>24] + THUMB( add r2,r10,r2,lsr#24 ) @ Td4[s2>>24] + THUMB( ldrb r2,[r2] ) eor r0,r0,r7,lsl#8 ldrb r9,[r10,r9] @ Td4[s2>>16] eor r1,r8,r1,lsl#16 @@ -1090,7 +1064,9 @@ _armv4_AES_decrypt: and r9,lr,r3 @ i2 ldrb r9,[r10,r9] @ Td4[s3>>0] - ldrb r3,[r10,r3,lsr#24] @ Td4[s3>>24] + ARM( ldrb r3,[r10,r3,lsr#24] ) @ Td4[s3>>24] + THUMB( add r3,r10,r3,lsr#24 ) @ Td4[s3>>24] + THUMB( ldrb r3,[r3] ) eor r0,r0,r7,lsl#16 ldr r7,[r11,#0] eor r1,r1,r8,lsl#8 diff --git a/arch/arm/crypto/sha1-armv4-large.S b/arch/arm/crypto/sha1-armv4-large.S index 7050ab1..92c6eed 100644 --- a/arch/arm/crypto/sha1-armv4-large.S +++ b/arch/arm/crypto/sha1-armv4-large.S @@ -51,13 +51,12 @@ @ Profiler-assisted and platform-specific optimization resulted in 10% @ improvement on Cortex A8 core and 12.2 cycles per byte. -.text +#include -.global sha1_block_data_order -.type sha1_block_data_order,%function +.text .align 2 -sha1_block_data_order: +ENTRY(sha1_block_data_order) stmdb sp!,{r4-r12,lr} add r2,r1,r2,lsl#6 @ r2 to point at the end of r1 ldmia r0,{r3,r4,r5,r6,r7} @@ -194,7 +193,7 @@ sha1_block_data_order: eor r10,r10,r7,ror#2 @ F_00_19(B,C,D) str r9,[r14,#-4]! add r3,r3,r10 @ E+=F_00_19(B,C,D) - teq r14,sp + cmp r14,sp bne .L_00_15 @ [((11+4)*5+2)*3] #if __ARM_ARCH__<7 ldrb r10,[r1,#2] @@ -374,7 +373,9 @@ sha1_block_data_order: @ F_xx_xx add r3,r3,r9 @ E+=X[i] add r3,r3,r10 @ E+=F_20_39(B,C,D) - teq r14,sp @ preserve carry + ARM( teq r14,sp ) @ preserve carry + THUMB( mov r11,sp ) + THUMB( teq r14,r11 ) @ preserve carry bne .L_20_39_or_60_79 @ [+((12+3)*5+2)*4] bcs .L_done @ [+((12+3)*5+2)*4], spare 300 bytes @@ -466,7 +467,7 @@ sha1_block_data_order: add r3,r3,r9 @ E+=X[i] add r3,r3,r10 @ E+=F_40_59(B,C,D) add r3,r3,r11,ror#2 - teq r14,sp + cmp r14,sp bne .L_40_59 @ [+((12+5)*5+2)*4] ldr r8,.LK_60_79 @@ -485,19 +486,12 @@ sha1_block_data_order: teq r1,r2 bne .Lloop @ [+18], total 1307 -#if __ARM_ARCH__>=5 ldmia sp!,{r4-r12,pc} -#else - ldmia sp!,{r4-r12,lr} - tst lr,#1 - moveq pc,lr @ be binary compatible with V4, yet - .word 0xe12fff1e @ interoperable with Thumb ISA:-) -#endif .align 2 .LK_00_19: .word 0x5a827999 .LK_20_39: .word 0x6ed9eba1 .LK_40_59: .word 0x8f1bbcdc .LK_60_79: .word 0xca62c1d6 -.size sha1_block_data_order,.-sha1_block_data_order +ENDPROC(sha1_block_data_order) .asciz "SHA1 block transform for ARMv4, CRYPTOGAMS by " .align 2