From patchwork Mon Dec 4 12:26:42 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 120526 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp4366071qgn; Mon, 4 Dec 2017 04:27:49 -0800 (PST) X-Google-Smtp-Source: AGs4zMbyBt/AblYpKMrdbKi5cScr+BpEqQnWRQOCdo/E83g9EsUditF3ddgND0zA2aSnrYW8J58p X-Received: by 10.98.159.16 with SMTP id g16mr19275147pfe.53.1512390469805; Mon, 04 Dec 2017 04:27:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1512390469; cv=none; d=google.com; s=arc-20160816; b=MyubI5VEMpKg/qj193DLJtZs6Px+1vzW2wpIHBXrIs3+Gvr1JgnA/FKx7B2qbIGfxO 1QmZT7x0uDG0bANgxn88++h0Mm91DY2shkumq/SLIO1+kYHrFgup/K5I+G0qLzynQtOL YTwQrmzvBTPvN4M0g3y2Qd/Oq5ZlkTRaLlsWWUX/hlDcpkvNwakuIYYUGOVhsnCDnOlA YR1eG8Sd0hThGCETC19EX49q+tG3U5ZQqhruLr1p3Y3QhYEk4AqZWzrrZZ7WthYdErf+ uXwNV5lW8roujFcuns/mE9TgbBxHv/roy4B/ZSapz4z3/95IKPUj1qF57DHkvg0361st llbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=; b=arlSHtrbxeVG6fEorsTBSaz3dIlYySVyMnD4+bW8/wHrm7LsTfzRscjt01G2j6sdbz bkxI7amSSGdCEq57/p+BCCrOgWETbAuPzg3uBxl2YNdLHDbrrKrT3MDllTeSa7wibrEb PTuWGpAWix0ay7aLCw6YlGeeWz/crACGZPLKi7pH0M3yYqCNdS17KTPaayCQpbx/m9Yi ih0hKvRAJtVTaGWh01tH4MDirL8BswbqxQJAxmxv0jnFu7Eb9fUuX0NgQxwOsoWgmy5O /gZ6U62siKEV5kkvIfhfdDhp+vD0F6y5cFgmmyx+tp9UIft47T2NZcIC4+Qk+7jXy3tu 58BA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RRPICWwD; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d23si10177768pfe.339.2017.12.04.04.27.49; Mon, 04 Dec 2017 04:27:49 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RRPICWwD; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753334AbdLDM1s (ORCPT + 1 other); Mon, 4 Dec 2017 07:27:48 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:37000 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753323AbdLDM1p (ORCPT ); Mon, 4 Dec 2017 07:27:45 -0500 Received: by mail-wm0-f66.google.com with SMTP id f140so13921499wmd.2 for ; Mon, 04 Dec 2017 04:27:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=; b=RRPICWwD8QaMGtwp4l1SQrQAwjJhkk27XHqdg0sz5ZgViyfB9WmuJBZErqR7vUnVES b6LIMY/i3y1zMWfwOBZkwM65x8RxBooUFclFaRM+49JGlQFxMnl/+Bikfimd+O1Ufr94 gVt8FwdgbCypZkJlWYhun4SCo41fhF/b84eO4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=6smGlxd/jiQWMr+EsFkOIWO4c5++VCVYV7KKwPV9qWc=; b=GaqjAv1SzkV0DaYck5M5+lEDdVpEC+ALcC1T9PplW39VZlj+MopWN85RAul0saWvwE oreeCYSJGjwDfV4XjUg1GFU+2mwFq85eb/VK48C/QTsCDeaqgIrOVsvKuez7g2mCSvRw 2SGsrWJ+igzHmWkqrGTX50EyCrAT22ALiAYNFyVeSM4oH8jYTgKB4sM4M/YS/ELCavw0 QH27ejjfEebetVZsGHRYdCbQ9vPdZjHwIOShQDkxyzmU0BemeCydYaz9Pfsa1KfuWjQ1 2WiktFkkmVz170VtYw+LNu0LZubHRrCjZu04eXMMgQz935TgQvhXZzKpjPuoFVt1Jej8 wN+w== X-Gm-Message-State: AKGB3mKEfJfFlmAnu0iIbZ2jouYlD8a5hBirpOS1im3/eGA74rJtPN5R 147ESoqWm7baqogGZZYRys6+LZJgU4c= X-Received: by 10.28.30.151 with SMTP id e145mr2785194wme.8.1512390463952; Mon, 04 Dec 2017 04:27:43 -0800 (PST) Received: from localhost.localdomain ([105.150.171.234]) by smtp.gmail.com with ESMTPSA id a8sm7665839wmh.41.2017.12.04.04.27.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Dec 2017 04:27:43 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v2 16/19] crypto: arm64/aes-ghash - yield after processing fixed number of blocks Date: Mon, 4 Dec 2017 12:26:42 +0000 Message-Id: <20171204122645.31535-17-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171204122645.31535-1-ard.biesheuvel@linaro.org> References: <20171204122645.31535-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org This updates both the core GHASH as well as the AES-GCM algorithm to yield each time after processing a fixed chunk of input. For the GCM driver, we align with the other AES/CE block mode drivers, and use a block size of 64 bytes. The core GHASH is much shorter, so let's use a block size of 128 bytes for that one. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 128 ++++++++++++++------ 1 file changed, 92 insertions(+), 36 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..fbfd4681675d 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -212,23 +212,36 @@ ushr XL.2d, XL.2d, #1 .endm - .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + .macro __pmull_ghash, pn, yield + stp x29, x30, [sp, #-64]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + str x23, [sp, #48] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +263,19 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f - st1 {XL.2d}, [x1] + yield_neon_pre w19, \yield, 1, 1b + st1 {XL.2d}, [x20] + yield_neon_post 0b + + b 1b + +3: st1 {XL.2d}, [x20] + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldr x23, [sp, #48] + ldp x29, x30, [sp], #64 ret .endm @@ -261,11 +284,11 @@ CPU_LE( rev64 T1.16b, T1.16b ) * struct ghash_key const *k, const char *head) */ ENTRY(pmull_ghash_update_p64) - __pmull_ghash p64 + __pmull_ghash p64, 5 ENDPROC(pmull_ghash_update_p64) ENTRY(pmull_ghash_update_p8) - __pmull_ghash p8 + __pmull_ghash p8, 2 ENDPROC(pmull_ghash_update_p8) KS .req v8 @@ -304,38 +327,56 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + stp x29, x30, [sp, #-96]! + mov x29, sp + stp x19, x20, [sp, #16] + stp x21, x22, [sp, #32] + stp x23, x24, [sp, #48] + stp x25, x26, [sp, #64] + str x27, [sp, #80] + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + + ldr x27, [x24, #8] // load lower counter +CPU_LE( rev x27, x27 ) + +0: ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x26] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x27 + add x27, x27, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w25, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +431,42 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + yield_neon_pre w19, 8, 1, 1b // yield every 8 blocks + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x26] + .endif + yield_neon_post 0b + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x26] .endif +CPU_LE( rev x27, x27 ) + str x27, [x24, #8] // store lower counter + + ldp x19, x20, [sp, #16] + ldp x21, x22, [sp, #32] + ldp x23, x24, [sp, #48] + ldp x25, x26, [sp, #64] + ldr x27, [sp, #80] + ldp x29, x30, [sp], #96 ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /*