From patchwork Mon Apr 30 16:18:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134711 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948103lji; Mon, 30 Apr 2018 09:18:47 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrNyd+QCvl9qxrErDi7opZ9ZRBhIfULlDiE/ZGWsaRfc4TrJD/ovteUhkDgktKtwRTmFyds X-Received: by 2002:a63:3ec9:: with SMTP id l192-v6mr10494093pga.318.1525105127666; Mon, 30 Apr 2018 09:18:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105127; cv=none; d=google.com; s=arc-20160816; b=WE6hfJdWfhN+BzQS+tWY3gGwv9/FPN+u6TS9Y/YD6DCmFby3wT4bbgrsGV5UnXcLVs 90EvO2JEnRpQI/NxmIEFMI99P3oxlQbflIq7T3xpbRovc0vHqWurP1vZbrB5idK0k5Hw SWYsy3uB/79GZlMR6yvZOx07dXs4kiPJ7j9/+CQfaJaE+Eigx+jXyKMXxHacI2QuGc9b ykS7FuFjgEOmNbBaKOv31a1OKQoLRRjuapwkJwWAnUq1AmsQMxQ8MlliTjbZD6Mo/Xyx /6bXFuSHeEblaiBwhhG2IZwG5LzG1mlG0aaJOJkmnMz6RKZl1ygQVaDkUswhwIG7Tz1D WBGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=/ROUsVQ72jriRGED9Y0Y0oJI4uCTW/zuVCxx/jeBvxE=; b=MHYI1NBGBc0Ygi6w9fOXmSYtODceJ0rfvne0xBS186bXJ+57Yjnh9Gr4SSMPotSXew vvpdloJI3FCY/zDkEr345otzyM3xReXI1Mv+oYp3mTNh+xYnDiXMJY+mjJ2EU1eWhZ2T 9Egsk8lldkxLu6aF+NyKePwDGegKMSyl74v808hyC1VybKIdooHsTyMdh3Km51gFEjpu e3YqU6qgq0VasOWVWOrLR174YOiiWzNvr0E2xj7drrWQHEAqWAQi5sqjqb8z6F15mO/P tHaHvrF7hX/W9gqjKzK9dexJNeuTtrdO9Nq58DpXa022VGVzfddwJ3wZITWxh9RWyXpN pP/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Qc48Uray; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.47; Mon, 30 Apr 2018 09:18:47 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Qc48Uray; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754541AbeD3QSq (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:46 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:45574 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QSp (ORCPT ); Mon, 30 Apr 2018 12:18:45 -0400 Received: by mail-wr0-f195.google.com with SMTP id p5-v6so8568292wre.12 for ; Mon, 30 Apr 2018 09:18:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=/ROUsVQ72jriRGED9Y0Y0oJI4uCTW/zuVCxx/jeBvxE=; b=Qc48Urayw1AUecUkdgI+fU7zZWL4gvmYOIQrwOqMUi4QNotZX28QY82l1ImOjYMtMN K1FVw4lCLPue2cVlGGGkfU/OnkMngngqmsLrRXYZQRpvIr9ZD5umFcfjCoOyQYousOxn nJ95A5Byhh/GX7HU2WlyVe2MmVS4m36yJH3Es= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=/ROUsVQ72jriRGED9Y0Y0oJI4uCTW/zuVCxx/jeBvxE=; b=tzm2Xi9oBvjUzeC4Pp/gtZocvdv7TsheEIhsvaE0WjiYdkI9LQ0xz/y/O63hSUcmVc iGf+pqqtle+XWrSydJayCDu0LaxCWwusi889AtLsHMAYTFXa7z6HWrF+UpaetcNdY2qi wiQeGbHV91Ohn4HqYLiKViH9ZzajJvPystlWKY5hQcEX/VMdo0OMNhyTmms7BUfCzIL9 b+Y83vYFBuOwKXa85+qhqsSQqxKJieOF19aqpQs+StIOAuR2Se9Uz8xEJ3Wvq3RV3LDY vakU7fWwnfCbzIOZRSfeTTgM7NilHNsSB/WgKIGgL9Wn6Dba1wCmhVoxqk9GAOzLKYS7 khMg== X-Gm-Message-State: ALQs6tCuPo8uJmveHRs6d0O5wROL5PI1mODFbewpZHV7Qag50KVJ/+0X iwUa80wVz24ODR3jT6i0h1kzvSqySwE= X-Received: by 2002:adf:db85:: with SMTP id u5-v6mr8827903wri.278.1525105123738; Mon, 30 Apr 2018 09:18:43 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:42 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 01/10] crypto: arm64/sha1-ce - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:21 +0200 Message-Id: <20180430161830.14892-2-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha1-ce-core.S | 42 ++++++++++++++------ 1 file changed, 29 insertions(+), 13 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index 46049850727d..78eb35fb5056 100644 --- a/arch/arm64/crypto/sha1-ce-core.S +++ b/arch/arm64/crypto/sha1-ce-core.S @@ -69,30 +69,36 @@ * int blocks) */ ENTRY(sha1_ce_transform) + frame_push 3 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - loadrc k0.4s, 0x5a827999, w6 +0: loadrc k0.4s, 0x5a827999, w6 loadrc k1.4s, 0x6ed9eba1, w6 loadrc k2.4s, 0x8f1bbcdc, w6 loadrc k3.4s, 0xca62c1d6, w6 /* load state */ - ld1 {dgav.4s}, [x0] - ldr dgb, [x0, #16] + ld1 {dgav.4s}, [x19] + ldr dgb, [x19, #16] /* load sha1_ce_state::finalize */ ldr_l w4, sha1_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v8.4s-v11.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v8.4s-v11.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE( rev32 v8.16b, v8.16b ) CPU_LE( rev32 v9.16b, v9.16b ) CPU_LE( rev32 v10.16b, v10.16b ) CPU_LE( rev32 v11.16b, v11.16b ) -1: add t0.4s, v8.4s, k0.4s +2: add t0.4s, v8.4s, k0.4s mov dg0v.16b, dgav.16b add_update c, ev, k0, 8, 9, 10, 11, dgb @@ -123,16 +129,25 @@ CPU_LE( rev32 v11.16b, v11.16b ) add dgbv.2s, dgbv.2s, dg1v.2s add dgav.4s, dgav.4s, dg0v.4s - cbnz w2, 0b + cbz w21, 3f + + if_will_cond_yield_neon + st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha1_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] movi v9.2d, #0 mov x8, #0x80000000 movi v10.2d, #0 @@ -141,10 +156,11 @@ CPU_LE( rev32 v11.16b, v11.16b ) mov x4, #0 mov v11.d[0], xzr mov v11.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s}, [x0] - str dgb, [x0, #16] +4: st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + frame_pop ret ENDPROC(sha1_ce_transform) From patchwork Mon Apr 30 16:18:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134712 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948118lji; Mon, 30 Apr 2018 09:18:49 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqHlmXzaGkw2y4CD95YHwxa6hZZnIMiaUecOwzfh5fEqme2bcIrUJUCkn00QGfvczBKIg8G X-Received: by 2002:a17:902:bf41:: with SMTP id u1-v6mr13153705pls.257.1525105128946; Mon, 30 Apr 2018 09:18:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105128; cv=none; d=google.com; s=arc-20160816; b=b9IjWUuuk3xtEpguE733/ZzIfx87BnaYXpQn8glcqYBwBwxwtled5Fz45mb3esTzEy uIs+iHbd4e7oZ9+mGBAo8rFLKRN93YYFAhR5PG0xS1yvohVXcaNOjMPsDl1Lj6M8e8k6 eAXcTsw+lQ9h02E2u17SObzBv263wB9SPe84io4cB7KuCCxTE1fLmpu0XRZzgEQn3xND 0eK5O8w8bniw0j0M967SPiQ4AmJkoyqfEDhlc0AELCoKRVBUrtA73rnvo4Og1JM9EavR 01dWPxd/1V5vE6iRR1g2KYXbaYKUGFHz4cEVc9n/U9MaMG8+eohqjM05B/tOXYo9CWPP zvUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=lcnypoy5h+AqrQvMvZmSK8KhkPa+m+lRmvG+TuuFmwE=; b=qqJmeFH+EjxGdpb+ayzEZx2Ffa3G2QvrrgIqQMDzAL7+J7hZ7SMttahVfBxJ8YXxSX v7y40BlmTZJmMH3I71wzf8N9HTtY62OfwfTK4Kk9INrbzBrdteY8ocHHAmAb5sZbg/W5 3M9zNAJ3e5UbEAOChViXGgbMuPwITWOeTpDpfEaE12dMrDlMMCyJuWkAkHGL5b9ta6Qk c1yOZt8WQKEcd2zMP5cXNMJj1OzDKn/GXfIkeXUnwudQMQlkmHDT033QL837ENUBc86h ZmA6pGUSOMVH1oCPLhjyZZCiewOgxjqBVM3hglvC0u/w/eotCuxAeixcIKhIMj9Z3HUF a4FA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=H8qZ/2Lq; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.48; Mon, 30 Apr 2018 09:18:48 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=H8qZ/2Lq; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754555AbeD3QSs (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:48 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:44147 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QSr (ORCPT ); Mon, 30 Apr 2018 12:18:47 -0400 Received: by mail-wr0-f196.google.com with SMTP id o15-v6so8576440wro.11 for ; Mon, 30 Apr 2018 09:18:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=lcnypoy5h+AqrQvMvZmSK8KhkPa+m+lRmvG+TuuFmwE=; b=H8qZ/2Lqd0t5ycvwoJPlkb9wZ3L8uQDqV4qKM3EhblnBjJoar+FgqktZ7LKj1zOERt qDBXjJilmiIwl3SvPwHNeJDzpUR1xSymyAQY0x7Jzs2IBiU7cr/R0W4I3uOtjzfNzGT/ /GMDKjiLag1S3SuWsZvACoH1+SOmNFotePMqw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=lcnypoy5h+AqrQvMvZmSK8KhkPa+m+lRmvG+TuuFmwE=; b=Kijsel2GyyJ0MyRdTFXzkrduAN+dw6AK7NcJmJYVberGQBfyY42iinMke3+XbtDPP4 I0kLHutcU7/J67tGJQtkVvEMyVT9Qp2xmz7x0IVYv21wp0aD1RkHX1Sp0f4og/bbj643 a+bEJKU0dPCQ5vJxAcVaAwR2JaVhcz6d/NxBrzhAQJfRYxY+EJLhtEFrlM8HbHqU3Wnu EFEdTNEzZDUnPKjr6JRWPJWkw7US24GG9JghOIIQB5J13bbD6YKCtDTlNQJMB3XjzVdH 7xF3RxsAukFytY5fzb9xn3p0FbR6LpYY/FM7FdBsQQF/5uubJhQ88pTHSFoMKwmzy84L 9h9w== X-Gm-Message-State: ALQs6tBTi+SEDGpNqgmo22haegeYnkexzgsdM8Ad/z+YJz8md2OVbO9I rdimMswmnjFljLdZ/cefWLAueP3vp9U= X-Received: by 2002:adf:9a27:: with SMTP id z36-v6mr8843321wrb.47.1525105125913; Mon, 30 Apr 2018 09:18:45 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:45 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 02/10] crypto: arm64/sha2-ce - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:22 +0200 Message-Id: <20180430161830.14892-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha2-ce-core.S | 37 ++++++++++++++------ 1 file changed, 26 insertions(+), 11 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S index 4c3c89b812ce..cd8b36412469 100644 --- a/arch/arm64/crypto/sha2-ce-core.S +++ b/arch/arm64/crypto/sha2-ce-core.S @@ -79,30 +79,36 @@ */ .text ENTRY(sha2_ce_transform) + frame_push 3 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - adr_l x8, .Lsha2_rcon +0: adr_l x8, .Lsha2_rcon ld1 { v0.4s- v3.4s}, [x8], #64 ld1 { v4.4s- v7.4s}, [x8], #64 ld1 { v8.4s-v11.4s}, [x8], #64 ld1 {v12.4s-v15.4s}, [x8] /* load state */ - ld1 {dgav.4s, dgbv.4s}, [x0] + ld1 {dgav.4s, dgbv.4s}, [x19] /* load sha256_ce_state::finalize */ ldr_l w4, sha256_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v16.4s-v19.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v16.4s-v19.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE( rev32 v16.16b, v16.16b ) CPU_LE( rev32 v17.16b, v17.16b ) CPU_LE( rev32 v18.16b, v18.16b ) CPU_LE( rev32 v19.16b, v19.16b ) -1: add t0.4s, v16.4s, v0.4s +2: add t0.4s, v16.4s, v0.4s mov dg0v.16b, dgav.16b mov dg1v.16b, dgbv.16b @@ -131,16 +137,24 @@ CPU_LE( rev32 v19.16b, v19.16b ) add dgbv.4s, dgbv.4s, dg1v.4s /* handled all input blocks? */ - cbnz w2, 0b + cbz w21, 3f + + if_will_cond_yield_neon + st1 {dgav.4s, dgbv.4s}, [x19] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha256_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] movi v17.2d, #0 mov x8, #0x80000000 movi v18.2d, #0 @@ -149,9 +163,10 @@ CPU_LE( rev32 v19.16b, v19.16b ) mov x4, #0 mov v19.d[0], xzr mov v19.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s, dgbv.4s}, [x0] +4: st1 {dgav.4s, dgbv.4s}, [x19] + frame_pop ret ENDPROC(sha2_ce_transform) From patchwork Mon Apr 30 16:18:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134713 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948150lji; Mon, 30 Apr 2018 09:18:51 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqeEM+zm96zVcTXVv7/lBvuCrONio2FHx+AdufWoECV44Xb3/LCfkIMvHC3T4ks5QHuX3bG X-Received: by 2002:a17:902:ab8d:: with SMTP id f13-v6mr12394029plr.81.1525105131511; Mon, 30 Apr 2018 09:18:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105131; cv=none; d=google.com; s=arc-20160816; b=xXTLSe5IakQ8VK9c9Cx6CiIE1n1xEnvpqEg5WlG7mdZcCjNEmQ3iyuSIy6tyANv3sd xmQ0KP6ViT/TUu9bNs8Ta+kGmApGAsHa6j64QqI4zrn79f4c3rTe1iw5UZAbMpa1NDEI fWavUX07eVgMfp6AKYv7Ca3HqZQ9fCXkDAm9tmZ2mnLSYc9ufbixwY0wVAS5dwvaLP2p F8tjRZsJPMasejC7X7TAtW7vuyBiFPOoCcXwWpheVRwVysc9qbq0oyLlRZiroLQ0Mb+L sdIWqahuWGG9iXj2QzQTu5vCSXCSvgZ2ONkv2aI+zOPgySZcwp0DcO0wUk3mqfV/jp3k E64Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=M3Dar2QfjxkTFdMiYpyIylqbas4/Sx5VzHZDuYjRXQY=; b=cSQ4v4mllCToDwI8IaNTHzMBth3EyzMNYfPEL8UAiY5K6hJDiV3AiFlcjuxMCqx5T2 /5RxV59mNOLlLdwlQ9oWLjZFRdAdddBozqfFpxN/n0DueXIkFtpDU+vS99WbQPyhNrmo /YNtZQiswzsCWezD+iakDBzEC5Ms1wYCgNN91PbXJ0fQTm3YmnbPO3OXacjD5so9Crlz MvwmaIHZX5THgTgZd0LIFCoFTepcZkQtZjWp72e4Q+zUYWpc0Dlp0qTLLUAiR+kDoW87 fzrO92g/9ruPidq8sYewIvhP+uxGGvbsavi3SzVHOJ0Y4gt0/zdgNLbDQNtBLInPEET6 AyeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iC6mQJAt; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.51; Mon, 30 Apr 2018 09:18:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iC6mQJAt; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754557AbeD3QSu (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:50 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:44149 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QSt (ORCPT ); Mon, 30 Apr 2018 12:18:49 -0400 Received: by mail-wr0-f195.google.com with SMTP id o15-v6so8576548wro.11 for ; Mon, 30 Apr 2018 09:18:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=M3Dar2QfjxkTFdMiYpyIylqbas4/Sx5VzHZDuYjRXQY=; b=iC6mQJAtSyWEU97jB2983xJh89TuYRWhSNr/tvpmaGvrdC8gQq6uQwWfQ938h6y6+5 6QT6p/GjukNeM8cFLCxEDdHtVB8/DsqLguDspegEVTw5BEUyl9kNiUfrM3O1zebHygiF yuCf9ifbSaGB7kFj5JznvsyTRmWyilOyXw8U0= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=M3Dar2QfjxkTFdMiYpyIylqbas4/Sx5VzHZDuYjRXQY=; b=P1L3OmphCVUm/xYZhaEPHCeBEeveYhbbnRJeHppF0smIZihm/mAbc0vjWtpypGUGyR RHdLnoHFNfGEHjRwOP4FFa/ogiPS88bCAeD8DO41ZC52qXbAUe2WLu20+v2mGlZUzUhW 06YzijVSPrSjyLBgufgQALDJrv3tCl4V7lr897L4XiC8yD6vBooDt5QbrlqRDUhhB2B7 vOvy9+vywNui2r2DJHNgSGwsXdpGdqYVDTb8bQme+k2QOVbgixHtxy1+UsBDJ6OLRPOp 0RNk8+rC2aaw/ZhUYD9HPaColDf2iNTCP41Z0yoXTx/AVUC+PhN6vMtGF1li+CoIlQsM qvkA== X-Gm-Message-State: ALQs6tAukto7hUK0zQfXgNmmqEdXSobupAS8u/2t+va7M8MPSeNgEcww O8v18U/C3qczTIRVbMAn7vvMh5ZSLFk= X-Received: by 2002:adf:988c:: with SMTP id w12-v6mr8983453wrb.215.1525105128189; Mon, 30 Apr 2018 09:18:48 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:47 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 03/10] crypto: arm64/aes-ccm - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:23 +0200 Message-Id: <20180430161830.14892-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 150 +++++++++++++------- 1 file changed, 95 insertions(+), 55 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index e3a375c4cb83..88f5aef7934c 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -19,24 +19,33 @@ * u32 *macp, u8 const rk[], u32 rounds); */ ENTRY(ce_aes_ccm_auth_data) - ldr w8, [x3] /* leftover from prev round? */ + frame_push 7 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + + ldr w25, [x22] /* leftover from prev round? */ ld1 {v0.16b}, [x0] /* load mac */ - cbz w8, 1f - sub w8, w8, #16 + cbz w25, 1f + sub w25, w25, #16 eor v1.16b, v1.16b, v1.16b -0: ldrb w7, [x1], #1 /* get 1 byte of input */ - subs w2, w2, #1 - add w8, w8, #1 +0: ldrb w7, [x20], #1 /* get 1 byte of input */ + subs w21, w21, #1 + add w25, w25, #1 ins v1.b[0], w7 ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */ beq 8f /* out of input? */ - cbnz w8, 0b + cbnz w25, 0b eor v0.16b, v0.16b, v1.16b -1: ld1 {v3.4s}, [x4] /* load first round key */ - prfm pldl1strm, [x1] - cmp w5, #12 /* which key size? */ - add x6, x4, #16 - sub w7, w5, #2 /* modified # of rounds */ +1: ld1 {v3.4s}, [x23] /* load first round key */ + prfm pldl1strm, [x20] + cmp w24, #12 /* which key size? */ + add x6, x23, #16 + sub w7, w24, #2 /* modified # of rounds */ bmi 2f bne 5f mov v5.16b, v3.16b @@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data) ld1 {v5.4s}, [x6], #16 /* load next round key */ bpl 3b aese v0.16b, v4.16b - subs w2, w2, #16 /* last data? */ + subs w21, w21, #16 /* last data? */ eor v0.16b, v0.16b, v5.16b /* final round */ bmi 6f - ld1 {v1.16b}, [x1], #16 /* load next input block */ + ld1 {v1.16b}, [x20], #16 /* load next input block */ eor v0.16b, v0.16b, v1.16b /* xor with mac */ - bne 1b -6: st1 {v0.16b}, [x0] /* store mac */ + beq 6f + + if_will_cond_yield_neon + st1 {v0.16b}, [x19] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x19] /* reload mac */ + endif_yield_neon + + b 1b +6: st1 {v0.16b}, [x19] /* store mac */ beq 10f - adds w2, w2, #16 + adds w21, w21, #16 beq 10f - mov w8, w2 -7: ldrb w7, [x1], #1 + mov w25, w21 +7: ldrb w7, [x20], #1 umov w6, v0.b[0] eor w6, w6, w7 - strb w6, [x0], #1 - subs w2, w2, #1 + strb w6, [x19], #1 + subs w21, w21, #1 beq 10f ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ b 7b -8: mov w7, w8 - add w8, w8, #16 +8: mov w7, w25 + add w25, w25, #16 9: ext v1.16b, v1.16b, v1.16b, #1 adds w7, w7, #1 bne 9b eor v0.16b, v0.16b, v1.16b - st1 {v0.16b}, [x0] -10: str w8, [x3] + st1 {v0.16b}, [x19] +10: str w25, [x22] + + frame_pop ret ENDPROC(ce_aes_ccm_auth_data) @@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final) ENDPROC(ce_aes_ccm_final) .macro aes_ccm_do_crypt,enc - ldr x8, [x6, #8] /* load lower ctr */ - ld1 {v0.16b}, [x5] /* load mac */ -CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ + frame_push 8 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + ldr x26, [x25, #8] /* load lower ctr */ + ld1 {v0.16b}, [x24] /* load mac */ +CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */ 0: /* outer loop */ - ld1 {v1.8b}, [x6] /* load upper ctr */ - prfm pldl1strm, [x1] - add x8, x8, #1 - rev x9, x8 - cmp w4, #12 /* which key size? */ - sub w7, w4, #2 /* get modified # of rounds */ + ld1 {v1.8b}, [x25] /* load upper ctr */ + prfm pldl1strm, [x20] + add x26, x26, #1 + rev x9, x26 + cmp w23, #12 /* which key size? */ + sub w7, w23, #2 /* get modified # of rounds */ ins v1.d[1], x9 /* no carry in lower ctr */ - ld1 {v3.4s}, [x3] /* load first round key */ - add x10, x3, #16 + ld1 {v3.4s}, [x22] /* load first round key */ + add x10, x22, #16 bmi 1f bne 4f mov v5.16b, v3.16b @@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ bpl 2b aese v0.16b, v4.16b aese v1.16b, v4.16b - subs w2, w2, #16 - bmi 6f /* partial block? */ - ld1 {v2.16b}, [x1], #16 /* load next input block */ + subs w21, w21, #16 + bmi 7f /* partial block? */ + ld1 {v2.16b}, [x20], #16 /* load next input block */ .if \enc == 1 eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */ @@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ eor v1.16b, v2.16b, v5.16b /* final round enc */ .endif eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ - st1 {v1.16b}, [x0], #16 /* write output block */ - bne 0b -CPU_LE( rev x8, x8 ) - st1 {v0.16b}, [x5] /* store mac */ - str x8, [x6, #8] /* store lsb end of ctr (BE) */ -5: ret - -6: eor v0.16b, v0.16b, v5.16b /* final round mac */ + st1 {v1.16b}, [x19], #16 /* write output block */ + beq 5f + + if_will_cond_yield_neon + st1 {v0.16b}, [x24] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x24] /* reload mac */ + endif_yield_neon + + b 0b +5: +CPU_LE( rev x26, x26 ) + st1 {v0.16b}, [x24] /* store mac */ + str x26, [x25, #8] /* store lsb end of ctr (BE) */ + +6: frame_pop + ret + +7: eor v0.16b, v0.16b, v5.16b /* final round mac */ eor v1.16b, v1.16b, v5.16b /* final round enc */ - st1 {v0.16b}, [x5] /* store mac */ - add w2, w2, #16 /* process partial tail block */ -7: ldrb w9, [x1], #1 /* get 1 byte of input */ + st1 {v0.16b}, [x24] /* store mac */ + add w21, w21, #16 /* process partial tail block */ +8: ldrb w9, [x20], #1 /* get 1 byte of input */ umov w6, v1.b[0] /* get top crypted ctr byte */ umov w7, v0.b[0] /* get top mac byte */ .if \enc == 1 @@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 ) eor w9, w9, w6 eor w7, w7, w9 .endif - strb w9, [x0], #1 /* store out byte */ - strb w7, [x5], #1 /* store mac byte */ - subs w2, w2, #1 - beq 5b + strb w9, [x19], #1 /* store out byte */ + strb w7, [x24], #1 /* store mac byte */ + subs w21, w21, #1 + beq 6b ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */ - b 7b + b 8b .endm /* From patchwork Mon Apr 30 16:18:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134714 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948185lji; Mon, 30 Apr 2018 09:18:54 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrewsxgy324fi+FxMbo6wsMjJIRjbbvhjWcv3+ldE/SEUKXCSOvzlW/9pZqRS9E9505KztF X-Received: by 2002:a17:902:6041:: with SMTP id a1-v6mr12899412plt.59.1525105134373; Mon, 30 Apr 2018 09:18:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105134; cv=none; d=google.com; s=arc-20160816; b=SQKbfgttRwE856r//GhjIyCk33RBlMNIfWGyeugBC594jg0nYXXdOvbZ/Y916bucs7 sjG4gQaTZ5fS2MhiUeiI61SzCzrzMDt1XT2zcjYLtFFyMEejJ1/c/Y8Q5BgZW8BY31Mu F4hA/Xwx1GgiEPRdM6k0yLm13Cuz17mYY+2Qwl7Sl31hLkJEX3tWlinPuKuHyrZ4yJBU Hk62Y1o0Zab+sPu2V1gOGzDoa09jCRnR6eAR59Ov4plryEeGLo3Qud7eaAtUHSJWVLB/ 1B3myFddCJSNgzwYKbiVBbbX+OadAnynC360WsLQt4nLYWhQr8od/h3tNWT3lpJqVrTq JVaA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=2+6+AY2DyLxC9ZbSe6ibtBR4TGSeXnj6LlIcwxTpi78=; b=fSjn2wpabOYFG3noQHRFwBd6z/HtqHxxtlGcrKCHqyJvOZ5IUQJCTtl9JgCfYkGFPB /MQQ4ZP/p20pyB6h1Pd2Hla/Dk1CWzK5BMV8UpXD3+bN9xnHxsZsihA3t2cx1mHE6Mx8 F+wSB4yd6Rl9e9ZJTV71hX177i5tKJ1TipiNsbZKCzWM357gMQywi5ONvsNK1cS18G1g kyqHLdtP4hAKQDD+yyDTpnoj5qHhM4SsrDR5z500m1EKK9mEFej71PLUFL7A90pLUShQ CcJoOqwrKw5mmwFDSXo5bKjKOInKYxn0chSXYJmV7v50K2BsKCMKw/V0/oBDwtpYH46+ R9NA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QtLaJmxM; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.54; Mon, 30 Apr 2018 09:18:54 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QtLaJmxM; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754571AbeD3QSx (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:53 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:40061 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QSw (ORCPT ); Mon, 30 Apr 2018 12:18:52 -0400 Received: by mail-wm0-f68.google.com with SMTP id j5so15183785wme.5 for ; Mon, 30 Apr 2018 09:18:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=2+6+AY2DyLxC9ZbSe6ibtBR4TGSeXnj6LlIcwxTpi78=; b=QtLaJmxMSs4RcGyj5kWERlcV4XbfIJbcxc92wgN/HYpfNT0DYS1s9+vnHHPf5SIsdI FqqkxsEb/sBNPrFqh8+XX0WaQUPKA5gMDysXIOaUMY27b4NYy5/EA2W22sroTTQigHqa RzTrmZiRpGSRwc4mhvfv9aEB+5IkA9NKk23Ts= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=2+6+AY2DyLxC9ZbSe6ibtBR4TGSeXnj6LlIcwxTpi78=; b=feeuRlikaUEayM+hY4wgJcLdjA6PPEbm9NI3EmTv5Lr7BhTvw/HNwdk0mnnB5f1dkh J6KOIHlTUG6dsYYfPeDuu6rwW+XjZpJKM96kP+MgM7AU2JRkAp7GRmM+/Pi89TzZeDN5 NCVf/ktEc+8WH9D38NdXdbhvvAqZP2MGy3VnTn0cFiha7JE4xrS/pAAlM1Aggq8+y6mQ clCaDk1R25417k2crW7LiNRFcgmALrJwjV80wZj8HM9G5jafQCgQ70HI5JEsEl9Kzkpq o75A3lvomPuRSDcCydesKYYif6ysUvIqBHZkk3f6KuSIBWA+AP5jw9k6s0ykC5+/Mqr4 fNYA== X-Gm-Message-State: ALQs6tA+4E0cCJWQ8DHcnX0UvRs31ev4dMmHaj92SfhTZttjCc56OUgy j0x1uTylPuTjTQfUJnp1j8KdaJ6Ush8= X-Received: by 10.28.142.149 with SMTP id q143mr7676394wmd.161.1525105130636; Mon, 30 Apr 2018 09:18:50 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.48 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:49 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 04/10] crypto: arm64/aes-blk - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:24 +0200 Message-Id: <20180430161830.14892-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce.S | 15 +- arch/arm64/crypto/aes-modes.S | 331 ++++++++++++-------- 2 files changed, 216 insertions(+), 130 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/aes-ce.S b/arch/arm64/crypto/aes-ce.S index 50330f5c3adc..623e74ed1c67 100644 --- a/arch/arm64/crypto/aes-ce.S +++ b/arch/arm64/crypto/aes-ce.S @@ -30,18 +30,21 @@ .endm /* prepare for encryption with key in rk[] */ - .macro enc_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for encryption (again) but with new key in rk[] */ - .macro enc_switch_key, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro enc_switch_key, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm /* prepare for decryption with key in rk[] */ - .macro dec_prepare, rounds, rk, ignore - load_round_keys \rounds, \rk + .macro dec_prepare, rounds, rk, temp + mov \temp, \rk + load_round_keys \rounds, \temp .endm .macro do_enc_Nx, de, mc, k, i0, i1, i2, i3 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index a68412e1e3a4..483a7130cf0e 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -14,12 +14,12 @@ .align 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + encrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 + decrypt_block4x v0, v1, v2, v3, w22, x21, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -31,57 +31,71 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 5 - enc_prepare w3, x2, x5 + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +.Lecbencrestart: + enc_prepare w22, x21, x5 .LecbencloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lecbenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ bl aes_encrypt_block4x - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + cond_yield_neon .Lecbencrestart b .LecbencloopNx .Lecbenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lecbencout .Lecbencloop: - ld1 {v0.16b}, [x1], #16 /* get next pt block */ - encrypt_block v0, w3, x2, x5, w6 - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + ld1 {v0.16b}, [x20], #16 /* get next pt block */ + encrypt_block v0, w22, x21, x5, w6 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lecbencloop .Lecbencout: - ldp x29, x30, [sp], #16 + frame_pop ret AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 - dec_prepare w3, x2, x5 +.Lecbdecrestart: + dec_prepare w22, x21, x5 .LecbdecloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lecbdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ bl aes_decrypt_block4x - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + cond_yield_neon .Lecbdecrestart b .LecbdecloopNx .Lecbdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lecbdecout .Lecbdecloop: - ld1 {v0.16b}, [x1], #16 /* get next ct block */ - decrypt_block v0, w3, x2, x5, w6 - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + ld1 {v0.16b}, [x20], #16 /* get next ct block */ + decrypt_block v0, w22, x21, x5, w6 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lecbdecloop .Lecbdecout: - ldp x29, x30, [sp], #16 + frame_pop ret AES_ENDPROC(aes_ecb_decrypt) @@ -94,78 +108,100 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - ld1 {v4.16b}, [x5] /* get iv */ - enc_prepare w3, x2, x6 + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + +.Lcbcencrestart: + ld1 {v4.16b}, [x24] /* get iv */ + enc_prepare w22, x21, x6 .Lcbcencloop4x: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lcbcenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w22, x21, x6, w7 eor v1.16b, v1.16b, v0.16b - encrypt_block v1, w3, x2, x6, w7 + encrypt_block v1, w22, x21, x6, w7 eor v2.16b, v2.16b, v1.16b - encrypt_block v2, w3, x2, x6, w7 + encrypt_block v2, w22, x21, x6, w7 eor v3.16b, v3.16b, v2.16b - encrypt_block v3, w3, x2, x6, w7 - st1 {v0.16b-v3.16b}, [x0], #64 + encrypt_block v3, w22, x21, x6, w7 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v3.16b + st1 {v4.16b}, [x24] /* return iv */ + cond_yield_neon .Lcbcencrestart b .Lcbcencloop4x .Lcbcenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lcbcencout .Lcbcencloop: - ld1 {v0.16b}, [x1], #16 /* get next pt block */ + ld1 {v0.16b}, [x20], #16 /* get next pt block */ eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ - encrypt_block v4, w3, x2, x6, w7 - st1 {v4.16b}, [x0], #16 - subs w4, w4, #1 + encrypt_block v4, w22, x21, x6, w7 + st1 {v4.16b}, [x19], #16 + subs w23, w23, #1 bne .Lcbcencloop .Lcbcencout: - st1 {v4.16b}, [x5] /* return iv */ + st1 {v4.16b}, [x24] /* return iv */ + frame_pop ret AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 - ld1 {v7.16b}, [x5] /* get iv */ - dec_prepare w3, x2, x6 +.Lcbcdecrestart: + ld1 {v7.16b}, [x24] /* get iv */ + dec_prepare w22, x21, x6 .LcbcdecloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lcbcdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ mov v4.16b, v0.16b mov v5.16b, v1.16b mov v6.16b, v2.16b bl aes_decrypt_block4x - sub x1, x1, #16 + sub x20, x20, #16 eor v0.16b, v0.16b, v7.16b eor v1.16b, v1.16b, v4.16b - ld1 {v7.16b}, [x1], #16 /* reload 1 ct block */ + ld1 {v7.16b}, [x20], #16 /* reload 1 ct block */ eor v2.16b, v2.16b, v5.16b eor v3.16b, v3.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 + st1 {v7.16b}, [x24] /* return iv */ + cond_yield_neon .Lcbcdecrestart b .LcbcdecloopNx .Lcbcdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lcbcdecout .Lcbcdecloop: - ld1 {v1.16b}, [x1], #16 /* get next ct block */ + ld1 {v1.16b}, [x20], #16 /* get next ct block */ mov v0.16b, v1.16b /* ...and copy to v0 */ - decrypt_block v0, w3, x2, x6, w7 + decrypt_block v0, w22, x21, x6, w7 eor v0.16b, v0.16b, v7.16b /* xor with iv => pt */ mov v7.16b, v1.16b /* ct is next iv */ - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 bne .Lcbcdecloop .Lcbcdecout: - st1 {v7.16b}, [x5] /* return iv */ - ldp x29, x30, [sp], #16 + st1 {v7.16b}, [x24] /* return iv */ + frame_pop ret AES_ENDPROC(aes_cbc_decrypt) @@ -176,19 +212,26 @@ AES_ENDPROC(aes_cbc_decrypt) */ AES_ENTRY(aes_ctr_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 - enc_prepare w3, x2, x6 - ld1 {v4.16b}, [x5] + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + +.Lctrrestart: + enc_prepare w22, x21, x6 + ld1 {v4.16b}, [x24] umov x6, v4.d[1] /* keep swabbed ctr in reg */ rev x6, x6 - cmn w6, w4 /* 32 bit overflow? */ - bcs .Lctrloop .LctrloopNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lctr1x + cmn w6, #4 /* 32 bit overflow? */ + bcs .Lctr1x ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ dup v7.4s, w6 mov v0.16b, v4.16b @@ -200,25 +243,27 @@ AES_ENTRY(aes_ctr_encrypt) mov v1.s[3], v8.s[0] mov v2.s[3], v8.s[1] mov v3.s[3], v8.s[2] - ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */ + ld1 {v5.16b-v7.16b}, [x20], #48 /* get 3 input blocks */ bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b - ld1 {v5.16b}, [x1], #16 /* get 1 input block */ + ld1 {v5.16b}, [x20], #16 /* get 1 input block */ eor v1.16b, v6.16b, v1.16b eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 add x6, x6, #4 rev x7, x6 ins v4.d[1], x7 - cbz w4, .Lctrout + cbz w23, .Lctrout + st1 {v4.16b}, [x24] /* return next CTR value */ + cond_yield_neon .Lctrrestart b .LctrloopNx .Lctr1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lctrout .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w3, x2, x8, w7 + encrypt_block v0, w22, x21, x8, w7 adds x6, x6, #1 /* increment BE ctr */ rev x7, x6 @@ -226,22 +271,22 @@ AES_ENTRY(aes_ctr_encrypt) bcs .Lctrcarry /* overflow? */ .Lctrcarrydone: - subs w4, w4, #1 + subs w23, w23, #1 bmi .Lctrtailblock /* blocks <0 means tail block */ - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 eor v3.16b, v0.16b, v3.16b - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 bne .Lctrloop .Lctrout: - st1 {v4.16b}, [x5] /* return next CTR value */ - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] /* return next CTR value */ +.Lctrret: + frame_pop ret .Lctrtailblock: - st1 {v0.16b}, [x0] - ldp x29, x30, [sp], #16 - ret + st1 {v0.16b}, [x19] + b .Lctrret .Lctrcarry: umov x7, v4.d[0] /* load upper word of ctr */ @@ -274,10 +319,16 @@ CPU_LE( .quad 1, 0x87 ) CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 - ld1 {v4.16b}, [x6] + ld1 {v4.16b}, [x24] cbz w7, .Lxtsencnotfirst enc_prepare w3, x5, x8 @@ -286,15 +337,17 @@ AES_ENTRY(aes_xts_encrypt) ldr q7, .Lxts_mul_x b .LxtsencNx +.Lxtsencrestart: + ld1 {v4.16b}, [x24] .Lxtsencnotfirst: - enc_prepare w3, x2, x8 + enc_prepare w22, x21, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsencNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lxtsenc1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 pt blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -307,35 +360,43 @@ AES_ENTRY(aes_xts_encrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v7.16b - cbz w4, .Lxtsencout + cbz w23, .Lxtsencout + st1 {v4.16b}, [x24] + cond_yield_neon .Lxtsencrestart b .LxtsencloopNx .Lxtsenc1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lxtsencout .Lxtsencloop: - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w3, x2, x8, w7 + encrypt_block v0, w22, x21, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 beq .Lxtsencout next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: - st1 {v4.16b}, [x6] - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] + frame_pop ret AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 - ld1 {v4.16b}, [x6] + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 + + ld1 {v4.16b}, [x24] cbz w7, .Lxtsdecnotfirst enc_prepare w3, x5, x8 @@ -344,15 +405,17 @@ AES_ENTRY(aes_xts_decrypt) ldr q7, .Lxts_mul_x b .LxtsdecNx +.Lxtsdecrestart: + ld1 {v4.16b}, [x24] .Lxtsdecnotfirst: - dec_prepare w3, x2, x8 + dec_prepare w22, x21, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsdecNx: - subs w4, w4, #4 + subs w23, w23, #4 bmi .Lxtsdec1x - ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ + ld1 {v0.16b-v3.16b}, [x20], #64 /* get 4 ct blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b next_tweak v6, v5, v7, v8 @@ -365,26 +428,28 @@ AES_ENTRY(aes_xts_decrypt) eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b eor v2.16b, v2.16b, v6.16b - st1 {v0.16b-v3.16b}, [x0], #64 + st1 {v0.16b-v3.16b}, [x19], #64 mov v4.16b, v7.16b - cbz w4, .Lxtsdecout + cbz w23, .Lxtsdecout + st1 {v4.16b}, [x24] + cond_yield_neon .Lxtsdecrestart b .LxtsdecloopNx .Lxtsdec1x: - adds w4, w4, #4 + adds w23, w23, #4 beq .Lxtsdecout .Lxtsdecloop: - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w3, x2, x8, w7 + decrypt_block v0, w22, x21, x8, w7 eor v0.16b, v0.16b, v4.16b - st1 {v0.16b}, [x0], #16 - subs w4, w4, #1 + st1 {v0.16b}, [x19], #16 + subs w23, w23, #1 beq .Lxtsdecout next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: - st1 {v4.16b}, [x6] - ldp x29, x30, [sp], #16 + st1 {v4.16b}, [x24] + frame_pop ret AES_ENDPROC(aes_xts_decrypt) @@ -393,43 +458,61 @@ AES_ENDPROC(aes_xts_decrypt) * int blocks, u8 dg[], int enc_before, int enc_after) */ AES_ENTRY(aes_mac_update) - ld1 {v0.16b}, [x4] /* get dg */ + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x6 + + ld1 {v0.16b}, [x23] /* get dg */ enc_prepare w2, x1, x7 cbz w5, .Lmacloop4x encrypt_block v0, w2, x1, x7, w8 .Lmacloop4x: - subs w3, w3, #4 + subs w22, w22, #4 bmi .Lmac1x - ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */ + ld1 {v1.16b-v4.16b}, [x19], #64 /* get next pt block */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v2.16b - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v3.16b - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 eor v0.16b, v0.16b, v4.16b - cmp w3, wzr - csinv x5, x6, xzr, eq + cmp w22, wzr + csinv x5, x24, xzr, eq cbz w5, .Lmacout - encrypt_block v0, w2, x1, x7, w8 + encrypt_block v0, w21, x20, x7, w8 + st1 {v0.16b}, [x23] /* return dg */ + cond_yield_neon .Lmacrestart b .Lmacloop4x .Lmac1x: - add w3, w3, #4 + add w22, w22, #4 .Lmacloop: - cbz w3, .Lmacout - ld1 {v1.16b}, [x0], #16 /* get next pt block */ + cbz w22, .Lmacout + ld1 {v1.16b}, [x19], #16 /* get next pt block */ eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ - subs w3, w3, #1 - csinv x5, x6, xzr, eq + subs w22, w22, #1 + csinv x5, x24, xzr, eq cbz w5, .Lmacout - encrypt_block v0, w2, x1, x7, w8 +.Lmacenc: + encrypt_block v0, w21, x20, x7, w8 b .Lmacloop .Lmacout: - st1 {v0.16b}, [x4] /* return dg */ + st1 {v0.16b}, [x23] /* return dg */ + frame_pop ret + +.Lmacrestart: + ld1 {v0.16b}, [x23] /* get dg */ + enc_prepare w21, x20, x0 + b .Lmacloop4x AES_ENDPROC(aes_mac_update) From patchwork Mon Apr 30 16:18:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134715 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948227lji; Mon, 30 Apr 2018 09:18:56 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqW/WeeGhAZmAox6cFjUam3Bcg8NTWq+2qsglIlBz4CFusidKRtwOLn2/wWpjMWDP+TPFMx X-Received: by 2002:a17:902:bb8c:: with SMTP id m12-v6mr13192584pls.53.1525105136736; Mon, 30 Apr 2018 09:18:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105136; cv=none; d=google.com; s=arc-20160816; b=aZKaswf8hxu6DkKBIs46GO0C8BmHXhOpWEOMd4ISQvB9s00UZLRpv6V12dTyt2PbOM w+Y/3CA3vxrwzcrWL+6Di7R2py20+NoU6vib7MCwvjXRHswfKvq+i7nvJ3bPwkoTMXwn sctAfexV6qhA2ZW+MNaPrtshZ+SXR2MlcrycdNCVthGV76Xj0+y/K36XS1cogOr7rI0c W7W6+tM5r8MaS4z+M9psZXy1Qg43XPdisdp46I+M/kBy9HZ4z+2JMXEa3PD7wIwWfFXj oKwis9Q3y83NacWIxrGpeMsHziOuXfOOxIaqm/oJgM7ZlSsKTXMeOHbylGX/PsykTQky jJ1w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=Pe63cUNr579Ty2lgL4Kr7lgKquOcfir69hHphKUp3og=; b=y90msVcjqS4cmjS7EmWFBQbjXj4WxIwGmDNe+ky4eeapwaolMQN0zDSUftizQZhEdb XZeSaGgRgIlWK+En69/xOpAEcb0sjhZ6rJoRTzDC2ZjDWiufMDbbezwFHgPwtHY0s4FK 9gbRPz6qs5SMQW3XwWHof86SbdIoaP+s3h1xhc3dN1PRgSiy5HHrspaooeyUyV2K81Pg CPUoJTOQikqzg8p7HOp1V1l40QTYbSPipAvkXfnoDb9in5nAUhawgkY97ZgvJZwfYvvQ TTqNs2lHLQA7j+i7sd9TR8uj7uCvvaJOfDg9lREFFM6gQhjcexkfk0OfL96w6H90jIna KF9w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RE6ONQZF; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.56; Mon, 30 Apr 2018 09:18:56 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=RE6ONQZF; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754579AbeD3QSz (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:55 -0400 Received: from mail-wr0-f195.google.com ([209.85.128.195]:36670 "EHLO mail-wr0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QSy (ORCPT ); Mon, 30 Apr 2018 12:18:54 -0400 Received: by mail-wr0-f195.google.com with SMTP id u18-v6so8574831wrg.3 for ; Mon, 30 Apr 2018 09:18:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Pe63cUNr579Ty2lgL4Kr7lgKquOcfir69hHphKUp3og=; b=RE6ONQZF0pP8QpAPrgRqb4KoTtO6kizLEyxLmSGlmCELz8y3vFjrkAAbCMOUl3iXLC 5nwNgDS1fnaR1aJDYeKkL1npdoXaXUYXmi+1zCyGuajybDooSYocH8td5VONULIu0mZ0 WZufUtgcBvyZDZvjc2LU/hpvseBHFqGUn5+jM= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Pe63cUNr579Ty2lgL4Kr7lgKquOcfir69hHphKUp3og=; b=UWKaE9Fc2N99YX7ia3aMmlmrbrabHZtx3jXlH8tsDhFmLi8+ZbkzEkxzl+k60VB9ap Kg+ECjOt8+raTB7z0FwEqj2nKt4IUI9R+5vOLe6V4sItompIkTLu4gYzDfBKy3fvbghN g8Dc02ZAwXbCg5RLedxLTtJWzM08xtKrm8D69xvkUugxzN6XR87zSpQVm30MEg0DNnUi 4b4feKWlhLV+CjtwRfwubL0qTSilNEFx2icdxnBUcpD8MuPB45YFFsjLjRBaRm8rHKOD HW/cEyA0au3MO0Qa1QePzSkCS3ojYdoU8ijbU+L60yD3YtOCLVj9PQfYzYic8EmyZ80B 6mDg== X-Gm-Message-State: ALQs6tCeCfNb2gXkdts3nGm/mEn55WCHB5KlZHdwSLSGtbT3z5kRlvL4 k9lWan1rlGCFvke6gpn1t4QvpoBFMbE= X-Received: by 2002:adf:82c4:: with SMTP id 62-v6mr10091218wrc.273.1525105132791; Mon, 30 Apr 2018 09:18:52 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:52 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 05/10] crypto: arm64/aes-bs - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:25 +0200 Message-Id: <20180430161830.14892-6-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-core.S | 305 +++++++++++--------- 1 file changed, 170 insertions(+), 135 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S index ca0472500433..e613a87f8b53 100644 --- a/arch/arm64/crypto/aes-neonbs-core.S +++ b/arch/arm64/crypto/aes-neonbs-core.S @@ -565,54 +565,61 @@ ENDPROC(aesbs_decrypt8) * int blocks) */ .macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 99: mov x5, #1 - lsl x5, x5, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x5, x5, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x5, x5, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 tbnz x5, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 tbnz x5, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 tbnz x5, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 tbnz x5, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 tbnz x5, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 tbnz x5, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 tbnz x5, #7, 0f - ld1 {v7.16b}, [x1], #16 + ld1 {v7.16b}, [x20], #16 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl \do8 - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 tbnz x5, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 tbnz x5, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 tbnz x5, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 tbnz x5, #4, 1f - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x5, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x5, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x5, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 - cbnz x4, 99b + cbz x23, 1f + cond_yield_neon + b 99b -1: ldp x29, x30, [sp], #16 +1: frame_pop ret .endm @@ -632,43 +639,49 @@ ENDPROC(aesbs_ecb_decrypt) */ .align 4 ENTRY(aesbs_cbc_decrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp + frame_push 6 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 99: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 mov v25.16b, v0.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 mov v26.16b, v1.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 mov v27.16b, v2.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 mov v28.16b, v3.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 + ld1 {v4.16b}, [x20], #16 mov v29.16b, v4.16b tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 + ld1 {v5.16b}, [x20], #16 mov v30.16b, v5.16b tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 + ld1 {v6.16b}, [x20], #16 mov v31.16b, v6.16b tbnz x6, #7, 0f - ld1 {v7.16b}, [x1] + ld1 {v7.16b}, [x20] -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_decrypt8 - ld1 {v24.16b}, [x5] // load IV + ld1 {v24.16b}, [x24] // load IV eor v1.16b, v1.16b, v25.16b eor v6.16b, v6.16b, v26.16b @@ -679,34 +692,36 @@ ENTRY(aesbs_cbc_decrypt) eor v3.16b, v3.16b, v30.16b eor v5.16b, v5.16b, v31.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 mov v24.16b, v25.16b tbnz x6, #1, 1f - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 mov v24.16b, v26.16b tbnz x6, #2, 1f - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 mov v24.16b, v27.16b tbnz x6, #3, 1f - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 mov v24.16b, v28.16b tbnz x6, #4, 1f - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 mov v24.16b, v29.16b tbnz x6, #5, 1f - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 mov v24.16b, v30.16b tbnz x6, #6, 1f - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 mov v24.16b, v31.16b tbnz x6, #7, 1f - ld1 {v24.16b}, [x1], #16 - st1 {v5.16b}, [x0], #16 -1: st1 {v24.16b}, [x5] // store IV + ld1 {v24.16b}, [x20], #16 + st1 {v5.16b}, [x19], #16 +1: st1 {v24.16b}, [x24] // store IV - cbnz x4, 99b + cbz x23, 2f + cond_yield_neon + b 99b - ldp x29, x30, [sp], #16 +2: frame_pop ret ENDPROC(aesbs_cbc_decrypt) @@ -731,87 +746,93 @@ CPU_BE( .quad 0x87, 1 ) */ __xts_crypt8: mov x6, #1 - lsl x6, x6, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x6, x6, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x6, x6, xzr, mi - ld1 {v0.16b}, [x1], #16 + ld1 {v0.16b}, [x20], #16 next_tweak v26, v25, v30, v31 eor v0.16b, v0.16b, v25.16b tbnz x6, #1, 0f - ld1 {v1.16b}, [x1], #16 + ld1 {v1.16b}, [x20], #16 next_tweak v27, v26, v30, v31 eor v1.16b, v1.16b, v26.16b tbnz x6, #2, 0f - ld1 {v2.16b}, [x1], #16 + ld1 {v2.16b}, [x20], #16 next_tweak v28, v27, v30, v31 eor v2.16b, v2.16b, v27.16b tbnz x6, #3, 0f - ld1 {v3.16b}, [x1], #16 + ld1 {v3.16b}, [x20], #16 next_tweak v29, v28, v30, v31 eor v3.16b, v3.16b, v28.16b tbnz x6, #4, 0f - ld1 {v4.16b}, [x1], #16 - str q29, [sp, #16] + ld1 {v4.16b}, [x20], #16 + str q29, [sp, #.Lframe_local_offset] eor v4.16b, v4.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #5, 0f - ld1 {v5.16b}, [x1], #16 - str q29, [sp, #32] + ld1 {v5.16b}, [x20], #16 + str q29, [sp, #.Lframe_local_offset + 16] eor v5.16b, v5.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #6, 0f - ld1 {v6.16b}, [x1], #16 - str q29, [sp, #48] + ld1 {v6.16b}, [x20], #16 + str q29, [sp, #.Lframe_local_offset + 32] eor v6.16b, v6.16b, v29.16b next_tweak v29, v29, v30, v31 tbnz x6, #7, 0f - ld1 {v7.16b}, [x1], #16 - str q29, [sp, #64] + ld1 {v7.16b}, [x20], #16 + str q29, [sp, #.Lframe_local_offset + 48] eor v7.16b, v7.16b, v29.16b next_tweak v29, v29, v30, v31 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 br x7 ENDPROC(__xts_crypt8) .macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7 - stp x29, x30, [sp, #-80]! - mov x29, sp + frame_push 6, 64 - ldr q30, .Lxts_mul_x - ld1 {v25.16b}, [x5] + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + +0: ldr q30, .Lxts_mul_x + ld1 {v25.16b}, [x24] 99: adr x7, \do8 bl __xts_crypt8 - ldp q16, q17, [sp, #16] - ldp q18, q19, [sp, #48] + ldp q16, q17, [sp, #.Lframe_local_offset] + ldp q18, q19, [sp, #.Lframe_local_offset + 32] eor \o0\().16b, \o0\().16b, v25.16b eor \o1\().16b, \o1\().16b, v26.16b eor \o2\().16b, \o2\().16b, v27.16b eor \o3\().16b, \o3\().16b, v28.16b - st1 {\o0\().16b}, [x0], #16 + st1 {\o0\().16b}, [x19], #16 mov v25.16b, v26.16b tbnz x6, #1, 1f - st1 {\o1\().16b}, [x0], #16 + st1 {\o1\().16b}, [x19], #16 mov v25.16b, v27.16b tbnz x6, #2, 1f - st1 {\o2\().16b}, [x0], #16 + st1 {\o2\().16b}, [x19], #16 mov v25.16b, v28.16b tbnz x6, #3, 1f - st1 {\o3\().16b}, [x0], #16 + st1 {\o3\().16b}, [x19], #16 mov v25.16b, v29.16b tbnz x6, #4, 1f @@ -820,18 +841,22 @@ ENDPROC(__xts_crypt8) eor \o6\().16b, \o6\().16b, v18.16b eor \o7\().16b, \o7\().16b, v19.16b - st1 {\o4\().16b}, [x0], #16 + st1 {\o4\().16b}, [x19], #16 tbnz x6, #5, 1f - st1 {\o5\().16b}, [x0], #16 + st1 {\o5\().16b}, [x19], #16 tbnz x6, #6, 1f - st1 {\o6\().16b}, [x0], #16 + st1 {\o6\().16b}, [x19], #16 tbnz x6, #7, 1f - st1 {\o7\().16b}, [x0], #16 + st1 {\o7\().16b}, [x19], #16 + + cbz x23, 1f + st1 {v25.16b}, [x24] - cbnz x4, 99b + cond_yield_neon 0b + b 99b -1: st1 {v25.16b}, [x5] - ldp x29, x30, [sp], #80 +1: st1 {v25.16b}, [x24] + frame_pop ret .endm @@ -856,24 +881,31 @@ ENDPROC(aesbs_xts_decrypt) * int rounds, int blocks, u8 iv[], u8 final[]) */ ENTRY(aesbs_ctr_encrypt) - stp x29, x30, [sp, #-16]! - mov x29, sp - - cmp x6, #0 - cset x10, ne - add x4, x4, x10 // do one extra block if final - - ldp x7, x8, [x5] - ld1 {v0.16b}, [x5] + frame_push 8 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + cmp x25, #0 + cset x26, ne + add x23, x23, x26 // do one extra block if final + +98: ldp x7, x8, [x24] + ld1 {v0.16b}, [x24] CPU_LE( rev x7, x7 ) CPU_LE( rev x8, x8 ) adds x8, x8, #1 adc x7, x7, xzr 99: mov x9, #1 - lsl x9, x9, x4 - subs w4, w4, #8 - csel x4, x4, xzr, pl + lsl x9, x9, x23 + subs w23, w23, #8 + csel x23, x23, xzr, pl csel x9, x9, xzr, le tbnz x9, #1, 0f @@ -891,82 +923,85 @@ CPU_LE( rev x8, x8 ) tbnz x9, #7, 0f next_ctr v7 -0: mov bskey, x2 - mov rounds, x3 +0: mov bskey, x21 + mov rounds, x22 bl aesbs_encrypt8 - lsr x9, x9, x10 // disregard the extra block + lsr x9, x9, x26 // disregard the extra block tbnz x9, #0, 0f - ld1 {v8.16b}, [x1], #16 + ld1 {v8.16b}, [x20], #16 eor v0.16b, v0.16b, v8.16b - st1 {v0.16b}, [x0], #16 + st1 {v0.16b}, [x19], #16 tbnz x9, #1, 1f - ld1 {v9.16b}, [x1], #16 + ld1 {v9.16b}, [x20], #16 eor v1.16b, v1.16b, v9.16b - st1 {v1.16b}, [x0], #16 + st1 {v1.16b}, [x19], #16 tbnz x9, #2, 2f - ld1 {v10.16b}, [x1], #16 + ld1 {v10.16b}, [x20], #16 eor v4.16b, v4.16b, v10.16b - st1 {v4.16b}, [x0], #16 + st1 {v4.16b}, [x19], #16 tbnz x9, #3, 3f - ld1 {v11.16b}, [x1], #16 + ld1 {v11.16b}, [x20], #16 eor v6.16b, v6.16b, v11.16b - st1 {v6.16b}, [x0], #16 + st1 {v6.16b}, [x19], #16 tbnz x9, #4, 4f - ld1 {v12.16b}, [x1], #16 + ld1 {v12.16b}, [x20], #16 eor v3.16b, v3.16b, v12.16b - st1 {v3.16b}, [x0], #16 + st1 {v3.16b}, [x19], #16 tbnz x9, #5, 5f - ld1 {v13.16b}, [x1], #16 + ld1 {v13.16b}, [x20], #16 eor v7.16b, v7.16b, v13.16b - st1 {v7.16b}, [x0], #16 + st1 {v7.16b}, [x19], #16 tbnz x9, #6, 6f - ld1 {v14.16b}, [x1], #16 + ld1 {v14.16b}, [x20], #16 eor v2.16b, v2.16b, v14.16b - st1 {v2.16b}, [x0], #16 + st1 {v2.16b}, [x19], #16 tbnz x9, #7, 7f - ld1 {v15.16b}, [x1], #16 + ld1 {v15.16b}, [x20], #16 eor v5.16b, v5.16b, v15.16b - st1 {v5.16b}, [x0], #16 + st1 {v5.16b}, [x19], #16 8: next_ctr v0 - cbnz x4, 99b + st1 {v0.16b}, [x24] + cbz x23, 0f + + cond_yield_neon 98b + b 99b -0: st1 {v0.16b}, [x5] - ldp x29, x30, [sp], #16 +0: frame_pop ret /* * If we are handling the tail of the input (x6 != NULL), return the * final keystream block back to the caller. */ -1: cbz x6, 8b - st1 {v1.16b}, [x6] +1: cbz x25, 8b + st1 {v1.16b}, [x25] b 8b -2: cbz x6, 8b - st1 {v4.16b}, [x6] +2: cbz x25, 8b + st1 {v4.16b}, [x25] b 8b -3: cbz x6, 8b - st1 {v6.16b}, [x6] +3: cbz x25, 8b + st1 {v6.16b}, [x25] b 8b -4: cbz x6, 8b - st1 {v3.16b}, [x6] +4: cbz x25, 8b + st1 {v3.16b}, [x25] b 8b -5: cbz x6, 8b - st1 {v7.16b}, [x6] +5: cbz x25, 8b + st1 {v7.16b}, [x25] b 8b -6: cbz x6, 8b - st1 {v2.16b}, [x6] +6: cbz x25, 8b + st1 {v2.16b}, [x25] b 8b -7: cbz x6, 8b - st1 {v5.16b}, [x6] +7: cbz x25, 8b + st1 {v5.16b}, [x25] b 8b ENDPROC(aesbs_ctr_encrypt) From patchwork Mon Apr 30 16:18:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134716 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948251lji; Mon, 30 Apr 2018 09:18:58 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqdWpWHeRh4JVl4JSdaTd5m0RaDIQztPhlyrqRVIrOtRDL6okA8nZOrm3VpWieqBp66GqaH X-Received: by 2002:a63:7516:: with SMTP id q22-v6mr10527816pgc.68.1525105138248; Mon, 30 Apr 2018 09:18:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105138; cv=none; d=google.com; s=arc-20160816; b=b7cFmy0zUjPaH0Vd7Ghbzqi7jbUKOOi0e4PyaouqNVs0xoE4nKLgbJdR/5Kn4WXZz3 VFplaxXEFfTS1kMiZ7LtYiK1+MeIaBqbJkmYtCILHnaPrEdl0QddaVxZuQZXkKphoyHo 8o82INsDxOBUQsxPi0arIQaOhVQy3EELVJEPy1CokozcAy+bNXn7PyEG1IfA+CGXNUDB aj3IUWXtjtt7CrjVXvAbNzE4bphzejCUFSoOJFqLze3//QEkNaBg7LBfDLYMzip+g/Us 7wUnVQ6Qp44TaGIZGPS/hO8S1WBbr9h5wrcSBYI2fJ2gXLvdgbqzob3g6V7KeLoclc+c ZsOA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=otq+ZDRLnmO2iM8H5dpLOgh6Ie0uRtzEdTL5p3+yWuw=; b=HJOl0dLOZIWfVhIvZqYWQ6iIH5KzsniuSBC6S65YDJWjwZdADyduvosqZG3MpWITBl Yv7zKB4r8ONnCuBmGUyxe9FseSpOliZkYAJK2ceRhBYDxI/qhX3Hn6c7puC6JmOcctFL kuERPSM3v+IPWhGcsq96ayaH5In1YRs7BdFQAgfYdl2bqRLv9JslffT5gtx4+8CPBd+b wdS4GfpiQWdo9Abx6DZTbd3xENQyiYkeSG1ldKuO3DzsawgpH63Fg+ocbiAJ4dsNwZi/ 71ifEyxuOSvXbvfu4IT7VavQ0AGX4Kt0SKlq/arIdoQ3/GCs8MQpIZqOsPjr4DG+tO6p svOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OT9/HXm0; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.58; Mon, 30 Apr 2018 09:18:58 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=OT9/HXm0; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754628AbeD3QS5 (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:57 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:39113 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QS4 (ORCPT ); Mon, 30 Apr 2018 12:18:56 -0400 Received: by mail-wr0-f193.google.com with SMTP id q3-v6so8580334wrj.6 for ; Mon, 30 Apr 2018 09:18:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=otq+ZDRLnmO2iM8H5dpLOgh6Ie0uRtzEdTL5p3+yWuw=; b=OT9/HXm08H2z0lZV81YHpL+x3eri0vbs54TzHsCwgk+oCiG37sVNx4FpPF8BKNbGCz mKDdYZ4JMdlCZk+hvK7+CZ9S8RiRH6aA6ba6rl86DWDGFSqjsRQDAieFEH+AwDyU70Vv JZA8C0YM/aXiUzrY5VLympTSqkcCaY4d5cJDI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=otq+ZDRLnmO2iM8H5dpLOgh6Ie0uRtzEdTL5p3+yWuw=; b=XAIkRqfZqWwUh71oYCFEgKJCfSBKBRSR2PbceAH87qRF3cH1Ty3Muf09zTPGaY4Kxh zu2RjlbRrHAoXrY7EIydtn+9aeu/9NyYJ31GpRBhnR5VTa16tqCf8ApkFoL9unLOeDRH 6cfyKS3CD/NMFZUPhuGmh7P6wH/JrD8HI0S+JqoqnDI8BfJwSKRXMzewVhJ3PEpq2dUc rt27HJ05C8Wwwn1kgbyTRey1xds27nZSer3fCgdKU6nPnigMP3Z2D+t3EIdv5jqFb8pj 6Es95+xoVWM6dtv+u5F32gmJ/JA3yeSajcSYko58efHgY31KNfYBlWrgbrlJXDi7IKc5 D4hA== X-Gm-Message-State: ALQs6tDRcYS+vT/ypbd7lmPYyFmFjkMCxZRKMVkQhLyRgiX1kVj8o17I hpzVYfnYJ9AfOpoR7js/mBjmx4904F4= X-Received: by 2002:adf:aa48:: with SMTP id q8-v6mr8775466wrd.140.1525105134845; Mon, 30 Apr 2018 09:18:54 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.52 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:54 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 06/10] crypto: arm64/aes-ghash - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:26 +0200 Message-Id: <20180430161830.14892-7-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/ghash-ce-core.S | 113 ++++++++++++++------ arch/arm64/crypto/ghash-ce-glue.c | 28 +++-- 2 files changed, 97 insertions(+), 44 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/ghash-ce-core.S b/arch/arm64/crypto/ghash-ce-core.S index 11ebf1ae248a..dcffb9e77589 100644 --- a/arch/arm64/crypto/ghash-ce-core.S +++ b/arch/arm64/crypto/ghash-ce-core.S @@ -213,22 +213,31 @@ .endm .macro __pmull_ghash, pn - ld1 {SHASH.2d}, [x3] - ld1 {XL.2d}, [x1] + frame_push 5 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + +0: ld1 {SHASH.2d}, [x22] + ld1 {XL.2d}, [x20] ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 eor SHASH2.16b, SHASH2.16b, SHASH.16b __pmull_pre_\pn /* do the head block first, if supplied */ - cbz x4, 0f - ld1 {T1.2d}, [x4] - b 1f + cbz x23, 1f + ld1 {T1.2d}, [x23] + mov x23, xzr + b 2f -0: ld1 {T1.2d}, [x2], #16 - sub w0, w0, #1 +1: ld1 {T1.2d}, [x21], #16 + sub w19, w19, #1 -1: /* multiply XL by SHASH in GF(2^128) */ +2: /* multiply XL by SHASH in GF(2^128) */ CPU_LE( rev64 T1.16b, T1.16b ) ext T2.16b, XL.16b, XL.16b, #8 @@ -250,9 +259,18 @@ CPU_LE( rev64 T1.16b, T1.16b ) eor T2.16b, T2.16b, XH.16b eor XL.16b, XL.16b, T2.16b - cbnz w0, 0b + cbz w19, 3f + + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b - st1 {XL.2d}, [x1] +3: st1 {XL.2d}, [x20] + frame_pop ret .endm @@ -304,38 +322,55 @@ ENDPROC(pmull_ghash_update_p8) .endm .macro pmull_gcm_do_crypt, enc - ld1 {SHASH.2d}, [x4] - ld1 {XL.2d}, [x1] - ldr x8, [x5, #8] // load lower counter + frame_push 10 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + mov x26, x7 + .if \enc == 1 + ldr x27, [sp, #96] // first stacked arg + .endif + + ldr x28, [x24, #8] // load lower counter +CPU_LE( rev x28, x28 ) + +0: mov x0, x25 + load_round_keys w26, x0 + ld1 {SHASH.2d}, [x23] + ld1 {XL.2d}, [x20] movi MASK.16b, #0xe1 ext SHASH2.16b, SHASH.16b, SHASH.16b, #8 -CPU_LE( rev x8, x8 ) shl MASK.2d, MASK.2d, #57 eor SHASH2.16b, SHASH2.16b, SHASH.16b .if \enc == 1 - ld1 {KS.16b}, [x7] + ld1 {KS.16b}, [x27] .endif -0: ld1 {CTR.8b}, [x5] // load upper counter - ld1 {INP.16b}, [x3], #16 - rev x9, x8 - add x8, x8, #1 - sub w0, w0, #1 +1: ld1 {CTR.8b}, [x24] // load upper counter + ld1 {INP.16b}, [x22], #16 + rev x9, x28 + add x28, x28, #1 + sub w19, w19, #1 ins CTR.d[1], x9 // set lower counter .if \enc == 1 eor INP.16b, INP.16b, KS.16b // encrypt input - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif rev64 T1.16b, INP.16b - cmp w6, #12 - b.ge 2f // AES-192/256? + cmp w26, #12 + b.ge 4f // AES-192/256? -1: enc_round CTR, v21 +2: enc_round CTR, v21 ext T2.16b, XL.16b, XL.16b, #8 ext IN1.16b, T1.16b, T1.16b, #8 @@ -390,27 +425,39 @@ CPU_LE( rev x8, x8 ) .if \enc == 0 eor INP.16b, INP.16b, KS.16b - st1 {INP.16b}, [x2], #16 + st1 {INP.16b}, [x21], #16 .endif - cbnz w0, 0b + cbz w19, 3f -CPU_LE( rev x8, x8 ) - st1 {XL.2d}, [x1] - str x8, [x5, #8] // store lower counter + if_will_cond_yield_neon + st1 {XL.2d}, [x20] + .if \enc == 1 + st1 {KS.16b}, [x27] + .endif + do_cond_yield_neon + b 0b + endif_yield_neon + b 1b + +3: st1 {XL.2d}, [x20] .if \enc == 1 - st1 {KS.16b}, [x7] + st1 {KS.16b}, [x27] .endif +CPU_LE( rev x28, x28 ) + str x28, [x24, #8] // store lower counter + + frame_pop ret -2: b.eq 3f // AES-192? +4: b.eq 5f // AES-192? enc_round CTR, v17 enc_round CTR, v18 -3: enc_round CTR, v19 +5: enc_round CTR, v19 enc_round CTR, v20 - b 1b + b 2b .endm /* diff --git a/arch/arm64/crypto/ghash-ce-glue.c b/arch/arm64/crypto/ghash-ce-glue.c index cfc9c92814fd..7cf0b1aa6ea8 100644 --- a/arch/arm64/crypto/ghash-ce-glue.c +++ b/arch/arm64/crypto/ghash-ce-glue.c @@ -63,11 +63,12 @@ static void (*pmull_ghash_update)(int blocks, u64 dg[], const char *src, asmlinkage void pmull_gcm_encrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds, u8 ks[]); + u8 ctr[], u32 const rk[], int rounds, + u8 ks[]); asmlinkage void pmull_gcm_decrypt(int blocks, u64 dg[], u8 dst[], const u8 src[], struct ghash_key const *k, - u8 ctr[], int rounds); + u8 ctr[], u32 const rk[], int rounds); asmlinkage void pmull_gcm_encrypt_block(u8 dst[], u8 const src[], u32 const rk[], int rounds); @@ -368,26 +369,29 @@ static int gcm_encrypt(struct aead_request *req) pmull_gcm_encrypt_block(ks, iv, NULL, num_rounds(&ctx->aes_key)); put_unaligned_be32(3, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_encrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key), ks); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key), ks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_encrypt(&walk, req, true); + err = skcipher_walk_aead_encrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -467,15 +471,19 @@ static int gcm_decrypt(struct aead_request *req) pmull_gcm_encrypt_block(tag, iv, ctx->aes_key.key_enc, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); + kernel_neon_end(); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; + kernel_neon_begin(); pmull_gcm_decrypt(blocks, dg, walk.dst.virt.addr, walk.src.virt.addr, &ctx->ghash_key, - iv, num_rounds(&ctx->aes_key)); + iv, ctx->aes_key.key_enc, + num_rounds(&ctx->aes_key)); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); @@ -483,14 +491,12 @@ static int gcm_decrypt(struct aead_request *req) if (walk.nbytes) pmull_gcm_encrypt_block(iv, iv, NULL, num_rounds(&ctx->aes_key)); - - kernel_neon_end(); } else { __aes_arm64_encrypt(ctx->aes_key.key_enc, tag, iv, num_rounds(&ctx->aes_key)); put_unaligned_be32(2, iv + GCM_IV_SIZE); - err = skcipher_walk_aead_decrypt(&walk, req, true); + err = skcipher_walk_aead_decrypt(&walk, req, false); while (walk.nbytes >= AES_BLOCK_SIZE) { int blocks = walk.nbytes / AES_BLOCK_SIZE; From patchwork Mon Apr 30 16:18:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134717 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948272lji; Mon, 30 Apr 2018 09:19:00 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqBzYXtG6X8rkEb4wm143GP6CjISOuJz+5WSWb7TDQksaX1j1K2IBpaBI6t+Q/PxK2bXiEu X-Received: by 2002:a63:731c:: with SMTP id o28-v6mr10718337pgc.238.1525105139905; Mon, 30 Apr 2018 09:18:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105139; cv=none; d=google.com; s=arc-20160816; b=k5epAXMAiFjeMjFoBor6mCYwYqweypQIytJWBX1dc1L8iZ0A6nFKoHCdUPkvb/eX0p RCh7WPK9WidnfcKGf7Xen71QP5lYP7Z5Af2Sm9Jd5k/onreyqWiR2/ZWYSBs7JzVqgDv sGklTsly1EnIcOzqOh/WUIVUOWCWAW+inDk2FN4LvbyKF9XKZA5sVDl7+SHqs/8t5hKz b17Muie6cuy34/BHKiqV8ZjXNnEO3EU8YhMfx+WR6VVP0yhNSboV+Bj5DTB8Xf/lVBBs bpM4CjJCCsns7YLR4KvfHPKdyGOg4c5JGppaqGrbvDIQP2LEGRsGWH1CWWjr+6qM4xKM R9nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=uiAXX4mnFgx+3v/OaTR5aCkx9BqrwLW+ytg1HxR66po=; b=tpxHf/Hvs4P/0E5U/+FmoK2HgFV6vhxJBdu46rtejyb1hEOMeyjbT9gSE1UyKm9Cnq yoIBLNldTl+HVEYxut+bU0ylurQa5g8/BzHJG6w5NbuLA0Yj9jrKJOoSVRUdqc10n7eL Od3LwaYA1RvqcW4cIe8VO39t856B0DairDGD8vLCxdJbDiygnslqgoXr5K0FRB6ihCoA 3TLUw5Eh/WL1ZyUp1Dpb2z8FVFqnnuQf0Zlw87jOZRR4U7u7Ymsu2P54ZlGESD7t3XdI lnoVWUn9zsvk+VOWw6+OaqLa+ACQdgrX2YUogxDXia8WMcXPiK0pdVVQIjRw0evAqk5k FrGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bAKgMcVV; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.18.59; Mon, 30 Apr 2018 09:18:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bAKgMcVV; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754645AbeD3QS6 (ORCPT + 1 other); Mon, 30 Apr 2018 12:18:58 -0400 Received: from mail-wr0-f196.google.com ([209.85.128.196]:38350 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QS6 (ORCPT ); Mon, 30 Apr 2018 12:18:58 -0400 Received: by mail-wr0-f196.google.com with SMTP id 94-v6so7263442wrf.5 for ; Mon, 30 Apr 2018 09:18:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=uiAXX4mnFgx+3v/OaTR5aCkx9BqrwLW+ytg1HxR66po=; b=bAKgMcVVrWVqOUgXOgoRWiKzOj7Mdpn8yzV7FLMKgTc2x87rdCNdK+lvnTPEO5rMRh YPppZeNdJD5oP7hBlnTUkF+d/65pRwDdJoslNxAeltEmShFmH8uZl+WrqxMLWGRhyJwn uI4liPNHm1y+FaNlqI1j7kIId41GHElnIDHh4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=uiAXX4mnFgx+3v/OaTR5aCkx9BqrwLW+ytg1HxR66po=; b=p5Kuu0DCXOs8Lr2ZUzfNBFVBUPrGg4BKPqWPKgjjmAqIRkeeyxos6XNzQ4Znhz9CBU dZvGyT/IUd0SPoN0WhP+u7zUkbehNWHROTtibMXVTum4gBGt1WameT2cUs/0tM4un9ws A03zPf1Ng1IZkZzpEsowWXtK96PPcGTz6qz28oQ1blrJlQ0pEgg9hdPRDtqOBEjB5QeC PYUq1Sx4PKTiWE2CzU5Jt7kUdG4NtK0q4zzQ3mteVKGY7TCaHou3J+h3DNDa65toNov6 Zx/0tgs8lIN2/jQ2VZqOaP22swKxDRMrSM6O80i3YwTo4q3+41vvUkn6MrVRcm8GcFSd LiZg== X-Gm-Message-State: ALQs6tD0mKjcyZmhR9ssFEwwonSyHgQEMAD9OkkYZlye9lWzuRuVJ+zn F6JCqvf011jC60gW0JeHn7T3/Hc6/kU= X-Received: by 2002:adf:b352:: with SMTP id k18-v6mr8908101wrd.95.1525105136905; Mon, 30 Apr 2018 09:18:56 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.54 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:56 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 07/10] crypto: arm64/crc32-ce - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:27 +0200 Message-Id: <20180430161830.14892-8-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crc32-ce-core.S | 40 +++++++++++++++----- 1 file changed, 30 insertions(+), 10 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/crc32-ce-core.S b/arch/arm64/crypto/crc32-ce-core.S index 16ed3c7ebd37..8061bf0f9c66 100644 --- a/arch/arm64/crypto/crc32-ce-core.S +++ b/arch/arm64/crypto/crc32-ce-core.S @@ -100,9 +100,10 @@ dCONSTANT .req d0 qCONSTANT .req q0 - BUF .req x0 - LEN .req x1 - CRC .req x2 + BUF .req x19 + LEN .req x20 + CRC .req x21 + CONST .req x22 vzr .req v9 @@ -123,7 +124,14 @@ ENTRY(crc32_pmull_le) ENTRY(crc32c_pmull_le) adr_l x3, .Lcrc32c_constants -0: bic LEN, LEN, #15 +0: frame_push 4, 64 + + mov BUF, x0 + mov LEN, x1 + mov CRC, x2 + mov CONST, x3 + + bic LEN, LEN, #15 ld1 {v1.16b-v4.16b}, [BUF], #0x40 movi vzr.16b, #0 fmov dCONSTANT, CRC @@ -132,7 +140,7 @@ ENTRY(crc32c_pmull_le) cmp LEN, #0x40 b.lt less_64 - ldr qCONSTANT, [x3] + ldr qCONSTANT, [CONST] loop_64: /* 64 bytes Full cache line folding */ sub LEN, LEN, #0x40 @@ -162,10 +170,21 @@ loop_64: /* 64 bytes Full cache line folding */ eor v4.16b, v4.16b, v8.16b cmp LEN, #0x40 - b.ge loop_64 + b.lt less_64 + + if_will_cond_yield_neon + stp q1, q2, [sp, #.Lframe_local_offset] + stp q3, q4, [sp, #.Lframe_local_offset + 32] + do_cond_yield_neon + ldp q1, q2, [sp, #.Lframe_local_offset] + ldp q3, q4, [sp, #.Lframe_local_offset + 32] + ldr qCONSTANT, [CONST] + movi vzr.16b, #0 + endif_yield_neon + b loop_64 less_64: /* Folding cache line into 128bit */ - ldr qCONSTANT, [x3, #16] + ldr qCONSTANT, [CONST, #16] pmull2 v5.1q, v1.2d, vCONSTANT.2d pmull v1.1q, v1.1d, vCONSTANT.1d @@ -204,8 +223,8 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* final 32-bit fold */ - ldr dCONSTANT, [x3, #32] - ldr d3, [x3, #40] + ldr dCONSTANT, [CONST, #32] + ldr d3, [CONST, #40] ext v2.16b, v1.16b, vzr.16b, #4 and v1.16b, v1.16b, v3.16b @@ -213,7 +232,7 @@ fold_64: eor v1.16b, v1.16b, v2.16b /* Finish up with the bit-reversed barrett reduction 64 ==> 32 bits */ - ldr qCONSTANT, [x3, #48] + ldr qCONSTANT, [CONST, #48] and v2.16b, v1.16b, v3.16b ext v2.16b, vzr.16b, v2.16b, #8 @@ -223,6 +242,7 @@ fold_64: eor v1.16b, v1.16b, v2.16b mov w0, v1.s[1] + frame_pop ret ENDPROC(crc32_pmull_le) ENDPROC(crc32c_pmull_le) From patchwork Mon Apr 30 16:18:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134718 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948309lji; Mon, 30 Apr 2018 09:19:02 -0700 (PDT) X-Google-Smtp-Source: AB8JxZobqTXjz6xrkoTpTdBI1QaNy3KAiGtyjLtM3J5Q4T4uEuqA567cLgUIzFKjeFSB7nkJuY4N X-Received: by 2002:a63:3389:: with SMTP id z131-v6mr10533204pgz.386.1525105142254; Mon, 30 Apr 2018 09:19:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105142; cv=none; d=google.com; s=arc-20160816; b=R4T3Kbo1+UiQzTBcYSuX8FEcGAX9kTuLM4dOYM3Pp07t2OcjhXJ5x6lozAsjswgc1f djYIXL4equyYWn0268Pqj2oAGcuxbL8dIbSahokK4EEW8CrkUcNilFzM/XTaTY/j58sx fE+zzJ4FIB1iCfLNdK0Kl2+xU0GV8bsfzYSDDvIRna6ETn5Rf9Gopew3zpZ9lTEbzo0Q RL5wd0a0n+SoU8InnpHr9GS/iNlZ8mGZpxg+KV04qC9vH18vmOlAKuxZPSIxYa7CIim5 9VesrOXfEd73JaM6Kbe4vcU4SMiGSUAN9rQ6Xen3WomNETs3xM6ECRBXmguYy699EcYt AXWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=Gthgd64Km1iH1bKOjg6MqzV/Ss4xdEPwnIkWfLymZ/g=; b=tc+tAQDP9WWZtxkhSSF0ajiI5ey0GumnKCdbg5AUsLe634h0F9izJntZ+W+SgoUESN 05Yb13rb2/1YAh6odl7nWkRT0ar/7/poOh6XRtve+1FhLFn2gIoSV3SNz/fW12x+hndz 5l+KXu2IIXdgALaxl2aEzwmIh1HyiWC9KfYlFY9pQa1KCFI76+/cRIl3f0hk4jEPo/U9 lRqoWVkkum/2/MNZAZ88EJbroNxzDfXBS8acYkqsjmphuos8oLXr5RWym+ou4c9fWclQ 0BoSIvJaKYXxQVtiiqctyqgW6rXe76H4jTTkSICOhFKDIwdVv8jXrJWWInIc2J9gXEOD M+uQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZHA/QYsp; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.19.02; Mon, 30 Apr 2018 09:19:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ZHA/QYsp; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754631AbeD3QTB (ORCPT + 1 other); Mon, 30 Apr 2018 12:19:01 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:34488 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QTA (ORCPT ); Mon, 30 Apr 2018 12:19:00 -0400 Received: by mail-wm0-f65.google.com with SMTP id a137so10639828wme.1 for ; Mon, 30 Apr 2018 09:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Gthgd64Km1iH1bKOjg6MqzV/Ss4xdEPwnIkWfLymZ/g=; b=ZHA/QYspwri/y8m1Kw1dhwEIfX1P+tFGWd9MEcVbzVEGufJep0ngvCUQN5S7ECVRhR v4T/0mXYbPmEhvkPtt2UXcambT6+OU1Rq2Mo+IkhkAxS9bInIX3GfP/Jj1g7an5JXTXN gcHz7Qt4pFUYXN9ZEL9QADFpmV65t8T5X54fA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Gthgd64Km1iH1bKOjg6MqzV/Ss4xdEPwnIkWfLymZ/g=; b=hqQZuXCsQZ663a+RUkuuduXTXxk4B4CeFV8tnf4Z7bApwmMk2Pq82xnYYIQAUQGSzk FtuqW+4hi7ozHmSVL4rx7SkSdvocDYoReeyoG/Gf5MmRSHVh1mWqsDOBepZXz0Vqm4eI K5SVwKbeiBL5lWbUgs7afCMru8fc2HWeINIAbx30Y0auK4QlsX+66LdP0WrsDjaDiPUS SUBvyj1y9thS6hqzVv5rowaTgxclaSie+Q4ZtMfzobM8xCUPGVhG0YDp0sUaBh0Hm5Fb jmDJzJZB0cLbcqUiV2xcED58utcNSl0X7F4UtDKMQHGjQpPs3+Pi93/GR9V5hsiCWgoa 8ihA== X-Gm-Message-State: ALQs6tA9K1eR9ObvbF+xBHLBi7/Ih3iZz2wH7DiXftdFvw6Uc3E8b7aV HaC9m4yffTWcdlEExAgW8NEJ1LYWM2k= X-Received: by 10.28.113.196 with SMTP id d65mr7484273wmi.157.1525105139232; Mon, 30 Apr 2018 09:18:59 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:18:58 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 08/10] crypto: arm64/crct10dif-ce - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:28 +0200 Message-Id: <20180430161830.14892-9-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crct10dif-ce-core.S | 32 +++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S index f179c01bd55c..663ea71cdb38 100644 --- a/arch/arm64/crypto/crct10dif-ce-core.S +++ b/arch/arm64/crypto/crct10dif-ce-core.S @@ -74,13 +74,19 @@ .text .cpu generic+crypto - arg1_low32 .req w0 - arg2 .req x1 - arg3 .req x2 + arg1_low32 .req w19 + arg2 .req x20 + arg3 .req x21 vzr .req v13 ENTRY(crc_t10dif_pmull) + frame_push 3, 128 + + mov arg1_low32, w0 + mov arg2, x1 + mov arg3, x2 + movi vzr.16b, #0 // init zero register // adjust the 16-bit initial_crc value, scale it to 32 bits @@ -175,8 +181,25 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 ) subs arg3, arg3, #128 // check if there is another 64B in the buffer to be able to fold - b.ge _fold_64_B_loop + b.lt _fold_64_B_end + + if_will_cond_yield_neon + stp q0, q1, [sp, #.Lframe_local_offset] + stp q2, q3, [sp, #.Lframe_local_offset + 32] + stp q4, q5, [sp, #.Lframe_local_offset + 64] + stp q6, q7, [sp, #.Lframe_local_offset + 96] + do_cond_yield_neon + ldp q0, q1, [sp, #.Lframe_local_offset] + ldp q2, q3, [sp, #.Lframe_local_offset + 32] + ldp q4, q5, [sp, #.Lframe_local_offset + 64] + ldp q6, q7, [sp, #.Lframe_local_offset + 96] + ldr_l q10, rk3, x8 + movi vzr.16b, #0 // init zero register + endif_yield_neon + + b _fold_64_B_loop +_fold_64_B_end: // at this point, the buffer pointer is pointing at the last y Bytes // of the buffer the 64B of folded data is in 4 of the vector // registers: v0, v1, v2, v3 @@ -304,6 +327,7 @@ _barrett: _cleanup: // scale the result back to 16 bits lsr x0, x0, #16 + frame_pop ret _less_than_128: From patchwork Mon Apr 30 16:18:29 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134719 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948345lji; Mon, 30 Apr 2018 09:19:04 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqESADwYTW1Xygmr/i91aJ75rD7+ZVBYZJpzvFVvwcDfQ8mzXrg+4Ohh9PZTaxqYTiMyRaH X-Received: by 2002:a65:5244:: with SMTP id q4-v6mr10405454pgp.201.1525105144529; Mon, 30 Apr 2018 09:19:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105144; cv=none; d=google.com; s=arc-20160816; b=kuonh+WNk1rHqDbJA9SfA3NoXfSOOTXIBF3v+GWMgYTLRJLXblBcV9OlRpPtn0PMV/ RHAgMB3MUmJyC/jryHwaxC1ZxmzRFzVDZmG1kYtv/6Hp5DIxhW9ii0VdXxnfIdvqt2CV 4B9k4pqkPHKcJCTXmIflO1kvFFvW7vaEbZJxs7dShSAvAOWv52vgbER/zSjo57N8cNBc dC3fCRF1Hc/W+DQrsMdmYpPlmt3YXCrZD2JdPB//gwT5KehhAWYAbfKHR9Lqgm9/KyjP VycLOs2IyahhVHbxgQqC4MJLnLykFzHKwqhEN0DbG6BQKYW7h7hAxn7sCceLFvf7Gdyr xBXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=bOSfwQobMJzi1OJdCB+etKpYG/fpkT32nfTf6cOmpZQ=; b=yemYxIx/Man9yabIn2tqFuv5zPEWt1y0Yp5+zVuAT9IHzX9XDS/j8qe8KmprBaQUHz pHZsuOO9vGCszEo9UrHIebW00xOlOGekfcfxaTxmWNATfriIJDCxQE6ksc0SHAXprMnh XcPIfbStYKxEOnWGrncMFicI1D6fpAmtCUnyUBdQZEQYZ644ScQ0C7lrJ3JlQoCwE55/ wYiV+APnGysE75RrE1IF4b2d4sKM06ijNUnvQhbifdMknp35Z6eO6qFJk1movAVyrs+E t1NhYs7iDT3L5H4d2mkfkscQgWHOQPew7dO93lkvOsR7Twqq79BZo2H3AmZAZ9LXDzYh 4Qjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Qbd0+15Y; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.19.04; Mon, 30 Apr 2018 09:19:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Qbd0+15Y; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754651AbeD3QTD (ORCPT + 1 other); Mon, 30 Apr 2018 12:19:03 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:38353 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753334AbeD3QTD (ORCPT ); Mon, 30 Apr 2018 12:19:03 -0400 Received: by mail-wr0-f193.google.com with SMTP id 94-v6so7263645wrf.5 for ; Mon, 30 Apr 2018 09:19:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=bOSfwQobMJzi1OJdCB+etKpYG/fpkT32nfTf6cOmpZQ=; b=Qbd0+15Y2pCybt6hKDfX+JqXZky05wobZH+2WaManVKCkyJSKsUj/wLDp9o0QnS55l 3l5CfqUo6w0Hd94kMfn49+pIflEC1FLZJa5PvWM+fAYTlpPtx5yE8xhMmtD6MACVCnsA 15NXn6EvcSnaYCV0/Fd0k0I4GSl6hrhSCycX8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=bOSfwQobMJzi1OJdCB+etKpYG/fpkT32nfTf6cOmpZQ=; b=Xo3yF2JVGooFEKcvabT8WO7f5CYPkChPC5Abc+Xed9fZSDexbDy9czGFk4z62dON3C rbTzbJ6rjQYL9qc2OMWkRCHiAD0TiU/vo63ZNWp8eoHYqxWgUQbvqoW3QDNAE/eY7jjK AUp28GqEnYBTa85a1n951qq+ZxK7t2dAljxCOzjAOqEMsznpHfuS36yH1EYcF0U/Ys/r aDznGQBTWNormBfNzYcAksS1p7DvsgTjAzmDSjyHaYv2vgfS5WJ54BpCS8nrN05IJ2mX NwSP12u374yCrL+eMWsZU8mtvGjDkbfsocWLLwuwg4Mf0SC0O7G4V4d/fcmjfpuGDcSA HyGg== X-Gm-Message-State: ALQs6tAfCxmCBgu01mkdekg6dKtsbdFxYdhO39c1Wid0KnxNB/+yZQKG BkuWzm60/VGc+QUYSTYl/qrdD503Lqk= X-Received: by 2002:adf:c104:: with SMTP id r4-v6mr9881361wre.84.1525105141405; Mon, 30 Apr 2018 09:19:01 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.18.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:19:00 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 09/10] crypto: arm64/sha3-ce - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:29 +0200 Message-Id: <20180430161830.14892-10-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha3-ce-core.S | 77 +++++++++++++------- 1 file changed, 50 insertions(+), 27 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/sha3-ce-core.S b/arch/arm64/crypto/sha3-ce-core.S index 332ad7530690..a7d587fa54f6 100644 --- a/arch/arm64/crypto/sha3-ce-core.S +++ b/arch/arm64/crypto/sha3-ce-core.S @@ -41,9 +41,16 @@ */ .text ENTRY(sha3_ce_transform) - /* load state */ - add x8, x0, #32 - ld1 { v0.1d- v3.1d}, [x0] + frame_push 4 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + +0: /* load state */ + add x8, x19, #32 + ld1 { v0.1d- v3.1d}, [x19] ld1 { v4.1d- v7.1d}, [x8], #32 ld1 { v8.1d-v11.1d}, [x8], #32 ld1 {v12.1d-v15.1d}, [x8], #32 @@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform) ld1 {v20.1d-v23.1d}, [x8], #32 ld1 {v24.1d}, [x8] -0: sub w2, w2, #1 +1: sub w21, w21, #1 mov w8, #24 adr_l x9, .Lsha3_rcon /* load input */ - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b-v31.8b}, [x1], #24 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b-v31.8b}, [x20], #24 eor v0.8b, v0.8b, v25.8b eor v1.8b, v1.8b, v26.8b eor v2.8b, v2.8b, v27.8b @@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform) eor v5.8b, v5.8b, v30.8b eor v6.8b, v6.8b, v31.8b - tbnz x3, #6, 2f // SHA3-512 + tbnz x22, #6, 3f // SHA3-512 - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b-v30.8b}, [x1], #16 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b-v30.8b}, [x20], #16 eor v7.8b, v7.8b, v25.8b eor v8.8b, v8.8b, v26.8b eor v9.8b, v9.8b, v27.8b @@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform) eor v11.8b, v11.8b, v29.8b eor v12.8b, v12.8b, v30.8b - tbnz x3, #4, 1f // SHA3-384 or SHA3-224 + tbnz x22, #4, 2f // SHA3-384 or SHA3-224 // SHA3-256 - ld1 {v25.8b-v28.8b}, [x1], #32 + ld1 {v25.8b-v28.8b}, [x20], #32 eor v13.8b, v13.8b, v25.8b eor v14.8b, v14.8b, v26.8b eor v15.8b, v15.8b, v27.8b eor v16.8b, v16.8b, v28.8b - b 3f + b 4f -1: tbz x3, #2, 3f // bit 2 cleared? SHA-384 +2: tbz x22, #2, 4f // bit 2 cleared? SHA-384 // SHA3-224 - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b}, [x1], #8 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b}, [x20], #8 eor v13.8b, v13.8b, v25.8b eor v14.8b, v14.8b, v26.8b eor v15.8b, v15.8b, v27.8b eor v16.8b, v16.8b, v28.8b eor v17.8b, v17.8b, v29.8b - b 3f + b 4f // SHA3-512 -2: ld1 {v25.8b-v26.8b}, [x1], #16 +3: ld1 {v25.8b-v26.8b}, [x20], #16 eor v7.8b, v7.8b, v25.8b eor v8.8b, v8.8b, v26.8b -3: sub w8, w8, #1 +4: sub w8, w8, #1 eor3 v29.16b, v4.16b, v9.16b, v14.16b eor3 v26.16b, v1.16b, v6.16b, v11.16b @@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform) eor v0.16b, v0.16b, v31.16b - cbnz w8, 3b - cbnz w2, 0b + cbnz w8, 4b + cbz w21, 5f + + if_will_cond_yield_neon + add x8, x19, #32 + st1 { v0.1d- v3.1d}, [x19] + st1 { v4.1d- v7.1d}, [x8], #32 + st1 { v8.1d-v11.1d}, [x8], #32 + st1 {v12.1d-v15.1d}, [x8], #32 + st1 {v16.1d-v19.1d}, [x8], #32 + st1 {v20.1d-v23.1d}, [x8], #32 + st1 {v24.1d}, [x8] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* save state */ - st1 { v0.1d- v3.1d}, [x0], #32 - st1 { v4.1d- v7.1d}, [x0], #32 - st1 { v8.1d-v11.1d}, [x0], #32 - st1 {v12.1d-v15.1d}, [x0], #32 - st1 {v16.1d-v19.1d}, [x0], #32 - st1 {v20.1d-v23.1d}, [x0], #32 - st1 {v24.1d}, [x0] +5: st1 { v0.1d- v3.1d}, [x19], #32 + st1 { v4.1d- v7.1d}, [x19], #32 + st1 { v8.1d-v11.1d}, [x19], #32 + st1 {v12.1d-v15.1d}, [x19], #32 + st1 {v16.1d-v19.1d}, [x19], #32 + st1 {v20.1d-v23.1d}, [x19], #32 + st1 {v24.1d}, [x19] + frame_pop ret ENDPROC(sha3_ce_transform) From patchwork Mon Apr 30 16:18:30 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 134720 Delivered-To: patch@linaro.org Received: by 10.46.151.6 with SMTP id r6csp3948380lji; Mon, 30 Apr 2018 09:19:06 -0700 (PDT) X-Google-Smtp-Source: AB8JxZorZswQbLxQzM7xxt+Eq+OI/gv4TROPQ7MigwlFxAGqVU+9zqFhDCgT4PSCesPFwt2inj/w X-Received: by 2002:a63:6783:: with SMTP id b125-v6mr10238079pgc.177.1525105146453; Mon, 30 Apr 2018 09:19:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525105146; cv=none; d=google.com; s=arc-20160816; b=FoTqK4Ji3Z0idZ4vnsBBzLr/Nrtjsb1pAuCArz7aJD3coXUDMr710c36KziJMpk8g1 QWZxkxgM20bCPlbJ8R/zr4GO46Iri955iXKONfFCVgimkTBAUVTK2RUQEjztrHeyuzDd 1+hzeD3AeUp+lSjmQGjWgvnv30EEoCclHo0R5jARnKsvZU9hSlSQMIKWjNbSXKhkrUvM ruPr44b8k6wBt4Z3py0UpbdG206+59AJRCEeTL7qDmmCz0ZiCNX8zRfiVzHbA1wkMbzd plVKCyVzytJzbCUjeAxoeVOdn9yJiZU9yhVBPFZyntdzWI92EXAAlEstyobpEuk5OObv EJDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=mwG0WG0+HI1ga14AdY6J5/Dk0o23fBmL75xEvbxHT5g=; b=0awNmv3+HfxcpLtTt9crmAveFoKuaXFX7quG0+I5v+UjoE5sqPHM0SufJx5ZMd3eDY l7oEhERN04jkmtFvlSJJLlWUkvm+Jqin+CmBDMQhqHOhggzmyeMk+hoFPKGSCf48ABJF /qgh8xViWJevPayKrlC+oXmMxqqK6B8dGFLmrKFwB/HXEfVltUQ36/Mw4UoAD67Gsd2W 6Z98/4mDAFx7WcZaSrUlNYbtPt3pDrAGIDQ7FGm1Kq+TlBE/63Zq6qroguROn21GpNlx KBopXXHOK5ebAm3JzxymuVHnm59HBamg4c60WE9OafInmT0R01T7sxqVgGH74IhdAW6K 3C8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CvIWwP0P; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f66-v6si5264259pgc.391.2018.04.30.09.19.06; Mon, 30 Apr 2018 09:19:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=CvIWwP0P; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754705AbeD3QTF (ORCPT + 1 other); Mon, 30 Apr 2018 12:19:05 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:46236 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754674AbeD3QTE (ORCPT ); Mon, 30 Apr 2018 12:19:04 -0400 Received: by mail-wr0-f193.google.com with SMTP id o2-v6so5673370wrj.13 for ; Mon, 30 Apr 2018 09:19:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=mwG0WG0+HI1ga14AdY6J5/Dk0o23fBmL75xEvbxHT5g=; b=CvIWwP0PEVVhdyqomj0HzfDrVHbGRDdBm0ORNmdI8+UO5PQH/hIPQ8BGd8/ukoj93X GOFK5FWPE5BVGGxtcdKm+bMbqm52U0aCjagoooO6yS3CNAPiIZf1V66gwNkfnKHX9oXA DZo1p7+6fy4+947WOcP193waFK98zejrAK0wE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=mwG0WG0+HI1ga14AdY6J5/Dk0o23fBmL75xEvbxHT5g=; b=Acb89LLkLfdMDOr5e/j7Zrc5boLSlfVkU9UvIEssl0vMb58810fiVNNil6zcSOMCjl XbRjxwDqkTNy6005hmkXRbheq1nFzasZWcrs+wnUcl7/JJKVuId1EwPQHrhh/aeUmsB2 g1JMxQHC9VWbfDEGoyn1n3WqYPQYAo0f10KWzrCN24SJHzoKSYpNEUkR3U7H4YV88qil NKxmj6LjtAduQPXxYD5y2C8/bVJsvoC1HA31fj/6Yj20WiKZYh/NEOXAswWaS0wqn/6F CIZUqwzYy0ezRcfDG/+4yEv751aErDSMQtsGur/XA3RyF0MmOw+OTjLrLPFQPgZ+RZ52 EZlw== X-Gm-Message-State: ALQs6tDp/Re2wO//V5vwyLfNzp4ObzHSTEtlkM0QZ1Zhf4ycaPm3qDYD aipfzCPJx6lzePMX75kBLBJhCvLQwuY= X-Received: by 2002:adf:db85:: with SMTP id u5-v6mr8828662wri.278.1525105143500; Mon, 30 Apr 2018 09:19:03 -0700 (PDT) Received: from localhost.localdomain ([2a01:e35:3995:5470:200:1aff:fe1b:b328]) by smtp.gmail.com with ESMTPSA id l1-v6sm5753845wre.54.2018.04.30.09.19.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 30 Apr 2018 09:19:02 -0700 (PDT) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org, herbert@gondor.apana.org.au Cc: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com, will.deacon@arm.com, Ard Biesheuvel Subject: [PATCH resend 10/10] crypto: arm64/sha512-ce - yield NEON after every block of input Date: Mon, 30 Apr 2018 18:18:30 +0200 Message-Id: <20180430161830.14892-11-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20180430161830.14892-1-ard.biesheuvel@linaro.org> References: <20180430161830.14892-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha512-ce-core.S | 27 +++++++++++++++----- 1 file changed, 21 insertions(+), 6 deletions(-) -- 2.17.0 diff --git a/arch/arm64/crypto/sha512-ce-core.S b/arch/arm64/crypto/sha512-ce-core.S index 7f3bca5c59a2..ce65e3abe4f2 100644 --- a/arch/arm64/crypto/sha512-ce-core.S +++ b/arch/arm64/crypto/sha512-ce-core.S @@ -107,17 +107,23 @@ */ .text ENTRY(sha512_ce_transform) + frame_push 3 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load state */ - ld1 {v8.2d-v11.2d}, [x0] +0: ld1 {v8.2d-v11.2d}, [x19] /* load first 4 round constants */ adr_l x3, .Lsha512_rcon ld1 {v20.2d-v23.2d}, [x3], #64 /* load input */ -0: ld1 {v12.2d-v15.2d}, [x1], #64 - ld1 {v16.2d-v19.2d}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v12.2d-v15.2d}, [x20], #64 + ld1 {v16.2d-v19.2d}, [x20], #64 + sub w21, w21, #1 CPU_LE( rev64 v12.16b, v12.16b ) CPU_LE( rev64 v13.16b, v13.16b ) @@ -196,9 +202,18 @@ CPU_LE( rev64 v19.16b, v19.16b ) add v11.2d, v11.2d, v3.2d /* handled all input blocks? */ - cbnz w2, 0b + cbz w21, 3f + + if_will_cond_yield_neon + st1 {v8.2d-v11.2d}, [x19] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* store new state */ -3: st1 {v8.2d-v11.2d}, [x0] +3: st1 {v8.2d-v11.2d}, [x19] + frame_pop ret ENDPROC(sha512_ce_transform)