From patchwork Sat Mar 10 15:22:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 131310 Delivered-To: patch@linaro.org Received: by 10.46.66.2 with SMTP id p2csp2245230lja; Sat, 10 Mar 2018 07:23:38 -0800 (PST) X-Google-Smtp-Source: AG47ELuETf4VkIWtYTmYbrs4J4+oH1v4FTqNbQdyeoYM5m1OqSIPQazxfamOgTFREcd6wILXdsAK X-Received: by 10.98.149.90 with SMTP id p87mr2269402pfd.28.1520695418344; Sat, 10 Mar 2018 07:23:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520695418; cv=none; d=google.com; s=arc-20160816; b=sJb2OaQGCKtHBoz+XT/7SpGSNWklJ/xB6ZEbmP7pi+tiGtQBooo+OaaFi97Lm3xBZN SLNLjtToiRkgP50K5Cj8fMUpVYrDF+ipMxKtU0Oo5PouwpyVx3kvOH6XXOKAj7Ec8eMY SNjwsVnCIT4tHyEzOM1BzqF9aJFHX6vjA3+CpI7ZJsXTSwjx6/6D+tOpzSgzF8NDNy+l /bJrVCNvmpejSxVvPF+LO7DPCshCghl4hrciaYU0dPcf9JbM/1JmPh9z4b3PvMG7Ydf7 7V1GCWc15o9tcC8b6nTiwTIrrdZ0XqJuAMop3Bv/KcoRfTduN+z2nIo76wBEqjtda3yf 2npg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=WU/4DO2evqYWvT8sxeUK34b8R5wT8RvLubFnboz9t/g=; b=YajUqAfI84PktD0X9m5BAk5/CxweTxHxVBVRpnaNl8Jy2Q2daGUhh9dS8Noklq9zrt CMyHoD5Eb0Um7MDK5tJjy18ZNkcoHgP3+e4SUMQITj7Ia2DzsDbqXt/K9FF3dxq7CAe1 4l0hdk+n0/YD8zZ/mPQya+QU+ZggtNXRDbOuKk9JcHifBTq9nUNM1gQfDf1IRYlzAcAB uvzWdMZAaxuakc5PClu+Crs44PxiptL/5taz8iJvqwi099TBPdKSl09cxwkm/Y+f2pPB keSuENGLYswFTdfZ5XUbIo48W4poHnkiNvOR7pMMKWjKmGq+lmi2FpoqmToq6BaNOW5+ /4wg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=HAvWCB8V; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id q16si2824191pfg.221.2018.03.10.07.23.38; Sat, 10 Mar 2018 07:23:38 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=HAvWCB8V; spf=pass (google.com: best guess record for domain of linux-crypto-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-crypto-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932373AbeCJPXf (ORCPT + 1 other); Sat, 10 Mar 2018 10:23:35 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:34036 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932368AbeCJPXd (ORCPT ); Sat, 10 Mar 2018 10:23:33 -0500 Received: by mail-wr0-f196.google.com with SMTP id o8so11657001wra.1 for ; Sat, 10 Mar 2018 07:23:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=WU/4DO2evqYWvT8sxeUK34b8R5wT8RvLubFnboz9t/g=; b=HAvWCB8V4dDnb1GolXuP+7o2F1N+HnIFw+yBR8IdcsfegEQ0GZ9KOrUS91d2/9Npp5 hJk07s6r1F0FVsurzIsbz32HPxGRelx4nTmd6T2YAFmQjRB2c9h8ONtDAi6MI5bSSxCI bgwtoIFAbhlvMiIOBbYx2vIiJUSpC5npAUtm4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=WU/4DO2evqYWvT8sxeUK34b8R5wT8RvLubFnboz9t/g=; b=pkjdgqvB1QsfhVNyiE0Y9DTDSHIYa8GPo7fMY6Gs/7cZmf9F1Yx7XDXKKBVHlp/kHd MC65NH1XFnInvzYCXtSWfsObZu4q0LjldRRWfNQeWyIgyEBXiR7J9TXNIQeZDWLaUFRg c+chB+kb9bYXSyLIEKZLU4xfUjJr5JIjQxuuF0hHfiWTpgiZK724DDkTfgVywqeKUuwX RzYAA+P3ItnUvqKa3yqQ+RT6TFakPoYyo/Wv7K7Dv7mYCv4a1ppuoNuipAGxmgRhYnhf 9FiZ5eB/TQdnuZ2+NAyzK1BVDc0LcDSSsFlFTKLut6TYazGrbOXl+n51sGcj6x27WJFz ygoA== X-Gm-Message-State: AElRT7F+t6m5RSHCwvt+Yo/Wf61H2CUUk8UUNdRKPBrZhYFzPJd4l3Vk Dq80ENpVehYr4kt2UKAbapuvDRSwbgM= X-Received: by 10.223.136.15 with SMTP id d15mr1891194wrd.127.1520695411618; Sat, 10 Mar 2018 07:23:31 -0800 (PST) Received: from localhost.localdomain ([105.148.128.186]) by smtp.gmail.com with ESMTPSA id m9sm7027531wrf.13.2018.03.10.07.23.27 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 10 Mar 2018 07:23:30 -0800 (PST) From: Ard Biesheuvel To: linux-crypto@vger.kernel.org Cc: herbert@gondor.apana.org.au, linux-arm-kernel@lists.infradead.org, Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v5 20/23] crypto: arm64/sha3-ce - yield NEON after every block of input Date: Sat, 10 Mar 2018 15:22:05 +0000 Message-Id: <20180310152208.10369-21-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.15.1 In-Reply-To: <20180310152208.10369-1-ard.biesheuvel@linaro.org> References: <20180310152208.10369-1-ard.biesheuvel@linaro.org> Sender: linux-crypto-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-crypto@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha3-ce-core.S | 77 +++++++++++++------- 1 file changed, 50 insertions(+), 27 deletions(-) -- 2.15.1 diff --git a/arch/arm64/crypto/sha3-ce-core.S b/arch/arm64/crypto/sha3-ce-core.S index 332ad7530690..a7d587fa54f6 100644 --- a/arch/arm64/crypto/sha3-ce-core.S +++ b/arch/arm64/crypto/sha3-ce-core.S @@ -41,9 +41,16 @@ */ .text ENTRY(sha3_ce_transform) - /* load state */ - add x8, x0, #32 - ld1 { v0.1d- v3.1d}, [x0] + frame_push 4 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + +0: /* load state */ + add x8, x19, #32 + ld1 { v0.1d- v3.1d}, [x19] ld1 { v4.1d- v7.1d}, [x8], #32 ld1 { v8.1d-v11.1d}, [x8], #32 ld1 {v12.1d-v15.1d}, [x8], #32 @@ -51,13 +58,13 @@ ENTRY(sha3_ce_transform) ld1 {v20.1d-v23.1d}, [x8], #32 ld1 {v24.1d}, [x8] -0: sub w2, w2, #1 +1: sub w21, w21, #1 mov w8, #24 adr_l x9, .Lsha3_rcon /* load input */ - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b-v31.8b}, [x1], #24 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b-v31.8b}, [x20], #24 eor v0.8b, v0.8b, v25.8b eor v1.8b, v1.8b, v26.8b eor v2.8b, v2.8b, v27.8b @@ -66,10 +73,10 @@ ENTRY(sha3_ce_transform) eor v5.8b, v5.8b, v30.8b eor v6.8b, v6.8b, v31.8b - tbnz x3, #6, 2f // SHA3-512 + tbnz x22, #6, 3f // SHA3-512 - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b-v30.8b}, [x1], #16 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b-v30.8b}, [x20], #16 eor v7.8b, v7.8b, v25.8b eor v8.8b, v8.8b, v26.8b eor v9.8b, v9.8b, v27.8b @@ -77,34 +84,34 @@ ENTRY(sha3_ce_transform) eor v11.8b, v11.8b, v29.8b eor v12.8b, v12.8b, v30.8b - tbnz x3, #4, 1f // SHA3-384 or SHA3-224 + tbnz x22, #4, 2f // SHA3-384 or SHA3-224 // SHA3-256 - ld1 {v25.8b-v28.8b}, [x1], #32 + ld1 {v25.8b-v28.8b}, [x20], #32 eor v13.8b, v13.8b, v25.8b eor v14.8b, v14.8b, v26.8b eor v15.8b, v15.8b, v27.8b eor v16.8b, v16.8b, v28.8b - b 3f + b 4f -1: tbz x3, #2, 3f // bit 2 cleared? SHA-384 +2: tbz x22, #2, 4f // bit 2 cleared? SHA-384 // SHA3-224 - ld1 {v25.8b-v28.8b}, [x1], #32 - ld1 {v29.8b}, [x1], #8 + ld1 {v25.8b-v28.8b}, [x20], #32 + ld1 {v29.8b}, [x20], #8 eor v13.8b, v13.8b, v25.8b eor v14.8b, v14.8b, v26.8b eor v15.8b, v15.8b, v27.8b eor v16.8b, v16.8b, v28.8b eor v17.8b, v17.8b, v29.8b - b 3f + b 4f // SHA3-512 -2: ld1 {v25.8b-v26.8b}, [x1], #16 +3: ld1 {v25.8b-v26.8b}, [x20], #16 eor v7.8b, v7.8b, v25.8b eor v8.8b, v8.8b, v26.8b -3: sub w8, w8, #1 +4: sub w8, w8, #1 eor3 v29.16b, v4.16b, v9.16b, v14.16b eor3 v26.16b, v1.16b, v6.16b, v11.16b @@ -183,17 +190,33 @@ ENTRY(sha3_ce_transform) eor v0.16b, v0.16b, v31.16b - cbnz w8, 3b - cbnz w2, 0b + cbnz w8, 4b + cbz w21, 5f + + if_will_cond_yield_neon + add x8, x19, #32 + st1 { v0.1d- v3.1d}, [x19] + st1 { v4.1d- v7.1d}, [x8], #32 + st1 { v8.1d-v11.1d}, [x8], #32 + st1 {v12.1d-v15.1d}, [x8], #32 + st1 {v16.1d-v19.1d}, [x8], #32 + st1 {v20.1d-v23.1d}, [x8], #32 + st1 {v24.1d}, [x8] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* save state */ - st1 { v0.1d- v3.1d}, [x0], #32 - st1 { v4.1d- v7.1d}, [x0], #32 - st1 { v8.1d-v11.1d}, [x0], #32 - st1 {v12.1d-v15.1d}, [x0], #32 - st1 {v16.1d-v19.1d}, [x0], #32 - st1 {v20.1d-v23.1d}, [x0], #32 - st1 {v24.1d}, [x0] +5: st1 { v0.1d- v3.1d}, [x19], #32 + st1 { v4.1d- v7.1d}, [x19], #32 + st1 { v8.1d-v11.1d}, [x19], #32 + st1 {v12.1d-v15.1d}, [x19], #32 + st1 {v16.1d-v19.1d}, [x19], #32 + st1 {v20.1d-v23.1d}, [x19], #32 + st1 {v24.1d}, [x19] + frame_pop ret ENDPROC(sha3_ce_transform)