From patchwork Tue Dec 26 10:29:24 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122726 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp778923qgn; Tue, 26 Dec 2017 02:30:55 -0800 (PST) X-Google-Smtp-Source: ACJfBosu7rVIwsnASFYWTCgGiedcfAxziGDdGtQVkV7YP7xfxpCrkTzUWxD8Rgw0XktyPPoR/x0W X-Received: by 10.98.93.92 with SMTP id r89mr25162492pfb.1.1514284255833; Tue, 26 Dec 2017 02:30:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284255; cv=none; d=google.com; s=arc-20160816; b=CGN4F/qxhaN65GO4+vxlHsI5NuIF7qZUtOvQsu2f+rMtBcG969I92ckEE80/GrKdD8 /GDNDbViKMoyJbgMwXZAXdp9LCUvbgt+KGOFo5M/K9ZOSjJMRN9hv9BShc0+0PN6l+pd HbJDTQYUuSu3iMFanTTdNj07+E68U/vUorweXaqKzxvG0r0fVL/DsfhgRDcK9RvHrhAA bn3ZiJs5oqYtm2RQaE2P1nZyI7mdpT2YjQgu/0YzaRO9EDRUmj60WUfzE21BE6NiQLxW nrSIH0+t119xp9TvMDPXZmR5WvkJxITnpO+mCMnuDGw3eQ1jHgpcta4hiwZf6NwW86z+ J3eg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=NYW3GNv1Fb5c3EF6hTCqbc8ovx6cH6VeyEIJ7wuOJjw=; b=ywfeVv0c0OKD8sglBSY8sUwZS9xScTwp+DbeQabJXRIfJ2H3rz4Y3+BgHckPyDwPRl uZWYxO09XwQxO9ojLljDO6oofs6aS8PbVlvL/TiY3X6nSsM0ECmAk3Et58Ghnr6ksvHL H9GYfGti3b2leDhIOVFQboDN50nlk699mzd/ozDk3gLbrh2X9cTyFoYaVUGxb2Vx+UnF kCbBTE929NABkGdT/z8gQ+pbHE0gNWpBSE0Ed8GMHYIaTIf4MiliT+CJfl/c8gc6r0MA slsJ9Q50fMC95RLl5WlpBAq26Uo2dhFUiJPrag4MYxsf22X+Qli61A42ZdqKyNLIBzvb F4Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=KEIHobJX; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t10si22769770plh.272.2017.12.26.02.30.55; Tue, 26 Dec 2017 02:30:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=KEIHobJX; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751061AbdLZKax (ORCPT + 4 others); Tue, 26 Dec 2017 05:30:53 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:35605 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750951AbdLZKav (ORCPT ); Tue, 26 Dec 2017 05:30:51 -0500 Received: by mail-wr0-f193.google.com with SMTP id l19so21938529wrc.2 for ; Tue, 26 Dec 2017 02:30:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=KEIHobJXG6FYvyR/9hLKknW0+Ad10cG9dg+LQvtDn/FrotN5pWwP96ACwF0O0Jpu+X KRRW8Mx+Xrf1Sm6Zw2j8F57zTPDgrrHVnOi4o9aPLMHVJFnMuL5fVl0g3wZ+tWkkksLf Gmqv3UI+tnaaEoFIEd8e42jvjZ4Re2NCpC9zw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=siDqjuKDSU5MVSI0X6LMwRScVHyvLRBVTI5yU5S7QOs=; b=JfW3/XmVd5UDrQLuSgOsnXlMdBOWoUTCih+IVtV/MUaxZoGpeMNGY4HzfgwASbUe3R uyxfix4t14jwlpju7SuIZarO6Q0OGC+gOpLj22+W8zQrHPt1+ezlwqzKfRNfU2fDFjgk UsWMZes8pf6uOIlUS89k7NIsMoNvV8Mj8Ob8NDh6oteK/wYQW3j9/BhkbtXR6m4KMBs1 RqNoU4JW8JW2lfpcwHrYq3uzohsbuReyfB4D9qCrcff+h+refMSTIhH2bL8BHnQFTQPZ QiZeK13fqOcgMNA3+crQuaH4enn47TUp0Sm+3txl2BMQ4cwv5siC35KCFaekRI57qBZ/ 24OQ== X-Gm-Message-State: AKGB3mIKKMEtX4c7ygOLGUEo+59zzGD1HjNg/GktkhEqpL+dBUnra1tz YrfPAcBHTUnWlXsJBHrE/qhjHA== X-Received: by 10.223.184.200 with SMTP id c8mr24658523wrg.268.1514284250087; Tue, 26 Dec 2017 02:30:50 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:30:49 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 04/20] crypto: arm64/aes-bs - move kernel mode neon en/disable into loop Date: Tue, 26 Dec 2017 10:29:24 +0000 Message-Id: <20171226102940.26908-5-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-neonbs-glue.c | 36 +++++++++----------- 1 file changed, 17 insertions(+), 19 deletions(-) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index 9d823c77ec84..e7a95a566462 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -99,9 +99,8 @@ static int __ecb_crypt(struct skcipher_request *req, struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -109,12 +108,13 @@ static int __ecb_crypt(struct skcipher_request *req, blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -158,19 +158,19 @@ static int cbc_encrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; /* fall back to the non-bitsliced NEON implementation */ + kernel_neon_begin(); neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->enc, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -181,9 +181,8 @@ static int cbc_decrypt(struct skcipher_request *req) struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -191,13 +190,14 @@ static int cbc_decrypt(struct skcipher_request *req) blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); aesbs_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -229,9 +229,8 @@ static int ctr_encrypt(struct skcipher_request *req) u8 buf[AES_BLOCK_SIZE]; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; u8 *final = (walk.total % AES_BLOCK_SIZE) ? buf : NULL; @@ -242,8 +241,10 @@ static int ctr_encrypt(struct skcipher_request *req) final = NULL; } + kernel_neon_begin(); aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, ctx->rk, ctx->rounds, blocks, walk.iv, final); + kernel_neon_end(); if (final) { u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE; @@ -258,8 +259,6 @@ static int ctr_encrypt(struct skcipher_request *req) err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); - return err; } @@ -304,12 +303,11 @@ static int __xts_crypt(struct skcipher_request *req, struct skcipher_walk walk; int err; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); kernel_neon_begin(); - - neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, - ctx->key.rounds, 1); + neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, ctx->key.rounds, 1); + kernel_neon_end(); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; @@ -318,13 +316,13 @@ static int __xts_crypt(struct skcipher_request *req, blocks = round_down(blocks, walk.stride / AES_BLOCK_SIZE); + kernel_neon_begin(); fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->key.rk, ctx->key.rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes - blocks * AES_BLOCK_SIZE); } - kernel_neon_end(); - return err; } From patchwork Tue Dec 26 10:29:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122727 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp779120qgn; Tue, 26 Dec 2017 02:31:07 -0800 (PST) X-Google-Smtp-Source: ACJfBovmQ/dGSQ4PQ4O9z7H7jfIr/x3lRv7GvAtGWO60OoiJdM2c+F40pMI/zZE1v9kOkEssXgXw X-Received: by 10.101.96.1 with SMTP id m1mr22104434pgu.38.1514284267679; Tue, 26 Dec 2017 02:31:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284267; cv=none; d=google.com; s=arc-20160816; b=o4MQeqFPVBDSUPl0w3hvHU+9yHRSxBvaJEHWSygF1dfioMcIzlYXm2TY11+uQpZfb5 k54zikUiJe/OIvVTbkz4Y+n4K2+05bqeXTOmsYwHxmK4Tt4Fs4zbr/IKyawnvY6pgNyV JiAB2U22EqZTmtsZSbH2ShcAoVV9Dqr+h3aeKVwNRE1W4gn7C/1/6vZNLzV+TEvpfcnp krlZE+r+biGjuhQNlTslGpCRfedCbEsrsZhe3Iiu4OCTD5lPdmqBwpbMcRiQWGJ0MPof FztEoT7OTCQocxEdSaCklx3JhjeuLpbHgHkSireyBHRx0KRYt1OBvPDuFemCehbtpGI8 knNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=4L8VBFJ8/vUZayezPEZFRvTau06CygKiThORT60NjPo=; b=zTLHmwbwxHvKdGyqUxcATOqLaJ3FGzSLYbcfFOJ8bGISyeUxPXCL8V9vnpHK1ukRk1 Yk5OO4vNBZJBLdpw1izRKVQCRwtLoigoYpT2c9W0L+W3lgwXlgiU+0vg2dQ81Brkda+r sHXOT5qgW/aKVoygX/YOsReSQXunEgqUeL4ksvOE7AsfFIUotnzFvz1jU1PsbYmj4BBr CJAgKShX2qGWXXDQp1WsApEeV3FF8RLae9zk7f3xdZHmhDa78+HlL/Qzgv/tzFsmdW75 +0M6U8urDl4irkGgZvfjyuEiW1xCNSxtUHp0sWaGrYBmQ6MKKf18OKnpLR/ypIVF+7EL rzxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=TeKl2C6U; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si22528060plw.770.2017.12.26.02.31.07; Tue, 26 Dec 2017 02:31:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=TeKl2C6U; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751168AbdLZKbF (ORCPT + 4 others); Tue, 26 Dec 2017 05:31:05 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:35619 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751159AbdLZKbD (ORCPT ); Tue, 26 Dec 2017 05:31:03 -0500 Received: by mail-wr0-f193.google.com with SMTP id l19so21938867wrc.2 for ; Tue, 26 Dec 2017 02:31:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=; b=TeKl2C6UwKzl/WOrBiYx6T4sb2jc0yg3gmsSB33y+uREsjUvSn8K77g3rmPuA19P3c Bgh3hap0Uhibh0J+rweXCUCjkdXoCwnu97OPMAnP1Nloe6W1Xwq02QXg058ORDUDr2Vz dr/C6ncn0CUKClZzhQBIu20rlL2nQ4HOKSv/E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=AVGr1XiAAq1ruDzOgmfaJd5iqwO5GoptNYnkdzAZJA8=; b=kaudKpSdZKG1Da56y+mr9qtqVxEtC6AaipoIXEcYiIBe6vRmKLUl+N4++8dAsJM/u6 FL3GNp7vrWpXm74NuFFd//vsc2Sw1bHr2dGB3gY/gwMFXtFlyoYjuwf9i5pUUcFj7BhK h5eOEqGhoN2W2xiRXWHR+fRKuzOBF2zBXRsa64PlNAszzK2OQrRTcQgqgQ7VZu000sgW /TDqXbFoEXoyZgWkX6IrMhvA1SOUr3OciUkHvUJosKFtczIDOKiBoG9tSUPOrzvf38pQ yFFozAjXCj/Xu7kSnqqs9fBKf5xEflreXS44a7PWyacvEu5HfkmNgJLKWfZ6vk3YmB+a Mi8g== X-Gm-Message-State: AKGB3mKs9KYAX9gdg1rC9WFNvlGFM8+lL3euB9qt3DIU9YHA1baseX+p hgd+gPGJqHiAEMY2XISrSE3HBA== X-Received: by 10.223.154.112 with SMTP id z103mr26711212wrb.210.1514284262010; Tue, 26 Dec 2017 02:31:02 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:01 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 08/20] crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path Date: Tue, 26 Dec 2017 10:29:28 +0000 Message-Id: <20171226102940.26908-9-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org CBC MAC is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC MAC which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 23 ++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index e86535a1329d..a68412e1e3a4 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -395,8 +395,28 @@ AES_ENDPROC(aes_xts_decrypt) AES_ENTRY(aes_mac_update) ld1 {v0.16b}, [x4] /* get dg */ enc_prepare w2, x1, x7 - cbnz w5, .Lmacenc + cbz w5, .Lmacloop4x + encrypt_block v0, w2, x1, x7, w8 + +.Lmacloop4x: + subs w3, w3, #4 + bmi .Lmac1x + ld1 {v1.16b-v4.16b}, [x0], #64 /* get next pt block */ + eor v0.16b, v0.16b, v1.16b /* ..and xor with dg */ + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v2.16b + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v3.16b + encrypt_block v0, w2, x1, x7, w8 + eor v0.16b, v0.16b, v4.16b + cmp w3, wzr + csinv x5, x6, xzr, eq + cbz w5, .Lmacout + encrypt_block v0, w2, x1, x7, w8 + b .Lmacloop4x +.Lmac1x: + add w3, w3, #4 .Lmacloop: cbz w3, .Lmacout ld1 {v1.16b}, [x0], #16 /* get next pt block */ @@ -406,7 +426,6 @@ AES_ENTRY(aes_mac_update) csinv x5, x6, xzr, eq cbz w5, .Lmacout -.Lmacenc: encrypt_block v0, w2, x1, x7, w8 b .Lmacloop From patchwork Tue Dec 26 10:29:32 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122738 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp781228qgn; Tue, 26 Dec 2017 02:33:13 -0800 (PST) X-Google-Smtp-Source: ACJfBotszWtkuUm6KcfvgZ+fn1W6SZxh8axkx1xvUkHTvb7KJ5nTa8yYk85dfwy6tc6LL08I5N7R X-Received: by 10.98.8.202 with SMTP id 71mr15996863pfi.204.1514284393586; Tue, 26 Dec 2017 02:33:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284393; cv=none; d=google.com; s=arc-20160816; b=MjB9UNHIDWtEvmKh+Voyz8uqv5dgVmUt/xfy+aEwek1pXhwNVLlPISGelSuG99v//x 6gk2jcz7kWS+Y0LPMc3X0kmjei5wcCg3G8hI4WAuTIpiMi/jtkxvyFs97hxnFaq7m+z+ CuW0+PRSpvKEY8JUHDtcB05lyi/Y7oNiw+uUTPhsEZHTsKnY49uELxuOT0wcrPxCwjud gNMD2A2YtY+btoenxCrpu8GmIlvfupvSnf5Aw+tUng0EJprJYr3fj98QbERzetvt+9XN rLP61xqaQPNc7MBrRWkylK1533PghwzKRkIrn9CIR9QUde6Ev6Oehz1/5XxtVZX95Fit BTow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=hXRxWygbDV6br9iMLDkfcoKKHnkycVupHIzDJCbySMc=; b=wsOqgKnijdqKFZGuJ6vh6IfrNonxB4/MScftQeGFJthhmBsfUCgwBTd/taMUPe20Z6 LXGkcN69Let0ZsdUgVulzDYJDpZFvSFGeVQscJexHTeKcorvSocf/gm2DToZx16VHGjx /MSQzgR/T4U3/ZI2vbWwWvs8UoYcnEJsq537/6SAr5rN6c7P9RULrjGQmVjW4O9jlPt9 zpQ9si+b4RVsPKnQs6yNT7d+BMyAkaVmZB5hWJckYLLfxGn+Nl66Esk+WwZizO6Sanhx oEUmCnVPjbIGHio/iRsmoJAKGMulkd21z47kY6LYbqG8yv5mefxNALVo77Ewz82YXPnr ko7w== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=BjU7sszT; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si21002472plu.508.2017.12.26.02.33.13; Tue, 26 Dec 2017 02:33:13 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=BjU7sszT; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751266AbdLZKbR (ORCPT + 4 others); Tue, 26 Dec 2017 05:31:17 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:43136 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750953AbdLZKbO (ORCPT ); Tue, 26 Dec 2017 05:31:14 -0500 Received: by mail-wm0-f66.google.com with SMTP id n138so34457612wmg.2 for ; Tue, 26 Dec 2017 02:31:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5FJvE4TZ320kLEjn0IS/ezWe6M00vG4Z7O6NEpxxNho=; b=BjU7sszTVXY7ZfNvErY2zDoWRHKt2LXjK5oC6RZdj1wb3mvmY6QqI/YshtRLtB/ZAe 0AZtHdfEt48Te5JreRC9qdFFiSatBdID00ctauBp70DNVdXGtp8ztNtXUlY3LuXpcsm9 2auXzFoktLJBCOQeVXk7tndEQ8zeJuhMD537U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5FJvE4TZ320kLEjn0IS/ezWe6M00vG4Z7O6NEpxxNho=; b=ryup/3UsCmaH5MqG9eTnku/jsPaTnNzbwDUPoWsoOYSgPIUooPwnoRmR6vjZAe0wjy Nimwom9Go8Bg5uZxSJi3PIL4qYqOCTy70OZ1ogjFV4X9FBqZu7kS+0+4BVKOrkU8edRw 587k/hSSRCqwyXYI2JSOkxdgIVI04vQu3VJRCmZayTv6nBfCvMKvkXZ7iTn+uucNM3Ti DzJznl8VsxmPV9erLqD9s46dLwHMqIBfcJ56Lwbm5MPXpjuKwING8MJHQDnf6cslvXRs 0CJ+j8uAtN/F6sUVYZKyE57xxJMyi8q4wVZkf5tPvi0pd0Upm0aAjlW/zUzuokAmwWZ4 ViBQ== X-Gm-Message-State: AKGB3mK+soGsiONL4oinRDETEOd9pTokteKcIHF6AHEJEOT/Hh/fc54J mHClOTOxZLl5OFXvIFxQ8qBQfw== X-Received: by 10.28.107.211 with SMTP id a80mr19974960wmi.71.1514284273287; Tue, 26 Dec 2017 02:31:13 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.10 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:12 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 12/20] crypto: arm64/sha1-ce - yield NEON after every block of input Date: Tue, 26 Dec 2017 10:29:32 +0000 Message-Id: <20171226102940.26908-13-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha1-ce-core.S | 42 ++++++++++++++------ 1 file changed, 29 insertions(+), 13 deletions(-) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index 8550408735a0..ded039415f4f 100644 --- a/arch/arm64/crypto/sha1-ce-core.S +++ b/arch/arm64/crypto/sha1-ce-core.S @@ -70,31 +70,37 @@ * int blocks) */ ENTRY(sha1_ce_transform) + frame_push 3 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + /* load round constants */ - adr x6, .Lsha1_rcon +0: adr x6, .Lsha1_rcon ld1r {k0.4s}, [x6], #4 ld1r {k1.4s}, [x6], #4 ld1r {k2.4s}, [x6], #4 ld1r {k3.4s}, [x6] /* load state */ - ld1 {dgav.4s}, [x0] - ldr dgb, [x0, #16] + ld1 {dgav.4s}, [x19] + ldr dgb, [x19, #16] /* load sha1_ce_state::finalize */ ldr_l w4, sha1_ce_offsetof_finalize, x4 - ldr w4, [x0, x4] + ldr w4, [x19, x4] /* load input */ -0: ld1 {v8.4s-v11.4s}, [x1], #64 - sub w2, w2, #1 +1: ld1 {v8.4s-v11.4s}, [x20], #64 + sub w21, w21, #1 CPU_LE( rev32 v8.16b, v8.16b ) CPU_LE( rev32 v9.16b, v9.16b ) CPU_LE( rev32 v10.16b, v10.16b ) CPU_LE( rev32 v11.16b, v11.16b ) -1: add t0.4s, v8.4s, k0.4s +2: add t0.4s, v8.4s, k0.4s mov dg0v.16b, dgav.16b add_update c, ev, k0, 8, 9, 10, 11, dgb @@ -125,16 +131,25 @@ CPU_LE( rev32 v11.16b, v11.16b ) add dgbv.2s, dgbv.2s, dg1v.2s add dgav.4s, dgav.4s, dg0v.4s - cbnz w2, 0b + cbz w21, 3f + + if_will_cond_yield_neon + st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + do_cond_yield_neon + b 0b + endif_yield_neon + + b 1b /* * Final block: add padding and total bit count. * Skip if the input size was not a round multiple of the block size, * the padding is handled by the C code in that case. */ - cbz x4, 3f +3: cbz x4, 4f ldr_l w4, sha1_ce_offsetof_count, x4 - ldr x4, [x0, x4] + ldr x4, [x19, x4] movi v9.2d, #0 mov x8, #0x80000000 movi v10.2d, #0 @@ -143,10 +158,11 @@ CPU_LE( rev32 v11.16b, v11.16b ) mov x4, #0 mov v11.d[0], xzr mov v11.d[1], x7 - b 1b + b 2b /* store new state */ -3: st1 {dgav.4s}, [x0] - str dgb, [x0, #16] +4: st1 {dgav.4s}, [x19] + str dgb, [x19, #16] + frame_pop ret ENDPROC(sha1_ce_transform) From patchwork Tue Dec 26 10:29:34 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122731 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp779436qgn; Tue, 26 Dec 2017 02:31:23 -0800 (PST) X-Google-Smtp-Source: ACJfBosGhfLGdgl/FOsrmm/9HCS6rvHZBGzpjbMqgfTQGFaOR4aaMnKf3cl4dVa9a+vplpOqdEmB X-Received: by 10.98.71.144 with SMTP id p16mr24722970pfi.15.1514284283253; Tue, 26 Dec 2017 02:31:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284283; cv=none; d=google.com; s=arc-20160816; b=V9SRn8JsS3QGfAwkTGGK3l/Qm1ldK8yEqNc2m4JFVQxQnIa8UkJsL3l0viiy4RYtGa GG//anNSbicjRTIKQDpQVGvb1BGzL2k6X1gIVR0+lX3n9CG01iwWdk07+0Ddj6OH8eiP d6aGJtGka7Z0EjaoubgV6xCoQaP/YcFZl/Sb+dtkxoQQm4U1ryBm1R6kLvNtlHXsMhRY EqLPsZq6rSXW6Ig/VXjFHQNo5uk5hv75kABkuLqIwmsWTi7f2tgHoAXsrHPtYWdiIWMD kh21Z0ZuGBYj1Mc/jnC1r4DsGuLrfJsQ+xdGgLp1sxcum9LEUyQ1Bpk5Lig0XmDF1xyo nSyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=TwHRCxNV3/Q4v5ujQ5FpJFTbsNrOy401Twqx1GUk0+M=; b=ody2rEE7NI/umksTx/gKOYsqEkFF+kDxpRPf2dfQd1ja06/pUHUTEDPqPyK3t5pcQJ EJ4kPHIsMgeBpcfTjQaxZeuLVsIfqRsAiQ/dOzaLQUi6Yqyc8XYyZUg3mAbLBCrAe1ZZ 3PxAFK1IdOPf5aGmJpqniw8OAnRfebGBdn4rJ/gAg6obL7s2fjFgK2uIIWo2VhZmlv0o KFNXJxvd8t9CnrKKq7DvHp9URhVVfStveVPkvWGHyn/H4lUqc3dYTFli2CoOC+xpMGFQ kTiTsF13wTxgQXWDFHaC8F6jFfjLSvunbjmpaT8oJvJySGWNFbeFYhqhMC9GM/eawLeC R7kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=f5l0xRHZ; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a90si22682509plc.137.2017.12.26.02.31.23; Tue, 26 Dec 2017 02:31:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=f5l0xRHZ; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750982AbdLZKbV (ORCPT + 4 others); Tue, 26 Dec 2017 05:31:21 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:39324 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750949AbdLZKbT (ORCPT ); Tue, 26 Dec 2017 05:31:19 -0500 Received: by mail-wr0-f196.google.com with SMTP id o101so5666642wrb.6 for ; Tue, 26 Dec 2017 02:31:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=AAl1duaqS+qr7JL7iXO0wgOrYHE7xW41ejLIVh7S9CQ=; b=f5l0xRHZBpv9IGMj0XSqqE22p0ThJsiWx9jmJu08RgLB1hNHaqK/YpKjYFCSUq9ya+ 7OKyR5uzXseBsnVUW9EWUGxKXcTCE07s+to9twBhrRkK8ReVIXhUyMHsrP/jtf7Z3Yiw ctrVjkoMhXHv1wr1pAcoXl91doVAgd6+8XeD8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=AAl1duaqS+qr7JL7iXO0wgOrYHE7xW41ejLIVh7S9CQ=; b=SxWQhV2YuP49c8Di/+PaszTT9x0XrK+OvPfpWT3Up2BBNhuk8D6nbWhJ2dEhDsKXrl fGbxpdiznyM/6+ZprVbHDr9C57+JZPjoHldB/pd9qjDLkxOkrxRPKFz6Z5sL3LqJA+wv 2bU3jkR9zZRLalMoY7YctUt2J8akB1z0OFPmrKXzRAFRqFdDVsUdp70NegcZuWGFqOdh eUDP3EYSmIbC9yGndAT9A5yDK6z5Ykp9GNrAIM7YBjG0gD8+PT59lKG0ohDnjDDQJVih Jei/Xa+i8Pxb0lYDLEYDzeh7RWv472M/XLh1efNJa+ykQZz93eatEExfcvJd9+gq76Ex py8g== X-Gm-Message-State: AKGB3mKxY38v6R8+89fJltfpB1U7At7UwYxzMUHpUvMdz/0kPnFCjke5 HDygYk2EAESGq/U0QMUZfE8Hkw== X-Received: by 10.223.155.219 with SMTP id e27mr25324057wrc.238.1514284278095; Tue, 26 Dec 2017 02:31:18 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.15 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:17 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 14/20] crypto: arm64/aes-ccm - yield NEON after every block of input Date: Tue, 26 Dec 2017 10:29:34 +0000 Message-Id: <20171226102940.26908-15-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-core.S | 150 +++++++++++++------- 1 file changed, 95 insertions(+), 55 deletions(-) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/aes-ce-ccm-core.S b/arch/arm64/crypto/aes-ce-ccm-core.S index e3a375c4cb83..88f5aef7934c 100644 --- a/arch/arm64/crypto/aes-ce-ccm-core.S +++ b/arch/arm64/crypto/aes-ce-ccm-core.S @@ -19,24 +19,33 @@ * u32 *macp, u8 const rk[], u32 rounds); */ ENTRY(ce_aes_ccm_auth_data) - ldr w8, [x3] /* leftover from prev round? */ + frame_push 7 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + + ldr w25, [x22] /* leftover from prev round? */ ld1 {v0.16b}, [x0] /* load mac */ - cbz w8, 1f - sub w8, w8, #16 + cbz w25, 1f + sub w25, w25, #16 eor v1.16b, v1.16b, v1.16b -0: ldrb w7, [x1], #1 /* get 1 byte of input */ - subs w2, w2, #1 - add w8, w8, #1 +0: ldrb w7, [x20], #1 /* get 1 byte of input */ + subs w21, w21, #1 + add w25, w25, #1 ins v1.b[0], w7 ext v1.16b, v1.16b, v1.16b, #1 /* rotate in the input bytes */ beq 8f /* out of input? */ - cbnz w8, 0b + cbnz w25, 0b eor v0.16b, v0.16b, v1.16b -1: ld1 {v3.4s}, [x4] /* load first round key */ - prfm pldl1strm, [x1] - cmp w5, #12 /* which key size? */ - add x6, x4, #16 - sub w7, w5, #2 /* modified # of rounds */ +1: ld1 {v3.4s}, [x23] /* load first round key */ + prfm pldl1strm, [x20] + cmp w24, #12 /* which key size? */ + add x6, x23, #16 + sub w7, w24, #2 /* modified # of rounds */ bmi 2f bne 5f mov v5.16b, v3.16b @@ -55,33 +64,43 @@ ENTRY(ce_aes_ccm_auth_data) ld1 {v5.4s}, [x6], #16 /* load next round key */ bpl 3b aese v0.16b, v4.16b - subs w2, w2, #16 /* last data? */ + subs w21, w21, #16 /* last data? */ eor v0.16b, v0.16b, v5.16b /* final round */ bmi 6f - ld1 {v1.16b}, [x1], #16 /* load next input block */ + ld1 {v1.16b}, [x20], #16 /* load next input block */ eor v0.16b, v0.16b, v1.16b /* xor with mac */ - bne 1b -6: st1 {v0.16b}, [x0] /* store mac */ + beq 6f + + if_will_cond_yield_neon + st1 {v0.16b}, [x19] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x19] /* reload mac */ + endif_yield_neon + + b 1b +6: st1 {v0.16b}, [x19] /* store mac */ beq 10f - adds w2, w2, #16 + adds w21, w21, #16 beq 10f - mov w8, w2 -7: ldrb w7, [x1], #1 + mov w25, w21 +7: ldrb w7, [x20], #1 umov w6, v0.b[0] eor w6, w6, w7 - strb w6, [x0], #1 - subs w2, w2, #1 + strb w6, [x19], #1 + subs w21, w21, #1 beq 10f ext v0.16b, v0.16b, v0.16b, #1 /* rotate out the mac bytes */ b 7b -8: mov w7, w8 - add w8, w8, #16 +8: mov w7, w25 + add w25, w25, #16 9: ext v1.16b, v1.16b, v1.16b, #1 adds w7, w7, #1 bne 9b eor v0.16b, v0.16b, v1.16b - st1 {v0.16b}, [x0] -10: str w8, [x3] + st1 {v0.16b}, [x19] +10: str w25, [x22] + + frame_pop ret ENDPROC(ce_aes_ccm_auth_data) @@ -126,19 +145,29 @@ ENTRY(ce_aes_ccm_final) ENDPROC(ce_aes_ccm_final) .macro aes_ccm_do_crypt,enc - ldr x8, [x6, #8] /* load lower ctr */ - ld1 {v0.16b}, [x5] /* load mac */ -CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ + frame_push 8 + + mov x19, x0 + mov x20, x1 + mov x21, x2 + mov x22, x3 + mov x23, x4 + mov x24, x5 + mov x25, x6 + + ldr x26, [x25, #8] /* load lower ctr */ + ld1 {v0.16b}, [x24] /* load mac */ +CPU_LE( rev x26, x26 ) /* keep swabbed ctr in reg */ 0: /* outer loop */ - ld1 {v1.8b}, [x6] /* load upper ctr */ - prfm pldl1strm, [x1] - add x8, x8, #1 - rev x9, x8 - cmp w4, #12 /* which key size? */ - sub w7, w4, #2 /* get modified # of rounds */ + ld1 {v1.8b}, [x25] /* load upper ctr */ + prfm pldl1strm, [x20] + add x26, x26, #1 + rev x9, x26 + cmp w23, #12 /* which key size? */ + sub w7, w23, #2 /* get modified # of rounds */ ins v1.d[1], x9 /* no carry in lower ctr */ - ld1 {v3.4s}, [x3] /* load first round key */ - add x10, x3, #16 + ld1 {v3.4s}, [x22] /* load first round key */ + add x10, x22, #16 bmi 1f bne 4f mov v5.16b, v3.16b @@ -165,9 +194,9 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ bpl 2b aese v0.16b, v4.16b aese v1.16b, v4.16b - subs w2, w2, #16 - bmi 6f /* partial block? */ - ld1 {v2.16b}, [x1], #16 /* load next input block */ + subs w21, w21, #16 + bmi 7f /* partial block? */ + ld1 {v2.16b}, [x20], #16 /* load next input block */ .if \enc == 1 eor v2.16b, v2.16b, v5.16b /* final round enc+mac */ eor v1.16b, v1.16b, v2.16b /* xor with crypted ctr */ @@ -176,18 +205,29 @@ CPU_LE( rev x8, x8 ) /* keep swabbed ctr in reg */ eor v1.16b, v2.16b, v5.16b /* final round enc */ .endif eor v0.16b, v0.16b, v2.16b /* xor mac with pt ^ rk[last] */ - st1 {v1.16b}, [x0], #16 /* write output block */ - bne 0b -CPU_LE( rev x8, x8 ) - st1 {v0.16b}, [x5] /* store mac */ - str x8, [x6, #8] /* store lsb end of ctr (BE) */ -5: ret - -6: eor v0.16b, v0.16b, v5.16b /* final round mac */ + st1 {v1.16b}, [x19], #16 /* write output block */ + beq 5f + + if_will_cond_yield_neon + st1 {v0.16b}, [x24] /* store mac */ + do_cond_yield_neon + ld1 {v0.16b}, [x24] /* reload mac */ + endif_yield_neon + + b 0b +5: +CPU_LE( rev x26, x26 ) + st1 {v0.16b}, [x24] /* store mac */ + str x26, [x25, #8] /* store lsb end of ctr (BE) */ + +6: frame_pop + ret + +7: eor v0.16b, v0.16b, v5.16b /* final round mac */ eor v1.16b, v1.16b, v5.16b /* final round enc */ - st1 {v0.16b}, [x5] /* store mac */ - add w2, w2, #16 /* process partial tail block */ -7: ldrb w9, [x1], #1 /* get 1 byte of input */ + st1 {v0.16b}, [x24] /* store mac */ + add w21, w21, #16 /* process partial tail block */ +8: ldrb w9, [x20], #1 /* get 1 byte of input */ umov w6, v1.b[0] /* get top crypted ctr byte */ umov w7, v0.b[0] /* get top mac byte */ .if \enc == 1 @@ -197,13 +237,13 @@ CPU_LE( rev x8, x8 ) eor w9, w9, w6 eor w7, w7, w9 .endif - strb w9, [x0], #1 /* store out byte */ - strb w7, [x5], #1 /* store mac byte */ - subs w2, w2, #1 - beq 5b + strb w9, [x19], #1 /* store out byte */ + strb w7, [x24], #1 /* store mac byte */ + subs w21, w21, #1 + beq 6b ext v0.16b, v0.16b, v0.16b, #1 /* shift out mac byte */ ext v1.16b, v1.16b, v1.16b, #1 /* shift out ctr byte */ - b 7b + b 8b .endm /* From patchwork Tue Dec 26 10:29:39 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122733 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp779714qgn; Tue, 26 Dec 2017 02:31:39 -0800 (PST) X-Google-Smtp-Source: ACJfBotgP9R2v5dXNrAHFcFMRvX+0G3592vcrXIet4IbM2ElgP4vcD3NzUSRRXSRezmoZDeHIzCz X-Received: by 10.99.168.10 with SMTP id o10mr4129267pgf.153.1514284299561; Tue, 26 Dec 2017 02:31:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284299; cv=none; d=google.com; s=arc-20160816; b=TYT8xGCL30snvAHnzxIT2u3U6fRWASs1seociBapcAqOLp5QKapQfbRIy+drXT3jTC plBkq3uWj8vW1Qsej08CGF50zMR13L0X8obAduifx/J0LJIhJyiM2PNP8R8HB/z5VJJf kVb1uAWM90qJiZhkcn/YBw9pa+ltaLtjBf92ekTOWkW9VVXbU4noFeiwy7pKdZr03s66 HrrV7LCJvyjc/hdGD7gkuQqsWg/yieAY2FGtBMGcO8REYMtSdBheufz7DjbVPDjLxVGl E87gnU4Hsa0DilcWbBGvC2pfA2psXfrhXN0Ho5oxH+/IbJ9IIKKBRMf1uHuBKPINYjJh ryTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=Na1KyWpjY/v6BSxy3TEYKMNO4Sf5t12BOwIFuDn/7bM=; b=qX6txAl7pSj/1ZvNfwXEImXMQtGPWxpdModf6qhOblLJJPzB5tsKQoai2yzTEQzEzx 24O0uTq6GoSxGcKQpkrcQtcUg6HY+luymrpbEOpgfQ9JtDsI2dDonhrRcojZC88E1Wvm mb+FTO3qZGiW6GBOEkWnC5iX6UScnh0tsr0x3HKlUzfLKsdOARq13L7Y0R/oMsJvIwB9 L+hklAS1V6u0d/HCI0m5JrTXaeQRlcPzt54TIJclMWuEK4o+9Vk+21V3QbdFnWO+no9+ 8digY1ZYPm0EtzUnvUk9gCAIaB+DPSGXhwZulthpZ9djQZaZG5jW+oF2VmYEIB1Q6r8B qVAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=gcMS18rG; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n8si20544144pgd.755.2017.12.26.02.31.39; Tue, 26 Dec 2017 02:31:39 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=gcMS18rG; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751422AbdLZKbi (ORCPT + 4 others); Tue, 26 Dec 2017 05:31:38 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:40216 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751385AbdLZKbd (ORCPT ); Tue, 26 Dec 2017 05:31:33 -0500 Received: by mail-wm0-f68.google.com with SMTP id f206so34335112wmf.5 for ; Tue, 26 Dec 2017 02:31:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vcZxQXjiZ4WeoVIVaraKKVAjq9WCyQ24svdTHBSp84Q=; b=gcMS18rGzxLGBu3BZz9dJTgKO2yI19QZ/q7kT0o3adrCYTzh4NNxiXPUbukuoYOKn3 TPi37yCtL0x29+xdfSOO3TqmYFqGvLI9D/hP7UIT/wutBVIgKiuzIB/qpA9hx80kXWxv /wtZgpmtHOAnfbnKmveuy8rK2pJV8nH55xuqE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vcZxQXjiZ4WeoVIVaraKKVAjq9WCyQ24svdTHBSp84Q=; b=FZyhfR1VtY/6r/4PGONlj5kffPJAM1tBG7FizctCNfP3YFPKhUm3Flyl7G6SYKpTTO 2GzGuAs1ZTbO4rU+XXFdW0B7n6zGoNAnVj/S9q7RXrAqz+ybZiyanth166H9GmQQvQKD pgyamaqQLAz+LZcUx6u9qXyQDMrVziEh4btoJ4j68dDxTsDjucROY2hm9lMFwvp2KXyN 63Z4kp3AR33+YevCueolIh260GfjzD5gDbKbHLrXWxhaobvqDhig9UHuQdwEWQ9lCTHd kyMeHFYphYD8Pv0cr28g9Vfp+4LvLqs+SpfsF2kvC8hoNjkKfz55vCCHVfHERaPkFqWJ K/9w== X-Gm-Message-State: AKGB3mLHgmWOvEalR8AemKzmbifGHd+PVGU4rSmwsAC4MM7+F5l9yMRw uq+/cYZuciiIyfiF+0KOkduMBA== X-Received: by 10.28.13.77 with SMTP id 74mr21644641wmn.51.1514284292674; Tue, 26 Dec 2017 02:31:32 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:32 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 19/20] crypto: arm64/crct10dif-ce - yield NEON after every block of input Date: Tue, 26 Dec 2017 10:29:39 +0000 Message-Id: <20171226102940.26908-20-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/crct10dif-ce-core.S | 32 +++++++++++++++++--- 1 file changed, 28 insertions(+), 4 deletions(-) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/crypto/crct10dif-ce-core.S b/arch/arm64/crypto/crct10dif-ce-core.S index d5b5a8c038c8..5bce7833ca5f 100644 --- a/arch/arm64/crypto/crct10dif-ce-core.S +++ b/arch/arm64/crypto/crct10dif-ce-core.S @@ -74,13 +74,19 @@ .text .cpu generic+crypto - arg1_low32 .req w0 - arg2 .req x1 - arg3 .req x2 + arg1_low32 .req w19 + arg2 .req x20 + arg3 .req x21 vzr .req v13 ENTRY(crc_t10dif_pmull) + frame_push 3, 128 + + mov arg1_low32, w0 + mov arg2, x1 + mov arg3, x2 + movi vzr.16b, #0 // init zero register // adjust the 16-bit initial_crc value, scale it to 32 bits @@ -175,8 +181,25 @@ CPU_LE( ext v12.16b, v12.16b, v12.16b, #8 ) subs arg3, arg3, #128 // check if there is another 64B in the buffer to be able to fold - b.ge _fold_64_B_loop + b.lt _fold_64_B_end + + if_will_cond_yield_neon + stp q0, q1, [sp, #.Lframe_local_offset] + stp q2, q3, [sp, #.Lframe_local_offset + 32] + stp q4, q5, [sp, #.Lframe_local_offset + 64] + stp q6, q7, [sp, #.Lframe_local_offset + 96] + do_cond_yield_neon + ldp q0, q1, [sp, #.Lframe_local_offset] + ldp q2, q3, [sp, #.Lframe_local_offset + 32] + ldp q4, q5, [sp, #.Lframe_local_offset + 64] + ldp q6, q7, [sp, #.Lframe_local_offset + 96] + ldr q10, rk3 + movi vzr.16b, #0 // init zero register + endif_yield_neon + + b _fold_64_B_loop +_fold_64_B_end: // at this point, the buffer pointer is pointing at the last y Bytes // of the buffer the 64B of folded data is in 4 of the vector // registers: v0, v1, v2, v3 @@ -304,6 +327,7 @@ _barrett: _cleanup: // scale the result back to 16 bits lsr x0, x0, #16 + frame_pop ret _less_than_128: From patchwork Tue Dec 26 10:29:40 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122735 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp780185qgn; Tue, 26 Dec 2017 02:32:07 -0800 (PST) X-Google-Smtp-Source: ACJfBotls0uhtGNPMmG28lYiPijfPvCkOzqRAAjGpPq28a5txPd28ibNnKfwKXrYAyBTSGczLdDy X-Received: by 10.99.95.213 with SMTP id t204mr22284556pgb.133.1514284327659; Tue, 26 Dec 2017 02:32:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284327; cv=none; d=google.com; s=arc-20160816; b=Ck35q17CDrAkEWxxEGRyMOL9sbmOtEOQZQtJsQq5g0KctD6UhZ3xLdyLKyOuVny8h1 ohRQONL22icBtbUwIb4ix3kNH61lJQYLqEXc7ffySOPnMlWJ0wcyLlYK2o7yGhDeEZUI j6ReKku67/0OVz/lkOAtbI0fAcGcmqDW9yHjPlPYqy1cOsZBjJSZ7pxCSCbDDU7rZ8Gd AJXPTXWZgZ7jwcOh55NOzil/96qgFhHGpDYPA9pix59eZDgstktFTeEwhdqXo5EE4ZU4 VhqiajO30mKIbDCtJxo5Fjc3OyQHFDA5Nhu6V8vxlxuibwQkbQpKKlkL3GcxXM4xPh3o u9Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=pIuvgWz9cU0Gs2JZlomiE4bqQgfoPVzzwAyZzj/V5lo=; b=mjvgoSIq1+Cko6C6ENyQP/Ws+5JZupBJh3WXLFeR99aOcTu8wbhsjr27gvY764n3/m RyN+gl+DEyah/fYhQWk1AeXo/bk0zsmQN6nVshb7Ei0o0vyzjqBth5qFibqX05Z7nikD vnOllkjGuCrpXkLlrxcUxFnlOM858/I8YTRwb9uP2RTycEKb9BtHjCoL2YdfFP8f2tV1 qFSy4NbgTUScQnJ7ldCke2s+kr+64pr+3VLBFgYU9XDjb94PyqhImJpq4BWgm5z/6mxl 45qj+cw0l+M4io9TkSilH7syWSgsBDqvxa6eLxY0/iN1Dnp7SOejn7qvEBL566Hv+uq4 t7fA== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=ih75zeFJ; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z64si23003917pfa.207.2017.12.26.02.32.07; Tue, 26 Dec 2017 02:32:07 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=ih75zeFJ; spf=pass (google.com: best guess record for domain of linux-rt-users-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-rt-users-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751470AbdLZKcG (ORCPT + 4 others); Tue, 26 Dec 2017 05:32:06 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:39339 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751389AbdLZKbg (ORCPT ); Tue, 26 Dec 2017 05:31:36 -0500 Received: by mail-wr0-f194.google.com with SMTP id o101so5667078wrb.6 for ; Tue, 26 Dec 2017 02:31:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=0dxSTFX/xduNQExyV3nBFWyJqjc8FIvuy78ML4E3Ors=; b=ih75zeFJlXEcmHsH+8uN1aJl0+FxWxmJQdbiJRuhobgPvfqI6pCVYS+p7YXpsCDr0V Gu/dNDSS/rhAZCeMEs2ydb93OcAKsLgh/q36XuiQhtfPA8oaE17vhouVVfHXphWgZxyQ IapwM0fUd0Ecp2fXMo0Clm0Q0FWYOOGU4660U= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=0dxSTFX/xduNQExyV3nBFWyJqjc8FIvuy78ML4E3Ors=; b=s7QQ4urOfViVd7LRNx3fWq4m3wMXeM4tREJI2pDR9TCTUH4MIbXK8fGq8VuGW48+vy CWr3TMb2OQM97jBQHBC94fIBATsQmZOStGOJbPRWgZgXiS4atMggLQ6MMZAk6zWPQEQL 4iQoy4F8GgcEmpbsO6z2hUKUlH+cSt3ID9564NlETdN2lV7q/4Imn1xh7rq36l0jtXg5 /LpEJ8JAg2YVQ5KkniGtB2uhm7ajQ7CjGM8/N+xqcvjq7EG3GSJl9kPk7RxBUmbHGM2I 3VbechZLGOKlRjfi9FSBjP2/ftIAgJvPyl19SmYiTtUlR5OmCfvWN9UvZ8+usQ0pOZyh qL0w== X-Gm-Message-State: AKGB3mIOUGCQQgrgwkNfpeIuaCMRnIlzySRAhgm/nQmG+bhwFAHd9Yl+ fbv/yTMjd8QJQivPevfEFOVwuQ== X-Received: by 10.223.159.82 with SMTP id f18mr6425413wrg.219.1514284295188; Tue, 26 Dec 2017 02:31:35 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:34 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 20/20] DO NOT MERGE Date: Tue, 26 Dec 2017 10:29:40 +0000 Message-Id: <20171226102940.26908-21-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-rt-users-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Test code to force a kernel_neon_end+begin sequence at every yield point, and wipe the entire NEON state before resuming the algorithm. --- arch/arm64/include/asm/assembler.h | 33 ++++++++++++++++++++ 1 file changed, 33 insertions(+) -- 2.11.0 -- To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 575d0f065d28..295937b2f39b 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -622,6 +622,7 @@ alternative_else_nop_endif cmp w1, #PREEMPT_DISABLE_OFFSET csel x0, x0, xzr, eq tbnz x0, #TIF_NEED_RESCHED, .Lyield_\@ // needs rescheduling? + b .Lyield_\@ #endif /* fall through to endif_yield_neon */ .subsection 1 @@ -631,6 +632,38 @@ alternative_else_nop_endif .macro do_cond_yield_neon bl kernel_neon_end bl kernel_neon_begin + movi v0.16b, #0x55 + movi v1.16b, #0x55 + movi v2.16b, #0x55 + movi v3.16b, #0x55 + movi v4.16b, #0x55 + movi v5.16b, #0x55 + movi v6.16b, #0x55 + movi v7.16b, #0x55 + movi v8.16b, #0x55 + movi v9.16b, #0x55 + movi v10.16b, #0x55 + movi v11.16b, #0x55 + movi v12.16b, #0x55 + movi v13.16b, #0x55 + movi v14.16b, #0x55 + movi v15.16b, #0x55 + movi v16.16b, #0x55 + movi v17.16b, #0x55 + movi v18.16b, #0x55 + movi v19.16b, #0x55 + movi v20.16b, #0x55 + movi v21.16b, #0x55 + movi v22.16b, #0x55 + movi v23.16b, #0x55 + movi v24.16b, #0x55 + movi v25.16b, #0x55 + movi v26.16b, #0x55 + movi v27.16b, #0x55 + movi v28.16b, #0x55 + movi v29.16b, #0x55 + movi v30.16b, #0x55 + movi v31.16b, #0x55 .endm .macro endif_yield_neon, lbl