From patchwork Tue Dec 26 10:29:21 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122725 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp778909qgn; Tue, 26 Dec 2017 02:30:54 -0800 (PST) X-Google-Smtp-Source: ACJfBovsWnIWjQCZZBsoBXNSEXosb3vHk8zkyE9WpDr1qJzMWbeZqdpPRy+g95vObQxJv6uW7v8X X-Received: by 10.98.156.81 with SMTP id f78mr24851783pfe.211.1514284254620; Tue, 26 Dec 2017 02:30:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284254; cv=none; d=google.com; s=arc-20160816; b=rmLi3nn9IjCI7U1RPbzmTMJ/CaYgIB8jlhx4mkXM653/5SmNuWujtrhKgD+Izil29J t7xQ+bxVK9ExtHLRQJxOCMuvztReivCmnjsyQcRhUkruIdbnh1XH9yyCg0zVu6jWmbov JWLRxwgP2BMP3Bqa2YWKDhg5BYewJD6JzuoCdk723yWO8yDsloRubazp3voEK0gbmED2 jD4I8txs6XAGqIlQPW5jbEpuJrI0qur9tEUKf0mJxBEw4p+p6YeniHbBN+yaaCqNxhXv wa0eLt+mDzw759yFBzCEst7zu/DOojbgwleO6IdKFMaqyAZ6rpVHj9oOzDu6CgP2J2MZ oyGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=+ZWLqI1QNuPJLTnEYdZnb8HCsE/psTZiCZMJQrLjBtI=; b=c5vBcJ8fPyw4zxEAYXJdzGI0n13ldulY0adrOG6veUfGjTRdx2bDXVgojJ0Kq4ZzHw FFFBmTqx7jnFmEOzHY/5FI55ZO2HRWsuazHZbEHtGF0/jx4TJX4ibtIu3cA1p75NAxqL FE/VpOksdh9op1G7ayASbk6FV6prBgprFoDVQS0Ty5/d6BlbittHM4IcoRViDJg9Lqcw 6961ODM5NiyoZ2B8QA44aVzAIQKg+dTBNE8Q+XvpvSN869IltNvUE4IrOfEnonSO4QUL zmQJ+cJcBVbef6PXhlPG/8aunI/V8odBN4eRsQs0lm0+1LR4ihUcHPZ5gmsDg9I6+MEh 0CtQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Cmbs8pLn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t10si22769770plh.272.2017.12.26.02.30.54; Tue, 26 Dec 2017 02:30:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Cmbs8pLn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751024AbdLZKau (ORCPT + 28 others); Tue, 26 Dec 2017 05:30:50 -0500 Received: from mail-wm0-f66.google.com ([74.125.82.66]:40093 "EHLO mail-wm0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750951AbdLZKam (ORCPT ); Tue, 26 Dec 2017 05:30:42 -0500 Received: by mail-wm0-f66.google.com with SMTP id f206so34332573wmf.5 for ; Tue, 26 Dec 2017 02:30:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=+ZWLqI1QNuPJLTnEYdZnb8HCsE/psTZiCZMJQrLjBtI=; b=Cmbs8pLnymUqPRYHuCwyp9qwnK3MnGuAib69oC54C9k5GFlQPKXycYU8Zk3r8ygez8 2BFj8Pla7xf6lPBgsgmR8eh8jeCj/PgDwhTvsnnqoLp8wb2hwv7Xv076DDp5U2yy43cF LyREB+rlfn/80NmRtpqdWaCE2bd2qgULj2w1A= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=+ZWLqI1QNuPJLTnEYdZnb8HCsE/psTZiCZMJQrLjBtI=; b=FhL3eE1FzxkGRW3jWT+LGfLxMreSbGm7jd5KIcK1RNQYGNZM2TqK22/zN+kfggjz7E MSroh4VyXVzLdSE6dyHzr/3L23L/7TtfpVYVHVy4BjTTPJ+WbWY+fb4Ogtg/d+KGToxL kX7lSOcnmMO+UFjwVAMB0FaYS2Gn1hu1hCBGjj+/Sitt29Vai7Cx/kK1pVCym3zjLurH JAB4aUXtCii3Yh24TIHN/mMx6BaKnV2wjKXk2OeHBeyDhalZJ0Oh5EMUcLZ1i8rj4l8m G41yWsYBmsrXligGUsoJwts7gQ3Cwhi/fPY7IuCqZrHKti3O9R09/H+dA+qmOn0xVyM1 pLKg== X-Gm-Message-State: AKGB3mIne9eCgVBX+OVw/z45QVTn0DBXzOtULZq49KfP72QKkdTMQKQh laF1J50Q5Lyox7UMorXipEsQs18Hscs= X-Received: by 10.28.170.75 with SMTP id t72mr19291991wme.15.1514284240642; Tue, 26 Dec 2017 02:30:40 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.38 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:30:39 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 01/20] crypto: testmgr - add a new test case for CRC-T10DIF Date: Tue, 26 Dec 2017 10:29:21 +0000 Message-Id: <20171226102940.26908-2-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org In order to be able to test yield support under preempt, add a test vector for CRC-T10DIF that is long enough to take multiple iterations (and thus possible preemption between them) of the primary loop of the accelerated x86 and arm64 implementations. Signed-off-by: Ard Biesheuvel --- crypto/testmgr.h | 259 ++++++++++++++++++++ 1 file changed, 259 insertions(+) -- 2.11.0 diff --git a/crypto/testmgr.h b/crypto/testmgr.h index a714b6293959..0c849aec161d 100644 --- a/crypto/testmgr.h +++ b/crypto/testmgr.h @@ -1494,6 +1494,265 @@ static const struct hash_testvec crct10dif_tv_template[] = { .digest = (u8 *)(u16 []){ 0x44c6 }, .np = 4, .tap = { 1, 255, 57, 6 }, + }, { + .plaintext = "\x6e\x05\x79\x10\xa7\x1b\xb2\x49" + "\xe0\x54\xeb\x82\x19\x8d\x24\xbb" + "\x2f\xc6\x5d\xf4\x68\xff\x96\x0a" + "\xa1\x38\xcf\x43\xda\x71\x08\x7c" + "\x13\xaa\x1e\xb5\x4c\xe3\x57\xee" + "\x85\x1c\x90\x27\xbe\x32\xc9\x60" + "\xf7\x6b\x02\x99\x0d\xa4\x3b\xd2" + "\x46\xdd\x74\x0b\x7f\x16\xad\x21" + "\xb8\x4f\xe6\x5a\xf1\x88\x1f\x93" + "\x2a\xc1\x35\xcc\x63\xfa\x6e\x05" + "\x9c\x10\xa7\x3e\xd5\x49\xe0\x77" + "\x0e\x82\x19\xb0\x24\xbb\x52\xe9" + "\x5d\xf4\x8b\x22\x96\x2d\xc4\x38" + "\xcf\x66\xfd\x71\x08\x9f\x13\xaa" + "\x41\xd8\x4c\xe3\x7a\x11\x85\x1c" + "\xb3\x27\xbe\x55\xec\x60\xf7\x8e" + "\x02\x99\x30\xc7\x3b\xd2\x69\x00" + "\x74\x0b\xa2\x16\xad\x44\xdb\x4f" + "\xe6\x7d\x14\x88\x1f\xb6\x2a\xc1" + "\x58\xef\x63\xfa\x91\x05\x9c\x33" + "\xca\x3e\xd5\x6c\x03\x77\x0e\xa5" + "\x19\xb0\x47\xde\x52\xe9\x80\x17" + "\x8b\x22\xb9\x2d\xc4\x5b\xf2\x66" + "\xfd\x94\x08\x9f\x36\xcd\x41\xd8" + "\x6f\x06\x7a\x11\xa8\x1c\xb3\x4a" + "\xe1\x55\xec\x83\x1a\x8e\x25\xbc" + "\x30\xc7\x5e\xf5\x69\x00\x97\x0b" + "\xa2\x39\xd0\x44\xdb\x72\x09\x7d" + "\x14\xab\x1f\xb6\x4d\xe4\x58\xef" + "\x86\x1d\x91\x28\xbf\x33\xca\x61" + "\xf8\x6c\x03\x9a\x0e\xa5\x3c\xd3" + "\x47\xde\x75\x0c\x80\x17\xae\x22" + "\xb9\x50\xe7\x5b\xf2\x89\x20\x94" + "\x2b\xc2\x36\xcd\x64\xfb\x6f\x06" + "\x9d\x11\xa8\x3f\xd6\x4a\xe1\x78" + "\x0f\x83\x1a\xb1\x25\xbc\x53\xea" + "\x5e\xf5\x8c\x00\x97\x2e\xc5\x39" + "\xd0\x67\xfe\x72\x09\xa0\x14\xab" + "\x42\xd9\x4d\xe4\x7b\x12\x86\x1d" + "\xb4\x28\xbf\x56\xed\x61\xf8\x8f" + "\x03\x9a\x31\xc8\x3c\xd3\x6a\x01" + "\x75\x0c\xa3\x17\xae\x45\xdc\x50" + "\xe7\x7e\x15\x89\x20\xb7\x2b\xc2" + "\x59\xf0\x64\xfb\x92\x06\x9d\x34" + "\xcb\x3f\xd6\x6d\x04\x78\x0f\xa6" + "\x1a\xb1\x48\xdf\x53\xea\x81\x18" + "\x8c\x23\xba\x2e\xc5\x5c\xf3\x67" + "\xfe\x95\x09\xa0\x37\xce\x42\xd9" + "\x70\x07\x7b\x12\xa9\x1d\xb4\x4b" + "\xe2\x56\xed\x84\x1b\x8f\x26\xbd" + "\x31\xc8\x5f\xf6\x6a\x01\x98\x0c" + "\xa3\x3a\xd1\x45\xdc\x73\x0a\x7e" + "\x15\xac\x20\xb7\x4e\xe5\x59\xf0" + "\x87\x1e\x92\x29\xc0\x34\xcb\x62" + "\xf9\x6d\x04\x9b\x0f\xa6\x3d\xd4" + "\x48\xdf\x76\x0d\x81\x18\xaf\x23" + "\xba\x51\xe8\x5c\xf3\x8a\x21\x95" + "\x2c\xc3\x37\xce\x65\xfc\x70\x07" + "\x9e\x12\xa9\x40\xd7\x4b\xe2\x79" + "\x10\x84\x1b\xb2\x26\xbd\x54\xeb" + "\x5f\xf6\x8d\x01\x98\x2f\xc6\x3a" + "\xd1\x68\xff\x73\x0a\xa1\x15\xac" + "\x43\xda\x4e\xe5\x7c\x13\x87\x1e" + "\xb5\x29\xc0\x57\xee\x62\xf9\x90" + "\x04\x9b\x32\xc9\x3d\xd4\x6b\x02" + "\x76\x0d\xa4\x18\xaf\x46\xdd\x51" + "\xe8\x7f\x16\x8a\x21\xb8\x2c\xc3" + "\x5a\xf1\x65\xfc\x93\x07\x9e\x35" + "\xcc\x40\xd7\x6e\x05\x79\x10\xa7" + "\x1b\xb2\x49\xe0\x54\xeb\x82\x19" + "\x8d\x24\xbb\x2f\xc6\x5d\xf4\x68" + "\xff\x96\x0a\xa1\x38\xcf\x43\xda" + "\x71\x08\x7c\x13\xaa\x1e\xb5\x4c" + "\xe3\x57\xee\x85\x1c\x90\x27\xbe" + "\x32\xc9\x60\xf7\x6b\x02\x99\x0d" + "\xa4\x3b\xd2\x46\xdd\x74\x0b\x7f" + "\x16\xad\x21\xb8\x4f\xe6\x5a\xf1" + "\x88\x1f\x93\x2a\xc1\x35\xcc\x63" + "\xfa\x6e\x05\x9c\x10\xa7\x3e\xd5" + "\x49\xe0\x77\x0e\x82\x19\xb0\x24" + "\xbb\x52\xe9\x5d\xf4\x8b\x22\x96" + "\x2d\xc4\x38\xcf\x66\xfd\x71\x08" + "\x9f\x13\xaa\x41\xd8\x4c\xe3\x7a" + "\x11\x85\x1c\xb3\x27\xbe\x55\xec" + "\x60\xf7\x8e\x02\x99\x30\xc7\x3b" + "\xd2\x69\x00\x74\x0b\xa2\x16\xad" + "\x44\xdb\x4f\xe6\x7d\x14\x88\x1f" + "\xb6\x2a\xc1\x58\xef\x63\xfa\x91" + "\x05\x9c\x33\xca\x3e\xd5\x6c\x03" + "\x77\x0e\xa5\x19\xb0\x47\xde\x52" + "\xe9\x80\x17\x8b\x22\xb9\x2d\xc4" + "\x5b\xf2\x66\xfd\x94\x08\x9f\x36" + "\xcd\x41\xd8\x6f\x06\x7a\x11\xa8" + "\x1c\xb3\x4a\xe1\x55\xec\x83\x1a" + "\x8e\x25\xbc\x30\xc7\x5e\xf5\x69" + "\x00\x97\x0b\xa2\x39\xd0\x44\xdb" + "\x72\x09\x7d\x14\xab\x1f\xb6\x4d" + "\xe4\x58\xef\x86\x1d\x91\x28\xbf" + "\x33\xca\x61\xf8\x6c\x03\x9a\x0e" + "\xa5\x3c\xd3\x47\xde\x75\x0c\x80" + "\x17\xae\x22\xb9\x50\xe7\x5b\xf2" + "\x89\x20\x94\x2b\xc2\x36\xcd\x64" + "\xfb\x6f\x06\x9d\x11\xa8\x3f\xd6" + "\x4a\xe1\x78\x0f\x83\x1a\xb1\x25" + "\xbc\x53\xea\x5e\xf5\x8c\x00\x97" + "\x2e\xc5\x39\xd0\x67\xfe\x72\x09" + "\xa0\x14\xab\x42\xd9\x4d\xe4\x7b" + "\x12\x86\x1d\xb4\x28\xbf\x56\xed" + "\x61\xf8\x8f\x03\x9a\x31\xc8\x3c" + "\xd3\x6a\x01\x75\x0c\xa3\x17\xae" + "\x45\xdc\x50\xe7\x7e\x15\x89\x20" + "\xb7\x2b\xc2\x59\xf0\x64\xfb\x92" + "\x06\x9d\x34\xcb\x3f\xd6\x6d\x04" + "\x78\x0f\xa6\x1a\xb1\x48\xdf\x53" + "\xea\x81\x18\x8c\x23\xba\x2e\xc5" + "\x5c\xf3\x67\xfe\x95\x09\xa0\x37" + "\xce\x42\xd9\x70\x07\x7b\x12\xa9" + "\x1d\xb4\x4b\xe2\x56\xed\x84\x1b" + "\x8f\x26\xbd\x31\xc8\x5f\xf6\x6a" + "\x01\x98\x0c\xa3\x3a\xd1\x45\xdc" + "\x73\x0a\x7e\x15\xac\x20\xb7\x4e" + "\xe5\x59\xf0\x87\x1e\x92\x29\xc0" + "\x34\xcb\x62\xf9\x6d\x04\x9b\x0f" + "\xa6\x3d\xd4\x48\xdf\x76\x0d\x81" + "\x18\xaf\x23\xba\x51\xe8\x5c\xf3" + "\x8a\x21\x95\x2c\xc3\x37\xce\x65" + "\xfc\x70\x07\x9e\x12\xa9\x40\xd7" + "\x4b\xe2\x79\x10\x84\x1b\xb2\x26" + "\xbd\x54\xeb\x5f\xf6\x8d\x01\x98" + "\x2f\xc6\x3a\xd1\x68\xff\x73\x0a" + "\xa1\x15\xac\x43\xda\x4e\xe5\x7c" + "\x13\x87\x1e\xb5\x29\xc0\x57\xee" + "\x62\xf9\x90\x04\x9b\x32\xc9\x3d" + "\xd4\x6b\x02\x76\x0d\xa4\x18\xaf" + "\x46\xdd\x51\xe8\x7f\x16\x8a\x21" + "\xb8\x2c\xc3\x5a\xf1\x65\xfc\x93" + "\x07\x9e\x35\xcc\x40\xd7\x6e\x05" + "\x79\x10\xa7\x1b\xb2\x49\xe0\x54" + "\xeb\x82\x19\x8d\x24\xbb\x2f\xc6" + "\x5d\xf4\x68\xff\x96\x0a\xa1\x38" + "\xcf\x43\xda\x71\x08\x7c\x13\xaa" + "\x1e\xb5\x4c\xe3\x57\xee\x85\x1c" + "\x90\x27\xbe\x32\xc9\x60\xf7\x6b" + "\x02\x99\x0d\xa4\x3b\xd2\x46\xdd" + "\x74\x0b\x7f\x16\xad\x21\xb8\x4f" + "\xe6\x5a\xf1\x88\x1f\x93\x2a\xc1" + "\x35\xcc\x63\xfa\x6e\x05\x9c\x10" + "\xa7\x3e\xd5\x49\xe0\x77\x0e\x82" + "\x19\xb0\x24\xbb\x52\xe9\x5d\xf4" + "\x8b\x22\x96\x2d\xc4\x38\xcf\x66" + "\xfd\x71\x08\x9f\x13\xaa\x41\xd8" + "\x4c\xe3\x7a\x11\x85\x1c\xb3\x27" + "\xbe\x55\xec\x60\xf7\x8e\x02\x99" + "\x30\xc7\x3b\xd2\x69\x00\x74\x0b" + "\xa2\x16\xad\x44\xdb\x4f\xe6\x7d" + "\x14\x88\x1f\xb6\x2a\xc1\x58\xef" + "\x63\xfa\x91\x05\x9c\x33\xca\x3e" + "\xd5\x6c\x03\x77\x0e\xa5\x19\xb0" + "\x47\xde\x52\xe9\x80\x17\x8b\x22" + "\xb9\x2d\xc4\x5b\xf2\x66\xfd\x94" + "\x08\x9f\x36\xcd\x41\xd8\x6f\x06" + "\x7a\x11\xa8\x1c\xb3\x4a\xe1\x55" + "\xec\x83\x1a\x8e\x25\xbc\x30\xc7" + "\x5e\xf5\x69\x00\x97\x0b\xa2\x39" + "\xd0\x44\xdb\x72\x09\x7d\x14\xab" + "\x1f\xb6\x4d\xe4\x58\xef\x86\x1d" + "\x91\x28\xbf\x33\xca\x61\xf8\x6c" + "\x03\x9a\x0e\xa5\x3c\xd3\x47\xde" + "\x75\x0c\x80\x17\xae\x22\xb9\x50" + "\xe7\x5b\xf2\x89\x20\x94\x2b\xc2" + "\x36\xcd\x64\xfb\x6f\x06\x9d\x11" + "\xa8\x3f\xd6\x4a\xe1\x78\x0f\x83" + "\x1a\xb1\x25\xbc\x53\xea\x5e\xf5" + "\x8c\x00\x97\x2e\xc5\x39\xd0\x67" + "\xfe\x72\x09\xa0\x14\xab\x42\xd9" + "\x4d\xe4\x7b\x12\x86\x1d\xb4\x28" + "\xbf\x56\xed\x61\xf8\x8f\x03\x9a" + "\x31\xc8\x3c\xd3\x6a\x01\x75\x0c" + "\xa3\x17\xae\x45\xdc\x50\xe7\x7e" + "\x15\x89\x20\xb7\x2b\xc2\x59\xf0" + "\x64\xfb\x92\x06\x9d\x34\xcb\x3f" + "\xd6\x6d\x04\x78\x0f\xa6\x1a\xb1" + "\x48\xdf\x53\xea\x81\x18\x8c\x23" + "\xba\x2e\xc5\x5c\xf3\x67\xfe\x95" + "\x09\xa0\x37\xce\x42\xd9\x70\x07" + "\x7b\x12\xa9\x1d\xb4\x4b\xe2\x56" + "\xed\x84\x1b\x8f\x26\xbd\x31\xc8" + "\x5f\xf6\x6a\x01\x98\x0c\xa3\x3a" + "\xd1\x45\xdc\x73\x0a\x7e\x15\xac" + "\x20\xb7\x4e\xe5\x59\xf0\x87\x1e" + "\x92\x29\xc0\x34\xcb\x62\xf9\x6d" + "\x04\x9b\x0f\xa6\x3d\xd4\x48\xdf" + "\x76\x0d\x81\x18\xaf\x23\xba\x51" + "\xe8\x5c\xf3\x8a\x21\x95\x2c\xc3" + "\x37\xce\x65\xfc\x70\x07\x9e\x12" + "\xa9\x40\xd7\x4b\xe2\x79\x10\x84" + "\x1b\xb2\x26\xbd\x54\xeb\x5f\xf6" + "\x8d\x01\x98\x2f\xc6\x3a\xd1\x68" + "\xff\x73\x0a\xa1\x15\xac\x43\xda" + "\x4e\xe5\x7c\x13\x87\x1e\xb5\x29" + "\xc0\x57\xee\x62\xf9\x90\x04\x9b" + "\x32\xc9\x3d\xd4\x6b\x02\x76\x0d" + "\xa4\x18\xaf\x46\xdd\x51\xe8\x7f" + "\x16\x8a\x21\xb8\x2c\xc3\x5a\xf1" + "\x65\xfc\x93\x07\x9e\x35\xcc\x40" + "\xd7\x6e\x05\x79\x10\xa7\x1b\xb2" + "\x49\xe0\x54\xeb\x82\x19\x8d\x24" + "\xbb\x2f\xc6\x5d\xf4\x68\xff\x96" + "\x0a\xa1\x38\xcf\x43\xda\x71\x08" + "\x7c\x13\xaa\x1e\xb5\x4c\xe3\x57" + "\xee\x85\x1c\x90\x27\xbe\x32\xc9" + "\x60\xf7\x6b\x02\x99\x0d\xa4\x3b" + "\xd2\x46\xdd\x74\x0b\x7f\x16\xad" + "\x21\xb8\x4f\xe6\x5a\xf1\x88\x1f" + "\x93\x2a\xc1\x35\xcc\x63\xfa\x6e" + "\x05\x9c\x10\xa7\x3e\xd5\x49\xe0" + "\x77\x0e\x82\x19\xb0\x24\xbb\x52" + "\xe9\x5d\xf4\x8b\x22\x96\x2d\xc4" + "\x38\xcf\x66\xfd\x71\x08\x9f\x13" + "\xaa\x41\xd8\x4c\xe3\x7a\x11\x85" + "\x1c\xb3\x27\xbe\x55\xec\x60\xf7" + "\x8e\x02\x99\x30\xc7\x3b\xd2\x69" + "\x00\x74\x0b\xa2\x16\xad\x44\xdb" + "\x4f\xe6\x7d\x14\x88\x1f\xb6\x2a" + "\xc1\x58\xef\x63\xfa\x91\x05\x9c" + "\x33\xca\x3e\xd5\x6c\x03\x77\x0e" + "\xa5\x19\xb0\x47\xde\x52\xe9\x80" + "\x17\x8b\x22\xb9\x2d\xc4\x5b\xf2" + "\x66\xfd\x94\x08\x9f\x36\xcd\x41" + "\xd8\x6f\x06\x7a\x11\xa8\x1c\xb3" + "\x4a\xe1\x55\xec\x83\x1a\x8e\x25" + "\xbc\x30\xc7\x5e\xf5\x69\x00\x97" + "\x0b\xa2\x39\xd0\x44\xdb\x72\x09" + "\x7d\x14\xab\x1f\xb6\x4d\xe4\x58" + "\xef\x86\x1d\x91\x28\xbf\x33\xca" + "\x61\xf8\x6c\x03\x9a\x0e\xa5\x3c" + "\xd3\x47\xde\x75\x0c\x80\x17\xae" + "\x22\xb9\x50\xe7\x5b\xf2\x89\x20" + "\x94\x2b\xc2\x36\xcd\x64\xfb\x6f" + "\x06\x9d\x11\xa8\x3f\xd6\x4a\xe1" + "\x78\x0f\x83\x1a\xb1\x25\xbc\x53" + "\xea\x5e\xf5\x8c\x00\x97\x2e\xc5" + "\x39\xd0\x67\xfe\x72\x09\xa0\x14" + "\xab\x42\xd9\x4d\xe4\x7b\x12\x86" + "\x1d\xb4\x28\xbf\x56\xed\x61\xf8" + "\x8f\x03\x9a\x31\xc8\x3c\xd3\x6a" + "\x01\x75\x0c\xa3\x17\xae\x45\xdc" + "\x50\xe7\x7e\x15\x89\x20\xb7\x2b" + "\xc2\x59\xf0\x64\xfb\x92\x06\x9d" + "\x34\xcb\x3f\xd6\x6d\x04\x78\x0f" + "\xa6\x1a\xb1\x48\xdf\x53\xea\x81" + "\x18\x8c\x23\xba\x2e\xc5\x5c\xf3" + "\x67\xfe\x95\x09\xa0\x37\xce\x42" + "\xd9\x70\x07\x7b\x12\xa9\x1d\xb4" + "\x4b\xe2\x56\xed\x84\x1b\x8f\x26" + "\xbd\x31\xc8\x5f\xf6\x6a\x01\x98", + .psize = 2048, + .digest = (u8 *)(u16 []){ 0x23ca }, } }; From patchwork Tue Dec 26 10:29:22 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122744 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp783311qgn; Tue, 26 Dec 2017 02:35:38 -0800 (PST) X-Google-Smtp-Source: ACJfBoumzBO8n9vaa4ywvgTqPrUFnoIAmhBqupL87EIFTZ04w0cJtcUMhtKZDjlDECmkKgwmwiik X-Received: by 10.98.69.209 with SMTP id n78mr24804704pfi.28.1514284537922; Tue, 26 Dec 2017 02:35:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284537; cv=none; d=google.com; s=arc-20160816; b=F6C9M0o89yWJvYmvJFFqk1QWe4ZE+CjrDoTSGNeJPHkYMEFTQ7Lbn1DxkN7B+GoOxU 73Tv9FXnqFVzitmWY9FjbwhfI+/qLD7HvaN7ahlppx1JFLKWVh0HBL923nk6FAl7oGdq aPFcmhElCA6ZL+tmdybGHDjOfK1rapQPRBHrsLklJoFNvfkPyIwMtVzJzSDCZ3X+W6hn 3FTuqkjsNnWQVfIflHnGhDS4UHO5GbLkR75nnhNaTtR2H0Fp1u2DCmFJR6Tr5cqLqKQG RJFaSdeFUyQlMwcKNHaalnHMZ93a5fJ+pFAJfffzIoEa798ubeHzxnMRGT4ckirtumxP zaCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=LihVXBSIHLub8TadiGHOcXJbFC4M3p/6b/xh6SsRLz4=; b=jAIwIfcudOa/3Hd6iroobZeISgcXHjqzy0JuL2pLXpwvt22vnzGz5kPby6PruzZPK1 qajgM+0wL5MnrUBe7OvGNyikfpZODBoThvVSegwj7BeHrUFmCJ6gHmCqMAggsTska8fU 69nGDSqT2nCtRHOigAbQMYJHuhwIrR9GgCnVpoFYTvI0jUnCg+/e7YXt7yt8nJou7ln8 +YGdJjKue830kkQsdGnKpAmkwL0+1NrIYPoaItpm46NbEH5AgMI0DaV2A/EKYHW1cCaZ +aGFnehj8SZges8B3M6uPIZBr+F3+VH/b4tiTVlBjZx/eTkWlQJXmxw+9g3/ozWK9aT7 bT6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GTUypmNJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o2si22844765pfj.12.2017.12.26.02.35.37; Tue, 26 Dec 2017 02:35:37 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GTUypmNJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751522AbdLZKfe (ORCPT + 28 others); Tue, 26 Dec 2017 05:35:34 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:36604 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750969AbdLZKap (ORCPT ); Tue, 26 Dec 2017 05:30:45 -0500 Received: by mail-wm0-f67.google.com with SMTP id b76so34573131wmg.1 for ; Tue, 26 Dec 2017 02:30:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=LihVXBSIHLub8TadiGHOcXJbFC4M3p/6b/xh6SsRLz4=; b=GTUypmNJEn6qKBdRZ8ThyCwAs13gyPvR6UGOG9UXqLNC5DZkBSOR6NOX73B2ae1cFh 3DA4PF2jeMHJIe4pgV53WH1ioEvz2U0Lr9a1mKWb3xU1NAYnAPnvtHHzjki6qX+yxmRW Uc9a0Kj0/iI1qogsoynIjLyer2VH0zbMKgQhQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=LihVXBSIHLub8TadiGHOcXJbFC4M3p/6b/xh6SsRLz4=; b=Y9Tyo5JjkMuBhXaX+VSAlfxAAx8N3+zHNN6qvNadgq1ngfGkZifPPIMk1JuTYVJJxD 6rwYn4gEJtiXDBOb5h7bTwsHdxwwP6r6ObOwpeBwAYf63I+V9IriPJ4S7I+lld4MTBCc c2CDquGVMHPCgJiIm6wHMEju/1CoTvvhi3tZRFK1oER2BiFS0+Bn5YBtdkXMj2bZECqL CVZztElHz/a5m9CQkY3ZiCI6oF28cvdw49D+xTBiFA09W3iFNMaUJrrq9h5kKFymRKxj gGVApvruP+oxvK3bmEHpnlnHumDy3QLEAM3AXIwAvH3KXQaVVZqO22kttGyp1t+gI+Ci PGIw== X-Gm-Message-State: AKGB3mJDMwyrLE1oq5m+b0WvXnj8qjatTbaMuxCBhmTHlE/evbXj1Y5o gJUeGMDnndub7ir2q/rT3wupNaCUpbw= X-Received: by 10.28.57.11 with SMTP id g11mr21674383wma.92.1514284243781; Tue, 26 Dec 2017 02:30:43 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.40 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:30:43 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 02/20] crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop Date: Tue, 26 Dec 2017 10:29:22 +0000 Message-Id: <20171226102940.26908-3-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 47 ++++++++++---------- 1 file changed, 23 insertions(+), 24 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index a1254036f2b1..68b11aa690e4 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -107,11 +107,13 @@ static int ccm_init_mac(struct aead_request *req, u8 maciv[], u32 msglen) } static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], - u32 abytes, u32 *macp, bool use_neon) + u32 abytes, u32 *macp) { - if (likely(use_neon)) { + if (may_use_simd()) { + kernel_neon_begin(); ce_aes_ccm_auth_data(mac, in, abytes, macp, key->key_enc, num_rounds(key)); + kernel_neon_end(); } else { if (*macp > 0 && *macp < AES_BLOCK_SIZE) { int added = min(abytes, AES_BLOCK_SIZE - *macp); @@ -143,8 +145,7 @@ static void ccm_update_mac(struct crypto_aes_ctx *key, u8 mac[], u8 const in[], } } -static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], - bool use_neon) +static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) { struct crypto_aead *aead = crypto_aead_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_aead_ctx(aead); @@ -163,7 +164,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], ltag.len = 6; } - ccm_update_mac(ctx, mac, (u8 *)<ag, ltag.len, &macp, use_neon); + ccm_update_mac(ctx, mac, (u8 *)<ag, ltag.len, &macp); scatterwalk_start(&walk, req->src); do { @@ -175,7 +176,7 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[], n = scatterwalk_clamp(&walk, len); } p = scatterwalk_map(&walk); - ccm_update_mac(ctx, mac, p, n, &macp, use_neon); + ccm_update_mac(ctx, mac, p, n, &macp); len -= n; scatterwalk_unmap(p); @@ -242,43 +243,42 @@ static int ccm_encrypt(struct aead_request *req) u8 __aligned(8) mac[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE]; u32 len = req->cryptlen; - bool use_neon = may_use_simd(); int err; err = ccm_init_mac(req, mac, len); if (err) return err; - if (likely(use_neon)) - kernel_neon_begin(); - if (req->assoclen) - ccm_calculate_auth_mac(req, mac, use_neon); + ccm_calculate_auth_mac(req, mac); /* preserve the original iv for the final round */ memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_encrypt(&walk, req, true); - if (likely(use_neon)) { + if (may_use_simd()) { while (walk.nbytes) { u32 tail = walk.nbytes % AES_BLOCK_SIZE; if (walk.nbytes == walk.total) tail = 0; + kernel_neon_begin(); ce_aes_ccm_encrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, tail); } - if (!err) + if (!err) { + kernel_neon_begin(); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - - kernel_neon_end(); + kernel_neon_end(); + } } else { err = ccm_crypt_fallback(&walk, mac, buf, ctx, true); } @@ -301,43 +301,42 @@ static int ccm_decrypt(struct aead_request *req) u8 __aligned(8) mac[AES_BLOCK_SIZE]; u8 buf[AES_BLOCK_SIZE]; u32 len = req->cryptlen - authsize; - bool use_neon = may_use_simd(); int err; err = ccm_init_mac(req, mac, len); if (err) return err; - if (likely(use_neon)) - kernel_neon_begin(); - if (req->assoclen) - ccm_calculate_auth_mac(req, mac, use_neon); + ccm_calculate_auth_mac(req, mac); /* preserve the original iv for the final round */ memcpy(buf, req->iv, AES_BLOCK_SIZE); err = skcipher_walk_aead_decrypt(&walk, req, true); - if (likely(use_neon)) { + if (may_use_simd()) { while (walk.nbytes) { u32 tail = walk.nbytes % AES_BLOCK_SIZE; if (walk.nbytes == walk.total) tail = 0; + kernel_neon_begin(); ce_aes_ccm_decrypt(walk.dst.virt.addr, walk.src.virt.addr, walk.nbytes - tail, ctx->key_enc, num_rounds(ctx), mac, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, tail); } - if (!err) + if (!err) { + kernel_neon_begin(); ce_aes_ccm_final(mac, buf, ctx->key_enc, num_rounds(ctx)); - - kernel_neon_end(); + kernel_neon_end(); + } } else { err = ccm_crypt_fallback(&walk, mac, buf, ctx, false); } From patchwork Tue Dec 26 10:29:23 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122743 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp782984qgn; Tue, 26 Dec 2017 02:35:15 -0800 (PST) X-Google-Smtp-Source: ACJfBou9Jy+RHjBgf8w+l9LiFooiqrLBexFwR2G7gI/OjRugLDkudLeW4i4AwPE6CSjf+F7FdX3f X-Received: by 10.159.229.10 with SMTP id s10mr25370113plq.386.1514284515583; Tue, 26 Dec 2017 02:35:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284515; cv=none; d=google.com; s=arc-20160816; b=Yz5kIjGi7z19eYO6w2T2fET4W7oc415LtI5pz4MbpT+0BTKDCsFx7Ia0IOxBXFanfm LZA0dwyu+gBsFgu8JOekJYQvBei6ceLBT4VrnLv+APxuLkU5I/9rvb6nviwD4RZzLbG+ zMNTDLOXZkWB7XHSYWhgfTwslhIGWwTHMvn35ije5f9C5C535hbHfF3dVQ53Cooq+D2t AdHiKDq8RQsnYI71v9+wvw8hKYKnzd7mDOIQI4kb9dY/uZNc5MmM/xUb9+uxEcCSpYig U14uorbt1XAObc1jmvwYTUHWOkWd2cdymZ1b6GTfWGgk3hMUruhOmnYLQLlpVREwlB9u DU3w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=O9KtkPHW86gXRrtPKKw6WdWokDizkGUrJiWKqgb4Vm91jOePTVXisaUublMljqu4Wm 1HXHy+qv6uyFo+hsfeXwT2EMRp/SsvQ6VxdJ/T00Yy/P2amTd21OBu8p+yhr72iFCr/6 azhUGgaInT28Hoy2iu/tLl+E0cSDVHy5joSnfFfYVJCfQAuVhu6HSu6V++W2xFELPkSq D8ydxObJZXIUWvX4O2LE1FEU7u/zkDslHPCwwp2phMVRwR+7r1FWByC4vOXJ6qznyQwl RrFZRleoZVc3kT96bH5DsMonnxP264AKEOY+AAjWxdn/5J6rzJ0EtlU8vPN3jQUBaLud Fwyg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jNkuzrIP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e190si22759409pfe.96.2017.12.26.02.35.15; Tue, 26 Dec 2017 02:35:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=jNkuzrIP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751482AbdLZKfK (ORCPT + 28 others); Tue, 26 Dec 2017 05:35:10 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:40999 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750978AbdLZKat (ORCPT ); Tue, 26 Dec 2017 05:30:49 -0500 Received: by mail-wr0-f196.google.com with SMTP id p69so25461833wrb.8 for ; Tue, 26 Dec 2017 02:30:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=jNkuzrIPjSC8Wdo8ypNZxhYp7QWpuER41oXvix8ivEm35v7b2HBpccSwtzRCodPYvP 4DUXOVIN94KHfc0SYI+oYKMflfrQxKWYD3k/20HxXY8PoJvEAxJs4b8MnbUOFeY/+JHS 8J7XMdeghPi94Yg0k5QtxlzRHI12vAXtR6/VY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=R2cQ4/fosfe4jzVbyYEDZjyUsNZ8Zda7gHi/Beszb0E=; b=m2QkpfpgwqQpLosm8Hebovq6R/lm44LPJWFrO8lT0ETiwbOWXc19RNw5TSVTL+yNGF fuxqycBnjYdIZVHdF7T+Oj7cKX+iWUJUBS3Pjopjliu/qvyjpZobojUnOHJeSPMKhC7D 3yDWHkNYLN7OzshclvXzw52n8gn247HeB9UJGABgIUbLcOoEcKGarxpGEizIizL9gaT+ yMj656srM5IaXu5C8zDiR03G1s6ZmbOJKWXH9udWQiLDGjb7OdY0Tf71bYO+1oat5kYd AeMdSUPYcXVJYWzI8AYZeD2K4X0F/KdJW//6YqBo0Q0KElknRr4nDqrHcqjlG8EPFktE njnw== X-Gm-Message-State: AKGB3mImWGdrtwPaXABzCcE6LHChWc87cLpugnStd+puhMsX7fpHvzlR hDt8qdRdutPcfUJKAtVjPo8Qr7vyNc4= X-Received: by 10.223.171.15 with SMTP id q15mr24034414wrc.112.1514284246972; Tue, 26 Dec 2017 02:30:46 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.43 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:30:46 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 03/20] crypto: arm64/aes-blk - move kernel mode neon en/disable into loop Date: Tue, 26 Dec 2017 10:29:23 +0000 Message-Id: <20171226102940.26908-4-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Note that this requires some reshuffling of the registers in the asm code, because the XTS routines can no longer rely on the registers to retain their contents between invocations. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 95 ++++++++++---------- arch/arm64/crypto/aes-modes.S | 90 +++++++++---------- arch/arm64/crypto/aes-neonbs-glue.c | 14 ++- 3 files changed, 97 insertions(+), 102 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 998ba519a026..00a3e2fd6a48 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -64,17 +64,17 @@ MODULE_LICENSE("GPL v2"); /* defined in aes-modes.S */ asmlinkage void aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 iv[], int first); + int rounds, int blocks, u8 iv[]); asmlinkage void aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], - int rounds, int blocks, u8 ctr[], int first); + int rounds, int blocks, u8 ctr[]); asmlinkage void aes_xts_encrypt(u8 out[], u8 const in[], u8 const rk1[], int rounds, int blocks, u8 const rk2[], u8 iv[], @@ -133,19 +133,19 @@ static int ecb_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, first); + (u8 *)ctx->key_enc, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -153,19 +153,19 @@ static int ecb_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ecb_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, first); + (u8 *)ctx->key_dec, rounds, blocks); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -173,20 +173,19 @@ static int cbc_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -194,20 +193,19 @@ static int cbc_decrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); - for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_cbc_decrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_dec, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_dec, rounds, blocks, walk.iv); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -215,20 +213,18 @@ static int ctr_encrypt(struct skcipher_request *req) { struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); - int err, first, rounds = 6 + ctx->key_length / 4; + int err, rounds = 6 + ctx->key_length / 4; struct skcipher_walk walk; int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - first = 1; - kernel_neon_begin(); while ((blocks = (walk.nbytes / AES_BLOCK_SIZE))) { + kernel_neon_begin(); aes_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - (u8 *)ctx->key_enc, rounds, blocks, walk.iv, - first); + (u8 *)ctx->key_enc, rounds, blocks, walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; + kernel_neon_end(); } if (walk.nbytes) { u8 __aligned(8) tail[AES_BLOCK_SIZE]; @@ -241,12 +237,13 @@ static int ctr_encrypt(struct skcipher_request *req) */ blocks = -1; + kernel_neon_begin(); aes_ctr_encrypt(tail, NULL, (u8 *)ctx->key_enc, rounds, - blocks, walk.iv, first); + blocks, walk.iv); + kernel_neon_end(); crypto_xor_cpy(tdst, tsrc, tail, nbytes); err = skcipher_walk_done(&walk, 0); } - kernel_neon_end(); return err; } @@ -270,16 +267,16 @@ static int xts_encrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_encrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_enc, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -292,16 +289,16 @@ static int xts_decrypt(struct skcipher_request *req) struct skcipher_walk walk; unsigned int blocks; - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); - kernel_neon_begin(); for (first = 1; (blocks = (walk.nbytes / AES_BLOCK_SIZE)); first = 0) { + kernel_neon_begin(); aes_xts_decrypt(walk.dst.virt.addr, walk.src.virt.addr, (u8 *)ctx->key1.key_dec, rounds, blocks, (u8 *)ctx->key2.key_enc, walk.iv, first); + kernel_neon_end(); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); } - kernel_neon_end(); return err; } @@ -425,7 +422,7 @@ static int cmac_setkey(struct crypto_shash *tfm, const u8 *in_key, /* encrypt the zero vector */ kernel_neon_begin(); - aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1, 1); + aes_ecb_encrypt(ctx->consts, (u8[AES_BLOCK_SIZE]){}, rk, rounds, 1); kernel_neon_end(); cmac_gf128_mul_by_x(consts, consts); @@ -454,8 +451,8 @@ static int xcbc_setkey(struct crypto_shash *tfm, const u8 *in_key, return err; kernel_neon_begin(); - aes_ecb_encrypt(key, ks[0], rk, rounds, 1, 1); - aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2, 0); + aes_ecb_encrypt(key, ks[0], rk, rounds, 1); + aes_ecb_encrypt(ctx->consts, ks[1], rk, rounds, 2); kernel_neon_end(); return cbcmac_setkey(tfm, key, sizeof(key)); diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 2674d43d1384..65b273667b34 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -40,24 +40,24 @@ #if INTERLEAVE == 2 aes_encrypt_block2x: - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block2x) aes_decrypt_block2x: - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block2x) #elif INTERLEAVE == 4 aes_encrypt_block4x: - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_encrypt_block4x) aes_decrypt_block4x: - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret ENDPROC(aes_decrypt_block4x) @@ -86,33 +86,32 @@ ENDPROC(aes_decrypt_block4x) #define FRAME_POP .macro do_encrypt_block2x - encrypt_block2x v0, v1, w3, x2, x6, w7 + encrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_decrypt_block2x - decrypt_block2x v0, v1, w3, x2, x6, w7 + decrypt_block2x v0, v1, w3, x2, x8, w7 .endm .macro do_encrypt_block4x - encrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm .macro do_decrypt_block4x - decrypt_block4x v0, v1, v2, v3, w3, x2, x6, w7 + decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 .endm #endif /* * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) * aes_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, int first) + * int blocks) */ AES_ENTRY(aes_ecb_encrypt) FRAME_PUSH - cbz w5, .LecbencloopNx enc_prepare w3, x2, x5 @@ -148,7 +147,6 @@ AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) FRAME_PUSH - cbz w5, .LecbdecloopNx dec_prepare w3, x2, x5 @@ -184,14 +182,12 @@ AES_ENDPROC(aes_ecb_decrypt) /* * aes_cbc_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) * aes_cbc_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 iv[], int first) + * int blocks, u8 iv[]) */ AES_ENTRY(aes_cbc_encrypt) - cbz w6, .Lcbcencloop - ld1 {v0.16b}, [x5] /* get iv */ enc_prepare w3, x2, x6 @@ -209,7 +205,6 @@ AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) FRAME_PUSH - cbz w6, .LcbcdecloopNx ld1 {v7.16b}, [x5] /* get iv */ dec_prepare w3, x2, x6 @@ -264,20 +259,19 @@ AES_ENDPROC(aes_cbc_decrypt) /* * aes_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, - * int blocks, u8 ctr[], int first) + * int blocks, u8 ctr[]) */ AES_ENTRY(aes_ctr_encrypt) FRAME_PUSH - cbz w6, .Lctrnotfirst /* 1st time around? */ + enc_prepare w3, x2, x6 ld1 {v4.16b}, [x5] -.Lctrnotfirst: - umov x8, v4.d[1] /* keep swabbed ctr in reg */ - rev x8, x8 + umov x6, v4.d[1] /* keep swabbed ctr in reg */ + rev x6, x6 #if INTERLEAVE >= 2 - cmn w8, w4 /* 32 bit overflow? */ + cmn w6, w4 /* 32 bit overflow? */ bcs .Lctrloop .LctrloopNx: subs w4, w4, #INTERLEAVE @@ -285,11 +279,11 @@ AES_ENTRY(aes_ctr_encrypt) #if INTERLEAVE == 2 mov v0.8b, v4.8b mov v1.8b, v4.8b - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v0.d[1], x7 - rev x7, x8 - add x8, x8, #1 + rev x7, x6 + add x6, x6, #1 ins v1.d[1], x7 ld1 {v2.16b-v3.16b}, [x1], #32 /* get 2 input blocks */ do_encrypt_block2x @@ -298,7 +292,7 @@ AES_ENTRY(aes_ctr_encrypt) st1 {v0.16b-v1.16b}, [x0], #32 #else ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ - dup v7.4s, w8 + dup v7.4s, w6 mov v0.16b, v4.16b add v7.4s, v7.4s, v8.4s mov v1.16b, v4.16b @@ -316,9 +310,9 @@ AES_ENTRY(aes_ctr_encrypt) eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b st1 {v0.16b-v3.16b}, [x0], #64 - add x8, x8, #INTERLEAVE + add x6, x6, #INTERLEAVE #endif - rev x7, x8 + rev x7, x6 ins v4.d[1], x7 cbz w4, .Lctrout b .LctrloopNx @@ -328,10 +322,10 @@ AES_ENTRY(aes_ctr_encrypt) #endif .Lctrloop: mov v0.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 - adds x8, x8, #1 /* increment BE ctr */ - rev x7, x8 + adds x6, x6, #1 /* increment BE ctr */ + rev x7, x6 ins v4.d[1], x7 bcs .Lctrcarry /* overflow? */ @@ -385,15 +379,17 @@ CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) FRAME_PUSH - cbz w7, .LxtsencloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - enc_switch_key w3, x2, x6 + cbz w7, .Lxtsencnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + enc_switch_key w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsencNx +.Lxtsencnotfirst: + enc_prepare w3, x2, x8 .LxtsencloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -442,7 +438,7 @@ AES_ENTRY(aes_xts_encrypt) .Lxtsencloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - encrypt_block v0, w3, x2, x6, w7 + encrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -450,6 +446,7 @@ AES_ENTRY(aes_xts_encrypt) next_tweak v4, v4, v7, v8 b .Lxtsencloop .Lxtsencout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_encrypt) @@ -457,15 +454,17 @@ AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) FRAME_PUSH - cbz w7, .LxtsdecloopNx - ld1 {v4.16b}, [x6] - enc_prepare w3, x5, x6 - encrypt_block v4, w3, x5, x6, w7 /* first tweak */ - dec_prepare w3, x2, x6 + cbz w7, .Lxtsdecnotfirst + + enc_prepare w3, x5, x8 + encrypt_block v4, w3, x5, x8, w7 /* first tweak */ + dec_prepare w3, x2, x8 ldr q7, .Lxts_mul_x b .LxtsdecNx +.Lxtsdecnotfirst: + dec_prepare w3, x2, x8 .LxtsdecloopNx: ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 @@ -514,7 +513,7 @@ AES_ENTRY(aes_xts_decrypt) .Lxtsdecloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b - decrypt_block v0, w3, x2, x6, w7 + decrypt_block v0, w3, x2, x8, w7 eor v0.16b, v0.16b, v4.16b st1 {v0.16b}, [x0], #16 subs w4, w4, #1 @@ -522,6 +521,7 @@ AES_ENTRY(aes_xts_decrypt) next_tweak v4, v4, v7, v8 b .Lxtsdecloop .Lxtsdecout: + st1 {v4.16b}, [x6] FRAME_POP ret AES_ENDPROC(aes_xts_decrypt) diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c index c55d68ccb89f..9d823c77ec84 100644 --- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -46,10 +46,9 @@ asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[], /* borrowed from aes-neon-blk.ko */ asmlinkage void neon_aes_ecb_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, int first); + int rounds, int blocks); asmlinkage void neon_aes_cbc_encrypt(u8 out[], u8 const in[], u32 const rk[], - int rounds, int blocks, u8 iv[], - int first); + int rounds, int blocks, u8 iv[]); struct aesbs_ctx { u8 rk[13 * (8 * AES_BLOCK_SIZE) + 32]; @@ -157,7 +156,7 @@ static int cbc_encrypt(struct skcipher_request *req) struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); struct aesbs_cbc_ctx *ctx = crypto_skcipher_ctx(tfm); struct skcipher_walk walk; - int err, first = 1; + int err; err = skcipher_walk_virt(&walk, req, true); @@ -167,10 +166,9 @@ static int cbc_encrypt(struct skcipher_request *req) /* fall back to the non-bitsliced NEON implementation */ neon_aes_cbc_encrypt(walk.dst.virt.addr, walk.src.virt.addr, - ctx->enc, ctx->key.rounds, blocks, walk.iv, - first); + ctx->enc, ctx->key.rounds, blocks, + walk.iv); err = skcipher_walk_done(&walk, walk.nbytes % AES_BLOCK_SIZE); - first = 0; } kernel_neon_end(); return err; @@ -311,7 +309,7 @@ static int __xts_crypt(struct skcipher_request *req, kernel_neon_begin(); neon_aes_ecb_encrypt(walk.iv, walk.iv, ctx->twkey, - ctx->key.rounds, 1, 1); + ctx->key.rounds, 1); while (walk.nbytes >= AES_BLOCK_SIZE) { unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE; From patchwork Tue Dec 26 10:29:25 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122742 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp782728qgn; Tue, 26 Dec 2017 02:34:57 -0800 (PST) X-Google-Smtp-Source: ACJfBov/H496C2CDgnkboM1Y6MFv5iRbdzXp07maucDF9qpio5+aAGj1MHeQ9LGQs8MNEI+KOnOg X-Received: by 10.159.241.1 with SMTP id q1mr25163600plr.383.1514284497371; Tue, 26 Dec 2017 02:34:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284497; cv=none; d=google.com; s=arc-20160816; b=yGS/hV7dsVj2OLN96IhxEyrgSiJLIw9PECOD0GzZEvmOpJg3rK+/tz/D+Sc9434eZe Y8m4wI+sGSu+GuBI5Sogfi1y/b2HJfEl8TAOnv6j60+eKz84zV3S1GfvQ9DECzCGVnqL bW+trdvYLGAIHwQs3I2J9DHoSW3ZaDOVTADzp5H8Y14DWaaAI8q7cOoi1WwqusjZFIta 6dUtXVlL92xZwGQJSR/cavMc0ST76LL+ZeJvV3OcTcinWvDoVn/c9/nFAJqrNp60uVuP z9U1yoF0x5Ma/I3SJdFEnY4lxJv7XtNaJDlFWzQxmth04L2fOridPGesTxiVNrpD3qIO 05tA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=mHm2oFDoKMiEF3+89E87hDBHCagAIS4bH4hTZDwHJh5+1YsDyAiwzknAAJCviFmaSx kLfN0Bzyujaeki7KE2yhLDekk6Kh39dnx/WfEHwFY4CseRW12bHcRFTpUADvFZiJI2z1 zRxS8WeM5jdoUcKo7JBdOJSeg3fPh3yMfnleq4LisXsKY5rpNRhF7B2tpl7/XafxH7jg evuRCManl3KLN1yOfO2J3pQg4XQ+uckMDE4A8q3LFcxmVC9hBvtFGPwLo98rVtK3tQVn V938Vm9opgtUquaoQ6J48uAu7QnAuv5LLc9MuppcCuSswmAFSL/NaIL3eC3a9N7AvIVc JzpQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PFo46X6T; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u85si22845532pfi.278.2017.12.26.02.34.57; Tue, 26 Dec 2017 02:34:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=PFo46X6T; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751441AbdLZKey (ORCPT + 28 others); Tue, 26 Dec 2017 05:34:54 -0500 Received: from mail-wm0-f68.google.com ([74.125.82.68]:44620 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751055AbdLZKay (ORCPT ); Tue, 26 Dec 2017 05:30:54 -0500 Received: by mail-wm0-f68.google.com with SMTP id t8so34450180wmc.3 for ; Tue, 26 Dec 2017 02:30:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=PFo46X6TkvWOtXe7FM+ZEv5Ex7Ujw3V4qVQa1uSh6NYG1FDY/3YWOv5QjWaGVswklk Q243Kg8U7v7NeE8HznjJd5Hu1HdP7y1VR9eJ3n6R98F4YsrGjs3aBDQ6FmJAurKjlvgX C9hJvreBD79fZq2bsV4l51/WbUJkwBt5uIwyw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=IhTik8BmpN/IjyhBdu2dYZZ72sUFaQrQeiEa5qMgGQk=; b=dzZwydc4Mo+wKZojzPetzRlOf703qSMrRA/rSIISvfj4cXYcSY1TMseiGQOzbBu54v 8mlzTIB3RdKDyV2s3v5sHghs9tDEHQq6S0xMdEKhA5nC5G+8Or0rkkUInUOlds8xtSHB o+Fj0+5453YLLNrdEAH4xvEb9ReOjAmzBzIWPaGFTIdmPWYA5kqWXiNM0n1vSdj9Nc3B o8xxKcrUlYrz+bcZllmF4n2K2fDXL/iEHaI7WiU/bz/betRJA5HhFgcEMXnAnpRQJJ10 XLR0QNe20HjtODbchR4dgaR+ItxMQ4A2BOPdoP4rsVw6NCj1l2UmdlmlECp0C5MOatk9 QmvQ== X-Gm-Message-State: AKGB3mLhRLw/wvt7mhP5db0ImD4auM+VYyb53e7czFmK4lK2zPR32BOu qzpfvG/42LBehDclYMYde6lSdF3eJYA= X-Received: by 10.28.15.201 with SMTP id 192mr18925702wmp.97.1514284253276; Tue, 26 Dec 2017 02:30:53 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.50 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:30:52 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 05/20] crypto: arm64/chacha20 - move kernel mode neon en/disable into loop Date: Tue, 26 Dec 2017 10:29:25 +0000 Message-Id: <20171226102940.26908-6-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/chacha20-neon-glue.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c index cbdb75d15cd0..727579c93ded 100644 --- a/arch/arm64/crypto/chacha20-neon-glue.c +++ b/arch/arm64/crypto/chacha20-neon-glue.c @@ -37,12 +37,19 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, u8 buf[CHACHA20_BLOCK_SIZE]; while (bytes >= CHACHA20_BLOCK_SIZE * 4) { + kernel_neon_begin(); chacha20_4block_xor_neon(state, dst, src); + kernel_neon_end(); bytes -= CHACHA20_BLOCK_SIZE * 4; src += CHACHA20_BLOCK_SIZE * 4; dst += CHACHA20_BLOCK_SIZE * 4; state[12] += 4; } + + if (!bytes) + return; + + kernel_neon_begin(); while (bytes >= CHACHA20_BLOCK_SIZE) { chacha20_block_xor_neon(state, dst, src); bytes -= CHACHA20_BLOCK_SIZE; @@ -55,6 +62,7 @@ static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src, chacha20_block_xor_neon(state, buf, buf); memcpy(dst, buf, bytes); } + kernel_neon_end(); } static int chacha20_neon(struct skcipher_request *req) @@ -68,11 +76,10 @@ static int chacha20_neon(struct skcipher_request *req) if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE) return crypto_chacha20_crypt(req); - err = skcipher_walk_virt(&walk, req, true); + err = skcipher_walk_virt(&walk, req, false); crypto_chacha20_init(state, ctx, walk.iv); - kernel_neon_begin(); while (walk.nbytes > 0) { unsigned int nbytes = walk.nbytes; @@ -83,7 +90,6 @@ static int chacha20_neon(struct skcipher_request *req) nbytes); err = skcipher_walk_done(&walk, walk.nbytes - nbytes); } - kernel_neon_end(); return err; } From patchwork Tue Dec 26 10:29:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122728 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp779127qgn; Tue, 26 Dec 2017 02:31:08 -0800 (PST) X-Google-Smtp-Source: ACJfBoscwD29hfWyWIFUYvQO2AgyJFSd2k1lfjtd5n+Ki3RnhCm8uDHecEx+H8lkHx6XtauFpWso X-Received: by 10.99.138.68 with SMTP id y65mr21826000pgd.160.1514284268020; Tue, 26 Dec 2017 02:31:08 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284268; cv=none; d=google.com; s=arc-20160816; b=Qr28r/UG3qRrDKy0q9OF03eYSdjjc6Xeq34pn0h0j3FbKsCoGAswVG9wPZoNNEOc6U JyCpipc/g5sUjMY632ZKm/QY1h7gQu/cCjZuEZzCPTk6gkiepBfwHSmXfXGKSBJgv6po dTiUp4oFmmEjem8acHlDn1b6THflIQeBhCsiYEkezv5Ez1E1w6/seXS0TOe7kdO9JOWy JKl+0ZDe94dPGhPf++xt0NGp14h9bU/yQL8VqiGT6Zn+uKScjIpc86R+Gnz2woAGXHTF aUHMk5R/8MnhnJaQONB28yy5HrNo6IjVVO1jdPML8dhktw/0o9KLJBOxnKzjkLiD2Xgd EpKg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=mPx5iqqZoQYjOqcGi3EBqUtnRcsEf9rWIzMkmnPsgJU=; b=A9M2EluT337LLz7PwL037JpaJX+dO609R0+M6/X8qSJOSED/JqCcvAt0zyu9RlMizf iLc+BYkOlRpo3H1wC2ueKaodSYL02mvjiU51vBGEzcyVFTPpYHw9+HnVXU2US5uFClFa /eDt3r0CFc5tehJ80sZTs+Pzw0LA8df5EgYVeOSRiBqsX5x5R6gKUlsbAx/82/Y6Y+ZP rO/3SrmDjC/yI/zc3E61lY64RW2Mnuhsr3N95HBIlVoMDr961b69ldzYQw6+0294aCqx Tn5fk6Ip/v2oQUWEraGa8XJMUDIKu7+EQu3yqNLsM5T+4uDXdaI0JU1ELy8ket04GJX3 RbTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Fc7gKD8W; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 1si22528060plw.770.2017.12.26.02.31.07; Tue, 26 Dec 2017 02:31:08 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Fc7gKD8W; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751160AbdLZKbC (ORCPT + 28 others); Tue, 26 Dec 2017 05:31:02 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:45142 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751070AbdLZKa5 (ORCPT ); Tue, 26 Dec 2017 05:30:57 -0500 Received: by mail-wr0-f193.google.com with SMTP id o15so2112954wrf.12 for ; Tue, 26 Dec 2017 02:30:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=mPx5iqqZoQYjOqcGi3EBqUtnRcsEf9rWIzMkmnPsgJU=; b=Fc7gKD8WRuzFjuZjVrath7oIVECAsNlOgqxrCSPSbAbeAocfW07H9s1sJaHba4DZJr p/SxQ/XJOWjWxTDksG4712ddXWG7DiXwVdYjlAddx/gFEX1Zzp1gsT7Na5qwxpFdfFTO M4/CTZxKVDkUsMaX+w6gazlC+5tG2pFWqbIPE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=mPx5iqqZoQYjOqcGi3EBqUtnRcsEf9rWIzMkmnPsgJU=; b=HoGfvI1ydZ+e8XqbxcD8/CdByMP4zf7OzGy5L3isQmdmblzWrdPJv/1vqQgeW8tmGt wU23TPL31R4vNLDKUDbXZ0w98D828qsM3fqor2+pCALwCr9tO5dJCNHfzUYYuDbMPDD4 nD4ggunJF3uJ07YICkxtkZkk9+tFrqt+GHTX3F1v/lUF8j/8PupyEgubsHYTsCoU5i6u OE95bO4/O73fGPsgEwEiONU/bJ5L3w2Yf4/INNpTGDi1yNqXqJML9bTKuiHkrswkhCtX OpTlznXcB2qC1aBKqNd9IQ7aXqkZBpaMTCwDLMP4hpKeJN9/xsMLbpaI7YSGFNJlnlJe lZGw== X-Gm-Message-State: AKGB3mK9sQk5b1e2CpFBA+eYXrxPJ0uit/+TkN32e0V0kmsXeDZhG0jH AEFd7CxR3uBvYwZoQk/3cS6UiSRWJzQ= X-Received: by 10.223.136.13 with SMTP id d13mr25155994wrd.76.1514284255818; Tue, 26 Dec 2017 02:30:55 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.53 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:30:55 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 06/20] crypto: arm64/aes-blk - remove configurable interleave Date: Tue, 26 Dec 2017 10:29:26 +0000 Message-Id: <20171226102940.26908-7-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The AES block mode implementation using Crypto Extensions or plain NEON was written before real hardware existed, and so its interleave factor was made build time configurable (as well as an option to instantiate all interleaved sequences inline rather than as subroutines) We ended up using INTERLEAVE=4 with inlining disabled for both flavors of the core AES routines, so let's stick with that, and remove the option to configure this at build time. This makes the code easier to modify, which is nice now that we're adding yield support. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/Makefile | 3 - arch/arm64/crypto/aes-modes.S | 237 ++++---------------- 2 files changed, 40 insertions(+), 200 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index b5edc5918c28..aaf4e9afd750 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -50,9 +50,6 @@ aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o -AFLAGS_aes-ce.o := -DINTERLEAVE=4 -AFLAGS_aes-neon.o := -DINTERLEAVE=4 - CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 65b273667b34..27a235b2ddee 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -13,44 +13,6 @@ .text .align 4 -/* - * There are several ways to instantiate this code: - * - no interleave, all inline - * - 2-way interleave, 2x calls out of line (-DINTERLEAVE=2) - * - 2-way interleave, all inline (-DINTERLEAVE=2 -DINTERLEAVE_INLINE) - * - 4-way interleave, 4x calls out of line (-DINTERLEAVE=4) - * - 4-way interleave, all inline (-DINTERLEAVE=4 -DINTERLEAVE_INLINE) - * - * Macros imported by this code: - * - enc_prepare - setup NEON registers for encryption - * - dec_prepare - setup NEON registers for decryption - * - enc_switch_key - change to new key after having prepared for encryption - * - encrypt_block - encrypt a single block - * - decrypt block - decrypt a single block - * - encrypt_block2x - encrypt 2 blocks in parallel (if INTERLEAVE == 2) - * - decrypt_block2x - decrypt 2 blocks in parallel (if INTERLEAVE == 2) - * - encrypt_block4x - encrypt 4 blocks in parallel (if INTERLEAVE == 4) - * - decrypt_block4x - decrypt 4 blocks in parallel (if INTERLEAVE == 4) - */ - -#if defined(INTERLEAVE) && !defined(INTERLEAVE_INLINE) -#define FRAME_PUSH stp x29, x30, [sp,#-16]! ; mov x29, sp -#define FRAME_POP ldp x29, x30, [sp],#16 - -#if INTERLEAVE == 2 - -aes_encrypt_block2x: - encrypt_block2x v0, v1, w3, x2, x8, w7 - ret -ENDPROC(aes_encrypt_block2x) - -aes_decrypt_block2x: - decrypt_block2x v0, v1, w3, x2, x8, w7 - ret -ENDPROC(aes_decrypt_block2x) - -#elif INTERLEAVE == 4 - aes_encrypt_block4x: encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 ret @@ -61,48 +23,6 @@ aes_decrypt_block4x: ret ENDPROC(aes_decrypt_block4x) -#else -#error INTERLEAVE should equal 2 or 4 -#endif - - .macro do_encrypt_block2x - bl aes_encrypt_block2x - .endm - - .macro do_decrypt_block2x - bl aes_decrypt_block2x - .endm - - .macro do_encrypt_block4x - bl aes_encrypt_block4x - .endm - - .macro do_decrypt_block4x - bl aes_decrypt_block4x - .endm - -#else -#define FRAME_PUSH -#define FRAME_POP - - .macro do_encrypt_block2x - encrypt_block2x v0, v1, w3, x2, x8, w7 - .endm - - .macro do_decrypt_block2x - decrypt_block2x v0, v1, w3, x2, x8, w7 - .endm - - .macro do_encrypt_block4x - encrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 - .endm - - .macro do_decrypt_block4x - decrypt_block4x v0, v1, v2, v3, w3, x2, x8, w7 - .endm - -#endif - /* * aes_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds, * int blocks) @@ -111,28 +31,21 @@ ENDPROC(aes_decrypt_block4x) */ AES_ENTRY(aes_ecb_encrypt) - FRAME_PUSH + stp x29, x30, [sp, #-16]! + mov x29, sp enc_prepare w3, x2, x5 .LecbencloopNx: -#if INTERLEAVE >= 2 - subs w4, w4, #INTERLEAVE + subs w4, w4, #4 bmi .Lecbenc1x -#if INTERLEAVE == 2 - ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 pt blocks */ - do_encrypt_block2x - st1 {v0.16b-v1.16b}, [x0], #32 -#else ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ - do_encrypt_block4x + bl aes_encrypt_block4x st1 {v0.16b-v3.16b}, [x0], #64 -#endif b .LecbencloopNx .Lecbenc1x: - adds w4, w4, #INTERLEAVE + adds w4, w4, #4 beq .Lecbencout -#endif .Lecbencloop: ld1 {v0.16b}, [x1], #16 /* get next pt block */ encrypt_block v0, w3, x2, x5, w6 @@ -140,34 +53,27 @@ AES_ENTRY(aes_ecb_encrypt) subs w4, w4, #1 bne .Lecbencloop .Lecbencout: - FRAME_POP + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_ecb_encrypt) AES_ENTRY(aes_ecb_decrypt) - FRAME_PUSH + stp x29, x30, [sp, #-16]! + mov x29, sp dec_prepare w3, x2, x5 .LecbdecloopNx: -#if INTERLEAVE >= 2 - subs w4, w4, #INTERLEAVE + subs w4, w4, #4 bmi .Lecbdec1x -#if INTERLEAVE == 2 - ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 ct blocks */ - do_decrypt_block2x - st1 {v0.16b-v1.16b}, [x0], #32 -#else ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ - do_decrypt_block4x + bl aes_decrypt_block4x st1 {v0.16b-v3.16b}, [x0], #64 -#endif b .LecbdecloopNx .Lecbdec1x: - adds w4, w4, #INTERLEAVE + adds w4, w4, #4 beq .Lecbdecout -#endif .Lecbdecloop: ld1 {v0.16b}, [x1], #16 /* get next ct block */ decrypt_block v0, w3, x2, x5, w6 @@ -175,7 +81,7 @@ AES_ENTRY(aes_ecb_decrypt) subs w4, w4, #1 bne .Lecbdecloop .Lecbdecout: - FRAME_POP + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_ecb_decrypt) @@ -204,30 +110,20 @@ AES_ENDPROC(aes_cbc_encrypt) AES_ENTRY(aes_cbc_decrypt) - FRAME_PUSH + stp x29, x30, [sp, #-16]! + mov x29, sp ld1 {v7.16b}, [x5] /* get iv */ dec_prepare w3, x2, x6 .LcbcdecloopNx: -#if INTERLEAVE >= 2 - subs w4, w4, #INTERLEAVE + subs w4, w4, #4 bmi .Lcbcdec1x -#if INTERLEAVE == 2 - ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 ct blocks */ - mov v2.16b, v0.16b - mov v3.16b, v1.16b - do_decrypt_block2x - eor v0.16b, v0.16b, v7.16b - eor v1.16b, v1.16b, v2.16b - mov v7.16b, v3.16b - st1 {v0.16b-v1.16b}, [x0], #32 -#else ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ mov v4.16b, v0.16b mov v5.16b, v1.16b mov v6.16b, v2.16b - do_decrypt_block4x + bl aes_decrypt_block4x sub x1, x1, #16 eor v0.16b, v0.16b, v7.16b eor v1.16b, v1.16b, v4.16b @@ -235,12 +131,10 @@ AES_ENTRY(aes_cbc_decrypt) eor v2.16b, v2.16b, v5.16b eor v3.16b, v3.16b, v6.16b st1 {v0.16b-v3.16b}, [x0], #64 -#endif b .LcbcdecloopNx .Lcbcdec1x: - adds w4, w4, #INTERLEAVE + adds w4, w4, #4 beq .Lcbcdecout -#endif .Lcbcdecloop: ld1 {v1.16b}, [x1], #16 /* get next ct block */ mov v0.16b, v1.16b /* ...and copy to v0 */ @@ -251,8 +145,8 @@ AES_ENTRY(aes_cbc_decrypt) subs w4, w4, #1 bne .Lcbcdecloop .Lcbcdecout: - FRAME_POP st1 {v7.16b}, [x5] /* return iv */ + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_cbc_decrypt) @@ -263,34 +157,19 @@ AES_ENDPROC(aes_cbc_decrypt) */ AES_ENTRY(aes_ctr_encrypt) - FRAME_PUSH + stp x29, x30, [sp, #-16]! + mov x29, sp enc_prepare w3, x2, x6 ld1 {v4.16b}, [x5] umov x6, v4.d[1] /* keep swabbed ctr in reg */ rev x6, x6 -#if INTERLEAVE >= 2 cmn w6, w4 /* 32 bit overflow? */ bcs .Lctrloop .LctrloopNx: - subs w4, w4, #INTERLEAVE + subs w4, w4, #4 bmi .Lctr1x -#if INTERLEAVE == 2 - mov v0.8b, v4.8b - mov v1.8b, v4.8b - rev x7, x6 - add x6, x6, #1 - ins v0.d[1], x7 - rev x7, x6 - add x6, x6, #1 - ins v1.d[1], x7 - ld1 {v2.16b-v3.16b}, [x1], #32 /* get 2 input blocks */ - do_encrypt_block2x - eor v0.16b, v0.16b, v2.16b - eor v1.16b, v1.16b, v3.16b - st1 {v0.16b-v1.16b}, [x0], #32 -#else ldr q8, =0x30000000200000001 /* addends 1,2,3[,0] */ dup v7.4s, w6 mov v0.16b, v4.16b @@ -303,23 +182,21 @@ AES_ENTRY(aes_ctr_encrypt) mov v2.s[3], v8.s[1] mov v3.s[3], v8.s[2] ld1 {v5.16b-v7.16b}, [x1], #48 /* get 3 input blocks */ - do_encrypt_block4x + bl aes_encrypt_block4x eor v0.16b, v5.16b, v0.16b ld1 {v5.16b}, [x1], #16 /* get 1 input block */ eor v1.16b, v6.16b, v1.16b eor v2.16b, v7.16b, v2.16b eor v3.16b, v5.16b, v3.16b st1 {v0.16b-v3.16b}, [x0], #64 - add x6, x6, #INTERLEAVE -#endif + add x6, x6, #4 rev x7, x6 ins v4.d[1], x7 cbz w4, .Lctrout b .LctrloopNx .Lctr1x: - adds w4, w4, #INTERLEAVE + adds w4, w4, #4 beq .Lctrout -#endif .Lctrloop: mov v0.16b, v4.16b encrypt_block v0, w3, x2, x8, w7 @@ -339,12 +216,12 @@ AES_ENTRY(aes_ctr_encrypt) .Lctrout: st1 {v4.16b}, [x5] /* return next CTR value */ - FRAME_POP + ldp x29, x30, [sp], #16 ret .Lctrtailblock: st1 {v0.16b}, [x0] - FRAME_POP + ldp x29, x30, [sp], #16 ret .Lctrcarry: @@ -378,7 +255,9 @@ CPU_LE( .quad 1, 0x87 ) CPU_BE( .quad 0x87, 1 ) AES_ENTRY(aes_xts_encrypt) - FRAME_PUSH + stp x29, x30, [sp, #-16]! + mov x29, sp + ld1 {v4.16b}, [x6] cbz w7, .Lxtsencnotfirst @@ -394,25 +273,8 @@ AES_ENTRY(aes_xts_encrypt) ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsencNx: -#if INTERLEAVE >= 2 - subs w4, w4, #INTERLEAVE + subs w4, w4, #4 bmi .Lxtsenc1x -#if INTERLEAVE == 2 - ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 pt blocks */ - next_tweak v5, v4, v7, v8 - eor v0.16b, v0.16b, v4.16b - eor v1.16b, v1.16b, v5.16b - do_encrypt_block2x - eor v0.16b, v0.16b, v4.16b - eor v1.16b, v1.16b, v5.16b - st1 {v0.16b-v1.16b}, [x0], #32 - cbz w4, .LxtsencoutNx - next_tweak v4, v5, v7, v8 - b .LxtsencNx -.LxtsencoutNx: - mov v4.16b, v5.16b - b .Lxtsencout -#else ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b @@ -421,7 +283,7 @@ AES_ENTRY(aes_xts_encrypt) eor v2.16b, v2.16b, v6.16b next_tweak v7, v6, v7, v8 eor v3.16b, v3.16b, v7.16b - do_encrypt_block4x + bl aes_encrypt_block4x eor v3.16b, v3.16b, v7.16b eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b @@ -430,11 +292,9 @@ AES_ENTRY(aes_xts_encrypt) mov v4.16b, v7.16b cbz w4, .Lxtsencout b .LxtsencloopNx -#endif .Lxtsenc1x: - adds w4, w4, #INTERLEAVE + adds w4, w4, #4 beq .Lxtsencout -#endif .Lxtsencloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b @@ -447,13 +307,15 @@ AES_ENTRY(aes_xts_encrypt) b .Lxtsencloop .Lxtsencout: st1 {v4.16b}, [x6] - FRAME_POP + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_xts_encrypt) AES_ENTRY(aes_xts_decrypt) - FRAME_PUSH + stp x29, x30, [sp, #-16]! + mov x29, sp + ld1 {v4.16b}, [x6] cbz w7, .Lxtsdecnotfirst @@ -469,25 +331,8 @@ AES_ENTRY(aes_xts_decrypt) ldr q7, .Lxts_mul_x next_tweak v4, v4, v7, v8 .LxtsdecNx: -#if INTERLEAVE >= 2 - subs w4, w4, #INTERLEAVE + subs w4, w4, #4 bmi .Lxtsdec1x -#if INTERLEAVE == 2 - ld1 {v0.16b-v1.16b}, [x1], #32 /* get 2 ct blocks */ - next_tweak v5, v4, v7, v8 - eor v0.16b, v0.16b, v4.16b - eor v1.16b, v1.16b, v5.16b - do_decrypt_block2x - eor v0.16b, v0.16b, v4.16b - eor v1.16b, v1.16b, v5.16b - st1 {v0.16b-v1.16b}, [x0], #32 - cbz w4, .LxtsdecoutNx - next_tweak v4, v5, v7, v8 - b .LxtsdecNx -.LxtsdecoutNx: - mov v4.16b, v5.16b - b .Lxtsdecout -#else ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 ct blocks */ next_tweak v5, v4, v7, v8 eor v0.16b, v0.16b, v4.16b @@ -496,7 +341,7 @@ AES_ENTRY(aes_xts_decrypt) eor v2.16b, v2.16b, v6.16b next_tweak v7, v6, v7, v8 eor v3.16b, v3.16b, v7.16b - do_decrypt_block4x + bl aes_decrypt_block4x eor v3.16b, v3.16b, v7.16b eor v0.16b, v0.16b, v4.16b eor v1.16b, v1.16b, v5.16b @@ -505,11 +350,9 @@ AES_ENTRY(aes_xts_decrypt) mov v4.16b, v7.16b cbz w4, .Lxtsdecout b .LxtsdecloopNx -#endif .Lxtsdec1x: - adds w4, w4, #INTERLEAVE + adds w4, w4, #4 beq .Lxtsdecout -#endif .Lxtsdecloop: ld1 {v1.16b}, [x1], #16 eor v0.16b, v1.16b, v4.16b @@ -522,7 +365,7 @@ AES_ENTRY(aes_xts_decrypt) b .Lxtsdecloop .Lxtsdecout: st1 {v4.16b}, [x6] - FRAME_POP + ldp x29, x30, [sp], #16 ret AES_ENDPROC(aes_xts_decrypt) From patchwork Tue Dec 26 10:29:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122741 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp782391qgn; Tue, 26 Dec 2017 02:34:32 -0800 (PST) X-Google-Smtp-Source: ACJfBosZ3nDXnLeIayyWG2kw+pIMHyiokBDC2XKcxuofa7tcILhw4MdzgQwAXisxLEHym3CdiMu6 X-Received: by 10.99.97.66 with SMTP id v63mr22014569pgb.184.1514284472886; Tue, 26 Dec 2017 02:34:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284472; cv=none; d=google.com; s=arc-20160816; b=RkUGfLF3GvLi1e6yvgqOU9ZX+hf+4xKTDSdWepgw/JjxiSpWmaahdqepB1XnZ3toex uQqAwkt2W+noKUCZ108HD3a3OfmgmWGZte6UBlA7d4NJfleGazH4KGbEifsc+Zu1f7YV HECRAU9tjBXFARa2KoatZRCEP0HtEaEE7ygiAMfsGUZx4bMtDZdAsk3YnnCYAm5u1KZ8 yLo6uEyiWaO0YVDBk2ZrwIB9dPImGboMwGZuwZNdQE6GZRvB2OnVVAKip1jpHTVwY1Cx ZHwu/Sjm9Kyu2Eik38XVLJ2WKWVjRN48t6fN8IMIvkuchFi9Dg7vBq3zDXsKiQpglAtS uqSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=I6Gnvs3Ig+XMQwO7jGoODHjYQvJyRj6RsM7VfBN0Rns=; b=V8XbQ0UNcnQli/5KiP8yhc2tz9H2E1dMFl640r+xakoNp8HZ7DT+DZJrBIkdVHt3Jy xe3x6i2vFlp2vBxiuMIPYY6YRx8H0s003DJ1fVnuDAO/6Pz/xisxj8Vw/2bv0qBMTH69 Qo5Hm0+rDndlpZq/qJXbNHl3vZdFGlbjAao3oJVBIBQWu+NzioQwgx+rg8RE4VZybLlo kJR8gvJo2Ek3mYcItpWYxg3/JSu0gkY/GoXg4JBc8OI4qqxiCl6iZOX8sSctYxcVHC2L 4RLHoAKOTLBWZk4/cjLphYxvOCGTw9gfq7zWJNRbj2pm7EMEjBzw0XzGvaeUKK3u3S7W WYFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IzqHiJlq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g8si23296861pln.755.2017.12.26.02.34.32; Tue, 26 Dec 2017 02:34:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IzqHiJlq; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751346AbdLZKe3 (ORCPT + 28 others); Tue, 26 Dec 2017 05:34:29 -0500 Received: from mail-wm0-f67.google.com ([74.125.82.67]:40139 "EHLO mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751111AbdLZKbA (ORCPT ); Tue, 26 Dec 2017 05:31:00 -0500 Received: by mail-wm0-f67.google.com with SMTP id f206so34333486wmf.5 for ; Tue, 26 Dec 2017 02:30:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=I6Gnvs3Ig+XMQwO7jGoODHjYQvJyRj6RsM7VfBN0Rns=; b=IzqHiJlqVMwHqeK502R0Rb0gO0YcyEoy8mBTLyern3S5c69/IVSddU0nXGk0tWEa7Z YNwa8e4HwCKzdAx6taiIeLGblw+Cx0moo1Vn1IOhicNG8o7jQT9+vwiSWDyDVry3qsi8 0NKu/d5cTXWUVR6Es8x1DfIoiZ5Z8q40MGuno= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=I6Gnvs3Ig+XMQwO7jGoODHjYQvJyRj6RsM7VfBN0Rns=; b=FHw/R91dGqQgo1mLJZ+DtF5JkPg0wp0he3l4zRig+JXMScVrTZQMOeQYZ4FDLaFpVE QhExKF/fgbk8SpLKBwC3AJ1p2SVU/F6RX9xT9vttbtgB56R6i1FUKyaS5036/vPVsrgy OtteGT9yxufQMnaKRFMxQfXQEU8Sl3wHVrUc9bp2amr5+zlPsntI8EIUkm71LRb52KGT 0UwsckGIF74KpLH7IT8rWunovxr/8zN0xAG1Cgg8c3aLH9Ldyt/jw/02d1uRi4q9PRyE n9MznGdCxr3YIV4fICSL/ogHMnTZqOUqBCyDiuP5GUxtkS8mfYVpEZfl4r+OTG4kTvrZ wYzw== X-Gm-Message-State: AKGB3mLlQEP5U+ymBYS6ZqhxxLhNbADABinMc7Ujr+o2j1u7L++GpTWm SBQ7ZKLP1VjJufUYwOlGkIzyARG/eG8= X-Received: by 10.28.222.132 with SMTP id v126mr19882764wmg.127.1514284258956; Tue, 26 Dec 2017 02:30:58 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.30.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:30:58 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 07/20] crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path Date: Tue, 26 Dec 2017 10:29:27 +0000 Message-Id: <20171226102940.26908-8-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CBC encryption is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC encrypt which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-modes.S | 31 ++++++++++++++++---- 1 file changed, 25 insertions(+), 6 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 27a235b2ddee..e86535a1329d 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -94,17 +94,36 @@ AES_ENDPROC(aes_ecb_decrypt) */ AES_ENTRY(aes_cbc_encrypt) - ld1 {v0.16b}, [x5] /* get iv */ + ld1 {v4.16b}, [x5] /* get iv */ enc_prepare w3, x2, x6 -.Lcbcencloop: - ld1 {v1.16b}, [x1], #16 /* get next pt block */ - eor v0.16b, v0.16b, v1.16b /* ..and xor with iv */ +.Lcbcencloop4x: + subs w4, w4, #4 + bmi .Lcbcenc1x + ld1 {v0.16b-v3.16b}, [x1], #64 /* get 4 pt blocks */ + eor v0.16b, v0.16b, v4.16b /* ..and xor with iv */ encrypt_block v0, w3, x2, x6, w7 - st1 {v0.16b}, [x0], #16 + eor v1.16b, v1.16b, v0.16b + encrypt_block v1, w3, x2, x6, w7 + eor v2.16b, v2.16b, v1.16b + encrypt_block v2, w3, x2, x6, w7 + eor v3.16b, v3.16b, v2.16b + encrypt_block v3, w3, x2, x6, w7 + st1 {v0.16b-v3.16b}, [x0], #64 + mov v4.16b, v3.16b + b .Lcbcencloop4x +.Lcbcenc1x: + adds w4, w4, #4 + beq .Lcbcencout +.Lcbcencloop: + ld1 {v0.16b}, [x1], #16 /* get next pt block */ + eor v4.16b, v4.16b, v0.16b /* ..and xor with iv */ + encrypt_block v4, w3, x2, x6, w7 + st1 {v4.16b}, [x0], #16 subs w4, w4, #1 bne .Lcbcencloop - st1 {v0.16b}, [x5] /* return iv */ +.Lcbcencout: + st1 {v4.16b}, [x5] /* return iv */ ret AES_ENDPROC(aes_cbc_encrypt) From patchwork Tue Dec 26 10:29:29 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122729 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp779269qgn; Tue, 26 Dec 2017 02:31:15 -0800 (PST) X-Google-Smtp-Source: ACJfBouqbsRM3pqtzZCS6WVsARd5EE9cyEO/mSUBWcUFAeBIJqes1c4nWi2WyFeEihLrb21zET8D X-Received: by 10.84.204.8 with SMTP id a8mr24743313ple.399.1514284275893; Tue, 26 Dec 2017 02:31:15 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284275; cv=none; d=google.com; s=arc-20160816; b=bbc38P/7rvG3rf521X8ZWtcY4j/yjieJ5ImTtEjmzSxR3Eg/0SpNmwevwl87x3StUq BlakgSwOy/P3GpzGQkxoZfdhCAtzJPONz7Hs3yQ/vLM6G/KVWeliPOq2nNGFolDoCHAb KaG9M0SP5WHC3mHER/oPcSdH2hxj6NcvcGKeDhN054qV6n//UsbWNKfFmKbS3Pvt8NHV C+eETezXbm4XoQaF1dlupZmW1l9d0iYDmVkRGN4DUOA3v/KLNdKE9/R1egIEFj9wSbiQ yB7FPGaGylxFfOgaBnSEIzha4W6/fsHlGWgvvXD+YKO6TvptPfQ2spi3O1LLrrW2IKtv audw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=GfvTpGsg7H0MllLZEHCStPTonQuLCjicFRmTxJ11HB6oanWegUyeXEOvXIAEfiAlNm ACoNGiVc1EuqM289pi4l7SlKGctwJ9VY4f1rVzHBqoAbQ9ulETqCPnhQR1SN0lqdoIxc zIE1tcrn6Y7jgiB9PSUyCaKP4GZJhfOrSh9b0woagYj7eVMX0Xrdd/AJetKUMmf2Mrae 9pRPE5RrPOn0m8ya0rR58Xb1zqh20Zsdx+OPU3eBNW+Js/SZ5sUQnVVmfC1LWsspYCJD dts9tnvv1LKqxBYiZnGKsP/ijRnbt4xbZARZlb2vRLYgk8d+uCDguDz1EupoFWt8ht4A aN+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dZ4+ob3B; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d13si5555913pgu.499.2017.12.26.02.31.15; Tue, 26 Dec 2017 02:31:15 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=dZ4+ob3B; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751241AbdLZKbK (ORCPT + 28 others); Tue, 26 Dec 2017 05:31:10 -0500 Received: from mail-wr0-f193.google.com ([209.85.128.193]:45150 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751205AbdLZKbG (ORCPT ); Tue, 26 Dec 2017 05:31:06 -0500 Received: by mail-wr0-f193.google.com with SMTP id o15so2113180wrf.12 for ; Tue, 26 Dec 2017 02:31:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=dZ4+ob3B4HsCxqdA2JHHbkiaIvB2WRogFed0Hhdh/NQQjem4CcMSHD+G1IPf0DjbHN z5tMaIsY/FS0rt0nqft44WUE6z7J40YpXKdRc2JupQ/59h6o2346etveM8WLF0+6On+U FtZ81AoRzzsRU9S8EfCU/FlFZsnKim08wpzCI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=5zk1lw0Of4wZ9+5EpqGNCIfS63EqvIFydBGYbdg5eLE=; b=VHqHwSN7vLMFDmZryo9SyEo7nc4+qCa1JhNOUBhtt1lhqc3aCN12D/4THUg2ia9Uy9 jclKenO3kSLhvKbi88SAzLe/OxqTSmiMmU65NO8CJYT/36eZdRxYhpjV1wmqy/BHHudI dOYwUwd0oBLx9U4ETxVI6pqfXN+ErHa25YpaL2W1mmDW7HHQyAveOjq4Ccjf5zo0Zcpj a4cpK2LXZcLCJYBDkj/DTzbGGHBsytt4uHe+Kt7k9HTwKahNcW/hoM4owlKk8S8Ezra3 VVJQSRyrB+nv+9J+KIwmbhmS9hdq8/lYohqsEWrC2r0ZUozCE/hQdUMgxZy+LnRppdJL uDWg== X-Gm-Message-State: AKGB3mLW81wt1UdwX/DoSWQxwlGOTb86DAUGVWZmQvLQgubV7cqNjf1V pc9PX4oK6WVADRT7AM8snXv3pipTz6g= X-Received: by 10.223.151.208 with SMTP id t16mr14233358wrb.200.1514284265295; Tue, 26 Dec 2017 02:31:05 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.02 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:04 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 09/20] crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels Date: Tue, 26 Dec 2017 10:29:29 +0000 Message-Id: <20171226102940.26908-10-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Tweak the SHA256 update routines to invoke the SHA256 block transform block by block, to avoid excessive scheduling delays caused by the NEON algorithm running with preemption disabled. Also, remove a stale comment which no longer applies now that kernel mode NEON is actually disallowed in some contexts. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/sha256-glue.c | 36 +++++++++++++------- 1 file changed, 23 insertions(+), 13 deletions(-) -- 2.11.0 diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c index b064d925fe2a..e8880ccdc71f 100644 --- a/arch/arm64/crypto/sha256-glue.c +++ b/arch/arm64/crypto/sha256-glue.c @@ -89,21 +89,32 @@ static struct shash_alg algs[] = { { static int sha256_update_neon(struct shash_desc *desc, const u8 *data, unsigned int len) { - /* - * Stacking and unstacking a substantial slice of the NEON register - * file may significantly affect performance for small updates when - * executing in interrupt context, so fall back to the scalar code - * in that case. - */ + struct sha256_state *sctx = shash_desc_ctx(desc); + if (!may_use_simd()) return sha256_base_do_update(desc, data, len, (sha256_block_fn *)sha256_block_data_order); - kernel_neon_begin(); - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); - kernel_neon_end(); + while (len > 0) { + unsigned int chunk = len; + + /* + * Don't hog the CPU for the entire time it takes to process all + * input when running on a preemptible kernel, but process the + * data block by block instead. + */ + if (IS_ENABLED(CONFIG_PREEMPT) && + chunk + sctx->count % SHA256_BLOCK_SIZE > SHA256_BLOCK_SIZE) + chunk = SHA256_BLOCK_SIZE - + sctx->count % SHA256_BLOCK_SIZE; + kernel_neon_begin(); + sha256_base_do_update(desc, data, chunk, + (sha256_block_fn *)sha256_block_neon); + kernel_neon_end(); + data += chunk; + len -= chunk; + } return 0; } @@ -117,10 +128,9 @@ static int sha256_finup_neon(struct shash_desc *desc, const u8 *data, sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_data_order); } else { - kernel_neon_begin(); if (len) - sha256_base_do_update(desc, data, len, - (sha256_block_fn *)sha256_block_neon); + sha256_update_neon(desc, data, len); + kernel_neon_begin(); sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_block_neon); kernel_neon_end(); From patchwork Tue Dec 26 10:29:30 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122740 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp781799qgn; Tue, 26 Dec 2017 02:33:53 -0800 (PST) X-Google-Smtp-Source: ACJfBovJvVqlv+t/UzCDacl82V9bxwtRPe96qBTxnNcwe2RBB14RB3tTWiurebiOi3fMNYngS67K X-Received: by 10.101.100.4 with SMTP id a4mr21784125pgv.76.1514284433727; Tue, 26 Dec 2017 02:33:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284433; cv=none; d=google.com; s=arc-20160816; b=dXlsXMSBRUt9Qmb8jqpQQpblwlJaOzfUC1i97Hi+OO5l1b8OFcb3Co6BFhYdhyE4YW 7zF7v8oBlvVem8r01sxh51s+jZJtCLiJ3BebRF0CyqiUobyYCiWgR28WXXQnMAFXTnHX 2SlLDB1EzYPp4zzRBMKYaIcTkJVlrEgxt8qsuOtKeXbmylqCRVzbQBP2nWarbCfPW0rp qRtYXFa2BcywulEhfn2PIk5EgD9Fuc0uzukmKkswRW6pvAgQSUWLrKqHaFWE0hAdbMzB xgqQztMF9OIQ16fuenHy11SL1F3eZiTN8I/iGEx/6F0eHqgjJyYkTb+QlimVbCk8JRjR dr5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=AbBcmMD3JcZ+dotMW8Cfk674GIJYl4wfVzfE5phVVuk=; b=ucBUYC3xJD6+dOyBUr9kSvJABPouArLf2zqam1YRp4O1yIeVpH8HVr8PsIQ36P2UoK cBtMfVpRlMDjUyDriArcM23/6QsJgsjoF0ZDjjY4Wba9hl5FGFMf2Irj9/EtB6M2g5SJ sgYCl84fq0EOjntKyFmiovCF4f/XXzJlaGCkTi6bVSRANfQCQag4dDxndxRFiCKdVE3m DjoCQBrZ8PyZiJqdfv4NB6PQI1jxvyX1IjxA9QZjpC4+MdwlXxROF6DM+XlG+W3TdZF1 1QzuaGY0/Sv9tdk+HJWoWYiyr9iCHVFYYkoFZ8hGFF8VY16Zg+MLdNFMf/FsLR5jKROL PXyw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GHr4CCGz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k8si23429097pln.310.2017.12.26.02.33.53; Tue, 26 Dec 2017 02:33:53 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=GHr4CCGz; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751596AbdLZKdu (ORCPT + 28 others); Tue, 26 Dec 2017 05:33:50 -0500 Received: from mail-wr0-f196.google.com ([209.85.128.196]:39313 "EHLO mail-wr0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751119AbdLZKbJ (ORCPT ); Tue, 26 Dec 2017 05:31:09 -0500 Received: by mail-wr0-f196.google.com with SMTP id o101so5666428wrb.6 for ; Tue, 26 Dec 2017 02:31:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=AbBcmMD3JcZ+dotMW8Cfk674GIJYl4wfVzfE5phVVuk=; b=GHr4CCGzvzrv9CqIa3XzctshWArv4KV7zg9uqijok63jNaSMuT+OvIE3OIROkGCZ0T uj16Z5L/jB0zLfYuzilYR5RO3k0YxEQP2qopAU2ms0IfYIrOPuHrrK3QFmK+Q5C2y7If tQ2V/u9xPDPzqknXEYmURhA30RRYcVFsiD1go= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=AbBcmMD3JcZ+dotMW8Cfk674GIJYl4wfVzfE5phVVuk=; b=Bv3JObPh34zPpEunZNYtQRajKxLtfULhwoGNB7JlyHWtxNzpy0sQTdgqa6Xeac7Gfa JnhJxC29DEt/9TKistqnf0mXW27bmsnN6PfWPKGLzeXhWYiDK9wujvMeLqT/RrhA4mfK OKED3mYpHzxL+jrK0hyu1vKUJHYaifqQY4qm5YxCyL1KgYpvliXNOs2P3O4GbLh2ZLPG HZ+CeODX2OWq9U4+uYPcJ8CIPGQASyHHQhpXfNai6sLzv37K16Yf9V5WomPcYAAYAW14 rhX/hBvWvTr/TNR8foS7iM6j3FWObTfT3Hr0lhitsmUDKpCnakZZiNn61DFe73RG/sj5 vq5Q== X-Gm-Message-State: AKGB3mKLL5YRCmbJazmRR5l8F9jfIFgKuXV5tbXcw3I1FvgANcZehioW PKoHIgjZsRlrKcizTWOifjWdUsqOnFg= X-Received: by 10.223.191.2 with SMTP id p2mr12335731wrh.81.1514284268350; Tue, 26 Dec 2017 02:31:08 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:07 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 10/20] arm64: assembler: add utility macros to push/pop stack frames Date: Tue, 26 Dec 2017 10:29:30 +0000 Message-Id: <20171226102940.26908-11-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org We are going to add code to all the NEON crypto routines that will turn them into non-leaf functions, so we need to manage the stack frames. To make this less tedious and error prone, add some macros that take the number of callee saved registers to preserve and the extra size to allocate in the stack frame (for locals) and emit the ldp/stp sequences. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 57 ++++++++++++++++++++ 1 file changed, 57 insertions(+) -- 2.11.0 diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 8b168280976f..9130be742a32 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -499,6 +499,63 @@ alternative_else_nop_endif #endif .endm + /* + * frame_push - Push @regcount callee saved registers to the stack, + * starting at x19, as well as x29/x30, and set x29 to + * the new value of sp. Add @extra bytes of stack space + * for locals. + */ + .macro frame_push, regcount:req, extra + __frame st, \regcount, \extra + .endm + + /* + * frame_pop - Pop the callee saved registers from the stack that were + * pushed in the most recent call to frame_push, as well + * as x29/x30 and any extra stack space that may have been + * allocated. + */ + .macro frame_pop + __frame ld + .endm + + .macro __frame_regs, reg1, reg2, op, num + .if .Lframe_regcount == \num + \op\()r \reg1, [sp, #(\num + 1) * 8] + .elseif .Lframe_regcount > \num + \op\()p \reg1, \reg2, [sp, #(\num + 1) * 8] + .endif + .endm + + .macro __frame, op, regcount, extra=0 + .ifc \op, st + .if (\regcount) < 0 || (\regcount) > 10 + .error "regcount should be in the range [0 ... 10]" + .endif + .if ((\extra) % 16) != 0 + .error "extra should be a multiple of 16 bytes" + .endif + .set .Lframe_regcount, \regcount + .set .Lframe_extra, \extra + .set .Lframe_local_offset, ((\regcount + 3) / 2) * 16 + stp x29, x30, [sp, #-.Lframe_local_offset - .Lframe_extra]! + mov x29, sp + .elseif .Lframe_regcount == -1 // && op == 'ld' + .error "frame_push/frame_pop may not be nested" + .endif + + __frame_regs x19, x20, \op, 1 + __frame_regs x21, x22, \op, 3 + __frame_regs x23, x24, \op, 5 + __frame_regs x25, x26, \op, 7 + __frame_regs x27, x28, \op, 9 + + .ifc \op, ld + ldp x29, x30, [sp], #.Lframe_local_offset + .Lframe_extra + .set .Lframe_regcount, -1 + .endif + .endm + /* * Errata workaround post TTBR0_EL1 update. */ From patchwork Tue Dec 26 10:29:31 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 122730 Delivered-To: patch@linaro.org Received: by 10.140.22.227 with SMTP id 90csp779434qgn; Tue, 26 Dec 2017 02:31:23 -0800 (PST) X-Google-Smtp-Source: ACJfBoubP59yNtokedlFSrMzmth5xvApFObUkHYaQ+Qem7W6oMWoV2ZWRrMfNFoV2I9RHH++i0PT X-Received: by 10.98.242.67 with SMTP id y3mr24885641pfl.82.1514284282924; Tue, 26 Dec 2017 02:31:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1514284282; cv=none; d=google.com; s=arc-20160816; b=v1+2Offv26CzJgYksE2m4HX9Cht1MKZbAFtL5CyYyJJBJFOEOoIWxsstr9lCF0w+zU WLIT2xA4bLXZRdzMr1XkS2+4LTMumR8fDcbtIHwGlLFCu2pLA7N/tNp+/InPkm2WF35F GirX+3h7XHbLTBJp4boaKAeROeDwLh7cCFagrlKBbTa30mTAgB8CgIMix0baFHwxZ7TW 3WtBhVp2JDCPgrtVUIqAXovSSGJNawH1W2EB+EsLi6D8NxeAZ8b6IGtxXDpCEHRZsgjq 5dgKW0A/oYZGFVSADkmvscLqf0E8phzFtmYUo2k1gdafEAHB7lYs1VusAgAcnnh0I5ne OgAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=pUlM6bflLg2O458f4GTJCNwhz/sR/9fYScn2QAM5ApM=; b=NvBqPEtuOICwp50A4/LMRKrxYTBgDPYl3t+HoVD7UspVuJ3eQKjShHz5/QVLgW7Oxi PUuv3KexxWamjry42EjXRYQ5vl/NLXTflJsrGUzNioUkhMhTsYXPaQEyLtLt82ck1O9D 3vvk7c19qs0miODO9XBwEqx4QWrCYGdpOyhCFxHDOr5Rsd4HXHnqr0X3d4nr1Nt/4NMT Hfahfq+KS7iJvTta6LbfOkBf09MeEnqkefwYnTOmDsQNqGiS8JDZweka6OLfxmyMlDTN 9NwNv7kBFcaMmHcnT9lrqq2T0oAIUsuooyITHwTClTBn7GEj/jSrA2IIx2bzMFGo3ZII ictQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QzGpfaZT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a90si22682509plc.137.2017.12.26.02.31.22; Tue, 26 Dec 2017 02:31:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=QzGpfaZT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751287AbdLZKbS (ORCPT + 28 others); Tue, 26 Dec 2017 05:31:18 -0500 Received: from mail-wr0-f194.google.com ([209.85.128.194]:45159 "EHLO mail-wr0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750949AbdLZKbN (ORCPT ); Tue, 26 Dec 2017 05:31:13 -0500 Received: by mail-wr0-f194.google.com with SMTP id o15so2113342wrf.12 for ; Tue, 26 Dec 2017 02:31:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=pUlM6bflLg2O458f4GTJCNwhz/sR/9fYScn2QAM5ApM=; b=QzGpfaZTdDSoVv85u5z44wd8mm9qD+sjZG1XauTy9ZGPg2cn9mf5Wj5QXIyHn95OTE NlLjQa/+l0lakYLHjRHi1KT9jqwEK5Tw9wvkLmQ2AV/4Nf2O45KUAS19zDL3BhjrT8SO kCbzikXDS52PkU/0kdmMAZ0DUpUQhHp3QY1gE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=pUlM6bflLg2O458f4GTJCNwhz/sR/9fYScn2QAM5ApM=; b=CfkYLvG/Yu7Qr32E4/UzoW+Suy52JTjw9hxT62tUbactlVk4SV9dRxXRuIY+ICtW3E AmQlJ/4oueAdMAtzCl16NLun7gGpQuQgj8WkWaIFKwiytlcRYJ5JMvBxabkaFnJwgj89 dSUPbY6EWxKZSxOW5MqFQxUQtTUF2aflEZMzjixEa55ish4v2ZdH4ZUr91Ln7CS9Ig+1 dLZ+lSJEDQL0RnWmAkpvGM8KrHzdoGMQERlx/GTJX4LPvj/M/EWf5dX8V6WOC1ZiZE1m eU0sA0wLivXr5jq6CG4Nsae8t8xCKyoNKvXtSWczaycjmaBr9UdO19VcPGDu3nYC224y Jceg== X-Gm-Message-State: AKGB3mJl7G1hnD+0DzA9Nd38d3D8bYdG1pSu1JKqUzuypX4uiemTsvdK 32Ddt/p6jgkPW1Osfn/u7yY4SmztZXY= X-Received: by 10.223.136.13 with SMTP id d13mr25156641wrd.76.1514284270716; Tue, 26 Dec 2017 02:31:10 -0800 (PST) Received: from localhost.localdomain ([160.171.216.245]) by smtp.gmail.com with ESMTPSA id l142sm13974036wmb.43.2017.12.26.02.31.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 26 Dec 2017 02:31:10 -0800 (PST) From: Ard Biesheuvel To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel , Dave Martin , Russell King - ARM Linux , Sebastian Andrzej Siewior , Mark Rutland , linux-rt-users@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Will Deacon , Steven Rostedt , Thomas Gleixner Subject: [PATCH v4 11/20] arm64: assembler: add macros to conditionally yield the NEON under PREEMPT Date: Tue, 26 Dec 2017 10:29:31 +0000 Message-Id: <20171226102940.26908-12-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20171226102940.26908-1-ard.biesheuvel@linaro.org> References: <20171226102940.26908-1-ard.biesheuvel@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add support macros to conditionally yield the NEON (and thus the CPU) that may be called from the assembler code. In some cases, yielding the NEON involves saving and restoring a non trivial amount of context (especially in the CRC folding algorithms), and so the macro is split into three, and the code in between is only executed when the yield path is taken, allowing the context to be preserved. The third macro takes an optional label argument that marks the resume path after a yield has been performed. Signed-off-by: Ard Biesheuvel --- arch/arm64/include/asm/assembler.h | 64 ++++++++++++++++++++ arch/arm64/kernel/asm-offsets.c | 2 + 2 files changed, 66 insertions(+) -- 2.11.0 diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 9130be742a32..575d0f065d28 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -579,4 +579,68 @@ alternative_else_nop_endif #endif .endm +/* + * Check whether to yield to another runnable task from kernel mode NEON code + * (which runs with preemption disabled). + * + * if_will_cond_yield_neon + * // pre-yield patchup code + * do_cond_yield_neon + * // post-yield patchup code + * endif_yield_neon