From patchwork Tue Sep 25 14:56:01 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147488 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp829845lji; Tue, 25 Sep 2018 07:56:40 -0700 (PDT) X-Google-Smtp-Source: ACcGV63pB2VYSHnT3mjieSATAcVBvYFOGoKV6lvlDD4bAY4Ety2bpoOnN03jDMOe8tC4IGZRjR4O X-Received: by 2002:aa7:881a:: with SMTP id c26-v6mr1611221pfo.82.1537887399886; Tue, 25 Sep 2018 07:56:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887399; cv=none; d=google.com; s=arc-20160816; b=wWr2DlLSbGWXD5tcuOjXwyXROR8n4POPNwztrORVwhKphlQVMYppfHARKzv7BC1DWs TXOgw7FLzohy0PKtwdSNg0oJTZBTaallVncNU7XPzM13e/YMMqRXDTJSounWh0CJWiS0 WIqwCyUy4MR1v23MWFRJnvuza3tQvajVwnkzOvFNaeYLdSw4vwHS0h+NQARAy3gkeqGF qC8bd5vf0w9/uiskxrwcaV9aS36mBUherL1GoEhq6b7eqs9nYIRIYymV6nYXyYBrUbYj 5YsEifNrdaQdP92326jCWFLkICmHZ9yvmZmYx7hD1H8CYq8owenxpVzIZ6rqxOgDpzUT KItQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Lw6tgFVnYKZm+0V0DA7UAcyEXzWvZV9Zf4KFrDBZaTs=; b=vWEcuhm/I1tjVsBQEr6kmyJNO7m2wJ98BGyui7+Zt7kKdizlxE3DhwLSu6/J+C6835 gwoXiuhTmNbbRAXthRJAF2MF9Dfu84+UtenWjYcZ5IuKLArCV3RKlYRzAGFIkAop9zs+ hq1pPo2n4E680reA+5P1RMhybsnoRuwYgHc9MH9dpleO0IbtEpEYS87TeoGXXySSbqbJ OZpb40sktZxBjISWSBwYVOZt6gbOijfOc9EyJFBlfzjVN10+aUK1GNeRSkXbQcyqqZGM c/Og9goPxKoHg2sIXZyf7xEX55p2ph49dk78/NP4Zgo0MeQxeCs+4+FV+hrH84JzF2EX yfiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=wtj9omrB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s36-v6si2781392pld.8.2018.09.25.07.56.39; Tue, 25 Sep 2018 07:56:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=wtj9omrB; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729563AbeIYVEa (ORCPT + 32 others); Tue, 25 Sep 2018 17:04:30 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:58261 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729506AbeIYVE3 (ORCPT ); Tue, 25 Sep 2018 17:04:29 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 8812145b; Tue, 25 Sep 2018 14:38:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=Q7syU/h/28EvQy9aLoBYxyAwK wI=; b=wtj9omrBzKcKNwkCPgiJn5FULV983XYx8LDI4wsGCyXmPIPawylKGWZQJ 5y9qYt0655UyxQ66qM2V9IIzhUun6Hmuryocy7b0w33jF1e6jp+5Nkmu8VcUCMEl Gsr/acOgMHspO3Weg9yDgApajkP09H1Hr8QEf0+S2mLxjLfy7n/i878rXMmqFZmu 2xl0NYb3QGrbIN9oyyHNzWHmtVoIvYMZJ2Jush5GzJTO4e+a5TxI+BSnpemcvwEE IzvkCg8uWHboDZwWV5zpx5h09mmese7pzHV/o9yhuqwA0bbnFTjfhaMgJZ7tm2oc vQfo1cApCAMGCBXcRiB4FyVgWzPxQ== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id c15420fa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:07 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson Subject: [PATCH net-next v6 02/23] zinc: introduce minimal cryptography library Date: Tue, 25 Sep 2018 16:56:01 +0200 Message-Id: <20180925145622.29959-3-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Zinc stands for "Zinc Is Neat Crypto" or "Zinc as IN Crypto" or maybe just "Zx2c4's INsane Cryptolib." It's also short, easy to type, and plays nicely with the recent trend of naming crypto libraries after elements. The guiding principle is "don't overdo it". It's less of a library and more of a directory tree for organizing well-curated direct implementations of cryptography primitives. Zinc is a new cryptography API that is much more minimal and lower-level than the current one. It intends to complement it and provide a basis upon which the current crypto API might build, as the provider of software implementations of cryptographic primitives. It is motivated by three primary observations in crypto API design: * Highly composable "cipher modes" and related abstractions from 90s cryptographers did not turn out to be as terrific an idea as hoped, leading to a host of API misuse problems. * Most programmers are afraid of crypto code, and so prefer to integrate it into libraries in a highly abstracted manner, so as to shield themselves from implementation details. Cryptographers, on the other hand, prefer simple direct implementations, which they're able to verify for high assurance and optimize in accordance with their expertise. * Overly abstracted and flexible cryptography APIs lead to a host of dangerous problems and performance issues. The kernel is in the business usually not of coming up with new uses of crypto, but rather implementing various constructions, which means it essentially needs a library of primitives, not a highly abstracted enterprise-ready pluggable system, with a few particular exceptions. This last observation has seen itself play out several times over and over again within the kernel: * The perennial move of actual primitives away from crypto/ and into lib/, so that users can actually call these functions directly with no overhead and without lots of allocations, function pointers, string specifier parsing, and general clunkiness. For example: sha256, chacha20, siphash, sha1, and so forth live in lib/ rather than in crypto/. Zinc intends to stop the cluttering of lib/ and introduce these direct primitives into their proper place, lib/zinc/. * An abundance of misuse bugs with the present crypto API that have been very unpleasant to clean up. * A hesitance to even use cryptography, because of the overhead and headaches involved in accessing the routines. Zinc goes in a rather different direction. Rather than providing a thoroughly designed and abstracted API, Zinc gives you simple functions, which implement some primitive, or some particular and specific construction of primitives. It is not dynamic in the least, though one could imagine implementing a complex dynamic dispatch mechanism (such as the current crypto API) on top of these basic functions. After all, dynamic dispatch is usually needed for applications with cipher agility, such as IPsec, dm-crypt, AF_ALG, and so forth, and the existing crypto API will continue to play that role. However, Zinc will provide a non- haphazard way of directly utilizing crypto routines in applications that do have neither the need nor desire for abstraction and dynamic dispatch. It also organizes the implementations in a simple, straight-forward, and direct manner, making it enjoyable and intuitive to work on. Rather than moving optimized assembly implementations into arch/, it keeps them all together in lib/zinc/, making it simple and obvious to compare and contrast what's happening. This is, notably, exactly what the lib/raid6/ tree does, and that seems to work out rather well. It's also the pattern of most successful crypto libraries. The architecture- specific glue-code is made a part of each translation unit, rather than being in a separate one, so that generic and architecture-optimized code are combined at compile-time, and incompatibility branches compiled out by the optimizer. All implementations have been extensively tested and fuzzed, and are selected for their quality, trustworthiness, and performance. Wherever possible and performant, formally verified implementations are used, such as those from HACL* [1] and Fiat-Crypto [2]. The routines also take special care to zero out secrets using memzero_explicit (and future work is planned to have gcc do this more reliably and performantly with compiler plugins). The performance of the selected implementations is state-of-the-art and unrivaled on a broad array of hardware, though of course we will continue to fine tune these to the hardware demands needed by kernel contributors. Each implementation also comes with extensive self-tests and crafted test vectors, pulled from various places such as Wycheproof [9]. Regularity of function signatures is important, so that users can easily "guess" the name of the function they want. Though, individual primitives are oftentimes not trivially interchangeable, having been designed for different things and requiring different parameters and semantics, and so the function signatures they provide will directly reflect the realities of the primitives' usages, rather than hiding it behind (inevitably leaky) abstractions. Also, in contrast to the current crypto API, Zinc functions can work on stack buffers, and can be called with different keys, without requiring allocations or locking. SIMD is used automatically when available, though some routines may benefit from either having their SIMD disabled for particular invocations, or to have the SIMD initialization calls amortized over several invocations of the function, and so Zinc utilizes function signatures enabling that in conjunction with the recently introduced simd_context_t. More generally, Zinc provides function signatures that allow just what is required by the various callers. This isn't to say that users of the functions will be permitted to pollute the function semantics with weird particular needs, but we are trying very hard not to overdo it, and that means looking carefully at what's actually necessary, and doing just that, and not much more than that. Remember: practicality and cleanliness rather than over-zealous infrastructure. Zinc provides also an opening for the best implementers in academia to contribute their time and effort to the kernel, by being sufficiently simple and inviting. In discussing this commit with some of the best and brightest over the last few years, there are many who are eager to devote rare talent and energy to this effort. Following the merging of this, I expect for the primitives that currently exist in lib/ to work their way into lib/zinc/, after intense scrutiny of each implementation, potentially replacing them with either formally-verified implementations, or better studied and faster state-of-the-art implementations. Also following the merging of this, I expect for the old crypto API implementations to be ported over to use Zinc for their software-based implementations. As Zinc is simply library code, its config options are un-menued, with the exception of CONFIG_ZINC_DEBUG, which enables various selftests and BUG_ONs. [1] https://github.com/project-everest/hacl-star [2] https://github.com/mit-plv/fiat-crypto [3] https://cr.yp.to/ecdh.html [4] https://cr.yp.to/chacha.html [5] https://cr.yp.to/snuffle/xsalsa-20081128.pdf [6] https://cr.yp.to/mac.html [7] https://blake2.net/ [8] https://tools.ietf.org/html/rfc8439 [9] https://github.com/google/wycheproof Signed-off-by: Jason A. Donenfeld Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson --- MAINTAINERS | 8 ++++++++ lib/Kconfig | 2 ++ lib/Makefile | 2 ++ lib/zinc/Kconfig | 32 ++++++++++++++++++++++++++++++++ lib/zinc/Makefile | 3 +++ 5 files changed, 47 insertions(+) create mode 100644 lib/zinc/Kconfig create mode 100644 lib/zinc/Makefile -- 2.19.0 diff --git a/MAINTAINERS b/MAINTAINERS index 4327777dce57..5967c737f3ce 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16170,6 +16170,14 @@ Q: https://patchwork.linuxtv.org/project/linux-media/list/ S: Maintained F: drivers/media/dvb-frontends/zd1301_demod* +ZINC CRYPTOGRAPHY LIBRARY +M: Jason A. Donenfeld +M: Samuel Neves +S: Maintained +F: lib/zinc/ +F: include/zinc/ +L: linux-crypto@vger.kernel.org + ZPOOL COMPRESSED PAGE STORAGE API M: Dan Streetman L: linux-mm@kvack.org diff --git a/lib/Kconfig b/lib/Kconfig index a3928d4438b5..3e6848269c66 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -485,6 +485,8 @@ config GLOB_SELFTEST module load) by a small amount, so you're welcome to play with it, but you probably don't need it. +source "lib/zinc/Kconfig" + # # Netlink attribute parsing support is select'ed if needed # diff --git a/lib/Makefile b/lib/Makefile index ca3f7ebb900d..571d28d3f143 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -214,6 +214,8 @@ obj-$(CONFIG_PERCPU_TEST) += percpu_test.o obj-$(CONFIG_ASN1) += asn1_decoder.o +obj-y += zinc/ + obj-$(CONFIG_FONT_SUPPORT) += fonts/ obj-$(CONFIG_PRIME_NUMBERS) += prime_numbers.o diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig new file mode 100644 index 000000000000..4e2e59126a67 --- /dev/null +++ b/lib/zinc/Kconfig @@ -0,0 +1,32 @@ +config ZINC_DEBUG + bool "Zinc cryptography library debugging and self-tests" + help + This builds a series of self-tests for the Zinc crypto library, which + help diagnose any cryptographic algorithm implementation issues that + might be at the root cause of potential bugs. It also adds various + debugging traps. + + Unless you're developing and testing cryptographic routines, or are + especially paranoid about correctness on your hardware, you may say + N here. + +config ZINC_ARCH_ARM + def_bool y + depends on ARM && !64BIT + +config ZINC_ARCH_ARM64 + def_bool y + depends on ARM64 && 64BIT + +config ZINC_ARCH_X86_64 + def_bool y + depends on X86_64 && 64BIT + depends on !UML + +config ZINC_ARCH_MIPS + def_bool y + depends on MIPS && CPU_MIPS32_R2 && !64BIT + +config ZINC_ARCH_MIPS64 + def_bool y + depends on MIPS && 64BIT diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile new file mode 100644 index 000000000000..a61c80d676cb --- /dev/null +++ b/lib/zinc/Makefile @@ -0,0 +1,3 @@ +ccflags-y := -O2 +ccflags-y += -D'pr_fmt(fmt)="zinc: " fmt' +ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG From patchwork Tue Sep 25 14:56:05 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147491 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp830278lji; Tue, 25 Sep 2018 07:57:04 -0700 (PDT) X-Google-Smtp-Source: ACcGV60rnLs0NHE2A0Zo8PFQnHQXHcXoBG7VB6W/GP9VkmIxNrL7/8c9dVcFsCikDY3L8bEjCiK6 X-Received: by 2002:a17:902:566:: with SMTP id 93-v6mr1676564plf.184.1537887424817; Tue, 25 Sep 2018 07:57:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887424; cv=none; d=google.com; s=arc-20160816; b=dG2eYSd0KuPji66hzPAMKIR0tFWmlCp1mMFJfTdcC6pG9/pq6hqZLdRdS/Wx/UZ3Tu IiHD+wiMwGBUumW/5CRzGH9NmfSXp+QYyN6qq3kR0eafrN9AcLnTPRwiZim57EIPVTSn nLIjeKL1eXGg4aTqWSMCJaqqtONYlU9Tvi/npyF/vo6oyjgADKKsdEvEd3E/g+2+nF3Q FUZlKFnqE2FkO4CQoKzhMGIkbPvGRN69vNvprvKYZzk7+O8G+eV5rXmQSsyF4wFQC6zY MowPudDCRE3pGJ2wH1jeFHHnzQ9J8a40SUPyrVgZtVBG0Iv2hT4LMKTmDTV5/2W7cphK c96g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=A9O70zmX3azi6a1y2+xLs8jLeNAxuD8mLP32oRAlLCM=; b=RT3WLlL2lqe7+BT4zWrMvm38XMz8uQGlijAJkfmQ8Tx9qJDfu7zKaJc+nqYhuhbtE0 ddKAdmadGDKR2dzSg6ENrQvS3b8F7OkJIjXn1WlxiTs4jkTJxpGPjFtWB6OsxwqXOEHK Lq6CNptuVWe+CYLG7OAkBnZemvry5dZb4GFocIJPs7BJjgyxM6t48lBlopVdG0gcvIio geTPqeJQkQInVv/Oiwzh/fRIyc7Qk9UVEVhHt63pST73pBbpX8fuL+EHHdgXnic6QXfp oBqWAsgDqJ+BcR8u4kzCBA6tR0Z5T58tEeb7AallShUJ0iGut0F2ijzjVLZixkpmhajg aD0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=hP+UUxAm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b5-v6si2677461pgd.54.2018.09.25.07.57.04; Tue, 25 Sep 2018 07:57:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=hP+UUxAm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729649AbeIYVEx (ORCPT + 32 others); Tue, 25 Sep 2018 17:04:53 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:58261 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729616AbeIYVEw (ORCPT ); Tue, 25 Sep 2018 17:04:52 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 639f8d34; Tue, 25 Sep 2018 14:38:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=lYddJx2HxnDZCvJTFgo5CNoL4 Gc=; b=hP+UUxAmUURvBGkoo7SRE5h8cIreFFlLB9v2rWDxQlW4PNkU05ppnrb9i 8CW0xv4WmzqtDjkEJQNETCCyA5WKkQNcPf7dWla3ZIiwCDPvYQwhBt+bYILHYZnZ 79bLhqn1wGK1iFt4CQFn29vgbud/f2+U/+iTH8btyUSpBycsjodLX5QXM8krPR9F suRuefdW6Aenqrm2AUmnIKpormdoGyhrJDofd1uCtqTbW9TqJ7YN3yRyHft+PZJW y6cWozWfL9dPbriAOyZlbkGC/ULjBU7+qhg1LL2PbduCV0h75PRCWRZN/bTOvyK1 48YpJyLdZD98/1WaRQNw/AqI3VniA== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 742109bf (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:19 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Russell King , linux-arm-kernel@lists.infradead.org Subject: [PATCH net-next v6 06/23] zinc: port Andy Polyakov's ChaCha20 ARM and ARM64 implementations Date: Tue, 25 Sep 2018 16:56:05 +0200 Message-Id: <20180925145622.29959-7-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org These port and prepare Andy Polyakov's implementations for the kernel, but don't actually wire up any of the code yet. The wiring will be done in a subsequent commit, since we'll need to merge these implementations with another one. We make a few small changes to the assembly: - Entries and exits use the proper kernel convention macro. - CPU feature checking is done in C by the glue code, so that has been removed from the assembly. - The function names have been renamed to fit kernel conventions. - Labels have been renamed (prefixed with .L) to fit kernel conventions. - Constants have been rearranged so that they are closer to the code that is using them. [ARM only] - The neon code can jump to the scalar code when it makes sense to do so. - The neon_512 function as a separate function has been removed, leaving the decision up to the main neon entry point. [ARM64 only] Signed-off-by: Jason A. Donenfeld Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Russell King Cc: linux-arm-kernel@lists.infradead.org --- lib/zinc/chacha20/chacha20-arm-cryptogams.S | 367 +++++++++--------- lib/zinc/chacha20/chacha20-arm64-cryptogams.S | 57 +-- 2 files changed, 193 insertions(+), 231 deletions(-) -- 2.19.0 diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm-cryptogams.S index 05a3a9e6e93f..770bab469171 100644 --- a/lib/zinc/chacha20/chacha20-arm-cryptogams.S +++ b/lib/zinc/chacha20/chacha20-arm-cryptogams.S @@ -1,9 +1,12 @@ /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ /* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. * Copyright (C) 2006-2017 CRYPTOGAMS by . All Rights Reserved. + * + * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS. */ -#include "arm_arch.h" +#include .text #if defined(__thumb2__) || defined(__clang__) @@ -24,48 +27,25 @@ .long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral .Lone: .long 1,0,0,0 -.Lrot8: -.long 0x02010003,0x06050407 -#if __ARM_MAX_ARCH__>=7 -.LOPENSSL_armcap: -.word OPENSSL_armcap_P-.LChaCha20_ctr32 -#else .word -1 -#endif -.globl ChaCha20_ctr32 -.type ChaCha20_ctr32,%function .align 5 -ChaCha20_ctr32: -.LChaCha20_ctr32: +ENTRY(chacha20_arm) ldr r12,[sp,#0] @ pull pointer to counter and nonce stmdb sp!,{r0-r2,r4-r11,lr} -#if __ARM_ARCH__<7 && !defined(__thumb2__) - sub r14,pc,#16 @ ChaCha20_ctr32 -#else - adr r14,.LChaCha20_ctr32 -#endif cmp r2,#0 @ len==0? -#ifdef __thumb2__ +#ifdef __thumb2__ itt eq #endif addeq sp,sp,#4*3 - beq .Lno_data -#if __ARM_MAX_ARCH__>=7 - cmp r2,#192 @ test len - bls .Lshort - ldr r4,[r14,#-24] - ldr r4,[r14,r4] -# ifdef __APPLE__ - ldr r4,[r4] -# endif - tst r4,#ARMV7_NEON - bne .LChaCha20_neon -.Lshort: -#endif + beq .Lno_data_arm ldmia r12,{r4-r7} @ load counter and nonce sub sp,sp,#4*(16) @ off-load area - sub r14,r14,#64 @ .Lsigma +#if __LINUX_ARM_ARCH__ < 7 && !defined(__thumb2__) + sub r14,pc,#100 @ .Lsigma +#else + adr r14,.Lsigma @ .Lsigma +#endif stmdb sp!,{r4-r7} @ copy counter and nonce ldmia r3,{r4-r11} @ load key ldmia r14,{r0-r3} @ load sigma @@ -191,7 +171,7 @@ ChaCha20_ctr32: @ rx and second half at sp+4*(16+8) cmp r11,#64 @ done yet? -#ifdef __thumb2__ +#ifdef __thumb2__ itete lo #endif addlo r12,sp,#4*(0) @ shortcut or ... @@ -202,49 +182,49 @@ ChaCha20_ctr32: ldr r8,[sp,#4*(0)] @ load key material ldr r9,[sp,#4*(1)] -#if __ARM_ARCH__>=6 || !defined(__ARMEB__) -# if __ARM_ARCH__<7 +#if __LINUX_ARM_ARCH__ >= 6 || !defined(__ARMEB__) +#if __LINUX_ARM_ARCH__ < 7 orr r10,r12,r14 tst r10,#3 @ are input and output aligned? ldr r10,[sp,#4*(2)] bne .Lunaligned cmp r11,#64 @ restore flags -# else +#else ldr r10,[sp,#4*(2)] -# endif +#endif ldr r11,[sp,#4*(3)] add r0,r0,r8 @ accumulate key material add r1,r1,r9 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r8,[r12],#16 @ load input ldrhs r9,[r12,#-12] add r2,r2,r10 add r3,r3,r11 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r10,[r12,#-8] ldrhs r11,[r12,#-4] -# if __ARM_ARCH__>=6 && defined(__ARMEB__) +#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__) rev r0,r0 rev r1,r1 rev r2,r2 rev r3,r3 -# endif -# ifdef __thumb2__ +#endif +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r0,r0,r8 @ xor with input eorhs r1,r1,r9 add r8,sp,#4*(4) str r0,[r14],#16 @ store output -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r2,r2,r10 eorhs r3,r3,r11 ldmia r8,{r8-r11} @ load key material @@ -254,34 +234,34 @@ ChaCha20_ctr32: add r4,r8,r4,ror#13 @ accumulate key material add r5,r9,r5,ror#13 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r8,[r12],#16 @ load input ldrhs r9,[r12,#-12] add r6,r10,r6,ror#13 add r7,r11,r7,ror#13 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r10,[r12,#-8] ldrhs r11,[r12,#-4] -# if __ARM_ARCH__>=6 && defined(__ARMEB__) +#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__) rev r4,r4 rev r5,r5 rev r6,r6 rev r7,r7 -# endif -# ifdef __thumb2__ +#endif +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r4,r4,r8 eorhs r5,r5,r9 add r8,sp,#4*(8) str r4,[r14],#16 @ store output -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r6,r6,r10 eorhs r7,r7,r11 str r5,[r14,#-12] @@ -294,39 +274,39 @@ ChaCha20_ctr32: add r0,r0,r8 @ accumulate key material add r1,r1,r9 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r8,[r12],#16 @ load input ldrhs r9,[r12,#-12] -# ifdef __thumb2__ +#ifdef __thumb2__ itt hi -# endif +#endif strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it add r2,r2,r10 add r3,r3,r11 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r10,[r12,#-8] ldrhs r11,[r12,#-4] -# if __ARM_ARCH__>=6 && defined(__ARMEB__) +#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__) rev r0,r0 rev r1,r1 rev r2,r2 rev r3,r3 -# endif -# ifdef __thumb2__ +#endif +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r0,r0,r8 eorhs r1,r1,r9 add r8,sp,#4*(12) str r0,[r14],#16 @ store output -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r2,r2,r10 eorhs r3,r3,r11 str r1,[r14,#-12] @@ -336,79 +316,79 @@ ChaCha20_ctr32: add r4,r8,r4,ror#24 @ accumulate key material add r5,r9,r5,ror#24 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hi -# endif +#endif addhi r8,r8,#1 @ next counter value strhi r8,[sp,#4*(12)] @ save next counter value -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r8,[r12],#16 @ load input ldrhs r9,[r12,#-12] add r6,r10,r6,ror#24 add r7,r11,r7,ror#24 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhs r10,[r12,#-8] ldrhs r11,[r12,#-4] -# if __ARM_ARCH__>=6 && defined(__ARMEB__) +#if __LINUX_ARM_ARCH__ >= 6 && defined(__ARMEB__) rev r4,r4 rev r5,r5 rev r6,r6 rev r7,r7 -# endif -# ifdef __thumb2__ +#endif +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r4,r4,r8 eorhs r5,r5,r9 -# ifdef __thumb2__ +#ifdef __thumb2__ it ne -# endif +#endif ldrne r8,[sp,#4*(32+2)] @ re-load len -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif eorhs r6,r6,r10 eorhs r7,r7,r11 str r4,[r14],#16 @ store output str r5,[r14,#-12] -# ifdef __thumb2__ +#ifdef __thumb2__ it hs -# endif +#endif subhs r11,r8,#64 @ len-=64 str r6,[r14,#-8] str r7,[r14,#-4] bhi .Loop_outer beq .Ldone -# if __ARM_ARCH__<7 +#if __LINUX_ARM_ARCH__ < 7 b .Ltail .align 4 .Lunaligned: @ unaligned endian-neutral path cmp r11,#64 @ restore flags -# endif #endif -#if __ARM_ARCH__<7 +#endif +#if __LINUX_ARM_ARCH__ < 7 ldr r11,[sp,#4*(3)] add r0,r8,r0 @ accumulate key material add r1,r9,r1 add r2,r10,r2 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r8,r8,r8 @ zero or ... ldrhsb r8,[r12],#16 @ ... load input eorlo r9,r9,r9 ldrhsb r9,[r12,#-12] add r3,r11,r3 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r10,r10,r10 ldrhsb r10,[r12,#-8] eorlo r11,r11,r11 @@ -416,53 +396,53 @@ ChaCha20_ctr32: eor r0,r8,r0 @ xor with input (or zero) eor r1,r9,r1 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-15] @ load more input ldrhsb r9,[r12,#-11] eor r2,r10,r2 strb r0,[r14],#16 @ store output eor r3,r11,r3 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-7] ldrhsb r11,[r12,#-3] strb r1,[r14,#-12] eor r0,r8,r0,lsr#8 strb r2,[r14,#-8] eor r1,r9,r1,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-14] @ load more input ldrhsb r9,[r12,#-10] strb r3,[r14,#-4] eor r2,r10,r2,lsr#8 strb r0,[r14,#-15] eor r3,r11,r3,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-6] ldrhsb r11,[r12,#-2] strb r1,[r14,#-11] eor r0,r8,r0,lsr#8 strb r2,[r14,#-7] eor r1,r9,r1,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-13] @ load more input ldrhsb r9,[r12,#-9] strb r3,[r14,#-3] eor r2,r10,r2,lsr#8 strb r0,[r14,#-14] eor r3,r11,r3,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-5] ldrhsb r11,[r12,#-1] strb r1,[r14,#-10] @@ -482,18 +462,18 @@ ChaCha20_ctr32: add r4,r8,r4,ror#13 @ accumulate key material add r5,r9,r5,ror#13 add r6,r10,r6,ror#13 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r8,r8,r8 @ zero or ... ldrhsb r8,[r12],#16 @ ... load input eorlo r9,r9,r9 ldrhsb r9,[r12,#-12] add r7,r11,r7,ror#13 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r10,r10,r10 ldrhsb r10,[r12,#-8] eorlo r11,r11,r11 @@ -501,53 +481,53 @@ ChaCha20_ctr32: eor r4,r8,r4 @ xor with input (or zero) eor r5,r9,r5 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-15] @ load more input ldrhsb r9,[r12,#-11] eor r6,r10,r6 strb r4,[r14],#16 @ store output eor r7,r11,r7 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-7] ldrhsb r11,[r12,#-3] strb r5,[r14,#-12] eor r4,r8,r4,lsr#8 strb r6,[r14,#-8] eor r5,r9,r5,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-14] @ load more input ldrhsb r9,[r12,#-10] strb r7,[r14,#-4] eor r6,r10,r6,lsr#8 strb r4,[r14,#-15] eor r7,r11,r7,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-6] ldrhsb r11,[r12,#-2] strb r5,[r14,#-11] eor r4,r8,r4,lsr#8 strb r6,[r14,#-7] eor r5,r9,r5,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-13] @ load more input ldrhsb r9,[r12,#-9] strb r7,[r14,#-3] eor r6,r10,r6,lsr#8 strb r4,[r14,#-14] eor r7,r11,r7,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-5] ldrhsb r11,[r12,#-1] strb r5,[r14,#-10] @@ -564,26 +544,26 @@ ChaCha20_ctr32: add r8,sp,#4*(4+4) ldmia r8,{r8-r11} @ load key material ldmia r0,{r0-r7} @ load second half -# ifdef __thumb2__ +#ifdef __thumb2__ itt hi -# endif +#endif strhi r10,[sp,#4*(16+10)] @ copy "rx" strhi r11,[sp,#4*(16+11)] @ copy "rx" add r0,r8,r0 @ accumulate key material add r1,r9,r1 add r2,r10,r2 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r8,r8,r8 @ zero or ... ldrhsb r8,[r12],#16 @ ... load input eorlo r9,r9,r9 ldrhsb r9,[r12,#-12] add r3,r11,r3 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r10,r10,r10 ldrhsb r10,[r12,#-8] eorlo r11,r11,r11 @@ -591,53 +571,53 @@ ChaCha20_ctr32: eor r0,r8,r0 @ xor with input (or zero) eor r1,r9,r1 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-15] @ load more input ldrhsb r9,[r12,#-11] eor r2,r10,r2 strb r0,[r14],#16 @ store output eor r3,r11,r3 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-7] ldrhsb r11,[r12,#-3] strb r1,[r14,#-12] eor r0,r8,r0,lsr#8 strb r2,[r14,#-8] eor r1,r9,r1,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-14] @ load more input ldrhsb r9,[r12,#-10] strb r3,[r14,#-4] eor r2,r10,r2,lsr#8 strb r0,[r14,#-15] eor r3,r11,r3,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-6] ldrhsb r11,[r12,#-2] strb r1,[r14,#-11] eor r0,r8,r0,lsr#8 strb r2,[r14,#-7] eor r1,r9,r1,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-13] @ load more input ldrhsb r9,[r12,#-9] strb r3,[r14,#-3] eor r2,r10,r2,lsr#8 strb r0,[r14,#-14] eor r3,r11,r3,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-5] ldrhsb r11,[r12,#-1] strb r1,[r14,#-10] @@ -654,25 +634,25 @@ ChaCha20_ctr32: add r8,sp,#4*(4+8) ldmia r8,{r8-r11} @ load key material add r4,r8,r4,ror#24 @ accumulate key material -# ifdef __thumb2__ +#ifdef __thumb2__ itt hi -# endif +#endif addhi r8,r8,#1 @ next counter value strhi r8,[sp,#4*(12)] @ save next counter value add r5,r9,r5,ror#24 add r6,r10,r6,ror#24 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r8,r8,r8 @ zero or ... ldrhsb r8,[r12],#16 @ ... load input eorlo r9,r9,r9 ldrhsb r9,[r12,#-12] add r7,r11,r7,ror#24 -# ifdef __thumb2__ +#ifdef __thumb2__ itete lo -# endif +#endif eorlo r10,r10,r10 ldrhsb r10,[r12,#-8] eorlo r11,r11,r11 @@ -680,53 +660,53 @@ ChaCha20_ctr32: eor r4,r8,r4 @ xor with input (or zero) eor r5,r9,r5 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-15] @ load more input ldrhsb r9,[r12,#-11] eor r6,r10,r6 strb r4,[r14],#16 @ store output eor r7,r11,r7 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-7] ldrhsb r11,[r12,#-3] strb r5,[r14,#-12] eor r4,r8,r4,lsr#8 strb r6,[r14,#-8] eor r5,r9,r5,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-14] @ load more input ldrhsb r9,[r12,#-10] strb r7,[r14,#-4] eor r6,r10,r6,lsr#8 strb r4,[r14,#-15] eor r7,r11,r7,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-6] ldrhsb r11,[r12,#-2] strb r5,[r14,#-11] eor r4,r8,r4,lsr#8 strb r6,[r14,#-7] eor r5,r9,r5,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r8,[r12,#-13] @ load more input ldrhsb r9,[r12,#-9] strb r7,[r14,#-3] eor r6,r10,r6,lsr#8 strb r4,[r14,#-14] eor r7,r11,r7,lsr#8 -# ifdef __thumb2__ +#ifdef __thumb2__ itt hs -# endif +#endif ldrhsb r10,[r12,#-5] ldrhsb r11,[r12,#-1] strb r5,[r14,#-10] @@ -740,13 +720,13 @@ ChaCha20_ctr32: eor r7,r11,r7,lsr#8 strb r6,[r14,#-5] strb r7,[r14,#-1] -# ifdef __thumb2__ +#ifdef __thumb2__ it ne -# endif +#endif ldrne r8,[sp,#4*(32+2)] @ re-load len -# ifdef __thumb2__ +#ifdef __thumb2__ it hs -# endif +#endif subhs r11,r8,#64 @ len-=64 bhi .Loop_outer @@ -768,20 +748,33 @@ ChaCha20_ctr32: .Ldone: add sp,sp,#4*(32+3) -.Lno_data: +.Lno_data_arm: ldmia sp!,{r4-r11,pc} -.size ChaCha20_ctr32,.-ChaCha20_ctr32 -#if __ARM_MAX_ARCH__>=7 +ENDPROC(chacha20_arm) + +#ifdef CONFIG_KERNEL_MODE_NEON +.align 5 +.Lsigma2: +.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral +.Lone2: +.long 1,0,0,0 +.word -1 + .arch armv7-a .fpu neon -.type ChaCha20_neon,%function .align 5 -ChaCha20_neon: +ENTRY(chacha20_neon) ldr r12,[sp,#0] @ pull pointer to counter and nonce stmdb sp!,{r0-r2,r4-r11,lr} -.LChaCha20_neon: - adr r14,.Lsigma + cmp r2,#0 @ len==0? +#ifdef __thumb2__ + itt eq +#endif + addeq sp,sp,#4*3 + beq .Lno_data_neon +.Lchacha20_neon_begin: + adr r14,.Lsigma2 vstmdb sp!,{d8-d15} @ ABI spec says so stmdb sp!,{r0-r3} @@ -1121,12 +1114,12 @@ ChaCha20_neon: ldr r10,[r12,#-8] add r3,r3,r11 ldr r11,[r12,#-4] -# ifdef __ARMEB__ +#ifdef __ARMEB__ rev r0,r0 rev r1,r1 rev r2,r2 rev r3,r3 -# endif +#endif eor r0,r0,r8 @ xor with input add r8,sp,#4*(4) eor r1,r1,r9 @@ -1146,12 +1139,12 @@ ChaCha20_neon: ldr r10,[r12,#-8] add r7,r11,r7,ror#13 ldr r11,[r12,#-4] -# ifdef __ARMEB__ +#ifdef __ARMEB__ rev r4,r4 rev r5,r5 rev r6,r6 rev r7,r7 -# endif +#endif eor r4,r4,r8 add r8,sp,#4*(8) eor r5,r5,r9 @@ -1170,24 +1163,24 @@ ChaCha20_neon: ldr r8,[r12],#16 @ load input add r1,r1,r9 ldr r9,[r12,#-12] -# ifdef __thumb2__ +#ifdef __thumb2__ it hi -# endif +#endif strhi r10,[sp,#4*(16+10)] @ copy "rx" while at it add r2,r2,r10 ldr r10,[r12,#-8] -# ifdef __thumb2__ +#ifdef __thumb2__ it hi -# endif +#endif strhi r11,[sp,#4*(16+11)] @ copy "rx" while at it add r3,r3,r11 ldr r11,[r12,#-4] -# ifdef __ARMEB__ +#ifdef __ARMEB__ rev r0,r0 rev r1,r1 rev r2,r2 rev r3,r3 -# endif +#endif eor r0,r0,r8 add r8,sp,#4*(12) eor r1,r1,r9 @@ -1210,16 +1203,16 @@ ChaCha20_neon: add r7,r11,r7,ror#24 ldr r10,[r12,#-8] ldr r11,[r12,#-4] -# ifdef __ARMEB__ +#ifdef __ARMEB__ rev r4,r4 rev r5,r5 rev r6,r6 rev r7,r7 -# endif +#endif eor r4,r4,r8 -# ifdef __thumb2__ +#ifdef __thumb2__ it hi -# endif +#endif ldrhi r8,[sp,#4*(32+2)] @ re-load len eor r5,r5,r9 eor r6,r6,r10 @@ -1379,7 +1372,7 @@ ChaCha20_neon: add r6,r10,r6,ror#13 add r7,r11,r7,ror#13 ldmia r8,{r8-r11} @ load key material -# ifdef __ARMEB__ +#ifdef __ARMEB__ rev r0,r0 rev r1,r1 rev r2,r2 @@ -1388,7 +1381,7 @@ ChaCha20_neon: rev r5,r5 rev r6,r6 rev r7,r7 -# endif +#endif stmia sp,{r0-r7} add r0,sp,#4*(16+8) @@ -1408,7 +1401,7 @@ ChaCha20_neon: add r6,r10,r6,ror#24 add r7,r11,r7,ror#24 ldr r11,[sp,#4*(32+2)] @ re-load len -# ifdef __ARMEB__ +#ifdef __ARMEB__ rev r0,r0 rev r1,r1 rev r2,r2 @@ -1417,7 +1410,7 @@ ChaCha20_neon: rev r5,r5 rev r6,r6 rev r7,r7 -# endif +#endif stmia r8,{r0-r7} add r10,sp,#4*(0) sub r11,r11,#64*3 @ len-=64*3 @@ -1434,7 +1427,7 @@ ChaCha20_neon: add sp,sp,#4*(32+4) vldmia sp,{d8-d15} add sp,sp,#4*(16+3) +.Lno_data_neon: ldmia sp!,{r4-r11,pc} -.size ChaCha20_neon,.-ChaCha20_neon -.comm OPENSSL_armcap_P,4,4 +ENDPROC(chacha20_neon) #endif diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S index 4d029bfdad3a..5037510edc6f 100644 --- a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S +++ b/lib/zinc/chacha20/chacha20-arm64-cryptogams.S @@ -1,46 +1,24 @@ /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ /* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. * Copyright (C) 2006-2017 CRYPTOGAMS by . All Rights Reserved. + * + * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS. */ -#include "arm_arch.h" +#include .text - - - .align 5 .Lsigma: .quad 0x3320646e61707865,0x6b20657479622d32 // endian-neutral .Lone: .long 1,0,0,0 -.LOPENSSL_armcap_P: -#ifdef __ILP32__ -.long OPENSSL_armcap_P-. -#else -.quad OPENSSL_armcap_P-. -#endif -.byte 67,104,97,67,104,97,50,48,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 -.align 2 -.globl ChaCha20_ctr32 -.type ChaCha20_ctr32,%function .align 5 -ChaCha20_ctr32: +ENTRY(chacha20_arm) cbz x2,.Labort - adr x5,.LOPENSSL_armcap_P - cmp x2,#192 - b.lo .Lshort -#ifdef __ILP32__ - ldrsw x6,[x5] -#else - ldr x6,[x5] -#endif - ldr w17,[x6,x5] - tst w17,#ARMV7_NEON - b.ne ChaCha20_neon -.Lshort: stp x29,x30,[sp,#-96]! add x29,sp,#0 @@ -309,11 +287,13 @@ ChaCha20_ctr32: ldp x27,x28,[x29,#80] ldp x29,x30,[sp],#96 ret -.size ChaCha20_ctr32,.-ChaCha20_ctr32 +ENDPROC(chacha20_arm) -.type ChaCha20_neon,%function +#ifdef CONFIG_KERNEL_MODE_NEON .align 5 -ChaCha20_neon: +ENTRY(chacha20_neon) + cbz x2,.Labort_neon + stp x29,x30,[sp,#-96]! add x29,sp,#0 @@ -803,19 +783,6 @@ ChaCha20_neon: ldp x27,x28,[x29,#80] ldp x29,x30,[sp],#96 ret -.size ChaCha20_neon,.-ChaCha20_neon -.type ChaCha20_512_neon,%function -.align 5 -ChaCha20_512_neon: - stp x29,x30,[sp,#-96]! - add x29,sp,#0 - - adr x5,.Lsigma - stp x19,x20,[sp,#16] - stp x21,x22,[sp,#32] - stp x23,x24,[sp,#48] - stp x25,x26,[sp,#64] - stp x27,x28,[sp,#80] .L512_or_more_neon: sub sp,sp,#128+64 @@ -1969,5 +1936,7 @@ ChaCha20_512_neon: ldp x25,x26,[x29,#64] ldp x27,x28,[x29,#80] ldp x29,x30,[sp],#96 +.Labort_neon: ret -.size ChaCha20_512_neon,.-ChaCha20_512_neon +ENDPROC(chacha20_neon) +#endif From patchwork Tue Sep 25 14:56:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147509 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp833558lji; Tue, 25 Sep 2018 08:00:10 -0700 (PDT) X-Google-Smtp-Source: ACcGV61e/eZMEdmYh0qtnHbWXgvSR17QMT7s44jelS4OGlUkzhVoAwkTtyOVspvN8zKNhkMnkFIz X-Received: by 2002:a62:be03:: with SMTP id l3-v6mr1640870pff.138.1537887610572; Tue, 25 Sep 2018 08:00:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887610; cv=none; d=google.com; s=arc-20160816; b=rWzRHedGgdlXZZaPeTw1yzxR0SYpxTQk/iB0zTjYPF83A9w6gb0ICPIKcsy5a6iepz tYfp0GATCekXi4+ZOe9AVEVBJm81qTg3M0kOW+4D0LfCF6Ju/vo+SaRDheDE3SxyCzDp +tLsW8T4d2xy0y/wbtpfnsRA5hiY4w0IcTu+LWUPfUa3cJH37o/qfYdPMYOPc3KXOQKD lJa8c8Xi+C+AXkxmTxGT0K+SUWztlKCTQxAyXbkSqYyNVY6XJ2Q9Alos8jjDDb8WWPKv 3fxX/JfIQUE8kaLof6DfV6uwLffeX2Omw3qAcq7pGhvljGr6e16BtMEvc/kLL17Dhhd6 8OdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jmZ88CgiMDS3ZAZHgLD2tZzsa2lMlYMzdbKa82N7lOA=; b=Ph61iTmj9tHGnS2St6RfUuse46BIXz4w3jzea98Zw1vfrE49snpPv4RYJ8nJBciTJ1 XxD0nkjuzaO024k3qf42kCmcwqoQcSllt6YbjQclEV3q2xPOusR7u+B3K6394xjE6ADO Tcw67oqdGVHnJbWizJryc1y0HOcOSkmgHA8erjGmeNARTg2/ISqjoEmFELZYJZH4mxad 8OUvffvaeeHV9ypAhMjHuF1wuQ57JLg7bgSXDmmjhVtRvKkUuxycuZr9d+7g1MFZAcib T1sr/R1I3rskGizURrrp3zGaBiGHLzdBexnUXKA3OkFx86h5WWkq073uynFyIG9riuKr Za0A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b="Y/wMFwqB"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h17-v6si2472751pgj.214.2018.09.25.08.00.10; Tue, 25 Sep 2018 08:00:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b="Y/wMFwqB"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729672AbeIYVIB (ORCPT + 32 others); Tue, 25 Sep 2018 17:08:01 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:36361 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729260AbeIYVEw (ORCPT ); Tue, 25 Sep 2018 17:04:52 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 53d34653; Tue, 25 Sep 2018 14:38:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=PddYXe56o3iu4lbPdIxhid5Of Oo=; b=Y/wMFwqB6RjHKwjd9cAADW/n+oNxFE3qIfKNPZKgpTfP6Eu3xQajRn1r3 K+BYVsnMzbTjZHYoSM6qMDJmEtxG7zWnizR6IXNC3eFar/LZWxfU74ejUIGXiloe Lb9tFZwF8oLq1bY82pnvu0rGbgNKIEpjnwgISAPaWyCCd8iRfmUXiW6Tyk0L8VOs gKZXX+qczK+4KKfK7KOvvmVG/oBW6BDDearCoU9g4OWeUnq3weTMOMe6uOIy8xsc uhlRCsmN0ezp3Ps+YV13U1H7Ci0nzwhxjBbVECAIaEgtXdAPgNdyEReQq9N0B3zR 7KNMFKMU/lA657gPFvVYFMUsN+GQw== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 22d85e22 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:21 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Russell King , linux-arm-kernel@lists.infradead.org Subject: [PATCH net-next v6 07/23] zinc: ChaCha20 ARM and ARM64 implementations Date: Tue, 25 Sep 2018 16:56:06 +0200 Message-Id: <20180925145622.29959-8-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org These wire Andy Polyakov's implementations up to the kernel for ARMv7,8 NEON, and introduce Eric Biggers' ultra-fast scalar implementation for CPUs without NEON or for CPUs with slow NEON (Cortex-A5,7). This commit does the following: - Adds the glue code for the assembly implementations. - Renames the ARMv8 code into place, since it can at this point be used wholesale. - Merges Andy Polyakov's ARMv7 NEON code with Eric Biggers' <=ARMv7 scalar code. Commit note: Eric Biggers' scalar code is brand new, and quite possibly prematurely added to this commit, and so it may require a bit of revision. This commit delivers approximately the same or much better performance than the existing crypto API's code and has been measured to do as such on: - ARM1176JZF-S [ARMv6] - Cortex-A7 [ARMv7] - Cortex-A8 [ARMv7] - Cortex-A9 [ARMv7] - Cortex-A17 [ARMv7] - Cortex-A53 [ARMv8] - Cortex-A55 [ARMv8] - Cortex-A73 [ARMv8] - Cortex-A75 [ARMv8] Interestingly, Andy Polyakov's scalar code is slower than Eric Biggers', but is also significantly shorter. This has the advantage that it does not evict other code from L1 cache -- particularly on ARM11 chips -- and so in certain circumstances it can actually be faster. However, it wasn't found that this had an affect on any code existing in the kernel today. Signed-off-by: Jason A. Donenfeld Co-authored-by: Eric Biggers Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Russell King Cc: linux-arm-kernel@lists.infradead.org --- lib/zinc/Makefile | 2 + lib/zinc/chacha20/chacha20-arm-glue.h | 88 +++ ...acha20-arm-cryptogams.S => chacha20-arm.S} | 502 ++++++++++++++++-- ...20-arm64-cryptogams.S => chacha20-arm64.S} | 0 lib/zinc/chacha20/chacha20.c | 2 + 5 files changed, 556 insertions(+), 38 deletions(-) create mode 100644 lib/zinc/chacha20/chacha20-arm-glue.h rename lib/zinc/chacha20/{chacha20-arm-cryptogams.S => chacha20-arm.S} (71%) rename lib/zinc/chacha20/{chacha20-arm64-cryptogams.S => chacha20-arm64.S} (100%) -- 2.19.0 diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile index 223a0816c918..e47f64e12bbd 100644 --- a/lib/zinc/Makefile +++ b/lib/zinc/Makefile @@ -4,4 +4,6 @@ ccflags-$(CONFIG_ZINC_DEBUG) += -DDEBUG zinc_chacha20-y := chacha20/chacha20.o zinc_chacha20-$(CONFIG_ZINC_ARCH_X86_64) += chacha20/chacha20-x86_64.o +zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM) += chacha20/chacha20-arm.o +zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM64) += chacha20/chacha20-arm64.o obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o diff --git a/lib/zinc/chacha20/chacha20-arm-glue.h b/lib/zinc/chacha20/chacha20-arm-glue.h new file mode 100644 index 000000000000..86cce851ed02 --- /dev/null +++ b/lib/zinc/chacha20/chacha20-arm-glue.h @@ -0,0 +1,88 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include +#include +#if defined(CONFIG_ARM) +#include +#include +#endif + +asmlinkage void chacha20_arm(u8 *out, const u8 *in, const size_t len, + const u32 key[8], const u32 counter[4]); +#if defined(CONFIG_ARM) +asmlinkage void hchacha20_arm(const u32 state[16], u32 out[8]); +#endif +#if defined(CONFIG_KERNEL_MODE_NEON) +asmlinkage void chacha20_neon(u8 *out, const u8 *in, const size_t len, + const u32 key[8], const u32 counter[4]); +#endif + +static bool chacha20_use_neon __ro_after_init; + +static void __init chacha20_fpu_init(void) +{ +#if defined(CONFIG_ARM64) + chacha20_use_neon = elf_hwcap & HWCAP_ASIMD; +#elif defined(CONFIG_ARM) + switch (read_cpuid_part()) { + case ARM_CPU_PART_CORTEX_A7: + case ARM_CPU_PART_CORTEX_A5: + /* The Cortex-A7 and Cortex-A5 do not perform well with the NEON + * implementation but do incredibly with the scalar one and use + * less power. + */ + break; + default: + chacha20_use_neon = elf_hwcap & HWCAP_NEON; + } +#endif +} + +static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst, + const u8 *src, size_t len, + simd_context_t *simd_context) +{ +#if defined(CONFIG_KERNEL_MODE_NEON) + if (chacha20_use_neon && len >= CHACHA20_BLOCK_SIZE * 3 && + simd_use(simd_context)) + chacha20_neon(dst, src, len, state->key, state->counter); + else +#endif + chacha20_arm(dst, src, len, state->key, state->counter); + + state->counter[0] += (len + 63) / 64; + return true; +} + +static inline bool hchacha20_arch(u32 derived_key[CHACHA20_KEY_WORDS], + const u8 nonce[HCHACHA20_NONCE_SIZE], + const u8 key[HCHACHA20_KEY_SIZE], + simd_context_t *simd_context) +{ +#if defined(CONFIG_ARM) + u32 x[] = { CHACHA20_CONSTANT_EXPA, + CHACHA20_CONSTANT_ND_3, + CHACHA20_CONSTANT_2_BY, + CHACHA20_CONSTANT_TE_K, + get_unaligned_le32(key + 0), + get_unaligned_le32(key + 4), + get_unaligned_le32(key + 8), + get_unaligned_le32(key + 12), + get_unaligned_le32(key + 16), + get_unaligned_le32(key + 20), + get_unaligned_le32(key + 24), + get_unaligned_le32(key + 28), + get_unaligned_le32(nonce + 0), + get_unaligned_le32(nonce + 4), + get_unaligned_le32(nonce + 8), + get_unaligned_le32(nonce + 12) + }; + hchacha20_arm(x, derived_key); + return true; +#else + return false; +#endif +} diff --git a/lib/zinc/chacha20/chacha20-arm-cryptogams.S b/lib/zinc/chacha20/chacha20-arm.S similarity index 71% rename from lib/zinc/chacha20/chacha20-arm-cryptogams.S rename to lib/zinc/chacha20/chacha20-arm.S index 770bab469171..5abedafcf129 100644 --- a/lib/zinc/chacha20/chacha20-arm-cryptogams.S +++ b/lib/zinc/chacha20/chacha20-arm.S @@ -1,13 +1,475 @@ /* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ /* + * Copyright (C) 2018 Google, Inc. * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. * Copyright (C) 2006-2017 CRYPTOGAMS by . All Rights Reserved. - * - * This is based in part on Andy Polyakov's implementation from CRYPTOGAMS. */ #include +/* + * The following scalar routine was written by Eric Biggers. + * + * Design notes: + * + * 16 registers would be needed to hold the state matrix, but only 14 are + * available because 'sp' and 'pc' cannot be used. So we spill the elements + * (x8, x9) to the stack and swap them out with (x10, x11). This adds one + * 'ldrd' and one 'strd' instruction per round. + * + * All rotates are performed using the implicit rotate operand accepted by the + * 'add' and 'eor' instructions. This is faster than using explicit rotate + * instructions. To make this work, we allow the values in the second and last + * rows of the ChaCha state matrix (rows 'b' and 'd') to temporarily have the + * wrong rotation amount. The rotation amount is then fixed up just in time + * when the values are used. 'brot' is the number of bits the values in row 'b' + * need to be rotated right to arrive at the correct values, and 'drot' + * similarly for row 'd'. (brot, drot) start out as (0, 0) but we make it such + * that they end up as (25, 24) after every round. + */ + + // ChaCha state registers + X0 .req r0 + X1 .req r1 + X2 .req r2 + X3 .req r3 + X4 .req r4 + X5 .req r5 + X6 .req r6 + X7 .req r7 + X8_X10 .req r8 // shared by x8 and x10 + X9_X11 .req r9 // shared by x9 and x11 + X12 .req r10 + X13 .req r11 + X14 .req r12 + X15 .req r14 + +.Lexpand_32byte_k: + // "expand 32-byte k" + .word 0x61707865, 0x3320646e, 0x79622d32, 0x6b206574 + +#ifdef __thumb2__ +# define adrl adr +#endif + +.macro __rev out, in, t0, t1, t2 +.if __LINUX_ARM_ARCH__ >= 6 + rev \out, \in +.else + lsl \t0, \in, #24 + and \t1, \in, #0xff00 + and \t2, \in, #0xff0000 + orr \out, \t0, \in, lsr #24 + orr \out, \out, \t1, lsl #8 + orr \out, \out, \t2, lsr #8 +.endif +.endm + +.macro _le32_bswap x, t0, t1, t2 +#ifdef __ARMEB__ + __rev \x, \x, \t0, \t1, \t2 +#endif +.endm + +.macro _le32_bswap_4x a, b, c, d, t0, t1, t2 + _le32_bswap \a, \t0, \t1, \t2 + _le32_bswap \b, \t0, \t1, \t2 + _le32_bswap \c, \t0, \t1, \t2 + _le32_bswap \d, \t0, \t1, \t2 +.endm + +.macro __ldrd a, b, src, offset +#if __LINUX_ARM_ARCH__ >= 6 + ldrd \a, \b, [\src, #\offset] +#else + ldr \a, [\src, #\offset] + ldr \b, [\src, #\offset + 4] +#endif +.endm + +.macro __strd a, b, dst, offset +#if __LINUX_ARM_ARCH__ >= 6 + strd \a, \b, [\dst, #\offset] +#else + str \a, [\dst, #\offset] + str \b, [\dst, #\offset + 4] +#endif +.endm + +.macro _halfround a1, b1, c1, d1, a2, b2, c2, d2 + + // a += b; d ^= a; d = rol(d, 16); + add \a1, \a1, \b1, ror #brot + add \a2, \a2, \b2, ror #brot + eor \d1, \a1, \d1, ror #drot + eor \d2, \a2, \d2, ror #drot + // drot == 32 - 16 == 16 + + // c += d; b ^= c; b = rol(b, 12); + add \c1, \c1, \d1, ror #16 + add \c2, \c2, \d2, ror #16 + eor \b1, \c1, \b1, ror #brot + eor \b2, \c2, \b2, ror #brot + // brot == 32 - 12 == 20 + + // a += b; d ^= a; d = rol(d, 8); + add \a1, \a1, \b1, ror #20 + add \a2, \a2, \b2, ror #20 + eor \d1, \a1, \d1, ror #16 + eor \d2, \a2, \d2, ror #16 + // drot == 32 - 8 == 24 + + // c += d; b ^= c; b = rol(b, 7); + add \c1, \c1, \d1, ror #24 + add \c2, \c2, \d2, ror #24 + eor \b1, \c1, \b1, ror #20 + eor \b2, \c2, \b2, ror #20 + // brot == 32 - 7 == 25 +.endm + +.macro _doubleround + + // column round + + // quarterrounds: (x0, x4, x8, x12) and (x1, x5, x9, x13) + _halfround X0, X4, X8_X10, X12, X1, X5, X9_X11, X13 + + // save (x8, x9); restore (x10, x11) + __strd X8_X10, X9_X11, sp, 0 + __ldrd X8_X10, X9_X11, sp, 8 + + // quarterrounds: (x2, x6, x10, x14) and (x3, x7, x11, x15) + _halfround X2, X6, X8_X10, X14, X3, X7, X9_X11, X15 + + .set brot, 25 + .set drot, 24 + + // diagonal round + + // quarterrounds: (x0, x5, x10, x15) and (x1, x6, x11, x12) + _halfround X0, X5, X8_X10, X15, X1, X6, X9_X11, X12 + + // save (x10, x11); restore (x8, x9) + __strd X8_X10, X9_X11, sp, 8 + __ldrd X8_X10, X9_X11, sp, 0 + + // quarterrounds: (x2, x7, x8, x13) and (x3, x4, x9, x14) + _halfround X2, X7, X8_X10, X13, X3, X4, X9_X11, X14 +.endm + +.macro _chacha_permute nrounds + .set brot, 0 + .set drot, 0 + .rept \nrounds / 2 + _doubleround + .endr +.endm + +.macro _chacha nrounds + +.Lnext_block\@: + // Stack: unused0-unused1 x10-x11 x0-x15 OUT IN LEN + // Registers contain x0-x9,x12-x15. + + // Do the core ChaCha permutation to update x0-x15. + _chacha_permute \nrounds + + add sp, #8 + // Stack: x10-x11 orig_x0-orig_x15 OUT IN LEN + // Registers contain x0-x9,x12-x15. + // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'. + + // Free up some registers (r8-r12,r14) by pushing (x8-x9,x12-x15). + push {X8_X10, X9_X11, X12, X13, X14, X15} + + // Load (OUT, IN, LEN). + ldr r14, [sp, #96] + ldr r12, [sp, #100] + ldr r11, [sp, #104] + + orr r10, r14, r12 + + // Use slow path if fewer than 64 bytes remain. + cmp r11, #64 + blt .Lxor_slowpath\@ + + // Use slow path if IN and/or OUT isn't 4-byte aligned. Needed even on + // ARMv6+, since ldmia and stmia (used below) still require alignment. + tst r10, #3 + bne .Lxor_slowpath\@ + + // Fast path: XOR 64 bytes of aligned data. + + // Stack: x8-x9 x12-x15 x10-x11 orig_x0-orig_x15 OUT IN LEN + // Registers: r0-r7 are x0-x7; r8-r11 are free; r12 is IN; r14 is OUT. + // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'. + + // x0-x3 + __ldrd r8, r9, sp, 32 + __ldrd r10, r11, sp, 40 + add X0, X0, r8 + add X1, X1, r9 + add X2, X2, r10 + add X3, X3, r11 + _le32_bswap_4x X0, X1, X2, X3, r8, r9, r10 + ldmia r12!, {r8-r11} + eor X0, X0, r8 + eor X1, X1, r9 + eor X2, X2, r10 + eor X3, X3, r11 + stmia r14!, {X0-X3} + + // x4-x7 + __ldrd r8, r9, sp, 48 + __ldrd r10, r11, sp, 56 + add X4, r8, X4, ror #brot + add X5, r9, X5, ror #brot + ldmia r12!, {X0-X3} + add X6, r10, X6, ror #brot + add X7, r11, X7, ror #brot + _le32_bswap_4x X4, X5, X6, X7, r8, r9, r10 + eor X4, X4, X0 + eor X5, X5, X1 + eor X6, X6, X2 + eor X7, X7, X3 + stmia r14!, {X4-X7} + + // x8-x15 + pop {r0-r7} // (x8-x9,x12-x15,x10-x11) + __ldrd r8, r9, sp, 32 + __ldrd r10, r11, sp, 40 + add r0, r0, r8 // x8 + add r1, r1, r9 // x9 + add r6, r6, r10 // x10 + add r7, r7, r11 // x11 + _le32_bswap_4x r0, r1, r6, r7, r8, r9, r10 + ldmia r12!, {r8-r11} + eor r0, r0, r8 // x8 + eor r1, r1, r9 // x9 + eor r6, r6, r10 // x10 + eor r7, r7, r11 // x11 + stmia r14!, {r0,r1,r6,r7} + ldmia r12!, {r0,r1,r6,r7} + __ldrd r8, r9, sp, 48 + __ldrd r10, r11, sp, 56 + add r2, r8, r2, ror #drot // x12 + add r3, r9, r3, ror #drot // x13 + add r4, r10, r4, ror #drot // x14 + add r5, r11, r5, ror #drot // x15 + _le32_bswap_4x r2, r3, r4, r5, r9, r10, r11 + ldr r9, [sp, #72] // load LEN + eor r2, r2, r0 // x12 + eor r3, r3, r1 // x13 + eor r4, r4, r6 // x14 + eor r5, r5, r7 // x15 + subs r9, #64 // decrement and check LEN + stmia r14!, {r2-r5} + + beq .Ldone\@ + +.Lprepare_for_next_block\@: + + // Stack: x0-x15 OUT IN LEN + + // Increment block counter (x12) + add r8, #1 + + // Store updated (OUT, IN, LEN) + str r14, [sp, #64] + str r12, [sp, #68] + str r9, [sp, #72] + + mov r14, sp + + // Store updated block counter (x12) + str r8, [sp, #48] + + sub sp, #16 + + // Reload state and do next block + ldmia r14!, {r0-r11} // load x0-x11 + __strd r10, r11, sp, 8 // store x10-x11 before state + ldmia r14, {r10-r12,r14} // load x12-x15 + b .Lnext_block\@ + +.Lxor_slowpath\@: + // Slow path: < 64 bytes remaining, or unaligned input or output buffer. + // We handle it by storing the 64 bytes of keystream to the stack, then + // XOR-ing the needed portion with the data. + + // Allocate keystream buffer + sub sp, #64 + mov r14, sp + + // Stack: ks0-ks15 x8-x9 x12-x15 x10-x11 orig_x0-orig_x15 OUT IN LEN + // Registers: r0-r7 are x0-x7; r8-r11 are free; r12 is IN; r14 is &ks0. + // x4-x7 are rotated by 'brot'; x12-x15 are rotated by 'drot'. + + // Save keystream for x0-x3 + __ldrd r8, r9, sp, 96 + __ldrd r10, r11, sp, 104 + add X0, X0, r8 + add X1, X1, r9 + add X2, X2, r10 + add X3, X3, r11 + _le32_bswap_4x X0, X1, X2, X3, r8, r9, r10 + stmia r14!, {X0-X3} + + // Save keystream for x4-x7 + __ldrd r8, r9, sp, 112 + __ldrd r10, r11, sp, 120 + add X4, r8, X4, ror #brot + add X5, r9, X5, ror #brot + add X6, r10, X6, ror #brot + add X7, r11, X7, ror #brot + _le32_bswap_4x X4, X5, X6, X7, r8, r9, r10 + add r8, sp, #64 + stmia r14!, {X4-X7} + + // Save keystream for x8-x15 + ldm r8, {r0-r7} // (x8-x9,x12-x15,x10-x11) + __ldrd r8, r9, sp, 128 + __ldrd r10, r11, sp, 136 + add r0, r0, r8 // x8 + add r1, r1, r9 // x9 + add r6, r6, r10 // x10 + add r7, r7, r11 // x11 + _le32_bswap_4x r0, r1, r6, r7, r8, r9, r10 + stmia r14!, {r0,r1,r6,r7} + __ldrd r8, r9, sp, 144 + __ldrd r10, r11, sp, 152 + add r2, r8, r2, ror #drot // x12 + add r3, r9, r3, ror #drot // x13 + add r4, r10, r4, ror #drot // x14 + add r5, r11, r5, ror #drot // x15 + _le32_bswap_4x r2, r3, r4, r5, r9, r10, r11 + stmia r14, {r2-r5} + + // Stack: ks0-ks15 unused0-unused7 x0-x15 OUT IN LEN + // Registers: r8 is block counter, r12 is IN. + + ldr r9, [sp, #168] // LEN + ldr r14, [sp, #160] // OUT + cmp r9, #64 + mov r0, sp + movle r1, r9 + movgt r1, #64 + // r1 is number of bytes to XOR, in range [1, 64] + +.if __LINUX_ARM_ARCH__ < 6 + orr r2, r12, r14 + tst r2, #3 // IN or OUT misaligned? + bne .Lxor_next_byte\@ +.endif + + // XOR a word at a time +.rept 16 + subs r1, #4 + blt .Lxor_words_done\@ + ldr r2, [r12], #4 + ldr r3, [r0], #4 + eor r2, r2, r3 + str r2, [r14], #4 +.endr + b .Lxor_slowpath_done\@ +.Lxor_words_done\@: + ands r1, r1, #3 + beq .Lxor_slowpath_done\@ + + // XOR a byte at a time +.Lxor_next_byte\@: + ldrb r2, [r12], #1 + ldrb r3, [r0], #1 + eor r2, r2, r3 + strb r2, [r14], #1 + subs r1, #1 + bne .Lxor_next_byte\@ + +.Lxor_slowpath_done\@: + subs r9, #64 + add sp, #96 + bgt .Lprepare_for_next_block\@ + +.Ldone\@: +.endm // _chacha + +/* + * void chacha20_arm(u8 *out, const u8 *in, size_t len, const u32 key[8], + * const u32 iv[4]); + */ +ENTRY(chacha20_arm) + cmp r2, #0 // len == 0? + bxeq lr + + push {r0-r2,r4-r11,lr} + + // Push state x0-x15 onto stack. + // Also store an extra copy of x10-x11 just before the state. + + ldr r4, [sp, #48] // iv + mov r0, sp + sub sp, #80 + + // iv: x12-x15 + ldm r4, {X12,X13,X14,X15} + stmdb r0!, {X12,X13,X14,X15} + + // key: x4-x11 + __ldrd X8_X10, X9_X11, r3, 24 + __strd X8_X10, X9_X11, sp, 8 + stmdb r0!, {X8_X10, X9_X11} + ldm r3, {X4-X9_X11} + stmdb r0!, {X4-X9_X11} + + // constants: x0-x3 + adrl X3, .Lexpand_32byte_k + ldm X3, {X0-X3} + __strd X0, X1, sp, 16 + __strd X2, X3, sp, 24 + + _chacha 20 + + add sp, #76 + pop {r4-r11, pc} +ENDPROC(chacha20_arm) + +/* + * void hchacha20_arm(const u32 state[16], u32 out[8]); + */ +ENTRY(hchacha20_arm) + push {r1,r4-r11,lr} + + mov r14, r0 + ldmia r14!, {r0-r11} // load x0-x11 + push {r10-r11} // store x10-x11 to stack + ldm r14, {r10-r12,r14} // load x12-x15 + sub sp, #8 + + _chacha_permute 20 + + // Skip over (unused0-unused1, x10-x11) + add sp, #16 + + // Fix up rotations of x12-x15 + ror X12, X12, #drot + ror X13, X13, #drot + pop {r4} // load 'out' + ror X14, X14, #drot + ror X15, X15, #drot + + // Store (x0-x3,x12-x15) to 'out' + stm r4, {X0,X1,X2,X3,X12,X13,X14,X15} + + pop {r4-r11,pc} +ENDPROC(hchacha20_arm) + +#ifdef CONFIG_KERNEL_MODE_NEON +/* + * This following NEON routine was ported from Andy Polyakov's implementation + * from CRYPTOGAMS. It begins with parts of the CRYPTOGAMS scalar routine, + * since certain NEON code paths actually branch to it. + */ + .text #if defined(__thumb2__) || defined(__clang__) .syntax unified @@ -22,39 +484,6 @@ #define ldrhsb ldrbhs #endif -.align 5 -.Lsigma: -.long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral -.Lone: -.long 1,0,0,0 -.word -1 - -.align 5 -ENTRY(chacha20_arm) - ldr r12,[sp,#0] @ pull pointer to counter and nonce - stmdb sp!,{r0-r2,r4-r11,lr} - cmp r2,#0 @ len==0? -#ifdef __thumb2__ - itt eq -#endif - addeq sp,sp,#4*3 - beq .Lno_data_arm - ldmia r12,{r4-r7} @ load counter and nonce - sub sp,sp,#4*(16) @ off-load area -#if __LINUX_ARM_ARCH__ < 7 && !defined(__thumb2__) - sub r14,pc,#100 @ .Lsigma -#else - adr r14,.Lsigma @ .Lsigma -#endif - stmdb sp!,{r4-r7} @ copy counter and nonce - ldmia r3,{r4-r11} @ load key - ldmia r14,{r0-r3} @ load sigma - stmdb sp!,{r4-r11} @ copy key - stmdb sp!,{r0-r3} @ copy sigma - str r10,[sp,#4*(16+10)] @ off-load "rx" - str r11,[sp,#4*(16+11)] @ off-load "rx" - b .Loop_outer_enter - .align 4 .Loop_outer: ldmia sp,{r0-r9} @ load key material @@ -748,11 +1177,8 @@ ENTRY(chacha20_arm) .Ldone: add sp,sp,#4*(32+3) -.Lno_data_arm: ldmia sp!,{r4-r11,pc} -ENDPROC(chacha20_arm) -#ifdef CONFIG_KERNEL_MODE_NEON .align 5 .Lsigma2: .long 0x61707865,0x3320646e,0x79622d32,0x6b206574 @ endian-neutral diff --git a/lib/zinc/chacha20/chacha20-arm64-cryptogams.S b/lib/zinc/chacha20/chacha20-arm64.S similarity index 100% rename from lib/zinc/chacha20/chacha20-arm64-cryptogams.S rename to lib/zinc/chacha20/chacha20-arm64.S diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c index 4354b874a6a5..fc4f74fca653 100644 --- a/lib/zinc/chacha20/chacha20.c +++ b/lib/zinc/chacha20/chacha20.c @@ -16,6 +16,8 @@ #if defined(CONFIG_ZINC_ARCH_X86_64) #include "chacha20-x86_64-glue.h" +#elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64) +#include "chacha20-arm-glue.h" #else void __init chacha20_fpu_init(void) { From patchwork Tue Sep 25 14:56:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147493 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp830446lji; Tue, 25 Sep 2018 07:57:13 -0700 (PDT) X-Google-Smtp-Source: ACcGV61EPLL9kqe26L7XmFBb2CMjyUOTKzM7VDnOtncYhPBzIpFVC5hktxBe3vOsI8ZEpbSfmk7C X-Received: by 2002:a62:cc0e:: with SMTP id a14-v6mr1535251pfg.131.1537887433010; Tue, 25 Sep 2018 07:57:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887433; cv=none; d=google.com; s=arc-20160816; b=Ya0LhCyoG+vaQ1uw/vs7CZBKQyvraoeSIcaWLiWlrOIVDymIv3wRzlEvUJKAzV2LhX UbSUGnbVUL2n2XG1dl3NKDz6GZsZTyVrgKqvOU+bA1p13NUH0rxdVRQ/8s8SOoyDMSjt Y6VxHGoBUf25dLzazpfK916hrcbG9X+q+6bM6XohAx0bBlGrKTO6ARttJaHwWRaC2KdZ 3tH106IB4kGuWB/xSSxeTuzGkUM0QAew/CJcQmk5IATqImTwewEDcmotof/RzMvgmUAY FxFzAtvYxst5S0qCY7pCw2a7wSCgJArelYcH/Z63q/zJ6HmBrccnL882z3VtrXYsfosp A1dg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=WVQp6xGQb5YgaeeuGxblUFcpIXgobvpxi+1KTg5wr70=; b=cAmXHqvkP8crbiLMUkTbphkRSzXomgLBw1AKd5I/U/E0Tc6/78Yiot43Qq6oOjmfTI QuQ7FkaRdhtMsSUQ0DqJwNjfS9k350KksZBTNjL5B33af+CQg3uq6kLN6YdIcwgwAiW+ uuHbpkfOO9xjUx5GA9SZoimPYBA6dp/VRREWOQHsCVfgg/HmLqUAgmIbk2KYKpLWhsrs KIF86KKNi7PNHxllTT3XeWlIOjECnu/9X/3MXNUjxugmqpV1+CeZy21Lqn6i9KpISD0I RaUuZOsBGDtEyB2fj7JSD4+CjdPfODMbG5y5lqMHpNbcC5MtoZVItTgnUj1fKcsnzQ+f olJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=VZQf9YoQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d9-v6si1129157pgm.109.2018.09.25.07.57.12; Tue, 25 Sep 2018 07:57:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=VZQf9YoQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729744AbeIYVFC (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:02 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:58261 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729622AbeIYVE4 (ORCPT ); Tue, 25 Sep 2018 17:04:56 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 4f461687; Tue, 25 Sep 2018 14:38:25 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=mail; bh=nMspqwfANYff FGnVL2YBgVtndLU=; b=VZQf9YoQ9koFHyGxNfG7VmXUzFHLfsqmspRHOHwAFibz 1fHugC1dkGVr/7PnVmJoMwW7PLtfnHObCwrNvX1V0FA2B9Y5hgUQW5b/7y6/ePEi OJRmmOzGH2PQxG1/i8hP17QxViNthR505vdjCqowUUBXIZKCmGcIawYkSZbj7Pa6 sWCKKfmZzU08dQNqAmD0s2S2l/eaxkAgw6KRDmbtrVjnMiPPTSQxCZJmJNxJKe1+ jshwjrUjyYvWoDzDYyaTBtwo31sSpzkNDxZXRYSQ3YA3gJuqZAq6Jdtd0o6vxtlK z17WaJyyAX2SliOUArOk59jIb02Nj6Jh94uhKpvM8Q== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 8e96da90 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:25 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , =?utf-8?q?Ren=C3=A9_van_Dorst?= , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Ralf Baechle , Paul Burton , James Hogan , linux-mips@linux-mips.org Subject: [PATCH net-next v6 08/23] zinc: ChaCha20 MIPS32r2 implementation Date: Tue, 25 Sep 2018 16:56:07 +0200 Message-Id: <20180925145622.29959-9-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This MIPS32r2 implementation comes from René van Dorst and me and results in a nice speedup on the usual OpenWRT targets. Signed-off-by: Jason A. Donenfeld Signed-off-by: René van Dorst Co-developed-by: René van Dorst Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Ralf Baechle Cc: Paul Burton Cc: James Hogan Cc: linux-mips@linux-mips.org --- lib/zinc/Makefile | 2 + lib/zinc/chacha20/chacha20-mips-glue.h | 27 ++ lib/zinc/chacha20/chacha20-mips.S | 424 +++++++++++++++++++++++++ lib/zinc/chacha20/chacha20.c | 2 + 4 files changed, 455 insertions(+) create mode 100644 lib/zinc/chacha20/chacha20-mips-glue.h create mode 100644 lib/zinc/chacha20/chacha20-mips.S -- 2.19.0 diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile index e47f64e12bbd..60d568cf5206 100644 --- a/lib/zinc/Makefile +++ b/lib/zinc/Makefile @@ -6,4 +6,6 @@ zinc_chacha20-y := chacha20/chacha20.o zinc_chacha20-$(CONFIG_ZINC_ARCH_X86_64) += chacha20/chacha20-x86_64.o zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM) += chacha20/chacha20-arm.o zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM64) += chacha20/chacha20-arm64.o +zinc_chacha20-$(CONFIG_ZINC_ARCH_MIPS) += chacha20/chacha20-mips.o +AFLAGS_chacha20-mips.o += -O2 # This is required to fill the branch delay slots obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o diff --git a/lib/zinc/chacha20/chacha20-mips-glue.h b/lib/zinc/chacha20/chacha20-mips-glue.h new file mode 100644 index 000000000000..6e70dd6817d2 --- /dev/null +++ b/lib/zinc/chacha20/chacha20-mips-glue.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +asmlinkage void chacha20_mips(u32 state[16], u8 *out, const u8 *in, + const size_t len); +static void __init chacha20_fpu_init(void) +{ +} + +static inline bool chacha20_arch(struct chacha20_ctx *state, u8 *dst, + const u8 *src, const size_t len, + simd_context_t *simd_context) +{ + chacha20_mips((u32 *)state, dst, src, len); + return true; +} + + +static inline bool hchacha20_arch(u32 derived_key[CHACHA20_KEY_WORDS], + const u8 nonce[HCHACHA20_NONCE_SIZE], + const u8 key[HCHACHA20_KEY_SIZE], + simd_context_t *simd_context) +{ + return false; +} diff --git a/lib/zinc/chacha20/chacha20-mips.S b/lib/zinc/chacha20/chacha20-mips.S new file mode 100644 index 000000000000..031ee5e794df --- /dev/null +++ b/lib/zinc/chacha20/chacha20-mips.S @@ -0,0 +1,424 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2016-2018 René van Dorst . All Rights Reserved. + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#define MASK_U32 0x3c +#define CHACHA20_BLOCK_SIZE 64 +#define STACK_SIZE 32 + +#define X0 $t0 +#define X1 $t1 +#define X2 $t2 +#define X3 $t3 +#define X4 $t4 +#define X5 $t5 +#define X6 $t6 +#define X7 $t7 +#define X8 $t8 +#define X9 $t9 +#define X10 $v1 +#define X11 $s6 +#define X12 $s5 +#define X13 $s4 +#define X14 $s3 +#define X15 $s2 +/* Use regs which are overwritten on exit for Tx so we don't leak clear data. */ +#define T0 $s1 +#define T1 $s0 +#define T(n) T ## n +#define X(n) X ## n + +/* Input arguments */ +#define STATE $a0 +#define OUT $a1 +#define IN $a2 +#define BYTES $a3 + +/* Output argument */ +/* NONCE[0] is kept in a register and not in memory. + * We don't want to touch original value in memory. + * Must be incremented every loop iteration. + */ +#define NONCE_0 $v0 + +/* SAVED_X and SAVED_CA are set in the jump table. + * Use regs which are overwritten on exit else we don't leak clear data. + * They are used to handling the last bytes which are not multiple of 4. + */ +#define SAVED_X X15 +#define SAVED_CA $s7 + +#define IS_UNALIGNED $s7 + +#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ +#define MSB 0 +#define LSB 3 +#define ROTx rotl +#define ROTR(n) rotr n, 24 +#define CPU_TO_LE32(n) \ + wsbh n; \ + rotr n, 16; +#else +#define MSB 3 +#define LSB 0 +#define ROTx rotr +#define CPU_TO_LE32(n) +#define ROTR(n) +#endif + +#define FOR_EACH_WORD(x) \ + x( 0); \ + x( 1); \ + x( 2); \ + x( 3); \ + x( 4); \ + x( 5); \ + x( 6); \ + x( 7); \ + x( 8); \ + x( 9); \ + x(10); \ + x(11); \ + x(12); \ + x(13); \ + x(14); \ + x(15); + +#define FOR_EACH_WORD_REV(x) \ + x(15); \ + x(14); \ + x(13); \ + x(12); \ + x(11); \ + x(10); \ + x( 9); \ + x( 8); \ + x( 7); \ + x( 6); \ + x( 5); \ + x( 4); \ + x( 3); \ + x( 2); \ + x( 1); \ + x( 0); + +#define PLUS_ONE_0 1 +#define PLUS_ONE_1 2 +#define PLUS_ONE_2 3 +#define PLUS_ONE_3 4 +#define PLUS_ONE_4 5 +#define PLUS_ONE_5 6 +#define PLUS_ONE_6 7 +#define PLUS_ONE_7 8 +#define PLUS_ONE_8 9 +#define PLUS_ONE_9 10 +#define PLUS_ONE_10 11 +#define PLUS_ONE_11 12 +#define PLUS_ONE_12 13 +#define PLUS_ONE_13 14 +#define PLUS_ONE_14 15 +#define PLUS_ONE_15 16 +#define PLUS_ONE(x) PLUS_ONE_ ## x +#define _CONCAT3(a,b,c) a ## b ## c +#define CONCAT3(a,b,c) _CONCAT3(a,b,c) + +#define STORE_UNALIGNED(x) \ +CONCAT3(.Lchacha20_mips_xor_unaligned_, PLUS_ONE(x), _b: ;) \ + .if (x != 12); \ + lw T0, (x*4)(STATE); \ + .endif; \ + lwl T1, (x*4)+MSB ## (IN); \ + lwr T1, (x*4)+LSB ## (IN); \ + .if (x == 12); \ + addu X ## x, NONCE_0; \ + .else; \ + addu X ## x, T0; \ + .endif; \ + CPU_TO_LE32(X ## x); \ + xor X ## x, T1; \ + swl X ## x, (x*4)+MSB ## (OUT); \ + swr X ## x, (x*4)+LSB ## (OUT); + +#define STORE_ALIGNED(x) \ +CONCAT3(.Lchacha20_mips_xor_aligned_, PLUS_ONE(x), _b: ;) \ + .if (x != 12); \ + lw T0, (x*4)(STATE); \ + .endif; \ + lw T1, (x*4) ## (IN); \ + .if (x == 12); \ + addu X ## x, NONCE_0; \ + .else; \ + addu X ## x, T0; \ + .endif; \ + CPU_TO_LE32(X ## x); \ + xor X ## x, T1; \ + sw X ## x, (x*4) ## (OUT); + +/* Jump table macro. + * Used for setup and handling the last bytes, which are not multiple of 4. + * X15 is free to store Xn + * Every jumptable entry must be equal in size. + */ +#define JMPTBL_ALIGNED(x) \ +.Lchacha20_mips_jmptbl_aligned_ ## x: ; \ + .set noreorder; \ + b .Lchacha20_mips_xor_aligned_ ## x ## _b; \ + .if (x == 12); \ + addu SAVED_X, X ## x, NONCE_0; \ + .else; \ + addu SAVED_X, X ## x, SAVED_CA; \ + .endif; \ + .set reorder + +#define JMPTBL_UNALIGNED(x) \ +.Lchacha20_mips_jmptbl_unaligned_ ## x: ; \ + .set noreorder; \ + b .Lchacha20_mips_xor_unaligned_ ## x ## _b; \ + .if (x == 12); \ + addu SAVED_X, X ## x, NONCE_0; \ + .else; \ + addu SAVED_X, X ## x, SAVED_CA; \ + .endif; \ + .set reorder + +#define AXR(A, B, C, D, K, L, M, N, V, W, Y, Z, S) \ + addu X(A), X(K); \ + addu X(B), X(L); \ + addu X(C), X(M); \ + addu X(D), X(N); \ + xor X(V), X(A); \ + xor X(W), X(B); \ + xor X(Y), X(C); \ + xor X(Z), X(D); \ + rotl X(V), S; \ + rotl X(W), S; \ + rotl X(Y), S; \ + rotl X(Z), S; + +.text +.set reorder +.set noat +.globl chacha20_mips +.ent chacha20_mips +chacha20_mips: + .frame $sp, STACK_SIZE, $ra + + addiu $sp, -STACK_SIZE + + /* Return bytes = 0. */ + beqz BYTES, .Lchacha20_mips_end + + lw NONCE_0, 48(STATE) + + /* Save s0-s7 */ + sw $s0, 0($sp) + sw $s1, 4($sp) + sw $s2, 8($sp) + sw $s3, 12($sp) + sw $s4, 16($sp) + sw $s5, 20($sp) + sw $s6, 24($sp) + sw $s7, 28($sp) + + /* Test IN or OUT is unaligned. + * IS_UNALIGNED = ( IN | OUT ) & 0x00000003 + */ + or IS_UNALIGNED, IN, OUT + andi IS_UNALIGNED, 0x3 + + /* Set number of rounds */ + li $at, 20 + + b .Lchacha20_rounds_start + +.align 4 +.Loop_chacha20_rounds: + addiu IN, CHACHA20_BLOCK_SIZE + addiu OUT, CHACHA20_BLOCK_SIZE + addiu NONCE_0, 1 + +.Lchacha20_rounds_start: + lw X0, 0(STATE) + lw X1, 4(STATE) + lw X2, 8(STATE) + lw X3, 12(STATE) + + lw X4, 16(STATE) + lw X5, 20(STATE) + lw X6, 24(STATE) + lw X7, 28(STATE) + lw X8, 32(STATE) + lw X9, 36(STATE) + lw X10, 40(STATE) + lw X11, 44(STATE) + + move X12, NONCE_0 + lw X13, 52(STATE) + lw X14, 56(STATE) + lw X15, 60(STATE) + +.Loop_chacha20_xor_rounds: + addiu $at, -2 + AXR( 0, 1, 2, 3, 4, 5, 6, 7, 12,13,14,15, 16); + AXR( 8, 9,10,11, 12,13,14,15, 4, 5, 6, 7, 12); + AXR( 0, 1, 2, 3, 4, 5, 6, 7, 12,13,14,15, 8); + AXR( 8, 9,10,11, 12,13,14,15, 4, 5, 6, 7, 7); + AXR( 0, 1, 2, 3, 5, 6, 7, 4, 15,12,13,14, 16); + AXR(10,11, 8, 9, 15,12,13,14, 5, 6, 7, 4, 12); + AXR( 0, 1, 2, 3, 5, 6, 7, 4, 15,12,13,14, 8); + AXR(10,11, 8, 9, 15,12,13,14, 5, 6, 7, 4, 7); + bnez $at, .Loop_chacha20_xor_rounds + + addiu BYTES, -(CHACHA20_BLOCK_SIZE) + + /* Is data src/dst unaligned? Jump */ + bnez IS_UNALIGNED, .Loop_chacha20_unaligned + + /* Set number rounds here to fill delayslot. */ + li $at, 20 + + /* BYTES < 0, it has no full block. */ + bltz BYTES, .Lchacha20_mips_no_full_block_aligned + + FOR_EACH_WORD_REV(STORE_ALIGNED) + + /* BYTES > 0? Loop again. */ + bgtz BYTES, .Loop_chacha20_rounds + + /* Place this here to fill delay slot */ + addiu NONCE_0, 1 + + /* BYTES < 0? Handle last bytes */ + bltz BYTES, .Lchacha20_mips_xor_bytes + +.Lchacha20_mips_xor_done: + /* Restore used registers */ + lw $s0, 0($sp) + lw $s1, 4($sp) + lw $s2, 8($sp) + lw $s3, 12($sp) + lw $s4, 16($sp) + lw $s5, 20($sp) + lw $s6, 24($sp) + lw $s7, 28($sp) + + /* Write NONCE_0 back to right location in state */ + sw NONCE_0, 48(STATE) + +.Lchacha20_mips_end: + addiu $sp, STACK_SIZE + jr $ra + +.Lchacha20_mips_no_full_block_aligned: + /* Restore the offset on BYTES */ + addiu BYTES, CHACHA20_BLOCK_SIZE + + /* Get number of full WORDS */ + andi $at, BYTES, MASK_U32 + + /* Load upper half of jump table addr */ + lui T0, %hi(.Lchacha20_mips_jmptbl_aligned_0) + + /* Calculate lower half jump table offset */ + ins T0, $at, 1, 6 + + /* Add offset to STATE */ + addu T1, STATE, $at + + /* Add lower half jump table addr */ + addiu T0, %lo(.Lchacha20_mips_jmptbl_aligned_0) + + /* Read value from STATE */ + lw SAVED_CA, 0(T1) + + /* Store remaining bytecounter as negative value */ + subu BYTES, $at, BYTES + + jr T0 + + /* Jump table */ + FOR_EACH_WORD(JMPTBL_ALIGNED) + + +.Loop_chacha20_unaligned: + /* Set number rounds here to fill delayslot. */ + li $at, 20 + + /* BYTES > 0, it has no full block. */ + bltz BYTES, .Lchacha20_mips_no_full_block_unaligned + + FOR_EACH_WORD_REV(STORE_UNALIGNED) + + /* BYTES > 0? Loop again. */ + bgtz BYTES, .Loop_chacha20_rounds + + /* Write NONCE_0 back to right location in state */ + sw NONCE_0, 48(STATE) + + .set noreorder + /* Fall through to byte handling */ + bgez BYTES, .Lchacha20_mips_xor_done +.Lchacha20_mips_xor_unaligned_0_b: +.Lchacha20_mips_xor_aligned_0_b: + /* Place this here to fill delay slot */ + addiu NONCE_0, 1 + .set reorder + +.Lchacha20_mips_xor_bytes: + addu IN, $at + addu OUT, $at + /* First byte */ + lbu T1, 0(IN) + addiu $at, BYTES, 1 + CPU_TO_LE32(SAVED_X) + ROTR(SAVED_X) + xor T1, SAVED_X + sb T1, 0(OUT) + beqz $at, .Lchacha20_mips_xor_done + /* Second byte */ + lbu T1, 1(IN) + addiu $at, BYTES, 2 + ROTx SAVED_X, 8 + xor T1, SAVED_X + sb T1, 1(OUT) + beqz $at, .Lchacha20_mips_xor_done + /* Third byte */ + lbu T1, 2(IN) + ROTx SAVED_X, 8 + xor T1, SAVED_X + sb T1, 2(OUT) + b .Lchacha20_mips_xor_done + +.Lchacha20_mips_no_full_block_unaligned: + /* Restore the offset on BYTES */ + addiu BYTES, CHACHA20_BLOCK_SIZE + + /* Get number of full WORDS */ + andi $at, BYTES, MASK_U32 + + /* Load upper half of jump table addr */ + lui T0, %hi(.Lchacha20_mips_jmptbl_unaligned_0) + + /* Calculate lower half jump table offset */ + ins T0, $at, 1, 6 + + /* Add offset to STATE */ + addu T1, STATE, $at + + /* Add lower half jump table addr */ + addiu T0, %lo(.Lchacha20_mips_jmptbl_unaligned_0) + + /* Read value from STATE */ + lw SAVED_CA, 0(T1) + + /* Store remaining bytecounter as negative value */ + subu BYTES, $at, BYTES + + jr T0 + + /* Jump table */ + FOR_EACH_WORD(JMPTBL_UNALIGNED) +.end chacha20_mips +.set at diff --git a/lib/zinc/chacha20/chacha20.c b/lib/zinc/chacha20/chacha20.c index fc4f74fca653..34cb35528c3d 100644 --- a/lib/zinc/chacha20/chacha20.c +++ b/lib/zinc/chacha20/chacha20.c @@ -18,6 +18,8 @@ #include "chacha20-x86_64-glue.h" #elif defined(CONFIG_ZINC_ARCH_ARM) || defined(CONFIG_ZINC_ARCH_ARM64) #include "chacha20-arm-glue.h" +#elif defined(CONFIG_ZINC_ARCH_MIPS) +#include "chacha20-mips-glue.h" #else void __init chacha20_fpu_init(void) { From patchwork Tue Sep 25 14:56:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147494 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp830436lji; Tue, 25 Sep 2018 07:57:12 -0700 (PDT) X-Google-Smtp-Source: ACcGV61B55LzExV2RxDsYvPpgrvcmi7Eex2lQ2LAjkID6LCIHcYR1MgusYtdnXib/BVSi/rljziP X-Received: by 2002:a62:f909:: with SMTP id o9-v6mr1529585pfh.141.1537887432351; Tue, 25 Sep 2018 07:57:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887432; cv=none; d=google.com; s=arc-20160816; b=LI13h30ow+VUSp61kb3M2tvVwE6ma7GIGgAKv+emYDW4wGNBCWpqbwYV5s04bwiXhG A3vqLU34iIqskGpDng2ZdYylyi1g9XPjrpkC+qd+uioGVs0YzAmn6nlEHeTwwnu9N3e/ tlEDUy99ebxaBKvUOe5OsUGbTWZLNTOMkNJoN6YgHQNRUII+F/El4O/KhB3PJ0ckzMET Auhy2BVp63e80tug92XEGcB/zUCKKBAVpdah/88+MyvIXnVZDoy7lZ/iuAvAJ2QF7uVk ZvXzgkaQDUmBi0wDkIOVQHDLTI3QysajlFFI7Q2tFRe/0Lp4KrRrhu5e8cskbknfQe9s LPWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UujhdmVSHWX5zmoBAMGlUZOovL5vy3zXvosP6o7/lj4=; b=srFyVrPTKr1loE7pg76KI2HZlq440LBC2SlmRNlhdWpg8q/nkUsxbgKef+kevNLDIi IZ7BqXsmpFtTG31UpXRzqlUJu0hYDH6MGY1nIt3bmthYJebFK3xKvRwxEwzsXMMh4ctA dD77iCGCvDDVC9hSw51oav1A3pdt5yaHRwE2nhv/qWLF+GeWTpUAMJ4KzYLtIeuTWABZ iNWv7DXVple3eukzYaKoqtqVi9l0MGXBaIhjgelbJeUbZ5pZ13b+/cHrR+pYVbfzAwAl zctgVKkZa7OIVVa094ZsIpor2bVu8whbyASdD+JoGl4oPj8HVT4Lwhh8G+AfD51iRbdw Brfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b="kF2/TKgK"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d9-v6si1129157pgm.109.2018.09.25.07.57.11; Tue, 25 Sep 2018 07:57:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b="kF2/TKgK"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729713AbeIYVFC (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:02 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:36361 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729616AbeIYVFA (ORCPT ); Tue, 25 Sep 2018 17:05:00 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id d52191ea; Tue, 25 Sep 2018 14:38:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=mail; bh=zNb15dQfHBfv zdiHdsqvoa4nyGY=; b=kF2/TKgKfRSp4TlbFin2fR9QSqBrX4f8H6s94ROxqoXv xZ9wV60yfylp48Wabw5x1JgHicJyb2mUvMarAOdm5nbntaaATl7Z3qkIIWOJFNIm dNQNFMFrJFrMW3KOBWaT5gTR5vsxDaiwwyuukccDjBlLR7w7v0UgdTrQgpOn4+4c Zfe3mlqSewdMDjBaRBzQTx/P+boXYmjfdc8KNuzCfFpfcsVgH+WQNnRqsYEr3Mn1 4L+Sii4QL2lYTFPRKg+MEadqO9hdv/wsuCf6Jwwouna4ub+V0jfWVNW+bBCV8vyF tuavcTgFbSJWHMX6L8hcAM1P4BtTX3IBABjTF/8hDg== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 7275e9ae (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:27 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson Subject: [PATCH net-next v6 09/23] zinc: Poly1305 generic C implementations and selftest Date: Tue, 25 Sep 2018 16:56:08 +0200 Message-Id: <20180925145622.29959-10-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org These two C implementations -- a 32x32 one and a 64x64 one, depending on the platform -- come from Andrew Moon's public domain poly1305-donna portable code, modified for usage in the kernel and for usage with accelerated primitives. Information: https://cr.yp.to/mac.html Signed-off-by: Jason A. Donenfeld Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson --- include/zinc/poly1305.h | 31 + lib/zinc/Kconfig | 3 + lib/zinc/Makefile | 3 + lib/zinc/poly1305/poly1305-donna32.h | 205 +++++ lib/zinc/poly1305/poly1305-donna64.h | 182 +++++ lib/zinc/poly1305/poly1305.c | 155 ++++ lib/zinc/selftest/poly1305.h | 1112 ++++++++++++++++++++++++++ 7 files changed, 1691 insertions(+) create mode 100644 include/zinc/poly1305.h create mode 100644 lib/zinc/poly1305/poly1305-donna32.h create mode 100644 lib/zinc/poly1305/poly1305-donna64.h create mode 100644 lib/zinc/poly1305/poly1305.c create mode 100644 lib/zinc/selftest/poly1305.h -- 2.19.0 diff --git a/include/zinc/poly1305.h b/include/zinc/poly1305.h new file mode 100644 index 000000000000..13fe0e50fc3c --- /dev/null +++ b/include/zinc/poly1305.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _ZINC_POLY1305_H +#define _ZINC_POLY1305_H + +#include +#include + +enum poly1305_lengths { + POLY1305_BLOCK_SIZE = 16, + POLY1305_KEY_SIZE = 32, + POLY1305_MAC_SIZE = 16 +}; + +struct poly1305_ctx { + u8 opaque[24 * sizeof(u64)]; + u32 nonce[4]; + u8 data[POLY1305_BLOCK_SIZE]; + size_t num; +} __aligned(8); + +void poly1305_init(struct poly1305_ctx *ctx, const u8 key[POLY1305_KEY_SIZE]); +void poly1305_update(struct poly1305_ctx *ctx, const u8 *input, size_t len, + simd_context_t *simd_context); +void poly1305_final(struct poly1305_ctx *ctx, u8 mac[POLY1305_MAC_SIZE], + simd_context_t *simd_context); + +#endif /* _ZINC_POLY1305_H */ diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig index 1ca1ae1e9ea9..f9ef8f7e3c25 100644 --- a/lib/zinc/Kconfig +++ b/lib/zinc/Kconfig @@ -2,6 +2,9 @@ config ZINC_CHACHA20 tristate select CRYPTO_ALGAPI +config ZINC_POLY1305 + tristate + config ZINC_DEBUG bool "Zinc cryptography library debugging and self-tests" help diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile index 60d568cf5206..6fc9626c55fa 100644 --- a/lib/zinc/Makefile +++ b/lib/zinc/Makefile @@ -9,3 +9,6 @@ zinc_chacha20-$(CONFIG_ZINC_ARCH_ARM64) += chacha20/chacha20-arm64.o zinc_chacha20-$(CONFIG_ZINC_ARCH_MIPS) += chacha20/chacha20-mips.o AFLAGS_chacha20-mips.o += -O2 # This is required to fill the branch delay slots obj-$(CONFIG_ZINC_CHACHA20) += zinc_chacha20.o + +zinc_poly1305-y := poly1305/poly1305.o +obj-$(CONFIG_ZINC_POLY1305) += zinc_poly1305.o diff --git a/lib/zinc/poly1305/poly1305-donna32.h b/lib/zinc/poly1305/poly1305-donna32.h new file mode 100644 index 000000000000..2e7565dd9679 --- /dev/null +++ b/lib/zinc/poly1305/poly1305-donna32.h @@ -0,0 +1,205 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * + * This is based in part on Andrew Moon's poly1305-donna, which is in the + * public domain. + */ + +struct poly1305_internal { + u32 h[5]; + u32 r[5]; + u32 s[4]; +}; + +static void poly1305_init_generic(void *ctx, const u8 key[16]) +{ + struct poly1305_internal *st = (struct poly1305_internal *)ctx; + + /* r &= 0xffffffc0ffffffc0ffffffc0fffffff */ + st->r[0] = (get_unaligned_le32(&key[0])) & 0x3ffffff; + st->r[1] = (get_unaligned_le32(&key[3]) >> 2) & 0x3ffff03; + st->r[2] = (get_unaligned_le32(&key[6]) >> 4) & 0x3ffc0ff; + st->r[3] = (get_unaligned_le32(&key[9]) >> 6) & 0x3f03fff; + st->r[4] = (get_unaligned_le32(&key[12]) >> 8) & 0x00fffff; + + /* s = 5*r */ + st->s[0] = st->r[1] * 5; + st->s[1] = st->r[2] * 5; + st->s[2] = st->r[3] * 5; + st->s[3] = st->r[4] * 5; + + /* h = 0 */ + st->h[0] = 0; + st->h[1] = 0; + st->h[2] = 0; + st->h[3] = 0; + st->h[4] = 0; +} + +static void poly1305_blocks_generic(void *ctx, const u8 *input, size_t len, + const u32 padbit) +{ + struct poly1305_internal *st = (struct poly1305_internal *)ctx; + const u32 hibit = padbit << 24; + u32 r0, r1, r2, r3, r4; + u32 s1, s2, s3, s4; + u32 h0, h1, h2, h3, h4; + u64 d0, d1, d2, d3, d4; + u32 c; + + r0 = st->r[0]; + r1 = st->r[1]; + r2 = st->r[2]; + r3 = st->r[3]; + r4 = st->r[4]; + + s1 = st->s[0]; + s2 = st->s[1]; + s3 = st->s[2]; + s4 = st->s[3]; + + h0 = st->h[0]; + h1 = st->h[1]; + h2 = st->h[2]; + h3 = st->h[3]; + h4 = st->h[4]; + + while (len >= POLY1305_BLOCK_SIZE) { + /* h += m[i] */ + h0 += (get_unaligned_le32(&input[0])) & 0x3ffffff; + h1 += (get_unaligned_le32(&input[3]) >> 2) & 0x3ffffff; + h2 += (get_unaligned_le32(&input[6]) >> 4) & 0x3ffffff; + h3 += (get_unaligned_le32(&input[9]) >> 6) & 0x3ffffff; + h4 += (get_unaligned_le32(&input[12]) >> 8) | hibit; + + /* h *= r */ + d0 = ((u64)h0 * r0) + ((u64)h1 * s4) + + ((u64)h2 * s3) + ((u64)h3 * s2) + + ((u64)h4 * s1); + d1 = ((u64)h0 * r1) + ((u64)h1 * r0) + + ((u64)h2 * s4) + ((u64)h3 * s3) + + ((u64)h4 * s2); + d2 = ((u64)h0 * r2) + ((u64)h1 * r1) + + ((u64)h2 * r0) + ((u64)h3 * s4) + + ((u64)h4 * s3); + d3 = ((u64)h0 * r3) + ((u64)h1 * r2) + + ((u64)h2 * r1) + ((u64)h3 * r0) + + ((u64)h4 * s4); + d4 = ((u64)h0 * r4) + ((u64)h1 * r3) + + ((u64)h2 * r2) + ((u64)h3 * r1) + + ((u64)h4 * r0); + + /* (partial) h %= p */ + c = (u32)(d0 >> 26); + h0 = (u32)d0 & 0x3ffffff; + d1 += c; + c = (u32)(d1 >> 26); + h1 = (u32)d1 & 0x3ffffff; + d2 += c; + c = (u32)(d2 >> 26); + h2 = (u32)d2 & 0x3ffffff; + d3 += c; + c = (u32)(d3 >> 26); + h3 = (u32)d3 & 0x3ffffff; + d4 += c; + c = (u32)(d4 >> 26); + h4 = (u32)d4 & 0x3ffffff; + h0 += c * 5; + c = (h0 >> 26); + h0 = h0 & 0x3ffffff; + h1 += c; + + input += POLY1305_BLOCK_SIZE; + len -= POLY1305_BLOCK_SIZE; + } + + st->h[0] = h0; + st->h[1] = h1; + st->h[2] = h2; + st->h[3] = h3; + st->h[4] = h4; +} + +static void poly1305_emit_generic(void *ctx, u8 mac[16], const u32 nonce[4]) +{ + struct poly1305_internal *st = (struct poly1305_internal *)ctx; + u32 h0, h1, h2, h3, h4, c; + u32 g0, g1, g2, g3, g4; + u64 f; + u32 mask; + + /* fully carry h */ + h0 = st->h[0]; + h1 = st->h[1]; + h2 = st->h[2]; + h3 = st->h[3]; + h4 = st->h[4]; + + c = h1 >> 26; + h1 = h1 & 0x3ffffff; + h2 += c; + c = h2 >> 26; + h2 = h2 & 0x3ffffff; + h3 += c; + c = h3 >> 26; + h3 = h3 & 0x3ffffff; + h4 += c; + c = h4 >> 26; + h4 = h4 & 0x3ffffff; + h0 += c * 5; + c = h0 >> 26; + h0 = h0 & 0x3ffffff; + h1 += c; + + /* compute h + -p */ + g0 = h0 + 5; + c = g0 >> 26; + g0 &= 0x3ffffff; + g1 = h1 + c; + c = g1 >> 26; + g1 &= 0x3ffffff; + g2 = h2 + c; + c = g2 >> 26; + g2 &= 0x3ffffff; + g3 = h3 + c; + c = g3 >> 26; + g3 &= 0x3ffffff; + g4 = h4 + c - (1UL << 26); + + /* select h if h < p, or h + -p if h >= p */ + mask = (g4 >> ((sizeof(u32) * 8) - 1)) - 1; + g0 &= mask; + g1 &= mask; + g2 &= mask; + g3 &= mask; + g4 &= mask; + mask = ~mask; + + h0 = (h0 & mask) | g0; + h1 = (h1 & mask) | g1; + h2 = (h2 & mask) | g2; + h3 = (h3 & mask) | g3; + h4 = (h4 & mask) | g4; + + /* h = h % (2^128) */ + h0 = ((h0) | (h1 << 26)) & 0xffffffff; + h1 = ((h1 >> 6) | (h2 << 20)) & 0xffffffff; + h2 = ((h2 >> 12) | (h3 << 14)) & 0xffffffff; + h3 = ((h3 >> 18) | (h4 << 8)) & 0xffffffff; + + /* mac = (h + nonce) % (2^128) */ + f = (u64)h0 + nonce[0]; + h0 = (u32)f; + f = (u64)h1 + nonce[1] + (f >> 32); + h1 = (u32)f; + f = (u64)h2 + nonce[2] + (f >> 32); + h2 = (u32)f; + f = (u64)h3 + nonce[3] + (f >> 32); + h3 = (u32)f; + + put_unaligned_le32(h0, &mac[0]); + put_unaligned_le32(h1, &mac[4]); + put_unaligned_le32(h2, &mac[8]); + put_unaligned_le32(h3, &mac[12]); +} diff --git a/lib/zinc/poly1305/poly1305-donna64.h b/lib/zinc/poly1305/poly1305-donna64.h new file mode 100644 index 000000000000..7850e0a11600 --- /dev/null +++ b/lib/zinc/poly1305/poly1305-donna64.h @@ -0,0 +1,182 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * + * This is based in part on Andrew Moon's poly1305-donna, which is in the + * public domain. + */ + +typedef __uint128_t u128; + +struct poly1305_internal { + u64 r[3]; + u64 h[3]; + u64 s[2]; +}; + +static void poly1305_init_generic(void *ctx, const u8 key[16]) +{ + struct poly1305_internal *st = (struct poly1305_internal *)ctx; + u64 t0, t1; + + /* r &= 0xffffffc0ffffffc0ffffffc0fffffff */ + t0 = get_unaligned_le64(&key[0]); + t1 = get_unaligned_le64(&key[8]); + + st->r[0] = t0 & 0xffc0fffffff; + st->r[1] = ((t0 >> 44) | (t1 << 20)) & 0xfffffc0ffff; + st->r[2] = ((t1 >> 24)) & 0x00ffffffc0f; + + /* s = 20*r */ + st->s[0] = st->r[1] * 20; + st->s[1] = st->r[2] * 20; + + /* h = 0 */ + st->h[0] = 0; + st->h[1] = 0; + st->h[2] = 0; +} + +static void poly1305_blocks_generic(void *ctx, const u8 *input, size_t len, + const u32 padbit) +{ + struct poly1305_internal *st = (struct poly1305_internal *)ctx; + const u64 hibit = ((u64)padbit) << 40; + u64 r0, r1, r2; + u64 s1, s2; + u64 h0, h1, h2; + u64 c; + u128 d0, d1, d2, d; + + r0 = st->r[0]; + r1 = st->r[1]; + r2 = st->r[2]; + + h0 = st->h[0]; + h1 = st->h[1]; + h2 = st->h[2]; + + s1 = st->s[0]; + s2 = st->s[1]; + + while (len >= POLY1305_BLOCK_SIZE) { + u64 t0, t1; + + /* h += m[i] */ + t0 = get_unaligned_le64(&input[0]); + t1 = get_unaligned_le64(&input[8]); + + h0 += t0 & 0xfffffffffff; + h1 += ((t0 >> 44) | (t1 << 20)) & 0xfffffffffff; + h2 += (((t1 >> 24)) & 0x3ffffffffff) | hibit; + + /* h *= r */ + d0 = (u128)h0 * r0; + d = (u128)h1 * s2; + d0 += d; + d = (u128)h2 * s1; + d0 += d; + d1 = (u128)h0 * r1; + d = (u128)h1 * r0; + d1 += d; + d = (u128)h2 * s2; + d1 += d; + d2 = (u128)h0 * r2; + d = (u128)h1 * r1; + d2 += d; + d = (u128)h2 * r0; + d2 += d; + + /* (partial) h %= p */ + c = (u64)(d0 >> 44); + h0 = (u64)d0 & 0xfffffffffff; + d1 += c; + c = (u64)(d1 >> 44); + h1 = (u64)d1 & 0xfffffffffff; + d2 += c; + c = (u64)(d2 >> 42); + h2 = (u64)d2 & 0x3ffffffffff; + h0 += c * 5; + c = h0 >> 44; + h0 = h0 & 0xfffffffffff; + h1 += c; + + input += POLY1305_BLOCK_SIZE; + len -= POLY1305_BLOCK_SIZE; + } + + st->h[0] = h0; + st->h[1] = h1; + st->h[2] = h2; +} + +static void poly1305_emit_generic(void *ctx, u8 mac[16], const u32 nonce[4]) +{ + struct poly1305_internal *st = (struct poly1305_internal *)ctx; + u64 h0, h1, h2, c; + u64 g0, g1, g2; + u64 t0, t1; + + /* fully carry h */ + h0 = st->h[0]; + h1 = st->h[1]; + h2 = st->h[2]; + + c = h1 >> 44; + h1 &= 0xfffffffffff; + h2 += c; + c = h2 >> 42; + h2 &= 0x3ffffffffff; + h0 += c * 5; + c = h0 >> 44; + h0 &= 0xfffffffffff; + h1 += c; + c = h1 >> 44; + h1 &= 0xfffffffffff; + h2 += c; + c = h2 >> 42; + h2 &= 0x3ffffffffff; + h0 += c * 5; + c = h0 >> 44; + h0 &= 0xfffffffffff; + h1 += c; + + /* compute h + -p */ + g0 = h0 + 5; + c = g0 >> 44; + g0 &= 0xfffffffffff; + g1 = h1 + c; + c = g1 >> 44; + g1 &= 0xfffffffffff; + g2 = h2 + c - (1ULL << 42); + + /* select h if h < p, or h + -p if h >= p */ + c = (g2 >> ((sizeof(u64) * 8) - 1)) - 1; + g0 &= c; + g1 &= c; + g2 &= c; + c = ~c; + h0 = (h0 & c) | g0; + h1 = (h1 & c) | g1; + h2 = (h2 & c) | g2; + + /* h = (h + nonce) */ + t0 = ((u64)nonce[1] << 32) | nonce[0]; + t1 = ((u64)nonce[3] << 32) | nonce[2]; + + h0 += t0 & 0xfffffffffff; + c = h0 >> 44; + h0 &= 0xfffffffffff; + h1 += (((t0 >> 44) | (t1 << 20)) & 0xfffffffffff) + c; + c = h1 >> 44; + h1 &= 0xfffffffffff; + h2 += (((t1 >> 24)) & 0x3ffffffffff) + c; + h2 &= 0x3ffffffffff; + + /* mac = h % (2^128) */ + h0 = h0 | (h1 << 44); + h1 = (h1 >> 20) | (h2 << 24); + + put_unaligned_le64(h0, &mac[0]); + put_unaligned_le64(h1, &mac[8]); +} diff --git a/lib/zinc/poly1305/poly1305.c b/lib/zinc/poly1305/poly1305.c new file mode 100644 index 000000000000..538abc359d1d --- /dev/null +++ b/lib/zinc/poly1305/poly1305.c @@ -0,0 +1,155 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * + * Implementation of the Poly1305 message authenticator. + * + * Information: https://cr.yp.to/mac.html + */ + +#include + +#include +#include +#include +#include +#include + +#ifndef HAVE_POLY1305_ARCH_IMPLEMENTATION +static inline bool poly1305_init_arch(void *ctx, + const u8 key[POLY1305_KEY_SIZE]) +{ + return false; +} +static inline bool poly1305_blocks_arch(void *ctx, const u8 *input, + const size_t len, const u32 padbit, + simd_context_t *simd_context) +{ + return false; +} +static inline bool poly1305_emit_arch(void *ctx, u8 mac[POLY1305_MAC_SIZE], + const u32 nonce[4], + simd_context_t *simd_context) +{ + return false; +} +void __init poly1305_fpu_init(void) +{ +} +#endif + +#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__) +#include "poly1305-donna64.h" +#else +#include "poly1305-donna32.h" +#endif + +void poly1305_init(struct poly1305_ctx *ctx, const u8 key[POLY1305_KEY_SIZE]) +{ + ctx->nonce[0] = get_unaligned_le32(&key[16]); + ctx->nonce[1] = get_unaligned_le32(&key[20]); + ctx->nonce[2] = get_unaligned_le32(&key[24]); + ctx->nonce[3] = get_unaligned_le32(&key[28]); + + if (!poly1305_init_arch(ctx->opaque, key)) + poly1305_init_generic(ctx->opaque, key); + + ctx->num = 0; +} +EXPORT_SYMBOL(poly1305_init); + +static inline void poly1305_blocks(void *ctx, const u8 *input, const size_t len, + const u32 padbit, + simd_context_t *simd_context) +{ + if (!poly1305_blocks_arch(ctx, input, len, padbit, simd_context)) + poly1305_blocks_generic(ctx, input, len, padbit); +} + +static inline void poly1305_emit(void *ctx, u8 mac[POLY1305_KEY_SIZE], + const u32 nonce[4], + simd_context_t *simd_context) +{ + if (!poly1305_emit_arch(ctx, mac, nonce, simd_context)) + poly1305_emit_generic(ctx, mac, nonce); +} + +void poly1305_update(struct poly1305_ctx *ctx, const u8 *input, size_t len, + simd_context_t *simd_context) +{ + const size_t num = ctx->num; + size_t rem; + + if (num) { + rem = POLY1305_BLOCK_SIZE - num; + if (len < rem) { + memcpy(ctx->data + num, input, len); + ctx->num = num + len; + return; + } + memcpy(ctx->data + num, input, rem); + poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 1, + simd_context); + input += rem; + len -= rem; + } + + rem = len % POLY1305_BLOCK_SIZE; + len -= rem; + + if (len >= POLY1305_BLOCK_SIZE) { + poly1305_blocks(ctx->opaque, input, len, 1, simd_context); + input += len; + } + + if (rem) + memcpy(ctx->data, input, rem); + + ctx->num = rem; +} +EXPORT_SYMBOL(poly1305_update); + +void poly1305_final(struct poly1305_ctx *ctx, u8 mac[POLY1305_MAC_SIZE], + simd_context_t *simd_context) +{ + size_t num = ctx->num; + + if (num) { + ctx->data[num++] = 1; + while (num < POLY1305_BLOCK_SIZE) + ctx->data[num++] = 0; + poly1305_blocks(ctx->opaque, ctx->data, POLY1305_BLOCK_SIZE, 0, + simd_context); + } + + poly1305_emit(ctx->opaque, mac, ctx->nonce, simd_context); + + memzero_explicit(ctx, sizeof(*ctx)); +} +EXPORT_SYMBOL(poly1305_final); + +#include "../selftest/poly1305.h" + +static bool nosimd __initdata = false; + +static int __init mod_init(void) +{ + if (!nosimd) + poly1305_fpu_init(); +#ifdef DEBUG + if (!poly1305_selftest()) + return -ENOTRECOVERABLE; +#endif + return 0; +} + +static void __exit mod_exit(void) +{ +} + +module_param(nosimd, bool, 0); +module_init(mod_init); +module_exit(mod_exit); +MODULE_LICENSE("GPL v2"); +MODULE_DESCRIPTION("Poly1305 one-time authenticator"); +MODULE_AUTHOR("Jason A. Donenfeld "); diff --git a/lib/zinc/selftest/poly1305.h b/lib/zinc/selftest/poly1305.h new file mode 100644 index 000000000000..e806ab2856fe --- /dev/null +++ b/lib/zinc/selftest/poly1305.h @@ -0,0 +1,1112 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifdef DEBUG +struct poly1305_testvec { + const u8 *input, *output, *key; + size_t ilen; +}; + +/* RFC7539 */ +static const u8 input01[] __initconst = { + 0x43, 0x72, 0x79, 0x70, 0x74, 0x6f, 0x67, 0x72, + 0x61, 0x70, 0x68, 0x69, 0x63, 0x20, 0x46, 0x6f, + 0x72, 0x75, 0x6d, 0x20, 0x52, 0x65, 0x73, 0x65, + 0x61, 0x72, 0x63, 0x68, 0x20, 0x47, 0x72, 0x6f, + 0x75, 0x70 +}; +static const u8 output01[] __initconst = { + 0xa8, 0x06, 0x1d, 0xc1, 0x30, 0x51, 0x36, 0xc6, + 0xc2, 0x2b, 0x8b, 0xaf, 0x0c, 0x01, 0x27, 0xa9 +}; +static const u8 key01[] __initconst = { + 0x85, 0xd6, 0xbe, 0x78, 0x57, 0x55, 0x6d, 0x33, + 0x7f, 0x44, 0x52, 0xfe, 0x42, 0xd5, 0x06, 0xa8, + 0x01, 0x03, 0x80, 0x8a, 0xfb, 0x0d, 0xb2, 0xfd, + 0x4a, 0xbf, 0xf6, 0xaf, 0x41, 0x49, 0xf5, 0x1b +}; + +/* "The Poly1305-AES message-authentication code" */ +static const u8 input02[] __initconst = { + 0xf3, 0xf6 +}; +static const u8 output02[] __initconst = { + 0xf4, 0xc6, 0x33, 0xc3, 0x04, 0x4f, 0xc1, 0x45, + 0xf8, 0x4f, 0x33, 0x5c, 0xb8, 0x19, 0x53, 0xde +}; +static const u8 key02[] __initconst = { + 0x85, 0x1f, 0xc4, 0x0c, 0x34, 0x67, 0xac, 0x0b, + 0xe0, 0x5c, 0xc2, 0x04, 0x04, 0xf3, 0xf7, 0x00, + 0x58, 0x0b, 0x3b, 0x0f, 0x94, 0x47, 0xbb, 0x1e, + 0x69, 0xd0, 0x95, 0xb5, 0x92, 0x8b, 0x6d, 0xbc +}; + +static const u8 input03[] __initconst = { }; +static const u8 output03[] __initconst = { + 0xdd, 0x3f, 0xab, 0x22, 0x51, 0xf1, 0x1a, 0xc7, + 0x59, 0xf0, 0x88, 0x71, 0x29, 0xcc, 0x2e, 0xe7 +}; +static const u8 key03[] __initconst = { + 0xa0, 0xf3, 0x08, 0x00, 0x00, 0xf4, 0x64, 0x00, + 0xd0, 0xc7, 0xe9, 0x07, 0x6c, 0x83, 0x44, 0x03, + 0xdd, 0x3f, 0xab, 0x22, 0x51, 0xf1, 0x1a, 0xc7, + 0x59, 0xf0, 0x88, 0x71, 0x29, 0xcc, 0x2e, 0xe7 +}; + +static const u8 input04[] __initconst = { + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 +}; +static const u8 output04[] __initconst = { + 0x0e, 0xe1, 0xc1, 0x6b, 0xb7, 0x3f, 0x0f, 0x4f, + 0xd1, 0x98, 0x81, 0x75, 0x3c, 0x01, 0xcd, 0xbe +}; +static const u8 key04[] __initconst = { + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef +}; + +static const u8 input05[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9 +}; +static const u8 output05[] __initconst = { + 0x51, 0x54, 0xad, 0x0d, 0x2c, 0xb2, 0x6e, 0x01, + 0x27, 0x4f, 0xc5, 0x11, 0x48, 0x49, 0x1f, 0x1b +}; +static const u8 key05[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +/* self-generated vectors exercise "significant" lengths, such that they + * are handled by different code paths */ +static const u8 input06[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf +}; +static const u8 output06[] __initconst = { + 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37, + 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66 +}; +static const u8 key06[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input07[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67 +}; +static const u8 output07[] __initconst = { + 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2, + 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61 +}; +static const u8 key07[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input08[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 +}; +static const u8 output08[] __initconst = { + 0xbb, 0xb6, 0x13, 0xb2, 0xb6, 0xd7, 0x53, 0xba, + 0x07, 0x39, 0x5b, 0x91, 0x6a, 0xae, 0xce, 0x15 +}; +static const u8 key08[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input09[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24 +}; +static const u8 output09[] __initconst = { + 0xc7, 0x94, 0xd7, 0x05, 0x7d, 0x17, 0x78, 0xc4, + 0xbb, 0xee, 0x0a, 0x39, 0xb3, 0xd9, 0x73, 0x42 +}; +static const u8 key09[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input10[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 +}; +static const u8 output10[] __initconst = { + 0xff, 0xbc, 0xb9, 0xb3, 0x71, 0x42, 0x31, 0x52, + 0xd7, 0xfc, 0xa5, 0xad, 0x04, 0x2f, 0xba, 0xa9 +}; +static const u8 key10[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input11[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36, + 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37, + 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66 +}; +static const u8 output11[] __initconst = { + 0x06, 0x9e, 0xd6, 0xb8, 0xef, 0x0f, 0x20, 0x7b, + 0x3e, 0x24, 0x3b, 0xb1, 0x01, 0x9f, 0xe6, 0x32 +}; +static const u8 key11[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input12[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36, + 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37, + 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66, + 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2, + 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61 +}; +static const u8 output12[] __initconst = { + 0xcc, 0xa3, 0x39, 0xd9, 0xa4, 0x5f, 0xa2, 0x36, + 0x8c, 0x2c, 0x68, 0xb3, 0xa4, 0x17, 0x91, 0x33 +}; +static const u8 key12[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input13[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36, + 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37, + 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66, + 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2, + 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61, + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36 +}; +static const u8 output13[] __initconst = { + 0x53, 0xf6, 0xe8, 0x28, 0xa2, 0xf0, 0xfe, 0x0e, + 0xe8, 0x15, 0xbf, 0x0b, 0xd5, 0x84, 0x1a, 0x34 +}; +static const u8 key13[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +static const u8 input14[] __initconst = { + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36, + 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37, + 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66, + 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2, + 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61, + 0xab, 0x08, 0x12, 0x72, 0x4a, 0x7f, 0x1e, 0x34, + 0x27, 0x42, 0xcb, 0xed, 0x37, 0x4d, 0x94, 0xd1, + 0x36, 0xc6, 0xb8, 0x79, 0x5d, 0x45, 0xb3, 0x81, + 0x98, 0x30, 0xf2, 0xc0, 0x44, 0x91, 0xfa, 0xf0, + 0x99, 0x0c, 0x62, 0xe4, 0x8b, 0x80, 0x18, 0xb2, + 0xc3, 0xe4, 0xa0, 0xfa, 0x31, 0x34, 0xcb, 0x67, + 0xfa, 0x83, 0xe1, 0x58, 0xc9, 0x94, 0xd9, 0x61, + 0xc4, 0xcb, 0x21, 0x09, 0x5c, 0x1b, 0xf9, 0xaf, + 0x48, 0x44, 0x3d, 0x0b, 0xb0, 0xd2, 0x11, 0x09, + 0xc8, 0x9a, 0x10, 0x0b, 0x5c, 0xe2, 0xc2, 0x08, + 0x83, 0x14, 0x9c, 0x69, 0xb5, 0x61, 0xdd, 0x88, + 0x29, 0x8a, 0x17, 0x98, 0xb1, 0x07, 0x16, 0xef, + 0x66, 0x3c, 0xea, 0x19, 0x0f, 0xfb, 0x83, 0xd8, + 0x95, 0x93, 0xf3, 0xf4, 0x76, 0xb6, 0xbc, 0x24, + 0xd7, 0xe6, 0x79, 0x10, 0x7e, 0xa2, 0x6a, 0xdb, + 0x8c, 0xaf, 0x66, 0x52, 0xd0, 0x65, 0x61, 0x36, + 0x81, 0x20, 0x59, 0xa5, 0xda, 0x19, 0x86, 0x37, + 0xca, 0xc7, 0xc4, 0xa6, 0x31, 0xbe, 0xe4, 0x66, + 0x5b, 0x88, 0xd7, 0xf6, 0x22, 0x8b, 0x11, 0xe2, + 0xe2, 0x85, 0x79, 0xa5, 0xc0, 0xc1, 0xf7, 0x61 +}; +static const u8 output14[] __initconst = { + 0xb8, 0x46, 0xd4, 0x4e, 0x9b, 0xbd, 0x53, 0xce, + 0xdf, 0xfb, 0xfb, 0xb6, 0xb7, 0xfa, 0x49, 0x33 +}; +static const u8 key14[] __initconst = { + 0x12, 0x97, 0x6a, 0x08, 0xc4, 0x42, 0x6d, 0x0c, + 0xe8, 0xa8, 0x24, 0x07, 0xc4, 0xf4, 0x82, 0x07, + 0x80, 0xf8, 0xc2, 0x0a, 0xa7, 0x12, 0x02, 0xd1, + 0xe2, 0x91, 0x79, 0xcb, 0xcb, 0x55, 0x5a, 0x57 +}; + +/* 4th power of the key spills to 131th bit in SIMD key setup */ +static const u8 input15[] __initconst = { + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff +}; +static const u8 output15[] __initconst = { + 0x07, 0x14, 0x5a, 0x4c, 0x02, 0xfe, 0x5f, 0xa3, + 0x20, 0x36, 0xde, 0x68, 0xfa, 0xbe, 0x90, 0x66 +}; +static const u8 key15[] __initconst = { + 0xad, 0x62, 0x81, 0x07, 0xe8, 0x35, 0x1d, 0x0f, + 0x2c, 0x23, 0x1a, 0x05, 0xdc, 0x4a, 0x41, 0x06, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +/* OpenSSL's poly1305_ieee754.c failed this in final stage */ +static const u8 input16[] __initconst = { + 0x84, 0x23, 0x64, 0xe1, 0x56, 0x33, 0x6c, 0x09, + 0x98, 0xb9, 0x33, 0xa6, 0x23, 0x77, 0x26, 0x18, + 0x0d, 0x9e, 0x3f, 0xdc, 0xbd, 0xe4, 0xcd, 0x5d, + 0x17, 0x08, 0x0f, 0xc3, 0xbe, 0xb4, 0x96, 0x14, + 0xd7, 0x12, 0x2c, 0x03, 0x74, 0x63, 0xff, 0x10, + 0x4d, 0x73, 0xf1, 0x9c, 0x12, 0x70, 0x46, 0x28, + 0xd4, 0x17, 0xc4, 0xc5, 0x4a, 0x3f, 0xe3, 0x0d, + 0x3c, 0x3d, 0x77, 0x14, 0x38, 0x2d, 0x43, 0xb0, + 0x38, 0x2a, 0x50, 0xa5, 0xde, 0xe5, 0x4b, 0xe8, + 0x44, 0xb0, 0x76, 0xe8, 0xdf, 0x88, 0x20, 0x1a, + 0x1c, 0xd4, 0x3b, 0x90, 0xeb, 0x21, 0x64, 0x3f, + 0xa9, 0x6f, 0x39, 0xb5, 0x18, 0xaa, 0x83, 0x40, + 0xc9, 0x42, 0xff, 0x3c, 0x31, 0xba, 0xf7, 0xc9, + 0xbd, 0xbf, 0x0f, 0x31, 0xae, 0x3f, 0xa0, 0x96, + 0xbf, 0x8c, 0x63, 0x03, 0x06, 0x09, 0x82, 0x9f, + 0xe7, 0x2e, 0x17, 0x98, 0x24, 0x89, 0x0b, 0xc8, + 0xe0, 0x8c, 0x31, 0x5c, 0x1c, 0xce, 0x2a, 0x83, + 0x14, 0x4d, 0xbb, 0xff, 0x09, 0xf7, 0x4e, 0x3e, + 0xfc, 0x77, 0x0b, 0x54, 0xd0, 0x98, 0x4a, 0x8f, + 0x19, 0xb1, 0x47, 0x19, 0xe6, 0x36, 0x35, 0x64, + 0x1d, 0x6b, 0x1e, 0xed, 0xf6, 0x3e, 0xfb, 0xf0, + 0x80, 0xe1, 0x78, 0x3d, 0x32, 0x44, 0x54, 0x12, + 0x11, 0x4c, 0x20, 0xde, 0x0b, 0x83, 0x7a, 0x0d, + 0xfa, 0x33, 0xd6, 0xb8, 0x28, 0x25, 0xff, 0xf4, + 0x4c, 0x9a, 0x70, 0xea, 0x54, 0xce, 0x47, 0xf0, + 0x7d, 0xf6, 0x98, 0xe6, 0xb0, 0x33, 0x23, 0xb5, + 0x30, 0x79, 0x36, 0x4a, 0x5f, 0xc3, 0xe9, 0xdd, + 0x03, 0x43, 0x92, 0xbd, 0xde, 0x86, 0xdc, 0xcd, + 0xda, 0x94, 0x32, 0x1c, 0x5e, 0x44, 0x06, 0x04, + 0x89, 0x33, 0x6c, 0xb6, 0x5b, 0xf3, 0x98, 0x9c, + 0x36, 0xf7, 0x28, 0x2c, 0x2f, 0x5d, 0x2b, 0x88, + 0x2c, 0x17, 0x1e, 0x74 +}; +static const u8 output16[] __initconst = { + 0xf2, 0x48, 0x31, 0x2e, 0x57, 0x8d, 0x9d, 0x58, + 0xf8, 0xb7, 0xbb, 0x4d, 0x19, 0x10, 0x54, 0x31 +}; +static const u8 key16[] __initconst = { + 0x95, 0xd5, 0xc0, 0x05, 0x50, 0x3e, 0x51, 0x0d, + 0x8c, 0xd0, 0xaa, 0x07, 0x2c, 0x4a, 0x4d, 0x06, + 0x6e, 0xab, 0xc5, 0x2d, 0x11, 0x65, 0x3d, 0xf4, + 0x7f, 0xbf, 0x63, 0xab, 0x19, 0x8b, 0xcc, 0x26 +}; + +/* AVX2 in OpenSSL's poly1305-x86.pl failed this with 176+32 split */ +static const u8 input17[] __initconst = { + 0x24, 0x8a, 0xc3, 0x10, 0x85, 0xb6, 0xc2, 0xad, + 0xaa, 0xa3, 0x82, 0x59, 0xa0, 0xd7, 0x19, 0x2c, + 0x5c, 0x35, 0xd1, 0xbb, 0x4e, 0xf3, 0x9a, 0xd9, + 0x4c, 0x38, 0xd1, 0xc8, 0x24, 0x79, 0xe2, 0xdd, + 0x21, 0x59, 0xa0, 0x77, 0x02, 0x4b, 0x05, 0x89, + 0xbc, 0x8a, 0x20, 0x10, 0x1b, 0x50, 0x6f, 0x0a, + 0x1a, 0xd0, 0xbb, 0xab, 0x76, 0xe8, 0x3a, 0x83, + 0xf1, 0xb9, 0x4b, 0xe6, 0xbe, 0xae, 0x74, 0xe8, + 0x74, 0xca, 0xb6, 0x92, 0xc5, 0x96, 0x3a, 0x75, + 0x43, 0x6b, 0x77, 0x61, 0x21, 0xec, 0x9f, 0x62, + 0x39, 0x9a, 0x3e, 0x66, 0xb2, 0xd2, 0x27, 0x07, + 0xda, 0xe8, 0x19, 0x33, 0xb6, 0x27, 0x7f, 0x3c, + 0x85, 0x16, 0xbc, 0xbe, 0x26, 0xdb, 0xbd, 0x86, + 0xf3, 0x73, 0x10, 0x3d, 0x7c, 0xf4, 0xca, 0xd1, + 0x88, 0x8c, 0x95, 0x21, 0x18, 0xfb, 0xfb, 0xd0, + 0xd7, 0xb4, 0xbe, 0xdc, 0x4a, 0xe4, 0x93, 0x6a, + 0xff, 0x91, 0x15, 0x7e, 0x7a, 0xa4, 0x7c, 0x54, + 0x44, 0x2e, 0xa7, 0x8d, 0x6a, 0xc2, 0x51, 0xd3, + 0x24, 0xa0, 0xfb, 0xe4, 0x9d, 0x89, 0xcc, 0x35, + 0x21, 0xb6, 0x6d, 0x16, 0xe9, 0xc6, 0x6a, 0x37, + 0x09, 0x89, 0x4e, 0x4e, 0xb0, 0xa4, 0xee, 0xdc, + 0x4a, 0xe1, 0x94, 0x68, 0xe6, 0x6b, 0x81, 0xf2, + 0x71, 0x35, 0x1b, 0x1d, 0x92, 0x1e, 0xa5, 0x51, + 0x04, 0x7a, 0xbc, 0xc6, 0xb8, 0x7a, 0x90, 0x1f, + 0xde, 0x7d, 0xb7, 0x9f, 0xa1, 0x81, 0x8c, 0x11, + 0x33, 0x6d, 0xbc, 0x07, 0x24, 0x4a, 0x40, 0xeb +}; +static const u8 output17[] __initconst = { + 0xbc, 0x93, 0x9b, 0xc5, 0x28, 0x14, 0x80, 0xfa, + 0x99, 0xc6, 0xd6, 0x8c, 0x25, 0x8e, 0xc4, 0x2f +}; +static const u8 key17[] __initconst = { + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, + 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +/* test vectors from Google */ +static const u8 input18[] __initconst = { }; +static const u8 output18[] __initconst = { + 0x47, 0x10, 0x13, 0x0e, 0x9f, 0x6f, 0xea, 0x8d, + 0x72, 0x29, 0x38, 0x50, 0xa6, 0x67, 0xd8, 0x6c +}; +static const u8 key18[] __initconst = { + 0xc8, 0xaf, 0xaa, 0xc3, 0x31, 0xee, 0x37, 0x2c, + 0xd6, 0x08, 0x2d, 0xe1, 0x34, 0x94, 0x3b, 0x17, + 0x47, 0x10, 0x13, 0x0e, 0x9f, 0x6f, 0xea, 0x8d, + 0x72, 0x29, 0x38, 0x50, 0xa6, 0x67, 0xd8, 0x6c +}; + +static const u8 input19[] __initconst = { + 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x77, 0x6f, + 0x72, 0x6c, 0x64, 0x21 +}; +static const u8 output19[] __initconst = { + 0xa6, 0xf7, 0x45, 0x00, 0x8f, 0x81, 0xc9, 0x16, + 0xa2, 0x0d, 0xcc, 0x74, 0xee, 0xf2, 0xb2, 0xf0 +}; +static const u8 key19[] __initconst = { + 0x74, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20, + 0x33, 0x32, 0x2d, 0x62, 0x79, 0x74, 0x65, 0x20, + 0x6b, 0x65, 0x79, 0x20, 0x66, 0x6f, 0x72, 0x20, + 0x50, 0x6f, 0x6c, 0x79, 0x31, 0x33, 0x30, 0x35 +}; + +static const u8 input20[] __initconst = { + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 output20[] __initconst = { + 0x49, 0xec, 0x78, 0x09, 0x0e, 0x48, 0x1e, 0xc6, + 0xc2, 0x6b, 0x33, 0xb9, 0x1c, 0xcc, 0x03, 0x07 +}; +static const u8 key20[] __initconst = { + 0x74, 0x68, 0x69, 0x73, 0x20, 0x69, 0x73, 0x20, + 0x33, 0x32, 0x2d, 0x62, 0x79, 0x74, 0x65, 0x20, + 0x6b, 0x65, 0x79, 0x20, 0x66, 0x6f, 0x72, 0x20, + 0x50, 0x6f, 0x6c, 0x79, 0x31, 0x33, 0x30, 0x35 +}; + +static const u8 input21[] __initconst = { + 0x89, 0xda, 0xb8, 0x0b, 0x77, 0x17, 0xc1, 0xdb, + 0x5d, 0xb4, 0x37, 0x86, 0x0a, 0x3f, 0x70, 0x21, + 0x8e, 0x93, 0xe1, 0xb8, 0xf4, 0x61, 0xfb, 0x67, + 0x7f, 0x16, 0xf3, 0x5f, 0x6f, 0x87, 0xe2, 0xa9, + 0x1c, 0x99, 0xbc, 0x3a, 0x47, 0xac, 0xe4, 0x76, + 0x40, 0xcc, 0x95, 0xc3, 0x45, 0xbe, 0x5e, 0xcc, + 0xa5, 0xa3, 0x52, 0x3c, 0x35, 0xcc, 0x01, 0x89, + 0x3a, 0xf0, 0xb6, 0x4a, 0x62, 0x03, 0x34, 0x27, + 0x03, 0x72, 0xec, 0x12, 0x48, 0x2d, 0x1b, 0x1e, + 0x36, 0x35, 0x61, 0x69, 0x8a, 0x57, 0x8b, 0x35, + 0x98, 0x03, 0x49, 0x5b, 0xb4, 0xe2, 0xef, 0x19, + 0x30, 0xb1, 0x7a, 0x51, 0x90, 0xb5, 0x80, 0xf1, + 0x41, 0x30, 0x0d, 0xf3, 0x0a, 0xdb, 0xec, 0xa2, + 0x8f, 0x64, 0x27, 0xa8, 0xbc, 0x1a, 0x99, 0x9f, + 0xd5, 0x1c, 0x55, 0x4a, 0x01, 0x7d, 0x09, 0x5d, + 0x8c, 0x3e, 0x31, 0x27, 0xda, 0xf9, 0xf5, 0x95 +}; +static const u8 output21[] __initconst = { + 0xc8, 0x5d, 0x15, 0xed, 0x44, 0xc3, 0x78, 0xd6, + 0xb0, 0x0e, 0x23, 0x06, 0x4c, 0x7b, 0xcd, 0x51 +}; +static const u8 key21[] __initconst = { + 0x2d, 0x77, 0x3b, 0xe3, 0x7a, 0xdb, 0x1e, 0x4d, + 0x68, 0x3b, 0xf0, 0x07, 0x5e, 0x79, 0xc4, 0xee, + 0x03, 0x79, 0x18, 0x53, 0x5a, 0x7f, 0x99, 0xcc, + 0xb7, 0x04, 0x0f, 0xb5, 0xf5, 0xf4, 0x3a, 0xea +}; + +static const u8 input22[] __initconst = { + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0b, + 0x17, 0x03, 0x03, 0x02, 0x00, 0x00, 0x00, 0x00, + 0x06, 0xdb, 0x1f, 0x1f, 0x36, 0x8d, 0x69, 0x6a, + 0x81, 0x0a, 0x34, 0x9c, 0x0c, 0x71, 0x4c, 0x9a, + 0x5e, 0x78, 0x50, 0xc2, 0x40, 0x7d, 0x72, 0x1a, + 0xcd, 0xed, 0x95, 0xe0, 0x18, 0xd7, 0xa8, 0x52, + 0x66, 0xa6, 0xe1, 0x28, 0x9c, 0xdb, 0x4a, 0xeb, + 0x18, 0xda, 0x5a, 0xc8, 0xa2, 0xb0, 0x02, 0x6d, + 0x24, 0xa5, 0x9a, 0xd4, 0x85, 0x22, 0x7f, 0x3e, + 0xae, 0xdb, 0xb2, 0xe7, 0xe3, 0x5e, 0x1c, 0x66, + 0xcd, 0x60, 0xf9, 0xab, 0xf7, 0x16, 0xdc, 0xc9, + 0xac, 0x42, 0x68, 0x2d, 0xd7, 0xda, 0xb2, 0x87, + 0xa7, 0x02, 0x4c, 0x4e, 0xef, 0xc3, 0x21, 0xcc, + 0x05, 0x74, 0xe1, 0x67, 0x93, 0xe3, 0x7c, 0xec, + 0x03, 0xc5, 0xbd, 0xa4, 0x2b, 0x54, 0xc1, 0x14, + 0xa8, 0x0b, 0x57, 0xaf, 0x26, 0x41, 0x6c, 0x7b, + 0xe7, 0x42, 0x00, 0x5e, 0x20, 0x85, 0x5c, 0x73, + 0xe2, 0x1d, 0xc8, 0xe2, 0xed, 0xc9, 0xd4, 0x35, + 0xcb, 0x6f, 0x60, 0x59, 0x28, 0x00, 0x11, 0xc2, + 0x70, 0xb7, 0x15, 0x70, 0x05, 0x1c, 0x1c, 0x9b, + 0x30, 0x52, 0x12, 0x66, 0x20, 0xbc, 0x1e, 0x27, + 0x30, 0xfa, 0x06, 0x6c, 0x7a, 0x50, 0x9d, 0x53, + 0xc6, 0x0e, 0x5a, 0xe1, 0xb4, 0x0a, 0xa6, 0xe3, + 0x9e, 0x49, 0x66, 0x92, 0x28, 0xc9, 0x0e, 0xec, + 0xb4, 0xa5, 0x0d, 0xb3, 0x2a, 0x50, 0xbc, 0x49, + 0xe9, 0x0b, 0x4f, 0x4b, 0x35, 0x9a, 0x1d, 0xfd, + 0x11, 0x74, 0x9c, 0xd3, 0x86, 0x7f, 0xcf, 0x2f, + 0xb7, 0xbb, 0x6c, 0xd4, 0x73, 0x8f, 0x6a, 0x4a, + 0xd6, 0xf7, 0xca, 0x50, 0x58, 0xf7, 0x61, 0x88, + 0x45, 0xaf, 0x9f, 0x02, 0x0f, 0x6c, 0x3b, 0x96, + 0x7b, 0x8f, 0x4c, 0xd4, 0xa9, 0x1e, 0x28, 0x13, + 0xb5, 0x07, 0xae, 0x66, 0xf2, 0xd3, 0x5c, 0x18, + 0x28, 0x4f, 0x72, 0x92, 0x18, 0x60, 0x62, 0xe1, + 0x0f, 0xd5, 0x51, 0x0d, 0x18, 0x77, 0x53, 0x51, + 0xef, 0x33, 0x4e, 0x76, 0x34, 0xab, 0x47, 0x43, + 0xf5, 0xb6, 0x8f, 0x49, 0xad, 0xca, 0xb3, 0x84, + 0xd3, 0xfd, 0x75, 0xf7, 0x39, 0x0f, 0x40, 0x06, + 0xef, 0x2a, 0x29, 0x5c, 0x8c, 0x7a, 0x07, 0x6a, + 0xd5, 0x45, 0x46, 0xcd, 0x25, 0xd2, 0x10, 0x7f, + 0xbe, 0x14, 0x36, 0xc8, 0x40, 0x92, 0x4a, 0xae, + 0xbe, 0x5b, 0x37, 0x08, 0x93, 0xcd, 0x63, 0xd1, + 0x32, 0x5b, 0x86, 0x16, 0xfc, 0x48, 0x10, 0x88, + 0x6b, 0xc1, 0x52, 0xc5, 0x32, 0x21, 0xb6, 0xdf, + 0x37, 0x31, 0x19, 0x39, 0x32, 0x55, 0xee, 0x72, + 0xbc, 0xaa, 0x88, 0x01, 0x74, 0xf1, 0x71, 0x7f, + 0x91, 0x84, 0xfa, 0x91, 0x64, 0x6f, 0x17, 0xa2, + 0x4a, 0xc5, 0x5d, 0x16, 0xbf, 0xdd, 0xca, 0x95, + 0x81, 0xa9, 0x2e, 0xda, 0x47, 0x92, 0x01, 0xf0, + 0xed, 0xbf, 0x63, 0x36, 0x00, 0xd6, 0x06, 0x6d, + 0x1a, 0xb3, 0x6d, 0x5d, 0x24, 0x15, 0xd7, 0x13, + 0x51, 0xbb, 0xcd, 0x60, 0x8a, 0x25, 0x10, 0x8d, + 0x25, 0x64, 0x19, 0x92, 0xc1, 0xf2, 0x6c, 0x53, + 0x1c, 0xf9, 0xf9, 0x02, 0x03, 0xbc, 0x4c, 0xc1, + 0x9f, 0x59, 0x27, 0xd8, 0x34, 0xb0, 0xa4, 0x71, + 0x16, 0xd3, 0x88, 0x4b, 0xbb, 0x16, 0x4b, 0x8e, + 0xc8, 0x83, 0xd1, 0xac, 0x83, 0x2e, 0x56, 0xb3, + 0x91, 0x8a, 0x98, 0x60, 0x1a, 0x08, 0xd1, 0x71, + 0x88, 0x15, 0x41, 0xd5, 0x94, 0xdb, 0x39, 0x9c, + 0x6a, 0xe6, 0x15, 0x12, 0x21, 0x74, 0x5a, 0xec, + 0x81, 0x4c, 0x45, 0xb0, 0xb0, 0x5b, 0x56, 0x54, + 0x36, 0xfd, 0x6f, 0x13, 0x7a, 0xa1, 0x0a, 0x0c, + 0x0b, 0x64, 0x37, 0x61, 0xdb, 0xd6, 0xf9, 0xa9, + 0xdc, 0xb9, 0x9b, 0x1a, 0x6e, 0x69, 0x08, 0x54, + 0xce, 0x07, 0x69, 0xcd, 0xe3, 0x97, 0x61, 0xd8, + 0x2f, 0xcd, 0xec, 0x15, 0xf0, 0xd9, 0x2d, 0x7d, + 0x8e, 0x94, 0xad, 0xe8, 0xeb, 0x83, 0xfb, 0xe0 +}; +static const u8 output22[] __initconst = { + 0x26, 0x37, 0x40, 0x8f, 0xe1, 0x30, 0x86, 0xea, + 0x73, 0xf9, 0x71, 0xe3, 0x42, 0x5e, 0x28, 0x20 +}; +static const u8 key22[] __initconst = { + 0x99, 0xe5, 0x82, 0x2d, 0xd4, 0x17, 0x3c, 0x99, + 0x5e, 0x3d, 0xae, 0x0d, 0xde, 0xfb, 0x97, 0x74, + 0x3f, 0xde, 0x3b, 0x08, 0x01, 0x34, 0xb3, 0x9f, + 0x76, 0xe9, 0xbf, 0x8d, 0x0e, 0x88, 0xd5, 0x46 +}; + +/* test vectors from Hanno Böck */ +static const u8 input23[] __initconst = { + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0x80, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xce, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xc5, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xe3, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xac, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xe6, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0x00, 0x00, 0x00, + 0xaf, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, + 0xcc, 0xcc, 0xff, 0xff, 0xff, 0xf5, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0xff, 0xff, 0xff, 0xe7, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x71, 0x92, 0x05, 0xa8, 0x52, 0x1d, + 0xfc +}; +static const u8 output23[] __initconst = { + 0x85, 0x59, 0xb8, 0x76, 0xec, 0xee, 0xd6, 0x6e, + 0xb3, 0x77, 0x98, 0xc0, 0x45, 0x7b, 0xaf, 0xf9 +}; +static const u8 key23[] __initconst = { + 0x7f, 0x1b, 0x02, 0x64, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc, 0xcc +}; + +static const u8 input24[] __initconst = { + 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x80, 0x02, 0x64 +}; +static const u8 output24[] __initconst = { + 0x00, 0xbd, 0x12, 0x58, 0x97, 0x8e, 0x20, 0x54, + 0x44, 0xc9, 0xaa, 0xaa, 0x82, 0x00, 0x6f, 0xed +}; +static const u8 key24[] __initconst = { + 0xe0, 0x00, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, + 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa, 0xaa +}; + +static const u8 input25[] __initconst = { + 0x02, 0xfc +}; +static const u8 output25[] __initconst = { + 0x06, 0x12, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, + 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c +}; +static const u8 key25[] __initconst = { + 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, + 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, + 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, + 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c, 0x0c +}; + +static const u8 input26[] __initconst = { + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7a, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x5c, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x6e, 0x7b, 0x00, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7a, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x5c, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, 0x7b, + 0x7b, 0x6e, 0x7b, 0x00, 0x13, 0x00, 0x00, 0x00, + 0x00, 0xb3, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0xf2, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x20, 0x00, 0xef, 0xff, 0x00, + 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x10, 0x00, 0x00, + 0x00, 0x00, 0x09, 0x00, 0x00, 0x00, 0x64, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x13, 0x00, 0x00, 0x00, 0x00, + 0xb3, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xf2, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x20, 0x00, 0xef, 0xff, 0x00, 0x09, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x7a, 0x00, 0x00, 0x10, 0x00, 0x00, 0x00, + 0x00, 0x09, 0x00, 0x00, 0x00, 0x64, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xfc +}; +static const u8 output26[] __initconst = { + 0x33, 0x20, 0x5b, 0xbf, 0x9e, 0x9f, 0x8f, 0x72, + 0x12, 0xab, 0x9e, 0x2a, 0xb9, 0xb7, 0xe4, 0xa5 +}; +static const u8 key26[] __initconst = { + 0x00, 0xff, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x1e, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x7b, 0x7b +}; + +static const u8 input27[] __initconst = { + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0xff, 0xff, 0xff, 0xe9, + 0xe9, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, + 0xac, 0xac, 0xac, 0xac, 0x00, 0x00, 0xac, 0xac, + 0xec, 0x01, 0x00, 0xac, 0xac, 0xac, 0x2c, 0xac, + 0xa2, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, 0xac, + 0xac, 0xac, 0xac, 0xac, 0x64, 0xf2 +}; +static const u8 output27[] __initconst = { + 0x02, 0xee, 0x7c, 0x8c, 0x54, 0x6d, 0xde, 0xb1, + 0xa4, 0x67, 0xe4, 0xc3, 0x98, 0x11, 0x58, 0xb9 +}; +static const u8 key27[] __initconst = { + 0x00, 0x00, 0x00, 0x7f, 0x00, 0x00, 0x00, 0x7f, + 0x01, 0x00, 0x00, 0x20, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0xcf, 0x77, 0x77, 0x77, 0x77, 0x77, + 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77, 0x77 +}; + +/* nacl */ +static const u8 input28[] __initconst = { + 0x8e, 0x99, 0x3b, 0x9f, 0x48, 0x68, 0x12, 0x73, + 0xc2, 0x96, 0x50, 0xba, 0x32, 0xfc, 0x76, 0xce, + 0x48, 0x33, 0x2e, 0xa7, 0x16, 0x4d, 0x96, 0xa4, + 0x47, 0x6f, 0xb8, 0xc5, 0x31, 0xa1, 0x18, 0x6a, + 0xc0, 0xdf, 0xc1, 0x7c, 0x98, 0xdc, 0xe8, 0x7b, + 0x4d, 0xa7, 0xf0, 0x11, 0xec, 0x48, 0xc9, 0x72, + 0x71, 0xd2, 0xc2, 0x0f, 0x9b, 0x92, 0x8f, 0xe2, + 0x27, 0x0d, 0x6f, 0xb8, 0x63, 0xd5, 0x17, 0x38, + 0xb4, 0x8e, 0xee, 0xe3, 0x14, 0xa7, 0xcc, 0x8a, + 0xb9, 0x32, 0x16, 0x45, 0x48, 0xe5, 0x26, 0xae, + 0x90, 0x22, 0x43, 0x68, 0x51, 0x7a, 0xcf, 0xea, + 0xbd, 0x6b, 0xb3, 0x73, 0x2b, 0xc0, 0xe9, 0xda, + 0x99, 0x83, 0x2b, 0x61, 0xca, 0x01, 0xb6, 0xde, + 0x56, 0x24, 0x4a, 0x9e, 0x88, 0xd5, 0xf9, 0xb3, + 0x79, 0x73, 0xf6, 0x22, 0xa4, 0x3d, 0x14, 0xa6, + 0x59, 0x9b, 0x1f, 0x65, 0x4c, 0xb4, 0x5a, 0x74, + 0xe3, 0x55, 0xa5 +}; +static const u8 output28[] __initconst = { + 0xf3, 0xff, 0xc7, 0x70, 0x3f, 0x94, 0x00, 0xe5, + 0x2a, 0x7d, 0xfb, 0x4b, 0x3d, 0x33, 0x05, 0xd9 +}; +static const u8 key28[] __initconst = { + 0xee, 0xa6, 0xa7, 0x25, 0x1c, 0x1e, 0x72, 0x91, + 0x6d, 0x11, 0xc2, 0xcb, 0x21, 0x4d, 0x3c, 0x25, + 0x25, 0x39, 0x12, 0x1d, 0x8e, 0x23, 0x4e, 0x65, + 0x2d, 0x65, 0x1f, 0xa4, 0xc8, 0xcf, 0xf8, 0x80 +}; + +/* wrap 2^130-5 */ +static const u8 input29[] __initconst = { + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff +}; +static const u8 output29[] __initconst = { + 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 key29[] __initconst = { + 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +/* wrap 2^128 */ +static const u8 input30[] __initconst = { + 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 output30[] __initconst = { + 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 key30[] __initconst = { + 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff +}; + +/* limb carry */ +static const u8 input31[] __initconst = { + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0x11, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 output31[] __initconst = { + 0x05, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 key31[] __initconst = { + 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +/* 2^130-5 */ +static const u8 input32[] __initconst = { + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xfb, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, + 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, 0xfe, + 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, + 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01, 0x01 +}; +static const u8 output32[] __initconst = { + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 key32[] __initconst = { + 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +/* 2^130-6 */ +static const u8 input33[] __initconst = { + 0xfd, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff +}; +static const u8 output33[] __initconst = { + 0xfa, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff +}; +static const u8 key33[] __initconst = { + 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +/* 5*H+L reduction intermediate */ +static const u8 input34[] __initconst = { + 0xe3, 0x35, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0xb9, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x33, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0x79, 0xcd, + 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 output34[] __initconst = { + 0x14, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x55, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 key34[] __initconst = { + 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +/* 5*H+L reduction final */ +static const u8 input35[] __initconst = { + 0xe3, 0x35, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0xb9, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x33, 0x94, 0xd7, 0x50, 0x5e, 0x43, 0x79, 0xcd, + 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 output35[] __initconst = { + 0x13, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; +static const u8 key35[] __initconst = { + 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 +}; + +static const struct poly1305_testvec poly1305_testvecs[] __initconst = { + { input01, output01, key01, sizeof(input01) }, + { input02, output02, key02, sizeof(input02) }, + { input03, output03, key03, sizeof(input03) }, + { input04, output04, key04, sizeof(input04) }, + { input05, output05, key05, sizeof(input05) }, + { input06, output06, key06, sizeof(input06) }, + { input07, output07, key07, sizeof(input07) }, + { input08, output08, key08, sizeof(input08) }, + { input09, output09, key09, sizeof(input09) }, + { input10, output10, key10, sizeof(input10) }, + { input11, output11, key11, sizeof(input11) }, + { input12, output12, key12, sizeof(input12) }, + { input13, output13, key13, sizeof(input13) }, + { input14, output14, key14, sizeof(input14) }, + { input15, output15, key15, sizeof(input15) }, + { input16, output16, key16, sizeof(input16) }, + { input17, output17, key17, sizeof(input17) }, + { input18, output18, key18, sizeof(input18) }, + { input19, output19, key19, sizeof(input19) }, + { input20, output20, key20, sizeof(input20) }, + { input21, output21, key21, sizeof(input21) }, + { input22, output22, key22, sizeof(input22) }, + { input23, output23, key23, sizeof(input23) }, + { input24, output24, key24, sizeof(input24) }, + { input25, output25, key25, sizeof(input25) }, + { input26, output26, key26, sizeof(input26) }, + { input27, output27, key27, sizeof(input27) }, + { input28, output28, key28, sizeof(input28) }, + { input29, output29, key29, sizeof(input29) }, + { input30, output30, key30, sizeof(input30) }, + { input31, output31, key31, sizeof(input31) }, + { input32, output32, key32, sizeof(input32) }, + { input33, output33, key33, sizeof(input33) }, + { input34, output34, key34, sizeof(input34) }, + { input35, output35, key35, sizeof(input35) } +}; + +static bool __init poly1305_selftest(void) +{ + simd_context_t simd_context; + bool success = true; + size_t i, j; + + simd_get(&simd_context); + for (i = 0; i < ARRAY_SIZE(poly1305_testvecs); ++i) { + struct poly1305_ctx poly1305; + u8 out[POLY1305_MAC_SIZE]; + + memset(out, 0, sizeof(out)); + memset(&poly1305, 0, sizeof(poly1305)); + poly1305_init(&poly1305, poly1305_testvecs[i].key); + poly1305_update(&poly1305, poly1305_testvecs[i].input, + poly1305_testvecs[i].ilen, &simd_context); + poly1305_final(&poly1305, out, &simd_context); + if (memcmp(out, poly1305_testvecs[i].output, + POLY1305_MAC_SIZE)) { + pr_info("poly1305 self-test %zu: FAIL\n", i + 1); + success = false; + } + simd_relax(&simd_context); + + if (poly1305_testvecs[i].ilen <= 1) + continue; + + for (j = 1; j < poly1305_testvecs[i].ilen - 1; ++j) { + memset(out, 0, sizeof(out)); + memset(&poly1305, 0, sizeof(poly1305)); + poly1305_init(&poly1305, poly1305_testvecs[i].key); + poly1305_update(&poly1305, poly1305_testvecs[i].input, + j, &simd_context); + poly1305_update(&poly1305, + poly1305_testvecs[i].input + j, + poly1305_testvecs[i].ilen - j, + &simd_context); + poly1305_final(&poly1305, out, &simd_context); + if (memcmp(out, poly1305_testvecs[i].output, + POLY1305_MAC_SIZE)) { + pr_info("poly1305 self-test %zu (split %zu): FAIL\n", + i + 1, j); + success = false; + } + + memset(out, 0, sizeof(out)); + memset(&poly1305, 0, sizeof(poly1305)); + poly1305_init(&poly1305, poly1305_testvecs[i].key); + poly1305_update(&poly1305, poly1305_testvecs[i].input, + j, &simd_context); + poly1305_update(&poly1305, + poly1305_testvecs[i].input + j, + poly1305_testvecs[i].ilen - j, + (simd_context_t []){ HAVE_NO_SIMD }); + poly1305_final(&poly1305, out, &simd_context); + if (memcmp(out, poly1305_testvecs[i].output, + POLY1305_MAC_SIZE)) { + pr_info("poly1305 self-test %zu (split %zu, mixed simd): FAIL\n", + i + 1, j); + success = false; + } + simd_relax(&simd_context); + } + } + simd_put(&simd_context); + + if (success) + pr_info("poly1305 self-tests: pass\n"); + + return success; +} +#endif From patchwork Tue Sep 25 14:56:10 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147495 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp830514lji; Tue, 25 Sep 2018 07:57:17 -0700 (PDT) X-Google-Smtp-Source: ACcGV63RCpDQfc8PUmBAT2pyVE8cK5vudq2DMBSDOMF5UOw/VvPZeBh+p9Yyb0BnZmCn0l6SE5qx X-Received: by 2002:a63:2348:: with SMTP id u8-v6mr1508914pgm.122.1537887437153; Tue, 25 Sep 2018 07:57:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887437; cv=none; d=google.com; s=arc-20160816; b=dbc6mOlt8q0I3+2qjc3+nx/qe+8xATmNifF/4DjzWNuY44EXHAqbBa5AC4dLykTszz xlSNlDM7M3SG5WiWVnhdBOlQuHQdtzxA5pT9YB63/9k0Gh9wP++OlLLbwxW30qJKywmq gH79DoHjBCXpxYHdv9DvaeCOvltAM5vWWm0fC70FE3QpuORHiDwCYkNET8MoFq3JBlN3 3rfoq1W6RL5DsnPGRUXy53nB96u4EQlgDmCwWxsEhLfxYYysolLeoVuuTcRgNurBJgeu YBXxGN6ittDHvGsIOoYkTFAQ+bGeLMMnFf3+ZwTfzKiOfyLUpxr61IoIBi72XYgy1I49 r2mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2H0+R2IR4NUzy0jgGgK+p3GdB/JtMO8T3qrEKquF+wA=; b=Z/qFuHtRIU19RJ8ZrZcKHReQNGYZljmUUyAPnYxrXyOodbrRss345IN1tYKdy9tshV nT/8kQBB5CLCewVyzUex7CXlhjmPaWbEnmMvxdF2PE8GJPlgDBrMijjgH9TrfBb9noI1 N3xx6oyEwp15FkbweTrKzkxXADohOszRJS8ZxbvyfyRxEZ/RMHxUxx90p9f8nItb9l6G rUgr6RB7C8yOH8zJ1j8zxPO488TP63MGyXKoGviAQtwwrVAMptbY0f6CTJK/ciZKhLjK IZG/Xzx73iTVkWhh3W5a2SPqhqj4mujSJ+rjjbRe4mh5hesG4qK0w9rmaWz0C39TpsCC V5Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=aOu8h1BJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t1-v6si2577158pgf.262.2018.09.25.07.57.16; Tue, 25 Sep 2018 07:57:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=aOu8h1BJ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729801AbeIYVFH (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:07 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:58261 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729224AbeIYVFD (ORCPT ); Tue, 25 Sep 2018 17:05:03 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 6e621a9e; Tue, 25 Sep 2018 14:38:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=3GX/EH/nqz8d2Qki5agxxK39K ZM=; b=aOu8h1BJs4XB8EzJFWnv+bwdDwEE553UV5vyd7vAvLOwiJChIvWIriq4q cFttROYRj/ibw34ShvunIBuJUUUUHuVdwwgTrUjHnuWOtYO7MOKtYThpMYdBD6Rc nYZzfm7ppmC2gQLXcSU1E/zx8ttiRlPMmosynHFeOOzCzVw2aJO6iWlgTa1vmjRV Jpez5x6k2NQQ3nJAlwgHO12ldx5S8Sn1CkYFzHk+v7cALkHIpJsY3xwi/icbVC/J MCpXT0W05+iuSxQm4ydN0ItNw6w5FrXcbI/ubp5bl+/Tk+gk/ThsjjUlbeW0fpMy QCG28oiCWfLlao+OMQqr9a4pt6uVg== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 9388fa5f (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:33 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Andy Polyakov , Russell King , linux-arm-kernel@lists.infradead.org Subject: [PATCH net-next v6 11/23] zinc: import Andy Polyakov's Poly1305 ARM and ARM64 implementations Date: Tue, 25 Sep 2018 16:56:10 +0200 Message-Id: <20180925145622.29959-12-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org These NEON and non-NEON implementations come from Andy Polyakov's implementation, and are included here in raw form without modification, so that subsequent commits that fix these up for the kernel can see how it has changed. This awkward commit splitting has been requested for the ARM[64] implementations in particular. While this is CRYPTOGAMS code, the originating code for this happens to be the same as OpenSSL's commit 5bb1cd2292b388263a0cc05392bb99141212aa53 Signed-off-by: Jason A. Donenfeld Based-on-code-from: Andy Polyakov Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Andy Polyakov Cc: Russell King Cc: linux-arm-kernel@lists.infradead.org --- lib/zinc/poly1305/poly1305-arm-cryptogams.S | 1172 +++++++++++++++++ lib/zinc/poly1305/poly1305-arm64-cryptogams.S | 869 ++++++++++++ 2 files changed, 2041 insertions(+) create mode 100644 lib/zinc/poly1305/poly1305-arm-cryptogams.S create mode 100644 lib/zinc/poly1305/poly1305-arm64-cryptogams.S -- 2.19.0 diff --git a/lib/zinc/poly1305/poly1305-arm-cryptogams.S b/lib/zinc/poly1305/poly1305-arm-cryptogams.S new file mode 100644 index 000000000000..884b465030e4 --- /dev/null +++ b/lib/zinc/poly1305/poly1305-arm-cryptogams.S @@ -0,0 +1,1172 @@ +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ +/* + * Copyright (C) 2006-2017 CRYPTOGAMS by . All Rights Reserved. + */ + +#include "arm_arch.h" + +.text +#if defined(__thumb2__) +.syntax unified +.thumb +#else +.code 32 +#endif + +.globl poly1305_emit +.globl poly1305_blocks +.globl poly1305_init +.type poly1305_init,%function +.align 5 +poly1305_init: +.Lpoly1305_init: + stmdb sp!,{r4-r11} + + eor r3,r3,r3 + cmp r1,#0 + str r3,[r0,#0] @ zero hash value + str r3,[r0,#4] + str r3,[r0,#8] + str r3,[r0,#12] + str r3,[r0,#16] + str r3,[r0,#36] @ is_base2_26 + add r0,r0,#20 + +#ifdef __thumb2__ + it eq +#endif + moveq r0,#0 + beq .Lno_key + +#if __ARM_MAX_ARCH__>=7 + adr r11,.Lpoly1305_init + ldr r12,.LOPENSSL_armcap +#endif + ldrb r4,[r1,#0] + mov r10,#0x0fffffff + ldrb r5,[r1,#1] + and r3,r10,#-4 @ 0x0ffffffc + ldrb r6,[r1,#2] + ldrb r7,[r1,#3] + orr r4,r4,r5,lsl#8 + ldrb r5,[r1,#4] + orr r4,r4,r6,lsl#16 + ldrb r6,[r1,#5] + orr r4,r4,r7,lsl#24 + ldrb r7,[r1,#6] + and r4,r4,r10 + +#if __ARM_MAX_ARCH__>=7 + ldr r12,[r11,r12] @ OPENSSL_armcap_P +# ifdef __APPLE__ + ldr r12,[r12] +# endif +#endif + ldrb r8,[r1,#7] + orr r5,r5,r6,lsl#8 + ldrb r6,[r1,#8] + orr r5,r5,r7,lsl#16 + ldrb r7,[r1,#9] + orr r5,r5,r8,lsl#24 + ldrb r8,[r1,#10] + and r5,r5,r3 + +#if __ARM_MAX_ARCH__>=7 + tst r12,#ARMV7_NEON @ check for NEON +# ifdef __APPLE__ + adr r9,poly1305_blocks_neon + adr r11,poly1305_blocks +# ifdef __thumb2__ + it ne +# endif + movne r11,r9 + adr r12,poly1305_emit + adr r10,poly1305_emit_neon +# ifdef __thumb2__ + it ne +# endif + movne r12,r10 +# else +# ifdef __thumb2__ + itete eq +# endif + addeq r12,r11,#(poly1305_emit-.Lpoly1305_init) + addne r12,r11,#(poly1305_emit_neon-.Lpoly1305_init) + addeq r11,r11,#(poly1305_blocks-.Lpoly1305_init) + addne r11,r11,#(poly1305_blocks_neon-.Lpoly1305_init) +# endif +# ifdef __thumb2__ + orr r12,r12,#1 @ thumb-ify address + orr r11,r11,#1 +# endif +#endif + ldrb r9,[r1,#11] + orr r6,r6,r7,lsl#8 + ldrb r7,[r1,#12] + orr r6,r6,r8,lsl#16 + ldrb r8,[r1,#13] + orr r6,r6,r9,lsl#24 + ldrb r9,[r1,#14] + and r6,r6,r3 + + ldrb r10,[r1,#15] + orr r7,r7,r8,lsl#8 + str r4,[r0,#0] + orr r7,r7,r9,lsl#16 + str r5,[r0,#4] + orr r7,r7,r10,lsl#24 + str r6,[r0,#8] + and r7,r7,r3 + str r7,[r0,#12] +#if __ARM_MAX_ARCH__>=7 + stmia r2,{r11,r12} @ fill functions table + mov r0,#1 +#else + mov r0,#0 +#endif +.Lno_key: + ldmia sp!,{r4-r11} +#if __ARM_ARCH__>=5 + bx lr @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +.size poly1305_init,.-poly1305_init +.type poly1305_blocks,%function +.align 5 +poly1305_blocks: +.Lpoly1305_blocks: + stmdb sp!,{r3-r11,lr} + + ands r2,r2,#-16 + beq .Lno_data + + cmp r3,#0 + add r2,r2,r1 @ end pointer + sub sp,sp,#32 + + ldmia r0,{r4-r12} @ load context + + str r0,[sp,#12] @ offload stuff + mov lr,r1 + str r2,[sp,#16] + str r10,[sp,#20] + str r11,[sp,#24] + str r12,[sp,#28] + b .Loop + +.Loop: +#if __ARM_ARCH__<7 + ldrb r0,[lr],#16 @ load input +# ifdef __thumb2__ + it hi +# endif + addhi r8,r8,#1 @ 1<<128 + ldrb r1,[lr,#-15] + ldrb r2,[lr,#-14] + ldrb r3,[lr,#-13] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-12] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-11] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-10] + adds r4,r4,r3 @ accumulate input + + ldrb r3,[lr,#-9] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-8] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-7] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-6] + adcs r5,r5,r3 + + ldrb r3,[lr,#-5] + orr r1,r0,r1,lsl#8 + ldrb r0,[lr,#-4] + orr r2,r1,r2,lsl#16 + ldrb r1,[lr,#-3] + orr r3,r2,r3,lsl#24 + ldrb r2,[lr,#-2] + adcs r6,r6,r3 + + ldrb r3,[lr,#-1] + orr r1,r0,r1,lsl#8 + str lr,[sp,#8] @ offload input pointer + orr r2,r1,r2,lsl#16 + add r10,r10,r10,lsr#2 + orr r3,r2,r3,lsl#24 +#else + ldr r0,[lr],#16 @ load input +# ifdef __thumb2__ + it hi +# endif + addhi r8,r8,#1 @ padbit + ldr r1,[lr,#-12] + ldr r2,[lr,#-8] + ldr r3,[lr,#-4] +# ifdef __ARMEB__ + rev r0,r0 + rev r1,r1 + rev r2,r2 + rev r3,r3 +# endif + adds r4,r4,r0 @ accumulate input + str lr,[sp,#8] @ offload input pointer + adcs r5,r5,r1 + add r10,r10,r10,lsr#2 + adcs r6,r6,r2 +#endif + add r11,r11,r11,lsr#2 + adcs r7,r7,r3 + add r12,r12,r12,lsr#2 + + umull r2,r3,r5,r9 + adc r8,r8,#0 + umull r0,r1,r4,r9 + umlal r2,r3,r8,r10 + umlal r0,r1,r7,r10 + ldr r10,[sp,#20] @ reload r10 + umlal r2,r3,r6,r12 + umlal r0,r1,r5,r12 + umlal r2,r3,r7,r11 + umlal r0,r1,r6,r11 + umlal r2,r3,r4,r10 + str r0,[sp,#0] @ future r4 + mul r0,r11,r8 + ldr r11,[sp,#24] @ reload r11 + adds r2,r2,r1 @ d1+=d0>>32 + eor r1,r1,r1 + adc lr,r3,#0 @ future r6 + str r2,[sp,#4] @ future r5 + + mul r2,r12,r8 + eor r3,r3,r3 + umlal r0,r1,r7,r12 + ldr r12,[sp,#28] @ reload r12 + umlal r2,r3,r7,r9 + umlal r0,r1,r6,r9 + umlal r2,r3,r6,r10 + umlal r0,r1,r5,r10 + umlal r2,r3,r5,r11 + umlal r0,r1,r4,r11 + umlal r2,r3,r4,r12 + ldr r4,[sp,#0] + mul r8,r9,r8 + ldr r5,[sp,#4] + + adds r6,lr,r0 @ d2+=d1>>32 + ldr lr,[sp,#8] @ reload input pointer + adc r1,r1,#0 + adds r7,r2,r1 @ d3+=d2>>32 + ldr r0,[sp,#16] @ reload end pointer + adc r3,r3,#0 + add r8,r8,r3 @ h4+=d3>>32 + + and r1,r8,#-4 + and r8,r8,#3 + add r1,r1,r1,lsr#2 @ *=5 + adds r4,r4,r1 + adcs r5,r5,#0 + adcs r6,r6,#0 + adcs r7,r7,#0 + adc r8,r8,#0 + + cmp r0,lr @ done yet? + bhi .Loop + + ldr r0,[sp,#12] + add sp,sp,#32 + stmia r0,{r4-r8} @ store the result + +.Lno_data: +#if __ARM_ARCH__>=5 + ldmia sp!,{r3-r11,pc} +#else + ldmia sp!,{r3-r11,lr} + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +.size poly1305_blocks,.-poly1305_blocks +.type poly1305_emit,%function +.align 5 +poly1305_emit: + stmdb sp!,{r4-r11} +.Lpoly1305_emit_enter: + + ldmia r0,{r3-r7} + adds r8,r3,#5 @ compare to modulus + adcs r9,r4,#0 + adcs r10,r5,#0 + adcs r11,r6,#0 + adc r7,r7,#0 + tst r7,#4 @ did it carry/borrow? + +#ifdef __thumb2__ + it ne +#endif + movne r3,r8 + ldr r8,[r2,#0] +#ifdef __thumb2__ + it ne +#endif + movne r4,r9 + ldr r9,[r2,#4] +#ifdef __thumb2__ + it ne +#endif + movne r5,r10 + ldr r10,[r2,#8] +#ifdef __thumb2__ + it ne +#endif + movne r6,r11 + ldr r11,[r2,#12] + + adds r3,r3,r8 + adcs r4,r4,r9 + adcs r5,r5,r10 + adc r6,r6,r11 + +#if __ARM_ARCH__>=7 +# ifdef __ARMEB__ + rev r3,r3 + rev r4,r4 + rev r5,r5 + rev r6,r6 +# endif + str r3,[r1,#0] + str r4,[r1,#4] + str r5,[r1,#8] + str r6,[r1,#12] +#else + strb r3,[r1,#0] + mov r3,r3,lsr#8 + strb r4,[r1,#4] + mov r4,r4,lsr#8 + strb r5,[r1,#8] + mov r5,r5,lsr#8 + strb r6,[r1,#12] + mov r6,r6,lsr#8 + + strb r3,[r1,#1] + mov r3,r3,lsr#8 + strb r4,[r1,#5] + mov r4,r4,lsr#8 + strb r5,[r1,#9] + mov r5,r5,lsr#8 + strb r6,[r1,#13] + mov r6,r6,lsr#8 + + strb r3,[r1,#2] + mov r3,r3,lsr#8 + strb r4,[r1,#6] + mov r4,r4,lsr#8 + strb r5,[r1,#10] + mov r5,r5,lsr#8 + strb r6,[r1,#14] + mov r6,r6,lsr#8 + + strb r3,[r1,#3] + strb r4,[r1,#7] + strb r5,[r1,#11] + strb r6,[r1,#15] +#endif + ldmia sp!,{r4-r11} +#if __ARM_ARCH__>=5 + bx lr @ bx lr +#else + tst lr,#1 + moveq pc,lr @ be binary compatible with V4, yet + .word 0xe12fff1e @ interoperable with Thumb ISA:-) +#endif +.size poly1305_emit,.-poly1305_emit +#if __ARM_MAX_ARCH__>=7 +.fpu neon + +.type poly1305_init_neon,%function +.align 5 +poly1305_init_neon: + ldr r4,[r0,#20] @ load key base 2^32 + ldr r5,[r0,#24] + ldr r6,[r0,#28] + ldr r7,[r0,#32] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + and r3,r3,#0x03ffffff + and r4,r4,#0x03ffffff + and r5,r5,#0x03ffffff + + vdup.32 d0,r2 @ r^1 in both lanes + add r2,r3,r3,lsl#2 @ *5 + vdup.32 d1,r3 + add r3,r4,r4,lsl#2 + vdup.32 d2,r2 + vdup.32 d3,r4 + add r4,r5,r5,lsl#2 + vdup.32 d4,r3 + vdup.32 d5,r5 + add r5,r6,r6,lsl#2 + vdup.32 d6,r4 + vdup.32 d7,r6 + vdup.32 d8,r5 + + mov r5,#2 @ counter + +.Lsquare_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + + vmull.u32 q5,d0,d0[1] + vmull.u32 q6,d1,d0[1] + vmull.u32 q7,d3,d0[1] + vmull.u32 q8,d5,d0[1] + vmull.u32 q9,d7,d0[1] + + vmlal.u32 q5,d7,d2[1] + vmlal.u32 q6,d0,d1[1] + vmlal.u32 q7,d1,d1[1] + vmlal.u32 q8,d3,d1[1] + vmlal.u32 q9,d5,d1[1] + + vmlal.u32 q5,d5,d4[1] + vmlal.u32 q6,d7,d4[1] + vmlal.u32 q8,d1,d3[1] + vmlal.u32 q7,d0,d3[1] + vmlal.u32 q9,d3,d3[1] + + vmlal.u32 q5,d3,d6[1] + vmlal.u32 q8,d0,d5[1] + vmlal.u32 q6,d5,d6[1] + vmlal.u32 q7,d7,d6[1] + vmlal.u32 q9,d1,d5[1] + + vmlal.u32 q8,d7,d8[1] + vmlal.u32 q5,d1,d8[1] + vmlal.u32 q6,d3,d8[1] + vmlal.u32 q7,d5,d8[1] + vmlal.u32 q9,d0,d7[1] + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction as discussed in "NEON crypto" by D.J. Bernstein + @ and P. Schwabe + @ + @ H0>>+H1>>+H2>>+H3>>+H4 + @ H3>>+H4>>*5+H0>>+H1 + @ + @ Trivia. + @ + @ Result of multiplication of n-bit number by m-bit number is + @ n+m bits wide. However! Even though 2^n is a n+1-bit number, + @ m-bit number multiplied by 2^n is still n+m bits wide. + @ + @ Sum of two n-bit numbers is n+1 bits wide, sum of three - n+2, + @ and so is sum of four. Sum of 2^m n-m-bit numbers and n-bit + @ one is n+1 bits wide. + @ + @ >>+ denotes Hnext += Hn>>26, Hn &= 0x3ffffff. This means that + @ H0, H2, H3 are guaranteed to be 26 bits wide, while H1 and H4 + @ can be 27. However! In cases when their width exceeds 26 bits + @ they are limited by 2^26+2^6. This in turn means that *sum* + @ of the products with these values can still be viewed as sum + @ of 52-bit numbers as long as the amount of addends is not a + @ power of 2. For example, + @ + @ H4 = H4*R0 + H3*R1 + H2*R2 + H1*R3 + H0 * R4, + @ + @ which can't be larger than 5 * (2^26 + 2^6) * (2^26 + 2^6), or + @ 5 * (2^52 + 2*2^32 + 2^12), which in turn is smaller than + @ 8 * (2^52) or 2^55. However, the value is then multiplied by + @ by 5, so we should be looking at 5 * 5 * (2^52 + 2^33 + 2^12), + @ which is less than 32 * (2^52) or 2^57. And when processing + @ data we are looking at triple as many addends... + @ + @ In key setup procedure pre-reduced H0 is limited by 5*4+1 and + @ 5*H4 - by 5*5 52-bit addends, or 57 bits. But when hashing the + @ input H0 is limited by (5*4+1)*3 addends, or 58 bits, while + @ 5*H4 by 5*5*3, or 59[!] bits. How is this relevant? vmlal.u32 + @ instruction accepts 2x32-bit input and writes 2x64-bit result. + @ This means that result of reduction have to be compressed upon + @ loop wrap-around. This can be done in the process of reduction + @ to minimize amount of instructions [as well as amount of + @ 128-bit instructions, which benefits low-end processors], but + @ one has to watch for H2 (which is narrower than H0) and 5*H4 + @ not being wider than 58 bits, so that result of right shift + @ by 26 bits fits in 32 bits. This is also useful on x86, + @ because it allows to use paddd in place for paddq, which + @ benefits Atom, where paddq is ridiculously slow. + + vshr.u64 q15,q8,#26 + vmovn.i64 d16,q8 + vshr.u64 q4,q5,#26 + vmovn.i64 d10,q5 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vbic.i32 d16,#0xfc000000 @ &=0x03ffffff + vadd.i64 q6,q6,q4 @ h0 -> h1 + vbic.i32 d10,#0xfc000000 + + vshrn.u64 d30,q9,#26 + vmovn.i64 d18,q9 + vshr.u64 q4,q6,#26 + vmovn.i64 d12,q6 + vadd.i64 q7,q7,q4 @ h1 -> h2 + vbic.i32 d18,#0xfc000000 + vbic.i32 d12,#0xfc000000 + + vadd.i32 d10,d10,d30 + vshl.u32 d30,d30,#2 + vshrn.u64 d8,q7,#26 + vmovn.i64 d14,q7 + vadd.i32 d10,d10,d30 @ h4 -> h0 + vadd.i32 d16,d16,d8 @ h2 -> h3 + vbic.i32 d14,#0xfc000000 + + vshr.u32 d30,d10,#26 + vbic.i32 d10,#0xfc000000 + vshr.u32 d8,d16,#26 + vbic.i32 d16,#0xfc000000 + vadd.i32 d12,d12,d30 @ h0 -> h1 + vadd.i32 d18,d18,d8 @ h3 -> h4 + + subs r5,r5,#1 + beq .Lsquare_break_neon + + add r6,r0,#(48+0*9*4) + add r7,r0,#(48+1*9*4) + + vtrn.32 d0,d10 @ r^2:r^1 + vtrn.32 d3,d14 + vtrn.32 d5,d16 + vtrn.32 d1,d12 + vtrn.32 d7,d18 + + vshl.u32 d4,d3,#2 @ *5 + vshl.u32 d6,d5,#2 + vshl.u32 d2,d1,#2 + vshl.u32 d8,d7,#2 + vadd.i32 d4,d4,d3 + vadd.i32 d2,d2,d1 + vadd.i32 d6,d6,d5 + vadd.i32 d8,d8,d7 + + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vst1.32 {d8[0]},[r6,:32] + vst1.32 {d8[1]},[r7,:32] + + b .Lsquare_neon + +.align 4 +.Lsquare_break_neon: + add r6,r0,#(48+2*4*9) + add r7,r0,#(48+3*4*9) + + vmov d0,d10 @ r^4:r^3 + vshl.u32 d2,d12,#2 @ *5 + vmov d1,d12 + vshl.u32 d4,d14,#2 + vmov d3,d14 + vshl.u32 d6,d16,#2 + vmov d5,d16 + vshl.u32 d8,d18,#2 + vmov d7,d18 + vadd.i32 d2,d2,d12 + vadd.i32 d4,d4,d14 + vadd.i32 d6,d6,d16 + vadd.i32 d8,d8,d18 + + vst4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! + vst4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! + vst4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vst4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vst1.32 {d8[0]},[r6] + vst1.32 {d8[1]},[r7] + + bx lr @ bx lr +.size poly1305_init_neon,.-poly1305_init_neon + +.type poly1305_blocks_neon,%function +.align 5 +poly1305_blocks_neon: + ldr ip,[r0,#36] @ is_base2_26 + ands r2,r2,#-16 + beq .Lno_data_neon + + cmp r2,#64 + bhs .Lenter_neon + tst ip,ip @ is_base2_26? + beq .Lpoly1305_blocks + +.Lenter_neon: + stmdb sp!,{r4-r7} + vstmdb sp!,{d8-d15} @ ABI specification says so + + tst ip,ip @ is_base2_26? + bne .Lbase2_26_neon + + stmdb sp!,{r1-r3,lr} + bl poly1305_init_neon + + ldr r4,[r0,#0] @ load hash value base 2^32 + ldr r5,[r0,#4] + ldr r6,[r0,#8] + ldr r7,[r0,#12] + ldr ip,[r0,#16] + + and r2,r4,#0x03ffffff @ base 2^32 -> base 2^26 + mov r3,r4,lsr#26 + veor d10,d10,d10 + mov r4,r5,lsr#20 + orr r3,r3,r5,lsl#6 + veor d12,d12,d12 + mov r5,r6,lsr#14 + orr r4,r4,r6,lsl#12 + veor d14,d14,d14 + mov r6,r7,lsr#8 + orr r5,r5,r7,lsl#18 + veor d16,d16,d16 + and r3,r3,#0x03ffffff + orr r6,r6,ip,lsl#24 + veor d18,d18,d18 + and r4,r4,#0x03ffffff + mov r1,#1 + and r5,r5,#0x03ffffff + str r1,[r0,#36] @ is_base2_26 + + vmov.32 d10[0],r2 + vmov.32 d12[0],r3 + vmov.32 d14[0],r4 + vmov.32 d16[0],r5 + vmov.32 d18[0],r6 + adr r5,.Lzeros + + ldmia sp!,{r1-r3,lr} + b .Lbase2_32_neon + +.align 4 +.Lbase2_26_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ load hash value + + veor d10,d10,d10 + veor d12,d12,d12 + veor d14,d14,d14 + veor d16,d16,d16 + veor d18,d18,d18 + vld4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]! + adr r5,.Lzeros + vld1.32 {d18[0]},[r0] + sub r0,r0,#16 @ rewind + +.Lbase2_32_neon: + add r4,r1,#32 + mov r3,r3,lsl#24 + tst r2,#31 + beq .Leven + + vld4.32 {d20[0],d22[0],d24[0],d26[0]},[r1]! + vmov.32 d28[0],r3 + sub r2,r2,#16 + add r4,r1,#32 + +# ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q13,q13 + vrev32.8 q11,q11 + vrev32.8 q12,q12 +# endif + vsri.u32 d28,d26,#8 @ base 2^32 -> base 2^26 + vshl.u32 d26,d26,#18 + + vsri.u32 d26,d24,#14 + vshl.u32 d24,d24,#12 + vadd.i32 d29,d28,d18 @ add hash value and move to #hi + + vbic.i32 d26,#0xfc000000 + vsri.u32 d24,d22,#20 + vshl.u32 d22,d22,#6 + + vbic.i32 d24,#0xfc000000 + vsri.u32 d22,d20,#26 + vadd.i32 d27,d26,d16 + + vbic.i32 d20,#0xfc000000 + vbic.i32 d22,#0xfc000000 + vadd.i32 d25,d24,d14 + + vadd.i32 d21,d20,d10 + vadd.i32 d23,d22,d12 + + mov r7,r5 + add r6,r0,#48 + + cmp r2,r2 + b .Long_tail + +.align 4 +.Leven: + subs r2,r2,#64 + it lo + movlo r4,r5 + + vmov.i32 q14,#1<<24 @ padbit, yes, always + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1] + add r1,r1,#64 + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0) + add r4,r4,#64 + itt hi + addhi r7,r0,#(48+1*9*4) + addhi r6,r0,#(48+3*9*4) + +# ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q13,q13 + vrev32.8 q11,q11 + vrev32.8 q12,q12 +# endif + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26 + vshl.u32 q13,q13,#18 + + vsri.u32 q13,q12,#14 + vshl.u32 q12,q12,#12 + + vbic.i32 q13,#0xfc000000 + vsri.u32 q12,q11,#20 + vshl.u32 q11,q11,#6 + + vbic.i32 q12,#0xfc000000 + vsri.u32 q11,q10,#26 + + vbic.i32 q10,#0xfc000000 + vbic.i32 q11,#0xfc000000 + + bls .Lskip_loop + + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^2 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + b .Loop_neon + +.align 5 +.Loop_neon: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r + @ ___________________/ + @ ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2 + @ ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r + @ ___________________/ ____________________/ + @ + @ Note that we start with inp[2:3]*r^2. This is because it + @ doesn't depend on reduction in previous iteration. + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ d4 = h4*r0 + h3*r1 + h2*r2 + h1*r3 + h0*r4 + @ d3 = h3*r0 + h2*r1 + h1*r2 + h0*r3 + h4*5*r4 + @ d2 = h2*r0 + h1*r1 + h0*r2 + h4*5*r3 + h3*5*r4 + @ d1 = h1*r0 + h0*r1 + h4*5*r2 + h3*5*r3 + h2*5*r4 + @ d0 = h0*r0 + h4*5*r1 + h3*5*r2 + h2*5*r3 + h1*5*r4 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ inp[2:3]*r^2 + + vadd.i32 d24,d24,d14 @ accumulate inp[0:1] + vmull.u32 q7,d25,d0[1] + vadd.i32 d20,d20,d10 + vmull.u32 q5,d21,d0[1] + vadd.i32 d26,d26,d16 + vmull.u32 q8,d27,d0[1] + vmlal.u32 q7,d23,d1[1] + vadd.i32 d22,d22,d12 + vmull.u32 q6,d23,d0[1] + + vadd.i32 d28,d28,d18 + vmull.u32 q9,d29,d0[1] + subs r2,r2,#64 + vmlal.u32 q5,d29,d2[1] + it lo + movlo r4,r5 + vmlal.u32 q8,d25,d1[1] + vld1.32 d8[1],[r7,:32] + vmlal.u32 q6,d21,d1[1] + vmlal.u32 q9,d27,d1[1] + + vmlal.u32 q5,d27,d4[1] + vmlal.u32 q8,d23,d3[1] + vmlal.u32 q9,d25,d3[1] + vmlal.u32 q6,d29,d4[1] + vmlal.u32 q7,d21,d3[1] + + vmlal.u32 q8,d21,d5[1] + vmlal.u32 q5,d25,d6[1] + vmlal.u32 q9,d23,d5[1] + vmlal.u32 q6,d27,d6[1] + vmlal.u32 q7,d29,d6[1] + + vmlal.u32 q8,d29,d8[1] + vmlal.u32 q5,d23,d8[1] + vmlal.u32 q9,d21,d7[1] + vmlal.u32 q6,d25,d8[1] + vmlal.u32 q7,d27,d8[1] + + vld4.32 {d21,d23,d25,d27},[r4] @ inp[2:3] (or 0) + add r4,r4,#64 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4 and accumulate + + vmlal.u32 q8,d26,d0[0] + vmlal.u32 q5,d20,d0[0] + vmlal.u32 q9,d28,d0[0] + vmlal.u32 q6,d22,d0[0] + vmlal.u32 q7,d24,d0[0] + vld1.32 d8[0],[r6,:32] + + vmlal.u32 q8,d24,d1[0] + vmlal.u32 q5,d28,d2[0] + vmlal.u32 q9,d26,d1[0] + vmlal.u32 q6,d20,d1[0] + vmlal.u32 q7,d22,d1[0] + + vmlal.u32 q8,d22,d3[0] + vmlal.u32 q5,d26,d4[0] + vmlal.u32 q9,d24,d3[0] + vmlal.u32 q6,d28,d4[0] + vmlal.u32 q7,d20,d3[0] + + vmlal.u32 q8,d20,d5[0] + vmlal.u32 q5,d24,d6[0] + vmlal.u32 q9,d22,d5[0] + vmlal.u32 q6,d26,d6[0] + vmlal.u32 q8,d28,d8[0] + + vmlal.u32 q7,d28,d6[0] + vmlal.u32 q5,d22,d8[0] + vmlal.u32 q9,d20,d7[0] + vmov.i32 q14,#1<<24 @ padbit, yes, always + vmlal.u32 q6,d24,d8[0] + vmlal.u32 q7,d26,d8[0] + + vld4.32 {d20,d22,d24,d26},[r1] @ inp[0:1] + add r1,r1,#64 +# ifdef __ARMEB__ + vrev32.8 q10,q10 + vrev32.8 q11,q11 + vrev32.8 q12,q12 + vrev32.8 q13,q13 +# endif + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction interleaved with base 2^32 -> base 2^26 of + @ inp[0:3] previously loaded to q10-q13 and smashed to q10-q14. + + vshr.u64 q15,q8,#26 + vmovn.i64 d16,q8 + vshr.u64 q4,q5,#26 + vmovn.i64 d10,q5 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vbic.i32 d16,#0xfc000000 + vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26 + vadd.i64 q6,q6,q4 @ h0 -> h1 + vshl.u32 q13,q13,#18 + vbic.i32 d10,#0xfc000000 + + vshrn.u64 d30,q9,#26 + vmovn.i64 d18,q9 + vshr.u64 q4,q6,#26 + vmovn.i64 d12,q6 + vadd.i64 q7,q7,q4 @ h1 -> h2 + vsri.u32 q13,q12,#14 + vbic.i32 d18,#0xfc000000 + vshl.u32 q12,q12,#12 + vbic.i32 d12,#0xfc000000 + + vadd.i32 d10,d10,d30 + vshl.u32 d30,d30,#2 + vbic.i32 q13,#0xfc000000 + vshrn.u64 d8,q7,#26 + vmovn.i64 d14,q7 + vaddl.u32 q5,d10,d30 @ h4 -> h0 [widen for a sec] + vsri.u32 q12,q11,#20 + vadd.i32 d16,d16,d8 @ h2 -> h3 + vshl.u32 q11,q11,#6 + vbic.i32 d14,#0xfc000000 + vbic.i32 q12,#0xfc000000 + + vshrn.u64 d30,q5,#26 @ re-narrow + vmovn.i64 d10,q5 + vsri.u32 q11,q10,#26 + vbic.i32 q10,#0xfc000000 + vshr.u32 d8,d16,#26 + vbic.i32 d16,#0xfc000000 + vbic.i32 d10,#0xfc000000 + vadd.i32 d12,d12,d30 @ h0 -> h1 + vadd.i32 d18,d18,d8 @ h3 -> h4 + vbic.i32 q11,#0xfc000000 + + bhi .Loop_neon + +.Lskip_loop: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1 + + add r7,r0,#(48+0*9*4) + add r6,r0,#(48+1*9*4) + adds r2,r2,#32 + it ne + movne r2,#0 + bne .Long_tail + + vadd.i32 d25,d24,d14 @ add hash value and move to #hi + vadd.i32 d21,d20,d10 + vadd.i32 d27,d26,d16 + vadd.i32 d23,d22,d12 + vadd.i32 d29,d28,d18 + +.Long_tail: + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^1 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^2 + + vadd.i32 d24,d24,d14 @ can be redundant + vmull.u32 q7,d25,d0 + vadd.i32 d20,d20,d10 + vmull.u32 q5,d21,d0 + vadd.i32 d26,d26,d16 + vmull.u32 q8,d27,d0 + vadd.i32 d22,d22,d12 + vmull.u32 q6,d23,d0 + vadd.i32 d28,d28,d18 + vmull.u32 q9,d29,d0 + + vmlal.u32 q5,d29,d2 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vmlal.u32 q8,d25,d1 + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vmlal.u32 q6,d21,d1 + vmlal.u32 q9,d27,d1 + vmlal.u32 q7,d23,d1 + + vmlal.u32 q8,d23,d3 + vld1.32 d8[1],[r7,:32] + vmlal.u32 q5,d27,d4 + vld1.32 d8[0],[r6,:32] + vmlal.u32 q9,d25,d3 + vmlal.u32 q6,d29,d4 + vmlal.u32 q7,d21,d3 + + vmlal.u32 q8,d21,d5 + it ne + addne r7,r0,#(48+2*9*4) + vmlal.u32 q5,d25,d6 + it ne + addne r6,r0,#(48+3*9*4) + vmlal.u32 q9,d23,d5 + vmlal.u32 q6,d27,d6 + vmlal.u32 q7,d29,d6 + + vmlal.u32 q8,d29,d8 + vorn q0,q0,q0 @ all-ones, can be redundant + vmlal.u32 q5,d23,d8 + vshr.u64 q0,q0,#38 + vmlal.u32 q9,d21,d7 + vmlal.u32 q6,d25,d8 + vmlal.u32 q7,d27,d8 + + beq .Lshort_tail + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ (hash+inp[0:1])*r^4:r^3 and accumulate + + vld4.32 {d0[1],d1[1],d2[1],d3[1]},[r7]! @ load r^3 + vld4.32 {d0[0],d1[0],d2[0],d3[0]},[r6]! @ load r^4 + + vmlal.u32 q7,d24,d0 + vmlal.u32 q5,d20,d0 + vmlal.u32 q8,d26,d0 + vmlal.u32 q6,d22,d0 + vmlal.u32 q9,d28,d0 + + vmlal.u32 q5,d28,d2 + vld4.32 {d4[1],d5[1],d6[1],d7[1]},[r7]! + vmlal.u32 q8,d24,d1 + vld4.32 {d4[0],d5[0],d6[0],d7[0]},[r6]! + vmlal.u32 q6,d20,d1 + vmlal.u32 q9,d26,d1 + vmlal.u32 q7,d22,d1 + + vmlal.u32 q8,d22,d3 + vld1.32 d8[1],[r7,:32] + vmlal.u32 q5,d26,d4 + vld1.32 d8[0],[r6,:32] + vmlal.u32 q9,d24,d3 + vmlal.u32 q6,d28,d4 + vmlal.u32 q7,d20,d3 + + vmlal.u32 q8,d20,d5 + vmlal.u32 q5,d24,d6 + vmlal.u32 q9,d22,d5 + vmlal.u32 q6,d26,d6 + vmlal.u32 q7,d28,d6 + + vmlal.u32 q8,d28,d8 + vorn q0,q0,q0 @ all-ones + vmlal.u32 q5,d22,d8 + vshr.u64 q0,q0,#38 + vmlal.u32 q9,d20,d7 + vmlal.u32 q6,d24,d8 + vmlal.u32 q7,d26,d8 + +.Lshort_tail: + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ horizontal addition + + vadd.i64 d16,d16,d17 + vadd.i64 d10,d10,d11 + vadd.i64 d18,d18,d19 + vadd.i64 d12,d12,d13 + vadd.i64 d14,d14,d15 + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ lazy reduction, but without narrowing + + vshr.u64 q15,q8,#26 + vand.i64 q8,q8,q0 + vshr.u64 q4,q5,#26 + vand.i64 q5,q5,q0 + vadd.i64 q9,q9,q15 @ h3 -> h4 + vadd.i64 q6,q6,q4 @ h0 -> h1 + + vshr.u64 q15,q9,#26 + vand.i64 q9,q9,q0 + vshr.u64 q4,q6,#26 + vand.i64 q6,q6,q0 + vadd.i64 q7,q7,q4 @ h1 -> h2 + + vadd.i64 q5,q5,q15 + vshl.u64 q15,q15,#2 + vshr.u64 q4,q7,#26 + vand.i64 q7,q7,q0 + vadd.i64 q5,q5,q15 @ h4 -> h0 + vadd.i64 q8,q8,q4 @ h2 -> h3 + + vshr.u64 q15,q5,#26 + vand.i64 q5,q5,q0 + vshr.u64 q4,q8,#26 + vand.i64 q8,q8,q0 + vadd.i64 q6,q6,q15 @ h0 -> h1 + vadd.i64 q9,q9,q4 @ h3 -> h4 + + cmp r2,#0 + bne .Leven + + @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ + @ store hash value + + vst4.32 {d10[0],d12[0],d14[0],d16[0]},[r0]! + vst1.32 {d18[0]},[r0] + + vldmia sp!,{d8-d15} @ epilogue + ldmia sp!,{r4-r7} +.Lno_data_neon: + bx lr @ bx lr +.size poly1305_blocks_neon,.-poly1305_blocks_neon + +.type poly1305_emit_neon,%function +.align 5 +poly1305_emit_neon: + ldr ip,[r0,#36] @ is_base2_26 + + stmdb sp!,{r4-r11} + + tst ip,ip + beq .Lpoly1305_emit_enter + + ldmia r0,{r3-r7} + eor r8,r8,r8 + + adds r3,r3,r4,lsl#26 @ base 2^26 -> base 2^32 + mov r4,r4,lsr#6 + adcs r4,r4,r5,lsl#20 + mov r5,r5,lsr#12 + adcs r5,r5,r6,lsl#14 + mov r6,r6,lsr#18 + adcs r6,r6,r7,lsl#8 + adc r7,r8,r7,lsr#24 @ can be partially reduced ... + + and r8,r7,#-4 @ ... so reduce + and r7,r6,#3 + add r8,r8,r8,lsr#2 @ *= 5 + adds r3,r3,r8 + adcs r4,r4,#0 + adcs r5,r5,#0 + adcs r6,r6,#0 + adc r7,r7,#0 + + adds r8,r3,#5 @ compare to modulus + adcs r9,r4,#0 + adcs r10,r5,#0 + adcs r11,r6,#0 + adc r7,r7,#0 + tst r7,#4 @ did it carry/borrow? + + it ne + movne r3,r8 + ldr r8,[r2,#0] + it ne + movne r4,r9 + ldr r9,[r2,#4] + it ne + movne r5,r10 + ldr r10,[r2,#8] + it ne + movne r6,r11 + ldr r11,[r2,#12] + + adds r3,r3,r8 @ accumulate nonce + adcs r4,r4,r9 + adcs r5,r5,r10 + adc r6,r6,r11 + +# ifdef __ARMEB__ + rev r3,r3 + rev r4,r4 + rev r5,r5 + rev r6,r6 +# endif + str r3,[r1,#0] @ store the result + str r4,[r1,#4] + str r5,[r1,#8] + str r6,[r1,#12] + + ldmia sp!,{r4-r11} + bx lr @ bx lr +.size poly1305_emit_neon,.-poly1305_emit_neon + +.align 5 +.Lzeros: +.long 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 +.LOPENSSL_armcap: +.word OPENSSL_armcap_P-.Lpoly1305_init +#endif +.asciz "Poly1305 for ARMv4/NEON, CRYPTOGAMS by " +.align 2 +#if __ARM_MAX_ARCH__>=7 +.comm OPENSSL_armcap_P,4,4 +#endif diff --git a/lib/zinc/poly1305/poly1305-arm64-cryptogams.S b/lib/zinc/poly1305/poly1305-arm64-cryptogams.S new file mode 100644 index 000000000000..0ecb50a83ec0 --- /dev/null +++ b/lib/zinc/poly1305/poly1305-arm64-cryptogams.S @@ -0,0 +1,869 @@ +/* SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause */ +/* + * Copyright (C) 2006-2017 CRYPTOGAMS by . All Rights Reserved. + */ + +#include "arm_arch.h" + +.text + +// forward "declarations" are required for Apple + +.globl poly1305_blocks +.globl poly1305_emit + +.globl poly1305_init +.type poly1305_init,%function +.align 5 +poly1305_init: + cmp x1,xzr + stp xzr,xzr,[x0] // zero hash value + stp xzr,xzr,[x0,#16] // [along with is_base2_26] + + csel x0,xzr,x0,eq + b.eq .Lno_key + +#ifdef __ILP32__ + ldrsw x11,.LOPENSSL_armcap_P +#else + ldr x11,.LOPENSSL_armcap_P +#endif + adr x10,.LOPENSSL_armcap_P + + ldp x7,x8,[x1] // load key + mov x9,#0xfffffffc0fffffff + movk x9,#0x0fff,lsl#48 + ldr w17,[x10,x11] +#ifdef __ARMEB__ + rev x7,x7 // flip bytes + rev x8,x8 +#endif + and x7,x7,x9 // &=0ffffffc0fffffff + and x9,x9,#-4 + and x8,x8,x9 // &=0ffffffc0ffffffc + stp x7,x8,[x0,#32] // save key value + + tst w17,#ARMV7_NEON + + adr x12,poly1305_blocks + adr x7,poly1305_blocks_neon + adr x13,poly1305_emit + adr x8,poly1305_emit_neon + + csel x12,x12,x7,eq + csel x13,x13,x8,eq + +#ifdef __ILP32__ + stp w12,w13,[x2] +#else + stp x12,x13,[x2] +#endif + + mov x0,#1 +.Lno_key: + ret +.size poly1305_init,.-poly1305_init + +.type poly1305_blocks,%function +.align 5 +poly1305_blocks: + ands x2,x2,#-16 + b.eq .Lno_data + + ldp x4,x5,[x0] // load hash value + ldp x7,x8,[x0,#32] // load key value + ldr x6,[x0,#16] + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) + b .Loop + +.align 5 +.Loop: + ldp x10,x11,[x1],#16 // load input + sub x2,x2,#16 +#ifdef __ARMEB__ + rev x10,x10 + rev x11,x11 +#endif + adds x4,x4,x10 // accumulate input + adcs x5,x5,x11 + + mul x12,x4,x7 // h0*r0 + adc x6,x6,x3 + umulh x13,x4,x7 + + mul x10,x5,x9 // h1*5*r1 + umulh x11,x5,x9 + + adds x12,x12,x10 + mul x10,x4,x8 // h0*r1 + adc x13,x13,x11 + umulh x14,x4,x8 + + adds x13,x13,x10 + mul x10,x5,x7 // h1*r0 + adc x14,x14,xzr + umulh x11,x5,x7 + + adds x13,x13,x10 + mul x10,x6,x9 // h2*5*r1 + adc x14,x14,x11 + mul x11,x6,x7 // h2*r0 + + adds x13,x13,x10 + adc x14,x14,x11 + + and x10,x14,#-4 // final reduction + and x6,x14,#3 + add x10,x10,x14,lsr#2 + adds x4,x12,x10 + adcs x5,x13,xzr + adc x6,x6,xzr + + cbnz x2,.Loop + + stp x4,x5,[x0] // store hash value + str x6,[x0,#16] + +.Lno_data: + ret +.size poly1305_blocks,.-poly1305_blocks + +.type poly1305_emit,%function +.align 5 +poly1305_emit: + ldp x4,x5,[x0] // load hash base 2^64 + ldr x6,[x0,#16] + ldp x10,x11,[x2] // load nonce + + adds x12,x4,#5 // compare to modulus + adcs x13,x5,xzr + adc x14,x6,xzr + + tst x14,#-4 // see if it's carried/borrowed + + csel x4,x4,x12,eq + csel x5,x5,x13,eq + +#ifdef __ARMEB__ + ror x10,x10,#32 // flip nonce words + ror x11,x11,#32 +#endif + adds x4,x4,x10 // accumulate nonce + adc x5,x5,x11 +#ifdef __ARMEB__ + rev x4,x4 // flip output bytes + rev x5,x5 +#endif + stp x4,x5,[x1] // write result + + ret +.size poly1305_emit,.-poly1305_emit +.type poly1305_mult,%function +.align 5 +poly1305_mult: + mul x12,x4,x7 // h0*r0 + umulh x13,x4,x7 + + mul x10,x5,x9 // h1*5*r1 + umulh x11,x5,x9 + + adds x12,x12,x10 + mul x10,x4,x8 // h0*r1 + adc x13,x13,x11 + umulh x14,x4,x8 + + adds x13,x13,x10 + mul x10,x5,x7 // h1*r0 + adc x14,x14,xzr + umulh x11,x5,x7 + + adds x13,x13,x10 + mul x10,x6,x9 // h2*5*r1 + adc x14,x14,x11 + mul x11,x6,x7 // h2*r0 + + adds x13,x13,x10 + adc x14,x14,x11 + + and x10,x14,#-4 // final reduction + and x6,x14,#3 + add x10,x10,x14,lsr#2 + adds x4,x12,x10 + adcs x5,x13,xzr + adc x6,x6,xzr + + ret +.size poly1305_mult,.-poly1305_mult + +.type poly1305_splat,%function +.align 5 +poly1305_splat: + and x12,x4,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x13,x4,#26,#26 + extr x14,x5,x4,#52 + and x14,x14,#0x03ffffff + ubfx x15,x5,#14,#26 + extr x16,x6,x5,#40 + + str w12,[x0,#16*0] // r0 + add w12,w13,w13,lsl#2 // r1*5 + str w13,[x0,#16*1] // r1 + add w13,w14,w14,lsl#2 // r2*5 + str w12,[x0,#16*2] // s1 + str w14,[x0,#16*3] // r2 + add w14,w15,w15,lsl#2 // r3*5 + str w13,[x0,#16*4] // s2 + str w15,[x0,#16*5] // r3 + add w15,w16,w16,lsl#2 // r4*5 + str w14,[x0,#16*6] // s3 + str w16,[x0,#16*7] // r4 + str w15,[x0,#16*8] // s4 + + ret +.size poly1305_splat,.-poly1305_splat + +.type poly1305_blocks_neon,%function +.align 5 +poly1305_blocks_neon: + ldr x17,[x0,#24] + cmp x2,#128 + b.hs .Lblocks_neon + cbz x17,poly1305_blocks + +.Lblocks_neon: + stp x29,x30,[sp,#-80]! + add x29,sp,#0 + + ands x2,x2,#-16 + b.eq .Lno_data_neon + + cbz x17,.Lbase2_64_neon + + ldp w10,w11,[x0] // load hash value base 2^26 + ldp w12,w13,[x0,#8] + ldr w14,[x0,#16] + + tst x2,#31 + b.eq .Leven_neon + + ldp x7,x8,[x0,#32] // load key value + + add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64 + lsr x5,x12,#12 + adds x4,x4,x12,lsl#52 + add x5,x5,x13,lsl#14 + adc x5,x5,xzr + lsr x6,x14,#24 + adds x5,x5,x14,lsl#40 + adc x14,x6,xzr // can be partially reduced... + + ldp x12,x13,[x1],#16 // load input + sub x2,x2,#16 + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) + + and x10,x14,#-4 // ... so reduce + and x6,x14,#3 + add x10,x10,x14,lsr#2 + adds x4,x4,x10 + adcs x5,x5,xzr + adc x6,x6,xzr + +#ifdef __ARMEB__ + rev x12,x12 + rev x13,x13 +#endif + adds x4,x4,x12 // accumulate input + adcs x5,x5,x13 + adc x6,x6,x3 + + bl poly1305_mult + ldr x30,[sp,#8] + + cbz x3,.Lstore_base2_64_neon + + and x10,x4,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x11,x4,#26,#26 + extr x12,x5,x4,#52 + and x12,x12,#0x03ffffff + ubfx x13,x5,#14,#26 + extr x14,x6,x5,#40 + + cbnz x2,.Leven_neon + + stp w10,w11,[x0] // store hash value base 2^26 + stp w12,w13,[x0,#8] + str w14,[x0,#16] + b .Lno_data_neon + +.align 4 +.Lstore_base2_64_neon: + stp x4,x5,[x0] // store hash value base 2^64 + stp x6,xzr,[x0,#16] // note that is_base2_26 is zeroed + b .Lno_data_neon + +.align 4 +.Lbase2_64_neon: + ldp x7,x8,[x0,#32] // load key value + + ldp x4,x5,[x0] // load hash value base 2^64 + ldr x6,[x0,#16] + + tst x2,#31 + b.eq .Linit_neon + + ldp x12,x13,[x1],#16 // load input + sub x2,x2,#16 + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) +#ifdef __ARMEB__ + rev x12,x12 + rev x13,x13 +#endif + adds x4,x4,x12 // accumulate input + adcs x5,x5,x13 + adc x6,x6,x3 + + bl poly1305_mult + +.Linit_neon: + and x10,x4,#0x03ffffff // base 2^64 -> base 2^26 + ubfx x11,x4,#26,#26 + extr x12,x5,x4,#52 + and x12,x12,#0x03ffffff + ubfx x13,x5,#14,#26 + extr x14,x6,x5,#40 + + stp d8,d9,[sp,#16] // meet ABI requirements + stp d10,d11,[sp,#32] + stp d12,d13,[sp,#48] + stp d14,d15,[sp,#64] + + fmov d24,x10 + fmov d25,x11 + fmov d26,x12 + fmov d27,x13 + fmov d28,x14 + + ////////////////////////////////// initialize r^n table + mov x4,x7 // r^1 + add x9,x8,x8,lsr#2 // s1 = r1 + (r1 >> 2) + mov x5,x8 + mov x6,xzr + add x0,x0,#48+12 + bl poly1305_splat + + bl poly1305_mult // r^2 + sub x0,x0,#4 + bl poly1305_splat + + bl poly1305_mult // r^3 + sub x0,x0,#4 + bl poly1305_splat + + bl poly1305_mult // r^4 + sub x0,x0,#4 + bl poly1305_splat + ldr x30,[sp,#8] + + add x16,x1,#32 + adr x17,.Lzeros + subs x2,x2,#64 + csel x16,x17,x16,lo + + mov x4,#1 + str x4,[x0,#-24] // set is_base2_26 + sub x0,x0,#48 // restore original x0 + b .Ldo_neon + +.align 4 +.Leven_neon: + add x16,x1,#32 + adr x17,.Lzeros + subs x2,x2,#64 + csel x16,x17,x16,lo + + stp d8,d9,[sp,#16] // meet ABI requirements + stp d10,d11,[sp,#32] + stp d12,d13,[sp,#48] + stp d14,d15,[sp,#64] + + fmov d24,x10 + fmov d25,x11 + fmov d26,x12 + fmov d27,x13 + fmov d28,x14 + +.Ldo_neon: + ldp x8,x12,[x16],#16 // inp[2:3] (or zero) + ldp x9,x13,[x16],#48 + + lsl x3,x3,#24 + add x15,x0,#48 + +#ifdef __ARMEB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + and x5,x9,#0x03ffffff + ubfx x6,x8,#26,#26 + ubfx x7,x9,#26,#26 + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + extr x8,x12,x8,#52 + extr x9,x13,x9,#52 + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + fmov d14,x4 + and x8,x8,#0x03ffffff + and x9,x9,#0x03ffffff + ubfx x10,x12,#14,#26 + ubfx x11,x13,#14,#26 + add x12,x3,x12,lsr#40 + add x13,x3,x13,lsr#40 + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + fmov d15,x6 + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + fmov d16,x8 + fmov d17,x10 + fmov d18,x12 + + ldp x8,x12,[x1],#16 // inp[0:1] + ldp x9,x13,[x1],#48 + + ld1 {v0.4s,v1.4s,v2.4s,v3.4s},[x15],#64 + ld1 {v4.4s,v5.4s,v6.4s,v7.4s},[x15],#64 + ld1 {v8.4s},[x15] + +#ifdef __ARMEB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + and x5,x9,#0x03ffffff + ubfx x6,x8,#26,#26 + ubfx x7,x9,#26,#26 + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + extr x8,x12,x8,#52 + extr x9,x13,x9,#52 + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + fmov d9,x4 + and x8,x8,#0x03ffffff + and x9,x9,#0x03ffffff + ubfx x10,x12,#14,#26 + ubfx x11,x13,#14,#26 + add x12,x3,x12,lsr#40 + add x13,x3,x13,lsr#40 + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + fmov d10,x6 + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + movi v31.2d,#-1 + fmov d11,x8 + fmov d12,x10 + fmov d13,x12 + ushr v31.2d,v31.2d,#38 + + b.ls .Lskip_loop + +.align 4 +.Loop_neon: + //////////////////////////////////////////////////////////////// + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2 + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^3+inp[7]*r + // ___________________/ + // ((inp[0]*r^4+inp[2]*r^2+inp[4])*r^4+inp[6]*r^2+inp[8])*r^2 + // ((inp[1]*r^4+inp[3]*r^2+inp[5])*r^4+inp[7]*r^2+inp[9])*r + // ___________________/ ____________________/ + // + // Note that we start with inp[2:3]*r^2. This is because it + // doesn't depend on reduction in previous iteration. + //////////////////////////////////////////////////////////////// + // d4 = h0*r4 + h1*r3 + h2*r2 + h3*r1 + h4*r0 + // d3 = h0*r3 + h1*r2 + h2*r1 + h3*r0 + h4*5*r4 + // d2 = h0*r2 + h1*r1 + h2*r0 + h3*5*r4 + h4*5*r3 + // d1 = h0*r1 + h1*r0 + h2*5*r4 + h3*5*r3 + h4*5*r2 + // d0 = h0*r0 + h1*5*r4 + h2*5*r3 + h3*5*r2 + h4*5*r1 + + subs x2,x2,#64 + umull v23.2d,v14.2s,v7.s[2] + csel x16,x17,x16,lo + umull v22.2d,v14.2s,v5.s[2] + umull v21.2d,v14.2s,v3.s[2] + ldp x8,x12,[x16],#16 // inp[2:3] (or zero) + umull v20.2d,v14.2s,v1.s[2] + ldp x9,x13,[x16],#48 + umull v19.2d,v14.2s,v0.s[2] +#ifdef __ARMEB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + + umlal v23.2d,v15.2s,v5.s[2] + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + umlal v22.2d,v15.2s,v3.s[2] + and x5,x9,#0x03ffffff + umlal v21.2d,v15.2s,v1.s[2] + ubfx x6,x8,#26,#26 + umlal v20.2d,v15.2s,v0.s[2] + ubfx x7,x9,#26,#26 + umlal v19.2d,v15.2s,v8.s[2] + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + + umlal v23.2d,v16.2s,v3.s[2] + extr x8,x12,x8,#52 + umlal v22.2d,v16.2s,v1.s[2] + extr x9,x13,x9,#52 + umlal v21.2d,v16.2s,v0.s[2] + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + umlal v20.2d,v16.2s,v8.s[2] + fmov d14,x4 + umlal v19.2d,v16.2s,v6.s[2] + and x8,x8,#0x03ffffff + + umlal v23.2d,v17.2s,v1.s[2] + and x9,x9,#0x03ffffff + umlal v22.2d,v17.2s,v0.s[2] + ubfx x10,x12,#14,#26 + umlal v21.2d,v17.2s,v8.s[2] + ubfx x11,x13,#14,#26 + umlal v20.2d,v17.2s,v6.s[2] + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + umlal v19.2d,v17.2s,v4.s[2] + fmov d15,x6 + + add v11.2s,v11.2s,v26.2s + add x12,x3,x12,lsr#40 + umlal v23.2d,v18.2s,v0.s[2] + add x13,x3,x13,lsr#40 + umlal v22.2d,v18.2s,v8.s[2] + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + umlal v21.2d,v18.2s,v6.s[2] + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + umlal v20.2d,v18.2s,v4.s[2] + fmov d16,x8 + umlal v19.2d,v18.2s,v2.s[2] + fmov d17,x10 + + //////////////////////////////////////////////////////////////// + // (hash+inp[0:1])*r^4 and accumulate + + add v9.2s,v9.2s,v24.2s + fmov d18,x12 + umlal v22.2d,v11.2s,v1.s[0] + ldp x8,x12,[x1],#16 // inp[0:1] + umlal v19.2d,v11.2s,v6.s[0] + ldp x9,x13,[x1],#48 + umlal v23.2d,v11.2s,v3.s[0] + umlal v20.2d,v11.2s,v8.s[0] + umlal v21.2d,v11.2s,v0.s[0] +#ifdef __ARMEB__ + rev x8,x8 + rev x12,x12 + rev x9,x9 + rev x13,x13 +#endif + + add v10.2s,v10.2s,v25.2s + umlal v22.2d,v9.2s,v5.s[0] + umlal v23.2d,v9.2s,v7.s[0] + and x4,x8,#0x03ffffff // base 2^64 -> base 2^26 + umlal v21.2d,v9.2s,v3.s[0] + and x5,x9,#0x03ffffff + umlal v19.2d,v9.2s,v0.s[0] + ubfx x6,x8,#26,#26 + umlal v20.2d,v9.2s,v1.s[0] + ubfx x7,x9,#26,#26 + + add v12.2s,v12.2s,v27.2s + add x4,x4,x5,lsl#32 // bfi x4,x5,#32,#32 + umlal v22.2d,v10.2s,v3.s[0] + extr x8,x12,x8,#52 + umlal v23.2d,v10.2s,v5.s[0] + extr x9,x13,x9,#52 + umlal v19.2d,v10.2s,v8.s[0] + add x6,x6,x7,lsl#32 // bfi x6,x7,#32,#32 + umlal v21.2d,v10.2s,v1.s[0] + fmov d9,x4 + umlal v20.2d,v10.2s,v0.s[0] + and x8,x8,#0x03ffffff + + add v13.2s,v13.2s,v28.2s + and x9,x9,#0x03ffffff + umlal v22.2d,v12.2s,v0.s[0] + ubfx x10,x12,#14,#26 + umlal v19.2d,v12.2s,v4.s[0] + ubfx x11,x13,#14,#26 + umlal v23.2d,v12.2s,v1.s[0] + add x8,x8,x9,lsl#32 // bfi x8,x9,#32,#32 + umlal v20.2d,v12.2s,v6.s[0] + fmov d10,x6 + umlal v21.2d,v12.2s,v8.s[0] + add x12,x3,x12,lsr#40 + + umlal v22.2d,v13.2s,v8.s[0] + add x13,x3,x13,lsr#40 + umlal v19.2d,v13.2s,v2.s[0] + add x10,x10,x11,lsl#32 // bfi x10,x11,#32,#32 + umlal v23.2d,v13.2s,v0.s[0] + add x12,x12,x13,lsl#32 // bfi x12,x13,#32,#32 + umlal v20.2d,v13.2s,v4.s[0] + fmov d11,x8 + umlal v21.2d,v13.2s,v6.s[0] + fmov d12,x10 + fmov d13,x12 + + ///////////////////////////////////////////////////////////////// + // lazy reduction as discussed in "NEON crypto" by D.J. Bernstein + // and P. Schwabe + // + // [see discussion in poly1305-armv4 module] + + ushr v29.2d,v22.2d,#26 + xtn v27.2s,v22.2d + ushr v30.2d,v19.2d,#26 + and v19.16b,v19.16b,v31.16b + add v23.2d,v23.2d,v29.2d // h3 -> h4 + bic v27.2s,#0xfc,lsl#24 // &=0x03ffffff + add v20.2d,v20.2d,v30.2d // h0 -> h1 + + ushr v29.2d,v23.2d,#26 + xtn v28.2s,v23.2d + ushr v30.2d,v20.2d,#26 + xtn v25.2s,v20.2d + bic v28.2s,#0xfc,lsl#24 + add v21.2d,v21.2d,v30.2d // h1 -> h2 + + add v19.2d,v19.2d,v29.2d + shl v29.2d,v29.2d,#2 + shrn v30.2s,v21.2d,#26 + xtn v26.2s,v21.2d + add v19.2d,v19.2d,v29.2d // h4 -> h0 + bic v25.2s,#0xfc,lsl#24 + add v27.2s,v27.2s,v30.2s // h2 -> h3 + bic v26.2s,#0xfc,lsl#24 + + shrn v29.2s,v19.2d,#26 + xtn v24.2s,v19.2d + ushr v30.2s,v27.2s,#26 + bic v27.2s,#0xfc,lsl#24 + bic v24.2s,#0xfc,lsl#24 + add v25.2s,v25.2s,v29.2s // h0 -> h1 + add v28.2s,v28.2s,v30.2s // h3 -> h4 + + b.hi .Loop_neon + +.Lskip_loop: + dup v16.2d,v16.d[0] + add v11.2s,v11.2s,v26.2s + + //////////////////////////////////////////////////////////////// + // multiply (inp[0:1]+hash) or inp[2:3] by r^2:r^1 + + adds x2,x2,#32 + b.ne .Long_tail + + dup v16.2d,v11.d[0] + add v14.2s,v9.2s,v24.2s + add v17.2s,v12.2s,v27.2s + add v15.2s,v10.2s,v25.2s + add v18.2s,v13.2s,v28.2s + +.Long_tail: + dup v14.2d,v14.d[0] + umull2 v19.2d,v16.4s,v6.4s + umull2 v22.2d,v16.4s,v1.4s + umull2 v23.2d,v16.4s,v3.4s + umull2 v21.2d,v16.4s,v0.4s + umull2 v20.2d,v16.4s,v8.4s + + dup v15.2d,v15.d[0] + umlal2 v19.2d,v14.4s,v0.4s + umlal2 v21.2d,v14.4s,v3.4s + umlal2 v22.2d,v14.4s,v5.4s + umlal2 v23.2d,v14.4s,v7.4s + umlal2 v20.2d,v14.4s,v1.4s + + dup v17.2d,v17.d[0] + umlal2 v19.2d,v15.4s,v8.4s + umlal2 v22.2d,v15.4s,v3.4s + umlal2 v21.2d,v15.4s,v1.4s + umlal2 v23.2d,v15.4s,v5.4s + umlal2 v20.2d,v15.4s,v0.4s + + dup v18.2d,v18.d[0] + umlal2 v22.2d,v17.4s,v0.4s + umlal2 v23.2d,v17.4s,v1.4s + umlal2 v19.2d,v17.4s,v4.4s + umlal2 v20.2d,v17.4s,v6.4s + umlal2 v21.2d,v17.4s,v8.4s + + umlal2 v22.2d,v18.4s,v8.4s + umlal2 v19.2d,v18.4s,v2.4s + umlal2 v23.2d,v18.4s,v0.4s + umlal2 v20.2d,v18.4s,v4.4s + umlal2 v21.2d,v18.4s,v6.4s + + b.eq .Lshort_tail + + //////////////////////////////////////////////////////////////// + // (hash+inp[0:1])*r^4:r^3 and accumulate + + add v9.2s,v9.2s,v24.2s + umlal v22.2d,v11.2s,v1.2s + umlal v19.2d,v11.2s,v6.2s + umlal v23.2d,v11.2s,v3.2s + umlal v20.2d,v11.2s,v8.2s + umlal v21.2d,v11.2s,v0.2s + + add v10.2s,v10.2s,v25.2s + umlal v22.2d,v9.2s,v5.2s + umlal v19.2d,v9.2s,v0.2s + umlal v23.2d,v9.2s,v7.2s + umlal v20.2d,v9.2s,v1.2s + umlal v21.2d,v9.2s,v3.2s + + add v12.2s,v12.2s,v27.2s + umlal v22.2d,v10.2s,v3.2s + umlal v19.2d,v10.2s,v8.2s + umlal v23.2d,v10.2s,v5.2s + umlal v20.2d,v10.2s,v0.2s + umlal v21.2d,v10.2s,v1.2s + + add v13.2s,v13.2s,v28.2s + umlal v22.2d,v12.2s,v0.2s + umlal v19.2d,v12.2s,v4.2s + umlal v23.2d,v12.2s,v1.2s + umlal v20.2d,v12.2s,v6.2s + umlal v21.2d,v12.2s,v8.2s + + umlal v22.2d,v13.2s,v8.2s + umlal v19.2d,v13.2s,v2.2s + umlal v23.2d,v13.2s,v0.2s + umlal v20.2d,v13.2s,v4.2s + umlal v21.2d,v13.2s,v6.2s + +.Lshort_tail: + //////////////////////////////////////////////////////////////// + // horizontal add + + addp v22.2d,v22.2d,v22.2d + ldp d8,d9,[sp,#16] // meet ABI requirements + addp v19.2d,v19.2d,v19.2d + ldp d10,d11,[sp,#32] + addp v23.2d,v23.2d,v23.2d + ldp d12,d13,[sp,#48] + addp v20.2d,v20.2d,v20.2d + ldp d14,d15,[sp,#64] + addp v21.2d,v21.2d,v21.2d + + //////////////////////////////////////////////////////////////// + // lazy reduction, but without narrowing + + ushr v29.2d,v22.2d,#26 + and v22.16b,v22.16b,v31.16b + ushr v30.2d,v19.2d,#26 + and v19.16b,v19.16b,v31.16b + + add v23.2d,v23.2d,v29.2d // h3 -> h4 + add v20.2d,v20.2d,v30.2d // h0 -> h1 + + ushr v29.2d,v23.2d,#26 + and v23.16b,v23.16b,v31.16b + ushr v30.2d,v20.2d,#26 + and v20.16b,v20.16b,v31.16b + add v21.2d,v21.2d,v30.2d // h1 -> h2 + + add v19.2d,v19.2d,v29.2d + shl v29.2d,v29.2d,#2 + ushr v30.2d,v21.2d,#26 + and v21.16b,v21.16b,v31.16b + add v19.2d,v19.2d,v29.2d // h4 -> h0 + add v22.2d,v22.2d,v30.2d // h2 -> h3 + + ushr v29.2d,v19.2d,#26 + and v19.16b,v19.16b,v31.16b + ushr v30.2d,v22.2d,#26 + and v22.16b,v22.16b,v31.16b + add v20.2d,v20.2d,v29.2d // h0 -> h1 + add v23.2d,v23.2d,v30.2d // h3 -> h4 + + //////////////////////////////////////////////////////////////// + // write the result, can be partially reduced + + st4 {v19.s,v20.s,v21.s,v22.s}[0],[x0],#16 + st1 {v23.s}[0],[x0] + +.Lno_data_neon: + ldr x29,[sp],#80 + ret +.size poly1305_blocks_neon,.-poly1305_blocks_neon + +.type poly1305_emit_neon,%function +.align 5 +poly1305_emit_neon: + ldr x17,[x0,#24] + cbz x17,poly1305_emit + + ldp w10,w11,[x0] // load hash value base 2^26 + ldp w12,w13,[x0,#8] + ldr w14,[x0,#16] + + add x4,x10,x11,lsl#26 // base 2^26 -> base 2^64 + lsr x5,x12,#12 + adds x4,x4,x12,lsl#52 + add x5,x5,x13,lsl#14 + adc x5,x5,xzr + lsr x6,x14,#24 + adds x5,x5,x14,lsl#40 + adc x6,x6,xzr // can be partially reduced... + + ldp x10,x11,[x2] // load nonce + + and x12,x6,#-4 // ... so reduce + add x12,x12,x6,lsr#2 + and x6,x6,#3 + adds x4,x4,x12 + adcs x5,x5,xzr + adc x6,x6,xzr + + adds x12,x4,#5 // compare to modulus + adcs x13,x5,xzr + adc x14,x6,xzr + + tst x14,#-4 // see if it's carried/borrowed + + csel x4,x4,x12,eq + csel x5,x5,x13,eq + +#ifdef __ARMEB__ + ror x10,x10,#32 // flip nonce words + ror x11,x11,#32 +#endif + adds x4,x4,x10 // accumulate nonce + adc x5,x5,x11 +#ifdef __ARMEB__ + rev x4,x4 // flip output bytes + rev x5,x5 +#endif + stp x4,x5,[x1] // write result + + ret +.size poly1305_emit_neon,.-poly1305_emit_neon + +.align 5 +.Lzeros: +.long 0,0,0,0,0,0,0,0 +.LOPENSSL_armcap_P: +#ifdef __ILP32__ +.long OPENSSL_armcap_P-. +#else +.quad OPENSSL_armcap_P-. +#endif +.byte 80,111,108,121,49,51,48,53,32,102,111,114,32,65,82,77,118,56,44,32,67,82,89,80,84,79,71,65,77,83,32,98,121,32,60,97,112,112,114,111,64,111,112,101,110,115,115,108,46,111,114,103,62,0 +.align 2 +.align 2 From patchwork Tue Sep 25 14:56:15 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147497 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp830666lji; Tue, 25 Sep 2018 07:57:24 -0700 (PDT) X-Google-Smtp-Source: ACcGV615M8XOsaeBXAS6Ac7mQIJQrL3n49aRPNs2a8oTGJwPW3jsnw0VIo4YNaZ4tPPETt9lC6fE X-Received: by 2002:a62:18a:: with SMTP id 132-v6mr1539989pfb.207.1537887444283; Tue, 25 Sep 2018 07:57:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887444; cv=none; d=google.com; s=arc-20160816; b=q/2BDp05J3pc20njE68YZfhBI8PIjc4sP2nb6kWVaUTVscJx9ur+OHqnxYZv1twTyg /qsBfhoum0gL5oDBGtUBX9cxYRp9iOwnxg9wz4MeOBdYNe5JZcv8nmGqYjNglSKxsumY vpbxCcEDTTwYsUbaLwQ68zfbdppEb1t/0PFJHzaH6DNy3ULB9PqB6WXeqTMbwq0YBmwI bRFKJ2m5GviQhbWFLj950K97nmJz2u2dGCXgAHY2xAmPzigeKTnh3VTzDd++x7/8sOaO 8fYt/GUCnraebtFMdUwggk9QUk91axQ1QvWh/TcMoUm6IM+QjOwVcWTdKQ9kJmuQK8bS muhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=VmsU88sSVSldbHrpXk0E2YgD9+DRHLepvg1c4mFuIts=; b=gaVsyd/KhGJhwllv4JPv7ojvtXQVVmCZs5Hy2Tx1IGM5TZv4i5w5jdha2zNybFQ2Q7 8vnHv3ttrEqjKjZ+OyfteJPZvR6PrRge0EYh3zA1TBsOcRnj3WVTS7V5FMwuL8/3ucE5 zyk4UEKM4usvEKeXeql6cCQDARe0qhcIA2NTIAlBdxFY2fOjD3v3F/1tqy6r/S18u6Qw 9H1S2Fey4FL9mrdAbh6trrx/+E2gOTS91mQlUOsHjYfoTlZ1kylAsk38IZWTnARiHhN3 UJbMXGykeBSIDYmTInzkU+FhNluE6NOTunOjg3wiGt+sMXLYXr7yWC8PAtOzNGw/Wdr2 NUgQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=Xp3aYMWi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p188-v6si395936pfg.197.2018.09.25.07.57.23; Tue, 25 Sep 2018 07:57:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=Xp3aYMWi; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729851AbeIYVFO (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:14 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:58261 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729224AbeIYVFK (ORCPT ); Tue, 25 Sep 2018 17:05:10 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 9ef61da1; Tue, 25 Sep 2018 14:38:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=LJnJTQ9WGHeZZheKgSq4zCWR6 Zc=; b=Xp3aYMWizU9URZqWyDOHWOdQcTMg9nAPd0k6odOemOS+2KywIztG2AuJM msnyooa/U8Db2jwQlT+YVoSagR11TL7jnlRumPKYq6bVHQcRmit8iJ3U/aDp6MsS y+Qk6FBiJ1BdSBdcEOCheBXSEe1RRwIvNN0uRcEfhTb1yKF64+taMbbzhO06isO6 PJxSX8n99iVUkcFK8qemPNVFvdxskF1yoxAW+SQ4GXqyaldkHX0AwxiTEOuEAmkE C6ZQK3H19HpSMQGj6NCWrDcLzKBygICNBv4EohBHLdsONgKy1R4JeqmVicw8ni8Q pFohja+p3o0uwSEpkcORS5tOIMJJA== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 49d94dc7 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:48 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Thomas Gleixner , Ingo Molnar , x86@kernel.org Subject: [PATCH net-next v6 16/23] zinc: BLAKE2s x86_64 implementation Date: Tue, 25 Sep 2018 16:56:15 +0200 Message-Id: <20180925145622.29959-17-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org These implementations from Samuel Neves support AVX and AVX-512VL. Originally this used AVX-512F, but Skylake thermal throttling made AVX-512VL more attractive and possible to do with negligable difference. Signed-off-by: Jason A. Donenfeld Signed-off-by: Samuel Neves Co-developed-by: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Thomas Gleixner Cc: Ingo Molnar Cc: x86@kernel.org --- lib/zinc/Makefile | 1 + lib/zinc/blake2s/blake2s-x86_64-glue.h | 59 +++ lib/zinc/blake2s/blake2s-x86_64.S | 685 +++++++++++++++++++++++++ lib/zinc/blake2s/blake2s.c | 4 +- 4 files changed, 748 insertions(+), 1 deletion(-) create mode 100644 lib/zinc/blake2s/blake2s-x86_64-glue.h create mode 100644 lib/zinc/blake2s/blake2s-x86_64.S -- 2.19.0 diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile index d2ec55c33ef0..67ad837c822c 100644 --- a/lib/zinc/Makefile +++ b/lib/zinc/Makefile @@ -23,4 +23,5 @@ zinc_chacha20poly1305-y := chacha20poly1305.o obj-$(CONFIG_ZINC_CHACHA20POLY1305) += zinc_chacha20poly1305.o zinc_blake2s-y := blake2s/blake2s.o +zinc_blake2s-$(CONFIG_ZINC_ARCH_X86_64) += blake2s/blake2s-x86_64.o obj-$(CONFIG_ZINC_BLAKE2S) += zinc_blake2s.o diff --git a/lib/zinc/blake2s/blake2s-x86_64-glue.h b/lib/zinc/blake2s/blake2s-x86_64-glue.h new file mode 100644 index 000000000000..8f2079ffcd79 --- /dev/null +++ b/lib/zinc/blake2s/blake2s-x86_64-glue.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include +#include +#include +#include + +#ifdef CONFIG_AS_AVX +asmlinkage void blake2s_compress_avx(struct blake2s_state *state, + const u8 *block, const size_t nblocks, + const u32 inc); +#endif +#ifdef CONFIG_AS_AVX512 +asmlinkage void blake2s_compress_avx512(struct blake2s_state *state, + const u8 *block, const size_t nblocks, + const u32 inc); +#endif + +static bool blake2s_use_avx __ro_after_init; +static bool blake2s_use_avx512 __ro_after_init; + +static void __init blake2s_fpu_init(void) +{ + blake2s_use_avx = + boot_cpu_has(X86_FEATURE_AVX) && + cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL); + blake2s_use_avx512 = + boot_cpu_has(X86_FEATURE_AVX) && + boot_cpu_has(X86_FEATURE_AVX2) && + boot_cpu_has(X86_FEATURE_AVX512F) && + boot_cpu_has(X86_FEATURE_AVX512VL) && + cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM | + XFEATURE_MASK_AVX512, NULL); +} + +static inline bool blake2s_arch(struct blake2s_state *state, const u8 *block, + size_t nblocks, const u32 inc) +{ +#ifdef CONFIG_AS_AVX512 + if (blake2s_use_avx512 && irq_fpu_usable()) { + kernel_fpu_begin(); + blake2s_compress_avx512(state, block, nblocks, inc); + kernel_fpu_end(); + return true; + } +#endif +#ifdef CONFIG_AS_AVX + if (blake2s_use_avx && irq_fpu_usable()) { + kernel_fpu_begin(); + blake2s_compress_avx(state, block, nblocks, inc); + kernel_fpu_end(); + return true; + } +#endif + return false; +} diff --git a/lib/zinc/blake2s/blake2s-x86_64.S b/lib/zinc/blake2s/blake2s-x86_64.S new file mode 100644 index 000000000000..1407a5ffb5c2 --- /dev/null +++ b/lib/zinc/blake2s/blake2s-x86_64.S @@ -0,0 +1,685 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * Copyright (C) 2017 Samuel Neves . All Rights Reserved. + */ + +#include + +.section .rodata.cst32.BLAKE2S_IV, "aM", @progbits, 32 +.align 32 +IV: .octa 0xA54FF53A3C6EF372BB67AE856A09E667 + .octa 0x5BE0CD191F83D9AB9B05688C510E527F +.section .rodata.cst16.ROT16, "aM", @progbits, 16 +.align 16 +ROT16: .octa 0x0D0C0F0E09080B0A0504070601000302 +.section .rodata.cst16.ROR328, "aM", @progbits, 16 +.align 16 +ROR328: .octa 0x0C0F0E0D080B0A090407060500030201 +#ifdef CONFIG_AS_AVX512 +.section .rodata.cst64.BLAKE2S_SIGMA, "aM", @progbits, 640 +.align 64 +SIGMA: +.long 0, 2, 4, 6, 1, 3, 5, 7, 8, 10, 12, 14, 9, 11, 13, 15 +.long 11, 2, 12, 14, 9, 8, 15, 3, 4, 0, 13, 6, 10, 1, 7, 5 +.long 10, 12, 11, 6, 5, 9, 13, 3, 4, 15, 14, 2, 0, 7, 8, 1 +.long 10, 9, 7, 0, 11, 14, 1, 12, 6, 2, 15, 3, 13, 8, 5, 4 +.long 4, 9, 8, 13, 14, 0, 10, 11, 7, 3, 12, 1, 5, 6, 15, 2 +.long 2, 10, 4, 14, 13, 3, 9, 11, 6, 5, 7, 12, 15, 1, 8, 0 +.long 4, 11, 14, 8, 13, 10, 12, 5, 2, 1, 15, 3, 9, 7, 0, 6 +.long 6, 12, 0, 13, 15, 2, 1, 10, 4, 5, 11, 14, 8, 3, 9, 7 +.long 14, 5, 4, 12, 9, 7, 3, 10, 2, 0, 6, 15, 11, 1, 13, 8 +.long 11, 7, 13, 10, 12, 14, 0, 15, 4, 5, 6, 9, 2, 1, 8, 3 +#endif /* CONFIG_AS_AVX512 */ + +.text +#ifdef CONFIG_AS_AVX +ENTRY(blake2s_compress_avx) + movl %ecx, %ecx + testq %rdx, %rdx + je .Lendofloop + .align 32 +.Lbeginofloop: + addq %rcx, 32(%rdi) + vmovdqu IV+16(%rip), %xmm1 + vmovdqu (%rsi), %xmm4 + vpxor 32(%rdi), %xmm1, %xmm1 + vmovdqu 16(%rsi), %xmm3 + vshufps $136, %xmm3, %xmm4, %xmm6 + vmovdqa ROT16(%rip), %xmm7 + vpaddd (%rdi), %xmm6, %xmm6 + vpaddd 16(%rdi), %xmm6, %xmm6 + vpxor %xmm6, %xmm1, %xmm1 + vmovdqu IV(%rip), %xmm8 + vpshufb %xmm7, %xmm1, %xmm1 + vmovdqu 48(%rsi), %xmm5 + vpaddd %xmm1, %xmm8, %xmm8 + vpxor 16(%rdi), %xmm8, %xmm9 + vmovdqu 32(%rsi), %xmm2 + vpblendw $12, %xmm3, %xmm5, %xmm13 + vshufps $221, %xmm5, %xmm2, %xmm12 + vpunpckhqdq %xmm2, %xmm4, %xmm14 + vpslld $20, %xmm9, %xmm0 + vpsrld $12, %xmm9, %xmm9 + vpxor %xmm0, %xmm9, %xmm0 + vshufps $221, %xmm3, %xmm4, %xmm9 + vpaddd %xmm9, %xmm6, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vmovdqa ROR328(%rip), %xmm6 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm8, %xmm8 + vpxor %xmm8, %xmm0, %xmm0 + vpshufd $147, %xmm1, %xmm1 + vpshufd $78, %xmm8, %xmm8 + vpslld $25, %xmm0, %xmm10 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm10, %xmm0, %xmm0 + vshufps $136, %xmm5, %xmm2, %xmm10 + vpshufd $57, %xmm0, %xmm0 + vpaddd %xmm10, %xmm9, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpaddd %xmm12, %xmm9, %xmm9 + vpblendw $12, %xmm2, %xmm3, %xmm12 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm8, %xmm8 + vpxor %xmm8, %xmm0, %xmm10 + vpslld $20, %xmm10, %xmm0 + vpsrld $12, %xmm10, %xmm10 + vpxor %xmm0, %xmm10, %xmm0 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm8, %xmm8 + vpxor %xmm8, %xmm0, %xmm0 + vpshufd $57, %xmm1, %xmm1 + vpshufd $78, %xmm8, %xmm8 + vpslld $25, %xmm0, %xmm10 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm10, %xmm0, %xmm0 + vpslldq $4, %xmm5, %xmm10 + vpblendw $240, %xmm10, %xmm12, %xmm12 + vpshufd $147, %xmm0, %xmm0 + vpshufd $147, %xmm12, %xmm12 + vpaddd %xmm9, %xmm12, %xmm12 + vpaddd %xmm0, %xmm12, %xmm12 + vpxor %xmm12, %xmm1, %xmm1 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm8, %xmm8 + vpxor %xmm8, %xmm0, %xmm11 + vpslld $20, %xmm11, %xmm9 + vpsrld $12, %xmm11, %xmm11 + vpxor %xmm9, %xmm11, %xmm0 + vpshufd $8, %xmm2, %xmm9 + vpblendw $192, %xmm5, %xmm3, %xmm11 + vpblendw $240, %xmm11, %xmm9, %xmm9 + vpshufd $177, %xmm9, %xmm9 + vpaddd %xmm12, %xmm9, %xmm9 + vpaddd %xmm0, %xmm9, %xmm11 + vpxor %xmm11, %xmm1, %xmm1 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm8, %xmm8 + vpxor %xmm8, %xmm0, %xmm9 + vpshufd $147, %xmm1, %xmm1 + vpshufd $78, %xmm8, %xmm8 + vpslld $25, %xmm9, %xmm0 + vpsrld $7, %xmm9, %xmm9 + vpxor %xmm0, %xmm9, %xmm0 + vpslldq $4, %xmm3, %xmm9 + vpblendw $48, %xmm9, %xmm2, %xmm9 + vpblendw $240, %xmm9, %xmm4, %xmm9 + vpshufd $57, %xmm0, %xmm0 + vpshufd $177, %xmm9, %xmm9 + vpaddd %xmm11, %xmm9, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm8, %xmm11 + vpxor %xmm11, %xmm0, %xmm0 + vpslld $20, %xmm0, %xmm8 + vpsrld $12, %xmm0, %xmm0 + vpxor %xmm8, %xmm0, %xmm0 + vpunpckhdq %xmm3, %xmm4, %xmm8 + vpblendw $12, %xmm10, %xmm8, %xmm12 + vpshufd $177, %xmm12, %xmm12 + vpaddd %xmm9, %xmm12, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm11, %xmm11 + vpxor %xmm11, %xmm0, %xmm0 + vpshufd $57, %xmm1, %xmm1 + vpshufd $78, %xmm11, %xmm11 + vpslld $25, %xmm0, %xmm12 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm12, %xmm0, %xmm0 + vpunpckhdq %xmm5, %xmm2, %xmm12 + vpshufd $147, %xmm0, %xmm0 + vpblendw $15, %xmm13, %xmm12, %xmm12 + vpslldq $8, %xmm5, %xmm13 + vpshufd $210, %xmm12, %xmm12 + vpaddd %xmm9, %xmm12, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm11, %xmm11 + vpxor %xmm11, %xmm0, %xmm0 + vpslld $20, %xmm0, %xmm12 + vpsrld $12, %xmm0, %xmm0 + vpxor %xmm12, %xmm0, %xmm0 + vpunpckldq %xmm4, %xmm2, %xmm12 + vpblendw $240, %xmm4, %xmm12, %xmm12 + vpblendw $192, %xmm13, %xmm12, %xmm12 + vpsrldq $12, %xmm3, %xmm13 + vpaddd %xmm12, %xmm9, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm11, %xmm11 + vpxor %xmm11, %xmm0, %xmm0 + vpshufd $147, %xmm1, %xmm1 + vpshufd $78, %xmm11, %xmm11 + vpslld $25, %xmm0, %xmm12 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm12, %xmm0, %xmm0 + vpblendw $60, %xmm2, %xmm4, %xmm12 + vpblendw $3, %xmm13, %xmm12, %xmm12 + vpshufd $57, %xmm0, %xmm0 + vpshufd $78, %xmm12, %xmm12 + vpaddd %xmm9, %xmm12, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm11, %xmm11 + vpxor %xmm11, %xmm0, %xmm12 + vpslld $20, %xmm12, %xmm13 + vpsrld $12, %xmm12, %xmm0 + vpblendw $51, %xmm3, %xmm4, %xmm12 + vpxor %xmm13, %xmm0, %xmm0 + vpblendw $192, %xmm10, %xmm12, %xmm10 + vpslldq $8, %xmm2, %xmm12 + vpshufd $27, %xmm10, %xmm10 + vpaddd %xmm9, %xmm10, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm11, %xmm11 + vpxor %xmm11, %xmm0, %xmm0 + vpshufd $57, %xmm1, %xmm1 + vpshufd $78, %xmm11, %xmm11 + vpslld $25, %xmm0, %xmm10 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm10, %xmm0, %xmm0 + vpunpckhdq %xmm2, %xmm8, %xmm10 + vpshufd $147, %xmm0, %xmm0 + vpblendw $12, %xmm5, %xmm10, %xmm10 + vpshufd $210, %xmm10, %xmm10 + vpaddd %xmm9, %xmm10, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm11, %xmm11 + vpxor %xmm11, %xmm0, %xmm10 + vpslld $20, %xmm10, %xmm0 + vpsrld $12, %xmm10, %xmm10 + vpxor %xmm0, %xmm10, %xmm0 + vpblendw $12, %xmm4, %xmm5, %xmm10 + vpblendw $192, %xmm12, %xmm10, %xmm10 + vpunpckldq %xmm2, %xmm4, %xmm12 + vpshufd $135, %xmm10, %xmm10 + vpaddd %xmm9, %xmm10, %xmm9 + vpaddd %xmm0, %xmm9, %xmm9 + vpxor %xmm9, %xmm1, %xmm1 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm11, %xmm13 + vpxor %xmm13, %xmm0, %xmm0 + vpshufd $147, %xmm1, %xmm1 + vpshufd $78, %xmm13, %xmm13 + vpslld $25, %xmm0, %xmm10 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm10, %xmm0, %xmm0 + vpblendw $15, %xmm3, %xmm4, %xmm10 + vpblendw $192, %xmm5, %xmm10, %xmm10 + vpshufd $57, %xmm0, %xmm0 + vpshufd $198, %xmm10, %xmm10 + vpaddd %xmm9, %xmm10, %xmm10 + vpaddd %xmm0, %xmm10, %xmm10 + vpxor %xmm10, %xmm1, %xmm1 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm13, %xmm13 + vpxor %xmm13, %xmm0, %xmm9 + vpslld $20, %xmm9, %xmm0 + vpsrld $12, %xmm9, %xmm9 + vpxor %xmm0, %xmm9, %xmm0 + vpunpckhdq %xmm2, %xmm3, %xmm9 + vpunpcklqdq %xmm12, %xmm9, %xmm15 + vpunpcklqdq %xmm12, %xmm8, %xmm12 + vpblendw $15, %xmm5, %xmm8, %xmm8 + vpaddd %xmm15, %xmm10, %xmm15 + vpaddd %xmm0, %xmm15, %xmm15 + vpxor %xmm15, %xmm1, %xmm1 + vpshufd $141, %xmm8, %xmm8 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm13, %xmm13 + vpxor %xmm13, %xmm0, %xmm0 + vpshufd $57, %xmm1, %xmm1 + vpshufd $78, %xmm13, %xmm13 + vpslld $25, %xmm0, %xmm10 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm10, %xmm0, %xmm0 + vpunpcklqdq %xmm2, %xmm3, %xmm10 + vpshufd $147, %xmm0, %xmm0 + vpblendw $51, %xmm14, %xmm10, %xmm14 + vpshufd $135, %xmm14, %xmm14 + vpaddd %xmm15, %xmm14, %xmm14 + vpaddd %xmm0, %xmm14, %xmm14 + vpxor %xmm14, %xmm1, %xmm1 + vpunpcklqdq %xmm3, %xmm4, %xmm15 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm13, %xmm13 + vpxor %xmm13, %xmm0, %xmm0 + vpslld $20, %xmm0, %xmm11 + vpsrld $12, %xmm0, %xmm0 + vpxor %xmm11, %xmm0, %xmm0 + vpunpckhqdq %xmm5, %xmm3, %xmm11 + vpblendw $51, %xmm15, %xmm11, %xmm11 + vpunpckhqdq %xmm3, %xmm5, %xmm15 + vpaddd %xmm11, %xmm14, %xmm11 + vpaddd %xmm0, %xmm11, %xmm11 + vpxor %xmm11, %xmm1, %xmm1 + vpshufb %xmm6, %xmm1, %xmm1 + vpaddd %xmm1, %xmm13, %xmm13 + vpxor %xmm13, %xmm0, %xmm0 + vpshufd $147, %xmm1, %xmm1 + vpshufd $78, %xmm13, %xmm13 + vpslld $25, %xmm0, %xmm14 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm14, %xmm0, %xmm14 + vpunpckhqdq %xmm4, %xmm2, %xmm0 + vpshufd $57, %xmm14, %xmm14 + vpblendw $51, %xmm15, %xmm0, %xmm15 + vpaddd %xmm15, %xmm11, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm1, %xmm1 + vpshufb %xmm7, %xmm1, %xmm1 + vpaddd %xmm1, %xmm13, %xmm13 + vpxor %xmm13, %xmm14, %xmm14 + vpslld $20, %xmm14, %xmm11 + vpsrld $12, %xmm14, %xmm14 + vpxor %xmm11, %xmm14, %xmm14 + vpblendw $3, %xmm2, %xmm4, %xmm11 + vpslldq $8, %xmm11, %xmm0 + vpblendw $15, %xmm5, %xmm0, %xmm0 + vpshufd $99, %xmm0, %xmm0 + vpaddd %xmm15, %xmm0, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm1, %xmm0 + vpaddd %xmm12, %xmm15, %xmm15 + vpshufb %xmm6, %xmm0, %xmm0 + vpaddd %xmm0, %xmm13, %xmm13 + vpxor %xmm13, %xmm14, %xmm14 + vpshufd $57, %xmm0, %xmm0 + vpshufd $78, %xmm13, %xmm13 + vpslld $25, %xmm14, %xmm1 + vpsrld $7, %xmm14, %xmm14 + vpxor %xmm1, %xmm14, %xmm14 + vpblendw $3, %xmm5, %xmm4, %xmm1 + vpshufd $147, %xmm14, %xmm14 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpshufb %xmm7, %xmm0, %xmm0 + vpaddd %xmm0, %xmm13, %xmm13 + vpxor %xmm13, %xmm14, %xmm14 + vpslld $20, %xmm14, %xmm12 + vpsrld $12, %xmm14, %xmm14 + vpxor %xmm12, %xmm14, %xmm14 + vpsrldq $4, %xmm2, %xmm12 + vpblendw $60, %xmm12, %xmm1, %xmm1 + vpaddd %xmm1, %xmm15, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpblendw $12, %xmm4, %xmm3, %xmm1 + vpshufb %xmm6, %xmm0, %xmm0 + vpaddd %xmm0, %xmm13, %xmm13 + vpxor %xmm13, %xmm14, %xmm14 + vpshufd $147, %xmm0, %xmm0 + vpshufd $78, %xmm13, %xmm13 + vpslld $25, %xmm14, %xmm12 + vpsrld $7, %xmm14, %xmm14 + vpxor %xmm12, %xmm14, %xmm14 + vpsrldq $4, %xmm5, %xmm12 + vpblendw $48, %xmm12, %xmm1, %xmm1 + vpshufd $33, %xmm5, %xmm12 + vpshufd $57, %xmm14, %xmm14 + vpshufd $108, %xmm1, %xmm1 + vpblendw $51, %xmm12, %xmm10, %xmm12 + vpaddd %xmm15, %xmm1, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpaddd %xmm12, %xmm15, %xmm15 + vpshufb %xmm7, %xmm0, %xmm0 + vpaddd %xmm0, %xmm13, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpslld $20, %xmm14, %xmm13 + vpsrld $12, %xmm14, %xmm14 + vpxor %xmm13, %xmm14, %xmm14 + vpslldq $12, %xmm3, %xmm13 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpshufb %xmm6, %xmm0, %xmm0 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpshufd $57, %xmm0, %xmm0 + vpshufd $78, %xmm1, %xmm1 + vpslld $25, %xmm14, %xmm12 + vpsrld $7, %xmm14, %xmm14 + vpxor %xmm12, %xmm14, %xmm14 + vpblendw $51, %xmm5, %xmm4, %xmm12 + vpshufd $147, %xmm14, %xmm14 + vpblendw $192, %xmm13, %xmm12, %xmm12 + vpaddd %xmm12, %xmm15, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpsrldq $4, %xmm3, %xmm12 + vpshufb %xmm7, %xmm0, %xmm0 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpslld $20, %xmm14, %xmm13 + vpsrld $12, %xmm14, %xmm14 + vpxor %xmm13, %xmm14, %xmm14 + vpblendw $48, %xmm2, %xmm5, %xmm13 + vpblendw $3, %xmm12, %xmm13, %xmm13 + vpshufd $156, %xmm13, %xmm13 + vpaddd %xmm15, %xmm13, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpshufb %xmm6, %xmm0, %xmm0 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpshufd $147, %xmm0, %xmm0 + vpshufd $78, %xmm1, %xmm1 + vpslld $25, %xmm14, %xmm13 + vpsrld $7, %xmm14, %xmm14 + vpxor %xmm13, %xmm14, %xmm14 + vpunpcklqdq %xmm2, %xmm4, %xmm13 + vpshufd $57, %xmm14, %xmm14 + vpblendw $12, %xmm12, %xmm13, %xmm12 + vpshufd $180, %xmm12, %xmm12 + vpaddd %xmm15, %xmm12, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpshufb %xmm7, %xmm0, %xmm0 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpslld $20, %xmm14, %xmm12 + vpsrld $12, %xmm14, %xmm14 + vpxor %xmm12, %xmm14, %xmm14 + vpunpckhqdq %xmm9, %xmm4, %xmm12 + vpshufd $198, %xmm12, %xmm12 + vpaddd %xmm15, %xmm12, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpaddd %xmm15, %xmm8, %xmm15 + vpshufb %xmm6, %xmm0, %xmm0 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpshufd $57, %xmm0, %xmm0 + vpshufd $78, %xmm1, %xmm1 + vpslld $25, %xmm14, %xmm12 + vpsrld $7, %xmm14, %xmm14 + vpxor %xmm12, %xmm14, %xmm14 + vpsrldq $4, %xmm4, %xmm12 + vpshufd $147, %xmm14, %xmm14 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm15, %xmm0, %xmm0 + vpshufb %xmm7, %xmm0, %xmm0 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpslld $20, %xmm14, %xmm8 + vpsrld $12, %xmm14, %xmm14 + vpxor %xmm14, %xmm8, %xmm14 + vpblendw $48, %xmm5, %xmm2, %xmm8 + vpblendw $3, %xmm12, %xmm8, %xmm8 + vpunpckhqdq %xmm5, %xmm4, %xmm12 + vpshufd $75, %xmm8, %xmm8 + vpblendw $60, %xmm10, %xmm12, %xmm10 + vpaddd %xmm15, %xmm8, %xmm15 + vpaddd %xmm14, %xmm15, %xmm15 + vpxor %xmm0, %xmm15, %xmm0 + vpshufd $45, %xmm10, %xmm10 + vpshufb %xmm6, %xmm0, %xmm0 + vpaddd %xmm15, %xmm10, %xmm15 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm1, %xmm14, %xmm14 + vpshufd $147, %xmm0, %xmm0 + vpshufd $78, %xmm1, %xmm1 + vpslld $25, %xmm14, %xmm8 + vpsrld $7, %xmm14, %xmm14 + vpxor %xmm14, %xmm8, %xmm8 + vpshufd $57, %xmm8, %xmm8 + vpaddd %xmm8, %xmm15, %xmm15 + vpxor %xmm0, %xmm15, %xmm0 + vpshufb %xmm7, %xmm0, %xmm0 + vpaddd %xmm0, %xmm1, %xmm1 + vpxor %xmm8, %xmm1, %xmm8 + vpslld $20, %xmm8, %xmm10 + vpsrld $12, %xmm8, %xmm8 + vpxor %xmm8, %xmm10, %xmm10 + vpunpckldq %xmm3, %xmm4, %xmm8 + vpunpcklqdq %xmm9, %xmm8, %xmm9 + vpaddd %xmm9, %xmm15, %xmm9 + vpaddd %xmm10, %xmm9, %xmm9 + vpxor %xmm0, %xmm9, %xmm8 + vpshufb %xmm6, %xmm8, %xmm8 + vpaddd %xmm8, %xmm1, %xmm1 + vpxor %xmm1, %xmm10, %xmm10 + vpshufd $57, %xmm8, %xmm8 + vpshufd $78, %xmm1, %xmm1 + vpslld $25, %xmm10, %xmm12 + vpsrld $7, %xmm10, %xmm10 + vpxor %xmm10, %xmm12, %xmm10 + vpblendw $48, %xmm4, %xmm3, %xmm12 + vpshufd $147, %xmm10, %xmm0 + vpunpckhdq %xmm5, %xmm3, %xmm10 + vpshufd $78, %xmm12, %xmm12 + vpunpcklqdq %xmm4, %xmm10, %xmm10 + vpblendw $192, %xmm2, %xmm10, %xmm10 + vpshufhw $78, %xmm10, %xmm10 + vpaddd %xmm10, %xmm9, %xmm10 + vpaddd %xmm0, %xmm10, %xmm10 + vpxor %xmm8, %xmm10, %xmm8 + vpshufb %xmm7, %xmm8, %xmm8 + vpaddd %xmm8, %xmm1, %xmm1 + vpxor %xmm0, %xmm1, %xmm9 + vpslld $20, %xmm9, %xmm0 + vpsrld $12, %xmm9, %xmm9 + vpxor %xmm9, %xmm0, %xmm0 + vpunpckhdq %xmm5, %xmm4, %xmm9 + vpblendw $240, %xmm9, %xmm2, %xmm13 + vpshufd $39, %xmm13, %xmm13 + vpaddd %xmm10, %xmm13, %xmm10 + vpaddd %xmm0, %xmm10, %xmm10 + vpxor %xmm8, %xmm10, %xmm8 + vpblendw $12, %xmm4, %xmm2, %xmm13 + vpshufb %xmm6, %xmm8, %xmm8 + vpslldq $4, %xmm13, %xmm13 + vpblendw $15, %xmm5, %xmm13, %xmm13 + vpaddd %xmm8, %xmm1, %xmm1 + vpxor %xmm1, %xmm0, %xmm0 + vpaddd %xmm13, %xmm10, %xmm13 + vpshufd $147, %xmm8, %xmm8 + vpshufd $78, %xmm1, %xmm1 + vpslld $25, %xmm0, %xmm14 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm0, %xmm14, %xmm14 + vpshufd $57, %xmm14, %xmm14 + vpaddd %xmm14, %xmm13, %xmm13 + vpxor %xmm8, %xmm13, %xmm8 + vpaddd %xmm13, %xmm12, %xmm12 + vpshufb %xmm7, %xmm8, %xmm8 + vpaddd %xmm8, %xmm1, %xmm1 + vpxor %xmm14, %xmm1, %xmm14 + vpslld $20, %xmm14, %xmm10 + vpsrld $12, %xmm14, %xmm14 + vpxor %xmm14, %xmm10, %xmm10 + vpaddd %xmm10, %xmm12, %xmm12 + vpxor %xmm8, %xmm12, %xmm8 + vpshufb %xmm6, %xmm8, %xmm8 + vpaddd %xmm8, %xmm1, %xmm1 + vpxor %xmm1, %xmm10, %xmm0 + vpshufd $57, %xmm8, %xmm8 + vpshufd $78, %xmm1, %xmm1 + vpslld $25, %xmm0, %xmm10 + vpsrld $7, %xmm0, %xmm0 + vpxor %xmm0, %xmm10, %xmm10 + vpblendw $48, %xmm2, %xmm3, %xmm0 + vpblendw $15, %xmm11, %xmm0, %xmm0 + vpshufd $147, %xmm10, %xmm10 + vpshufd $114, %xmm0, %xmm0 + vpaddd %xmm12, %xmm0, %xmm0 + vpaddd %xmm10, %xmm0, %xmm0 + vpxor %xmm8, %xmm0, %xmm8 + vpshufb %xmm7, %xmm8, %xmm8 + vpaddd %xmm8, %xmm1, %xmm1 + vpxor %xmm10, %xmm1, %xmm10 + vpslld $20, %xmm10, %xmm11 + vpsrld $12, %xmm10, %xmm10 + vpxor %xmm10, %xmm11, %xmm10 + vpslldq $4, %xmm4, %xmm11 + vpblendw $192, %xmm11, %xmm3, %xmm3 + vpunpckldq %xmm5, %xmm4, %xmm4 + vpshufd $99, %xmm3, %xmm3 + vpaddd %xmm0, %xmm3, %xmm3 + vpaddd %xmm10, %xmm3, %xmm3 + vpxor %xmm8, %xmm3, %xmm11 + vpunpckldq %xmm5, %xmm2, %xmm0 + vpblendw $192, %xmm2, %xmm5, %xmm2 + vpshufb %xmm6, %xmm11, %xmm11 + vpunpckhqdq %xmm0, %xmm9, %xmm0 + vpblendw $15, %xmm4, %xmm2, %xmm4 + vpaddd %xmm11, %xmm1, %xmm1 + vpxor %xmm1, %xmm10, %xmm10 + vpshufd $147, %xmm11, %xmm11 + vpshufd $201, %xmm0, %xmm0 + vpslld $25, %xmm10, %xmm8 + vpsrld $7, %xmm10, %xmm10 + vpxor %xmm10, %xmm8, %xmm10 + vpshufd $78, %xmm1, %xmm1 + vpaddd %xmm3, %xmm0, %xmm0 + vpshufd $27, %xmm4, %xmm4 + vpshufd $57, %xmm10, %xmm10 + vpaddd %xmm10, %xmm0, %xmm0 + vpxor %xmm11, %xmm0, %xmm11 + vpaddd %xmm0, %xmm4, %xmm0 + vpshufb %xmm7, %xmm11, %xmm7 + vpaddd %xmm7, %xmm1, %xmm1 + vpxor %xmm10, %xmm1, %xmm10 + vpslld $20, %xmm10, %xmm8 + vpsrld $12, %xmm10, %xmm10 + vpxor %xmm10, %xmm8, %xmm8 + vpaddd %xmm8, %xmm0, %xmm0 + vpxor %xmm7, %xmm0, %xmm7 + vpshufb %xmm6, %xmm7, %xmm6 + vpaddd %xmm6, %xmm1, %xmm1 + vpxor %xmm1, %xmm8, %xmm8 + vpshufd $78, %xmm1, %xmm1 + vpshufd $57, %xmm6, %xmm6 + vpslld $25, %xmm8, %xmm2 + vpsrld $7, %xmm8, %xmm8 + vpxor %xmm8, %xmm2, %xmm8 + vpxor (%rdi), %xmm1, %xmm1 + vpshufd $147, %xmm8, %xmm8 + vpxor %xmm0, %xmm1, %xmm0 + vmovups %xmm0, (%rdi) + vpxor 16(%rdi), %xmm8, %xmm0 + vpxor %xmm6, %xmm0, %xmm6 + vmovups %xmm6, 16(%rdi) + addq $64, %rsi + decq %rdx + jnz .Lbeginofloop +.Lendofloop: + ret +ENDPROC(blake2s_compress_avx) +#endif /* CONFIG_AS_AVX */ + +#ifdef CONFIG_AS_AVX512 +ENTRY(blake2s_compress_avx512) + vmovdqu (%rdi),%xmm0 + vmovdqu 0x10(%rdi),%xmm1 + vmovdqu 0x20(%rdi),%xmm4 + vmovq %rcx,%xmm5 + vmovdqa IV(%rip),%xmm14 + vmovdqa IV+16(%rip),%xmm15 + jmp .Lblake2s_compress_avx512_mainloop +.align 32 +.Lblake2s_compress_avx512_mainloop: + vmovdqa %xmm0,%xmm10 + vmovdqa %xmm1,%xmm11 + vpaddq %xmm5,%xmm4,%xmm4 + vmovdqa %xmm14,%xmm2 + vpxor %xmm15,%xmm4,%xmm3 + vmovdqu (%rsi),%ymm6 + vmovdqu 0x20(%rsi),%ymm7 + addq $0x40,%rsi + leaq SIGMA(%rip),%rax + movb $0xa,%cl +.Lblake2s_compress_avx512_roundloop: + addq $0x40,%rax + vmovdqa -0x40(%rax),%ymm8 + vmovdqa -0x20(%rax),%ymm9 + vpermi2d %ymm7,%ymm6,%ymm8 + vpermi2d %ymm7,%ymm6,%ymm9 + vmovdqa %ymm8,%ymm6 + vmovdqa %ymm9,%ymm7 + vpaddd %xmm8,%xmm0,%xmm0 + vpaddd %xmm1,%xmm0,%xmm0 + vpxor %xmm0,%xmm3,%xmm3 + vprord $0x10,%xmm3,%xmm3 + vpaddd %xmm3,%xmm2,%xmm2 + vpxor %xmm2,%xmm1,%xmm1 + vprord $0xc,%xmm1,%xmm1 + vextracti128 $0x1,%ymm8,%xmm8 + vpaddd %xmm8,%xmm0,%xmm0 + vpaddd %xmm1,%xmm0,%xmm0 + vpxor %xmm0,%xmm3,%xmm3 + vprord $0x8,%xmm3,%xmm3 + vpaddd %xmm3,%xmm2,%xmm2 + vpxor %xmm2,%xmm1,%xmm1 + vprord $0x7,%xmm1,%xmm1 + vpshufd $0x39,%xmm1,%xmm1 + vpshufd $0x4e,%xmm2,%xmm2 + vpshufd $0x93,%xmm3,%xmm3 + vpaddd %xmm9,%xmm0,%xmm0 + vpaddd %xmm1,%xmm0,%xmm0 + vpxor %xmm0,%xmm3,%xmm3 + vprord $0x10,%xmm3,%xmm3 + vpaddd %xmm3,%xmm2,%xmm2 + vpxor %xmm2,%xmm1,%xmm1 + vprord $0xc,%xmm1,%xmm1 + vextracti128 $0x1,%ymm9,%xmm9 + vpaddd %xmm9,%xmm0,%xmm0 + vpaddd %xmm1,%xmm0,%xmm0 + vpxor %xmm0,%xmm3,%xmm3 + vprord $0x8,%xmm3,%xmm3 + vpaddd %xmm3,%xmm2,%xmm2 + vpxor %xmm2,%xmm1,%xmm1 + vprord $0x7,%xmm1,%xmm1 + vpshufd $0x93,%xmm1,%xmm1 + vpshufd $0x4e,%xmm2,%xmm2 + vpshufd $0x39,%xmm3,%xmm3 + decb %cl + jne .Lblake2s_compress_avx512_roundloop + vpxor %xmm10,%xmm0,%xmm0 + vpxor %xmm11,%xmm1,%xmm1 + vpxor %xmm2,%xmm0,%xmm0 + vpxor %xmm3,%xmm1,%xmm1 + decq %rdx + jne .Lblake2s_compress_avx512_mainloop + vmovdqu %xmm0,(%rdi) + vmovdqu %xmm1,0x10(%rdi) + vmovdqu %xmm4,0x20(%rdi) + vzeroupper + retq +ENDPROC(blake2s_compress_avx512) +#endif /* CONFIG_AS_AVX512 */ diff --git a/lib/zinc/blake2s/blake2s.c b/lib/zinc/blake2s/blake2s.c index 1b64bee20bc4..b40bde23e5e4 100644 --- a/lib/zinc/blake2s/blake2s.c +++ b/lib/zinc/blake2s/blake2s.c @@ -113,7 +113,9 @@ void blake2s_init_key(struct blake2s_state *state, const size_t outlen, } EXPORT_SYMBOL(blake2s_init_key); -#ifndef HAVE_BLAKE2S_ARCH_IMPLEMENTATION +#if defined(CONFIG_ZINC_ARCH_X86_64) +#include "blake2s-x86_64-glue.h" +#else static void __init blake2s_fpu_init(void) { } From patchwork Tue Sep 25 14:56:16 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147499 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp830852lji; Tue, 25 Sep 2018 07:57:34 -0700 (PDT) X-Google-Smtp-Source: ACcGV61kHLUDDKP4Aqv/HlEZWykWSkuIYOt2zjldMk/VBatf74jAjJ+TNWqRz6gZCE6ApqQW08u4 X-Received: by 2002:a62:54c7:: with SMTP id i190-v6mr1525413pfb.155.1537887454415; Tue, 25 Sep 2018 07:57:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887454; cv=none; d=google.com; s=arc-20160816; b=uU1aLYyESfgWC+Yno3/lz/EeLnOFSwfatUdI57+M7q2HdiFUIEJK/hY1YUlOz4YTFC 532/SP8QZsf+eX2aEiJy5/kS0jMiClwZ+ZApO5mb129Qbd/400YF7TedVqeG7ht/UO5G dDUez7FrRliFH3Q77zUJy8gkjSWZ2vWNTpPAYPhca2/yNtAs+eW0c4yFW5nCBFWsxaNe lOlir5Yl5GTZVmb/Ia3vAgiB+GSVoBUPECGto8PzMz5m8LDxsKpeopnfBWcfdPC3P03q R6ETijbken/nRAL7vrhjOo5ruIr4s9se+lQ6ypEd1eY92VXhe+4M1U2wqhI2RKvHwsaT LSbQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=LfyQRdZfblzOcL/7pOqL/utTVY6u2RjCgJ1WozqE/aU=; b=NUMvWyBp9kbu6uny9FyJXJ4AVAt8ryoytFRRTekwthyPkdZbts/dCBYDLsv9Onihdm C3hk9Yef+nDAMqvbe7oduWEgyQWYuECWZXkvw5zD3EiJwZ/hNyzPT4zwb3DrSwU1Php6 0epoknkqVYgpHyqK3g31p4CdORGOIHrCUf+54Tf8cGJWkwTFeX3wrIxvF5ipJvWJGW6E 5zsmyYuCqSyoNYVEv618wzbtI5VUCB2AfFL1q5Mgcu5eKS3ou7LcAGoT0V+fZI09phki eOzP/q544d6Gq/r3+oD5Z6PFuzm0BFVN1ktPvDWzzOv5oELvu67EBQXua95lmOQ07E7a YRIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=T3qT9UqT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f95-v6si2923496plb.373.2018.09.25.07.57.33; Tue, 25 Sep 2018 07:57:34 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=T3qT9UqT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729913AbeIYVFW (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:22 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:58261 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729622AbeIYVFV (ORCPT ); Tue, 25 Sep 2018 17:05:21 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id c8f32a95; Tue, 25 Sep 2018 14:38:51 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=auD5nqhnIA+rn+8Lti46PSiVh yI=; b=T3qT9UqTIQLKoXcBBpXD91AVzxNaVbflJxs04cLo/lpBkaf20363b8Bq5 Upi0lQK1hcw7GoIPbfUTgM7g7tXALPGLmS9+rscYo63EjroITPWrf2xPBe93CQlO DOt9j5Is8qOnAP54NpDdV1KARwAzibJ2uGD5mdWW+eHfOeIzNbiwauIgJ7BaAorO e4hOm+OrMoaYwi1K5Xm4GDz03mBFJeX33PoSw8w06+/e/jD4wvlVnVAbFcfUI/Yx b2AnHaw5D3sabJjaWI2jHdu1AkluCxRVPjFgyTCC+ogcxnySZ8H95iD9lJkCycZ9 MAcovAGtJQWqnwwdO8DZyBKkalTFg== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id fa0615de (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:50 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Karthikeyan Bhargavan Subject: [PATCH net-next v6 17/23] zinc: Curve25519 generic C implementations and selftest Date: Tue, 25 Sep 2018 16:56:16 +0200 Message-Id: <20180925145622.29959-18-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This contains two formally verified C implementations of the Curve25519 scalar multiplication function, one for 32-bit systems, and one for 64-bit systems whose compiler supports efficient 128-bit integer types. Not only are these implementations formally verified, but they are also the fastest available C implementations. They have been modified to be friendly to kernel space and to be generally less horrendous looking, but still an effort has been made to retain their formally verified characteristic, and so the C might look slightly unidiomatic. The 64-bit version comes from HACL*: https://github.com/project-everest/hacl-star The 32-bit version comes from Fiat: https://github.com/mit-plv/fiat-crypto Information: https://cr.yp.to/ecdh.html Signed-off-by: Jason A. Donenfeld Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Karthikeyan Bhargavan --- include/zinc/curve25519.h | 22 + lib/zinc/Kconfig | 4 + lib/zinc/Makefile | 3 + lib/zinc/curve25519/curve25519-fiat32.h | 860 +++++++++++++++ lib/zinc/curve25519/curve25519-hacl64.h | 784 ++++++++++++++ lib/zinc/curve25519/curve25519.c | 109 ++ lib/zinc/selftest/curve25519.h | 1321 +++++++++++++++++++++++ 7 files changed, 3103 insertions(+) create mode 100644 include/zinc/curve25519.h create mode 100644 lib/zinc/curve25519/curve25519-fiat32.h create mode 100644 lib/zinc/curve25519/curve25519-hacl64.h create mode 100644 lib/zinc/curve25519/curve25519.c create mode 100644 lib/zinc/selftest/curve25519.h -- 2.19.0 diff --git a/include/zinc/curve25519.h b/include/zinc/curve25519.h new file mode 100644 index 000000000000..def173e736fc --- /dev/null +++ b/include/zinc/curve25519.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _ZINC_CURVE25519_H +#define _ZINC_CURVE25519_H + +#include + +enum curve25519_lengths { + CURVE25519_KEY_SIZE = 32 +}; + +bool __must_check curve25519(u8 mypublic[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE], + const u8 basepoint[CURVE25519_KEY_SIZE]); +void curve25519_generate_secret(u8 secret[CURVE25519_KEY_SIZE]); +bool __must_check curve25519_generate_public( + u8 pub[CURVE25519_KEY_SIZE], const u8 secret[CURVE25519_KEY_SIZE]); + +#endif /* _ZINC_CURVE25519_H */ diff --git a/lib/zinc/Kconfig b/lib/zinc/Kconfig index 7bf4bc88f81f..0d8a94dd73a8 100644 --- a/lib/zinc/Kconfig +++ b/lib/zinc/Kconfig @@ -14,6 +14,10 @@ config ZINC_CHACHA20POLY1305 config ZINC_BLAKE2S tristate +config ZINC_CURVE25519 + tristate + select CONFIG_CRYPTO + config ZINC_DEBUG bool "Zinc cryptography library debugging and self-tests" help diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile index 67ad837c822c..65440438c6e5 100644 --- a/lib/zinc/Makefile +++ b/lib/zinc/Makefile @@ -25,3 +25,6 @@ obj-$(CONFIG_ZINC_CHACHA20POLY1305) += zinc_chacha20poly1305.o zinc_blake2s-y := blake2s/blake2s.o zinc_blake2s-$(CONFIG_ZINC_ARCH_X86_64) += blake2s/blake2s-x86_64.o obj-$(CONFIG_ZINC_BLAKE2S) += zinc_blake2s.o + +zinc_curve25519-y := curve25519/curve25519.o +obj-$(CONFIG_ZINC_CURVE25519) += zinc_curve25519.o diff --git a/lib/zinc/curve25519/curve25519-fiat32.h b/lib/zinc/curve25519/curve25519-fiat32.h new file mode 100644 index 000000000000..32b5ec7aa040 --- /dev/null +++ b/lib/zinc/curve25519/curve25519-fiat32.h @@ -0,0 +1,860 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2016 The fiat-crypto Authors. + * Copyright (C) 2018 Jason A. Donenfeld . All Rights Reserved. + * + * This is a machine-generated formally verified implementation of Curve25519 + * ECDH from: . Though originally + * machine generated, it has been tweaked to be suitable for use in the kernel. + * It is optimized for 32-bit machines and machines that cannot work efficiently + * with 128-bit integer types. + */ + +/* fe means field element. Here the field is \Z/(2^255-19). An element t, + * entries t[0]...t[9], represents the integer t[0]+2^26 t[1]+2^51 t[2]+2^77 + * t[3]+2^102 t[4]+...+2^230 t[9]. + * fe limbs are bounded by 1.125*2^26,1.125*2^25,1.125*2^26,1.125*2^25,etc. + * Multiplication and carrying produce fe from fe_loose. + */ +typedef struct fe { u32 v[10]; } fe; + +/* fe_loose limbs are bounded by 3.375*2^26,3.375*2^25,3.375*2^26,3.375*2^25,etc + * Addition and subtraction produce fe_loose from (fe, fe). + */ +typedef struct fe_loose { u32 v[10]; } fe_loose; + +static __always_inline void fe_frombytes_impl(u32 h[10], const u8 *s) +{ + /* Ignores top bit of s. */ + u32 a0 = get_unaligned_le32(s); + u32 a1 = get_unaligned_le32(s+4); + u32 a2 = get_unaligned_le32(s+8); + u32 a3 = get_unaligned_le32(s+12); + u32 a4 = get_unaligned_le32(s+16); + u32 a5 = get_unaligned_le32(s+20); + u32 a6 = get_unaligned_le32(s+24); + u32 a7 = get_unaligned_le32(s+28); + h[0] = a0&((1<<26)-1); /* 26 used, 32-26 left. 26 */ + h[1] = (a0>>26) | ((a1&((1<<19)-1))<< 6); /* (32-26) + 19 = 6+19 = 25 */ + h[2] = (a1>>19) | ((a2&((1<<13)-1))<<13); /* (32-19) + 13 = 13+13 = 26 */ + h[3] = (a2>>13) | ((a3&((1<< 6)-1))<<19); /* (32-13) + 6 = 19+ 6 = 25 */ + h[4] = (a3>> 6); /* (32- 6) = 26 */ + h[5] = a4&((1<<25)-1); /* 25 */ + h[6] = (a4>>25) | ((a5&((1<<19)-1))<< 7); /* (32-25) + 19 = 7+19 = 26 */ + h[7] = (a5>>19) | ((a6&((1<<12)-1))<<13); /* (32-19) + 12 = 13+12 = 25 */ + h[8] = (a6>>12) | ((a7&((1<< 6)-1))<<20); /* (32-12) + 6 = 20+ 6 = 26 */ + h[9] = (a7>> 6)&((1<<25)-1); /* 25 */ +} + +static __always_inline void fe_frombytes(fe *h, const u8 *s) +{ + fe_frombytes_impl(h->v, s); +} + +static __always_inline u8 /*bool*/ +addcarryx_u25(u8 /*bool*/ c, u32 a, u32 b, u32 *low) +{ + /* This function extracts 25 bits of result and 1 bit of carry + * (26 total), so a 32-bit intermediate is sufficient. + */ + u32 x = a + b + c; + *low = x & ((1 << 25) - 1); + return (x >> 25) & 1; +} + +static __always_inline u8 /*bool*/ +addcarryx_u26(u8 /*bool*/ c, u32 a, u32 b, u32 *low) +{ + /* This function extracts 26 bits of result and 1 bit of carry + * (27 total), so a 32-bit intermediate is sufficient. + */ + u32 x = a + b + c; + *low = x & ((1 << 26) - 1); + return (x >> 26) & 1; +} + +static __always_inline u8 /*bool*/ +subborrow_u25(u8 /*bool*/ c, u32 a, u32 b, u32 *low) +{ + /* This function extracts 25 bits of result and 1 bit of borrow + * (26 total), so a 32-bit intermediate is sufficient. + */ + u32 x = a - b - c; + *low = x & ((1 << 25) - 1); + return x >> 31; +} + +static __always_inline u8 /*bool*/ +subborrow_u26(u8 /*bool*/ c, u32 a, u32 b, u32 *low) +{ + /* This function extracts 26 bits of result and 1 bit of borrow + *(27 total), so a 32-bit intermediate is sufficient. + */ + u32 x = a - b - c; + *low = x & ((1 << 26) - 1); + return x >> 31; +} + +static __always_inline u32 cmovznz32(u32 t, u32 z, u32 nz) +{ + t = -!!t; /* all set if nonzero, 0 if 0 */ + return (t&nz) | ((~t)&z); +} + +static __always_inline void fe_freeze(u32 out[10], const u32 in1[10]) +{ + { const u32 x17 = in1[9]; + { const u32 x18 = in1[8]; + { const u32 x16 = in1[7]; + { const u32 x14 = in1[6]; + { const u32 x12 = in1[5]; + { const u32 x10 = in1[4]; + { const u32 x8 = in1[3]; + { const u32 x6 = in1[2]; + { const u32 x4 = in1[1]; + { const u32 x2 = in1[0]; + { u32 x20; u8/*bool*/ x21 = subborrow_u26(0x0, x2, 0x3ffffed, &x20); + { u32 x23; u8/*bool*/ x24 = subborrow_u25(x21, x4, 0x1ffffff, &x23); + { u32 x26; u8/*bool*/ x27 = subborrow_u26(x24, x6, 0x3ffffff, &x26); + { u32 x29; u8/*bool*/ x30 = subborrow_u25(x27, x8, 0x1ffffff, &x29); + { u32 x32; u8/*bool*/ x33 = subborrow_u26(x30, x10, 0x3ffffff, &x32); + { u32 x35; u8/*bool*/ x36 = subborrow_u25(x33, x12, 0x1ffffff, &x35); + { u32 x38; u8/*bool*/ x39 = subborrow_u26(x36, x14, 0x3ffffff, &x38); + { u32 x41; u8/*bool*/ x42 = subborrow_u25(x39, x16, 0x1ffffff, &x41); + { u32 x44; u8/*bool*/ x45 = subborrow_u26(x42, x18, 0x3ffffff, &x44); + { u32 x47; u8/*bool*/ x48 = subborrow_u25(x45, x17, 0x1ffffff, &x47); + { u32 x49 = cmovznz32(x48, 0x0, 0xffffffff); + { u32 x50 = (x49 & 0x3ffffed); + { u32 x52; u8/*bool*/ x53 = addcarryx_u26(0x0, x20, x50, &x52); + { u32 x54 = (x49 & 0x1ffffff); + { u32 x56; u8/*bool*/ x57 = addcarryx_u25(x53, x23, x54, &x56); + { u32 x58 = (x49 & 0x3ffffff); + { u32 x60; u8/*bool*/ x61 = addcarryx_u26(x57, x26, x58, &x60); + { u32 x62 = (x49 & 0x1ffffff); + { u32 x64; u8/*bool*/ x65 = addcarryx_u25(x61, x29, x62, &x64); + { u32 x66 = (x49 & 0x3ffffff); + { u32 x68; u8/*bool*/ x69 = addcarryx_u26(x65, x32, x66, &x68); + { u32 x70 = (x49 & 0x1ffffff); + { u32 x72; u8/*bool*/ x73 = addcarryx_u25(x69, x35, x70, &x72); + { u32 x74 = (x49 & 0x3ffffff); + { u32 x76; u8/*bool*/ x77 = addcarryx_u26(x73, x38, x74, &x76); + { u32 x78 = (x49 & 0x1ffffff); + { u32 x80; u8/*bool*/ x81 = addcarryx_u25(x77, x41, x78, &x80); + { u32 x82 = (x49 & 0x3ffffff); + { u32 x84; u8/*bool*/ x85 = addcarryx_u26(x81, x44, x82, &x84); + { u32 x86 = (x49 & 0x1ffffff); + { u32 x88; addcarryx_u25(x85, x47, x86, &x88); + out[0] = x52; + out[1] = x56; + out[2] = x60; + out[3] = x64; + out[4] = x68; + out[5] = x72; + out[6] = x76; + out[7] = x80; + out[8] = x84; + out[9] = x88; + }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}} +} + +static __always_inline void fe_tobytes(u8 s[32], const fe *f) +{ + u32 h[10]; + fe_freeze(h, f->v); + s[0] = h[0] >> 0; + s[1] = h[0] >> 8; + s[2] = h[0] >> 16; + s[3] = (h[0] >> 24) | (h[1] << 2); + s[4] = h[1] >> 6; + s[5] = h[1] >> 14; + s[6] = (h[1] >> 22) | (h[2] << 3); + s[7] = h[2] >> 5; + s[8] = h[2] >> 13; + s[9] = (h[2] >> 21) | (h[3] << 5); + s[10] = h[3] >> 3; + s[11] = h[3] >> 11; + s[12] = (h[3] >> 19) | (h[4] << 6); + s[13] = h[4] >> 2; + s[14] = h[4] >> 10; + s[15] = h[4] >> 18; + s[16] = h[5] >> 0; + s[17] = h[5] >> 8; + s[18] = h[5] >> 16; + s[19] = (h[5] >> 24) | (h[6] << 1); + s[20] = h[6] >> 7; + s[21] = h[6] >> 15; + s[22] = (h[6] >> 23) | (h[7] << 3); + s[23] = h[7] >> 5; + s[24] = h[7] >> 13; + s[25] = (h[7] >> 21) | (h[8] << 4); + s[26] = h[8] >> 4; + s[27] = h[8] >> 12; + s[28] = (h[8] >> 20) | (h[9] << 6); + s[29] = h[9] >> 2; + s[30] = h[9] >> 10; + s[31] = h[9] >> 18; +} + +/* h = f */ +static __always_inline void fe_copy(fe *h, const fe *f) +{ + memmove(h, f, sizeof(u32) * 10); +} + +static __always_inline void fe_copy_lt(fe_loose *h, const fe *f) +{ + memmove(h, f, sizeof(u32) * 10); +} + +/* h = 0 */ +static __always_inline void fe_0(fe *h) +{ + memset(h, 0, sizeof(u32) * 10); +} + +/* h = 1 */ +static __always_inline void fe_1(fe *h) +{ + memset(h, 0, sizeof(u32) * 10); + h->v[0] = 1; +} + +static void fe_add_impl(u32 out[10], const u32 in1[10], const u32 in2[10]) +{ + { const u32 x20 = in1[9]; + { const u32 x21 = in1[8]; + { const u32 x19 = in1[7]; + { const u32 x17 = in1[6]; + { const u32 x15 = in1[5]; + { const u32 x13 = in1[4]; + { const u32 x11 = in1[3]; + { const u32 x9 = in1[2]; + { const u32 x7 = in1[1]; + { const u32 x5 = in1[0]; + { const u32 x38 = in2[9]; + { const u32 x39 = in2[8]; + { const u32 x37 = in2[7]; + { const u32 x35 = in2[6]; + { const u32 x33 = in2[5]; + { const u32 x31 = in2[4]; + { const u32 x29 = in2[3]; + { const u32 x27 = in2[2]; + { const u32 x25 = in2[1]; + { const u32 x23 = in2[0]; + out[0] = (x5 + x23); + out[1] = (x7 + x25); + out[2] = (x9 + x27); + out[3] = (x11 + x29); + out[4] = (x13 + x31); + out[5] = (x15 + x33); + out[6] = (x17 + x35); + out[7] = (x19 + x37); + out[8] = (x21 + x39); + out[9] = (x20 + x38); + }}}}}}}}}}}}}}}}}}}} +} + +/* h = f + g + * Can overlap h with f or g. + */ +static __always_inline void fe_add(fe_loose *h, const fe *f, const fe *g) +{ + fe_add_impl(h->v, f->v, g->v); +} + +static void fe_sub_impl(u32 out[10], const u32 in1[10], const u32 in2[10]) +{ + { const u32 x20 = in1[9]; + { const u32 x21 = in1[8]; + { const u32 x19 = in1[7]; + { const u32 x17 = in1[6]; + { const u32 x15 = in1[5]; + { const u32 x13 = in1[4]; + { const u32 x11 = in1[3]; + { const u32 x9 = in1[2]; + { const u32 x7 = in1[1]; + { const u32 x5 = in1[0]; + { const u32 x38 = in2[9]; + { const u32 x39 = in2[8]; + { const u32 x37 = in2[7]; + { const u32 x35 = in2[6]; + { const u32 x33 = in2[5]; + { const u32 x31 = in2[4]; + { const u32 x29 = in2[3]; + { const u32 x27 = in2[2]; + { const u32 x25 = in2[1]; + { const u32 x23 = in2[0]; + out[0] = ((0x7ffffda + x5) - x23); + out[1] = ((0x3fffffe + x7) - x25); + out[2] = ((0x7fffffe + x9) - x27); + out[3] = ((0x3fffffe + x11) - x29); + out[4] = ((0x7fffffe + x13) - x31); + out[5] = ((0x3fffffe + x15) - x33); + out[6] = ((0x7fffffe + x17) - x35); + out[7] = ((0x3fffffe + x19) - x37); + out[8] = ((0x7fffffe + x21) - x39); + out[9] = ((0x3fffffe + x20) - x38); + }}}}}}}}}}}}}}}}}}}} +} + +/* h = f - g + * Can overlap h with f or g. + */ +static __always_inline void fe_sub(fe_loose *h, const fe *f, const fe *g) +{ + fe_sub_impl(h->v, f->v, g->v); +} + +static void fe_mul_impl(u32 out[10], const u32 in1[10], const u32 in2[10]) +{ + { const u32 x20 = in1[9]; + { const u32 x21 = in1[8]; + { const u32 x19 = in1[7]; + { const u32 x17 = in1[6]; + { const u32 x15 = in1[5]; + { const u32 x13 = in1[4]; + { const u32 x11 = in1[3]; + { const u32 x9 = in1[2]; + { const u32 x7 = in1[1]; + { const u32 x5 = in1[0]; + { const u32 x38 = in2[9]; + { const u32 x39 = in2[8]; + { const u32 x37 = in2[7]; + { const u32 x35 = in2[6]; + { const u32 x33 = in2[5]; + { const u32 x31 = in2[4]; + { const u32 x29 = in2[3]; + { const u32 x27 = in2[2]; + { const u32 x25 = in2[1]; + { const u32 x23 = in2[0]; + { u64 x40 = ((u64)x23 * x5); + { u64 x41 = (((u64)x23 * x7) + ((u64)x25 * x5)); + { u64 x42 = ((((u64)(0x2 * x25) * x7) + ((u64)x23 * x9)) + ((u64)x27 * x5)); + { u64 x43 = (((((u64)x25 * x9) + ((u64)x27 * x7)) + ((u64)x23 * x11)) + ((u64)x29 * x5)); + { u64 x44 = (((((u64)x27 * x9) + (0x2 * (((u64)x25 * x11) + ((u64)x29 * x7)))) + ((u64)x23 * x13)) + ((u64)x31 * x5)); + { u64 x45 = (((((((u64)x27 * x11) + ((u64)x29 * x9)) + ((u64)x25 * x13)) + ((u64)x31 * x7)) + ((u64)x23 * x15)) + ((u64)x33 * x5)); + { u64 x46 = (((((0x2 * ((((u64)x29 * x11) + ((u64)x25 * x15)) + ((u64)x33 * x7))) + ((u64)x27 * x13)) + ((u64)x31 * x9)) + ((u64)x23 * x17)) + ((u64)x35 * x5)); + { u64 x47 = (((((((((u64)x29 * x13) + ((u64)x31 * x11)) + ((u64)x27 * x15)) + ((u64)x33 * x9)) + ((u64)x25 * x17)) + ((u64)x35 * x7)) + ((u64)x23 * x19)) + ((u64)x37 * x5)); + { u64 x48 = (((((((u64)x31 * x13) + (0x2 * (((((u64)x29 * x15) + ((u64)x33 * x11)) + ((u64)x25 * x19)) + ((u64)x37 * x7)))) + ((u64)x27 * x17)) + ((u64)x35 * x9)) + ((u64)x23 * x21)) + ((u64)x39 * x5)); + { u64 x49 = (((((((((((u64)x31 * x15) + ((u64)x33 * x13)) + ((u64)x29 * x17)) + ((u64)x35 * x11)) + ((u64)x27 * x19)) + ((u64)x37 * x9)) + ((u64)x25 * x21)) + ((u64)x39 * x7)) + ((u64)x23 * x20)) + ((u64)x38 * x5)); + { u64 x50 = (((((0x2 * ((((((u64)x33 * x15) + ((u64)x29 * x19)) + ((u64)x37 * x11)) + ((u64)x25 * x20)) + ((u64)x38 * x7))) + ((u64)x31 * x17)) + ((u64)x35 * x13)) + ((u64)x27 * x21)) + ((u64)x39 * x9)); + { u64 x51 = (((((((((u64)x33 * x17) + ((u64)x35 * x15)) + ((u64)x31 * x19)) + ((u64)x37 * x13)) + ((u64)x29 * x21)) + ((u64)x39 * x11)) + ((u64)x27 * x20)) + ((u64)x38 * x9)); + { u64 x52 = (((((u64)x35 * x17) + (0x2 * (((((u64)x33 * x19) + ((u64)x37 * x15)) + ((u64)x29 * x20)) + ((u64)x38 * x11)))) + ((u64)x31 * x21)) + ((u64)x39 * x13)); + { u64 x53 = (((((((u64)x35 * x19) + ((u64)x37 * x17)) + ((u64)x33 * x21)) + ((u64)x39 * x15)) + ((u64)x31 * x20)) + ((u64)x38 * x13)); + { u64 x54 = (((0x2 * ((((u64)x37 * x19) + ((u64)x33 * x20)) + ((u64)x38 * x15))) + ((u64)x35 * x21)) + ((u64)x39 * x17)); + { u64 x55 = (((((u64)x37 * x21) + ((u64)x39 * x19)) + ((u64)x35 * x20)) + ((u64)x38 * x17)); + { u64 x56 = (((u64)x39 * x21) + (0x2 * (((u64)x37 * x20) + ((u64)x38 * x19)))); + { u64 x57 = (((u64)x39 * x20) + ((u64)x38 * x21)); + { u64 x58 = ((u64)(0x2 * x38) * x20); + { u64 x59 = (x48 + (x58 << 0x4)); + { u64 x60 = (x59 + (x58 << 0x1)); + { u64 x61 = (x60 + x58); + { u64 x62 = (x47 + (x57 << 0x4)); + { u64 x63 = (x62 + (x57 << 0x1)); + { u64 x64 = (x63 + x57); + { u64 x65 = (x46 + (x56 << 0x4)); + { u64 x66 = (x65 + (x56 << 0x1)); + { u64 x67 = (x66 + x56); + { u64 x68 = (x45 + (x55 << 0x4)); + { u64 x69 = (x68 + (x55 << 0x1)); + { u64 x70 = (x69 + x55); + { u64 x71 = (x44 + (x54 << 0x4)); + { u64 x72 = (x71 + (x54 << 0x1)); + { u64 x73 = (x72 + x54); + { u64 x74 = (x43 + (x53 << 0x4)); + { u64 x75 = (x74 + (x53 << 0x1)); + { u64 x76 = (x75 + x53); + { u64 x77 = (x42 + (x52 << 0x4)); + { u64 x78 = (x77 + (x52 << 0x1)); + { u64 x79 = (x78 + x52); + { u64 x80 = (x41 + (x51 << 0x4)); + { u64 x81 = (x80 + (x51 << 0x1)); + { u64 x82 = (x81 + x51); + { u64 x83 = (x40 + (x50 << 0x4)); + { u64 x84 = (x83 + (x50 << 0x1)); + { u64 x85 = (x84 + x50); + { u64 x86 = (x85 >> 0x1a); + { u32 x87 = ((u32)x85 & 0x3ffffff); + { u64 x88 = (x86 + x82); + { u64 x89 = (x88 >> 0x19); + { u32 x90 = ((u32)x88 & 0x1ffffff); + { u64 x91 = (x89 + x79); + { u64 x92 = (x91 >> 0x1a); + { u32 x93 = ((u32)x91 & 0x3ffffff); + { u64 x94 = (x92 + x76); + { u64 x95 = (x94 >> 0x19); + { u32 x96 = ((u32)x94 & 0x1ffffff); + { u64 x97 = (x95 + x73); + { u64 x98 = (x97 >> 0x1a); + { u32 x99 = ((u32)x97 & 0x3ffffff); + { u64 x100 = (x98 + x70); + { u64 x101 = (x100 >> 0x19); + { u32 x102 = ((u32)x100 & 0x1ffffff); + { u64 x103 = (x101 + x67); + { u64 x104 = (x103 >> 0x1a); + { u32 x105 = ((u32)x103 & 0x3ffffff); + { u64 x106 = (x104 + x64); + { u64 x107 = (x106 >> 0x19); + { u32 x108 = ((u32)x106 & 0x1ffffff); + { u64 x109 = (x107 + x61); + { u64 x110 = (x109 >> 0x1a); + { u32 x111 = ((u32)x109 & 0x3ffffff); + { u64 x112 = (x110 + x49); + { u64 x113 = (x112 >> 0x19); + { u32 x114 = ((u32)x112 & 0x1ffffff); + { u64 x115 = (x87 + (0x13 * x113)); + { u32 x116 = (u32) (x115 >> 0x1a); + { u32 x117 = ((u32)x115 & 0x3ffffff); + { u32 x118 = (x116 + x90); + { u32 x119 = (x118 >> 0x19); + { u32 x120 = (x118 & 0x1ffffff); + out[0] = x117; + out[1] = x120; + out[2] = (x119 + x93); + out[3] = x96; + out[4] = x99; + out[5] = x102; + out[6] = x105; + out[7] = x108; + out[8] = x111; + out[9] = x114; + }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}} +} + +static __always_inline void fe_mul_ttt(fe *h, const fe *f, const fe *g) +{ + fe_mul_impl(h->v, f->v, g->v); +} + +static __always_inline void fe_mul_tlt(fe *h, const fe_loose *f, const fe *g) +{ + fe_mul_impl(h->v, f->v, g->v); +} + +static __always_inline void +fe_mul_tll(fe *h, const fe_loose *f, const fe_loose *g) +{ + fe_mul_impl(h->v, f->v, g->v); +} + +static void fe_sqr_impl(u32 out[10], const u32 in1[10]) +{ + { const u32 x17 = in1[9]; + { const u32 x18 = in1[8]; + { const u32 x16 = in1[7]; + { const u32 x14 = in1[6]; + { const u32 x12 = in1[5]; + { const u32 x10 = in1[4]; + { const u32 x8 = in1[3]; + { const u32 x6 = in1[2]; + { const u32 x4 = in1[1]; + { const u32 x2 = in1[0]; + { u64 x19 = ((u64)x2 * x2); + { u64 x20 = ((u64)(0x2 * x2) * x4); + { u64 x21 = (0x2 * (((u64)x4 * x4) + ((u64)x2 * x6))); + { u64 x22 = (0x2 * (((u64)x4 * x6) + ((u64)x2 * x8))); + { u64 x23 = ((((u64)x6 * x6) + ((u64)(0x4 * x4) * x8)) + ((u64)(0x2 * x2) * x10)); + { u64 x24 = (0x2 * ((((u64)x6 * x8) + ((u64)x4 * x10)) + ((u64)x2 * x12))); + { u64 x25 = (0x2 * (((((u64)x8 * x8) + ((u64)x6 * x10)) + ((u64)x2 * x14)) + ((u64)(0x2 * x4) * x12))); + { u64 x26 = (0x2 * (((((u64)x8 * x10) + ((u64)x6 * x12)) + ((u64)x4 * x14)) + ((u64)x2 * x16))); + { u64 x27 = (((u64)x10 * x10) + (0x2 * ((((u64)x6 * x14) + ((u64)x2 * x18)) + (0x2 * (((u64)x4 * x16) + ((u64)x8 * x12)))))); + { u64 x28 = (0x2 * ((((((u64)x10 * x12) + ((u64)x8 * x14)) + ((u64)x6 * x16)) + ((u64)x4 * x18)) + ((u64)x2 * x17))); + { u64 x29 = (0x2 * (((((u64)x12 * x12) + ((u64)x10 * x14)) + ((u64)x6 * x18)) + (0x2 * (((u64)x8 * x16) + ((u64)x4 * x17))))); + { u64 x30 = (0x2 * (((((u64)x12 * x14) + ((u64)x10 * x16)) + ((u64)x8 * x18)) + ((u64)x6 * x17))); + { u64 x31 = (((u64)x14 * x14) + (0x2 * (((u64)x10 * x18) + (0x2 * (((u64)x12 * x16) + ((u64)x8 * x17)))))); + { u64 x32 = (0x2 * ((((u64)x14 * x16) + ((u64)x12 * x18)) + ((u64)x10 * x17))); + { u64 x33 = (0x2 * ((((u64)x16 * x16) + ((u64)x14 * x18)) + ((u64)(0x2 * x12) * x17))); + { u64 x34 = (0x2 * (((u64)x16 * x18) + ((u64)x14 * x17))); + { u64 x35 = (((u64)x18 * x18) + ((u64)(0x4 * x16) * x17)); + { u64 x36 = ((u64)(0x2 * x18) * x17); + { u64 x37 = ((u64)(0x2 * x17) * x17); + { u64 x38 = (x27 + (x37 << 0x4)); + { u64 x39 = (x38 + (x37 << 0x1)); + { u64 x40 = (x39 + x37); + { u64 x41 = (x26 + (x36 << 0x4)); + { u64 x42 = (x41 + (x36 << 0x1)); + { u64 x43 = (x42 + x36); + { u64 x44 = (x25 + (x35 << 0x4)); + { u64 x45 = (x44 + (x35 << 0x1)); + { u64 x46 = (x45 + x35); + { u64 x47 = (x24 + (x34 << 0x4)); + { u64 x48 = (x47 + (x34 << 0x1)); + { u64 x49 = (x48 + x34); + { u64 x50 = (x23 + (x33 << 0x4)); + { u64 x51 = (x50 + (x33 << 0x1)); + { u64 x52 = (x51 + x33); + { u64 x53 = (x22 + (x32 << 0x4)); + { u64 x54 = (x53 + (x32 << 0x1)); + { u64 x55 = (x54 + x32); + { u64 x56 = (x21 + (x31 << 0x4)); + { u64 x57 = (x56 + (x31 << 0x1)); + { u64 x58 = (x57 + x31); + { u64 x59 = (x20 + (x30 << 0x4)); + { u64 x60 = (x59 + (x30 << 0x1)); + { u64 x61 = (x60 + x30); + { u64 x62 = (x19 + (x29 << 0x4)); + { u64 x63 = (x62 + (x29 << 0x1)); + { u64 x64 = (x63 + x29); + { u64 x65 = (x64 >> 0x1a); + { u32 x66 = ((u32)x64 & 0x3ffffff); + { u64 x67 = (x65 + x61); + { u64 x68 = (x67 >> 0x19); + { u32 x69 = ((u32)x67 & 0x1ffffff); + { u64 x70 = (x68 + x58); + { u64 x71 = (x70 >> 0x1a); + { u32 x72 = ((u32)x70 & 0x3ffffff); + { u64 x73 = (x71 + x55); + { u64 x74 = (x73 >> 0x19); + { u32 x75 = ((u32)x73 & 0x1ffffff); + { u64 x76 = (x74 + x52); + { u64 x77 = (x76 >> 0x1a); + { u32 x78 = ((u32)x76 & 0x3ffffff); + { u64 x79 = (x77 + x49); + { u64 x80 = (x79 >> 0x19); + { u32 x81 = ((u32)x79 & 0x1ffffff); + { u64 x82 = (x80 + x46); + { u64 x83 = (x82 >> 0x1a); + { u32 x84 = ((u32)x82 & 0x3ffffff); + { u64 x85 = (x83 + x43); + { u64 x86 = (x85 >> 0x19); + { u32 x87 = ((u32)x85 & 0x1ffffff); + { u64 x88 = (x86 + x40); + { u64 x89 = (x88 >> 0x1a); + { u32 x90 = ((u32)x88 & 0x3ffffff); + { u64 x91 = (x89 + x28); + { u64 x92 = (x91 >> 0x19); + { u32 x93 = ((u32)x91 & 0x1ffffff); + { u64 x94 = (x66 + (0x13 * x92)); + { u32 x95 = (u32) (x94 >> 0x1a); + { u32 x96 = ((u32)x94 & 0x3ffffff); + { u32 x97 = (x95 + x69); + { u32 x98 = (x97 >> 0x19); + { u32 x99 = (x97 & 0x1ffffff); + out[0] = x96; + out[1] = x99; + out[2] = (x98 + x72); + out[3] = x75; + out[4] = x78; + out[5] = x81; + out[6] = x84; + out[7] = x87; + out[8] = x90; + out[9] = x93; + }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}} +} + +static __always_inline void fe_sq_tl(fe *h, const fe_loose *f) +{ + fe_sqr_impl(h->v, f->v); +} + +static __always_inline void fe_sq_tt(fe *h, const fe *f) +{ + fe_sqr_impl(h->v, f->v); +} + +static __always_inline void fe_loose_invert(fe *out, const fe_loose *z) +{ + fe t0; + fe t1; + fe t2; + fe t3; + int i; + + fe_sq_tl(&t0, z); + fe_sq_tt(&t1, &t0); + for (i = 1; i < 2; ++i) + fe_sq_tt(&t1, &t1); + fe_mul_tlt(&t1, z, &t1); + fe_mul_ttt(&t0, &t0, &t1); + fe_sq_tt(&t2, &t0); + fe_mul_ttt(&t1, &t1, &t2); + fe_sq_tt(&t2, &t1); + for (i = 1; i < 5; ++i) + fe_sq_tt(&t2, &t2); + fe_mul_ttt(&t1, &t2, &t1); + fe_sq_tt(&t2, &t1); + for (i = 1; i < 10; ++i) + fe_sq_tt(&t2, &t2); + fe_mul_ttt(&t2, &t2, &t1); + fe_sq_tt(&t3, &t2); + for (i = 1; i < 20; ++i) + fe_sq_tt(&t3, &t3); + fe_mul_ttt(&t2, &t3, &t2); + fe_sq_tt(&t2, &t2); + for (i = 1; i < 10; ++i) + fe_sq_tt(&t2, &t2); + fe_mul_ttt(&t1, &t2, &t1); + fe_sq_tt(&t2, &t1); + for (i = 1; i < 50; ++i) + fe_sq_tt(&t2, &t2); + fe_mul_ttt(&t2, &t2, &t1); + fe_sq_tt(&t3, &t2); + for (i = 1; i < 100; ++i) + fe_sq_tt(&t3, &t3); + fe_mul_ttt(&t2, &t3, &t2); + fe_sq_tt(&t2, &t2); + for (i = 1; i < 50; ++i) + fe_sq_tt(&t2, &t2); + fe_mul_ttt(&t1, &t2, &t1); + fe_sq_tt(&t1, &t1); + for (i = 1; i < 5; ++i) + fe_sq_tt(&t1, &t1); + fe_mul_ttt(out, &t1, &t0); +} + +static __always_inline void fe_invert(fe *out, const fe *z) +{ + fe_loose l; + fe_copy_lt(&l, z); + fe_loose_invert(out, &l); +} + +/* Replace (f,g) with (g,f) if b == 1; + * replace (f,g) with (f,g) if b == 0. + * + * Preconditions: b in {0,1} + */ +static __always_inline void fe_cswap(fe *f, fe *g, unsigned int b) +{ + unsigned i; + b = 0-b; + for (i = 0; i < 10; i++) { + u32 x = f->v[i] ^ g->v[i]; + x &= b; + f->v[i] ^= x; + g->v[i] ^= x; + } +} + +/* NOTE: based on fiat-crypto fe_mul, edited for in2=121666, 0, 0.*/ +static __always_inline void fe_mul_121666_impl(u32 out[10], const u32 in1[10]) +{ + { const u32 x20 = in1[9]; + { const u32 x21 = in1[8]; + { const u32 x19 = in1[7]; + { const u32 x17 = in1[6]; + { const u32 x15 = in1[5]; + { const u32 x13 = in1[4]; + { const u32 x11 = in1[3]; + { const u32 x9 = in1[2]; + { const u32 x7 = in1[1]; + { const u32 x5 = in1[0]; + { const u32 x38 = 0; + { const u32 x39 = 0; + { const u32 x37 = 0; + { const u32 x35 = 0; + { const u32 x33 = 0; + { const u32 x31 = 0; + { const u32 x29 = 0; + { const u32 x27 = 0; + { const u32 x25 = 0; + { const u32 x23 = 121666; + { u64 x40 = ((u64)x23 * x5); + { u64 x41 = (((u64)x23 * x7) + ((u64)x25 * x5)); + { u64 x42 = ((((u64)(0x2 * x25) * x7) + ((u64)x23 * x9)) + ((u64)x27 * x5)); + { u64 x43 = (((((u64)x25 * x9) + ((u64)x27 * x7)) + ((u64)x23 * x11)) + ((u64)x29 * x5)); + { u64 x44 = (((((u64)x27 * x9) + (0x2 * (((u64)x25 * x11) + ((u64)x29 * x7)))) + ((u64)x23 * x13)) + ((u64)x31 * x5)); + { u64 x45 = (((((((u64)x27 * x11) + ((u64)x29 * x9)) + ((u64)x25 * x13)) + ((u64)x31 * x7)) + ((u64)x23 * x15)) + ((u64)x33 * x5)); + { u64 x46 = (((((0x2 * ((((u64)x29 * x11) + ((u64)x25 * x15)) + ((u64)x33 * x7))) + ((u64)x27 * x13)) + ((u64)x31 * x9)) + ((u64)x23 * x17)) + ((u64)x35 * x5)); + { u64 x47 = (((((((((u64)x29 * x13) + ((u64)x31 * x11)) + ((u64)x27 * x15)) + ((u64)x33 * x9)) + ((u64)x25 * x17)) + ((u64)x35 * x7)) + ((u64)x23 * x19)) + ((u64)x37 * x5)); + { u64 x48 = (((((((u64)x31 * x13) + (0x2 * (((((u64)x29 * x15) + ((u64)x33 * x11)) + ((u64)x25 * x19)) + ((u64)x37 * x7)))) + ((u64)x27 * x17)) + ((u64)x35 * x9)) + ((u64)x23 * x21)) + ((u64)x39 * x5)); + { u64 x49 = (((((((((((u64)x31 * x15) + ((u64)x33 * x13)) + ((u64)x29 * x17)) + ((u64)x35 * x11)) + ((u64)x27 * x19)) + ((u64)x37 * x9)) + ((u64)x25 * x21)) + ((u64)x39 * x7)) + ((u64)x23 * x20)) + ((u64)x38 * x5)); + { u64 x50 = (((((0x2 * ((((((u64)x33 * x15) + ((u64)x29 * x19)) + ((u64)x37 * x11)) + ((u64)x25 * x20)) + ((u64)x38 * x7))) + ((u64)x31 * x17)) + ((u64)x35 * x13)) + ((u64)x27 * x21)) + ((u64)x39 * x9)); + { u64 x51 = (((((((((u64)x33 * x17) + ((u64)x35 * x15)) + ((u64)x31 * x19)) + ((u64)x37 * x13)) + ((u64)x29 * x21)) + ((u64)x39 * x11)) + ((u64)x27 * x20)) + ((u64)x38 * x9)); + { u64 x52 = (((((u64)x35 * x17) + (0x2 * (((((u64)x33 * x19) + ((u64)x37 * x15)) + ((u64)x29 * x20)) + ((u64)x38 * x11)))) + ((u64)x31 * x21)) + ((u64)x39 * x13)); + { u64 x53 = (((((((u64)x35 * x19) + ((u64)x37 * x17)) + ((u64)x33 * x21)) + ((u64)x39 * x15)) + ((u64)x31 * x20)) + ((u64)x38 * x13)); + { u64 x54 = (((0x2 * ((((u64)x37 * x19) + ((u64)x33 * x20)) + ((u64)x38 * x15))) + ((u64)x35 * x21)) + ((u64)x39 * x17)); + { u64 x55 = (((((u64)x37 * x21) + ((u64)x39 * x19)) + ((u64)x35 * x20)) + ((u64)x38 * x17)); + { u64 x56 = (((u64)x39 * x21) + (0x2 * (((u64)x37 * x20) + ((u64)x38 * x19)))); + { u64 x57 = (((u64)x39 * x20) + ((u64)x38 * x21)); + { u64 x58 = ((u64)(0x2 * x38) * x20); + { u64 x59 = (x48 + (x58 << 0x4)); + { u64 x60 = (x59 + (x58 << 0x1)); + { u64 x61 = (x60 + x58); + { u64 x62 = (x47 + (x57 << 0x4)); + { u64 x63 = (x62 + (x57 << 0x1)); + { u64 x64 = (x63 + x57); + { u64 x65 = (x46 + (x56 << 0x4)); + { u64 x66 = (x65 + (x56 << 0x1)); + { u64 x67 = (x66 + x56); + { u64 x68 = (x45 + (x55 << 0x4)); + { u64 x69 = (x68 + (x55 << 0x1)); + { u64 x70 = (x69 + x55); + { u64 x71 = (x44 + (x54 << 0x4)); + { u64 x72 = (x71 + (x54 << 0x1)); + { u64 x73 = (x72 + x54); + { u64 x74 = (x43 + (x53 << 0x4)); + { u64 x75 = (x74 + (x53 << 0x1)); + { u64 x76 = (x75 + x53); + { u64 x77 = (x42 + (x52 << 0x4)); + { u64 x78 = (x77 + (x52 << 0x1)); + { u64 x79 = (x78 + x52); + { u64 x80 = (x41 + (x51 << 0x4)); + { u64 x81 = (x80 + (x51 << 0x1)); + { u64 x82 = (x81 + x51); + { u64 x83 = (x40 + (x50 << 0x4)); + { u64 x84 = (x83 + (x50 << 0x1)); + { u64 x85 = (x84 + x50); + { u64 x86 = (x85 >> 0x1a); + { u32 x87 = ((u32)x85 & 0x3ffffff); + { u64 x88 = (x86 + x82); + { u64 x89 = (x88 >> 0x19); + { u32 x90 = ((u32)x88 & 0x1ffffff); + { u64 x91 = (x89 + x79); + { u64 x92 = (x91 >> 0x1a); + { u32 x93 = ((u32)x91 & 0x3ffffff); + { u64 x94 = (x92 + x76); + { u64 x95 = (x94 >> 0x19); + { u32 x96 = ((u32)x94 & 0x1ffffff); + { u64 x97 = (x95 + x73); + { u64 x98 = (x97 >> 0x1a); + { u32 x99 = ((u32)x97 & 0x3ffffff); + { u64 x100 = (x98 + x70); + { u64 x101 = (x100 >> 0x19); + { u32 x102 = ((u32)x100 & 0x1ffffff); + { u64 x103 = (x101 + x67); + { u64 x104 = (x103 >> 0x1a); + { u32 x105 = ((u32)x103 & 0x3ffffff); + { u64 x106 = (x104 + x64); + { u64 x107 = (x106 >> 0x19); + { u32 x108 = ((u32)x106 & 0x1ffffff); + { u64 x109 = (x107 + x61); + { u64 x110 = (x109 >> 0x1a); + { u32 x111 = ((u32)x109 & 0x3ffffff); + { u64 x112 = (x110 + x49); + { u64 x113 = (x112 >> 0x19); + { u32 x114 = ((u32)x112 & 0x1ffffff); + { u64 x115 = (x87 + (0x13 * x113)); + { u32 x116 = (u32) (x115 >> 0x1a); + { u32 x117 = ((u32)x115 & 0x3ffffff); + { u32 x118 = (x116 + x90); + { u32 x119 = (x118 >> 0x19); + { u32 x120 = (x118 & 0x1ffffff); + out[0] = x117; + out[1] = x120; + out[2] = (x119 + x93); + out[3] = x96; + out[4] = x99; + out[5] = x102; + out[6] = x105; + out[7] = x108; + out[8] = x111; + out[9] = x114; + }}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}} +} + +static __always_inline void fe_mul121666(fe *h, const fe_loose *f) +{ + fe_mul_121666_impl(h->v, f->v); +} + +static void curve25519_generic(u8 out[CURVE25519_KEY_SIZE], + const u8 scalar[CURVE25519_KEY_SIZE], + const u8 point[CURVE25519_KEY_SIZE]) +{ + fe x1, x2, z2, x3, z3; + fe_loose x2l, z2l, x3l; + unsigned swap = 0; + int pos; + u8 e[32]; + + memcpy(e, scalar, 32); + normalize_secret(e); + + /* The following implementation was transcribed to Coq and proven to + * correspond to unary scalar multiplication in affine coordinates given + * that x1 != 0 is the x coordinate of some point on the curve. It was + * also checked in Coq that doing a ladderstep with x1 = x3 = 0 gives + * z2' = z3' = 0, and z2 = z3 = 0 gives z2' = z3' = 0. The statement was + * quantified over the underlying field, so it applies to Curve25519 + * itself and the quadratic twist of Curve25519. It was not proven in + * Coq that prime-field arithmetic correctly simulates extension-field + * arithmetic on prime-field values. The decoding of the byte array + * representation of e was not considered. + * + * Specification of Montgomery curves in affine coordinates: + * + * + * Proof that these form a group that is isomorphic to a Weierstrass + * curve: + * + * + * Coq transcription and correctness proof of the loop + * (where scalarbits=255): + * + * + * preconditions: 0 <= e < 2^255 (not necessarily e < order), + * fe_invert(0) = 0 + */ + fe_frombytes(&x1, point); + fe_1(&x2); + fe_0(&z2); + fe_copy(&x3, &x1); + fe_1(&z3); + + for (pos = 254; pos >= 0; --pos) { + fe tmp0, tmp1; + fe_loose tmp0l, tmp1l; + /* loop invariant as of right before the test, for the case + * where x1 != 0: + * pos >= -1; if z2 = 0 then x2 is nonzero; if z3 = 0 then x3 + * is nonzero + * let r := e >> (pos+1) in the following equalities of + * projective points: + * to_xz (r*P) === if swap then (x3, z3) else (x2, z2) + * to_xz ((r+1)*P) === if swap then (x2, z2) else (x3, z3) + * x1 is the nonzero x coordinate of the nonzero + * point (r*P-(r+1)*P) + */ + unsigned b = 1 & (e[pos / 8] >> (pos & 7)); + swap ^= b; + fe_cswap(&x2, &x3, swap); + fe_cswap(&z2, &z3, swap); + swap = b; + /* Coq transcription of ladderstep formula (called from + * transcribed loop): + * + * + * x1 != 0 + * x1 = 0 + */ + fe_sub(&tmp0l, &x3, &z3); + fe_sub(&tmp1l, &x2, &z2); + fe_add(&x2l, &x2, &z2); + fe_add(&z2l, &x3, &z3); + fe_mul_tll(&z3, &tmp0l, &x2l); + fe_mul_tll(&z2, &z2l, &tmp1l); + fe_sq_tl(&tmp0, &tmp1l); + fe_sq_tl(&tmp1, &x2l); + fe_add(&x3l, &z3, &z2); + fe_sub(&z2l, &z3, &z2); + fe_mul_ttt(&x2, &tmp1, &tmp0); + fe_sub(&tmp1l, &tmp1, &tmp0); + fe_sq_tl(&z2, &z2l); + fe_mul121666(&z3, &tmp1l); + fe_sq_tl(&x3, &x3l); + fe_add(&tmp0l, &tmp0, &z3); + fe_mul_ttt(&z3, &x1, &z2); + fe_mul_tll(&z2, &tmp1l, &tmp0l); + } + /* here pos=-1, so r=e, so to_xz (e*P) === if swap then (x3, z3) + * else (x2, z2) + */ + fe_cswap(&x2, &x3, swap); + fe_cswap(&z2, &z3, swap); + + fe_invert(&z2, &z2); + fe_mul_ttt(&x2, &x2, &z2); + fe_tobytes(out, &x2); + + memzero_explicit(&x1, sizeof(x1)); + memzero_explicit(&x2, sizeof(x2)); + memzero_explicit(&z2, sizeof(z2)); + memzero_explicit(&x3, sizeof(x3)); + memzero_explicit(&z3, sizeof(z3)); + memzero_explicit(&x2l, sizeof(x2l)); + memzero_explicit(&z2l, sizeof(z2l)); + memzero_explicit(&x3l, sizeof(x3l)); + memzero_explicit(&e, sizeof(e)); +} diff --git a/lib/zinc/curve25519/curve25519-hacl64.h b/lib/zinc/curve25519/curve25519-hacl64.h new file mode 100644 index 000000000000..598be44622a2 --- /dev/null +++ b/lib/zinc/curve25519/curve25519-hacl64.h @@ -0,0 +1,784 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2016-2017 INRIA and Microsoft Corporation. + * Copyright (C) 2018 Jason A. Donenfeld . All Rights Reserved. + * + * This is a machine-generated formally verified implementation of Curve25519 + * ECDH from: . Though originally machine + * generated, it has been tweaked to be suitable for use in the kernel. It is + * optimized for 64-bit machines that can efficiently work with 128-bit + * integer types. + */ + +typedef __uint128_t u128; + +static __always_inline u64 u64_eq_mask(u64 a, u64 b) +{ + u64 x = a ^ b; + u64 minus_x = ~x + (u64)1U; + u64 x_or_minus_x = x | minus_x; + u64 xnx = x_or_minus_x >> (u32)63U; + u64 c = xnx - (u64)1U; + return c; +} + +static __always_inline u64 u64_gte_mask(u64 a, u64 b) +{ + u64 x = a; + u64 y = b; + u64 x_xor_y = x ^ y; + u64 x_sub_y = x - y; + u64 x_sub_y_xor_y = x_sub_y ^ y; + u64 q = x_xor_y | x_sub_y_xor_y; + u64 x_xor_q = x ^ q; + u64 x_xor_q_ = x_xor_q >> (u32)63U; + u64 c = x_xor_q_ - (u64)1U; + return c; +} + +static __always_inline void modulo_carry_top(u64 *b) +{ + u64 b4 = b[4]; + u64 b0 = b[0]; + u64 b4_ = b4 & 0x7ffffffffffffLLU; + u64 b0_ = b0 + 19 * (b4 >> 51); + b[4] = b4_; + b[0] = b0_; +} + +static __always_inline void fproduct_copy_from_wide_(u64 *output, u128 *input) +{ + { + u128 xi = input[0]; + output[0] = ((u64)(xi)); + } + { + u128 xi = input[1]; + output[1] = ((u64)(xi)); + } + { + u128 xi = input[2]; + output[2] = ((u64)(xi)); + } + { + u128 xi = input[3]; + output[3] = ((u64)(xi)); + } + { + u128 xi = input[4]; + output[4] = ((u64)(xi)); + } +} + +static __always_inline void +fproduct_sum_scalar_multiplication_(u128 *output, u64 *input, u64 s) +{ + output[0] += (u128)input[0] * s; + output[1] += (u128)input[1] * s; + output[2] += (u128)input[2] * s; + output[3] += (u128)input[3] * s; + output[4] += (u128)input[4] * s; +} + +static __always_inline void fproduct_carry_wide_(u128 *tmp) +{ + { + u32 ctr = 0; + u128 tctr = tmp[ctr]; + u128 tctrp1 = tmp[ctr + 1]; + u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU; + u128 c = ((tctr) >> (51)); + tmp[ctr] = ((u128)(r0)); + tmp[ctr + 1] = ((tctrp1) + (c)); + } + { + u32 ctr = 1; + u128 tctr = tmp[ctr]; + u128 tctrp1 = tmp[ctr + 1]; + u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU; + u128 c = ((tctr) >> (51)); + tmp[ctr] = ((u128)(r0)); + tmp[ctr + 1] = ((tctrp1) + (c)); + } + + { + u32 ctr = 2; + u128 tctr = tmp[ctr]; + u128 tctrp1 = tmp[ctr + 1]; + u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU; + u128 c = ((tctr) >> (51)); + tmp[ctr] = ((u128)(r0)); + tmp[ctr + 1] = ((tctrp1) + (c)); + } + { + u32 ctr = 3; + u128 tctr = tmp[ctr]; + u128 tctrp1 = tmp[ctr + 1]; + u64 r0 = ((u64)(tctr)) & 0x7ffffffffffffLLU; + u128 c = ((tctr) >> (51)); + tmp[ctr] = ((u128)(r0)); + tmp[ctr + 1] = ((tctrp1) + (c)); + } +} + +static __always_inline void fmul_shift_reduce(u64 *output) +{ + u64 tmp = output[4]; + u64 b0; + { + u32 ctr = 5 - 0 - 1; + u64 z = output[ctr - 1]; + output[ctr] = z; + } + { + u32 ctr = 5 - 1 - 1; + u64 z = output[ctr - 1]; + output[ctr] = z; + } + { + u32 ctr = 5 - 2 - 1; + u64 z = output[ctr - 1]; + output[ctr] = z; + } + { + u32 ctr = 5 - 3 - 1; + u64 z = output[ctr - 1]; + output[ctr] = z; + } + output[0] = tmp; + b0 = output[0]; + output[0] = 19 * b0; +} + +static __always_inline void fmul_mul_shift_reduce_(u128 *output, u64 *input, + u64 *input21) +{ + u32 i; + u64 input2i; + { + u64 input2i = input21[0]; + fproduct_sum_scalar_multiplication_(output, input, input2i); + fmul_shift_reduce(input); + } + { + u64 input2i = input21[1]; + fproduct_sum_scalar_multiplication_(output, input, input2i); + fmul_shift_reduce(input); + } + { + u64 input2i = input21[2]; + fproduct_sum_scalar_multiplication_(output, input, input2i); + fmul_shift_reduce(input); + } + { + u64 input2i = input21[3]; + fproduct_sum_scalar_multiplication_(output, input, input2i); + fmul_shift_reduce(input); + } + i = 4; + input2i = input21[i]; + fproduct_sum_scalar_multiplication_(output, input, input2i); +} + +static __always_inline void fmul_fmul(u64 *output, u64 *input, u64 *input21) +{ + u64 tmp[5] = { input[0], input[1], input[2], input[3], input[4] }; + { + u128 b4; + u128 b0; + u128 b4_; + u128 b0_; + u64 i0; + u64 i1; + u64 i0_; + u64 i1_; + u128 t[5] = { 0 }; + fmul_mul_shift_reduce_(t, tmp, input21); + fproduct_carry_wide_(t); + b4 = t[4]; + b0 = t[0]; + b4_ = ((b4) & (((u128)(0x7ffffffffffffLLU)))); + b0_ = ((b0) + (((u128)(19) * (((u64)(((b4) >> (51)))))))); + t[4] = b4_; + t[0] = b0_; + fproduct_copy_from_wide_(output, t); + i0 = output[0]; + i1 = output[1]; + i0_ = i0 & 0x7ffffffffffffLLU; + i1_ = i1 + (i0 >> 51); + output[0] = i0_; + output[1] = i1_; + } +} + +static __always_inline void fsquare_fsquare__(u128 *tmp, u64 *output) +{ + u64 r0 = output[0]; + u64 r1 = output[1]; + u64 r2 = output[2]; + u64 r3 = output[3]; + u64 r4 = output[4]; + u64 d0 = r0 * 2; + u64 d1 = r1 * 2; + u64 d2 = r2 * 2 * 19; + u64 d419 = r4 * 19; + u64 d4 = d419 * 2; + u128 s0 = ((((((u128)(r0) * (r0))) + (((u128)(d4) * (r1))))) + + (((u128)(d2) * (r3)))); + u128 s1 = ((((((u128)(d0) * (r1))) + (((u128)(d4) * (r2))))) + + (((u128)(r3 * 19) * (r3)))); + u128 s2 = ((((((u128)(d0) * (r2))) + (((u128)(r1) * (r1))))) + + (((u128)(d4) * (r3)))); + u128 s3 = ((((((u128)(d0) * (r3))) + (((u128)(d1) * (r2))))) + + (((u128)(r4) * (d419)))); + u128 s4 = ((((((u128)(d0) * (r4))) + (((u128)(d1) * (r3))))) + + (((u128)(r2) * (r2)))); + tmp[0] = s0; + tmp[1] = s1; + tmp[2] = s2; + tmp[3] = s3; + tmp[4] = s4; +} + +static __always_inline void fsquare_fsquare_(u128 *tmp, u64 *output) +{ + u128 b4; + u128 b0; + u128 b4_; + u128 b0_; + u64 i0; + u64 i1; + u64 i0_; + u64 i1_; + fsquare_fsquare__(tmp, output); + fproduct_carry_wide_(tmp); + b4 = tmp[4]; + b0 = tmp[0]; + b4_ = ((b4) & (((u128)(0x7ffffffffffffLLU)))); + b0_ = ((b0) + (((u128)(19) * (((u64)(((b4) >> (51)))))))); + tmp[4] = b4_; + tmp[0] = b0_; + fproduct_copy_from_wide_(output, tmp); + i0 = output[0]; + i1 = output[1]; + i0_ = i0 & 0x7ffffffffffffLLU; + i1_ = i1 + (i0 >> 51); + output[0] = i0_; + output[1] = i1_; +} + +static __always_inline void fsquare_fsquare_times_(u64 *output, u128 *tmp, + u32 count1) +{ + u32 i; + fsquare_fsquare_(tmp, output); + for (i = 1; i < count1; ++i) + fsquare_fsquare_(tmp, output); +} + +static __always_inline void fsquare_fsquare_times(u64 *output, u64 *input, + u32 count1) +{ + u128 t[5]; + memcpy(output, input, 5 * sizeof(*input)); + fsquare_fsquare_times_(output, t, count1); +} + +static __always_inline void fsquare_fsquare_times_inplace(u64 *output, + u32 count1) +{ + u128 t[5]; + fsquare_fsquare_times_(output, t, count1); +} + +static __always_inline void crecip_crecip(u64 *out, u64 *z) +{ + u64 buf[20] = { 0 }; + u64 *a0 = buf; + u64 *t00 = buf + 5; + u64 *b0 = buf + 10; + u64 *t01; + u64 *b1; + u64 *c0; + u64 *a; + u64 *t0; + u64 *b; + u64 *c; + fsquare_fsquare_times(a0, z, 1); + fsquare_fsquare_times(t00, a0, 2); + fmul_fmul(b0, t00, z); + fmul_fmul(a0, b0, a0); + fsquare_fsquare_times(t00, a0, 1); + fmul_fmul(b0, t00, b0); + fsquare_fsquare_times(t00, b0, 5); + t01 = buf + 5; + b1 = buf + 10; + c0 = buf + 15; + fmul_fmul(b1, t01, b1); + fsquare_fsquare_times(t01, b1, 10); + fmul_fmul(c0, t01, b1); + fsquare_fsquare_times(t01, c0, 20); + fmul_fmul(t01, t01, c0); + fsquare_fsquare_times_inplace(t01, 10); + fmul_fmul(b1, t01, b1); + fsquare_fsquare_times(t01, b1, 50); + a = buf; + t0 = buf + 5; + b = buf + 10; + c = buf + 15; + fmul_fmul(c, t0, b); + fsquare_fsquare_times(t0, c, 100); + fmul_fmul(t0, t0, c); + fsquare_fsquare_times_inplace(t0, 50); + fmul_fmul(t0, t0, b); + fsquare_fsquare_times_inplace(t0, 5); + fmul_fmul(out, t0, a); +} + +static __always_inline void fsum(u64 *a, u64 *b) +{ + a[0] += b[0]; + a[1] += b[1]; + a[2] += b[2]; + a[3] += b[3]; + a[4] += b[4]; +} + +static __always_inline void fdifference(u64 *a, u64 *b) +{ + u64 tmp[5] = { 0 }; + u64 b0; + u64 b1; + u64 b2; + u64 b3; + u64 b4; + memcpy(tmp, b, 5 * sizeof(*b)); + b0 = tmp[0]; + b1 = tmp[1]; + b2 = tmp[2]; + b3 = tmp[3]; + b4 = tmp[4]; + tmp[0] = b0 + 0x3fffffffffff68LLU; + tmp[1] = b1 + 0x3ffffffffffff8LLU; + tmp[2] = b2 + 0x3ffffffffffff8LLU; + tmp[3] = b3 + 0x3ffffffffffff8LLU; + tmp[4] = b4 + 0x3ffffffffffff8LLU; + { + u64 xi = a[0]; + u64 yi = tmp[0]; + a[0] = yi - xi; + } + { + u64 xi = a[1]; + u64 yi = tmp[1]; + a[1] = yi - xi; + } + { + u64 xi = a[2]; + u64 yi = tmp[2]; + a[2] = yi - xi; + } + { + u64 xi = a[3]; + u64 yi = tmp[3]; + a[3] = yi - xi; + } + { + u64 xi = a[4]; + u64 yi = tmp[4]; + a[4] = yi - xi; + } +} + +static __always_inline void fscalar(u64 *output, u64 *b, u64 s) +{ + u128 tmp[5]; + u128 b4; + u128 b0; + u128 b4_; + u128 b0_; + { + u64 xi = b[0]; + tmp[0] = ((u128)(xi) * (s)); + } + { + u64 xi = b[1]; + tmp[1] = ((u128)(xi) * (s)); + } + { + u64 xi = b[2]; + tmp[2] = ((u128)(xi) * (s)); + } + { + u64 xi = b[3]; + tmp[3] = ((u128)(xi) * (s)); + } + { + u64 xi = b[4]; + tmp[4] = ((u128)(xi) * (s)); + } + fproduct_carry_wide_(tmp); + b4 = tmp[4]; + b0 = tmp[0]; + b4_ = ((b4) & (((u128)(0x7ffffffffffffLLU)))); + b0_ = ((b0) + (((u128)(19) * (((u64)(((b4) >> (51)))))))); + tmp[4] = b4_; + tmp[0] = b0_; + fproduct_copy_from_wide_(output, tmp); +} + +static __always_inline void fmul(u64 *output, u64 *a, u64 *b) +{ + fmul_fmul(output, a, b); +} + +static __always_inline void crecip(u64 *output, u64 *input) +{ + crecip_crecip(output, input); +} + +static __always_inline void point_swap_conditional_step(u64 *a, u64 *b, + u64 swap1, u32 ctr) +{ + u32 i = ctr - 1; + u64 ai = a[i]; + u64 bi = b[i]; + u64 x = swap1 & (ai ^ bi); + u64 ai1 = ai ^ x; + u64 bi1 = bi ^ x; + a[i] = ai1; + b[i] = bi1; +} + +static __always_inline void point_swap_conditional5(u64 *a, u64 *b, u64 swap1) +{ + point_swap_conditional_step(a, b, swap1, 5); + point_swap_conditional_step(a, b, swap1, 4); + point_swap_conditional_step(a, b, swap1, 3); + point_swap_conditional_step(a, b, swap1, 2); + point_swap_conditional_step(a, b, swap1, 1); +} + +static __always_inline void point_swap_conditional(u64 *a, u64 *b, u64 iswap) +{ + u64 swap1 = 0 - iswap; + point_swap_conditional5(a, b, swap1); + point_swap_conditional5(a + 5, b + 5, swap1); +} + +static __always_inline void point_copy(u64 *output, u64 *input) +{ + memcpy(output, input, 5 * sizeof(*input)); + memcpy(output + 5, input + 5, 5 * sizeof(*input)); +} + +static __always_inline void addanddouble_fmonty(u64 *pp, u64 *ppq, u64 *p, + u64 *pq, u64 *qmqp) +{ + u64 *qx = qmqp; + u64 *x2 = pp; + u64 *z2 = pp + 5; + u64 *x3 = ppq; + u64 *z3 = ppq + 5; + u64 *x = p; + u64 *z = p + 5; + u64 *xprime = pq; + u64 *zprime = pq + 5; + u64 buf[40] = { 0 }; + u64 *origx = buf; + u64 *origxprime0 = buf + 5; + u64 *xxprime0; + u64 *zzprime0; + u64 *origxprime; + xxprime0 = buf + 25; + zzprime0 = buf + 30; + memcpy(origx, x, 5 * sizeof(*x)); + fsum(x, z); + fdifference(z, origx); + memcpy(origxprime0, xprime, 5 * sizeof(*xprime)); + fsum(xprime, zprime); + fdifference(zprime, origxprime0); + fmul(xxprime0, xprime, z); + fmul(zzprime0, x, zprime); + origxprime = buf + 5; + { + u64 *xx0; + u64 *zz0; + u64 *xxprime; + u64 *zzprime; + u64 *zzzprime; + xx0 = buf + 15; + zz0 = buf + 20; + xxprime = buf + 25; + zzprime = buf + 30; + zzzprime = buf + 35; + memcpy(origxprime, xxprime, 5 * sizeof(*xxprime)); + fsum(xxprime, zzprime); + fdifference(zzprime, origxprime); + fsquare_fsquare_times(x3, xxprime, 1); + fsquare_fsquare_times(zzzprime, zzprime, 1); + fmul(z3, zzzprime, qx); + fsquare_fsquare_times(xx0, x, 1); + fsquare_fsquare_times(zz0, z, 1); + { + u64 *zzz; + u64 *xx; + u64 *zz; + u64 scalar; + zzz = buf + 10; + xx = buf + 15; + zz = buf + 20; + fmul(x2, xx, zz); + fdifference(zz, xx); + scalar = 121665; + fscalar(zzz, zz, scalar); + fsum(zzz, xx); + fmul(z2, zzz, zz); + } + } +} + +static __always_inline void +ladder_smallloop_cmult_small_loop_step(u64 *nq, u64 *nqpq, u64 *nq2, u64 *nqpq2, + u64 *q, u8 byt) +{ + u64 bit0 = (u64)(byt >> 7); + u64 bit; + point_swap_conditional(nq, nqpq, bit0); + addanddouble_fmonty(nq2, nqpq2, nq, nqpq, q); + bit = (u64)(byt >> 7); + point_swap_conditional(nq2, nqpq2, bit); +} + +static __always_inline void +ladder_smallloop_cmult_small_loop_double_step(u64 *nq, u64 *nqpq, u64 *nq2, + u64 *nqpq2, u64 *q, u8 byt) +{ + u8 byt1; + ladder_smallloop_cmult_small_loop_step(nq, nqpq, nq2, nqpq2, q, byt); + byt1 = byt << 1; + ladder_smallloop_cmult_small_loop_step(nq2, nqpq2, nq, nqpq, q, byt1); +} + +static __always_inline void +ladder_smallloop_cmult_small_loop(u64 *nq, u64 *nqpq, u64 *nq2, u64 *nqpq2, + u64 *q, u8 byt, u32 i) +{ + while (i--) { + ladder_smallloop_cmult_small_loop_double_step(nq, nqpq, nq2, + nqpq2, q, byt); + byt <<= 2; + } +} + +static __always_inline void ladder_bigloop_cmult_big_loop(u8 *n1, u64 *nq, + u64 *nqpq, u64 *nq2, + u64 *nqpq2, u64 *q, + u32 i) +{ + while (i--) { + u8 byte = n1[i]; + ladder_smallloop_cmult_small_loop(nq, nqpq, nq2, nqpq2, q, + byte, 4); + } +} + +static void ladder_cmult(u64 *result, u8 *n1, u64 *q) +{ + u64 point_buf[40] = { 0 }; + u64 *nq = point_buf; + u64 *nqpq = point_buf + 10; + u64 *nq2 = point_buf + 20; + u64 *nqpq2 = point_buf + 30; + point_copy(nqpq, q); + nq[0] = 1; + ladder_bigloop_cmult_big_loop(n1, nq, nqpq, nq2, nqpq2, q, 32); + point_copy(result, nq); +} + +static __always_inline void format_fexpand(u64 *output, const u8 *input) +{ + const u8 *x00 = input + 6; + const u8 *x01 = input + 12; + const u8 *x02 = input + 19; + const u8 *x0 = input + 24; + u64 i0, i1, i2, i3, i4, output0, output1, output2, output3, output4; + i0 = get_unaligned_le64(input); + i1 = get_unaligned_le64(x00); + i2 = get_unaligned_le64(x01); + i3 = get_unaligned_le64(x02); + i4 = get_unaligned_le64(x0); + output0 = i0 & 0x7ffffffffffffLLU; + output1 = i1 >> 3 & 0x7ffffffffffffLLU; + output2 = i2 >> 6 & 0x7ffffffffffffLLU; + output3 = i3 >> 1 & 0x7ffffffffffffLLU; + output4 = i4 >> 12 & 0x7ffffffffffffLLU; + output[0] = output0; + output[1] = output1; + output[2] = output2; + output[3] = output3; + output[4] = output4; +} + +static __always_inline void format_fcontract_first_carry_pass(u64 *input) +{ + u64 t0 = input[0]; + u64 t1 = input[1]; + u64 t2 = input[2]; + u64 t3 = input[3]; + u64 t4 = input[4]; + u64 t1_ = t1 + (t0 >> 51); + u64 t0_ = t0 & 0x7ffffffffffffLLU; + u64 t2_ = t2 + (t1_ >> 51); + u64 t1__ = t1_ & 0x7ffffffffffffLLU; + u64 t3_ = t3 + (t2_ >> 51); + u64 t2__ = t2_ & 0x7ffffffffffffLLU; + u64 t4_ = t4 + (t3_ >> 51); + u64 t3__ = t3_ & 0x7ffffffffffffLLU; + input[0] = t0_; + input[1] = t1__; + input[2] = t2__; + input[3] = t3__; + input[4] = t4_; +} + +static __always_inline void format_fcontract_first_carry_full(u64 *input) +{ + format_fcontract_first_carry_pass(input); + modulo_carry_top(input); +} + +static __always_inline void format_fcontract_second_carry_pass(u64 *input) +{ + u64 t0 = input[0]; + u64 t1 = input[1]; + u64 t2 = input[2]; + u64 t3 = input[3]; + u64 t4 = input[4]; + u64 t1_ = t1 + (t0 >> 51); + u64 t0_ = t0 & 0x7ffffffffffffLLU; + u64 t2_ = t2 + (t1_ >> 51); + u64 t1__ = t1_ & 0x7ffffffffffffLLU; + u64 t3_ = t3 + (t2_ >> 51); + u64 t2__ = t2_ & 0x7ffffffffffffLLU; + u64 t4_ = t4 + (t3_ >> 51); + u64 t3__ = t3_ & 0x7ffffffffffffLLU; + input[0] = t0_; + input[1] = t1__; + input[2] = t2__; + input[3] = t3__; + input[4] = t4_; +} + +static __always_inline void format_fcontract_second_carry_full(u64 *input) +{ + u64 i0; + u64 i1; + u64 i0_; + u64 i1_; + format_fcontract_second_carry_pass(input); + modulo_carry_top(input); + i0 = input[0]; + i1 = input[1]; + i0_ = i0 & 0x7ffffffffffffLLU; + i1_ = i1 + (i0 >> 51); + input[0] = i0_; + input[1] = i1_; +} + +static __always_inline void format_fcontract_trim(u64 *input) +{ + u64 a0 = input[0]; + u64 a1 = input[1]; + u64 a2 = input[2]; + u64 a3 = input[3]; + u64 a4 = input[4]; + u64 mask0 = u64_gte_mask(a0, 0x7ffffffffffedLLU); + u64 mask1 = u64_eq_mask(a1, 0x7ffffffffffffLLU); + u64 mask2 = u64_eq_mask(a2, 0x7ffffffffffffLLU); + u64 mask3 = u64_eq_mask(a3, 0x7ffffffffffffLLU); + u64 mask4 = u64_eq_mask(a4, 0x7ffffffffffffLLU); + u64 mask = (((mask0 & mask1) & mask2) & mask3) & mask4; + u64 a0_ = a0 - (0x7ffffffffffedLLU & mask); + u64 a1_ = a1 - (0x7ffffffffffffLLU & mask); + u64 a2_ = a2 - (0x7ffffffffffffLLU & mask); + u64 a3_ = a3 - (0x7ffffffffffffLLU & mask); + u64 a4_ = a4 - (0x7ffffffffffffLLU & mask); + input[0] = a0_; + input[1] = a1_; + input[2] = a2_; + input[3] = a3_; + input[4] = a4_; +} + +static __always_inline void format_fcontract_store(u8 *output, u64 *input) +{ + u64 t0 = input[0]; + u64 t1 = input[1]; + u64 t2 = input[2]; + u64 t3 = input[3]; + u64 t4 = input[4]; + u64 o0 = t1 << 51 | t0; + u64 o1 = t2 << 38 | t1 >> 13; + u64 o2 = t3 << 25 | t2 >> 26; + u64 o3 = t4 << 12 | t3 >> 39; + u8 *b0 = output; + u8 *b1 = output + 8; + u8 *b2 = output + 16; + u8 *b3 = output + 24; + put_unaligned_le64(o0, b0); + put_unaligned_le64(o1, b1); + put_unaligned_le64(o2, b2); + put_unaligned_le64(o3, b3); +} + +static __always_inline void format_fcontract(u8 *output, u64 *input) +{ + format_fcontract_first_carry_full(input); + format_fcontract_second_carry_full(input); + format_fcontract_trim(input); + format_fcontract_store(output, input); +} + +static __always_inline void format_scalar_of_point(u8 *scalar, u64 *point) +{ + u64 *x = point; + u64 *z = point + 5; + u64 buf[10] __aligned(32) = { 0 }; + u64 *zmone = buf; + u64 *sc = buf + 5; + crecip(zmone, z); + fmul(sc, x, zmone); + format_fcontract(scalar, sc); +} + +static void curve25519_generic(u8 mypublic[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE], + const u8 basepoint[CURVE25519_KEY_SIZE]) +{ + u64 buf0[10] __aligned(32) = { 0 }; + u64 *x0 = buf0; + u64 *z = buf0 + 5; + u64 *q; + format_fexpand(x0, basepoint); + z[0] = 1; + q = buf0; + { + u8 e[32] __aligned(32) = { 0 }; + u8 *scalar; + memcpy(e, secret, 32); + normalize_secret(e); + scalar = e; + { + u64 buf[15] = { 0 }; + u64 *nq = buf; + u64 *x = nq; + x[0] = 1; + ladder_cmult(nq, scalar, q); + format_scalar_of_point(mypublic, nq); + memzero_explicit(buf, sizeof(buf)); + } + memzero_explicit(e, sizeof(e)); + } + memzero_explicit(buf0, sizeof(buf0)); +} diff --git a/lib/zinc/curve25519/curve25519.c b/lib/zinc/curve25519/curve25519.c new file mode 100644 index 000000000000..203fe67c0c9d --- /dev/null +++ b/lib/zinc/curve25519/curve25519.c @@ -0,0 +1,109 @@ +// SPDX-License-Identifier: GPL-2.0 OR MIT +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * + * This is an implementation of the Curve25519 ECDH algorithm, using either + * a 32-bit implementation or a 64-bit implementation with 128-bit integers, + * depending on what is supported by the target compiler. + * + * Information: https://cr.yp.to/ecdh.html + */ + +#include + +#include +#include +#include +#include +#include +#include +#include + +#ifndef HAVE_CURVE25519_ARCH_IMPLEMENTATION +void __init curve25519_fpu_init(void) +{ +} +static inline bool curve25519_arch(u8 mypublic[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE], + const u8 basepoint[CURVE25519_KEY_SIZE]) +{ + return false; +} +static inline bool curve25519_base_arch(u8 pub[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE]) +{ + return false; +} +#endif + +static __always_inline void normalize_secret(u8 secret[CURVE25519_KEY_SIZE]) +{ + secret[0] &= 248; + secret[31] &= 127; + secret[31] |= 64; +} + +#if defined(CONFIG_ARCH_SUPPORTS_INT128) && defined(__SIZEOF_INT128__) +#include "curve25519-hacl64.h" +#else +#include "curve25519-fiat32.h" +#endif + +static const u8 null_point[CURVE25519_KEY_SIZE] = { 0 }; + +bool curve25519(u8 mypublic[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE], + const u8 basepoint[CURVE25519_KEY_SIZE]) +{ + if (!curve25519_arch(mypublic, secret, basepoint)) + curve25519_generic(mypublic, secret, basepoint); + return crypto_memneq(mypublic, null_point, CURVE25519_KEY_SIZE); +} +EXPORT_SYMBOL(curve25519); + +bool curve25519_generate_public(u8 pub[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE]) +{ + static const u8 basepoint[CURVE25519_KEY_SIZE] __aligned(32) = { 9 }; + + if (unlikely(!crypto_memneq(secret, null_point, CURVE25519_KEY_SIZE))) + return false; + + if (curve25519_base_arch(pub, secret)) + return crypto_memneq(pub, null_point, CURVE25519_KEY_SIZE); + return curve25519(pub, secret, basepoint); +} +EXPORT_SYMBOL(curve25519_generate_public); + +void curve25519_generate_secret(u8 secret[CURVE25519_KEY_SIZE]) +{ + get_random_bytes_wait(secret, CURVE25519_KEY_SIZE); + normalize_secret(secret); +} +EXPORT_SYMBOL(curve25519_generate_secret); + +#include "../selftest/curve25519.h" + +static bool nosimd __initdata = false; + +static int __init mod_init(void) +{ + if (!nosimd) + curve25519_fpu_init(); +#ifdef DEBUG + if (!curve25519_selftest()) + return -ENOTRECOVERABLE; +#endif + return 0; +} + +static void __exit mod_exit(void) +{ +} + +module_param(nosimd, bool, 0); +module_init(mod_init); +module_exit(mod_exit); +MODULE_LICENSE("GPL v2"); +MODULE_DESCRIPTION("Curve25519 scalar multiplication"); +MODULE_AUTHOR("Jason A. Donenfeld "); diff --git a/lib/zinc/selftest/curve25519.h b/lib/zinc/selftest/curve25519.h new file mode 100644 index 000000000000..8c11d806c9f6 --- /dev/null +++ b/lib/zinc/selftest/curve25519.h @@ -0,0 +1,1321 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifdef DEBUG +struct curve25519_test_vector { + u8 private[CURVE25519_KEY_SIZE]; + u8 public[CURVE25519_KEY_SIZE]; + u8 result[CURVE25519_KEY_SIZE]; + bool valid; +}; +static const struct curve25519_test_vector curve25519_test_vectors[] __initconst = { + { + .private = { 0x77, 0x07, 0x6d, 0x0a, 0x73, 0x18, 0xa5, 0x7d, + 0x3c, 0x16, 0xc1, 0x72, 0x51, 0xb2, 0x66, 0x45, + 0xdf, 0x4c, 0x2f, 0x87, 0xeb, 0xc0, 0x99, 0x2a, + 0xb1, 0x77, 0xfb, 0xa5, 0x1d, 0xb9, 0x2c, 0x2a }, + .public = { 0xde, 0x9e, 0xdb, 0x7d, 0x7b, 0x7d, 0xc1, 0xb4, + 0xd3, 0x5b, 0x61, 0xc2, 0xec, 0xe4, 0x35, 0x37, + 0x3f, 0x83, 0x43, 0xc8, 0x5b, 0x78, 0x67, 0x4d, + 0xad, 0xfc, 0x7e, 0x14, 0x6f, 0x88, 0x2b, 0x4f }, + .result = { 0x4a, 0x5d, 0x9d, 0x5b, 0xa4, 0xce, 0x2d, 0xe1, + 0x72, 0x8e, 0x3b, 0xf4, 0x80, 0x35, 0x0f, 0x25, + 0xe0, 0x7e, 0x21, 0xc9, 0x47, 0xd1, 0x9e, 0x33, + 0x76, 0xf0, 0x9b, 0x3c, 0x1e, 0x16, 0x17, 0x42 }, + .valid = true + }, + { + .private = { 0x5d, 0xab, 0x08, 0x7e, 0x62, 0x4a, 0x8a, 0x4b, + 0x79, 0xe1, 0x7f, 0x8b, 0x83, 0x80, 0x0e, 0xe6, + 0x6f, 0x3b, 0xb1, 0x29, 0x26, 0x18, 0xb6, 0xfd, + 0x1c, 0x2f, 0x8b, 0x27, 0xff, 0x88, 0xe0, 0xeb }, + .public = { 0x85, 0x20, 0xf0, 0x09, 0x89, 0x30, 0xa7, 0x54, + 0x74, 0x8b, 0x7d, 0xdc, 0xb4, 0x3e, 0xf7, 0x5a, + 0x0d, 0xbf, 0x3a, 0x0d, 0x26, 0x38, 0x1a, 0xf4, + 0xeb, 0xa4, 0xa9, 0x8e, 0xaa, 0x9b, 0x4e, 0x6a }, + .result = { 0x4a, 0x5d, 0x9d, 0x5b, 0xa4, 0xce, 0x2d, 0xe1, + 0x72, 0x8e, 0x3b, 0xf4, 0x80, 0x35, 0x0f, 0x25, + 0xe0, 0x7e, 0x21, 0xc9, 0x47, 0xd1, 0x9e, 0x33, + 0x76, 0xf0, 0x9b, 0x3c, 0x1e, 0x16, 0x17, 0x42 }, + .valid = true + }, + { + .private = { 1 }, + .public = { 0x25, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .result = { 0x3c, 0x77, 0x77, 0xca, 0xf9, 0x97, 0xb2, 0x64, + 0x41, 0x60, 0x77, 0x66, 0x5b, 0x4e, 0x22, 0x9d, + 0x0b, 0x95, 0x48, 0xdc, 0x0c, 0xd8, 0x19, 0x98, + 0xdd, 0xcd, 0xc5, 0xc8, 0x53, 0x3c, 0x79, 0x7f }, + .valid = true + }, + { + .private = { 1 }, + .public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0xb3, 0x2d, 0x13, 0x62, 0xc2, 0x48, 0xd6, 0x2f, + 0xe6, 0x26, 0x19, 0xcf, 0xf0, 0x4d, 0xd4, 0x3d, + 0xb7, 0x3f, 0xfc, 0x1b, 0x63, 0x08, 0xed, 0xe3, + 0x0b, 0x78, 0xd8, 0x73, 0x80, 0xf1, 0xe8, 0x34 }, + .valid = true + }, + { + .private = { 0xa5, 0x46, 0xe3, 0x6b, 0xf0, 0x52, 0x7c, 0x9d, + 0x3b, 0x16, 0x15, 0x4b, 0x82, 0x46, 0x5e, 0xdd, + 0x62, 0x14, 0x4c, 0x0a, 0xc1, 0xfc, 0x5a, 0x18, + 0x50, 0x6a, 0x22, 0x44, 0xba, 0x44, 0x9a, 0xc4 }, + .public = { 0xe6, 0xdb, 0x68, 0x67, 0x58, 0x30, 0x30, 0xdb, + 0x35, 0x94, 0xc1, 0xa4, 0x24, 0xb1, 0x5f, 0x7c, + 0x72, 0x66, 0x24, 0xec, 0x26, 0xb3, 0x35, 0x3b, + 0x10, 0xa9, 0x03, 0xa6, 0xd0, 0xab, 0x1c, 0x4c }, + .result = { 0xc3, 0xda, 0x55, 0x37, 0x9d, 0xe9, 0xc6, 0x90, + 0x8e, 0x94, 0xea, 0x4d, 0xf2, 0x8d, 0x08, 0x4f, + 0x32, 0xec, 0xcf, 0x03, 0x49, 0x1c, 0x71, 0xf7, + 0x54, 0xb4, 0x07, 0x55, 0x77, 0xa2, 0x85, 0x52 }, + .valid = true + }, + { + .private = { 1, 2, 3, 4 }, + .public = { 0 }, + .result = { 0 }, + .valid = false + }, + { + .private = { 2, 4, 6, 8 }, + .public = { 0xe0, 0xeb, 0x7a, 0x7c, 0x3b, 0x41, 0xb8, 0xae, + 0x16, 0x56, 0xe3, 0xfa, 0xf1, 0x9f, 0xc4, 0x6a, + 0xda, 0x09, 0x8d, 0xeb, 0x9c, 0x32, 0xb1, 0xfd, + 0x86, 0x62, 0x05, 0x16, 0x5f, 0x49, 0xb8 }, + .result = { 0 }, + .valid = false + }, + { + .private = { 0xff, 0xff, 0xff, 0xff, 0x0a, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0x0a, 0x00, 0xfb, 0x9f }, + .result = { 0x77, 0x52, 0xb6, 0x18, 0xc1, 0x2d, 0x48, 0xd2, + 0xc6, 0x93, 0x46, 0x83, 0x81, 0x7c, 0xc6, 0x57, + 0xf3, 0x31, 0x03, 0x19, 0x49, 0x48, 0x20, 0x05, + 0x42, 0x2b, 0x4e, 0xae, 0x8d, 0x1d, 0x43, 0x23 }, + .valid = true + }, + { + .private = { 0x8e, 0x0a, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .public = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x8e, 0x06 }, + .result = { 0x5a, 0xdf, 0xaa, 0x25, 0x86, 0x8e, 0x32, 0x3d, + 0xae, 0x49, 0x62, 0xc1, 0x01, 0x5c, 0xb3, 0x12, + 0xe1, 0xc5, 0xc7, 0x9e, 0x95, 0x3f, 0x03, 0x99, + 0xb0, 0xba, 0x16, 0x22, 0xf3, 0xb6, 0xf7, 0x0c }, + .valid = true + }, + /* wycheproof - normal case */ + { + .private = { 0x48, 0x52, 0x83, 0x4d, 0x9d, 0x6b, 0x77, 0xda, + 0xde, 0xab, 0xaa, 0xf2, 0xe1, 0x1d, 0xca, 0x66, + 0xd1, 0x9f, 0xe7, 0x49, 0x93, 0xa7, 0xbe, 0xc3, + 0x6c, 0x6e, 0x16, 0xa0, 0x98, 0x3f, 0xea, 0xba }, + .public = { 0x9c, 0x64, 0x7d, 0x9a, 0xe5, 0x89, 0xb9, 0xf5, + 0x8f, 0xdc, 0x3c, 0xa4, 0x94, 0x7e, 0xfb, 0xc9, + 0x15, 0xc4, 0xb2, 0xe0, 0x8e, 0x74, 0x4a, 0x0e, + 0xdf, 0x46, 0x9d, 0xac, 0x59, 0xc8, 0xf8, 0x5a }, + .result = { 0x87, 0xb7, 0xf2, 0x12, 0xb6, 0x27, 0xf7, 0xa5, + 0x4c, 0xa5, 0xe0, 0xbc, 0xda, 0xdd, 0xd5, 0x38, + 0x9d, 0x9d, 0xe6, 0x15, 0x6c, 0xdb, 0xcf, 0x8e, + 0xbe, 0x14, 0xff, 0xbc, 0xfb, 0x43, 0x65, 0x51 }, + .valid = true + }, + /* wycheproof - public key on twist */ + { + .private = { 0x58, 0x8c, 0x06, 0x1a, 0x50, 0x80, 0x4a, 0xc4, + 0x88, 0xad, 0x77, 0x4a, 0xc7, 0x16, 0xc3, 0xf5, + 0xba, 0x71, 0x4b, 0x27, 0x12, 0xe0, 0x48, 0x49, + 0x13, 0x79, 0xa5, 0x00, 0x21, 0x19, 0x98, 0xa8 }, + .public = { 0x63, 0xaa, 0x40, 0xc6, 0xe3, 0x83, 0x46, 0xc5, + 0xca, 0xf2, 0x3a, 0x6d, 0xf0, 0xa5, 0xe6, 0xc8, + 0x08, 0x89, 0xa0, 0x86, 0x47, 0xe5, 0x51, 0xb3, + 0x56, 0x34, 0x49, 0xbe, 0xfc, 0xfc, 0x97, 0x33 }, + .result = { 0xb1, 0xa7, 0x07, 0x51, 0x94, 0x95, 0xff, 0xff, + 0xb2, 0x98, 0xff, 0x94, 0x17, 0x16, 0xb0, 0x6d, + 0xfa, 0xb8, 0x7c, 0xf8, 0xd9, 0x11, 0x23, 0xfe, + 0x2b, 0xe9, 0xa2, 0x33, 0xdd, 0xa2, 0x22, 0x12 }, + .valid = true + }, + /* wycheproof - public key on twist */ + { + .private = { 0xb0, 0x5b, 0xfd, 0x32, 0xe5, 0x53, 0x25, 0xd9, + 0xfd, 0x64, 0x8c, 0xb3, 0x02, 0x84, 0x80, 0x39, + 0x00, 0x0b, 0x39, 0x0e, 0x44, 0xd5, 0x21, 0xe5, + 0x8a, 0xab, 0x3b, 0x29, 0xa6, 0x96, 0x0b, 0xa8 }, + .public = { 0x0f, 0x83, 0xc3, 0x6f, 0xde, 0xd9, 0xd3, 0x2f, + 0xad, 0xf4, 0xef, 0xa3, 0xae, 0x93, 0xa9, 0x0b, + 0xb5, 0xcf, 0xa6, 0x68, 0x93, 0xbc, 0x41, 0x2c, + 0x43, 0xfa, 0x72, 0x87, 0xdb, 0xb9, 0x97, 0x79 }, + .result = { 0x67, 0xdd, 0x4a, 0x6e, 0x16, 0x55, 0x33, 0x53, + 0x4c, 0x0e, 0x3f, 0x17, 0x2e, 0x4a, 0xb8, 0x57, + 0x6b, 0xca, 0x92, 0x3a, 0x5f, 0x07, 0xb2, 0xc0, + 0x69, 0xb4, 0xc3, 0x10, 0xff, 0x2e, 0x93, 0x5b }, + .valid = true + }, + /* wycheproof - public key on twist */ + { + .private = { 0x70, 0xe3, 0x4b, 0xcb, 0xe1, 0xf4, 0x7f, 0xbc, + 0x0f, 0xdd, 0xfd, 0x7c, 0x1e, 0x1a, 0xa5, 0x3d, + 0x57, 0xbf, 0xe0, 0xf6, 0x6d, 0x24, 0x30, 0x67, + 0xb4, 0x24, 0xbb, 0x62, 0x10, 0xbe, 0xd1, 0x9c }, + .public = { 0x0b, 0x82, 0x11, 0xa2, 0xb6, 0x04, 0x90, 0x97, + 0xf6, 0x87, 0x1c, 0x6c, 0x05, 0x2d, 0x3c, 0x5f, + 0xc1, 0xba, 0x17, 0xda, 0x9e, 0x32, 0xae, 0x45, + 0x84, 0x03, 0xb0, 0x5b, 0xb2, 0x83, 0x09, 0x2a }, + .result = { 0x4a, 0x06, 0x38, 0xcf, 0xaa, 0x9e, 0xf1, 0x93, + 0x3b, 0x47, 0xf8, 0x93, 0x92, 0x96, 0xa6, 0xb2, + 0x5b, 0xe5, 0x41, 0xef, 0x7f, 0x70, 0xe8, 0x44, + 0xc0, 0xbc, 0xc0, 0x0b, 0x13, 0x4d, 0xe6, 0x4a }, + .valid = true + }, + /* wycheproof - public key on twist */ + { + .private = { 0x68, 0xc1, 0xf3, 0xa6, 0x53, 0xa4, 0xcd, 0xb1, + 0xd3, 0x7b, 0xba, 0x94, 0x73, 0x8f, 0x8b, 0x95, + 0x7a, 0x57, 0xbe, 0xb2, 0x4d, 0x64, 0x6e, 0x99, + 0x4d, 0xc2, 0x9a, 0x27, 0x6a, 0xad, 0x45, 0x8d }, + .public = { 0x34, 0x3a, 0xc2, 0x0a, 0x3b, 0x9c, 0x6a, 0x27, + 0xb1, 0x00, 0x81, 0x76, 0x50, 0x9a, 0xd3, 0x07, + 0x35, 0x85, 0x6e, 0xc1, 0xc8, 0xd8, 0xfc, 0xae, + 0x13, 0x91, 0x2d, 0x08, 0xd1, 0x52, 0xf4, 0x6c }, + .result = { 0x39, 0x94, 0x91, 0xfc, 0xe8, 0xdf, 0xab, 0x73, + 0xb4, 0xf9, 0xf6, 0x11, 0xde, 0x8e, 0xa0, 0xb2, + 0x7b, 0x28, 0xf8, 0x59, 0x94, 0x25, 0x0b, 0x0f, + 0x47, 0x5d, 0x58, 0x5d, 0x04, 0x2a, 0xc2, 0x07 }, + .valid = true + }, + /* wycheproof - public key on twist */ + { + .private = { 0xd8, 0x77, 0xb2, 0x6d, 0x06, 0xdf, 0xf9, 0xd9, + 0xf7, 0xfd, 0x4c, 0x5b, 0x37, 0x69, 0xf8, 0xcd, + 0xd5, 0xb3, 0x05, 0x16, 0xa5, 0xab, 0x80, 0x6b, + 0xe3, 0x24, 0xff, 0x3e, 0xb6, 0x9e, 0xa0, 0xb2 }, + .public = { 0xfa, 0x69, 0x5f, 0xc7, 0xbe, 0x8d, 0x1b, 0xe5, + 0xbf, 0x70, 0x48, 0x98, 0xf3, 0x88, 0xc4, 0x52, + 0xba, 0xfd, 0xd3, 0xb8, 0xea, 0xe8, 0x05, 0xf8, + 0x68, 0x1a, 0x8d, 0x15, 0xc2, 0xd4, 0xe1, 0x42 }, + .result = { 0x2c, 0x4f, 0xe1, 0x1d, 0x49, 0x0a, 0x53, 0x86, + 0x17, 0x76, 0xb1, 0x3b, 0x43, 0x54, 0xab, 0xd4, + 0xcf, 0x5a, 0x97, 0x69, 0x9d, 0xb6, 0xe6, 0xc6, + 0x8c, 0x16, 0x26, 0xd0, 0x76, 0x62, 0xf7, 0x58 }, + .valid = true + }, + /* wycheproof - public key = 0 */ + { + .private = { 0x20, 0x74, 0x94, 0x03, 0x8f, 0x2b, 0xb8, 0x11, + 0xd4, 0x78, 0x05, 0xbc, 0xdf, 0x04, 0xa2, 0xac, + 0x58, 0x5a, 0xda, 0x7f, 0x2f, 0x23, 0x38, 0x9b, + 0xfd, 0x46, 0x58, 0xf9, 0xdd, 0xd4, 0xde, 0xbc }, + .public = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key = 1 */ + { + .private = { 0x20, 0x2e, 0x89, 0x72, 0xb6, 0x1c, 0x7e, 0x61, + 0x93, 0x0e, 0xb9, 0x45, 0x0b, 0x50, 0x70, 0xea, + 0xe1, 0xc6, 0x70, 0x47, 0x56, 0x85, 0x54, 0x1f, + 0x04, 0x76, 0x21, 0x7e, 0x48, 0x18, 0xcf, 0xab }, + .public = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - edge case on twist */ + { + .private = { 0x38, 0xdd, 0xe9, 0xf3, 0xe7, 0xb7, 0x99, 0x04, + 0x5f, 0x9a, 0xc3, 0x79, 0x3d, 0x4a, 0x92, 0x77, + 0xda, 0xde, 0xad, 0xc4, 0x1b, 0xec, 0x02, 0x90, + 0xf8, 0x1f, 0x74, 0x4f, 0x73, 0x77, 0x5f, 0x84 }, + .public = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .result = { 0x9a, 0x2c, 0xfe, 0x84, 0xff, 0x9c, 0x4a, 0x97, + 0x39, 0x62, 0x5c, 0xae, 0x4a, 0x3b, 0x82, 0xa9, + 0x06, 0x87, 0x7a, 0x44, 0x19, 0x46, 0xf8, 0xd7, + 0xb3, 0xd7, 0x95, 0xfe, 0x8f, 0x5d, 0x16, 0x39 }, + .valid = true + }, + /* wycheproof - edge case on twist */ + { + .private = { 0x98, 0x57, 0xa9, 0x14, 0xe3, 0xc2, 0x90, 0x36, + 0xfd, 0x9a, 0x44, 0x2b, 0xa5, 0x26, 0xb5, 0xcd, + 0xcd, 0xf2, 0x82, 0x16, 0x15, 0x3e, 0x63, 0x6c, + 0x10, 0x67, 0x7a, 0xca, 0xb6, 0xbd, 0x6a, 0xa5 }, + .public = { 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .result = { 0x4d, 0xa4, 0xe0, 0xaa, 0x07, 0x2c, 0x23, 0x2e, + 0xe2, 0xf0, 0xfa, 0x4e, 0x51, 0x9a, 0xe5, 0x0b, + 0x52, 0xc1, 0xed, 0xd0, 0x8a, 0x53, 0x4d, 0x4e, + 0xf3, 0x46, 0xc2, 0xe1, 0x06, 0xd2, 0x1d, 0x60 }, + .valid = true + }, + /* wycheproof - edge case on twist */ + { + .private = { 0x48, 0xe2, 0x13, 0x0d, 0x72, 0x33, 0x05, 0xed, + 0x05, 0xe6, 0xe5, 0x89, 0x4d, 0x39, 0x8a, 0x5e, + 0x33, 0x36, 0x7a, 0x8c, 0x6a, 0xac, 0x8f, 0xcd, + 0xf0, 0xa8, 0x8e, 0x4b, 0x42, 0x82, 0x0d, 0xb7 }, + .public = { 0xff, 0xff, 0xff, 0x03, 0x00, 0x00, 0xf8, 0xff, + 0xff, 0x1f, 0x00, 0x00, 0xc0, 0xff, 0xff, 0xff, + 0x00, 0x00, 0x00, 0xfe, 0xff, 0xff, 0x07, 0x00, + 0x00, 0xf0, 0xff, 0xff, 0x3f, 0x00, 0x00, 0x00 }, + .result = { 0x9e, 0xd1, 0x0c, 0x53, 0x74, 0x7f, 0x64, 0x7f, + 0x82, 0xf4, 0x51, 0x25, 0xd3, 0xde, 0x15, 0xa1, + 0xe6, 0xb8, 0x24, 0x49, 0x6a, 0xb4, 0x04, 0x10, + 0xff, 0xcc, 0x3c, 0xfe, 0x95, 0x76, 0x0f, 0x3b }, + .valid = true + }, + /* wycheproof - edge case on twist */ + { + .private = { 0x28, 0xf4, 0x10, 0x11, 0x69, 0x18, 0x51, 0xb3, + 0xa6, 0x2b, 0x64, 0x15, 0x53, 0xb3, 0x0d, 0x0d, + 0xfd, 0xdc, 0xb8, 0xff, 0xfc, 0xf5, 0x37, 0x00, + 0xa7, 0xbe, 0x2f, 0x6a, 0x87, 0x2e, 0x9f, 0xb0 }, + .public = { 0x00, 0x00, 0x00, 0xfc, 0xff, 0xff, 0x07, 0x00, + 0x00, 0xe0, 0xff, 0xff, 0x3f, 0x00, 0x00, 0x00, + 0xff, 0xff, 0xff, 0x01, 0x00, 0x00, 0xf8, 0xff, + 0xff, 0x0f, 0x00, 0x00, 0xc0, 0xff, 0xff, 0x7f }, + .result = { 0xcf, 0x72, 0xb4, 0xaa, 0x6a, 0xa1, 0xc9, 0xf8, + 0x94, 0xf4, 0x16, 0x5b, 0x86, 0x10, 0x9a, 0xa4, + 0x68, 0x51, 0x76, 0x48, 0xe1, 0xf0, 0xcc, 0x70, + 0xe1, 0xab, 0x08, 0x46, 0x01, 0x76, 0x50, 0x6b }, + .valid = true + }, + /* wycheproof - edge case on twist */ + { + .private = { 0x18, 0xa9, 0x3b, 0x64, 0x99, 0xb9, 0xf6, 0xb3, + 0x22, 0x5c, 0xa0, 0x2f, 0xef, 0x41, 0x0e, 0x0a, + 0xde, 0xc2, 0x35, 0x32, 0x32, 0x1d, 0x2d, 0x8e, + 0xf1, 0xa6, 0xd6, 0x02, 0xa8, 0xc6, 0x5b, 0x83 }, + .public = { 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0xff, + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x5d, 0x50, 0xb6, 0x28, 0x36, 0xbb, 0x69, 0x57, + 0x94, 0x10, 0x38, 0x6c, 0xf7, 0xbb, 0x81, 0x1c, + 0x14, 0xbf, 0x85, 0xb1, 0xc7, 0xb1, 0x7e, 0x59, + 0x24, 0xc7, 0xff, 0xea, 0x91, 0xef, 0x9e, 0x12 }, + .valid = true + }, + /* wycheproof - edge case on twist */ + { + .private = { 0xc0, 0x1d, 0x13, 0x05, 0xa1, 0x33, 0x8a, 0x1f, + 0xca, 0xc2, 0xba, 0x7e, 0x2e, 0x03, 0x2b, 0x42, + 0x7e, 0x0b, 0x04, 0x90, 0x31, 0x65, 0xac, 0xa9, + 0x57, 0xd8, 0xd0, 0x55, 0x3d, 0x87, 0x17, 0xb0 }, + .public = { 0xea, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x19, 0x23, 0x0e, 0xb1, 0x48, 0xd5, 0xd6, 0x7c, + 0x3c, 0x22, 0xab, 0x1d, 0xae, 0xff, 0x80, 0xa5, + 0x7e, 0xae, 0x42, 0x65, 0xce, 0x28, 0x72, 0x65, + 0x7b, 0x2c, 0x80, 0x99, 0xfc, 0x69, 0x8e, 0x50 }, + .valid = true + }, + /* wycheproof - edge case for public key */ + { + .private = { 0x38, 0x6f, 0x7f, 0x16, 0xc5, 0x07, 0x31, 0xd6, + 0x4f, 0x82, 0xe6, 0xa1, 0x70, 0xb1, 0x42, 0xa4, + 0xe3, 0x4f, 0x31, 0xfd, 0x77, 0x68, 0xfc, 0xb8, + 0x90, 0x29, 0x25, 0xe7, 0xd1, 0xe2, 0x1a, 0xbe }, + .public = { 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .result = { 0x0f, 0xca, 0xb5, 0xd8, 0x42, 0xa0, 0x78, 0xd7, + 0xa7, 0x1f, 0xc5, 0x9b, 0x57, 0xbf, 0xb4, 0xca, + 0x0b, 0xe6, 0x87, 0x3b, 0x49, 0xdc, 0xdb, 0x9f, + 0x44, 0xe1, 0x4a, 0xe8, 0xfb, 0xdf, 0xa5, 0x42 }, + .valid = true + }, + /* wycheproof - edge case for public key */ + { + .private = { 0xe0, 0x23, 0xa2, 0x89, 0xbd, 0x5e, 0x90, 0xfa, + 0x28, 0x04, 0xdd, 0xc0, 0x19, 0xa0, 0x5e, 0xf3, + 0xe7, 0x9d, 0x43, 0x4b, 0xb6, 0xea, 0x2f, 0x52, + 0x2e, 0xcb, 0x64, 0x3a, 0x75, 0x29, 0x6e, 0x95 }, + .public = { 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, + 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, + 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00, + 0xff, 0xff, 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 }, + .result = { 0x54, 0xce, 0x8f, 0x22, 0x75, 0xc0, 0x77, 0xe3, + 0xb1, 0x30, 0x6a, 0x39, 0x39, 0xc5, 0xe0, 0x3e, + 0xef, 0x6b, 0xbb, 0x88, 0x06, 0x05, 0x44, 0x75, + 0x8d, 0x9f, 0xef, 0x59, 0xb0, 0xbc, 0x3e, 0x4f }, + .valid = true + }, + /* wycheproof - edge case for public key */ + { + .private = { 0x68, 0xf0, 0x10, 0xd6, 0x2e, 0xe8, 0xd9, 0x26, + 0x05, 0x3a, 0x36, 0x1c, 0x3a, 0x75, 0xc6, 0xea, + 0x4e, 0xbd, 0xc8, 0x60, 0x6a, 0xb2, 0x85, 0x00, + 0x3a, 0x6f, 0x8f, 0x40, 0x76, 0xb0, 0x1e, 0x83 }, + .public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x03 }, + .result = { 0xf1, 0x36, 0x77, 0x5c, 0x5b, 0xeb, 0x0a, 0xf8, + 0x11, 0x0a, 0xf1, 0x0b, 0x20, 0x37, 0x23, 0x32, + 0x04, 0x3c, 0xab, 0x75, 0x24, 0x19, 0x67, 0x87, + 0x75, 0xa2, 0x23, 0xdf, 0x57, 0xc9, 0xd3, 0x0d }, + .valid = true + }, + /* wycheproof - edge case for public key */ + { + .private = { 0x58, 0xeb, 0xcb, 0x35, 0xb0, 0xf8, 0x84, 0x5c, + 0xaf, 0x1e, 0xc6, 0x30, 0xf9, 0x65, 0x76, 0xb6, + 0x2c, 0x4b, 0x7b, 0x6c, 0x36, 0xb2, 0x9d, 0xeb, + 0x2c, 0xb0, 0x08, 0x46, 0x51, 0x75, 0x5c, 0x96 }, + .public = { 0xff, 0xff, 0xff, 0xfb, 0xff, 0xff, 0xfb, 0xff, + 0xff, 0xdf, 0xff, 0xff, 0xdf, 0xff, 0xff, 0xff, + 0xfe, 0xff, 0xff, 0xfe, 0xff, 0xff, 0xf7, 0xff, + 0xff, 0xf7, 0xff, 0xff, 0xbf, 0xff, 0xff, 0x3f }, + .result = { 0xbf, 0x9a, 0xff, 0xd0, 0x6b, 0x84, 0x40, 0x85, + 0x58, 0x64, 0x60, 0x96, 0x2e, 0xf2, 0x14, 0x6f, + 0xf3, 0xd4, 0x53, 0x3d, 0x94, 0x44, 0xaa, 0xb0, + 0x06, 0xeb, 0x88, 0xcc, 0x30, 0x54, 0x40, 0x7d }, + .valid = true + }, + /* wycheproof - edge case for public key */ + { + .private = { 0x18, 0x8c, 0x4b, 0xc5, 0xb9, 0xc4, 0x4b, 0x38, + 0xbb, 0x65, 0x8b, 0x9b, 0x2a, 0xe8, 0x2d, 0x5b, + 0x01, 0x01, 0x5e, 0x09, 0x31, 0x84, 0xb1, 0x7c, + 0xb7, 0x86, 0x35, 0x03, 0xa7, 0x83, 0xe1, 0xbb }, + .public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f }, + .result = { 0xd4, 0x80, 0xde, 0x04, 0xf6, 0x99, 0xcb, 0x3b, + 0xe0, 0x68, 0x4a, 0x9c, 0xc2, 0xe3, 0x12, 0x81, + 0xea, 0x0b, 0xc5, 0xa9, 0xdc, 0xc1, 0x57, 0xd3, + 0xd2, 0x01, 0x58, 0xd4, 0x6c, 0xa5, 0x24, 0x6d }, + .valid = true + }, + /* wycheproof - edge case for public key */ + { + .private = { 0xe0, 0x6c, 0x11, 0xbb, 0x2e, 0x13, 0xce, 0x3d, + 0xc7, 0x67, 0x3f, 0x67, 0xf5, 0x48, 0x22, 0x42, + 0x90, 0x94, 0x23, 0xa9, 0xae, 0x95, 0xee, 0x98, + 0x6a, 0x98, 0x8d, 0x98, 0xfa, 0xee, 0x23, 0xa2 }, + .public = { 0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f, + 0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f, + 0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f, + 0xff, 0xff, 0xff, 0xff, 0xfe, 0xff, 0xff, 0x7f }, + .result = { 0x4c, 0x44, 0x01, 0xcc, 0xe6, 0xb5, 0x1e, 0x4c, + 0xb1, 0x8f, 0x27, 0x90, 0x24, 0x6c, 0x9b, 0xf9, + 0x14, 0xdb, 0x66, 0x77, 0x50, 0xa1, 0xcb, 0x89, + 0x06, 0x90, 0x92, 0xaf, 0x07, 0x29, 0x22, 0x76 }, + .valid = true + }, + /* wycheproof - edge case for public key */ + { + .private = { 0xc0, 0x65, 0x8c, 0x46, 0xdd, 0xe1, 0x81, 0x29, + 0x29, 0x38, 0x77, 0x53, 0x5b, 0x11, 0x62, 0xb6, + 0xf9, 0xf5, 0x41, 0x4a, 0x23, 0xcf, 0x4d, 0x2c, + 0xbc, 0x14, 0x0a, 0x4d, 0x99, 0xda, 0x2b, 0x8f }, + .public = { 0xeb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x57, 0x8b, 0xa8, 0xcc, 0x2d, 0xbd, 0xc5, 0x75, + 0xaf, 0xcf, 0x9d, 0xf2, 0xb3, 0xee, 0x61, 0x89, + 0xf5, 0x33, 0x7d, 0x68, 0x54, 0xc7, 0x9b, 0x4c, + 0xe1, 0x65, 0xea, 0x12, 0x29, 0x3b, 0x3a, 0x0f }, + .valid = true + }, + /* wycheproof - public key with low order */ + { + .private = { 0x10, 0x25, 0x5c, 0x92, 0x30, 0xa9, 0x7a, 0x30, + 0xa4, 0x58, 0xca, 0x28, 0x4a, 0x62, 0x96, 0x69, + 0x29, 0x3a, 0x31, 0x89, 0x0c, 0xda, 0x9d, 0x14, + 0x7f, 0xeb, 0xc7, 0xd1, 0xe2, 0x2d, 0x6b, 0xb1 }, + .public = { 0xe0, 0xeb, 0x7a, 0x7c, 0x3b, 0x41, 0xb8, 0xae, + 0x16, 0x56, 0xe3, 0xfa, 0xf1, 0x9f, 0xc4, 0x6a, + 0xda, 0x09, 0x8d, 0xeb, 0x9c, 0x32, 0xb1, 0xfd, + 0x86, 0x62, 0x05, 0x16, 0x5f, 0x49, 0xb8, 0x00 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0x78, 0xf1, 0xe8, 0xed, 0xf1, 0x44, 0x81, 0xb3, + 0x89, 0x44, 0x8d, 0xac, 0x8f, 0x59, 0xc7, 0x0b, + 0x03, 0x8e, 0x7c, 0xf9, 0x2e, 0xf2, 0xc7, 0xef, + 0xf5, 0x7a, 0x72, 0x46, 0x6e, 0x11, 0x52, 0x96 }, + .public = { 0x5f, 0x9c, 0x95, 0xbc, 0xa3, 0x50, 0x8c, 0x24, + 0xb1, 0xd0, 0xb1, 0x55, 0x9c, 0x83, 0xef, 0x5b, + 0x04, 0x44, 0x5c, 0xc4, 0x58, 0x1c, 0x8e, 0x86, + 0xd8, 0x22, 0x4e, 0xdd, 0xd0, 0x9f, 0x11, 0x57 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0xa0, 0xa0, 0x5a, 0x3e, 0x8f, 0x9f, 0x44, 0x20, + 0x4d, 0x5f, 0x80, 0x59, 0xa9, 0x4a, 0xc7, 0xdf, + 0xc3, 0x9a, 0x49, 0xac, 0x01, 0x6d, 0xd7, 0x43, + 0xdb, 0xfa, 0x43, 0xc5, 0xd6, 0x71, 0xfd, 0x88 }, + .public = { 0xec, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0xd0, 0xdb, 0xb3, 0xed, 0x19, 0x06, 0x66, 0x3f, + 0x15, 0x42, 0x0a, 0xf3, 0x1f, 0x4e, 0xaf, 0x65, + 0x09, 0xd9, 0xa9, 0x94, 0x97, 0x23, 0x50, 0x06, + 0x05, 0xad, 0x7c, 0x1c, 0x6e, 0x74, 0x50, 0xa9 }, + .public = { 0xed, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0xc0, 0xb1, 0xd0, 0xeb, 0x22, 0xb2, 0x44, 0xfe, + 0x32, 0x91, 0x14, 0x00, 0x72, 0xcd, 0xd9, 0xd9, + 0x89, 0xb5, 0xf0, 0xec, 0xd9, 0x6c, 0x10, 0x0f, + 0xeb, 0x5b, 0xca, 0x24, 0x1c, 0x1d, 0x9f, 0x8f }, + .public = { 0xee, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0x48, 0x0b, 0xf4, 0x5f, 0x59, 0x49, 0x42, 0xa8, + 0xbc, 0x0f, 0x33, 0x53, 0xc6, 0xe8, 0xb8, 0x85, + 0x3d, 0x77, 0xf3, 0x51, 0xf1, 0xc2, 0xca, 0x6c, + 0x2d, 0x1a, 0xbf, 0x8a, 0x00, 0xb4, 0x22, 0x9c }, + .public = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0x30, 0xf9, 0x93, 0xfc, 0xf8, 0x51, 0x4f, 0xc8, + 0x9b, 0xd8, 0xdb, 0x14, 0xcd, 0x43, 0xba, 0x0d, + 0x4b, 0x25, 0x30, 0xe7, 0x3c, 0x42, 0x76, 0xa0, + 0x5e, 0x1b, 0x14, 0x5d, 0x42, 0x0c, 0xed, 0xb4 }, + .public = { 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0xc0, 0x49, 0x74, 0xb7, 0x58, 0x38, 0x0e, 0x2a, + 0x5b, 0x5d, 0xf6, 0xeb, 0x09, 0xbb, 0x2f, 0x6b, + 0x34, 0x34, 0xf9, 0x82, 0x72, 0x2a, 0x8e, 0x67, + 0x6d, 0x3d, 0xa2, 0x51, 0xd1, 0xb3, 0xde, 0x83 }, + .public = { 0xe0, 0xeb, 0x7a, 0x7c, 0x3b, 0x41, 0xb8, 0xae, + 0x16, 0x56, 0xe3, 0xfa, 0xf1, 0x9f, 0xc4, 0x6a, + 0xda, 0x09, 0x8d, 0xeb, 0x9c, 0x32, 0xb1, 0xfd, + 0x86, 0x62, 0x05, 0x16, 0x5f, 0x49, 0xb8, 0x80 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0x50, 0x2a, 0x31, 0x37, 0x3d, 0xb3, 0x24, 0x46, + 0x84, 0x2f, 0xe5, 0xad, 0xd3, 0xe0, 0x24, 0x02, + 0x2e, 0xa5, 0x4f, 0x27, 0x41, 0x82, 0xaf, 0xc3, + 0xd9, 0xf1, 0xbb, 0x3d, 0x39, 0x53, 0x4e, 0xb5 }, + .public = { 0x5f, 0x9c, 0x95, 0xbc, 0xa3, 0x50, 0x8c, 0x24, + 0xb1, 0xd0, 0xb1, 0x55, 0x9c, 0x83, 0xef, 0x5b, + 0x04, 0x44, 0x5c, 0xc4, 0x58, 0x1c, 0x8e, 0x86, + 0xd8, 0x22, 0x4e, 0xdd, 0xd0, 0x9f, 0x11, 0xd7 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0x90, 0xfa, 0x64, 0x17, 0xb0, 0xe3, 0x70, 0x30, + 0xfd, 0x6e, 0x43, 0xef, 0xf2, 0xab, 0xae, 0xf1, + 0x4c, 0x67, 0x93, 0x11, 0x7a, 0x03, 0x9c, 0xf6, + 0x21, 0x31, 0x8b, 0xa9, 0x0f, 0x4e, 0x98, 0xbe }, + .public = { 0xec, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0x78, 0xad, 0x3f, 0x26, 0x02, 0x7f, 0x1c, 0x9f, + 0xdd, 0x97, 0x5a, 0x16, 0x13, 0xb9, 0x47, 0x77, + 0x9b, 0xad, 0x2c, 0xf2, 0xb7, 0x41, 0xad, 0xe0, + 0x18, 0x40, 0x88, 0x5a, 0x30, 0xbb, 0x97, 0x9c }, + .public = { 0xed, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key with low order */ + { + .private = { 0x98, 0xe2, 0x3d, 0xe7, 0xb1, 0xe0, 0x92, 0x6e, + 0xd9, 0xc8, 0x7e, 0x7b, 0x14, 0xba, 0xf5, 0x5f, + 0x49, 0x7a, 0x1d, 0x70, 0x96, 0xf9, 0x39, 0x77, + 0x68, 0x0e, 0x44, 0xdc, 0x1c, 0x7b, 0x7b, 0x8b }, + .public = { 0xee, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = false + }, + /* wycheproof - public key >= p */ + { + .private = { 0xf0, 0x1e, 0x48, 0xda, 0xfa, 0xc9, 0xd7, 0xbc, + 0xf5, 0x89, 0xcb, 0xc3, 0x82, 0xc8, 0x78, 0xd1, + 0x8b, 0xda, 0x35, 0x50, 0x58, 0x9f, 0xfb, 0x5d, + 0x50, 0xb5, 0x23, 0xbe, 0xbe, 0x32, 0x9d, 0xae }, + .public = { 0xef, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0xbd, 0x36, 0xa0, 0x79, 0x0e, 0xb8, 0x83, 0x09, + 0x8c, 0x98, 0x8b, 0x21, 0x78, 0x67, 0x73, 0xde, + 0x0b, 0x3a, 0x4d, 0xf1, 0x62, 0x28, 0x2c, 0xf1, + 0x10, 0xde, 0x18, 0xdd, 0x48, 0x4c, 0xe7, 0x4b }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x28, 0x87, 0x96, 0xbc, 0x5a, 0xff, 0x4b, 0x81, + 0xa3, 0x75, 0x01, 0x75, 0x7b, 0xc0, 0x75, 0x3a, + 0x3c, 0x21, 0x96, 0x47, 0x90, 0xd3, 0x86, 0x99, + 0x30, 0x8d, 0xeb, 0xc1, 0x7a, 0x6e, 0xaf, 0x8d }, + .public = { 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0xb4, 0xe0, 0xdd, 0x76, 0xda, 0x7b, 0x07, 0x17, + 0x28, 0xb6, 0x1f, 0x85, 0x67, 0x71, 0xaa, 0x35, + 0x6e, 0x57, 0xed, 0xa7, 0x8a, 0x5b, 0x16, 0x55, + 0xcc, 0x38, 0x20, 0xfb, 0x5f, 0x85, 0x4c, 0x5c }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x98, 0xdf, 0x84, 0x5f, 0x66, 0x51, 0xbf, 0x11, + 0x38, 0x22, 0x1f, 0x11, 0x90, 0x41, 0xf7, 0x2b, + 0x6d, 0xbc, 0x3c, 0x4a, 0xce, 0x71, 0x43, 0xd9, + 0x9f, 0xd5, 0x5a, 0xd8, 0x67, 0x48, 0x0d, 0xa8 }, + .public = { 0xf1, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x6f, 0xdf, 0x6c, 0x37, 0x61, 0x1d, 0xbd, 0x53, + 0x04, 0xdc, 0x0f, 0x2e, 0xb7, 0xc9, 0x51, 0x7e, + 0xb3, 0xc5, 0x0e, 0x12, 0xfd, 0x05, 0x0a, 0xc6, + 0xde, 0xc2, 0x70, 0x71, 0xd4, 0xbf, 0xc0, 0x34 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0xf0, 0x94, 0x98, 0xe4, 0x6f, 0x02, 0xf8, 0x78, + 0x82, 0x9e, 0x78, 0xb8, 0x03, 0xd3, 0x16, 0xa2, + 0xed, 0x69, 0x5d, 0x04, 0x98, 0xa0, 0x8a, 0xbd, + 0xf8, 0x27, 0x69, 0x30, 0xe2, 0x4e, 0xdc, 0xb0 }, + .public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .result = { 0x4c, 0x8f, 0xc4, 0xb1, 0xc6, 0xab, 0x88, 0xfb, + 0x21, 0xf1, 0x8f, 0x6d, 0x4c, 0x81, 0x02, 0x40, + 0xd4, 0xe9, 0x46, 0x51, 0xba, 0x44, 0xf7, 0xa2, + 0xc8, 0x63, 0xce, 0xc7, 0xdc, 0x56, 0x60, 0x2d }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x18, 0x13, 0xc1, 0x0a, 0x5c, 0x7f, 0x21, 0xf9, + 0x6e, 0x17, 0xf2, 0x88, 0xc0, 0xcc, 0x37, 0x60, + 0x7c, 0x04, 0xc5, 0xf5, 0xae, 0xa2, 0xdb, 0x13, + 0x4f, 0x9e, 0x2f, 0xfc, 0x66, 0xbd, 0x9d, 0xb8 }, + .public = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 }, + .result = { 0x1c, 0xd0, 0xb2, 0x82, 0x67, 0xdc, 0x54, 0x1c, + 0x64, 0x2d, 0x6d, 0x7d, 0xca, 0x44, 0xa8, 0xb3, + 0x8a, 0x63, 0x73, 0x6e, 0xef, 0x5c, 0x4e, 0x65, + 0x01, 0xff, 0xbb, 0xb1, 0x78, 0x0c, 0x03, 0x3c }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x78, 0x57, 0xfb, 0x80, 0x86, 0x53, 0x64, 0x5a, + 0x0b, 0xeb, 0x13, 0x8a, 0x64, 0xf5, 0xf4, 0xd7, + 0x33, 0xa4, 0x5e, 0xa8, 0x4c, 0x3c, 0xda, 0x11, + 0xa9, 0xc0, 0x6f, 0x7e, 0x71, 0x39, 0x14, 0x9e }, + .public = { 0x03, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 }, + .result = { 0x87, 0x55, 0xbe, 0x01, 0xc6, 0x0a, 0x7e, 0x82, + 0x5c, 0xff, 0x3e, 0x0e, 0x78, 0xcb, 0x3a, 0xa4, + 0x33, 0x38, 0x61, 0x51, 0x6a, 0xa5, 0x9b, 0x1c, + 0x51, 0xa8, 0xb2, 0xa5, 0x43, 0xdf, 0xa8, 0x22 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0xe0, 0x3a, 0xa8, 0x42, 0xe2, 0xab, 0xc5, 0x6e, + 0x81, 0xe8, 0x7b, 0x8b, 0x9f, 0x41, 0x7b, 0x2a, + 0x1e, 0x59, 0x13, 0xc7, 0x23, 0xee, 0xd2, 0x8d, + 0x75, 0x2f, 0x8d, 0x47, 0xa5, 0x9f, 0x49, 0x8f }, + .public = { 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80 }, + .result = { 0x54, 0xc9, 0xa1, 0xed, 0x95, 0xe5, 0x46, 0xd2, + 0x78, 0x22, 0xa3, 0x60, 0x93, 0x1d, 0xda, 0x60, + 0xa1, 0xdf, 0x04, 0x9d, 0xa6, 0xf9, 0x04, 0x25, + 0x3c, 0x06, 0x12, 0xbb, 0xdc, 0x08, 0x74, 0x76 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0xf8, 0xf7, 0x07, 0xb7, 0x99, 0x9b, 0x18, 0xcb, + 0x0d, 0x6b, 0x96, 0x12, 0x4f, 0x20, 0x45, 0x97, + 0x2c, 0xa2, 0x74, 0xbf, 0xc1, 0x54, 0xad, 0x0c, + 0x87, 0x03, 0x8c, 0x24, 0xc6, 0xd0, 0xd4, 0xb2 }, + .public = { 0xda, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0xcc, 0x1f, 0x40, 0xd7, 0x43, 0xcd, 0xc2, 0x23, + 0x0e, 0x10, 0x43, 0xda, 0xba, 0x8b, 0x75, 0xe8, + 0x10, 0xf1, 0xfb, 0xab, 0x7f, 0x25, 0x52, 0x69, + 0xbd, 0x9e, 0xbb, 0x29, 0xe6, 0xbf, 0x49, 0x4f }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0xa0, 0x34, 0xf6, 0x84, 0xfa, 0x63, 0x1e, 0x1a, + 0x34, 0x81, 0x18, 0xc1, 0xce, 0x4c, 0x98, 0x23, + 0x1f, 0x2d, 0x9e, 0xec, 0x9b, 0xa5, 0x36, 0x5b, + 0x4a, 0x05, 0xd6, 0x9a, 0x78, 0x5b, 0x07, 0x96 }, + .public = { 0xdb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x54, 0x99, 0x8e, 0xe4, 0x3a, 0x5b, 0x00, 0x7b, + 0xf4, 0x99, 0xf0, 0x78, 0xe7, 0x36, 0x52, 0x44, + 0x00, 0xa8, 0xb5, 0xc7, 0xe9, 0xb9, 0xb4, 0x37, + 0x71, 0x74, 0x8c, 0x7c, 0xdf, 0x88, 0x04, 0x12 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x30, 0xb6, 0xc6, 0xa0, 0xf2, 0xff, 0xa6, 0x80, + 0x76, 0x8f, 0x99, 0x2b, 0xa8, 0x9e, 0x15, 0x2d, + 0x5b, 0xc9, 0x89, 0x3d, 0x38, 0xc9, 0x11, 0x9b, + 0xe4, 0xf7, 0x67, 0xbf, 0xab, 0x6e, 0x0c, 0xa5 }, + .public = { 0xdc, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0xea, 0xd9, 0xb3, 0x8e, 0xfd, 0xd7, 0x23, 0x63, + 0x79, 0x34, 0xe5, 0x5a, 0xb7, 0x17, 0xa7, 0xae, + 0x09, 0xeb, 0x86, 0xa2, 0x1d, 0xc3, 0x6a, 0x3f, + 0xee, 0xb8, 0x8b, 0x75, 0x9e, 0x39, 0x1e, 0x09 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x90, 0x1b, 0x9d, 0xcf, 0x88, 0x1e, 0x01, 0xe0, + 0x27, 0x57, 0x50, 0x35, 0xd4, 0x0b, 0x43, 0xbd, + 0xc1, 0xc5, 0x24, 0x2e, 0x03, 0x08, 0x47, 0x49, + 0x5b, 0x0c, 0x72, 0x86, 0x46, 0x9b, 0x65, 0x91 }, + .public = { 0xea, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x60, 0x2f, 0xf4, 0x07, 0x89, 0xb5, 0x4b, 0x41, + 0x80, 0x59, 0x15, 0xfe, 0x2a, 0x62, 0x21, 0xf0, + 0x7a, 0x50, 0xff, 0xc2, 0xc3, 0xfc, 0x94, 0xcf, + 0x61, 0xf1, 0x3d, 0x79, 0x04, 0xe8, 0x8e, 0x0e }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x80, 0x46, 0x67, 0x7c, 0x28, 0xfd, 0x82, 0xc9, + 0xa1, 0xbd, 0xb7, 0x1a, 0x1a, 0x1a, 0x34, 0xfa, + 0xba, 0x12, 0x25, 0xe2, 0x50, 0x7f, 0xe3, 0xf5, + 0x4d, 0x10, 0xbd, 0x5b, 0x0d, 0x86, 0x5f, 0x8e }, + .public = { 0xeb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0xe0, 0x0a, 0xe8, 0xb1, 0x43, 0x47, 0x12, 0x47, + 0xba, 0x24, 0xf1, 0x2c, 0x88, 0x55, 0x36, 0xc3, + 0xcb, 0x98, 0x1b, 0x58, 0xe1, 0xe5, 0x6b, 0x2b, + 0xaf, 0x35, 0xc1, 0x2a, 0xe1, 0xf7, 0x9c, 0x26 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x60, 0x2f, 0x7e, 0x2f, 0x68, 0xa8, 0x46, 0xb8, + 0x2c, 0xc2, 0x69, 0xb1, 0xd4, 0x8e, 0x93, 0x98, + 0x86, 0xae, 0x54, 0xfd, 0x63, 0x6c, 0x1f, 0xe0, + 0x74, 0xd7, 0x10, 0x12, 0x7d, 0x47, 0x24, 0x91 }, + .public = { 0xef, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x98, 0xcb, 0x9b, 0x50, 0xdd, 0x3f, 0xc2, 0xb0, + 0xd4, 0xf2, 0xd2, 0xbf, 0x7c, 0x5c, 0xfd, 0xd1, + 0x0c, 0x8f, 0xcd, 0x31, 0xfc, 0x40, 0xaf, 0x1a, + 0xd4, 0x4f, 0x47, 0xc1, 0x31, 0x37, 0x63, 0x62 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x60, 0x88, 0x7b, 0x3d, 0xc7, 0x24, 0x43, 0x02, + 0x6e, 0xbe, 0xdb, 0xbb, 0xb7, 0x06, 0x65, 0xf4, + 0x2b, 0x87, 0xad, 0xd1, 0x44, 0x0e, 0x77, 0x68, + 0xfb, 0xd7, 0xe8, 0xe2, 0xce, 0x5f, 0x63, 0x9d }, + .public = { 0xf0, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x38, 0xd6, 0x30, 0x4c, 0x4a, 0x7e, 0x6d, 0x9f, + 0x79, 0x59, 0x33, 0x4f, 0xb5, 0x24, 0x5b, 0xd2, + 0xc7, 0x54, 0x52, 0x5d, 0x4c, 0x91, 0xdb, 0x95, + 0x02, 0x06, 0x92, 0x62, 0x34, 0xc1, 0xf6, 0x33 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0x78, 0xd3, 0x1d, 0xfa, 0x85, 0x44, 0x97, 0xd7, + 0x2d, 0x8d, 0xef, 0x8a, 0x1b, 0x7f, 0xb0, 0x06, + 0xce, 0xc2, 0xd8, 0xc4, 0x92, 0x46, 0x47, 0xc9, + 0x38, 0x14, 0xae, 0x56, 0xfa, 0xed, 0xa4, 0x95 }, + .public = { 0xf1, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x78, 0x6c, 0xd5, 0x49, 0x96, 0xf0, 0x14, 0xa5, + 0xa0, 0x31, 0xec, 0x14, 0xdb, 0x81, 0x2e, 0xd0, + 0x83, 0x55, 0x06, 0x1f, 0xdb, 0x5d, 0xe6, 0x80, + 0xa8, 0x00, 0xac, 0x52, 0x1f, 0x31, 0x8e, 0x23 }, + .valid = true + }, + /* wycheproof - public key >= p */ + { + .private = { 0xc0, 0x4c, 0x5b, 0xae, 0xfa, 0x83, 0x02, 0xdd, + 0xde, 0xd6, 0xa4, 0xbb, 0x95, 0x77, 0x61, 0xb4, + 0xeb, 0x97, 0xae, 0xfa, 0x4f, 0xc3, 0xb8, 0x04, + 0x30, 0x85, 0xf9, 0x6a, 0x56, 0x59, 0xb3, 0xa5 }, + .public = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }, + .result = { 0x29, 0xae, 0x8b, 0xc7, 0x3e, 0x9b, 0x10, 0xa0, + 0x8b, 0x4f, 0x68, 0x1c, 0x43, 0xc3, 0xe0, 0xac, + 0x1a, 0x17, 0x1d, 0x31, 0xb3, 0x8f, 0x1a, 0x48, + 0xef, 0xba, 0x29, 0xae, 0x63, 0x9e, 0xa1, 0x34 }, + .valid = true + }, + /* wycheproof - RFC 7748 */ + { + .private = { 0xa0, 0x46, 0xe3, 0x6b, 0xf0, 0x52, 0x7c, 0x9d, + 0x3b, 0x16, 0x15, 0x4b, 0x82, 0x46, 0x5e, 0xdd, + 0x62, 0x14, 0x4c, 0x0a, 0xc1, 0xfc, 0x5a, 0x18, + 0x50, 0x6a, 0x22, 0x44, 0xba, 0x44, 0x9a, 0x44 }, + .public = { 0xe6, 0xdb, 0x68, 0x67, 0x58, 0x30, 0x30, 0xdb, + 0x35, 0x94, 0xc1, 0xa4, 0x24, 0xb1, 0x5f, 0x7c, + 0x72, 0x66, 0x24, 0xec, 0x26, 0xb3, 0x35, 0x3b, + 0x10, 0xa9, 0x03, 0xa6, 0xd0, 0xab, 0x1c, 0x4c }, + .result = { 0xc3, 0xda, 0x55, 0x37, 0x9d, 0xe9, 0xc6, 0x90, + 0x8e, 0x94, 0xea, 0x4d, 0xf2, 0x8d, 0x08, 0x4f, + 0x32, 0xec, 0xcf, 0x03, 0x49, 0x1c, 0x71, 0xf7, + 0x54, 0xb4, 0x07, 0x55, 0x77, 0xa2, 0x85, 0x52 }, + .valid = true + }, + /* wycheproof - RFC 7748 */ + { + .private = { 0x48, 0x66, 0xe9, 0xd4, 0xd1, 0xb4, 0x67, 0x3c, + 0x5a, 0xd2, 0x26, 0x91, 0x95, 0x7d, 0x6a, 0xf5, + 0xc1, 0x1b, 0x64, 0x21, 0xe0, 0xea, 0x01, 0xd4, + 0x2c, 0xa4, 0x16, 0x9e, 0x79, 0x18, 0xba, 0x4d }, + .public = { 0xe5, 0x21, 0x0f, 0x12, 0x78, 0x68, 0x11, 0xd3, + 0xf4, 0xb7, 0x95, 0x9d, 0x05, 0x38, 0xae, 0x2c, + 0x31, 0xdb, 0xe7, 0x10, 0x6f, 0xc0, 0x3c, 0x3e, + 0xfc, 0x4c, 0xd5, 0x49, 0xc7, 0x15, 0xa4, 0x13 }, + .result = { 0x95, 0xcb, 0xde, 0x94, 0x76, 0xe8, 0x90, 0x7d, + 0x7a, 0xad, 0xe4, 0x5c, 0xb4, 0xb8, 0x73, 0xf8, + 0x8b, 0x59, 0x5a, 0x68, 0x79, 0x9f, 0xa1, 0x52, + 0xe6, 0xf8, 0xf7, 0x64, 0x7a, 0xac, 0x79, 0x57 }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x0a, 0xb4, 0xe7, 0x63, 0x80, 0xd8, 0x4d, 0xde, + 0x4f, 0x68, 0x33, 0xc5, 0x8f, 0x2a, 0x9f, 0xb8, + 0xf8, 0x3b, 0xb0, 0x16, 0x9b, 0x17, 0x2b, 0xe4, + 0xb6, 0xe0, 0x59, 0x28, 0x87, 0x74, 0x1a, 0x36 }, + .result = { 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x89, 0xe1, 0x0d, 0x57, 0x01, 0xb4, 0x33, 0x7d, + 0x2d, 0x03, 0x21, 0x81, 0x53, 0x8b, 0x10, 0x64, + 0xbd, 0x40, 0x84, 0x40, 0x1c, 0xec, 0xa1, 0xfd, + 0x12, 0x66, 0x3a, 0x19, 0x59, 0x38, 0x80, 0x00 }, + .result = { 0x09, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x2b, 0x55, 0xd3, 0xaa, 0x4a, 0x8f, 0x80, 0xc8, + 0xc0, 0xb2, 0xae, 0x5f, 0x93, 0x3e, 0x85, 0xaf, + 0x49, 0xbe, 0xac, 0x36, 0xc2, 0xfa, 0x73, 0x94, + 0xba, 0xb7, 0x6c, 0x89, 0x33, 0xf8, 0xf8, 0x1d }, + .result = { 0x10, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x63, 0xe5, 0xb1, 0xfe, 0x96, 0x01, 0xfe, 0x84, + 0x38, 0x5d, 0x88, 0x66, 0xb0, 0x42, 0x12, 0x62, + 0xf7, 0x8f, 0xbf, 0xa5, 0xaf, 0xf9, 0x58, 0x5e, + 0x62, 0x66, 0x79, 0xb1, 0x85, 0x47, 0xd9, 0x59 }, + .result = { 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0xe4, 0x28, 0xf3, 0xda, 0xc1, 0x78, 0x09, 0xf8, + 0x27, 0xa5, 0x22, 0xce, 0x32, 0x35, 0x50, 0x58, + 0xd0, 0x73, 0x69, 0x36, 0x4a, 0xa7, 0x89, 0x02, + 0xee, 0x10, 0x13, 0x9b, 0x9f, 0x9d, 0xd6, 0x53 }, + .result = { 0xfc, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0xb3, 0xb5, 0x0e, 0x3e, 0xd3, 0xa4, 0x07, 0xb9, + 0x5d, 0xe9, 0x42, 0xef, 0x74, 0x57, 0x5b, 0x5a, + 0xb8, 0xa1, 0x0c, 0x09, 0xee, 0x10, 0x35, 0x44, + 0xd6, 0x0b, 0xdf, 0xed, 0x81, 0x38, 0xab, 0x2b }, + .result = { 0xf9, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x21, 0x3f, 0xff, 0xe9, 0x3d, 0x5e, 0xa8, 0xcd, + 0x24, 0x2e, 0x46, 0x28, 0x44, 0x02, 0x99, 0x22, + 0xc4, 0x3c, 0x77, 0xc9, 0xe3, 0xe4, 0x2f, 0x56, + 0x2f, 0x48, 0x5d, 0x24, 0xc5, 0x01, 0xa2, 0x0b }, + .result = { 0xf3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x3f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x91, 0xb2, 0x32, 0xa1, 0x78, 0xb3, 0xcd, 0x53, + 0x09, 0x32, 0x44, 0x1e, 0x61, 0x39, 0x41, 0x8f, + 0x72, 0x17, 0x22, 0x92, 0xf1, 0xda, 0x4c, 0x18, + 0x34, 0xfc, 0x5e, 0xbf, 0xef, 0xb5, 0x1e, 0x3f }, + .result = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x03 }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x04, 0x5c, 0x6e, 0x11, 0xc5, 0xd3, 0x32, 0x55, + 0x6c, 0x78, 0x22, 0xfe, 0x94, 0xeb, 0xf8, 0x9b, + 0x56, 0xa3, 0x87, 0x8d, 0xc2, 0x7c, 0xa0, 0x79, + 0x10, 0x30, 0x58, 0x84, 0x9f, 0xab, 0xcb, 0x4f }, + .result = { 0xe5, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x1c, 0xa2, 0x19, 0x0b, 0x71, 0x16, 0x35, 0x39, + 0x06, 0x3c, 0x35, 0x77, 0x3b, 0xda, 0x0c, 0x9c, + 0x92, 0x8e, 0x91, 0x36, 0xf0, 0x62, 0x0a, 0xeb, + 0x09, 0x3f, 0x09, 0x91, 0x97, 0xb7, 0xf7, 0x4e }, + .result = { 0xe3, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0xf7, 0x6e, 0x90, 0x10, 0xac, 0x33, 0xc5, 0x04, + 0x3b, 0x2d, 0x3b, 0x76, 0xa8, 0x42, 0x17, 0x10, + 0x00, 0xc4, 0x91, 0x62, 0x22, 0xe9, 0xe8, 0x58, + 0x97, 0xa0, 0xae, 0xc7, 0xf6, 0x35, 0x0b, 0x3c }, + .result = { 0xdd, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0xbb, 0x72, 0x68, 0x8d, 0x8f, 0x8a, 0xa7, 0xa3, + 0x9c, 0xd6, 0x06, 0x0c, 0xd5, 0xc8, 0x09, 0x3c, + 0xde, 0xc6, 0xfe, 0x34, 0x19, 0x37, 0xc3, 0x88, + 0x6a, 0x99, 0x34, 0x6c, 0xd0, 0x7f, 0xaa, 0x55 }, + .result = { 0xdb, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x7f }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x88, 0xfd, 0xde, 0xa1, 0x93, 0x39, 0x1c, 0x6a, + 0x59, 0x33, 0xef, 0x9b, 0x71, 0x90, 0x15, 0x49, + 0x44, 0x72, 0x05, 0xaa, 0xe9, 0xda, 0x92, 0x8a, + 0x6b, 0x91, 0xa3, 0x52, 0xba, 0x10, 0xf4, 0x1f }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02 }, + .valid = true + }, + /* wycheproof - edge case for shared secret */ + { + .private = { 0xa0, 0xa4, 0xf1, 0x30, 0xb9, 0x8a, 0x5b, 0xe4, + 0xb1, 0xce, 0xdb, 0x7c, 0xb8, 0x55, 0x84, 0xa3, + 0x52, 0x0e, 0x14, 0x2d, 0x47, 0x4d, 0xc9, 0xcc, + 0xb9, 0x09, 0xa0, 0x73, 0xa9, 0x76, 0xbf, 0x63 }, + .public = { 0x30, 0x3b, 0x39, 0x2f, 0x15, 0x31, 0x16, 0xca, + 0xd9, 0xcc, 0x68, 0x2a, 0x00, 0xcc, 0xc4, 0x4c, + 0x95, 0xff, 0x0d, 0x3b, 0xbe, 0x56, 0x8b, 0xeb, + 0x6c, 0x4e, 0x73, 0x9b, 0xaf, 0xdc, 0x2c, 0x68 }, + .result = { 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x00 }, + .valid = true + }, + /* wycheproof - checking for overflow */ + { + .private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d, + 0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d, + 0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c, + 0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 }, + .public = { 0xfd, 0x30, 0x0a, 0xeb, 0x40, 0xe1, 0xfa, 0x58, + 0x25, 0x18, 0x41, 0x2b, 0x49, 0xb2, 0x08, 0xa7, + 0x84, 0x2b, 0x1e, 0x1f, 0x05, 0x6a, 0x04, 0x01, + 0x78, 0xea, 0x41, 0x41, 0x53, 0x4f, 0x65, 0x2d }, + .result = { 0xb7, 0x34, 0x10, 0x5d, 0xc2, 0x57, 0x58, 0x5d, + 0x73, 0xb5, 0x66, 0xcc, 0xb7, 0x6f, 0x06, 0x27, + 0x95, 0xcc, 0xbe, 0xc8, 0x91, 0x28, 0xe5, 0x2b, + 0x02, 0xf3, 0xe5, 0x96, 0x39, 0xf1, 0x3c, 0x46 }, + .valid = true + }, + /* wycheproof - checking for overflow */ + { + .private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d, + 0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d, + 0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c, + 0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 }, + .public = { 0xc8, 0xef, 0x79, 0xb5, 0x14, 0xd7, 0x68, 0x26, + 0x77, 0xbc, 0x79, 0x31, 0xe0, 0x6e, 0xe5, 0xc2, + 0x7c, 0x9b, 0x39, 0x2b, 0x4a, 0xe9, 0x48, 0x44, + 0x73, 0xf5, 0x54, 0xe6, 0x67, 0x8e, 0xcc, 0x2e }, + .result = { 0x64, 0x7a, 0x46, 0xb6, 0xfc, 0x3f, 0x40, 0xd6, + 0x21, 0x41, 0xee, 0x3c, 0xee, 0x70, 0x6b, 0x4d, + 0x7a, 0x92, 0x71, 0x59, 0x3a, 0x7b, 0x14, 0x3e, + 0x8e, 0x2e, 0x22, 0x79, 0x88, 0x3e, 0x45, 0x50 }, + .valid = true + }, + /* wycheproof - checking for overflow */ + { + .private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d, + 0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d, + 0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c, + 0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 }, + .public = { 0x64, 0xae, 0xac, 0x25, 0x04, 0x14, 0x48, 0x61, + 0x53, 0x2b, 0x7b, 0xbc, 0xb6, 0xc8, 0x7d, 0x67, + 0xdd, 0x4c, 0x1f, 0x07, 0xeb, 0xc2, 0xe0, 0x6e, + 0xff, 0xb9, 0x5a, 0xec, 0xc6, 0x17, 0x0b, 0x2c }, + .result = { 0x4f, 0xf0, 0x3d, 0x5f, 0xb4, 0x3c, 0xd8, 0x65, + 0x7a, 0x3c, 0xf3, 0x7c, 0x13, 0x8c, 0xad, 0xce, + 0xcc, 0xe5, 0x09, 0xe4, 0xeb, 0xa0, 0x89, 0xd0, + 0xef, 0x40, 0xb4, 0xe4, 0xfb, 0x94, 0x61, 0x55 }, + .valid = true + }, + /* wycheproof - checking for overflow */ + { + .private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d, + 0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d, + 0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c, + 0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 }, + .public = { 0xbf, 0x68, 0xe3, 0x5e, 0x9b, 0xdb, 0x7e, 0xee, + 0x1b, 0x50, 0x57, 0x02, 0x21, 0x86, 0x0f, 0x5d, + 0xcd, 0xad, 0x8a, 0xcb, 0xab, 0x03, 0x1b, 0x14, + 0x97, 0x4c, 0xc4, 0x90, 0x13, 0xc4, 0x98, 0x31 }, + .result = { 0x21, 0xce, 0xe5, 0x2e, 0xfd, 0xbc, 0x81, 0x2e, + 0x1d, 0x02, 0x1a, 0x4a, 0xf1, 0xe1, 0xd8, 0xbc, + 0x4d, 0xb3, 0xc4, 0x00, 0xe4, 0xd2, 0xa2, 0xc5, + 0x6a, 0x39, 0x26, 0xdb, 0x4d, 0x99, 0xc6, 0x5b }, + .valid = true + }, + /* wycheproof - checking for overflow */ + { + .private = { 0xc8, 0x17, 0x24, 0x70, 0x40, 0x00, 0xb2, 0x6d, + 0x31, 0x70, 0x3c, 0xc9, 0x7e, 0x3a, 0x37, 0x8d, + 0x56, 0xfa, 0xd8, 0x21, 0x93, 0x61, 0xc8, 0x8c, + 0xca, 0x8b, 0xd7, 0xc5, 0x71, 0x9b, 0x12, 0xb2 }, + .public = { 0x53, 0x47, 0xc4, 0x91, 0x33, 0x1a, 0x64, 0xb4, + 0x3d, 0xdc, 0x68, 0x30, 0x34, 0xe6, 0x77, 0xf5, + 0x3d, 0xc3, 0x2b, 0x52, 0xa5, 0x2a, 0x57, 0x7c, + 0x15, 0xa8, 0x3b, 0xf2, 0x98, 0xe9, 0x9f, 0x19 }, + .result = { 0x18, 0xcb, 0x89, 0xe4, 0xe2, 0x0c, 0x0c, 0x2b, + 0xd3, 0x24, 0x30, 0x52, 0x45, 0x26, 0x6c, 0x93, + 0x27, 0x69, 0x0b, 0xbe, 0x79, 0xac, 0xb8, 0x8f, + 0x5b, 0x8f, 0xb3, 0xf7, 0x4e, 0xca, 0x3e, 0x52 }, + .valid = true + }, + /* wycheproof - private key == -1 (mod order) */ + { + .private = { 0xa0, 0x23, 0xcd, 0xd0, 0x83, 0xef, 0x5b, 0xb8, + 0x2f, 0x10, 0xd6, 0x2e, 0x59, 0xe1, 0x5a, 0x68, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x50 }, + .public = { 0x25, 0x8e, 0x04, 0x52, 0x3b, 0x8d, 0x25, 0x3e, + 0xe6, 0x57, 0x19, 0xfc, 0x69, 0x06, 0xc6, 0x57, + 0x19, 0x2d, 0x80, 0x71, 0x7e, 0xdc, 0x82, 0x8f, + 0xa0, 0xaf, 0x21, 0x68, 0x6e, 0x2f, 0xaa, 0x75 }, + .result = { 0x25, 0x8e, 0x04, 0x52, 0x3b, 0x8d, 0x25, 0x3e, + 0xe6, 0x57, 0x19, 0xfc, 0x69, 0x06, 0xc6, 0x57, + 0x19, 0x2d, 0x80, 0x71, 0x7e, 0xdc, 0x82, 0x8f, + 0xa0, 0xaf, 0x21, 0x68, 0x6e, 0x2f, 0xaa, 0x75 }, + .valid = true + }, + /* wycheproof - private key == 1 (mod order) on twist */ + { + .private = { 0x58, 0x08, 0x3d, 0xd2, 0x61, 0xad, 0x91, 0xef, + 0xf9, 0x52, 0x32, 0x2e, 0xc8, 0x24, 0xc6, 0x82, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, + 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x5f }, + .public = { 0x2e, 0xae, 0x5e, 0xc3, 0xdd, 0x49, 0x4e, 0x9f, + 0x2d, 0x37, 0xd2, 0x58, 0xf8, 0x73, 0xa8, 0xe6, + 0xe9, 0xd0, 0xdb, 0xd1, 0xe3, 0x83, 0xef, 0x64, + 0xd9, 0x8b, 0xb9, 0x1b, 0x3e, 0x0b, 0xe0, 0x35 }, + .result = { 0x2e, 0xae, 0x5e, 0xc3, 0xdd, 0x49, 0x4e, 0x9f, + 0x2d, 0x37, 0xd2, 0x58, 0xf8, 0x73, 0xa8, 0xe6, + 0xe9, 0xd0, 0xdb, 0xd1, 0xe3, 0x83, 0xef, 0x64, + 0xd9, 0x8b, 0xb9, 0x1b, 0x3e, 0x0b, 0xe0, 0x35 }, + .valid = true + } +}; + +static bool __init curve25519_selftest(void) +{ + bool success = true, ret, ret2; + size_t i = 0, j; + u8 in[CURVE25519_KEY_SIZE]; + u8 out[CURVE25519_KEY_SIZE], out2[CURVE25519_KEY_SIZE]; + + for (i = 0; i < ARRAY_SIZE(curve25519_test_vectors); ++i) { + memset(out, 0, CURVE25519_KEY_SIZE); + ret = curve25519(out, curve25519_test_vectors[i].private, + curve25519_test_vectors[i].public); + if (ret != curve25519_test_vectors[i].valid || + memcmp(out, curve25519_test_vectors[i].result, + CURVE25519_KEY_SIZE)) { + pr_info("curve25519 self-test %zu: FAIL\n", i + 1); + success = false; + break; + } + } + + for (i = 0; i < 5; ++i) { + get_random_bytes(in, sizeof(in)); + ret = curve25519_generate_public(out, in); + ret2 = curve25519(out2, in, (u8[CURVE25519_KEY_SIZE]){ 9 }); + if (ret != ret2 || memcmp(out, out2, CURVE25519_KEY_SIZE)) { + pr_info("curve25519 basepoint self-test %zu: FAIL: input - 0x", + i + 1); + for (j = CURVE25519_KEY_SIZE; j-- > 0;) + printk(KERN_CONT "%02x", in[j]); + printk(KERN_CONT "\n"); + success = false; + break; + } + } + + if (success) + pr_info("curve25519 self-tests: pass\n"); + return success; +} +#endif From patchwork Tue Sep 25 14:56:18 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147501 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp831002lji; Tue, 25 Sep 2018 07:57:44 -0700 (PDT) X-Google-Smtp-Source: ACcGV61XEaj6/QBz+BQ3jmzbtFOXGusdqN52uVFGnzqzGiB7o4oVGoygbrA2kaU9DfiTpUq0+4bw X-Received: by 2002:a63:1b52:: with SMTP id b18-v6mr1432315pgm.303.1537887464441; Tue, 25 Sep 2018 07:57:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887464; cv=none; d=google.com; s=arc-20160816; b=giS8QiVPW8jTVDulgtlRqIOT9bfJkM0LbEPzvKaeBCzyu6ckJGdcipqSfqd/b0s1dw sLnkmuoNXkDmuOF2zASdvwCtkuspxmu0Otds167+lU264lmq63V5yHxlCdsGVU6iKzvS HlorQPpLBfbLOz8mFgyK9qzaCOrGHZUGAQQ1ZGUWJcSaag49Z1LnGMWGItOaQsAHThbe inJyM2yJusgzBVuGZdkwevgFsOkdTG0sf2FZIaL0Ttz21QHdbEMJiW1im4BTuSGGbuMH 3rYY9B/RQx0MB/RubKPTusYFm4X3m/qnyZDN67cWClnmoSX1zZtq9CJHV/eIQhJ31RX/ UktA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6yByEMolZmE9b+Dz/yJ5G74fjZC9OYcboaJ+GP3xVmE=; b=Y1NzkHfPbzj2JBOwQtutsNHRDpJVWogNIZr5nEkV12+pUGQijnIDUTwZ2q2CLzZd+y joQI1dxR2PF8Z2Gx/Kd7fod/wzOvKLb7OnZBVnor3xRvfBF2oFedOW2dzHTQkH5U1cYd LD5VI6MtZDk9DgRa2lXw47rdBHNNcbOwLVu8wDyYjGZAlbk/DkViCxlqrpoU7sFGx5Np yZNjt8mxkGWnyeUtsH0C4yyLS3awyiITL0X4t23FmHappBIMVWogW/IOXzeWCD+nQTwF VrE/zNsSeuf16YRkXQD28J1eSm86zP+n9QSEDAE5A2ZI1ZdjBV7y6KXWFmWXZKQ8BzTV mAMw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=aAu1R6rl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z1-v6si2410381pfc.11.2018.09.25.07.57.43; Tue, 25 Sep 2018 07:57:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=aAu1R6rl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729963AbeIYVFc (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:32 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:58261 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729877AbeIYVFa (ORCPT ); Tue, 25 Sep 2018 17:05:30 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id e2d6bea0; Tue, 25 Sep 2018 14:38:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=AcrGrFn2zPeXjVWfUsiqa1uLU Nk=; b=aAu1R6rla8CjGM6EixNBnXdZ24VLJZ0q1AcmrOTiR7aFiYBs0G30YHDzo GuE7hBVSljJi7sCeW1dUNr7MEnkWoElhk0MqJjghd54KkKeC11qev2UQ3lMfA25H uinNAkEZnxtk8P2RtgteKsY0z6GJKbzdSGNW0AKz3HTvS3e7JgyZKLJ8Knb8lebA ud22hJeg4yTzg9pIG1RQHxgsfskCsMhRVE/DS6JumHmkAOTG4Mz7U5g/wNaKwIbM vp2CX8EkrZEfGFhqGWzHbwK3oM1bnl1YmUUpJWHFiZhlScT1MJpIVjkp5xvCx8Vf O7ZFFCyoHCu3LMtM3Y4XdLelwnbvQ== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 974421ce (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:38:56 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Russell King , linux-arm-kernel@lists.infradead.org Subject: [PATCH net-next v6 19/23] zinc: Curve25519 ARM implementation Date: Tue, 25 Sep 2018 16:56:18 +0200 Message-Id: <20180925145622.29959-20-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This comes from Dan Bernstein and Peter Schwabe's public domain NEON code, and has been modified to be friendly for kernel space, as well as removing some qhasm strangeness to be more idiomatic. Signed-off-by: Jason A. Donenfeld Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Russell King Cc: linux-arm-kernel@lists.infradead.org --- lib/zinc/Makefile | 1 + lib/zinc/curve25519/curve25519-arm-glue.h | 42 + lib/zinc/curve25519/curve25519-arm.S | 2095 +++++++++++++++++++++ lib/zinc/curve25519/curve25519.c | 2 + 4 files changed, 2140 insertions(+) create mode 100644 lib/zinc/curve25519/curve25519-arm-glue.h create mode 100644 lib/zinc/curve25519/curve25519-arm.S -- 2.19.0 diff --git a/lib/zinc/Makefile b/lib/zinc/Makefile index 65440438c6e5..be73c342f9ba 100644 --- a/lib/zinc/Makefile +++ b/lib/zinc/Makefile @@ -27,4 +27,5 @@ zinc_blake2s-$(CONFIG_ZINC_ARCH_X86_64) += blake2s/blake2s-x86_64.o obj-$(CONFIG_ZINC_BLAKE2S) += zinc_blake2s.o zinc_curve25519-y := curve25519/curve25519.o +zinc_curve25519-$(CONFIG_ZINC_ARCH_ARM) += curve25519/curve25519-arm.o obj-$(CONFIG_ZINC_CURVE25519) += zinc_curve25519.o diff --git a/lib/zinc/curve25519/curve25519-arm-glue.h b/lib/zinc/curve25519/curve25519-arm-glue.h new file mode 100644 index 000000000000..9211bcab5615 --- /dev/null +++ b/lib/zinc/curve25519/curve25519-arm-glue.h @@ -0,0 +1,42 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include +#include +#include + +#if defined(CONFIG_KERNEL_MODE_NEON) +asmlinkage void curve25519_neon(u8 mypublic[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE], + const u8 basepoint[CURVE25519_KEY_SIZE]); +#endif + +static bool curve25519_use_neon __ro_after_init; + +static void __init curve25519_fpu_init(void) +{ + curve25519_use_neon = elf_hwcap & HWCAP_NEON; +} + +static inline bool curve25519_arch(u8 mypublic[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE], + const u8 basepoint[CURVE25519_KEY_SIZE]) +{ +#if defined(CONFIG_KERNEL_MODE_NEON) + if (curve25519_use_neon && may_use_simd()) { + kernel_neon_begin(); + curve25519_neon(mypublic, secret, basepoint); + kernel_neon_end(); + return true; + } +#endif + return false; +} + +static inline bool curve25519_base_arch(u8 pub[CURVE25519_KEY_SIZE], + const u8 secret[CURVE25519_KEY_SIZE]) +{ + return false; +} diff --git a/lib/zinc/curve25519/curve25519-arm.S b/lib/zinc/curve25519/curve25519-arm.S new file mode 100644 index 000000000000..db6570c20fd1 --- /dev/null +++ b/lib/zinc/curve25519/curve25519-arm.S @@ -0,0 +1,2095 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * + * Based on public domain code from Daniel J. Bernstein and Peter Schwabe. This + * has been built from SUPERCOP's curve25519/neon2/scalarmult.pq using qhasm, + * but has subsequently been manually reworked for use in kernel space. + */ + +#ifdef CONFIG_KERNEL_MODE_NEON +#include + +.text +.fpu neon +.arch armv7-a +.align 4 + +ENTRY(curve25519_neon) + push {r4-r11, lr} + mov ip, sp + sub r3, sp, #704 + and r3, r3, #0xfffffff0 + mov sp, r3 + movw r4, #0 + movw r5, #254 + vmov.i32 q0, #1 + vshr.u64 q1, q0, #7 + vshr.u64 q0, q0, #8 + vmov.i32 d4, #19 + vmov.i32 d5, #38 + add r6, sp, #480 + vst1.8 {d2-d3}, [r6, : 128] + add r6, sp, #496 + vst1.8 {d0-d1}, [r6, : 128] + add r6, sp, #512 + vst1.8 {d4-d5}, [r6, : 128] + add r6, r3, #0 + vmov.i32 q2, #0 + vst1.8 {d4-d5}, [r6, : 128]! + vst1.8 {d4-d5}, [r6, : 128]! + vst1.8 d4, [r6, : 64] + add r6, r3, #0 + movw r7, #960 + sub r7, r7, #2 + neg r7, r7 + sub r7, r7, r7, LSL #7 + str r7, [r6] + add r6, sp, #672 + vld1.8 {d4-d5}, [r1]! + vld1.8 {d6-d7}, [r1] + vst1.8 {d4-d5}, [r6, : 128]! + vst1.8 {d6-d7}, [r6, : 128] + sub r1, r6, #16 + ldrb r6, [r1] + and r6, r6, #248 + strb r6, [r1] + ldrb r6, [r1, #31] + and r6, r6, #127 + orr r6, r6, #64 + strb r6, [r1, #31] + vmov.i64 q2, #0xffffffff + vshr.u64 q3, q2, #7 + vshr.u64 q2, q2, #6 + vld1.8 {d8}, [r2] + vld1.8 {d10}, [r2] + add r2, r2, #6 + vld1.8 {d12}, [r2] + vld1.8 {d14}, [r2] + add r2, r2, #6 + vld1.8 {d16}, [r2] + add r2, r2, #4 + vld1.8 {d18}, [r2] + vld1.8 {d20}, [r2] + add r2, r2, #6 + vld1.8 {d22}, [r2] + add r2, r2, #2 + vld1.8 {d24}, [r2] + vld1.8 {d26}, [r2] + vshr.u64 q5, q5, #26 + vshr.u64 q6, q6, #3 + vshr.u64 q7, q7, #29 + vshr.u64 q8, q8, #6 + vshr.u64 q10, q10, #25 + vshr.u64 q11, q11, #3 + vshr.u64 q12, q12, #12 + vshr.u64 q13, q13, #38 + vand q4, q4, q2 + vand q6, q6, q2 + vand q8, q8, q2 + vand q10, q10, q2 + vand q2, q12, q2 + vand q5, q5, q3 + vand q7, q7, q3 + vand q9, q9, q3 + vand q11, q11, q3 + vand q3, q13, q3 + add r2, r3, #48 + vadd.i64 q12, q4, q1 + vadd.i64 q13, q10, q1 + vshr.s64 q12, q12, #26 + vshr.s64 q13, q13, #26 + vadd.i64 q5, q5, q12 + vshl.i64 q12, q12, #26 + vadd.i64 q14, q5, q0 + vadd.i64 q11, q11, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q15, q11, q0 + vsub.i64 q4, q4, q12 + vshr.s64 q12, q14, #25 + vsub.i64 q10, q10, q13 + vshr.s64 q13, q15, #25 + vadd.i64 q6, q6, q12 + vshl.i64 q12, q12, #25 + vadd.i64 q14, q6, q1 + vadd.i64 q2, q2, q13 + vsub.i64 q5, q5, q12 + vshr.s64 q12, q14, #26 + vshl.i64 q13, q13, #25 + vadd.i64 q14, q2, q1 + vadd.i64 q7, q7, q12 + vshl.i64 q12, q12, #26 + vadd.i64 q15, q7, q0 + vsub.i64 q11, q11, q13 + vshr.s64 q13, q14, #26 + vsub.i64 q6, q6, q12 + vshr.s64 q12, q15, #25 + vadd.i64 q3, q3, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q14, q3, q0 + vadd.i64 q8, q8, q12 + vshl.i64 q12, q12, #25 + vadd.i64 q15, q8, q1 + add r2, r2, #8 + vsub.i64 q2, q2, q13 + vshr.s64 q13, q14, #25 + vsub.i64 q7, q7, q12 + vshr.s64 q12, q15, #26 + vadd.i64 q14, q13, q13 + vadd.i64 q9, q9, q12 + vtrn.32 d12, d14 + vshl.i64 q12, q12, #26 + vtrn.32 d13, d15 + vadd.i64 q0, q9, q0 + vadd.i64 q4, q4, q14 + vst1.8 d12, [r2, : 64]! + vshl.i64 q6, q13, #4 + vsub.i64 q7, q8, q12 + vshr.s64 q0, q0, #25 + vadd.i64 q4, q4, q6 + vadd.i64 q6, q10, q0 + vshl.i64 q0, q0, #25 + vadd.i64 q8, q6, q1 + vadd.i64 q4, q4, q13 + vshl.i64 q10, q13, #25 + vadd.i64 q1, q4, q1 + vsub.i64 q0, q9, q0 + vshr.s64 q8, q8, #26 + vsub.i64 q3, q3, q10 + vtrn.32 d14, d0 + vshr.s64 q1, q1, #26 + vtrn.32 d15, d1 + vadd.i64 q0, q11, q8 + vst1.8 d14, [r2, : 64] + vshl.i64 q7, q8, #26 + vadd.i64 q5, q5, q1 + vtrn.32 d4, d6 + vshl.i64 q1, q1, #26 + vtrn.32 d5, d7 + vsub.i64 q3, q6, q7 + add r2, r2, #16 + vsub.i64 q1, q4, q1 + vst1.8 d4, [r2, : 64] + vtrn.32 d6, d0 + vtrn.32 d7, d1 + sub r2, r2, #8 + vtrn.32 d2, d10 + vtrn.32 d3, d11 + vst1.8 d6, [r2, : 64] + sub r2, r2, #24 + vst1.8 d2, [r2, : 64] + add r2, r3, #96 + vmov.i32 q0, #0 + vmov.i64 d2, #0xff + vmov.i64 d3, #0 + vshr.u32 q1, q1, #7 + vst1.8 {d2-d3}, [r2, : 128]! + vst1.8 {d0-d1}, [r2, : 128]! + vst1.8 d0, [r2, : 64] + add r2, r3, #144 + vmov.i32 q0, #0 + vst1.8 {d0-d1}, [r2, : 128]! + vst1.8 {d0-d1}, [r2, : 128]! + vst1.8 d0, [r2, : 64] + add r2, r3, #240 + vmov.i32 q0, #0 + vmov.i64 d2, #0xff + vmov.i64 d3, #0 + vshr.u32 q1, q1, #7 + vst1.8 {d2-d3}, [r2, : 128]! + vst1.8 {d0-d1}, [r2, : 128]! + vst1.8 d0, [r2, : 64] + add r2, r3, #48 + add r6, r3, #192 + vld1.8 {d0-d1}, [r2, : 128]! + vld1.8 {d2-d3}, [r2, : 128]! + vld1.8 {d4}, [r2, : 64] + vst1.8 {d0-d1}, [r6, : 128]! + vst1.8 {d2-d3}, [r6, : 128]! + vst1.8 d4, [r6, : 64] +.Lmainloop: + mov r2, r5, LSR #3 + and r6, r5, #7 + ldrb r2, [r1, r2] + mov r2, r2, LSR r6 + and r2, r2, #1 + str r5, [sp, #456] + eor r4, r4, r2 + str r2, [sp, #460] + neg r2, r4 + add r4, r3, #96 + add r5, r3, #192 + add r6, r3, #144 + vld1.8 {d8-d9}, [r4, : 128]! + add r7, r3, #240 + vld1.8 {d10-d11}, [r5, : 128]! + veor q6, q4, q5 + vld1.8 {d14-d15}, [r6, : 128]! + vdup.i32 q8, r2 + vld1.8 {d18-d19}, [r7, : 128]! + veor q10, q7, q9 + vld1.8 {d22-d23}, [r4, : 128]! + vand q6, q6, q8 + vld1.8 {d24-d25}, [r5, : 128]! + vand q10, q10, q8 + vld1.8 {d26-d27}, [r6, : 128]! + veor q4, q4, q6 + vld1.8 {d28-d29}, [r7, : 128]! + veor q5, q5, q6 + vld1.8 {d0}, [r4, : 64] + veor q6, q7, q10 + vld1.8 {d2}, [r5, : 64] + veor q7, q9, q10 + vld1.8 {d4}, [r6, : 64] + veor q9, q11, q12 + vld1.8 {d6}, [r7, : 64] + veor q10, q0, q1 + sub r2, r4, #32 + vand q9, q9, q8 + sub r4, r5, #32 + vand q10, q10, q8 + sub r5, r6, #32 + veor q11, q11, q9 + sub r6, r7, #32 + veor q0, q0, q10 + veor q9, q12, q9 + veor q1, q1, q10 + veor q10, q13, q14 + veor q12, q2, q3 + vand q10, q10, q8 + vand q8, q12, q8 + veor q12, q13, q10 + veor q2, q2, q8 + veor q10, q14, q10 + veor q3, q3, q8 + vadd.i32 q8, q4, q6 + vsub.i32 q4, q4, q6 + vst1.8 {d16-d17}, [r2, : 128]! + vadd.i32 q6, q11, q12 + vst1.8 {d8-d9}, [r5, : 128]! + vsub.i32 q4, q11, q12 + vst1.8 {d12-d13}, [r2, : 128]! + vadd.i32 q6, q0, q2 + vst1.8 {d8-d9}, [r5, : 128]! + vsub.i32 q0, q0, q2 + vst1.8 d12, [r2, : 64] + vadd.i32 q2, q5, q7 + vst1.8 d0, [r5, : 64] + vsub.i32 q0, q5, q7 + vst1.8 {d4-d5}, [r4, : 128]! + vadd.i32 q2, q9, q10 + vst1.8 {d0-d1}, [r6, : 128]! + vsub.i32 q0, q9, q10 + vst1.8 {d4-d5}, [r4, : 128]! + vadd.i32 q2, q1, q3 + vst1.8 {d0-d1}, [r6, : 128]! + vsub.i32 q0, q1, q3 + vst1.8 d4, [r4, : 64] + vst1.8 d0, [r6, : 64] + add r2, sp, #512 + add r4, r3, #96 + add r5, r3, #144 + vld1.8 {d0-d1}, [r2, : 128] + vld1.8 {d2-d3}, [r4, : 128]! + vld1.8 {d4-d5}, [r5, : 128]! + vzip.i32 q1, q2 + vld1.8 {d6-d7}, [r4, : 128]! + vld1.8 {d8-d9}, [r5, : 128]! + vshl.i32 q5, q1, #1 + vzip.i32 q3, q4 + vshl.i32 q6, q2, #1 + vld1.8 {d14}, [r4, : 64] + vshl.i32 q8, q3, #1 + vld1.8 {d15}, [r5, : 64] + vshl.i32 q9, q4, #1 + vmul.i32 d21, d7, d1 + vtrn.32 d14, d15 + vmul.i32 q11, q4, q0 + vmul.i32 q0, q7, q0 + vmull.s32 q12, d2, d2 + vmlal.s32 q12, d11, d1 + vmlal.s32 q12, d12, d0 + vmlal.s32 q12, d13, d23 + vmlal.s32 q12, d16, d22 + vmlal.s32 q12, d7, d21 + vmull.s32 q10, d2, d11 + vmlal.s32 q10, d4, d1 + vmlal.s32 q10, d13, d0 + vmlal.s32 q10, d6, d23 + vmlal.s32 q10, d17, d22 + vmull.s32 q13, d10, d4 + vmlal.s32 q13, d11, d3 + vmlal.s32 q13, d13, d1 + vmlal.s32 q13, d16, d0 + vmlal.s32 q13, d17, d23 + vmlal.s32 q13, d8, d22 + vmull.s32 q1, d10, d5 + vmlal.s32 q1, d11, d4 + vmlal.s32 q1, d6, d1 + vmlal.s32 q1, d17, d0 + vmlal.s32 q1, d8, d23 + vmull.s32 q14, d10, d6 + vmlal.s32 q14, d11, d13 + vmlal.s32 q14, d4, d4 + vmlal.s32 q14, d17, d1 + vmlal.s32 q14, d18, d0 + vmlal.s32 q14, d9, d23 + vmull.s32 q11, d10, d7 + vmlal.s32 q11, d11, d6 + vmlal.s32 q11, d12, d5 + vmlal.s32 q11, d8, d1 + vmlal.s32 q11, d19, d0 + vmull.s32 q15, d10, d8 + vmlal.s32 q15, d11, d17 + vmlal.s32 q15, d12, d6 + vmlal.s32 q15, d13, d5 + vmlal.s32 q15, d19, d1 + vmlal.s32 q15, d14, d0 + vmull.s32 q2, d10, d9 + vmlal.s32 q2, d11, d8 + vmlal.s32 q2, d12, d7 + vmlal.s32 q2, d13, d6 + vmlal.s32 q2, d14, d1 + vmull.s32 q0, d15, d1 + vmlal.s32 q0, d10, d14 + vmlal.s32 q0, d11, d19 + vmlal.s32 q0, d12, d8 + vmlal.s32 q0, d13, d17 + vmlal.s32 q0, d6, d6 + add r2, sp, #480 + vld1.8 {d18-d19}, [r2, : 128] + vmull.s32 q3, d16, d7 + vmlal.s32 q3, d10, d15 + vmlal.s32 q3, d11, d14 + vmlal.s32 q3, d12, d9 + vmlal.s32 q3, d13, d8 + add r2, sp, #496 + vld1.8 {d8-d9}, [r2, : 128] + vadd.i64 q5, q12, q9 + vadd.i64 q6, q15, q9 + vshr.s64 q5, q5, #26 + vshr.s64 q6, q6, #26 + vadd.i64 q7, q10, q5 + vshl.i64 q5, q5, #26 + vadd.i64 q8, q7, q4 + vadd.i64 q2, q2, q6 + vshl.i64 q6, q6, #26 + vadd.i64 q10, q2, q4 + vsub.i64 q5, q12, q5 + vshr.s64 q8, q8, #25 + vsub.i64 q6, q15, q6 + vshr.s64 q10, q10, #25 + vadd.i64 q12, q13, q8 + vshl.i64 q8, q8, #25 + vadd.i64 q13, q12, q9 + vadd.i64 q0, q0, q10 + vsub.i64 q7, q7, q8 + vshr.s64 q8, q13, #26 + vshl.i64 q10, q10, #25 + vadd.i64 q13, q0, q9 + vadd.i64 q1, q1, q8 + vshl.i64 q8, q8, #26 + vadd.i64 q15, q1, q4 + vsub.i64 q2, q2, q10 + vshr.s64 q10, q13, #26 + vsub.i64 q8, q12, q8 + vshr.s64 q12, q15, #25 + vadd.i64 q3, q3, q10 + vshl.i64 q10, q10, #26 + vadd.i64 q13, q3, q4 + vadd.i64 q14, q14, q12 + add r2, r3, #288 + vshl.i64 q12, q12, #25 + add r4, r3, #336 + vadd.i64 q15, q14, q9 + add r2, r2, #8 + vsub.i64 q0, q0, q10 + add r4, r4, #8 + vshr.s64 q10, q13, #25 + vsub.i64 q1, q1, q12 + vshr.s64 q12, q15, #26 + vadd.i64 q13, q10, q10 + vadd.i64 q11, q11, q12 + vtrn.32 d16, d2 + vshl.i64 q12, q12, #26 + vtrn.32 d17, d3 + vadd.i64 q1, q11, q4 + vadd.i64 q4, q5, q13 + vst1.8 d16, [r2, : 64]! + vshl.i64 q5, q10, #4 + vst1.8 d17, [r4, : 64]! + vsub.i64 q8, q14, q12 + vshr.s64 q1, q1, #25 + vadd.i64 q4, q4, q5 + vadd.i64 q5, q6, q1 + vshl.i64 q1, q1, #25 + vadd.i64 q6, q5, q9 + vadd.i64 q4, q4, q10 + vshl.i64 q10, q10, #25 + vadd.i64 q9, q4, q9 + vsub.i64 q1, q11, q1 + vshr.s64 q6, q6, #26 + vsub.i64 q3, q3, q10 + vtrn.32 d16, d2 + vshr.s64 q9, q9, #26 + vtrn.32 d17, d3 + vadd.i64 q1, q2, q6 + vst1.8 d16, [r2, : 64] + vshl.i64 q2, q6, #26 + vst1.8 d17, [r4, : 64] + vadd.i64 q6, q7, q9 + vtrn.32 d0, d6 + vshl.i64 q7, q9, #26 + vtrn.32 d1, d7 + vsub.i64 q2, q5, q2 + add r2, r2, #16 + vsub.i64 q3, q4, q7 + vst1.8 d0, [r2, : 64] + add r4, r4, #16 + vst1.8 d1, [r4, : 64] + vtrn.32 d4, d2 + vtrn.32 d5, d3 + sub r2, r2, #8 + sub r4, r4, #8 + vtrn.32 d6, d12 + vtrn.32 d7, d13 + vst1.8 d4, [r2, : 64] + vst1.8 d5, [r4, : 64] + sub r2, r2, #24 + sub r4, r4, #24 + vst1.8 d6, [r2, : 64] + vst1.8 d7, [r4, : 64] + add r2, r3, #240 + add r4, r3, #96 + vld1.8 {d0-d1}, [r4, : 128]! + vld1.8 {d2-d3}, [r4, : 128]! + vld1.8 {d4}, [r4, : 64] + add r4, r3, #144 + vld1.8 {d6-d7}, [r4, : 128]! + vtrn.32 q0, q3 + vld1.8 {d8-d9}, [r4, : 128]! + vshl.i32 q5, q0, #4 + vtrn.32 q1, q4 + vshl.i32 q6, q3, #4 + vadd.i32 q5, q5, q0 + vadd.i32 q6, q6, q3 + vshl.i32 q7, q1, #4 + vld1.8 {d5}, [r4, : 64] + vshl.i32 q8, q4, #4 + vtrn.32 d4, d5 + vadd.i32 q7, q7, q1 + vadd.i32 q8, q8, q4 + vld1.8 {d18-d19}, [r2, : 128]! + vshl.i32 q10, q2, #4 + vld1.8 {d22-d23}, [r2, : 128]! + vadd.i32 q10, q10, q2 + vld1.8 {d24}, [r2, : 64] + vadd.i32 q5, q5, q0 + add r2, r3, #192 + vld1.8 {d26-d27}, [r2, : 128]! + vadd.i32 q6, q6, q3 + vld1.8 {d28-d29}, [r2, : 128]! + vadd.i32 q8, q8, q4 + vld1.8 {d25}, [r2, : 64] + vadd.i32 q10, q10, q2 + vtrn.32 q9, q13 + vadd.i32 q7, q7, q1 + vadd.i32 q5, q5, q0 + vtrn.32 q11, q14 + vadd.i32 q6, q6, q3 + add r2, sp, #528 + vadd.i32 q10, q10, q2 + vtrn.32 d24, d25 + vst1.8 {d12-d13}, [r2, : 128] + vshl.i32 q6, q13, #1 + add r2, sp, #544 + vst1.8 {d20-d21}, [r2, : 128] + vshl.i32 q10, q14, #1 + add r2, sp, #560 + vst1.8 {d12-d13}, [r2, : 128] + vshl.i32 q15, q12, #1 + vadd.i32 q8, q8, q4 + vext.32 d10, d31, d30, #0 + vadd.i32 q7, q7, q1 + add r2, sp, #576 + vst1.8 {d16-d17}, [r2, : 128] + vmull.s32 q8, d18, d5 + vmlal.s32 q8, d26, d4 + vmlal.s32 q8, d19, d9 + vmlal.s32 q8, d27, d3 + vmlal.s32 q8, d22, d8 + vmlal.s32 q8, d28, d2 + vmlal.s32 q8, d23, d7 + vmlal.s32 q8, d29, d1 + vmlal.s32 q8, d24, d6 + vmlal.s32 q8, d25, d0 + add r2, sp, #592 + vst1.8 {d14-d15}, [r2, : 128] + vmull.s32 q2, d18, d4 + vmlal.s32 q2, d12, d9 + vmlal.s32 q2, d13, d8 + vmlal.s32 q2, d19, d3 + vmlal.s32 q2, d22, d2 + vmlal.s32 q2, d23, d1 + vmlal.s32 q2, d24, d0 + add r2, sp, #608 + vst1.8 {d20-d21}, [r2, : 128] + vmull.s32 q7, d18, d9 + vmlal.s32 q7, d26, d3 + vmlal.s32 q7, d19, d8 + vmlal.s32 q7, d27, d2 + vmlal.s32 q7, d22, d7 + vmlal.s32 q7, d28, d1 + vmlal.s32 q7, d23, d6 + vmlal.s32 q7, d29, d0 + add r2, sp, #624 + vst1.8 {d10-d11}, [r2, : 128] + vmull.s32 q5, d18, d3 + vmlal.s32 q5, d19, d2 + vmlal.s32 q5, d22, d1 + vmlal.s32 q5, d23, d0 + vmlal.s32 q5, d12, d8 + add r2, sp, #640 + vst1.8 {d16-d17}, [r2, : 128] + vmull.s32 q4, d18, d8 + vmlal.s32 q4, d26, d2 + vmlal.s32 q4, d19, d7 + vmlal.s32 q4, d27, d1 + vmlal.s32 q4, d22, d6 + vmlal.s32 q4, d28, d0 + vmull.s32 q8, d18, d7 + vmlal.s32 q8, d26, d1 + vmlal.s32 q8, d19, d6 + vmlal.s32 q8, d27, d0 + add r2, sp, #544 + vld1.8 {d20-d21}, [r2, : 128] + vmlal.s32 q7, d24, d21 + vmlal.s32 q7, d25, d20 + vmlal.s32 q4, d23, d21 + vmlal.s32 q4, d29, d20 + vmlal.s32 q8, d22, d21 + vmlal.s32 q8, d28, d20 + vmlal.s32 q5, d24, d20 + add r2, sp, #544 + vst1.8 {d14-d15}, [r2, : 128] + vmull.s32 q7, d18, d6 + vmlal.s32 q7, d26, d0 + add r2, sp, #624 + vld1.8 {d30-d31}, [r2, : 128] + vmlal.s32 q2, d30, d21 + vmlal.s32 q7, d19, d21 + vmlal.s32 q7, d27, d20 + add r2, sp, #592 + vld1.8 {d26-d27}, [r2, : 128] + vmlal.s32 q4, d25, d27 + vmlal.s32 q8, d29, d27 + vmlal.s32 q8, d25, d26 + vmlal.s32 q7, d28, d27 + vmlal.s32 q7, d29, d26 + add r2, sp, #576 + vld1.8 {d28-d29}, [r2, : 128] + vmlal.s32 q4, d24, d29 + vmlal.s32 q8, d23, d29 + vmlal.s32 q8, d24, d28 + vmlal.s32 q7, d22, d29 + vmlal.s32 q7, d23, d28 + add r2, sp, #576 + vst1.8 {d8-d9}, [r2, : 128] + add r2, sp, #528 + vld1.8 {d8-d9}, [r2, : 128] + vmlal.s32 q7, d24, d9 + vmlal.s32 q7, d25, d31 + vmull.s32 q1, d18, d2 + vmlal.s32 q1, d19, d1 + vmlal.s32 q1, d22, d0 + vmlal.s32 q1, d24, d27 + vmlal.s32 q1, d23, d20 + vmlal.s32 q1, d12, d7 + vmlal.s32 q1, d13, d6 + vmull.s32 q6, d18, d1 + vmlal.s32 q6, d19, d0 + vmlal.s32 q6, d23, d27 + vmlal.s32 q6, d22, d20 + vmlal.s32 q6, d24, d26 + vmull.s32 q0, d18, d0 + vmlal.s32 q0, d22, d27 + vmlal.s32 q0, d23, d26 + vmlal.s32 q0, d24, d31 + vmlal.s32 q0, d19, d20 + add r2, sp, #608 + vld1.8 {d18-d19}, [r2, : 128] + vmlal.s32 q2, d18, d7 + vmlal.s32 q2, d19, d6 + vmlal.s32 q5, d18, d6 + vmlal.s32 q5, d19, d21 + vmlal.s32 q1, d18, d21 + vmlal.s32 q1, d19, d29 + vmlal.s32 q0, d18, d28 + vmlal.s32 q0, d19, d9 + vmlal.s32 q6, d18, d29 + vmlal.s32 q6, d19, d28 + add r2, sp, #560 + vld1.8 {d18-d19}, [r2, : 128] + add r2, sp, #480 + vld1.8 {d22-d23}, [r2, : 128] + vmlal.s32 q5, d19, d7 + vmlal.s32 q0, d18, d21 + vmlal.s32 q0, d19, d29 + vmlal.s32 q6, d18, d6 + add r2, sp, #496 + vld1.8 {d6-d7}, [r2, : 128] + vmlal.s32 q6, d19, d21 + add r2, sp, #544 + vld1.8 {d18-d19}, [r2, : 128] + vmlal.s32 q0, d30, d8 + add r2, sp, #640 + vld1.8 {d20-d21}, [r2, : 128] + vmlal.s32 q5, d30, d29 + add r2, sp, #576 + vld1.8 {d24-d25}, [r2, : 128] + vmlal.s32 q1, d30, d28 + vadd.i64 q13, q0, q11 + vadd.i64 q14, q5, q11 + vmlal.s32 q6, d30, d9 + vshr.s64 q4, q13, #26 + vshr.s64 q13, q14, #26 + vadd.i64 q7, q7, q4 + vshl.i64 q4, q4, #26 + vadd.i64 q14, q7, q3 + vadd.i64 q9, q9, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q15, q9, q3 + vsub.i64 q0, q0, q4 + vshr.s64 q4, q14, #25 + vsub.i64 q5, q5, q13 + vshr.s64 q13, q15, #25 + vadd.i64 q6, q6, q4 + vshl.i64 q4, q4, #25 + vadd.i64 q14, q6, q11 + vadd.i64 q2, q2, q13 + vsub.i64 q4, q7, q4 + vshr.s64 q7, q14, #26 + vshl.i64 q13, q13, #25 + vadd.i64 q14, q2, q11 + vadd.i64 q8, q8, q7 + vshl.i64 q7, q7, #26 + vadd.i64 q15, q8, q3 + vsub.i64 q9, q9, q13 + vshr.s64 q13, q14, #26 + vsub.i64 q6, q6, q7 + vshr.s64 q7, q15, #25 + vadd.i64 q10, q10, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q14, q10, q3 + vadd.i64 q1, q1, q7 + add r2, r3, #144 + vshl.i64 q7, q7, #25 + add r4, r3, #96 + vadd.i64 q15, q1, q11 + add r2, r2, #8 + vsub.i64 q2, q2, q13 + add r4, r4, #8 + vshr.s64 q13, q14, #25 + vsub.i64 q7, q8, q7 + vshr.s64 q8, q15, #26 + vadd.i64 q14, q13, q13 + vadd.i64 q12, q12, q8 + vtrn.32 d12, d14 + vshl.i64 q8, q8, #26 + vtrn.32 d13, d15 + vadd.i64 q3, q12, q3 + vadd.i64 q0, q0, q14 + vst1.8 d12, [r2, : 64]! + vshl.i64 q7, q13, #4 + vst1.8 d13, [r4, : 64]! + vsub.i64 q1, q1, q8 + vshr.s64 q3, q3, #25 + vadd.i64 q0, q0, q7 + vadd.i64 q5, q5, q3 + vshl.i64 q3, q3, #25 + vadd.i64 q6, q5, q11 + vadd.i64 q0, q0, q13 + vshl.i64 q7, q13, #25 + vadd.i64 q8, q0, q11 + vsub.i64 q3, q12, q3 + vshr.s64 q6, q6, #26 + vsub.i64 q7, q10, q7 + vtrn.32 d2, d6 + vshr.s64 q8, q8, #26 + vtrn.32 d3, d7 + vadd.i64 q3, q9, q6 + vst1.8 d2, [r2, : 64] + vshl.i64 q6, q6, #26 + vst1.8 d3, [r4, : 64] + vadd.i64 q1, q4, q8 + vtrn.32 d4, d14 + vshl.i64 q4, q8, #26 + vtrn.32 d5, d15 + vsub.i64 q5, q5, q6 + add r2, r2, #16 + vsub.i64 q0, q0, q4 + vst1.8 d4, [r2, : 64] + add r4, r4, #16 + vst1.8 d5, [r4, : 64] + vtrn.32 d10, d6 + vtrn.32 d11, d7 + sub r2, r2, #8 + sub r4, r4, #8 + vtrn.32 d0, d2 + vtrn.32 d1, d3 + vst1.8 d10, [r2, : 64] + vst1.8 d11, [r4, : 64] + sub r2, r2, #24 + sub r4, r4, #24 + vst1.8 d0, [r2, : 64] + vst1.8 d1, [r4, : 64] + add r2, r3, #288 + add r4, r3, #336 + vld1.8 {d0-d1}, [r2, : 128]! + vld1.8 {d2-d3}, [r4, : 128]! + vsub.i32 q0, q0, q1 + vld1.8 {d2-d3}, [r2, : 128]! + vld1.8 {d4-d5}, [r4, : 128]! + vsub.i32 q1, q1, q2 + add r5, r3, #240 + vld1.8 {d4}, [r2, : 64] + vld1.8 {d6}, [r4, : 64] + vsub.i32 q2, q2, q3 + vst1.8 {d0-d1}, [r5, : 128]! + vst1.8 {d2-d3}, [r5, : 128]! + vst1.8 d4, [r5, : 64] + add r2, r3, #144 + add r4, r3, #96 + add r5, r3, #144 + add r6, r3, #192 + vld1.8 {d0-d1}, [r2, : 128]! + vld1.8 {d2-d3}, [r4, : 128]! + vsub.i32 q2, q0, q1 + vadd.i32 q0, q0, q1 + vld1.8 {d2-d3}, [r2, : 128]! + vld1.8 {d6-d7}, [r4, : 128]! + vsub.i32 q4, q1, q3 + vadd.i32 q1, q1, q3 + vld1.8 {d6}, [r2, : 64] + vld1.8 {d10}, [r4, : 64] + vsub.i32 q6, q3, q5 + vadd.i32 q3, q3, q5 + vst1.8 {d4-d5}, [r5, : 128]! + vst1.8 {d0-d1}, [r6, : 128]! + vst1.8 {d8-d9}, [r5, : 128]! + vst1.8 {d2-d3}, [r6, : 128]! + vst1.8 d12, [r5, : 64] + vst1.8 d6, [r6, : 64] + add r2, r3, #0 + add r4, r3, #240 + vld1.8 {d0-d1}, [r4, : 128]! + vld1.8 {d2-d3}, [r4, : 128]! + vld1.8 {d4}, [r4, : 64] + add r4, r3, #336 + vld1.8 {d6-d7}, [r4, : 128]! + vtrn.32 q0, q3 + vld1.8 {d8-d9}, [r4, : 128]! + vshl.i32 q5, q0, #4 + vtrn.32 q1, q4 + vshl.i32 q6, q3, #4 + vadd.i32 q5, q5, q0 + vadd.i32 q6, q6, q3 + vshl.i32 q7, q1, #4 + vld1.8 {d5}, [r4, : 64] + vshl.i32 q8, q4, #4 + vtrn.32 d4, d5 + vadd.i32 q7, q7, q1 + vadd.i32 q8, q8, q4 + vld1.8 {d18-d19}, [r2, : 128]! + vshl.i32 q10, q2, #4 + vld1.8 {d22-d23}, [r2, : 128]! + vadd.i32 q10, q10, q2 + vld1.8 {d24}, [r2, : 64] + vadd.i32 q5, q5, q0 + add r2, r3, #288 + vld1.8 {d26-d27}, [r2, : 128]! + vadd.i32 q6, q6, q3 + vld1.8 {d28-d29}, [r2, : 128]! + vadd.i32 q8, q8, q4 + vld1.8 {d25}, [r2, : 64] + vadd.i32 q10, q10, q2 + vtrn.32 q9, q13 + vadd.i32 q7, q7, q1 + vadd.i32 q5, q5, q0 + vtrn.32 q11, q14 + vadd.i32 q6, q6, q3 + add r2, sp, #528 + vadd.i32 q10, q10, q2 + vtrn.32 d24, d25 + vst1.8 {d12-d13}, [r2, : 128] + vshl.i32 q6, q13, #1 + add r2, sp, #544 + vst1.8 {d20-d21}, [r2, : 128] + vshl.i32 q10, q14, #1 + add r2, sp, #560 + vst1.8 {d12-d13}, [r2, : 128] + vshl.i32 q15, q12, #1 + vadd.i32 q8, q8, q4 + vext.32 d10, d31, d30, #0 + vadd.i32 q7, q7, q1 + add r2, sp, #576 + vst1.8 {d16-d17}, [r2, : 128] + vmull.s32 q8, d18, d5 + vmlal.s32 q8, d26, d4 + vmlal.s32 q8, d19, d9 + vmlal.s32 q8, d27, d3 + vmlal.s32 q8, d22, d8 + vmlal.s32 q8, d28, d2 + vmlal.s32 q8, d23, d7 + vmlal.s32 q8, d29, d1 + vmlal.s32 q8, d24, d6 + vmlal.s32 q8, d25, d0 + add r2, sp, #592 + vst1.8 {d14-d15}, [r2, : 128] + vmull.s32 q2, d18, d4 + vmlal.s32 q2, d12, d9 + vmlal.s32 q2, d13, d8 + vmlal.s32 q2, d19, d3 + vmlal.s32 q2, d22, d2 + vmlal.s32 q2, d23, d1 + vmlal.s32 q2, d24, d0 + add r2, sp, #608 + vst1.8 {d20-d21}, [r2, : 128] + vmull.s32 q7, d18, d9 + vmlal.s32 q7, d26, d3 + vmlal.s32 q7, d19, d8 + vmlal.s32 q7, d27, d2 + vmlal.s32 q7, d22, d7 + vmlal.s32 q7, d28, d1 + vmlal.s32 q7, d23, d6 + vmlal.s32 q7, d29, d0 + add r2, sp, #624 + vst1.8 {d10-d11}, [r2, : 128] + vmull.s32 q5, d18, d3 + vmlal.s32 q5, d19, d2 + vmlal.s32 q5, d22, d1 + vmlal.s32 q5, d23, d0 + vmlal.s32 q5, d12, d8 + add r2, sp, #640 + vst1.8 {d16-d17}, [r2, : 128] + vmull.s32 q4, d18, d8 + vmlal.s32 q4, d26, d2 + vmlal.s32 q4, d19, d7 + vmlal.s32 q4, d27, d1 + vmlal.s32 q4, d22, d6 + vmlal.s32 q4, d28, d0 + vmull.s32 q8, d18, d7 + vmlal.s32 q8, d26, d1 + vmlal.s32 q8, d19, d6 + vmlal.s32 q8, d27, d0 + add r2, sp, #544 + vld1.8 {d20-d21}, [r2, : 128] + vmlal.s32 q7, d24, d21 + vmlal.s32 q7, d25, d20 + vmlal.s32 q4, d23, d21 + vmlal.s32 q4, d29, d20 + vmlal.s32 q8, d22, d21 + vmlal.s32 q8, d28, d20 + vmlal.s32 q5, d24, d20 + add r2, sp, #544 + vst1.8 {d14-d15}, [r2, : 128] + vmull.s32 q7, d18, d6 + vmlal.s32 q7, d26, d0 + add r2, sp, #624 + vld1.8 {d30-d31}, [r2, : 128] + vmlal.s32 q2, d30, d21 + vmlal.s32 q7, d19, d21 + vmlal.s32 q7, d27, d20 + add r2, sp, #592 + vld1.8 {d26-d27}, [r2, : 128] + vmlal.s32 q4, d25, d27 + vmlal.s32 q8, d29, d27 + vmlal.s32 q8, d25, d26 + vmlal.s32 q7, d28, d27 + vmlal.s32 q7, d29, d26 + add r2, sp, #576 + vld1.8 {d28-d29}, [r2, : 128] + vmlal.s32 q4, d24, d29 + vmlal.s32 q8, d23, d29 + vmlal.s32 q8, d24, d28 + vmlal.s32 q7, d22, d29 + vmlal.s32 q7, d23, d28 + add r2, sp, #576 + vst1.8 {d8-d9}, [r2, : 128] + add r2, sp, #528 + vld1.8 {d8-d9}, [r2, : 128] + vmlal.s32 q7, d24, d9 + vmlal.s32 q7, d25, d31 + vmull.s32 q1, d18, d2 + vmlal.s32 q1, d19, d1 + vmlal.s32 q1, d22, d0 + vmlal.s32 q1, d24, d27 + vmlal.s32 q1, d23, d20 + vmlal.s32 q1, d12, d7 + vmlal.s32 q1, d13, d6 + vmull.s32 q6, d18, d1 + vmlal.s32 q6, d19, d0 + vmlal.s32 q6, d23, d27 + vmlal.s32 q6, d22, d20 + vmlal.s32 q6, d24, d26 + vmull.s32 q0, d18, d0 + vmlal.s32 q0, d22, d27 + vmlal.s32 q0, d23, d26 + vmlal.s32 q0, d24, d31 + vmlal.s32 q0, d19, d20 + add r2, sp, #608 + vld1.8 {d18-d19}, [r2, : 128] + vmlal.s32 q2, d18, d7 + vmlal.s32 q2, d19, d6 + vmlal.s32 q5, d18, d6 + vmlal.s32 q5, d19, d21 + vmlal.s32 q1, d18, d21 + vmlal.s32 q1, d19, d29 + vmlal.s32 q0, d18, d28 + vmlal.s32 q0, d19, d9 + vmlal.s32 q6, d18, d29 + vmlal.s32 q6, d19, d28 + add r2, sp, #560 + vld1.8 {d18-d19}, [r2, : 128] + add r2, sp, #480 + vld1.8 {d22-d23}, [r2, : 128] + vmlal.s32 q5, d19, d7 + vmlal.s32 q0, d18, d21 + vmlal.s32 q0, d19, d29 + vmlal.s32 q6, d18, d6 + add r2, sp, #496 + vld1.8 {d6-d7}, [r2, : 128] + vmlal.s32 q6, d19, d21 + add r2, sp, #544 + vld1.8 {d18-d19}, [r2, : 128] + vmlal.s32 q0, d30, d8 + add r2, sp, #640 + vld1.8 {d20-d21}, [r2, : 128] + vmlal.s32 q5, d30, d29 + add r2, sp, #576 + vld1.8 {d24-d25}, [r2, : 128] + vmlal.s32 q1, d30, d28 + vadd.i64 q13, q0, q11 + vadd.i64 q14, q5, q11 + vmlal.s32 q6, d30, d9 + vshr.s64 q4, q13, #26 + vshr.s64 q13, q14, #26 + vadd.i64 q7, q7, q4 + vshl.i64 q4, q4, #26 + vadd.i64 q14, q7, q3 + vadd.i64 q9, q9, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q15, q9, q3 + vsub.i64 q0, q0, q4 + vshr.s64 q4, q14, #25 + vsub.i64 q5, q5, q13 + vshr.s64 q13, q15, #25 + vadd.i64 q6, q6, q4 + vshl.i64 q4, q4, #25 + vadd.i64 q14, q6, q11 + vadd.i64 q2, q2, q13 + vsub.i64 q4, q7, q4 + vshr.s64 q7, q14, #26 + vshl.i64 q13, q13, #25 + vadd.i64 q14, q2, q11 + vadd.i64 q8, q8, q7 + vshl.i64 q7, q7, #26 + vadd.i64 q15, q8, q3 + vsub.i64 q9, q9, q13 + vshr.s64 q13, q14, #26 + vsub.i64 q6, q6, q7 + vshr.s64 q7, q15, #25 + vadd.i64 q10, q10, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q14, q10, q3 + vadd.i64 q1, q1, q7 + add r2, r3, #288 + vshl.i64 q7, q7, #25 + add r4, r3, #96 + vadd.i64 q15, q1, q11 + add r2, r2, #8 + vsub.i64 q2, q2, q13 + add r4, r4, #8 + vshr.s64 q13, q14, #25 + vsub.i64 q7, q8, q7 + vshr.s64 q8, q15, #26 + vadd.i64 q14, q13, q13 + vadd.i64 q12, q12, q8 + vtrn.32 d12, d14 + vshl.i64 q8, q8, #26 + vtrn.32 d13, d15 + vadd.i64 q3, q12, q3 + vadd.i64 q0, q0, q14 + vst1.8 d12, [r2, : 64]! + vshl.i64 q7, q13, #4 + vst1.8 d13, [r4, : 64]! + vsub.i64 q1, q1, q8 + vshr.s64 q3, q3, #25 + vadd.i64 q0, q0, q7 + vadd.i64 q5, q5, q3 + vshl.i64 q3, q3, #25 + vadd.i64 q6, q5, q11 + vadd.i64 q0, q0, q13 + vshl.i64 q7, q13, #25 + vadd.i64 q8, q0, q11 + vsub.i64 q3, q12, q3 + vshr.s64 q6, q6, #26 + vsub.i64 q7, q10, q7 + vtrn.32 d2, d6 + vshr.s64 q8, q8, #26 + vtrn.32 d3, d7 + vadd.i64 q3, q9, q6 + vst1.8 d2, [r2, : 64] + vshl.i64 q6, q6, #26 + vst1.8 d3, [r4, : 64] + vadd.i64 q1, q4, q8 + vtrn.32 d4, d14 + vshl.i64 q4, q8, #26 + vtrn.32 d5, d15 + vsub.i64 q5, q5, q6 + add r2, r2, #16 + vsub.i64 q0, q0, q4 + vst1.8 d4, [r2, : 64] + add r4, r4, #16 + vst1.8 d5, [r4, : 64] + vtrn.32 d10, d6 + vtrn.32 d11, d7 + sub r2, r2, #8 + sub r4, r4, #8 + vtrn.32 d0, d2 + vtrn.32 d1, d3 + vst1.8 d10, [r2, : 64] + vst1.8 d11, [r4, : 64] + sub r2, r2, #24 + sub r4, r4, #24 + vst1.8 d0, [r2, : 64] + vst1.8 d1, [r4, : 64] + add r2, sp, #512 + add r4, r3, #144 + add r5, r3, #192 + vld1.8 {d0-d1}, [r2, : 128] + vld1.8 {d2-d3}, [r4, : 128]! + vld1.8 {d4-d5}, [r5, : 128]! + vzip.i32 q1, q2 + vld1.8 {d6-d7}, [r4, : 128]! + vld1.8 {d8-d9}, [r5, : 128]! + vshl.i32 q5, q1, #1 + vzip.i32 q3, q4 + vshl.i32 q6, q2, #1 + vld1.8 {d14}, [r4, : 64] + vshl.i32 q8, q3, #1 + vld1.8 {d15}, [r5, : 64] + vshl.i32 q9, q4, #1 + vmul.i32 d21, d7, d1 + vtrn.32 d14, d15 + vmul.i32 q11, q4, q0 + vmul.i32 q0, q7, q0 + vmull.s32 q12, d2, d2 + vmlal.s32 q12, d11, d1 + vmlal.s32 q12, d12, d0 + vmlal.s32 q12, d13, d23 + vmlal.s32 q12, d16, d22 + vmlal.s32 q12, d7, d21 + vmull.s32 q10, d2, d11 + vmlal.s32 q10, d4, d1 + vmlal.s32 q10, d13, d0 + vmlal.s32 q10, d6, d23 + vmlal.s32 q10, d17, d22 + vmull.s32 q13, d10, d4 + vmlal.s32 q13, d11, d3 + vmlal.s32 q13, d13, d1 + vmlal.s32 q13, d16, d0 + vmlal.s32 q13, d17, d23 + vmlal.s32 q13, d8, d22 + vmull.s32 q1, d10, d5 + vmlal.s32 q1, d11, d4 + vmlal.s32 q1, d6, d1 + vmlal.s32 q1, d17, d0 + vmlal.s32 q1, d8, d23 + vmull.s32 q14, d10, d6 + vmlal.s32 q14, d11, d13 + vmlal.s32 q14, d4, d4 + vmlal.s32 q14, d17, d1 + vmlal.s32 q14, d18, d0 + vmlal.s32 q14, d9, d23 + vmull.s32 q11, d10, d7 + vmlal.s32 q11, d11, d6 + vmlal.s32 q11, d12, d5 + vmlal.s32 q11, d8, d1 + vmlal.s32 q11, d19, d0 + vmull.s32 q15, d10, d8 + vmlal.s32 q15, d11, d17 + vmlal.s32 q15, d12, d6 + vmlal.s32 q15, d13, d5 + vmlal.s32 q15, d19, d1 + vmlal.s32 q15, d14, d0 + vmull.s32 q2, d10, d9 + vmlal.s32 q2, d11, d8 + vmlal.s32 q2, d12, d7 + vmlal.s32 q2, d13, d6 + vmlal.s32 q2, d14, d1 + vmull.s32 q0, d15, d1 + vmlal.s32 q0, d10, d14 + vmlal.s32 q0, d11, d19 + vmlal.s32 q0, d12, d8 + vmlal.s32 q0, d13, d17 + vmlal.s32 q0, d6, d6 + add r2, sp, #480 + vld1.8 {d18-d19}, [r2, : 128] + vmull.s32 q3, d16, d7 + vmlal.s32 q3, d10, d15 + vmlal.s32 q3, d11, d14 + vmlal.s32 q3, d12, d9 + vmlal.s32 q3, d13, d8 + add r2, sp, #496 + vld1.8 {d8-d9}, [r2, : 128] + vadd.i64 q5, q12, q9 + vadd.i64 q6, q15, q9 + vshr.s64 q5, q5, #26 + vshr.s64 q6, q6, #26 + vadd.i64 q7, q10, q5 + vshl.i64 q5, q5, #26 + vadd.i64 q8, q7, q4 + vadd.i64 q2, q2, q6 + vshl.i64 q6, q6, #26 + vadd.i64 q10, q2, q4 + vsub.i64 q5, q12, q5 + vshr.s64 q8, q8, #25 + vsub.i64 q6, q15, q6 + vshr.s64 q10, q10, #25 + vadd.i64 q12, q13, q8 + vshl.i64 q8, q8, #25 + vadd.i64 q13, q12, q9 + vadd.i64 q0, q0, q10 + vsub.i64 q7, q7, q8 + vshr.s64 q8, q13, #26 + vshl.i64 q10, q10, #25 + vadd.i64 q13, q0, q9 + vadd.i64 q1, q1, q8 + vshl.i64 q8, q8, #26 + vadd.i64 q15, q1, q4 + vsub.i64 q2, q2, q10 + vshr.s64 q10, q13, #26 + vsub.i64 q8, q12, q8 + vshr.s64 q12, q15, #25 + vadd.i64 q3, q3, q10 + vshl.i64 q10, q10, #26 + vadd.i64 q13, q3, q4 + vadd.i64 q14, q14, q12 + add r2, r3, #144 + vshl.i64 q12, q12, #25 + add r4, r3, #192 + vadd.i64 q15, q14, q9 + add r2, r2, #8 + vsub.i64 q0, q0, q10 + add r4, r4, #8 + vshr.s64 q10, q13, #25 + vsub.i64 q1, q1, q12 + vshr.s64 q12, q15, #26 + vadd.i64 q13, q10, q10 + vadd.i64 q11, q11, q12 + vtrn.32 d16, d2 + vshl.i64 q12, q12, #26 + vtrn.32 d17, d3 + vadd.i64 q1, q11, q4 + vadd.i64 q4, q5, q13 + vst1.8 d16, [r2, : 64]! + vshl.i64 q5, q10, #4 + vst1.8 d17, [r4, : 64]! + vsub.i64 q8, q14, q12 + vshr.s64 q1, q1, #25 + vadd.i64 q4, q4, q5 + vadd.i64 q5, q6, q1 + vshl.i64 q1, q1, #25 + vadd.i64 q6, q5, q9 + vadd.i64 q4, q4, q10 + vshl.i64 q10, q10, #25 + vadd.i64 q9, q4, q9 + vsub.i64 q1, q11, q1 + vshr.s64 q6, q6, #26 + vsub.i64 q3, q3, q10 + vtrn.32 d16, d2 + vshr.s64 q9, q9, #26 + vtrn.32 d17, d3 + vadd.i64 q1, q2, q6 + vst1.8 d16, [r2, : 64] + vshl.i64 q2, q6, #26 + vst1.8 d17, [r4, : 64] + vadd.i64 q6, q7, q9 + vtrn.32 d0, d6 + vshl.i64 q7, q9, #26 + vtrn.32 d1, d7 + vsub.i64 q2, q5, q2 + add r2, r2, #16 + vsub.i64 q3, q4, q7 + vst1.8 d0, [r2, : 64] + add r4, r4, #16 + vst1.8 d1, [r4, : 64] + vtrn.32 d4, d2 + vtrn.32 d5, d3 + sub r2, r2, #8 + sub r4, r4, #8 + vtrn.32 d6, d12 + vtrn.32 d7, d13 + vst1.8 d4, [r2, : 64] + vst1.8 d5, [r4, : 64] + sub r2, r2, #24 + sub r4, r4, #24 + vst1.8 d6, [r2, : 64] + vst1.8 d7, [r4, : 64] + add r2, r3, #336 + add r4, r3, #288 + vld1.8 {d0-d1}, [r2, : 128]! + vld1.8 {d2-d3}, [r4, : 128]! + vadd.i32 q0, q0, q1 + vld1.8 {d2-d3}, [r2, : 128]! + vld1.8 {d4-d5}, [r4, : 128]! + vadd.i32 q1, q1, q2 + add r5, r3, #288 + vld1.8 {d4}, [r2, : 64] + vld1.8 {d6}, [r4, : 64] + vadd.i32 q2, q2, q3 + vst1.8 {d0-d1}, [r5, : 128]! + vst1.8 {d2-d3}, [r5, : 128]! + vst1.8 d4, [r5, : 64] + add r2, r3, #48 + add r4, r3, #144 + vld1.8 {d0-d1}, [r4, : 128]! + vld1.8 {d2-d3}, [r4, : 128]! + vld1.8 {d4}, [r4, : 64] + add r4, r3, #288 + vld1.8 {d6-d7}, [r4, : 128]! + vtrn.32 q0, q3 + vld1.8 {d8-d9}, [r4, : 128]! + vshl.i32 q5, q0, #4 + vtrn.32 q1, q4 + vshl.i32 q6, q3, #4 + vadd.i32 q5, q5, q0 + vadd.i32 q6, q6, q3 + vshl.i32 q7, q1, #4 + vld1.8 {d5}, [r4, : 64] + vshl.i32 q8, q4, #4 + vtrn.32 d4, d5 + vadd.i32 q7, q7, q1 + vadd.i32 q8, q8, q4 + vld1.8 {d18-d19}, [r2, : 128]! + vshl.i32 q10, q2, #4 + vld1.8 {d22-d23}, [r2, : 128]! + vadd.i32 q10, q10, q2 + vld1.8 {d24}, [r2, : 64] + vadd.i32 q5, q5, q0 + add r2, r3, #240 + vld1.8 {d26-d27}, [r2, : 128]! + vadd.i32 q6, q6, q3 + vld1.8 {d28-d29}, [r2, : 128]! + vadd.i32 q8, q8, q4 + vld1.8 {d25}, [r2, : 64] + vadd.i32 q10, q10, q2 + vtrn.32 q9, q13 + vadd.i32 q7, q7, q1 + vadd.i32 q5, q5, q0 + vtrn.32 q11, q14 + vadd.i32 q6, q6, q3 + add r2, sp, #528 + vadd.i32 q10, q10, q2 + vtrn.32 d24, d25 + vst1.8 {d12-d13}, [r2, : 128] + vshl.i32 q6, q13, #1 + add r2, sp, #544 + vst1.8 {d20-d21}, [r2, : 128] + vshl.i32 q10, q14, #1 + add r2, sp, #560 + vst1.8 {d12-d13}, [r2, : 128] + vshl.i32 q15, q12, #1 + vadd.i32 q8, q8, q4 + vext.32 d10, d31, d30, #0 + vadd.i32 q7, q7, q1 + add r2, sp, #576 + vst1.8 {d16-d17}, [r2, : 128] + vmull.s32 q8, d18, d5 + vmlal.s32 q8, d26, d4 + vmlal.s32 q8, d19, d9 + vmlal.s32 q8, d27, d3 + vmlal.s32 q8, d22, d8 + vmlal.s32 q8, d28, d2 + vmlal.s32 q8, d23, d7 + vmlal.s32 q8, d29, d1 + vmlal.s32 q8, d24, d6 + vmlal.s32 q8, d25, d0 + add r2, sp, #592 + vst1.8 {d14-d15}, [r2, : 128] + vmull.s32 q2, d18, d4 + vmlal.s32 q2, d12, d9 + vmlal.s32 q2, d13, d8 + vmlal.s32 q2, d19, d3 + vmlal.s32 q2, d22, d2 + vmlal.s32 q2, d23, d1 + vmlal.s32 q2, d24, d0 + add r2, sp, #608 + vst1.8 {d20-d21}, [r2, : 128] + vmull.s32 q7, d18, d9 + vmlal.s32 q7, d26, d3 + vmlal.s32 q7, d19, d8 + vmlal.s32 q7, d27, d2 + vmlal.s32 q7, d22, d7 + vmlal.s32 q7, d28, d1 + vmlal.s32 q7, d23, d6 + vmlal.s32 q7, d29, d0 + add r2, sp, #624 + vst1.8 {d10-d11}, [r2, : 128] + vmull.s32 q5, d18, d3 + vmlal.s32 q5, d19, d2 + vmlal.s32 q5, d22, d1 + vmlal.s32 q5, d23, d0 + vmlal.s32 q5, d12, d8 + add r2, sp, #640 + vst1.8 {d16-d17}, [r2, : 128] + vmull.s32 q4, d18, d8 + vmlal.s32 q4, d26, d2 + vmlal.s32 q4, d19, d7 + vmlal.s32 q4, d27, d1 + vmlal.s32 q4, d22, d6 + vmlal.s32 q4, d28, d0 + vmull.s32 q8, d18, d7 + vmlal.s32 q8, d26, d1 + vmlal.s32 q8, d19, d6 + vmlal.s32 q8, d27, d0 + add r2, sp, #544 + vld1.8 {d20-d21}, [r2, : 128] + vmlal.s32 q7, d24, d21 + vmlal.s32 q7, d25, d20 + vmlal.s32 q4, d23, d21 + vmlal.s32 q4, d29, d20 + vmlal.s32 q8, d22, d21 + vmlal.s32 q8, d28, d20 + vmlal.s32 q5, d24, d20 + add r2, sp, #544 + vst1.8 {d14-d15}, [r2, : 128] + vmull.s32 q7, d18, d6 + vmlal.s32 q7, d26, d0 + add r2, sp, #624 + vld1.8 {d30-d31}, [r2, : 128] + vmlal.s32 q2, d30, d21 + vmlal.s32 q7, d19, d21 + vmlal.s32 q7, d27, d20 + add r2, sp, #592 + vld1.8 {d26-d27}, [r2, : 128] + vmlal.s32 q4, d25, d27 + vmlal.s32 q8, d29, d27 + vmlal.s32 q8, d25, d26 + vmlal.s32 q7, d28, d27 + vmlal.s32 q7, d29, d26 + add r2, sp, #576 + vld1.8 {d28-d29}, [r2, : 128] + vmlal.s32 q4, d24, d29 + vmlal.s32 q8, d23, d29 + vmlal.s32 q8, d24, d28 + vmlal.s32 q7, d22, d29 + vmlal.s32 q7, d23, d28 + add r2, sp, #576 + vst1.8 {d8-d9}, [r2, : 128] + add r2, sp, #528 + vld1.8 {d8-d9}, [r2, : 128] + vmlal.s32 q7, d24, d9 + vmlal.s32 q7, d25, d31 + vmull.s32 q1, d18, d2 + vmlal.s32 q1, d19, d1 + vmlal.s32 q1, d22, d0 + vmlal.s32 q1, d24, d27 + vmlal.s32 q1, d23, d20 + vmlal.s32 q1, d12, d7 + vmlal.s32 q1, d13, d6 + vmull.s32 q6, d18, d1 + vmlal.s32 q6, d19, d0 + vmlal.s32 q6, d23, d27 + vmlal.s32 q6, d22, d20 + vmlal.s32 q6, d24, d26 + vmull.s32 q0, d18, d0 + vmlal.s32 q0, d22, d27 + vmlal.s32 q0, d23, d26 + vmlal.s32 q0, d24, d31 + vmlal.s32 q0, d19, d20 + add r2, sp, #608 + vld1.8 {d18-d19}, [r2, : 128] + vmlal.s32 q2, d18, d7 + vmlal.s32 q2, d19, d6 + vmlal.s32 q5, d18, d6 + vmlal.s32 q5, d19, d21 + vmlal.s32 q1, d18, d21 + vmlal.s32 q1, d19, d29 + vmlal.s32 q0, d18, d28 + vmlal.s32 q0, d19, d9 + vmlal.s32 q6, d18, d29 + vmlal.s32 q6, d19, d28 + add r2, sp, #560 + vld1.8 {d18-d19}, [r2, : 128] + add r2, sp, #480 + vld1.8 {d22-d23}, [r2, : 128] + vmlal.s32 q5, d19, d7 + vmlal.s32 q0, d18, d21 + vmlal.s32 q0, d19, d29 + vmlal.s32 q6, d18, d6 + add r2, sp, #496 + vld1.8 {d6-d7}, [r2, : 128] + vmlal.s32 q6, d19, d21 + add r2, sp, #544 + vld1.8 {d18-d19}, [r2, : 128] + vmlal.s32 q0, d30, d8 + add r2, sp, #640 + vld1.8 {d20-d21}, [r2, : 128] + vmlal.s32 q5, d30, d29 + add r2, sp, #576 + vld1.8 {d24-d25}, [r2, : 128] + vmlal.s32 q1, d30, d28 + vadd.i64 q13, q0, q11 + vadd.i64 q14, q5, q11 + vmlal.s32 q6, d30, d9 + vshr.s64 q4, q13, #26 + vshr.s64 q13, q14, #26 + vadd.i64 q7, q7, q4 + vshl.i64 q4, q4, #26 + vadd.i64 q14, q7, q3 + vadd.i64 q9, q9, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q15, q9, q3 + vsub.i64 q0, q0, q4 + vshr.s64 q4, q14, #25 + vsub.i64 q5, q5, q13 + vshr.s64 q13, q15, #25 + vadd.i64 q6, q6, q4 + vshl.i64 q4, q4, #25 + vadd.i64 q14, q6, q11 + vadd.i64 q2, q2, q13 + vsub.i64 q4, q7, q4 + vshr.s64 q7, q14, #26 + vshl.i64 q13, q13, #25 + vadd.i64 q14, q2, q11 + vadd.i64 q8, q8, q7 + vshl.i64 q7, q7, #26 + vadd.i64 q15, q8, q3 + vsub.i64 q9, q9, q13 + vshr.s64 q13, q14, #26 + vsub.i64 q6, q6, q7 + vshr.s64 q7, q15, #25 + vadd.i64 q10, q10, q13 + vshl.i64 q13, q13, #26 + vadd.i64 q14, q10, q3 + vadd.i64 q1, q1, q7 + add r2, r3, #240 + vshl.i64 q7, q7, #25 + add r4, r3, #144 + vadd.i64 q15, q1, q11 + add r2, r2, #8 + vsub.i64 q2, q2, q13 + add r4, r4, #8 + vshr.s64 q13, q14, #25 + vsub.i64 q7, q8, q7 + vshr.s64 q8, q15, #26 + vadd.i64 q14, q13, q13 + vadd.i64 q12, q12, q8 + vtrn.32 d12, d14 + vshl.i64 q8, q8, #26 + vtrn.32 d13, d15 + vadd.i64 q3, q12, q3 + vadd.i64 q0, q0, q14 + vst1.8 d12, [r2, : 64]! + vshl.i64 q7, q13, #4 + vst1.8 d13, [r4, : 64]! + vsub.i64 q1, q1, q8 + vshr.s64 q3, q3, #25 + vadd.i64 q0, q0, q7 + vadd.i64 q5, q5, q3 + vshl.i64 q3, q3, #25 + vadd.i64 q6, q5, q11 + vadd.i64 q0, q0, q13 + vshl.i64 q7, q13, #25 + vadd.i64 q8, q0, q11 + vsub.i64 q3, q12, q3 + vshr.s64 q6, q6, #26 + vsub.i64 q7, q10, q7 + vtrn.32 d2, d6 + vshr.s64 q8, q8, #26 + vtrn.32 d3, d7 + vadd.i64 q3, q9, q6 + vst1.8 d2, [r2, : 64] + vshl.i64 q6, q6, #26 + vst1.8 d3, [r4, : 64] + vadd.i64 q1, q4, q8 + vtrn.32 d4, d14 + vshl.i64 q4, q8, #26 + vtrn.32 d5, d15 + vsub.i64 q5, q5, q6 + add r2, r2, #16 + vsub.i64 q0, q0, q4 + vst1.8 d4, [r2, : 64] + add r4, r4, #16 + vst1.8 d5, [r4, : 64] + vtrn.32 d10, d6 + vtrn.32 d11, d7 + sub r2, r2, #8 + sub r4, r4, #8 + vtrn.32 d0, d2 + vtrn.32 d1, d3 + vst1.8 d10, [r2, : 64] + vst1.8 d11, [r4, : 64] + sub r2, r2, #24 + sub r4, r4, #24 + vst1.8 d0, [r2, : 64] + vst1.8 d1, [r4, : 64] + ldr r2, [sp, #456] + ldr r4, [sp, #460] + subs r5, r2, #1 + bge .Lmainloop + add r1, r3, #144 + add r2, r3, #336 + vld1.8 {d0-d1}, [r1, : 128]! + vld1.8 {d2-d3}, [r1, : 128]! + vld1.8 {d4}, [r1, : 64] + vst1.8 {d0-d1}, [r2, : 128]! + vst1.8 {d2-d3}, [r2, : 128]! + vst1.8 d4, [r2, : 64] + movw r1, #0 +.Linvertloop: + add r2, r3, #144 + movw r4, #0 + movw r5, #2 + cmp r1, #1 + moveq r5, #1 + addeq r2, r3, #336 + addeq r4, r3, #48 + cmp r1, #2 + moveq r5, #1 + addeq r2, r3, #48 + cmp r1, #3 + moveq r5, #5 + addeq r4, r3, #336 + cmp r1, #4 + moveq r5, #10 + cmp r1, #5 + moveq r5, #20 + cmp r1, #6 + moveq r5, #10 + addeq r2, r3, #336 + addeq r4, r3, #336 + cmp r1, #7 + moveq r5, #50 + cmp r1, #8 + moveq r5, #100 + cmp r1, #9 + moveq r5, #50 + addeq r2, r3, #336 + cmp r1, #10 + moveq r5, #5 + addeq r2, r3, #48 + cmp r1, #11 + moveq r5, #0 + addeq r2, r3, #96 + add r6, r3, #144 + add r7, r3, #288 + vld1.8 {d0-d1}, [r6, : 128]! + vld1.8 {d2-d3}, [r6, : 128]! + vld1.8 {d4}, [r6, : 64] + vst1.8 {d0-d1}, [r7, : 128]! + vst1.8 {d2-d3}, [r7, : 128]! + vst1.8 d4, [r7, : 64] + cmp r5, #0 + beq .Lskipsquaringloop +.Lsquaringloop: + add r6, r3, #288 + add r7, r3, #288 + add r8, r3, #288 + vmov.i32 q0, #19 + vmov.i32 q1, #0 + vmov.i32 q2, #1 + vzip.i32 q1, q2 + vld1.8 {d4-d5}, [r7, : 128]! + vld1.8 {d6-d7}, [r7, : 128]! + vld1.8 {d9}, [r7, : 64] + vld1.8 {d10-d11}, [r6, : 128]! + add r7, sp, #384 + vld1.8 {d12-d13}, [r6, : 128]! + vmul.i32 q7, q2, q0 + vld1.8 {d8}, [r6, : 64] + vext.32 d17, d11, d10, #1 + vmul.i32 q9, q3, q0 + vext.32 d16, d10, d8, #1 + vshl.u32 q10, q5, q1 + vext.32 d22, d14, d4, #1 + vext.32 d24, d18, d6, #1 + vshl.u32 q13, q6, q1 + vshl.u32 d28, d8, d2 + vrev64.i32 d22, d22 + vmul.i32 d1, d9, d1 + vrev64.i32 d24, d24 + vext.32 d29, d8, d13, #1 + vext.32 d0, d1, d9, #1 + vrev64.i32 d0, d0 + vext.32 d2, d9, d1, #1 + vext.32 d23, d15, d5, #1 + vmull.s32 q4, d20, d4 + vrev64.i32 d23, d23 + vmlal.s32 q4, d21, d1 + vrev64.i32 d2, d2 + vmlal.s32 q4, d26, d19 + vext.32 d3, d5, d15, #1 + vmlal.s32 q4, d27, d18 + vrev64.i32 d3, d3 + vmlal.s32 q4, d28, d15 + vext.32 d14, d12, d11, #1 + vmull.s32 q5, d16, d23 + vext.32 d15, d13, d12, #1 + vmlal.s32 q5, d17, d4 + vst1.8 d8, [r7, : 64]! + vmlal.s32 q5, d14, d1 + vext.32 d12, d9, d8, #0 + vmlal.s32 q5, d15, d19 + vmov.i64 d13, #0 + vmlal.s32 q5, d29, d18 + vext.32 d25, d19, d7, #1 + vmlal.s32 q6, d20, d5 + vrev64.i32 d25, d25 + vmlal.s32 q6, d21, d4 + vst1.8 d11, [r7, : 64]! + vmlal.s32 q6, d26, d1 + vext.32 d9, d10, d10, #0 + vmlal.s32 q6, d27, d19 + vmov.i64 d8, #0 + vmlal.s32 q6, d28, d18 + vmlal.s32 q4, d16, d24 + vmlal.s32 q4, d17, d5 + vmlal.s32 q4, d14, d4 + vst1.8 d12, [r7, : 64]! + vmlal.s32 q4, d15, d1 + vext.32 d10, d13, d12, #0 + vmlal.s32 q4, d29, d19 + vmov.i64 d11, #0 + vmlal.s32 q5, d20, d6 + vmlal.s32 q5, d21, d5 + vmlal.s32 q5, d26, d4 + vext.32 d13, d8, d8, #0 + vmlal.s32 q5, d27, d1 + vmov.i64 d12, #0 + vmlal.s32 q5, d28, d19 + vst1.8 d9, [r7, : 64]! + vmlal.s32 q6, d16, d25 + vmlal.s32 q6, d17, d6 + vst1.8 d10, [r7, : 64] + vmlal.s32 q6, d14, d5 + vext.32 d8, d11, d10, #0 + vmlal.s32 q6, d15, d4 + vmov.i64 d9, #0 + vmlal.s32 q6, d29, d1 + vmlal.s32 q4, d20, d7 + vmlal.s32 q4, d21, d6 + vmlal.s32 q4, d26, d5 + vext.32 d11, d12, d12, #0 + vmlal.s32 q4, d27, d4 + vmov.i64 d10, #0 + vmlal.s32 q4, d28, d1 + vmlal.s32 q5, d16, d0 + sub r6, r7, #32 + vmlal.s32 q5, d17, d7 + vmlal.s32 q5, d14, d6 + vext.32 d30, d9, d8, #0 + vmlal.s32 q5, d15, d5 + vld1.8 {d31}, [r6, : 64]! + vmlal.s32 q5, d29, d4 + vmlal.s32 q15, d20, d0 + vext.32 d0, d6, d18, #1 + vmlal.s32 q15, d21, d25 + vrev64.i32 d0, d0 + vmlal.s32 q15, d26, d24 + vext.32 d1, d7, d19, #1 + vext.32 d7, d10, d10, #0 + vmlal.s32 q15, d27, d23 + vrev64.i32 d1, d1 + vld1.8 {d6}, [r6, : 64] + vmlal.s32 q15, d28, d22 + vmlal.s32 q3, d16, d4 + add r6, r6, #24 + vmlal.s32 q3, d17, d2 + vext.32 d4, d31, d30, #0 + vmov d17, d11 + vmlal.s32 q3, d14, d1 + vext.32 d11, d13, d13, #0 + vext.32 d13, d30, d30, #0 + vmlal.s32 q3, d15, d0 + vext.32 d1, d8, d8, #0 + vmlal.s32 q3, d29, d3 + vld1.8 {d5}, [r6, : 64] + sub r6, r6, #16 + vext.32 d10, d6, d6, #0 + vmov.i32 q1, #0xffffffff + vshl.i64 q4, q1, #25 + add r7, sp, #480 + vld1.8 {d14-d15}, [r7, : 128] + vadd.i64 q9, q2, q7 + vshl.i64 q1, q1, #26 + vshr.s64 q10, q9, #26 + vld1.8 {d0}, [r6, : 64]! + vadd.i64 q5, q5, q10 + vand q9, q9, q1 + vld1.8 {d16}, [r6, : 64]! + add r6, sp, #496 + vld1.8 {d20-d21}, [r6, : 128] + vadd.i64 q11, q5, q10 + vsub.i64 q2, q2, q9 + vshr.s64 q9, q11, #25 + vext.32 d12, d5, d4, #0 + vand q11, q11, q4 + vadd.i64 q0, q0, q9 + vmov d19, d7 + vadd.i64 q3, q0, q7 + vsub.i64 q5, q5, q11 + vshr.s64 q11, q3, #26 + vext.32 d18, d11, d10, #0 + vand q3, q3, q1 + vadd.i64 q8, q8, q11 + vadd.i64 q11, q8, q10 + vsub.i64 q0, q0, q3 + vshr.s64 q3, q11, #25 + vand q11, q11, q4 + vadd.i64 q3, q6, q3 + vadd.i64 q6, q3, q7 + vsub.i64 q8, q8, q11 + vshr.s64 q11, q6, #26 + vand q6, q6, q1 + vadd.i64 q9, q9, q11 + vadd.i64 d25, d19, d21 + vsub.i64 q3, q3, q6 + vshr.s64 d23, d25, #25 + vand q4, q12, q4 + vadd.i64 d21, d23, d23 + vshl.i64 d25, d23, #4 + vadd.i64 d21, d21, d23 + vadd.i64 d25, d25, d21 + vadd.i64 d4, d4, d25 + vzip.i32 q0, q8 + vadd.i64 d12, d4, d14 + add r6, r8, #8 + vst1.8 d0, [r6, : 64] + vsub.i64 d19, d19, d9 + add r6, r6, #16 + vst1.8 d16, [r6, : 64] + vshr.s64 d22, d12, #26 + vand q0, q6, q1 + vadd.i64 d10, d10, d22 + vzip.i32 q3, q9 + vsub.i64 d4, d4, d0 + sub r6, r6, #8 + vst1.8 d6, [r6, : 64] + add r6, r6, #16 + vst1.8 d18, [r6, : 64] + vzip.i32 q2, q5 + sub r6, r6, #32 + vst1.8 d4, [r6, : 64] + subs r5, r5, #1 + bhi .Lsquaringloop +.Lskipsquaringloop: + mov r2, r2 + add r5, r3, #288 + add r6, r3, #144 + vmov.i32 q0, #19 + vmov.i32 q1, #0 + vmov.i32 q2, #1 + vzip.i32 q1, q2 + vld1.8 {d4-d5}, [r5, : 128]! + vld1.8 {d6-d7}, [r5, : 128]! + vld1.8 {d9}, [r5, : 64] + vld1.8 {d10-d11}, [r2, : 128]! + add r5, sp, #384 + vld1.8 {d12-d13}, [r2, : 128]! + vmul.i32 q7, q2, q0 + vld1.8 {d8}, [r2, : 64] + vext.32 d17, d11, d10, #1 + vmul.i32 q9, q3, q0 + vext.32 d16, d10, d8, #1 + vshl.u32 q10, q5, q1 + vext.32 d22, d14, d4, #1 + vext.32 d24, d18, d6, #1 + vshl.u32 q13, q6, q1 + vshl.u32 d28, d8, d2 + vrev64.i32 d22, d22 + vmul.i32 d1, d9, d1 + vrev64.i32 d24, d24 + vext.32 d29, d8, d13, #1 + vext.32 d0, d1, d9, #1 + vrev64.i32 d0, d0 + vext.32 d2, d9, d1, #1 + vext.32 d23, d15, d5, #1 + vmull.s32 q4, d20, d4 + vrev64.i32 d23, d23 + vmlal.s32 q4, d21, d1 + vrev64.i32 d2, d2 + vmlal.s32 q4, d26, d19 + vext.32 d3, d5, d15, #1 + vmlal.s32 q4, d27, d18 + vrev64.i32 d3, d3 + vmlal.s32 q4, d28, d15 + vext.32 d14, d12, d11, #1 + vmull.s32 q5, d16, d23 + vext.32 d15, d13, d12, #1 + vmlal.s32 q5, d17, d4 + vst1.8 d8, [r5, : 64]! + vmlal.s32 q5, d14, d1 + vext.32 d12, d9, d8, #0 + vmlal.s32 q5, d15, d19 + vmov.i64 d13, #0 + vmlal.s32 q5, d29, d18 + vext.32 d25, d19, d7, #1 + vmlal.s32 q6, d20, d5 + vrev64.i32 d25, d25 + vmlal.s32 q6, d21, d4 + vst1.8 d11, [r5, : 64]! + vmlal.s32 q6, d26, d1 + vext.32 d9, d10, d10, #0 + vmlal.s32 q6, d27, d19 + vmov.i64 d8, #0 + vmlal.s32 q6, d28, d18 + vmlal.s32 q4, d16, d24 + vmlal.s32 q4, d17, d5 + vmlal.s32 q4, d14, d4 + vst1.8 d12, [r5, : 64]! + vmlal.s32 q4, d15, d1 + vext.32 d10, d13, d12, #0 + vmlal.s32 q4, d29, d19 + vmov.i64 d11, #0 + vmlal.s32 q5, d20, d6 + vmlal.s32 q5, d21, d5 + vmlal.s32 q5, d26, d4 + vext.32 d13, d8, d8, #0 + vmlal.s32 q5, d27, d1 + vmov.i64 d12, #0 + vmlal.s32 q5, d28, d19 + vst1.8 d9, [r5, : 64]! + vmlal.s32 q6, d16, d25 + vmlal.s32 q6, d17, d6 + vst1.8 d10, [r5, : 64] + vmlal.s32 q6, d14, d5 + vext.32 d8, d11, d10, #0 + vmlal.s32 q6, d15, d4 + vmov.i64 d9, #0 + vmlal.s32 q6, d29, d1 + vmlal.s32 q4, d20, d7 + vmlal.s32 q4, d21, d6 + vmlal.s32 q4, d26, d5 + vext.32 d11, d12, d12, #0 + vmlal.s32 q4, d27, d4 + vmov.i64 d10, #0 + vmlal.s32 q4, d28, d1 + vmlal.s32 q5, d16, d0 + sub r2, r5, #32 + vmlal.s32 q5, d17, d7 + vmlal.s32 q5, d14, d6 + vext.32 d30, d9, d8, #0 + vmlal.s32 q5, d15, d5 + vld1.8 {d31}, [r2, : 64]! + vmlal.s32 q5, d29, d4 + vmlal.s32 q15, d20, d0 + vext.32 d0, d6, d18, #1 + vmlal.s32 q15, d21, d25 + vrev64.i32 d0, d0 + vmlal.s32 q15, d26, d24 + vext.32 d1, d7, d19, #1 + vext.32 d7, d10, d10, #0 + vmlal.s32 q15, d27, d23 + vrev64.i32 d1, d1 + vld1.8 {d6}, [r2, : 64] + vmlal.s32 q15, d28, d22 + vmlal.s32 q3, d16, d4 + add r2, r2, #24 + vmlal.s32 q3, d17, d2 + vext.32 d4, d31, d30, #0 + vmov d17, d11 + vmlal.s32 q3, d14, d1 + vext.32 d11, d13, d13, #0 + vext.32 d13, d30, d30, #0 + vmlal.s32 q3, d15, d0 + vext.32 d1, d8, d8, #0 + vmlal.s32 q3, d29, d3 + vld1.8 {d5}, [r2, : 64] + sub r2, r2, #16 + vext.32 d10, d6, d6, #0 + vmov.i32 q1, #0xffffffff + vshl.i64 q4, q1, #25 + add r5, sp, #480 + vld1.8 {d14-d15}, [r5, : 128] + vadd.i64 q9, q2, q7 + vshl.i64 q1, q1, #26 + vshr.s64 q10, q9, #26 + vld1.8 {d0}, [r2, : 64]! + vadd.i64 q5, q5, q10 + vand q9, q9, q1 + vld1.8 {d16}, [r2, : 64]! + add r2, sp, #496 + vld1.8 {d20-d21}, [r2, : 128] + vadd.i64 q11, q5, q10 + vsub.i64 q2, q2, q9 + vshr.s64 q9, q11, #25 + vext.32 d12, d5, d4, #0 + vand q11, q11, q4 + vadd.i64 q0, q0, q9 + vmov d19, d7 + vadd.i64 q3, q0, q7 + vsub.i64 q5, q5, q11 + vshr.s64 q11, q3, #26 + vext.32 d18, d11, d10, #0 + vand q3, q3, q1 + vadd.i64 q8, q8, q11 + vadd.i64 q11, q8, q10 + vsub.i64 q0, q0, q3 + vshr.s64 q3, q11, #25 + vand q11, q11, q4 + vadd.i64 q3, q6, q3 + vadd.i64 q6, q3, q7 + vsub.i64 q8, q8, q11 + vshr.s64 q11, q6, #26 + vand q6, q6, q1 + vadd.i64 q9, q9, q11 + vadd.i64 d25, d19, d21 + vsub.i64 q3, q3, q6 + vshr.s64 d23, d25, #25 + vand q4, q12, q4 + vadd.i64 d21, d23, d23 + vshl.i64 d25, d23, #4 + vadd.i64 d21, d21, d23 + vadd.i64 d25, d25, d21 + vadd.i64 d4, d4, d25 + vzip.i32 q0, q8 + vadd.i64 d12, d4, d14 + add r2, r6, #8 + vst1.8 d0, [r2, : 64] + vsub.i64 d19, d19, d9 + add r2, r2, #16 + vst1.8 d16, [r2, : 64] + vshr.s64 d22, d12, #26 + vand q0, q6, q1 + vadd.i64 d10, d10, d22 + vzip.i32 q3, q9 + vsub.i64 d4, d4, d0 + sub r2, r2, #8 + vst1.8 d6, [r2, : 64] + add r2, r2, #16 + vst1.8 d18, [r2, : 64] + vzip.i32 q2, q5 + sub r2, r2, #32 + vst1.8 d4, [r2, : 64] + cmp r4, #0 + beq .Lskippostcopy + add r2, r3, #144 + mov r4, r4 + vld1.8 {d0-d1}, [r2, : 128]! + vld1.8 {d2-d3}, [r2, : 128]! + vld1.8 {d4}, [r2, : 64] + vst1.8 {d0-d1}, [r4, : 128]! + vst1.8 {d2-d3}, [r4, : 128]! + vst1.8 d4, [r4, : 64] +.Lskippostcopy: + cmp r1, #1 + bne .Lskipfinalcopy + add r2, r3, #288 + add r4, r3, #144 + vld1.8 {d0-d1}, [r2, : 128]! + vld1.8 {d2-d3}, [r2, : 128]! + vld1.8 {d4}, [r2, : 64] + vst1.8 {d0-d1}, [r4, : 128]! + vst1.8 {d2-d3}, [r4, : 128]! + vst1.8 d4, [r4, : 64] +.Lskipfinalcopy: + add r1, r1, #1 + cmp r1, #12 + blo .Linvertloop + add r1, r3, #144 + ldr r2, [r1], #4 + ldr r3, [r1], #4 + ldr r4, [r1], #4 + ldr r5, [r1], #4 + ldr r6, [r1], #4 + ldr r7, [r1], #4 + ldr r8, [r1], #4 + ldr r9, [r1], #4 + ldr r10, [r1], #4 + ldr r1, [r1] + add r11, r1, r1, LSL #4 + add r11, r11, r1, LSL #1 + add r11, r11, #16777216 + mov r11, r11, ASR #25 + add r11, r11, r2 + mov r11, r11, ASR #26 + add r11, r11, r3 + mov r11, r11, ASR #25 + add r11, r11, r4 + mov r11, r11, ASR #26 + add r11, r11, r5 + mov r11, r11, ASR #25 + add r11, r11, r6 + mov r11, r11, ASR #26 + add r11, r11, r7 + mov r11, r11, ASR #25 + add r11, r11, r8 + mov r11, r11, ASR #26 + add r11, r11, r9 + mov r11, r11, ASR #25 + add r11, r11, r10 + mov r11, r11, ASR #26 + add r11, r11, r1 + mov r11, r11, ASR #25 + add r2, r2, r11 + add r2, r2, r11, LSL #1 + add r2, r2, r11, LSL #4 + mov r11, r2, ASR #26 + add r3, r3, r11 + sub r2, r2, r11, LSL #26 + mov r11, r3, ASR #25 + add r4, r4, r11 + sub r3, r3, r11, LSL #25 + mov r11, r4, ASR #26 + add r5, r5, r11 + sub r4, r4, r11, LSL #26 + mov r11, r5, ASR #25 + add r6, r6, r11 + sub r5, r5, r11, LSL #25 + mov r11, r6, ASR #26 + add r7, r7, r11 + sub r6, r6, r11, LSL #26 + mov r11, r7, ASR #25 + add r8, r8, r11 + sub r7, r7, r11, LSL #25 + mov r11, r8, ASR #26 + add r9, r9, r11 + sub r8, r8, r11, LSL #26 + mov r11, r9, ASR #25 + add r10, r10, r11 + sub r9, r9, r11, LSL #25 + mov r11, r10, ASR #26 + add r1, r1, r11 + sub r10, r10, r11, LSL #26 + mov r11, r1, ASR #25 + sub r1, r1, r11, LSL #25 + add r2, r2, r3, LSL #26 + mov r3, r3, LSR #6 + add r3, r3, r4, LSL #19 + mov r4, r4, LSR #13 + add r4, r4, r5, LSL #13 + mov r5, r5, LSR #19 + add r5, r5, r6, LSL #6 + add r6, r7, r8, LSL #25 + mov r7, r8, LSR #7 + add r7, r7, r9, LSL #19 + mov r8, r9, LSR #13 + add r8, r8, r10, LSL #12 + mov r9, r10, LSR #20 + add r1, r9, r1, LSL #6 + str r2, [r0] + str r3, [r0, #4] + str r4, [r0, #8] + str r5, [r0, #12] + str r6, [r0, #16] + str r7, [r0, #20] + str r8, [r0, #24] + str r1, [r0, #28] + movw r0, #0 + mov sp, ip + pop {r4-r11, pc} +ENDPROC(curve25519_neon) +#endif diff --git a/lib/zinc/curve25519/curve25519.c b/lib/zinc/curve25519/curve25519.c index 32536340d39d..0d5ea97762d4 100644 --- a/lib/zinc/curve25519/curve25519.c +++ b/lib/zinc/curve25519/curve25519.c @@ -21,6 +21,8 @@ #if defined(CONFIG_ZINC_ARCH_X86_64) #include "curve25519-x86_64-glue.h" +#elif defined(CONFIG_ZINC_ARCH_ARM) +#include "curve25519-arm-glue.h" #else void __init curve25519_fpu_init(void) { From patchwork Tue Sep 25 14:56:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147504 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp831164lji; Tue, 25 Sep 2018 07:57:53 -0700 (PDT) X-Google-Smtp-Source: ACcGV605EjPXY2c05FfjUNuZWFE5Ie6VZoino8ANtkd0GciAWHS1k3g7rT+dI71usJIhMN2Trz1t X-Received: by 2002:a17:902:981:: with SMTP id 1-v6mr1663931pln.60.1537887473313; Tue, 25 Sep 2018 07:57:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887473; cv=none; d=google.com; s=arc-20160816; b=KC1NcqCw2SJzYV6wNIamie7XmkMc1Uzmlo+n46yjGYCCL6zWgb4sIS/ilsuDx3nP4g 35Iiy8Z1Nyxwxg3S/oAAs3QAejKI3NU/wHDKOAoaX0VQnRZHw4RD4+S5Xu+70Xt4s0Vx oGRFe6KYC/uU9IuckU+hd5nB/DtkR1MV7haJ8TglsJFfZY6jg41Q9ezf3mGyi3FzY3hL uKPoxndSeFauNmAuPG9fLcd0sW+G7c1RDZ4vtwiXiRvpZkb7nP4cOd8ivHxDLUDUJJVr xPWOfodxfBK8ioJzQhb4gKQpolfAfKMbkm1hJAiAshQ+DstrSkUnKS4DeAA7Wt3a+SFb p4WQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HMfekfbg16Js0MY621VMzQ37aC3ncj8GGByb9X1A+rM=; b=PElE/IeK0YZ/GDKWtMnKb5hAoOTqVTpyvxXgz4+w9zcmCpLzQhIL1MCMPO5q6PkTtz /lZxi2Nyl4D93WxFZz3r5zaSDHCRqdr51aBt0R2K7Tax3hXTlZ0VeTQnfGp+Ln3FPA2p gaU7f+npJ3JT9Z2rXuwqoN9HW7fHOdzNQg6FfjGuVP+/FwIliK/NsHKwKm1gQS0t20Vj 1dIG+tt1Qqg0fb+bZeU1W8zPVbjNPmsS3NUl5pQeRTo7unwBKHlpLfHbu+M+guTUolwb xOvIOy1SnFnvD7+/M5ffMFaF3QtuLqHUO0u4UN/aSs7IJa8QA9zGg819+dGnsKL2z4pN 3QDA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=pJnzVFWn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a32-v6si2377520pgm.24.2018.09.25.07.57.52; Tue, 25 Sep 2018 07:57:53 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=pJnzVFWn; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730033AbeIYVFn (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:43 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:36361 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729877AbeIYVFk (ORCPT ); Tue, 25 Sep 2018 17:05:40 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id b5db582c; Tue, 25 Sep 2018 14:39:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-transfer-encoding; s=mail; bh=CgwqL7BLhRalvfTRbOjy1L1fS Js=; b=pJnzVFWnBi3Mssqd9KGVkychRHFClS9godTFuPAIv/OMMLyqTr8SDnwjj vvorC6lXLD6ZRF2lhVccCa4kBxaNJpQKRQTPquZteMZzWzBBsQ4sTGEaUgGmgH5D iLne01AjP0HQO/iQXAeoHil5Tk9iyki96glOKohJ6NcJgMT6TnFXw0QgsR6OgM5B yn+26ytPwFARKtcExfaNKpUe7dliwkbs/rKH5Vm8oy6iONm0hoaXYF1xFgeegUq+ P0+hQvs0wtcEILP9YW61T76Q2b6dnNUGYpTA4RdsulDe59SiVEaUVDp8Wg4U7lfo EP65G2AFV97BgiWyijxX4z2+QmFuQ== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 8b04d298 (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:39:05 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" , Samuel Neves , Andy Lutomirski , Jean-Philippe Aumasson , Eric Biggers , David Howells Subject: [PATCH net-next v6 22/23] security/keys: rewrite big_key crypto to use Zinc Date: Tue, 25 Sep 2018 16:56:21 +0200 Message-Id: <20180925145622.29959-23-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org A while back, I noticed that the crypto and crypto API usage in big_keys were entirely broken in multiple ways, so I rewrote it. Now, I'm rewriting it again, but this time using Zinc's ChaCha20Poly1305 function. This makes the file considerably more simple; the diffstat alone should justify this commit. It also should be faster, since it no longer requires a mutex around the "aead api object" (nor allocations), allowing us to encrypt multiple items in parallel. We also benefit from being able to pass any type of pointer to Zinc, so we can get rid of the ridiculously complex custom page allocator that big_key really doesn't need. Signed-off-by: Jason A. Donenfeld Cc: Samuel Neves Cc: Andy Lutomirski Cc: Greg KH Cc: Jean-Philippe Aumasson Cc: Eric Biggers Cc: David Howells --- security/keys/Kconfig | 4 +- security/keys/big_key.c | 230 +++++----------------------------------- 2 files changed, 28 insertions(+), 206 deletions(-) -- 2.19.0 diff --git a/security/keys/Kconfig b/security/keys/Kconfig index 6462e6654ccf..66ff26298fb3 100644 --- a/security/keys/Kconfig +++ b/security/keys/Kconfig @@ -45,9 +45,7 @@ config BIG_KEYS bool "Large payload keys" depends on KEYS depends on TMPFS - select CRYPTO - select CRYPTO_AES - select CRYPTO_GCM + select ZINC_CHACHA20POLY1305 help This option provides support for holding large keys within the kernel (for example Kerberos ticket caches). The data may be stored out to diff --git a/security/keys/big_key.c b/security/keys/big_key.c index 2806e70d7f8f..bb15360221d8 100644 --- a/security/keys/big_key.c +++ b/security/keys/big_key.c @@ -1,6 +1,6 @@ /* Large capacity key type * - * Copyright (C) 2017 Jason A. Donenfeld . All Rights Reserved. + * Copyright (C) 2017-2018 Jason A. Donenfeld . All Rights Reserved. * Copyright (C) 2013 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * @@ -16,20 +16,10 @@ #include #include #include -#include #include -#include #include #include -#include -#include - -struct big_key_buf { - unsigned int nr_pages; - void *virt; - struct scatterlist *sg; - struct page *pages[]; -}; +#include /* * Layout of key payload words. @@ -41,14 +31,6 @@ enum { big_key_len, }; -/* - * Crypto operation with big_key data - */ -enum big_key_op { - BIG_KEY_ENC, - BIG_KEY_DEC, -}; - /* * If the data is under this limit, there's no point creating a shm file to * hold it as the permanently resident metadata for the shmem fs will be at @@ -56,16 +38,6 @@ enum big_key_op { */ #define BIG_KEY_FILE_THRESHOLD (sizeof(struct inode) + sizeof(struct dentry)) -/* - * Key size for big_key data encryption - */ -#define ENC_KEY_SIZE 32 - -/* - * Authentication tag length - */ -#define ENC_AUTHTAG_SIZE 16 - /* * big_key defined keys take an arbitrary string as the description and an * arbitrary blob of data as the payload @@ -79,136 +51,20 @@ struct key_type key_type_big_key = { .destroy = big_key_destroy, .describe = big_key_describe, .read = big_key_read, - /* no ->update(); don't add it without changing big_key_crypt() nonce */ + /* no ->update(); don't add it without changing chacha20poly1305's nonce */ }; -/* - * Crypto names for big_key data authenticated encryption - */ -static const char big_key_alg_name[] = "gcm(aes)"; -#define BIG_KEY_IV_SIZE GCM_AES_IV_SIZE - -/* - * Crypto algorithms for big_key data authenticated encryption - */ -static struct crypto_aead *big_key_aead; - -/* - * Since changing the key affects the entire object, we need a mutex. - */ -static DEFINE_MUTEX(big_key_aead_lock); - -/* - * Encrypt/decrypt big_key data - */ -static int big_key_crypt(enum big_key_op op, struct big_key_buf *buf, size_t datalen, u8 *key) -{ - int ret; - struct aead_request *aead_req; - /* We always use a zero nonce. The reason we can get away with this is - * because we're using a different randomly generated key for every - * different encryption. Notably, too, key_type_big_key doesn't define - * an .update function, so there's no chance we'll wind up reusing the - * key to encrypt updated data. Simply put: one key, one encryption. - */ - u8 zero_nonce[BIG_KEY_IV_SIZE]; - - aead_req = aead_request_alloc(big_key_aead, GFP_KERNEL); - if (!aead_req) - return -ENOMEM; - - memset(zero_nonce, 0, sizeof(zero_nonce)); - aead_request_set_crypt(aead_req, buf->sg, buf->sg, datalen, zero_nonce); - aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_SLEEP, NULL, NULL); - aead_request_set_ad(aead_req, 0); - - mutex_lock(&big_key_aead_lock); - if (crypto_aead_setkey(big_key_aead, key, ENC_KEY_SIZE)) { - ret = -EAGAIN; - goto error; - } - if (op == BIG_KEY_ENC) - ret = crypto_aead_encrypt(aead_req); - else - ret = crypto_aead_decrypt(aead_req); -error: - mutex_unlock(&big_key_aead_lock); - aead_request_free(aead_req); - return ret; -} - -/* - * Free up the buffer. - */ -static void big_key_free_buffer(struct big_key_buf *buf) -{ - unsigned int i; - - if (buf->virt) { - memset(buf->virt, 0, buf->nr_pages * PAGE_SIZE); - vunmap(buf->virt); - } - - for (i = 0; i < buf->nr_pages; i++) - if (buf->pages[i]) - __free_page(buf->pages[i]); - - kfree(buf); -} - -/* - * Allocate a buffer consisting of a set of pages with a virtual mapping - * applied over them. - */ -static void *big_key_alloc_buffer(size_t len) -{ - struct big_key_buf *buf; - unsigned int npg = (len + PAGE_SIZE - 1) >> PAGE_SHIFT; - unsigned int i, l; - - buf = kzalloc(sizeof(struct big_key_buf) + - sizeof(struct page) * npg + - sizeof(struct scatterlist) * npg, - GFP_KERNEL); - if (!buf) - return NULL; - - buf->nr_pages = npg; - buf->sg = (void *)(buf->pages + npg); - sg_init_table(buf->sg, npg); - - for (i = 0; i < buf->nr_pages; i++) { - buf->pages[i] = alloc_page(GFP_KERNEL); - if (!buf->pages[i]) - goto nomem; - - l = min_t(size_t, len, PAGE_SIZE); - sg_set_page(&buf->sg[i], buf->pages[i], l, 0); - len -= l; - } - - buf->virt = vmap(buf->pages, buf->nr_pages, VM_MAP, PAGE_KERNEL); - if (!buf->virt) - goto nomem; - - return buf; - -nomem: - big_key_free_buffer(buf); - return NULL; -} - /* * Preparse a big key */ int big_key_preparse(struct key_preparsed_payload *prep) { - struct big_key_buf *buf; struct path *path = (struct path *)&prep->payload.data[big_key_path]; struct file *file; - u8 *enckey; + u8 *buf, *enckey; ssize_t written; - size_t datalen = prep->datalen, enclen = datalen + ENC_AUTHTAG_SIZE; + size_t datalen = prep->datalen; + size_t enclen = datalen + CHACHA20POLY1305_AUTHTAG_SIZE; int ret; if (datalen <= 0 || datalen > 1024 * 1024 || !prep->data) @@ -224,28 +80,28 @@ int big_key_preparse(struct key_preparsed_payload *prep) * to be swapped out if needed. * * File content is stored encrypted with randomly generated key. + * Since the key is random for each file, we can set the nonce + * to zero, provided we never define a ->update() call. */ loff_t pos = 0; - buf = big_key_alloc_buffer(enclen); + buf = kvmalloc(enclen, GFP_KERNEL); if (!buf) return -ENOMEM; - memcpy(buf->virt, prep->data, datalen); /* generate random key */ - enckey = kmalloc(ENC_KEY_SIZE, GFP_KERNEL); + enckey = kmalloc(CHACHA20POLY1305_KEY_SIZE, GFP_KERNEL); if (!enckey) { ret = -ENOMEM; goto error; } - ret = get_random_bytes_wait(enckey, ENC_KEY_SIZE); + ret = get_random_bytes_wait(enckey, CHACHA20POLY1305_KEY_SIZE); if (unlikely(ret)) goto err_enckey; - /* encrypt aligned data */ - ret = big_key_crypt(BIG_KEY_ENC, buf, datalen, enckey); - if (ret) - goto err_enckey; + /* encrypt data */ + chacha20poly1305_encrypt(buf, prep->data, datalen, NULL, 0, + 0, enckey); /* save aligned data to file */ file = shmem_kernel_file_setup("", enclen, 0); @@ -254,7 +110,7 @@ int big_key_preparse(struct key_preparsed_payload *prep) goto err_enckey; } - written = kernel_write(file, buf->virt, enclen, &pos); + written = kernel_write(file, buf, enclen, &pos); if (written != enclen) { ret = written; if (written >= 0) @@ -269,7 +125,7 @@ int big_key_preparse(struct key_preparsed_payload *prep) *path = file->f_path; path_get(path); fput(file); - big_key_free_buffer(buf); + kvfree(buf); } else { /* Just store the data in a buffer */ void *data = kmalloc(datalen, GFP_KERNEL); @@ -287,7 +143,7 @@ int big_key_preparse(struct key_preparsed_payload *prep) err_enckey: kzfree(enckey); error: - big_key_free_buffer(buf); + kvfree(buf); return ret; } @@ -365,14 +221,13 @@ long big_key_read(const struct key *key, char __user *buffer, size_t buflen) return datalen; if (datalen > BIG_KEY_FILE_THRESHOLD) { - struct big_key_buf *buf; struct path *path = (struct path *)&key->payload.data[big_key_path]; struct file *file; - u8 *enckey = (u8 *)key->payload.data[big_key_data]; - size_t enclen = datalen + ENC_AUTHTAG_SIZE; + u8 *buf, *enckey = (u8 *)key->payload.data[big_key_data]; + size_t enclen = datalen + CHACHA20POLY1305_AUTHTAG_SIZE; loff_t pos = 0; - buf = big_key_alloc_buffer(enclen); + buf = kvmalloc(enclen, GFP_KERNEL); if (!buf) return -ENOMEM; @@ -383,26 +238,27 @@ long big_key_read(const struct key *key, char __user *buffer, size_t buflen) } /* read file to kernel and decrypt */ - ret = kernel_read(file, buf->virt, enclen, &pos); + ret = kernel_read(file, buf, enclen, &pos); if (ret >= 0 && ret != enclen) { ret = -EIO; goto err_fput; } - ret = big_key_crypt(BIG_KEY_DEC, buf, enclen, enckey); - if (ret) + ret = chacha20poly1305_decrypt(buf, buf, enclen, NULL, 0, 0, + enckey) ? 0 : -EINVAL; + if (unlikely(ret)) goto err_fput; ret = datalen; /* copy decrypted data to user */ - if (copy_to_user(buffer, buf->virt, datalen) != 0) + if (copy_to_user(buffer, buf, datalen) != 0) ret = -EFAULT; err_fput: fput(file); error: - big_key_free_buffer(buf); + kvfree(buf); } else { ret = datalen; if (copy_to_user(buffer, key->payload.data[big_key_data], @@ -418,39 +274,7 @@ long big_key_read(const struct key *key, char __user *buffer, size_t buflen) */ static int __init big_key_init(void) { - int ret; - - /* init block cipher */ - big_key_aead = crypto_alloc_aead(big_key_alg_name, 0, CRYPTO_ALG_ASYNC); - if (IS_ERR(big_key_aead)) { - ret = PTR_ERR(big_key_aead); - pr_err("Can't alloc crypto: %d\n", ret); - return ret; - } - - if (unlikely(crypto_aead_ivsize(big_key_aead) != BIG_KEY_IV_SIZE)) { - WARN(1, "big key algorithm changed?"); - ret = -EINVAL; - goto free_aead; - } - - ret = crypto_aead_setauthsize(big_key_aead, ENC_AUTHTAG_SIZE); - if (ret < 0) { - pr_err("Can't set crypto auth tag len: %d\n", ret); - goto free_aead; - } - - ret = register_key_type(&key_type_big_key); - if (ret < 0) { - pr_err("Can't register type: %d\n", ret); - goto free_aead; - } - - return 0; - -free_aead: - crypto_free_aead(big_key_aead); - return ret; + return register_key_type(&key_type_big_key); } late_initcall(big_key_init); From patchwork Tue Sep 25 14:56:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Jason A. Donenfeld" X-Patchwork-Id: 147506 Delivered-To: patch@linaro.org Received: by 2002:a2e:8595:0:0:0:0:0 with SMTP id b21-v6csp831438lji; Tue, 25 Sep 2018 07:58:11 -0700 (PDT) X-Google-Smtp-Source: ACcGV61lakcOuHpyK4VXsiYQX8l5eENcIrYc1UYwjqWS5ZYH7dLgxPP2iR8DoItRDnw0mqYgdf0m X-Received: by 2002:a63:91c9:: with SMTP id l192-v6mr1452115pge.433.1537887490801; Tue, 25 Sep 2018 07:58:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1537887490; cv=none; d=google.com; s=arc-20160816; b=aCZYWx4VkxKBJ59cAeAil1w0i267aMnemH1KYocr9UKA3NfICN4Mrd3z58ZonKa9za u+kymiOaErYXD0tAphLlXO6RRse2JtEpNvRjgGSeB+eV7vcW5vT7KlpaZzieEA/aeA/1 VyRuggFYYnlZDNZgAE19HkPebmv01Z7j8c6YnMcGGSaAhJ7einOyzwj7OGncv2sX9tOR 9TdYIFlJD0N/alm3s+2fdncCtEa5roAwu+wtUUjeHhzGpPQgSxwnYZhZJPbJ9+M94IC/ kjIpz4jBV66wZYXfL6wROiGYr2cjGe7t1ksxjZ+hMzb+uZ6xhiMOj7EFUEwtEf2da/ui vt3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AFtf3M8nUlE/53WXwCZFKLQ7XCc3Uok18erhGPuiMqE=; b=znld/uYUmR0rzeteCYTYFM/sQDiYrVVYiZzJmKmLs1YM6kyyG2gcIqLez7V0s2DL42 YTlo7LOa04WixMyVELOM4Oyn/dH9j9uVoA2coTS5jN32RhCgBXnHWFfjgqWK2Gf+P3oT 8ZbcF+w0fIANOsod3zvpIPrso0W+YbeZmVXbnUVDfgoAxSCDBi1OEpeneYZt5kf5BpcF MFoDRjBQ9D46MYOWyBQP4r8V/0NEzXupM9v3q4/XwjGWPQyWh9D8zimFdrVdMJxPznQ/ 14DglTxh+DQwE5O0B3HSLVvw6nMkWdSOok+Ufpx1Qq9g4zoERE0BinWjhNy1e5Bc84x4 1cjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=A4ISOXWp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u1-v6si2623578pgq.1.2018.09.25.07.58.08; Tue, 25 Sep 2018 07:58:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@zx2c4.com header.s=mail header.b=A4ISOXWp; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=zx2c4.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730081AbeIYVF6 (ORCPT + 32 others); Tue, 25 Sep 2018 17:05:58 -0400 Received: from frisell.zx2c4.com ([192.95.5.64]:38531 "EHLO frisell.zx2c4.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729991AbeIYVF5 (ORCPT ); Tue, 25 Sep 2018 17:05:57 -0400 Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTP id 72dd11fc; Tue, 25 Sep 2018 14:39:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=zx2c4.com; h=from:to:cc :subject:date:message-id:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=mail; bh=rDpz/ZL0T8Oq KSCoLjFMsO7cj44=; b=A4ISOXWpsIDLwoxyZvhIMA32hJi04NmUf6De7T6Xyo5j UtLiYrk8FXgeJESsCqU+7Op7zJZhWCiYAq6SXbfb942gG/+YrV50ePYGmBwW1IB1 nAmODmDjDSAItbkvjVxoDut6xj1io49fmNgAkPdYcsI0BM7u8kUtBkzM0f7rh3/f gNRPdJ2OZEB21WfYMg6y0dCdH9omEjdsoQ9fplJWaiFBi1F5itK862N+MAWQ76Lf fRRs2cG6XIsPMgyabtL+hr3fwqO4wkD7qajVs+U/oslR0ezDHRoNE4qJeO4TYnv6 7xXSNFUqq/hQLqsGy/BVbAIkztdrUfOhRmagFsc6Ng== Received: by frisell.zx2c4.com (ZX2C4 Mail Server) with ESMTPSA id 2ff82d1e (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256:NO); Tue, 25 Sep 2018 14:39:07 +0000 (UTC) From: "Jason A. Donenfeld" To: linux-kernel@vger.kernel.org, netdev@vger.kernel.org, linux-crypto@vger.kernel.org, davem@davemloft.net, gregkh@linuxfoundation.org Cc: "Jason A. Donenfeld" Subject: [PATCH net-next v6 23/23] net: WireGuard secure network tunnel Date: Tue, 25 Sep 2018 16:56:22 +0200 Message-Id: <20180925145622.29959-24-Jason@zx2c4.com> In-Reply-To: <20180925145622.29959-1-Jason@zx2c4.com> References: <20180925145622.29959-1-Jason@zx2c4.com> MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org WireGuard is a layer 3 secure networking tunnel made specifically for the kernel, that aims to be much simpler and easier to audit than IPsec. Extensive documentation and description of the protocol and considerations, along with formal proofs of the cryptography, are available at: * https://www.wireguard.com/ * https://www.wireguard.com/papers/wireguard.pdf This commit implements WireGuard as a simple network device driver, accessible in the usual RTNL way used by virtual network drivers. It makes use of the udp_tunnel APIs, GRO, GSO, NAPI, and the usual set of networking subsystem APIs. It has a somewhat novel multicore queueing system designed for maximum throughput and minimal latency of encryption operations, but it is implemented modestly using workqueues and NAPI. Configuration is done via generic Netlink, and following a review from the Netlink maintainer a year ago, several high profile userspace have already implemented the API. This commit also comes with several different tests, both in-kernel tests and out-of-kernel tests based on network namespaces, taking profit of the fact that sockets used by WireGuard intentionally stay in the namespace the WireGuard interface was originally created, exactly like the semantics of userspace tun devices. See wireguard.com/netns/ for pictures and examples. The source code is fairly short, but rather than combining everything into a single file, WireGuard is developed as cleanly separable files, making auditing and comprehension easier. Things are laid out as follows: * noise.[ch], cookie.[ch], messages.h: These implement the bulk of the cryptographic aspects of the protocol, and are mostly data-only in nature, taking in buffers of bytes and spitting out buffers of bytes. They also handle reference counting for their various shared pieces of data, like keys and key lists. * ratelimiter.[ch]: Used as an integral part of cookie.[ch] for ratelimiting certain types of cryptographic operations in accordance with particular WireGuard semantics. * allowedips.[ch], hashtables.[ch]: The main lookup structures of WireGuard, the former being trie-like with particular semantics, an integral part of the design of the protocol, and the latter just being nice helper functions around the specific hashtables we use. * device.[ch]: Implementation of functions for the netdevice and for rtnl, responsible for maintaining the life of a given interface and wiring it up to the rest of WireGuard. * peer.[ch]: Each interface has a list of peers, with helper functions available here for creation, destruction, and reference counting. * socket.[ch]: Implementation of functions related to udp_socket and the general set of kernel socket APIs, for sending and receiving ciphertext UDP packets, and taking care of WireGuard-specific sticky socket routing semantics for the automatic roaming. * netlink.[ch]: Userspace API entry point for configuring WireGuard peers and devices. The API has been implemented by several userspace tools and network management utility, and the WireGuard project distributes the basic wg(8) tool. * queueing.[ch]: Shared function on the rx and tx path for handling the various queues used in the multicore algorithms. * send.c: Handles encrypting outgoing packets in parallel on multiple cores, before sending them in order on a single core, via workqueues and ring buffers. Also handles sending handshake and cookie messages as part of the protocol, in parallel. * receive.c: Handles decrypting incoming packets in parallel on multiple cores, before passing them off in order to be ingested via the rest of the networking subsystem with GRO via the typical NAPI poll function. Also handles receiving handshake and cookie messages as part of the protocol, in parallel. * timers.[ch]: Uses the timer wheel to implement protocol particular event timeouts, and gives a set of very simple event-driven entry point functions for callers. * main.c, version.h: Initialization and deinitialization of the module. * selftest/*.h: Runtime unit tests for some of the most security sensitive functions. * tools/testing/selftests/wireguard/netns.sh: Aforementioned testing script using network namespaces. This commit aims to be as self-contained as possible, implementing WireGuard as a standalone module not needing much special handling or coordination from the network subsystem. I expect for future optimizations to the network stack to positively improve WireGuard, and vice-versa, but for the time being, this exists as intentionally standalone. We introduce a menu option for CONFIG_WIREGUARD, as well as providing a verbose debug log and self-tests via CONFIG_WIREGUARD_DEBUG. Signed-off-by: Jason A. Donenfeld Cc: David Miller Cc: Greg KH --- MAINTAINERS | 8 + drivers/net/Kconfig | 30 + drivers/net/Makefile | 1 + drivers/net/wireguard/Makefile | 18 + drivers/net/wireguard/allowedips.c | 404 ++++++++++ drivers/net/wireguard/allowedips.h | 55 ++ drivers/net/wireguard/cookie.c | 234 ++++++ drivers/net/wireguard/cookie.h | 59 ++ drivers/net/wireguard/device.c | 438 +++++++++++ drivers/net/wireguard/device.h | 65 ++ drivers/net/wireguard/hashtables.c | 209 +++++ drivers/net/wireguard/hashtables.h | 63 ++ drivers/net/wireguard/main.c | 65 ++ drivers/net/wireguard/messages.h | 128 +++ drivers/net/wireguard/netlink.c | 606 ++++++++++++++ drivers/net/wireguard/netlink.h | 12 + drivers/net/wireguard/noise.c | 784 +++++++++++++++++++ drivers/net/wireguard/noise.h | 129 +++ drivers/net/wireguard/peer.c | 191 +++++ drivers/net/wireguard/peer.h | 87 ++ drivers/net/wireguard/queueing.c | 52 ++ drivers/net/wireguard/queueing.h | 193 +++++ drivers/net/wireguard/ratelimiter.c | 220 ++++++ drivers/net/wireguard/ratelimiter.h | 19 + drivers/net/wireguard/receive.c | 595 ++++++++++++++ drivers/net/wireguard/selftest/allowedips.h | 663 ++++++++++++++++ drivers/net/wireguard/selftest/counter.h | 103 +++ drivers/net/wireguard/selftest/ratelimiter.h | 178 +++++ drivers/net/wireguard/send.c | 420 ++++++++++ drivers/net/wireguard/socket.c | 432 ++++++++++ drivers/net/wireguard/socket.h | 44 ++ drivers/net/wireguard/timers.c | 256 ++++++ drivers/net/wireguard/timers.h | 30 + drivers/net/wireguard/version.h | 1 + include/uapi/linux/wireguard.h | 190 +++++ tools/testing/selftests/wireguard/netns.sh | 499 ++++++++++++ 36 files changed, 7481 insertions(+) create mode 100644 drivers/net/wireguard/Makefile create mode 100644 drivers/net/wireguard/allowedips.c create mode 100644 drivers/net/wireguard/allowedips.h create mode 100644 drivers/net/wireguard/cookie.c create mode 100644 drivers/net/wireguard/cookie.h create mode 100644 drivers/net/wireguard/device.c create mode 100644 drivers/net/wireguard/device.h create mode 100644 drivers/net/wireguard/hashtables.c create mode 100644 drivers/net/wireguard/hashtables.h create mode 100644 drivers/net/wireguard/main.c create mode 100644 drivers/net/wireguard/messages.h create mode 100644 drivers/net/wireguard/netlink.c create mode 100644 drivers/net/wireguard/netlink.h create mode 100644 drivers/net/wireguard/noise.c create mode 100644 drivers/net/wireguard/noise.h create mode 100644 drivers/net/wireguard/peer.c create mode 100644 drivers/net/wireguard/peer.h create mode 100644 drivers/net/wireguard/queueing.c create mode 100644 drivers/net/wireguard/queueing.h create mode 100644 drivers/net/wireguard/ratelimiter.c create mode 100644 drivers/net/wireguard/ratelimiter.h create mode 100644 drivers/net/wireguard/receive.c create mode 100644 drivers/net/wireguard/selftest/allowedips.h create mode 100644 drivers/net/wireguard/selftest/counter.h create mode 100644 drivers/net/wireguard/selftest/ratelimiter.h create mode 100644 drivers/net/wireguard/send.c create mode 100644 drivers/net/wireguard/socket.c create mode 100644 drivers/net/wireguard/socket.h create mode 100644 drivers/net/wireguard/timers.c create mode 100644 drivers/net/wireguard/timers.h create mode 100644 drivers/net/wireguard/version.h create mode 100644 include/uapi/linux/wireguard.h create mode 100755 tools/testing/selftests/wireguard/netns.sh -- 2.19.0 diff --git a/MAINTAINERS b/MAINTAINERS index 5967c737f3ce..32db7ebad86e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -15823,6 +15823,14 @@ L: linux-gpio@vger.kernel.org S: Maintained F: drivers/gpio/gpio-ws16c48.c +WIREGUARD SECURE NETWORK TUNNEL +M: Jason A. Donenfeld +S: Maintained +F: drivers/net/wireguard/ +F: tools/testing/selftests/wireguard/ +L: wireguard@lists.zx2c4.com +L: netdev@vger.kernel.org + WISTRON LAPTOP BUTTON DRIVER M: Miloslav Trmac S: Maintained diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig index d03775100f7d..aa631fe3b395 100644 --- a/drivers/net/Kconfig +++ b/drivers/net/Kconfig @@ -70,6 +70,36 @@ config DUMMY To compile this driver as a module, choose M here: the module will be called dummy. +config WIREGUARD + tristate "WireGuard secure network tunnel" + depends on NET && INET + select NET_UDP_TUNNEL + select DST_CACHE + select ZINC_CHACHA20POLY1305 + select ZINC_BLAKE2S + select ZINC_CURVE25519 + default m + help + WireGuard is a secure, fast, and easy to use replacement for IPSec + that uses modern cryptography and clever networking tricks. It's + designed to be fairly general purpose and abstract enough to fit most + use cases, while at the same time remaining extremely simple to + configure. See www.wireguard.com for more info. + + It's safe to say Y or M here, as the driver is very lightweight and + is only in use when an administrator chooses to add an interface. + +config WIREGUARD_DEBUG + bool "Debugging checks and verbose messages" + depends on WIREGUARD + help + This will write log messages for handshake and other events + that occur for a WireGuard interface. It will also perform some + extra validation checks and unit tests at various points. This is + only useful for debugging. + + Say N here unless you know what you're doing. + config EQUALIZER tristate "EQL (serial line load balancing) support" ---help--- diff --git a/drivers/net/Makefile b/drivers/net/Makefile index 21cde7e78621..f0acd11a143d 100644 --- a/drivers/net/Makefile +++ b/drivers/net/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_RIONET) += rionet.o obj-$(CONFIG_NET_TEAM) += team/ obj-$(CONFIG_TUN) += tun.o obj-$(CONFIG_TAP) += tap.o +obj-$(CONFIG_WIREGUARD) += wireguard/ obj-$(CONFIG_VETH) += veth.o obj-$(CONFIG_VIRTIO_NET) += virtio_net.o obj-$(CONFIG_VXLAN) += vxlan.o diff --git a/drivers/net/wireguard/Makefile b/drivers/net/wireguard/Makefile new file mode 100644 index 000000000000..d8856255bc9d --- /dev/null +++ b/drivers/net/wireguard/Makefile @@ -0,0 +1,18 @@ +ccflags-y := -O3 +ccflags-y += -D'pr_fmt(fmt)=KBUILD_MODNAME ": " fmt' +ccflags-$(CONFIG_WIREGUARD_DEBUG) += -DDEBUG +wireguard-y := main.o +wireguard-y += noise.o +wireguard-y += device.o +wireguard-y += peer.o +wireguard-y += timers.o +wireguard-y += queueing.o +wireguard-y += send.o +wireguard-y += receive.o +wireguard-y += socket.o +wireguard-y += hashtables.o +wireguard-y += allowedips.o +wireguard-y += ratelimiter.o +wireguard-y += cookie.o +wireguard-y += netlink.o +obj-$(CONFIG_WIREGUARD) := wireguard.o diff --git a/drivers/net/wireguard/allowedips.c b/drivers/net/wireguard/allowedips.c new file mode 100644 index 000000000000..f8f026c5ac1a --- /dev/null +++ b/drivers/net/wireguard/allowedips.c @@ -0,0 +1,404 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "allowedips.h" +#include "peer.h" + +struct allowedips_node { + struct wireguard_peer __rcu *peer; + struct rcu_head rcu; + struct allowedips_node __rcu *bit[2]; + /* While it may seem scandalous that we waste space for v4, + * we're alloc'ing to the nearest power of 2 anyway, so this + * doesn't actually make a difference. + */ + u8 bits[16] __aligned(__alignof(u64)); + u8 cidr, bit_at_a, bit_at_b; +}; + +static __always_inline void swap_endian(u8 *dst, const u8 *src, u8 bits) +{ + if (bits == 32) + *(u32 *)dst = be32_to_cpu(*(const __be32 *)src); + else if (bits == 128) { + ((u64 *)dst)[0] = be64_to_cpu(((const __be64 *)src)[0]); + ((u64 *)dst)[1] = be64_to_cpu(((const __be64 *)src)[1]); + } +} + +static void copy_and_assign_cidr(struct allowedips_node *node, const u8 *src, + u8 cidr, u8 bits) +{ + node->cidr = cidr; + node->bit_at_a = cidr / 8U; +#ifdef __LITTLE_ENDIAN + node->bit_at_a ^= (bits / 8U - 1U) % 8U; +#endif + node->bit_at_b = 7U - (cidr % 8U); + memcpy(node->bits, src, bits / 8U); +} + +#define choose_node(parent, key) \ + parent->bit[(key[parent->bit_at_a] >> parent->bit_at_b) & 1] + +static void node_free_rcu(struct rcu_head *rcu) +{ + kfree(container_of(rcu, struct allowedips_node, rcu)); +} + +#define push_rcu(stack, p, len) ({ \ + if (rcu_access_pointer(p)) { \ + WARN_ON(len >= 128); \ + stack[len++] = rcu_dereference_raw(p); \ + } \ + true; \ + }) +static void root_free_rcu(struct rcu_head *rcu) +{ + struct allowedips_node *node, *stack[128] = { + container_of(rcu, struct allowedips_node, rcu) }; + unsigned int len = 1; + + while (len > 0 && (node = stack[--len]) && + push_rcu(stack, node->bit[0], len) && + push_rcu(stack, node->bit[1], len)) + kfree(node); +} + +static int +walk_by_peer(struct allowedips_node __rcu *top, u8 bits, + struct allowedips_cursor *cursor, struct wireguard_peer *peer, + int (*func)(void *ctx, const u8 *ip, u8 cidr, int family), + void *ctx, struct mutex *lock) +{ + const int address_family = bits == 32 ? AF_INET : AF_INET6; + u8 ip[16] __aligned(__alignof(u64)); + struct allowedips_node *node; + int ret; + + if (!rcu_access_pointer(top)) + return 0; + + if (!cursor->len) + push_rcu(cursor->stack, top, cursor->len); + + for (; cursor->len > 0 && (node = cursor->stack[cursor->len - 1]); + --cursor->len, push_rcu(cursor->stack, node->bit[0], cursor->len), + push_rcu(cursor->stack, node->bit[1], cursor->len)) { + const unsigned int cidr_bytes = DIV_ROUND_UP(node->cidr, 8U); + + if (rcu_dereference_protected(node->peer, + lockdep_is_held(lock)) != peer) + continue; + + swap_endian(ip, node->bits, bits); + memset(ip + cidr_bytes, 0, bits / 8U - cidr_bytes); + if (node->cidr) + ip[cidr_bytes - 1U] &= ~0U << (-node->cidr % 8U); + + ret = func(ctx, ip, node->cidr, address_family); + if (ret) + return ret; + } + return 0; +} +#undef push_rcu + +#define ref(p) rcu_access_pointer(p) +#define deref(p) rcu_dereference_protected(*p, lockdep_is_held(lock)) +#define push(p) ({ \ + WARN_ON(len >= 128); \ + stack[len++] = p; \ + }) +static void walk_remove_by_peer(struct allowedips_node __rcu **top, + struct wireguard_peer *peer, struct mutex *lock) +{ + struct allowedips_node __rcu **stack[128], **nptr; + struct allowedips_node *node, *prev; + unsigned int len; + + if (unlikely(!peer || !ref(*top))) + return; + + for (prev = NULL, len = 0, push(top); len > 0; prev = node) { + nptr = stack[len - 1]; + node = deref(nptr); + if (!node) { + --len; + continue; + } + if (!prev || ref(prev->bit[0]) == node || + ref(prev->bit[1]) == node) { + if (ref(node->bit[0])) + push(&node->bit[0]); + else if (ref(node->bit[1])) + push(&node->bit[1]); + } else if (ref(node->bit[0]) == prev) { + if (ref(node->bit[1])) + push(&node->bit[1]); + } else { + if (rcu_dereference_protected(node->peer, + lockdep_is_held(lock)) == peer) { + RCU_INIT_POINTER(node->peer, NULL); + if (!node->bit[0] || !node->bit[1]) { + rcu_assign_pointer(*nptr, + deref(&node->bit[!ref(node->bit[0])])); + call_rcu_bh(&node->rcu, node_free_rcu); + node = deref(nptr); + } + } + --len; + } + } +} +#undef ref +#undef deref +#undef push + +static __always_inline unsigned int fls128(u64 a, u64 b) +{ + return a ? fls64(a) + 64U : fls64(b); +} + +static __always_inline u8 common_bits(const struct allowedips_node *node, + const u8 *key, u8 bits) +{ + if (bits == 32) + return 32U - fls(*(const u32 *)node->bits ^ *(const u32 *)key); + else if (bits == 128) + return 128U - fls128( + *(const u64 *)&node->bits[0] ^ *(const u64 *)&key[0], + *(const u64 *)&node->bits[8] ^ *(const u64 *)&key[8]); + return 0; +} + +/* This could be much faster if it actually just compared the common bits + * properly, by precomputing a mask bswap(~0 << (32 - cidr)), and the rest, but + * it turns out that common_bits is already super fast on modern processors, + * even taking into account the unfortunate bswap. So, we just inline it like + * this instead. + */ +#define prefix_matches(node, key, bits) \ + (common_bits(node, key, bits) >= node->cidr) + +static __always_inline struct allowedips_node * +find_node(struct allowedips_node *trie, u8 bits, const u8 *key) +{ + struct allowedips_node *node = trie, *found = NULL; + + while (node && prefix_matches(node, key, bits)) { + if (rcu_access_pointer(node->peer)) + found = node; + if (node->cidr == bits) + break; + node = rcu_dereference_bh(choose_node(node, key)); + } + return found; +} + +/* Returns a strong reference to a peer */ +static __always_inline struct wireguard_peer * +lookup(struct allowedips_node __rcu *root, u8 bits, const void *be_ip) +{ + u8 ip[16] __aligned(__alignof(u64)); + struct wireguard_peer *peer = NULL; + struct allowedips_node *node; + + swap_endian(ip, be_ip, bits); + + rcu_read_lock_bh(); +retry: + node = find_node(rcu_dereference_bh(root), bits, ip); + if (node) { + peer = peer_get_maybe_zero(rcu_dereference_bh(node->peer)); + if (!peer) + goto retry; + } + rcu_read_unlock_bh(); + return peer; +} + +__attribute__((nonnull(1))) static bool +node_placement(struct allowedips_node __rcu *trie, const u8 *key, u8 cidr, + u8 bits, struct allowedips_node **rnode, struct mutex *lock) +{ + struct allowedips_node *node = rcu_dereference_protected(trie, + lockdep_is_held(lock)); + struct allowedips_node *parent = NULL; + bool exact = false; + + while (node && node->cidr <= cidr && prefix_matches(node, key, bits)) { + parent = node; + if (parent->cidr == cidr) { + exact = true; + break; + } + node = rcu_dereference_protected(choose_node(parent, key), + lockdep_is_held(lock)); + } + *rnode = parent; + return exact; +} + +static int add(struct allowedips_node __rcu **trie, u8 bits, const u8 *be_key, + u8 cidr, struct wireguard_peer *peer, struct mutex *lock) +{ + struct allowedips_node *node, *parent, *down, *newnode; + u8 key[16] __aligned(__alignof(u64)); + + if (unlikely(cidr > bits || !peer)) + return -EINVAL; + + swap_endian(key, be_key, bits); + + if (!rcu_access_pointer(*trie)) { + node = kzalloc(sizeof(*node), GFP_KERNEL); + if (unlikely(!node)) + return -ENOMEM; + RCU_INIT_POINTER(node->peer, peer); + copy_and_assign_cidr(node, key, cidr, bits); + rcu_assign_pointer(*trie, node); + return 0; + } + if (node_placement(*trie, key, cidr, bits, &node, lock)) { + rcu_assign_pointer(node->peer, peer); + return 0; + } + + newnode = kzalloc(sizeof(*newnode), GFP_KERNEL); + if (unlikely(!newnode)) + return -ENOMEM; + RCU_INIT_POINTER(newnode->peer, peer); + copy_and_assign_cidr(newnode, key, cidr, bits); + + if (!node) + down = rcu_dereference_protected(*trie, lockdep_is_held(lock)); + else { + down = rcu_dereference_protected(choose_node(node, key), + lockdep_is_held(lock)); + if (!down) { + rcu_assign_pointer(choose_node(node, key), newnode); + return 0; + } + } + cidr = min(cidr, common_bits(down, key, bits)); + parent = node; + + if (newnode->cidr == cidr) { + rcu_assign_pointer(choose_node(newnode, down->bits), down); + if (!parent) + rcu_assign_pointer(*trie, newnode); + else + rcu_assign_pointer(choose_node(parent, newnode->bits), + newnode); + } else { + node = kzalloc(sizeof(*node), GFP_KERNEL); + if (unlikely(!node)) { + kfree(newnode); + return -ENOMEM; + } + copy_and_assign_cidr(node, newnode->bits, cidr, bits); + + rcu_assign_pointer(choose_node(node, down->bits), down); + rcu_assign_pointer(choose_node(node, newnode->bits), newnode); + if (!parent) + rcu_assign_pointer(*trie, node); + else + rcu_assign_pointer(choose_node(parent, node->bits), + node); + } + return 0; +} + +void allowedips_init(struct allowedips *table) +{ + table->root4 = table->root6 = NULL; + table->seq = 1; +} + +void allowedips_free(struct allowedips *table, struct mutex *lock) +{ + struct allowedips_node __rcu *old4 = table->root4, *old6 = table->root6; + ++table->seq; + RCU_INIT_POINTER(table->root4, NULL); + RCU_INIT_POINTER(table->root6, NULL); + if (rcu_access_pointer(old4)) + call_rcu_bh(&rcu_dereference_protected(old4, + lockdep_is_held(lock))->rcu, root_free_rcu); + if (rcu_access_pointer(old6)) + call_rcu_bh(&rcu_dereference_protected(old6, + lockdep_is_held(lock))->rcu, root_free_rcu); +} + +int allowedips_insert_v4(struct allowedips *table, const struct in_addr *ip, + u8 cidr, struct wireguard_peer *peer, + struct mutex *lock) +{ + ++table->seq; + return add(&table->root4, 32, (const u8 *)ip, cidr, peer, lock); +} + +int allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip, + u8 cidr, struct wireguard_peer *peer, + struct mutex *lock) +{ + ++table->seq; + return add(&table->root6, 128, (const u8 *)ip, cidr, peer, lock); +} + +void allowedips_remove_by_peer(struct allowedips *table, + struct wireguard_peer *peer, struct mutex *lock) +{ + ++table->seq; + walk_remove_by_peer(&table->root4, peer, lock); + walk_remove_by_peer(&table->root6, peer, lock); +} + +int allowedips_walk_by_peer(struct allowedips *table, + struct allowedips_cursor *cursor, + struct wireguard_peer *peer, + int (*func)(void *ctx, const u8 *ip, u8 cidr, int family), + void *ctx, struct mutex *lock) +{ + int ret; + + if (!cursor->seq) + cursor->seq = table->seq; + else if (cursor->seq != table->seq) + return 0; + + if (!cursor->second_half) { + ret = walk_by_peer(table->root4, 32, cursor, peer, func, ctx, lock); + if (ret) + return ret; + cursor->len = 0; + cursor->second_half = true; + } + return walk_by_peer(table->root6, 128, cursor, peer, func, ctx, lock); +} + +/* Returns a strong reference to a peer */ +struct wireguard_peer *allowedips_lookup_dst(struct allowedips *table, + struct sk_buff *skb) +{ + if (skb->protocol == htons(ETH_P_IP)) + return lookup(table->root4, 32, &ip_hdr(skb)->daddr); + else if (skb->protocol == htons(ETH_P_IPV6)) + return lookup(table->root6, 128, &ipv6_hdr(skb)->daddr); + return NULL; +} + +/* Returns a strong reference to a peer */ +struct wireguard_peer *allowedips_lookup_src(struct allowedips *table, + struct sk_buff *skb) +{ + if (skb->protocol == htons(ETH_P_IP)) + return lookup(table->root4, 32, &ip_hdr(skb)->saddr); + else if (skb->protocol == htons(ETH_P_IPV6)) + return lookup(table->root6, 128, &ipv6_hdr(skb)->saddr); + return NULL; +} + +#include "selftest/allowedips.h" diff --git a/drivers/net/wireguard/allowedips.h b/drivers/net/wireguard/allowedips.h new file mode 100644 index 000000000000..ace154254e6e --- /dev/null +++ b/drivers/net/wireguard/allowedips.h @@ -0,0 +1,55 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_ALLOWEDIPS_H +#define _WG_ALLOWEDIPS_H + +#include +#include +#include + +struct wireguard_peer; +struct allowedips_node; + +struct allowedips { + struct allowedips_node __rcu *root4; + struct allowedips_node __rcu *root6; + u64 seq; +}; + +struct allowedips_cursor { + u64 seq; + struct allowedips_node *stack[128]; + unsigned int len; + bool second_half; +}; + +void allowedips_init(struct allowedips *table); +void allowedips_free(struct allowedips *table, struct mutex *mutex); +int allowedips_insert_v4(struct allowedips *table, const struct in_addr *ip, + u8 cidr, struct wireguard_peer *peer, + struct mutex *lock); +int allowedips_insert_v6(struct allowedips *table, const struct in6_addr *ip, + u8 cidr, struct wireguard_peer *peer, + struct mutex *lock); +void allowedips_remove_by_peer(struct allowedips *table, + struct wireguard_peer *peer, struct mutex *lock); +int allowedips_walk_by_peer(struct allowedips *table, + struct allowedips_cursor *cursor, + struct wireguard_peer *peer, + int (*func)(void *ctx, const u8 *ip, u8 cidr, int family), + void *ctx, struct mutex *lock); + +/* These return a strong reference to a peer: */ +struct wireguard_peer *allowedips_lookup_dst(struct allowedips *table, + struct sk_buff *skb); +struct wireguard_peer *allowedips_lookup_src(struct allowedips *table, + struct sk_buff *skb); + +#ifdef DEBUG +bool allowedips_selftest(void); +#endif + +#endif /* _WG_ALLOWEDIPS_H */ diff --git a/drivers/net/wireguard/cookie.c b/drivers/net/wireguard/cookie.c new file mode 100644 index 000000000000..afd82a0d966d --- /dev/null +++ b/drivers/net/wireguard/cookie.c @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "cookie.h" +#include "peer.h" +#include "device.h" +#include "messages.h" +#include "ratelimiter.h" +#include "timers.h" + +#include +#include + +#include +#include + +void cookie_checker_init(struct cookie_checker *checker, + struct wireguard_device *wg) +{ + init_rwsem(&checker->secret_lock); + checker->secret_birthdate = ktime_get_boot_fast_ns(); + get_random_bytes(checker->secret, NOISE_HASH_LEN); + checker->device = wg; +} + +enum { COOKIE_KEY_LABEL_LEN = 8 }; +static const u8 mac1_key_label[COOKIE_KEY_LABEL_LEN] = "mac1----"; +static const u8 cookie_key_label[COOKIE_KEY_LABEL_LEN] = "cookie--"; + +static void precompute_key(u8 key[NOISE_SYMMETRIC_KEY_LEN], + const u8 pubkey[NOISE_PUBLIC_KEY_LEN], + const u8 label[COOKIE_KEY_LABEL_LEN]) +{ + struct blake2s_state blake; + + blake2s_init(&blake, NOISE_SYMMETRIC_KEY_LEN); + blake2s_update(&blake, label, COOKIE_KEY_LABEL_LEN); + blake2s_update(&blake, pubkey, NOISE_PUBLIC_KEY_LEN); + blake2s_final(&blake, key, NOISE_SYMMETRIC_KEY_LEN); +} + +/* Must hold peer->handshake.static_identity->lock */ +void cookie_checker_precompute_device_keys(struct cookie_checker *checker) +{ + if (likely(checker->device->static_identity.has_identity)) { + precompute_key(checker->cookie_encryption_key, + checker->device->static_identity.static_public, + cookie_key_label); + precompute_key(checker->message_mac1_key, + checker->device->static_identity.static_public, + mac1_key_label); + } else { + memset(checker->cookie_encryption_key, 0, + NOISE_SYMMETRIC_KEY_LEN); + memset(checker->message_mac1_key, 0, NOISE_SYMMETRIC_KEY_LEN); + } +} + +void cookie_checker_precompute_peer_keys(struct wireguard_peer *peer) +{ + precompute_key(peer->latest_cookie.cookie_decryption_key, + peer->handshake.remote_static, cookie_key_label); + precompute_key(peer->latest_cookie.message_mac1_key, + peer->handshake.remote_static, mac1_key_label); +} + +void cookie_init(struct cookie *cookie) +{ + memset(cookie, 0, sizeof(*cookie)); + init_rwsem(&cookie->lock); +} + +static void compute_mac1(u8 mac1[COOKIE_LEN], const void *message, size_t len, + const u8 key[NOISE_SYMMETRIC_KEY_LEN]) +{ + len = len - sizeof(struct message_macs) + + offsetof(struct message_macs, mac1); + blake2s(mac1, message, key, COOKIE_LEN, len, NOISE_SYMMETRIC_KEY_LEN); +} + +static void compute_mac2(u8 mac2[COOKIE_LEN], const void *message, size_t len, + const u8 cookie[COOKIE_LEN]) +{ + len = len - sizeof(struct message_macs) + + offsetof(struct message_macs, mac2); + blake2s(mac2, message, cookie, COOKIE_LEN, len, COOKIE_LEN); +} + +static void make_cookie(u8 cookie[COOKIE_LEN], struct sk_buff *skb, + struct cookie_checker *checker) +{ + struct blake2s_state state; + + if (has_expired(checker->secret_birthdate, COOKIE_SECRET_MAX_AGE)) { + down_write(&checker->secret_lock); + checker->secret_birthdate = ktime_get_boot_fast_ns(); + get_random_bytes(checker->secret, NOISE_HASH_LEN); + up_write(&checker->secret_lock); + } + + down_read(&checker->secret_lock); + + blake2s_init_key(&state, COOKIE_LEN, checker->secret, NOISE_HASH_LEN); + if (skb->protocol == htons(ETH_P_IP)) + blake2s_update(&state, (u8 *)&ip_hdr(skb)->saddr, + sizeof(struct in_addr)); + else if (skb->protocol == htons(ETH_P_IPV6)) + blake2s_update(&state, (u8 *)&ipv6_hdr(skb)->saddr, + sizeof(struct in6_addr)); + blake2s_update(&state, (u8 *)&udp_hdr(skb)->source, sizeof(__be16)); + blake2s_final(&state, cookie, COOKIE_LEN); + + up_read(&checker->secret_lock); +} + +enum cookie_mac_state cookie_validate_packet(struct cookie_checker *checker, + struct sk_buff *skb, + bool check_cookie) +{ + struct message_macs *macs = (struct message_macs *) + (skb->data + skb->len - sizeof(*macs)); + enum cookie_mac_state ret; + u8 computed_mac[COOKIE_LEN]; + u8 cookie[COOKIE_LEN]; + + ret = INVALID_MAC; + compute_mac1(computed_mac, skb->data, skb->len, + checker->message_mac1_key); + if (crypto_memneq(computed_mac, macs->mac1, COOKIE_LEN)) + goto out; + + ret = VALID_MAC_BUT_NO_COOKIE; + + if (!check_cookie) + goto out; + + make_cookie(cookie, skb, checker); + + compute_mac2(computed_mac, skb->data, skb->len, cookie); + if (crypto_memneq(computed_mac, macs->mac2, COOKIE_LEN)) + goto out; + + ret = VALID_MAC_WITH_COOKIE_BUT_RATELIMITED; + if (!ratelimiter_allow(skb, dev_net(checker->device->dev))) + goto out; + + ret = VALID_MAC_WITH_COOKIE; + +out: + return ret; +} + +void cookie_add_mac_to_packet(void *message, size_t len, + struct wireguard_peer *peer) +{ + struct message_macs *macs = (struct message_macs *) + ((u8 *)message + len - sizeof(*macs)); + + down_write(&peer->latest_cookie.lock); + compute_mac1(macs->mac1, message, len, + peer->latest_cookie.message_mac1_key); + memcpy(peer->latest_cookie.last_mac1_sent, macs->mac1, COOKIE_LEN); + peer->latest_cookie.have_sent_mac1 = true; + up_write(&peer->latest_cookie.lock); + + down_read(&peer->latest_cookie.lock); + if (peer->latest_cookie.is_valid && + !has_expired(peer->latest_cookie.birthdate, + COOKIE_SECRET_MAX_AGE - COOKIE_SECRET_LATENCY)) + compute_mac2(macs->mac2, message, len, + peer->latest_cookie.cookie); + else + memset(macs->mac2, 0, COOKIE_LEN); + up_read(&peer->latest_cookie.lock); +} + +void cookie_message_create(struct message_handshake_cookie *dst, + struct sk_buff *skb, __le32 index, + struct cookie_checker *checker) +{ + struct message_macs *macs = (struct message_macs *) + ((u8 *)skb->data + skb->len - sizeof(*macs)); + u8 cookie[COOKIE_LEN]; + + dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE); + dst->receiver_index = index; + get_random_bytes_wait(dst->nonce, COOKIE_NONCE_LEN); + + make_cookie(cookie, skb, checker); + xchacha20poly1305_encrypt(dst->encrypted_cookie, cookie, COOKIE_LEN, + macs->mac1, COOKIE_LEN, dst->nonce, + checker->cookie_encryption_key); +} + +void cookie_message_consume(struct message_handshake_cookie *src, + struct wireguard_device *wg) +{ + struct wireguard_peer *peer = NULL; + u8 cookie[COOKIE_LEN]; + bool ret; + + if (unlikely(!index_hashtable_lookup(&wg->index_hashtable, + INDEX_HASHTABLE_HANDSHAKE | + INDEX_HASHTABLE_KEYPAIR, + src->receiver_index, &peer))) + return; + + down_read(&peer->latest_cookie.lock); + if (unlikely(!peer->latest_cookie.have_sent_mac1)) { + up_read(&peer->latest_cookie.lock); + goto out; + } + ret = xchacha20poly1305_decrypt( + cookie, src->encrypted_cookie, sizeof(src->encrypted_cookie), + peer->latest_cookie.last_mac1_sent, COOKIE_LEN, src->nonce, + peer->latest_cookie.cookie_decryption_key); + up_read(&peer->latest_cookie.lock); + + if (ret) { + down_write(&peer->latest_cookie.lock); + memcpy(peer->latest_cookie.cookie, cookie, COOKIE_LEN); + peer->latest_cookie.birthdate = ktime_get_boot_fast_ns(); + peer->latest_cookie.is_valid = true; + peer->latest_cookie.have_sent_mac1 = false; + up_write(&peer->latest_cookie.lock); + } else + net_dbg_ratelimited("%s: Could not decrypt invalid cookie response\n", + wg->dev->name); + +out: + peer_put(peer); +} diff --git a/drivers/net/wireguard/cookie.h b/drivers/net/wireguard/cookie.h new file mode 100644 index 000000000000..41122fffb894 --- /dev/null +++ b/drivers/net/wireguard/cookie.h @@ -0,0 +1,59 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_COOKIE_H +#define _WG_COOKIE_H + +#include "messages.h" +#include + +struct wireguard_peer; + +struct cookie_checker { + u8 secret[NOISE_HASH_LEN]; + u8 cookie_encryption_key[NOISE_SYMMETRIC_KEY_LEN]; + u8 message_mac1_key[NOISE_SYMMETRIC_KEY_LEN]; + u64 secret_birthdate; + struct rw_semaphore secret_lock; + struct wireguard_device *device; +}; + +struct cookie { + u64 birthdate; + bool is_valid; + u8 cookie[COOKIE_LEN]; + bool have_sent_mac1; + u8 last_mac1_sent[COOKIE_LEN]; + u8 cookie_decryption_key[NOISE_SYMMETRIC_KEY_LEN]; + u8 message_mac1_key[NOISE_SYMMETRIC_KEY_LEN]; + struct rw_semaphore lock; +}; + +enum cookie_mac_state { + INVALID_MAC, + VALID_MAC_BUT_NO_COOKIE, + VALID_MAC_WITH_COOKIE_BUT_RATELIMITED, + VALID_MAC_WITH_COOKIE +}; + +void cookie_checker_init(struct cookie_checker *checker, + struct wireguard_device *wg); +void cookie_checker_precompute_device_keys(struct cookie_checker *checker); +void cookie_checker_precompute_peer_keys(struct wireguard_peer *peer); +void cookie_init(struct cookie *cookie); + +enum cookie_mac_state cookie_validate_packet(struct cookie_checker *checker, + struct sk_buff *skb, + bool check_cookie); +void cookie_add_mac_to_packet(void *message, size_t len, + struct wireguard_peer *peer); + +void cookie_message_create(struct message_handshake_cookie *src, + struct sk_buff *skb, __le32 index, + struct cookie_checker *checker); +void cookie_message_consume(struct message_handshake_cookie *src, + struct wireguard_device *wg); + +#endif /* _WG_COOKIE_H */ diff --git a/drivers/net/wireguard/device.c b/drivers/net/wireguard/device.c new file mode 100644 index 000000000000..ba03ac6922ea --- /dev/null +++ b/drivers/net/wireguard/device.c @@ -0,0 +1,438 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "queueing.h" +#include "socket.h" +#include "timers.h" +#include "device.h" +#include "ratelimiter.h" +#include "peer.h" +#include "messages.h" + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +static LIST_HEAD(device_list); + +static int open(struct net_device *dev) +{ + struct in_device *dev_v4 = __in_dev_get_rtnl(dev); + struct wireguard_device *wg = netdev_priv(dev); + struct inet6_dev *dev_v6 = __in6_dev_get(dev); + struct wireguard_peer *peer; + int ret; + + if (dev_v4) { + /* At some point we might put this check near the ip_rt_send_ + * redirect call of ip_forward in net/ipv4/ip_forward.c, similar + * to the current secpath check. + */ + IN_DEV_CONF_SET(dev_v4, SEND_REDIRECTS, false); + IPV4_DEVCONF_ALL(dev_net(dev), SEND_REDIRECTS) = false; + } + if (dev_v6) + dev_v6->cnf.addr_gen_mode = IN6_ADDR_GEN_MODE_NONE; + + ret = socket_init(wg, wg->incoming_port); + if (ret < 0) + return ret; + mutex_lock(&wg->device_update_lock); + list_for_each_entry (peer, &wg->peer_list, peer_list) { + packet_send_staged_packets(peer); + if (peer->persistent_keepalive_interval) + packet_send_keepalive(peer); + } + mutex_unlock(&wg->device_update_lock); + return 0; +} + +#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID) +static int pm_notification(struct notifier_block *nb, unsigned long action, + void *data) +{ + struct wireguard_device *wg; + struct wireguard_peer *peer; + + if (action != PM_HIBERNATION_PREPARE && action != PM_SUSPEND_PREPARE) + return 0; + + rtnl_lock(); + list_for_each_entry (wg, &device_list, device_list) { + mutex_lock(&wg->device_update_lock); + list_for_each_entry (peer, &wg->peer_list, peer_list) { + noise_handshake_clear(&peer->handshake); + noise_keypairs_clear(&peer->keypairs); + if (peer->timers_enabled) + del_timer(&peer->timer_zero_key_material); + } + mutex_unlock(&wg->device_update_lock); + } + rtnl_unlock(); + rcu_barrier_bh(); + return 0; +} +static struct notifier_block pm_notifier = { .notifier_call = pm_notification }; +#endif + +static int stop(struct net_device *dev) +{ + struct wireguard_device *wg = netdev_priv(dev); + struct wireguard_peer *peer; + + mutex_lock(&wg->device_update_lock); + list_for_each_entry (peer, &wg->peer_list, peer_list) { + skb_queue_purge(&peer->staged_packet_queue); + timers_stop(peer); + noise_handshake_clear(&peer->handshake); + noise_keypairs_clear(&peer->keypairs); + atomic64_set(&peer->last_sent_handshake, + ktime_get_boot_fast_ns() - + (u64)(REKEY_TIMEOUT + 1) * NSEC_PER_SEC); + } + mutex_unlock(&wg->device_update_lock); + skb_queue_purge(&wg->incoming_handshakes); + socket_reinit(wg, NULL, NULL); + return 0; +} + +static netdev_tx_t xmit(struct sk_buff *skb, struct net_device *dev) +{ + struct wireguard_device *wg = netdev_priv(dev); + struct wireguard_peer *peer; + struct sk_buff *next; + struct sk_buff_head packets; + sa_family_t family; + u32 mtu; + int ret; + + if (unlikely(skb_examine_untrusted_ip_hdr(skb) != skb->protocol)) { + ret = -EPROTONOSUPPORT; + net_dbg_ratelimited("%s: Invalid IP packet\n", dev->name); + goto err; + } + + peer = allowedips_lookup_dst(&wg->peer_allowedips, skb); + if (unlikely(!peer)) { + ret = -ENOKEY; + if (skb->protocol == htons(ETH_P_IP)) + net_dbg_ratelimited("%s: No peer has allowed IPs matching %pI4\n", + dev->name, &ip_hdr(skb)->daddr); + else if (skb->protocol == htons(ETH_P_IPV6)) + net_dbg_ratelimited("%s: No peer has allowed IPs matching %pI6\n", + dev->name, &ipv6_hdr(skb)->daddr); + goto err; + } + + family = READ_ONCE(peer->endpoint.addr.sa_family); + if (unlikely(family != AF_INET && family != AF_INET6)) { + ret = -EDESTADDRREQ; + net_dbg_ratelimited("%s: No valid endpoint has been configured or discovered for peer %llu\n", + dev->name, peer->internal_id); + goto err_peer; + } + + mtu = skb_dst(skb) ? dst_mtu(skb_dst(skb)) : dev->mtu; + + __skb_queue_head_init(&packets); + if (!skb_is_gso(skb)) + skb->next = NULL; + else { + struct sk_buff *segs = skb_gso_segment(skb, 0); + + if (unlikely(IS_ERR(segs))) { + ret = PTR_ERR(segs); + goto err_peer; + } + dev_kfree_skb(skb); + skb = segs; + } + do { + next = skb->next; + skb->next = skb->prev = NULL; + + skb = skb_share_check(skb, GFP_ATOMIC); + if (unlikely(!skb)) + continue; + + /* We only need to keep the original dst around for icmp, + * so at this point we're in a position to drop it. + */ + skb_dst_drop(skb); + + PACKET_CB(skb)->mtu = mtu; + + __skb_queue_tail(&packets, skb); + } while ((skb = next) != NULL); + + spin_lock_bh(&peer->staged_packet_queue.lock); + /* If the queue is getting too big, we start removing the oldest packets + * until it's small again. We do this before adding the new packet, so + * we don't remove GSO segments that are in excess. + */ + while (skb_queue_len(&peer->staged_packet_queue) > MAX_STAGED_PACKETS) + dev_kfree_skb(__skb_dequeue(&peer->staged_packet_queue)); + skb_queue_splice_tail(&packets, &peer->staged_packet_queue); + spin_unlock_bh(&peer->staged_packet_queue.lock); + + packet_send_staged_packets(peer); + + peer_put(peer); + return NETDEV_TX_OK; + +err_peer: + peer_put(peer); +err: + ++dev->stats.tx_errors; + if (skb->protocol == htons(ETH_P_IP)) + icmp_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0); + else if (skb->protocol == htons(ETH_P_IPV6)) + icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_ADDR_UNREACH, 0); + kfree_skb(skb); + return ret; +} + +static const struct net_device_ops netdev_ops = { + .ndo_open = open, + .ndo_stop = stop, + .ndo_start_xmit = xmit, + .ndo_get_stats64 = ip_tunnel_get_stats64 +}; + +static void destruct(struct net_device *dev) +{ + struct wireguard_device *wg = netdev_priv(dev); + + rtnl_lock(); + list_del(&wg->device_list); + rtnl_unlock(); + mutex_lock(&wg->device_update_lock); + wg->incoming_port = 0; + socket_reinit(wg, NULL, NULL); + allowedips_free(&wg->peer_allowedips, &wg->device_update_lock); + /* The final references are cleared in the below calls to destroy_workqueue. */ + peer_remove_all(wg); + destroy_workqueue(wg->handshake_receive_wq); + destroy_workqueue(wg->handshake_send_wq); + destroy_workqueue(wg->packet_crypt_wq); + packet_queue_free(&wg->decrypt_queue, true); + packet_queue_free(&wg->encrypt_queue, true); + rcu_barrier_bh(); /* Wait for all the peers to be actually freed. */ + ratelimiter_uninit(); + memzero_explicit(&wg->static_identity, sizeof(wg->static_identity)); + skb_queue_purge(&wg->incoming_handshakes); + free_percpu(dev->tstats); + free_percpu(wg->incoming_handshakes_worker); + if (wg->have_creating_net_ref) + put_net(wg->creating_net); + mutex_unlock(&wg->device_update_lock); + + pr_debug("%s: Interface deleted\n", dev->name); + free_netdev(dev); +} + +static const struct device_type device_type = { .name = KBUILD_MODNAME }; + +static void setup(struct net_device *dev) +{ + struct wireguard_device *wg = netdev_priv(dev); + enum { WG_NETDEV_FEATURES = NETIF_F_HW_CSUM | NETIF_F_RXCSUM | + NETIF_F_SG | NETIF_F_GSO | + NETIF_F_GSO_SOFTWARE | NETIF_F_HIGHDMA }; + + dev->netdev_ops = &netdev_ops; + dev->hard_header_len = 0; + dev->addr_len = 0; + dev->needed_headroom = DATA_PACKET_HEAD_ROOM; + dev->needed_tailroom = noise_encrypted_len(MESSAGE_PADDING_MULTIPLE); + dev->type = ARPHRD_NONE; + dev->flags = IFF_POINTOPOINT | IFF_NOARP; + dev->priv_flags |= IFF_NO_QUEUE; + dev->features |= NETIF_F_LLTX; + dev->features |= WG_NETDEV_FEATURES; + dev->hw_features |= WG_NETDEV_FEATURES; + dev->hw_enc_features |= WG_NETDEV_FEATURES; + dev->mtu = ETH_DATA_LEN - MESSAGE_MINIMUM_LENGTH - + sizeof(struct udphdr) - + max(sizeof(struct ipv6hdr), sizeof(struct iphdr)); + + SET_NETDEV_DEVTYPE(dev, &device_type); + + /* We need to keep the dst around in case of icmp replies. */ + netif_keep_dst(dev); + + memset(wg, 0, sizeof(*wg)); + wg->dev = dev; +} + +static int newlink(struct net *src_net, struct net_device *dev, + struct nlattr *tb[], struct nlattr *data[], + struct netlink_ext_ack *extack) +{ + int ret = -ENOMEM; + struct wireguard_device *wg = netdev_priv(dev); + + wg->creating_net = src_net; + init_rwsem(&wg->static_identity.lock); + mutex_init(&wg->socket_update_lock); + mutex_init(&wg->device_update_lock); + skb_queue_head_init(&wg->incoming_handshakes); + pubkey_hashtable_init(&wg->peer_hashtable); + index_hashtable_init(&wg->index_hashtable); + allowedips_init(&wg->peer_allowedips); + cookie_checker_init(&wg->cookie_checker, wg); + INIT_LIST_HEAD(&wg->peer_list); + wg->device_update_gen = 1; + + dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); + if (!dev->tstats) + goto error_1; + + wg->incoming_handshakes_worker = packet_alloc_percpu_multicore_worker( + packet_handshake_receive_worker, wg); + if (!wg->incoming_handshakes_worker) + goto error_2; + + wg->handshake_receive_wq = alloc_workqueue("wg-kex-%s", + WQ_CPU_INTENSIVE | WQ_FREEZABLE, 0, dev->name); + if (!wg->handshake_receive_wq) + goto error_3; + + wg->handshake_send_wq = alloc_workqueue("wg-kex-%s", + WQ_UNBOUND | WQ_FREEZABLE, 0, dev->name); + if (!wg->handshake_send_wq) + goto error_4; + + wg->packet_crypt_wq = alloc_workqueue("wg-crypt-%s", + WQ_CPU_INTENSIVE | WQ_MEM_RECLAIM, 0, dev->name); + if (!wg->packet_crypt_wq) + goto error_5; + + if (packet_queue_init(&wg->encrypt_queue, packet_encrypt_worker, true, + MAX_QUEUED_PACKETS) < 0) + goto error_6; + + if (packet_queue_init(&wg->decrypt_queue, packet_decrypt_worker, true, + MAX_QUEUED_PACKETS) < 0) + goto error_7; + + ret = ratelimiter_init(); + if (ret < 0) + goto error_8; + + ret = register_netdevice(dev); + if (ret < 0) + goto error_9; + + list_add(&wg->device_list, &device_list); + + /* We wait until the end to assign priv_destructor, so that + * register_netdevice doesn't call it for us if it fails. + */ + dev->priv_destructor = destruct; + + pr_debug("%s: Interface created\n", dev->name); + return ret; + +error_9: + ratelimiter_uninit(); +error_8: + packet_queue_free(&wg->decrypt_queue, true); +error_7: + packet_queue_free(&wg->encrypt_queue, true); +error_6: + destroy_workqueue(wg->packet_crypt_wq); +error_5: + destroy_workqueue(wg->handshake_send_wq); +error_4: + destroy_workqueue(wg->handshake_receive_wq); +error_3: + free_percpu(wg->incoming_handshakes_worker); +error_2: + free_percpu(dev->tstats); +error_1: + return ret; +} + +static struct rtnl_link_ops link_ops __read_mostly = { + .kind = KBUILD_MODNAME, + .priv_size = sizeof(struct wireguard_device), + .setup = setup, + .newlink = newlink, +}; + +static int netdevice_notification(struct notifier_block *nb, + unsigned long action, void *data) +{ + struct net_device *dev = ((struct netdev_notifier_info *)data)->dev; + struct wireguard_device *wg = netdev_priv(dev); + + ASSERT_RTNL(); + + if (action != NETDEV_REGISTER || dev->netdev_ops != &netdev_ops) + return 0; + + if (dev_net(dev) == wg->creating_net && wg->have_creating_net_ref) { + put_net(wg->creating_net); + wg->have_creating_net_ref = false; + } else if (dev_net(dev) != wg->creating_net && + !wg->have_creating_net_ref) { + wg->have_creating_net_ref = true; + get_net(wg->creating_net); + } + return 0; +} + +static struct notifier_block netdevice_notifier = { + .notifier_call = netdevice_notification +}; + +int __init device_init(void) +{ + int ret; + +#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID) + ret = register_pm_notifier(&pm_notifier); + if (ret) + return ret; +#endif + + ret = register_netdevice_notifier(&netdevice_notifier); + if (ret) + goto error_pm; + + ret = rtnl_link_register(&link_ops); + if (ret) + goto error_netdevice; + + return 0; + +error_netdevice: + unregister_netdevice_notifier(&netdevice_notifier); +error_pm: +#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID) + unregister_pm_notifier(&pm_notifier); +#endif + return ret; +} + +void device_uninit(void) +{ + rtnl_link_unregister(&link_ops); + unregister_netdevice_notifier(&netdevice_notifier); +#if defined(CONFIG_PM_SLEEP) && !defined(CONFIG_ANDROID) + unregister_pm_notifier(&pm_notifier); +#endif + rcu_barrier_bh(); +} diff --git a/drivers/net/wireguard/device.h b/drivers/net/wireguard/device.h new file mode 100644 index 000000000000..6589c9d0c36e --- /dev/null +++ b/drivers/net/wireguard/device.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_DEVICE_H +#define _WG_DEVICE_H + +#include "noise.h" +#include "allowedips.h" +#include "hashtables.h" +#include "cookie.h" + +#include +#include +#include +#include +#include +#include + +struct wireguard_device; + +struct multicore_worker { + void *ptr; + struct work_struct work; +}; + +struct crypt_queue { + struct ptr_ring ring; + union { + struct { + struct multicore_worker __percpu *worker; + int last_cpu; + }; + struct work_struct work; + }; +}; + +struct wireguard_device { + struct net_device *dev; + struct crypt_queue encrypt_queue, decrypt_queue; + struct sock __rcu *sock4, *sock6; + struct net *creating_net; + struct noise_static_identity static_identity; + struct workqueue_struct *handshake_receive_wq, *handshake_send_wq; + struct workqueue_struct *packet_crypt_wq; + struct sk_buff_head incoming_handshakes; + int incoming_handshake_cpu; + struct multicore_worker __percpu *incoming_handshakes_worker; + struct cookie_checker cookie_checker; + struct pubkey_hashtable peer_hashtable; + struct index_hashtable index_hashtable; + struct allowedips peer_allowedips; + struct mutex device_update_lock, socket_update_lock; + struct list_head device_list, peer_list; + unsigned int num_peers, device_update_gen; + u32 fwmark; + u16 incoming_port; + bool have_creating_net_ref; +}; + +int device_init(void); +void device_uninit(void); + +#endif /* _WG_DEVICE_H */ diff --git a/drivers/net/wireguard/hashtables.c b/drivers/net/wireguard/hashtables.c new file mode 100644 index 000000000000..afe17e91427e --- /dev/null +++ b/drivers/net/wireguard/hashtables.c @@ -0,0 +1,209 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "hashtables.h" +#include "peer.h" +#include "noise.h" + +static struct hlist_head *pubkey_bucket(struct pubkey_hashtable *table, + const u8 pubkey[NOISE_PUBLIC_KEY_LEN]) +{ + /* siphash gives us a secure 64bit number based on a random key. Since + * the bits are uniformly distributed, we can then mask off to get the + * bits we need. + */ + return &table->hashtable[ + siphash(pubkey, NOISE_PUBLIC_KEY_LEN, &table->key) & + (HASH_SIZE(table->hashtable) - 1)]; +} + +void pubkey_hashtable_init(struct pubkey_hashtable *table) +{ + get_random_bytes(&table->key, sizeof(table->key)); + hash_init(table->hashtable); + mutex_init(&table->lock); +} + +void pubkey_hashtable_add(struct pubkey_hashtable *table, + struct wireguard_peer *peer) +{ + mutex_lock(&table->lock); + hlist_add_head_rcu(&peer->pubkey_hash, + pubkey_bucket(table, peer->handshake.remote_static)); + mutex_unlock(&table->lock); +} + +void pubkey_hashtable_remove(struct pubkey_hashtable *table, + struct wireguard_peer *peer) +{ + mutex_lock(&table->lock); + hlist_del_init_rcu(&peer->pubkey_hash); + mutex_unlock(&table->lock); +} + +/* Returns a strong reference to a peer */ +struct wireguard_peer * +pubkey_hashtable_lookup(struct pubkey_hashtable *table, + const u8 pubkey[NOISE_PUBLIC_KEY_LEN]) +{ + struct wireguard_peer *iter_peer, *peer = NULL; + + rcu_read_lock_bh(); + hlist_for_each_entry_rcu_bh (iter_peer, pubkey_bucket(table, pubkey), + pubkey_hash) { + if (!memcmp(pubkey, iter_peer->handshake.remote_static, + NOISE_PUBLIC_KEY_LEN)) { + peer = iter_peer; + break; + } + } + peer = peer_get_maybe_zero(peer); + rcu_read_unlock_bh(); + return peer; +} + +static struct hlist_head *index_bucket(struct index_hashtable *table, + const __le32 index) +{ + /* Since the indices are random and thus all bits are uniformly + * distributed, we can find its bucket simply by masking. + */ + return &table->hashtable[(__force u32)index & + (HASH_SIZE(table->hashtable) - 1)]; +} + +void index_hashtable_init(struct index_hashtable *table) +{ + hash_init(table->hashtable); + spin_lock_init(&table->lock); +} + +/* At the moment, we limit ourselves to 2^20 total peers, which generally might + * amount to 2^20*3 items in this hashtable. The algorithm below works by + * picking a random number and testing it. We can see that these limits mean we + * usually succeed pretty quickly: + * + * >>> def calculation(tries, size): + * ... return (size / 2**32)**(tries - 1) * (1 - (size / 2**32)) + * ... + * >>> calculation(1, 2**20 * 3) + * 0.999267578125 + * >>> calculation(2, 2**20 * 3) + * 0.0007318854331970215 + * >>> calculation(3, 2**20 * 3) + * 5.360489012673497e-07 + * >>> calculation(4, 2**20 * 3) + * 3.9261394135792216e-10 + * + * At the moment, we don't do any masking, so this algorithm isn't exactly + * constant time in either the random guessing or in the hash list lookup. We + * could require a minimum of 3 tries, which would successfully mask the + * guessing. this would not, however, help with the growing hash lengths, which + * is another thing to consider moving forward. + */ + +__le32 index_hashtable_insert(struct index_hashtable *table, + struct index_hashtable_entry *entry) +{ + struct index_hashtable_entry *existing_entry; + + spin_lock_bh(&table->lock); + hlist_del_init_rcu(&entry->index_hash); + spin_unlock_bh(&table->lock); + + rcu_read_lock_bh(); + +search_unused_slot: + /* First we try to find an unused slot, randomly, while unlocked. */ + entry->index = (__force __le32)get_random_u32(); + hlist_for_each_entry_rcu_bh (existing_entry, + index_bucket(table, entry->index), + index_hash) { + if (existing_entry->index == entry->index) + /* If it's already in use, we continue searching. */ + goto search_unused_slot; + } + + /* Once we've found an unused slot, we lock it, and then double-check + * that nobody else stole it from us. + */ + spin_lock_bh(&table->lock); + hlist_for_each_entry_rcu_bh (existing_entry, + index_bucket(table, entry->index), + index_hash) { + if (existing_entry->index == entry->index) { + spin_unlock_bh(&table->lock); + /* If it was stolen, we start over. */ + goto search_unused_slot; + } + } + /* Otherwise, we know we have it exclusively (since we're locked), + * so we insert. + */ + hlist_add_head_rcu(&entry->index_hash, + index_bucket(table, entry->index)); + spin_unlock_bh(&table->lock); + + rcu_read_unlock_bh(); + + return entry->index; +} + +bool index_hashtable_replace(struct index_hashtable *table, + struct index_hashtable_entry *old, + struct index_hashtable_entry *new) +{ + if (unlikely(hlist_unhashed(&old->index_hash))) + return false; + spin_lock_bh(&table->lock); + new->index = old->index; + hlist_replace_rcu(&old->index_hash, &new->index_hash); + + /* Calling init here NULLs out index_hash, and in fact after this + * function returns, it's theoretically possible for this to get + * reinserted elsewhere. That means the RCU lookup below might either + * terminate early or jump between buckets, in which case the packet + * simply gets dropped, which isn't terrible. + */ + INIT_HLIST_NODE(&old->index_hash); + spin_unlock_bh(&table->lock); + return true; +} + +void index_hashtable_remove(struct index_hashtable *table, + struct index_hashtable_entry *entry) +{ + spin_lock_bh(&table->lock); + hlist_del_init_rcu(&entry->index_hash); + spin_unlock_bh(&table->lock); +} + +/* Returns a strong reference to a entry->peer */ +struct index_hashtable_entry * +index_hashtable_lookup(struct index_hashtable *table, + const enum index_hashtable_type type_mask, + const __le32 index, struct wireguard_peer **peer) +{ + struct index_hashtable_entry *iter_entry, *entry = NULL; + + rcu_read_lock_bh(); + hlist_for_each_entry_rcu_bh (iter_entry, index_bucket(table, index), + index_hash) { + if (iter_entry->index == index) { + if (likely(iter_entry->type & type_mask)) + entry = iter_entry; + break; + } + } + if (likely(entry)) { + entry->peer = peer_get_maybe_zero(entry->peer); + if (likely(entry->peer)) + *peer = entry->peer; + else + entry = NULL; + } + rcu_read_unlock_bh(); + return entry; +} diff --git a/drivers/net/wireguard/hashtables.h b/drivers/net/wireguard/hashtables.h new file mode 100644 index 000000000000..263a2a5f65af --- /dev/null +++ b/drivers/net/wireguard/hashtables.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_HASHTABLES_H +#define _WG_HASHTABLES_H + +#include "messages.h" + +#include +#include +#include + +struct wireguard_peer; + +struct pubkey_hashtable { + /* TODO: move to rhashtable */ + DECLARE_HASHTABLE(hashtable, 11); + siphash_key_t key; + struct mutex lock; +}; + +void pubkey_hashtable_init(struct pubkey_hashtable *table); +void pubkey_hashtable_add(struct pubkey_hashtable *table, + struct wireguard_peer *peer); +void pubkey_hashtable_remove(struct pubkey_hashtable *table, + struct wireguard_peer *peer); +struct wireguard_peer * +pubkey_hashtable_lookup(struct pubkey_hashtable *table, + const u8 pubkey[NOISE_PUBLIC_KEY_LEN]); + +struct index_hashtable { + /* TODO: move to rhashtable */ + DECLARE_HASHTABLE(hashtable, 13); + spinlock_t lock; +}; + +enum index_hashtable_type { + INDEX_HASHTABLE_HANDSHAKE = 1U << 0, + INDEX_HASHTABLE_KEYPAIR = 1U << 1 +}; + +struct index_hashtable_entry { + struct wireguard_peer *peer; + struct hlist_node index_hash; + enum index_hashtable_type type; + __le32 index; +}; +void index_hashtable_init(struct index_hashtable *table); +__le32 index_hashtable_insert(struct index_hashtable *table, + struct index_hashtable_entry *entry); +bool index_hashtable_replace(struct index_hashtable *table, + struct index_hashtable_entry *old, + struct index_hashtable_entry *new); +void index_hashtable_remove(struct index_hashtable *table, + struct index_hashtable_entry *entry); +struct index_hashtable_entry * +index_hashtable_lookup(struct index_hashtable *table, + const enum index_hashtable_type type_mask, + const __le32 index, struct wireguard_peer **peer); + +#endif /* _WG_HASHTABLES_H */ diff --git a/drivers/net/wireguard/main.c b/drivers/net/wireguard/main.c new file mode 100644 index 000000000000..7f902102f524 --- /dev/null +++ b/drivers/net/wireguard/main.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "version.h" +#include "device.h" +#include "noise.h" +#include "queueing.h" +#include "ratelimiter.h" +#include "netlink.h" + +#include + +#include +#include +#include +#include +#include + +static int __init mod_init(void) +{ + int ret; + +#ifdef DEBUG + if (!allowedips_selftest() || !packet_counter_selftest() || + !ratelimiter_selftest()) + return -ENOTRECOVERABLE; +#endif + noise_init(); + + ret = device_init(); + if (ret < 0) + goto err_device; + + ret = genetlink_init(); + if (ret < 0) + goto err_netlink; + + pr_info("WireGuard " WIREGUARD_VERSION " loaded. See www.wireguard.com for information.\n"); + pr_info("Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved.\n"); + + return 0; + +err_netlink: + device_uninit(); +err_device: + return ret; +} + +static void __exit mod_exit(void) +{ + genetlink_uninit(); + device_uninit(); + pr_debug("WireGuard unloaded\n"); +} + +module_init(mod_init); +module_exit(mod_exit); +MODULE_LICENSE("GPL v2"); +MODULE_DESCRIPTION("Fast, modern, and secure VPN tunnel"); +MODULE_AUTHOR("Jason A. Donenfeld "); +MODULE_VERSION(WIREGUARD_VERSION); +MODULE_ALIAS_RTNL_LINK(KBUILD_MODNAME); +MODULE_ALIAS_GENL_FAMILY(WG_GENL_NAME); diff --git a/drivers/net/wireguard/messages.h b/drivers/net/wireguard/messages.h new file mode 100644 index 000000000000..090e6f09c7df --- /dev/null +++ b/drivers/net/wireguard/messages.h @@ -0,0 +1,128 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_MESSAGES_H +#define _WG_MESSAGES_H + +#include +#include +#include + +#include +#include +#include + +enum noise_lengths { + NOISE_PUBLIC_KEY_LEN = CURVE25519_KEY_SIZE, + NOISE_SYMMETRIC_KEY_LEN = CHACHA20POLY1305_KEY_SIZE, + NOISE_TIMESTAMP_LEN = sizeof(u64) + sizeof(u32), + NOISE_AUTHTAG_LEN = CHACHA20POLY1305_AUTHTAG_SIZE, + NOISE_HASH_LEN = BLAKE2S_HASH_SIZE +}; + +#define noise_encrypted_len(plain_len) (plain_len + NOISE_AUTHTAG_LEN) + +enum cookie_values { + COOKIE_SECRET_MAX_AGE = 2 * 60, + COOKIE_SECRET_LATENCY = 5, + COOKIE_NONCE_LEN = XCHACHA20POLY1305_NONCE_SIZE, + COOKIE_LEN = 16 +}; + +enum counter_values { + COUNTER_BITS_TOTAL = 2048, + COUNTER_REDUNDANT_BITS = BITS_PER_LONG, + COUNTER_WINDOW_SIZE = COUNTER_BITS_TOTAL - COUNTER_REDUNDANT_BITS +}; + +enum limits { + REKEY_AFTER_MESSAGES = U64_MAX - 0xffff, + REJECT_AFTER_MESSAGES = U64_MAX - COUNTER_WINDOW_SIZE - 1, + REKEY_TIMEOUT = 5, + REKEY_TIMEOUT_JITTER_MAX_JIFFIES = HZ / 3, + REKEY_AFTER_TIME = 120, + REJECT_AFTER_TIME = 180, + INITIATIONS_PER_SECOND = 50, + MAX_PEERS_PER_DEVICE = 1U << 20, + KEEPALIVE_TIMEOUT = 10, + MAX_TIMER_HANDSHAKES = 90 / REKEY_TIMEOUT, + MAX_QUEUED_INCOMING_HANDSHAKES = 4096, /* TODO: replace this with DQL */ + MAX_STAGED_PACKETS = 128, + MAX_QUEUED_PACKETS = 1024 /* TODO: replace this with DQL */ +}; + +enum message_type { + MESSAGE_INVALID = 0, + MESSAGE_HANDSHAKE_INITIATION = 1, + MESSAGE_HANDSHAKE_RESPONSE = 2, + MESSAGE_HANDSHAKE_COOKIE = 3, + MESSAGE_DATA = 4 +}; + +struct message_header { + /* The actual layout of this that we want is: + * u8 type + * u8 reserved_zero[3] + * + * But it turns out that by encoding this as little endian, + * we achieve the same thing, and it makes checking faster. + */ + __le32 type; +}; + +struct message_macs { + u8 mac1[COOKIE_LEN]; + u8 mac2[COOKIE_LEN]; +}; + +struct message_handshake_initiation { + struct message_header header; + __le32 sender_index; + u8 unencrypted_ephemeral[NOISE_PUBLIC_KEY_LEN]; + u8 encrypted_static[noise_encrypted_len(NOISE_PUBLIC_KEY_LEN)]; + u8 encrypted_timestamp[noise_encrypted_len(NOISE_TIMESTAMP_LEN)]; + struct message_macs macs; +}; + +struct message_handshake_response { + struct message_header header; + __le32 sender_index; + __le32 receiver_index; + u8 unencrypted_ephemeral[NOISE_PUBLIC_KEY_LEN]; + u8 encrypted_nothing[noise_encrypted_len(0)]; + struct message_macs macs; +}; + +struct message_handshake_cookie { + struct message_header header; + __le32 receiver_index; + u8 nonce[COOKIE_NONCE_LEN]; + u8 encrypted_cookie[noise_encrypted_len(COOKIE_LEN)]; +}; + +struct message_data { + struct message_header header; + __le32 key_idx; + __le64 counter; + u8 encrypted_data[]; +}; + +#define message_data_len(plain_len) \ + (noise_encrypted_len(plain_len) + sizeof(struct message_data)) + +enum message_alignments { + MESSAGE_PADDING_MULTIPLE = 16, + MESSAGE_MINIMUM_LENGTH = message_data_len(0) +}; + +#define SKB_HEADER_LEN \ + (max(sizeof(struct iphdr), sizeof(struct ipv6hdr)) + \ + sizeof(struct udphdr) + NET_SKB_PAD) +#define DATA_PACKET_HEAD_ROOM \ + ALIGN(sizeof(struct message_data) + SKB_HEADER_LEN, 4) + +enum { HANDSHAKE_DSCP = 0x88 /* AF41, plus 00 ECN */ }; + +#endif /* _WG_MESSAGES_H */ diff --git a/drivers/net/wireguard/netlink.c b/drivers/net/wireguard/netlink.c new file mode 100644 index 000000000000..cb292b345c5c --- /dev/null +++ b/drivers/net/wireguard/netlink.c @@ -0,0 +1,606 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "netlink.h" +#include "device.h" +#include "peer.h" +#include "socket.h" +#include "queueing.h" +#include "messages.h" + +#include + +#include +#include +#include + +static struct genl_family genl_family; + +static const struct nla_policy device_policy[WGDEVICE_A_MAX + 1] = { + [WGDEVICE_A_IFINDEX] = { .type = NLA_U32 }, + [WGDEVICE_A_IFNAME] = { .type = NLA_NUL_STRING, .len = IFNAMSIZ - 1 }, + [WGDEVICE_A_PRIVATE_KEY] = { .len = NOISE_PUBLIC_KEY_LEN }, + [WGDEVICE_A_PUBLIC_KEY] = { .len = NOISE_PUBLIC_KEY_LEN }, + [WGDEVICE_A_FLAGS] = { .type = NLA_U32 }, + [WGDEVICE_A_LISTEN_PORT] = { .type = NLA_U16 }, + [WGDEVICE_A_FWMARK] = { .type = NLA_U32 }, + [WGDEVICE_A_PEERS] = { .type = NLA_NESTED } +}; + +static const struct nla_policy peer_policy[WGPEER_A_MAX + 1] = { + [WGPEER_A_PUBLIC_KEY] = { .len = NOISE_PUBLIC_KEY_LEN }, + [WGPEER_A_PRESHARED_KEY] = { .len = NOISE_SYMMETRIC_KEY_LEN }, + [WGPEER_A_FLAGS] = { .type = NLA_U32 }, + [WGPEER_A_ENDPOINT] = { .len = sizeof(struct sockaddr) }, + [WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL] = { .type = NLA_U16 }, + [WGPEER_A_LAST_HANDSHAKE_TIME] = { .len = sizeof(struct timespec) }, + [WGPEER_A_RX_BYTES] = { .type = NLA_U64 }, + [WGPEER_A_TX_BYTES] = { .type = NLA_U64 }, + [WGPEER_A_ALLOWEDIPS] = { .type = NLA_NESTED }, + [WGPEER_A_PROTOCOL_VERSION] = { .type = NLA_U32 } +}; + +static const struct nla_policy allowedip_policy[WGALLOWEDIP_A_MAX + 1] = { + [WGALLOWEDIP_A_FAMILY] = { .type = NLA_U16 }, + [WGALLOWEDIP_A_IPADDR] = { .len = sizeof(struct in_addr) }, + [WGALLOWEDIP_A_CIDR_MASK] = { .type = NLA_U8 } +}; + +static struct wireguard_device *lookup_interface(struct nlattr **attrs, + struct sk_buff *skb) +{ + struct net_device *dev = NULL; + + if (!attrs[WGDEVICE_A_IFINDEX] == !attrs[WGDEVICE_A_IFNAME]) + return ERR_PTR(-EBADR); + if (attrs[WGDEVICE_A_IFINDEX]) + dev = dev_get_by_index(sock_net(skb->sk), + nla_get_u32(attrs[WGDEVICE_A_IFINDEX])); + else if (attrs[WGDEVICE_A_IFNAME]) + dev = dev_get_by_name(sock_net(skb->sk), + nla_data(attrs[WGDEVICE_A_IFNAME])); + if (!dev) + return ERR_PTR(-ENODEV); + if (!dev->rtnl_link_ops || !dev->rtnl_link_ops->kind || + strcmp(dev->rtnl_link_ops->kind, KBUILD_MODNAME)) { + dev_put(dev); + return ERR_PTR(-EOPNOTSUPP); + } + return netdev_priv(dev); +} + +struct allowedips_ctx { + struct sk_buff *skb; + unsigned int i; +}; + +static int get_allowedips(void *ctx, const u8 *ip, u8 cidr, int family) +{ + struct allowedips_ctx *actx = ctx; + struct nlattr *allowedip_nest; + + allowedip_nest = nla_nest_start(actx->skb, actx->i++); + if (!allowedip_nest) + return -EMSGSIZE; + + if (nla_put_u8(actx->skb, WGALLOWEDIP_A_CIDR_MASK, cidr) || + nla_put_u16(actx->skb, WGALLOWEDIP_A_FAMILY, family) || + nla_put(actx->skb, WGALLOWEDIP_A_IPADDR, family == AF_INET6 ? + sizeof(struct in6_addr) : sizeof(struct in_addr), ip)) { + nla_nest_cancel(actx->skb, allowedip_nest); + return -EMSGSIZE; + } + + nla_nest_end(actx->skb, allowedip_nest); + return 0; +} + +static int get_peer(struct wireguard_peer *peer, unsigned int index, + struct allowedips_cursor *rt_cursor, struct sk_buff *skb) +{ + struct nlattr *allowedips_nest, *peer_nest = nla_nest_start(skb, index); + struct allowedips_ctx ctx = { .skb = skb }; + bool fail; + + if (!peer_nest) + return -EMSGSIZE; + + down_read(&peer->handshake.lock); + fail = nla_put(skb, WGPEER_A_PUBLIC_KEY, NOISE_PUBLIC_KEY_LEN, + peer->handshake.remote_static); + up_read(&peer->handshake.lock); + if (fail) + goto err; + + if (!rt_cursor->seq) { + down_read(&peer->handshake.lock); + fail = nla_put(skb, WGPEER_A_PRESHARED_KEY, + NOISE_SYMMETRIC_KEY_LEN, + peer->handshake.preshared_key); + up_read(&peer->handshake.lock); + if (fail) + goto err; + + if (nla_put(skb, WGPEER_A_LAST_HANDSHAKE_TIME, + sizeof(peer->walltime_last_handshake), + &peer->walltime_last_handshake) || + nla_put_u16(skb, WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, + peer->persistent_keepalive_interval) || + nla_put_u64_64bit(skb, WGPEER_A_TX_BYTES, peer->tx_bytes, + WGPEER_A_UNSPEC) || + nla_put_u64_64bit(skb, WGPEER_A_RX_BYTES, peer->rx_bytes, + WGPEER_A_UNSPEC) || + nla_put_u32(skb, WGPEER_A_PROTOCOL_VERSION, 1)) + goto err; + + read_lock_bh(&peer->endpoint_lock); + if (peer->endpoint.addr.sa_family == AF_INET) + fail = nla_put(skb, WGPEER_A_ENDPOINT, + sizeof(peer->endpoint.addr4), + &peer->endpoint.addr4); + else if (peer->endpoint.addr.sa_family == AF_INET6) + fail = nla_put(skb, WGPEER_A_ENDPOINT, + sizeof(peer->endpoint.addr6), + &peer->endpoint.addr6); + read_unlock_bh(&peer->endpoint_lock); + if (fail) + goto err; + } + + allowedips_nest = nla_nest_start(skb, WGPEER_A_ALLOWEDIPS); + if (!allowedips_nest) + goto err; + if (allowedips_walk_by_peer(&peer->device->peer_allowedips, rt_cursor, + peer, get_allowedips, &ctx, + &peer->device->device_update_lock)) { + nla_nest_end(skb, allowedips_nest); + nla_nest_end(skb, peer_nest); + return -EMSGSIZE; + } + memset(rt_cursor, 0, sizeof(*rt_cursor)); + nla_nest_end(skb, allowedips_nest); + nla_nest_end(skb, peer_nest); + return 0; +err: + nla_nest_cancel(skb, peer_nest); + return -EMSGSIZE; +} + +static int get_device_start(struct netlink_callback *cb) +{ + struct nlattr **attrs = genl_family_attrbuf(&genl_family); + struct wireguard_device *wg; + int ret; + + ret = nlmsg_parse(cb->nlh, GENL_HDRLEN + genl_family.hdrsize, attrs, + genl_family.maxattr, device_policy, NULL); + if (ret < 0) + return ret; + cb->args[2] = (long)kzalloc(sizeof(struct allowedips_cursor), + GFP_KERNEL); + if (unlikely(!cb->args[2])) + return -ENOMEM; + wg = lookup_interface(attrs, cb->skb); + if (IS_ERR(wg)) { + kfree((void *)cb->args[2]); + cb->args[2] = 0; + return PTR_ERR(wg); + } + cb->args[0] = (long)wg; + return 0; +} + +static int get_device_dump(struct sk_buff *skb, struct netlink_callback *cb) +{ + struct wireguard_peer *peer, *next_peer_cursor, *last_peer_cursor; + struct allowedips_cursor *rt_cursor; + struct wireguard_device *wg; + unsigned int peer_idx = 0; + struct nlattr *peers_nest; + int ret = -EMSGSIZE; + bool done = true; + void *hdr; + + wg = (struct wireguard_device *)cb->args[0]; + next_peer_cursor = (struct wireguard_peer *)cb->args[1]; + last_peer_cursor = (struct wireguard_peer *)cb->args[1]; + rt_cursor = (struct allowedips_cursor *)cb->args[2]; + + rtnl_lock(); + mutex_lock(&wg->device_update_lock); + cb->seq = wg->device_update_gen; + + hdr = genlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq, + &genl_family, NLM_F_MULTI, WG_CMD_GET_DEVICE); + if (!hdr) + goto out; + genl_dump_check_consistent(cb, hdr); + + if (!last_peer_cursor) { + if (nla_put_u16(skb, WGDEVICE_A_LISTEN_PORT, + wg->incoming_port) || + nla_put_u32(skb, WGDEVICE_A_FWMARK, wg->fwmark) || + nla_put_u32(skb, WGDEVICE_A_IFINDEX, wg->dev->ifindex) || + nla_put_string(skb, WGDEVICE_A_IFNAME, wg->dev->name)) + goto out; + + down_read(&wg->static_identity.lock); + if (wg->static_identity.has_identity) { + if (nla_put(skb, WGDEVICE_A_PRIVATE_KEY, + NOISE_PUBLIC_KEY_LEN, + wg->static_identity.static_private) || + nla_put(skb, WGDEVICE_A_PUBLIC_KEY, + NOISE_PUBLIC_KEY_LEN, + wg->static_identity.static_public)) { + up_read(&wg->static_identity.lock); + goto out; + } + } + up_read(&wg->static_identity.lock); + } + + peers_nest = nla_nest_start(skb, WGDEVICE_A_PEERS); + if (!peers_nest) + goto out; + ret = 0; + /* If the last cursor was removed via list_del_init in peer_remove, then + * we just treat this the same as there being no more peers left. The + * reason is that seq_nr should indicate to userspace that this isn't a + * coherent dump anyway, so they'll try again. + */ + if (list_empty(&wg->peer_list) || + (last_peer_cursor && list_empty(&last_peer_cursor->peer_list))) { + nla_nest_cancel(skb, peers_nest); + goto out; + } + lockdep_assert_held(&wg->device_update_lock); + peer = list_prepare_entry(last_peer_cursor, &wg->peer_list, peer_list); + list_for_each_entry_continue (peer, &wg->peer_list, peer_list) { + if (get_peer(peer, peer_idx++, rt_cursor, skb)) { + done = false; + break; + } + next_peer_cursor = peer; + } + nla_nest_end(skb, peers_nest); + +out: + if (!ret && !done && next_peer_cursor) + peer_get(next_peer_cursor); + peer_put(last_peer_cursor); + mutex_unlock(&wg->device_update_lock); + rtnl_unlock(); + + if (ret) { + genlmsg_cancel(skb, hdr); + return ret; + } + genlmsg_end(skb, hdr); + if (done) { + cb->args[1] = 0; + return 0; + } + cb->args[1] = (long)next_peer_cursor; + return skb->len; + + /* At this point, we can't really deal ourselves with safely zeroing out + * the private key material after usage. This will need an additional API + * in the kernel for marking skbs as zero_on_free. + */ +} + +static int get_device_done(struct netlink_callback *cb) +{ + struct wireguard_device *wg = (struct wireguard_device *)cb->args[0]; + struct wireguard_peer *peer = (struct wireguard_peer *)cb->args[1]; + struct allowedips_cursor *rt_cursor = + (struct allowedips_cursor *)cb->args[2]; + + if (wg) + dev_put(wg->dev); + kfree(rt_cursor); + peer_put(peer); + return 0; +} + +static int set_port(struct wireguard_device *wg, u16 port) +{ + struct wireguard_peer *peer; + + if (wg->incoming_port == port) + return 0; + list_for_each_entry (peer, &wg->peer_list, peer_list) + socket_clear_peer_endpoint_src(peer); + if (!netif_running(wg->dev)) { + wg->incoming_port = port; + return 0; + } + return socket_init(wg, port); +} + +static int set_allowedip(struct wireguard_peer *peer, struct nlattr **attrs) +{ + int ret = -EINVAL; + u16 family; + u8 cidr; + + if (!attrs[WGALLOWEDIP_A_FAMILY] || !attrs[WGALLOWEDIP_A_IPADDR] || + !attrs[WGALLOWEDIP_A_CIDR_MASK]) + return ret; + family = nla_get_u16(attrs[WGALLOWEDIP_A_FAMILY]); + cidr = nla_get_u8(attrs[WGALLOWEDIP_A_CIDR_MASK]); + + if (family == AF_INET && cidr <= 32 && + nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in_addr)) + ret = allowedips_insert_v4( + &peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, + &peer->device->device_update_lock); + else if (family == AF_INET6 && cidr <= 128 && + nla_len(attrs[WGALLOWEDIP_A_IPADDR]) == sizeof(struct in6_addr)) + ret = allowedips_insert_v6( + &peer->device->peer_allowedips, + nla_data(attrs[WGALLOWEDIP_A_IPADDR]), cidr, peer, + &peer->device->device_update_lock); + + return ret; +} + +static int set_peer(struct wireguard_device *wg, struct nlattr **attrs) +{ + u8 *public_key = NULL, *preshared_key = NULL; + struct wireguard_peer *peer = NULL; + u32 flags = 0; + int ret; + + ret = -EINVAL; + if (attrs[WGPEER_A_PUBLIC_KEY] && + nla_len(attrs[WGPEER_A_PUBLIC_KEY]) == NOISE_PUBLIC_KEY_LEN) + public_key = nla_data(attrs[WGPEER_A_PUBLIC_KEY]); + else + goto out; + if (attrs[WGPEER_A_PRESHARED_KEY] && + nla_len(attrs[WGPEER_A_PRESHARED_KEY]) == NOISE_SYMMETRIC_KEY_LEN) + preshared_key = nla_data(attrs[WGPEER_A_PRESHARED_KEY]); + if (attrs[WGPEER_A_FLAGS]) + flags = nla_get_u32(attrs[WGPEER_A_FLAGS]); + + ret = -EPFNOSUPPORT; + if (attrs[WGPEER_A_PROTOCOL_VERSION]) { + if (nla_get_u32(attrs[WGPEER_A_PROTOCOL_VERSION]) != 1) + goto out; + } + + peer = pubkey_hashtable_lookup(&wg->peer_hashtable, + nla_data(attrs[WGPEER_A_PUBLIC_KEY])); + if (!peer) { /* Peer doesn't exist yet. Add a new one. */ + ret = -ENODEV; + if (flags & WGPEER_F_REMOVE_ME) + goto out; /* Tried to remove a non-existing peer. */ + + down_read(&wg->static_identity.lock); + if (wg->static_identity.has_identity && + !memcmp(nla_data(attrs[WGPEER_A_PUBLIC_KEY]), + wg->static_identity.static_public, + NOISE_PUBLIC_KEY_LEN)) { + /* We silently ignore peers that have the same public + * key as the device. The reason we do it silently is + * that we'd like for people to be able to reuse the + * same set of API calls across peers. + */ + up_read(&wg->static_identity.lock); + ret = 0; + goto out; + } + up_read(&wg->static_identity.lock); + + ret = -ENOMEM; + peer = peer_create(wg, public_key, preshared_key); + if (!peer) + goto out; + /* Take additional reference, as though we've just been + * looked up. + */ + peer_get(peer); + } + + ret = 0; + if (flags & WGPEER_F_REMOVE_ME) { + peer_remove(peer); + goto out; + } + + if (preshared_key) { + down_write(&peer->handshake.lock); + memcpy(&peer->handshake.preshared_key, preshared_key, + NOISE_SYMMETRIC_KEY_LEN); + up_write(&peer->handshake.lock); + } + + if (attrs[WGPEER_A_ENDPOINT]) { + struct sockaddr *addr = nla_data(attrs[WGPEER_A_ENDPOINT]); + size_t len = nla_len(attrs[WGPEER_A_ENDPOINT]); + + if ((len == sizeof(struct sockaddr_in) && + addr->sa_family == AF_INET) || + (len == sizeof(struct sockaddr_in6) && + addr->sa_family == AF_INET6)) { + struct endpoint endpoint = { { { 0 } } }; + + memcpy(&endpoint.addr, addr, len); + socket_set_peer_endpoint(peer, &endpoint); + } + } + + if (flags & WGPEER_F_REPLACE_ALLOWEDIPS) + allowedips_remove_by_peer(&wg->peer_allowedips, peer, + &wg->device_update_lock); + + if (attrs[WGPEER_A_ALLOWEDIPS]) { + struct nlattr *attr, *allowedip[WGALLOWEDIP_A_MAX + 1]; + int rem; + + nla_for_each_nested (attr, attrs[WGPEER_A_ALLOWEDIPS], rem) { + ret = nla_parse_nested(allowedip, WGALLOWEDIP_A_MAX, + attr, allowedip_policy, NULL); + if (ret < 0) + goto out; + ret = set_allowedip(peer, allowedip); + if (ret < 0) + goto out; + } + } + + if (attrs[WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL]) { + const u16 persistent_keepalive_interval = nla_get_u16( + attrs[WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL]); + const bool send_keepalive = + !peer->persistent_keepalive_interval && + persistent_keepalive_interval && + netif_running(wg->dev); + + peer->persistent_keepalive_interval = persistent_keepalive_interval; + if (send_keepalive) + packet_send_keepalive(peer); + } + + if (netif_running(wg->dev)) + packet_send_staged_packets(peer); + +out: + peer_put(peer); + if (attrs[WGPEER_A_PRESHARED_KEY]) + memzero_explicit(nla_data(attrs[WGPEER_A_PRESHARED_KEY]), + nla_len(attrs[WGPEER_A_PRESHARED_KEY])); + return ret; +} + +static int set_device(struct sk_buff *skb, struct genl_info *info) +{ + struct wireguard_device *wg = lookup_interface(info->attrs, skb); + int ret; + + if (IS_ERR(wg)) { + ret = PTR_ERR(wg); + goto out_nodev; + } + + rtnl_lock(); + mutex_lock(&wg->device_update_lock); + ++wg->device_update_gen; + + if (info->attrs[WGDEVICE_A_FWMARK]) { + struct wireguard_peer *peer; + + wg->fwmark = nla_get_u32(info->attrs[WGDEVICE_A_FWMARK]); + list_for_each_entry (peer, &wg->peer_list, peer_list) + socket_clear_peer_endpoint_src(peer); + } + + if (info->attrs[WGDEVICE_A_LISTEN_PORT]) { + ret = set_port(wg, + nla_get_u16(info->attrs[WGDEVICE_A_LISTEN_PORT])); + if (ret) + goto out; + } + + if (info->attrs[WGDEVICE_A_FLAGS] && + nla_get_u32(info->attrs[WGDEVICE_A_FLAGS]) & + WGDEVICE_F_REPLACE_PEERS) + peer_remove_all(wg); + + if (info->attrs[WGDEVICE_A_PRIVATE_KEY] && + nla_len(info->attrs[WGDEVICE_A_PRIVATE_KEY]) == + NOISE_PUBLIC_KEY_LEN) { + u8 *private_key = nla_data(info->attrs[WGDEVICE_A_PRIVATE_KEY]); + u8 public_key[NOISE_PUBLIC_KEY_LEN]; + struct wireguard_peer *peer, *temp; + + /* We remove before setting, to prevent race, which means doing + * two 25519-genpub ops. + */ + if (curve25519_generate_public(public_key, private_key)) { + peer = pubkey_hashtable_lookup(&wg->peer_hashtable, + public_key); + if (peer) { + peer_put(peer); + peer_remove(peer); + } + } + + down_write(&wg->static_identity.lock); + noise_set_static_identity_private_key(&wg->static_identity, + private_key); + list_for_each_entry_safe (peer, temp, &wg->peer_list, + peer_list) { + if (!noise_precompute_static_static(peer)) + peer_remove(peer); + } + cookie_checker_precompute_device_keys(&wg->cookie_checker); + up_write(&wg->static_identity.lock); + } + + if (info->attrs[WGDEVICE_A_PEERS]) { + struct nlattr *attr, *peer[WGPEER_A_MAX + 1]; + int rem; + + nla_for_each_nested (attr, info->attrs[WGDEVICE_A_PEERS], rem) { + ret = nla_parse_nested(peer, WGPEER_A_MAX, attr, + peer_policy, NULL); + if (ret < 0) + goto out; + ret = set_peer(wg, peer); + if (ret < 0) + goto out; + } + } + ret = 0; + +out: + mutex_unlock(&wg->device_update_lock); + rtnl_unlock(); + dev_put(wg->dev); +out_nodev: + if (info->attrs[WGDEVICE_A_PRIVATE_KEY]) + memzero_explicit(nla_data(info->attrs[WGDEVICE_A_PRIVATE_KEY]), + nla_len(info->attrs[WGDEVICE_A_PRIVATE_KEY])); + return ret; +} + +static const struct genl_ops genl_ops[] = { + { + .cmd = WG_CMD_GET_DEVICE, + .start = get_device_start, + .dumpit = get_device_dump, + .done = get_device_done, + .policy = device_policy, + .flags = GENL_UNS_ADMIN_PERM + }, { + .cmd = WG_CMD_SET_DEVICE, + .doit = set_device, + .policy = device_policy, + .flags = GENL_UNS_ADMIN_PERM + } +}; + +static struct genl_family genl_family __ro_after_init = { + .ops = genl_ops, + .n_ops = ARRAY_SIZE(genl_ops), + .name = WG_GENL_NAME, + .version = WG_GENL_VERSION, + .maxattr = WGDEVICE_A_MAX, + .module = THIS_MODULE, + .netnsok = true +}; + +int __init genetlink_init(void) +{ + return genl_register_family(&genl_family); +} + +void __exit genetlink_uninit(void) +{ + genl_unregister_family(&genl_family); +} diff --git a/drivers/net/wireguard/netlink.h b/drivers/net/wireguard/netlink.h new file mode 100644 index 000000000000..657fe1ad3786 --- /dev/null +++ b/drivers/net/wireguard/netlink.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_NETLINK_H +#define _WG_NETLINK_H + +int genetlink_init(void); +void genetlink_uninit(void); + +#endif /* _WG_NETLINK_H */ diff --git a/drivers/net/wireguard/noise.c b/drivers/net/wireguard/noise.c new file mode 100644 index 000000000000..7ab389091c80 --- /dev/null +++ b/drivers/net/wireguard/noise.c @@ -0,0 +1,784 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "noise.h" +#include "device.h" +#include "peer.h" +#include "messages.h" +#include "queueing.h" +#include "hashtables.h" + +#include +#include +#include +#include +#include +#include + +/* This implements Noise_IKpsk2: + * + * <- s + * ****** + * -> e, es, s, ss, {t} + * <- e, ee, se, psk, {} + */ + +static const u8 handshake_name[37] = "Noise_IKpsk2_25519_ChaChaPoly_BLAKE2s"; +static const u8 identifier_name[34] = "WireGuard v1 zx2c4 Jason@zx2c4.com"; +static u8 handshake_init_hash[NOISE_HASH_LEN] __ro_after_init; +static u8 handshake_init_chaining_key[NOISE_HASH_LEN] __ro_after_init; +static atomic64_t keypair_counter = ATOMIC64_INIT(0); + +void __init noise_init(void) +{ + struct blake2s_state blake; + + blake2s(handshake_init_chaining_key, handshake_name, NULL, + NOISE_HASH_LEN, sizeof(handshake_name), 0); + blake2s_init(&blake, NOISE_HASH_LEN); + blake2s_update(&blake, handshake_init_chaining_key, NOISE_HASH_LEN); + blake2s_update(&blake, identifier_name, sizeof(identifier_name)); + blake2s_final(&blake, handshake_init_hash, NOISE_HASH_LEN); +} + +/* Must hold peer->handshake.static_identity->lock */ +bool noise_precompute_static_static(struct wireguard_peer *peer) +{ + bool ret = true; + + down_write(&peer->handshake.lock); + if (peer->handshake.static_identity->has_identity) + ret = curve25519( + peer->handshake.precomputed_static_static, + peer->handshake.static_identity->static_private, + peer->handshake.remote_static); + else + memset(peer->handshake.precomputed_static_static, 0, + NOISE_PUBLIC_KEY_LEN); + up_write(&peer->handshake.lock); + return ret; +} + +bool noise_handshake_init(struct noise_handshake *handshake, + struct noise_static_identity *static_identity, + const u8 peer_public_key[NOISE_PUBLIC_KEY_LEN], + const u8 peer_preshared_key[NOISE_SYMMETRIC_KEY_LEN], + struct wireguard_peer *peer) +{ + memset(handshake, 0, sizeof(*handshake)); + init_rwsem(&handshake->lock); + handshake->entry.type = INDEX_HASHTABLE_HANDSHAKE; + handshake->entry.peer = peer; + memcpy(handshake->remote_static, peer_public_key, NOISE_PUBLIC_KEY_LEN); + if (peer_preshared_key) + memcpy(handshake->preshared_key, peer_preshared_key, + NOISE_SYMMETRIC_KEY_LEN); + handshake->static_identity = static_identity; + handshake->state = HANDSHAKE_ZEROED; + return noise_precompute_static_static(peer); +} + +static void handshake_zero(struct noise_handshake *handshake) +{ + memset(&handshake->ephemeral_private, 0, NOISE_PUBLIC_KEY_LEN); + memset(&handshake->remote_ephemeral, 0, NOISE_PUBLIC_KEY_LEN); + memset(&handshake->hash, 0, NOISE_HASH_LEN); + memset(&handshake->chaining_key, 0, NOISE_HASH_LEN); + handshake->remote_index = 0; + handshake->state = HANDSHAKE_ZEROED; +} + +void noise_handshake_clear(struct noise_handshake *handshake) +{ + index_hashtable_remove(&handshake->entry.peer->device->index_hashtable, + &handshake->entry); + down_write(&handshake->lock); + handshake_zero(handshake); + up_write(&handshake->lock); + index_hashtable_remove(&handshake->entry.peer->device->index_hashtable, + &handshake->entry); +} + +static struct noise_keypair *keypair_create(struct wireguard_peer *peer) +{ + struct noise_keypair *keypair = kzalloc(sizeof(*keypair), GFP_KERNEL); + + if (unlikely(!keypair)) + return NULL; + keypair->internal_id = atomic64_inc_return(&keypair_counter); + keypair->entry.type = INDEX_HASHTABLE_KEYPAIR; + keypair->entry.peer = peer; + kref_init(&keypair->refcount); + return keypair; +} + +static void keypair_free_rcu(struct rcu_head *rcu) +{ + kzfree(container_of(rcu, struct noise_keypair, rcu)); +} + +static void keypair_free_kref(struct kref *kref) +{ + struct noise_keypair *keypair = + container_of(kref, struct noise_keypair, refcount); + net_dbg_ratelimited("%s: Keypair %llu destroyed for peer %llu\n", + keypair->entry.peer->device->dev->name, + keypair->internal_id, + keypair->entry.peer->internal_id); + index_hashtable_remove(&keypair->entry.peer->device->index_hashtable, + &keypair->entry); + call_rcu_bh(&keypair->rcu, keypair_free_rcu); +} + +void noise_keypair_put(struct noise_keypair *keypair, bool unreference_now) +{ + if (unlikely(!keypair)) + return; + if (unlikely(unreference_now)) + index_hashtable_remove( + &keypair->entry.peer->device->index_hashtable, + &keypair->entry); + kref_put(&keypair->refcount, keypair_free_kref); +} + +struct noise_keypair *noise_keypair_get(struct noise_keypair *keypair) +{ + RCU_LOCKDEP_WARN(!rcu_read_lock_bh_held(), + "Taking noise keypair reference without holding the RCU BH read lock"); + if (unlikely(!keypair || !kref_get_unless_zero(&keypair->refcount))) + return NULL; + return keypair; +} + +void noise_keypairs_clear(struct noise_keypairs *keypairs) +{ + struct noise_keypair *old; + + spin_lock_bh(&keypairs->keypair_update_lock); + old = rcu_dereference_protected(keypairs->previous_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)); + RCU_INIT_POINTER(keypairs->previous_keypair, NULL); + noise_keypair_put(old, true); + old = rcu_dereference_protected(keypairs->next_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)); + RCU_INIT_POINTER(keypairs->next_keypair, NULL); + noise_keypair_put(old, true); + old = rcu_dereference_protected(keypairs->current_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)); + RCU_INIT_POINTER(keypairs->current_keypair, NULL); + noise_keypair_put(old, true); + spin_unlock_bh(&keypairs->keypair_update_lock); +} + +static void add_new_keypair(struct noise_keypairs *keypairs, + struct noise_keypair *new_keypair) +{ + struct noise_keypair *previous_keypair, *next_keypair, *current_keypair; + + spin_lock_bh(&keypairs->keypair_update_lock); + previous_keypair = rcu_dereference_protected(keypairs->previous_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)); + next_keypair = rcu_dereference_protected(keypairs->next_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)); + current_keypair = rcu_dereference_protected(keypairs->current_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)); + if (new_keypair->i_am_the_initiator) { + /* If we're the initiator, it means we've sent a handshake, and + * received a confirmation response, which means this new + * keypair can now be used. + */ + if (next_keypair) { + /* If there already was a next keypair pending, we + * demote it to be the previous keypair, and free the + * existing current. Note that this means KCI can result + * in this transition. It would perhaps be more sound to + * always just get rid of the unused next keypair + * instead of putting it in the previous slot, but this + * might be a bit less robust. Something to think about + * for the future. + */ + RCU_INIT_POINTER(keypairs->next_keypair, NULL); + rcu_assign_pointer(keypairs->previous_keypair, + next_keypair); + noise_keypair_put(current_keypair, true); + } else /* If there wasn't an existing next keypair, we replace + * the previous with the current one. + */ + rcu_assign_pointer(keypairs->previous_keypair, + current_keypair); + /* At this point we can get rid of the old previous keypair, and + * set up the new keypair. + */ + noise_keypair_put(previous_keypair, true); + rcu_assign_pointer(keypairs->current_keypair, new_keypair); + } else { + /* If we're the responder, it means we can't use the new keypair + * until we receive confirmation via the first data packet, so + * we get rid of the existing previous one, the possibly + * existing next one, and slide in the new next one. + */ + rcu_assign_pointer(keypairs->next_keypair, new_keypair); + noise_keypair_put(next_keypair, true); + RCU_INIT_POINTER(keypairs->previous_keypair, NULL); + noise_keypair_put(previous_keypair, true); + } + spin_unlock_bh(&keypairs->keypair_update_lock); +} + +bool noise_received_with_keypair(struct noise_keypairs *keypairs, + struct noise_keypair *received_keypair) +{ + struct noise_keypair *old_keypair; + bool key_is_new; + + /* We first check without taking the spinlock. */ + key_is_new = received_keypair == + rcu_access_pointer(keypairs->next_keypair); + if (likely(!key_is_new)) + return false; + + spin_lock_bh(&keypairs->keypair_update_lock); + /* After locking, we double check that things didn't change from + * beneath us. + */ + if (unlikely(received_keypair != + rcu_dereference_protected(keypairs->next_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)))) { + spin_unlock_bh(&keypairs->keypair_update_lock); + return false; + } + + /* When we've finally received the confirmation, we slide the next + * into the current, the current into the previous, and get rid of + * the old previous. + */ + old_keypair = rcu_dereference_protected(keypairs->previous_keypair, + lockdep_is_held(&keypairs->keypair_update_lock)); + rcu_assign_pointer(keypairs->previous_keypair, + rcu_dereference_protected(keypairs->current_keypair, + lockdep_is_held(&keypairs->keypair_update_lock))); + noise_keypair_put(old_keypair, true); + rcu_assign_pointer(keypairs->current_keypair, received_keypair); + RCU_INIT_POINTER(keypairs->next_keypair, NULL); + + spin_unlock_bh(&keypairs->keypair_update_lock); + return true; +} + +/* Must hold static_identity->lock */ +void noise_set_static_identity_private_key( + struct noise_static_identity *static_identity, + const u8 private_key[NOISE_PUBLIC_KEY_LEN]) +{ + memcpy(static_identity->static_private, private_key, + NOISE_PUBLIC_KEY_LEN); + static_identity->has_identity = curve25519_generate_public( + static_identity->static_public, private_key); +} + +/* This is Hugo Krawczyk's HKDF: + * - https://eprint.iacr.org/2010/264.pdf + * - https://tools.ietf.org/html/rfc5869 + */ +static void kdf(u8 *first_dst, u8 *second_dst, u8 *third_dst, const u8 *data, + size_t first_len, size_t second_len, size_t third_len, + size_t data_len, const u8 chaining_key[NOISE_HASH_LEN]) +{ + u8 output[BLAKE2S_HASH_SIZE + 1]; + u8 secret[BLAKE2S_HASH_SIZE]; + +#ifdef DEBUG + BUG_ON(first_len > BLAKE2S_HASH_SIZE || second_len > BLAKE2S_HASH_SIZE || + third_len > BLAKE2S_HASH_SIZE || + ((second_len || second_dst || third_len || third_dst) && + (!first_len || !first_dst)) || + ((third_len || third_dst) && (!second_len || !second_dst))); +#endif + + /* Extract entropy from data into secret */ + blake2s_hmac(secret, data, chaining_key, BLAKE2S_HASH_SIZE, data_len, + NOISE_HASH_LEN); + + if (!first_dst || !first_len) + goto out; + + /* Expand first key: key = secret, data = 0x1 */ + output[0] = 1; + blake2s_hmac(output, output, secret, BLAKE2S_HASH_SIZE, 1, + BLAKE2S_HASH_SIZE); + memcpy(first_dst, output, first_len); + + if (!second_dst || !second_len) + goto out; + + /* Expand second key: key = secret, data = first-key || 0x2 */ + output[BLAKE2S_HASH_SIZE] = 2; + blake2s_hmac(output, output, secret, BLAKE2S_HASH_SIZE, + BLAKE2S_HASH_SIZE + 1, BLAKE2S_HASH_SIZE); + memcpy(second_dst, output, second_len); + + if (!third_dst || !third_len) + goto out; + + /* Expand third key: key = secret, data = second-key || 0x3 */ + output[BLAKE2S_HASH_SIZE] = 3; + blake2s_hmac(output, output, secret, BLAKE2S_HASH_SIZE, + BLAKE2S_HASH_SIZE + 1, BLAKE2S_HASH_SIZE); + memcpy(third_dst, output, third_len); + +out: + /* Clear sensitive data from stack */ + memzero_explicit(secret, BLAKE2S_HASH_SIZE); + memzero_explicit(output, BLAKE2S_HASH_SIZE + 1); +} + +static void symmetric_key_init(struct noise_symmetric_key *key) +{ + spin_lock_init(&key->counter.receive.lock); + atomic64_set(&key->counter.counter, 0); + memset(key->counter.receive.backtrack, 0, + sizeof(key->counter.receive.backtrack)); + key->birthdate = ktime_get_boot_fast_ns(); + key->is_valid = true; +} + +static void derive_keys(struct noise_symmetric_key *first_dst, + struct noise_symmetric_key *second_dst, + const u8 chaining_key[NOISE_HASH_LEN]) +{ + kdf(first_dst->key, second_dst->key, NULL, NULL, + NOISE_SYMMETRIC_KEY_LEN, NOISE_SYMMETRIC_KEY_LEN, 0, 0, + chaining_key); + symmetric_key_init(first_dst); + symmetric_key_init(second_dst); +} + +static bool __must_check mix_dh(u8 chaining_key[NOISE_HASH_LEN], + u8 key[NOISE_SYMMETRIC_KEY_LEN], + const u8 private[NOISE_PUBLIC_KEY_LEN], + const u8 public[NOISE_PUBLIC_KEY_LEN]) +{ + u8 dh_calculation[NOISE_PUBLIC_KEY_LEN]; + + if (unlikely(!curve25519(dh_calculation, private, public))) + return false; + kdf(chaining_key, key, NULL, dh_calculation, NOISE_HASH_LEN, + NOISE_SYMMETRIC_KEY_LEN, 0, NOISE_PUBLIC_KEY_LEN, chaining_key); + memzero_explicit(dh_calculation, NOISE_PUBLIC_KEY_LEN); + return true; +} + +static void mix_hash(u8 hash[NOISE_HASH_LEN], const u8 *src, size_t src_len) +{ + struct blake2s_state blake; + + blake2s_init(&blake, NOISE_HASH_LEN); + blake2s_update(&blake, hash, NOISE_HASH_LEN); + blake2s_update(&blake, src, src_len); + blake2s_final(&blake, hash, NOISE_HASH_LEN); +} + +static void mix_psk(u8 chaining_key[NOISE_HASH_LEN], u8 hash[NOISE_HASH_LEN], + u8 key[NOISE_SYMMETRIC_KEY_LEN], + const u8 psk[NOISE_SYMMETRIC_KEY_LEN]) +{ + u8 temp_hash[NOISE_HASH_LEN]; + + kdf(chaining_key, temp_hash, key, psk, NOISE_HASH_LEN, NOISE_HASH_LEN, + NOISE_SYMMETRIC_KEY_LEN, NOISE_SYMMETRIC_KEY_LEN, chaining_key); + mix_hash(hash, temp_hash, NOISE_HASH_LEN); + memzero_explicit(temp_hash, NOISE_HASH_LEN); +} + +static void handshake_init(u8 chaining_key[NOISE_HASH_LEN], + u8 hash[NOISE_HASH_LEN], + const u8 remote_static[NOISE_PUBLIC_KEY_LEN]) +{ + memcpy(hash, handshake_init_hash, NOISE_HASH_LEN); + memcpy(chaining_key, handshake_init_chaining_key, NOISE_HASH_LEN); + mix_hash(hash, remote_static, NOISE_PUBLIC_KEY_LEN); +} + +static void message_encrypt(u8 *dst_ciphertext, const u8 *src_plaintext, + size_t src_len, u8 key[NOISE_SYMMETRIC_KEY_LEN], + u8 hash[NOISE_HASH_LEN]) +{ + chacha20poly1305_encrypt(dst_ciphertext, src_plaintext, src_len, hash, + NOISE_HASH_LEN, + 0 /* Always zero for Noise_IK */, key); + mix_hash(hash, dst_ciphertext, noise_encrypted_len(src_len)); +} + +static bool message_decrypt(u8 *dst_plaintext, const u8 *src_ciphertext, + size_t src_len, u8 key[NOISE_SYMMETRIC_KEY_LEN], + u8 hash[NOISE_HASH_LEN]) +{ + if (!chacha20poly1305_decrypt(dst_plaintext, src_ciphertext, src_len, + hash, NOISE_HASH_LEN, + 0 /* Always zero for Noise_IK */, key)) + return false; + mix_hash(hash, src_ciphertext, src_len); + return true; +} + +static void message_ephemeral(u8 ephemeral_dst[NOISE_PUBLIC_KEY_LEN], + const u8 ephemeral_src[NOISE_PUBLIC_KEY_LEN], + u8 chaining_key[NOISE_HASH_LEN], + u8 hash[NOISE_HASH_LEN]) +{ + if (ephemeral_dst != ephemeral_src) + memcpy(ephemeral_dst, ephemeral_src, NOISE_PUBLIC_KEY_LEN); + mix_hash(hash, ephemeral_src, NOISE_PUBLIC_KEY_LEN); + kdf(chaining_key, NULL, NULL, ephemeral_src, NOISE_HASH_LEN, 0, 0, + NOISE_PUBLIC_KEY_LEN, chaining_key); +} + +static void tai64n_now(u8 output[NOISE_TIMESTAMP_LEN]) +{ + struct timespec64 now; + + getnstimeofday64(&now); + /* https://cr.yp.to/libtai/tai64.html */ + *(__be64 *)output = cpu_to_be64(0x400000000000000aULL + now.tv_sec); + *(__be32 *)(output + sizeof(__be64)) = cpu_to_be32(now.tv_nsec); +} + +bool noise_handshake_create_initiation(struct message_handshake_initiation *dst, + struct noise_handshake *handshake) +{ + u8 timestamp[NOISE_TIMESTAMP_LEN]; + u8 key[NOISE_SYMMETRIC_KEY_LEN]; + bool ret = false; + + /* We need to wait for crng _before_ taking any locks, since + * curve25519_generate_secret uses get_random_bytes_wait. + */ + wait_for_random_bytes(); + + down_read(&handshake->static_identity->lock); + down_write(&handshake->lock); + + if (unlikely(!handshake->static_identity->has_identity)) + goto out; + + dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION); + + handshake_init(handshake->chaining_key, handshake->hash, + handshake->remote_static); + + /* e */ + curve25519_generate_secret(handshake->ephemeral_private); + if (!curve25519_generate_public(dst->unencrypted_ephemeral, + handshake->ephemeral_private)) + goto out; + message_ephemeral(dst->unencrypted_ephemeral, + dst->unencrypted_ephemeral, handshake->chaining_key, + handshake->hash); + + /* es */ + if (!mix_dh(handshake->chaining_key, key, handshake->ephemeral_private, + handshake->remote_static)) + goto out; + + /* s */ + message_encrypt(dst->encrypted_static, + handshake->static_identity->static_public, + NOISE_PUBLIC_KEY_LEN, key, handshake->hash); + + /* ss */ + kdf(handshake->chaining_key, key, NULL, + handshake->precomputed_static_static, NOISE_HASH_LEN, + NOISE_SYMMETRIC_KEY_LEN, 0, NOISE_PUBLIC_KEY_LEN, + handshake->chaining_key); + + /* {t} */ + tai64n_now(timestamp); + message_encrypt(dst->encrypted_timestamp, timestamp, + NOISE_TIMESTAMP_LEN, key, handshake->hash); + + dst->sender_index = index_hashtable_insert( + &handshake->entry.peer->device->index_hashtable, + &handshake->entry); + + handshake->state = HANDSHAKE_CREATED_INITIATION; + ret = true; + +out: + up_write(&handshake->lock); + up_read(&handshake->static_identity->lock); + memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN); + return ret; +} + +struct wireguard_peer * +noise_handshake_consume_initiation(struct message_handshake_initiation *src, + struct wireguard_device *wg) +{ + struct wireguard_peer *peer = NULL, *ret_peer = NULL; + struct noise_handshake *handshake; + bool replay_attack, flood_attack; + u8 key[NOISE_SYMMETRIC_KEY_LEN]; + u8 chaining_key[NOISE_HASH_LEN]; + u8 hash[NOISE_HASH_LEN]; + u8 s[NOISE_PUBLIC_KEY_LEN]; + u8 e[NOISE_PUBLIC_KEY_LEN]; + u8 t[NOISE_TIMESTAMP_LEN]; + + down_read(&wg->static_identity.lock); + if (unlikely(!wg->static_identity.has_identity)) + goto out; + + handshake_init(chaining_key, hash, wg->static_identity.static_public); + + /* e */ + message_ephemeral(e, src->unencrypted_ephemeral, chaining_key, hash); + + /* es */ + if (!mix_dh(chaining_key, key, wg->static_identity.static_private, e)) + goto out; + + /* s */ + if (!message_decrypt(s, src->encrypted_static, + sizeof(src->encrypted_static), key, hash)) + goto out; + + /* Lookup which peer we're actually talking to */ + peer = pubkey_hashtable_lookup(&wg->peer_hashtable, s); + if (!peer) + goto out; + handshake = &peer->handshake; + + /* ss */ + kdf(chaining_key, key, NULL, handshake->precomputed_static_static, + NOISE_HASH_LEN, NOISE_SYMMETRIC_KEY_LEN, 0, NOISE_PUBLIC_KEY_LEN, + chaining_key); + + /* {t} */ + if (!message_decrypt(t, src->encrypted_timestamp, + sizeof(src->encrypted_timestamp), key, hash)) + goto out; + + down_read(&handshake->lock); + replay_attack = memcmp(t, handshake->latest_timestamp, + NOISE_TIMESTAMP_LEN) <= 0; + flood_attack = handshake->last_initiation_consumption + + NSEC_PER_SEC / INITIATIONS_PER_SECOND > + ktime_get_boot_fast_ns(); + up_read(&handshake->lock); + if (replay_attack || flood_attack) + goto out; + + /* Success! Copy everything to peer */ + down_write(&handshake->lock); + memcpy(handshake->remote_ephemeral, e, NOISE_PUBLIC_KEY_LEN); + memcpy(handshake->latest_timestamp, t, NOISE_TIMESTAMP_LEN); + memcpy(handshake->hash, hash, NOISE_HASH_LEN); + memcpy(handshake->chaining_key, chaining_key, NOISE_HASH_LEN); + handshake->remote_index = src->sender_index; + handshake->last_initiation_consumption = ktime_get_boot_fast_ns(); + handshake->state = HANDSHAKE_CONSUMED_INITIATION; + up_write(&handshake->lock); + ret_peer = peer; + +out: + memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN); + memzero_explicit(hash, NOISE_HASH_LEN); + memzero_explicit(chaining_key, NOISE_HASH_LEN); + up_read(&wg->static_identity.lock); + if (!ret_peer) + peer_put(peer); + return ret_peer; +} + +bool noise_handshake_create_response(struct message_handshake_response *dst, + struct noise_handshake *handshake) +{ + bool ret = false; + u8 key[NOISE_SYMMETRIC_KEY_LEN]; + + /* We need to wait for crng _before_ taking any locks, since + * curve25519_generate_secret uses get_random_bytes_wait. + */ + wait_for_random_bytes(); + + down_read(&handshake->static_identity->lock); + down_write(&handshake->lock); + + if (handshake->state != HANDSHAKE_CONSUMED_INITIATION) + goto out; + + dst->header.type = cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE); + dst->receiver_index = handshake->remote_index; + + /* e */ + curve25519_generate_secret(handshake->ephemeral_private); + if (!curve25519_generate_public(dst->unencrypted_ephemeral, + handshake->ephemeral_private)) + goto out; + message_ephemeral(dst->unencrypted_ephemeral, + dst->unencrypted_ephemeral, handshake->chaining_key, + handshake->hash); + + /* ee */ + if (!mix_dh(handshake->chaining_key, NULL, handshake->ephemeral_private, + handshake->remote_ephemeral)) + goto out; + + /* se */ + if (!mix_dh(handshake->chaining_key, NULL, handshake->ephemeral_private, + handshake->remote_static)) + goto out; + + /* psk */ + mix_psk(handshake->chaining_key, handshake->hash, key, + handshake->preshared_key); + + /* {} */ + message_encrypt(dst->encrypted_nothing, NULL, 0, key, handshake->hash); + + dst->sender_index = index_hashtable_insert( + &handshake->entry.peer->device->index_hashtable, + &handshake->entry); + + handshake->state = HANDSHAKE_CREATED_RESPONSE; + ret = true; + +out: + up_write(&handshake->lock); + up_read(&handshake->static_identity->lock); + memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN); + return ret; +} + +struct wireguard_peer * +noise_handshake_consume_response(struct message_handshake_response *src, + struct wireguard_device *wg) +{ + struct noise_handshake *handshake; + struct wireguard_peer *peer = NULL, *ret_peer = NULL; + u8 key[NOISE_SYMMETRIC_KEY_LEN]; + u8 hash[NOISE_HASH_LEN]; + u8 chaining_key[NOISE_HASH_LEN]; + u8 e[NOISE_PUBLIC_KEY_LEN]; + u8 ephemeral_private[NOISE_PUBLIC_KEY_LEN]; + u8 static_private[NOISE_PUBLIC_KEY_LEN]; + enum noise_handshake_state state = HANDSHAKE_ZEROED; + + down_read(&wg->static_identity.lock); + + if (unlikely(!wg->static_identity.has_identity)) + goto out; + + handshake = (struct noise_handshake *)index_hashtable_lookup( + &wg->index_hashtable, INDEX_HASHTABLE_HANDSHAKE, + src->receiver_index, &peer); + if (unlikely(!handshake)) + goto out; + + down_read(&handshake->lock); + state = handshake->state; + memcpy(hash, handshake->hash, NOISE_HASH_LEN); + memcpy(chaining_key, handshake->chaining_key, NOISE_HASH_LEN); + memcpy(ephemeral_private, handshake->ephemeral_private, + NOISE_PUBLIC_KEY_LEN); + up_read(&handshake->lock); + + if (state != HANDSHAKE_CREATED_INITIATION) + goto fail; + + /* e */ + message_ephemeral(e, src->unencrypted_ephemeral, chaining_key, hash); + + /* ee */ + if (!mix_dh(chaining_key, NULL, ephemeral_private, e)) + goto fail; + + /* se */ + if (!mix_dh(chaining_key, NULL, wg->static_identity.static_private, e)) + goto fail; + + /* psk */ + mix_psk(chaining_key, hash, key, handshake->preshared_key); + + /* {} */ + if (!message_decrypt(NULL, src->encrypted_nothing, + sizeof(src->encrypted_nothing), key, hash)) + goto fail; + + /* Success! Copy everything to peer */ + down_write(&handshake->lock); + /* It's important to check that the state is still the same, while we + * have an exclusive lock. + */ + if (handshake->state != state) { + up_write(&handshake->lock); + goto fail; + } + memcpy(handshake->remote_ephemeral, e, NOISE_PUBLIC_KEY_LEN); + memcpy(handshake->hash, hash, NOISE_HASH_LEN); + memcpy(handshake->chaining_key, chaining_key, NOISE_HASH_LEN); + handshake->remote_index = src->sender_index; + handshake->state = HANDSHAKE_CONSUMED_RESPONSE; + up_write(&handshake->lock); + ret_peer = peer; + goto out; + +fail: + peer_put(peer); +out: + memzero_explicit(key, NOISE_SYMMETRIC_KEY_LEN); + memzero_explicit(hash, NOISE_HASH_LEN); + memzero_explicit(chaining_key, NOISE_HASH_LEN); + memzero_explicit(ephemeral_private, NOISE_PUBLIC_KEY_LEN); + memzero_explicit(static_private, NOISE_PUBLIC_KEY_LEN); + up_read(&wg->static_identity.lock); + return ret_peer; +} + +bool noise_handshake_begin_session(struct noise_handshake *handshake, + struct noise_keypairs *keypairs) +{ + struct noise_keypair *new_keypair; + bool ret = false; + + down_write(&handshake->lock); + if (handshake->state != HANDSHAKE_CREATED_RESPONSE && + handshake->state != HANDSHAKE_CONSUMED_RESPONSE) + goto out; + + new_keypair = keypair_create(handshake->entry.peer); + if (!new_keypair) + goto out; + new_keypair->i_am_the_initiator = handshake->state == + HANDSHAKE_CONSUMED_RESPONSE; + new_keypair->remote_index = handshake->remote_index; + + if (new_keypair->i_am_the_initiator) + derive_keys(&new_keypair->sending, &new_keypair->receiving, + handshake->chaining_key); + else + derive_keys(&new_keypair->receiving, &new_keypair->sending, + handshake->chaining_key); + + handshake_zero(handshake); + rcu_read_lock_bh(); + if (likely(!container_of(handshake, struct wireguard_peer, + handshake)->is_dead)) { + add_new_keypair(keypairs, new_keypair); + net_dbg_ratelimited("%s: Keypair %llu created for peer %llu\n", + handshake->entry.peer->device->dev->name, + new_keypair->internal_id, + handshake->entry.peer->internal_id); + ret = index_hashtable_replace( + &handshake->entry.peer->device->index_hashtable, + &handshake->entry, &new_keypair->entry); + } else + kzfree(new_keypair); + rcu_read_unlock_bh(); + +out: + up_write(&handshake->lock); + return ret; +} diff --git a/drivers/net/wireguard/noise.h b/drivers/net/wireguard/noise.h new file mode 100644 index 000000000000..1fc25efc2d51 --- /dev/null +++ b/drivers/net/wireguard/noise.h @@ -0,0 +1,129 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ +#ifndef _WG_NOISE_H +#define _WG_NOISE_H + +#include "messages.h" +#include "hashtables.h" + +#include +#include +#include +#include +#include +#include +#include + +union noise_counter { + struct { + u64 counter; + unsigned long backtrack[COUNTER_BITS_TOTAL / BITS_PER_LONG]; + spinlock_t lock; + } receive; + atomic64_t counter; +}; + +struct noise_symmetric_key { + u8 key[NOISE_SYMMETRIC_KEY_LEN]; + union noise_counter counter; + u64 birthdate; + bool is_valid; +}; + +struct noise_keypair { + struct index_hashtable_entry entry; + struct noise_symmetric_key sending; + struct noise_symmetric_key receiving; + __le32 remote_index; + bool i_am_the_initiator; + struct kref refcount; + struct rcu_head rcu; + u64 internal_id; +}; + +struct noise_keypairs { + struct noise_keypair __rcu *current_keypair; + struct noise_keypair __rcu *previous_keypair; + struct noise_keypair __rcu *next_keypair; + spinlock_t keypair_update_lock; +}; + +struct noise_static_identity { + u8 static_public[NOISE_PUBLIC_KEY_LEN]; + u8 static_private[NOISE_PUBLIC_KEY_LEN]; + struct rw_semaphore lock; + bool has_identity; +}; + +enum noise_handshake_state { + HANDSHAKE_ZEROED, + HANDSHAKE_CREATED_INITIATION, + HANDSHAKE_CONSUMED_INITIATION, + HANDSHAKE_CREATED_RESPONSE, + HANDSHAKE_CONSUMED_RESPONSE +}; + +struct noise_handshake { + struct index_hashtable_entry entry; + + enum noise_handshake_state state; + u64 last_initiation_consumption; + + struct noise_static_identity *static_identity; + + u8 ephemeral_private[NOISE_PUBLIC_KEY_LEN]; + u8 remote_static[NOISE_PUBLIC_KEY_LEN]; + u8 remote_ephemeral[NOISE_PUBLIC_KEY_LEN]; + u8 precomputed_static_static[NOISE_PUBLIC_KEY_LEN]; + + u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN]; + + u8 hash[NOISE_HASH_LEN]; + u8 chaining_key[NOISE_HASH_LEN]; + + u8 latest_timestamp[NOISE_TIMESTAMP_LEN]; + __le32 remote_index; + + /* Protects all members except the immutable (after noise_handshake_ + * init): remote_static, precomputed_static_static, static_identity. */ + struct rw_semaphore lock; +}; + +struct wireguard_device; + +void noise_init(void); +bool noise_handshake_init(struct noise_handshake *handshake, + struct noise_static_identity *static_identity, + const u8 peer_public_key[NOISE_PUBLIC_KEY_LEN], + const u8 peer_preshared_key[NOISE_SYMMETRIC_KEY_LEN], + struct wireguard_peer *peer); +void noise_handshake_clear(struct noise_handshake *handshake); +void noise_keypair_put(struct noise_keypair *keypair, bool unreference_now); +struct noise_keypair *noise_keypair_get(struct noise_keypair *keypair); +void noise_keypairs_clear(struct noise_keypairs *keypairs); +bool noise_received_with_keypair(struct noise_keypairs *keypairs, + struct noise_keypair *received_keypair); + +void noise_set_static_identity_private_key( + struct noise_static_identity *static_identity, + const u8 private_key[NOISE_PUBLIC_KEY_LEN]); +bool noise_precompute_static_static(struct wireguard_peer *peer); + +bool noise_handshake_create_initiation(struct message_handshake_initiation *dst, + struct noise_handshake *handshake); +struct wireguard_peer * +noise_handshake_consume_initiation(struct message_handshake_initiation *src, + struct wireguard_device *wg); + +bool noise_handshake_create_response(struct message_handshake_response *dst, + struct noise_handshake *handshake); +struct wireguard_peer * +noise_handshake_consume_response(struct message_handshake_response *src, + struct wireguard_device *wg); + +bool noise_handshake_begin_session(struct noise_handshake *handshake, + struct noise_keypairs *keypairs); + +#endif /* _WG_NOISE_H */ diff --git a/drivers/net/wireguard/peer.c b/drivers/net/wireguard/peer.c new file mode 100644 index 000000000000..d9ac366826e0 --- /dev/null +++ b/drivers/net/wireguard/peer.c @@ -0,0 +1,191 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "peer.h" +#include "device.h" +#include "queueing.h" +#include "timers.h" +#include "hashtables.h" +#include "noise.h" + +#include +#include +#include +#include + +static atomic64_t peer_counter = ATOMIC64_INIT(0); + +struct wireguard_peer * +peer_create(struct wireguard_device *wg, + const u8 public_key[NOISE_PUBLIC_KEY_LEN], + const u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN]) +{ + struct wireguard_peer *peer; + + lockdep_assert_held(&wg->device_update_lock); + + if (wg->num_peers >= MAX_PEERS_PER_DEVICE) + return NULL; + + peer = kzalloc(sizeof(*peer), GFP_KERNEL); + if (unlikely(!peer)) + return NULL; + peer->device = wg; + + if (!noise_handshake_init(&peer->handshake, &wg->static_identity, + public_key, preshared_key, peer)) + goto err_1; + if (dst_cache_init(&peer->endpoint_cache, GFP_KERNEL)) + goto err_1; + if (packet_queue_init(&peer->tx_queue, packet_tx_worker, false, + MAX_QUEUED_PACKETS)) + goto err_2; + if (packet_queue_init(&peer->rx_queue, NULL, false, MAX_QUEUED_PACKETS)) + goto err_3; + + peer->internal_id = atomic64_inc_return(&peer_counter); + peer->serial_work_cpu = nr_cpumask_bits; + cookie_init(&peer->latest_cookie); + timers_init(peer); + cookie_checker_precompute_peer_keys(peer); + spin_lock_init(&peer->keypairs.keypair_update_lock); + INIT_WORK(&peer->transmit_handshake_work, packet_handshake_send_worker); + rwlock_init(&peer->endpoint_lock); + kref_init(&peer->refcount); + skb_queue_head_init(&peer->staged_packet_queue); + atomic64_set(&peer->last_sent_handshake, + ktime_get_boot_fast_ns() - + (u64)(REKEY_TIMEOUT + 1) * NSEC_PER_SEC); + set_bit(NAPI_STATE_NO_BUSY_POLL, &peer->napi.state); + netif_napi_add(wg->dev, &peer->napi, packet_rx_poll, NAPI_POLL_WEIGHT); + napi_enable(&peer->napi); + list_add_tail(&peer->peer_list, &wg->peer_list); + pubkey_hashtable_add(&wg->peer_hashtable, peer); + ++wg->num_peers; + pr_debug("%s: Peer %llu created\n", wg->dev->name, peer->internal_id); + return peer; + +err_3: + packet_queue_free(&peer->tx_queue, false); +err_2: + dst_cache_destroy(&peer->endpoint_cache); +err_1: + kfree(peer); + return NULL; +} + +struct wireguard_peer *peer_get_maybe_zero(struct wireguard_peer *peer) +{ + RCU_LOCKDEP_WARN(!rcu_read_lock_bh_held(), + "Taking peer reference without holding the RCU read lock"); + if (unlikely(!peer || !kref_get_unless_zero(&peer->refcount))) + return NULL; + return peer; +} + +/* We have a separate "remove" function to get rid of the final reference + * because peer_list, clearing handshakes, and flushing all require mutexes + * which requires sleeping, which must only be done from certain contexts. + */ +void peer_remove(struct wireguard_peer *peer) +{ + if (unlikely(!peer)) + return; + lockdep_assert_held(&peer->device->device_update_lock); + + /* Remove from configuration-time lookup structures so new packets + * can't enter. + */ + list_del_init(&peer->peer_list); + allowedips_remove_by_peer(&peer->device->peer_allowedips, peer, + &peer->device->device_update_lock); + pubkey_hashtable_remove(&peer->device->peer_hashtable, peer); + + /* Mark as dead, so that we don't allow jumping contexts after. */ + WRITE_ONCE(peer->is_dead, true); + synchronize_rcu_bh(); + + /* Now that no more keypairs can be created for this peer, we destroy + * existing ones. + */ + noise_keypairs_clear(&peer->keypairs); + + /* Destroy all ongoing timers that were in-flight at the beginning of + * this function. + */ + timers_stop(peer); + + /* The transition between packet encryption/decryption queues isn't + * guarded by is_dead, but each reference's life is strictly bounded by + * two generations: once for parallel crypto and once for serial + * ingestion, so we can simply flush twice, and be sure that we no + * longer have references inside these queues. + */ + + /* a) For encrypt/decrypt. */ + flush_workqueue(peer->device->packet_crypt_wq); + /* b.1) For send (but not receive, since that's napi). */ + flush_workqueue(peer->device->packet_crypt_wq); + /* b.2.1) For receive (but not send, since that's wq). */ + napi_disable(&peer->napi); + /* b.2.1) It's now safe to remove the napi struct, which must be done + * here from process context. + */ + netif_napi_del(&peer->napi); + + /* Ensure any workstructs we own (like transmit_handshake_work or + * clear_peer_work) no longer are in use. + */ + flush_workqueue(peer->device->handshake_send_wq); + + --peer->device->num_peers; + peer_put(peer); +} + +static void rcu_release(struct rcu_head *rcu) +{ + struct wireguard_peer *peer = + container_of(rcu, struct wireguard_peer, rcu); + dst_cache_destroy(&peer->endpoint_cache); + packet_queue_free(&peer->rx_queue, false); + packet_queue_free(&peer->tx_queue, false); + kzfree(peer); +} + +static void kref_release(struct kref *refcount) +{ + struct wireguard_peer *peer = + container_of(refcount, struct wireguard_peer, refcount); + pr_debug("%s: Peer %llu (%pISpfsc) destroyed\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr); + /* Remove ourself from dynamic runtime lookup structures, now that the + * last reference is gone. + */ + index_hashtable_remove(&peer->device->index_hashtable, + &peer->handshake.entry); + /* Remove any lingering packets that didn't have a chance to be + * transmitted. + */ + skb_queue_purge(&peer->staged_packet_queue); + /* Free the memory used. */ + call_rcu_bh(&peer->rcu, rcu_release); +} + +void peer_put(struct wireguard_peer *peer) +{ + if (unlikely(!peer)) + return; + kref_put(&peer->refcount, kref_release); +} + +void peer_remove_all(struct wireguard_device *wg) +{ + struct wireguard_peer *peer, *temp; + + lockdep_assert_held(&wg->device_update_lock); + list_for_each_entry_safe (peer, temp, &wg->peer_list, peer_list) + peer_remove(peer); +} diff --git a/drivers/net/wireguard/peer.h b/drivers/net/wireguard/peer.h new file mode 100644 index 000000000000..b95c3ed57801 --- /dev/null +++ b/drivers/net/wireguard/peer.h @@ -0,0 +1,87 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_PEER_H +#define _WG_PEER_H + +#include "device.h" +#include "noise.h" +#include "cookie.h" + +#include +#include +#include +#include +#include + +struct wireguard_device; + +struct endpoint { + union { + struct sockaddr addr; + struct sockaddr_in addr4; + struct sockaddr_in6 addr6; + }; + union { + struct { + struct in_addr src4; + /* Essentially the same as addr6->scope_id */ + int src_if4; + }; + struct in6_addr src6; + }; +}; + +struct wireguard_peer { + struct wireguard_device *device; + struct crypt_queue tx_queue, rx_queue; + struct sk_buff_head staged_packet_queue; + int serial_work_cpu; + struct noise_keypairs keypairs; + struct endpoint endpoint; + struct dst_cache endpoint_cache; + rwlock_t endpoint_lock; + struct noise_handshake handshake; + atomic64_t last_sent_handshake; + struct work_struct transmit_handshake_work, clear_peer_work; + struct cookie latest_cookie; + struct hlist_node pubkey_hash; + u64 rx_bytes, tx_bytes; + struct timer_list timer_retransmit_handshake, timer_send_keepalive; + struct timer_list timer_new_handshake, timer_zero_key_material; + struct timer_list timer_persistent_keepalive; + unsigned int timer_handshake_attempts; + u16 persistent_keepalive_interval; + bool timers_enabled, timer_need_another_keepalive; + bool sent_lastminute_handshake; + struct timespec walltime_last_handshake; + struct kref refcount; + struct rcu_head rcu; + struct list_head peer_list; + u64 internal_id; + struct napi_struct napi; + bool is_dead; +}; + +struct wireguard_peer * +peer_create(struct wireguard_device *wg, + const u8 public_key[NOISE_PUBLIC_KEY_LEN], + const u8 preshared_key[NOISE_SYMMETRIC_KEY_LEN]); + +struct wireguard_peer *__must_check +peer_get_maybe_zero(struct wireguard_peer *peer); +static inline struct wireguard_peer *peer_get(struct wireguard_peer *peer) +{ + kref_get(&peer->refcount); + return peer; +} +void peer_put(struct wireguard_peer *peer); +void peer_remove(struct wireguard_peer *peer); +void peer_remove_all(struct wireguard_device *wg); + +struct wireguard_peer *peer_lookup_by_index(struct wireguard_device *wg, + u32 index); + +#endif /* _WG_PEER_H */ diff --git a/drivers/net/wireguard/queueing.c b/drivers/net/wireguard/queueing.c new file mode 100644 index 000000000000..09eb93eab1b1 --- /dev/null +++ b/drivers/net/wireguard/queueing.c @@ -0,0 +1,52 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "queueing.h" + +struct multicore_worker __percpu * +packet_alloc_percpu_multicore_worker(work_func_t function, void *ptr) +{ + int cpu; + struct multicore_worker __percpu *worker = + alloc_percpu(struct multicore_worker); + + if (!worker) + return NULL; + + for_each_possible_cpu (cpu) { + per_cpu_ptr(worker, cpu)->ptr = ptr; + INIT_WORK(&per_cpu_ptr(worker, cpu)->work, function); + } + return worker; +} + +int packet_queue_init(struct crypt_queue *queue, work_func_t function, + bool multicore, unsigned int len) +{ + int ret; + + memset(queue, 0, sizeof(*queue)); + ret = ptr_ring_init(&queue->ring, len, GFP_KERNEL); + if (ret) + return ret; + if (function) { + if (multicore) { + queue->worker = packet_alloc_percpu_multicore_worker( + function, queue); + if (!queue->worker) + return -ENOMEM; + } else + INIT_WORK(&queue->work, function); + } + return 0; +} + +void packet_queue_free(struct crypt_queue *queue, bool multicore) +{ + if (multicore) + free_percpu(queue->worker); + WARN_ON(!__ptr_ring_empty(&queue->ring)); + ptr_ring_cleanup(&queue->ring, NULL); +} diff --git a/drivers/net/wireguard/queueing.h b/drivers/net/wireguard/queueing.h new file mode 100644 index 000000000000..758a57daced2 --- /dev/null +++ b/drivers/net/wireguard/queueing.h @@ -0,0 +1,193 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_QUEUEING_H +#define _WG_QUEUEING_H + +#include "peer.h" +#include +#include +#include +#include + +struct wireguard_device; +struct wireguard_peer; +struct multicore_worker; +struct crypt_queue; +struct sk_buff; + +/* queueing.c APIs: */ +int packet_queue_init(struct crypt_queue *queue, work_func_t function, + bool multicore, unsigned int len); +void packet_queue_free(struct crypt_queue *queue, bool multicore); +struct multicore_worker __percpu * +packet_alloc_percpu_multicore_worker(work_func_t function, void *ptr); + +/* receive.c APIs: */ +void packet_receive(struct wireguard_device *wg, struct sk_buff *skb); +void packet_handshake_receive_worker(struct work_struct *work); +/* NAPI poll function: */ +int packet_rx_poll(struct napi_struct *napi, int budget); +/* Workqueue worker: */ +void packet_decrypt_worker(struct work_struct *work); + +/* send.c APIs: */ +void packet_send_queued_handshake_initiation(struct wireguard_peer *peer, + bool is_retry); +void packet_send_handshake_response(struct wireguard_peer *peer); +void packet_send_handshake_cookie(struct wireguard_device *wg, + struct sk_buff *initiating_skb, + __le32 sender_index); +void packet_send_keepalive(struct wireguard_peer *peer); +void packet_send_staged_packets(struct wireguard_peer *peer); +/* Workqueue workers: */ +void packet_handshake_send_worker(struct work_struct *work); +void packet_tx_worker(struct work_struct *work); +void packet_encrypt_worker(struct work_struct *work); + +enum packet_state { + PACKET_STATE_UNCRYPTED, + PACKET_STATE_CRYPTED, + PACKET_STATE_DEAD +}; + +struct packet_cb { + u64 nonce; + struct noise_keypair *keypair; + atomic_t state; + u32 mtu; + u8 ds; +}; + +#define PACKET_PEER(skb) (((struct packet_cb *)skb->cb)->keypair->entry.peer) +#define PACKET_CB(skb) ((struct packet_cb *)skb->cb) + +/* Returns either the correct skb->protocol value, or 0 if invalid. */ +static inline __be16 skb_examine_untrusted_ip_hdr(struct sk_buff *skb) +{ + if (skb_network_header(skb) >= skb->head && + (skb_network_header(skb) + sizeof(struct iphdr)) <= + skb_tail_pointer(skb) && + ip_hdr(skb)->version == 4) + return htons(ETH_P_IP); + if (skb_network_header(skb) >= skb->head && + (skb_network_header(skb) + sizeof(struct ipv6hdr)) <= + skb_tail_pointer(skb) && + ipv6_hdr(skb)->version == 6) + return htons(ETH_P_IPV6); + return 0; +} + +static inline void skb_reset(struct sk_buff *skb) +{ + const int pfmemalloc = skb->pfmemalloc; + skb_scrub_packet(skb, true); + memset(&skb->headers_start, 0, + offsetof(struct sk_buff, headers_end) - + offsetof(struct sk_buff, headers_start)); + skb->pfmemalloc = pfmemalloc; + skb->queue_mapping = 0; + skb->nohdr = 0; + skb->peeked = 0; + skb->mac_len = 0; + skb->dev = NULL; +#ifdef CONFIG_NET_SCHED + skb->tc_index = 0; + skb_reset_tc(skb); +#endif + skb->hdr_len = skb_headroom(skb); + skb_reset_mac_header(skb); + skb_reset_network_header(skb); + skb_probe_transport_header(skb, 0); + skb_reset_inner_headers(skb); +} + +static inline int cpumask_choose_online(int *stored_cpu, unsigned int id) +{ + unsigned int cpu = *stored_cpu, cpu_index, i; + + if (unlikely(cpu == nr_cpumask_bits || + !cpumask_test_cpu(cpu, cpu_online_mask))) { + cpu_index = id % cpumask_weight(cpu_online_mask); + cpu = cpumask_first(cpu_online_mask); + for (i = 0; i < cpu_index; ++i) + cpu = cpumask_next(cpu, cpu_online_mask); + *stored_cpu = cpu; + } + return cpu; +} + +/* This function is racy, in the sense that next is unlocked, so it could return + * the same CPU twice. A race-free version of this would be to instead store an + * atomic sequence number, do an increment-and-return, and then iterate through + * every possible CPU until we get to that index -- choose_cpu. However that's + * a bit slower, and it doesn't seem like this potential race actually + * introduces any performance loss, so we live with it. + */ +static inline int cpumask_next_online(int *next) +{ + int cpu = *next; + + while (unlikely(!cpumask_test_cpu(cpu, cpu_online_mask))) + cpu = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits; + *next = cpumask_next(cpu, cpu_online_mask) % nr_cpumask_bits; + return cpu; +} + +static inline int queue_enqueue_per_device_and_peer( + struct crypt_queue *device_queue, struct crypt_queue *peer_queue, + struct sk_buff *skb, struct workqueue_struct *wq, int *next_cpu) +{ + int cpu; + + atomic_set_release(&PACKET_CB(skb)->state, PACKET_STATE_UNCRYPTED); + /* We first queue this up for the peer ingestion, but the consumer + * will wait for the state to change to CRYPTED or DEAD before. + */ + if (unlikely(ptr_ring_produce_bh(&peer_queue->ring, skb))) + return -ENOSPC; + /* Then we queue it up in the device queue, which consumes the + * packet as soon as it can. + */ + cpu = cpumask_next_online(next_cpu); + if (unlikely(ptr_ring_produce_bh(&device_queue->ring, skb))) + return -EPIPE; + queue_work_on(cpu, wq, &per_cpu_ptr(device_queue->worker, cpu)->work); + return 0; +} + +static inline void queue_enqueue_per_peer(struct crypt_queue *queue, + struct sk_buff *skb, + enum packet_state state) +{ + /* We take a reference, because as soon as we call atomic_set, the + * peer can be freed from below us. + */ + struct wireguard_peer *peer = peer_get(PACKET_PEER(skb)); + atomic_set_release(&PACKET_CB(skb)->state, state); + queue_work_on(cpumask_choose_online(&peer->serial_work_cpu, + peer->internal_id), + peer->device->packet_crypt_wq, &queue->work); + peer_put(peer); +} + +static inline void queue_enqueue_per_peer_napi(struct crypt_queue *queue, + struct sk_buff *skb, + enum packet_state state) +{ + /* We take a reference, because as soon as we call atomic_set, the + * peer can be freed from below us. + */ + struct wireguard_peer *peer = peer_get(PACKET_PEER(skb)); + atomic_set_release(&PACKET_CB(skb)->state, state); + napi_schedule(&peer->napi); + peer_put(peer); +} + +#ifdef DEBUG +bool packet_counter_selftest(void); +#endif + +#endif /* _WG_QUEUEING_H */ diff --git a/drivers/net/wireguard/ratelimiter.c b/drivers/net/wireguard/ratelimiter.c new file mode 100644 index 000000000000..6e43d6e8047b --- /dev/null +++ b/drivers/net/wireguard/ratelimiter.c @@ -0,0 +1,220 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "ratelimiter.h" +#include +#include +#include +#include + +static struct kmem_cache *entry_cache; +static hsiphash_key_t key; +static spinlock_t table_lock = __SPIN_LOCK_UNLOCKED("ratelimiter_table_lock"); +static DEFINE_MUTEX(init_lock); +static atomic64_t refcnt = ATOMIC64_INIT(0); +static atomic_t total_entries = ATOMIC_INIT(0); +static unsigned int max_entries, table_size; +static void gc_entries(struct work_struct *); +static DECLARE_DEFERRABLE_WORK(gc_work, gc_entries); +static struct hlist_head *table_v4; +#if IS_ENABLED(CONFIG_IPV6) +static struct hlist_head *table_v6; +#endif + +struct ratelimiter_entry { + u64 last_time_ns, tokens; + __be64 ip; + void *net; + spinlock_t lock; + struct hlist_node hash; + struct rcu_head rcu; +}; + +enum { + PACKETS_PER_SECOND = 20, + PACKETS_BURSTABLE = 5, + PACKET_COST = NSEC_PER_SEC / PACKETS_PER_SECOND, + TOKEN_MAX = PACKET_COST * PACKETS_BURSTABLE +}; + +static void entry_free(struct rcu_head *rcu) +{ + kmem_cache_free(entry_cache, + container_of(rcu, struct ratelimiter_entry, rcu)); + atomic_dec(&total_entries); +} + +static void entry_uninit(struct ratelimiter_entry *entry) +{ + hlist_del_rcu(&entry->hash); + call_rcu(&entry->rcu, entry_free); +} + +/* Calling this function with a NULL work uninits all entries. */ +static void gc_entries(struct work_struct *work) +{ + const u64 now = ktime_get_boot_fast_ns(); + struct ratelimiter_entry *entry; + struct hlist_node *temp; + unsigned int i; + + for (i = 0; i < table_size; ++i) { + spin_lock(&table_lock); + hlist_for_each_entry_safe (entry, temp, &table_v4[i], hash) { + if (unlikely(!work) || + now - entry->last_time_ns > NSEC_PER_SEC) + entry_uninit(entry); + } +#if IS_ENABLED(CONFIG_IPV6) + hlist_for_each_entry_safe (entry, temp, &table_v6[i], hash) { + if (unlikely(!work) || + now - entry->last_time_ns > NSEC_PER_SEC) + entry_uninit(entry); + } +#endif + spin_unlock(&table_lock); + if (likely(work)) + cond_resched(); + } + if (likely(work)) + queue_delayed_work(system_power_efficient_wq, &gc_work, HZ); +} + +bool ratelimiter_allow(struct sk_buff *skb, struct net *net) +{ + struct { __be64 ip; u32 net; } data = { + .net = (unsigned long)net & 0xffffffff }; + struct ratelimiter_entry *entry; + struct hlist_head *bucket; + + if (skb->protocol == htons(ETH_P_IP)) { + data.ip = (__force __be64)ip_hdr(skb)->saddr; + bucket = &table_v4[hsiphash(&data, sizeof(u32) * 3, &key) & + (table_size - 1)]; + } +#if IS_ENABLED(CONFIG_IPV6) + else if (skb->protocol == htons(ETH_P_IPV6)) { + memcpy(&data.ip, &ipv6_hdr(skb)->saddr, + sizeof(__be64)); /* Only 64 bits */ + bucket = &table_v6[hsiphash(&data, sizeof(u32) * 3, &key) & + (table_size - 1)]; + } +#endif + else + return false; + rcu_read_lock(); + hlist_for_each_entry_rcu (entry, bucket, hash) { + if (entry->net == net && entry->ip == data.ip) { + u64 now, tokens; + bool ret; + /* Quasi-inspired by nft_limit.c, but this is actually a + * slightly different algorithm. Namely, we incorporate + * the burst as part of the maximum tokens, rather than + * as part of the rate. + */ + spin_lock(&entry->lock); + now = ktime_get_boot_fast_ns(); + tokens = min_t(u64, TOKEN_MAX, + entry->tokens + now - + entry->last_time_ns); + entry->last_time_ns = now; + ret = tokens >= PACKET_COST; + entry->tokens = ret ? tokens - PACKET_COST : tokens; + spin_unlock(&entry->lock); + rcu_read_unlock(); + return ret; + } + } + rcu_read_unlock(); + + if (atomic_inc_return(&total_entries) > max_entries) + goto err_oom; + + entry = kmem_cache_alloc(entry_cache, GFP_KERNEL); + if (unlikely(!entry)) + goto err_oom; + + entry->net = net; + entry->ip = data.ip; + INIT_HLIST_NODE(&entry->hash); + spin_lock_init(&entry->lock); + entry->last_time_ns = ktime_get_boot_fast_ns(); + entry->tokens = TOKEN_MAX - PACKET_COST; + spin_lock(&table_lock); + hlist_add_head_rcu(&entry->hash, bucket); + spin_unlock(&table_lock); + return true; + +err_oom: + atomic_dec(&total_entries); + return false; +} + +int ratelimiter_init(void) +{ + mutex_lock(&init_lock); + if (atomic64_inc_return(&refcnt) != 1) + goto out; + + entry_cache = KMEM_CACHE(ratelimiter_entry, 0); + if (!entry_cache) + goto err; + + /* xt_hashlimit.c uses a slightly different algorithm for ratelimiting, + * but what it shares in common is that it uses a massive hashtable. So, + * we borrow their wisdom about good table sizes on different systems + * dependent on RAM. This calculation here comes from there. + */ + table_size = (totalram_pages > (1U << 30) / PAGE_SIZE) ? 8192 : + max_t(unsigned long, 16, roundup_pow_of_two( + (totalram_pages << PAGE_SHIFT) / + (1U << 14) / sizeof(struct hlist_head))); + max_entries = table_size * 8; + + table_v4 = kvzalloc(table_size * sizeof(*table_v4), GFP_KERNEL); + if (unlikely(!table_v4)) + goto err_kmemcache; + +#if IS_ENABLED(CONFIG_IPV6) + table_v6 = kvzalloc(table_size * sizeof(*table_v6), GFP_KERNEL); + if (unlikely(!table_v6)) { + kvfree(table_v4); + goto err_kmemcache; + } +#endif + + queue_delayed_work(system_power_efficient_wq, &gc_work, HZ); + get_random_bytes(&key, sizeof(key)); +out: + mutex_unlock(&init_lock); + return 0; + +err_kmemcache: + kmem_cache_destroy(entry_cache); +err: + atomic64_dec(&refcnt); + mutex_unlock(&init_lock); + return -ENOMEM; +} + +void ratelimiter_uninit(void) +{ + mutex_lock(&init_lock); + if (atomic64_dec_if_positive(&refcnt)) + goto out; + + cancel_delayed_work_sync(&gc_work); + gc_entries(NULL); + rcu_barrier(); + kvfree(table_v4); +#if IS_ENABLED(CONFIG_IPV6) + kvfree(table_v6); +#endif + kmem_cache_destroy(entry_cache); +out: + mutex_unlock(&init_lock); +} + +#include "selftest/ratelimiter.h" diff --git a/drivers/net/wireguard/ratelimiter.h b/drivers/net/wireguard/ratelimiter.h new file mode 100644 index 000000000000..83e420310c78 --- /dev/null +++ b/drivers/net/wireguard/ratelimiter.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_RATELIMITER_H +#define _WG_RATELIMITER_H + +#include + +int ratelimiter_init(void); +void ratelimiter_uninit(void); +bool ratelimiter_allow(struct sk_buff *skb, struct net *net); + +#ifdef DEBUG +bool ratelimiter_selftest(void); +#endif + +#endif /* _WG_RATELIMITER_H */ diff --git a/drivers/net/wireguard/receive.c b/drivers/net/wireguard/receive.c new file mode 100644 index 000000000000..6a27bddf65f2 --- /dev/null +++ b/drivers/net/wireguard/receive.c @@ -0,0 +1,595 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "queueing.h" +#include "device.h" +#include "peer.h" +#include "timers.h" +#include "messages.h" +#include "cookie.h" +#include "socket.h" + +#include +#include +#include +#include +#include + +/* Must be called with bh disabled. */ +static void rx_stats(struct wireguard_peer *peer, size_t len) +{ + struct pcpu_sw_netstats *tstats = + get_cpu_ptr(peer->device->dev->tstats); + + u64_stats_update_begin(&tstats->syncp); + ++tstats->rx_packets; + tstats->rx_bytes += len; + peer->rx_bytes += len; + u64_stats_update_end(&tstats->syncp); + put_cpu_ptr(tstats); +} + +#define SKB_TYPE_LE32(skb) (((struct message_header *)(skb)->data)->type) + +static size_t validate_header_len(struct sk_buff *skb) +{ + if (unlikely(skb->len < sizeof(struct message_header))) + return 0; + if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_DATA) && + skb->len >= MESSAGE_MINIMUM_LENGTH) + return sizeof(struct message_data); + if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION) && + skb->len == sizeof(struct message_handshake_initiation)) + return sizeof(struct message_handshake_initiation); + if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE) && + skb->len == sizeof(struct message_handshake_response)) + return sizeof(struct message_handshake_response); + if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE) && + skb->len == sizeof(struct message_handshake_cookie)) + return sizeof(struct message_handshake_cookie); + return 0; +} + +static int skb_prepare_header(struct sk_buff *skb, struct wireguard_device *wg) +{ + size_t data_offset, data_len, header_len; + struct udphdr *udp; + + if (unlikely(skb_examine_untrusted_ip_hdr(skb) != skb->protocol || + skb_transport_header(skb) < skb->head || + (skb_transport_header(skb) + sizeof(struct udphdr)) > + skb_tail_pointer(skb))) + return -EINVAL; /* Bogus IP header */ + udp = udp_hdr(skb); + data_offset = (u8 *)udp - skb->data; + if (unlikely(data_offset > U16_MAX || + data_offset + sizeof(struct udphdr) > skb->len)) + /* Packet has offset at impossible location or isn't big enough + * to have UDP fields. + */ + return -EINVAL; + data_len = ntohs(udp->len); + if (unlikely(data_len < sizeof(struct udphdr) || + data_len > skb->len - data_offset)) + /* UDP packet is reporting too small of a size or lying about + * its size. + */ + return -EINVAL; + data_len -= sizeof(struct udphdr); + data_offset = (u8 *)udp + sizeof(struct udphdr) - skb->data; + if (unlikely(!pskb_may_pull(skb, + data_offset + sizeof(struct message_header)) || + pskb_trim(skb, data_len + data_offset) < 0)) + return -EINVAL; + skb_pull(skb, data_offset); + if (unlikely(skb->len != data_len)) + /* Final len does not agree with calculated len */ + return -EINVAL; + header_len = validate_header_len(skb); + if (unlikely(!header_len)) + return -EINVAL; + __skb_push(skb, data_offset); + if (unlikely(!pskb_may_pull(skb, data_offset + header_len))) + return -EINVAL; + __skb_pull(skb, data_offset); + return 0; +} + +static void receive_handshake_packet(struct wireguard_device *wg, + struct sk_buff *skb) +{ + struct wireguard_peer *peer = NULL; + enum cookie_mac_state mac_state; + /* This is global, so that our load calculation applies to + * the whole system. + */ + static u64 last_under_load; + bool packet_needs_cookie; + bool under_load; + + if (SKB_TYPE_LE32(skb) == cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE)) { + net_dbg_skb_ratelimited("%s: Receiving cookie response from %pISpfsc\n", + wg->dev->name, skb); + cookie_message_consume( + (struct message_handshake_cookie *)skb->data, wg); + return; + } + + under_load = skb_queue_len(&wg->incoming_handshakes) >= + MAX_QUEUED_INCOMING_HANDSHAKES / 8; + if (under_load) + last_under_load = ktime_get_boot_fast_ns(); + else if (last_under_load) + under_load = !has_expired(last_under_load, 1); + mac_state = cookie_validate_packet(&wg->cookie_checker, skb, + under_load); + if ((under_load && mac_state == VALID_MAC_WITH_COOKIE) || + (!under_load && mac_state == VALID_MAC_BUT_NO_COOKIE)) + packet_needs_cookie = false; + else if (under_load && mac_state == VALID_MAC_BUT_NO_COOKIE) + packet_needs_cookie = true; + else { + net_dbg_skb_ratelimited("%s: Invalid MAC of handshake, dropping packet from %pISpfsc\n", + wg->dev->name, skb); + return; + } + + switch (SKB_TYPE_LE32(skb)) { + case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION): { + struct message_handshake_initiation *message = + (struct message_handshake_initiation *)skb->data; + + if (packet_needs_cookie) { + packet_send_handshake_cookie(wg, skb, + message->sender_index); + return; + } + peer = noise_handshake_consume_initiation(message, wg); + if (unlikely(!peer)) { + net_dbg_skb_ratelimited("%s: Invalid handshake initiation from %pISpfsc\n", + wg->dev->name, skb); + return; + } + socket_set_peer_endpoint_from_skb(peer, skb); + net_dbg_ratelimited("%s: Receiving handshake initiation from peer %llu (%pISpfsc)\n", + wg->dev->name, peer->internal_id, + &peer->endpoint.addr); + packet_send_handshake_response(peer); + break; + } + case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE): { + struct message_handshake_response *message = + (struct message_handshake_response *)skb->data; + + if (packet_needs_cookie) { + packet_send_handshake_cookie(wg, skb, + message->sender_index); + return; + } + peer = noise_handshake_consume_response(message, wg); + if (unlikely(!peer)) { + net_dbg_skb_ratelimited("%s: Invalid handshake response from %pISpfsc\n", + wg->dev->name, skb); + return; + } + socket_set_peer_endpoint_from_skb(peer, skb); + net_dbg_ratelimited("%s: Receiving handshake response from peer %llu (%pISpfsc)\n", + wg->dev->name, peer->internal_id, + &peer->endpoint.addr); + if (noise_handshake_begin_session(&peer->handshake, + &peer->keypairs)) { + timers_session_derived(peer); + timers_handshake_complete(peer); + /* Calling this function will either send any existing + * packets in the queue and not send a keepalive, which + * is the best case, Or, if there's nothing in the + * queue, it will send a keepalive, in order to give + * immediate confirmation of the session. + */ + packet_send_keepalive(peer); + } + break; + } + } + + if (unlikely(!peer)) { + WARN(1, "Somehow a wrong type of packet wound up in the handshake queue!\n"); + return; + } + + local_bh_disable(); + rx_stats(peer, skb->len); + local_bh_enable(); + + timers_any_authenticated_packet_received(peer); + timers_any_authenticated_packet_traversal(peer); + peer_put(peer); +} + +void packet_handshake_receive_worker(struct work_struct *work) +{ + struct wireguard_device *wg = + container_of(work, struct multicore_worker, work)->ptr; + struct sk_buff *skb; + + while ((skb = skb_dequeue(&wg->incoming_handshakes)) != NULL) { + receive_handshake_packet(wg, skb); + dev_kfree_skb(skb); + cond_resched(); + } +} + +static void keep_key_fresh(struct wireguard_peer *peer) +{ + struct noise_keypair *keypair; + bool send = false; + + if (peer->sent_lastminute_handshake) + return; + + rcu_read_lock_bh(); + keypair = rcu_dereference_bh(peer->keypairs.current_keypair); + if (likely(keypair && keypair->sending.is_valid) && + keypair->i_am_the_initiator && + unlikely(has_expired(keypair->sending.birthdate, + REJECT_AFTER_TIME - KEEPALIVE_TIMEOUT - REKEY_TIMEOUT))) + send = true; + rcu_read_unlock_bh(); + + if (send) { + peer->sent_lastminute_handshake = true; + packet_send_queued_handshake_initiation(peer, false); + } +} + +static bool skb_decrypt(struct sk_buff *skb, struct noise_symmetric_key *key, + simd_context_t *simd_context) +{ + struct scatterlist sg[MAX_SKB_FRAGS + 8]; + struct sk_buff *trailer; + unsigned int offset; + int num_frags; + + if (unlikely(!key)) + return false; + + if (unlikely(!key->is_valid || + has_expired(key->birthdate, REJECT_AFTER_TIME) || + key->counter.receive.counter >= REJECT_AFTER_MESSAGES)) { + key->is_valid = false; + return false; + } + + PACKET_CB(skb)->nonce = + le64_to_cpu(((struct message_data *)skb->data)->counter); + + /* We ensure that the network header is part of the packet before we + * call skb_cow_data, so that there's no chance that data is removed + * from the skb, so that later we can extract the original endpoint. + */ + offset = skb->data - skb_network_header(skb); + skb_push(skb, offset); + num_frags = skb_cow_data(skb, 0, &trailer); + offset += sizeof(struct message_data); + skb_pull(skb, offset); + if (unlikely(num_frags < 0 || num_frags > ARRAY_SIZE(sg))) + return false; + + sg_init_table(sg, num_frags); + if (skb_to_sgvec(skb, sg, 0, skb->len) <= 0) + return false; + + if (!chacha20poly1305_decrypt_sg(sg, sg, skb->len, NULL, 0, + PACKET_CB(skb)->nonce, key->key, + simd_context)) + return false; + + /* Another ugly situation of pushing and pulling the header so as to + * keep endpoint information intact. + */ + skb_push(skb, offset); + if (pskb_trim(skb, skb->len - noise_encrypted_len(0))) + return false; + skb_pull(skb, offset); + + return true; +} + +/* This is RFC6479, a replay detection bitmap algorithm that avoids bitshifts */ +static bool counter_validate(union noise_counter *counter, u64 their_counter) +{ + unsigned long index, index_current, top, i; + bool ret = false; + + spin_lock_bh(&counter->receive.lock); + + if (unlikely(counter->receive.counter >= REJECT_AFTER_MESSAGES + 1 || + their_counter >= REJECT_AFTER_MESSAGES)) + goto out; + + ++their_counter; + + if (unlikely((COUNTER_WINDOW_SIZE + their_counter) < + counter->receive.counter)) + goto out; + + index = their_counter >> ilog2(BITS_PER_LONG); + + if (likely(their_counter > counter->receive.counter)) { + index_current = counter->receive.counter >> ilog2(BITS_PER_LONG); + top = min_t(unsigned long, index - index_current, + COUNTER_BITS_TOTAL / BITS_PER_LONG); + for (i = 1; i <= top; ++i) + counter->receive.backtrack[(i + index_current) & + ((COUNTER_BITS_TOTAL / BITS_PER_LONG) - 1)] = 0; + counter->receive.counter = their_counter; + } + + index &= (COUNTER_BITS_TOTAL / BITS_PER_LONG) - 1; + ret = !test_and_set_bit(their_counter & (BITS_PER_LONG - 1), + &counter->receive.backtrack[index]); + +out: + spin_unlock_bh(&counter->receive.lock); + return ret; +} +#include "selftest/counter.h" + +static void packet_consume_data_done(struct wireguard_peer *peer, + struct sk_buff *skb, + struct endpoint *endpoint) +{ + struct net_device *dev = peer->device->dev; + struct wireguard_peer *routed_peer; + unsigned int len, len_before_trim; + + socket_set_peer_endpoint(peer, endpoint); + + if (unlikely(noise_received_with_keypair(&peer->keypairs, + PACKET_CB(skb)->keypair))) { + timers_handshake_complete(peer); + packet_send_staged_packets(peer); + } + + keep_key_fresh(peer); + + timers_any_authenticated_packet_received(peer); + timers_any_authenticated_packet_traversal(peer); + + /* A packet with length 0 is a keepalive packet */ + if (unlikely(!skb->len)) { + rx_stats(peer, message_data_len(0)); + net_dbg_ratelimited("%s: Receiving keepalive packet from peer %llu (%pISpfsc)\n", + dev->name, peer->internal_id, + &peer->endpoint.addr); + goto packet_processed; + } + + timers_data_received(peer); + + if (unlikely(skb_network_header(skb) < skb->head)) + goto dishonest_packet_size; + if (unlikely(!(pskb_network_may_pull(skb, sizeof(struct iphdr)) && + (ip_hdr(skb)->version == 4 || + (ip_hdr(skb)->version == 6 && + pskb_network_may_pull(skb, sizeof(struct ipv6hdr))))))) + goto dishonest_packet_type; + + skb->dev = dev; + skb->ip_summed = CHECKSUM_UNNECESSARY; + skb->protocol = skb_examine_untrusted_ip_hdr(skb); + if (skb->protocol == htons(ETH_P_IP)) { + len = ntohs(ip_hdr(skb)->tot_len); + if (unlikely(len < sizeof(struct iphdr))) + goto dishonest_packet_size; + if (INET_ECN_is_ce(PACKET_CB(skb)->ds)) + IP_ECN_set_ce(ip_hdr(skb)); + } else if (skb->protocol == htons(ETH_P_IPV6)) { + len = ntohs(ipv6_hdr(skb)->payload_len) + + sizeof(struct ipv6hdr); + if (INET_ECN_is_ce(PACKET_CB(skb)->ds)) + IP6_ECN_set_ce(skb, ipv6_hdr(skb)); + } else + goto dishonest_packet_type; + + if (unlikely(len > skb->len)) + goto dishonest_packet_size; + len_before_trim = skb->len; + if (unlikely(pskb_trim(skb, len))) + goto packet_processed; + + routed_peer = allowedips_lookup_src(&peer->device->peer_allowedips, skb); + peer_put(routed_peer); /* We don't need the extra reference. */ + + if (unlikely(routed_peer != peer)) + goto dishonest_packet_peer; + + if (unlikely(napi_gro_receive(&peer->napi, skb) == GRO_DROP)) { + ++dev->stats.rx_dropped; + net_dbg_ratelimited("%s: Failed to give packet to userspace from peer %llu (%pISpfsc)\n", + dev->name, peer->internal_id, + &peer->endpoint.addr); + } else + rx_stats(peer, message_data_len(len_before_trim)); + return; + +dishonest_packet_peer: + net_dbg_skb_ratelimited("%s: Packet has unallowed src IP (%pISc) from peer %llu (%pISpfsc)\n", + dev->name, skb, peer->internal_id, + &peer->endpoint.addr); + ++dev->stats.rx_errors; + ++dev->stats.rx_frame_errors; + goto packet_processed; +dishonest_packet_type: + net_dbg_ratelimited("%s: Packet is neither ipv4 nor ipv6 from peer %llu (%pISpfsc)\n", + dev->name, peer->internal_id, &peer->endpoint.addr); + ++dev->stats.rx_errors; + ++dev->stats.rx_frame_errors; + goto packet_processed; +dishonest_packet_size: + net_dbg_ratelimited("%s: Packet has incorrect size from peer %llu (%pISpfsc)\n", + dev->name, peer->internal_id, &peer->endpoint.addr); + ++dev->stats.rx_errors; + ++dev->stats.rx_length_errors; + goto packet_processed; +packet_processed: + dev_kfree_skb(skb); +} + +int packet_rx_poll(struct napi_struct *napi, int budget) +{ + struct wireguard_peer *peer = + container_of(napi, struct wireguard_peer, napi); + struct crypt_queue *queue = &peer->rx_queue; + struct noise_keypair *keypair; + struct endpoint endpoint; + enum packet_state state; + struct sk_buff *skb; + int work_done = 0; + bool free; + + if (unlikely(budget <= 0)) + return 0; + + while ((skb = __ptr_ring_peek(&queue->ring)) != NULL && + (state = atomic_read_acquire(&PACKET_CB(skb)->state)) != + PACKET_STATE_UNCRYPTED) { + __ptr_ring_discard_one(&queue->ring); + peer = PACKET_PEER(skb); + keypair = PACKET_CB(skb)->keypair; + free = true; + + if (unlikely(state != PACKET_STATE_CRYPTED)) + goto next; + + if (unlikely(!counter_validate(&keypair->receiving.counter, + PACKET_CB(skb)->nonce))) { + net_dbg_ratelimited("%s: Packet has invalid nonce %llu (max %llu)\n", + peer->device->dev->name, + PACKET_CB(skb)->nonce, + keypair->receiving.counter.receive.counter); + goto next; + } + + if (unlikely(socket_endpoint_from_skb(&endpoint, skb))) + goto next; + + skb_reset(skb); + packet_consume_data_done(peer, skb, &endpoint); + free = false; + + next: + noise_keypair_put(keypair, false); + peer_put(peer); + if (unlikely(free)) + dev_kfree_skb(skb); + + if (++work_done >= budget) + break; + } + + if (work_done < budget) + napi_complete_done(napi, work_done); + + return work_done; +} + +void packet_decrypt_worker(struct work_struct *work) +{ + struct crypt_queue *queue = + container_of(work, struct multicore_worker, work)->ptr; + simd_context_t simd_context; + struct sk_buff *skb; + + simd_get(&simd_context); + while ((skb = ptr_ring_consume_bh(&queue->ring)) != NULL) { + enum packet_state state = likely(skb_decrypt(skb, + &PACKET_CB(skb)->keypair->receiving, + &simd_context)) ? + PACKET_STATE_CRYPTED : PACKET_STATE_DEAD; + queue_enqueue_per_peer_napi(&PACKET_PEER(skb)->rx_queue, skb, + state); + simd_relax(&simd_context); + } + + simd_put(&simd_context); +} + +static void packet_consume_data(struct wireguard_device *wg, + struct sk_buff *skb) +{ + __le32 idx = ((struct message_data *)skb->data)->key_idx; + struct wireguard_peer *peer = NULL; + int ret; + + rcu_read_lock_bh(); + PACKET_CB(skb)->keypair = + (struct noise_keypair *)index_hashtable_lookup( + &wg->index_hashtable, INDEX_HASHTABLE_KEYPAIR, idx, + &peer); + if (unlikely(!noise_keypair_get(PACKET_CB(skb)->keypair))) + goto err_keypair; + + if (unlikely(peer->is_dead)) + goto err; + + ret = queue_enqueue_per_device_and_peer(&wg->decrypt_queue, + &peer->rx_queue, skb, + wg->packet_crypt_wq, + &wg->decrypt_queue.last_cpu); + if (unlikely(ret == -EPIPE)) + queue_enqueue_per_peer(&peer->rx_queue, skb, PACKET_STATE_DEAD); + if (likely(!ret || ret == -EPIPE)) { + rcu_read_unlock_bh(); + return; + } +err: + noise_keypair_put(PACKET_CB(skb)->keypair, false); +err_keypair: + rcu_read_unlock_bh(); + peer_put(peer); + dev_kfree_skb(skb); +} + +void packet_receive(struct wireguard_device *wg, struct sk_buff *skb) +{ + if (unlikely(skb_prepare_header(skb, wg) < 0)) + goto err; + switch (SKB_TYPE_LE32(skb)) { + case cpu_to_le32(MESSAGE_HANDSHAKE_INITIATION): + case cpu_to_le32(MESSAGE_HANDSHAKE_RESPONSE): + case cpu_to_le32(MESSAGE_HANDSHAKE_COOKIE): { + int cpu; + + if (skb_queue_len(&wg->incoming_handshakes) > + MAX_QUEUED_INCOMING_HANDSHAKES || + unlikely(!rng_is_initialized())) { + net_dbg_skb_ratelimited("%s: Dropping handshake packet from %pISpfsc\n", + wg->dev->name, skb); + goto err; + } + skb_queue_tail(&wg->incoming_handshakes, skb); + /* Queues up a call to packet_process_queued_handshake_ + * packets(skb): + */ + cpu = cpumask_next_online(&wg->incoming_handshake_cpu); + queue_work_on(cpu, wg->handshake_receive_wq, + &per_cpu_ptr(wg->incoming_handshakes_worker, cpu)->work); + break; + } + case cpu_to_le32(MESSAGE_DATA): + PACKET_CB(skb)->ds = ip_tunnel_get_dsfield(ip_hdr(skb), skb); + packet_consume_data(wg, skb); + break; + default: + net_dbg_skb_ratelimited("%s: Invalid packet from %pISpfsc\n", + wg->dev->name, skb); + goto err; + } + return; + +err: + dev_kfree_skb(skb); +} diff --git a/drivers/net/wireguard/selftest/allowedips.h b/drivers/net/wireguard/selftest/allowedips.h new file mode 100644 index 000000000000..ca5f256c81a8 --- /dev/null +++ b/drivers/net/wireguard/selftest/allowedips.h @@ -0,0 +1,663 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifdef DEBUG + +#ifdef DEBUG_PRINT_TRIE_GRAPHVIZ +#include + +static __init void swap_endian_and_apply_cidr(u8 *dst, const u8 *src, u8 bits, + u8 cidr) +{ + swap_endian(dst, src, bits); + memset(dst + (cidr + 7) / 8, 0, bits / 8 - (cidr + 7) / 8); + if (cidr) + dst[(cidr + 7) / 8 - 1] &= ~0U << ((8 - (cidr % 8)) % 8); +} + +static __init void print_node(struct allowedips_node *node, u8 bits) +{ + char *fmt_connection = KERN_DEBUG "\t\"%p/%d\" -> \"%p/%d\";\n"; + char *fmt_declaration = KERN_DEBUG + "\t\"%p/%d\"[style=%s, color=\"#%06x\"];\n"; + char *style = "dotted"; + u8 ip1[16], ip2[16]; + u32 color = 0; + + if (bits == 32) { + fmt_connection = KERN_DEBUG "\t\"%pI4/%d\" -> \"%pI4/%d\";\n"; + fmt_declaration = KERN_DEBUG + "\t\"%pI4/%d\"[style=%s, color=\"#%06x\"];\n"; + } else if (bits == 128) { + fmt_connection = KERN_DEBUG "\t\"%pI6/%d\" -> \"%pI6/%d\";\n"; + fmt_declaration = KERN_DEBUG + "\t\"%pI6/%d\"[style=%s, color=\"#%06x\"];\n"; + } + if (node->peer) { + hsiphash_key_t key = { 0 }; + memcpy(&key, &node->peer, sizeof(node->peer)); + color = hsiphash_1u32(0xdeadbeef, &key) % 200 << 16 | + hsiphash_1u32(0xbabecafe, &key) % 200 << 8 | + hsiphash_1u32(0xabad1dea, &key) % 200; + style = "bold"; + } + swap_endian_and_apply_cidr(ip1, node->bits, bits, node->cidr); + printk(fmt_declaration, ip1, node->cidr, style, color); + if (node->bit[0]) { + swap_endian_and_apply_cidr(ip2, node->bit[0]->bits, bits, + node->cidr); + printk(fmt_connection, ip1, node->cidr, ip2, + node->bit[0]->cidr); + print_node(node->bit[0], bits); + } + if (node->bit[1]) { + swap_endian_and_apply_cidr(ip2, node->bit[1]->bits, bits, + node->cidr); + printk(fmt_connection, ip1, node->cidr, ip2, + node->bit[1]->cidr); + print_node(node->bit[1], bits); + } +} +static __init void print_tree(struct allowedips_node *top, u8 bits) +{ + printk(KERN_DEBUG "digraph trie {\n"); + print_node(top, bits); + printk(KERN_DEBUG "}\n"); +} +#endif + +#ifdef DEBUG_RANDOM_TRIE +#define NUM_PEERS 2000 +#define NUM_RAND_ROUTES 400 +#define NUM_MUTATED_ROUTES 100 +#define NUM_QUERIES (NUM_RAND_ROUTES * NUM_MUTATED_ROUTES * 30) +#include +struct horrible_allowedips { + struct hlist_head head; +}; +struct horrible_allowedips_node { + struct hlist_node table; + union nf_inet_addr ip; + union nf_inet_addr mask; + uint8_t ip_version; + void *value; +}; +static __init void horrible_allowedips_init(struct horrible_allowedips *table) +{ + INIT_HLIST_HEAD(&table->head); +} +static __init void horrible_allowedips_free(struct horrible_allowedips *table) +{ + struct horrible_allowedips_node *node; + struct hlist_node *h; + + hlist_for_each_entry_safe (node, h, &table->head, table) { + hlist_del(&node->table); + kfree(node); + } +} +static __init inline union nf_inet_addr horrible_cidr_to_mask(uint8_t cidr) +{ + union nf_inet_addr mask; + + memset(&mask, 0x00, 128 / 8); + memset(&mask, 0xff, cidr / 8); + if (cidr % 32) + mask.all[cidr / 32] = htonl( + (0xFFFFFFFFUL << (32 - (cidr % 32))) & 0xFFFFFFFFUL); + return mask; +} +static __init inline uint8_t horrible_mask_to_cidr(union nf_inet_addr subnet) +{ + return hweight32(subnet.all[0]) + hweight32(subnet.all[1]) + + hweight32(subnet.all[2]) + hweight32(subnet.all[3]); +} +static __init inline void +horrible_mask_self(struct horrible_allowedips_node *node) +{ + if (node->ip_version == 4) + node->ip.ip &= node->mask.ip; + else if (node->ip_version == 6) { + node->ip.ip6[0] &= node->mask.ip6[0]; + node->ip.ip6[1] &= node->mask.ip6[1]; + node->ip.ip6[2] &= node->mask.ip6[2]; + node->ip.ip6[3] &= node->mask.ip6[3]; + } +} +static __init inline bool +horrible_match_v4(const struct horrible_allowedips_node *node, + struct in_addr *ip) +{ + return (ip->s_addr & node->mask.ip) == node->ip.ip; +} +static __init inline bool +horrible_match_v6(const struct horrible_allowedips_node *node, + struct in6_addr *ip) +{ + return (ip->in6_u.u6_addr32[0] & node->mask.ip6[0]) == + node->ip.ip6[0] && + (ip->in6_u.u6_addr32[1] & node->mask.ip6[1]) == + node->ip.ip6[1] && + (ip->in6_u.u6_addr32[2] & node->mask.ip6[2]) == + node->ip.ip6[2] && + (ip->in6_u.u6_addr32[3] & node->mask.ip6[3]) == node->ip.ip6[3]; +} +static __init void +horrible_insert_ordered(struct horrible_allowedips *table, + struct horrible_allowedips_node *node) +{ + struct horrible_allowedips_node *other = NULL, *where = NULL; + uint8_t my_cidr = horrible_mask_to_cidr(node->mask); + + hlist_for_each_entry (other, &table->head, table) { + if (!memcmp(&other->mask, &node->mask, + sizeof(union nf_inet_addr)) && + !memcmp(&other->ip, &node->ip, + sizeof(union nf_inet_addr)) && + other->ip_version == node->ip_version) { + other->value = node->value; + kfree(node); + return; + } + where = other; + if (horrible_mask_to_cidr(other->mask) <= my_cidr) + break; + } + if (!other && !where) + hlist_add_head(&node->table, &table->head); + else if (!other) + hlist_add_behind(&node->table, &where->table); + else + hlist_add_before(&node->table, &where->table); +} +static __init int +horrible_allowedips_insert_v4(struct horrible_allowedips *table, + struct in_addr *ip, uint8_t cidr, void *value) +{ + struct horrible_allowedips_node *node = kzalloc(sizeof(*node), GFP_KERNEL); + + if (unlikely(!node)) + return -ENOMEM; + node->ip.in = *ip; + node->mask = horrible_cidr_to_mask(cidr); + node->ip_version = 4; + node->value = value; + horrible_mask_self(node); + horrible_insert_ordered(table, node); + return 0; +} +static __init int +horrible_allowedips_insert_v6(struct horrible_allowedips *table, + struct in6_addr *ip, uint8_t cidr, void *value) +{ + struct horrible_allowedips_node *node = kzalloc(sizeof(*node), GFP_KERNEL); + + if (unlikely(!node)) + return -ENOMEM; + node->ip.in6 = *ip; + node->mask = horrible_cidr_to_mask(cidr); + node->ip_version = 6; + node->value = value; + horrible_mask_self(node); + horrible_insert_ordered(table, node); + return 0; +} +static __init void * +horrible_allowedips_lookup_v4(struct horrible_allowedips *table, + struct in_addr *ip) +{ + struct horrible_allowedips_node *node; + void *ret = NULL; + + hlist_for_each_entry (node, &table->head, table) { + if (node->ip_version != 4) + continue; + if (horrible_match_v4(node, ip)) { + ret = node->value; + break; + } + } + return ret; +} +static __init void * +horrible_allowedips_lookup_v6(struct horrible_allowedips *table, + struct in6_addr *ip) +{ + struct horrible_allowedips_node *node; + void *ret = NULL; + + hlist_for_each_entry (node, &table->head, table) { + if (node->ip_version != 6) + continue; + if (horrible_match_v6(node, ip)) { + ret = node->value; + break; + } + } + return ret; +} + +static __init bool randomized_test(void) +{ + unsigned int i, j, k, mutate_amount, cidr; + u8 ip[16], mutate_mask[16], mutated[16]; + struct wireguard_peer **peers, *peer; + struct horrible_allowedips h; + DEFINE_MUTEX(mutex); + struct allowedips t; + bool ret = false; + + mutex_init(&mutex); + + allowedips_init(&t); + horrible_allowedips_init(&h); + + peers = kcalloc(NUM_PEERS, sizeof(*peers), GFP_KERNEL); + if (unlikely(!peers)) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + for (i = 0; i < NUM_PEERS; ++i) { + peers[i] = kzalloc(sizeof(*peers[i]), GFP_KERNEL); + if (unlikely(!peers[i])) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + kref_init(&peers[i]->refcount); + } + + mutex_lock(&mutex); + + for (i = 0; i < NUM_RAND_ROUTES; ++i) { + prandom_bytes(ip, 4); + cidr = prandom_u32_max(32) + 1; + peer = peers[prandom_u32_max(NUM_PEERS)]; + if (allowedips_insert_v4(&t, (struct in_addr *)ip, cidr, peer, + &mutex) < 0) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + if (horrible_allowedips_insert_v4(&h, (struct in_addr *)ip, + cidr, peer) < 0) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + for (j = 0; j < NUM_MUTATED_ROUTES; ++j) { + memcpy(mutated, ip, 4); + prandom_bytes(mutate_mask, 4); + mutate_amount = prandom_u32_max(32); + for (k = 0; k < mutate_amount / 8; ++k) + mutate_mask[k] = 0xff; + mutate_mask[k] = 0xff + << ((8 - (mutate_amount % 8)) % 8); + for (; k < 4; ++k) + mutate_mask[k] = 0; + for (k = 0; k < 4; ++k) + mutated[k] = (mutated[k] & mutate_mask[k]) | + (~mutate_mask[k] & + prandom_u32_max(256)); + cidr = prandom_u32_max(32) + 1; + peer = peers[prandom_u32_max(NUM_PEERS)]; + if (allowedips_insert_v4(&t, (struct in_addr *)mutated, + cidr, peer, &mutex) < 0) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + if (horrible_allowedips_insert_v4(&h, + (struct in_addr *)mutated, cidr, peer)) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + } + } + + for (i = 0; i < NUM_RAND_ROUTES; ++i) { + prandom_bytes(ip, 16); + cidr = prandom_u32_max(128) + 1; + peer = peers[prandom_u32_max(NUM_PEERS)]; + if (allowedips_insert_v6(&t, (struct in6_addr *)ip, cidr, peer, + &mutex) < 0) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + if (horrible_allowedips_insert_v6(&h, (struct in6_addr *)ip, + cidr, peer) < 0) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + for (j = 0; j < NUM_MUTATED_ROUTES; ++j) { + memcpy(mutated, ip, 16); + prandom_bytes(mutate_mask, 16); + mutate_amount = prandom_u32_max(128); + for (k = 0; k < mutate_amount / 8; ++k) + mutate_mask[k] = 0xff; + mutate_mask[k] = 0xff + << ((8 - (mutate_amount % 8)) % 8); + for (; k < 4; ++k) + mutate_mask[k] = 0; + for (k = 0; k < 4; ++k) + mutated[k] = (mutated[k] & mutate_mask[k]) | + (~mutate_mask[k] & + prandom_u32_max(256)); + cidr = prandom_u32_max(128) + 1; + peer = peers[prandom_u32_max(NUM_PEERS)]; + if (allowedips_insert_v6(&t, (struct in6_addr *)mutated, + cidr, peer, &mutex) < 0) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + if (horrible_allowedips_insert_v6( + &h, (struct in6_addr *)mutated, cidr, + peer)) { + pr_info("allowedips random self-test: out of memory\n"); + goto free; + } + } + } + + mutex_unlock(&mutex); + +#ifdef DEBUG_PRINT_TRIE_GRAPHVIZ + print_tree(t.root4, 32); + print_tree(t.root6, 128); +#endif + + for (i = 0; i < NUM_QUERIES; ++i) { + prandom_bytes(ip, 4); + if (lookup(t.root4, 32, ip) != + horrible_allowedips_lookup_v4(&h, (struct in_addr *)ip)) { + pr_info("allowedips random self-test: FAIL\n"); + goto free; + } + } + + for (i = 0; i < NUM_QUERIES; ++i) { + prandom_bytes(ip, 16); + if (lookup(t.root6, 128, ip) != + horrible_allowedips_lookup_v6(&h, (struct in6_addr *)ip)) { + pr_info("allowedips random self-test: FAIL\n"); + goto free; + } + } + ret = true; + +free: + mutex_lock(&mutex); + allowedips_free(&t, &mutex); + mutex_unlock(&mutex); + horrible_allowedips_free(&h); + if (peers) { + for (i = 0; i < NUM_PEERS; ++i) + kfree(peers[i]); + } + kfree(peers); + return ret; +} +#endif + +static __init inline struct in_addr *ip4(u8 a, u8 b, u8 c, u8 d) +{ + static struct in_addr ip; + u8 *split = (u8 *)&ip; + split[0] = a; + split[1] = b; + split[2] = c; + split[3] = d; + return &ip; +} +static __init inline struct in6_addr *ip6(u32 a, u32 b, u32 c, u32 d) +{ + static struct in6_addr ip; + __be32 *split = (__be32 *)&ip; + split[0] = cpu_to_be32(a); + split[1] = cpu_to_be32(b); + split[2] = cpu_to_be32(c); + split[3] = cpu_to_be32(d); + return &ip; +} + +struct walk_ctx { + int count; + bool found_a, found_b, found_c, found_d, found_e; + bool found_other; +}; + +static __init int walk_callback(void *ctx, const u8 *ip, u8 cidr, int family) +{ + struct walk_ctx *wctx = ctx; + + wctx->count++; + + if (cidr == 27 && + !memcmp(ip, ip4(192, 95, 5, 64), sizeof(struct in_addr))) + wctx->found_a = true; + else if (cidr == 128 && + !memcmp(ip, ip6(0x26075300, 0x60006b00, 0, 0xc05f0543), + sizeof(struct in6_addr))) + wctx->found_b = true; + else if (cidr == 29 && + !memcmp(ip, ip4(10, 1, 0, 16), sizeof(struct in_addr))) + wctx->found_c = true; + else if (cidr == 83 && + !memcmp(ip, ip6(0x26075300, 0x6d8a6bf8, 0xdab1e000, 0), + sizeof(struct in6_addr))) + wctx->found_d = true; + else if (cidr == 21 && + !memcmp(ip, ip6(0x26075000, 0, 0, 0), sizeof(struct in6_addr))) + wctx->found_e = true; + else + wctx->found_other = true; + + return 0; +} + +#define init_peer(name) do { \ + name = kzalloc(sizeof(*name), GFP_KERNEL); \ + if (unlikely(!name)) { \ + pr_info("allowedips self-test: out of memory\n"); \ + goto free; \ + } \ + kref_init(&name->refcount); \ + } while (0) + +#define insert(version, mem, ipa, ipb, ipc, ipd, cidr) \ + allowedips_insert_v##version(&t, ip##version(ipa, ipb, ipc, ipd), \ + cidr, mem, &mutex) + +#define maybe_fail() do { \ + ++i; \ + if (!_s) { \ + pr_info("allowedips self-test %zu: FAIL\n", i); \ + success = false; \ + } \ + } while (0) + +#define test(version, mem, ipa, ipb, ipc, ipd) do { \ + bool _s = lookup(t.root##version, version == 4 ? 32 : 128, \ + ip##version(ipa, ipb, ipc, ipd)) == mem; \ + maybe_fail(); \ + } while (0) + +#define test_negative(version, mem, ipa, ipb, ipc, ipd) do { \ + bool _s = lookup(t.root##version, version == 4 ? 32 : 128, \ + ip##version(ipa, ipb, ipc, ipd)) != mem; \ + maybe_fail(); \ + } while (0) + +#define test_boolean(cond) do { \ + bool _s = (cond); \ + maybe_fail(); \ + } while (0) + +bool __init allowedips_selftest(void) +{ + struct wireguard_peer *a = NULL, *b = NULL, *c = NULL, *d = NULL, + *e = NULL, *f = NULL, *g = NULL, *h = NULL; + struct allowedips_cursor *cursor; + struct walk_ctx wctx = { 0 }; + bool success = false; + struct allowedips t; + DEFINE_MUTEX(mutex); + struct in6_addr ip; + size_t i = 0; + __be64 part; + + cursor = kzalloc(sizeof(*cursor), GFP_KERNEL); + if (!cursor) { + pr_info("allowedips self-test malloc: FAIL\n"); + return false; + } + + mutex_init(&mutex); + mutex_lock(&mutex); + + allowedips_init(&t); + init_peer(a); + init_peer(b); + init_peer(c); + init_peer(d); + init_peer(e); + init_peer(f); + init_peer(g); + init_peer(h); + + insert(4, a, 192, 168, 4, 0, 24); + insert(4, b, 192, 168, 4, 4, 32); + insert(4, c, 192, 168, 0, 0, 16); + insert(4, d, 192, 95, 5, 64, 27); + /* replaces previous entry, and maskself is required */ + insert(4, c, 192, 95, 5, 65, 27); + insert(6, d, 0x26075300, 0x60006b00, 0, 0xc05f0543, 128); + insert(6, c, 0x26075300, 0x60006b00, 0, 0, 64); + insert(4, e, 0, 0, 0, 0, 0); + insert(6, e, 0, 0, 0, 0, 0); + /* replaces previous entry */ + insert(6, f, 0, 0, 0, 0, 0); + insert(6, g, 0x24046800, 0, 0, 0, 32); + /* maskself is required */ + insert(6, h, 0x24046800, 0x40040800, 0xdeadbeef, 0xdeadbeef, 64); + insert(6, a, 0x24046800, 0x40040800, 0xdeadbeef, 0xdeadbeef, 128); + insert(6, c, 0x24446800, 0x40e40800, 0xdeaebeef, 0xdefbeef, 128); + insert(6, b, 0x24446800, 0xf0e40800, 0xeeaebeef, 0, 98); + insert(4, g, 64, 15, 112, 0, 20); + /* maskself is required */ + insert(4, h, 64, 15, 123, 211, 25); + insert(4, a, 10, 0, 0, 0, 25); + insert(4, b, 10, 0, 0, 128, 25); + insert(4, a, 10, 1, 0, 0, 30); + insert(4, b, 10, 1, 0, 4, 30); + insert(4, c, 10, 1, 0, 8, 29); + insert(4, d, 10, 1, 0, 16, 29); + +#ifdef DEBUG_PRINT_TRIE_GRAPHVIZ + print_tree(t.root4, 32); + print_tree(t.root6, 128); +#endif + + success = true; + + test(4, a, 192, 168, 4, 20); + test(4, a, 192, 168, 4, 0); + test(4, b, 192, 168, 4, 4); + test(4, c, 192, 168, 200, 182); + test(4, c, 192, 95, 5, 68); + test(4, e, 192, 95, 5, 96); + test(6, d, 0x26075300, 0x60006b00, 0, 0xc05f0543); + test(6, c, 0x26075300, 0x60006b00, 0, 0xc02e01ee); + test(6, f, 0x26075300, 0x60006b01, 0, 0); + test(6, g, 0x24046800, 0x40040806, 0, 0x1006); + test(6, g, 0x24046800, 0x40040806, 0x1234, 0x5678); + test(6, f, 0x240467ff, 0x40040806, 0x1234, 0x5678); + test(6, f, 0x24046801, 0x40040806, 0x1234, 0x5678); + test(6, h, 0x24046800, 0x40040800, 0x1234, 0x5678); + test(6, h, 0x24046800, 0x40040800, 0, 0); + test(6, h, 0x24046800, 0x40040800, 0x10101010, 0x10101010); + test(6, a, 0x24046800, 0x40040800, 0xdeadbeef, 0xdeadbeef); + test(4, g, 64, 15, 116, 26); + test(4, g, 64, 15, 127, 3); + test(4, g, 64, 15, 123, 1); + test(4, h, 64, 15, 123, 128); + test(4, h, 64, 15, 123, 129); + test(4, a, 10, 0, 0, 52); + test(4, b, 10, 0, 0, 220); + test(4, a, 10, 1, 0, 2); + test(4, b, 10, 1, 0, 6); + test(4, c, 10, 1, 0, 10); + test(4, d, 10, 1, 0, 20); + + insert(4, a, 1, 0, 0, 0, 32); + insert(4, a, 64, 0, 0, 0, 32); + insert(4, a, 128, 0, 0, 0, 32); + insert(4, a, 192, 0, 0, 0, 32); + insert(4, a, 255, 0, 0, 0, 32); + allowedips_remove_by_peer(&t, a, &mutex); + test_negative(4, a, 1, 0, 0, 0); + test_negative(4, a, 64, 0, 0, 0); + test_negative(4, a, 128, 0, 0, 0); + test_negative(4, a, 192, 0, 0, 0); + test_negative(4, a, 255, 0, 0, 0); + + allowedips_free(&t, &mutex); + allowedips_init(&t); + insert(4, a, 192, 168, 0, 0, 16); + insert(4, a, 192, 168, 0, 0, 24); + allowedips_remove_by_peer(&t, a, &mutex); + test_negative(4, a, 192, 168, 0, 1); + + /* These will hit the WARN_ON(len >= 128) in free_node if something goes wrong. */ + for (i = 0; i < 128; ++i) { + part = cpu_to_be64(~(1LLU << (i % 64))); + memset(&ip, 0xff, 16); + memcpy((u8 *)&ip + (i < 64) * 8, &part, 8); + allowedips_insert_v6(&t, &ip, 128, a, &mutex); + } + + allowedips_free(&t, &mutex); + + allowedips_init(&t); + insert(4, a, 192, 95, 5, 93, 27); + insert(6, a, 0x26075300, 0x60006b00, 0, 0xc05f0543, 128); + insert(4, a, 10, 1, 0, 20, 29); + insert(6, a, 0x26075300, 0x6d8a6bf8, 0xdab1f1df, 0xc05f1523, 83); + insert(6, a, 0x26075300, 0x6d8a6bf8, 0xdab1f1df, 0xc05f1523, 21); + allowedips_walk_by_peer(&t, cursor, a, walk_callback, &wctx, &mutex); + test_boolean(wctx.count == 5); + test_boolean(wctx.found_a); + test_boolean(wctx.found_b); + test_boolean(wctx.found_c); + test_boolean(wctx.found_d); + test_boolean(wctx.found_e); + test_boolean(!wctx.found_other); + +#ifdef DEBUG_RANDOM_TRIE + if (success) + success = randomized_test(); +#endif + + if (success) + pr_info("allowedips self-tests: pass\n"); + +free: + allowedips_free(&t, &mutex); + kfree(a); + kfree(b); + kfree(c); + kfree(d); + kfree(e); + kfree(f); + kfree(g); + kfree(h); + mutex_unlock(&mutex); + kfree(cursor); + + return success; +} +#undef test_negative +#undef test +#undef remove +#undef insert +#undef init_peer + +#endif diff --git a/drivers/net/wireguard/selftest/counter.h b/drivers/net/wireguard/selftest/counter.h new file mode 100644 index 000000000000..2a0e5a5680c8 --- /dev/null +++ b/drivers/net/wireguard/selftest/counter.h @@ -0,0 +1,103 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifdef DEBUG +bool __init packet_counter_selftest(void) +{ + unsigned int test_num = 0, i; + union noise_counter counter; + bool success = true; + +#define T_INIT do { \ + memset(&counter, 0, sizeof(union noise_counter)); \ + spin_lock_init(&counter.receive.lock); \ + } while (0) +#define T_LIM (COUNTER_WINDOW_SIZE + 1) +#define T(n, v) do { \ + ++test_num; \ + if (counter_validate(&counter, n) != v) { \ + pr_info("nonce counter self-test %u: FAIL\n", \ + test_num); \ + success = false; \ + } \ + } while (0) + + T_INIT; + /* 1 */ T(0, true); + /* 2 */ T(1, true); + /* 3 */ T(1, false); + /* 4 */ T(9, true); + /* 5 */ T(8, true); + /* 6 */ T(7, true); + /* 7 */ T(7, false); + /* 8 */ T(T_LIM, true); + /* 9 */ T(T_LIM - 1, true); + /* 10 */ T(T_LIM - 1, false); + /* 11 */ T(T_LIM - 2, true); + /* 12 */ T(2, true); + /* 13 */ T(2, false); + /* 14 */ T(T_LIM + 16, true); + /* 15 */ T(3, false); + /* 16 */ T(T_LIM + 16, false); + /* 17 */ T(T_LIM * 4, true); + /* 18 */ T(T_LIM * 4 - (T_LIM - 1), true); + /* 19 */ T(10, false); + /* 20 */ T(T_LIM * 4 - T_LIM, false); + /* 21 */ T(T_LIM * 4 - (T_LIM + 1), false); + /* 22 */ T(T_LIM * 4 - (T_LIM - 2), true); + /* 23 */ T(T_LIM * 4 + 1 - T_LIM, false); + /* 24 */ T(0, false); + /* 25 */ T(REJECT_AFTER_MESSAGES, false); + /* 26 */ T(REJECT_AFTER_MESSAGES - 1, true); + /* 27 */ T(REJECT_AFTER_MESSAGES, false); + /* 28 */ T(REJECT_AFTER_MESSAGES - 1, false); + /* 29 */ T(REJECT_AFTER_MESSAGES - 2, true); + /* 30 */ T(REJECT_AFTER_MESSAGES + 1, false); + /* 31 */ T(REJECT_AFTER_MESSAGES + 2, false); + /* 32 */ T(REJECT_AFTER_MESSAGES - 2, false); + /* 33 */ T(REJECT_AFTER_MESSAGES - 3, true); + /* 34 */ T(0, false); + + T_INIT; + for (i = 1; i <= COUNTER_WINDOW_SIZE; ++i) + T(i, true); + T(0, true); + T(0, false); + + T_INIT; + for (i = 2; i <= COUNTER_WINDOW_SIZE + 1; ++i) + T(i, true); + T(1, true); + T(0, false); + + T_INIT; + for (i = COUNTER_WINDOW_SIZE + 1; i-- > 0;) + T(i, true); + + T_INIT; + for (i = COUNTER_WINDOW_SIZE + 2; i-- > 1;) + T(i, true); + T(0, false); + + T_INIT; + for (i = COUNTER_WINDOW_SIZE + 1; i-- > 1;) + T(i, true); + T(COUNTER_WINDOW_SIZE + 1, true); + T(0, false); + + T_INIT; + for (i = COUNTER_WINDOW_SIZE + 1; i-- > 1;) + T(i, true); + T(0, true); + T(COUNTER_WINDOW_SIZE + 1, true); +#undef T +#undef T_LIM +#undef T_INIT + + if (success) + pr_info("nonce counter self-tests: pass\n"); + return success; +} +#endif diff --git a/drivers/net/wireguard/selftest/ratelimiter.h b/drivers/net/wireguard/selftest/ratelimiter.h new file mode 100644 index 000000000000..e0c65eafb4a7 --- /dev/null +++ b/drivers/net/wireguard/selftest/ratelimiter.h @@ -0,0 +1,178 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifdef DEBUG + +#include + +static const struct { + bool result; + unsigned int msec_to_sleep_before; +} expected_results[] __initconst = { + [0 ... PACKETS_BURSTABLE - 1] = { true, 0 }, + [PACKETS_BURSTABLE] = { false, 0 }, + [PACKETS_BURSTABLE + 1] = { true, MSEC_PER_SEC / PACKETS_PER_SECOND }, + [PACKETS_BURSTABLE + 2] = { false, 0 }, + [PACKETS_BURSTABLE + 3] = { true, (MSEC_PER_SEC / PACKETS_PER_SECOND) * 2 }, + [PACKETS_BURSTABLE + 4] = { true, 0 }, + [PACKETS_BURSTABLE + 5] = { false, 0 } +}; + +static __init unsigned int maximum_jiffies_at_index(int index) +{ + unsigned int total_msecs = 2 * MSEC_PER_SEC / PACKETS_PER_SECOND / 3; + int i; + + for (i = 0; i <= index; ++i) + total_msecs += expected_results[i].msec_to_sleep_before; + return msecs_to_jiffies(total_msecs); +} + +bool __init ratelimiter_selftest(void) +{ + int i, test = 0, tries = 0, ret = false; + unsigned long loop_start_time; +#if IS_ENABLED(CONFIG_IPV6) + struct sk_buff *skb6; + struct ipv6hdr *hdr6; +#endif + struct sk_buff *skb4; + struct iphdr *hdr4; + +#if defined(CONFIG_KASAN) || defined(CONFIG_UBSAN) + return true; +#endif + + BUILD_BUG_ON(MSEC_PER_SEC % PACKETS_PER_SECOND != 0); + + if (ratelimiter_init()) + goto out; + ++test; + if (ratelimiter_init()) { + ratelimiter_uninit(); + goto out; + } + ++test; + if (ratelimiter_init()) { + ratelimiter_uninit(); + ratelimiter_uninit(); + goto out; + } + ++test; + + skb4 = alloc_skb(sizeof(struct iphdr), GFP_KERNEL); + if (unlikely(!skb4)) + goto err_nofree; + skb4->protocol = htons(ETH_P_IP); + hdr4 = (struct iphdr *)skb_put(skb4, sizeof(*hdr4)); + hdr4->saddr = htonl(8182); + skb_reset_network_header(skb4); + ++test; + +#if IS_ENABLED(CONFIG_IPV6) + skb6 = alloc_skb(sizeof(struct ipv6hdr), GFP_KERNEL); + if (unlikely(!skb6)) { + kfree_skb(skb4); + goto err_nofree; + } + skb6->protocol = htons(ETH_P_IPV6); + hdr6 = (struct ipv6hdr *)skb_put(skb6, sizeof(*hdr6)); + hdr6->saddr.in6_u.u6_addr32[0] = htonl(1212); + hdr6->saddr.in6_u.u6_addr32[1] = htonl(289188); + skb_reset_network_header(skb6); + ++test; +#endif + +restart: + loop_start_time = jiffies; + for (i = 0; i < ARRAY_SIZE(expected_results); ++i) { +#define ensure_time do { \ + if (time_is_before_jiffies(loop_start_time + \ + maximum_jiffies_at_index(i))) { \ + if (++tries >= 5000) \ + goto err; \ + gc_entries(NULL); \ + rcu_barrier(); \ + msleep(500); \ + goto restart; \ + } \ + } while (0) + + if (expected_results[i].msec_to_sleep_before) + msleep(expected_results[i].msec_to_sleep_before); + + ensure_time; + if (ratelimiter_allow(skb4, &init_net) != + expected_results[i].result) + goto err; + ++test; + hdr4->saddr = htonl(ntohl(hdr4->saddr) + i + 1); + ensure_time; + if (!ratelimiter_allow(skb4, &init_net)) + goto err; + ++test; + hdr4->saddr = htonl(ntohl(hdr4->saddr) - i - 1); + +#if IS_ENABLED(CONFIG_IPV6) + hdr6->saddr.in6_u.u6_addr32[2] = + hdr6->saddr.in6_u.u6_addr32[3] = htonl(i); + ensure_time; + if (ratelimiter_allow(skb6, &init_net) != + expected_results[i].result) + goto err; + ++test; + hdr6->saddr.in6_u.u6_addr32[0] = + htonl(ntohl(hdr6->saddr.in6_u.u6_addr32[0]) + i + 1); + ensure_time; + if (!ratelimiter_allow(skb6, &init_net)) + goto err; + ++test; + hdr6->saddr.in6_u.u6_addr32[0] = + htonl(ntohl(hdr6->saddr.in6_u.u6_addr32[0]) - i - 1); + ensure_time; +#endif + } + + tries = 0; +restart2: + gc_entries(NULL); + rcu_barrier(); + + if (atomic_read(&total_entries)) + goto err; + ++test; + + for (i = 0; i <= max_entries; ++i) { + hdr4->saddr = htonl(i); + if (ratelimiter_allow(skb4, &init_net) != (i != max_entries)) { + if (++tries < 5000) + goto restart2; + goto err; + } + ++test; + } + + ret = true; + +err: + kfree_skb(skb4); +#if IS_ENABLED(CONFIG_IPV6) + kfree_skb(skb6); +#endif +err_nofree: + ratelimiter_uninit(); + ratelimiter_uninit(); + ratelimiter_uninit(); + /* Uninit one extra time to check underflow detection. */ + ratelimiter_uninit(); +out: + if (ret) + pr_info("ratelimiter self-tests: pass\n"); + else + pr_info("ratelimiter self-test %d: fail\n", test); + + return ret; +} +#endif diff --git a/drivers/net/wireguard/send.c b/drivers/net/wireguard/send.c new file mode 100644 index 000000000000..5dde5a3c5844 --- /dev/null +++ b/drivers/net/wireguard/send.c @@ -0,0 +1,420 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "queueing.h" +#include "timers.h" +#include "device.h" +#include "peer.h" +#include "socket.h" +#include "messages.h" +#include "cookie.h" + +#include +#include +#include +#include +#include +#include +#include + +static void packet_send_handshake_initiation(struct wireguard_peer *peer) +{ + struct message_handshake_initiation packet; + + if (!has_expired(atomic64_read(&peer->last_sent_handshake), + REKEY_TIMEOUT)) + return; /* This function is rate limited. */ + + atomic64_set(&peer->last_sent_handshake, ktime_get_boot_fast_ns()); + net_dbg_ratelimited("%s: Sending handshake initiation to peer %llu (%pISpfsc)\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr); + + if (noise_handshake_create_initiation(&packet, &peer->handshake)) { + cookie_add_mac_to_packet(&packet, sizeof(packet), peer); + timers_any_authenticated_packet_traversal(peer); + timers_any_authenticated_packet_sent(peer); + atomic64_set(&peer->last_sent_handshake, + ktime_get_boot_fast_ns()); + socket_send_buffer_to_peer(peer, &packet, sizeof(packet), + HANDSHAKE_DSCP); + timers_handshake_initiated(peer); + } +} + +void packet_handshake_send_worker(struct work_struct *work) +{ + struct wireguard_peer *peer = container_of(work, struct wireguard_peer, + transmit_handshake_work); + + packet_send_handshake_initiation(peer); + peer_put(peer); +} + +void packet_send_queued_handshake_initiation(struct wireguard_peer *peer, + bool is_retry) +{ + if (!is_retry) + peer->timer_handshake_attempts = 0; + + rcu_read_lock_bh(); + /* We check last_sent_handshake here in addition to the actual function + * we're queueing up, so that we don't queue things if not strictly + * necessary: + */ + if (!has_expired(atomic64_read(&peer->last_sent_handshake), + REKEY_TIMEOUT) || unlikely(peer->is_dead)) + goto out; + + peer_get(peer); + /* Queues up calling packet_send_queued_handshakes(peer), where we do a + * peer_put(peer) after: + */ + if (!queue_work(peer->device->handshake_send_wq, + &peer->transmit_handshake_work)) + /* If the work was already queued, we want to drop the + * extra reference: + */ + peer_put(peer); +out: + rcu_read_unlock_bh(); +} + +void packet_send_handshake_response(struct wireguard_peer *peer) +{ + struct message_handshake_response packet; + + atomic64_set(&peer->last_sent_handshake, ktime_get_boot_fast_ns()); + net_dbg_ratelimited("%s: Sending handshake response to peer %llu (%pISpfsc)\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr); + + if (noise_handshake_create_response(&packet, &peer->handshake)) { + cookie_add_mac_to_packet(&packet, sizeof(packet), peer); + if (noise_handshake_begin_session(&peer->handshake, + &peer->keypairs)) { + timers_session_derived(peer); + timers_any_authenticated_packet_traversal(peer); + timers_any_authenticated_packet_sent(peer); + atomic64_set(&peer->last_sent_handshake, + ktime_get_boot_fast_ns()); + socket_send_buffer_to_peer(peer, &packet, + sizeof(packet), + HANDSHAKE_DSCP); + } + } +} + +void packet_send_handshake_cookie(struct wireguard_device *wg, + struct sk_buff *initiating_skb, + __le32 sender_index) +{ + struct message_handshake_cookie packet; + + net_dbg_skb_ratelimited("%s: Sending cookie response for denied handshake message for %pISpfsc\n", + wg->dev->name, initiating_skb); + cookie_message_create(&packet, initiating_skb, sender_index, + &wg->cookie_checker); + socket_send_buffer_as_reply_to_skb(wg, initiating_skb, &packet, + sizeof(packet)); +} + +static void keep_key_fresh(struct wireguard_peer *peer) +{ + struct noise_keypair *keypair; + bool send = false; + + rcu_read_lock_bh(); + keypair = rcu_dereference_bh(peer->keypairs.current_keypair); + if (likely(keypair && keypair->sending.is_valid) && + (unlikely(atomic64_read(&keypair->sending.counter.counter) > + REKEY_AFTER_MESSAGES) || + (keypair->i_am_the_initiator && + unlikely(has_expired(keypair->sending.birthdate, + REKEY_AFTER_TIME))))) + send = true; + rcu_read_unlock_bh(); + + if (send) + packet_send_queued_handshake_initiation(peer, false); +} + +static unsigned int skb_padding(struct sk_buff *skb) +{ + /* We do this modulo business with the MTU, just in case the networking + * layer gives us a packet that's bigger than the MTU. In that case, we + * wouldn't want the final subtraction to overflow in the case of the + * padded_size being clamped. + */ + unsigned int last_unit = skb->len % PACKET_CB(skb)->mtu; + unsigned int padded_size = ALIGN(last_unit, MESSAGE_PADDING_MULTIPLE); + + if (padded_size > PACKET_CB(skb)->mtu) + padded_size = PACKET_CB(skb)->mtu; + return padded_size - last_unit; +} + +static bool skb_encrypt(struct sk_buff *skb, struct noise_keypair *keypair, + simd_context_t *simd_context) +{ + unsigned int padding_len, plaintext_len, trailer_len; + struct scatterlist sg[MAX_SKB_FRAGS + 8]; + struct message_data *header; + struct sk_buff *trailer; + int num_frags; + + /* Calculate lengths. */ + padding_len = skb_padding(skb); + trailer_len = padding_len + noise_encrypted_len(0); + plaintext_len = skb->len + padding_len; + + /* Expand data section to have room for padding and auth tag. */ + num_frags = skb_cow_data(skb, trailer_len, &trailer); + if (unlikely(num_frags < 0 || num_frags > ARRAY_SIZE(sg))) + return false; + + /* Set the padding to zeros, and make sure it and the auth tag are part + * of the skb. + */ + memset(skb_tail_pointer(trailer), 0, padding_len); + + /* Expand head section to have room for our header and the network + * stack's headers. + */ + if (unlikely(skb_cow_head(skb, DATA_PACKET_HEAD_ROOM) < 0)) + return false; + + /* We have to remember to add the checksum to the innerpacket, in case + * the receiver forwards it. + */ + if (likely(!skb_checksum_setup(skb, true))) + skb_checksum_help(skb); + + /* Only after checksumming can we safely add on the padding at the end + * and the header. + */ + skb_set_inner_network_header(skb, 0); + header = (struct message_data *)skb_push(skb, sizeof(*header)); + header->header.type = cpu_to_le32(MESSAGE_DATA); + header->key_idx = keypair->remote_index; + header->counter = cpu_to_le64(PACKET_CB(skb)->nonce); + pskb_put(skb, trailer, trailer_len); + + /* Now we can encrypt the scattergather segments */ + sg_init_table(sg, num_frags); + if (skb_to_sgvec(skb, sg, sizeof(struct message_data), + noise_encrypted_len(plaintext_len)) <= 0) + return false; + return chacha20poly1305_encrypt_sg(sg, sg, plaintext_len, NULL, 0, + PACKET_CB(skb)->nonce, + keypair->sending.key, simd_context); +} + +void packet_send_keepalive(struct wireguard_peer *peer) +{ + struct sk_buff *skb; + + if (skb_queue_empty(&peer->staged_packet_queue)) { + skb = alloc_skb(DATA_PACKET_HEAD_ROOM + MESSAGE_MINIMUM_LENGTH, + GFP_ATOMIC); + if (unlikely(!skb)) + return; + skb_reserve(skb, DATA_PACKET_HEAD_ROOM); + skb->dev = peer->device->dev; + PACKET_CB(skb)->mtu = skb->dev->mtu; + skb_queue_tail(&peer->staged_packet_queue, skb); + net_dbg_ratelimited("%s: Sending keepalive packet to peer %llu (%pISpfsc)\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr); + } + + packet_send_staged_packets(peer); +} + +#define skb_walk_null_queue_safe(first, skb, next) \ + for (skb = first, next = skb->next; skb; \ + skb = next, next = skb ? skb->next : NULL) +static void skb_free_null_queue(struct sk_buff *first) +{ + struct sk_buff *skb, *next; + + skb_walk_null_queue_safe (first, skb, next) + dev_kfree_skb(skb); +} + +static void packet_create_data_done(struct sk_buff *first, + struct wireguard_peer *peer) +{ + struct sk_buff *skb, *next; + bool is_keepalive, data_sent = false; + + timers_any_authenticated_packet_traversal(peer); + timers_any_authenticated_packet_sent(peer); + skb_walk_null_queue_safe (first, skb, next) { + is_keepalive = skb->len == message_data_len(0); + if (likely(!socket_send_skb_to_peer(peer, skb, + PACKET_CB(skb)->ds) && !is_keepalive)) + data_sent = true; + } + + if (likely(data_sent)) + timers_data_sent(peer); + + keep_key_fresh(peer); +} + +void packet_tx_worker(struct work_struct *work) +{ + struct crypt_queue *queue = + container_of(work, struct crypt_queue, work); + struct wireguard_peer *peer; + struct noise_keypair *keypair; + struct sk_buff *first; + enum packet_state state; + + while ((first = __ptr_ring_peek(&queue->ring)) != NULL && + (state = atomic_read_acquire(&PACKET_CB(first)->state)) != + PACKET_STATE_UNCRYPTED) { + __ptr_ring_discard_one(&queue->ring); + peer = PACKET_PEER(first); + keypair = PACKET_CB(first)->keypair; + + if (likely(state == PACKET_STATE_CRYPTED)) + packet_create_data_done(first, peer); + else + skb_free_null_queue(first); + + noise_keypair_put(keypair, false); + peer_put(peer); + } +} + +void packet_encrypt_worker(struct work_struct *work) +{ + struct crypt_queue *queue = + container_of(work, struct multicore_worker, work)->ptr; + struct sk_buff *first, *skb, *next; + simd_context_t simd_context; + + simd_get(&simd_context); + while ((first = ptr_ring_consume_bh(&queue->ring)) != NULL) { + enum packet_state state = PACKET_STATE_CRYPTED; + + skb_walk_null_queue_safe (first, skb, next) { + if (likely(skb_encrypt(skb, PACKET_CB(first)->keypair, + &simd_context))) + skb_reset(skb); + else { + state = PACKET_STATE_DEAD; + break; + } + } + queue_enqueue_per_peer(&PACKET_PEER(first)->tx_queue, first, + state); + + simd_relax(&simd_context); + } + simd_put(&simd_context); +} + +static void packet_create_data(struct sk_buff *first) +{ + struct wireguard_peer *peer = PACKET_PEER(first); + struct wireguard_device *wg = peer->device; + int ret = -EINVAL; + + rcu_read_lock_bh(); + if (unlikely(peer->is_dead)) + goto err; + + ret = queue_enqueue_per_device_and_peer(&wg->encrypt_queue, + &peer->tx_queue, first, + wg->packet_crypt_wq, + &wg->encrypt_queue.last_cpu); + if (unlikely(ret == -EPIPE)) + queue_enqueue_per_peer(&peer->tx_queue, first, + PACKET_STATE_DEAD); +err: + rcu_read_unlock_bh(); + if (likely(!ret || ret == -EPIPE)) + return; + noise_keypair_put(PACKET_CB(first)->keypair, false); + peer_put(peer); + skb_free_null_queue(first); +} + +void packet_send_staged_packets(struct wireguard_peer *peer) +{ + struct noise_symmetric_key *key; + struct noise_keypair *keypair; + struct sk_buff_head packets; + struct sk_buff *skb; + + /* Steal the current queue into our local one. */ + __skb_queue_head_init(&packets); + spin_lock_bh(&peer->staged_packet_queue.lock); + skb_queue_splice_init(&peer->staged_packet_queue, &packets); + spin_unlock_bh(&peer->staged_packet_queue.lock); + if (unlikely(skb_queue_empty(&packets))) + return; + + /* First we make sure we have a valid reference to a valid key. */ + rcu_read_lock_bh(); + keypair = noise_keypair_get( + rcu_dereference_bh(peer->keypairs.current_keypair)); + rcu_read_unlock_bh(); + if (unlikely(!keypair)) + goto out_nokey; + key = &keypair->sending; + if (unlikely(!key->is_valid)) + goto out_nokey; + if (unlikely(has_expired(key->birthdate, REJECT_AFTER_TIME))) + goto out_invalid; + + /* After we know we have a somewhat valid key, we now try to assign + * nonces to all of the packets in the queue. If we can't assign nonces + * for all of them, we just consider it a failure and wait for the next + * handshake. + */ + skb_queue_walk (&packets, skb) { + /* 0 for no outer TOS: no leak. TODO: should we use flowi->tos + * as outer? */ + PACKET_CB(skb)->ds = ip_tunnel_ecn_encap(0, ip_hdr(skb), skb); + PACKET_CB(skb)->nonce = + atomic64_inc_return(&key->counter.counter) - 1; + if (unlikely(PACKET_CB(skb)->nonce >= REJECT_AFTER_MESSAGES)) + goto out_invalid; + } + + packets.prev->next = NULL; + peer_get(keypair->entry.peer); + PACKET_CB(packets.next)->keypair = keypair; + packet_create_data(packets.next); + return; + +out_invalid: + key->is_valid = false; +out_nokey: + noise_keypair_put(keypair, false); + + /* We orphan the packets if we're waiting on a handshake, so that they + * don't block a socket's pool. + */ + skb_queue_walk (&packets, skb) + skb_orphan(skb); + /* Then we put them back on the top of the queue. We're not too + * concerned about accidentally getting things a little out of order if + * packets are being added really fast, because this queue is for before + * packets can even be sent and it's small anyway. + */ + spin_lock_bh(&peer->staged_packet_queue.lock); + skb_queue_splice(&packets, &peer->staged_packet_queue); + spin_unlock_bh(&peer->staged_packet_queue.lock); + + /* If we're exiting because there's something wrong with the key, it + * means we should initiate a new handshake. + */ + packet_send_queued_handshake_initiation(peer, false); +} diff --git a/drivers/net/wireguard/socket.c b/drivers/net/wireguard/socket.c new file mode 100644 index 000000000000..e87fc4cdec9f --- /dev/null +++ b/drivers/net/wireguard/socket.c @@ -0,0 +1,432 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "device.h" +#include "peer.h" +#include "socket.h" +#include "queueing.h" +#include "messages.h" + +#include +#include +#include +#include +#include +#include +#include + +static int send4(struct wireguard_device *wg, struct sk_buff *skb, + struct endpoint *endpoint, u8 ds, struct dst_cache *cache) +{ + struct flowi4 fl = { + .saddr = endpoint->src4.s_addr, + .daddr = endpoint->addr4.sin_addr.s_addr, + .fl4_dport = endpoint->addr4.sin_port, + .flowi4_mark = wg->fwmark, + .flowi4_proto = IPPROTO_UDP + }; + struct rtable *rt = NULL; + struct sock *sock; + int ret = 0; + + skb->next = skb->prev = NULL; + skb->dev = wg->dev; + skb->mark = wg->fwmark; + + rcu_read_lock_bh(); + sock = rcu_dereference_bh(wg->sock4); + + if (unlikely(!sock)) { + ret = -ENONET; + goto err; + } + + fl.fl4_sport = inet_sk(sock)->inet_sport; + + if (cache) + rt = dst_cache_get_ip4(cache, &fl.saddr); + + if (!rt) { + security_sk_classify_flow(sock, flowi4_to_flowi(&fl)); + if (unlikely(!inet_confirm_addr(sock_net(sock), NULL, 0, + fl.saddr, RT_SCOPE_HOST))) { + endpoint->src4.s_addr = 0; + *(__force __be32 *)&endpoint->src_if4 = 0; + fl.saddr = 0; + if (cache) + dst_cache_reset(cache); + } + rt = ip_route_output_flow(sock_net(sock), &fl, sock); + if (unlikely(endpoint->src_if4 && ((IS_ERR(rt) && + PTR_ERR(rt) == -EINVAL) || (!IS_ERR(rt) && + rt->dst.dev->ifindex != endpoint->src_if4)))) { + endpoint->src4.s_addr = 0; + *(__force __be32 *)&endpoint->src_if4 = 0; + fl.saddr = 0; + if (cache) + dst_cache_reset(cache); + if (!IS_ERR(rt)) + ip_rt_put(rt); + rt = ip_route_output_flow(sock_net(sock), &fl, sock); + } + if (unlikely(IS_ERR(rt))) { + ret = PTR_ERR(rt); + net_dbg_ratelimited("%s: No route to %pISpfsc, error %d\n", + wg->dev->name, &endpoint->addr, ret); + goto err; + } else if (unlikely(rt->dst.dev == skb->dev)) { + ip_rt_put(rt); + ret = -ELOOP; + net_dbg_ratelimited("%s: Avoiding routing loop to %pISpfsc\n", + wg->dev->name, &endpoint->addr); + goto err; + } + if (cache) + dst_cache_set_ip4(cache, &rt->dst, fl.saddr); + } + udp_tunnel_xmit_skb(rt, sock, skb, fl.saddr, fl.daddr, ds, + ip4_dst_hoplimit(&rt->dst), 0, fl.fl4_sport, + fl.fl4_dport, false, false); + goto out; + +err: + kfree_skb(skb); +out: + rcu_read_unlock_bh(); + return ret; +} + +static int send6(struct wireguard_device *wg, struct sk_buff *skb, + struct endpoint *endpoint, u8 ds, struct dst_cache *cache) +{ +#if IS_ENABLED(CONFIG_IPV6) + struct flowi6 fl = { + .saddr = endpoint->src6, + .daddr = endpoint->addr6.sin6_addr, + .fl6_dport = endpoint->addr6.sin6_port, + .flowi6_mark = wg->fwmark, + .flowi6_oif = endpoint->addr6.sin6_scope_id, + .flowi6_proto = IPPROTO_UDP + /* TODO: addr->sin6_flowinfo */ + }; + struct dst_entry *dst = NULL; + struct sock *sock; + int ret = 0; + + skb->next = skb->prev = NULL; + skb->dev = wg->dev; + skb->mark = wg->fwmark; + + rcu_read_lock_bh(); + sock = rcu_dereference_bh(wg->sock6); + + if (unlikely(!sock)) { + ret = -ENONET; + goto err; + } + + fl.fl6_sport = inet_sk(sock)->inet_sport; + + if (cache) + dst = dst_cache_get_ip6(cache, &fl.saddr); + + if (!dst) { + security_sk_classify_flow(sock, flowi6_to_flowi(&fl)); + if (unlikely(!ipv6_addr_any(&fl.saddr) && + !ipv6_chk_addr(sock_net(sock), &fl.saddr, NULL, 0))) { + endpoint->src6 = fl.saddr = in6addr_any; + if (cache) + dst_cache_reset(cache); + } + ret = ipv6_stub->ipv6_dst_lookup(sock_net(sock), sock, &dst, + &fl); + if (unlikely(ret)) { + net_dbg_ratelimited("%s: No route to %pISpfsc, error %d\n", + wg->dev->name, &endpoint->addr, ret); + goto err; + } else if (unlikely(dst->dev == skb->dev)) { + dst_release(dst); + ret = -ELOOP; + net_dbg_ratelimited("%s: Avoiding routing loop to %pISpfsc\n", + wg->dev->name, &endpoint->addr); + goto err; + } + if (cache) + dst_cache_set_ip6(cache, dst, &fl.saddr); + } + + udp_tunnel6_xmit_skb(dst, sock, skb, skb->dev, &fl.saddr, &fl.daddr, ds, + ip6_dst_hoplimit(dst), 0, fl.fl6_sport, + fl.fl6_dport, false); + goto out; + +err: + kfree_skb(skb); +out: + rcu_read_unlock_bh(); + return ret; +#else + return -EAFNOSUPPORT; +#endif +} + +int socket_send_skb_to_peer(struct wireguard_peer *peer, struct sk_buff *skb, + u8 ds) +{ + size_t skb_len = skb->len; + int ret = -EAFNOSUPPORT; + + read_lock_bh(&peer->endpoint_lock); + if (peer->endpoint.addr.sa_family == AF_INET) + ret = send4(peer->device, skb, &peer->endpoint, ds, + &peer->endpoint_cache); + else if (peer->endpoint.addr.sa_family == AF_INET6) + ret = send6(peer->device, skb, &peer->endpoint, ds, + &peer->endpoint_cache); + else + dev_kfree_skb(skb); + if (likely(!ret)) + peer->tx_bytes += skb_len; + read_unlock_bh(&peer->endpoint_lock); + + return ret; +} + +int socket_send_buffer_to_peer(struct wireguard_peer *peer, void *buffer, + size_t len, u8 ds) +{ + struct sk_buff *skb = alloc_skb(len + SKB_HEADER_LEN, GFP_ATOMIC); + + if (unlikely(!skb)) + return -ENOMEM; + + skb_reserve(skb, SKB_HEADER_LEN); + skb_set_inner_network_header(skb, 0); + skb_put_data(skb, buffer, len); + return socket_send_skb_to_peer(peer, skb, ds); +} + +int socket_send_buffer_as_reply_to_skb(struct wireguard_device *wg, + struct sk_buff *in_skb, void *buffer, + size_t len) +{ + int ret = 0; + struct sk_buff *skb; + struct endpoint endpoint; + + if (unlikely(!in_skb)) + return -EINVAL; + ret = socket_endpoint_from_skb(&endpoint, in_skb); + if (unlikely(ret < 0)) + return ret; + + skb = alloc_skb(len + SKB_HEADER_LEN, GFP_ATOMIC); + if (unlikely(!skb)) + return -ENOMEM; + skb_reserve(skb, SKB_HEADER_LEN); + skb_set_inner_network_header(skb, 0); + skb_put_data(skb, buffer, len); + + if (endpoint.addr.sa_family == AF_INET) + ret = send4(wg, skb, &endpoint, 0, NULL); + else if (endpoint.addr.sa_family == AF_INET6) + ret = send6(wg, skb, &endpoint, 0, NULL); + /* No other possibilities if the endpoint is valid, which it is, + * as we checked above. + */ + + return ret; +} + +int socket_endpoint_from_skb(struct endpoint *endpoint, + const struct sk_buff *skb) +{ + memset(endpoint, 0, sizeof(*endpoint)); + if (skb->protocol == htons(ETH_P_IP)) { + endpoint->addr4.sin_family = AF_INET; + endpoint->addr4.sin_port = udp_hdr(skb)->source; + endpoint->addr4.sin_addr.s_addr = ip_hdr(skb)->saddr; + endpoint->src4.s_addr = ip_hdr(skb)->daddr; + endpoint->src_if4 = skb->skb_iif; + } else if (skb->protocol == htons(ETH_P_IPV6)) { + endpoint->addr6.sin6_family = AF_INET6; + endpoint->addr6.sin6_port = udp_hdr(skb)->source; + endpoint->addr6.sin6_addr = ipv6_hdr(skb)->saddr; + endpoint->addr6.sin6_scope_id = ipv6_iface_scope_id( + &ipv6_hdr(skb)->saddr, skb->skb_iif); + endpoint->src6 = ipv6_hdr(skb)->daddr; + } else + return -EINVAL; + return 0; +} + +static bool endpoint_eq(const struct endpoint *a, const struct endpoint *b) +{ + return (a->addr.sa_family == AF_INET && b->addr.sa_family == AF_INET && + a->addr4.sin_port == b->addr4.sin_port && + a->addr4.sin_addr.s_addr == b->addr4.sin_addr.s_addr && + a->src4.s_addr == b->src4.s_addr && a->src_if4 == b->src_if4) || + (a->addr.sa_family == AF_INET6 && + b->addr.sa_family == AF_INET6 && + a->addr6.sin6_port == b->addr6.sin6_port && + ipv6_addr_equal(&a->addr6.sin6_addr, &b->addr6.sin6_addr) && + a->addr6.sin6_scope_id == b->addr6.sin6_scope_id && + ipv6_addr_equal(&a->src6, &b->src6)) || + unlikely(!a->addr.sa_family && !b->addr.sa_family); +} + +void socket_set_peer_endpoint(struct wireguard_peer *peer, + const struct endpoint *endpoint) +{ + /* First we check unlocked, in order to optimize, since it's pretty rare + * that an endpoint will change. If we happen to be mid-write, and two + * CPUs wind up writing the same thing or something slightly different, + * it doesn't really matter much either. + */ + if (endpoint_eq(endpoint, &peer->endpoint)) + return; + write_lock_bh(&peer->endpoint_lock); + if (endpoint->addr.sa_family == AF_INET) { + peer->endpoint.addr4 = endpoint->addr4; + peer->endpoint.src4 = endpoint->src4; + peer->endpoint.src_if4 = endpoint->src_if4; + } else if (endpoint->addr.sa_family == AF_INET6) { + peer->endpoint.addr6 = endpoint->addr6; + peer->endpoint.src6 = endpoint->src6; + } else + goto out; + dst_cache_reset(&peer->endpoint_cache); +out: + write_unlock_bh(&peer->endpoint_lock); +} + +void socket_set_peer_endpoint_from_skb(struct wireguard_peer *peer, + const struct sk_buff *skb) +{ + struct endpoint endpoint; + + if (!socket_endpoint_from_skb(&endpoint, skb)) + socket_set_peer_endpoint(peer, &endpoint); +} + +void socket_clear_peer_endpoint_src(struct wireguard_peer *peer) +{ + write_lock_bh(&peer->endpoint_lock); + memset(&peer->endpoint.src6, 0, sizeof(peer->endpoint.src6)); + dst_cache_reset(&peer->endpoint_cache); + write_unlock_bh(&peer->endpoint_lock); +} + +static int receive(struct sock *sk, struct sk_buff *skb) +{ + struct wireguard_device *wg; + + if (unlikely(!sk)) + goto err; + wg = sk->sk_user_data; + if (unlikely(!wg)) + goto err; + packet_receive(wg, skb); + return 0; + +err: + kfree_skb(skb); + return 0; +} + +static void sock_free(struct sock *sock) +{ + if (unlikely(!sock)) + return; + sk_clear_memalloc(sock); + udp_tunnel_sock_release(sock->sk_socket); +} + +static void set_sock_opts(struct socket *sock) +{ + sock->sk->sk_allocation = GFP_ATOMIC; + sock->sk->sk_sndbuf = INT_MAX; + sk_set_memalloc(sock->sk); +} + +int socket_init(struct wireguard_device *wg, u16 port) +{ + int ret; + struct udp_tunnel_sock_cfg cfg = { + .sk_user_data = wg, + .encap_type = 1, + .encap_rcv = receive + }; + struct socket *new4 = NULL, *new6 = NULL; + struct udp_port_cfg port4 = { + .family = AF_INET, + .local_ip.s_addr = htonl(INADDR_ANY), + .local_udp_port = htons(port), + .use_udp_checksums = true + }; +#if IS_ENABLED(CONFIG_IPV6) + int retries = 0; + struct udp_port_cfg port6 = { + .family = AF_INET6, + .local_ip6 = IN6ADDR_ANY_INIT, + .use_udp6_tx_checksums = true, + .use_udp6_rx_checksums = true, + .ipv6_v6only = true + }; +#endif + +#if IS_ENABLED(CONFIG_IPV6) +retry: +#endif + + ret = udp_sock_create(wg->creating_net, &port4, &new4); + if (ret < 0) { + pr_err("%s: Could not create IPv4 socket\n", wg->dev->name); + return ret; + } + set_sock_opts(new4); + setup_udp_tunnel_sock(wg->creating_net, new4, &cfg); + +#if IS_ENABLED(CONFIG_IPV6) + if (ipv6_mod_enabled()) { + port6.local_udp_port = inet_sk(new4->sk)->inet_sport; + ret = udp_sock_create(wg->creating_net, &port6, &new6); + if (ret < 0) { + udp_tunnel_sock_release(new4); + if (ret == -EADDRINUSE && !port && retries++ < 100) + goto retry; + pr_err("%s: Could not create IPv6 socket\n", + wg->dev->name); + return ret; + } + set_sock_opts(new6); + setup_udp_tunnel_sock(wg->creating_net, new6, &cfg); + } +#endif + + socket_reinit(wg, new4 ? new4->sk : NULL, new6 ? new6->sk : NULL); + return 0; +} + +void socket_reinit(struct wireguard_device *wg, struct sock *new4, + struct sock *new6) +{ + struct sock *old4, *old6; + + mutex_lock(&wg->socket_update_lock); + old4 = rcu_dereference_protected(wg->sock4, + lockdep_is_held(&wg->socket_update_lock)); + old6 = rcu_dereference_protected(wg->sock6, + lockdep_is_held(&wg->socket_update_lock)); + rcu_assign_pointer(wg->sock4, new4); + rcu_assign_pointer(wg->sock6, new6); + if (new4) + wg->incoming_port = ntohs(inet_sk(new4)->inet_sport); + mutex_unlock(&wg->socket_update_lock); + synchronize_rcu_bh(); + synchronize_net(); + sock_free(old4); + sock_free(old6); +} diff --git a/drivers/net/wireguard/socket.h b/drivers/net/wireguard/socket.h new file mode 100644 index 000000000000..49784990ea6e --- /dev/null +++ b/drivers/net/wireguard/socket.h @@ -0,0 +1,44 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_SOCKET_H +#define _WG_SOCKET_H + +#include +#include +#include +#include + +int socket_init(struct wireguard_device *wg, u16 port); +void socket_reinit(struct wireguard_device *wg, struct sock *new4, + struct sock *new6); +int socket_send_buffer_to_peer(struct wireguard_peer *peer, void *data, + size_t len, u8 ds); +int socket_send_skb_to_peer(struct wireguard_peer *peer, struct sk_buff *skb, + u8 ds); +int socket_send_buffer_as_reply_to_skb(struct wireguard_device *wg, + struct sk_buff *in_skb, void *out_buffer, + size_t len); + +int socket_endpoint_from_skb(struct endpoint *endpoint, + const struct sk_buff *skb); +void socket_set_peer_endpoint(struct wireguard_peer *peer, + const struct endpoint *endpoint); +void socket_set_peer_endpoint_from_skb(struct wireguard_peer *peer, + const struct sk_buff *skb); +void socket_clear_peer_endpoint_src(struct wireguard_peer *peer); + +#if defined(CONFIG_DYNAMIC_DEBUG) || defined(DEBUG) +#define net_dbg_skb_ratelimited(fmt, dev, skb, ...) do { \ + struct endpoint __endpoint; \ + socket_endpoint_from_skb(&__endpoint, skb); \ + net_dbg_ratelimited(fmt, dev, &__endpoint.addr, \ + ##__VA_ARGS__); \ + } while (0) +#else +#define net_dbg_skb_ratelimited(fmt, skb, ...) +#endif + +#endif /* _WG_SOCKET_H */ diff --git a/drivers/net/wireguard/timers.c b/drivers/net/wireguard/timers.c new file mode 100644 index 000000000000..fdc8c380af2a --- /dev/null +++ b/drivers/net/wireguard/timers.c @@ -0,0 +1,256 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#include "timers.h" +#include "device.h" +#include "peer.h" +#include "queueing.h" +#include "socket.h" + +/* + * - Timer for retransmitting the handshake if we don't hear back after + * `REKEY_TIMEOUT + jitter` ms. + * + * - Timer for sending empty packet if we have received a packet but after have + * not sent one for `KEEPALIVE_TIMEOUT` ms. + * + * - Timer for initiating new handshake if we have sent a packet but after have + * not received one (even empty) for `(KEEPALIVE_TIMEOUT + REKEY_TIMEOUT)` ms. + * + * - Timer for zeroing out all ephemeral keys after `(REJECT_AFTER_TIME * 3)` ms + * if no new keys have been received. + * + * - Timer for, if enabled, sending an empty authenticated packet every user- + * specified seconds. + */ + +#define peer_get_from_timer(timer_name) \ + struct wireguard_peer *peer; \ + rcu_read_lock_bh(); \ + peer = peer_get_maybe_zero(from_timer(peer, timer, timer_name)); \ + rcu_read_unlock_bh(); \ + if (unlikely(!peer)) \ + return; + +static inline void mod_peer_timer(struct wireguard_peer *peer, + struct timer_list *timer, + unsigned long expires) +{ + rcu_read_lock_bh(); + if (likely(netif_running(peer->device->dev) && !peer->is_dead)) + mod_timer(timer, expires); + rcu_read_unlock_bh(); +} + +static inline void del_peer_timer(struct wireguard_peer *peer, + struct timer_list *timer) +{ + rcu_read_lock_bh(); + if (likely(netif_running(peer->device->dev) && !peer->is_dead)) + del_timer(timer); + rcu_read_unlock_bh(); +} + +static void expired_retransmit_handshake(struct timer_list *timer) +{ + peer_get_from_timer(timer_retransmit_handshake); + + if (peer->timer_handshake_attempts > MAX_TIMER_HANDSHAKES) { + pr_debug("%s: Handshake for peer %llu (%pISpfsc) did not complete after %d attempts, giving up\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr, MAX_TIMER_HANDSHAKES + 2); + + del_peer_timer(peer, &peer->timer_send_keepalive); + /* We drop all packets without a keypair and don't try again, + * if we try unsuccessfully for too long to make a handshake. + */ + skb_queue_purge(&peer->staged_packet_queue); + + /* We set a timer for destroying any residue that might be left + * of a partial exchange. + */ + if (!timer_pending(&peer->timer_zero_key_material)) + mod_peer_timer(peer, &peer->timer_zero_key_material, + jiffies + REJECT_AFTER_TIME * 3 * HZ); + } else { + ++peer->timer_handshake_attempts; + pr_debug("%s: Handshake for peer %llu (%pISpfsc) did not complete after %d seconds, retrying (try %d)\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr, REKEY_TIMEOUT, + peer->timer_handshake_attempts + 1); + + /* We clear the endpoint address src address, in case this is + * the cause of trouble. + */ + socket_clear_peer_endpoint_src(peer); + + packet_send_queued_handshake_initiation(peer, true); + } + peer_put(peer); +} + +static void expired_send_keepalive(struct timer_list *timer) +{ + peer_get_from_timer(timer_send_keepalive); + + packet_send_keepalive(peer); + if (peer->timer_need_another_keepalive) { + peer->timer_need_another_keepalive = false; + mod_peer_timer(peer, &peer->timer_send_keepalive, + jiffies + KEEPALIVE_TIMEOUT * HZ); + } + peer_put(peer); +} + +static void expired_new_handshake(struct timer_list *timer) +{ + peer_get_from_timer(timer_new_handshake); + + pr_debug("%s: Retrying handshake with peer %llu (%pISpfsc) because we stopped hearing back after %d seconds\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr, KEEPALIVE_TIMEOUT + REKEY_TIMEOUT); + /* We clear the endpoint address src address, in case this is the cause + * of trouble. + */ + socket_clear_peer_endpoint_src(peer); + packet_send_queued_handshake_initiation(peer, false); + peer_put(peer); +} + +static void expired_zero_key_material(struct timer_list *timer) +{ + peer_get_from_timer(timer_zero_key_material); + + rcu_read_lock_bh(); + if (!peer->is_dead) { + /* Should take our reference. */ + if (!queue_work(peer->device->handshake_send_wq, + &peer->clear_peer_work)) + /* If the work was already on the queue, we want to drop the extra reference */ + peer_put(peer); + } + rcu_read_unlock_bh(); +} +static void queued_expired_zero_key_material(struct work_struct *work) +{ + struct wireguard_peer *peer = + container_of(work, struct wireguard_peer, clear_peer_work); + + pr_debug("%s: Zeroing out all keys for peer %llu (%pISpfsc), since we haven't received a new one in %d seconds\n", + peer->device->dev->name, peer->internal_id, + &peer->endpoint.addr, REJECT_AFTER_TIME * 3); + noise_handshake_clear(&peer->handshake); + noise_keypairs_clear(&peer->keypairs); + peer_put(peer); +} + +static void expired_send_persistent_keepalive(struct timer_list *timer) +{ + peer_get_from_timer(timer_persistent_keepalive); + + if (likely(peer->persistent_keepalive_interval)) + packet_send_keepalive(peer); + peer_put(peer); +} + +/* Should be called after an authenticated data packet is sent. */ +void timers_data_sent(struct wireguard_peer *peer) +{ + if (!timer_pending(&peer->timer_new_handshake)) + mod_peer_timer(peer, &peer->timer_new_handshake, + jiffies + (KEEPALIVE_TIMEOUT + REKEY_TIMEOUT) * HZ); +} + +/* Should be called after an authenticated data packet is received. */ +void timers_data_received(struct wireguard_peer *peer) +{ + if (likely(netif_running(peer->device->dev))) { + if (!timer_pending(&peer->timer_send_keepalive)) + mod_peer_timer(peer, &peer->timer_send_keepalive, + jiffies + KEEPALIVE_TIMEOUT * HZ); + else + peer->timer_need_another_keepalive = true; + } +} + +/* Should be called after any type of authenticated packet is sent, whether + * keepalive, data, or handshake. + */ +void timers_any_authenticated_packet_sent(struct wireguard_peer *peer) +{ + del_peer_timer(peer, &peer->timer_send_keepalive); +} + +/* Should be called after any type of authenticated packet is received, whether + * keepalive, data, or handshake. + */ +void timers_any_authenticated_packet_received(struct wireguard_peer *peer) +{ + del_peer_timer(peer, &peer->timer_new_handshake); +} + +/* Should be called after a handshake initiation message is sent. */ +void timers_handshake_initiated(struct wireguard_peer *peer) +{ + mod_peer_timer( + peer, &peer->timer_retransmit_handshake, + jiffies + REKEY_TIMEOUT * HZ + + prandom_u32_max(REKEY_TIMEOUT_JITTER_MAX_JIFFIES)); +} + +/* Should be called after a handshake response message is received and processed + * or when getting key confirmation via the first data message. + */ +void timers_handshake_complete(struct wireguard_peer *peer) +{ + del_peer_timer(peer, &peer->timer_retransmit_handshake); + peer->timer_handshake_attempts = 0; + peer->sent_lastminute_handshake = false; + getnstimeofday(&peer->walltime_last_handshake); +} + +/* Should be called after an ephemeral key is created, which is before sending a + * handshake response or after receiving a handshake response. + */ +void timers_session_derived(struct wireguard_peer *peer) +{ + mod_peer_timer(peer, &peer->timer_zero_key_material, + jiffies + REJECT_AFTER_TIME * 3 * HZ); +} + +/* Should be called before a packet with authentication, whether + * keepalive, data, or handshakem is sent, or after one is received. + */ +void timers_any_authenticated_packet_traversal(struct wireguard_peer *peer) +{ + if (peer->persistent_keepalive_interval) + mod_peer_timer(peer, &peer->timer_persistent_keepalive, + jiffies + peer->persistent_keepalive_interval * HZ); +} + +void timers_init(struct wireguard_peer *peer) +{ + timer_setup(&peer->timer_retransmit_handshake, + expired_retransmit_handshake, 0); + timer_setup(&peer->timer_send_keepalive, expired_send_keepalive, 0); + timer_setup(&peer->timer_new_handshake, expired_new_handshake, 0); + timer_setup(&peer->timer_zero_key_material, expired_zero_key_material, 0); + timer_setup(&peer->timer_persistent_keepalive, + expired_send_persistent_keepalive, 0); + INIT_WORK(&peer->clear_peer_work, queued_expired_zero_key_material); + peer->timer_handshake_attempts = 0; + peer->sent_lastminute_handshake = false; + peer->timer_need_another_keepalive = false; +} + +void timers_stop(struct wireguard_peer *peer) +{ + del_timer_sync(&peer->timer_retransmit_handshake); + del_timer_sync(&peer->timer_send_keepalive); + del_timer_sync(&peer->timer_new_handshake); + del_timer_sync(&peer->timer_zero_key_material); + del_timer_sync(&peer->timer_persistent_keepalive); + flush_work(&peer->clear_peer_work); +} diff --git a/drivers/net/wireguard/timers.h b/drivers/net/wireguard/timers.h new file mode 100644 index 000000000000..50014d284668 --- /dev/null +++ b/drivers/net/wireguard/timers.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + */ + +#ifndef _WG_TIMERS_H +#define _WG_TIMERS_H + +#include + +struct wireguard_peer; + +void timers_init(struct wireguard_peer *peer); +void timers_stop(struct wireguard_peer *peer); +void timers_data_sent(struct wireguard_peer *peer); +void timers_data_received(struct wireguard_peer *peer); +void timers_any_authenticated_packet_sent(struct wireguard_peer *peer); +void timers_any_authenticated_packet_received(struct wireguard_peer *peer); +void timers_handshake_initiated(struct wireguard_peer *peer); +void timers_handshake_complete(struct wireguard_peer *peer); +void timers_session_derived(struct wireguard_peer *peer); +void timers_any_authenticated_packet_traversal(struct wireguard_peer *peer); + +static inline bool has_expired(u64 birthday_nanoseconds, u64 expiration_seconds) +{ + return (s64)(birthday_nanoseconds + expiration_seconds * NSEC_PER_SEC) + <= (s64)ktime_get_boot_fast_ns(); +} + +#endif /* _WG_TIMERS_H */ diff --git a/drivers/net/wireguard/version.h b/drivers/net/wireguard/version.h new file mode 100644 index 000000000000..fe8af5adc073 --- /dev/null +++ b/drivers/net/wireguard/version.h @@ -0,0 +1 @@ +#define WIREGUARD_VERSION "0.0.20180925" diff --git a/include/uapi/linux/wireguard.h b/include/uapi/linux/wireguard.h new file mode 100644 index 000000000000..ab72766a07ff --- /dev/null +++ b/include/uapi/linux/wireguard.h @@ -0,0 +1,190 @@ +/* SPDX-License-Identifier: (GPL-2.0 WITH Linux-syscall-note) OR MIT */ +/* + * Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. + * + * Documentation + * ============= + * + * The below enums and macros are for interfacing with WireGuard, using generic + * netlink, with family WG_GENL_NAME and version WG_GENL_VERSION. It defines two + * methods: get and set. Note that while they share many common attributes, + * these two functions actually accept a slightly different set of inputs and + * outputs. + * + * WG_CMD_GET_DEVICE + * ----------------- + * + * May only be called via NLM_F_REQUEST | NLM_F_DUMP. The command should contain + * one but not both of: + * + * WGDEVICE_A_IFINDEX: NLA_U32 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * + * The kernel will then return several messages (NLM_F_MULTI) containing the + * following tree of nested items: + * + * WGDEVICE_A_IFINDEX: NLA_U32 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN + * WGDEVICE_A_PUBLIC_KEY: len WG_KEY_LEN + * WGDEVICE_A_LISTEN_PORT: NLA_U16 + * WGDEVICE_A_FWMARK: NLA_U32 + * WGDEVICE_A_PEERS: NLA_NESTED + * 0: NLA_NESTED + * WGPEER_A_PUBLIC_KEY: len WG_KEY_LEN + * WGPEER_A_PRESHARED_KEY: len WG_KEY_LEN + * WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6 + * WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16 + * WGPEER_A_LAST_HANDSHAKE_TIME: struct timespec + * WGPEER_A_RX_BYTES: NLA_U64 + * WGPEER_A_TX_BYTES: NLA_U64 + * WGPEER_A_ALLOWEDIPS: NLA_NESTED + * 0: NLA_NESTED + * WGALLOWEDIP_A_FAMILY: NLA_U16 + * WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr + * WGALLOWEDIP_A_CIDR_MASK: NLA_U8 + * 1: NLA_NESTED + * ... + * 2: NLA_NESTED + * ... + * ... + * WGPEER_A_PROTOCOL_VERSION: NLA_U32 + * 1: NLA_NESTED + * ... + * ... + * + * It is possible that all of the allowed IPs of a single peer will not + * fit within a single netlink message. In that case, the same peer will + * be written in the following message, except it will only contain + * WGPEER_A_PUBLIC_KEY and WGPEER_A_ALLOWEDIPS. This may occur several + * times in a row for the same peer. It is then up to the receiver to + * coalesce adjacent peers. Likewise, it is possible that all peers will + * not fit within a single message. So, subsequent peers will be sent + * in following messages, except those will only contain WGDEVICE_A_IFNAME + * and WGDEVICE_A_PEERS. It is then up to the receiver to coalesce these + * messages to form the complete list of peers. + * + * Since this is an NLA_F_DUMP command, the final message will always be + * NLMSG_DONE, even if an error occurs. However, this NLMSG_DONE message + * contains an integer error code. It is either zero or a negative error + * code corresponding to the errno. + * + * WG_CMD_SET_DEVICE + * ----------------- + * + * May only be called via NLM_F_REQUEST. The command should contain the + * following tree of nested items, containing one but not both of + * WGDEVICE_A_IFINDEX and WGDEVICE_A_IFNAME: + * + * WGDEVICE_A_IFINDEX: NLA_U32 + * WGDEVICE_A_IFNAME: NLA_NUL_STRING, maxlen IFNAMESIZ - 1 + * WGDEVICE_A_FLAGS: NLA_U32, 0 or WGDEVICE_F_REPLACE_PEERS if all current + * peers should be removed prior to adding the list below. + * WGDEVICE_A_PRIVATE_KEY: len WG_KEY_LEN, all zeros to remove + * WGDEVICE_A_LISTEN_PORT: NLA_U16, 0 to choose randomly + * WGDEVICE_A_FWMARK: NLA_U32, 0 to disable + * WGDEVICE_A_PEERS: NLA_NESTED + * 0: NLA_NESTED + * WGPEER_A_PUBLIC_KEY: len WG_KEY_LEN + * WGPEER_A_FLAGS: NLA_U32, 0 and/or WGPEER_F_REMOVE_ME if the + * specified peer should be removed rather than + * added/updated and/or WGPEER_F_REPLACE_ALLOWEDIPS + * if all current allowed IPs of this peer should be + * removed prior to adding the list below. + * WGPEER_A_PRESHARED_KEY: len WG_KEY_LEN, all zeros to remove + * WGPEER_A_ENDPOINT: struct sockaddr_in or struct sockaddr_in6 + * WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL: NLA_U16, 0 to disable + * WGPEER_A_ALLOWEDIPS: NLA_NESTED + * 0: NLA_NESTED + * WGALLOWEDIP_A_FAMILY: NLA_U16 + * WGALLOWEDIP_A_IPADDR: struct in_addr or struct in6_addr + * WGALLOWEDIP_A_CIDR_MASK: NLA_U8 + * 1: NLA_NESTED + * ... + * 2: NLA_NESTED + * ... + * ... + * WGPEER_A_PROTOCOL_VERSION: NLA_U32, should not be set or used at + * all by most users of this API, as the + * most recent protocol will be used when + * this is unset. Otherwise, must be set + * to 1. + * 1: NLA_NESTED + * ... + * ... + * + * It is possible that the amount of configuration data exceeds that of + * the maximum message length accepted by the kernel. In that case, several + * messages should be sent one after another, with each successive one + * filling in information not contained in the prior. Note that if + * WGDEVICE_F_REPLACE_PEERS is specified in the first message, it probably + * should not be specified in fragments that come after, so that the list + * of peers is only cleared the first time but appened after. Likewise for + * peers, if WGPEER_F_REPLACE_ALLOWEDIPS is specified in the first message + * of a peer, it likely should not be specified in subsequent fragments. + * + * If an error occurs, NLMSG_ERROR will reply containing an errno. + */ + +#ifndef _WG_UAPI_WIREGUARD_H +#define _WG_UAPI_WIREGUARD_H + +#define WG_GENL_NAME "wireguard" +#define WG_GENL_VERSION 1 + +#define WG_KEY_LEN 32 + +enum wg_cmd { + WG_CMD_GET_DEVICE, + WG_CMD_SET_DEVICE, + __WG_CMD_MAX +}; +#define WG_CMD_MAX (__WG_CMD_MAX - 1) + +enum wgdevice_flag { + WGDEVICE_F_REPLACE_PEERS = 1U << 0 +}; +enum wgdevice_attribute { + WGDEVICE_A_UNSPEC, + WGDEVICE_A_IFINDEX, + WGDEVICE_A_IFNAME, + WGDEVICE_A_PRIVATE_KEY, + WGDEVICE_A_PUBLIC_KEY, + WGDEVICE_A_FLAGS, + WGDEVICE_A_LISTEN_PORT, + WGDEVICE_A_FWMARK, + WGDEVICE_A_PEERS, + __WGDEVICE_A_LAST +}; +#define WGDEVICE_A_MAX (__WGDEVICE_A_LAST - 1) + +enum wgpeer_flag { + WGPEER_F_REMOVE_ME = 1U << 0, + WGPEER_F_REPLACE_ALLOWEDIPS = 1U << 1 +}; +enum wgpeer_attribute { + WGPEER_A_UNSPEC, + WGPEER_A_PUBLIC_KEY, + WGPEER_A_PRESHARED_KEY, + WGPEER_A_FLAGS, + WGPEER_A_ENDPOINT, + WGPEER_A_PERSISTENT_KEEPALIVE_INTERVAL, + WGPEER_A_LAST_HANDSHAKE_TIME, + WGPEER_A_RX_BYTES, + WGPEER_A_TX_BYTES, + WGPEER_A_ALLOWEDIPS, + WGPEER_A_PROTOCOL_VERSION, + __WGPEER_A_LAST +}; +#define WGPEER_A_MAX (__WGPEER_A_LAST - 1) + +enum wgallowedip_attribute { + WGALLOWEDIP_A_UNSPEC, + WGALLOWEDIP_A_FAMILY, + WGALLOWEDIP_A_IPADDR, + WGALLOWEDIP_A_CIDR_MASK, + __WGALLOWEDIP_A_LAST +}; +#define WGALLOWEDIP_A_MAX (__WGALLOWEDIP_A_LAST - 1) + +#endif /* _WG_UAPI_WIREGUARD_H */ diff --git a/tools/testing/selftests/wireguard/netns.sh b/tools/testing/selftests/wireguard/netns.sh new file mode 100755 index 000000000000..568612c45acc --- /dev/null +++ b/tools/testing/selftests/wireguard/netns.sh @@ -0,0 +1,499 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Copyright (C) 2015-2018 Jason A. Donenfeld . All Rights Reserved. +# +# This script tests the below topology: +# +# ┌─────────────────────┐ ┌──────────────────────────────────┐ ┌─────────────────────┐ +# │ $ns1 namespace │ │ $ns0 namespace │ │ $ns2 namespace │ +# │ │ │ │ │ │ +# │┌────────┐ │ │ ┌────────┐ │ │ ┌────────┐│ +# ││ wg0 │───────────┼───┼────────────│ lo │────────────┼───┼───────────│ wg0 ││ +# │├────────┴──────────┐│ │ ┌───────┴────────┴────────┐ │ │┌──────────┴────────┤│ +# ││192.168.241.1/24 ││ │ │(ns1) (ns2) │ │ ││192.168.241.2/24 ││ +# ││fd00::1/24 ││ │ │127.0.0.1:1 127.0.0.1:2│ │ ││fd00::2/24 ││ +# │└───────────────────┘│ │ │[::]:1 [::]:2 │ │ │└───────────────────┘│ +# └─────────────────────┘ │ └─────────────────────────┘ │ └─────────────────────┘ +# └──────────────────────────────────┘ +# +# After the topology is prepared we run a series of TCP/UDP iperf3 tests between the +# wireguard peers in $ns1 and $ns2. Note that $ns0 is the endpoint for the wg0 +# interfaces in $ns1 and $ns2. See https://www.wireguard.com/netns/ for further +# details on how this is accomplished. +set -e + +exec 3>&1 +export WG_HIDE_KEYS=never +netns0="wg-test-$$-0" +netns1="wg-test-$$-1" +netns2="wg-test-$$-2" +pretty() { echo -e "\x1b[32m\x1b[1m[+] ${1:+NS$1: }${2}\x1b[0m" >&3; } +pp() { pretty "" "$*"; "$@"; } +maybe_exec() { if [[ $BASHPID -eq $$ ]]; then "$@"; else exec "$@"; fi; } +n0() { pretty 0 "$*"; maybe_exec ip netns exec $netns0 "$@"; } +n1() { pretty 1 "$*"; maybe_exec ip netns exec $netns1 "$@"; } +n2() { pretty 2 "$*"; maybe_exec ip netns exec $netns2 "$@"; } +ip0() { pretty 0 "ip $*"; ip -n $netns0 "$@"; } +ip1() { pretty 1 "ip $*"; ip -n $netns1 "$@"; } +ip2() { pretty 2 "ip $*"; ip -n $netns2 "$@"; } +sleep() { read -t "$1" -N 0 || true; } +waitiperf() { pretty "${1//*-}" "wait for iperf:5201"; while [[ $(ss -N "$1" -tlp 'sport = 5201') != *iperf3* ]]; do sleep 0.1; done; } +waitncatudp() { pretty "${1//*-}" "wait for udp:1111"; while [[ $(ss -N "$1" -ulp 'sport = 1111') != *ncat* ]]; do sleep 0.1; done; } +waitncattcp() { pretty "${1//*-}" "wait for tcp:1111"; while [[ $(ss -N "$1" -tlp 'sport = 1111') != *ncat* ]]; do sleep 0.1; done; } +waitiface() { pretty "${1//*-}" "wait for $2 to come up"; ip netns exec "$1" bash -c "while [[ \$(< \"/sys/class/net/$2/operstate\") != up ]]; do read -t .1 -N 0 || true; done;"; } + +cleanup() { + set +e + exec 2>/dev/null + printf "$orig_message_cost" > /proc/sys/net/core/message_cost + ip0 link del dev wg0 + ip1 link del dev wg0 + ip2 link del dev wg0 + local to_kill="$(ip netns pids $netns0) $(ip netns pids $netns1) $(ip netns pids $netns2)" + [[ -n $to_kill ]] && kill $to_kill + pp ip netns del $netns1 + pp ip netns del $netns2 + pp ip netns del $netns0 + exit +} + +orig_message_cost="$(< /proc/sys/net/core/message_cost)" +trap cleanup EXIT +printf 0 > /proc/sys/net/core/message_cost + +ip netns del $netns0 2>/dev/null || true +ip netns del $netns1 2>/dev/null || true +ip netns del $netns2 2>/dev/null || true +pp ip netns add $netns0 +pp ip netns add $netns1 +pp ip netns add $netns2 +ip0 link set up dev lo + +ip0 link add dev wg0 type wireguard +ip0 link set wg0 netns $netns1 +ip0 link add dev wg0 type wireguard +ip0 link set wg0 netns $netns2 +key1="$(pp wg genkey)" +key2="$(pp wg genkey)" +pub1="$(pp wg pubkey <<<"$key1")" +pub2="$(pp wg pubkey <<<"$key2")" +psk="$(pp wg genpsk)" +[[ -n $key1 && -n $key2 && -n $psk ]] + +configure_peers() { + ip1 addr add 192.168.241.1/24 dev wg0 + ip1 addr add fd00::1/24 dev wg0 + + ip2 addr add 192.168.241.2/24 dev wg0 + ip2 addr add fd00::2/24 dev wg0 + + n1 wg set wg0 \ + private-key <(echo "$key1") \ + listen-port 1 \ + peer "$pub2" \ + preshared-key <(echo "$psk") \ + allowed-ips 192.168.241.2/32,fd00::2/128 + n2 wg set wg0 \ + private-key <(echo "$key2") \ + listen-port 2 \ + peer "$pub1" \ + preshared-key <(echo "$psk") \ + allowed-ips 192.168.241.1/32,fd00::1/128 + + ip1 link set up dev wg0 + ip2 link set up dev wg0 +} +configure_peers + +tests() { + # Ping over IPv4 + n2 ping -c 10 -f -W 1 192.168.241.1 + n1 ping -c 10 -f -W 1 192.168.241.2 + + # Ping over IPv6 + n2 ping6 -c 10 -f -W 1 fd00::1 + n1 ping6 -c 10 -f -W 1 fd00::2 + + # TCP over IPv4 + n2 iperf3 -s -1 -B 192.168.241.2 & + waitiperf $netns2 + n1 iperf3 -Z -t 3 -c 192.168.241.2 + + # TCP over IPv6 + n1 iperf3 -s -1 -B fd00::1 & + waitiperf $netns1 + n2 iperf3 -Z -t 3 -c fd00::1 + + # UDP over IPv4 + n1 iperf3 -s -1 -B 192.168.241.1 & + waitiperf $netns1 + n2 iperf3 -Z -t 3 -b 0 -u -c 192.168.241.1 + + # UDP over IPv6 + n2 iperf3 -s -1 -B fd00::2 & + waitiperf $netns2 + n1 iperf3 -Z -t 3 -b 0 -u -c fd00::2 +} + +[[ $(ip1 link show dev wg0) =~ mtu\ ([0-9]+) ]] && orig_mtu="${BASH_REMATCH[1]}" +big_mtu=$(( 34816 - 1500 + $orig_mtu )) + +# Test using IPv4 as outer transport +n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2 +n2 wg set wg0 peer "$pub1" endpoint 127.0.0.1:1 +# Before calling tests, we first make sure that the stats counters are working +n2 ping -c 10 -f -W 1 192.168.241.1 +{ read _; read _; read _; read rx_bytes _; read _; read tx_bytes _; } < <(ip2 -stats link show dev wg0) +(( rx_bytes == 1372 && (tx_bytes == 1428 || tx_bytes == 1460) )) +{ read _; read _; read _; read rx_bytes _; read _; read tx_bytes _; } < <(ip1 -stats link show dev wg0) +(( tx_bytes == 1372 && (rx_bytes == 1428 || rx_bytes == 1460) )) +read _ rx_bytes tx_bytes < <(n2 wg show wg0 transfer) +(( rx_bytes == 1372 && (tx_bytes == 1428 || tx_bytes == 1460) )) +read _ rx_bytes tx_bytes < <(n1 wg show wg0 transfer) +(( tx_bytes == 1372 && (rx_bytes == 1428 || rx_bytes == 1460) )) + +tests +ip1 link set wg0 mtu $big_mtu +ip2 link set wg0 mtu $big_mtu +tests + +ip1 link set wg0 mtu $orig_mtu +ip2 link set wg0 mtu $orig_mtu + +# Test using IPv6 as outer transport +n1 wg set wg0 peer "$pub2" endpoint [::1]:2 +n2 wg set wg0 peer "$pub1" endpoint [::1]:1 +tests +ip1 link set wg0 mtu $big_mtu +ip2 link set wg0 mtu $big_mtu +tests + +# Test that route MTUs work with the padding +ip1 link set wg0 mtu 1300 +ip2 link set wg0 mtu 1300 +n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2 +n2 wg set wg0 peer "$pub1" endpoint 127.0.0.1:1 +n0 iptables -A INPUT -m length --length 1360 -j DROP +n1 ip route add 192.168.241.2/32 dev wg0 mtu 1299 +n2 ip route add 192.168.241.1/32 dev wg0 mtu 1299 +n2 ping -c 1 -W 1 -s 1269 192.168.241.1 +n2 ip route delete 192.168.241.1/32 dev wg0 mtu 1299 +n1 ip route delete 192.168.241.2/32 dev wg0 mtu 1299 +n0 iptables -F INPUT + +ip1 link set wg0 mtu $orig_mtu +ip2 link set wg0 mtu $orig_mtu + +# Test using IPv4 that roaming works +ip0 -4 addr del 127.0.0.1/8 dev lo +ip0 -4 addr add 127.212.121.99/8 dev lo +n1 wg set wg0 listen-port 9999 +n1 wg set wg0 peer "$pub2" endpoint 127.0.0.1:2 +n1 ping6 -W 1 -c 1 fd00::2 +[[ $(n2 wg show wg0 endpoints) == "$pub1 127.212.121.99:9999" ]] + +# Test using IPv6 that roaming works +n1 wg set wg0 listen-port 9998 +n1 wg set wg0 peer "$pub2" endpoint [::1]:2 +n1 ping -W 1 -c 1 192.168.241.2 +[[ $(n2 wg show wg0 endpoints) == "$pub1 [::1]:9998" ]] + +# Test that crypto-RP filter works +n1 wg set wg0 peer "$pub2" allowed-ips 192.168.241.0/24 +exec 4< <(n1 ncat -l -u -p 1111) +nmap_pid=$! +waitncatudp $netns1 +n2 ncat -u 192.168.241.1 1111 <<<"X" +read -r -N 1 -t 1 out <&4 && [[ $out == "X" ]] +kill $nmap_pid +more_specific_key="$(pp wg genkey | pp wg pubkey)" +n1 wg set wg0 peer "$more_specific_key" allowed-ips 192.168.241.2/32 +n2 wg set wg0 listen-port 9997 +exec 4< <(n1 ncat -l -u -p 1111) +nmap_pid=$! +waitncatudp $netns1 +n2 ncat -u 192.168.241.1 1111 <<<"X" +! read -r -N 1 -t 1 out <&4 || false +kill $nmap_pid +n1 wg set wg0 peer "$more_specific_key" remove +[[ $(n1 wg show wg0 endpoints) == "$pub2 [::1]:9997" ]] + +ip1 link del wg0 +ip2 link del wg0 + +# Test using NAT. We now change the topology to this: +# ┌────────────────────────────────────────┐ ┌────────────────────────────────────────────────┐ ┌────────────────────────────────────────┐ +# │ $ns1 namespace │ │ $ns0 namespace │ │ $ns2 namespace │ +# │ │ │ │ │ │ +# │ ┌─────┐ ┌─────┐ │ │ ┌──────┐ ┌──────┐ │ │ ┌─────┐ ┌─────┐ │ +# │ │ wg0 │─────────────│vethc│───────────┼────┼────│vethrc│ │vethrs│──────────────┼─────┼──│veths│────────────│ wg0 │ │ +# │ ├─────┴──────────┐ ├─────┴──────────┐│ │ ├──────┴─────────┐ ├──────┴────────────┐ │ │ ├─────┴──────────┐ ├─────┴──────────┐ │ +# │ │192.168.241.1/24│ │192.168.1.100/24││ │ │192.168.1.100/24│ │10.0.0.1/24 │ │ │ │10.0.0.100/24 │ │192.168.241.2/24│ │ +# │ │fd00::1/24 │ │ ││ │ │ │ │SNAT:192.168.1.0/24│ │ │ │ │ │fd00::2/24 │ │ +# │ └────────────────┘ └────────────────┘│ │ └────────────────┘ └───────────────────┘ │ │ └────────────────┘ └────────────────┘ │ +# └────────────────────────────────────────┘ └────────────────────────────────────────────────┘ └────────────────────────────────────────┘ + +ip1 link add dev wg0 type wireguard +ip2 link add dev wg0 type wireguard +configure_peers + +ip0 link add vethrc type veth peer name vethc +ip0 link add vethrs type veth peer name veths +ip0 link set vethc netns $netns1 +ip0 link set veths netns $netns2 +ip0 link set vethrc up +ip0 link set vethrs up +ip0 addr add 192.168.1.1/24 dev vethrc +ip0 addr add 10.0.0.1/24 dev vethrs +ip1 addr add 192.168.1.100/24 dev vethc +ip1 link set vethc up +ip1 route add default via 192.168.1.1 +ip2 addr add 10.0.0.100/24 dev veths +ip2 link set veths up +waitiface $netns0 vethrc +waitiface $netns0 vethrs +waitiface $netns1 vethc +waitiface $netns2 veths + +n0 bash -c 'printf 1 > /proc/sys/net/ipv4/ip_forward' +n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout' +n0 bash -c 'printf 2 > /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream' +n0 iptables -t nat -A POSTROUTING -s 192.168.1.0/24 -d 10.0.0.0/24 -j SNAT --to 10.0.0.1 + +n1 wg set wg0 peer "$pub2" endpoint 10.0.0.100:2 persistent-keepalive 1 +n1 ping -W 1 -c 1 192.168.241.2 +n2 ping -W 1 -c 1 192.168.241.1 +[[ $(n2 wg show wg0 endpoints) == "$pub1 10.0.0.1:1" ]] +# Demonstrate n2 can still send packets to n1, since persistent-keepalive will prevent connection tracking entry from expiring (to see entries: `n0 conntrack -L`). +pp sleep 3 +n2 ping -W 1 -c 1 192.168.241.1 + +n0 iptables -t nat -F +ip0 link del vethrc +ip0 link del vethrs +ip1 link del wg0 +ip2 link del wg0 + +# Test that saddr routing is sticky but not too sticky, changing to this topology: +# ┌────────────────────────────────────────┐ ┌────────────────────────────────────────┐ +# │ $ns1 namespace │ │ $ns2 namespace │ +# │ │ │ │ +# │ ┌─────┐ ┌─────┐ │ │ ┌─────┐ ┌─────┐ │ +# │ │ wg0 │─────────────│veth1│───────────┼────┼──│veth2│────────────│ wg0 │ │ +# │ ├─────┴──────────┐ ├─────┴──────────┐│ │ ├─────┴──────────┐ ├─────┴──────────┐ │ +# │ │192.168.241.1/24│ │10.0.0.1/24 ││ │ │10.0.0.2/24 │ │192.168.241.2/24│ │ +# │ │fd00::1/24 │ │fd00:aa::1/96 ││ │ │fd00:aa::2/96 │ │fd00::2/24 │ │ +# │ └────────────────┘ └────────────────┘│ │ └────────────────┘ └────────────────┘ │ +# └────────────────────────────────────────┘ └────────────────────────────────────────┘ + +ip1 link add dev wg0 type wireguard +ip2 link add dev wg0 type wireguard +configure_peers +ip1 link add veth1 type veth peer name veth2 +ip1 link set veth2 netns $netns2 +n1 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/all/accept_dad' +n2 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/all/accept_dad' +n1 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/veth1/accept_dad' +n2 bash -c 'printf 0 > /proc/sys/net/ipv6/conf/veth2/accept_dad' +n1 bash -c 'printf 1 > /proc/sys/net/ipv4/conf/veth1/promote_secondaries' + +# First we check that we aren't overly sticky and can fall over to new IPs when old ones are removed +ip1 addr add 10.0.0.1/24 dev veth1 +ip1 addr add fd00:aa::1/96 dev veth1 +ip2 addr add 10.0.0.2/24 dev veth2 +ip2 addr add fd00:aa::2/96 dev veth2 +ip1 link set veth1 up +ip2 link set veth2 up +waitiface $netns1 veth1 +waitiface $netns2 veth2 +n1 wg set wg0 peer "$pub2" endpoint 10.0.0.2:2 +n1 ping -W 1 -c 1 192.168.241.2 +ip1 addr add 10.0.0.10/24 dev veth1 +ip1 addr del 10.0.0.1/24 dev veth1 +n1 ping -W 1 -c 1 192.168.241.2 +n1 wg set wg0 peer "$pub2" endpoint [fd00:aa::2]:2 +n1 ping -W 1 -c 1 192.168.241.2 +ip1 addr add fd00:aa::10/96 dev veth1 +ip1 addr del fd00:aa::1/96 dev veth1 +n1 ping -W 1 -c 1 192.168.241.2 + +# Now we show that we can successfully do reply to sender routing +ip1 link set veth1 down +ip2 link set veth2 down +ip1 addr flush dev veth1 +ip2 addr flush dev veth2 +ip1 addr add 10.0.0.1/24 dev veth1 +ip1 addr add 10.0.0.2/24 dev veth1 +ip1 addr add fd00:aa::1/96 dev veth1 +ip1 addr add fd00:aa::2/96 dev veth1 +ip2 addr add 10.0.0.3/24 dev veth2 +ip2 addr add fd00:aa::3/96 dev veth2 +ip1 link set veth1 up +ip2 link set veth2 up +waitiface $netns1 veth1 +waitiface $netns2 veth2 +n2 wg set wg0 peer "$pub1" endpoint 10.0.0.1:1 +n2 ping -W 1 -c 1 192.168.241.1 +[[ $(n2 wg show wg0 endpoints) == "$pub1 10.0.0.1:1" ]] +n2 wg set wg0 peer "$pub1" endpoint [fd00:aa::1]:1 +n2 ping -W 1 -c 1 192.168.241.1 +[[ $(n2 wg show wg0 endpoints) == "$pub1 [fd00:aa::1]:1" ]] +n2 wg set wg0 peer "$pub1" endpoint 10.0.0.2:1 +n2 ping -W 1 -c 1 192.168.241.1 +[[ $(n2 wg show wg0 endpoints) == "$pub1 10.0.0.2:1" ]] +n2 wg set wg0 peer "$pub1" endpoint [fd00:aa::2]:1 +n2 ping -W 1 -c 1 192.168.241.1 +[[ $(n2 wg show wg0 endpoints) == "$pub1 [fd00:aa::2]:1" ]] + +# What happens if the inbound destination address belongs to a different interface as the default route? +ip1 link add dummy0 type dummy +ip1 addr add 10.50.0.1/24 dev dummy0 +ip1 link set dummy0 up +ip2 route add 10.50.0.0/24 dev veth2 +n2 wg set wg0 peer "$pub1" endpoint 10.50.0.1:1 +n2 ping -W 1 -c 1 192.168.241.1 +[[ $(n2 wg show wg0 endpoints) == "$pub1 10.50.0.1:1" ]] + +ip1 link del dummy0 +ip1 addr flush dev veth1 +ip2 addr flush dev veth2 +ip1 route flush dev veth1 +ip2 route flush dev veth2 + +# Now we see what happens if another interface route takes precedence over an ongoing one +ip1 link add veth3 type veth peer name veth4 +ip1 link set veth4 netns $netns2 +ip1 addr add 10.0.0.1/24 dev veth1 +ip2 addr add 10.0.0.2/24 dev veth2 +ip1 addr add 10.0.0.3/24 dev veth3 +ip1 link set veth1 up +ip2 link set veth2 up +ip1 link set veth3 up +ip2 link set veth4 up +waitiface $netns1 veth1 +waitiface $netns2 veth2 +waitiface $netns1 veth3 +waitiface $netns2 veth4 +ip1 route flush dev veth1 +ip1 route flush dev veth3 +ip1 route add 10.0.0.0/24 dev veth1 src 10.0.0.1 metric 2 +n1 wg set wg0 peer "$pub2" endpoint 10.0.0.2:2 +n1 ping -W 1 -c 1 192.168.241.2 +[[ $(n2 wg show wg0 endpoints) == "$pub1 10.0.0.1:1" ]] +ip1 route add 10.0.0.0/24 dev veth3 src 10.0.0.3 metric 1 +n1 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter' +n2 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/veth4/rp_filter' +n1 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/all/rp_filter' +n2 bash -c 'printf 0 > /proc/sys/net/ipv4/conf/all/rp_filter' +n1 ping -W 1 -c 1 192.168.241.2 +[[ $(n2 wg show wg0 endpoints) == "$pub1 10.0.0.3:1" ]] + +ip1 link del veth1 +ip1 link del veth3 +ip1 link del wg0 +ip2 link del wg0 + +# We test that Netlink/IPC is working properly by doing things that usually cause split responses +ip0 link add dev wg0 type wireguard +config=( "[Interface]" "PrivateKey=$(wg genkey)" "[Peer]" "PublicKey=$(wg genkey)" ) +for a in {1..255}; do + for b in {0..255}; do + config+=( "AllowedIPs=$a.$b.0.0/16,$a::$b/128" ) + done +done +n0 wg setconf wg0 <(printf '%s\n' "${config[@]}") +i=0 +for ip in $(n0 wg show wg0 allowed-ips); do + ((++i)) +done +((i == 255*256*2+1)) +ip0 link del wg0 +ip0 link add dev wg0 type wireguard +config=( "[Interface]" "PrivateKey=$(wg genkey)" ) +for a in {1..40}; do + config+=( "[Peer]" "PublicKey=$(wg genkey)" ) + for b in {1..52}; do + config+=( "AllowedIPs=$a.$b.0.0/16" ) + done +done +n0 wg setconf wg0 <(printf '%s\n' "${config[@]}") +i=0 +while read -r line; do + j=0 + for ip in $line; do + ((++j)) + done + ((j == 53)) + ((++i)) +done < <(n0 wg show wg0 allowed-ips) +((i == 40)) +ip0 link del wg0 +ip0 link add wg0 type wireguard +config=( ) +for i in {1..29}; do + config+=( "[Peer]" "PublicKey=$(wg genkey)" ) +done +config+=( "[Peer]" "PublicKey=$(wg genkey)" "AllowedIPs=255.2.3.4/32,abcd::255/128" ) +n0 wg setconf wg0 <(printf '%s\n' "${config[@]}") +n0 wg showconf wg0 > /dev/null +ip0 link del wg0 + +allowedips=( ) +for i in {1..197}; do + allowedips+=( abcd::$i ) +done +saved_ifs="$IFS" +IFS=, +allowedips="${allowedips[*]}" +IFS="$saved_ifs" +ip0 link add wg0 type wireguard +n0 wg set wg0 peer "$pub1" +n0 wg set wg0 peer "$pub2" allowed-ips "$allowedips" +{ + read -r pub allowedips + [[ $pub == "$pub1" && $allowedips == "(none)" ]] + read -r pub allowedips + [[ $pub == "$pub2" ]] + i=0 + for _ in $allowedips; do + ((++i)) + done + ((i == 197)) +} < <(n0 wg show wg0 allowed-ips) +ip0 link del wg0 + +! n0 wg show doesnotexist || false + +ip0 link add wg0 type wireguard +n0 wg set wg0 private-key <(echo "$key1") peer "$pub2" preshared-key <(echo "$psk") +[[ $(n0 wg show wg0 private-key) == "$key1" ]] +[[ $(n0 wg show wg0 preshared-keys) == "$pub2 $psk" ]] +n0 wg set wg0 private-key /dev/null peer "$pub2" preshared-key /dev/null +[[ $(n0 wg show wg0 private-key) == "(none)" ]] +[[ $(n0 wg show wg0 preshared-keys) == "$pub2 (none)" ]] +n0 wg set wg0 peer "$pub2" +n0 wg set wg0 private-key <(echo "$key2") +[[ $(n0 wg show wg0 public-key) == "$pub2" ]] +[[ -z $(n0 wg show wg0 peers) ]] +n0 wg set wg0 peer "$pub2" +[[ -z $(n0 wg show wg0 peers) ]] +n0 wg set wg0 private-key <(echo "$key1") +n0 wg set wg0 peer "$pub2" +[[ $(n0 wg show wg0 peers) == "$pub2" ]] +ip0 link del wg0 + +declare -A objects +while read -t 0.1 -r line 2>/dev/null || [[ $? -ne 142 ]]; do + [[ $line =~ .*(wg[0-9]+:\ [A-Z][a-z]+\ [0-9]+)\ .*(created|destroyed).* ]] || continue + objects["${BASH_REMATCH[1]}"]+="${BASH_REMATCH[2]}" +done < /dev/kmsg +alldeleted=1 +for object in "${!objects[@]}"; do + if [[ ${objects["$object"]} != *createddestroyed ]]; then + echo "Error: $object: merely ${objects["$object"]}" >&3 + alldeleted=0 + fi +done +[[ $alldeleted -eq 1 ]] +pretty "" "Objects that were created were also destroyed."