From patchwork Mon Nov 27 12:23:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 747647 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4Awf7/dH" Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD842183 for ; Mon, 27 Nov 2023 04:23:15 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-5cfe0b63eeeso16146837b3.0 for ; Mon, 27 Nov 2023 04:23:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701087795; x=1701692595; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=p/Ch/7P7gbGGJgSxsuR+ilVTc8c1SGXumgwmkI4FD94=; b=4Awf7/dH26FDONZbYGB+1hdCXXRfoe8ERJzCTZBckMzCfOQbsfmumufZSooq2QgIo8 9d4UccWZ2Z81b7HkORIMivJgEgev/X7ZJkS30VAUUsFeDmsJQC88GnBizCUTmZ7cyVRm 4wLS2hiIHgT6gdKurdKoqIZlrk0lRVCApWSAPdlt8/vgiRQEvBOW9j5OrQrC0SsfLMcc Vn2JIkbT3vhaZGwRRdwlEr1nywNioL6vUrsH6MgQkIEytx7DqSKPCXXhHSXgs7/jtR2o /zzqiUnvSCQ6z33jS51HeaaxS2VyLB23wsXT/S60gbeLaa6uZSVu3xzzRqaOGBi9R/1r 9AxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701087795; x=1701692595; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=p/Ch/7P7gbGGJgSxsuR+ilVTc8c1SGXumgwmkI4FD94=; b=mCPW7y8jGJOochsp1PM0ZwBHUjrwCfp/csuPoi7IPqgHyj+y9bAlfL/3VV5YwoR1qj VYdpAQ3rCU281kX2ikvJbCvguNcvzDc/zbC77QSd2jGSLhpJa5LZX81lJW+VmW9I5CY6 UJcAZ0wB9xrh0HgnPIdVS3T89gKkGlIsJGUXC3BiR2N32G8AcW9s1n7hx73vxEsa8G87 G9z62IuBEkPwPsGSjyYZNHtE2RVMunRVQkoX+CGJ/nl6yLzAlIi1W/38xFhQtM10LZ3i BvrlQwG0smFrVzCJkeG/tlkVgez1PJBOEle4oXTzE0GdyDpFwoYhbEBkkAOVLnrkl+j+ 9scQ== X-Gm-Message-State: AOJu0YxgRG8Wa/Yc9GL7ib3ZZMxjwyA4T56ZwE3YpUldxgs6aEEWtkWX 7VDSpWdzsrgUPXudc22865Lc9TAh X-Google-Smtp-Source: AGHT+IF139AT3aSFpqii6704dz6/8G3AqH52BUM1m6DcXw4sMHUwiDAh6VdtyvMa9xONH3X0O6krylhX X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:690c:3301:b0:5c9:6d54:de9e with SMTP id fj1-20020a05690c330100b005c96d54de9emr345006ywb.10.1701087794943; Mon, 27 Nov 2023 04:23:14 -0800 (PST) Date: Mon, 27 Nov 2023 13:23:01 +0100 In-Reply-To: <20231127122259.2265164-7-ardb@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231127122259.2265164-7-ardb@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=7220; i=ardb@kernel.org; h=from:subject; bh=XBBn0M5ow9wFIvsftiDNXJYygKjCB9+fC9f2YPlvdKk=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JITWlS/V6/6d1n0MvC6470mSewDi94qBCYYjr3AeShbYHU o2WHtvaUcrCIMbBICumyCIw+++7nacnStU6z5KFmcPKBDKEgYtTACZygp2RYQvXzl9rNTsezxd1 PeFaeu/a76YHmqZW9YlbNP+GPZxespzhn47Co+bgrs+9j/a/61FfKzH93EEnrv1LO4V6I/eJfrt 0mQsA X-Mailer: git-send-email 2.43.0.rc1.413.gea7ed67945-goog Message-ID: <20231127122259.2265164-8-ardb@google.com> Subject: [PATCH v3 1/5] arm64: fpsimd: Drop unneeded 'busy' flag From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: linux-crypto@vger.kernel.org, Ard Biesheuvel , Marc Zyngier , Will Deacon , Mark Rutland , Kees Cook , Catalin Marinas , Mark Brown , Eric Biggers , Sebastian Andrzej Siewior From: Ard Biesheuvel Kernel mode NEON will preserve the user mode FPSIMD state by saving it into the task struct before clobbering the registers. In order to avoid the need for preserving kernel mode state too, we disallow nested use of kernel mode NEON, i..e, use in softirq context while the interrupted task context was using kernel mode NEON too. Originally, this policy was implemented using a per-CPU flag which was exposed via may_use_simd(), requiring the users of the kernel mode NEON to deal with the possibility that it might return false, and having NEON and non-NEON code paths. This policy was changed by commit 13150149aa6ded1 ("arm64: fpsimd: run kernel mode NEON with softirqs disabled"), and now, softirq processing is disabled entirely instead, and so may_use_simd() can never fail when called from task or softirq context. This means we can drop the fpsimd_context_busy flag entirely, and instead, ensure that we disable softirq processing in places where we formerly relied on the flag for preventing races in the FPSIMD preserve routines. Signed-off-by: Ard Biesheuvel Reviewed-by: Mark Brown --- arch/arm64/include/asm/simd.h | 11 +--- arch/arm64/kernel/fpsimd.c | 53 +++++--------------- 2 files changed, 13 insertions(+), 51 deletions(-) diff --git a/arch/arm64/include/asm/simd.h b/arch/arm64/include/asm/simd.h index 6a75d7ecdcaa..8e86c9e70e48 100644 --- a/arch/arm64/include/asm/simd.h +++ b/arch/arm64/include/asm/simd.h @@ -12,8 +12,6 @@ #include #include -DECLARE_PER_CPU(bool, fpsimd_context_busy); - #ifdef CONFIG_KERNEL_MODE_NEON /* @@ -28,17 +26,10 @@ static __must_check inline bool may_use_simd(void) /* * We must make sure that the SVE has been initialized properly * before using the SIMD in kernel. - * fpsimd_context_busy is only set while preemption is disabled, - * and is clear whenever preemption is enabled. Since - * this_cpu_read() is atomic w.r.t. preemption, fpsimd_context_busy - * cannot change under our feet -- if it's set we cannot be - * migrated, and if it's clear we cannot be migrated to a CPU - * where it is set. */ return !WARN_ON(!system_capabilities_finalized()) && system_supports_fpsimd() && - !in_hardirq() && !irqs_disabled() && !in_nmi() && - !this_cpu_read(fpsimd_context_busy); + !in_hardirq() && !irqs_disabled() && !in_nmi(); } #else /* ! CONFIG_KERNEL_MODE_NEON */ diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 1559c706d32d..ccc4a78a70e4 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -85,13 +85,13 @@ * softirq kicks in. Upon vcpu_put(), KVM will save the vcpu FP state and * flag the register state as invalid. * - * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may - * save the task's FPSIMD context back to task_struct from softirq context. - * To prevent this from racing with the manipulation of the task's FPSIMD state - * from task context and thereby corrupting the state, it is necessary to - * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE - * flag with {, __}get_cpu_fpsimd_context(). This will still allow softirqs to - * run but prevent them to use FPSIMD. + * In order to allow softirq handlers to use FPSIMD, kernel_neon_begin() may be + * called from softirq context, which will save the task's FPSIMD context back + * to task_struct. To prevent this from racing with the manipulation of the + * task's FPSIMD state from task context and thereby corrupting the state, it + * is necessary to protect any manipulation of a task's fpsimd_state or + * TIF_FOREIGN_FPSTATE flag with get_cpu_fpsimd_context(), which will suspend + * softirq servicing entirely until put_cpu_fpsimd_context() is called. * * For a certain task, the sequence may look something like this: * - the task gets scheduled in; if both the task's fpsimd_cpu field @@ -209,27 +209,14 @@ static inline void sme_free(struct task_struct *t) { } #endif -DEFINE_PER_CPU(bool, fpsimd_context_busy); -EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy); - static void fpsimd_bind_task_to_cpu(void); -static void __get_cpu_fpsimd_context(void) -{ - bool busy = __this_cpu_xchg(fpsimd_context_busy, true); - - WARN_ON(busy); -} - /* * Claim ownership of the CPU FPSIMD context for use by the calling context. * * The caller may freely manipulate the FPSIMD context metadata until * put_cpu_fpsimd_context() is called. * - * The double-underscore version must only be called if you know the task - * can't be preempted. - * * On RT kernels local_bh_disable() is not sufficient because it only * serializes soft interrupt related sections via a local lock, but stays * preemptible. Disabling preemption is the right choice here as bottom @@ -242,14 +229,6 @@ static void get_cpu_fpsimd_context(void) local_bh_disable(); else preempt_disable(); - __get_cpu_fpsimd_context(); -} - -static void __put_cpu_fpsimd_context(void) -{ - bool busy = __this_cpu_xchg(fpsimd_context_busy, false); - - WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */ } /* @@ -261,18 +240,12 @@ static void __put_cpu_fpsimd_context(void) */ static void put_cpu_fpsimd_context(void) { - __put_cpu_fpsimd_context(); if (!IS_ENABLED(CONFIG_PREEMPT_RT)) local_bh_enable(); else preempt_enable(); } -static bool have_cpu_fpsimd_context(void) -{ - return !preemptible() && __this_cpu_read(fpsimd_context_busy); -} - unsigned int task_get_vl(const struct task_struct *task, enum vec_type type) { return task->thread.vl[type]; @@ -383,7 +356,7 @@ static void task_fpsimd_load(void) bool restore_ffr; WARN_ON(!system_supports_fpsimd()); - WARN_ON(!have_cpu_fpsimd_context()); + WARN_ON(preemptible()); if (system_supports_sve() || system_supports_sme()) { switch (current->thread.fp_type) { @@ -467,7 +440,7 @@ static void fpsimd_save(void) unsigned int vl; WARN_ON(!system_supports_fpsimd()); - WARN_ON(!have_cpu_fpsimd_context()); + WARN_ON(preemptible()); if (test_thread_flag(TIF_FOREIGN_FPSTATE)) return; @@ -1507,7 +1480,7 @@ void fpsimd_thread_switch(struct task_struct *next) if (!system_supports_fpsimd()) return; - __get_cpu_fpsimd_context(); + WARN_ON_ONCE(!irqs_disabled()); /* Save unsaved fpsimd state, if any: */ fpsimd_save(); @@ -1523,8 +1496,6 @@ void fpsimd_thread_switch(struct task_struct *next) update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE, wrong_task || wrong_cpu); - - __put_cpu_fpsimd_context(); } static void fpsimd_flush_thread_vl(enum vec_type type) @@ -1829,10 +1800,10 @@ void fpsimd_save_and_flush_cpu_state(void) if (!system_supports_fpsimd()) return; WARN_ON(preemptible()); - __get_cpu_fpsimd_context(); + get_cpu_fpsimd_context(); fpsimd_save(); fpsimd_flush_cpu_state(); - __put_cpu_fpsimd_context(); + put_cpu_fpsimd_context(); } #ifdef CONFIG_KERNEL_MODE_NEON From patchwork Mon Nov 27 12:23:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 749269 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IGfCc31Z" Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 246A090 for ; Mon, 27 Nov 2023 04:23:18 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5c5daf2baccso52662247b3.3 for ; Mon, 27 Nov 2023 04:23:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701087797; x=1701692597; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PqLlpO64Fd8jrWutb7/V7hApVe27wWzkVubKvzlDbiw=; b=IGfCc31ZWJXBZO5pKZiT68uCL0kGdi7tDonZtiXu3PQsgIpd29OFzcbKo6o4owStp5 UU3Y13VtLuhp9FoH4WLZgVrbXOvsVjo9Tb/vjs7hHbinBWKm1AEf5HaTL8PgkzuZO6MR kCKwAYDrTVJBiDKWF46spyMiqeF+XxKkdlaX27ueAENS0igVTUiLKbnHiN/xzSeDCM9Z Li3ydrHa/WBX5Kproeks4q9ClgNApZDesgmAZPTTtFmqrwOA/VZHLnqBG6dSgRicthvT RGE3D6UTumoha6HRAxClsfi9so+cLiKFT6ja/vXjwrYYNxXOyB9+8QYyWL+kywZncNUO Fm3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701087797; x=1701692597; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PqLlpO64Fd8jrWutb7/V7hApVe27wWzkVubKvzlDbiw=; b=D/UZpxwlV7iuT/W9fnIl0V5XHZaiujuIzVcoO7h3WvkqaK/ElIJU8RUYroEN56bT7K bOVq2qx2UaotXF1EM4FGZdvdZYjkJGWNB21/7OiWwp5iaqUPXxC+a8ncwGwv4J7xruHe +mcIo41RzsefUXdQUHxLhdYfNsIb09uPufIL1X/m3dbFHXuOZEBItYJkGeuqaCrurPMe cvtL0HaYfODfCs6DLLo0g4qeFQjZOY0kwiY4wOVF/8/cys2HnkVJogaHjYmUt8oxJ2Eb CX7Z/KVRtMsh1i09FE4KLR63LJgW46e0fmVFztWwiKzFvRzHmXAsAZHddMPM4ASYIOwW 6oLA== X-Gm-Message-State: AOJu0YyrGbm+TH+kzdKJkTphIfB2ga5LcszpQ8UH4TdzVH7KwIndTBnW FdRwTAbE92yR+2RkDoCKgvlLAWjS X-Google-Smtp-Source: AGHT+IF3us8uZbMqzB0v4/kcJQErPPc0HYSXC91eYkm5Lm013FLAD+dXmS1x3DwIBp7wPzVvyGRPY2P1 X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:690c:844:b0:5cd:c47d:d89e with SMTP id bz4-20020a05690c084400b005cdc47dd89emr332658ywb.1.1701087797441; Mon, 27 Nov 2023 04:23:17 -0800 (PST) Date: Mon, 27 Nov 2023 13:23:02 +0100 In-Reply-To: <20231127122259.2265164-7-ardb@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231127122259.2265164-7-ardb@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=9003; i=ardb@kernel.org; h=from:subject; bh=8hdapRxOdUgx0Dc+73UbIjfUG7AYLk13Xi+bCvpKXZE=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JITWlS016jcXSfWc2cW9evPmA2Y6e59vW+E3Ouy28xbJiQ 7BfS8jUjlIWBjEOBlkxRRaB2X/f7Tw9UarWeZYszBxWJpAhDFycAjCR61sY/idvubZIs7TXJbn/ 0HRh391d5rrJPkvW/IzMtp/HsdlD7Tgjw9UtOpMm3Llz5WVc8yXWlF2LdONiZu+47L1kxtPTakv 5JFkB X-Mailer: git-send-email 2.43.0.rc1.413.gea7ed67945-goog Message-ID: <20231127122259.2265164-9-ardb@google.com> Subject: [PATCH v3 2/5] arm64: fpsimd: Preserve/restore kernel mode NEON at context switch From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: linux-crypto@vger.kernel.org, Ard Biesheuvel , Marc Zyngier , Will Deacon , Mark Rutland , Kees Cook , Catalin Marinas , Mark Brown , Eric Biggers , Sebastian Andrzej Siewior From: Ard Biesheuvel Currently, the FPSIMD register file is not preserved and restored along with the general registers on exception entry/exit or context switch. For this reason, we disable preemption when enabling FPSIMD for kernel mode use in task context, and suspend the processing of softirqs so that there are no concurrent uses in the kernel. (Kernel mode FPSIMD may not be used at all in other contexts). Disabling preemption while doing CPU intensive work on inputs of potentially unbounded size is bad for real-time performance, which is why we try and ensure that SIMD crypto code does not operate on more than ~4k at a time, which is an arbitrary limit and requires assembler code to implement efficiently. We can avoid the need for disabling preemption if we can ensure that any in-kernel users of the NEON will not lose the FPSIMD register state across a context switch. And given that disabling softirqs implicitly disables preemption as well, we will also have to ensure that a softirq that runs code using FPSIMD can safely interrupt an in-kernel user. So introduce a thread_info flag TIF_USING_KMODE_FPSIMD, and modify the context switch hook for FPSIMD to preserve and restore the kernel mode FPSIMD to/from struct thread_struct when it is set. This avoids any scheduling blackouts due to prolonged use of FPSIMD in kernel mode, without the need for manual yielding. In order to support softirq processing while FPSIMD is being used in kernel task context, use the same flag to decide whether the kernel mode FPSIMD state needs to be preserved and restored before allowing FPSIMD to be used in softirq context. Signed-off-by: Ard Biesheuvel Reviewed-by: Mark Brown --- arch/arm64/include/asm/processor.h | 2 + arch/arm64/include/asm/thread_info.h | 1 + arch/arm64/kernel/fpsimd.c | 92 ++++++++++++++++---- 3 files changed, 77 insertions(+), 18 deletions(-) diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index e5bc54522e71..dcb51c0571af 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -167,6 +167,8 @@ struct thread_struct { unsigned long fault_address; /* fault info */ unsigned long fault_code; /* ESR_EL1 value */ struct debug_info debug; /* debugging */ + + struct user_fpsimd_state kmode_fpsimd_state; #ifdef CONFIG_ARM64_PTR_AUTH struct ptrauth_keys_user keys_user; #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/thread_info.h index 553d1bc559c6..6b254cf90e8b 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -80,6 +80,7 @@ void arch_setup_new_exec(void); #define TIF_TAGGED_ADDR 26 /* Allow tagged user addresses */ #define TIF_SME 27 /* SME in use */ #define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */ +#define TIF_USING_KMODE_FPSIMD 29 /* Task is in a kernel mode FPSIMD section */ #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index ccc4a78a70e4..198918805bf6 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -357,6 +357,7 @@ static void task_fpsimd_load(void) WARN_ON(!system_supports_fpsimd()); WARN_ON(preemptible()); + WARN_ON(test_thread_flag(TIF_USING_KMODE_FPSIMD)); if (system_supports_sve() || system_supports_sme()) { switch (current->thread.fp_type) { @@ -379,7 +380,7 @@ static void task_fpsimd_load(void) default: /* * This indicates either a bug in - * fpsimd_save() or memory corruption, we + * fpsimd_save_user_state() or memory corruption, we * should always record an explicit format * when we save. We always at least have the * memory allocated for FPSMID registers so @@ -430,7 +431,7 @@ static void task_fpsimd_load(void) * than via current, if we are saving KVM state then it will have * ensured that the type of registers to save is set in last->to_save. */ -static void fpsimd_save(void) +static void fpsimd_save_user_state(void) { struct cpu_fp_state const *last = this_cpu_ptr(&fpsimd_last_state); @@ -861,7 +862,7 @@ int vec_set_vector_length(struct task_struct *task, enum vec_type type, if (task == current) { get_cpu_fpsimd_context(); - fpsimd_save(); + fpsimd_save_user_state(); } fpsimd_flush_task_state(task); @@ -1473,6 +1474,16 @@ void do_fpsimd_exc(unsigned long esr, struct pt_regs *regs) current); } +static void fpsimd_load_kernel_state(struct task_struct *task) +{ + fpsimd_load_state(&task->thread.kmode_fpsimd_state); +} + +static void fpsimd_save_kernel_state(struct task_struct *task) +{ + fpsimd_save_state(&task->thread.kmode_fpsimd_state); +} + void fpsimd_thread_switch(struct task_struct *next) { bool wrong_task, wrong_cpu; @@ -1483,19 +1494,28 @@ void fpsimd_thread_switch(struct task_struct *next) WARN_ON_ONCE(!irqs_disabled()); /* Save unsaved fpsimd state, if any: */ - fpsimd_save(); + if (!test_thread_flag(TIF_USING_KMODE_FPSIMD)) + fpsimd_save_user_state(); + else + fpsimd_save_kernel_state(current); - /* - * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's - * state. For kernel threads, FPSIMD registers are never loaded - * and wrong_task and wrong_cpu will always be true. - */ - wrong_task = __this_cpu_read(fpsimd_last_state.st) != - &next->thread.uw.fpsimd_state; - wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id(); + if (test_tsk_thread_flag(next, TIF_USING_KMODE_FPSIMD)) { + fpsimd_load_kernel_state(next); + set_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE); + } else { + /* + * Fix up TIF_FOREIGN_FPSTATE to correctly describe next's + * state. For kernel threads, FPSIMD registers are never + * loaded with user mode FPSIMD state and so wrong_task and + * wrong_cpu will always be true. + */ + wrong_task = __this_cpu_read(fpsimd_last_state.st) != + &next->thread.uw.fpsimd_state; + wrong_cpu = next->thread.fpsimd_cpu != smp_processor_id(); - update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE, - wrong_task || wrong_cpu); + update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE, + wrong_task || wrong_cpu); + } } static void fpsimd_flush_thread_vl(enum vec_type type) @@ -1585,7 +1605,7 @@ void fpsimd_preserve_current_state(void) return; get_cpu_fpsimd_context(); - fpsimd_save(); + fpsimd_save_user_state(); put_cpu_fpsimd_context(); } @@ -1801,7 +1821,7 @@ void fpsimd_save_and_flush_cpu_state(void) return; WARN_ON(preemptible()); get_cpu_fpsimd_context(); - fpsimd_save(); + fpsimd_save_user_state(); fpsimd_flush_cpu_state(); put_cpu_fpsimd_context(); } @@ -1835,10 +1855,37 @@ void kernel_neon_begin(void) get_cpu_fpsimd_context(); /* Save unsaved fpsimd state, if any: */ - fpsimd_save(); + if (!test_thread_flag(TIF_USING_KMODE_FPSIMD)) { + fpsimd_save_user_state(); + + /* + * Set the thread flag so that the kernel mode FPSIMD state + * will be context switched along with the rest of the task + * state. + * + * On non-PREEMPT_RT, softirqs may interrupt task level kernel + * mode FPSIMD, but the task will not be preemptible so setting + * TIF_USING_KMODE_FPSIMD for those would be both wrong (as it + * would mark the task context FPSIMD state as requiring a + * context switch) and unnecessary. + * + * On PREEMPT_RT, softirqs are serviced from a separate thread, + * which is scheduled as usual, and this guarantees that these + * softirqs are not interrupting use of the FPSIMD in kernel + * mode in task context. So in this case, setting the flag here + * is always appropriate. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq()) + set_thread_flag(TIF_USING_KMODE_FPSIMD); + } else { + BUG_ON(IS_ENABLED(CONFIG_PREEMPT_RT) || !in_serving_softirq()); + fpsimd_save_kernel_state(current); + } /* Invalidate any task state remaining in the fpsimd regs: */ fpsimd_flush_cpu_state(); + + put_cpu_fpsimd_context(); } EXPORT_SYMBOL_GPL(kernel_neon_begin); @@ -1856,7 +1903,16 @@ void kernel_neon_end(void) if (!system_supports_fpsimd()) return; - put_cpu_fpsimd_context(); + /* + * If we are returning from a nested use of kernel mode FPSIMD, restore + * the task context kernel mode FPSIMD state. This can only happen when + * running in softirq context on non-PREEMPT_RT. + */ + if (!IS_ENABLED(CONFIG_PREEMPT_RT) && in_serving_softirq() && + test_thread_flag(TIF_USING_KMODE_FPSIMD)) + fpsimd_load_kernel_state(current); + else + clear_thread_flag(TIF_USING_KMODE_FPSIMD); } EXPORT_SYMBOL_GPL(kernel_neon_end); From patchwork Mon Nov 27 12:23:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 747646 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nrgIVtOD" Received: from mail-wr1-x44a.google.com (mail-wr1-x44a.google.com [IPv6:2a00:1450:4864:20::44a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BB6B111 for ; Mon, 27 Nov 2023 04:23:21 -0800 (PST) Received: by mail-wr1-x44a.google.com with SMTP id ffacd0b85a97d-332ee6c2a1aso1862737f8f.1 for ; Mon, 27 Nov 2023 04:23:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701087799; x=1701692599; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=7RC7fcm+iIbvYs4DSiycpu+OLkiF/mkbOdoMlvtSw6k=; b=nrgIVtODJgjjn0rHpg1oVRHevzvHulR08y2B/kUXnVUVhSV7JD/N8kgFOPYOLPmlW7 pLxtv9e44vhsb6aV4JjVTKcy6v6j5rUt2I1mpwxPiDTy0Lk/tEThUS5C/gty9hQNiplZ /Sh+xUEg29N3N7ZGjUP1b0Q5q7LNaGClleQA7d40QuagfpgygDdN/AXnhmlphASubIog wY+Vv+ao09CM0SZsWmKhQD7dwPEXBv3i8Z3ri7vpWA+NSWVoFrvYqk/VV8BbiruTyodQ yCWiSEmyUnXsi7nbSK+1JjqaNj1mJshx/GChXJBdFXIpw/dzHZsNu5r9mnnKeRQ0IypI LDvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701087800; x=1701692600; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=7RC7fcm+iIbvYs4DSiycpu+OLkiF/mkbOdoMlvtSw6k=; b=WTRFaia0eW5ukOijvl8E9mOYXsPaJwpDhbqt8f+ePYvC0cDv9VUf7dr8MkNOjz5Rhf CtOCgoXP3jBiQuxVOZg5VFRKudDzCNX9Lb1jeoq6C9tSYPCms/eQz5fRzk3kCAZ65dsw aosBNXgoJZRCwR54amS29rlnuZAQGtdAlXv0J6iXO/xJbXaJV9GOA4ZGeN5DfE6rkzrR nfzFnQ+gOxQW+jhbBpQxgSjGfg4aHfJ08E4KtNmwCPR2Vh2dASZMfghYSdzPD/Bt3CH9 D3/YLq7KDSm+TYybUkO+rodscfekugfa+73Pu+GeqfanKzVrbqAgfh7g2D51erGRivlX ZudQ== X-Gm-Message-State: AOJu0YzhO3w5AfAqHQwPhgJFdMHWPL/MgFUBCQ1Z6aGfXL6NZ+z8OVGi 9/q8keSFRCFzgY4brXz3/0FOY+9R X-Google-Smtp-Source: AGHT+IHf0gTC1PXtSo2Om+FUZLbT7f8NZUczw3CzhB5wimmyrlTb6QDofzUzL3kLMA0a2OUJC9vKEbW3 X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:adf:fc52:0:b0:332:cc18:b798 with SMTP id e18-20020adffc52000000b00332cc18b798mr178167wrs.14.1701087799840; Mon, 27 Nov 2023 04:23:19 -0800 (PST) Date: Mon, 27 Nov 2023 13:23:03 +0100 In-Reply-To: <20231127122259.2265164-7-ardb@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231127122259.2265164-7-ardb@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=2761; i=ardb@kernel.org; h=from:subject; bh=lq25ekjnEUttIKcfoZQhq0creUwzzdf4IQiZXvjxNR0=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JITWlS73BdDabothCI+GzBy6dev+y8ujX1SE8exbv+b323 nG7x8afOkpZGMQ4GGTFFFkEZv99t/P0RKla51myMHNYmUCGMHBxCsBEGCQZ/hklH69KkPmpnFkt t/yB3oLV3xasDbzwI4e9eXP2m+SLl74z/OGL9j/6qqbysPDhBTNOMV9XOdBd63t7IRNP9PRpoo8 XOfICAA== X-Mailer: git-send-email 2.43.0.rc1.413.gea7ed67945-goog Message-ID: <20231127122259.2265164-10-ardb@google.com> Subject: [PATCH v3 3/5] arm64: fpsimd: Implement lazy restore for kernel mode FPSIMD From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: linux-crypto@vger.kernel.org, Ard Biesheuvel , Marc Zyngier , Will Deacon , Mark Rutland , Kees Cook , Catalin Marinas , Mark Brown , Eric Biggers , Sebastian Andrzej Siewior From: Ard Biesheuvel Now that kernel mode FPSIMD state is context switched along with other task state, we can enable the existing logic that keeps track of which task's FPSIMD state the CPU is holding in its registers. If it is the context of the task that we are switching to, we can elide the reload of the FPSIMD state from memory. Note that we also need to check whether the FPSIMD state on this CPU is the most recent: if a task gets migrated away and back again, the state in memory may be more recent than the state in the CPU. So add another CPU id field to task_struct to keep track of this. (We could reuse the existing CPU id field used for user mode context, but that might result in user state to be discarded unnecessarily, given that two distinct CPUs could be holding the most recent user mode state and the most recent kernel mode state) Signed-off-by: Ard Biesheuvel Reviewed-by: Mark Brown Acked-by: Mark Rutland --- arch/arm64/include/asm/processor.h | 1 + arch/arm64/kernel/fpsimd.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h index dcb51c0571af..332f15d0abcf 100644 --- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -169,6 +169,7 @@ struct thread_struct { struct debug_info debug; /* debugging */ struct user_fpsimd_state kmode_fpsimd_state; + unsigned int kmode_fpsimd_cpu; #ifdef CONFIG_ARM64_PTR_AUTH struct ptrauth_keys_user keys_user; #ifdef CONFIG_ARM64_PTR_AUTH_KERNEL diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 198918805bf6..112111a078b6 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1476,12 +1476,30 @@ void do_fpsimd_exc(unsigned long esr, struct pt_regs *regs) static void fpsimd_load_kernel_state(struct task_struct *task) { + struct cpu_fp_state *last = this_cpu_ptr(&fpsimd_last_state); + + /* + * Elide the load if this CPU holds the most recent kernel mode + * FPSIMD context of the current task. + */ + if (last->st == &task->thread.kmode_fpsimd_state && + task->thread.kmode_fpsimd_cpu == smp_processor_id()) + return; + fpsimd_load_state(&task->thread.kmode_fpsimd_state); } static void fpsimd_save_kernel_state(struct task_struct *task) { + struct cpu_fp_state cpu_fp_state = { + .st = &task->thread.kmode_fpsimd_state, + .to_save = FP_STATE_FPSIMD, + }; + fpsimd_save_state(&task->thread.kmode_fpsimd_state); + fpsimd_bind_state_to_cpu(&cpu_fp_state); + + task->thread.kmode_fpsimd_cpu = smp_processor_id(); } void fpsimd_thread_switch(struct task_struct *next) From patchwork Mon Nov 27 12:23:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 749268 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="lg6b3Grf" Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EDF890 for ; Mon, 27 Nov 2023 04:23:23 -0800 (PST) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-db40b699d0fso4531980276.2 for ; Mon, 27 Nov 2023 04:23:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701087802; x=1701692602; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=A3Tct0MAFLzfpIwnBhpG04Rsq9x9oCOBU/4NJUxRL5g=; b=lg6b3Grf2URKARVskxiRnAhfBugrUQzr/BPv2rPD8QuH0kBTx6CsYW77xoIOigqGlb T0jLY6pQkXJDPadkU32mCdnQ1rat33RjB35kYKrg5ie3zu3NfwBUJ1nZBPqO7ju9hDNQ G8Zw0JIcPDeqRLTek5xZ+CmTcDfRXYDh4cST2iwllPHKW9vK5YB46r01pC9CKIh9CMx2 x6L181XaAPJFoJ1gbBFKJ6UCkpeEQ3wfVpHzKm8M82hgFibyAlYf0yalLkF9EJy3EKyV fCK7LZTPo/oxA6Bc5kmOiWxGkUlQb2jGZN5EaAsmZoRDY69wnr/IbCW2xKba4vMveXhL hhWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701087802; x=1701692602; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A3Tct0MAFLzfpIwnBhpG04Rsq9x9oCOBU/4NJUxRL5g=; b=wsuY/brPMfplVC0T+dGgW5sRAZIDHTmI/jrF3QvSjJyhtVW7mYutLnl9LppctE8Wg9 6oL2mnfTW0UnEWORDhl/wBaDNy/HItGU4hUPWoFP0egxPT2QkKYr1lRzyNk6ZC3IkqSy PPqW93ILehFRaOFspvX3HypvKmmUzUU9aTCOzzSvSd+YZRUYAN5UviIqAnzTMVbaY2tX GocvBwfxWjxGZ9KVUbNgGF4p+kPTZsvKRdoxFktfbB+tdUlSGYq6U3C0sene8q0LE8/C yiUCuae+q7N7B1ImuNXtTiEgJqgQbVlpB2ZTjMVKyfjoXGRTGbOxlhEqxhvLJaYXc+FG UmzQ== X-Gm-Message-State: AOJu0YzUYiN9HJFg90rkEfvopkY/0oahv8NqE2XHTae9a6MsOO0eeWJb ovFKzna6bX3QYZh6ShcQni81wd18 X-Google-Smtp-Source: AGHT+IHvdRnQyXaD2cZVGaoFh+SnCDYDvumLE/wIGS5qzsMtiSO7OlE8DeyFhoIqnQRkL3NHIF6ZGS/W X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a25:6c89:0:b0:d9a:58e1:bb52 with SMTP id h131-20020a256c89000000b00d9a58e1bb52mr333598ybc.6.1701087802245; Mon, 27 Nov 2023 04:23:22 -0800 (PST) Date: Mon, 27 Nov 2023 13:23:04 +0100 In-Reply-To: <20231127122259.2265164-7-ardb@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231127122259.2265164-7-ardb@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=13333; i=ardb@kernel.org; h=from:subject; bh=/mtAyA8UvG4Co8hPQgaVgFqOJxTTWYBgb/Sg4+IwoKU=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JITWlSyNYe2vO2Tvqd3abzjDu2cGx33/OvHj/N+06jdeKP 2Usefipo5SFQYyDQVZMkUVg9t93O09PlKp1niULM4eVCWQIAxenAExkZi0jw9nGlKZJyXOOS31W zgyJqpYOfVmeWtyVU7DpqJqsk+TF7Qz/3S/u/uHOviNieeWmtQvvPvf27I49N/et0fLlmq1ZB/Z L8wAA X-Mailer: git-send-email 2.43.0.rc1.413.gea7ed67945-goog Message-ID: <20231127122259.2265164-11-ardb@google.com> Subject: [PATCH v3 4/5] arm64: crypto: Remove conditional yield logic From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: linux-crypto@vger.kernel.org, Ard Biesheuvel , Marc Zyngier , Will Deacon , Mark Rutland , Kees Cook , Catalin Marinas , Mark Brown , Eric Biggers , Sebastian Andrzej Siewior From: Ard Biesheuvel Some classes of crypto algorithms (such as skciphers or aeads) have natural yield points, but SIMD based shashes yield the NEON unit manually to avoid causing scheduling blackouts when operating on large inputs. This is no longer necessary now that kernel mode NEON runs with preemption enabled, so remove this logic from the crypto assembler code, along with the macro that implements the TIF_NEED_RESCHED check. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-glue.c | 21 +++++--------- arch/arm64/crypto/aes-modes.S | 2 -- arch/arm64/crypto/sha1-ce-core.S | 6 ++-- arch/arm64/crypto/sha1-ce-glue.c | 19 ++++--------- arch/arm64/crypto/sha2-ce-core.S | 6 ++-- arch/arm64/crypto/sha2-ce-glue.c | 19 ++++--------- arch/arm64/crypto/sha3-ce-core.S | 6 ++-- arch/arm64/crypto/sha3-ce-glue.c | 14 ++++------ arch/arm64/crypto/sha512-ce-core.S | 8 ++---- arch/arm64/crypto/sha512-ce-glue.c | 16 ++++------- arch/arm64/include/asm/assembler.h | 29 -------------------- arch/arm64/kernel/asm-offsets.c | 4 --- 12 files changed, 38 insertions(+), 112 deletions(-) diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c index 162787c7aa86..c42c903b7d60 100644 --- a/arch/arm64/crypto/aes-glue.c +++ b/arch/arm64/crypto/aes-glue.c @@ -109,9 +109,9 @@ asmlinkage void aes_essiv_cbc_decrypt(u8 out[], u8 const in[], u32 const rk1[], int rounds, int blocks, u8 iv[], u32 const rk2[]); -asmlinkage int aes_mac_update(u8 const in[], u32 const rk[], int rounds, - int blocks, u8 dg[], int enc_before, - int enc_after); +asmlinkage void aes_mac_update(u8 const in[], u32 const rk[], int rounds, + int blocks, u8 dg[], int enc_before, + int enc_after); struct crypto_aes_xts_ctx { struct crypto_aes_ctx key1; @@ -880,17 +880,10 @@ static void mac_do_update(struct crypto_aes_ctx *ctx, u8 const in[], int blocks, int rounds = 6 + ctx->key_length / 4; if (crypto_simd_usable()) { - int rem; - - do { - kernel_neon_begin(); - rem = aes_mac_update(in, ctx->key_enc, rounds, blocks, - dg, enc_before, enc_after); - kernel_neon_end(); - in += (blocks - rem) * AES_BLOCK_SIZE; - blocks = rem; - enc_before = 0; - } while (blocks); + kernel_neon_begin(); + aes_mac_update(in, ctx->key_enc, rounds, blocks, dg, + enc_before, enc_after); + kernel_neon_end(); } else { if (enc_before) aes_encrypt(ctx, dg, dg); diff --git a/arch/arm64/crypto/aes-modes.S b/arch/arm64/crypto/aes-modes.S index 0e834a2c062c..4d68853d0caf 100644 --- a/arch/arm64/crypto/aes-modes.S +++ b/arch/arm64/crypto/aes-modes.S @@ -842,7 +842,6 @@ AES_FUNC_START(aes_mac_update) cbz w5, .Lmacout encrypt_block v0, w2, x1, x7, w8 st1 {v0.16b}, [x4] /* return dg */ - cond_yield .Lmacout, x7, x8 b .Lmacloop4x .Lmac1x: add w3, w3, #4 @@ -861,6 +860,5 @@ AES_FUNC_START(aes_mac_update) .Lmacout: st1 {v0.16b}, [x4] /* return dg */ - mov w0, w3 ret AES_FUNC_END(aes_mac_update) diff --git a/arch/arm64/crypto/sha1-ce-core.S b/arch/arm64/crypto/sha1-ce-core.S index 9b1f2d82a6fe..9e37bc09c3a5 100644 --- a/arch/arm64/crypto/sha1-ce-core.S +++ b/arch/arm64/crypto/sha1-ce-core.S @@ -62,8 +62,8 @@ .endm /* - * int __sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src, - * int blocks) + * void __sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src, + * int blocks) */ SYM_FUNC_START(__sha1_ce_transform) /* load round constants */ @@ -121,7 +121,6 @@ CPU_LE( rev32 v11.16b, v11.16b ) add dgav.4s, dgav.4s, dg0v.4s cbz w2, 2f - cond_yield 3f, x5, x6 b 0b /* @@ -145,6 +144,5 @@ CPU_LE( rev32 v11.16b, v11.16b ) /* store new state */ 3: st1 {dgav.4s}, [x0] str dgb, [x0, #16] - mov w0, w2 ret SYM_FUNC_END(__sha1_ce_transform) diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c index 1dd93e1fcb39..c1c5c5cb104b 100644 --- a/arch/arm64/crypto/sha1-ce-glue.c +++ b/arch/arm64/crypto/sha1-ce-glue.c @@ -29,23 +29,16 @@ struct sha1_ce_state { extern const u32 sha1_ce_offsetof_count; extern const u32 sha1_ce_offsetof_finalize; -asmlinkage int __sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src, - int blocks); +asmlinkage void __sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src, + int blocks); static void sha1_ce_transform(struct sha1_state *sst, u8 const *src, int blocks) { - while (blocks) { - int rem; - - kernel_neon_begin(); - rem = __sha1_ce_transform(container_of(sst, - struct sha1_ce_state, - sst), src, blocks); - kernel_neon_end(); - src += (blocks - rem) * SHA1_BLOCK_SIZE; - blocks = rem; - } + kernel_neon_begin(); + __sha1_ce_transform(container_of(sst, struct sha1_ce_state, sst), src, + blocks); + kernel_neon_end(); } const u32 sha1_ce_offsetof_count = offsetof(struct sha1_ce_state, sst.count); diff --git a/arch/arm64/crypto/sha2-ce-core.S b/arch/arm64/crypto/sha2-ce-core.S index fce84d88ddb2..112d772b29db 100644 --- a/arch/arm64/crypto/sha2-ce-core.S +++ b/arch/arm64/crypto/sha2-ce-core.S @@ -71,8 +71,8 @@ .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 /* - * int __sha256_ce_transform(struct sha256_ce_state *sst, u8 const *src, - * int blocks) + * void __sha256_ce_transform(struct sha256_ce_state *sst, u8 const *src, + * int blocks) */ .text SYM_FUNC_START(__sha256_ce_transform) @@ -129,7 +129,6 @@ CPU_LE( rev32 v19.16b, v19.16b ) /* handled all input blocks? */ cbz w2, 2f - cond_yield 3f, x5, x6 b 0b /* @@ -152,6 +151,5 @@ CPU_LE( rev32 v19.16b, v19.16b ) /* store new state */ 3: st1 {dgav.4s, dgbv.4s}, [x0] - mov w0, w2 ret SYM_FUNC_END(__sha256_ce_transform) diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c index 0a44d2e7ee1f..f785a66a1de4 100644 --- a/arch/arm64/crypto/sha2-ce-glue.c +++ b/arch/arm64/crypto/sha2-ce-glue.c @@ -30,23 +30,16 @@ struct sha256_ce_state { extern const u32 sha256_ce_offsetof_count; extern const u32 sha256_ce_offsetof_finalize; -asmlinkage int __sha256_ce_transform(struct sha256_ce_state *sst, u8 const *src, - int blocks); +asmlinkage void __sha256_ce_transform(struct sha256_ce_state *sst, u8 const *src, + int blocks); static void sha256_ce_transform(struct sha256_state *sst, u8 const *src, int blocks) { - while (blocks) { - int rem; - - kernel_neon_begin(); - rem = __sha256_ce_transform(container_of(sst, - struct sha256_ce_state, - sst), src, blocks); - kernel_neon_end(); - src += (blocks - rem) * SHA256_BLOCK_SIZE; - blocks = rem; - } + kernel_neon_begin(); + __sha256_ce_transform(container_of(sst, struct sha256_ce_state, sst), + src, blocks); + kernel_neon_end(); } const u32 sha256_ce_offsetof_count = offsetof(struct sha256_ce_state, diff --git a/arch/arm64/crypto/sha3-ce-core.S b/arch/arm64/crypto/sha3-ce-core.S index 9c77313f5a60..db64831ad35d 100644 --- a/arch/arm64/crypto/sha3-ce-core.S +++ b/arch/arm64/crypto/sha3-ce-core.S @@ -37,7 +37,7 @@ .endm /* - * int sha3_ce_transform(u64 *st, const u8 *data, int blocks, int dg_size) + * void sha3_ce_transform(u64 *st, const u8 *data, int blocks, int dg_size) */ .text SYM_FUNC_START(sha3_ce_transform) @@ -184,18 +184,16 @@ SYM_FUNC_START(sha3_ce_transform) eor v0.16b, v0.16b, v31.16b cbnz w8, 3b - cond_yield 4f, x8, x9 cbnz w2, 0b /* save state */ -4: st1 { v0.1d- v3.1d}, [x0], #32 + st1 { v0.1d- v3.1d}, [x0], #32 st1 { v4.1d- v7.1d}, [x0], #32 st1 { v8.1d-v11.1d}, [x0], #32 st1 {v12.1d-v15.1d}, [x0], #32 st1 {v16.1d-v19.1d}, [x0], #32 st1 {v20.1d-v23.1d}, [x0], #32 st1 {v24.1d}, [x0] - mov w0, w2 ret SYM_FUNC_END(sha3_ce_transform) diff --git a/arch/arm64/crypto/sha3-ce-glue.c b/arch/arm64/crypto/sha3-ce-glue.c index 250e1377c481..d689cd2bf4cf 100644 --- a/arch/arm64/crypto/sha3-ce-glue.c +++ b/arch/arm64/crypto/sha3-ce-glue.c @@ -28,8 +28,8 @@ MODULE_ALIAS_CRYPTO("sha3-256"); MODULE_ALIAS_CRYPTO("sha3-384"); MODULE_ALIAS_CRYPTO("sha3-512"); -asmlinkage int sha3_ce_transform(u64 *st, const u8 *data, int blocks, - int md_len); +asmlinkage void sha3_ce_transform(u64 *st, const u8 *data, int blocks, + int md_len); static int sha3_update(struct shash_desc *desc, const u8 *data, unsigned int len) @@ -59,15 +59,11 @@ static int sha3_update(struct shash_desc *desc, const u8 *data, blocks = len / sctx->rsiz; len %= sctx->rsiz; - while (blocks) { - int rem; - + if (blocks) { kernel_neon_begin(); - rem = sha3_ce_transform(sctx->st, data, blocks, - digest_size); + sha3_ce_transform(sctx->st, data, blocks, digest_size); kernel_neon_end(); - data += (blocks - rem) * sctx->rsiz; - blocks = rem; + data += blocks * sctx->rsiz; } } diff --git a/arch/arm64/crypto/sha512-ce-core.S b/arch/arm64/crypto/sha512-ce-core.S index 91ef68b15fcc..96acc9295230 100644 --- a/arch/arm64/crypto/sha512-ce-core.S +++ b/arch/arm64/crypto/sha512-ce-core.S @@ -102,8 +102,8 @@ .endm /* - * int __sha512_ce_transform(struct sha512_state *sst, u8 const *src, - * int blocks) + * void __sha512_ce_transform(struct sha512_state *sst, u8 const *src, + * int blocks) */ .text SYM_FUNC_START(__sha512_ce_transform) @@ -195,12 +195,10 @@ CPU_LE( rev64 v19.16b, v19.16b ) add v10.2d, v10.2d, v2.2d add v11.2d, v11.2d, v3.2d - cond_yield 3f, x4, x5 /* handled all input blocks? */ cbnz w2, 0b /* store new state */ -3: st1 {v8.2d-v11.2d}, [x0] - mov w0, w2 + st1 {v8.2d-v11.2d}, [x0] ret SYM_FUNC_END(__sha512_ce_transform) diff --git a/arch/arm64/crypto/sha512-ce-glue.c b/arch/arm64/crypto/sha512-ce-glue.c index f3431fc62315..70eef74fe031 100644 --- a/arch/arm64/crypto/sha512-ce-glue.c +++ b/arch/arm64/crypto/sha512-ce-glue.c @@ -26,23 +26,17 @@ MODULE_LICENSE("GPL v2"); MODULE_ALIAS_CRYPTO("sha384"); MODULE_ALIAS_CRYPTO("sha512"); -asmlinkage int __sha512_ce_transform(struct sha512_state *sst, u8 const *src, - int blocks); +asmlinkage void __sha512_ce_transform(struct sha512_state *sst, u8 const *src, + int blocks); asmlinkage void sha512_block_data_order(u64 *digest, u8 const *src, int blocks); static void sha512_ce_transform(struct sha512_state *sst, u8 const *src, int blocks) { - while (blocks) { - int rem; - - kernel_neon_begin(); - rem = __sha512_ce_transform(sst, src, blocks); - kernel_neon_end(); - src += (blocks - rem) * SHA512_BLOCK_SIZE; - blocks = rem; - } + kernel_neon_begin(); + __sha512_ce_transform(sst, src, blocks); + kernel_neon_end(); } static void sha512_arm64_transform(struct sha512_state *sst, u8 const *src, diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 376a980f2bad..f0da53a0388f 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -759,35 +759,6 @@ alternative_endif set_sctlr sctlr_el2, \reg .endm - /* - * Check whether preempt/bh-disabled asm code should yield as soon as - * it is able. This is the case if we are currently running in task - * context, and either a softirq is pending, or the TIF_NEED_RESCHED - * flag is set and re-enabling preemption a single time would result in - * a preempt count of zero. (Note that the TIF_NEED_RESCHED flag is - * stored negated in the top word of the thread_info::preempt_count - * field) - */ - .macro cond_yield, lbl:req, tmp:req, tmp2:req - get_current_task \tmp - ldr \tmp, [\tmp, #TSK_TI_PREEMPT] - /* - * If we are serving a softirq, there is no point in yielding: the - * softirq will not be preempted no matter what we do, so we should - * run to completion as quickly as we can. - */ - tbnz \tmp, #SOFTIRQ_SHIFT, .Lnoyield_\@ -#ifdef CONFIG_PREEMPTION - sub \tmp, \tmp, #PREEMPT_DISABLE_OFFSET - cbz \tmp, \lbl -#endif - adr_l \tmp, irq_stat + IRQ_CPUSTAT_SOFTIRQ_PENDING - get_this_cpu_offset \tmp2 - ldr w\tmp, [\tmp, \tmp2] - cbnz w\tmp, \lbl // yield on pending softirq in task context -.Lnoyield_\@: - .endm - /* * Branch Target Identifier (BTI) */ diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c index 5ff1942b04fc..fb9e9ef9b527 100644 --- a/arch/arm64/kernel/asm-offsets.c +++ b/arch/arm64/kernel/asm-offsets.c @@ -116,10 +116,6 @@ int main(void) DEFINE(DMA_TO_DEVICE, DMA_TO_DEVICE); DEFINE(DMA_FROM_DEVICE, DMA_FROM_DEVICE); BLANK(); - DEFINE(PREEMPT_DISABLE_OFFSET, PREEMPT_DISABLE_OFFSET); - DEFINE(SOFTIRQ_SHIFT, SOFTIRQ_SHIFT); - DEFINE(IRQ_CPUSTAT_SOFTIRQ_PENDING, offsetof(irq_cpustat_t, __softirq_pending)); - BLANK(); DEFINE(CPU_BOOT_TASK, offsetof(struct secondary_data, task)); BLANK(); DEFINE(FTR_OVR_VAL_OFFSET, offsetof(struct arm64_ftr_override, val)); From patchwork Mon Nov 27 12:23:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 747645 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CahGLWlb" Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6986F111 for ; Mon, 27 Nov 2023 04:23:25 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id 3f1490d57ef6-d9a541b720aso4567429276.0 for ; Mon, 27 Nov 2023 04:23:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701087804; x=1701692604; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LtKj4jG97S5ROkd3H2gfeK9FLeAMjZ/gxMMmIHx9jAA=; b=CahGLWlb37HnP04bXZT6A1WoxjcDJRhYH+3bzz6Z4pM/YkD4LFkro64tMzGDbu5iB5 wJGuI0AYAEi06UK1/oQa+T7ZiGzBA+gQU+NtqwzzVMg7UHF7+OV6poU/ipB+7iTUCPlf zMreawL9/lQ264YkRFEg3jKuDuDnO7QGA0p0efOxbuDNJd5fHwi0WWSiMp1/AU1JLLe2 1pGWhrlypeoIz6uWa/h2MB9GFTNGq+O7QjGTjy9HtnmUFK4WsqUdzjZQIndPjaOOqP1k oLx3pVAb451Xvo4zNx9n7Gj3NDtBSdEAmg2PbbCUz0OuWmJmXqMngiDSdNpMDqdREqf1 m+WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701087804; x=1701692604; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LtKj4jG97S5ROkd3H2gfeK9FLeAMjZ/gxMMmIHx9jAA=; b=NjNpvwCIGSiGNxFLeoxJmSYU6VCyAnKPx40KZjM/11xCqHxJl6s7+G3Xuc//3u3JNH FbPx7Q16n1YwCBVc35N6Bl3xA+yH8cW+jHFTP09bQcpYX9TJMXKkXdf/umYpKNl5u17z XDN5mYz/4btHVlhH2vFVaqJDZ2Be3MIdUeNlfrd127Kf27SLBlROV83E9livgF8ZAdoD nYp/FoFmYxx7lvH6sqzMaCL7xlDEoJed4GXJ0mxBmUoiK0C2QfaXEmaj5a3c9fB9j0J+ wDg2QXp0LvED3stcNlUajjJ20g7IHgqMZlOkg3QSc+nm8uhQXd1n9/s4CYAyW6FNOs8N mWlw== X-Gm-Message-State: AOJu0YwtRZ1hwBtvKw8l7XBL5b08/qgX2RgS7wCuwe8+MqyoCts0XcqO FtpSgd1ywwCWJHO2WGUOmI0gBUB9 X-Google-Smtp-Source: AGHT+IEe7Hy+nDV+k7QymmY1pBDw6FA1EP22u6IitdInsZJo4dzubHu7ZgGyB0HUS9r6tQh4h5B1ojyq X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a25:738c:0:b0:d90:e580:88e5 with SMTP id o134-20020a25738c000000b00d90e58088e5mr311965ybc.10.1701087804633; Mon, 27 Nov 2023 04:23:24 -0800 (PST) Date: Mon, 27 Nov 2023 13:23:05 +0100 In-Reply-To: <20231127122259.2265164-7-ardb@google.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20231127122259.2265164-7-ardb@google.com> X-Developer-Key: i=ardb@kernel.org; a=openpgp; fpr=F43D03328115A198C90016883D200E9CA6329909 X-Developer-Signature: v=1; a=openpgp-sha256; l=6539; i=ardb@kernel.org; h=from:subject; bh=KAO1mMfveVRlNLtpRgzO/obJW9IGCdblJhF8gKSOjpo=; b=owGbwMvMwCFmkMcZplerG8N4Wi2JITWlS3NjhFD+pbQDzg/KdTX2f7X5xen80lOvftfNdJFLH jyH9fU6SlkYxDgYZMUUWQRm/3238/REqVrnWbIwc1iZQIYwcHEKwESijjP8j58bxVy6sD5o9Rb+ rJYZHfv+nuOc0TNTqal5id3KdsZSVob/qaJXNblWRYaEHHTj5il+Hi+6xW2zUq7gFe5kjXzHGWb 8AA== X-Mailer: git-send-email 2.43.0.rc1.413.gea7ed67945-goog Message-ID: <20231127122259.2265164-12-ardb@google.com> Subject: [PATCH v3 5/5] arm64: crypto: Remove FPSIMD yield logic from glue code From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org Cc: linux-crypto@vger.kernel.org, Ard Biesheuvel , Marc Zyngier , Will Deacon , Mark Rutland , Kees Cook , Catalin Marinas , Mark Brown , Eric Biggers , Sebastian Andrzej Siewior From: Ard Biesheuvel A previous patch already removed the assembler logic that was used to check periodically whether a task has its TIF_NEED_RESCHED set, and to yield the FPSIMD unit and the timeslice if this is the case. This is no longer necessary now that we no longer disable preemption when using the FPSIMD in kernel mode. Let's also remove the remaining C logic that yields the FPSIMD unit after every 4 KiB of input, which is arguably worse in terms of overhead, given that it is unconditional and therefore mostly unnecessary. Signed-off-by: Ard Biesheuvel --- arch/arm64/crypto/aes-ce-ccm-glue.c | 5 ---- arch/arm64/crypto/chacha-neon-glue.c | 14 ++------- arch/arm64/crypto/crct10dif-ce-glue.c | 30 ++++---------------- arch/arm64/crypto/nhpoly1305-neon-glue.c | 12 ++------ arch/arm64/crypto/poly1305-glue.c | 15 +++------- arch/arm64/crypto/polyval-ce-glue.c | 5 ++-- 6 files changed, 18 insertions(+), 63 deletions(-) diff --git a/arch/arm64/crypto/aes-ce-ccm-glue.c b/arch/arm64/crypto/aes-ce-ccm-glue.c index 25cd3808ecbe..a92ca6de1f96 100644 --- a/arch/arm64/crypto/aes-ce-ccm-glue.c +++ b/arch/arm64/crypto/aes-ce-ccm-glue.c @@ -125,16 +125,11 @@ static void ccm_calculate_auth_mac(struct aead_request *req, u8 mac[]) scatterwalk_start(&walk, sg_next(walk.sg)); n = scatterwalk_clamp(&walk, len); } - n = min_t(u32, n, SZ_4K); /* yield NEON at least every 4k */ p = scatterwalk_map(&walk); macp = ce_aes_ccm_auth_data(mac, p, n, macp, ctx->key_enc, num_rounds(ctx)); - if (len / SZ_4K > (len - n) / SZ_4K) { - kernel_neon_end(); - kernel_neon_begin(); - } len -= n; scatterwalk_unmap(p); diff --git a/arch/arm64/crypto/chacha-neon-glue.c b/arch/arm64/crypto/chacha-neon-glue.c index af2bbca38e70..37ca3e889848 100644 --- a/arch/arm64/crypto/chacha-neon-glue.c +++ b/arch/arm64/crypto/chacha-neon-glue.c @@ -87,17 +87,9 @@ void chacha_crypt_arch(u32 *state, u8 *dst, const u8 *src, unsigned int bytes, !crypto_simd_usable()) return chacha_crypt_generic(state, dst, src, bytes, nrounds); - do { - unsigned int todo = min_t(unsigned int, bytes, SZ_4K); - - kernel_neon_begin(); - chacha_doneon(state, dst, src, todo, nrounds); - kernel_neon_end(); - - bytes -= todo; - src += todo; - dst += todo; - } while (bytes); + kernel_neon_begin(); + chacha_doneon(state, dst, src, bytes, nrounds); + kernel_neon_end(); } EXPORT_SYMBOL(chacha_crypt_arch); diff --git a/arch/arm64/crypto/crct10dif-ce-glue.c b/arch/arm64/crypto/crct10dif-ce-glue.c index 09eb1456aed4..ccc3f6067742 100644 --- a/arch/arm64/crypto/crct10dif-ce-glue.c +++ b/arch/arm64/crypto/crct10dif-ce-glue.c @@ -37,18 +37,9 @@ static int crct10dif_update_pmull_p8(struct shash_desc *desc, const u8 *data, u16 *crc = shash_desc_ctx(desc); if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && crypto_simd_usable()) { - do { - unsigned int chunk = length; - - if (chunk > SZ_4K + CRC_T10DIF_PMULL_CHUNK_SIZE) - chunk = SZ_4K; - - kernel_neon_begin(); - *crc = crc_t10dif_pmull_p8(*crc, data, chunk); - kernel_neon_end(); - data += chunk; - length -= chunk; - } while (length); + kernel_neon_begin(); + *crc = crc_t10dif_pmull_p8(*crc, data, length); + kernel_neon_end(); } else { *crc = crc_t10dif_generic(*crc, data, length); } @@ -62,18 +53,9 @@ static int crct10dif_update_pmull_p64(struct shash_desc *desc, const u8 *data, u16 *crc = shash_desc_ctx(desc); if (length >= CRC_T10DIF_PMULL_CHUNK_SIZE && crypto_simd_usable()) { - do { - unsigned int chunk = length; - - if (chunk > SZ_4K + CRC_T10DIF_PMULL_CHUNK_SIZE) - chunk = SZ_4K; - - kernel_neon_begin(); - *crc = crc_t10dif_pmull_p64(*crc, data, chunk); - kernel_neon_end(); - data += chunk; - length -= chunk; - } while (length); + kernel_neon_begin(); + *crc = crc_t10dif_pmull_p64(*crc, data, length); + kernel_neon_end(); } else { *crc = crc_t10dif_generic(*crc, data, length); } diff --git a/arch/arm64/crypto/nhpoly1305-neon-glue.c b/arch/arm64/crypto/nhpoly1305-neon-glue.c index e4a0b463f080..7df0ab811c4e 100644 --- a/arch/arm64/crypto/nhpoly1305-neon-glue.c +++ b/arch/arm64/crypto/nhpoly1305-neon-glue.c @@ -22,15 +22,9 @@ static int nhpoly1305_neon_update(struct shash_desc *desc, if (srclen < 64 || !crypto_simd_usable()) return crypto_nhpoly1305_update(desc, src, srclen); - do { - unsigned int n = min_t(unsigned int, srclen, SZ_4K); - - kernel_neon_begin(); - crypto_nhpoly1305_update_helper(desc, src, n, nh_neon); - kernel_neon_end(); - src += n; - srclen -= n; - } while (srclen); + kernel_neon_begin(); + crypto_nhpoly1305_update_helper(desc, src, srclen, nh_neon); + kernel_neon_end(); return 0; } diff --git a/arch/arm64/crypto/poly1305-glue.c b/arch/arm64/crypto/poly1305-glue.c index 1fae18ba11ed..326871897d5d 100644 --- a/arch/arm64/crypto/poly1305-glue.c +++ b/arch/arm64/crypto/poly1305-glue.c @@ -143,20 +143,13 @@ void poly1305_update_arch(struct poly1305_desc_ctx *dctx, const u8 *src, unsigned int len = round_down(nbytes, POLY1305_BLOCK_SIZE); if (static_branch_likely(&have_neon) && crypto_simd_usable()) { - do { - unsigned int todo = min_t(unsigned int, len, SZ_4K); - - kernel_neon_begin(); - poly1305_blocks_neon(&dctx->h, src, todo, 1); - kernel_neon_end(); - - len -= todo; - src += todo; - } while (len); + kernel_neon_begin(); + poly1305_blocks_neon(&dctx->h, src, len, 1); + kernel_neon_end(); } else { poly1305_blocks(&dctx->h, src, len, 1); - src += len; } + src += len; nbytes %= POLY1305_BLOCK_SIZE; } diff --git a/arch/arm64/crypto/polyval-ce-glue.c b/arch/arm64/crypto/polyval-ce-glue.c index 0a3b5718df85..8c83e5f44e51 100644 --- a/arch/arm64/crypto/polyval-ce-glue.c +++ b/arch/arm64/crypto/polyval-ce-glue.c @@ -122,9 +122,8 @@ static int polyval_arm64_update(struct shash_desc *desc, tctx->key_powers[NUM_KEY_POWERS-1]); } - while (srclen >= POLYVAL_BLOCK_SIZE) { - /* allow rescheduling every 4K bytes */ - nblocks = min(srclen, 4096U) / POLYVAL_BLOCK_SIZE; + if (srclen >= POLYVAL_BLOCK_SIZE) { + nblocks = srclen / POLYVAL_BLOCK_SIZE; internal_polyval_update(tctx, src, nblocks, dctx->buffer); srclen -= nblocks * POLYVAL_BLOCK_SIZE; src += nblocks * POLYVAL_BLOCK_SIZE;