From patchwork Tue Dec 6 15:39:00 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ard Biesheuvel X-Patchwork-Id: 86865 Delivered-To: patch@linaro.org Received: by 10.140.20.101 with SMTP id 92csp2079394qgi; Tue, 6 Dec 2016 07:40:42 -0800 (PST) X-Received: by 10.99.232.21 with SMTP id s21mr113763770pgh.19.1481038842766; Tue, 06 Dec 2016 07:40:42 -0800 (PST) Return-Path: Received: from bombadil.infradead.org (bombadil.infradead.org. [2001:1868:205::9]) by mx.google.com with ESMTPS id b14si19978234pli.237.2016.12.06.07.40.42 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 06 Dec 2016 07:40:42 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) client-ip=2001:1868:205::9; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org designates 2001:1868:205::9 as permitted sender) smtp.mailfrom=linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.85_2 #1 (Red Hat Linux)) id 1cEHqA-0004aH-8H; Tue, 06 Dec 2016 15:39:34 +0000 Received: from mail-wm0-x234.google.com ([2a00:1450:400c:c09::234]) by bombadil.infradead.org with esmtps (Exim 4.85_2 #1 (Red Hat Linux)) id 1cEHq5-0004SW-LF for linux-arm-kernel@lists.infradead.org; Tue, 06 Dec 2016 15:39:30 +0000 Received: by mail-wm0-x234.google.com with SMTP id t79so130296663wmt.0 for ; Tue, 06 Dec 2016 07:39:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=w8a6ki966tZ6hAxJucAjyUuUPkLcywuCtwTeF1okt2g=; b=G6K5ctUTZUU70UufRPK9OFrEFR7RCWY1OCe1PH9yIpOM2Tq05EMf520JF0lY76H1do DnYu/lyuKL1IKM+FuIGNP0Un6D3yjqs2DBRHmoExot+dQqzVnOpWCQtTJD4E7R4SPQQv hERgr7PGnDa6pOuQcP2gMQP6DO5385cLk/EpY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=w8a6ki966tZ6hAxJucAjyUuUPkLcywuCtwTeF1okt2g=; b=cxmBcunBimlUtrO++C180Epbu3cCuZU9JTF23b8rx4DLGL424ObawJrQJPAEnBRBs4 uHQLfbFlZ42Y+h/zgHRCDCN1UB1g9usMR3sEOVN4zqbQR3BeVqd5w4rZpM6TnUgS4mQ1 w0/i94xk0rHDtp2PyVBKdBEPBWHShJzd5QvGgFRiPNlotc2HcOgteRKV5Q1QDJ6S3HFG htpMiextrOUHDJJ8cbday7hPon4g8y2o6V/ir+j25WrK+tMeCZ/CuKmMyiXGKrDTKVyl 1C6vBZRvhbHvZB6vXZMyfEnl6sJM0uAtP9zj2XqMCs+ZPsKgg0Rb4SPyf2+KCLiC4771 mAmA== X-Gm-Message-State: AKaTC02ohOkV1k+Zu5HcAcydWZAMnTm5dRNt7t5WN9K7IlIUKHIHs7ZACYsVs0j5Yd0zMRE8 X-Received: by 10.28.142.16 with SMTP id q16mr3152380wmd.35.1481038747659; Tue, 06 Dec 2016 07:39:07 -0800 (PST) Received: from localhost.localdomain ([105.144.52.243]) by smtp.gmail.com with ESMTPSA id x5sm26314428wje.36.2016.12.06.07.39.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 06 Dec 2016 07:39:06 -0800 (PST) From: Ard Biesheuvel To: linux-arm-kernel@lists.infradead.org, dave.martin@arm.com Subject: [RFC PATCH] arm64: fpsimd: improve stacking logic in non-interruptible context Date: Tue, 6 Dec 2016 15:39:00 +0000 Message-Id: <1481038740-11633-1-git-send-email-ard.biesheuvel@linaro.org> X-Mailer: git-send-email 2.7.4 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20161206_073929_916596_D9F3A580 X-CRM114-Status: GOOD ( 15.71 ) X-Spam-Score: -2.7 (--) X-Spam-Report: SpamAssassin version 3.4.0 on bombadil.infradead.org summary: Content analysis details: (-2.7 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [2a00:1450:400c:c09:0:0:0:234 listed in] [list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] -0.1 DKIM_VALID Message has at least one valid DKIM or DK signature -0.1 DKIM_VALID_AU Message has a valid DKIM or DK signature from author's domain 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: catalin.marinas@arm.com, will.deacon@arm.com, Ard Biesheuvel MIME-Version: 1.0 Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patch=linaro.org@lists.infradead.org Currently, we allow kernel mode NEON in softirq or hardirq context by stacking and unstacking a slice of the NEON register file for each call to kernel_neon_begin() and kernel_neon_end(), respectively. Given that a) a CPU typically spends most of its time in userland, during which time no kernel mode NEON in process context is in progress, b) a CPU spends most of its time in the kernel doing other things than kernel mode NEON when it gets interrupted to perform kernel mode NEON in softirq context the stacking and subsequent unstacking is only necessary if we are interrupting a thread while it is performing kernel mode NEON in process context, which means that in all other cases, we can simply preserve the userland FPSIMD state once, and only restore it upon return to userland, even if we are being invoked from softirq or hardirq context. So instead of checking whether we are running in interrupt context, keep track of the level of nested kernel mode NEON calls in progress, and only perform the eager stack/unstack of the level exceeds 1. Signed-off-by: Ard Biesheuvel --- arch/arm64/kernel/fpsimd.c | 40 +++++++++++++++++++++++++++------------- 1 file changed, 27 insertions(+), 13 deletions(-) -- 2.7.4 _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 394c61db5566..2614a216ac5d 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -220,20 +220,31 @@ void fpsimd_flush_task_state(struct task_struct *t) #ifdef CONFIG_KERNEL_MODE_NEON -static DEFINE_PER_CPU(struct fpsimd_partial_state, hardirq_fpsimdstate); -static DEFINE_PER_CPU(struct fpsimd_partial_state, softirq_fpsimdstate); +/* + * Although unlikely, it is possible for three kernel mode NEON contexts to + * be live at the same time: process context, softirq context and hardirq + * context. So while the userland context is stashed in the thread's fpsimd + * state structure, we need two additional levels of storage. + */ +static DEFINE_PER_CPU(struct fpsimd_partial_state, nested_fpsimdstate[2]); +static DEFINE_PER_CPU(int, kernel_neon_nesting_level); /* * Kernel-side NEON support functions */ void kernel_neon_begin_partial(u32 num_regs) { - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); + struct fpsimd_partial_state *s; + int level; + + preempt_disable(); + + level = this_cpu_read(kernel_neon_nesting_level); + if (level > 0) { + s = this_cpu_ptr(nested_fpsimdstate); BUG_ON(num_regs > 32); - fpsimd_save_partial_state(s, roundup(num_regs, 2)); + fpsimd_save_partial_state(&s[level - 1], roundup(num_regs, 2)); } else { /* * Save the userland FPSIMD state if we have one and if we @@ -241,24 +252,27 @@ void kernel_neon_begin_partial(u32 num_regs) * that there is no longer userland FPSIMD state in the * registers. */ - preempt_disable(); if (current->mm && !test_and_set_thread_flag(TIF_FOREIGN_FPSTATE)) fpsimd_save_state(¤t->thread.fpsimd_state); this_cpu_write(fpsimd_last_state, NULL); } + this_cpu_write(kernel_neon_nesting_level, level + 1); } EXPORT_SYMBOL(kernel_neon_begin_partial); void kernel_neon_end(void) { - if (in_interrupt()) { - struct fpsimd_partial_state *s = this_cpu_ptr( - in_irq() ? &hardirq_fpsimdstate : &softirq_fpsimdstate); - fpsimd_load_partial_state(s); - } else { - preempt_enable(); + struct fpsimd_partial_state *s; + int level; + + level = this_cpu_read(kernel_neon_nesting_level); + if (level > 1) { + s = this_cpu_ptr(nested_fpsimdstate); + fpsimd_load_partial_state(&s[level - 2]); } + this_cpu_write(kernel_neon_nesting_level, level - 1); + preempt_enable(); } EXPORT_SYMBOL(kernel_neon_end);