From patchwork Thu Apr 1 22:58:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 413957 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34697C433B4 for ; Thu, 1 Apr 2021 22:58:41 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F0954610FA for ; Thu, 1 Apr 2021 22:58:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235775AbhDAW6j (ORCPT ); Thu, 1 Apr 2021 18:58:39 -0400 Received: from mga04.intel.com ([192.55.52.120]:11530 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235188AbhDAW6h (ORCPT ); Thu, 1 Apr 2021 18:58:37 -0400 IronPort-SDR: lL4dQUuc+GoZonyn3tGjHFSZieVLaI6nHRgyRRnp9RJOxRFAmpDIjXcCLRKdv5HTcVR+ul3mV6 22ihYYYEnpjQ== X-IronPort-AV: E=McAfee;i="6000,8403,9941"; a="190117100" X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="190117100" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:36 -0700 IronPort-SDR: 0SpMwS7muan2xU01xnsskKq3Ua/zLN/26a3JAKZxvjXYuQWrI90YjNfIHdc/vWfmjfqS5Z/yoN 7RnHf4CfLSVg== X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="611092347" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:36 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, Fenghua Yu , linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V6 03/10] x86/pks: Add additional PKEY helper macros Date: Thu, 1 Apr 2021 15:58:26 -0700 Message-Id: <20210401225833.566238-4-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210401225833.566238-1-ira.weiny@intel.com> References: <20210401225833.566238-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Avoid open coding shift and mask operations by defining and using helper macros for PKey operations. Reviewed-by: Dan Williams Signed-off-by: Ira Weiny --- Changes from V3: new patch suggested by Dan Williams to use macros better. --- arch/x86/include/asm/pgtable.h | 7 ++----- arch/x86/include/asm/pkeys_common.h | 11 ++++++++--- arch/x86/mm/pkeys.c | 8 +++----- 3 files changed, 13 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 53bbde334193..07e9779b76d2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1370,16 +1370,13 @@ extern u32 init_pkru_value; static inline bool __pkru_allows_read(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; - - return !(pkru & (PKR_AD_BIT << pkru_pkey_bits)); + return !(pkru & PKR_AD_KEY(pkey)); } static inline bool __pkru_allows_write(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; /* Access-disable disables writes too so check both bits here. */ - return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits)); + return !(pkru & (PKR_AD_KEY(pkey) | PKR_WD_KEY(pkey))); } static inline u16 pte_flags_pkey(unsigned long pte_flags) diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h index e40b0ced733f..0681522974ba 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -6,10 +6,15 @@ #define PKR_WD_BIT 0x2 #define PKR_BITS_PER_PKEY 2 +#define PKR_PKEY_SHIFT(pkey) (pkey * PKR_BITS_PER_PKEY) +#define PKR_PKEY_MASK(pkey) (((1 << PKR_BITS_PER_PKEY) - 1) << PKR_PKEY_SHIFT(pkey)) + /* - * Generate an Access-Disable mask for the given pkey. Several of these can be - * OR'd together to generate pkey register values. + * Generate an Access-Disable and Write-Disable mask for the given pkey. + * Several of the AD's are OR'd together to generate a default pkey register + * value. */ -#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << PKR_PKEY_SHIFT(pkey)) +#define PKR_WD_KEY(pkey) (PKR_WD_BIT << PKR_PKEY_SHIFT(pkey)) #endif /*_ASM_X86_PKEYS_COMMON_H */ diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index d1dfe743e79f..fc8c7e2bb21b 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -218,16 +218,14 @@ __setup("init_pkru=", setup_init_pkru); */ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) { - int pkey_shift = pkey * PKR_BITS_PER_PKEY; - /* Mask out old bit values */ - pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift); + pk_reg &= ~PKR_PKEY_MASK(pkey); /* Or in new values */ if (flags & PKEY_DISABLE_ACCESS) - pk_reg |= PKR_AD_BIT << pkey_shift; + pk_reg |= PKR_AD_KEY(pkey); if (flags & PKEY_DISABLE_WRITE) - pk_reg |= PKR_WD_BIT << pkey_shift; + pk_reg |= PKR_WD_KEY(pkey); return pk_reg; } From patchwork Thu Apr 1 22:58:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 413955 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 063C6C4361B for ; Thu, 1 Apr 2021 22:58:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id BA4E461105 for ; Thu, 1 Apr 2021 22:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235888AbhDAW6n (ORCPT ); Thu, 1 Apr 2021 18:58:43 -0400 Received: from mga04.intel.com ([192.55.52.120]:11530 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235373AbhDAW6i (ORCPT ); Thu, 1 Apr 2021 18:58:38 -0400 IronPort-SDR: dui/g2P/GGrVuqZk2FY7siuSxrKXhN57D6ClSJOiBYJyRdY83+muKkkxoAG4pg8IDieNX72Epl tPxASwsBrIJQ== X-IronPort-AV: E=McAfee;i="6000,8403,9941"; a="190117107" X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="190117107" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:37 -0700 IronPort-SDR: Iu6fTwQoZ98uMmFHDB6Z8+rKIc+0xugOW7VluQv1j6K3mita2jQ/Fub6hUzbaajpXc3KEAqIyg qCFecINMnZVA== X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="611092362" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:37 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Sean Christopherson , Dan Williams , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, Fenghua Yu , linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V6 06/10] x86/fault: Adjust WARN_ON for PKey fault Date: Thu, 1 Apr 2021 15:58:29 -0700 Message-Id: <20210401225833.566238-7-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210401225833.566238-1-ira.weiny@intel.com> References: <20210401225833.566238-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Previously if a Protection key fault occurred it indicated something very wrong because user page mappings are not supposed to be in the kernel address space. Now PKey faults may happen on kernel mappings if the feature is enabled. Remove the warning in the fault path and allow the oops to occur without extra debugging if PKS is enabled. Cc: Sean Christopherson Cc: Dan Williams Signed-off-by: Ira Weiny --- Changes from V5: From Dave Hansen Remove 'we' from comment Changes from V4: From Sean Christopherson Clean up commit message and comment Change cpu_feature_enabled to be in WARN_ON check --- arch/x86/mm/fault.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index a73347e2cdfc..0e0e90968f57 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1141,11 +1141,15 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, unsigned long address) { /* - * Protection keys exceptions only happen on user pages. We - * have no user pages in the kernel portion of the address - * space, so do not expect them here. + * X86_PF_PK (Protection key exceptions) may occur on kernel addresses + * when PKS (PKeys Supervisor) is enabled. + * + * However, if PKS is not enabled WARN if this exception is seen + * because there are no user pages in the kernel portion of the address + * space. */ - WARN_ON_ONCE(hw_error_code & X86_PF_PK); + WARN_ON_ONCE(!cpu_feature_enabled(X86_FEATURE_PKS) && + (hw_error_code & X86_PF_PK)); #ifdef CONFIG_X86_32 /* From patchwork Thu Apr 1 22:58:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 413956 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5A59C43617 for ; Thu, 1 Apr 2021 22:58:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 65D1761105 for ; Thu, 1 Apr 2021 22:58:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235823AbhDAW6k (ORCPT ); Thu, 1 Apr 2021 18:58:40 -0400 Received: from mga04.intel.com ([192.55.52.120]:11532 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235119AbhDAW6j (ORCPT ); Thu, 1 Apr 2021 18:58:39 -0400 IronPort-SDR: IKRVHflrDHjUOhf9p3x77ckIX3DD7ZkfndwNK7sSZtW+kygPGsYBIwN+0KTj1+5COBcAcmbygs 549iNNvuLxsQ== X-IronPort-AV: E=McAfee;i="6000,8403,9941"; a="190117109" X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="190117109" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:37 -0700 IronPort-SDR: r1sqz1aLrculmgb2KMyIdvfHQALMne7H9HlfgJ1vzx3qm9uLEo6SgxVQzaX0hF2JviGvIhE9tJ dNqIUmwqyVNA== X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="611092365" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:37 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Fenghua Yu , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V6 07/10] x86/pks: Preserve the PKRS MSR on context switch Date: Thu, 1 Apr 2021 15:58:30 -0700 Message-Id: <20210401225833.566238-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210401225833.566238-1-ira.weiny@intel.com> References: <20210401225833.566238-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The PKRS MSR is defined as a per-logical-processor register. This isolates memory access by logical CPU. Unfortunately, the MSR is not managed by XSAVE. Therefore, tasks must save/restore the MSR value on context switch. Define a saved PKRS value in the task struct, as well as a cached per-logical-processor MSR value which mirrors the MSR value of the current CPU. Initialize all tasks with the default MSR value. Then, on schedule in, call write_pkrs() which automatically avoids the overhead of the MSR write if possible. Reviewed-by: Dan Williams Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes from V4 From kernel test robot Fix i386 build: pks_init_task not found Move MSR_IA32_PKRS and INIT_PKRS_VALUE to patch 5 where they are 'used'. (Technically nothing is used until the final test patch but this organization makes review better.) Fix checkpatch errors Changes from V3 From Dan Williams make pks_init_task() and pks_sched_in() macros To avoid Supervisor PKey '#ifdefery' in process.c and process_64.c Split write_pkrs() to an earlier patch to be used in setup_pks() Move Peter's authorship to that patch. From Dan Williams Use ARCH_ENABLE_SUPERVISOR_PKEYS Remove kernel doc comment from write_pkrs From Thomas Gleixner Fix where pks_sched_in() is called from. Should be called from __switch_to() NOTE: PKS requires x86_64 so there is no need to update process_32.c Make pkrs_cache static Remove unnecessary pkrs_cache declaration Clean up formatting Changes from V2 Adjust for PKS enable being final patch. Changes from V1 Rebase to latest tip/master Resolve conflicts with INIT_THREAD changes Changes since RFC V3 Per Dave Hansen Update commit message move saved_pkrs to be in a nicer place Per Peter Zijlstra Add Comment from Peter Clean up white space Update authorship --- arch/x86/include/asm/processor.h | 47 +++++++++++++++++++++++++++++++- arch/x86/kernel/process.c | 3 ++ arch/x86/kernel/process_64.c | 2 ++ 3 files changed, 51 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index dc6d149bf851..e0ffb9c849c5 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -18,6 +18,7 @@ struct vm86; #include #include #include +#include #include #include #include @@ -519,6 +520,12 @@ struct thread_struct { unsigned long cr2; unsigned long trap_nr; unsigned long error_code; + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + /* Saved Protection key register for supervisor mappings */ + u32 saved_pkrs; +#endif + #ifdef CONFIG_VM86 /* Virtual 86 mode info */ struct vm86 *vm86; @@ -775,6 +782,37 @@ static inline void spin_lock_prefetch(const void *x) ((struct pt_regs *)__ptr) - 1; \ }) +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +void write_pkrs(u32 new_pkrs); + +/* + * Define pks_init_task and pks_sched_in as macros to avoid requiring the + * definition of struct task_struct in this header while keeping the supervisor + * pkey #ifdefery out of process.c and process_64.c + */ + +/* + * New tasks get the most restrictive PKRS value. + */ +#define pks_init_task(tsk) \ + tsk->thread.saved_pkrs = INIT_PKRS_VALUE + +/* + * PKRS is only temporarily changed during specific code paths. Only a + * preemption during these windows away from the default value would + * require updating the MSR. write_pkrs() handles this optimization. + */ +#define pks_sched_in() \ + write_pkrs(current->thread.saved_pkrs) + +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#define pks_init_task(tsk) +#define pks_sched_in() + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + #ifdef CONFIG_X86_32 #define INIT_THREAD { \ .sp0 = TOP_OF_INIT_STACK, \ @@ -784,7 +822,14 @@ static inline void spin_lock_prefetch(const void *x) #define KSTK_ESP(task) (task_pt_regs(task)->sp) #else -#define INIT_THREAD { } + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define INIT_THREAD { \ + .saved_pkrs = INIT_PKRS_VALUE, \ +} +#else +#define INIT_THREAD { } +#endif extern unsigned long KSTK_ESP(struct task_struct *task); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9c214d7085a4..89f8454a8541 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "process.h" @@ -195,6 +196,8 @@ void flush_thread(void) memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); fpu__clear_all(&tsk->thread.fpu); + + pks_init_task(tsk); } void disable_TSC(void) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index d08307df69ad..e590ecac1650 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -632,6 +632,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* Load the Intel cache allocation PQR MSR. */ resctrl_sched_in(); + pks_sched_in(); + return prev_p; } From patchwork Thu Apr 1 22:58:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 413954 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75749C43462 for ; Thu, 1 Apr 2021 22:58:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 279F86112D for ; Thu, 1 Apr 2021 22:58:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235886AbhDAW6n (ORCPT ); Thu, 1 Apr 2021 18:58:43 -0400 Received: from mga04.intel.com ([192.55.52.120]:11537 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235767AbhDAW6j (ORCPT ); Thu, 1 Apr 2021 18:58:39 -0400 IronPort-SDR: pzo4fIZ1nOyTZfVY/hs7vQp/+NnJUEyFEBBR7BG7XQbSLdsqdlPncgcZqhLS3ZcLWQM+cBvPp9 521W2204pgmw== X-IronPort-AV: E=McAfee;i="6000,8403,9941"; a="190117116" X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="190117116" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:38 -0700 IronPort-SDR: /LAHeSDLYy7BAvnolYCZT1u8Y4SA5BtaY8kdH74hkqkQtSsM0InXJN+IdKDyyL1JWVpXiep5O9 R/MmiwSZL07Q== X-IronPort-AV: E=Sophos;i="5.81,296,1610438400"; d="scan'208";a="611092368" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Apr 2021 15:58:37 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dave Hansen , Dan Williams , x86@kernel.org, linux-kernel@vger.kernel.org, Fenghua Yu , linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V6 08/10] x86/entry: Preserve PKRS MSR across exceptions Date: Thu, 1 Apr 2021 15:58:31 -0700 Message-Id: <20210401225833.566238-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210401225833.566238-1-ira.weiny@intel.com> References: <20210401225833.566238-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The PKRS MSR is not managed by XSAVE. It is preserved through a context switch but this support leaves exception handling code open to memory accesses during exceptions. 2 possible places for preserving this state were considered, irqentry_state_t or pt_regs.[1] pt_regs was much more complicated and was potentially fraught with unintended consequences.[2] However, Andy came up with a way to hide additional values on the stack which could be accessed as "extended_pt_regs".[3] This method allows for; any place which has struct pt_regs can get access to the extra information; no extra information is added to irq_state; and pt_regs is left intact for compatibility with outside tools like BPF. To simplify, the assembly code only adds space on the stack. The setting or use of any needed values are left to the C code. While some entry points may not use this space it is still added where ever pt_regs is passed to the C code for consistency. Each nested exception gets another copy of this extended space allowing for any number of levels of exception handling. In the assembly, a macro is defined to allow a central place to add space for other uses should the need arise. Finally export pkrs_{save_set|restore}_irq to the common code to allow it to preserve the current task's PKRS in the new extended pt_regs if enabled. Peter, Thomas, Andy, Dave, and Dan all suggested parts of the patch or aided in the development of the patch.. [1] https://lore.kernel.org/lkml/CALCETrVe1i5JdyzD_BcctxQJn+ZE3T38EFPgjxN1F577M36g+w@mail.gmail.com/ [2] https://lore.kernel.org/lkml/874kpxx4jf.fsf@nanos.tec.linutronix.de/#t [3] https://lore.kernel.org/lkml/CALCETrUHwZPic89oExMMe-WyDY8-O3W68NcZvse3=PGW+iW5=w@mail.gmail.com/ Cc: Dave Hansen Reviewed-by: Dan Williams Suggested-by: Dave Hansen Suggested-by: Dan Williams Suggested-by: Peter Zijlstra Suggested-by: Thomas Gleixner Suggested-by: Andy Lutomirski Signed-off-by: Ira Weiny --- Changes from V5: Dave Hansen Remove 'we' from comments Changes from V4: Fix checkpatch errors Changes from V3: Fix 0-day issues Move all extended regs stuff to pks.h From Dan Williams Move show_extended_regs_oops ifdefery to pks.h Remove a bad comment s/irq_save_set_pkrs/pkrs_save_set_irq s/irq_restore_pkrs/pkrs_restore_irq s/ARCH_HAS/ARCH_ENABLE_SUPERVISOR_PKEYS From Dave Hansen: remove extra macro parameter for most calls clarify with comments Add BUILD check for extend regs size use subq/addq vs push/pop From Dan Williams and Dave Hansen: Use a macro call to wrap the c function calls with push/pop extended_pt_regs From Dave Hansen: Guidance on where to find each of the pt_regs being passed to C code From Thomas Gleixner: Remove unnecessary noinstr's From Andy Lutomirski: Convert to using the extended pt_regs Add in showing pks on fault through the extended pt_regs Changes from V1 remove redundant irq_state->pkrs This value is only needed for the global tracking. So it should be included in that patch and not in this one. Changes from RFC V3 Standardize on 'irq_state' variable name Per Dave Hansen irq_save_pkrs() -> irq_save_set_pkrs() Rebased based on clean up patch by Thomas Gleixner This includes moving irq_[save_set|restore]_pkrs() to the core as well. --- arch/x86/entry/calling.h | 26 ++++++++++++ arch/x86/entry/common.c | 57 ++++++++++++++++++++++++++ arch/x86/entry/entry_64.S | 22 +++++----- arch/x86/entry/entry_64_compat.S | 6 +-- arch/x86/include/asm/pks.h | 16 ++++++++ arch/x86/include/asm/processor-flags.h | 2 + arch/x86/kernel/head_64.S | 7 ++-- arch/x86/mm/fault.c | 3 ++ include/linux/pkeys.h | 17 ++++++++ kernel/entry/common.c | 14 ++++++- 10 files changed, 151 insertions(+), 19 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 07a9331d55e7..ec85f8f675be 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -97,6 +97,32 @@ For 32-bit we have the following conventions - kernel is built with #define SIZEOF_PTREGS 21*8 +/* + * __call_ext_ptregs - Helper macro to call into C with extended pt_regs + * @cfunc: C function to be called + * + * This will ensure that extended_ptregs is added and removed as needed during + * a call into C code. + */ +.macro __call_ext_ptregs cfunc annotate_retpoline_safe:req +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + /* add space for extended_pt_regs */ + subq $EXTENDED_PT_REGS_SIZE, %rsp +#endif + .if \annotate_retpoline_safe == 1 + ANNOTATE_RETPOLINE_SAFE + .endif + call \cfunc +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + /* remove space for extended_pt_regs */ + addq $EXTENDED_PT_REGS_SIZE, %rsp +#endif +.endm + +.macro call_ext_ptregs cfunc + __call_ext_ptregs \cfunc, annotate_retpoline_safe=0 +.endm + .macro PUSH_AND_CLEAR_REGS rdx=%rdx rax=%rax save_ret=0 .if \save_ret pushq %rsi /* pt_regs->si */ diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 4efd39aacb9f..239f01d710c5 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -19,6 +19,7 @@ #include #include #include +#include #ifdef CONFIG_XEN_PV #include @@ -34,6 +35,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 __visible noinstr void do_syscall_64(unsigned long nr, struct pt_regs *regs) @@ -214,6 +216,59 @@ SYSCALL_DEFINE0(ni_syscall) return -ENOSYS; } +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +void show_extended_regs_oops(struct pt_regs *regs, unsigned long error_code) +{ + struct extended_pt_regs *ept_regs = extended_pt_regs(regs); + + if (cpu_feature_enabled(X86_FEATURE_PKS) && (error_code & X86_PF_PK)) + pr_alert("PKRS: 0x%x\n", ept_regs->thread_pkrs); +} + +/* + * PKRS is a per-logical-processor MSR which overlays additional protection for + * pages which have been mapped with a protection key. + * + * The register is not maintained with XSAVE maintain the MSR value in software + * during context switch and exception handling. + * + * Context switches save the MSR in the task struct thus taking that value to + * other processors if necessary. + * + * To protect against exceptions having access to this memory save the current + * running value and set the PKRS value to be used during the exception. + */ +void pkrs_save_set_irq(struct pt_regs *regs, u32 val) +{ + struct extended_pt_regs *ept_regs; + + BUILD_BUG_ON(sizeof(struct extended_pt_regs) + != EXTENDED_PT_REGS_SIZE + + sizeof(struct pt_regs)); + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + ept_regs = extended_pt_regs(regs); + ept_regs->thread_pkrs = current->thread.saved_pkrs; + write_pkrs(val); +} + +void pkrs_restore_irq(struct pt_regs *regs) +{ + struct extended_pt_regs *ept_regs; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + ept_regs = extended_pt_regs(regs); + write_pkrs(ept_regs->thread_pkrs); + current->thread.saved_pkrs = ept_regs->thread_pkrs; +} + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + #ifdef CONFIG_XEN_PV #ifndef CONFIG_PREEMPTION /* @@ -270,6 +325,8 @@ __visible noinstr void xen_pv_evtchn_do_upcall(struct pt_regs *regs) inhcall = get_and_clear_inhcall(); if (inhcall && !WARN_ON_ONCE(state.exit_rcu)) { + /* Normally called by irqentry_exit, restore pkrs here */ + pkrs_restore_irq(regs); instrumentation_begin(); irqentry_exit_cond_resched(); instrumentation_end(); diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 400908dff42e..d65952a18ad7 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -331,7 +331,7 @@ SYM_CODE_END(ret_from_fork) movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */ .endif - call \cfunc + call_ext_ptregs \cfunc jmp error_return .endm @@ -434,7 +434,7 @@ SYM_CODE_START(\asmsym) movq %rsp, %rdi /* pt_regs pointer */ - call \cfunc + call_ext_ptregs \cfunc jmp paranoid_exit @@ -495,7 +495,7 @@ SYM_CODE_START(\asmsym) * stack. */ movq %rsp, %rdi /* pt_regs pointer */ - call vc_switch_off_ist + call_ext_ptregs vc_switch_off_ist movq %rax, %rsp /* Switch to new stack */ UNWIND_HINT_REGS @@ -506,7 +506,7 @@ SYM_CODE_START(\asmsym) movq %rsp, %rdi /* pt_regs pointer */ - call \cfunc + call_ext_ptregs \cfunc /* * No need to switch back to the IST stack. The current stack is either @@ -541,7 +541,7 @@ SYM_CODE_START(\asmsym) movq %rsp, %rdi /* pt_regs pointer into first argument */ movq ORIG_RAX(%rsp), %rsi /* get error code into 2nd argument*/ movq $-1, ORIG_RAX(%rsp) /* no syscall to restart */ - call \cfunc + call_ext_ptregs \cfunc jmp paranoid_exit @@ -780,7 +780,7 @@ SYM_CODE_START_LOCAL(exc_xen_hypervisor_callback) movq %rdi, %rsp /* we don't return, adjust the stack frame */ UNWIND_HINT_REGS - call xen_pv_evtchn_do_upcall + call_ext_ptregs xen_pv_evtchn_do_upcall jmp error_return SYM_CODE_END(exc_xen_hypervisor_callback) @@ -986,7 +986,7 @@ SYM_CODE_START_LOCAL(error_entry) /* Put us onto the real thread stack. */ popq %r12 /* save return addr in %12 */ movq %rsp, %rdi /* arg0 = pt_regs pointer */ - call sync_regs + call_ext_ptregs sync_regs movq %rax, %rsp /* switch stack */ ENCODE_FRAME_POINTER pushq %r12 @@ -1041,7 +1041,7 @@ SYM_CODE_START_LOCAL(error_entry) * as if we faulted immediately after IRET. */ mov %rsp, %rdi - call fixup_bad_iret + call_ext_ptregs fixup_bad_iret mov %rax, %rsp jmp .Lerror_entry_from_usermode_after_swapgs SYM_CODE_END(error_entry) @@ -1147,7 +1147,7 @@ SYM_CODE_START(asm_exc_nmi) movq %rsp, %rdi movq $-1, %rsi - call exc_nmi + call_ext_ptregs exc_nmi /* * Return back to user mode. We must *not* do the normal exit @@ -1183,6 +1183,8 @@ SYM_CODE_START(asm_exc_nmi) * +---------------------------------------------------------+ * | pt_regs | * +---------------------------------------------------------+ + * | (Optionally) extended_pt_regs | + * +---------------------------------------------------------+ * * The "original" frame is used by hardware. Before re-enabling * NMIs, we need to be done with it, and we need to leave enough @@ -1359,7 +1361,7 @@ end_repeat_nmi: movq %rsp, %rdi movq $-1, %rsi - call exc_nmi + call_ext_ptregs exc_nmi /* Always restore stashed CR3 value (see paranoid_entry) */ RESTORE_CR3 scratch_reg=%r15 save_reg=%r14 diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 0051cf5c792d..53254d29d5c7 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -136,7 +136,7 @@ SYM_INNER_LABEL(entry_SYSENTER_compat_after_hwframe, SYM_L_GLOBAL) .Lsysenter_flags_fixed: movq %rsp, %rdi - call do_SYSENTER_32 + call_ext_ptregs do_SYSENTER_32 /* XEN PV guests always use IRET path */ ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \ "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV @@ -253,7 +253,7 @@ SYM_INNER_LABEL(entry_SYSCALL_compat_after_hwframe, SYM_L_GLOBAL) UNWIND_HINT_REGS movq %rsp, %rdi - call do_fast_syscall_32 + call_ext_ptregs do_fast_syscall_32 /* XEN PV guests always use IRET path */ ALTERNATIVE "testl %eax, %eax; jz swapgs_restore_regs_and_return_to_usermode", \ "jmp swapgs_restore_regs_and_return_to_usermode", X86_FEATURE_XENPV @@ -410,6 +410,6 @@ SYM_CODE_START(entry_INT80_compat) cld movq %rsp, %rdi - call do_int80_syscall_32 + call_ext_ptregs do_int80_syscall_32 jmp swapgs_restore_regs_and_return_to_usermode SYM_CODE_END(entry_INT80_compat) diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index 5d7067ada8fb..516d82f032b6 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -4,11 +4,27 @@ #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +struct extended_pt_regs { + u32 thread_pkrs; + /* Keep stack 8 byte aligned */ + u32 pad; + struct pt_regs pt_regs; +}; + void setup_pks(void); +static inline struct extended_pt_regs *extended_pt_regs(struct pt_regs *regs) +{ + return container_of(regs, struct extended_pt_regs, pt_regs); +} + +void show_extended_regs_oops(struct pt_regs *regs, unsigned long error_code); + #else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ static inline void setup_pks(void) { } +static inline void show_extended_regs_oops(struct pt_regs *regs, + unsigned long error_code) { } #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index 02c2cbda4a74..4a41fc4cf028 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -53,4 +53,6 @@ # define X86_CR3_PTI_PCID_USER_BIT 11 #endif +#define EXTENDED_PT_REGS_SIZE 8 + #endif /* _ASM_X86_PROCESSOR_FLAGS_H */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 04bddaaba8e2..80531526b0d2 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -319,8 +319,7 @@ SYM_CODE_START_NOALIGN(vc_boot_ghcb) movq %rsp, %rdi movq ORIG_RAX(%rsp), %rsi movq initial_vc_handler(%rip), %rax - ANNOTATE_RETPOLINE_SAFE - call *%rax + __call_ext_ptregs *%rax, annotate_retpoline_safe=1 /* Unwind pt_regs */ POP_REGS @@ -397,7 +396,7 @@ SYM_CODE_START_LOCAL(early_idt_handler_common) UNWIND_HINT_REGS movq %rsp,%rdi /* RDI = pt_regs; RSI is already trapnr */ - call do_early_exception + call_ext_ptregs do_early_exception decl early_recursion_flag(%rip) jmp restore_regs_and_return_to_kernel @@ -421,7 +420,7 @@ SYM_CODE_START_NOALIGN(vc_no_ghcb) /* Call C handler */ movq %rsp, %rdi movq ORIG_RAX(%rsp), %rsi - call do_vc_no_ghcb + call_ext_ptregs do_vc_no_ghcb /* Unwind pt_regs */ POP_REGS diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 0e0e90968f57..f694fbf9dcb8 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -32,6 +32,7 @@ #include /* VMALLOC_START, ... */ #include /* kvm_handle_async_pf */ #include /* fixup_vdso_exception() */ +#include #define CREATE_TRACE_POINTS #include @@ -547,6 +548,8 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad (error_code & X86_PF_PK) ? "protection keys violation" : "permissions violation"); + show_extended_regs_oops(regs, error_code); + if (!(error_code & X86_PF_USER) && user_mode(regs)) { struct desc_ptr idt, gdt; u16 ldtr, tr; diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 2955ba976048..a3d17a8e4e81 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -50,4 +50,21 @@ static inline void copy_init_pkru_to_fpregs(void) #endif /* ! CONFIG_ARCH_HAS_PKEYS */ + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +void pkrs_save_set_irq(struct pt_regs *regs, u32 val); +void pkrs_restore_irq(struct pt_regs *regs); + +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#ifndef INIT_PKRS_VALUE +#define INIT_PKRS_VALUE 0 +#endif + +static inline void pkrs_save_set_irq(struct pt_regs *regs, u32 val) { } +static inline void pkrs_restore_irq(struct pt_regs *regs) { } + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + #endif /* _LINUX_PKEYS_H */ diff --git a/kernel/entry/common.c b/kernel/entry/common.c index 8442e5c9cfa2..b50bcc2d3ea5 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -5,6 +5,7 @@ #include #include #include +#include #include "common.h" @@ -363,7 +364,7 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) instrumentation_end(); ret.exit_rcu = true; - return ret; + goto done; } /* @@ -378,6 +379,8 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) trace_hardirqs_off_finish(); instrumentation_end(); +done: + pkrs_save_set_irq(regs, INIT_PKRS_VALUE); return ret; } @@ -403,7 +406,12 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) /* Check whether this returns to user mode */ if (user_mode(regs)) { irqentry_exit_to_user_mode(regs); - } else if (!regs_irqs_disabled(regs)) { + return; + } + + pkrs_restore_irq(regs); + + if (!regs_irqs_disabled(regs)) { /* * If RCU was not watching on entry this needs to be done * carefully and needs the same ordering of lockdep/tracing @@ -457,11 +465,13 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) ftrace_nmi_enter(); instrumentation_end(); + pkrs_save_set_irq(regs, INIT_PKRS_VALUE); return irq_state; } void noinstr irqentry_nmi_exit(struct pt_regs *regs, irqentry_state_t irq_state) { + pkrs_restore_irq(regs); instrumentation_begin(); ftrace_nmi_exit(); if (irq_state.lockdep) {