From patchwork Mon Mar 22 05:30:11 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 406342 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18A18C433DB for ; Mon, 22 Mar 2021 05:31:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D149A6196B for ; Mon, 22 Mar 2021 05:31:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229951AbhCVFan (ORCPT ); Mon, 22 Mar 2021 01:30:43 -0400 Received: from mga09.intel.com ([134.134.136.24]:18626 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229840AbhCVFab (ORCPT ); Mon, 22 Mar 2021 01:30:31 -0400 IronPort-SDR: uPTTWwtYrRLZR2DYALEoF9YI+M/89pd1CDCFsShyr+8cDyYqh0hCbq1tIHNPJLsK03iDYmYdP7 6JABSQEvsRZA== X-IronPort-AV: E=McAfee;i="6000,8403,9930"; a="190298137" X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="190298137" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:30 -0700 IronPort-SDR: 7G1YiyELvMrY9ztDPPU6Q4Y4kN8qGNAh1krxTM6ooi7Ab3JmVBS20Gjr4rE0UOPjQG7UKoCgCf ld7UeFPwVEDw== X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="607238724" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:30 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, Fenghua Yu , linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V4 01/10] x86/pkeys: Create pkeys_common.h Date: Sun, 21 Mar 2021 22:30:11 -0700 Message-Id: <20210322053020.2287058-2-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210322053020.2287058-1-ira.weiny@intel.com> References: <20210322053020.2287058-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Protection Keys User (PKU) and Protection Keys Supervisor (PKS) work in similar fashions and can share common defines. Specifically PKS and PKU each have: 1. A single control register 2. The same number of keys 3. The same number of bits in the register per key 4. Access and Write disable in the same bit locations Given the above, share all the macros that synthesize and manipulate register values between the two features. Unlike PKU the PKS definitions are needed in both pgtable.h and pkeys.h. Create a common header for those 2 headers to share. The alternative, including pgtable.h in pkeys.h, triggers complex header dependencies. Share these defines by moving them into a new header, change their names to reflect the common use, and include the header where needed. Reviewed-by: Dan Williams Signed-off-by: Ira Weiny --- NOTE: The initialization of init_pkru_value cause checkpatch errors because of the space after the '(' in the macros. We leave this as is because it is more readable in this format. And it was existing code. --- Changes from V3: From Dan Williams Fix guard macro names Reword commit message. Changes from RFC V3 Per Dave Hansen Update commit message Add comment to PKR_AD_KEY macro --- arch/x86/include/asm/pgtable.h | 13 ++++++------- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/include/asm/pkeys_common.h | 15 +++++++++++++++ arch/x86/kernel/fpu/xstate.c | 8 ++++---- arch/x86/mm/pkeys.c | 14 ++++++-------- 5 files changed, 33 insertions(+), 19 deletions(-) create mode 100644 arch/x86/include/asm/pkeys_common.h diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a02c67291cfc..bfbfb951fe65 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1360,9 +1360,7 @@ static inline pmd_t pmd_swp_clear_uffd_wp(pmd_t pmd) } #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */ -#define PKRU_AD_BIT 0x1 -#define PKRU_WD_BIT 0x2 -#define PKRU_BITS_PER_PKEY 2 +#include #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS extern u32 init_pkru_value; @@ -1372,18 +1370,19 @@ extern u32 init_pkru_value; static inline bool __pkru_allows_read(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY; - return !(pkru & (PKRU_AD_BIT << pkru_pkey_bits)); + int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; + + return !(pkru & (PKR_AD_BIT << pkru_pkey_bits)); } static inline bool __pkru_allows_write(u32 pkru, u16 pkey) { - int pkru_pkey_bits = pkey * PKRU_BITS_PER_PKEY; + int pkru_pkey_bits = pkey * PKR_BITS_PER_PKEY; /* * Access-disable disables writes too so we need to check * both bits here. */ - return !(pkru & ((PKRU_AD_BIT|PKRU_WD_BIT) << pkru_pkey_bits)); + return !(pkru & ((PKR_AD_BIT|PKR_WD_BIT) << pkru_pkey_bits)); } static inline u16 pte_flags_pkey(unsigned long pte_flags) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 2ff9b98812b7..f9feba80894b 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PKEYS_H #define _ASM_X86_PKEYS_H +#include + #define ARCH_DEFAULT_PKEY 0 /* diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h new file mode 100644 index 000000000000..e40b0ced733f --- /dev/null +++ b/arch/x86/include/asm/pkeys_common.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKEYS_COMMON_H +#define _ASM_X86_PKEYS_COMMON_H + +#define PKR_AD_BIT 0x1 +#define PKR_WD_BIT 0x2 +#define PKR_BITS_PER_PKEY 2 + +/* + * Generate an Access-Disable mask for the given pkey. Several of these can be + * OR'd together to generate pkey register values. + */ +#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) + +#endif /*_ASM_X86_PKEYS_COMMON_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index 683749b80ae2..face29dab0e3 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -995,7 +995,7 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { u32 old_pkru; - int pkey_shift = (pkey * PKRU_BITS_PER_PKEY); + int pkey_shift = (pkey * PKR_BITS_PER_PKEY); u32 new_pkru_bits = 0; /* @@ -1014,16 +1014,16 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, /* Set the bits we need in PKRU: */ if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |= PKRU_AD_BIT; + new_pkru_bits |= PKR_AD_BIT; if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |= PKRU_WD_BIT; + new_pkru_bits |= PKR_WD_BIT; /* Shift the bits in to the correct place in PKRU for pkey: */ new_pkru_bits <<= pkey_shift; /* Get old PKRU and mask off any old bits in place: */ old_pkru = read_pkru(); - old_pkru &= ~((PKRU_AD_BIT|PKRU_WD_BIT) << pkey_shift); + old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); /* Write old part along with new part: */ write_pkru(old_pkru | new_pkru_bits); diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 8873ed1438a9..f5efb4007e74 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -111,19 +111,17 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey return vma_pkey(vma); } -#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) - /* * Make the default PKRU value (at execve() time) as restrictive * as possible. This ensures that any threads clone()'d early * in the process's lifetime will not accidentally get access * to data which is pkey-protected later on. */ -u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) | - PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) | - PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) | - PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) | - PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15); +u32 init_pkru_value = PKR_AD_KEY( 1) | PKR_AD_KEY( 2) | PKR_AD_KEY( 3) | + PKR_AD_KEY( 4) | PKR_AD_KEY( 5) | PKR_AD_KEY( 6) | + PKR_AD_KEY( 7) | PKR_AD_KEY( 8) | PKR_AD_KEY( 9) | + PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | + PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15); /* * Called from the FPU code when creating a fresh set of FPU @@ -173,7 +171,7 @@ static ssize_t init_pkru_write_file(struct file *file, * up immediately if someone attempts to disable access * or writes to pkey 0. */ - if (new_init_pkru & (PKRU_AD_BIT|PKRU_WD_BIT)) + if (new_init_pkru & (PKR_AD_BIT|PKR_WD_BIT)) return -EINVAL; WRITE_ONCE(init_pkru_value, new_init_pkru); From patchwork Mon Mar 22 05:30:12 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 406341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 15EC7C433E4 for ; Mon, 22 Mar 2021 05:31:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E2EEF6196B for ; Mon, 22 Mar 2021 05:31:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230010AbhCVFan (ORCPT ); Mon, 22 Mar 2021 01:30:43 -0400 Received: from mga09.intel.com ([134.134.136.24]:18629 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229904AbhCVFac (ORCPT ); Mon, 22 Mar 2021 01:30:32 -0400 IronPort-SDR: WVNFDs9dXluhrNRc+GC/zG2DhJWEB4C1jZHFkehdP0p/VgRzSArTwaFKGJEhLhFhTOfwVaMPeZ kITzxvx3rnPQ== X-IronPort-AV: E=McAfee;i="6000,8403,9930"; a="190298141" X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="190298141" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:32 -0700 IronPort-SDR: L5JJAP7dflOGzcIGhap+STJa69Dp82VrMouOOAjpxcHi3Qi2r0DhtSJu2D11B82V296Zs0ZGan 339t3hrSqsew== X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="607238734" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:31 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, Fenghua Yu , linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V4 02/10] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support Date: Sun, 21 Mar 2021 22:30:12 -0700 Message-Id: <20210322053020.2287058-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210322053020.2287058-1-ira.weiny@intel.com> References: <20210322053020.2287058-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Define a helper, update_pkey_val(), which will be used to support both Protection Key User (PKU) and the new Protection Key for Supervisor (PKS) in subsequent patches. Reviewed-by: Dan Williams Co-developed-by: Peter Zijlstra Signed-off-by: Peter Zijlstra Signed-off-by: Ira Weiny --- Changes from RFC V3: Per Dave Hansen Update and add comments per Dave's review Per Peter Correct attribution --- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 22 ++++------------------ arch/x86/mm/pkeys.c | 23 +++++++++++++++++++++++ 3 files changed, 29 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index f9feba80894b..4526245b03e5 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -136,4 +136,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags); + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index face29dab0e3..00251bdf759b 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -994,9 +994,7 @@ const void *get_xsave_field_ptr(int xfeature_nr) int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { - u32 old_pkru; - int pkey_shift = (pkey * PKR_BITS_PER_PKEY); - u32 new_pkru_bits = 0; + u32 pkru; /* * This check implies XSAVE support. OSPKE only gets @@ -1012,21 +1010,9 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, */ WARN_ON_ONCE(pkey >= arch_max_pkey()); - /* Set the bits we need in PKRU: */ - if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |= PKR_AD_BIT; - if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |= PKR_WD_BIT; - - /* Shift the bits in to the correct place in PKRU for pkey: */ - new_pkru_bits <<= pkey_shift; - - /* Get old PKRU and mask off any old bits in place: */ - old_pkru = read_pkru(); - old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); - - /* Write old part along with new part: */ - write_pkru(old_pkru | new_pkru_bits); + pkru = read_pkru(); + pkru = update_pkey_val(pkru, pkey, init_val); + write_pkru(pkru); return 0; } diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index f5efb4007e74..d1dfe743e79f 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -208,3 +208,26 @@ static __init int setup_init_pkru(char *opt) return 1; } __setup("init_pkru=", setup_init_pkru); + +/* + * Replace disable bits for @pkey with values from @flags + * + * Kernel users use the same flags as user space: + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + */ +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) +{ + int pkey_shift = pkey * PKR_BITS_PER_PKEY; + + /* Mask out old bit values */ + pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift); + + /* Or in new values */ + if (flags & PKEY_DISABLE_ACCESS) + pk_reg |= PKR_AD_BIT << pkey_shift; + if (flags & PKEY_DISABLE_WRITE) + pk_reg |= PKR_WD_BIT << pkey_shift; + + return pk_reg; +} From patchwork Mon Mar 22 05:30:14 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 406339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8CADC433EC for ; Mon, 22 Mar 2021 05:31:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 98C7661966 for ; Mon, 22 Mar 2021 05:31:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230018AbhCVFao (ORCPT ); Mon, 22 Mar 2021 01:30:44 -0400 Received: from mga09.intel.com ([134.134.136.24]:18626 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229926AbhCVFaf (ORCPT ); Mon, 22 Mar 2021 01:30:35 -0400 IronPort-SDR: XXDNoAyd256LRlrmV0ch3MCZiqhID+XZCufTk31Rv4kdiHGEQDOZfsB3m4ebjZtDqdFQ4PwVwE t7HTHhi7ZAjg== X-IronPort-AV: E=McAfee;i="6000,8403,9930"; a="190298143" X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="190298143" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:35 -0700 IronPort-SDR: cQ2hdS88R4RIflBow77nGTW9TIPO3K0TsZNbDTLafGoEi4oauhlBPJdCd1gZoaIjz65JYbaMT7 LzFYexxPfcVQ== X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="607238746" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:34 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Fenghua Yu , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V4 04/10] x86/pks: Add PKS defines and Kconfig options Date: Sun, 21 Mar 2021 22:30:14 -0700 Message-Id: <20210322053020.2287058-5-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210322053020.2287058-1-ira.weiny@intel.com> References: <20210322053020.2287058-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Protection Keys for Supervisor pages (PKS) enables fast, hardware thread specific, manipulation of permission restrictions on supervisor page mappings. It uses the same mechanism of Protection Keys as those on User mappings but applies that mechanism to supervisor mappings using a supervisor specific MSR. Kernel users can define domains of page mappings which have an extra level of protection beyond those specified in the supervisor page table entries. Define the PKS CPU feature bits. Add the Kconfig ARCH_HAS_SUPERVISOR_PKEYS to indicate to consumers that an architecture supports pkeys. Introduce ARCH_ENABLE_SUPERVISOR_PKEYS to allow kernel users to specify to the arch that they wish to use the supervisor key support if ARCH_HAS_SUPERVISOR_PKEYS is available. ARCH_ENABLE_SUPERVISOR_PKEYS remains off until the first use case sets it. Reviewed-by: Dan Williams Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes from V3: From Dan Clean up commit message Add ARCH_ENABLE_SUPERVISOR_PKEYS option so we don't have the overhead of PKS unless there is a user Clean up commit message grammar Changes from V2 New patch for V3: Split this off from the enable patch to be able to create cleaner bisectability --- arch/x86/Kconfig | 1 + arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/include/uapi/asm/processor-flags.h | 2 ++ mm/Kconfig | 4 ++++ 5 files changed, 15 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2792879d398e..5e3a7c2bc342 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1870,6 +1870,7 @@ config X86_INTEL_MEMORY_PROTECTION_KEYS depends on X86_64 && (CPU_SUP_INTEL || CPU_SUP_AMD) select ARCH_USES_HIGH_VMA_FLAGS select ARCH_HAS_PKEYS + select ARCH_HAS_SUPERVISOR_PKEYS help Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cc96e26d69f7..83ed73407417 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -359,6 +359,7 @@ #define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */ #define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS instructions */ #define X86_FEATURE_SGX_LC (16*32+30) /* Software Guard Extensions Launch Control */ +#define X86_FEATURE_PKS (16*32+31) /* Protection Keys for Supervisor pages */ /* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ #define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index b7dd944dc867..fd09ae852c04 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -44,6 +44,12 @@ # define DISABLE_OSPKE (1<<(X86_FEATURE_OSPKE & 31)) #endif /* CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS */ +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +# define DISABLE_PKS 0 +#else +# define DISABLE_PKS (1<<(X86_FEATURE_PKS & 31)) +#endif + #ifdef CONFIG_X86_5LEVEL # define DISABLE_LA57 0 #else @@ -88,7 +94,7 @@ #define DISABLED_MASK14 0 #define DISABLED_MASK15 0 #define DISABLED_MASK16 (DISABLE_PKU|DISABLE_OSPKE|DISABLE_LA57|DISABLE_UMIP| \ - DISABLE_ENQCMD) + DISABLE_ENQCMD|DISABLE_PKS) #define DISABLED_MASK17 0 #define DISABLED_MASK18 0 #define DISABLED_MASK19 0 diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index bcba3c643e63..191c574b2390 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -130,6 +130,8 @@ #define X86_CR4_SMAP _BITUL(X86_CR4_SMAP_BIT) #define X86_CR4_PKE_BIT 22 /* enable Protection Keys support */ #define X86_CR4_PKE _BITUL(X86_CR4_PKE_BIT) +#define X86_CR4_PKS_BIT 24 /* enable Protection Keys for Supervisor */ +#define X86_CR4_PKS _BITUL(X86_CR4_PKS_BIT) /* * x86-64 Task Priority Register, CR8 diff --git a/mm/Kconfig b/mm/Kconfig index 24c045b24b95..c7d1fc780358 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -808,6 +808,10 @@ config ARCH_USES_HIGH_VMA_FLAGS bool config ARCH_HAS_PKEYS bool +config ARCH_HAS_SUPERVISOR_PKEYS + bool +config ARCH_ENABLE_SUPERVISOR_PKEYS + bool config PERCPU_STATS bool "Collect percpu memory statistics" From patchwork Mon Mar 22 05:30:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 406340 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 63702C433E9 for ; Mon, 22 Mar 2021 05:31:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 3E1E66197A for ; Mon, 22 Mar 2021 05:31:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230040AbhCVFap (ORCPT ); Mon, 22 Mar 2021 01:30:45 -0400 Received: from mga09.intel.com ([134.134.136.24]:18629 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229956AbhCVFai (ORCPT ); Mon, 22 Mar 2021 01:30:38 -0400 IronPort-SDR: Gp9+Rr9yzGfozM1uamYPy7c/hP2/DqB6yJed2xbYN0DDuI5Dxb38+csOEBhe4PvtFfJgwb1DNd 5blkncZP+mdg== X-IronPort-AV: E=McAfee;i="6000,8403,9930"; a="190298147" X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="190298147" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:36 -0700 IronPort-SDR: rDhgwVuZx9D+75Si1AW9dPH/ADq4eJdnyQVNmXAPrRmaxfLDKnJDC+Ad/numNe9XIelhpqx3mf ig9qRqPLcAhQ== X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="607238754" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:36 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Fenghua Yu , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V4 05/10] x86/pks: Add PKS setup code Date: Sun, 21 Mar 2021 22:30:15 -0700 Message-Id: <20210322053020.2287058-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210322053020.2287058-1-ira.weiny@intel.com> References: <20210322053020.2287058-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny Protection Keys for Supervisor pages (PKS) enables fast, hardware thread specific, manipulation of permission restrictions on supervisor page mappings. It uses the same mechanism of Protection Keys as those on User mappings but applies that mechanism to supervisor mappings using a supervisor specific MSR. Add setup code and the lowest level of PKS MSR write support. The write value is cached per-cpu to avoid the overhead of the MSR write if the value has not changed. That said, it should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing but still maintains ordering properties similar to WRPKRU. The current SDM section on PKRS needs updating but should be the same as that of WRPKRU. So to quote from the WRPKRU text: WRPKRU will never execute transiently. Memory accesses affected by PKRU register will not execute (even transiently) until all prior executions of WRPKRU have completed execution and updated the PKRU register. write_pkrs() contributed by Peter Zijlstra. Introduce asm/pks.h to declare setup_pks() as an internal function call. Later patches will also need this new header as a place to declare internal structures and functions. Reviewed-by: Dan Williams Co-developed-by: Peter Zijlstra Signed-off-by: Peter Zijlstra Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes from V3: From Dan Williams: Update commit message Add pks.h to hold ifdefery out of *.c files s/ARCH_HAS.../SUPERVISOR_PKEYS move setup_pks to pkeys.c (remove more ifdefery) Remove 'domain' language from commit message Clarify comment in fault handler Move the removal of the WARN_ON_ONCE in the fault path to this patch. Previously it was in: [07/10] x86/fault: Report the PKRS state on fault Changes from V2 From Thomas: Make this patch last so PKS is not enabled until all the PKS mechanisms are in place. Specifically: 1) Modify setup_pks() to call write_pkrs() to properly set up the initial value when enabled. 2) Split this patch into two. 1) a precursor patch with the required defines/config options and 2) this patch which actually enables feature on CPUs which support it. Changes since RFC V3 Per Dave Hansen Update comment Add X86_FEATURE_PKS to disabled-features.h Rebase based on latest TIP tree --- arch/x86/include/asm/pks.h | 15 +++++++++++ arch/x86/kernel/cpu/common.c | 2 ++ arch/x86/mm/pkeys.c | 48 ++++++++++++++++++++++++++++++++++++ 3 files changed, 65 insertions(+) create mode 100644 arch/x86/include/asm/pks.h diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h new file mode 100644 index 000000000000..5d7067ada8fb --- /dev/null +++ b/arch/x86/include/asm/pks.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_PKS_H +#define _ASM_X86_PKS_H + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +void setup_pks(void); + +#else /* !CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +static inline void setup_pks(void) { } + +#endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#endif /* _ASM_X86_PKS_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index ab640abe26b6..de49d0c0f4e0 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -58,6 +58,7 @@ #include #include #include +#include #include "cpu.h" @@ -1594,6 +1595,7 @@ static void identify_cpu(struct cpuinfo_x86 *c) x86_init_rdrand(c); setup_pku(c); + setup_pks(); /* * Clear/Set all flags overridden by options, need do it diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index fc8c7e2bb21b..f6a3a54b8d7d 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -229,3 +229,51 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) return pk_reg; } + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + +static DEFINE_PER_CPU(u32, pkrs_cache); + +/* + * write_pkrs() optimizes MSR writes by maintaining a per cpu cache which can + * be checked quickly. + * + * It should also be noted that the underlying WRMSR(MSR_IA32_PKRS) is not + * serializing but still maintains ordering properties similar to WRPKRU. + * The current SDM section on PKRS needs updating but should be the same as + * that of WRPKRU. So to quote from the WRPKRU text: + * + * WRPKRU will never execute transiently. Memory accesses + * affected by PKRU register will not execute (even transiently) + * until all prior executions of WRPKRU have completed execution + * and updated the PKRU register. + */ +void write_pkrs(u32 new_pkrs) +{ + u32 *pkrs; + + if (!static_cpu_has(X86_FEATURE_PKS)) + return; + + pkrs = get_cpu_ptr(&pkrs_cache); + if (*pkrs != new_pkrs) { + *pkrs = new_pkrs; + wrmsrl(MSR_IA32_PKRS, new_pkrs); + } + put_cpu_ptr(pkrs); +} + +/* + * PKS is independent of PKU and either or both may be supported on a CPU. + * Configure PKS if the CPU supports the feature. + */ +void setup_pks(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + write_pkrs(INIT_PKRS_VALUE); + cr4_set_bits(X86_CR4_PKS); +} + +#endif From patchwork Mon Mar 22 05:30:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 406338 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3024CC433E1 for ; Mon, 22 Mar 2021 05:32:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0D01E61966 for ; Mon, 22 Mar 2021 05:32:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229944AbhCVFbV (ORCPT ); Mon, 22 Mar 2021 01:31:21 -0400 Received: from mga09.intel.com ([134.134.136.24]:18626 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229455AbhCVFal (ORCPT ); Mon, 22 Mar 2021 01:30:41 -0400 IronPort-SDR: EulzqdgenwXhadwJe7Wt1T6KRAM2kU2LS8A6Iv3sPMuDuYguBDa2Rer67OKR6JqSoJ3vfBMWMl dsxF/GRATZMg== X-IronPort-AV: E=McAfee;i="6000,8403,9930"; a="190298153" X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="190298153" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:38 -0700 IronPort-SDR: gwDLtwMN7+Qjzz+r79uIPFMUurCkAwG/hW6+7yXqSU3u4MOpzuSqCcoz+PAezv5+DKTIi9qHfq oVZGsJQTpWAw== X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="607238769" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:38 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Fenghua Yu , Dave Hansen , x86@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V4 07/10] x86/pks: Preserve the PKRS MSR on context switch Date: Sun, 21 Mar 2021 22:30:17 -0700 Message-Id: <20210322053020.2287058-8-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210322053020.2287058-1-ira.weiny@intel.com> References: <20210322053020.2287058-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The PKRS MSR is defined as a per-logical-processor register. This isolates memory access by logical CPU. Unfortunately, the MSR is not managed by XSAVE. Therefore, tasks must save/restore the MSR value on context switch. Define a saved PKRS value in the task struct, as well as a cached per-logical-processor MSR value which mirrors the MSR value of the current CPU. Initialize all tasks with the default MSR value. Then, on schedule in, call write_pkrs() which automatically avoids the overhead of the MSR write if possible. Reviewed-by: Dan Williams Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes from V3 From Dan Williams make pks_init_task() and pks_sched_in() macros To avoid Supervisor PKey '#ifdefery' in process.c and process_64.c Use ARCH_ENABLE_SUPERVISOR_PKEYS Split write_pkrs() to an earlier patch to be used in setup_pks() Move Peter's authorship to that patch. Remove kernel doc comment from write_pkrs From Thomas Gleixner Fix where pks_sched_in() is called from. Should be called from __switch_to() NOTE: PKS requires x86_64 so there is no need to update process_32.c Make pkrs_cache static Remove unnecessary pkrs_cache declaration Clean up formatting Changes from V2 Adjust for PKS enable being final patch. Changes from V1 Rebase to latest tip/master Resolve conflicts with INIT_THREAD changes Changes since RFC V3 Per Dave Hansen Update commit message move saved_pkrs to be in a nicer place Per Peter Zijlstra Add Comment from Peter Clean up white space Update authorship --- arch/x86/include/asm/msr-index.h | 1 + arch/x86/include/asm/pkeys_common.h | 14 ++++++++++ arch/x86/include/asm/processor.h | 43 ++++++++++++++++++++++++++++- arch/x86/kernel/process.c | 3 ++ arch/x86/kernel/process_64.c | 2 ++ 5 files changed, 62 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 546d6ecf0a35..c15a049bf6ac 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -765,6 +765,7 @@ #define MSR_IA32_TSC_DEADLINE 0x000006E0 +#define MSR_IA32_PKRS 0x000006E1 #define MSR_TSX_FORCE_ABORT 0x0000010F diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h index 0681522974ba..6917f1a27479 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -17,4 +17,18 @@ #define PKR_AD_KEY(pkey) (PKR_AD_BIT << PKR_PKEY_SHIFT(pkey)) #define PKR_WD_KEY(pkey) (PKR_WD_BIT << PKR_PKEY_SHIFT(pkey)) +/* + * Define a default PKRS value for each task. + * + * Key 0 has no restriction. All other keys are set to the most restrictive + * value which is access disabled (AD=1). + * + * NOTE: This needs to be a macro to be used as part of the INIT_THREAD macro. + */ +#define INIT_PKRS_VALUE (PKR_AD_KEY(1) | PKR_AD_KEY(2) | PKR_AD_KEY(3) | \ + PKR_AD_KEY(4) | PKR_AD_KEY(5) | PKR_AD_KEY(6) | \ + PKR_AD_KEY(7) | PKR_AD_KEY(8) | PKR_AD_KEY(9) | \ + PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \ + PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) + #endif /*_ASM_X86_PKEYS_COMMON_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index dc6d149bf851..b7ae396285dd 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -18,6 +18,7 @@ struct vm86; #include #include #include +#include #include #include #include @@ -519,6 +520,12 @@ struct thread_struct { unsigned long cr2; unsigned long trap_nr; unsigned long error_code; + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS + /* Saved Protection key register for supervisor mappings */ + u32 saved_pkrs; +#endif + #ifdef CONFIG_VM86 /* Virtual 86 mode info */ struct vm86 *vm86; @@ -784,7 +791,41 @@ static inline void spin_lock_prefetch(const void *x) #define KSTK_ESP(task) (task_pt_regs(task)->sp) #else -#define INIT_THREAD { } + +#ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS +#define INIT_THREAD_PKRS .saved_pkrs = INIT_PKRS_VALUE + +void write_pkrs(u32 new_pkrs); + +/* + * Define pks_init_task and pks_sched_in as macros to avoid requiring the + * definition of struct task_struct in this header while keeping the supervisor + * pkey #ifdefery out of process.c and process_64.c + */ + +/* + * New tasks get the most restrictive PKRS value. + */ +#define pks_init_task(tsk) \ + tsk->thread.saved_pkrs = INIT_PKRS_VALUE; + +/* + * PKRS is only temporarily changed during specific code paths. Only a + * preemption during these windows away from the default value would + * require updating the MSR. write_pkrs() handles this optimization. + */ +#define pks_sched_in() \ + write_pkrs(current->thread.saved_pkrs); + +#else +#define INIT_THREAD_PKRS 0 +#define pks_init_task(tsk) +#define pks_sched_in() +#endif + +#define INIT_THREAD { \ + INIT_THREAD_PKRS, \ +} extern unsigned long KSTK_ESP(struct task_struct *task); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 9c214d7085a4..89f8454a8541 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,6 +43,7 @@ #include #include #include +#include #include "process.h" @@ -195,6 +196,8 @@ void flush_thread(void) memset(tsk->thread.tls_array, 0, sizeof(tsk->thread.tls_array)); fpu__clear_all(&tsk->thread.fpu); + + pks_init_task(tsk); } void disable_TSC(void) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index d08307df69ad..e590ecac1650 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -632,6 +632,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* Load the Intel cache allocation PQR MSR. */ resctrl_sched_in(); + pks_sched_in(); + return prev_p; } From patchwork Mon Mar 22 05:30:20 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 406337 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DDC7C433E5 for ; Mon, 22 Mar 2021 05:32:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2F03B61967 for ; Mon, 22 Mar 2021 05:32:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229987AbhCVFbW (ORCPT ); Mon, 22 Mar 2021 01:31:22 -0400 Received: from mga09.intel.com ([134.134.136.24]:18631 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229746AbhCVFam (ORCPT ); Mon, 22 Mar 2021 01:30:42 -0400 IronPort-SDR: /4EHhHOkK17Q0u0LBvWt+MPzCLKva89H8vpsJORAdrFJxAiVrsrGaX7J4cgZPsNTMb83Mv/uJm y1wbFv8p7GuA== X-IronPort-AV: E=McAfee;i="6000,8403,9930"; a="190298160" X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="190298160" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:42 -0700 IronPort-SDR: fHv+yYQTx8ocgejX05n2Avw4sFqYaSmwbYl50butH0z8pg3jM29SHjzd6YtnoEWnGjsCSlY1NJ 997Kery3pQxg== X-IronPort-AV: E=Sophos;i="5.81,268,1610438400"; d="scan'208";a="607238788" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Mar 2021 22:30:41 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Dan Williams , Dave Hansen , Fenghua Yu , x86@kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org Subject: [PATCH V4 10/10] x86/pks: Add PKS test code Date: Sun, 21 Mar 2021 22:30:20 -0700 Message-Id: <20210322053020.2287058-11-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20210322053020.2287058-1-ira.weiny@intel.com> References: <20210322053020.2287058-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The core PKS functionality provides an interface for kernel users to reserve keys to their domains set up the page tables with those keys and control access to those domains when needed. Define test code which exercises the core functionality of PKS via a debugfs entry. Basic checks can be triggered on boot with a kernel command line option while both basic and preemption checks can be triggered with separate debugfs values. debugfs controls are: '0' -- Run access tests with a single pkey '1' -- Set up the pkey register with no access for the pkey allocated to this fd '2' -- Check that the pkey register updated in '1' is still the same. (To be used after a forced context switch.) '3' -- Allocate all pkeys possible and run tests on each pkey allocated. DEFAULT when run at boot. Closing the fd will cleanup and release the pkey, therefore to exercise context switch testing a user space program is provided in: .../tools/testing/selftests/x86/test_pks.c Reviewed-by: Dan Williams Reviewed-by: Dave Hansen Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny --- Changes from V3 Add test into ARCH_ENABLE_SUPERVISOR_PKEYS Fix allocate context error handling Callback must now take pt_regs instead of irq_state Use pipes to ensure code switches contexts Add more verbose output Add --debug opt to trigger more kernel debug output Reduce kernel output by default Use #defines for the various options Add ability to chose cpu for testing Work out how to make pkrs_cache global when CONFIG_PKS_TEST=y Comments from Dan Williams: Remove walk_table in favor of follow_pte Adjust for new MASK and SHIFT macros Remove unneeded pkey.h header Handle_pks_testing -> handle_pks_test Retain static pkrs_cache when not test s/PKS_TESTING/PKS_TEST/ Put pks_test_callback declaration in pks_common.h Don't export pks_test_callback Add comment explaining context creation Remove module boilerplate Changes for V2 Fix compilation errors Changes for V1 Update for new pks_key_alloc() Changes from RFC V3 Comments from Dave Hansen clean up whitespace dmanage Clean up Kconfig help Clean up user test error output s/pks_mknoaccess/pks_mk_noaccess/ s/pks_mkread/pks_mk_readonly/ s/pks_mkrdwr/pks_mk_readwrite/ Comments from Jing Han Remove duplicate stdio.h --- Documentation/core-api/protection-keys.rst | 5 +- arch/x86/include/asm/pks.h | 19 + arch/x86/mm/fault.c | 15 + arch/x86/mm/pkeys.c | 2 +- lib/Kconfig.debug | 11 + lib/Makefile | 3 + lib/pks/Makefile | 3 + lib/pks/pks_test.c | 693 +++++++++++++++++++++ mm/Kconfig | 3 +- tools/testing/selftests/x86/Makefile | 3 +- tools/testing/selftests/x86/test_pks.c | 150 +++++ 11 files changed, 903 insertions(+), 4 deletions(-) create mode 100644 lib/pks/Makefile create mode 100644 lib/pks/pks_test.c create mode 100644 tools/testing/selftests/x86/test_pks.c diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index 6d6c4f25080c..2bcbb991231b 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -120,7 +120,8 @@ PTE adds this additional protection to the page. Kernel users intending to use PKS support should check (depend on) ARCH_HAS_SUPERVISOR_PKEYS and add their config to ARCH_ENABLE_SUPERVISOR_PKEYS -to turn on this support within the core. +to turn on this support within the core. See the test configuration option +'PKS_TEST' for an example. int pks_key_alloc(const char * const pkey_user); #define PAGE_KERNEL_PKEY(pkey) @@ -170,3 +171,5 @@ text: affected by PKRU register will not execute (even transiently) until all prior executions of WRPKRU have completed execution and updated the PKRU register. + +Example code can be found in lib/pks/pks_test.c diff --git a/arch/x86/include/asm/pks.h b/arch/x86/include/asm/pks.h index 4891c9aa8fc7..9e71322b0cf2 100644 --- a/arch/x86/include/asm/pks.h +++ b/arch/x86/include/asm/pks.h @@ -32,4 +32,23 @@ static inline void show_extended_regs_oops(struct pt_regs *regs, #endif /* CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS */ + +#ifdef CONFIG_PKS_TEST + +#define __static_or_pks_test + +bool handle_pks_test(unsigned long hw_error_code, struct pt_regs *regs); +bool pks_test_callback(struct pt_regs *regs); + +#else /* !CONFIG_PKS_TEST */ + +#define __static_or_pks_test static + +static inline bool handle_pks_test(unsigned long hw_error_code, struct pt_regs *regs) +{ + return false; +} + +#endif /* CONFIG_PKS_TEST */ + #endif /* _ASM_X86_PKS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 0c36ce2f6abf..764d2fbb6c72 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1134,6 +1134,19 @@ bool fault_in_kernel_space(unsigned long address) return address >= TASK_SIZE_MAX; } +#ifdef CONFIG_PKS_TEST +bool handle_pks_test(unsigned long hw_error_code, struct pt_regs *regs) +{ + /* + * If we get a protection key exception it could be because we + * are running the PKS test. If so, pks_test_callback() will + * clear the protection mechanism and return true to indicate + * the fault was handled. + */ + return (hw_error_code & X86_PF_PK) && pks_test_callback(regs); +} +#endif + /* * Called for all faults where 'address' is part of the kernel address * space. Might get called for faults that originate from *code* that @@ -1150,6 +1163,8 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, if (!cpu_feature_enabled(X86_FEATURE_PKS)) WARN_ON_ONCE(hw_error_code & X86_PF_PK); + if (handle_pks_test(hw_error_code, regs)) + return; #ifdef CONFIG_X86_32 /* diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 47d29707ac39..2dd1feeff9f6 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -236,7 +236,7 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) #ifdef CONFIG_ARCH_ENABLE_SUPERVISOR_PKEYS -static DEFINE_PER_CPU(u32, pkrs_cache); +__static_or_pks_test DEFINE_PER_CPU(u32, pkrs_cache); /* * write_pkrs() optimizes MSR writes by maintaining a per cpu cache which can diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 2779c29d9981..b7728ed139f9 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2535,6 +2535,17 @@ config HYPERV_TESTING help Select this option to enable Hyper-V vmbus testing. +config PKS_TEST + bool "PKey (S)upervisor testing" + depends on ARCH_HAS_SUPERVISOR_PKEYS + help + Select this option to enable testing of PKS core software and + hardware. The PKS core provides a mechanism to allocate keys as well + as maintain the protection settings across context switches. + Answer N if you don't know what supervisor keys are. + + If unsure, say N. + endmenu # "Kernel Testing and Coverage" source "Documentation/Kconfig" diff --git a/lib/Makefile b/lib/Makefile index b5307d3eec1a..3606b97e12e0 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -354,3 +354,6 @@ obj-$(CONFIG_BITS_TEST) += test_bits.o obj-$(CONFIG_CMDLINE_KUNIT_TEST) += cmdline_kunit.o obj-$(CONFIG_GENERIC_LIB_DEVMEM_IS_ALLOWED) += devmem_is_allowed.o + +# PKS test +obj-y += pks/ diff --git a/lib/pks/Makefile b/lib/pks/Makefile new file mode 100644 index 000000000000..9daccba4f7c4 --- /dev/null +++ b/lib/pks/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_PKS_TEST) += pks_test.o diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c new file mode 100644 index 000000000000..ca308f3ff5aa --- /dev/null +++ b/lib/pks/pks_test.c @@ -0,0 +1,693 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright(c) 2020 Intel Corporation. All rights reserved. + * + * Implement PKS testing + * Access to run this test can be with a command line parameter + * ("pks-test-on-boot") or more detailed tests can be triggered through: + * + * /sys/kernel/debug/x86/run_pks + * + * debugfs controls are: + * + * '0' -- Run access tests with a single pkey + * + * '1' -- Set up the pkey register with no access for the pkey allocated to + * this fd + * '2' -- Check that the pkey register updated in '1' is still the same. (To + * be used after a forced context switch.) + * + * '3' -- Allocate all pkeys possible and run tests on each pkey allocated. + * DEFAULT when run at boot. + * + * Closing the fd will cleanup and release the pkey. + * + * A companion user space program is provided in: + * + * .../tools/testing/selftests/x86/test_pks.c + * + * which will better test the context switching. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include /* for struct pt_regs */ +#include +#include +#include + +#define PKS_TEST_MEM_SIZE (PAGE_SIZE) + +#define RUN_ALLOCATE "0" +#define ARM_CTX_SWITCH "1" +#define CHECK_CTX_SWITCH "2" +#define RUN_ALLOCATE_ALL "3" +#define RUN_ALLOCATE_DEBUG "4" +#define RUN_ALLOCATE_ALL_DEBUG "5" +#define RUN_CRASH_TEST "9" + +DECLARE_PER_CPU(u32, pkrs_cache); + +/* + * run_on_boot default '= false' which checkpatch complains about initializing; + * so we don't + */ +static bool run_on_boot; +static struct dentry *pks_test_dentry; +static bool run_9; + +/* + * We must lock the following globals for brief periods while the fault handler + * checks/updates them. + */ +static DEFINE_SPINLOCK(test_lock); +static int test_armed_key; +static unsigned long prev_cnt; +static unsigned long fault_cnt; + +struct pks_test_ctx { + bool pass; + bool pks_cpu_enabled; + bool debug; + int pkey; + char data[64]; +}; +static struct pks_test_ctx *test_exception_ctx; + +static bool check_pkey_val(u32 pk_reg, int pkey, u32 expected) +{ + pk_reg = (pk_reg & PKR_PKEY_MASK(pkey)) >> PKR_PKEY_SHIFT(pkey); + return (pk_reg == expected); +} + +/* + * Check if the register @pkey value matches @expected value + * + * Both the cached and actual MSR must match. + */ +static bool check_pkrs(int pkey, u32 expected) +{ + bool ret = true; + u64 pkrs; + u32 *tmp_cache; + + tmp_cache = get_cpu_ptr(&pkrs_cache); + if (!check_pkey_val(*tmp_cache, pkey, expected)) + ret = false; + put_cpu_ptr(tmp_cache); + + rdmsrl(MSR_IA32_PKRS, pkrs); + if (!check_pkey_val(pkrs, pkey, expected)) + ret = false; + + return ret; +} + +static void check_exception(u32 thread_pkrs) +{ + /* Check the thread saved state */ + if (!check_pkey_val(thread_pkrs, test_armed_key, PKEY_DISABLE_WRITE)) { + pr_err(" FAIL: checking ept_regs->thread_pkrs\n"); + test_exception_ctx->pass = false; + } + + /* Check the exception state */ + if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) { + pr_err(" FAIL: PKRS cache and MSR\n"); + test_exception_ctx->pass = false; + } + + /* + * Check we can update the value during exception without affecting the + * calling thread. The calling thread is checked after exception... + */ + pks_mk_readwrite(test_armed_key); + if (!check_pkrs(test_armed_key, 0)) { + pr_err(" FAIL: exception did not change register to 0\n"); + test_exception_ctx->pass = false; + } + pks_mk_noaccess(test_armed_key); + if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) { + pr_err(" FAIL: exception did not change register to 0x%x\n", + PKEY_DISABLE_ACCESS); + test_exception_ctx->pass = false; + } +} + +/** + * pks_test_callback() is exported so that the fault handler can detect + * and report back status of intentional faults. + * + * NOTE: It clears the protection key from the page such that the fault handler + * will not re-trigger. + */ +bool pks_test_callback(struct pt_regs *regs) +{ + struct extended_pt_regs *ept_regs = extended_pt_regs(regs); + bool armed = (test_armed_key != 0); + + if (test_exception_ctx) { + check_exception(ept_regs->thread_pkrs); + /* + * We stop this check within the exception because the + * fault handler clean up code will call us 2x while checking + * the PMD entry and we don't need to check this again + */ + test_exception_ctx = NULL; + } + + if (armed) { + /* Enable read and write to stop faults */ + ept_regs->thread_pkrs = update_pkey_val(ept_regs->thread_pkrs, + test_armed_key, 0); + fault_cnt++; + } + + return armed; +} + +static bool exception_caught(void) +{ + bool ret = (fault_cnt != prev_cnt); + + prev_cnt = fault_cnt; + return ret; +} + +static void report_pkey_settings(void *info) +{ + u8 pkey; + unsigned long long msr = 0; + unsigned int cpu = smp_processor_id(); + struct pks_test_ctx *ctx = info; + + rdmsrl(MSR_IA32_PKRS, msr); + + pr_info("for CPU %d : 0x%llx\n", cpu, msr); + + if (ctx->debug) { + for (pkey = 0; pkey < PKS_NUM_KEYS; pkey++) { + int ad, wd; + + ad = (msr >> PKR_PKEY_SHIFT(pkey)) & PKEY_DISABLE_ACCESS; + wd = (msr >> PKR_PKEY_SHIFT(pkey)) & PKEY_DISABLE_WRITE; + pr_info(" %u: A:%d W:%d\n", pkey, ad, wd); + } + } +} + +enum pks_access_mode { + PKS_TEST_NO_ACCESS, + PKS_TEST_RDWR, + PKS_TEST_RDONLY +}; + +static char *get_mode_str(enum pks_access_mode mode) +{ + switch (mode) { + case PKS_TEST_NO_ACCESS: + return "No Access"; + case PKS_TEST_RDWR: + return "Read Write"; + case PKS_TEST_RDONLY: + return "Read Only"; + default: + pr_err("BUG in test invalid mode\n"); + break; + } + + return ""; +} + +struct pks_access_test { + enum pks_access_mode mode; + bool write; + bool exception; +}; + +static struct pks_access_test pkey_test_ary[] = { + /* disable both */ + { PKS_TEST_NO_ACCESS, true, true }, + { PKS_TEST_NO_ACCESS, false, true }, + + /* enable both */ + { PKS_TEST_RDWR, true, false }, + { PKS_TEST_RDWR, false, false }, + + /* enable read only */ + { PKS_TEST_RDONLY, true, true }, + { PKS_TEST_RDONLY, false, false }, +}; + +static int test_it(struct pks_test_ctx *ctx, struct pks_access_test *test, void *ptr) +{ + bool exception; + int ret = 0; + + spin_lock(&test_lock); + WRITE_ONCE(test_armed_key, ctx->pkey); + + if (test->write) + memcpy(ptr, ctx->data, 8); + else + memcpy(ctx->data, ptr, 8); + + exception = exception_caught(); + + WRITE_ONCE(test_armed_key, 0); + spin_unlock(&test_lock); + + if (test->exception != exception) { + pr_err("pkey test FAILED: mode %s; write %s; exception %s != %s\n", + get_mode_str(test->mode), + test->write ? "TRUE" : "FALSE", + test->exception ? "TRUE" : "FALSE", + exception ? "TRUE" : "FALSE"); + ret = -EFAULT; + } + + return ret; +} + +static int run_access_test(struct pks_test_ctx *ctx, + struct pks_access_test *test, + void *ptr) +{ + switch (test->mode) { + case PKS_TEST_NO_ACCESS: + pks_mk_noaccess(ctx->pkey); + break; + case PKS_TEST_RDWR: + pks_mk_readwrite(ctx->pkey); + break; + case PKS_TEST_RDONLY: + pks_mk_readonly(ctx->pkey); + break; + default: + pr_err("BUG in test invalid mode\n"); + break; + } + + return test_it(ctx, test, ptr); +} + +static void *alloc_test_page(int pkey) +{ + return __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, VMALLOC_END, + GFP_KERNEL, PAGE_KERNEL_PKEY(pkey), 0, + NUMA_NO_NODE, __builtin_return_address(0)); +} + +static void test_mem_access(struct pks_test_ctx *ctx) +{ + int i, rc; + u8 pkey; + void *ptr = NULL; + pte_t *ptep = NULL; + spinlock_t *ptl; + + ptr = alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err("Failed to vmalloc page???\n"); + ctx->pass = false; + return; + } + + if (follow_pte(&init_mm, (unsigned long)ptr, &ptep, &ptl)) { + pr_err("Failed to walk table???\n"); + ctx->pass = false; + goto done; + } + + pkey = pte_flags_pkey(ptep->pte); + pr_info("ptep flags 0x%lx pkey %u\n", + (unsigned long)ptep->pte, pkey); + pte_unmap_unlock(ptep, ptl); + + if (pkey != ctx->pkey) { + pr_err("invalid pkey found: %u, test_pkey: %u\n", + pkey, ctx->pkey); + ctx->pass = false; + goto done; + } + + if (!ctx->pks_cpu_enabled) { + pr_err("not CPU enabled; skipping access tests...\n"); + ctx->pass = true; + goto done; + } + + for (i = 0; i < ARRAY_SIZE(pkey_test_ary); i++) { + rc = run_access_test(ctx, &pkey_test_ary[i], ptr); + + /* only save last error is fine */ + if (rc) + ctx->pass = false; + } + +done: + vfree(ptr); +} + +static void pks_run_test(struct pks_test_ctx *ctx) +{ + ctx->pass = true; + + pr_info("\n"); + pr_info("\n"); + pr_info(" ***** BEGIN: Testing (CPU enabled : %s) *****\n", + ctx->pks_cpu_enabled ? "TRUE" : "FALSE"); + + if (ctx->pks_cpu_enabled) + on_each_cpu(report_pkey_settings, ctx, 1); + + pr_info(" BEGIN: pkey %d Testing\n", ctx->pkey); + test_mem_access(ctx); + pr_info(" END: PAGE_KERNEL_PKEY Testing : %s\n", + ctx->pass ? "PASS" : "FAIL"); + + pr_info(" ***** END: Testing *****\n"); + pr_info("\n"); + pr_info("\n"); +} + +static ssize_t pks_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct pks_test_ctx *ctx = file->private_data; + char buf[32]; + unsigned int len; + + if (!ctx) + len = sprintf(buf, "not run\n"); + else + len = sprintf(buf, "%s\n", ctx->pass ? "PASS" : "FAIL"); + + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +static struct pks_test_ctx *alloc_ctx(const char *name) +{ + struct pks_test_ctx *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + + if (!ctx) { + pr_err("Failed to allocate memory for test context\n"); + return ERR_PTR(-ENOMEM); + } + + ctx->pkey = pks_key_alloc(name); + if (ctx->pkey <= 0) { + pr_err("Failed to allocate memory for test context\n"); + kfree(ctx); + return ERR_PTR(-ENOMEM); + } + + ctx->pks_cpu_enabled = cpu_feature_enabled(X86_FEATURE_PKS); + sprintf(ctx->data, "%s", "DEADBEEF"); + return ctx; +} + +static void free_ctx(struct pks_test_ctx *ctx) +{ + pks_key_free(ctx->pkey); + kfree(ctx); +} + +static void run_exception_test(void) +{ + void *ptr = NULL; + bool pass = true; + struct pks_test_ctx *ctx; + + pr_info(" ***** BEGIN: exception checking\n"); + + ctx = alloc_ctx("Exception test"); + if (IS_ERR(ctx)) { + pr_err(" FAIL: no context\n"); + pass = false; + goto result; + } + ctx->pass = true; + + ptr = alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err(" FAIL: no vmalloc page\n"); + pass = false; + goto free_context; + } + + pks_mk_readonly(ctx->pkey); + + spin_lock(&test_lock); + WRITE_ONCE(test_exception_ctx, ctx); + WRITE_ONCE(test_armed_key, ctx->pkey); + + memcpy(ptr, ctx->data, 8); + + if (!exception_caught()) { + pr_err(" FAIL: did not get an exception\n"); + pass = false; + } + + /* + * NOTE The exception code has to enable access (b00) to keep the + * fault from looping forever. So we don't see the write disabled + * restored but rather full access restored. Also note that as part + * of this test the exception callback attempted to disable access + * completely (b11) and so we ensure that we are seeing the proper + * thread value restored here. + */ + if (!check_pkrs(test_armed_key, 0)) { + pr_err(" FAIL: PKRS not restored\n"); + pass = false; + } + + if (!ctx->pass) + pass = false; + + WRITE_ONCE(test_armed_key, 0); + spin_unlock(&test_lock); + + vfree(ptr); +free_context: + free_ctx(ctx); +result: + pr_info(" ***** END: exception checking : %s\n", + pass ? "PASS" : "FAIL"); +} + +static void run_all(bool debug) +{ + struct pks_test_ctx *ctx[PKS_NUM_KEYS]; + static char name[PKS_NUM_KEYS][64]; + int i; + + for (i = 1; i < PKS_NUM_KEYS; i++) { + sprintf(name[i], "pks ctx %d", i); + ctx[i] = alloc_ctx((const char *)name[i]); + if (!IS_ERR(ctx[i])) + ctx[i]->debug = debug; + } + + for (i = 1; i < PKS_NUM_KEYS; i++) { + if (!IS_ERR(ctx[i])) + pks_run_test(ctx[i]); + } + + for (i = 1; i < PKS_NUM_KEYS; i++) { + if (!IS_ERR(ctx[i])) + free_ctx(ctx[i]); + } + + run_exception_test(); +} + +static void crash_it(void) +{ + struct pks_test_ctx *ctx; + void *ptr; + + pr_warn(" ***** BEGIN: Unhandled fault test *****\n"); + + ctx = alloc_ctx("crashing kernel\n"); + if (IS_ERR(ctx)) { + pr_err("Failed to allocate context???\n"); + return; + } + + ptr = alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err("Failed to vmalloc page???\n"); + ctx->pass = false; + return; + } + + pks_mk_noaccess(ctx->pkey); + + spin_lock(&test_lock); + WRITE_ONCE(test_armed_key, 0); + /* This purposely faults */ + memcpy(ptr, ctx->data, 8); + spin_unlock(&test_lock); + + vfree(ptr); + free_ctx(ctx); +} + +static ssize_t pks_write_file(struct file *file, const char __user *user_buf, + size_t count, loff_t *ppos) +{ + char buf[2]; + struct pks_test_ctx *ctx = file->private_data; + + if (copy_from_user(buf, user_buf, 1)) + return -EFAULT; + buf[1] = '\0'; + + /* + * WARNING: Test "9" will crash the kernel. + * + * So we arm the test and print a warning. A second "9" will run the + * test. + */ + if (!strcmp(buf, RUN_CRASH_TEST)) { + if (run_9) { + crash_it(); + run_9 = false; + } else { + pr_warn("CAUTION: Test 9 will crash in the kernel.\n"); + pr_warn(" Specify 9 a second time to run\n"); + pr_warn(" run any other test to clear\n"); + run_9 = true; + } + } else { + run_9 = false; + } + + /* + * Test "3" will test allocating all keys. Do it first without + * using "ctx". + */ + if (!strcmp(buf, RUN_ALLOCATE_ALL)) + run_all(false); + if (!strcmp(buf, RUN_ALLOCATE_ALL_DEBUG)) + run_all(true); + + /* + * This context is only required if the file is held open for the below + * tests. Otherwise the context just get's freed in pks_release_file. + */ + if (!ctx) { + ctx = alloc_ctx("pks test"); + if (IS_ERR(ctx)) + return -ENOMEM; + file->private_data = ctx; + } + + if (!strcmp(buf, RUN_ALLOCATE)) { + ctx->debug = false; + pks_run_test(ctx); + } + if (!strcmp(buf, RUN_ALLOCATE_DEBUG)) { + ctx->debug = true; + pks_run_test(ctx); + } + + /* start of context switch test */ + if (!strcmp(buf, ARM_CTX_SWITCH)) { + unsigned long reg_pkrs; + int access; + + /* Ensure a known state to test context switch */ + pks_mk_readwrite(ctx->pkey); + + rdmsrl(MSR_IA32_PKRS, reg_pkrs); + + access = (reg_pkrs >> PKR_PKEY_SHIFT(ctx->pkey)) & + PKEY_ACCESS_MASK; + pr_info("Context switch armed : pkey %d: 0x%x reg: 0x%lx\n", + ctx->pkey, access, reg_pkrs); + } + + /* After context switch msr should be restored */ + if (!strcmp(buf, CHECK_CTX_SWITCH) && ctx->pks_cpu_enabled) { + unsigned long reg_pkrs; + int access; + + rdmsrl(MSR_IA32_PKRS, reg_pkrs); + + access = (reg_pkrs >> PKR_PKEY_SHIFT(ctx->pkey)) & + PKEY_ACCESS_MASK; + if (access != 0) { + ctx->pass = false; + pr_err("Context switch check failed: pkey %d: 0x%x reg: 0x%lx\n", + ctx->pkey, access, reg_pkrs); + } else { + pr_err("Context switch check passed: pkey %d: 0x%x reg: 0x%lx\n", + ctx->pkey, access, reg_pkrs); + } + } + + return count; +} + +static int pks_release_file(struct inode *inode, struct file *file) +{ + struct pks_test_ctx *ctx = file->private_data; + + if (!ctx) + return 0; + + free_ctx(ctx); + return 0; +} + +static const struct file_operations fops_init_pks = { + .read = pks_read_file, + .write = pks_write_file, + .llseek = default_llseek, + .release = pks_release_file, +}; + +static int __init parse_pks_test_options(char *str) +{ + run_on_boot = true; + + return 0; +} +early_param("pks-test-on-boot", parse_pks_test_options); + +static int __init pks_test_init(void) +{ + if (cpu_feature_enabled(X86_FEATURE_PKS)) { + if (run_on_boot) + run_all(true); + + pks_test_dentry = debugfs_create_file("run_pks", 0600, arch_debugfs_dir, + NULL, &fops_init_pks); + } + + return 0; +} +late_initcall(pks_test_init); + +static void __exit pks_test_exit(void) +{ + debugfs_remove(pks_test_dentry); + pr_info("test exit\n"); +} diff --git a/mm/Kconfig b/mm/Kconfig index c7d1fc780358..463e95ea0df1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -811,7 +811,8 @@ config ARCH_HAS_PKEYS config ARCH_HAS_SUPERVISOR_PKEYS bool config ARCH_ENABLE_SUPERVISOR_PKEYS - bool + def_bool y + depends on PKS_TEST config PERCPU_STATS bool "Collect percpu memory statistics" diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 333980375bc7..32fe0414c6af 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -13,7 +13,8 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vsyscall mov_ss_trap \ - syscall_arg_fault fsgsbase_restore + syscall_arg_fault fsgsbase_restore test_pks + TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftests/x86/test_pks.c new file mode 100644 index 000000000000..62146cd59eb5 --- /dev/null +++ b/tools/testing/selftests/x86/test_pks.c @@ -0,0 +1,150 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PKS_TEST_FILE "/sys/kernel/debug/x86/run_pks" + +#define RUN_ALLOCATE "0" +#define SETUP_CTX_SWITCH "1" +#define CHECK_CTX_SWITCH "2" +#define RUN_ALLOCATE_ALL "3" +#define RUN_ALLOCATE_DEBUG "4" +#define RUN_ALLOCATE_ALL_DEBUG "5" +#define RUN_CRASH_TEST "9" + +int main(int argc, char *argv[]) +{ + cpu_set_t cpuset; + char result[32]; + pid_t pid; + int fd; + int setup_done[2]; + int switch_done[2]; + int cpu = 0; + int rc = 0; + int c; + bool debug = false; + + while (1) { + int option_index = 0; + static struct option long_options[] = { + {"debug", no_argument, 0, 0 }, + {0, 0, 0, 0 } + }; + + c = getopt_long(argc, argv, "", long_options, &option_index); + if (c == -1) + break; + + switch (c) { + case 0: + debug = true; + break; + } + } + + if (optind < argc) + cpu = strtoul(argv[optind], NULL, 0); + + if (cpu >= sysconf(_SC_NPROCESSORS_ONLN)) { + printf("CPU %d is invalid\n", cpu); + cpu = sysconf(_SC_NPROCESSORS_ONLN) - 1; + printf(" running on max CPU: %d\n", cpu); + } + + CPU_ZERO(&cpuset); + CPU_SET(cpu, &cpuset); + /* Two processes run on CPU 0 so that they go through context switch. */ + sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset); + + if (pipe(setup_done)) + printf("Failed to create pipe\n"); + if (pipe(switch_done)) + printf("Failed to create pipe\n"); + + pid = fork(); + if (pid == 0) { + char done = 'y'; + + fd = open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + printf("cannot open %s\n", PKS_TEST_FILE); + return -1; + } + + cpu = sched_getcpu(); + printf("Child running on cpu %d...\n", cpu); + + /* Allocate test_pkey1 and run test. */ + if (debug) + write(fd, RUN_ALLOCATE_DEBUG, 1); + else + write(fd, RUN_ALLOCATE, 1); + + /* Arm for context switch test */ + write(fd, SETUP_CTX_SWITCH, 1); + + printf(" tell parent to go\n"); + write(setup_done[1], &done, sizeof(done)); + + /* Context switch out... */ + printf(" Waiting for parent...\n"); + read(switch_done[0], &done, sizeof(done)); + + /* Check msr restored */ + printf("Checking result\n"); + write(fd, CHECK_CTX_SWITCH, 1); + + read(fd, result, 10); + printf(" #PF, context switch, pkey allocation and free tests: %s\n", result); + if (!strncmp(result, "PASS", 10)) { + rc = -1; + done = 'F'; + } + + /* Signal result */ + write(setup_done[1], &done, sizeof(done)); + } else { + char done = 'y'; + + read(setup_done[0], &done, sizeof(done)); + cpu = sched_getcpu(); + printf("Parent running on cpu %d\n", cpu); + + fd = open(PKS_TEST_FILE, O_RDWR); + if (fd < 0) { + printf("cannot open %s\n", PKS_TEST_FILE); + return -1; + } + + /* run test with alternate pkey */ + if (debug) + write(fd, RUN_ALLOCATE_DEBUG, 1); + else + write(fd, RUN_ALLOCATE, 1); + + /* Signal child we are done. */ + printf(" Telling child we are done.\n"); + write(switch_done[1], &done, sizeof(done)); + + /* Wait for result */ + read(setup_done[0], &done, sizeof(done)); + if (done == 'F') + rc = -1; + } + + close(fd); + + return rc; +}