From patchwork Fri Oct 9 19:42:51 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 285942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 069DBC4363D for ; Fri, 9 Oct 2020 19:43:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D309322282 for ; Fri, 9 Oct 2020 19:43:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390767AbgJITnQ (ORCPT ); Fri, 9 Oct 2020 15:43:16 -0400 Received: from mga07.intel.com ([134.134.136.100]:55979 "EHLO mga07.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390752AbgJITnN (ORCPT ); Fri, 9 Oct 2020 15:43:13 -0400 IronPort-SDR: xvNZtwplYA0Lr6RImAn6BUpfnfS/aO4ACP9ti3g91ZVfo0y/+wZEBWP/jtSG7GQv0i2/8YJGQZ iUqChcSoHkSQ== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="229714253" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="229714253" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:12 -0700 IronPort-SDR: qfQGwj/Ls/ugPHwCmAAiBeOeJJx3gtIcomNrAcTjfMS2lWba5WxQkMOPVaTlBjfwy9WZfzuusi pcPsJ+HG3BGA== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="343970170" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:11 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [PATCH RFC V3 2/9] x86/fpu: Refactor arch_set_user_pkey_access() for PKS support Date: Fri, 9 Oct 2020 12:42:51 -0700 Message-Id: <20201009194258.3207172-3-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201009194258.3207172-1-ira.weiny@intel.com> References: <20201009194258.3207172-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu Define a helper, update_pkey_val(), which will be used to support both Protection Key User (PKU) and the new Protection Key for Supervisor (PKS) in subsequent patches. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu Signed-off-by: Peter Zijlstra --- arch/x86/include/asm/pkeys.h | 2 ++ arch/x86/kernel/fpu/xstate.c | 22 ++++------------------ arch/x86/mm/pkeys.c | 21 +++++++++++++++++++++ 3 files changed, 27 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index f9feba80894b..4526245b03e5 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -136,4 +136,6 @@ static inline int vma_pkey(struct vm_area_struct *vma) return (vma->vm_flags & vma_pkey_mask) >> VM_PKEY_SHIFT; } +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags); + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index b55cf3acd82a..725f10670d0a 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -990,9 +990,7 @@ const void *get_xsave_field_ptr(int xfeature_nr) int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val) { - u32 old_pkru; - int pkey_shift = (pkey * PKR_BITS_PER_PKEY); - u32 new_pkru_bits = 0; + u32 pkru; /* * This check implies XSAVE support. OSPKE only gets @@ -1008,21 +1006,9 @@ int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, */ WARN_ON_ONCE(pkey >= arch_max_pkey()); - /* Set the bits we need in PKRU: */ - if (init_val & PKEY_DISABLE_ACCESS) - new_pkru_bits |= PKR_AD_BIT; - if (init_val & PKEY_DISABLE_WRITE) - new_pkru_bits |= PKR_WD_BIT; - - /* Shift the bits in to the correct place in PKRU for pkey: */ - new_pkru_bits <<= pkey_shift; - - /* Get old PKRU and mask off any old bits in place: */ - old_pkru = read_pkru(); - old_pkru &= ~((PKR_AD_BIT|PKR_WD_BIT) << pkey_shift); - - /* Write old part along with new part: */ - write_pkru(old_pkru | new_pkru_bits); + pkru = read_pkru(); + pkru = update_pkey_val(pkru, pkey, init_val); + write_pkru(pkru); return 0; } diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index f5efb4007e74..3cf8f775f36d 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -208,3 +208,24 @@ static __init int setup_init_pkru(char *opt) return 1; } __setup("init_pkru=", setup_init_pkru); + +/* + * Update the pk_reg value and return it. + * + * Kernel users use the same flags as user space: + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + */ +u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) +{ + int pkey_shift = pkey * PKR_BITS_PER_PKEY; + + pk_reg &= ~(((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift); + + if (flags & PKEY_DISABLE_ACCESS) + pk_reg |= PKR_AD_BIT << pkey_shift; + if (flags & PKEY_DISABLE_WRITE) + pk_reg |= PKR_WD_BIT << pkey_shift; + + return pk_reg; +} From patchwork Fri Oct 9 19:42:54 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 285943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5D7AC35257 for ; Fri, 9 Oct 2020 19:43:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 83315222BA for ; Fri, 9 Oct 2020 19:43:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2403796AbgJITn1 (ORCPT ); Fri, 9 Oct 2020 15:43:27 -0400 Received: from mga04.intel.com ([192.55.52.120]:42712 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2403771AbgJITnU (ORCPT ); Fri, 9 Oct 2020 15:43:20 -0400 IronPort-SDR: xCN4lmGxohXITFGjHslj2U58MIRBIHhgYZ5cKZn64N7lDpYz6ow0NJZFRyTL4K6aaZRnqR/F3I AM3o14tl7rYA== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="162892651" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="162892651" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:19 -0700 IronPort-SDR: ioOUcujqdf9W3f9A6sGS1TlnTSiA9NRqPzQ0YBRPZvGczCu2a0fYpeabI4AK0/JIc1oFhf9b/o v0j54S4+A2ow== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="518798290" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:18 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Fenghua Yu , Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [PATCH RFC V3 5/9] x86/pks: Add PKS kernel API Date: Fri, 9 Oct 2020 12:42:54 -0700 Message-Id: <20201009194258.3207172-6-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201009194258.3207172-1-ira.weiny@intel.com> References: <20201009194258.3207172-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Fenghua Yu PKS allows kernel users to define domains of page mappings which have additional protections beyond the paging protections. Add an API to allocate, use, and free a protection key which identifies such a domain. Export 5 new symbols pks_key_alloc(), pks_mknoaccess(), pks_mkread(), pks_mkrdwr(), and pks_key_free(). Add 2 new macros; PAGE_KERNEL_PKEY(key) and _PAGE_PKEY(pkey). Update the protection key documentation to cover pkeys on supervisor pages. Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Signed-off-by: Fenghua Yu --- Documentation/core-api/protection-keys.rst | 101 ++++++++++++++--- arch/x86/include/asm/pgtable_types.h | 12 ++ arch/x86/include/asm/pkeys.h | 11 ++ arch/x86/include/asm/pkeys_common.h | 4 + arch/x86/mm/pkeys.c | 125 +++++++++++++++++++++ include/linux/pgtable.h | 4 + include/linux/pkeys.h | 22 ++++ 7 files changed, 261 insertions(+), 18 deletions(-) diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index ec575e72d0b2..00a046a913e4 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -4,25 +4,33 @@ Memory Protection Keys ====================== -Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature -which is found on Intel's Skylake (and later) "Scalable Processor" -Server CPUs. It will be available in future non-server Intel parts -and future AMD processors. - -For anyone wishing to test or use this feature, it is available in -Amazon's EC2 C5 instances and is known to work there using an Ubuntu -17.04 image. - Memory Protection Keys provides a mechanism for enforcing page-based protections, but without requiring modification of the page tables -when an application changes protection domains. It works by -dedicating 4 previously ignored bits in each page table entry to a -"protection key", giving 16 possible keys. +when an application changes protection domains. + +PKeys Userspace (PKU) is a feature which is found on Intel's Skylake "Scalable +Processor" Server CPUs and later. And It will be available in future +non-server Intel parts and future AMD processors. + +Future Intel processors will support Protection Keys for Supervisor pages +(PKS). + +For anyone wishing to test or use user space pkeys, it is available in Amazon's +EC2 C5 instances and is known to work there using an Ubuntu 17.04 image. + +pkeys work by dedicating 4 previously Reserved bits in each page table entry to +a "protection key", giving 16 possible keys. User and Supervisor pages are +treated separately. + +Protections for each page are controlled with per CPU registers for each type +of page User and Supervisor. Each of these 32 bit register stores two separate +bits (Access Disable and Write Disable) for each key. -There is also a new user-accessible register (PKRU) with two separate -bits (Access Disable and Write Disable) for each key. Being a CPU -register, PKRU is inherently thread-local, potentially giving each -thread a different set of protections from every other thread. +For Userspace the register is user-accessible (rdpkru/wrpkru). For +Supervisor, the register (MSR_IA32_PKRS) is accessible only to the kernel. + +Being a CPU register, pkeys are inherently thread-local, potentially giving +each thread an independent set of protections from every other thread. There are two new instructions (RDPKRU/WRPKRU) for reading and writing to the new register. The feature is only available in 64-bit mode, @@ -30,8 +38,11 @@ even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches. -Syscalls -======== +For kernel space rdmsr/wrmsr are used to access the kernel MSRs. + + +Syscalls for user space keys +============================ There are 3 system calls which directly interact with pkeys:: @@ -98,3 +109,57 @@ with a read():: The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. + + +Kernel API for PKS support +========================== + +The following interface is used to allocate, use, and free a pkey which defines +a 'protection domain' within the kernel. Setting a pkey value in a supervisor +mapping adds that mapping to the protection domain. + + int pks_key_alloc(const char * const pkey_user); + #define PAGE_KERNEL_PKEY(pkey) + #define _PAGE_KEY(pkey) + void pks_mknoaccess(int pkey); + void pks_mkread(int pkey); + void pks_mkrdwr(int pkey); + void pks_key_free(int pkey); + +pks_key_alloc() allocates keys dynamically to allow better use of the limited +key space. + +Callers of pks_key_alloc() _must_ be prepared for it to fail and take +appropriate action. This is due mainly to the fact that PKS may not be +available on all arch's. Failure to check the return of pks_key_alloc() and +using any of the rest of the API is undefined. + +Kernel users must set the PTE permissions in the page table entries for the +mappings they want to protect. This can be done with PAGE_KERNEL_PKEY() or +_PAGE_KEY(). + +The pks_mk*() family of calls allows kernel users the ability to change the +protections for the domain identified by the pkey specified. 3 states are +available pks_mknoaccess(), pks_mkread(), and pks_mkrdwr() which set the access +to none, read, and read/write respectively. + +Finally, pks_key_free() allows a user to return the key to the allocator for +use by others. + +The interface maintains pks_mknoaccess() (Access Disabled (AD=1)) for all keys +not currently allocated. Therefore, the user can depend on access being +disabled when pks_key_alloc() returns a key and the user should remove mappings +from the domain (remove the pkey from the PTE) prior to calling pks_key_free(). + +It should be noted that the underlying WRMSR(MSR_IA32_PKRS) is not serializing +but still maintains ordering properties similar to WRPKRU. Thus it is safe to +immediately use a mapping when the pks_mk*() functions returns. + +The current SDM section on PKRS needs updating but should be the same as that +of WRPKRU. So to quote from the WRPKRU text: + + WRPKRU will never execute transiently. Memory accesses + affected by PKRU register will not execute (even transiently) + until all prior executions of WRPKRU have completed execution + and updated the PKRU register. + diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 816b31c68550..c9fdfbdcbbfb 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -73,6 +73,12 @@ _PAGE_PKEY_BIT2 | \ _PAGE_PKEY_BIT3) +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +#define _PAGE_PKEY(pkey) (_AT(pteval_t, pkey) << _PAGE_BIT_PKEY_BIT0) +#else +#define _PAGE_PKEY(pkey) (_AT(pteval_t, 0)) +#endif + #if defined(CONFIG_X86_64) || defined(CONFIG_X86_PAE) #define _PAGE_KNL_ERRATUM_MASK (_PAGE_DIRTY | _PAGE_ACCESSED) #else @@ -229,6 +235,12 @@ enum page_cache_mode { #define PAGE_KERNEL_IO __pgprot_mask(__PAGE_KERNEL_IO) #define PAGE_KERNEL_IO_NOCACHE __pgprot_mask(__PAGE_KERNEL_IO_NOCACHE) +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +#define PAGE_KERNEL_PKEY(pkey) __pgprot_mask(__PAGE_KERNEL | _PAGE_PKEY(pkey)) +#else +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + #endif /* __ASSEMBLY__ */ /* xwr */ diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 4526245b03e5..79952216474e 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -3,6 +3,7 @@ #define _ASM_X86_PKEYS_H #include +#include #define ARCH_DEFAULT_PKEY 0 @@ -138,4 +139,14 @@ static inline int vma_pkey(struct vm_area_struct *vma) u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags); +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +int pks_key_alloc(const char *const pkey_user); +void pks_key_free(int pkey); + +void pks_mknoaccess(int pkey); +void pks_mkread(int pkey); +void pks_mkrdwr(int pkey); + +#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h index 05781be33c14..40464c170522 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -22,6 +22,10 @@ PKR_AD_KEY(10) | PKR_AD_KEY(11) | PKR_AD_KEY(12) | \ PKR_AD_KEY(13) | PKR_AD_KEY(14) | PKR_AD_KEY(15)) +/* PKS supports 16 keys. Key 0 is reserved for the kernel. */ +#define PKS_KERN_DEFAULT_KEY 0 +#define PKS_NUM_KEYS 16 + #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS void write_pkrs(u32 new_pkrs); #else diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 30f65dd3d0c5..1d9f451b4b78 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -3,6 +3,9 @@ * Intel Memory Protection Keys management * Copyright (c) 2015, Intel Corporation. */ +#undef pr_fmt +#define pr_fmt(fmt) "x86/pkeys: " fmt + #include /* debugfs_create_u32() */ #include /* mm_struct, vma, etc... */ #include /* PKEY_* */ @@ -229,6 +232,7 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags) return pk_reg; } +EXPORT_SYMBOL_GPL(update_pkey_val); DEFINE_PER_CPU(u32, pkrs_cache); @@ -257,3 +261,124 @@ void write_pkrs(u32 new_pkrs) } put_cpu_ptr(pkrs); } +EXPORT_SYMBOL_GPL(write_pkrs); + +/** + * Do not call this directly, see pks_mk*() below. + * + * @pkey: Key for the domain to change + * @protection: protection bits to be used + * + * Protection utilizes the same protection bits specified for User pkeys + * PKEY_DISABLE_ACCESS + * PKEY_DISABLE_WRITE + * + */ +static inline void pks_update_protection(int pkey, unsigned long protection) +{ + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs, + pkey, protection); + preempt_disable(); + write_pkrs(current->thread.saved_pkrs); + preempt_enable(); +} + +/** + * PKS access control functions + * + * Change the access of the domain specified by the pkey. These are global + * updates. They only affects the current running thread. It is undefined and + * a bug for users to call this without having allocated a pkey and using it as + * pkey here. + * + * pks_mknoaccess() + * Disable all access to the domain + * pks_mkread() + * Make the domain Read only + * pks_mkrdwr() + * Make the domain Read/Write + * + * @pkey the pkey for which the access should change. + * + */ +void pks_mknoaccess(int pkey) +{ + pks_update_protection(pkey, PKEY_DISABLE_ACCESS); +} +EXPORT_SYMBOL_GPL(pks_mknoaccess); + +void pks_mkread(int pkey) +{ + pks_update_protection(pkey, PKEY_DISABLE_WRITE); +} +EXPORT_SYMBOL_GPL(pks_mkread); + +void pks_mkrdwr(int pkey) +{ + pks_update_protection(pkey, 0); +} +EXPORT_SYMBOL_GPL(pks_mkrdwr); + +static const char pks_key_user0[] = "kernel"; + +/* Store names of allocated keys for debug. Key 0 is reserved for the kernel. */ +static const char *pks_key_users[PKS_NUM_KEYS] = { + pks_key_user0 +}; + +/* + * Each key is represented by a bit. Bit 0 is set for key 0 and reserved for + * its use. We use ulong for the bit operations but only 16 bits are used. + */ +static unsigned long pks_key_allocation_map = 1 << PKS_KERN_DEFAULT_KEY; + +/* + * pks_key_alloc - Allocate a PKS key + * + * @pkey_user: String stored for debugging of key exhaustion. The caller is + * responsible to maintain this memory until pks_key_free(). + */ +int pks_key_alloc(const char * const pkey_user) +{ + int nr; + + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return -EINVAL; + + while (1) { + nr = find_first_zero_bit(&pks_key_allocation_map, PKS_NUM_KEYS); + if (nr >= PKS_NUM_KEYS) { + pr_info("Cannot allocate supervisor key for %s.\n", + pkey_user); + return -ENOSPC; + } + if (!test_and_set_bit_lock(nr, &pks_key_allocation_map)) + break; + } + + /* for debugging key exhaustion */ + pks_key_users[nr] = pkey_user; + + return nr; +} +EXPORT_SYMBOL_GPL(pks_key_alloc); + +/* + * pks_key_free - Free a previously allocate PKS key + * + * @pkey: Key to be free'ed + */ +void pks_key_free(int pkey) +{ + if (!cpu_feature_enabled(X86_FEATURE_PKS)) + return; + + if (pkey >= PKS_NUM_KEYS || pkey <= PKS_KERN_DEFAULT_KEY) + return; + + /* Restore to default of no access */ + pks_mknoaccess(pkey); + pks_key_users[pkey] = NULL; + __clear_bit(pkey, &pks_key_allocation_map); +} +EXPORT_SYMBOL_GPL(pks_key_free); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 90654cb63e9e..6900182d53ee 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1374,6 +1374,10 @@ static inline bool arch_has_pfn_modify_check(void) # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif +#ifndef PAGE_KERNEL_PKEY +#define PAGE_KERNEL_PKEY(pkey) PAGE_KERNEL +#endif + /* * Page Table Modification bits for pgtbl_mod_mask. * diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 2955ba976048..cc3510cde64e 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -50,4 +50,26 @@ static inline void copy_init_pkru_to_fpregs(void) #endif /* ! CONFIG_ARCH_HAS_PKEYS */ +#ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +static inline int pks_key_alloc(const char * const pkey_user) +{ + return -EINVAL; +} +static inline void pks_key_free(int pkey) +{ +} +static inline void pks_mknoaccess(int pkey) +{ + WARN_ON_ONCE(1); +} +static inline void pks_mkread(int pkey) +{ + WARN_ON_ONCE(1); +} +static inline void pks_mkrdwr(int pkey) +{ + WARN_ON_ONCE(1); +} +#endif /* ! CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + #endif /* _LINUX_PKEYS_H */ From patchwork Fri Oct 9 19:42:57 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 285941 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DAF1EC2D0A1 for ; Fri, 9 Oct 2020 19:43:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id AC90C222BA for ; Fri, 9 Oct 2020 19:43:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390808AbgJITnr (ORCPT ); Fri, 9 Oct 2020 15:43:47 -0400 Received: from mga06.intel.com ([134.134.136.31]:65384 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2403805AbgJITna (ORCPT ); Fri, 9 Oct 2020 15:43:30 -0400 IronPort-SDR: M3yCscjQvjnbkZMd0d74M8Y4PUaDkGusqmCzfjBU4Q16X7GUa9m0Ppew/cbpxn8aZqcia1H4AJ u3qiPHfuNYiw== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="227178066" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="227178066" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:28 -0700 IronPort-SDR: U5A+gl4oWFGZa9Q6DUignU6A8dgYfIP71pLrcRb2eqmqdmwzofAlTp5hKTDIL127ghjKdH+BQJ PCWjfdFM3joA== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="329004275" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:27 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , x86@kernel.org, Dave Hansen , Dan Williams , Andrew Morton , Fenghua Yu , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [PATCH RFC V3 8/9] x86/fault: Report the PKRS state on fault Date: Fri, 9 Oct 2020 12:42:57 -0700 Message-Id: <20201009194258.3207172-9-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201009194258.3207172-1-ira.weiny@intel.com> References: <20201009194258.3207172-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny When only user space pkeys are enabled faulting within the kernel was an unexpected condition which should never happen, therefore a WARN_ON was added to the kernel fault handler to detect if it ever did. Now that PKS can be enabled this is no longer the case. Report a Pkey fault with a normal splat and add the PKRS state to the fault splat text. Note the PKS register is reset during an exception therefore the saved PKRS value from before the beginning of the exception is passed down. If PKS is not enabled, or not active, maintain the WARN_ON_ONCE() from before. Because each fault has its own state the pkrs information will be correctly reported even if a fault 'faults'. Suggested-by: Andy Lutomirski Signed-off-by: Ira Weiny --- arch/x86/mm/fault.c | 59 ++++++++++++++++++++++++++------------------- 1 file changed, 34 insertions(+), 25 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index e55bc4bff389..ee761c993f58 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -504,7 +504,8 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index) } static void -show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address) +show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long address, + irqentry_state_t *irq_state) { if (!oops_may_print()) return; @@ -548,6 +549,11 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad (error_code & X86_PF_PK) ? "protection keys violation" : "permissions violation"); +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS + if (irq_state && (error_code & X86_PF_PK)) + pr_alert("PKRS: 0x%x\n", irq_state->pkrs); +#endif + if (!(error_code & X86_PF_USER) && user_mode(regs)) { struct desc_ptr idt, gdt; u16 ldtr, tr; @@ -626,7 +632,8 @@ static void set_signal_archinfo(unsigned long address, static noinline void no_context(struct pt_regs *regs, unsigned long error_code, - unsigned long address, int signal, int si_code) + unsigned long address, int signal, int si_code, + irqentry_state_t *irq_state) { struct task_struct *tsk = current; unsigned long flags; @@ -732,7 +739,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, */ flags = oops_begin(); - show_fault_oops(regs, error_code, address); + show_fault_oops(regs, error_code, address, irq_state); if (task_stack_end_corrupted(tsk)) printk(KERN_EMERG "Thread overran stack, or stack corrupted\n"); @@ -785,7 +792,8 @@ static bool is_vsyscall_vaddr(unsigned long vaddr) static void __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, - unsigned long address, u32 pkey, int si_code) + unsigned long address, u32 pkey, int si_code, + irqentry_state_t *state) { struct task_struct *tsk = current; @@ -832,14 +840,14 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, if (is_f00f_bug(regs, address)) return; - no_context(regs, error_code, address, SIGSEGV, si_code); + no_context(regs, error_code, address, SIGSEGV, si_code, state); } static noinline void bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, - unsigned long address) + unsigned long address, irqentry_state_t *state) { - __bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR); + __bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR, state); } static void @@ -853,7 +861,7 @@ __bad_area(struct pt_regs *regs, unsigned long error_code, */ mmap_read_unlock(mm); - __bad_area_nosemaphore(regs, error_code, address, pkey, si_code); + __bad_area_nosemaphore(regs, error_code, address, pkey, si_code, NULL); } static noinline void @@ -923,7 +931,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, { /* Kernel mode? Handle exceptions or die: */ if (!(error_code & X86_PF_USER)) { - no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); + no_context(regs, error_code, address, SIGBUS, BUS_ADRERR, NULL); return; } @@ -957,7 +965,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, unsigned long address, vm_fault_t fault) { if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) { - no_context(regs, error_code, address, 0, 0); + no_context(regs, error_code, address, 0, 0, NULL); return; } @@ -965,7 +973,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, /* Kernel mode? Handle exceptions or die: */ if (!(error_code & X86_PF_USER)) { no_context(regs, error_code, address, - SIGSEGV, SEGV_MAPERR); + SIGSEGV, SEGV_MAPERR, NULL); return; } @@ -980,7 +988,7 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, VM_FAULT_HWPOISON_LARGE)) do_sigbus(regs, error_code, address, fault); else if (fault & VM_FAULT_SIGSEGV) - bad_area_nosemaphore(regs, error_code, address); + bad_area_nosemaphore(regs, error_code, address, NULL); else BUG(); } @@ -1148,14 +1156,15 @@ static int fault_in_kernel_space(unsigned long address) */ static void do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, - unsigned long address) + unsigned long address, irqentry_state_t *irq_state) { /* - * Protection keys exceptions only happen on user pages. We - * have no user pages in the kernel portion of the address - * space, so do not expect them here. + * If protection keys are not enabled for kernel space + * do not expect Pkey errors here. */ - WARN_ON_ONCE(hw_error_code & X86_PF_PK); + if (!IS_ENABLED(CONFIG_ARCH_HAS_SUPERVISOR_PKEYS) || + !cpu_feature_enabled(X86_FEATURE_PKS)) + WARN_ON_ONCE(hw_error_code & X86_PF_PK); #ifdef CONFIG_X86_32 /* @@ -1204,7 +1213,7 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, * Don't take the mm semaphore here. If we fixup a prefetch * fault we could otherwise deadlock: */ - bad_area_nosemaphore(regs, hw_error_code, address); + bad_area_nosemaphore(regs, hw_error_code, address, irq_state); } NOKPROBE_SYMBOL(do_kern_addr_fault); @@ -1245,7 +1254,7 @@ void do_user_addr_fault(struct pt_regs *regs, !(hw_error_code & X86_PF_USER) && !(regs->flags & X86_EFLAGS_AC))) { - bad_area_nosemaphore(regs, hw_error_code, address); + bad_area_nosemaphore(regs, hw_error_code, address, NULL); return; } @@ -1254,7 +1263,7 @@ void do_user_addr_fault(struct pt_regs *regs, * in a region with pagefaults disabled then we must not take the fault */ if (unlikely(faulthandler_disabled() || !mm)) { - bad_area_nosemaphore(regs, hw_error_code, address); + bad_area_nosemaphore(regs, hw_error_code, address, NULL); return; } @@ -1316,7 +1325,7 @@ void do_user_addr_fault(struct pt_regs *regs, * Fault from code in kernel from * which we do not expect faults. */ - bad_area_nosemaphore(regs, hw_error_code, address); + bad_area_nosemaphore(regs, hw_error_code, address, NULL); return; } retry: @@ -1375,7 +1384,7 @@ void do_user_addr_fault(struct pt_regs *regs, if (fault_signal_pending(fault, regs)) { if (!user_mode(regs)) no_context(regs, hw_error_code, address, SIGBUS, - BUS_ADRERR); + BUS_ADRERR, NULL); return; } @@ -1415,7 +1424,7 @@ trace_page_fault_entries(struct pt_regs *regs, unsigned long error_code, static __always_inline void handle_page_fault(struct pt_regs *regs, unsigned long error_code, - unsigned long address) + unsigned long address, irqentry_state_t *irq_state) { trace_page_fault_entries(regs, error_code, address); @@ -1424,7 +1433,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code, /* Was the fault on kernel-controlled part of the address space? */ if (unlikely(fault_in_kernel_space(address))) { - do_kern_addr_fault(regs, error_code, address); + do_kern_addr_fault(regs, error_code, address, irq_state); } else { do_user_addr_fault(regs, error_code, address); /* @@ -1479,7 +1488,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) irqentry_enter(regs, &state); instrumentation_begin(); - handle_page_fault(regs, error_code, address); + handle_page_fault(regs, error_code, address, &state); instrumentation_end(); irqentry_exit(regs, &state); From patchwork Fri Oct 9 19:42:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ira Weiny X-Patchwork-Id: 285940 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF0BCC35267 for ; Fri, 9 Oct 2020 19:43:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7575022284 for ; Fri, 9 Oct 2020 19:43:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390803AbgJITnr (ORCPT ); Fri, 9 Oct 2020 15:43:47 -0400 Received: from mga04.intel.com ([192.55.52.120]:42726 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2390795AbgJITnd (ORCPT ); Fri, 9 Oct 2020 15:43:33 -0400 IronPort-SDR: NPSfdEskT5lEfBwpNPh5axGrGGxIsDeBvm1y+zO1ttzw74ZfnZCl+PHAb4byJndvMSa4zwXw7N lg8t6Vp5+74Q== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="162892679" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="162892679" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:31 -0700 IronPort-SDR: GFcRh+V5yaQxBzyqkkHxepEAzV5iEzHjMzVPtl+L7GJFMuH1YUjOv9hW0pfuhnCgM+U8oxSDRO fsL5Mzd1jRzA== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="298417435" Received: from iweiny-desk2.sc.intel.com (HELO localhost) ([10.3.52.147]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 12:43:30 -0700 From: ira.weiny@intel.com To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Andy Lutomirski , Peter Zijlstra Cc: Ira Weiny , Fenghua Yu , x86@kernel.org, Dave Hansen , Dan Williams , Andrew Morton , linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org Subject: [PATCH RFC V3 9/9] x86/pks: Add PKS test code Date: Fri, 9 Oct 2020 12:42:58 -0700 Message-Id: <20201009194258.3207172-10-ira.weiny@intel.com> X-Mailer: git-send-email 2.28.0.rc0.12.gb6a658bd00c9 In-Reply-To: <20201009194258.3207172-1-ira.weiny@intel.com> References: <20201009194258.3207172-1-ira.weiny@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-kselftest@vger.kernel.org From: Ira Weiny The core PKS functionality provides an interface for kernel users to reserve keys to their domains set up the page tables with those keys and control access to those domains when needed. Define test code which exercises the core functionality of PKS via a debugfs entry. Basic checks can be triggered on boot with a kernel command line option while both basic and preemption checks can be triggered with separate debugfs values. debugfs controls are: '0' -- Run access tests with a single pkey '1' -- Set up the pkey register with no access for the pkey allocated to this fd '2' -- Check that the pkey register updated in '1' is still the same. (To be used after a forced context switch.) '3' -- Allocate all pkeys possible and run tests on each pkey allocated. DEFAULT when run at boot. Closing the fd will cleanup and release the pkey, therefore to exercise context switch testing a user space program is provided in: .../tools/testing/selftests/x86/test_pks.c Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Ira Weiny Reviewed-by: Dave Hansen --- Documentation/core-api/protection-keys.rst | 1 + arch/x86/mm/fault.c | 23 + include/linux/pkeys.h | 1 - lib/Kconfig.debug | 12 + lib/Makefile | 3 + lib/pks/Makefile | 3 + lib/pks/pks_test.c | 690 +++++++++++++++++++++ tools/testing/selftests/x86/Makefile | 3 +- tools/testing/selftests/x86/test_pks.c | 65 ++ 9 files changed, 799 insertions(+), 2 deletions(-) create mode 100644 lib/pks/Makefile create mode 100644 lib/pks/pks_test.c create mode 100644 tools/testing/selftests/x86/test_pks.c diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index 00a046a913e4..c60366921d60 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -163,3 +163,4 @@ of WRPKRU. So to quote from the WRPKRU text: until all prior executions of WRPKRU have completed execution and updated the PKRU register. +Example code can be found in lib/pks/pks_test.c diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index ee761c993f58..dd5af9399131 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -18,6 +18,7 @@ #include /* faulthandler_disabled() */ #include /* efi_recover_from_page_fault()*/ #include +#include #include /* boot_cpu_has, ... */ #include /* dotraplinkage, ... */ @@ -1149,6 +1150,25 @@ static int fault_in_kernel_space(unsigned long address) return address >= TASK_SIZE_MAX; } +#ifdef CONFIG_PKS_TESTING +bool pks_test_callback(irqentry_state_t *irq_state); +static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state) +{ + /* + * If we get a protection key exception it could be because we + * are running the PKS test. If so, pks_test_callback() will + * clear the protection mechanism and return true to indicate + * the fault was handled. + */ + return (hw_error_code & X86_PF_PK) && pks_test_callback(irq_state); +} +#else +static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state) +{ + return false; +} +#endif + /* * Called for all faults where 'address' is part of the kernel address * space. Might get called for faults that originate from *code* that @@ -1166,6 +1186,9 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, !cpu_feature_enabled(X86_FEATURE_PKS)) WARN_ON_ONCE(hw_error_code & X86_PF_PK); + if (handle_pks_testing(hw_error_code, irq_state)) + return; + #ifdef CONFIG_X86_32 /* * We can fault-in kernel-space virtual memory on-demand. The diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index cc3510cde64e..f9552bd9341f 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -47,7 +47,6 @@ static inline bool arch_pkeys_enabled(void) static inline void copy_init_pkru_to_fpregs(void) { } - #endif /* ! CONFIG_ARCH_HAS_PKEYS */ #ifndef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 0c781f912f9f..f015c09ba5a1 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -2400,6 +2400,18 @@ config HYPERV_TESTING help Select this option to enable Hyper-V vmbus testing. +config PKS_TESTING + bool "PKey(S)upervisor testing" + default n + depends on ARCH_HAS_SUPERVISOR_PKEYS + help + Select this option to enable testing of PKS core software and + hardware. The PKS core provides a mechanism to allocate keys as well + as maintain the protection settings across context switches. + Answer N if you don't know what supervisor keys are. + + If unsure, say N. + endmenu # "Kernel Testing and Coverage" endmenu # Kernel hacking diff --git a/lib/Makefile b/lib/Makefile index a4a4c6864f51..81fcba41f256 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -348,3 +348,6 @@ obj-$(CONFIG_PLDMFW) += pldmfw/ obj-$(CONFIG_LIST_KUNIT_TEST) += list-test.o obj-$(CONFIG_LINEAR_RANGES_TEST) += test_linear_ranges.o obj-$(CONFIG_BITS_TEST) += test_bits.o + +# PKS test +obj-y += pks/ diff --git a/lib/pks/Makefile b/lib/pks/Makefile new file mode 100644 index 000000000000..7d1df7563db9 --- /dev/null +++ b/lib/pks/Makefile @@ -0,0 +1,3 @@ +# SPDX-License-Identifier: GPL-2.0 + +obj-$(CONFIG_PKS_TESTING) += pks_test.o diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c new file mode 100644 index 000000000000..d7dbf92527bd --- /dev/null +++ b/lib/pks/pks_test.c @@ -0,0 +1,690 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright(c) 2020 Intel Corporation. All rights reserved. + * + * Implement PKS testing + * Access to run this test can be with a command line parameter + * ("pks-test-on-boot") or more detailed tests can be triggered through: + * + * /sys/kernel/debug/x86/run_pks + * + * debugfs controls are: + * + * '0' -- Run access tests with a single pkey + * + * '1' -- Set up the pkey register with no access for the pkey allocated to + * this fd + * '2' -- Check that the pkey register updated in '1' is still the same. (To + * be used after a forced context switch.) + * + * '3' -- Allocate all pkeys possible and run tests on each pkey allocated. + * DEFAULT when run at boot. + * + * Closing the fd will cleanup and release the pkey. + * + * A companion user space program is provided in: + * + * .../tools/testing/selftests/x86/test_pks.c + * + * which will better test the context switching. + * + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define PKS_TEST_MEM_SIZE (PAGE_SIZE) + +/* + * run_on_boot default '= false' which checkpatch complains about initializing; + * so we don't + */ +static bool run_on_boot; +static struct dentry *pks_test_dentry; +static bool run_9; + +/* + * We must lock the following globals for brief periods while the fault handler + * checks/updates them. + */ +static DEFINE_SPINLOCK(test_lock); +static int test_armed_key; +static unsigned long prev_cnt; +static unsigned long fault_cnt; + +struct pks_test_ctx { + bool pass; + bool pks_cpu_enabled; + int pkey; + char data[64]; +}; +static struct pks_test_ctx *test_exception_ctx; + +static pte_t *walk_table(void *ptr) +{ + struct page *page = NULL; + pgd_t *pgdp; + p4d_t *p4dp; + pud_t *pudp; + pmd_t *pmdp; + pte_t *ret = NULL; + + pgdp = pgd_offset_k((unsigned long)ptr); + if (pgd_none(*pgdp) || pgd_bad(*pgdp)) + goto error; + + p4dp = p4d_offset(pgdp, (unsigned long)ptr); + if (p4d_none(*p4dp) || p4d_bad(*p4dp)) + goto error; + + pudp = pud_offset(p4dp, (unsigned long)ptr); + if (pud_none(*pudp) || pud_bad(*pudp)) + goto error; + + pmdp = pmd_offset(pudp, (unsigned long)ptr); + if (pmd_none(*pmdp) || pmd_bad(*pmdp)) + goto error; + + ret = pte_offset_map(pmdp, (unsigned long)ptr); + if (pte_present(*ret)) { + page = pte_page(*ret); + if (!page) { + pte_unmap(ret); + goto error; + } + pr_info("page 0x%lx; flags 0x%lx\n", + (unsigned long)page, page->flags); + } + +error: + return ret; +} + +static bool check_pkey_val(u32 pk_reg, int pkey, u32 expected) +{ + u32 pkey_shift = pkey * PKR_BITS_PER_PKEY; + u32 pkey_mask = ((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift; + + pk_reg = (pk_reg & pkey_mask) >> pkey_shift; + return (pk_reg == expected); +} + +/* + * Check if the register @pkey value matches @expected value + * + * Both the cached and actual MSR must match. + */ +static bool check_pkrs(int pkey, u32 expected) +{ + bool ret = true; + u64 pkrs; + u32 *tmp_cache; + + tmp_cache = get_cpu_ptr(&pkrs_cache); + if (!check_pkey_val(*tmp_cache, pkey, expected)) + ret = false; + put_cpu_ptr(tmp_cache); + + rdmsrl(MSR_IA32_PKRS, pkrs); + if (!check_pkey_val(pkrs, pkey, expected)) + ret = false; + + return ret; +} + +static void check_exception(irqentry_state_t *irq_state) +{ + /* Check the thread saved state */ + if (!check_pkey_val(irq_state->pkrs, test_armed_key, PKEY_DISABLE_WRITE)) { + pr_err(" FAIL: checking irq_state->pkrs\n"); + test_exception_ctx->pass = false; + } + + /* Check the exception state */ + if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) { + pr_err(" FAIL: PKRS cache and MSR\n"); + test_exception_ctx->pass = false; + } + + /* + * Check we can update the value during exception without affecting the + * calling thread. The calling thread is checked after exception... + */ + pks_mkrdwr(test_armed_key); + if (!check_pkrs(test_armed_key, 0)) { + pr_err(" FAIL: exception did not change register to 0\n"); + test_exception_ctx->pass = false; + } + pks_mknoaccess(test_armed_key); + if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)) { + pr_err(" FAIL: exception did not change register to 0x3\n"); + test_exception_ctx->pass = false; + } +} + +/* Silence prototype warning */ +bool pks_test_callback(irqentry_state_t *irq_state); + +/** + * pks_test_callback() is exported so that the fault handler can detect + * and report back status of intentional faults. + * + * NOTE: It clears the protection key from the page such that the fault handler + * will not re-trigger. + */ +bool pks_test_callback(irqentry_state_t *irq_state) +{ + bool armed = (test_armed_key != 0); + + if (test_exception_ctx) { + check_exception(irq_state); + /* + * We stop this check within the exception because the + * fault handler clean up code will call us 2x while checking + * the PMD entry and we don't need to check this again + */ + test_exception_ctx = NULL; + } + + if (armed) { + /* Enable read and write to stop faults */ + irq_state->pkrs = update_pkey_val(irq_state->pkrs, test_armed_key, 0); + fault_cnt++; + } + + return armed; +} +EXPORT_SYMBOL(pks_test_callback); + +static bool exception_caught(void) +{ + bool ret = (fault_cnt != prev_cnt); + + prev_cnt = fault_cnt; + return ret; +} + +static void report_pkey_settings(void *unused) +{ + u8 pkey; + unsigned long long msr = 0; + unsigned int cpu = smp_processor_id(); + + rdmsrl(MSR_IA32_PKRS, msr); + + pr_info("for CPU %d : 0x%llx\n", cpu, msr); + for (pkey = 0; pkey < PKS_NUM_KEYS; pkey++) { + int ad, wd; + + ad = (msr >> (pkey * PKR_BITS_PER_PKEY)) & PKEY_DISABLE_ACCESS; + wd = (msr >> (pkey * PKR_BITS_PER_PKEY)) & PKEY_DISABLE_WRITE; + pr_info(" %u: A:%d W:%d\n", pkey, ad, wd); + } +} + +enum pks_access_mode { + PKS_TEST_NO_ACCESS, + PKS_TEST_RDWR, + PKS_TEST_RDONLY +}; + +static char *get_mode_str(enum pks_access_mode mode) +{ + switch (mode) { + case PKS_TEST_NO_ACCESS: + return "No Access"; + case PKS_TEST_RDWR: + return "Read Write"; + case PKS_TEST_RDONLY: + return "Read Only"; + default: + pr_err("BUG in test invalid mode\n"); + break; + } + + return ""; +} + +struct pks_access_test { + enum pks_access_mode mode; + bool write; + bool exception; +}; + +static struct pks_access_test pkey_test_ary[] = { + /* disable both */ + { PKS_TEST_NO_ACCESS, true, true }, + { PKS_TEST_NO_ACCESS, false, true }, + + /* enable both */ + { PKS_TEST_RDWR, true, false }, + { PKS_TEST_RDWR, false, false }, + + /* enable read only */ + { PKS_TEST_RDONLY, true, true }, + { PKS_TEST_RDONLY, false, false }, +}; + +static int test_it(struct pks_test_ctx *ctx, struct pks_access_test *test, void *ptr) +{ + bool exception; + int ret = 0; + + spin_lock(&test_lock); + WRITE_ONCE(test_armed_key, ctx->pkey); + + if (test->write) + memcpy(ptr, ctx->data, 8); + else + memcpy(ctx->data, ptr, 8); + + exception = exception_caught(); + + WRITE_ONCE(test_armed_key, 0); + spin_unlock(&test_lock); + + if (test->exception != exception) { + pr_err("pkey test FAILED: mode %s; write %s; exception %s != %s\n", + get_mode_str(test->mode), + test->write ? "TRUE" : "FALSE", + test->exception ? "TRUE" : "FALSE", + exception ? "TRUE" : "FALSE"); + ret = -EFAULT; + } + + return ret; +} + +static int run_access_test(struct pks_test_ctx *ctx, + struct pks_access_test *test, + void *ptr) +{ + switch (test->mode) { + case PKS_TEST_NO_ACCESS: + pks_mknoaccess(ctx->pkey); + break; + case PKS_TEST_RDWR: + pks_mkrdwr(ctx->pkey); + break; + case PKS_TEST_RDONLY: + pks_mkread(ctx->pkey); + break; + default: + pr_err("BUG in test invalid mode\n"); + break; + } + + return test_it(ctx, test, ptr); +} + +static void *alloc_test_page(int pkey) +{ + return __vmalloc_node_range(PKS_TEST_MEM_SIZE, 1, VMALLOC_START, VMALLOC_END, + GFP_KERNEL, PAGE_KERNEL_PKEY(pkey), 0, + NUMA_NO_NODE, __builtin_return_address(0)); +} + +static void test_mem_access(struct pks_test_ctx *ctx) +{ + int i, rc; + u8 pkey; + void *ptr = NULL; + pte_t *ptep; + + ptr = alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err("Failed to vmalloc page???\n"); + ctx->pass = false; + return; + } + + ptep = walk_table(ptr); + if (!ptep) { + pr_err("Failed to walk table???\n"); + ctx->pass = false; + goto done; + } + + pkey = pte_flags_pkey(ptep->pte); + pr_info("ptep flags 0x%lx pkey %u\n", + (unsigned long)ptep->pte, pkey); + + if (pkey != ctx->pkey) { + pr_err("invalid pkey found: %u, test_pkey: %u\n", + pkey, ctx->pkey); + ctx->pass = false; + goto unmap; + } + + if (!ctx->pks_cpu_enabled) { + pr_err("not CPU enabled; skipping access tests...\n"); + ctx->pass = true; + goto unmap; + } + + for (i = 0; i < ARRAY_SIZE(pkey_test_ary); i++) { + rc = run_access_test(ctx, &pkey_test_ary[i], ptr); + + /* only save last error is fine */ + if (rc) + ctx->pass = false; + } + +unmap: + pte_unmap(ptep); +done: + vfree(ptr); +} + +static void pks_run_test(struct pks_test_ctx *ctx) +{ + ctx->pass = true; + + pr_info("\n"); + pr_info("\n"); + pr_info(" ***** BEGIN: Testing (CPU enabled : %s) *****\n", + ctx->pks_cpu_enabled ? "TRUE" : "FALSE"); + + if (ctx->pks_cpu_enabled) + on_each_cpu(report_pkey_settings, NULL, 1); + + pr_info(" BEGIN: pkey %d Testing\n", ctx->pkey); + test_mem_access(ctx); + pr_info(" END: PAGE_KERNEL_PKEY Testing : %s\n", + ctx->pass ? "PASS" : "FAIL"); + + pr_info(" ***** END: Testing *****\n"); + pr_info("\n"); + pr_info("\n"); +} + +static ssize_t pks_read_file(struct file *file, char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct pks_test_ctx *ctx = file->private_data; + char buf[32]; + unsigned int len; + + if (!ctx) + len = sprintf(buf, "not run\n"); + else + len = sprintf(buf, "%s\n", ctx->pass ? "PASS" : "FAIL"); + + return simple_read_from_buffer(user_buf, count, ppos, buf, len); +} + +static struct pks_test_ctx *alloc_ctx(const char *name) +{ + struct pks_test_ctx *ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + + if (!ctx) { + pr_err("Failed to allocate memory for test context\n"); + return ERR_PTR(-ENOMEM); + } + + ctx->pkey = pks_key_alloc(name); + if (ctx->pkey <= 0) { + pr_err("Failed to allocate memory for test context\n"); + kfree(ctx); + return ERR_PTR(-ENOMEM); + } + + ctx->pks_cpu_enabled = cpu_feature_enabled(X86_FEATURE_PKS); + sprintf(ctx->data, "%s", "DEADBEEF"); + return ctx; +} + +static void free_ctx(struct pks_test_ctx *ctx) +{ + pks_key_free(ctx->pkey); + kfree(ctx); +} + +static void run_exception_test(void) +{ + void *ptr = NULL; + bool pass = true; + struct pks_test_ctx *ctx; + + pr_info(" ***** BEGIN: exception checking\n"); + + ctx = alloc_ctx("Exception test"); + if (IS_ERR(ctx)) { + pr_err(" FAIL: no context\n"); + pass = false; + goto result; + } + ctx->pass = true; + + ptr = alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err(" FAIL: no vmalloc page\n"); + pass = false; + goto free_context; + } + + pks_mkread(ctx->pkey); + + spin_lock(&test_lock); + WRITE_ONCE(test_exception_ctx, ctx); + WRITE_ONCE(test_armed_key, ctx->pkey); + + memcpy(ptr, ctx->data, 8); + + if (!exception_caught()) { + pr_err(" FAIL: did not get an exception\n"); + pass = false; + } + + /* + * NOTE The exception code has to enable access (b00) to keep the + * fault from looping forever. So we don't see the write disabled + * restored but rather full access restored. Also note that as part + * of this test the exception callback attempted to disable access + * completely (b11) and so we ensure that we are seeing the proper + * thread value restored here. + */ + if (!check_pkrs(test_armed_key, 0)) { + pr_err(" FAIL: PKRS not restored\n"); + pass = false; + } + + if (!ctx->pass) + pass = false; + + WRITE_ONCE(test_armed_key, 0); + spin_unlock(&test_lock); + + vfree(ptr); +free_context: + free_ctx(ctx); +result: + pr_info(" ***** END: exception checking : %s\n", + pass ? "PASS" : "FAIL"); +} + +static void run_all(void) +{ + struct pks_test_ctx *ctx[PKS_NUM_KEYS]; + static char name[PKS_NUM_KEYS][64]; + int i; + + for (i = 1; i < PKS_NUM_KEYS; i++) { + sprintf(name[i], "pks ctx %d", i); + ctx[i] = alloc_ctx((const char *)name[i]); + } + + for (i = 1; i < PKS_NUM_KEYS; i++) { + if (!IS_ERR(ctx[i])) + pks_run_test(ctx[i]); + } + + for (i = 1; i < PKS_NUM_KEYS; i++) { + if (!IS_ERR(ctx[i])) + free_ctx(ctx[i]); + } + + run_exception_test(); +} + +static void crash_it(void) +{ + struct pks_test_ctx *ctx; + void *ptr; + + pr_warn(" ***** BEGIN: Unhandled fault test *****\n"); + + ctx = alloc_ctx("crashing kernel\n"); + + ptr = alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err("Failed to vmalloc page???\n"); + ctx->pass = false; + return; + } + + pks_mknoaccess(ctx->pkey); + + spin_lock(&test_lock); + WRITE_ONCE(test_armed_key, 0); + /* This purposely faults */ + memcpy(ptr, ctx->data, 8); + spin_unlock(&test_lock); + + vfree(ptr); + free_ctx(ctx); +} + +static ssize_t pks_write_file(struct file *file, const char __user *user_buf, + size_t count, loff_t *ppos) +{ + char buf[2]; + struct pks_test_ctx *ctx = file->private_data; + + if (copy_from_user(buf, user_buf, 1)) + return -EFAULT; + buf[1] = '\0'; + + /* + * WARNING: Test "9" will crash the kernel. + * + * So we arm the test and print a warning. A second "9" will run the + * test. + */ + if (!strcmp(buf, "9")) { + if (run_9) { + crash_it(); + run_9 = false; + } else { + pr_warn("CAUTION: Test 9 will crash in the kernel.\n"); + pr_warn(" Specify 9 a second time to run\n"); + pr_warn(" run any other test to clear\n"); + run_9 = true; + } + } else { + run_9 = false; + } + + /* + * Test "3" will test allocating all keys. Do it first without + * using "ctx". + */ + if (!strcmp(buf, "3")) + run_all(); + + if (!ctx) { + ctx = alloc_ctx("pks test"); + if (IS_ERR(ctx)) + return -ENOMEM; + file->private_data = ctx; + } + + if (!strcmp(buf, "0")) + pks_run_test(ctx); + + /* start of context switch test */ + if (!strcmp(buf, "1")) { + /* Ensure a known state to test context switch */ + pks_mknoaccess(ctx->pkey); + } + + /* After context switch msr should be restored */ + if (!strcmp(buf, "2") && ctx->pks_cpu_enabled) { + unsigned long reg_pkrs; + int access; + + rdmsrl(MSR_IA32_PKRS, reg_pkrs); + + access = (reg_pkrs >> (ctx->pkey * PKR_BITS_PER_PKEY)) & + PKEY_ACCESS_MASK; + if (access != (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)) { + ctx->pass = false; + pr_err("Context switch check failed\n"); + } + } + + return count; +} + +static int pks_release_file(struct inode *inode, struct file *file) +{ + struct pks_test_ctx *ctx = file->private_data; + + if (!ctx) + return 0; + + free_ctx(ctx); + return 0; +} + +static const struct file_operations fops_init_pks = { + .read = pks_read_file, + .write = pks_write_file, + .llseek = default_llseek, + .release = pks_release_file, +}; + +static int __init parse_pks_test_options(char *str) +{ + run_on_boot = true; + + return 0; +} +early_param("pks-test-on-boot", parse_pks_test_options); + +static int __init pks_test_init(void) +{ + if (cpu_feature_enabled(X86_FEATURE_PKS)) { + if (run_on_boot) + run_all(); + + pks_test_dentry = debugfs_create_file("run_pks", 0600, arch_debugfs_dir, + NULL, &fops_init_pks); + } + + return 0; +} +late_initcall(pks_test_init); + +static void __exit pks_test_exit(void) +{ + debugfs_remove(pks_test_dentry); + pr_info("test exit\n"); +} +module_exit(pks_test_exit); + +MODULE_AUTHOR("Intel Corporation"); +MODULE_LICENSE("GPL v2"); diff --git a/tools/testing/selftests/x86/Makefile b/tools/testing/selftests/x86/Makefile index 6703c7906b71..f5c80f952eab 100644 --- a/tools/testing/selftests/x86/Makefile +++ b/tools/testing/selftests/x86/Makefile @@ -13,7 +13,8 @@ CAN_BUILD_WITH_NOPIE := $(shell ./check_cc.sh $(CC) trivial_program.c -no-pie) TARGETS_C_BOTHBITS := single_step_syscall sysret_ss_attrs syscall_nt test_mremap_vdso \ check_initial_reg_state sigreturn iopl ioperm \ test_vdso test_vsyscall mov_ss_trap \ - syscall_arg_fault fsgsbase_restore + syscall_arg_fault fsgsbase_restore test_pks + TARGETS_C_32BIT_ONLY := entry_from_vm86 test_syscall_vdso unwind_vdso \ test_FCMOV test_FCOMI test_FISTTP \ vdso_restorer diff --git a/tools/testing/selftests/x86/test_pks.c b/tools/testing/selftests/x86/test_pks.c new file mode 100644 index 000000000000..8037a2a9ff5f --- /dev/null +++ b/tools/testing/selftests/x86/test_pks.c @@ -0,0 +1,65 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +int main(void) +{ + cpu_set_t cpuset; + char result[32]; + pid_t pid; + int fd; + + CPU_ZERO(&cpuset); + CPU_SET(0, &cpuset); + /* Two processes run on CPU 0 so that they go through context switch. */ + sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset); + + pid = fork(); + if (pid == 0) { + fd = open("/sys/kernel/debug/x86/run_pks", O_RDWR); + if (fd < 0) { + printf("cannot open file\n"); + return -1; + } + + /* Allocate test_pkey1 and run test. */ + write(fd, "0", 1); + + /* Arm for context switch test */ + write(fd, "1", 1); + + /* Context switch out... */ + sleep(4); + + /* Check msr restored */ + write(fd, "2", 1); + } else { + sleep(2); + + fd = open("/sys/kernel/debug/x86/run_pks", O_RDWR); + if (fd < 0) { + printf("cannot open file\n"); + return -1; + } + + /* run test with alternate pkey */ + write(fd, "0", 1); + } + + read(fd, result, 10); + printf("#PF, context switch, pkey allocation and free tests: %s\n", + result); + + close(fd); + + return 0; +}