From patchwork Mon Apr 25 03:39:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 565910 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DFE2C43217 for ; Mon, 25 Apr 2022 03:39:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240701AbiDYDmu (ORCPT ); Sun, 24 Apr 2022 23:42:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240692AbiDYDms (ORCPT ); Sun, 24 Apr 2022 23:42:48 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB1962F00C; Sun, 24 Apr 2022 20:39:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650857985; x=1682393985; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=voiZlbMIDqijYsQ1xhylh9FstZReHrjZplC2cj9TuAA=; b=k3zkCukbHs49VuF+JgHwnfs4c0fK78lr8J6Egq+7Jn1iMjhrv50mP9jx G9U3qnymiYpD0CuFRgWnbYG9Luxk8egHfSr5orVwqDP9cl4QXZJcd11jI jFG5sjlqTkuC9jXVvs2C4rMpttNixNENFMEFMZCGE9ZBSeR3weTDmnZu3 uMVy4Y/BBxAOYa0uyW7PGVnHYmtUxaY+4s2Nut0pE0fAi6p+8hkoxY4rG 11UQD8lfWZBZcq9FKAvgy+m7xylJJL7aStxXKEF+JrOLVX3Y8UCGKFnX9 TdAFhV2FAaNPlZMaRkOh8ROplrFp3Nh7Xv/Kw8S2JrA3eZfbdE+XcQMQO g==; X-IronPort-AV: E=McAfee;i="6400,9594,10327"; a="264925567" X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="264925567" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 20:39:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="729520376" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 24 Apr 2022 20:39:35 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 8973481; Mon, 25 Apr 2022 06:39:35 +0300 (EEST) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Dave Hansen , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 01/12] x86/boot/: Centralize __pa()/__va() definitions Date: Mon, 25 Apr 2022 06:39:23 +0300 Message-Id: <20220425033934.68551-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> References: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Replace multiple __pa()/__va() definitions with a single one in misc.h. Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/compressed/ident_map_64.c | 8 -------- arch/x86/boot/compressed/misc.h | 9 +++++++++ arch/x86/boot/compressed/sev.c | 2 -- 3 files changed, 9 insertions(+), 10 deletions(-) diff --git a/arch/x86/boot/compressed/ident_map_64.c b/arch/x86/boot/compressed/ident_map_64.c index f7213d0943b8..fe523ee1a19f 100644 --- a/arch/x86/boot/compressed/ident_map_64.c +++ b/arch/x86/boot/compressed/ident_map_64.c @@ -8,14 +8,6 @@ * Copyright (C) 2016 Kees Cook */ -/* - * Since we're dealing with identity mappings, physical and virtual - * addresses are the same, so override these defines which are ultimately - * used by the headers in misc.h. - */ -#define __pa(x) ((unsigned long)(x)) -#define __va(x) ((void *)((unsigned long)(x))) - /* No PAGE_TABLE_ISOLATION support needed either: */ #undef CONFIG_PAGE_TABLE_ISOLATION diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h index ea71cf3d64e1..9f7154a30d37 100644 --- a/arch/x86/boot/compressed/misc.h +++ b/arch/x86/boot/compressed/misc.h @@ -19,6 +19,15 @@ /* cpu_feature_enabled() cannot be used this early */ #define USE_EARLY_PGTABLE_L5 +/* + * Boot stub deals with identity mappings, physical and virtual addresses are + * the same, so override these defines. + * + * will not define them if they are already defined. + */ +#define __pa(x) ((unsigned long)(x)) +#define __va(x) ((void *)((unsigned long)(x))) + #include #include #include diff --git a/arch/x86/boot/compressed/sev.c b/arch/x86/boot/compressed/sev.c index 28bcf04c022e..4dcea0bc4fe4 100644 --- a/arch/x86/boot/compressed/sev.c +++ b/arch/x86/boot/compressed/sev.c @@ -106,9 +106,7 @@ static enum es_result vc_read_mem(struct es_em_ctxt *ctxt, } #undef __init -#undef __pa #define __init -#define __pa(x) ((unsigned long)(x)) #define __BOOT_COMPRESSED From patchwork Mon Apr 25 03:39:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 565912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0FAA6C433FE for ; Mon, 25 Apr 2022 03:39:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240691AbiDYDms (ORCPT ); Sun, 24 Apr 2022 23:42:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237995AbiDYDmr (ORCPT ); Sun, 24 Apr 2022 23:42:47 -0400 Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7405F2ED7F; Sun, 24 Apr 2022 20:39:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650857984; x=1682393984; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uq7WbkY0GvTQhFT7oa2kwzHtvnsgmaIdyx/9kdIYCD0=; b=Ulsfxi1ekNtLj4dkY7y+V8QrMwHzN0Wt07eYSoWF0/guIQTs4R0ftY5M sDPQqqOM7lZjYXMebDCVSh4l3XTKIeKBSRLHPGOjMo3LV2oFQk9Tg/n04 N8EBf2Rk54DVfuzRWfNrsAQji6yVAcJVa4J38zUDXXs4NKCR24WG9n4ly WpGa79OTLjPHECSqNtDCcK6kX0oEUmzWvDgIyzFlw6sT6+V9HAqWzNVSm jLoGxW2bffd7YeKhLd1vJX6b6wXgz01xpho40UC2yJSOlM8eZB+8KCAUF i/iZXNbXjDprDhOuHkf/9I+gxCZtQGhULpXNqRB9g3EujXiQihSzWSbCU w==; X-IronPort-AV: E=McAfee;i="6400,9594,10327"; a="264925569" X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="264925569" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 20:39:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="729520378" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga005.jf.intel.com with ESMTP; 24 Apr 2022 20:39:35 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id A608B3A8; Mon, 25 Apr 2022 06:39:35 +0300 (EEST) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Dave Hansen , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 03/12] efi/x86: Get full memory map in allocate_e820() Date: Mon, 25 Apr 2022 06:39:25 +0300 Message-Id: <20220425033934.68551-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> References: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Currently allocate_e820() only interested in the size of map and size of memory descriptor to determine how many e820 entries the kernel needs. UEFI Specification version 2.9 introduces a new memory type -- unaccepted memory. To track unaccepted memory kernel needs to allocate a bitmap. The size of the bitmap is dependent on the maximum physical address present in the system. A full memory map is required to find the maximum address. Modify allocate_e820() to get a full memory map. This is preparation for the next patch that implements handling of unaccepted memory in EFI stub. Signed-off-by: Kirill A. Shutemov --- drivers/firmware/efi/libstub/x86-stub.c | 30 ++++++++++++------------- 1 file changed, 14 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 01ddd4502e28..5401985901f5 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -569,31 +569,29 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext, } static efi_status_t allocate_e820(struct boot_params *params, + struct efi_boot_memmap *map, struct setup_data **e820ext, u32 *e820ext_size) { - unsigned long map_size, desc_size, map_key; efi_status_t status; - __u32 nr_desc, desc_version; + __u32 nr_desc; - /* Only need the size of the mem map and size of each mem descriptor */ - map_size = 0; - status = efi_bs_call(get_memory_map, &map_size, NULL, &map_key, - &desc_size, &desc_version); - if (status != EFI_BUFFER_TOO_SMALL) - return (status != EFI_SUCCESS) ? status : EFI_UNSUPPORTED; - - nr_desc = map_size / desc_size + EFI_MMAP_NR_SLACK_SLOTS; + status = efi_get_memory_map(map); + if (status != EFI_SUCCESS) + return status; - if (nr_desc > ARRAY_SIZE(params->e820_table)) { - u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table); + nr_desc = *map->map_size / *map->desc_size; + if (nr_desc > ARRAY_SIZE(params->e820_table) - EFI_MMAP_NR_SLACK_SLOTS) { + u32 nr_e820ext = nr_desc - ARRAY_SIZE(params->e820_table) + + EFI_MMAP_NR_SLACK_SLOTS; status = alloc_e820ext(nr_e820ext, e820ext, e820ext_size); if (status != EFI_SUCCESS) - return status; + goto out; } - - return EFI_SUCCESS; +out: + efi_bs_call(free_pool, *map->map); + return status; } struct exit_boot_struct { @@ -642,7 +640,7 @@ static efi_status_t exit_boot(struct boot_params *boot_params, void *handle) priv.boot_params = boot_params; priv.efi = &boot_params->efi_info; - status = allocate_e820(boot_params, &e820ext, &e820ext_size); + status = allocate_e820(boot_params, &map, &e820ext, &e820ext_size); if (status != EFI_SUCCESS) return status; From patchwork Mon Apr 25 03:39:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 565911 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CC96C43219 for ; Mon, 25 Apr 2022 03:39:48 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240697AbiDYDms (ORCPT ); Sun, 24 Apr 2022 23:42:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55420 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240675AbiDYDmr (ORCPT ); Sun, 24 Apr 2022 23:42:47 -0400 Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AAB82F021; Sun, 24 Apr 2022 20:39:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650857984; x=1682393984; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q/YmYQjfOe8AznqHyjTAnCkWN8UWX+2u2MP+z2dU+nQ=; b=GdWW9wB+wIvAVfCk793CgFYTXH12BalEyzcIwSJB/A3AVMSaxFwqr7qX DjutmfZftEiGfBfJXn2tfTd54Pv9lDMcwLQneoJiM92JgWMk9ofh7nDUo Q8cxwuMHxNWPH2ILwyMIiTVdaX4TL2YnbARKm3V+g8vgjKiffNlV84l0z Ps4KTZpOFv6wG0O9U4ryLk35Cyn4FxQ9J/XcI4N0Xs0qK8aKOFp1qWXOQ W+xrW7gPk3mxURDYTumRub9E/V3sE9gpVcYJG/6r0gjaG+QQxZhzl+VrH XG4Gc3BDYMyvg4i9EtHF059hlC6IKXiGnjQj+FNlwkYcQ6me+2QkYkaNo w==; X-IronPort-AV: E=McAfee;i="6400,9594,10327"; a="245055673" X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="245055673" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 20:39:42 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="563911372" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga007.fm.intel.com with ESMTP; 24 Apr 2022 20:39:35 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id B3F7D4E1; Mon, 25 Apr 2022 06:39:35 +0300 (EEST) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Dave Hansen , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 04/12] x86/boot: Add infrastructure required for unaccepted memory support Date: Mon, 25 Apr 2022 06:39:26 +0300 Message-Id: <20220425033934.68551-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> References: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Pull functionality from the main kernel headers and lib/ that is required for unaccepted memory support. This is preparatory patch. The users for the functionality will come in following patches. Signed-off-by: Kirill A. Shutemov --- arch/x86/boot/bitops.h | 40 +++++++++++++++ arch/x86/boot/compressed/align.h | 14 +++++ arch/x86/boot/compressed/bitmap.c | 43 ++++++++++++++++ arch/x86/boot/compressed/bitmap.h | 49 ++++++++++++++++++ arch/x86/boot/compressed/bits.h | 36 +++++++++++++ arch/x86/boot/compressed/compiler.h | 9 ++++ arch/x86/boot/compressed/find.c | 54 +++++++++++++++++++ arch/x86/boot/compressed/find.h | 80 +++++++++++++++++++++++++++++ arch/x86/boot/compressed/math.h | 37 +++++++++++++ arch/x86/boot/compressed/minmax.h | 61 ++++++++++++++++++++++ 10 files changed, 423 insertions(+) create mode 100644 arch/x86/boot/compressed/align.h create mode 100644 arch/x86/boot/compressed/bitmap.c create mode 100644 arch/x86/boot/compressed/bitmap.h create mode 100644 arch/x86/boot/compressed/bits.h create mode 100644 arch/x86/boot/compressed/compiler.h create mode 100644 arch/x86/boot/compressed/find.c create mode 100644 arch/x86/boot/compressed/find.h create mode 100644 arch/x86/boot/compressed/math.h create mode 100644 arch/x86/boot/compressed/minmax.h diff --git a/arch/x86/boot/bitops.h b/arch/x86/boot/bitops.h index 02e1dea11d94..61eb820ee402 100644 --- a/arch/x86/boot/bitops.h +++ b/arch/x86/boot/bitops.h @@ -41,4 +41,44 @@ static inline void set_bit(int nr, void *addr) asm("btsl %1,%0" : "+m" (*(u32 *)addr) : "Ir" (nr)); } +static __always_inline void __set_bit(long nr, volatile unsigned long *addr) +{ + asm volatile(__ASM_SIZE(bts) " %1,%0" : : "m" (*(volatile long *) addr), + "Ir" (nr) : "memory"); +} + +static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) +{ + asm volatile(__ASM_SIZE(btr) " %1,%0" : : "m" (*(volatile long *) addr), + "Ir" (nr) : "memory"); +} + +/** + * __ffs - find first set bit in word + * @word: The word to search + * + * Undefined if no bit exists, so code should check against 0 first. + */ +static __always_inline unsigned long __ffs(unsigned long word) +{ + asm("rep; bsf %1,%0" + : "=r" (word) + : "rm" (word)); + return word; +} + +/** + * ffz - find first zero bit in word + * @word: The word to search + * + * Undefined if no zero exists, so code should check against ~0UL first. + */ +static __always_inline unsigned long ffz(unsigned long word) +{ + asm("rep; bsf %1,%0" + : "=r" (word) + : "r" (~word)); + return word; +} + #endif /* BOOT_BITOPS_H */ diff --git a/arch/x86/boot/compressed/align.h b/arch/x86/boot/compressed/align.h new file mode 100644 index 000000000000..c72ff4e8dd63 --- /dev/null +++ b/arch/x86/boot/compressed/align.h @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef BOOT_ALIGN_H +#define BOOT_ALIGN_H +#define _LINUX_ALIGN_H /* Inhibit inclusion of */ + +/* @a is a power of 2 value */ +#define ALIGN(x, a) __ALIGN_KERNEL((x), (a)) +#define ALIGN_DOWN(x, a) __ALIGN_KERNEL((x) - ((a) - 1), (a)) +#define __ALIGN_MASK(x, mask) __ALIGN_KERNEL_MASK((x), (mask)) +#define PTR_ALIGN(p, a) ((typeof(p))ALIGN((unsigned long)(p), (a))) +#define PTR_ALIGN_DOWN(p, a) ((typeof(p))ALIGN_DOWN((unsigned long)(p), (a))) +#define IS_ALIGNED(x, a) (((x) & ((typeof(x))(a) - 1)) == 0) + +#endif diff --git a/arch/x86/boot/compressed/bitmap.c b/arch/x86/boot/compressed/bitmap.c new file mode 100644 index 000000000000..789ecadeb521 --- /dev/null +++ b/arch/x86/boot/compressed/bitmap.c @@ -0,0 +1,43 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "bitmap.h" + +void __bitmap_set(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_set = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_set = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_set >= 0) { + *p |= mask_to_set; + len -= bits_to_set; + bits_to_set = BITS_PER_LONG; + mask_to_set = ~0UL; + p++; + } + if (len) { + mask_to_set &= BITMAP_LAST_WORD_MASK(size); + *p |= mask_to_set; + } +} + +void __bitmap_clear(unsigned long *map, unsigned int start, int len) +{ + unsigned long *p = map + BIT_WORD(start); + const unsigned int size = start + len; + int bits_to_clear = BITS_PER_LONG - (start % BITS_PER_LONG); + unsigned long mask_to_clear = BITMAP_FIRST_WORD_MASK(start); + + while (len - bits_to_clear >= 0) { + *p &= ~mask_to_clear; + len -= bits_to_clear; + bits_to_clear = BITS_PER_LONG; + mask_to_clear = ~0UL; + p++; + } + if (len) { + mask_to_clear &= BITMAP_LAST_WORD_MASK(size); + *p &= ~mask_to_clear; + } +} diff --git a/arch/x86/boot/compressed/bitmap.h b/arch/x86/boot/compressed/bitmap.h new file mode 100644 index 000000000000..34cce38d94e9 --- /dev/null +++ b/arch/x86/boot/compressed/bitmap.h @@ -0,0 +1,49 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef BOOT_BITMAP_H +#define BOOT_BITMAP_H +#define __LINUX_BITMAP_H /* Inhibit inclusion of */ + +#include "../bitops.h" +#include "../string.h" +#include "align.h" + +#define BITMAP_MEM_ALIGNMENT 8 +#define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1) + +#define BITMAP_FIRST_WORD_MASK(start) (~0UL << ((start) & (BITS_PER_LONG - 1))) +#define BITMAP_LAST_WORD_MASK(nbits) (~0UL >> (-(nbits) & (BITS_PER_LONG - 1))) + +#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) + +void __bitmap_set(unsigned long *map, unsigned int start, int len); +void __bitmap_clear(unsigned long *map, unsigned int start, int len); + +static __always_inline void bitmap_set(unsigned long *map, unsigned int start, + unsigned int nbits) +{ + if (__builtin_constant_p(nbits) && nbits == 1) + __set_bit(start, map); + else if (__builtin_constant_p(start & BITMAP_MEM_MASK) && + IS_ALIGNED(start, BITMAP_MEM_ALIGNMENT) && + __builtin_constant_p(nbits & BITMAP_MEM_MASK) && + IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT)) + memset((char *)map + start / 8, 0xff, nbits / 8); + else + __bitmap_set(map, start, nbits); +} + +static __always_inline void bitmap_clear(unsigned long *map, unsigned int start, + unsigned int nbits) +{ + if (__builtin_constant_p(nbits) && nbits == 1) + __clear_bit(start, map); + else if (__builtin_constant_p(start & BITMAP_MEM_MASK) && + IS_ALIGNED(start, BITMAP_MEM_ALIGNMENT) && + __builtin_constant_p(nbits & BITMAP_MEM_MASK) && + IS_ALIGNED(nbits, BITMAP_MEM_ALIGNMENT)) + memset((char *)map + start / 8, 0, nbits / 8); + else + __bitmap_clear(map, start, nbits); +} + +#endif diff --git a/arch/x86/boot/compressed/bits.h b/arch/x86/boot/compressed/bits.h new file mode 100644 index 000000000000..b00cd13c63c8 --- /dev/null +++ b/arch/x86/boot/compressed/bits.h @@ -0,0 +1,36 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef BOOT_BITS_H +#define BOOT_BITS_H +#define __LINUX_BITS_H /* Inhibit inclusion of */ + +#ifdef __ASSEMBLY__ +#define _AC(X,Y) X +#define _AT(T,X) X +#else +#define __AC(X,Y) (X##Y) +#define _AC(X,Y) __AC(X,Y) +#define _AT(T,X) ((T)(X)) +#endif + +#define _UL(x) (_AC(x, UL)) +#define _ULL(x) (_AC(x, ULL)) +#define UL(x) (_UL(x)) +#define ULL(x) (_ULL(x)) + +#define BIT(nr) (UL(1) << (nr)) +#define BIT_ULL(nr) (ULL(1) << (nr)) +#define BIT_MASK(nr) (UL(1) << ((nr) % BITS_PER_LONG)) +#define BIT_WORD(nr) ((nr) / BITS_PER_LONG) +#define BIT_ULL_MASK(nr) (ULL(1) << ((nr) % BITS_PER_LONG_LONG)) +#define BIT_ULL_WORD(nr) ((nr) / BITS_PER_LONG_LONG) +#define BITS_PER_BYTE 8 + +#define GENMASK(h, l) \ + (((~UL(0)) - (UL(1) << (l)) + 1) & \ + (~UL(0) >> (BITS_PER_LONG - 1 - (h)))) + +#define GENMASK_ULL(h, l) \ + (((~ULL(0)) - (ULL(1) << (l)) + 1) & \ + (~ULL(0) >> (BITS_PER_LONG_LONG - 1 - (h)))) + +#endif diff --git a/arch/x86/boot/compressed/compiler.h b/arch/x86/boot/compressed/compiler.h new file mode 100644 index 000000000000..72e20cf01465 --- /dev/null +++ b/arch/x86/boot/compressed/compiler.h @@ -0,0 +1,9 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef BOOT_COMPILER_H +#define BOOT_COMPILER_H +#define __LINUX_COMPILER_H /* Inhibit inclusion of */ + +# define likely(x) __builtin_expect(!!(x), 1) +# define unlikely(x) __builtin_expect(!!(x), 0) + +#endif diff --git a/arch/x86/boot/compressed/find.c b/arch/x86/boot/compressed/find.c new file mode 100644 index 000000000000..839be91aae52 --- /dev/null +++ b/arch/x86/boot/compressed/find.c @@ -0,0 +1,54 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include "bitmap.h" +#include "find.h" +#include "math.h" +#include "minmax.h" + +static __always_inline unsigned long swab(const unsigned long y) +{ +#if __BITS_PER_LONG == 64 + return __builtin_bswap32(y); +#else /* __BITS_PER_LONG == 32 */ + return __builtin_bswap64(y); +#endif +} + +unsigned long _find_next_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long nbits, + unsigned long start, unsigned long invert, unsigned long le) +{ + unsigned long tmp, mask; + + if (unlikely(start >= nbits)) + return nbits; + + tmp = addr1[start / BITS_PER_LONG]; + if (addr2) + tmp &= addr2[start / BITS_PER_LONG]; + tmp ^= invert; + + /* Handle 1st word. */ + mask = BITMAP_FIRST_WORD_MASK(start); + if (le) + mask = swab(mask); + + tmp &= mask; + + start = round_down(start, BITS_PER_LONG); + + while (!tmp) { + start += BITS_PER_LONG; + if (start >= nbits) + return nbits; + + tmp = addr1[start / BITS_PER_LONG]; + if (addr2) + tmp &= addr2[start / BITS_PER_LONG]; + tmp ^= invert; + } + + if (le) + tmp = swab(tmp); + + return min(start + __ffs(tmp), nbits); +} diff --git a/arch/x86/boot/compressed/find.h b/arch/x86/boot/compressed/find.h new file mode 100644 index 000000000000..910d007a7ec5 --- /dev/null +++ b/arch/x86/boot/compressed/find.h @@ -0,0 +1,80 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef BOOT_FIND_H +#define BOOT_FIND_H +#define __LINUX_FIND_H /* Inhibit inclusion of */ + +#include "../bitops.h" +#include "align.h" +#include "bits.h" +#include "compiler.h" + +unsigned long _find_next_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long nbits, + unsigned long start, unsigned long invert, unsigned long le); + +/** + * find_next_bit - find the next set bit in a memory region + * @addr: The address to base the search on + * @offset: The bitnumber to start searching at + * @size: The bitmap size in bits + * + * Returns the bit number for the next set bit + * If no bits are set, returns @size. + */ +static inline +unsigned long find_next_bit(const unsigned long *addr, unsigned long size, + unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val; + + if (unlikely(offset >= size)) + return size; + + val = *addr & GENMASK(size - 1, offset); + return val ? __ffs(val) : size; + } + + return _find_next_bit(addr, NULL, size, offset, 0UL, 0); +} + +/** + * find_next_zero_bit - find the next cleared bit in a memory region + * @addr: The address to base the search on + * @offset: The bitnumber to start searching at + * @size: The bitmap size in bits + * + * Returns the bit number of the next zero bit + * If no bits are zero, returns @size. + */ +static inline +unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, + unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val; + + if (unlikely(offset >= size)) + return size; + + val = *addr | ~GENMASK(size - 1, offset); + return val == ~0UL ? size : ffz(val); + } + + return _find_next_bit(addr, NULL, size, offset, ~0UL, 0); +} + +/** + * for_each_set_bitrange_from - iterate over all set bit ranges [b; e) + * @b: bit offset of start of current bitrange (first set bit); must be initialized + * @e: bit offset of end of current bitrange (first unset bit) + * @addr: bitmap address to base the search on + * @size: bitmap size in number of bits + */ +#define for_each_set_bitrange_from(b, e, addr, size) \ + for ((b) = find_next_bit((addr), (size), (b)), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1); \ + (b) < (size); \ + (b) = find_next_bit((addr), (size), (e) + 1), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1)) +#endif diff --git a/arch/x86/boot/compressed/math.h b/arch/x86/boot/compressed/math.h new file mode 100644 index 000000000000..b8b9fccb3c03 --- /dev/null +++ b/arch/x86/boot/compressed/math.h @@ -0,0 +1,37 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef BOOT_MATH_H +#define BOOT_MATH_H +#define __LINUX_MATH_H /* Inhibit inclusion of */ + +/* + * + * This looks more complex than it should be. But we need to + * get the type for the ~ right in round_down (it needs to be + * as wide as the result!), and we want to evaluate the macro + * arguments just once each. + */ +#define __round_mask(x, y) ((__typeof__(x))((y)-1)) + +/** + * round_up - round up to next specified power of 2 + * @x: the value to round + * @y: multiple to round up to (must be a power of 2) + * + * Rounds @x up to next multiple of @y (which must be a power of 2). + * To perform arbitrary rounding up, use roundup() below. + */ +#define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1) + +/** + * round_down - round down to next specified power of 2 + * @x: the value to round + * @y: multiple to round down to (must be a power of 2) + * + * Rounds @x down to next multiple of @y (which must be a power of 2). + * To perform arbitrary rounding down, use rounddown() below. + */ +#define round_down(x, y) ((x) & ~__round_mask(x, y)) + +#define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) + +#endif diff --git a/arch/x86/boot/compressed/minmax.h b/arch/x86/boot/compressed/minmax.h new file mode 100644 index 000000000000..fbf640cfce32 --- /dev/null +++ b/arch/x86/boot/compressed/minmax.h @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0-only +#ifndef BOOT_MINMAX_H +#define BOOT_MINMAX_H +#define __LINUX_MINMAX_H /* Inhibit inclusion of */ + +/* + * This returns a constant expression while determining if an argument is + * a constant expression, most importantly without evaluating the argument. + * Glory to Martin Uecker + */ +#define __is_constexpr(x) \ + (sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8))) + +/* + * min()/max()/clamp() macros must accomplish three things: + * + * - avoid multiple evaluations of the arguments (so side-effects like + * "x++" happen only once) when non-constant. + * - perform strict type-checking (to generate warnings instead of + * nasty runtime surprises). See the "unnecessary" pointer comparison + * in __typecheck(). + * - retain result as a constant expressions when called with only + * constant expressions (to avoid tripping VLA warnings in stack + * allocation usage). + */ +#define __typecheck(x, y) \ + (!!(sizeof((typeof(x) *)1 == (typeof(y) *)1))) + +#define __no_side_effects(x, y) \ + (__is_constexpr(x) && __is_constexpr(y)) + +#define __safe_cmp(x, y) \ + (__typecheck(x, y) && __no_side_effects(x, y)) + +#define __cmp(x, y, op) ((x) op (y) ? (x) : (y)) + +#define __cmp_once(x, y, unique_x, unique_y, op) ({ \ + typeof(x) unique_x = (x); \ + typeof(y) unique_y = (y); \ + __cmp(unique_x, unique_y, op); }) + +#define __careful_cmp(x, y, op) \ + __builtin_choose_expr(__safe_cmp(x, y), \ + __cmp(x, y, op), \ + __cmp_once(x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y), op)) + +/** + * min - return minimum of two values of the same or compatible types + * @x: first value + * @y: second value + */ +#define min(x, y) __careful_cmp(x, y, <) + +/** + * max - return maximum of two values of the same or compatible types + * @x: first value + * @y: second value + */ +#define max(x, y) __careful_cmp(x, y, >) + +#endif From patchwork Mon Apr 25 03:39:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 565909 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FDB3C43217 for ; Mon, 25 Apr 2022 03:40:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240702AbiDYDnE (ORCPT ); Sun, 24 Apr 2022 23:43:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55700 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240708AbiDYDmx (ORCPT ); Sun, 24 Apr 2022 23:42:53 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 707A82F026; Sun, 24 Apr 2022 20:39:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650857990; x=1682393990; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w5IxTrKVM9UmdAGSefOOJ0UE+57s2glhh61AAAfqqWs=; b=F/GWquMjXrWg5EEJbpq+VJv3WLxecUQTsx/vdwNVsqtFjm/HKFrcpBFV RM3MBgu03vzGi5JYjPRO/+39DLAQNeWWzbFD15z1H6s27tmjKC4r247pa lgPaROIBA6JXafIP0U6emVkl3T87Nbub+2OhAx47zhcQBPCgPPAL1wFJY Qg3BEhNfOKVgJ/6R8ZCxWDnDih+G6hIaBAO8B9N3VjbCx9E3XqFC6/JKG LlzL62vs4uu/zRTVGt7Z0gGCoY9nm+GKA5VsELQkF6C1uuzU1N9L0/2wf nqXClhNqUur5Xr+5y/d9bNf0++Q8qlFaFsIUgyCy3iNdkdnEmNe8PK3YY w==; X-IronPort-AV: E=McAfee;i="6400,9594,10327"; a="325612370" X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="325612370" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 20:39:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="659959896" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 24 Apr 2022 20:39:42 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id C1B54530; Mon, 25 Apr 2022 06:39:35 +0300 (EEST) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Dave Hansen , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 05/12] efi/x86: Implement support for unaccepted memory Date: Mon, 25 Apr 2022 06:39:27 +0300 Message-Id: <20220425033934.68551-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> References: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org UEFI Specification version 2.9 introduces the concept of memory acceptance: Some Virtual Machine platforms, such as Intel TDX or AMD SEV-SNP, requiring memory to be accepted before it can be used by the guest. Accepting happens via a protocol specific for the Virtual Machine platform. Accepting memory is costly and it makes VMM allocate memory for the accepted guest physical address range. It's better to postpone memory acceptance until memory is needed. It lowers boot time and reduces memory overhead. The kernel needs to know what memory has been accepted. Firmware communicates this information via memory map: a new memory type -- EFI_UNACCEPTED_MEMORY -- indicates such memory. Range-based tracking works fine for firmware, but it gets bulky for the kernel: e820 has to be modified on every page acceptance. It leads to table fragmentation, but there's a limited number of entries in the e820 table Another option is to mark such memory as usable in e820 and track if the range has been accepted in a bitmap. One bit in the bitmap represents 2MiB in the address space: one 4k page is enough to track 64GiB or physical address space. In the worst-case scenario -- a huge hole in the middle of the address space -- It needs 256MiB to handle 4PiB of the address space. Any unaccepted memory that is not aligned to 2M gets accepted upfront. The bitmap is allocated and constructed in the EFI stub and passed down to the kernel via boot_params. allocate_e820() allocates the bitmap if unaccepted memory is present, according to the maximum address in the memory map. The same boot_params.unaccepted_memory can be used to pass the bitmap between two kernels on kexec, but the use-case is not yet implemented. Make KEXEC and UNACCEPTED_MEMORY mutually exclusive for now. The implementation requires some basic helpers in boot stub. They provided by linux/ includes in the main kernel image, but is not present in boot stub. Create copy of required functionality in the boot stub. Signed-off-by: Kirill A. Shutemov --- Documentation/x86/zero-page.rst | 1 + arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/mem.c | 68 +++++++++++++++++++++++ arch/x86/include/asm/unaccepted_memory.h | 10 ++++ arch/x86/include/uapi/asm/bootparam.h | 2 +- drivers/firmware/efi/Kconfig | 15 ++++++ drivers/firmware/efi/efi.c | 1 + drivers/firmware/efi/libstub/x86-stub.c | 69 ++++++++++++++++++++++++ include/linux/efi.h | 3 +- 9 files changed, 168 insertions(+), 2 deletions(-) create mode 100644 arch/x86/boot/compressed/mem.c create mode 100644 arch/x86/include/asm/unaccepted_memory.h diff --git a/Documentation/x86/zero-page.rst b/Documentation/x86/zero-page.rst index f088f5881666..bb8e9cb093cc 100644 --- a/Documentation/x86/zero-page.rst +++ b/Documentation/x86/zero-page.rst @@ -19,6 +19,7 @@ Offset/Size Proto Name Meaning 058/008 ALL tboot_addr Physical address of tboot shared page 060/010 ALL ist_info Intel SpeedStep (IST) BIOS support information (struct ist_info) +078/008 ALL unaccepted_memory Bitmap of unaccepted memory (1bit == 2M) 080/010 ALL hd0_info hd0 disk parameter, OBSOLETE!! 090/010 ALL hd1_info hd1 disk parameter, OBSOLETE!! 0A0/010 ALL sys_desc_table System description table (struct sys_desc_table), diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 8fd0e6ae2e1f..7f672f7e2fea 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -102,6 +102,7 @@ endif vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o vmlinux-objs-$(CONFIG_INTEL_TDX_GUEST) += $(obj)/tdx.o $(obj)/tdcall.o +vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY) += $(obj)/bitmap.o $(obj)/mem.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c new file mode 100644 index 000000000000..415df0d3bc81 --- /dev/null +++ b/arch/x86/boot/compressed/mem.c @@ -0,0 +1,68 @@ +// SPDX-License-Identifier: GPL-2.0-only + +#include "../cpuflags.h" +#include "bitmap.h" +#include "error.h" +#include "math.h" + +#define PMD_SHIFT 21 +#define PMD_SIZE (_AC(1, UL) << PMD_SHIFT) +#define PMD_MASK (~(PMD_SIZE - 1)) + +static inline void __accept_memory(phys_addr_t start, phys_addr_t end) +{ + /* Platform-specific memory-acceptance call goes here */ + error("Cannot accept memory"); +} + +/* + * The accepted memory bitmap only works at PMD_SIZE granularity. If a request + * comes in to mark memory as unaccepted which is not PMD_SIZE-aligned, simply + * accept the memory now since it can not be *marked* as unaccepted. + */ +void process_unaccepted_memory(struct boot_params *params, u64 start, u64 end) +{ + /* + * Accept small regions that might not be able to be represented + * in the bitmap. This is a bit imprecise and may accept some + * areas that could have been represented in the bitmap instead. + * + * Consider case like this: + * + * | 4k | 2044k | 2048k | + * ^ 0x0 ^ 2MB ^ 4MB + * + * all memory in the range is unaccepted, except for the first 4k. + * The second 2M can be represented in the bitmap, but kernel accept it + * right away. The imprecision makes the code simpler by ensuring that + * at least one bit will be set int the bitmap below. + */ + if (end - start < 2 * PMD_SIZE) { + __accept_memory(start, end); + return; + } + + /* + * No matter how the start and end are aligned, at least one unaccepted + * PMD_SIZE area will remain. + */ + + /* Immediately accept a unaccepted_memory, + start / PMD_SIZE, (end - start) / PMD_SIZE); +} diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h new file mode 100644 index 000000000000..df0736d32858 --- /dev/null +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (C) 2020 Intel Corporation */ +#ifndef _ASM_X86_UNACCEPTED_MEMORY_H +#define _ASM_X86_UNACCEPTED_MEMORY_H + +struct boot_params; + +void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num); + +#endif diff --git a/arch/x86/include/uapi/asm/bootparam.h b/arch/x86/include/uapi/asm/bootparam.h index b25d3f82c2f3..f7a32176f301 100644 --- a/arch/x86/include/uapi/asm/bootparam.h +++ b/arch/x86/include/uapi/asm/bootparam.h @@ -179,7 +179,7 @@ struct boot_params { __u64 tboot_addr; /* 0x058 */ struct ist_info ist_info; /* 0x060 */ __u64 acpi_rsdp_addr; /* 0x070 */ - __u8 _pad3[8]; /* 0x078 */ + __u64 unaccepted_memory; /* 0x078 */ __u8 hd0_info[16]; /* obsolete! */ /* 0x080 */ __u8 hd1_info[16]; /* obsolete! */ /* 0x090 */ struct sys_desc_table sys_desc_table; /* obsolete! */ /* 0x0a0 */ diff --git a/drivers/firmware/efi/Kconfig b/drivers/firmware/efi/Kconfig index 2c3dac5ecb36..e8048586aefa 100644 --- a/drivers/firmware/efi/Kconfig +++ b/drivers/firmware/efi/Kconfig @@ -243,6 +243,21 @@ config EFI_DISABLE_PCI_DMA options "efi=disable_early_pci_dma" or "efi=no_disable_early_pci_dma" may be used to override this option. +config UNACCEPTED_MEMORY + bool + depends on EFI_STUB + depends on !KEXEC_CORE + help + Some Virtual Machine platforms, such as Intel TDX, require + some memory to be "accepted" by the guest before it can be used. + This mechanism helps prevent malicious hosts from making changes + to guest memory. + + UEFI specification v2.9 introduced EFI_UNACCEPTED_MEMORY memory type. + + This option adds support for unaccepted memory and makes such memory + usable by the kernel. + endmenu config EFI_EMBEDDED_FIRMWARE diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index 5502e176d51b..2c055afb1b11 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -747,6 +747,7 @@ static __initdata char memory_type_name[][13] = { "MMIO Port", "PAL Code", "Persistent", + "Unaccepted", }; char * __init efi_md_typeattr_format(char *buf, size_t size, diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 5401985901f5..f9b88174209e 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -15,6 +15,7 @@ #include #include #include +#include #include "efistub.h" @@ -504,6 +505,17 @@ setup_e820(struct boot_params *params, struct setup_data *e820ext, u32 e820ext_s e820_type = E820_TYPE_PMEM; break; + case EFI_UNACCEPTED_MEMORY: + if (!IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) { + efi_warn_once("The system has unaccepted memory," + " but kernel does not support it\n"); + efi_warn_once("Consider enabling UNACCEPTED_MEMORY\n"); + continue; + } + e820_type = E820_TYPE_RAM; + process_unaccepted_memory(params, d->phys_addr, + d->phys_addr + PAGE_SIZE * d->num_pages); + break; default: continue; } @@ -568,6 +580,59 @@ static efi_status_t alloc_e820ext(u32 nr_desc, struct setup_data **e820ext, return status; } +static efi_status_t allocate_unaccepted_memory(struct boot_params *params, + __u32 nr_desc, + struct efi_boot_memmap *map) +{ + unsigned long *mem = NULL; + u64 size, max_addr = 0; + efi_status_t status; + bool found = false; + int i; + + /* Check if there's any unaccepted memory and find the max address */ + for (i = 0; i < nr_desc; i++) { + efi_memory_desc_t *d; + + d = efi_early_memdesc_ptr(*map->map, *map->desc_size, i); + if (d->type == EFI_UNACCEPTED_MEMORY) + found = true; + if (d->phys_addr + d->num_pages * PAGE_SIZE > max_addr) + max_addr = d->phys_addr + d->num_pages * PAGE_SIZE; + } + + if (!found) { + params->unaccepted_memory = 0; + return EFI_SUCCESS; + } + + /* + * If unaccepted memory is present allocate a bitmap to track what + * memory has to be accepted before access. + * + * One bit in the bitmap represents 2MiB in the address space: + * A 4k bitmap can track 64GiB of physical address space. + * + * In the worst case scenario -- a huge hole in the middle of the + * address space -- It needs 256MiB to handle 4PiB of the address + * space. + * + * TODO: handle situation if params->unaccepted_memory has already set. + * It's required to deal with kexec. + * + * The bitmap will be populated in setup_e820() according to the memory + * map after efi_exit_boot_services(). + */ + size = DIV_ROUND_UP(max_addr, PMD_SIZE * BITS_PER_BYTE); + status = efi_allocate_pages(size, (unsigned long *)&mem, ULONG_MAX); + if (status == EFI_SUCCESS) { + memset(mem, 0, size); + params->unaccepted_memory = (unsigned long)mem; + } + + return status; +} + static efi_status_t allocate_e820(struct boot_params *params, struct efi_boot_memmap *map, struct setup_data **e820ext, @@ -589,6 +654,10 @@ static efi_status_t allocate_e820(struct boot_params *params, if (status != EFI_SUCCESS) goto out; } + + if (IS_ENABLED(CONFIG_UNACCEPTED_MEMORY)) + status = allocate_unaccepted_memory(params, nr_desc, map); + out: efi_bs_call(free_pool, *map->map); return status; diff --git a/include/linux/efi.h b/include/linux/efi.h index ccd4d3f91c98..b0240fdcaf5b 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -108,7 +108,8 @@ typedef struct { #define EFI_MEMORY_MAPPED_IO_PORT_SPACE 12 #define EFI_PAL_CODE 13 #define EFI_PERSISTENT_MEMORY 14 -#define EFI_MAX_MEMORY_TYPE 15 +#define EFI_UNACCEPTED_MEMORY 15 +#define EFI_MAX_MEMORY_TYPE 16 /* Attribute values: */ #define EFI_MEMORY_UC ((u64)0x0000000000000001ULL) /* uncached */ From patchwork Mon Apr 25 03:39:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 565906 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 953AEC433EF for ; Mon, 25 Apr 2022 03:40:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240736AbiDYDnR (ORCPT ); Sun, 24 Apr 2022 23:43:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55896 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240714AbiDYDm5 (ORCPT ); Sun, 24 Apr 2022 23:42:57 -0400 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B78D22F02B; Sun, 24 Apr 2022 20:39:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650857990; x=1682393990; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UAkTjqMUWj3U88352d4TWKeyPDjEnADBaQUiU5n9fSE=; b=StYn3UrmFLVazORhqRmojlemplxSDSVvWyE/3VqQsfGYzX4PQPr3RSIE uEMIsNrtLbAvayhVJppa7StWqZLXs6nJum80cmjG1avZHS35sGJJf0vDm f4Vo0MQIlBzSpGahVHTrrlRE2hDpj6Ly0jiNMsGIQua4a31/NjZi260b8 VRgqiYM1bRAWxteHOPDGJvtACvn28dOUZQqNwL1hB+HhiItajbgv6SmP4 bjJFbeB0sGgcXBIj+Dq1GzKf8Oa3wl3hd8rT+uJbXW99bg3wg3lbsAQVE V17ezgc6LoO7tdk8Ke1PnIJzvgIJTNtXnvdcJdFWRingnsFl8lYJlsMaF A==; X-IronPort-AV: E=McAfee;i="6400,9594,10327"; a="328064267" X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="328064267" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 20:39:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="557501602" Received: from black.fi.intel.com ([10.237.72.28]) by orsmga007.jf.intel.com with ESMTP; 24 Apr 2022 20:39:43 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 0FAE86EE; Mon, 25 Apr 2022 06:39:36 +0300 (EEST) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Dave Hansen , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 10/12] x86/tdx: Unaccepted memory support Date: Mon, 25 Apr 2022 06:39:32 +0300 Message-Id: <20220425033934.68551-11-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> References: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org All preparations are complete. Hookup TDX-specific code to accept memory. Accepting the memory is the same process as converting memory from shared to private: kernel notifies VMM with MAP_GPA hypercall and then accept pages with ACCEPT_PAGE module call. The implementation in core kernel uses tdx_enc_status_changed(). It already used for converting memory to shared and back for I/O transactions. Boot stub provides own implementation of tdx_accept_memory(). It is similar in structure to tdx_enc_status_changed(), but only cares about converting memory to private. Signed-off-by: Kirill A. Shutemov --- arch/x86/Kconfig | 1 + arch/x86/boot/compressed/mem.c | 24 ++++++++- arch/x86/boot/compressed/tdx.c | 85 +++++++++++++++++++++++++++++++ arch/x86/coco/tdx/tdx.c | 31 +++++++---- arch/x86/include/asm/shared/tdx.h | 2 + arch/x86/mm/unaccepted_memory.c | 9 +++- 6 files changed, 141 insertions(+), 11 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 7021ec725dd3..e4c31dbea6d7 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -885,6 +885,7 @@ config INTEL_TDX_GUEST select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE + select UNACCEPTED_MEMORY help Support running as a guest under Intel TDX. Without this support, the guest kernel can not boot or run under TDX. diff --git a/arch/x86/boot/compressed/mem.c b/arch/x86/boot/compressed/mem.c index b5058c975d26..539fff27de49 100644 --- a/arch/x86/boot/compressed/mem.c +++ b/arch/x86/boot/compressed/mem.c @@ -5,6 +5,8 @@ #include "error.h" #include "find.h" #include "math.h" +#include "tdx.h" +#include #define PMD_SHIFT 21 #define PMD_SIZE (_AC(1, UL) << PMD_SHIFT) @@ -12,10 +14,30 @@ extern struct boot_params *boot_params; +static bool is_tdx_guest(void) +{ + static bool once; + static bool is_tdx; + + if (!once) { + u32 eax, sig[3]; + + cpuid_count(TDX_CPUID_LEAF_ID, 0, &eax, + &sig[0], &sig[2], &sig[1]); + is_tdx = !memcmp(TDX_IDENT, sig, sizeof(sig)); + once = true; + } + + return is_tdx; +} + static inline void __accept_memory(phys_addr_t start, phys_addr_t end) { /* Platform-specific memory-acceptance call goes here */ - error("Cannot accept memory"); + if (is_tdx_guest()) + tdx_accept_memory(start, end); + else + error("Cannot accept memory"); } /* diff --git a/arch/x86/boot/compressed/tdx.c b/arch/x86/boot/compressed/tdx.c index 918a7606f53c..57fd2bf28484 100644 --- a/arch/x86/boot/compressed/tdx.c +++ b/arch/x86/boot/compressed/tdx.c @@ -3,12 +3,14 @@ #include "../cpuflags.h" #include "../string.h" #include "../io.h" +#include "align.h" #include "error.h" #include #include #include +#include /* Called from __tdx_hypercall() for unrecoverable failure */ void __tdx_hypercall_failed(void) @@ -75,3 +77,86 @@ void early_tdx_detect(void) pio_ops.f_outb = tdx_outb; pio_ops.f_outw = tdx_outw; } + +enum pg_level { + PG_LEVEL_4K, + PG_LEVEL_2M, + PG_LEVEL_1G, +}; + +#define PTE_SHIFT 9 + +static bool try_accept_one(phys_addr_t *start, unsigned long len, + enum pg_level pg_level) +{ + unsigned long accept_size = PAGE_SIZE << (pg_level * PTE_SHIFT); + u64 tdcall_rcx; + u8 page_size; + + if (!IS_ALIGNED(*start, accept_size)) + return false; + + if (len < accept_size) + return false; + + /* + * Pass the page physical address to the TDX module to accept the + * pending, private page. + * + * Bits 2:0 of RCX encode page size: 0 - 4K, 1 - 2M, 2 - 1G. + */ + switch (pg_level) { + case PG_LEVEL_4K: + page_size = 0; + break; + case PG_LEVEL_2M: + page_size = 1; + break; + case PG_LEVEL_1G: + page_size = 2; + break; + default: + return false; + } + + tdcall_rcx = *start | page_size; + if (__tdx_module_call(TDX_ACCEPT_PAGE, tdcall_rcx, 0, 0, 0, NULL)) + return false; + + *start += accept_size; + return true; +} + +void tdx_accept_memory(phys_addr_t start, phys_addr_t end) +{ + /* + * Notify the VMM about page mapping conversion. More info about ABI + * can be found in TDX Guest-Host-Communication Interface (GHCI), + * section "TDG.VP.VMCALL" + */ + if (_tdx_hypercall(TDVMCALL_MAP_GPA, start, end - start, 0, 0)) + error("Accepting memory failed\n"); + + /* + * For shared->private conversion, accept the page using + * TDX_ACCEPT_PAGE TDX module call. + */ + while (start < end) { + unsigned long len = end - start; + + /* + * Try larger accepts first. It gives chance to VMM to keep + * 1G/2M SEPT entries where possible and speeds up process by + * cutting number of hypercalls (if successful). + */ + + if (try_accept_one(&start, len, PG_LEVEL_1G)) + continue; + + if (try_accept_one(&start, len, PG_LEVEL_2M)) + continue; + + if (!try_accept_one(&start, len, PG_LEVEL_4K)) + error("Accepting memory failed\n"); + } +} diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index ddb60a87b426..ab4deb897942 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -580,16 +580,9 @@ static bool try_accept_one(phys_addr_t *start, unsigned long len, return true; } -/* - * Inform the VMM of the guest's intent for this physical page: shared with - * the VMM or private to the guest. The VMM is expected to change its mapping - * of the page in response. - */ -static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc) +static bool tdx_enc_status_changed_phys(phys_addr_t start, phys_addr_t end, + bool enc) { - phys_addr_t start = __pa(vaddr); - phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE); - if (!enc) { /* Set the shared (decrypted) bits: */ start |= cc_mkdec(0); @@ -634,6 +627,25 @@ static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc) return true; } +void tdx_accept_memory(phys_addr_t start, phys_addr_t end) +{ + if (!tdx_enc_status_changed_phys(start, end, true)) + panic("Accepting memory failed\n"); +} + +/* + * Inform the VMM of the guest's intent for this physical page: shared with + * the VMM or private to the guest. The VMM is expected to change its mapping + * of the page in response. + */ +static bool tdx_enc_status_changed(unsigned long vaddr, int numpages, bool enc) +{ + phys_addr_t start = __pa(vaddr); + phys_addr_t end = __pa(vaddr + numpages * PAGE_SIZE); + + return tdx_enc_status_changed_phys(start, end, enc); +} + void __init tdx_early_init(void) { u64 cc_mask; @@ -645,6 +657,7 @@ void __init tdx_early_init(void) return; setup_force_cpu_cap(X86_FEATURE_TDX_GUEST); + setup_clear_cpu_cap(X86_FEATURE_MCE); cc_set_vendor(CC_VENDOR_INTEL); cc_mask = get_cc_mask(); diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index 956ced04c3be..97534c334473 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -81,5 +81,7 @@ struct tdx_module_output { u64 __tdx_module_call(u64 fn, u64 rcx, u64 rdx, u64 r8, u64 r9, struct tdx_module_output *out); +void tdx_accept_memory(phys_addr_t start, phys_addr_t end); + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_SHARED_TDX_H */ diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c index 1327f64d5205..de0790af1824 100644 --- a/arch/x86/mm/unaccepted_memory.c +++ b/arch/x86/mm/unaccepted_memory.c @@ -6,6 +6,7 @@ #include #include +#include #include /* Protects unaccepted memory bitmap */ @@ -29,7 +30,13 @@ void accept_memory(phys_addr_t start, phys_addr_t end) unsigned long len = range_end - range_start; /* Platform-specific memory-acceptance call goes here */ - panic("Cannot accept memory"); + if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { + tdx_accept_memory(range_start * PMD_SIZE, + range_end * PMD_SIZE); + } else { + panic("Cannot accept memory"); + } + bitmap_clear(unaccepted_memory, range_start, len); } spin_unlock_irqrestore(&unaccepted_memory_lock, flags); From patchwork Mon Apr 25 03:39:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 565907 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EEE1C433EF for ; Mon, 25 Apr 2022 03:40:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240721AbiDYDnO (ORCPT ); Sun, 24 Apr 2022 23:43:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56162 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240726AbiDYDnC (ORCPT ); Sun, 24 Apr 2022 23:43:02 -0400 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C89D2F01D; Sun, 24 Apr 2022 20:39:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650857994; x=1682393994; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5hPx0r3OWxS+lz/D+l0BiGQBbeJJHb52sFjAF/68dVQ=; b=Bpd6NcKA0XJRrq2U5Jl6i/oyVXFhvsM76P4Lwcljbhqf60OiH3TVVl+1 W1oiy3x1Cs8hS6wlE+Uk2/6fi+mBS25xH8wO3DnYRMmklRYQR3I+fV++4 7i1jbZnW1D+aqk0Y7yIgRi/pr1uXYU1OoVsXRqShiba//mztPlO8nFM1C X7kx0MIQoGyYklb+iRVVpkiOSznOd7Kk5gpKhAw8VGD44lOJj7A/vvmlz X7B/nSxJ/hbFQpZ/bQv0Taz9po55XLtQUUqybWiWotTO3LhRsmDnVttvA kJ2b03AqqIl0V4My9m7j7O3HxMGk3fIcGsVsBrSj47h99GdhykDvMkCbK w==; X-IronPort-AV: E=McAfee;i="6400,9594,10327"; a="247051030" X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="247051030" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 20:39:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="649514099" Received: from black.fi.intel.com ([10.237.72.28]) by FMSMGA003.fm.intel.com with ESMTP; 24 Apr 2022 20:39:43 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 1D2DC739; Mon, 25 Apr 2022 06:39:36 +0300 (EEST) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Dave Hansen , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 11/12] mm/vmstat: Add counter for memory accepting Date: Mon, 25 Apr 2022 06:39:33 +0300 Message-Id: <20220425033934.68551-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> References: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org The counter increased every time kernel accepts a memory region. The counter allows to see if memory acceptation is still ongoing and contributes to memory allocation latency. Signed-off-by: Kirill A. Shutemov --- arch/x86/mm/unaccepted_memory.c | 1 + include/linux/vm_event_item.h | 3 +++ mm/vmstat.c | 3 +++ 3 files changed, 7 insertions(+) diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c index de0790af1824..65cd49b93c50 100644 --- a/arch/x86/mm/unaccepted_memory.c +++ b/arch/x86/mm/unaccepted_memory.c @@ -38,6 +38,7 @@ void accept_memory(phys_addr_t start, phys_addr_t end) } bitmap_clear(unaccepted_memory, range_start, len); + count_vm_events(ACCEPT_MEMORY, len * PMD_SIZE / PAGE_SIZE); } spin_unlock_irqrestore(&unaccepted_memory_lock, flags); } diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 16a0a4fd000b..6a468164a2f9 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -136,6 +136,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, +#endif +#ifdef CONFIG_UNACCEPTED_MEMORY + ACCEPT_MEMORY, #endif NR_VM_EVENT_ITEMS }; diff --git a/mm/vmstat.c b/mm/vmstat.c index b75b1a64b54c..4c9197f32406 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1397,6 +1397,9 @@ const char * const vmstat_text[] = { "direct_map_level2_splits", "direct_map_level3_splits", #endif +#ifdef CONFIG_UNACCEPTED_MEMORY + "accept_memory", +#endif #endif /* CONFIG_VM_EVENT_COUNTERS || CONFIG_MEMCG */ }; #endif /* CONFIG_PROC_FS || CONFIG_SYSFS || CONFIG_NUMA || CONFIG_MEMCG */ From patchwork Mon Apr 25 03:39:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Kirill A. Shutemov" X-Patchwork-Id: 565908 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF2C7C433EF for ; Mon, 25 Apr 2022 03:40:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240730AbiDYDnG (ORCPT ); Sun, 24 Apr 2022 23:43:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55898 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240724AbiDYDnC (ORCPT ); Sun, 24 Apr 2022 23:43:02 -0400 Received: from mga06.intel.com (mga06b.intel.com [134.134.136.31]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ABED12F031; Sun, 24 Apr 2022 20:39:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1650857992; x=1682393992; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UW/BHigZ6spgrKRAqG9bWiHedWVnA5vrB2UptEivfzc=; b=Wq3ULo2C+tSeDwqVzZIb01wooRUSz5J6Ob+G8UmhDkVjAb0MTnxhSPLx oloyCDzn3SspBr/TWszB9ij0hbPDnOMmtMQZFJP27R6fHrt97eg8CQPOh Nz/h3AiKoeLC9UenNXulARhpYfsUMLNZpa4aPXQL3/hMbT3ZziNJtr45g lBpmmsYcOTRe5FAWFQ/6xkYSghNJbMHmppGbaENAsjDmkVezv/zhersXW 7pcmOqcvlqxsRamG1FfPOXV1rlFFjZUCNBnLqqLeHZPfzNyOS5zhl0YaZ hpWad1RfoSjetmfBMIhMHR1uPL4BP0ZNzTEeolTt8QEcrPrVpwdY/WTSr Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10327"; a="325612384" X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="325612384" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Apr 2022 20:39:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.90,287,1643702400"; d="scan'208";a="659959901" Received: from black.fi.intel.com ([10.237.72.28]) by fmsmga002.fm.intel.com with ESMTP; 24 Apr 2022 20:39:43 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 288BC7E0; Mon, 25 Apr 2022 06:39:36 +0300 (EEST) From: "Kirill A. Shutemov" To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Joerg Roedel , Ard Biesheuvel Cc: Andi Kleen , Kuppuswamy Sathyanarayanan , David Rientjes , Vlastimil Babka , Tom Lendacky , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Varad Gautam , Dario Faggioli , Dave Hansen , Brijesh Singh , Mike Rapoport , David Hildenbrand , x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv5 12/12] x86/mm: Report unaccepted memory in /proc/meminfo Date: Mon, 25 Apr 2022 06:39:34 +0300 Message-Id: <20220425033934.68551-13-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> References: <20220425033934.68551-1-kirill.shutemov@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org Track amount of unaccepted memory and report it in /proc/meminfo. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/set_memory.h | 2 ++ arch/x86/include/asm/unaccepted_memory.h | 9 ++++++ arch/x86/mm/init.c | 8 ++++++ arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/mm/unaccepted_memory.c | 36 +++++++++++++++++++++++- 5 files changed, 55 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_memory.h index 78ca53512486..e467f3941d22 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -86,6 +86,8 @@ bool kernel_page_present(struct page *page); extern int kernel_set_to_readonly; +void direct_map_meminfo(struct seq_file *m); + #ifdef CONFIG_X86_64 /* * Prevent speculative access to the page by either unmapping diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h index a59264ee0ab3..7c93661152a9 100644 --- a/arch/x86/include/asm/unaccepted_memory.h +++ b/arch/x86/include/asm/unaccepted_memory.h @@ -3,7 +3,10 @@ #ifndef _ASM_X86_UNACCEPTED_MEMORY_H #define _ASM_X86_UNACCEPTED_MEMORY_H +#include + struct boot_params; +struct seq_file; void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num); @@ -12,5 +15,11 @@ void process_unaccepted_memory(struct boot_params *params, u64 start, u64 num); void accept_memory(phys_addr_t start, phys_addr_t end); bool memory_is_unaccepted(phys_addr_t start, phys_addr_t end); +void unaccepted_meminfo(struct seq_file *m); + +#else + +static inline void unaccepted_meminfo(struct seq_file *m) {} + #endif #endif diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index d8cfce221275..7e92a9d93994 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1065,3 +1065,11 @@ unsigned long max_swapfile_size(void) return pages; } #endif + +#ifdef CONFIG_PROC_FS +void arch_report_meminfo(struct seq_file *m) +{ + direct_map_meminfo(m); + unaccepted_meminfo(m); +} +#endif diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index abf5ed76e4b7..2880ba01451c 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -105,7 +105,7 @@ static void split_page_count(int level) direct_pages_count[level - 1] += PTRS_PER_PTE; } -void arch_report_meminfo(struct seq_file *m) +void direct_map_meminfo(struct seq_file *m) { seq_printf(m, "DirectMap4k: %8lu kB\n", direct_pages_count[PG_LEVEL_4K] << 2); diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c index 65cd49b93c50..66a6c529bf31 100644 --- a/arch/x86/mm/unaccepted_memory.c +++ b/arch/x86/mm/unaccepted_memory.c @@ -3,14 +3,17 @@ #include #include #include +#include +#include #include #include #include #include -/* Protects unaccepted memory bitmap */ +/* Protects unaccepted memory bitmap and nr_unaccepted */ static DEFINE_SPINLOCK(unaccepted_memory_lock); +static unsigned long nr_unaccepted; void accept_memory(phys_addr_t start, phys_addr_t end) { @@ -39,6 +42,12 @@ void accept_memory(phys_addr_t start, phys_addr_t end) bitmap_clear(unaccepted_memory, range_start, len); count_vm_events(ACCEPT_MEMORY, len * PMD_SIZE / PAGE_SIZE); + + /* In early boot nr_unaccepted is not yet initialized */ + if (nr_unaccepted) { + WARN_ON(nr_unaccepted < len); + nr_unaccepted -= len; + } } spin_unlock_irqrestore(&unaccepted_memory_lock, flags); } @@ -62,3 +71,28 @@ bool memory_is_unaccepted(phys_addr_t start, phys_addr_t end) return ret; } + +void unaccepted_meminfo(struct seq_file *m) +{ + seq_printf(m, "UnacceptedMem: %8lu kB\n", + (READ_ONCE(nr_unaccepted) * PMD_SIZE) >> 10); +} + +static int __init unaccepted_meminfo_init(void) +{ + unsigned long *unaccepted_memory; + unsigned long flags, bitmap_size; + + if (!boot_params.unaccepted_memory) + return 0; + + bitmap_size = e820__end_of_ram_pfn() * PAGE_SIZE / PMD_SIZE; + unaccepted_memory = __va(boot_params.unaccepted_memory); + + spin_lock_irqsave(&unaccepted_memory_lock, flags); + nr_unaccepted = bitmap_weight(unaccepted_memory, bitmap_size); + spin_unlock_irqrestore(&unaccepted_memory_lock, flags); + + return 0; +} +fs_initcall(unaccepted_meminfo_init);