From patchwork Mon Jul 15 18:53:08 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wahl X-Patchwork-Id: 812682 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEDB04EB37; Mon, 15 Jul 2024 18:54:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.147.86 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721069670; cv=none; b=SvFaOeDXEeQKZEMnhp780VjXfLiHqIQqfaxLkj1DjHWy3o5NU/nZpWAVJp2OxI6MLmCT9Lh6YpGewCVqtm9w7MG83nZ2eDeYn+9WvPhpJi+1nDnDfizPEDaaoVIrgO6SviDdcHkI+s9WTaKvi7B7PAjS8z/vRXSO95xEukjkzBo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721069670; c=relaxed/simple; bh=406glRtonkqGEa7am3+UpFuEnHxkwOFnqUzjS1hNNmU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=hQiQjtY4ZbAVp5BeKikDQ4y+1Pj+37xbAdvkmDDy9ygm6xRICb68k2K1OJbr0xxm8I5tiqfUmROjeYF+Wt39wE4FswtczZYsShE5aZui747oXsnUwNLlecJhPwPM7vQn/KfjjVbBqnHtvDuC4qI83pseiJrtunvVtuT4dYJVeIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com; spf=pass smtp.mailfrom=hpe.com; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b=O+SvTFNR; arc=none smtp.client-ip=148.163.147.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hpe.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b="O+SvTFNR" Received: from pps.filterd (m0150241.ppops.net [127.0.0.1]) by mx0a-002e3701.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46FIQa1N020560; Mon, 15 Jul 2024 18:53:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=pps0720; bh=b9w0XB6La ytVB7Wdf1B/UO3yLAMzxQddtR5FFc0xAXI=; b=O+SvTFNRZkkemnyioMs1vrQ4l jVXcMUACB5nrzwiDNaNGLK0k+O9ajXlVnjrVQN60+fXs7RAQT3Ja6X2OCm1iFRFJ 0a88OzVEqSSOY7RLA9a+Ke9c/z/mpPgnSN/MoKJUPsavMDsLxQm3m+laJ2gGnX6L p6mnfnA6vTQfg+ROKKS5/TFEO0zZVn3c9fRlOi6tGjFSP6UK+1rJpL069I2gxA0W C/WW1mmIj7JTJhIqC31/fVWHBgLsyLgZVejeXd7EMGvs2F2CS7Toq5JRcD6gM3Oi 7+QwaU5Eh0zo+ER6/kZamfEPvhgE7JRz60jPO3JGbdxMPfVj6CsSnqjOLMtqg== Received: from p1lg14879.it.hpe.com ([16.230.97.200]) by mx0a-002e3701.pphosted.com (PPS) with ESMTPS id 40d920g5dc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jul 2024 18:53:14 +0000 (GMT) Received: from p1lg14886.dc01.its.hpecorp.net (unknown [10.119.18.237]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14879.it.hpe.com (Postfix) with ESMTPS id 62DFF147B7; Mon, 15 Jul 2024 18:53:13 +0000 (UTC) Received: from dog.eag.rdlabs.hpecorp.net (unknown [16.231.227.36]) by p1lg14886.dc01.its.hpecorp.net (Postfix) with ESMTP id 4597380A808; Mon, 15 Jul 2024 18:53:11 +0000 (UTC) Received: by dog.eag.rdlabs.hpecorp.net (Postfix, from userid 200934) id D919B3003D738; Mon, 15 Jul 2024 13:53:09 -0500 (CDT) From: Steve Wahl To: Steve Wahl , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Pavin Joseph , Eric Hagberg Cc: Simon Horman , Eric Biederman , Dave Young , Sarah Brofeldt , Russ Anderson , Dimitri Sivanich , Hou Wenlong , Andrew Morton , Baoquan He , Yuntao Wang , Bjorn Helgaas , Joerg Roedel , Michael Roth , Tao Liu , kexec@lists.infradead.org, "Kalra, Ashish" , Ard Biesheuvel , linux-efi@vger.kernel.org Subject: [PATCH v2 1/2] x86/kexec: Add EFI config table identity mapping for kexec kernel Date: Mon, 15 Jul 2024 13:53:08 -0500 Message-Id: <20240715185309.1637839-2-steve.wahl@hpe.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20240715185309.1637839-1-steve.wahl@hpe.com> References: <20240715185309.1637839-1-steve.wahl@hpe.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-GUID: erjPfCL_YkXRsiLsrvsUrX7KQgUKW0HO X-Proofpoint-ORIG-GUID: erjPfCL_YkXRsiLsrvsUrX7KQgUKW0HO X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-15_13,2024-07-11_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 impostorscore=0 spamscore=0 clxscore=1011 malwarescore=0 adultscore=0 priorityscore=1501 mlxscore=0 mlxlogscore=999 bulkscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2406140001 definitions=main-2407150147 From: Tao Liu A kexec kernel boot failure is sometimes observed on AMD CPUs due to an unmapped EFI config table array. This can be seen when "nogbpages" is on the kernel command line, and has been observed as a full BIOS reboot rather than a successful kexec. This was also the cause of reported regressions attributed to Commit 7143c5f4cf20 ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.") which was subsequently reverted. To avoid this page fault, explicitly include the EFI config table array in the kexec identity map. Further explanation: The following 2 commits caused the EFI config table array to be accessed when enabling sev at kernel startup. commit ec1c66af3a30 ("x86/compressed/64: Detect/setup SEV/SME features earlier during boot") commit c01fce9cef84 ("x86/compressed: Add SEV-SNP feature detection/setup") This is in the code that examines whether SEV should be enabled or not, so it can even affect systems that are not SEV capable. This may result in a page fault if the EFI config table array's address is unmapped. Since the page fault occurs before the new kernel establishes its own identity map and page fault routines, it is unrecoverable and kexec fails. Most often, this problem is not seen because the EFI config table array gets included in the map by the luck of being placed at a memory address close enough to other memory areas that *are* included in the map created by kexec. Both the "nogbpages" command line option and the "use gpbages only where full GB page should be mapped" patch greatly reduce the chance of being included in the map by luck, which is why the problem appears. Signed-off-by: Tao Liu Signed-off-by: Steve Wahl Tested-by: Pavin Joseph Tested-by: Sarah Brofeldt Tested-by: Eric Hagberg --- arch/x86/kernel/machine_kexec_64.c | 35 ++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c index cc0f7f70b17b..563d119f9f29 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -28,6 +28,7 @@ #include #include #include +#include #ifdef CONFIG_ACPI /* @@ -83,10 +84,12 @@ const struct kexec_file_ops * const kexec_file_loaders[] = { #endif static int -map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p) +map_efi_tables(struct x86_mapping_info *info, pgd_t *level4p) { #ifdef CONFIG_EFI unsigned long mstart, mend; + void *kaddr; + int ret; if (!efi_enabled(EFI_BOOT)) return 0; @@ -102,6 +105,30 @@ map_efi_systab(struct x86_mapping_info *info, pgd_t *level4p) if (!mstart) return 0; + ret = kernel_ident_mapping_init(info, level4p, mstart, mend); + if (ret) + return ret; + + kaddr = memremap(mstart, mend - mstart, MEMREMAP_WB); + if (!kaddr) { + pr_err("Could not map UEFI system table\n"); + return -ENOMEM; + } + + mstart = efi_config_table; + + if (efi_enabled(EFI_64BIT)) { + efi_system_table_64_t *stbl = (efi_system_table_64_t *)kaddr; + + mend = mstart + sizeof(efi_config_table_64_t) * stbl->nr_tables; + } else { + efi_system_table_32_t *stbl = (efi_system_table_32_t *)kaddr; + + mend = mstart + sizeof(efi_config_table_32_t) * stbl->nr_tables; + } + + memunmap(kaddr); + return kernel_ident_mapping_init(info, level4p, mstart, mend); #endif return 0; @@ -241,10 +268,10 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable) } /* - * Prepare EFI systab and ACPI tables for kexec kernel since they are - * not covered by pfn_mapped. + * Prepare EFI systab, config table and ACPI tables for kexec kernel + * since they are not covered by pfn_mapped. */ - result = map_efi_systab(&info, level4p); + result = map_efi_tables(&info, level4p); if (result) return result; From patchwork Mon Jul 15 18:53:09 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Steve Wahl X-Patchwork-Id: 812900 Received: from mx0a-002e3701.pphosted.com (mx0a-002e3701.pphosted.com [148.163.147.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98DA01C687; Mon, 15 Jul 2024 18:54:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=148.163.147.86 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721069670; cv=none; b=FSdjxfpZRML8teu9PJaYbltLemxii6WgpQv5kaFW5X4npJFbvVCI5nETkEKRZfGfV5xN9QH/6UbQwFpvHW+GO8yUCkqSqIe5fBeVK3qZ2DM/Tj4/vJf/OkBcN2JVDkNMwOPsXkdIfA6H0yPBUK+3sVGRxVW5mUtHtuK9ol9eWME= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1721069670; c=relaxed/simple; bh=pBawZxHaYBQT2t62wERrG9s82JSHOZeCgIQYqIENj64=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZQN2u+KPWL8ZxHGM/CsiJeRdh9NpZmUTWPoUPP4jyUIMhaXGEXRIJUZ8WcSGm5JKoFgN7nGXUhaa/BkqQ0G4Xkwa3RV53zO8xVy2phJTWhN2lOuV4MWWW6bkKL9UJ0Bvywp7xvAEXUNy8xlJtu7plZY2mm2rpcVVe9QzV1lDXkc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com; spf=pass smtp.mailfrom=hpe.com; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b=gzDusuFI; arc=none smtp.client-ip=148.163.147.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=hpe.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=hpe.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=hpe.com header.i=@hpe.com header.b="gzDusuFI" Received: from pps.filterd (m0134421.ppops.net [127.0.0.1]) by mx0b-002e3701.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46FIqx7B023075; Mon, 15 Jul 2024 18:53:15 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hpe.com; h=from :to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; s=pps0720; bh=ma6MWGfDS iybBQfLQjd48GruvjHUoPF2GnYDc9S5+xA=; b=gzDusuFIUVWKJ+vp4epftAvGm dDGsl5+y0x2ixnPzTlBPijvt7hK7UAQ2FGpLa+sb3ffp6owUXmrFWolA2m2r6Wtv O/c7wqOsE2rxU7GAcD1zmCcdvLY/IiNkaBRweTUWxL4JeBaZjc1fF8g2wTD2msh3 JP/aAlJ677/WO4dHGXoA2AwXYPpFSimptHcorkp9KxrzJHInPV1KuugRPP8s+Y5A EzYO/xe5kOPXIACOsLG0v6FSBxfHFPr9nYWy4BePalFqXqZNrbQp4zeqHnkuJWhb aQ3sTxOVt0rCQNdsj6v7wLKCAi+Zjg03MRzgPZyCSTh4Ow5LFbIbQ4ie4K8Yg== Received: from p1lg14880.it.hpe.com ([16.230.97.201]) by mx0b-002e3701.pphosted.com (PPS) with ESMTPS id 40bfsd1334-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 15 Jul 2024 18:53:14 +0000 (GMT) Received: from p1lg14885.dc01.its.hpecorp.net (unknown [10.119.18.236]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by p1lg14880.it.hpe.com (Postfix) with ESMTPS id 6C688800267; Mon, 15 Jul 2024 18:53:13 +0000 (UTC) Received: from dog.eag.rdlabs.hpecorp.net (unknown [16.231.227.39]) by p1lg14885.dc01.its.hpecorp.net (Postfix) with ESMTP id 4EAA5806B23; Mon, 15 Jul 2024 18:53:11 +0000 (UTC) Received: by dog.eag.rdlabs.hpecorp.net (Postfix, from userid 200934) id DA9933003D73E; Mon, 15 Jul 2024 13:53:09 -0500 (CDT) From: Steve Wahl To: Steve Wahl , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Pavin Joseph , Eric Hagberg Cc: Simon Horman , Eric Biederman , Dave Young , Sarah Brofeldt , Russ Anderson , Dimitri Sivanich , Hou Wenlong , Andrew Morton , Baoquan He , Yuntao Wang , Bjorn Helgaas , Joerg Roedel , Michael Roth , Tao Liu , kexec@lists.infradead.org, "Kalra, Ashish" , Ard Biesheuvel , linux-efi@vger.kernel.org Subject: [PATCH v2 2/2] x86/mm/ident_map: Use gbpages only where full GB page should be mapped. Date: Mon, 15 Jul 2024 13:53:09 -0500 Message-Id: <20240715185309.1637839-3-steve.wahl@hpe.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20240715185309.1637839-1-steve.wahl@hpe.com> References: <20240715185309.1637839-1-steve.wahl@hpe.com> Precedence: bulk X-Mailing-List: linux-efi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Proofpoint-GUID: O0Bo_58AH_KYzszwFF4URZ3YOJr2hd9- X-Proofpoint-ORIG-GUID: O0Bo_58AH_KYzszwFF4URZ3YOJr2hd9- X-HPE-SCL: -1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-15_13,2024-07-11_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 adultscore=0 mlxscore=0 clxscore=1015 phishscore=0 lowpriorityscore=0 mlxlogscore=999 impostorscore=0 spamscore=0 bulkscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2406140001 definitions=main-2407150147 When ident_pud_init() uses only gbpages to create identity maps, large ranges of addresses not actually requested can be included in the resulting table; a 4K request will map a full GB. This can include a lot of extra address space past that requested, including areas marked reserved by the BIOS. That allows processor speculation into reserved regions, that on UV systems can cause system halts. Only use gbpages when map creation requests include the full GB page of space. Fall back to using smaller 2M pages when only portions of a GB page are included in the request. No attempt is made to coalesce mapping requests. If a request requires a map entry at the 2M (pmd) level, subsequent mapping requests within the same 1G region will also be at the pmd level, even if adjacent or overlapping such requests could have been combined to map a full gbpage. Existing usage starts with larger regions and then adds smaller regions, so this should not have any great consequence. Signed-off-by: Steve Wahl Tested-by: Pavin Joseph Tested-by: Sarah Brofeldt Tested-by: Eric Hagberg --- arch/x86/mm/ident_map.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c index 968d7005f4a7..a204a332c71f 100644 --- a/arch/x86/mm/ident_map.c +++ b/arch/x86/mm/ident_map.c @@ -26,18 +26,31 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page, for (; addr < end; addr = next) { pud_t *pud = pud_page + pud_index(addr); pmd_t *pmd; + bool use_gbpage; next = (addr & PUD_MASK) + PUD_SIZE; if (next > end) next = end; - if (info->direct_gbpages) { - pud_t pudval; + /* if this is already a gbpage, this portion is already mapped */ + if (pud_leaf(*pud)) + continue; + + /* Is using a gbpage allowed? */ + use_gbpage = info->direct_gbpages; - if (pud_present(*pud)) - continue; + /* Don't use gbpage if it maps more than the requested region. */ + /* at the begining: */ + use_gbpage &= ((addr & ~PUD_MASK) == 0); + /* ... or at the end: */ + use_gbpage &= ((next & ~PUD_MASK) == 0); + + /* Never overwrite existing mappings */ + use_gbpage &= !pud_present(*pud); + + if (use_gbpage) { + pud_t pudval; - addr &= PUD_MASK; pudval = __pud((addr - info->offset) | info->page_flag); set_pud(pud, pudval); continue;