From patchwork Wed Mar 14 08:29:42 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: AKASHI Takahiro X-Patchwork-Id: 131569 Delivered-To: patch@linaro.org Received: by 10.46.84.17 with SMTP id i17csp574314ljb; Wed, 14 Mar 2018 01:29:15 -0700 (PDT) X-Google-Smtp-Source: AG47ELvIHZHt93etGwT2Kc3onvZcUmi5RnIwFMgyVj75PvTgVSo3L4P6er2ZG8Mr9sCzReCCnEAB X-Received: by 10.98.58.129 with SMTP id v1mr3451664pfj.203.1521016155808; Wed, 14 Mar 2018 01:29:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521016155; cv=none; d=google.com; s=arc-20160816; b=W5iwUcHHpGNBznD2fvGWJj85oGfMNjIIBIFt8G4G0kzkAcewOSaELsZhAeQL4Ppdcv vQAHn54nwQM9olbxMCFzu0gE1qKKgbZtePCWHSH89z3scIo+d+Lai2YxC9+q28/CRwsl m+gJbnHKSKw9z7kXmVL5aaQ+G9EBiNw0sN1VtM9N89v9g/+TS/Aezh2JzLkSqfNKP6Pa /3HvPk3ComiJEU7+8nbdUm5xnGqKrvbSEkSbdauQ1tUlHnaigu78emRaPXjz1LxMFaOy i0Fbo8+OTZ0jFTWK3OWxVHk9UKD+O4jnPTLeHTcczZ/waOh099FdJbjjiQ8euoqHZYbm bX6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:mail-followup-to:message-id:subject:cc:to:from:date :dkim-signature:arc-authentication-results; bh=SluqxH73/PJ9XDgb/OWYAiav5VNZTiXBTf/Fs6kogNw=; b=EQrQN2mIuylLaD640Ln+hhG3FK5wJ8Id96jfPowTSuG+cq4EksZFDji6cpOg9httEC GLJVbrcjU9Zh6owGI8BKritOuwwbaoYMxI6Cs6iCY+IWOzJKf+fm2PzXUS3vsH9wQ81i jPv1K/B0T6Y9sEKs2jfz92dh0kBmVh2ZKPZKozJ4nmjZcz8vok0Gl5wTlC8VfLqZtVaC wrlGoDndaXAhGqv1zaNY+OOWx1NIn4QRLjO3PRBSZCjaJkG9mBKHwH3lH9Qy4kIIwXuw i6nuaj785z3/NNME7W7qsrJDmY3TwU4ZKpFDfC13sha1bmjZon16ZlkRd5z1MK04D0BL bdYw== ARC-Authentication-Results: i=1; mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=jCThxqMH; spf=pass (google.com: best guess record for domain of linux-efi-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-efi-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r6-v6si1567521pls.13.2018.03.14.01.29.15; Wed, 14 Mar 2018 01:29:15 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-efi-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org header.s=google header.b=jCThxqMH; spf=pass (google.com: best guess record for domain of linux-efi-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-efi-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751417AbeCNI3P (ORCPT + 2 others); Wed, 14 Mar 2018 04:29:15 -0400 Received: from mail-io0-f173.google.com ([209.85.223.173]:36522 "EHLO mail-io0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751346AbeCNI3M (ORCPT ); Wed, 14 Mar 2018 04:29:12 -0400 Received: by mail-io0-f173.google.com with SMTP id e30so3373175ioc.3 for ; Wed, 14 Mar 2018 01:29:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=date:from:to:cc:subject:message-id:mail-followup-to:mime-version :content-disposition:user-agent; bh=OGButhUb322UAlV5t9RM5D0UXCfeHAOm2nnHL9lt8M4=; b=jCThxqMHIkErWp1AQdGeuVkyiYyMfgM8ABv/Nrg1oYbNckeWXMb85QpV24gLXskvDS S3YjpqcZyDFdV1p1h/hVLBjcZGkQNof1Ub6UU5GsoCxMfKImc1ec5zLfzcnG2+j+6RfT hd3ek9yb8vY+Be60Ooc3r+S4DAdWP+Ijl9qLA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id :mail-followup-to:mime-version:content-disposition:user-agent; bh=OGButhUb322UAlV5t9RM5D0UXCfeHAOm2nnHL9lt8M4=; b=qX+3CcP/TyzwYaKDZGFcZaugHdZgvV/GwBiP0uPbnZC9VEWVS+BmwffZXGSk3OM1Lm 0KqC1EQYtP7RN5gSWuIhxcyQyWsJWLXDye9oWE0bqYJcfbDEoCCEXclamZU7LBim75Bx 9P7oQSeABcu6hk2w/XnUA25QXqx+ukJKbcwlTc1/LiaOn6rh4C3mwr5ORfThOK4mIT77 cTUN4ak2jLzakxHyOuY//SZa7PgAVE8HJ1Od3jJPnN0hKpK8Dolr9UWieD0vCS3qM+b8 rWKbnGhzd7eogoXMGtSKSrQ3Tnn6OJxFqLgWAuhDffdtkz5ugOkSKuB+kxf9U/cAZeyE Cp8Q== X-Gm-Message-State: AElRT7GsGrN1FrMeGckVSp9WbsW56IhWg9E9cWakEHf9mPg4VH8RtWV6 Mj1QExOhSHwvFuS+SvM9soT+og== X-Received: by 10.107.81.3 with SMTP id f3mr3816732iob.284.1521016151800; Wed, 14 Mar 2018 01:29:11 -0700 (PDT) Received: from linaro.org ([121.95.100.191]) by smtp.googlemail.com with ESMTPSA id l74-v6sm756859itb.25.2018.03.14.01.29.08 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 14 Mar 2018 01:29:11 -0700 (PDT) Date: Wed, 14 Mar 2018 17:29:42 +0900 From: AKASHI Takahiro To: linux-arm-kernel@lists.infradead.org, linux-efi@vger.kernel.org, kexec@lists.infradead.org Cc: ard.biesheuvel@linaro.org, james.morse@arm.com, bhsharma@redhat.com, tbaicar@codeaurora.org, jhugo@codeaurora.org Subject: [RFC] arm64: extra entries in /proc/iomem for kexec Message-ID: <20180314082941.GM25863@linaro.org> Mail-Followup-To: AKASHI Takahiro , linux-arm-kernel@lists.infradead.org, linux-efi@vger.kernel.org, kexec@lists.infradead.org, ard.biesheuvel@linaro.org, james.morse@arm.com, bhsharma@redhat.com, tbaicar@codeaurora.org, jhugo@codeaurora.org MIME-Version: 1.0 Content-Disposition: inline User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-efi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-efi@vger.kernel.org In the last couples of months, there were some problems reported [1],[2] around arm64 kexec/kdump. Where those phenomenon look different, the root cause would be that kexec/kdump doesn't take into account crucial "reserved" regions of system memory and unintentionally corrupts them. Given that kexec-tools looks for all the information by seeking the file, /proc/iomem, the first step to address said problems is to expand this file's format so that it will have enough information about system memory and its usage. Attached is my experimental code: With this patch applied, /proc/iomem sees something like the below: (format A) 40000000-5871ffff : System RAM 40080000-40f1ffff : Kernel code 41040000-411e8fff : Kernel data 54400000-583fffff : Crash kernel 58590000-585effff : EFI Resources 58700000-5871ffff : EFI Resources 58720000-58b5ffff : System RAM 58720000-58b5ffff : EFI Resources 58b60000-5be3ffff : System RAM 58b61018-58b61947 : EFI Memory Map 59a7b118-59a7b667 : EFI Configuration Tables 5be40000-5becffff : System RAM <== (A-1) 5be40000-5becffff : EFI Resources 5bed0000-5bedffff : System RAM 5bee0000-5bffffff : System RAM 5bee0000-5bffffff : EFI Resources 5c000000-5fffffff : System RAM 8000000000-ffffffffff : PCI Bus 0000:00 Meanwhile, the workaround I suggested in [3] gave us a simpler view: (format B) 40000000-5871ffff : System RAM 40080000-40f1ffff : Kernel code 41040000-411e9fff : Kernel data 54400000-583fffff : Crash kernel 58590000-585effff : reserved 58700000-5871ffff : reserved 58720000-58b5ffff : reserved 58b60000-5be3ffff : System RAM 58b61000-58b61fff : reserved 59a7b318-59a7b867 : reserved 5be40000-5becffff : reserved <== (B-1) 5bed0000-5bedffff : System RAM 5bee0000-5bffffff : reserved 5c000000-5fffffff : System RAM 5ec00000-5edfffff : reserved 8000000000-ffffffffff : PCI Bus 0000:00 Here all the regions to be protected are named just "reserved" whether they are NOMAP regions or simply-memblock_reserve'd. They are not very useful for anything but kexec/kdump which knows what they mean. Alternatively, we may want to give them more specific names, based on related efi memory map descriptors and else, that will characterize their contents: (format C) 40000000-5871ffff : System RAM 40080000-40f1ffff : Kernel code 41040000-411e9fff : Kernel data 54400000-583fffff : Crash kernel 58590000-585effff : ACPI Reclaim Memory 58700000-5871ffff : ACPI Reclaim Memory 58720000-58b5ffff : System RAM 58720000-5878ffff : Runtime Data 58790000-587dffff : Runtime Code 587e0000-5882ffff : Runtime Data 58830000-5887ffff : Runtime Code 58880000-588cffff : Runtime Data 588d0000-5891ffff : Runtime Code 58920000-5896ffff : Runtime Data 58970000-589bffff : Runtime Code 589c0000-58a5ffff : Runtime Data 58a60000-58abffff : Runtime Code 58ac0000-58b0ffff : Runtime Data 58b10000-58b5ffff : Runtime Code 58b60000-5be3ffff : System RAM 58b61000-58b61fff : EFI Memory Map 59a7b118-59a7b667 : EFI Memory Attributes Table 5be40000-5becffff : System RAM 5be40000-5becffff : Runtime Code 5bed0000-5bedffff : System RAM 5bee0000-5bffffff : System RAM 5bee0000-5bffffff : Runtime Data 5c000000-5fffffff : System RAM 8000000000-ffffffffff : PCI Bus 0000:00 I once created a patch for this format, but it looks quite noisy and names are a sort of mixture of memory attributes( ACPI Reclaim memory, Conventional Memory, Persistent Memory etc.) vs. function/usages ([Loader|Boot Service|Runtime] Code/Data). (As a matter of fact, (C-1) consists of various ACPI tables.) Anyhow, they seem not so useful for most of other applications. Those observations lead to format A, where some entries with the same attributes are squeezed into a single entry under a simple name if they are neighbouring. So my questions here are: 1. Which format, A, B, or C, is the most appropriate for the moment? or any other suggestions? Currently, there is a inconsistent view between (A) and the mainline's: see (A-1) and (B-1). If this is really a matter, I can fix it. Kexec-tools can be easily modified to accept both formats, though. 2. How should we determine which regions be exported in /proc/iomem? a. Trust all the memblock_reserve'd regions as my previous patch [3] does. As I said, it's a kind of "overkill." Some of regions, say fdt, are not required to be preserved across kexec. b. List regions separately and exhaustively later on at a single point as my patch attached does. I'm afraid of any possibility that some regions may be doubly counted, one from efi memory map search and another from other efi/acpi code (various type of "tables" in most cases). c. Expand efi_mem_reserve() with an argument of "resource descriptor" and replace memblock_reserve() in efi code wherever necessary so as to maintain an export list. efi_mem_reserve() was first introduced for specific needs at kexec on x86, but I believe that its coverage over efi code is far from perfect. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2017-November/541831.html [2] http://lkml.iu.edu//hypermail/linux/kernel/1802.2/06745.html [3] http://lkml.iu.edu//hypermail/linux/kernel/1803.0/04658.html http://lkml.iu.edu//hypermail/linux/kernel/1803.0/04659.html -Takahiro AKASHI -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ===8<=== diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index 30ad2f085d1f..feda5cbdc6bf 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -214,13 +214,8 @@ static void __init request_standard_resources(void) for_each_memblock(memory, region) { res = alloc_bootmem_low(sizeof(*res)); - if (memblock_is_nomap(region)) { - res->name = "reserved"; - res->flags = IORESOURCE_MEM; - } else { - res->name = "System RAM"; - res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; - } + res->name = "System RAM"; + res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY; res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region)); res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1; @@ -239,6 +234,9 @@ static void __init request_standard_resources(void) request_resource(res, &crashk_res); #endif } + + /* Add firmware-reserved memory */ + efi_arch_request_resources(); } u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID }; diff --git a/drivers/firmware/efi/arm-init.c b/drivers/firmware/efi/arm-init.c index 80d1a885def5..308143e69db4 100644 --- a/drivers/firmware/efi/arm-init.c +++ b/drivers/firmware/efi/arm-init.c @@ -13,8 +13,10 @@ #define pr_fmt(fmt) "efi: " fmt +#include #include #include +#include #include #include #include @@ -280,3 +282,97 @@ static int __init register_gop_device(void) return PTR_ERR_OR_ZERO(pd); } subsys_initcall(register_gop_device); + +static unsigned long __init efi_memattr_to_iores_type(u64 addr) +{ + if (efi_mem_attributes(addr) & + (EFI_MEMORY_WB|EFI_MEMORY_WT|EFI_MEMORY_WC)) + return IORESOURCE_SYSTEM_RAM; + else + return IORESOURCE_MEM; +} + +static unsigned long __init efi_type_to_iores_desc(u64 addr) +{ + /* TODO */ + return IORES_DESC_NONE; +} + +static struct resource * __init _efi_arch_request_resource(u64 start, u64 end, + char *name, struct resource *prev) +{ + struct resource *conflict, *res; + + res = alloc_bootmem_low(sizeof(*res)); + + res->start = start; + res->end = end; + res->flags = efi_memattr_to_iores_type(res->start); + res->desc = efi_type_to_iores_desc(res->start); + res->name = name; + + conflict = request_resource_conflict(&iomem_resource, res); + if (conflict) { + if (prev && (prev->parent == conflict) && + ((prev->end + 1) == start)) { + /* merge consecutive regions */ + adjust_resource(prev, prev->start, + end - prev->start + 1); + free_bootmem((unsigned long)res, sizeof(*res)); + res = prev; + } else + insert_resource(conflict, res); + } + + return res; +} + +/* Kexec expects those resources to be preserved */ +void __init efi_arch_request_resources(void) +{ + struct resource *res = NULL; + efi_memory_desc_t *md; + u64 paddr, npages, size; + + /* EFI Memory Map */ + /* FIXME */ + _efi_arch_request_resource(efi.memmap.phys_map, + efi.memmap.phys_map + + efi.memmap.desc_size * efi.memmap.nr_map - 1, + "EFI Memory Map", res); + + /* generic EFI Configuration Tables */ + efi_request_config_table_resources(_efi_arch_request_resource); + + /* architecture-specifc Configuration Tables */ + if (screen_info.lfb_size) + _efi_arch_request_resource(screen_info.lfb_base, + screen_info.lfb_base + screen_info.lfb_size - 1, + "EFI Screen Info Table", res); + + /* architecture-specific EFI resources */ + /* FIXME */ + efi_memmap_install(efi.memmap.phys_map, efi.memmap.nr_map); + + res = NULL; + for_each_efi_memory_desc(md) { + paddr = md->phys_addr; + npages = md->num_pages; + + memrange_efi_to_native(&paddr, &npages); + size = npages << PAGE_SHIFT; + + if (is_memory(md)) { + if (!is_usable_memory(md)) + res = _efi_arch_request_resource(paddr, + paddr + size - 1, + "EFI Resources", res); + + if (md->type == EFI_ACPI_RECLAIM_MEMORY) + res = _efi_arch_request_resource(paddr, + paddr + size - 1, + "EFI Resources", res); + } + } + efi_memmap_unmap(); +} diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index cd42f66a7c85..b13c9461278b 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -603,6 +603,33 @@ int __init efi_config_init(efi_config_table_type_t *arch_tables) return ret; } +void __init efi_request_config_table_resources(struct resource *(*fn)(u64 start, + u64 end, char *name, struct resource *prev)) +{ + struct resource *prev = NULL; + char *name = "EFI Configuration Tables"; + + if (efi.config_table_size) + prev = fn(efi.config_table, + efi.config_table + efi.config_table_size - 1, name, + prev); + + if (efi.mem_attr_table_size) + prev = fn(efi.mem_attr_table, + efi.mem_attr_table + efi.mem_attr_table_size - 1, name, + prev); + + if (efi.esrt_size) + prev = fn(efi.esrt, efi.esrt + efi.esrt_size - 1, name, prev); + + if (efi.tpm_log_size) + prev = fn(efi.tpm_log, efi.tpm_log + efi.tpm_log_size - 1, name, + prev); + + + /* TODO: BGRT */ +} + #ifdef CONFIG_EFI_VARS_MODULE static int __init efi_load_efivars(void) { diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c index c47e0c6ec00f..61f66c139afb 100644 --- a/drivers/firmware/efi/esrt.c +++ b/drivers/firmware/efi/esrt.c @@ -330,6 +330,7 @@ void __init efi_esrt_init(void) esrt_data = (phys_addr_t)efi.esrt; esrt_data_size = size; + efi.esrt_size = size; end = esrt_data + size; pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end); diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c index 8986757eafaf..dc2c7608793a 100644 --- a/drivers/firmware/efi/memattr.c +++ b/drivers/firmware/efi/memattr.c @@ -42,6 +42,7 @@ int __init efi_memattr_init(void) } tbl_size = sizeof(*tbl) + tbl->num_entries * tbl->desc_size; + efi.mem_attr_table_size = tbl_size; memblock_reserve(efi.mem_attr_table, tbl_size); set_bit(EFI_MEM_ATTR, &efi.flags); diff --git a/drivers/firmware/efi/tpm.c b/drivers/firmware/efi/tpm.c index 0cbeb3d46b18..53cfb12513fa 100644 --- a/drivers/firmware/efi/tpm.c +++ b/drivers/firmware/efi/tpm.c @@ -33,6 +33,7 @@ int __init efi_tpm_eventlog_init(void) } tbl_size = sizeof(*log_tbl) + log_tbl->size; + efi.tpm_log_size = tbl_size; memblock_reserve(efi.tpm_log, tbl_size); early_memunmap(log_tbl, sizeof(*log_tbl)); return 0; diff --git a/include/linux/efi.h b/include/linux/efi.h index f5083aa72eae..9c3f8d284b36 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -942,11 +942,15 @@ extern struct efi { unsigned long fw_vendor; /* fw_vendor */ unsigned long runtime; /* runtime table */ unsigned long config_table; /* config tables */ + unsigned long config_table_size; unsigned long esrt; /* ESRT table */ + unsigned long esrt_size; unsigned long properties_table; /* properties table */ unsigned long mem_attr_table; /* memory attributes table */ + unsigned long mem_attr_table_size; unsigned long rng_seed; /* UEFI firmware random seed */ unsigned long tpm_log; /* TPM2 Event Log table */ + unsigned long tpm_log_size; efi_get_time_t *get_time; efi_set_time_t *set_time; efi_get_wakeup_time_t *get_wakeup_time; @@ -980,6 +984,9 @@ efi_guid_to_str(efi_guid_t *guid, char *out) } extern void efi_init (void); +extern void efi_arch_request_resources(void); +extern void efi_request_config_table_resources(struct resource * + (*fn)(u64 start, u64 end, char *name, struct resource *prev)); extern void *efi_get_pal_addr (void); extern void efi_map_pal_code (void); extern void efi_memmap_walk (efi_freemem_callback_t callback, void *arg);