Message ID | cover.1664298147.git.demi@invisiblethingslab.com |
---|---|
Headers | show |
Series | EFI improvements for Xen dom0 | expand |
On 30.09.2022 01:02, Demi Marie Obenour wrote: > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > Xen before Linux gets to start using it. Therefore, Linux under Xen > must not use EFI tables from such memory. Most of the remaining EFI > memory types are not suitable for EFI tables, leaving only > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > use tables that are located in one of these types of memory. > > This patch ensures this, and also adds a function > (xen_config_table_memory_region_max()) that will be used later to > replace the usage of the EFI memory map in esrt.c when running under > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > but I have not implemented this. > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed "-mapbs". Should we perhaps extend the interface such that Dom0 can then also use tables located in such regions, perhaps by faking EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO? Jan
On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > Xen before Linux gets to start using it. Therefore, Linux under Xen > must not use EFI tables from such memory. Most of the remaining EFI > memory types are not suitable for EFI tables, leaving only > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > use tables that are located in one of these types of memory. > > This patch ensures this, and also adds a function > (xen_config_table_memory_region_max()) that will be used later to > replace the usage of the EFI memory map in esrt.c when running under > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > but I have not implemented this. > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > --- > drivers/firmware/efi/efi.c | 8 +++++--- > drivers/xen/efi.c | 35 +++++++++++++++++++++++++++++++++++ > include/linux/efi.h | 9 +++++++++ > 3 files changed, 49 insertions(+), 3 deletions(-) > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644 > --- a/drivers/firmware/efi/efi.c > +++ b/drivers/firmware/efi/efi.c > @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > unsigned long table; > int i; > > - pr_info(""); Why are you removing these prints? > for (i = 0; i < count; i++) { > if (!IS_ENABLED(CONFIG_X86)) { > guid = &config_tables[i].guid; > @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > if (IS_ENABLED(CONFIG_X86_32) && > tbl64[i].table > U32_MAX) { > - pr_cont("\n"); > pr_err("Table located above 4GB, disabling EFI.\n"); > return -EINVAL; > } > @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > table = tbl32[i].table; > } > > +#ifdef CONFIG_XEN_EFI We tend to prefer IS_ENABLED() for cases such as this one. That way, the compiler always gets to see the code inside the conditional block, which gives better build test coverage (even if CONFIG_XEN_EFI is disabled). > + if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table)) So the question here is whether Xen thinks the table should be disregarded or not. So let's define a prototype that reflects that purpose, and let the implementation reason about how this should be achieved. So if (IS_ENABLED(CONFIG_XEN_EFI) && efi_enabled(EFI_PARAVIRT) && xen_efi_config_table_valid(guid, table) continue I should note here, though, that EFI_PARAViRT is only set on x86 not on other architectures that enable CONFIG_XEN_EFI so this will not work anywhere else. > + continue; > +#endif > + > if (!match_config_table(guid, table, common_tables) && arch_tables) > match_config_table(guid, table, arch_tables); > } > - pr_cont("\n"); > set_bit(EFI_CONFIG_TABLES, &efi.flags); > > if (efi_rng_seed != EFI_INVALID_TABLE_ADDR) { > diff --git a/drivers/xen/efi.c b/drivers/xen/efi.c > index d1ff2186ebb48a7c0981ecb6d4afcbbb25ffcea0..c2274ddfcc63304008ef0fd78fd9fa416f75d073 100644 > --- a/drivers/xen/efi.c > +++ b/drivers/xen/efi.c > @@ -28,6 +28,7 @@ > #include <xen/interface/platform.h> > #include <xen/xen.h> > #include <xen/xen-ops.h> > +#include <xen/page.h> > > #include <asm/page.h> > > @@ -271,6 +272,40 @@ static void xen_efi_reset_system(int reset_type, efi_status_t status, > } > } > > +__init u64 xen_config_table_memory_region_max(u64 addr) It is more idiomatic for Linux to put __init after the return type. And if we adopt my suggestion above, this becomes bool __init xen_efi_config_table_valid(const efi_guid_t *guid, u64 table) Alternatively, you could pass the string identifier of the table instead of the guid (or both) to print in the diagnostic message. > +{ > + static_assert(XEN_PAGE_SHIFT == EFI_PAGE_SHIFT, > + "Mismatch between EFI_PAGE_SHIFT and XEN_PAGE_SHIFT"); Is this the only place where this matters? And this never happens on x86, right? > + struct xen_platform_op op = { > + .cmd = XENPF_firmware_info, > + .u.firmware_info = { > + .type = XEN_FW_EFI_INFO, > + .index = XEN_FW_EFI_MEM_INFO, > + .u.efi_info.mem.addr = addr, > + .u.efi_info.mem.size = U64_MAX - addr, > + } > + }; > + union xenpf_efi_info *info = &op.u.firmware_info.u.efi_info; > + int rc = HYPERVISOR_platform_op(&op); > + > + if (rc) { > + pr_warn("Failed to lookup header %llu in Xen memory map: error %d\n", > + (unsigned long long)addr, rc); > + return 0; > + } > + > + switch (info->mem.type) { > + case EFI_RUNTIME_SERVICES_CODE: > + case EFI_RUNTIME_SERVICES_DATA: > + case EFI_ACPI_RECLAIM_MEMORY: If we are listing all memory types that Xen preserves, you might add EFI_RESERVED_MEMORY here. Otherwise, please only list the ones that you need to permit explicitly. > + return info->mem.addr + info->mem.size; > + default: > + pr_warn("Table %llu is in memory of type %d, ignoring it\n", > + (unsigned long long)addr, info->mem.type); > + return 0; > + } > +} > + > /* > * Set XEN EFI runtime services function pointers. Other fields of struct efi, > * e.g. efi.systab, will be set like normal EFI. > diff --git a/include/linux/efi.h b/include/linux/efi.h > index d2b84c2fec39f0268324d1a38a73ed67786973c9..fc81e4b984398cdb399e7886b2cae7f33bf91613 100644 > --- a/include/linux/efi.h > +++ b/include/linux/efi.h > @@ -1324,4 +1324,13 @@ struct linux_efi_coco_secret_area { > /* Header of a populated EFI secret area */ > #define EFI_SECRET_TABLE_HEADER_GUID EFI_GUID(0x1e74f542, 0x71dd, 0x4d66, 0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b) > > +#ifdef CONFIG_XEN_EFI Please drop this #ifdef > +/* > + * Returns the end of the memory region containing the given config table, > + * or 0 if the given address does not reside in memory that can validly > + * contain EFI configuration tables. > + */ > +__init u64 xen_config_table_memory_region_max(u64 addr); You can drop the __init here > +#endif > + > #endif /* _LINUX_EFI_H */ > -- > Sincerely, > Demi Marie Obenour (she/her/hers) > Invisible Things Lab >
On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote: > > On 30.09.2022 01:02, Demi Marie Obenour wrote: > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > must not use EFI tables from such memory. Most of the remaining EFI > > memory types are not suitable for EFI tables, leaving only > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > use tables that are located in one of these types of memory. > > > > This patch ensures this, and also adds a function > > (xen_config_table_memory_region_max()) that will be used later to > > replace the usage of the EFI memory map in esrt.c when running under > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > but I have not implemented this. > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed > "-mapbs". Should we perhaps extend the interface such that Dom0 can then > also use tables located in such regions, perhaps by faking > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO? > I know this ship has sailed for x86, but for the sake of other architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME bits alone, for the same reasons I gave earlier. (Runtime mappings for the firmware code itself, page table fragmentation etc etc) I know very little about Xen, but based on the context you provided in this thread, I'd say that the best approach from the Xen side is to convert all EfiBootServicesData regions that have configuration tables pointing into them into EfiAcpiReclaimMemory. I take it XEN_FW_EFI_MEM_INFO is an existing interface? If so, you might do the same for the returned type - EfiBootServicesData -> EfiAcpiReclaimMemory, and not muck about with the EFI_MEMORY_RUNTIME attribute.
On Fri, Sep 30, 2022 at 08:44:21AM +0200, Jan Beulich wrote: > On 30.09.2022 01:02, Demi Marie Obenour wrote: > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > must not use EFI tables from such memory. Most of the remaining EFI > > memory types are not suitable for EFI tables, leaving only > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > use tables that are located in one of these types of memory. > > > > This patch ensures this, and also adds a function > > (xen_config_table_memory_region_max()) that will be used later to > > replace the usage of the EFI memory map in esrt.c when running under > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > but I have not implemented this. > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed > "-mapbs". Should we perhaps extend the interface such that Dom0 can then > also use tables located in such regions, perhaps by faking > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO? I can add a check for EFI_MEMORY_RUNTIME, but only if I can require a Xen version with https://lore.kernel.org/xen-devel/cc0fbcb4-5ea3-178c-e691-9acb7cc9a3a7@suse.com/t/#u. This is easy in Qubes OS via RPM dependencies, but I am not sure if it is suitable for upstream without a mechanism for dom0 to verify that the patch has been included.
On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote: > On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote: > > > > On 30.09.2022 01:02, Demi Marie Obenour wrote: > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > > must not use EFI tables from such memory. Most of the remaining EFI > > > memory types are not suitable for EFI tables, leaving only > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > > use tables that are located in one of these types of memory. > > > > > > This patch ensures this, and also adds a function > > > (xen_config_table_memory_region_max()) that will be used later to > > > replace the usage of the EFI memory map in esrt.c when running under > > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > > but I have not implemented this. > > > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > > > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed > > "-mapbs". Should we perhaps extend the interface such that Dom0 can then > > also use tables located in such regions, perhaps by faking > > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO? > > > > I know this ship has sailed for x86, but for the sake of other > architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME > bits alone, for the same reasons I gave earlier. (Runtime mappings for > the firmware code itself, page table fragmentation etc etc) Why do you say that it has sailed for x86? > I know very little about Xen, but based on the context you provided in > this thread, I'd say that the best approach from the Xen side is to > convert all EfiBootServicesData regions that have configuration tables > pointing into them into EfiAcpiReclaimMemory. Should Xen convert the entire region, or should it try to reserve only the memory it needs? The latter would require it to parse the configuration tables. Is there a list of configuration tables that can legitimately be in EfiBootServicesData regions? > I take it XEN_FW_EFI_MEM_INFO is an existing interface? If so, you > might do the same for the returned type - EfiBootServicesData -> > EfiAcpiReclaimMemory, and not muck about with the EFI_MEMORY_RUNTIME > attribute. It is indeed an existing interface, and this is a much better idea than what you proposed.
On Fri, Sep 30, 2022 at 06:25:53PM +0200, Ard Biesheuvel wrote: > On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > must not use EFI tables from such memory. Most of the remaining EFI > > memory types are not suitable for EFI tables, leaving only > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > use tables that are located in one of these types of memory. > > > > This patch ensures this, and also adds a function > > (xen_config_table_memory_region_max()) that will be used later to > > replace the usage of the EFI memory map in esrt.c when running under > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > but I have not implemented this. > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > --- > > drivers/firmware/efi/efi.c | 8 +++++--- > > drivers/xen/efi.c | 35 +++++++++++++++++++++++++++++++++++ > > include/linux/efi.h | 9 +++++++++ > > 3 files changed, 49 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > > index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644 > > --- a/drivers/firmware/efi/efi.c > > +++ b/drivers/firmware/efi/efi.c > > @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > unsigned long table; > > int i; > > > > - pr_info(""); > > Why are you removing these prints? If I left them, I would need to include a pr_cont("\n") later. Checkpatch recommends against that. What is the purpose of this print? I assumed that since it prints an empty string it is superfluous. > > for (i = 0; i < count; i++) { > > if (!IS_ENABLED(CONFIG_X86)) { > > guid = &config_tables[i].guid; > > @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > > > if (IS_ENABLED(CONFIG_X86_32) && > > tbl64[i].table > U32_MAX) { > > - pr_cont("\n"); > > pr_err("Table located above 4GB, disabling EFI.\n"); > > return -EINVAL; > > } > > @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > table = tbl32[i].table; > > } > > > > +#ifdef CONFIG_XEN_EFI > > We tend to prefer IS_ENABLED() for cases such as this one. That way, > the compiler always gets to see the code inside the conditional block, > which gives better build test coverage (even if CONFIG_XEN_EFI is > disabled). Can I count on the compiler eliminating the code as unreachable? With CONFIG_XEN_EFI disabled xen_config_table_memory_region_max() would be an undefined symbol. > > + if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table)) > > So the question here is whether Xen thinks the table should be > disregarded or not. So let's define a prototype that reflects that > purpose, and let the implementation reason about how this should be > achieved. xen_config_table_memory_region_max() doesn’t just return whether the table should be disregarded, but also (if the table should not be ignored) the end of the memory region containing it. I will make xen_efi_config_table_valid() a wrapper around xen_config_table_memory_region_max(), as the former also needs to print a warning if the table is in an invalid location. > So > > if (IS_ENABLED(CONFIG_XEN_EFI) && > efi_enabled(EFI_PARAVIRT) && > xen_efi_config_table_valid(guid, table) > continue > > I should note here, though, that EFI_PARAViRT is only set on x86 not > on other architectures that enable CONFIG_XEN_EFI so this will not > work anywhere else. What should I use instead? > > + continue; > > +#endif > > + > > if (!match_config_table(guid, table, common_tables) && arch_tables) > > match_config_table(guid, table, arch_tables); > > } > > - pr_cont("\n"); > > set_bit(EFI_CONFIG_TABLES, &efi.flags); > > > > if (efi_rng_seed != EFI_INVALID_TABLE_ADDR) { > > diff --git a/drivers/xen/efi.c b/drivers/xen/efi.c > > index d1ff2186ebb48a7c0981ecb6d4afcbbb25ffcea0..c2274ddfcc63304008ef0fd78fd9fa416f75d073 100644 > > --- a/drivers/xen/efi.c > > +++ b/drivers/xen/efi.c > > @@ -28,6 +28,7 @@ > > #include <xen/interface/platform.h> > > #include <xen/xen.h> > > #include <xen/xen-ops.h> > > +#include <xen/page.h> > > > > #include <asm/page.h> > > > > @@ -271,6 +272,40 @@ static void xen_efi_reset_system(int reset_type, efi_status_t status, > > } > > } > > > > +__init u64 xen_config_table_memory_region_max(u64 addr) > > It is more idiomatic for Linux to put __init after the return type. > And if we adopt my suggestion above, this becomes > > bool __init xen_efi_config_table_valid(const efi_guid_t *guid, u64 table) > > Alternatively, you could pass the string identifier of the table > instead of the guid (or both) to print in the diagnostic message. Will fix in v5. > > +{ > > + static_assert(XEN_PAGE_SHIFT == EFI_PAGE_SHIFT, > > + "Mismatch between EFI_PAGE_SHIFT and XEN_PAGE_SHIFT"); > > Is this the only place where this matters? And this never happens on x86, right? My understanding is that it should never happen on any architecture. That’s why I static_assert() it. I have no idea if this is the only place it matters, though. > > + struct xen_platform_op op = { > > + .cmd = XENPF_firmware_info, > > + .u.firmware_info = { > > + .type = XEN_FW_EFI_INFO, > > + .index = XEN_FW_EFI_MEM_INFO, > > + .u.efi_info.mem.addr = addr, > > + .u.efi_info.mem.size = U64_MAX - addr, > > + } > > + }; > > + union xenpf_efi_info *info = &op.u.firmware_info.u.efi_info; > > + int rc = HYPERVISOR_platform_op(&op); > > + > > + if (rc) { > > + pr_warn("Failed to lookup header %llu in Xen memory map: error %d\n", > > + (unsigned long long)addr, rc); > > + return 0; > > + } > > + > > + switch (info->mem.type) { > > + case EFI_RUNTIME_SERVICES_CODE: > > + case EFI_RUNTIME_SERVICES_DATA: > > + case EFI_ACPI_RECLAIM_MEMORY: > > If we are listing all memory types that Xen preserves, you might add > EFI_RESERVED_MEMORY here. Otherwise, please only list the ones that > you need to permit explicitly. My understanding was that EFI_RESERVED_MEMORY should never be touched by the OS, so I left it out. Which types would you permit? > > + return info->mem.addr + info->mem.size; > > + default: > > + pr_warn("Table %llu is in memory of type %d, ignoring it\n", > > + (unsigned long long)addr, info->mem.type); > > + return 0; > > + } > > +} > > + > > /* > > * Set XEN EFI runtime services function pointers. Other fields of struct efi, > > * e.g. efi.systab, will be set like normal EFI. > > diff --git a/include/linux/efi.h b/include/linux/efi.h > > index d2b84c2fec39f0268324d1a38a73ed67786973c9..fc81e4b984398cdb399e7886b2cae7f33bf91613 100644 > > --- a/include/linux/efi.h > > +++ b/include/linux/efi.h > > @@ -1324,4 +1324,13 @@ struct linux_efi_coco_secret_area { > > /* Header of a populated EFI secret area */ > > #define EFI_SECRET_TABLE_HEADER_GUID EFI_GUID(0x1e74f542, 0x71dd, 0x4d66, 0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b) > > > > +#ifdef CONFIG_XEN_EFI > > Please drop this #ifdef Will fix in v5. > > +/* > > + * Returns the end of the memory region containing the given config table, > > + * or 0 if the given address does not reside in memory that can validly > > + * contain EFI configuration tables. > > + */ > > +__init u64 xen_config_table_memory_region_max(u64 addr); > > You can drop the __init here Will fix in v5. > > +#endif > > + > > #endif /* _LINUX_EFI_H */ > > -- > > Sincerely, > > Demi Marie Obenour (she/her/hers) > > Invisible Things Lab > >
On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote: > > On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote: > > > > > > On 30.09.2022 01:02, Demi Marie Obenour wrote: > > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > > > must not use EFI tables from such memory. Most of the remaining EFI > > > > memory types are not suitable for EFI tables, leaving only > > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > > > use tables that are located in one of these types of memory. > > > > > > > > This patch ensures this, and also adds a function > > > > (xen_config_table_memory_region_max()) that will be used later to > > > > replace the usage of the EFI memory map in esrt.c when running under > > > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > > > but I have not implemented this. > > > > > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > > > > > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed > > > "-mapbs". Should we perhaps extend the interface such that Dom0 can then > > > also use tables located in such regions, perhaps by faking > > > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO? > > > > > > > I know this ship has sailed for x86, but for the sake of other > > architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME > > bits alone, for the same reasons I gave earlier. (Runtime mappings for > > the firmware code itself, page table fragmentation etc etc) > > Why do you say that it has sailed for x86? > The x86 EFI code in Linux makes changes to the EFI memory map in many different places in the code. On other architectures, we have managed to avoid this, so that the EFI memory map is always identical to the one provided by the firmware at boot. > > I know very little about Xen, but based on the context you provided in > > this thread, I'd say that the best approach from the Xen side is to > > convert all EfiBootServicesData regions that have configuration tables > > pointing into them into EfiAcpiReclaimMemory. > > Should Xen convert the entire region, or should it try to reserve only > the memory it needs? The latter would require it to parse the > configuration tables. Is there a list of configuration tables that can > legitimately be in EfiBootServicesData regions? > Not really, no. So you would have to convert the entire region /unless/ Xen knows the GUID, and therefore knows how to derive the size of the table, allowing it to reserve memory more conservatively. However, I doubt whether this is worth it: splitting entries implies rewriting the memory map, which is a thing I'd rather avoid if I were in your shoes. > > I take it XEN_FW_EFI_MEM_INFO is an existing interface? If so, you > > might do the same for the returned type - EfiBootServicesData -> > > EfiAcpiReclaimMemory, and not muck about with the EFI_MEMORY_RUNTIME > > attribute. > > It is indeed an existing interface, and this is a much better idea than > what you proposed. Right.
On Fri, 30 Sept 2022 at 20:17, Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Fri, Sep 30, 2022 at 06:25:53PM +0200, Ard Biesheuvel wrote: > > On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour > > <demi@invisiblethingslab.com> wrote: > > > > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > > must not use EFI tables from such memory. Most of the remaining EFI > > > memory types are not suitable for EFI tables, leaving only > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > > use tables that are located in one of these types of memory. > > > > > > This patch ensures this, and also adds a function > > > (xen_config_table_memory_region_max()) that will be used later to > > > replace the usage of the EFI memory map in esrt.c when running under > > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > > but I have not implemented this. > > > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > > --- > > > drivers/firmware/efi/efi.c | 8 +++++--- > > > drivers/xen/efi.c | 35 +++++++++++++++++++++++++++++++++++ > > > include/linux/efi.h | 9 +++++++++ > > > 3 files changed, 49 insertions(+), 3 deletions(-) > > > > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > > > index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644 > > > --- a/drivers/firmware/efi/efi.c > > > +++ b/drivers/firmware/efi/efi.c > > > @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > > unsigned long table; > > > int i; > > > > > > - pr_info(""); > > > > Why are you removing these prints? > > If I left them, I would need to include a pr_cont("\n") later. There should always be one at the end of the loop, no? Or is this related to the diagnostic that gets printed in your helper? > Checkpatch recommends against that. What is the purpose of this print? > I assumed that since it prints an empty string it is superfluous. > It prints the leading [invisible] loglevel marker, and the 'efi: ' prefix. > > > for (i = 0; i < count; i++) { > > > if (!IS_ENABLED(CONFIG_X86)) { > > > guid = &config_tables[i].guid; > > > @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > > > > > if (IS_ENABLED(CONFIG_X86_32) && > > > tbl64[i].table > U32_MAX) { > > > - pr_cont("\n"); > > > pr_err("Table located above 4GB, disabling EFI.\n"); > > > return -EINVAL; > > > } > > > @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > > table = tbl32[i].table; > > > } > > > > > > +#ifdef CONFIG_XEN_EFI > > > > We tend to prefer IS_ENABLED() for cases such as this one. That way, > > the compiler always gets to see the code inside the conditional block, > > which gives better build test coverage (even if CONFIG_XEN_EFI is > > disabled). > > Can I count on the compiler eliminating the code as unreachable? With > CONFIG_XEN_EFI disabled xen_config_table_memory_region_max() would be an > undefined symbol. > If you drop the #ifdef in the .h file (as I suggested below) the code will compile fine, and the symbol reference will not be emitted into the object, so it will link fine even if the Xen objects are not being built. We rely on this behavior all over the Linux kernel. > > > + if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table)) > > > > So the question here is whether Xen thinks the table should be > > disregarded or not. So let's define a prototype that reflects that > > purpose, and let the implementation reason about how this should be > > achieved. > > xen_config_table_memory_region_max() doesn’t just return whether the > table should be disregarded, but also (if the table should not be > ignored) the end of the memory region containing it. But the calling code never uses that value, right? > I will make > xen_efi_config_table_valid() a wrapper around > xen_config_table_memory_region_max(), as the former also needs to print > a warning if the table is in an invalid location. > > > So > > > > if (IS_ENABLED(CONFIG_XEN_EFI) && > > efi_enabled(EFI_PARAVIRT) && > > xen_efi_config_table_valid(guid, table) > > continue > > > > I should note here, though, that EFI_PARAViRT is only set on x86 not > > on other architectures that enable CONFIG_XEN_EFI so this will not > > work anywhere else. > > What should I use instead? > > > > + continue; > > > +#endif > > > + > > > if (!match_config_table(guid, table, common_tables) && arch_tables) > > > match_config_table(guid, table, arch_tables); > > > } > > > - pr_cont("\n"); > > > set_bit(EFI_CONFIG_TABLES, &efi.flags); > > > > > > if (efi_rng_seed != EFI_INVALID_TABLE_ADDR) { > > > diff --git a/drivers/xen/efi.c b/drivers/xen/efi.c > > > index d1ff2186ebb48a7c0981ecb6d4afcbbb25ffcea0..c2274ddfcc63304008ef0fd78fd9fa416f75d073 100644 > > > --- a/drivers/xen/efi.c > > > +++ b/drivers/xen/efi.c > > > @@ -28,6 +28,7 @@ > > > #include <xen/interface/platform.h> > > > #include <xen/xen.h> > > > #include <xen/xen-ops.h> > > > +#include <xen/page.h> > > > > > > #include <asm/page.h> > > > > > > @@ -271,6 +272,40 @@ static void xen_efi_reset_system(int reset_type, efi_status_t status, > > > } > > > } > > > > > > +__init u64 xen_config_table_memory_region_max(u64 addr) > > > > It is more idiomatic for Linux to put __init after the return type. > > And if we adopt my suggestion above, this becomes > > > > bool __init xen_efi_config_table_valid(const efi_guid_t *guid, u64 table) > > > > Alternatively, you could pass the string identifier of the table > > instead of the guid (or both) to print in the diagnostic message. > > Will fix in v5. > > > > +{ > > > + static_assert(XEN_PAGE_SHIFT == EFI_PAGE_SHIFT, > > > + "Mismatch between EFI_PAGE_SHIFT and XEN_PAGE_SHIFT"); > > > > Is this the only place where this matters? And this never happens on x86, right? > > My understanding is that it should never happen on any architecture. EFI_PAGE_SHIFT is always 12, on any architecture and regardless of the page size used by the OS itself. AFAICT, the same applies to XEN_PAGE_SHIFT. > That’s why I static_assert() it. I have no idea if this is the only > place it matters, though. > I don't mind adding this here, but it's kind of orthogonal to the rest of the patch so please make a mention in the commit log why you are adding it. > > > + struct xen_platform_op op = { > > > + .cmd = XENPF_firmware_info, > > > + .u.firmware_info = { > > > + .type = XEN_FW_EFI_INFO, > > > + .index = XEN_FW_EFI_MEM_INFO, > > > + .u.efi_info.mem.addr = addr, > > > + .u.efi_info.mem.size = U64_MAX - addr, > > > + } > > > + }; > > > + union xenpf_efi_info *info = &op.u.firmware_info.u.efi_info; > > > + int rc = HYPERVISOR_platform_op(&op); > > > + > > > + if (rc) { > > > + pr_warn("Failed to lookup header %llu in Xen memory map: error %d\n", > > > + (unsigned long long)addr, rc); > > > + return 0; > > > + } > > > + > > > + switch (info->mem.type) { > > > + case EFI_RUNTIME_SERVICES_CODE: > > > + case EFI_RUNTIME_SERVICES_DATA: > > > + case EFI_ACPI_RECLAIM_MEMORY: > > > > If we are listing all memory types that Xen preserves, you might add > > EFI_RESERVED_MEMORY here. Otherwise, please only list the ones that > > you need to permit explicitly. > > My understanding was that EFI_RESERVED_MEMORY should never be touched by > the OS, so I left it out. Which types would you permit? > Well, given the purpose of the function (i.e, to reject EfiBootServicesData in spite of the spec), I'd only permit EFI_ACPI_RECLAIM_MEMORY and EFI_RUNTIME_SERVICES_DATA. However, the EFI spec does mention that prior versions permitted the reserved memory type as well for ACPI and SMBIOS tables (although that may be a very long time ago). Bottom line is that you want to permit only regions that Xen is guaranteed not to clobber, right? In any case, I'm not going to obsess over this so if you prefer to keep it this way, that's also fine. > > > + return info->mem.addr + info->mem.size; > > > + default: > > > + pr_warn("Table %llu is in memory of type %d, ignoring it\n", > > > + (unsigned long long)addr, info->mem.type); > > > + return 0; > > > + } > > > +} > > > + > > > /* > > > * Set XEN EFI runtime services function pointers. Other fields of struct efi, > > > * e.g. efi.systab, will be set like normal EFI. > > > diff --git a/include/linux/efi.h b/include/linux/efi.h > > > index d2b84c2fec39f0268324d1a38a73ed67786973c9..fc81e4b984398cdb399e7886b2cae7f33bf91613 100644 > > > --- a/include/linux/efi.h > > > +++ b/include/linux/efi.h > > > @@ -1324,4 +1324,13 @@ struct linux_efi_coco_secret_area { > > > /* Header of a populated EFI secret area */ > > > #define EFI_SECRET_TABLE_HEADER_GUID EFI_GUID(0x1e74f542, 0x71dd, 0x4d66, 0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b) > > > > > > +#ifdef CONFIG_XEN_EFI > > > > Please drop this #ifdef > > Will fix in v5. > > > > +/* > > > + * Returns the end of the memory region containing the given config table, > > > + * or 0 if the given address does not reside in memory that can validly > > > + * contain EFI configuration tables. > > > + */ > > > +__init u64 xen_config_table_memory_region_max(u64 addr); > > > > You can drop the __init here > > Will fix in v5. > > > > +#endif > > > + > > > #endif /* _LINUX_EFI_H */ > > > -- > > > Sincerely, > > > Demi Marie Obenour (she/her/hers) > > > Invisible Things Lab > > > > > -- > Sincerely, > Demi Marie Obenour (she/her/hers) > Invisible Things Lab
On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote: > On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote: > > > On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote: > > > > > > > > On 30.09.2022 01:02, Demi Marie Obenour wrote: > > > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > > > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > > > > must not use EFI tables from such memory. Most of the remaining EFI > > > > > memory types are not suitable for EFI tables, leaving only > > > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > > > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > > > > use tables that are located in one of these types of memory. > > > > > > > > > > This patch ensures this, and also adds a function > > > > > (xen_config_table_memory_region_max()) that will be used later to > > > > > replace the usage of the EFI memory map in esrt.c when running under > > > > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > > > > but I have not implemented this. > > > > > > > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > > > > > > > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed > > > > "-mapbs". Should we perhaps extend the interface such that Dom0 can then > > > > also use tables located in such regions, perhaps by faking > > > > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO? > > > > > > > > > > I know this ship has sailed for x86, but for the sake of other > > > architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME > > > bits alone, for the same reasons I gave earlier. (Runtime mappings for > > > the firmware code itself, page table fragmentation etc etc) > > > > Why do you say that it has sailed for x86? > > > > The x86 EFI code in Linux makes changes to the EFI memory map in many > different places in the code. On other architectures, we have managed > to avoid this, so that the EFI memory map is always identical to the > one provided by the firmware at boot. > > > > I know very little about Xen, but based on the context you provided in > > > this thread, I'd say that the best approach from the Xen side is to > > > convert all EfiBootServicesData regions that have configuration tables > > > pointing into them into EfiAcpiReclaimMemory. > > > > Should Xen convert the entire region, or should it try to reserve only > > the memory it needs? The latter would require it to parse the > > configuration tables. Is there a list of configuration tables that can > > legitimately be in EfiBootServicesData regions? > > > > Not really, no. Is there a list of tables that can be in EfiBootServicesData and which Linux cares about? > So you would have to convert the entire region > /unless/ Xen knows the GUID, and therefore knows how to derive the > size of the table, allowing it to reserve memory more conservatively. My worry is that this will wind up being equivalent to mapping all (or most) of EfiBootServicesData memory. > However, I doubt whether this is worth it: splitting entries implies > rewriting the memory map, which is a thing I'd rather avoid if I were > in your shoes. Xen actually uses a different approach: instead of rewriting the memory map, it uses the EFI pool allocator to allocate memory of the desired type, copies the table to the newly allocated memory, and installs the new table in place of the old one. That only works for tables Xen understands, though.
On Fri, Sep 30, 2022 at 08:42:41PM +0200, Ard Biesheuvel wrote: > On Fri, 30 Sept 2022 at 20:17, Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > On Fri, Sep 30, 2022 at 06:25:53PM +0200, Ard Biesheuvel wrote: > > > On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour > > > <demi@invisiblethingslab.com> wrote: > > > > > > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA, > > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by > > > > Xen before Linux gets to start using it. Therefore, Linux under Xen > > > > must not use EFI tables from such memory. Most of the remaining EFI > > > > memory types are not suitable for EFI tables, leaving only > > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and > > > > EFI_RUNTIME_SERVICES_CODE. When running under Xen, Linux should only > > > > use tables that are located in one of these types of memory. > > > > > > > > This patch ensures this, and also adds a function > > > > (xen_config_table_memory_region_max()) that will be used later to > > > > replace the usage of the EFI memory map in esrt.c when running under > > > > Xen. This function can also be used in mokvar-table.c and efi-bgrt.c, > > > > but I have not implemented this. > > > > > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> > > > > --- > > > > drivers/firmware/efi/efi.c | 8 +++++--- > > > > drivers/xen/efi.c | 35 +++++++++++++++++++++++++++++++++++ > > > > include/linux/efi.h | 9 +++++++++ > > > > 3 files changed, 49 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c > > > > index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644 > > > > --- a/drivers/firmware/efi/efi.c > > > > +++ b/drivers/firmware/efi/efi.c > > > > @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > > > unsigned long table; > > > > int i; > > > > > > > > - pr_info(""); > > > > > > Why are you removing these prints? > > > > If I left them, I would need to include a pr_cont("\n") later. > > There should always be one at the end of the loop, no? Or is this > related to the diagnostic that gets printed in your helper? My helper emits a diagnostic (at severity KERN_WARNING) if the table is in memory that Xen has not reserved. > > Checkpatch recommends against that. What is the purpose of this print? > > I assumed that since it prints an empty string it is superfluous. > > > > It prints the leading [invisible] loglevel marker, and the 'efi: ' prefix. Okay, that makes sense. > > > > for (i = 0; i < count; i++) { > > > > if (!IS_ENABLED(CONFIG_X86)) { > > > > guid = &config_tables[i].guid; > > > > @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > > > > > > > if (IS_ENABLED(CONFIG_X86_32) && > > > > tbl64[i].table > U32_MAX) { > > > > - pr_cont("\n"); > > > > pr_err("Table located above 4GB, disabling EFI.\n"); > > > > return -EINVAL; > > > > } > > > > @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables, > > > > table = tbl32[i].table; > > > > } > > > > > > > > +#ifdef CONFIG_XEN_EFI > > > > > > We tend to prefer IS_ENABLED() for cases such as this one. That way, > > > the compiler always gets to see the code inside the conditional block, > > > which gives better build test coverage (even if CONFIG_XEN_EFI is > > > disabled). > > > > Can I count on the compiler eliminating the code as unreachable? With > > CONFIG_XEN_EFI disabled xen_config_table_memory_region_max() would be an > > undefined symbol. > > > > If you drop the #ifdef in the .h file (as I suggested below) the code > will compile fine, and the symbol reference will not be emitted into > the object, so it will link fine even if the Xen objects are not being > built. > > We rely on this behavior all over the Linux kernel. Okay, thanks! > > > > + if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table)) > > > > > > So the question here is whether Xen thinks the table should be > > > disregarded or not. So let's define a prototype that reflects that > > > purpose, and let the implementation reason about how this should be > > > achieved. > > > > xen_config_table_memory_region_max() doesn’t just return whether the > > table should be disregarded, but also (if the table should not be > > ignored) the end of the memory region containing it. > > But the calling code never uses that value, right? The code in this patch does not use that value. Patch 2 of 2 does use it.
On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote: > On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour wrote: > > On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote: > > > I know very little about Xen, but based on the context you provided in > > > this thread, I'd say that the best approach from the Xen side is to > > > convert all EfiBootServicesData regions that have configuration tables > > > pointing into them into EfiAcpiReclaimMemory. > > > > Should Xen convert the entire region, or should it try to reserve only > > the memory it needs? The latter would require it to parse the > > configuration tables. Is there a list of configuration tables that can > > legitimately be in EfiBootServicesData regions? > > > > Not really, no. So you would have to convert the entire region > /unless/ Xen knows the GUID, and therefore knows how to derive the > size of the table, allowing it to reserve memory more conservatively. > However, I doubt whether this is worth it: splitting entries implies > rewriting the memory map, which is a thing I'd rather avoid if I were > in your shoes. I actually wonder if Xen needs to reserve *all* of EfiBootServicesData. The reason is that some (probably buggy) firmware may store ACPI tables there, and Xen does not have an ACPI implementation. From my perspective, a much safer approach would be to pass all of EfiBootServicesData memory directly to dom0, and have dom0 give Xen back what it doesn’t wind up using. That allows dom0’s memory reservation code to work properly, which it currently does not.
On 01.10.2022 02:30, Demi Marie Obenour wrote: > On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote: >> On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour wrote: >>> On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote: >>>> I know very little about Xen, but based on the context you provided in >>>> this thread, I'd say that the best approach from the Xen side is to >>>> convert all EfiBootServicesData regions that have configuration tables >>>> pointing into them into EfiAcpiReclaimMemory. >>> >>> Should Xen convert the entire region, or should it try to reserve only >>> the memory it needs? The latter would require it to parse the >>> configuration tables. Is there a list of configuration tables that can >>> legitimately be in EfiBootServicesData regions? >>> >> >> Not really, no. So you would have to convert the entire region >> /unless/ Xen knows the GUID, and therefore knows how to derive the >> size of the table, allowing it to reserve memory more conservatively. >> However, I doubt whether this is worth it: splitting entries implies >> rewriting the memory map, which is a thing I'd rather avoid if I were >> in your shoes. > > I actually wonder if Xen needs to reserve *all* of EfiBootServicesData. > The reason is that some (probably buggy) firmware may store ACPI tables > there, and Xen does not have an ACPI implementation. We already have the -mapbs option as a workaround in such situations. > From my > perspective, a much safer approach would be to pass all of > EfiBootServicesData memory directly to dom0, and have dom0 give Xen back > what it doesn’t wind up using. That allows dom0’s memory reservation > code to work properly, which it currently does not. As said already on a different thread: Giving memory to domains (incl Dom0) isn't related to their original memory type (neither EFI's nor E820's); the needed memory is taken from the general page allocator (with one exception for initrd, to avoid unnecessary copying around of data). Hence what you propose would end up as an (imo) awful hack in Xen. I also don't see how this relates to "dom0’s memory reservation code", but I'm sure you can clarify that for me. Jan
On Tue, Oct 04, 2022 at 10:22:13AM +0200, Jan Beulich wrote: > On 01.10.2022 02:30, Demi Marie Obenour wrote: > > On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote: > >> On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour wrote: > >>> On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote: > >>>> I know very little about Xen, but based on the context you provided in > >>>> this thread, I'd say that the best approach from the Xen side is to > >>>> convert all EfiBootServicesData regions that have configuration tables > >>>> pointing into them into EfiAcpiReclaimMemory. > >>> > >>> Should Xen convert the entire region, or should it try to reserve only > >>> the memory it needs? The latter would require it to parse the > >>> configuration tables. Is there a list of configuration tables that can > >>> legitimately be in EfiBootServicesData regions? > >>> > >> > >> Not really, no. So you would have to convert the entire region > >> /unless/ Xen knows the GUID, and therefore knows how to derive the > >> size of the table, allowing it to reserve memory more conservatively. > >> However, I doubt whether this is worth it: splitting entries implies > >> rewriting the memory map, which is a thing I'd rather avoid if I were > >> in your shoes. > > > > I actually wonder if Xen needs to reserve *all* of EfiBootServicesData. > > The reason is that some (probably buggy) firmware may store ACPI tables > > there, and Xen does not have an ACPI implementation. > > We already have the -mapbs option as a workaround in such situations. The problem is that there is no way for the user to know when it is needed. One option would be for dom0 to print a warning in the kernel log if it needs to ignore a table due to it being in EfiBootServicesData. > > From my > > perspective, a much safer approach would be to pass all of > > EfiBootServicesData memory directly to dom0, and have dom0 give Xen back > > what it doesn’t wind up using. That allows dom0’s memory reservation > > code to work properly, which it currently does not. > > As said already on a different thread: Giving memory to domains (incl > Dom0) isn't related to their original memory type (neither EFI's nor > E820's); the needed memory is taken from the general page allocator > (with one exception for initrd, to avoid unnecessary copying around of > data). Hence what you propose would end up as an (imo) awful hack in > Xen. I also don't see how this relates to "dom0’s memory reservation > code", but I'm sure you can clarify that for me. Linux has a function called efi_mem_reserve() that is used to reserve EfiBootServicesData memory that contains e.g. EFI configuration tables. This function does not work under Xen because Xen could have already clobbered the memory. efi_mem_reserve() not working is the whole reason for this thread, as it prevents EFI tables that are in EfiBootServicesData from being used under Xen. A much nicer approach would be for Xen to reserve boot services memory unconditionally, but provide a hypercall that dom0 could used to free the parts of EfiBootServicesData memory that are no longer needed. This would allow efi_mem_reserve() to work normally.
On 04.10.2022 17:46, Demi Marie Obenour wrote: > Linux has a function called efi_mem_reserve() that is used to reserve > EfiBootServicesData memory that contains e.g. EFI configuration tables. > This function does not work under Xen because Xen could have already > clobbered the memory. efi_mem_reserve() not working is the whole reason > for this thread, as it prevents EFI tables that are in > EfiBootServicesData from being used under Xen. > > A much nicer approach would be for Xen to reserve boot services memory > unconditionally, but provide a hypercall that dom0 could used to free > the parts of EfiBootServicesData memory that are no longer needed. This > would allow efi_mem_reserve() to work normally. efi_mem_reserve() actually working would be a layering violation; controlling the EFI memory map is entirely Xen's job. As to the hypercall you suggest - I wouldn't mind its addition, but only for the case when -mapbs is used. As I've indicated before, I'm of the opinion that default behavior should be matching the intentions of the spec, and the intention of EfiBootServices* is for the space to be reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using that hypercall: It might use it for regions where data lives which it wouldn't care about itself, but which an eventual kexec-ed (or alike) entity would later want to consume. Code/data potentially usable by _anyone_ between two resets of the system cannot legitimately be freed (and hence imo is wrong to live in EfiBootServices* regions). In a way one could view the Dom0 kernel as an "or alike" entity ... Jan
On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > Linux has a function called efi_mem_reserve() that is used to reserve > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > This function does not work under Xen because Xen could have already > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > for this thread, as it prevents EFI tables that are in > > EfiBootServicesData from being used under Xen. > > > > A much nicer approach would be for Xen to reserve boot services memory > > unconditionally, but provide a hypercall that dom0 could used to free > > the parts of EfiBootServicesData memory that are no longer needed. This > > would allow efi_mem_reserve() to work normally. > > efi_mem_reserve() actually working would be a layering violation; > controlling the EFI memory map is entirely Xen's job. Doing this properly would require Xen to understand all of the EFI tables that could validly be in EfiBootServices* and which could be of interest to dom0. It might (at least on some very buggy firmware) require a partial ACPI and/or SMBIOS implementation too, if the firmware decided to put an ACPI or SMBIOS table in EfiBootServices*. > As to the hypercall you suggest - I wouldn't mind its addition, but only > for the case when -mapbs is used. As I've indicated before, I'm of the > opinion that default behavior should be matching the intentions of the > spec, and the intention of EfiBootServices* is for the space to be > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > that hypercall: It might use it for regions where data lives which it > wouldn't care about itself, but which an eventual kexec-ed (or alike) > entity would later want to consume. Code/data potentially usable by > _anyone_ between two resets of the system cannot legitimately be freed > (and hence imo is wrong to live in EfiBootServices* regions). I agree, but currently some such data *is* in EfiBootServices* regions, sadly. When -mapbs is *not* used, I recommend uninstalling all of the configuration tables that point to EfiBootServicesData memory before freeing that memory. > In a way one could view the Dom0 kernel as an "or alike" entity ... It is indeed such an entity.
On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > This function does not work under Xen because Xen could have already > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > for this thread, as it prevents EFI tables that are in > > > EfiBootServicesData from being used under Xen. > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > unconditionally, but provide a hypercall that dom0 could used to free > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > would allow efi_mem_reserve() to work normally. > > > > efi_mem_reserve() actually working would be a layering violation; > > controlling the EFI memory map is entirely Xen's job. > > Doing this properly would require Xen to understand all of the EFI > tables that could validly be in EfiBootServices* and which could be of > interest to dom0. It might (at least on some very buggy firmware) > require a partial ACPI and/or SMBIOS implementation too, if the firmware > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > for the case when -mapbs is used. As I've indicated before, I'm of the > > opinion that default behavior should be matching the intentions of the > > spec, and the intention of EfiBootServices* is for the space to be > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > that hypercall: It might use it for regions where data lives which it > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > entity would later want to consume. Code/data potentially usable by > > _anyone_ between two resets of the system cannot legitimately be freed > > (and hence imo is wrong to live in EfiBootServices* regions). > > I agree, but currently some such data *is* in EfiBootServices* regions, > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > configuration tables that point to EfiBootServicesData memory before > freeing that memory. > That seems like a reasonable approach to me. Tables like MEMATTR or RT_PROP are mostly relevant for bare metal where the host kernel maps the runtime services, and in general, passing on these tables without knowing what they do is kind of fishy anyway. You might even argue that only known table types should be forwarded in the first place, regardless of the memory type.
On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote: > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > > This function does not work under Xen because Xen could have already > > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > > for this thread, as it prevents EFI tables that are in > > > > EfiBootServicesData from being used under Xen. > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > > unconditionally, but provide a hypercall that dom0 could used to free > > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > > would allow efi_mem_reserve() to work normally. > > > > > > efi_mem_reserve() actually working would be a layering violation; > > > controlling the EFI memory map is entirely Xen's job. > > > > Doing this properly would require Xen to understand all of the EFI > > tables that could validly be in EfiBootServices* and which could be of > > interest to dom0. It might (at least on some very buggy firmware) > > require a partial ACPI and/or SMBIOS implementation too, if the firmware > > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > > for the case when -mapbs is used. As I've indicated before, I'm of the > > > opinion that default behavior should be matching the intentions of the > > > spec, and the intention of EfiBootServices* is for the space to be > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > > that hypercall: It might use it for regions where data lives which it > > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > > entity would later want to consume. Code/data potentially usable by > > > _anyone_ between two resets of the system cannot legitimately be freed > > > (and hence imo is wrong to live in EfiBootServices* regions). > > > > I agree, but currently some such data *is* in EfiBootServices* regions, > > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > > configuration tables that point to EfiBootServicesData memory before > > freeing that memory. > > > > That seems like a reasonable approach to me. Tables like MEMATTR or > RT_PROP are mostly relevant for bare metal where the host kernel maps > the runtime services, and in general, passing on these tables without > knowing what they do is kind of fishy anyway. You might even argue > that only known table types should be forwarded in the first place, > regardless of the memory type. Which tables are worth handling in Xen? I know about ACPI, SMBIOS, and ESRT, but I am curious which others Xen should preserve. Currently, Xen does not know about RT_PROP or MEMATTR; could this be a cause of problems?
On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote: > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour > > <demi@invisiblethingslab.com> wrote: > > > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > > > This function does not work under Xen because Xen could have already > > > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > > > for this thread, as it prevents EFI tables that are in > > > > > EfiBootServicesData from being used under Xen. > > > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > > > unconditionally, but provide a hypercall that dom0 could used to free > > > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > > > would allow efi_mem_reserve() to work normally. > > > > > > > > efi_mem_reserve() actually working would be a layering violation; > > > > controlling the EFI memory map is entirely Xen's job. > > > > > > Doing this properly would require Xen to understand all of the EFI > > > tables that could validly be in EfiBootServices* and which could be of > > > interest to dom0. It might (at least on some very buggy firmware) > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware > > > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > > > for the case when -mapbs is used. As I've indicated before, I'm of the > > > > opinion that default behavior should be matching the intentions of the > > > > spec, and the intention of EfiBootServices* is for the space to be > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > > > that hypercall: It might use it for regions where data lives which it > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > > > entity would later want to consume. Code/data potentially usable by > > > > _anyone_ between two resets of the system cannot legitimately be freed > > > > (and hence imo is wrong to live in EfiBootServices* regions). > > > > > > I agree, but currently some such data *is* in EfiBootServices* regions, > > > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > > > configuration tables that point to EfiBootServicesData memory before > > > freeing that memory. > > > > > > > That seems like a reasonable approach to me. Tables like MEMATTR or > > RT_PROP are mostly relevant for bare metal where the host kernel maps > > the runtime services, and in general, passing on these tables without > > knowing what they do is kind of fishy anyway. You might even argue > > that only known table types should be forwarded in the first place, > > regardless of the memory type. > > Which tables are worth handling in Xen? I know about ACPI, SMBIOS, and > ESRT, but I am curious which others Xen should preserve. Currently, Xen > does not know about RT_PROP or MEMATTR; could this be a cause of > problems? dom0 only has access to paravirtualized EFI runtime services, so consuming RT_PROP or MEMATTR should be up to Xen (they describe which runtime services remain available at runtime, and which permission attributes to use for the runtime services memory regions, respectively) Looking through the kernel code, I don't think there are any that dom0 should care about beyond ACPI, SMBIOS and ESRT. But as you suggest, that means Xen should just mask them in the view of the EFI system table it exposes so dom0. Otherwise, the kernel may still try to map and parse them.
On 05.10.2022 20:11, Demi Marie Obenour wrote: > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: >> On 04.10.2022 17:46, Demi Marie Obenour wrote: >>> Linux has a function called efi_mem_reserve() that is used to reserve >>> EfiBootServicesData memory that contains e.g. EFI configuration tables. >>> This function does not work under Xen because Xen could have already >>> clobbered the memory. efi_mem_reserve() not working is the whole reason >>> for this thread, as it prevents EFI tables that are in >>> EfiBootServicesData from being used under Xen. >>> >>> A much nicer approach would be for Xen to reserve boot services memory >>> unconditionally, but provide a hypercall that dom0 could used to free >>> the parts of EfiBootServicesData memory that are no longer needed. This >>> would allow efi_mem_reserve() to work normally. >> >> efi_mem_reserve() actually working would be a layering violation; >> controlling the EFI memory map is entirely Xen's job. > > Doing this properly would require Xen to understand all of the EFI > tables that could validly be in EfiBootServices* and which could be of > interest to dom0. We don't need to understand the tables as long as none crosses memory map descriptor boundaries, and as long as they don't contain further pointers. > It might (at least on some very buggy firmware) > require a partial ACPI and/or SMBIOS implementation too, if the firmware > decided to put an ACPI or SMBIOS table in EfiBootServices*. I hope we won't need to go that far; on such systems -mapbs will continue to be needed. >> As to the hypercall you suggest - I wouldn't mind its addition, but only >> for the case when -mapbs is used. As I've indicated before, I'm of the >> opinion that default behavior should be matching the intentions of the >> spec, and the intention of EfiBootServices* is for the space to be >> reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using >> that hypercall: It might use it for regions where data lives which it >> wouldn't care about itself, but which an eventual kexec-ed (or alike) >> entity would later want to consume. Code/data potentially usable by >> _anyone_ between two resets of the system cannot legitimately be freed >> (and hence imo is wrong to live in EfiBootServices* regions). > > I agree, but currently some such data *is* in EfiBootServices* regions, > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > configuration tables that point to EfiBootServicesData memory before > freeing that memory. Hmm, uninstalling isn't nice, as it may limit functionality. Instead we might go through all tables and fiddle with memap descriptors in case a pointer references an EfiBootServices* region (regardless of size, as per the first restriction mentioned above). (A more brute force approach might be to simply behave as if -mapbs was specified in such a case, provided we can reliably determine this early enough, i.e. before first checking the "map_bs" variable.) Tables actually known to us could also be relocated (like you've done for ESRT). Such checking could be extended to the runtime services function pointers. While that wouldn't cover cases where a function entry point is in runtime services space but the function then wrongly calls into or references boot services space, it would cover a few more (broken) systems. This, unlike behaving by default as if -mapbs was given, would be a workaround I'd accept to be enabled unconditionally, as it wouldn't affect well behaved systems (beyond the time it takes to carry out the checks, and provided the checking logic isn't buggy). There's one further caveat towards uninstalling (in a way also for your ESRT relocation code): The final memory map is known to us only when we can't call boot services functions anymore (i.e. in particular InstallConfigurationTable()). Jan
On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote: > On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote: > > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour > > > <demi@invisiblethingslab.com> wrote: > > > > > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > > > > This function does not work under Xen because Xen could have already > > > > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > > > > for this thread, as it prevents EFI tables that are in > > > > > > EfiBootServicesData from being used under Xen. > > > > > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > > > > unconditionally, but provide a hypercall that dom0 could used to free > > > > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > > > > would allow efi_mem_reserve() to work normally. > > > > > > > > > > efi_mem_reserve() actually working would be a layering violation; > > > > > controlling the EFI memory map is entirely Xen's job. > > > > > > > > Doing this properly would require Xen to understand all of the EFI > > > > tables that could validly be in EfiBootServices* and which could be of > > > > interest to dom0. It might (at least on some very buggy firmware) > > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware > > > > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > > > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > > > > for the case when -mapbs is used. As I've indicated before, I'm of the > > > > > opinion that default behavior should be matching the intentions of the > > > > > spec, and the intention of EfiBootServices* is for the space to be > > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > > > > that hypercall: It might use it for regions where data lives which it > > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > > > > entity would later want to consume. Code/data potentially usable by > > > > > _anyone_ between two resets of the system cannot legitimately be freed > > > > > (and hence imo is wrong to live in EfiBootServices* regions). > > > > > > > > I agree, but currently some such data *is* in EfiBootServices* regions, > > > > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > > > > configuration tables that point to EfiBootServicesData memory before > > > > freeing that memory. > > > > > > > > > > That seems like a reasonable approach to me. Tables like MEMATTR or > > > RT_PROP are mostly relevant for bare metal where the host kernel maps > > > the runtime services, and in general, passing on these tables without > > > knowing what they do is kind of fishy anyway. You might even argue > > > that only known table types should be forwarded in the first place, > > > regardless of the memory type. > > > > Which tables are worth handling in Xen? I know about ACPI, SMBIOS, and > > ESRT, but I am curious which others Xen should preserve. Currently, Xen > > does not know about RT_PROP or MEMATTR; could this be a cause of > > problems? > > dom0 only has access to paravirtualized EFI runtime services, so > consuming RT_PROP or MEMATTR should be up to Xen (they describe which > runtime services remain available at runtime, and which permission > attributes to use for the runtime services memory regions, > respectively) Xen does not do this right now. I wonder if this could be the cause of compatibility issues with various firmware implementations. > Looking through the kernel code, I don't think there are any that dom0 > should care about beyond ACPI, SMBIOS and ESRT. But as you suggest, > that means Xen should just mask them in the view of the EFI system > table it exposes so dom0. Otherwise, the kernel may still try to map > and parse them. What about the BGRT and MOKvar? I agree that Xen should not expose the others. Should it just hide the tables, or should it actually uninstall them? My intuition is that the second would be technically more correct, but also more likely to trigger bugs in various firmware implementations.
On Thu, 6 Oct 2022 at 16:43, Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote: > > On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour > > <demi@invisiblethingslab.com> wrote: > > > > > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote: > > > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour > > > > <demi@invisiblethingslab.com> wrote: > > > > > > > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > > > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > > > > > This function does not work under Xen because Xen could have already > > > > > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > > > > > for this thread, as it prevents EFI tables that are in > > > > > > > EfiBootServicesData from being used under Xen. > > > > > > > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > > > > > unconditionally, but provide a hypercall that dom0 could used to free > > > > > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > > > > > would allow efi_mem_reserve() to work normally. > > > > > > > > > > > > efi_mem_reserve() actually working would be a layering violation; > > > > > > controlling the EFI memory map is entirely Xen's job. > > > > > > > > > > Doing this properly would require Xen to understand all of the EFI > > > > > tables that could validly be in EfiBootServices* and which could be of > > > > > interest to dom0. It might (at least on some very buggy firmware) > > > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware > > > > > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > > > > > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > > > > > for the case when -mapbs is used. As I've indicated before, I'm of the > > > > > > opinion that default behavior should be matching the intentions of the > > > > > > spec, and the intention of EfiBootServices* is for the space to be > > > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > > > > > that hypercall: It might use it for regions where data lives which it > > > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > > > > > entity would later want to consume. Code/data potentially usable by > > > > > > _anyone_ between two resets of the system cannot legitimately be freed > > > > > > (and hence imo is wrong to live in EfiBootServices* regions). > > > > > > > > > > I agree, but currently some such data *is* in EfiBootServices* regions, > > > > > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > > > > > configuration tables that point to EfiBootServicesData memory before > > > > > freeing that memory. > > > > > > > > > > > > > That seems like a reasonable approach to me. Tables like MEMATTR or > > > > RT_PROP are mostly relevant for bare metal where the host kernel maps > > > > the runtime services, and in general, passing on these tables without > > > > knowing what they do is kind of fishy anyway. You might even argue > > > > that only known table types should be forwarded in the first place, > > > > regardless of the memory type. > > > > > > Which tables are worth handling in Xen? I know about ACPI, SMBIOS, and > > > ESRT, but I am curious which others Xen should preserve. Currently, Xen > > > does not know about RT_PROP or MEMATTR; could this be a cause of > > > problems? > > > > dom0 only has access to paravirtualized EFI runtime services, so > > consuming RT_PROP or MEMATTR should be up to Xen (they describe which > > runtime services remain available at runtime, and which permission > > attributes to use for the runtime services memory regions, > > respectively) > > Xen does not do this right now. I wonder if this could be the cause of > compatibility issues with various firmware implementations. > > > Looking through the kernel code, I don't think there are any that dom0 > > should care about beyond ACPI, SMBIOS and ESRT. But as you suggest, > > that means Xen should just mask them in the view of the EFI system > > table it exposes so dom0. Otherwise, the kernel may still try to map > > and parse them. > > What about the BGRT and MOKvar? I agree that Xen should not expose the > others. Should it just hide the tables, or should it actually uninstall > them? My intuition is that the second would be technically more > correct, but also more likely to trigger bugs in various firmware > implementations. BGRT is a ACPI table not a EFI configuration table, so I'd assume it is treated the same way as other ACPI tables MOKvar is a fallback for systems where it doesn't fit into a volatile variable IIRC but I am not sure if it is actually implemented anywhere, and what type of memory region it is expected to use. However, I'm not sure if it even matters, given that Xen loads the dom0 kernel not the firmware, right?
On Thu, Oct 06, 2022 at 06:19:35PM +0200, Ard Biesheuvel wrote: > On Thu, 6 Oct 2022 at 16:43, Demi Marie Obenour > <demi@invisiblethingslab.com> wrote: > > > > On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote: > > > On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour > > > <demi@invisiblethingslab.com> wrote: > > > > > > > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote: > > > > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour > > > > > <demi@invisiblethingslab.com> wrote: > > > > > > > > > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > > > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > > > > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > > > > > > This function does not work under Xen because Xen could have already > > > > > > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > > > > > > for this thread, as it prevents EFI tables that are in > > > > > > > > EfiBootServicesData from being used under Xen. > > > > > > > > > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > > > > > > unconditionally, but provide a hypercall that dom0 could used to free > > > > > > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > > > > > > would allow efi_mem_reserve() to work normally. > > > > > > > > > > > > > > efi_mem_reserve() actually working would be a layering violation; > > > > > > > controlling the EFI memory map is entirely Xen's job. > > > > > > > > > > > > Doing this properly would require Xen to understand all of the EFI > > > > > > tables that could validly be in EfiBootServices* and which could be of > > > > > > interest to dom0. It might (at least on some very buggy firmware) > > > > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware > > > > > > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > > > > > > > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > > > > > > for the case when -mapbs is used. As I've indicated before, I'm of the > > > > > > > opinion that default behavior should be matching the intentions of the > > > > > > > spec, and the intention of EfiBootServices* is for the space to be > > > > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > > > > > > that hypercall: It might use it for regions where data lives which it > > > > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > > > > > > entity would later want to consume. Code/data potentially usable by > > > > > > > _anyone_ between two resets of the system cannot legitimately be freed > > > > > > > (and hence imo is wrong to live in EfiBootServices* regions). > > > > > > > > > > > > I agree, but currently some such data *is* in EfiBootServices* regions, > > > > > > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > > > > > > configuration tables that point to EfiBootServicesData memory before > > > > > > freeing that memory. > > > > > > > > > > > > > > > > That seems like a reasonable approach to me. Tables like MEMATTR or > > > > > RT_PROP are mostly relevant for bare metal where the host kernel maps > > > > > the runtime services, and in general, passing on these tables without > > > > > knowing what they do is kind of fishy anyway. You might even argue > > > > > that only known table types should be forwarded in the first place, > > > > > regardless of the memory type. > > > > > > > > Which tables are worth handling in Xen? I know about ACPI, SMBIOS, and > > > > ESRT, but I am curious which others Xen should preserve. Currently, Xen > > > > does not know about RT_PROP or MEMATTR; could this be a cause of > > > > problems? > > > > > > dom0 only has access to paravirtualized EFI runtime services, so > > > consuming RT_PROP or MEMATTR should be up to Xen (they describe which > > > runtime services remain available at runtime, and which permission > > > attributes to use for the runtime services memory regions, > > > respectively) > > > > Xen does not do this right now. I wonder if this could be the cause of > > compatibility issues with various firmware implementations. > > > > > Looking through the kernel code, I don't think there are any that dom0 > > > should care about beyond ACPI, SMBIOS and ESRT. But as you suggest, > > > that means Xen should just mask them in the view of the EFI system > > > table it exposes so dom0. Otherwise, the kernel may still try to map > > > and parse them. > > > > What about the BGRT and MOKvar? I agree that Xen should not expose the > > others. Should it just hide the tables, or should it actually uninstall > > them? My intuition is that the second would be technically more > > correct, but also more likely to trigger bugs in various firmware > > implementations. > > BGRT is a ACPI table not a EFI configuration table, so I'd assume it > is treated the same way as other ACPI tables That actually surprises me. Xen definitely needs to know about its GUID, then. Are there any other EFI configuration tables that Xen needs to know about, or ACPI tables that might be outside of EfiACPIReclaimMemory. > MOKvar is a fallback for systems where it doesn't fit into a volatile > variable IIRC but I am not sure if it is actually implemented > anywhere, and what type of memory region it is expected to use. > > However, I'm not sure if it even matters, given that Xen loads the > dom0 kernel not the firmware, right? No idea.
On Thu, 6 Oct 2022 at 19:22, Demi Marie Obenour <demi@invisiblethingslab.com> wrote: > > On Thu, Oct 06, 2022 at 06:19:35PM +0200, Ard Biesheuvel wrote: > > On Thu, 6 Oct 2022 at 16:43, Demi Marie Obenour > > <demi@invisiblethingslab.com> wrote: > > > > > > On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote: > > > > On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour > > > > <demi@invisiblethingslab.com> wrote: > > > > > > > > > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote: > > > > > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour > > > > > > <demi@invisiblethingslab.com> wrote: > > > > > > > > > > > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote: > > > > > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote: > > > > > > > > > Linux has a function called efi_mem_reserve() that is used to reserve > > > > > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables. > > > > > > > > > This function does not work under Xen because Xen could have already > > > > > > > > > clobbered the memory. efi_mem_reserve() not working is the whole reason > > > > > > > > > for this thread, as it prevents EFI tables that are in > > > > > > > > > EfiBootServicesData from being used under Xen. > > > > > > > > > > > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory > > > > > > > > > unconditionally, but provide a hypercall that dom0 could used to free > > > > > > > > > the parts of EfiBootServicesData memory that are no longer needed. This > > > > > > > > > would allow efi_mem_reserve() to work normally. > > > > > > > > > > > > > > > > efi_mem_reserve() actually working would be a layering violation; > > > > > > > > controlling the EFI memory map is entirely Xen's job. > > > > > > > > > > > > > > Doing this properly would require Xen to understand all of the EFI > > > > > > > tables that could validly be in EfiBootServices* and which could be of > > > > > > > interest to dom0. It might (at least on some very buggy firmware) > > > > > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware > > > > > > > decided to put an ACPI or SMBIOS table in EfiBootServices*. > > > > > > > > > > > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only > > > > > > > > for the case when -mapbs is used. As I've indicated before, I'm of the > > > > > > > > opinion that default behavior should be matching the intentions of the > > > > > > > > spec, and the intention of EfiBootServices* is for the space to be > > > > > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using > > > > > > > > that hypercall: It might use it for regions where data lives which it > > > > > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike) > > > > > > > > entity would later want to consume. Code/data potentially usable by > > > > > > > > _anyone_ between two resets of the system cannot legitimately be freed > > > > > > > > (and hence imo is wrong to live in EfiBootServices* regions). > > > > > > > > > > > > > > I agree, but currently some such data *is* in EfiBootServices* regions, > > > > > > > sadly. When -mapbs is *not* used, I recommend uninstalling all of the > > > > > > > configuration tables that point to EfiBootServicesData memory before > > > > > > > freeing that memory. > > > > > > > > > > > > > > > > > > > That seems like a reasonable approach to me. Tables like MEMATTR or > > > > > > RT_PROP are mostly relevant for bare metal where the host kernel maps > > > > > > the runtime services, and in general, passing on these tables without > > > > > > knowing what they do is kind of fishy anyway. You might even argue > > > > > > that only known table types should be forwarded in the first place, > > > > > > regardless of the memory type. > > > > > > > > > > Which tables are worth handling in Xen? I know about ACPI, SMBIOS, and > > > > > ESRT, but I am curious which others Xen should preserve. Currently, Xen > > > > > does not know about RT_PROP or MEMATTR; could this be a cause of > > > > > problems? > > > > > > > > dom0 only has access to paravirtualized EFI runtime services, so > > > > consuming RT_PROP or MEMATTR should be up to Xen (they describe which > > > > runtime services remain available at runtime, and which permission > > > > attributes to use for the runtime services memory regions, > > > > respectively) > > > > > > Xen does not do this right now. I wonder if this could be the cause of > > > compatibility issues with various firmware implementations. > > > > > > > Looking through the kernel code, I don't think there are any that dom0 > > > > should care about beyond ACPI, SMBIOS and ESRT. But as you suggest, > > > > that means Xen should just mask them in the view of the EFI system > > > > table it exposes so dom0. Otherwise, the kernel may still try to map > > > > and parse them. > > > > > > What about the BGRT and MOKvar? I agree that Xen should not expose the > > > others. Should it just hide the tables, or should it actually uninstall > > > them? My intuition is that the second would be technically more > > > correct, but also more likely to trigger bugs in various firmware > > > implementations. > > > > BGRT is a ACPI table not a EFI configuration table, so I'd assume it > > is treated the same way as other ACPI tables > > That actually surprises me. Xen definitely needs to know about its > GUID, then. It doesn't have a GUID. The table itself is a ACPI table that is enumerated in the usual ACPI way. The problem is that the BGRT describes a region in memory (where the OEM boot logo is), and this is usually in EfiBootServicesData memory. This data is typically used during early boot, to keep the logo intact while the OS loader draws a nice animation below it. Not the end of the world if it gets corrupted or removed.