mbox series

[v4,0/2] EFI improvements for Xen dom0

Message ID cover.1664298147.git.demi@invisiblethingslab.com
Headers show
Series EFI improvements for Xen dom0 | expand

Message

Demi Marie Obenour Sept. 29, 2022, 11:02 p.m. UTC
This fixes EFI table processing in Xen dom0 and adds ESRT support in
that configuration.

Changes since v3:

- Check location of all configuration tables, not just the ESRT.
- Move most Xen-specific code to drivers/xen/.
- Allow configuration tables to be in EFI_ACPI_RECLAIM_MEMORY.

Demi Marie Obenour (2):
  Avoid using EFI tables Xen may have clobbered
  Support ESRT in Xen dom0

 drivers/firmware/efi/efi.c  |  8 ++++---
 drivers/firmware/efi/esrt.c | 43 ++++++++++++++++++++++++++-----------
 drivers/xen/efi.c           | 35 ++++++++++++++++++++++++++++++
 include/linux/efi.h         |  9 ++++++++
 4 files changed, 79 insertions(+), 16 deletions(-)

Comments

Jan Beulich Sept. 30, 2022, 6:44 a.m. UTC | #1
On 30.09.2022 01:02, Demi Marie Obenour wrote:
> Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> Xen before Linux gets to start using it.  Therefore, Linux under Xen
> must not use EFI tables from such memory.  Most of the remaining EFI
> memory types are not suitable for EFI tables, leaving only
> EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> use tables that are located in one of these types of memory.
> 
> This patch ensures this, and also adds a function
> (xen_config_table_memory_region_max()) that will be used later to
> replace the usage of the EFI memory map in esrt.c when running under
> Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> but I have not implemented this.
> 
> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>

In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed
"-mapbs". Should we perhaps extend the interface such that Dom0 can then
also use tables located in such regions, perhaps by faking
EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO?

Jan
Ard Biesheuvel Sept. 30, 2022, 4:25 p.m. UTC | #2
On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> Xen before Linux gets to start using it.  Therefore, Linux under Xen
> must not use EFI tables from such memory.  Most of the remaining EFI
> memory types are not suitable for EFI tables, leaving only
> EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> use tables that are located in one of these types of memory.
>
> This patch ensures this, and also adds a function
> (xen_config_table_memory_region_max()) that will be used later to
> replace the usage of the EFI memory map in esrt.c when running under
> Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> but I have not implemented this.
>
> Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> ---
>  drivers/firmware/efi/efi.c |  8 +++++---
>  drivers/xen/efi.c          | 35 +++++++++++++++++++++++++++++++++++
>  include/linux/efi.h        |  9 +++++++++
>  3 files changed, 49 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
>         unsigned long table;
>         int i;
>
> -       pr_info("");

Why are you removing these prints?

>         for (i = 0; i < count; i++) {
>                 if (!IS_ENABLED(CONFIG_X86)) {
>                         guid = &config_tables[i].guid;
> @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
>
>                         if (IS_ENABLED(CONFIG_X86_32) &&
>                             tbl64[i].table > U32_MAX) {
> -                               pr_cont("\n");
>                                 pr_err("Table located above 4GB, disabling EFI.\n");
>                                 return -EINVAL;
>                         }
> @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
>                         table = tbl32[i].table;
>                 }
>
> +#ifdef CONFIG_XEN_EFI

We tend to prefer IS_ENABLED() for cases such as this one. That way,
the compiler always gets to see the code inside the conditional block,
which gives better build test coverage (even if CONFIG_XEN_EFI is
disabled).

> +               if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table))

So the question here is whether Xen thinks the table should be
disregarded or not. So let's define a prototype that reflects that
purpose, and let the implementation reason about how this should be
achieved.

So

if (IS_ENABLED(CONFIG_XEN_EFI) &&
    efi_enabled(EFI_PARAVIRT) &&
    xen_efi_config_table_valid(guid, table)
        continue

I should note here, though, that EFI_PARAViRT is only set on x86 not
on other architectures that enable CONFIG_XEN_EFI so this will not
work anywhere else.


> +                       continue;
> +#endif
> +
>                 if (!match_config_table(guid, table, common_tables) && arch_tables)
>                         match_config_table(guid, table, arch_tables);
>         }
> -       pr_cont("\n");
>         set_bit(EFI_CONFIG_TABLES, &efi.flags);
>
>         if (efi_rng_seed != EFI_INVALID_TABLE_ADDR) {
> diff --git a/drivers/xen/efi.c b/drivers/xen/efi.c
> index d1ff2186ebb48a7c0981ecb6d4afcbbb25ffcea0..c2274ddfcc63304008ef0fd78fd9fa416f75d073 100644
> --- a/drivers/xen/efi.c
> +++ b/drivers/xen/efi.c
> @@ -28,6 +28,7 @@
>  #include <xen/interface/platform.h>
>  #include <xen/xen.h>
>  #include <xen/xen-ops.h>
> +#include <xen/page.h>
>
>  #include <asm/page.h>
>
> @@ -271,6 +272,40 @@ static void xen_efi_reset_system(int reset_type, efi_status_t status,
>         }
>  }
>
> +__init u64 xen_config_table_memory_region_max(u64 addr)

It is more idiomatic for Linux to put __init after the return type.
And if we adopt my suggestion above, this becomes

bool __init xen_efi_config_table_valid(const efi_guid_t *guid, u64 table)

Alternatively, you could pass the string identifier of the table
instead of the guid (or both) to print in the diagnostic message.


> +{
> +       static_assert(XEN_PAGE_SHIFT == EFI_PAGE_SHIFT,
> +                     "Mismatch between EFI_PAGE_SHIFT and XEN_PAGE_SHIFT");

Is this the only place where this matters? And this never happens on x86, right?

> +       struct xen_platform_op op = {
> +               .cmd = XENPF_firmware_info,
> +               .u.firmware_info = {
> +                       .type = XEN_FW_EFI_INFO,
> +                       .index = XEN_FW_EFI_MEM_INFO,
> +                       .u.efi_info.mem.addr = addr,
> +                       .u.efi_info.mem.size = U64_MAX - addr,
> +               }
> +       };
> +       union xenpf_efi_info *info = &op.u.firmware_info.u.efi_info;
> +       int rc = HYPERVISOR_platform_op(&op);
> +
> +       if (rc) {
> +               pr_warn("Failed to lookup header %llu in Xen memory map: error %d\n",
> +                       (unsigned long long)addr, rc);
> +               return 0;
> +       }
> +
> +       switch (info->mem.type) {
> +       case EFI_RUNTIME_SERVICES_CODE:
> +       case EFI_RUNTIME_SERVICES_DATA:
> +       case EFI_ACPI_RECLAIM_MEMORY:

If we are listing all memory types that Xen preserves, you might add
EFI_RESERVED_MEMORY here. Otherwise, please only list the ones that
you need to permit explicitly.

> +               return info->mem.addr + info->mem.size;
> +       default:
> +               pr_warn("Table %llu is in memory of type %d, ignoring it\n",
> +                       (unsigned long long)addr, info->mem.type);
> +               return 0;
> +       }
> +}
> +
>  /*
>   * Set XEN EFI runtime services function pointers. Other fields of struct efi,
>   * e.g. efi.systab, will be set like normal EFI.
> diff --git a/include/linux/efi.h b/include/linux/efi.h
> index d2b84c2fec39f0268324d1a38a73ed67786973c9..fc81e4b984398cdb399e7886b2cae7f33bf91613 100644
> --- a/include/linux/efi.h
> +++ b/include/linux/efi.h
> @@ -1324,4 +1324,13 @@ struct linux_efi_coco_secret_area {
>  /* Header of a populated EFI secret area */
>  #define EFI_SECRET_TABLE_HEADER_GUID   EFI_GUID(0x1e74f542, 0x71dd, 0x4d66,  0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b)
>
> +#ifdef CONFIG_XEN_EFI

Please drop this #ifdef

> +/*
> + * Returns the end of the memory region containing the given config table,
> + * or 0 if the given address does not reside in memory that can validly
> + * contain EFI configuration tables.
> + */
> +__init u64 xen_config_table_memory_region_max(u64 addr);

You can drop the __init here

> +#endif
> +
>  #endif /* _LINUX_EFI_H */
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
> Invisible Things Lab
>
Ard Biesheuvel Sept. 30, 2022, 4:30 p.m. UTC | #3
On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote:
>
> On 30.09.2022 01:02, Demi Marie Obenour wrote:
> > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > must not use EFI tables from such memory.  Most of the remaining EFI
> > memory types are not suitable for EFI tables, leaving only
> > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > use tables that are located in one of these types of memory.
> >
> > This patch ensures this, and also adds a function
> > (xen_config_table_memory_region_max()) that will be used later to
> > replace the usage of the EFI memory map in esrt.c when running under
> > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > but I have not implemented this.
> >
> > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
>
> In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed
> "-mapbs". Should we perhaps extend the interface such that Dom0 can then
> also use tables located in such regions, perhaps by faking
> EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO?
>

I know this ship has sailed for x86, but for the sake of other
architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME
bits alone, for the same reasons I gave earlier. (Runtime mappings for
the firmware code itself, page table fragmentation etc etc)

I know very little about Xen, but based on the context you provided in
this thread, I'd say that the best approach from the Xen side is to
convert all EfiBootServicesData regions that have configuration tables
pointing into them into EfiAcpiReclaimMemory.

I take it XEN_FW_EFI_MEM_INFO is an existing interface? If so, you
might do the same for the returned type - EfiBootServicesData ->
EfiAcpiReclaimMemory, and not muck about with the EFI_MEMORY_RUNTIME
attribute.
Demi Marie Obenour Sept. 30, 2022, 4:38 p.m. UTC | #4
On Fri, Sep 30, 2022 at 08:44:21AM +0200, Jan Beulich wrote:
> On 30.09.2022 01:02, Demi Marie Obenour wrote:
> > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > must not use EFI tables from such memory.  Most of the remaining EFI
> > memory types are not suitable for EFI tables, leaving only
> > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > use tables that are located in one of these types of memory.
> > 
> > This patch ensures this, and also adds a function
> > (xen_config_table_memory_region_max()) that will be used later to
> > replace the usage of the EFI memory map in esrt.c when running under
> > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > but I have not implemented this.
> > 
> > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> 
> In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed
> "-mapbs". Should we perhaps extend the interface such that Dom0 can then
> also use tables located in such regions, perhaps by faking
> EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO?

I can add a check for EFI_MEMORY_RUNTIME, but only if I can require a Xen
version with https://lore.kernel.org/xen-devel/cc0fbcb4-5ea3-178c-e691-9acb7cc9a3a7@suse.com/t/#u.
This is easy in Qubes OS via RPM dependencies, but I am not sure if it
is suitable for upstream without a mechanism for dom0 to verify that the
patch has been included.
Demi Marie Obenour Sept. 30, 2022, 5:11 p.m. UTC | #5
On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote:
> On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote:
> >
> > On 30.09.2022 01:02, Demi Marie Obenour wrote:
> > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > > must not use EFI tables from such memory.  Most of the remaining EFI
> > > memory types are not suitable for EFI tables, leaving only
> > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > > use tables that are located in one of these types of memory.
> > >
> > > This patch ensures this, and also adds a function
> > > (xen_config_table_memory_region_max()) that will be used later to
> > > replace the usage of the EFI memory map in esrt.c when running under
> > > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > > but I have not implemented this.
> > >
> > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> >
> > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed
> > "-mapbs". Should we perhaps extend the interface such that Dom0 can then
> > also use tables located in such regions, perhaps by faking
> > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO?
> >
> 
> I know this ship has sailed for x86, but for the sake of other
> architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME
> bits alone, for the same reasons I gave earlier. (Runtime mappings for
> the firmware code itself, page table fragmentation etc etc)

Why do you say that it has sailed for x86?

> I know very little about Xen, but based on the context you provided in
> this thread, I'd say that the best approach from the Xen side is to
> convert all EfiBootServicesData regions that have configuration tables
> pointing into them into EfiAcpiReclaimMemory.

Should Xen convert the entire region, or should it try to reserve only
the memory it needs?  The latter would require it to parse the
configuration tables.  Is there a list of configuration tables that can
legitimately be in EfiBootServicesData regions?

> I take it XEN_FW_EFI_MEM_INFO is an existing interface? If so, you
> might do the same for the returned type - EfiBootServicesData ->
> EfiAcpiReclaimMemory, and not muck about with the EFI_MEMORY_RUNTIME
> attribute.

It is indeed an existing interface, and this is a much better idea than
what you proposed.
Demi Marie Obenour Sept. 30, 2022, 6:15 p.m. UTC | #6
On Fri, Sep 30, 2022 at 06:25:53PM +0200, Ard Biesheuvel wrote:
> On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour
> <demi@invisiblethingslab.com> wrote:
> >
> > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > must not use EFI tables from such memory.  Most of the remaining EFI
> > memory types are not suitable for EFI tables, leaving only
> > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > use tables that are located in one of these types of memory.
> >
> > This patch ensures this, and also adds a function
> > (xen_config_table_memory_region_max()) that will be used later to
> > replace the usage of the EFI memory map in esrt.c when running under
> > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > but I have not implemented this.
> >
> > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> > ---
> >  drivers/firmware/efi/efi.c |  8 +++++---
> >  drivers/xen/efi.c          | 35 +++++++++++++++++++++++++++++++++++
> >  include/linux/efi.h        |  9 +++++++++
> >  3 files changed, 49 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644
> > --- a/drivers/firmware/efi/efi.c
> > +++ b/drivers/firmware/efi/efi.c
> > @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> >         unsigned long table;
> >         int i;
> >
> > -       pr_info("");
> 
> Why are you removing these prints?

If I left them, I would need to include a pr_cont("\n") later.
Checkpatch recommends against that.  What is the purpose of this print?
I assumed that since it prints an empty string it is superfluous.

> >         for (i = 0; i < count; i++) {
> >                 if (!IS_ENABLED(CONFIG_X86)) {
> >                         guid = &config_tables[i].guid;
> > @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> >
> >                         if (IS_ENABLED(CONFIG_X86_32) &&
> >                             tbl64[i].table > U32_MAX) {
> > -                               pr_cont("\n");
> >                                 pr_err("Table located above 4GB, disabling EFI.\n");
> >                                 return -EINVAL;
> >                         }
> > @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> >                         table = tbl32[i].table;
> >                 }
> >
> > +#ifdef CONFIG_XEN_EFI
> 
> We tend to prefer IS_ENABLED() for cases such as this one. That way,
> the compiler always gets to see the code inside the conditional block,
> which gives better build test coverage (even if CONFIG_XEN_EFI is
> disabled).

Can I count on the compiler eliminating the code as unreachable?  With
CONFIG_XEN_EFI disabled xen_config_table_memory_region_max() would be an
undefined symbol.

> > +               if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table))
> 
> So the question here is whether Xen thinks the table should be
> disregarded or not. So let's define a prototype that reflects that
> purpose, and let the implementation reason about how this should be
> achieved.

xen_config_table_memory_region_max() doesn’t just return whether the
table should be disregarded, but also (if the table should not be
ignored) the end of the memory region containing it.  I will make
xen_efi_config_table_valid() a wrapper around
xen_config_table_memory_region_max(), as the former also needs to print
a warning if the table is in an invalid location.

> So
> 
> if (IS_ENABLED(CONFIG_XEN_EFI) &&
>     efi_enabled(EFI_PARAVIRT) &&
>     xen_efi_config_table_valid(guid, table)
>         continue
> 
> I should note here, though, that EFI_PARAViRT is only set on x86 not
> on other architectures that enable CONFIG_XEN_EFI so this will not
> work anywhere else.

What should I use instead?

> > +                       continue;
> > +#endif
> > +
> >                 if (!match_config_table(guid, table, common_tables) && arch_tables)
> >                         match_config_table(guid, table, arch_tables);
> >         }
> > -       pr_cont("\n");
> >         set_bit(EFI_CONFIG_TABLES, &efi.flags);
> >
> >         if (efi_rng_seed != EFI_INVALID_TABLE_ADDR) {
> > diff --git a/drivers/xen/efi.c b/drivers/xen/efi.c
> > index d1ff2186ebb48a7c0981ecb6d4afcbbb25ffcea0..c2274ddfcc63304008ef0fd78fd9fa416f75d073 100644
> > --- a/drivers/xen/efi.c
> > +++ b/drivers/xen/efi.c
> > @@ -28,6 +28,7 @@
> >  #include <xen/interface/platform.h>
> >  #include <xen/xen.h>
> >  #include <xen/xen-ops.h>
> > +#include <xen/page.h>
> >
> >  #include <asm/page.h>
> >
> > @@ -271,6 +272,40 @@ static void xen_efi_reset_system(int reset_type, efi_status_t status,
> >         }
> >  }
> >
> > +__init u64 xen_config_table_memory_region_max(u64 addr)
> 
> It is more idiomatic for Linux to put __init after the return type.
> And if we adopt my suggestion above, this becomes
> 
> bool __init xen_efi_config_table_valid(const efi_guid_t *guid, u64 table)
> 
> Alternatively, you could pass the string identifier of the table
> instead of the guid (or both) to print in the diagnostic message.

Will fix in v5.

> > +{
> > +       static_assert(XEN_PAGE_SHIFT == EFI_PAGE_SHIFT,
> > +                     "Mismatch between EFI_PAGE_SHIFT and XEN_PAGE_SHIFT");
> 
> Is this the only place where this matters? And this never happens on x86, right?

My understanding is that it should never happen on any architecture.
That’s why I static_assert() it.  I have no idea if this is the only
place it matters, though.

> > +       struct xen_platform_op op = {
> > +               .cmd = XENPF_firmware_info,
> > +               .u.firmware_info = {
> > +                       .type = XEN_FW_EFI_INFO,
> > +                       .index = XEN_FW_EFI_MEM_INFO,
> > +                       .u.efi_info.mem.addr = addr,
> > +                       .u.efi_info.mem.size = U64_MAX - addr,
> > +               }
> > +       };
> > +       union xenpf_efi_info *info = &op.u.firmware_info.u.efi_info;
> > +       int rc = HYPERVISOR_platform_op(&op);
> > +
> > +       if (rc) {
> > +               pr_warn("Failed to lookup header %llu in Xen memory map: error %d\n",
> > +                       (unsigned long long)addr, rc);
> > +               return 0;
> > +       }
> > +
> > +       switch (info->mem.type) {
> > +       case EFI_RUNTIME_SERVICES_CODE:
> > +       case EFI_RUNTIME_SERVICES_DATA:
> > +       case EFI_ACPI_RECLAIM_MEMORY:
> 
> If we are listing all memory types that Xen preserves, you might add
> EFI_RESERVED_MEMORY here. Otherwise, please only list the ones that
> you need to permit explicitly.

My understanding was that EFI_RESERVED_MEMORY should never be touched by
the OS, so I left it out.  Which types would you permit?

> > +               return info->mem.addr + info->mem.size;
> > +       default:
> > +               pr_warn("Table %llu is in memory of type %d, ignoring it\n",
> > +                       (unsigned long long)addr, info->mem.type);
> > +               return 0;
> > +       }
> > +}
> > +
> >  /*
> >   * Set XEN EFI runtime services function pointers. Other fields of struct efi,
> >   * e.g. efi.systab, will be set like normal EFI.
> > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > index d2b84c2fec39f0268324d1a38a73ed67786973c9..fc81e4b984398cdb399e7886b2cae7f33bf91613 100644
> > --- a/include/linux/efi.h
> > +++ b/include/linux/efi.h
> > @@ -1324,4 +1324,13 @@ struct linux_efi_coco_secret_area {
> >  /* Header of a populated EFI secret area */
> >  #define EFI_SECRET_TABLE_HEADER_GUID   EFI_GUID(0x1e74f542, 0x71dd, 0x4d66,  0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b)
> >
> > +#ifdef CONFIG_XEN_EFI
> 
> Please drop this #ifdef

Will fix in v5.

> > +/*
> > + * Returns the end of the memory region containing the given config table,
> > + * or 0 if the given address does not reside in memory that can validly
> > + * contain EFI configuration tables.
> > + */
> > +__init u64 xen_config_table_memory_region_max(u64 addr);
> 
> You can drop the __init here

Will fix in v5.

> > +#endif
> > +
> >  #endif /* _LINUX_EFI_H */
> > --
> > Sincerely,
> > Demi Marie Obenour (she/her/hers)
> > Invisible Things Lab
> >
Ard Biesheuvel Sept. 30, 2022, 6:27 p.m. UTC | #7
On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote:
> > On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote:
> > >
> > > On 30.09.2022 01:02, Demi Marie Obenour wrote:
> > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > > > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > > > must not use EFI tables from such memory.  Most of the remaining EFI
> > > > memory types are not suitable for EFI tables, leaving only
> > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > > > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > > > use tables that are located in one of these types of memory.
> > > >
> > > > This patch ensures this, and also adds a function
> > > > (xen_config_table_memory_region_max()) that will be used later to
> > > > replace the usage of the EFI memory map in esrt.c when running under
> > > > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > > > but I have not implemented this.
> > > >
> > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> > >
> > > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed
> > > "-mapbs". Should we perhaps extend the interface such that Dom0 can then
> > > also use tables located in such regions, perhaps by faking
> > > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO?
> > >
> >
> > I know this ship has sailed for x86, but for the sake of other
> > architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME
> > bits alone, for the same reasons I gave earlier. (Runtime mappings for
> > the firmware code itself, page table fragmentation etc etc)
>
> Why do you say that it has sailed for x86?
>

The x86 EFI code in Linux makes changes to the EFI memory map in many
different places in the code. On other architectures, we have managed
to avoid this, so that the EFI memory map is always identical to the
one provided by the firmware at boot.

> > I know very little about Xen, but based on the context you provided in
> > this thread, I'd say that the best approach from the Xen side is to
> > convert all EfiBootServicesData regions that have configuration tables
> > pointing into them into EfiAcpiReclaimMemory.
>
> Should Xen convert the entire region, or should it try to reserve only
> the memory it needs?  The latter would require it to parse the
> configuration tables.  Is there a list of configuration tables that can
> legitimately be in EfiBootServicesData regions?
>

Not really, no. So you would have to convert the entire region
/unless/ Xen knows the GUID, and therefore knows how to derive the
size of the table, allowing it to reserve memory more conservatively.
However, I doubt whether this is worth it: splitting entries implies
rewriting the memory map, which is a thing I'd rather avoid if I were
in your shoes.

> > I take it XEN_FW_EFI_MEM_INFO is an existing interface? If so, you
> > might do the same for the returned type - EfiBootServicesData ->
> > EfiAcpiReclaimMemory, and not muck about with the EFI_MEMORY_RUNTIME
> > attribute.
>
> It is indeed an existing interface, and this is a much better idea than
> what you proposed.

Right.
Ard Biesheuvel Sept. 30, 2022, 6:42 p.m. UTC | #8
On Fri, 30 Sept 2022 at 20:17, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> On Fri, Sep 30, 2022 at 06:25:53PM +0200, Ard Biesheuvel wrote:
> > On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour
> > <demi@invisiblethingslab.com> wrote:
> > >
> > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > > must not use EFI tables from such memory.  Most of the remaining EFI
> > > memory types are not suitable for EFI tables, leaving only
> > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > > use tables that are located in one of these types of memory.
> > >
> > > This patch ensures this, and also adds a function
> > > (xen_config_table_memory_region_max()) that will be used later to
> > > replace the usage of the EFI memory map in esrt.c when running under
> > > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > > but I have not implemented this.
> > >
> > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> > > ---
> > >  drivers/firmware/efi/efi.c |  8 +++++---
> > >  drivers/xen/efi.c          | 35 +++++++++++++++++++++++++++++++++++
> > >  include/linux/efi.h        |  9 +++++++++
> > >  3 files changed, 49 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > > index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644
> > > --- a/drivers/firmware/efi/efi.c
> > > +++ b/drivers/firmware/efi/efi.c
> > > @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> > >         unsigned long table;
> > >         int i;
> > >
> > > -       pr_info("");
> >
> > Why are you removing these prints?
>
> If I left them, I would need to include a pr_cont("\n") later.

There should always be one at the end of the loop, no? Or is this
related to the diagnostic that gets printed in your helper?

> Checkpatch recommends against that.  What is the purpose of this print?
> I assumed that since it prints an empty string it is superfluous.
>

It prints the leading [invisible] loglevel marker, and the 'efi: ' prefix.

> > >         for (i = 0; i < count; i++) {
> > >                 if (!IS_ENABLED(CONFIG_X86)) {
> > >                         guid = &config_tables[i].guid;
> > > @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> > >
> > >                         if (IS_ENABLED(CONFIG_X86_32) &&
> > >                             tbl64[i].table > U32_MAX) {
> > > -                               pr_cont("\n");
> > >                                 pr_err("Table located above 4GB, disabling EFI.\n");
> > >                                 return -EINVAL;
> > >                         }
> > > @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> > >                         table = tbl32[i].table;
> > >                 }
> > >
> > > +#ifdef CONFIG_XEN_EFI
> >
> > We tend to prefer IS_ENABLED() for cases such as this one. That way,
> > the compiler always gets to see the code inside the conditional block,
> > which gives better build test coverage (even if CONFIG_XEN_EFI is
> > disabled).
>
> Can I count on the compiler eliminating the code as unreachable?  With
> CONFIG_XEN_EFI disabled xen_config_table_memory_region_max() would be an
> undefined symbol.
>

If you drop the #ifdef in the .h file (as I suggested below) the code
will compile fine, and the symbol reference will not be emitted into
the object, so it will link fine even if the Xen objects are not being
built.

We rely on this behavior all over the Linux kernel.

> > > +               if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table))
> >
> > So the question here is whether Xen thinks the table should be
> > disregarded or not. So let's define a prototype that reflects that
> > purpose, and let the implementation reason about how this should be
> > achieved.
>
> xen_config_table_memory_region_max() doesn’t just return whether the
> table should be disregarded, but also (if the table should not be
> ignored) the end of the memory region containing it.

But the calling code never uses that value, right?

> I will make
> xen_efi_config_table_valid() a wrapper around
> xen_config_table_memory_region_max(), as the former also needs to print
> a warning if the table is in an invalid location.
>
> > So
> >
> > if (IS_ENABLED(CONFIG_XEN_EFI) &&
> >     efi_enabled(EFI_PARAVIRT) &&
> >     xen_efi_config_table_valid(guid, table)
> >         continue
> >
> > I should note here, though, that EFI_PARAViRT is only set on x86 not
> > on other architectures that enable CONFIG_XEN_EFI so this will not
> > work anywhere else.
>
> What should I use instead?
>
> > > +                       continue;
> > > +#endif
> > > +
> > >                 if (!match_config_table(guid, table, common_tables) && arch_tables)
> > >                         match_config_table(guid, table, arch_tables);
> > >         }
> > > -       pr_cont("\n");
> > >         set_bit(EFI_CONFIG_TABLES, &efi.flags);
> > >
> > >         if (efi_rng_seed != EFI_INVALID_TABLE_ADDR) {
> > > diff --git a/drivers/xen/efi.c b/drivers/xen/efi.c
> > > index d1ff2186ebb48a7c0981ecb6d4afcbbb25ffcea0..c2274ddfcc63304008ef0fd78fd9fa416f75d073 100644
> > > --- a/drivers/xen/efi.c
> > > +++ b/drivers/xen/efi.c
> > > @@ -28,6 +28,7 @@
> > >  #include <xen/interface/platform.h>
> > >  #include <xen/xen.h>
> > >  #include <xen/xen-ops.h>
> > > +#include <xen/page.h>
> > >
> > >  #include <asm/page.h>
> > >
> > > @@ -271,6 +272,40 @@ static void xen_efi_reset_system(int reset_type, efi_status_t status,
> > >         }
> > >  }
> > >
> > > +__init u64 xen_config_table_memory_region_max(u64 addr)
> >
> > It is more idiomatic for Linux to put __init after the return type.
> > And if we adopt my suggestion above, this becomes
> >
> > bool __init xen_efi_config_table_valid(const efi_guid_t *guid, u64 table)
> >
> > Alternatively, you could pass the string identifier of the table
> > instead of the guid (or both) to print in the diagnostic message.
>
> Will fix in v5.
>
> > > +{
> > > +       static_assert(XEN_PAGE_SHIFT == EFI_PAGE_SHIFT,
> > > +                     "Mismatch between EFI_PAGE_SHIFT and XEN_PAGE_SHIFT");
> >
> > Is this the only place where this matters? And this never happens on x86, right?
>
> My understanding is that it should never happen on any architecture.

EFI_PAGE_SHIFT is always 12, on any architecture and regardless of the
page size used by the OS itself. AFAICT, the same applies to
XEN_PAGE_SHIFT.

> That’s why I static_assert() it.  I have no idea if this is the only
> place it matters, though.
>

I don't mind adding this here, but it's kind of orthogonal to the rest
of the patch so please make a mention in the commit log why you are
adding it.

> > > +       struct xen_platform_op op = {
> > > +               .cmd = XENPF_firmware_info,
> > > +               .u.firmware_info = {
> > > +                       .type = XEN_FW_EFI_INFO,
> > > +                       .index = XEN_FW_EFI_MEM_INFO,
> > > +                       .u.efi_info.mem.addr = addr,
> > > +                       .u.efi_info.mem.size = U64_MAX - addr,
> > > +               }
> > > +       };
> > > +       union xenpf_efi_info *info = &op.u.firmware_info.u.efi_info;
> > > +       int rc = HYPERVISOR_platform_op(&op);
> > > +
> > > +       if (rc) {
> > > +               pr_warn("Failed to lookup header %llu in Xen memory map: error %d\n",
> > > +                       (unsigned long long)addr, rc);
> > > +               return 0;
> > > +       }
> > > +
> > > +       switch (info->mem.type) {
> > > +       case EFI_RUNTIME_SERVICES_CODE:
> > > +       case EFI_RUNTIME_SERVICES_DATA:
> > > +       case EFI_ACPI_RECLAIM_MEMORY:
> >
> > If we are listing all memory types that Xen preserves, you might add
> > EFI_RESERVED_MEMORY here. Otherwise, please only list the ones that
> > you need to permit explicitly.
>
> My understanding was that EFI_RESERVED_MEMORY should never be touched by
> the OS, so I left it out.  Which types would you permit?
>

Well, given the purpose of the function (i.e, to reject
EfiBootServicesData in spite of the spec), I'd only permit
EFI_ACPI_RECLAIM_MEMORY and EFI_RUNTIME_SERVICES_DATA. However, the
EFI spec does mention that prior versions permitted the reserved
memory type as well for ACPI and SMBIOS tables (although that may be a
very long time ago).

Bottom line is that you want to permit only regions that Xen is
guaranteed not to clobber, right? In any case, I'm not going to obsess
over this so if you prefer to keep it this way, that's also fine.


> > > +               return info->mem.addr + info->mem.size;
> > > +       default:
> > > +               pr_warn("Table %llu is in memory of type %d, ignoring it\n",
> > > +                       (unsigned long long)addr, info->mem.type);
> > > +               return 0;
> > > +       }
> > > +}
> > > +
> > >  /*
> > >   * Set XEN EFI runtime services function pointers. Other fields of struct efi,
> > >   * e.g. efi.systab, will be set like normal EFI.
> > > diff --git a/include/linux/efi.h b/include/linux/efi.h
> > > index d2b84c2fec39f0268324d1a38a73ed67786973c9..fc81e4b984398cdb399e7886b2cae7f33bf91613 100644
> > > --- a/include/linux/efi.h
> > > +++ b/include/linux/efi.h
> > > @@ -1324,4 +1324,13 @@ struct linux_efi_coco_secret_area {
> > >  /* Header of a populated EFI secret area */
> > >  #define EFI_SECRET_TABLE_HEADER_GUID   EFI_GUID(0x1e74f542, 0x71dd, 0x4d66,  0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b)
> > >
> > > +#ifdef CONFIG_XEN_EFI
> >
> > Please drop this #ifdef
>
> Will fix in v5.
>
> > > +/*
> > > + * Returns the end of the memory region containing the given config table,
> > > + * or 0 if the given address does not reside in memory that can validly
> > > + * contain EFI configuration tables.
> > > + */
> > > +__init u64 xen_config_table_memory_region_max(u64 addr);
> >
> > You can drop the __init here
>
> Will fix in v5.
>
> > > +#endif
> > > +
> > >  #endif /* _LINUX_EFI_H */
> > > --
> > > Sincerely,
> > > Demi Marie Obenour (she/her/hers)
> > > Invisible Things Lab
> > >
>
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
> Invisible Things Lab
Demi Marie Obenour Sept. 30, 2022, 6:50 p.m. UTC | #9
On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote:
> On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour
> <demi@invisiblethingslab.com> wrote:
> >
> > On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote:
> > > On Fri, 30 Sept 2022 at 08:44, Jan Beulich <jbeulich@suse.com> wrote:
> > > >
> > > > On 30.09.2022 01:02, Demi Marie Obenour wrote:
> > > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > > > > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > > > > must not use EFI tables from such memory.  Most of the remaining EFI
> > > > > memory types are not suitable for EFI tables, leaving only
> > > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > > > > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > > > > use tables that are located in one of these types of memory.
> > > > >
> > > > > This patch ensures this, and also adds a function
> > > > > (xen_config_table_memory_region_max()) that will be used later to
> > > > > replace the usage of the EFI memory map in esrt.c when running under
> > > > > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > > > > but I have not implemented this.
> > > > >
> > > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> > > >
> > > > In Xen we don't clobber EfiBootServices{Code,Data} when xen.efi was passed
> > > > "-mapbs". Should we perhaps extend the interface such that Dom0 can then
> > > > also use tables located in such regions, perhaps by faking
> > > > EFI_MEMORY_RUNTIME in the attributes returned by XEN_FW_EFI_MEM_INFO?
> > > >
> > >
> > > I know this ship has sailed for x86, but for the sake of other
> > > architectures, I'd strongly recommend leaving the EFI_MEMORY_RUNTIME
> > > bits alone, for the same reasons I gave earlier. (Runtime mappings for
> > > the firmware code itself, page table fragmentation etc etc)
> >
> > Why do you say that it has sailed for x86?
> >
> 
> The x86 EFI code in Linux makes changes to the EFI memory map in many
> different places in the code. On other architectures, we have managed
> to avoid this, so that the EFI memory map is always identical to the
> one provided by the firmware at boot.
> 
> > > I know very little about Xen, but based on the context you provided in
> > > this thread, I'd say that the best approach from the Xen side is to
> > > convert all EfiBootServicesData regions that have configuration tables
> > > pointing into them into EfiAcpiReclaimMemory.
> >
> > Should Xen convert the entire region, or should it try to reserve only
> > the memory it needs?  The latter would require it to parse the
> > configuration tables.  Is there a list of configuration tables that can
> > legitimately be in EfiBootServicesData regions?
> >
> 
> Not really, no.

Is there a list of tables that can be in EfiBootServicesData and which
Linux cares about?

> So you would have to convert the entire region
> /unless/ Xen knows the GUID, and therefore knows how to derive the
> size of the table, allowing it to reserve memory more conservatively.

My worry is that this will wind up being equivalent to mapping all (or
most) of EfiBootServicesData memory.

> However, I doubt whether this is worth it: splitting entries implies
> rewriting the memory map, which is a thing I'd rather avoid if I were
> in your shoes.

Xen actually uses a different approach: instead of rewriting the memory
map, it uses the EFI pool allocator to allocate memory of the desired
type, copies the table to the newly allocated memory, and installs the
new table in place of the old one.  That only works for tables Xen
understands, though.
Demi Marie Obenour Sept. 30, 2022, 7 p.m. UTC | #10
On Fri, Sep 30, 2022 at 08:42:41PM +0200, Ard Biesheuvel wrote:
> On Fri, 30 Sept 2022 at 20:17, Demi Marie Obenour
> <demi@invisiblethingslab.com> wrote:
> >
> > On Fri, Sep 30, 2022 at 06:25:53PM +0200, Ard Biesheuvel wrote:
> > > On Fri, 30 Sept 2022 at 01:02, Demi Marie Obenour
> > > <demi@invisiblethingslab.com> wrote:
> > > >
> > > > Memory of type EFI_CONVENTIONAL_MEMORY, EFI_LOADER_CODE, EFI_LOADER_DATA,
> > > > EFI_BOOT_SERVICES_CODE, and EFI_BOOT_SERVICES_DATA may be clobbered by
> > > > Xen before Linux gets to start using it.  Therefore, Linux under Xen
> > > > must not use EFI tables from such memory.  Most of the remaining EFI
> > > > memory types are not suitable for EFI tables, leaving only
> > > > EFI_ACPI_RECLAIM_MEMORY, EFI_RUNTIME_SERVICES_DATA, and
> > > > EFI_RUNTIME_SERVICES_CODE.  When running under Xen, Linux should only
> > > > use tables that are located in one of these types of memory.
> > > >
> > > > This patch ensures this, and also adds a function
> > > > (xen_config_table_memory_region_max()) that will be used later to
> > > > replace the usage of the EFI memory map in esrt.c when running under
> > > > Xen.  This function can also be used in mokvar-table.c and efi-bgrt.c,
> > > > but I have not implemented this.
> > > >
> > > > Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
> > > > ---
> > > >  drivers/firmware/efi/efi.c |  8 +++++---
> > > >  drivers/xen/efi.c          | 35 +++++++++++++++++++++++++++++++++++
> > > >  include/linux/efi.h        |  9 +++++++++
> > > >  3 files changed, 49 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> > > > index e4080ad96089abd7f84745dd8461c548bcbb7685..d344f3ff73d1c5ed0c67e3251a9502e66719741d 100644
> > > > --- a/drivers/firmware/efi/efi.c
> > > > +++ b/drivers/firmware/efi/efi.c
> > > > @@ -574,7 +574,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> > > >         unsigned long table;
> > > >         int i;
> > > >
> > > > -       pr_info("");
> > >
> > > Why are you removing these prints?
> >
> > If I left them, I would need to include a pr_cont("\n") later.
> 
> There should always be one at the end of the loop, no? Or is this
> related to the diagnostic that gets printed in your helper?

My helper emits a diagnostic (at severity KERN_WARNING) if the table is
in memory that Xen has not reserved.

> > Checkpatch recommends against that.  What is the purpose of this print?
> > I assumed that since it prints an empty string it is superfluous.
> >
> 
> It prints the leading [invisible] loglevel marker, and the 'efi: ' prefix.

Okay, that makes sense.

> > > >         for (i = 0; i < count; i++) {
> > > >                 if (!IS_ENABLED(CONFIG_X86)) {
> > > >                         guid = &config_tables[i].guid;
> > > > @@ -585,7 +584,6 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> > > >
> > > >                         if (IS_ENABLED(CONFIG_X86_32) &&
> > > >                             tbl64[i].table > U32_MAX) {
> > > > -                               pr_cont("\n");
> > > >                                 pr_err("Table located above 4GB, disabling EFI.\n");
> > > >                                 return -EINVAL;
> > > >                         }
> > > > @@ -594,10 +592,14 @@ int __init efi_config_parse_tables(const efi_config_table_t *config_tables,
> > > >                         table = tbl32[i].table;
> > > >                 }
> > > >
> > > > +#ifdef CONFIG_XEN_EFI
> > >
> > > We tend to prefer IS_ENABLED() for cases such as this one. That way,
> > > the compiler always gets to see the code inside the conditional block,
> > > which gives better build test coverage (even if CONFIG_XEN_EFI is
> > > disabled).
> >
> > Can I count on the compiler eliminating the code as unreachable?  With
> > CONFIG_XEN_EFI disabled xen_config_table_memory_region_max() would be an
> > undefined symbol.
> >
> 
> If you drop the #ifdef in the .h file (as I suggested below) the code
> will compile fine, and the symbol reference will not be emitted into
> the object, so it will link fine even if the Xen objects are not being
> built.
> 
> We rely on this behavior all over the Linux kernel.

Okay, thanks!

> > > > +               if (efi_enabled(EFI_PARAVIRT) && !xen_config_table_memory_region_max(table))
> > >
> > > So the question here is whether Xen thinks the table should be
> > > disregarded or not. So let's define a prototype that reflects that
> > > purpose, and let the implementation reason about how this should be
> > > achieved.
> >
> > xen_config_table_memory_region_max() doesn’t just return whether the
> > table should be disregarded, but also (if the table should not be
> > ignored) the end of the memory region containing it.
> 
> But the calling code never uses that value, right?

The code in this patch does not use that value.  Patch 2 of 2 does use
it.
Demi Marie Obenour Oct. 1, 2022, 12:30 a.m. UTC | #11
On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote:
> On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour wrote:
> > On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote:
> > > I know very little about Xen, but based on the context you provided in
> > > this thread, I'd say that the best approach from the Xen side is to
> > > convert all EfiBootServicesData regions that have configuration tables
> > > pointing into them into EfiAcpiReclaimMemory.
> >
> > Should Xen convert the entire region, or should it try to reserve only
> > the memory it needs?  The latter would require it to parse the
> > configuration tables.  Is there a list of configuration tables that can
> > legitimately be in EfiBootServicesData regions?
> >
> 
> Not really, no. So you would have to convert the entire region
> /unless/ Xen knows the GUID, and therefore knows how to derive the
> size of the table, allowing it to reserve memory more conservatively.
> However, I doubt whether this is worth it: splitting entries implies
> rewriting the memory map, which is a thing I'd rather avoid if I were
> in your shoes.

I actually wonder if Xen needs to reserve *all* of EfiBootServicesData.
The reason is that some (probably buggy) firmware may store ACPI tables
there, and Xen does not have an ACPI implementation.  From my
perspective, a much safer approach would be to pass all of
EfiBootServicesData memory directly to dom0, and have dom0 give Xen back
what it doesn’t wind up using.  That allows dom0’s memory reservation
code to work properly, which it currently does not.
Jan Beulich Oct. 4, 2022, 8:22 a.m. UTC | #12
On 01.10.2022 02:30, Demi Marie Obenour wrote:
> On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote:
>> On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour wrote:
>>> On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote:
>>>> I know very little about Xen, but based on the context you provided in
>>>> this thread, I'd say that the best approach from the Xen side is to
>>>> convert all EfiBootServicesData regions that have configuration tables
>>>> pointing into them into EfiAcpiReclaimMemory.
>>>
>>> Should Xen convert the entire region, or should it try to reserve only
>>> the memory it needs?  The latter would require it to parse the
>>> configuration tables.  Is there a list of configuration tables that can
>>> legitimately be in EfiBootServicesData regions?
>>>
>>
>> Not really, no. So you would have to convert the entire region
>> /unless/ Xen knows the GUID, and therefore knows how to derive the
>> size of the table, allowing it to reserve memory more conservatively.
>> However, I doubt whether this is worth it: splitting entries implies
>> rewriting the memory map, which is a thing I'd rather avoid if I were
>> in your shoes.
> 
> I actually wonder if Xen needs to reserve *all* of EfiBootServicesData.
> The reason is that some (probably buggy) firmware may store ACPI tables
> there, and Xen does not have an ACPI implementation.

We already have the -mapbs option as a workaround in such situations.

>  From my
> perspective, a much safer approach would be to pass all of
> EfiBootServicesData memory directly to dom0, and have dom0 give Xen back
> what it doesn’t wind up using.  That allows dom0’s memory reservation
> code to work properly, which it currently does not.

As said already on a different thread: Giving memory to domains (incl
Dom0) isn't related to their original memory type (neither EFI's nor
E820's); the needed memory is taken from the general page allocator
(with one exception for initrd, to avoid unnecessary copying around of
data). Hence what you propose would end up as an (imo) awful hack in
Xen. I also don't see how this relates to "dom0’s memory reservation
code", but I'm sure you can clarify that for me.

Jan
Demi Marie Obenour Oct. 4, 2022, 3:46 p.m. UTC | #13
On Tue, Oct 04, 2022 at 10:22:13AM +0200, Jan Beulich wrote:
> On 01.10.2022 02:30, Demi Marie Obenour wrote:
> > On Fri, Sep 30, 2022 at 08:27:09PM +0200, Ard Biesheuvel wrote:
> >> On Fri, 30 Sept 2022 at 19:12, Demi Marie Obenour wrote:
> >>> On Fri, Sep 30, 2022 at 06:30:57PM +0200, Ard Biesheuvel wrote:
> >>>> I know very little about Xen, but based on the context you provided in
> >>>> this thread, I'd say that the best approach from the Xen side is to
> >>>> convert all EfiBootServicesData regions that have configuration tables
> >>>> pointing into them into EfiAcpiReclaimMemory.
> >>>
> >>> Should Xen convert the entire region, or should it try to reserve only
> >>> the memory it needs?  The latter would require it to parse the
> >>> configuration tables.  Is there a list of configuration tables that can
> >>> legitimately be in EfiBootServicesData regions?
> >>>
> >>
> >> Not really, no. So you would have to convert the entire region
> >> /unless/ Xen knows the GUID, and therefore knows how to derive the
> >> size of the table, allowing it to reserve memory more conservatively.
> >> However, I doubt whether this is worth it: splitting entries implies
> >> rewriting the memory map, which is a thing I'd rather avoid if I were
> >> in your shoes.
> > 
> > I actually wonder if Xen needs to reserve *all* of EfiBootServicesData.
> > The reason is that some (probably buggy) firmware may store ACPI tables
> > there, and Xen does not have an ACPI implementation.
> 
> We already have the -mapbs option as a workaround in such situations.

The problem is that there is no way for the user to know when it is
needed.  One option would be for dom0 to print a warning in the kernel
log if it needs to ignore a table due to it being in EfiBootServicesData.

> >  From my
> > perspective, a much safer approach would be to pass all of
> > EfiBootServicesData memory directly to dom0, and have dom0 give Xen back
> > what it doesn’t wind up using.  That allows dom0’s memory reservation
> > code to work properly, which it currently does not.
> 
> As said already on a different thread: Giving memory to domains (incl
> Dom0) isn't related to their original memory type (neither EFI's nor
> E820's); the needed memory is taken from the general page allocator
> (with one exception for initrd, to avoid unnecessary copying around of
> data). Hence what you propose would end up as an (imo) awful hack in
> Xen. I also don't see how this relates to "dom0’s memory reservation
> code", but I'm sure you can clarify that for me.

Linux has a function called efi_mem_reserve() that is used to reserve
EfiBootServicesData memory that contains e.g. EFI configuration tables.
This function does not work under Xen because Xen could have already
clobbered the memory.  efi_mem_reserve() not working is the whole reason
for this thread, as it prevents EFI tables that are in
EfiBootServicesData from being used under Xen.

A much nicer approach would be for Xen to reserve boot services memory
unconditionally, but provide a hypercall that dom0 could used to free
the parts of EfiBootServicesData memory that are no longer needed.  This
would allow efi_mem_reserve() to work normally.
Jan Beulich Oct. 5, 2022, 6:15 a.m. UTC | #14
On 04.10.2022 17:46, Demi Marie Obenour wrote:
> Linux has a function called efi_mem_reserve() that is used to reserve
> EfiBootServicesData memory that contains e.g. EFI configuration tables.
> This function does not work under Xen because Xen could have already
> clobbered the memory.  efi_mem_reserve() not working is the whole reason
> for this thread, as it prevents EFI tables that are in
> EfiBootServicesData from being used under Xen.
> 
> A much nicer approach would be for Xen to reserve boot services memory
> unconditionally, but provide a hypercall that dom0 could used to free
> the parts of EfiBootServicesData memory that are no longer needed.  This
> would allow efi_mem_reserve() to work normally.

efi_mem_reserve() actually working would be a layering violation;
controlling the EFI memory map is entirely Xen's job.

As to the hypercall you suggest - I wouldn't mind its addition, but only
for the case when -mapbs is used. As I've indicated before, I'm of the
opinion that default behavior should be matching the intentions of the
spec, and the intention of EfiBootServices* is for the space to be
reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
that hypercall: It might use it for regions where data lives which it
wouldn't care about itself, but which an eventual kexec-ed (or alike)
entity would later want to consume. Code/data potentially usable by
_anyone_ between two resets of the system cannot legitimately be freed
(and hence imo is wrong to live in EfiBootServices* regions). In a way
one could view the Dom0 kernel as an "or alike" entity ...

Jan
Demi Marie Obenour Oct. 5, 2022, 6:11 p.m. UTC | #15
On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > Linux has a function called efi_mem_reserve() that is used to reserve
> > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > This function does not work under Xen because Xen could have already
> > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > for this thread, as it prevents EFI tables that are in
> > EfiBootServicesData from being used under Xen.
> > 
> > A much nicer approach would be for Xen to reserve boot services memory
> > unconditionally, but provide a hypercall that dom0 could used to free
> > the parts of EfiBootServicesData memory that are no longer needed.  This
> > would allow efi_mem_reserve() to work normally.
> 
> efi_mem_reserve() actually working would be a layering violation;
> controlling the EFI memory map is entirely Xen's job.

Doing this properly would require Xen to understand all of the EFI
tables that could validly be in EfiBootServices* and which could be of
interest to dom0.  It might (at least on some very buggy firmware)
require a partial ACPI and/or SMBIOS implementation too, if the firmware
decided to put an ACPI or SMBIOS table in EfiBootServices*.

> As to the hypercall you suggest - I wouldn't mind its addition, but only
> for the case when -mapbs is used. As I've indicated before, I'm of the
> opinion that default behavior should be matching the intentions of the
> spec, and the intention of EfiBootServices* is for the space to be
> reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> that hypercall: It might use it for regions where data lives which it
> wouldn't care about itself, but which an eventual kexec-ed (or alike)
> entity would later want to consume. Code/data potentially usable by
> _anyone_ between two resets of the system cannot legitimately be freed
> (and hence imo is wrong to live in EfiBootServices* regions).

I agree, but currently some such data *is* in EfiBootServices* regions,
sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
configuration tables that point to EfiBootServicesData memory before
freeing that memory.

> In a way one could view the Dom0 kernel as an "or alike" entity ...

It is indeed such an entity.
Ard Biesheuvel Oct. 5, 2022, 9:28 p.m. UTC | #16
On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > This function does not work under Xen because Xen could have already
> > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > for this thread, as it prevents EFI tables that are in
> > > EfiBootServicesData from being used under Xen.
> > >
> > > A much nicer approach would be for Xen to reserve boot services memory
> > > unconditionally, but provide a hypercall that dom0 could used to free
> > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > would allow efi_mem_reserve() to work normally.
> >
> > efi_mem_reserve() actually working would be a layering violation;
> > controlling the EFI memory map is entirely Xen's job.
>
> Doing this properly would require Xen to understand all of the EFI
> tables that could validly be in EfiBootServices* and which could be of
> interest to dom0.  It might (at least on some very buggy firmware)
> require a partial ACPI and/or SMBIOS implementation too, if the firmware
> decided to put an ACPI or SMBIOS table in EfiBootServices*.
>
> > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > for the case when -mapbs is used. As I've indicated before, I'm of the
> > opinion that default behavior should be matching the intentions of the
> > spec, and the intention of EfiBootServices* is for the space to be
> > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > that hypercall: It might use it for regions where data lives which it
> > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > entity would later want to consume. Code/data potentially usable by
> > _anyone_ between two resets of the system cannot legitimately be freed
> > (and hence imo is wrong to live in EfiBootServices* regions).
>
> I agree, but currently some such data *is* in EfiBootServices* regions,
> sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> configuration tables that point to EfiBootServicesData memory before
> freeing that memory.
>

That seems like a reasonable approach to me. Tables like MEMATTR or
RT_PROP are mostly relevant for bare metal where the host kernel maps
the runtime services, and in general, passing on these tables without
knowing what they do is kind of fishy anyway. You might even argue
that only known table types should be forwarded in the first place,
regardless of the memory type.
Demi Marie Obenour Oct. 6, 2022, 1:40 a.m. UTC | #17
On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote:
> On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
> <demi@invisiblethingslab.com> wrote:
> >
> > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > > This function does not work under Xen because Xen could have already
> > > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > > for this thread, as it prevents EFI tables that are in
> > > > EfiBootServicesData from being used under Xen.
> > > >
> > > > A much nicer approach would be for Xen to reserve boot services memory
> > > > unconditionally, but provide a hypercall that dom0 could used to free
> > > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > > would allow efi_mem_reserve() to work normally.
> > >
> > > efi_mem_reserve() actually working would be a layering violation;
> > > controlling the EFI memory map is entirely Xen's job.
> >
> > Doing this properly would require Xen to understand all of the EFI
> > tables that could validly be in EfiBootServices* and which could be of
> > interest to dom0.  It might (at least on some very buggy firmware)
> > require a partial ACPI and/or SMBIOS implementation too, if the firmware
> > decided to put an ACPI or SMBIOS table in EfiBootServices*.
> >
> > > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > > for the case when -mapbs is used. As I've indicated before, I'm of the
> > > opinion that default behavior should be matching the intentions of the
> > > spec, and the intention of EfiBootServices* is for the space to be
> > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > > that hypercall: It might use it for regions where data lives which it
> > > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > > entity would later want to consume. Code/data potentially usable by
> > > _anyone_ between two resets of the system cannot legitimately be freed
> > > (and hence imo is wrong to live in EfiBootServices* regions).
> >
> > I agree, but currently some such data *is* in EfiBootServices* regions,
> > sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> > configuration tables that point to EfiBootServicesData memory before
> > freeing that memory.
> >
> 
> That seems like a reasonable approach to me. Tables like MEMATTR or
> RT_PROP are mostly relevant for bare metal where the host kernel maps
> the runtime services, and in general, passing on these tables without
> knowing what they do is kind of fishy anyway. You might even argue
> that only known table types should be forwarded in the first place,
> regardless of the memory type.

Which tables are worth handling in Xen?  I know about ACPI, SMBIOS, and
ESRT, but I am curious which others Xen should preserve.  Currently, Xen
does not know about RT_PROP or MEMATTR; could this be a cause of
problems?
Ard Biesheuvel Oct. 6, 2022, 7:31 a.m. UTC | #18
On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote:
> > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
> > <demi@invisiblethingslab.com> wrote:
> > >
> > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > > > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > > > This function does not work under Xen because Xen could have already
> > > > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > > > for this thread, as it prevents EFI tables that are in
> > > > > EfiBootServicesData from being used under Xen.
> > > > >
> > > > > A much nicer approach would be for Xen to reserve boot services memory
> > > > > unconditionally, but provide a hypercall that dom0 could used to free
> > > > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > > > would allow efi_mem_reserve() to work normally.
> > > >
> > > > efi_mem_reserve() actually working would be a layering violation;
> > > > controlling the EFI memory map is entirely Xen's job.
> > >
> > > Doing this properly would require Xen to understand all of the EFI
> > > tables that could validly be in EfiBootServices* and which could be of
> > > interest to dom0.  It might (at least on some very buggy firmware)
> > > require a partial ACPI and/or SMBIOS implementation too, if the firmware
> > > decided to put an ACPI or SMBIOS table in EfiBootServices*.
> > >
> > > > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > > > for the case when -mapbs is used. As I've indicated before, I'm of the
> > > > opinion that default behavior should be matching the intentions of the
> > > > spec, and the intention of EfiBootServices* is for the space to be
> > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > > > that hypercall: It might use it for regions where data lives which it
> > > > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > > > entity would later want to consume. Code/data potentially usable by
> > > > _anyone_ between two resets of the system cannot legitimately be freed
> > > > (and hence imo is wrong to live in EfiBootServices* regions).
> > >
> > > I agree, but currently some such data *is* in EfiBootServices* regions,
> > > sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> > > configuration tables that point to EfiBootServicesData memory before
> > > freeing that memory.
> > >
> >
> > That seems like a reasonable approach to me. Tables like MEMATTR or
> > RT_PROP are mostly relevant for bare metal where the host kernel maps
> > the runtime services, and in general, passing on these tables without
> > knowing what they do is kind of fishy anyway. You might even argue
> > that only known table types should be forwarded in the first place,
> > regardless of the memory type.
>
> Which tables are worth handling in Xen?  I know about ACPI, SMBIOS, and
> ESRT, but I am curious which others Xen should preserve.  Currently, Xen
> does not know about RT_PROP or MEMATTR; could this be a cause of
> problems?

dom0 only has access to paravirtualized EFI runtime services, so
consuming RT_PROP or MEMATTR should be up to Xen (they describe which
runtime services remain available at runtime, and which permission
attributes to use for the runtime services memory regions,
respectively)

Looking through the kernel code, I don't think there are any that dom0
should care about beyond ACPI, SMBIOS and ESRT. But as you suggest,
that means Xen should just mask them in the view of the EFI system
table it exposes so dom0. Otherwise, the kernel may still try to map
and parse them.
Jan Beulich Oct. 6, 2022, 9:22 a.m. UTC | #19
On 05.10.2022 20:11, Demi Marie Obenour wrote:
> On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
>> On 04.10.2022 17:46, Demi Marie Obenour wrote:
>>> Linux has a function called efi_mem_reserve() that is used to reserve
>>> EfiBootServicesData memory that contains e.g. EFI configuration tables.
>>> This function does not work under Xen because Xen could have already
>>> clobbered the memory.  efi_mem_reserve() not working is the whole reason
>>> for this thread, as it prevents EFI tables that are in
>>> EfiBootServicesData from being used under Xen.
>>>
>>> A much nicer approach would be for Xen to reserve boot services memory
>>> unconditionally, but provide a hypercall that dom0 could used to free
>>> the parts of EfiBootServicesData memory that are no longer needed.  This
>>> would allow efi_mem_reserve() to work normally.
>>
>> efi_mem_reserve() actually working would be a layering violation;
>> controlling the EFI memory map is entirely Xen's job.
> 
> Doing this properly would require Xen to understand all of the EFI
> tables that could validly be in EfiBootServices* and which could be of
> interest to dom0.

We don't need to understand the tables as long as none crosses memory
map descriptor boundaries, and as long as they don't contain further
pointers.

>  It might (at least on some very buggy firmware)
> require a partial ACPI and/or SMBIOS implementation too, if the firmware
> decided to put an ACPI or SMBIOS table in EfiBootServices*.

I hope we won't need to go that far; on such systems -mapbs will continue
to be needed.

>> As to the hypercall you suggest - I wouldn't mind its addition, but only
>> for the case when -mapbs is used. As I've indicated before, I'm of the
>> opinion that default behavior should be matching the intentions of the
>> spec, and the intention of EfiBootServices* is for the space to be
>> reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
>> that hypercall: It might use it for regions where data lives which it
>> wouldn't care about itself, but which an eventual kexec-ed (or alike)
>> entity would later want to consume. Code/data potentially usable by
>> _anyone_ between two resets of the system cannot legitimately be freed
>> (and hence imo is wrong to live in EfiBootServices* regions).
> 
> I agree, but currently some such data *is* in EfiBootServices* regions,
> sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> configuration tables that point to EfiBootServicesData memory before
> freeing that memory.

Hmm, uninstalling isn't nice, as it may limit functionality. Instead we
might go through all tables and fiddle with memap descriptors in case
a pointer references an EfiBootServices* region (regardless of size, as
per the first restriction mentioned above). (A more brute force approach
might be to simply behave as if -mapbs was specified in such a case,
provided we can reliably determine this early enough, i.e. before first
checking the "map_bs" variable.) Tables actually known to us could also
be relocated (like you've done for ESRT).

Such checking could be extended to the runtime services function
pointers. While that wouldn't cover cases where a function entry point
is in runtime services space but the function then wrongly calls into
or references boot services space, it would cover a few more (broken)
systems.

This, unlike behaving by default as if -mapbs was given, would be a
workaround I'd accept to be enabled unconditionally, as it wouldn't
affect well behaved systems (beyond the time it takes to carry out the
checks, and provided the checking logic isn't buggy).

There's one further caveat towards uninstalling (in a way also for your
ESRT relocation code): The final memory map is known to us only when we
can't call boot services functions anymore (i.e. in particular
InstallConfigurationTable()).

Jan
Demi Marie Obenour Oct. 6, 2022, 2:43 p.m. UTC | #20
On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote:
> On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour
> <demi@invisiblethingslab.com> wrote:
> >
> > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote:
> > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
> > > <demi@invisiblethingslab.com> wrote:
> > > >
> > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > > > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > > > > This function does not work under Xen because Xen could have already
> > > > > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > > > > for this thread, as it prevents EFI tables that are in
> > > > > > EfiBootServicesData from being used under Xen.
> > > > > >
> > > > > > A much nicer approach would be for Xen to reserve boot services memory
> > > > > > unconditionally, but provide a hypercall that dom0 could used to free
> > > > > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > > > > would allow efi_mem_reserve() to work normally.
> > > > >
> > > > > efi_mem_reserve() actually working would be a layering violation;
> > > > > controlling the EFI memory map is entirely Xen's job.
> > > >
> > > > Doing this properly would require Xen to understand all of the EFI
> > > > tables that could validly be in EfiBootServices* and which could be of
> > > > interest to dom0.  It might (at least on some very buggy firmware)
> > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware
> > > > decided to put an ACPI or SMBIOS table in EfiBootServices*.
> > > >
> > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > > > > for the case when -mapbs is used. As I've indicated before, I'm of the
> > > > > opinion that default behavior should be matching the intentions of the
> > > > > spec, and the intention of EfiBootServices* is for the space to be
> > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > > > > that hypercall: It might use it for regions where data lives which it
> > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > > > > entity would later want to consume. Code/data potentially usable by
> > > > > _anyone_ between two resets of the system cannot legitimately be freed
> > > > > (and hence imo is wrong to live in EfiBootServices* regions).
> > > >
> > > > I agree, but currently some such data *is* in EfiBootServices* regions,
> > > > sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> > > > configuration tables that point to EfiBootServicesData memory before
> > > > freeing that memory.
> > > >
> > >
> > > That seems like a reasonable approach to me. Tables like MEMATTR or
> > > RT_PROP are mostly relevant for bare metal where the host kernel maps
> > > the runtime services, and in general, passing on these tables without
> > > knowing what they do is kind of fishy anyway. You might even argue
> > > that only known table types should be forwarded in the first place,
> > > regardless of the memory type.
> >
> > Which tables are worth handling in Xen?  I know about ACPI, SMBIOS, and
> > ESRT, but I am curious which others Xen should preserve.  Currently, Xen
> > does not know about RT_PROP or MEMATTR; could this be a cause of
> > problems?
> 
> dom0 only has access to paravirtualized EFI runtime services, so
> consuming RT_PROP or MEMATTR should be up to Xen (they describe which
> runtime services remain available at runtime, and which permission
> attributes to use for the runtime services memory regions,
> respectively)

Xen does not do this right now.  I wonder if this could be the cause of
compatibility issues with various firmware implementations.

> Looking through the kernel code, I don't think there are any that dom0
> should care about beyond ACPI, SMBIOS and ESRT. But as you suggest,
> that means Xen should just mask them in the view of the EFI system
> table it exposes so dom0. Otherwise, the kernel may still try to map
> and parse them.

What about the BGRT and MOKvar?  I agree that Xen should not expose the
others.  Should it just hide the tables, or should it actually uninstall
them?  My intuition is that the second would be technically more
correct, but also more likely to trigger bugs in various firmware
implementations.
Ard Biesheuvel Oct. 6, 2022, 4:19 p.m. UTC | #21
On Thu, 6 Oct 2022 at 16:43, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote:
> > On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour
> > <demi@invisiblethingslab.com> wrote:
> > >
> > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote:
> > > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
> > > > <demi@invisiblethingslab.com> wrote:
> > > > >
> > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > > > > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > > > > > This function does not work under Xen because Xen could have already
> > > > > > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > > > > > for this thread, as it prevents EFI tables that are in
> > > > > > > EfiBootServicesData from being used under Xen.
> > > > > > >
> > > > > > > A much nicer approach would be for Xen to reserve boot services memory
> > > > > > > unconditionally, but provide a hypercall that dom0 could used to free
> > > > > > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > > > > > would allow efi_mem_reserve() to work normally.
> > > > > >
> > > > > > efi_mem_reserve() actually working would be a layering violation;
> > > > > > controlling the EFI memory map is entirely Xen's job.
> > > > >
> > > > > Doing this properly would require Xen to understand all of the EFI
> > > > > tables that could validly be in EfiBootServices* and which could be of
> > > > > interest to dom0.  It might (at least on some very buggy firmware)
> > > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware
> > > > > decided to put an ACPI or SMBIOS table in EfiBootServices*.
> > > > >
> > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > > > > > for the case when -mapbs is used. As I've indicated before, I'm of the
> > > > > > opinion that default behavior should be matching the intentions of the
> > > > > > spec, and the intention of EfiBootServices* is for the space to be
> > > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > > > > > that hypercall: It might use it for regions where data lives which it
> > > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > > > > > entity would later want to consume. Code/data potentially usable by
> > > > > > _anyone_ between two resets of the system cannot legitimately be freed
> > > > > > (and hence imo is wrong to live in EfiBootServices* regions).
> > > > >
> > > > > I agree, but currently some such data *is* in EfiBootServices* regions,
> > > > > sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> > > > > configuration tables that point to EfiBootServicesData memory before
> > > > > freeing that memory.
> > > > >
> > > >
> > > > That seems like a reasonable approach to me. Tables like MEMATTR or
> > > > RT_PROP are mostly relevant for bare metal where the host kernel maps
> > > > the runtime services, and in general, passing on these tables without
> > > > knowing what they do is kind of fishy anyway. You might even argue
> > > > that only known table types should be forwarded in the first place,
> > > > regardless of the memory type.
> > >
> > > Which tables are worth handling in Xen?  I know about ACPI, SMBIOS, and
> > > ESRT, but I am curious which others Xen should preserve.  Currently, Xen
> > > does not know about RT_PROP or MEMATTR; could this be a cause of
> > > problems?
> >
> > dom0 only has access to paravirtualized EFI runtime services, so
> > consuming RT_PROP or MEMATTR should be up to Xen (they describe which
> > runtime services remain available at runtime, and which permission
> > attributes to use for the runtime services memory regions,
> > respectively)
>
> Xen does not do this right now.  I wonder if this could be the cause of
> compatibility issues with various firmware implementations.
>
> > Looking through the kernel code, I don't think there are any that dom0
> > should care about beyond ACPI, SMBIOS and ESRT. But as you suggest,
> > that means Xen should just mask them in the view of the EFI system
> > table it exposes so dom0. Otherwise, the kernel may still try to map
> > and parse them.
>
> What about the BGRT and MOKvar?  I agree that Xen should not expose the
> others.  Should it just hide the tables, or should it actually uninstall
> them?  My intuition is that the second would be technically more
> correct, but also more likely to trigger bugs in various firmware
> implementations.

BGRT is a ACPI table not a EFI configuration table, so I'd assume it
is treated the same way as other ACPI tables

MOKvar is a fallback for systems where it doesn't fit into a volatile
variable IIRC but I am not sure if it is actually implemented
anywhere, and what type of memory region it is expected to use.

However, I'm not sure if it even matters, given that Xen loads the
dom0 kernel not the firmware, right?
Demi Marie Obenour Oct. 6, 2022, 5:22 p.m. UTC | #22
On Thu, Oct 06, 2022 at 06:19:35PM +0200, Ard Biesheuvel wrote:
> On Thu, 6 Oct 2022 at 16:43, Demi Marie Obenour
> <demi@invisiblethingslab.com> wrote:
> >
> > On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote:
> > > On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour
> > > <demi@invisiblethingslab.com> wrote:
> > > >
> > > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote:
> > > > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
> > > > > <demi@invisiblethingslab.com> wrote:
> > > > > >
> > > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > > > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > > > > > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > > > > > > This function does not work under Xen because Xen could have already
> > > > > > > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > > > > > > for this thread, as it prevents EFI tables that are in
> > > > > > > > EfiBootServicesData from being used under Xen.
> > > > > > > >
> > > > > > > > A much nicer approach would be for Xen to reserve boot services memory
> > > > > > > > unconditionally, but provide a hypercall that dom0 could used to free
> > > > > > > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > > > > > > would allow efi_mem_reserve() to work normally.
> > > > > > >
> > > > > > > efi_mem_reserve() actually working would be a layering violation;
> > > > > > > controlling the EFI memory map is entirely Xen's job.
> > > > > >
> > > > > > Doing this properly would require Xen to understand all of the EFI
> > > > > > tables that could validly be in EfiBootServices* and which could be of
> > > > > > interest to dom0.  It might (at least on some very buggy firmware)
> > > > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware
> > > > > > decided to put an ACPI or SMBIOS table in EfiBootServices*.
> > > > > >
> > > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > > > > > > for the case when -mapbs is used. As I've indicated before, I'm of the
> > > > > > > opinion that default behavior should be matching the intentions of the
> > > > > > > spec, and the intention of EfiBootServices* is for the space to be
> > > > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > > > > > > that hypercall: It might use it for regions where data lives which it
> > > > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > > > > > > entity would later want to consume. Code/data potentially usable by
> > > > > > > _anyone_ between two resets of the system cannot legitimately be freed
> > > > > > > (and hence imo is wrong to live in EfiBootServices* regions).
> > > > > >
> > > > > > I agree, but currently some such data *is* in EfiBootServices* regions,
> > > > > > sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> > > > > > configuration tables that point to EfiBootServicesData memory before
> > > > > > freeing that memory.
> > > > > >
> > > > >
> > > > > That seems like a reasonable approach to me. Tables like MEMATTR or
> > > > > RT_PROP are mostly relevant for bare metal where the host kernel maps
> > > > > the runtime services, and in general, passing on these tables without
> > > > > knowing what they do is kind of fishy anyway. You might even argue
> > > > > that only known table types should be forwarded in the first place,
> > > > > regardless of the memory type.
> > > >
> > > > Which tables are worth handling in Xen?  I know about ACPI, SMBIOS, and
> > > > ESRT, but I am curious which others Xen should preserve.  Currently, Xen
> > > > does not know about RT_PROP or MEMATTR; could this be a cause of
> > > > problems?
> > >
> > > dom0 only has access to paravirtualized EFI runtime services, so
> > > consuming RT_PROP or MEMATTR should be up to Xen (they describe which
> > > runtime services remain available at runtime, and which permission
> > > attributes to use for the runtime services memory regions,
> > > respectively)
> >
> > Xen does not do this right now.  I wonder if this could be the cause of
> > compatibility issues with various firmware implementations.
> >
> > > Looking through the kernel code, I don't think there are any that dom0
> > > should care about beyond ACPI, SMBIOS and ESRT. But as you suggest,
> > > that means Xen should just mask them in the view of the EFI system
> > > table it exposes so dom0. Otherwise, the kernel may still try to map
> > > and parse them.
> >
> > What about the BGRT and MOKvar?  I agree that Xen should not expose the
> > others.  Should it just hide the tables, or should it actually uninstall
> > them?  My intuition is that the second would be technically more
> > correct, but also more likely to trigger bugs in various firmware
> > implementations.
> 
> BGRT is a ACPI table not a EFI configuration table, so I'd assume it
> is treated the same way as other ACPI tables

That actually surprises me.  Xen definitely needs to know about its
GUID, then.  Are there any other EFI configuration tables that Xen needs
to know about, or ACPI tables that might be outside of
EfiACPIReclaimMemory.

> MOKvar is a fallback for systems where it doesn't fit into a volatile
> variable IIRC but I am not sure if it is actually implemented
> anywhere, and what type of memory region it is expected to use.
> 
> However, I'm not sure if it even matters, given that Xen loads the
> dom0 kernel not the firmware, right?

No idea.
Ard Biesheuvel Oct. 6, 2022, 5:56 p.m. UTC | #23
On Thu, 6 Oct 2022 at 19:22, Demi Marie Obenour
<demi@invisiblethingslab.com> wrote:
>
> On Thu, Oct 06, 2022 at 06:19:35PM +0200, Ard Biesheuvel wrote:
> > On Thu, 6 Oct 2022 at 16:43, Demi Marie Obenour
> > <demi@invisiblethingslab.com> wrote:
> > >
> > > On Thu, Oct 06, 2022 at 09:31:47AM +0200, Ard Biesheuvel wrote:
> > > > On Thu, 6 Oct 2022 at 03:41, Demi Marie Obenour
> > > > <demi@invisiblethingslab.com> wrote:
> > > > >
> > > > > On Wed, Oct 05, 2022 at 11:28:29PM +0200, Ard Biesheuvel wrote:
> > > > > > On Wed, 5 Oct 2022 at 20:11, Demi Marie Obenour
> > > > > > <demi@invisiblethingslab.com> wrote:
> > > > > > >
> > > > > > > On Wed, Oct 05, 2022 at 08:15:07AM +0200, Jan Beulich wrote:
> > > > > > > > On 04.10.2022 17:46, Demi Marie Obenour wrote:
> > > > > > > > > Linux has a function called efi_mem_reserve() that is used to reserve
> > > > > > > > > EfiBootServicesData memory that contains e.g. EFI configuration tables.
> > > > > > > > > This function does not work under Xen because Xen could have already
> > > > > > > > > clobbered the memory.  efi_mem_reserve() not working is the whole reason
> > > > > > > > > for this thread, as it prevents EFI tables that are in
> > > > > > > > > EfiBootServicesData from being used under Xen.
> > > > > > > > >
> > > > > > > > > A much nicer approach would be for Xen to reserve boot services memory
> > > > > > > > > unconditionally, but provide a hypercall that dom0 could used to free
> > > > > > > > > the parts of EfiBootServicesData memory that are no longer needed.  This
> > > > > > > > > would allow efi_mem_reserve() to work normally.
> > > > > > > >
> > > > > > > > efi_mem_reserve() actually working would be a layering violation;
> > > > > > > > controlling the EFI memory map is entirely Xen's job.
> > > > > > >
> > > > > > > Doing this properly would require Xen to understand all of the EFI
> > > > > > > tables that could validly be in EfiBootServices* and which could be of
> > > > > > > interest to dom0.  It might (at least on some very buggy firmware)
> > > > > > > require a partial ACPI and/or SMBIOS implementation too, if the firmware
> > > > > > > decided to put an ACPI or SMBIOS table in EfiBootServices*.
> > > > > > >
> > > > > > > > As to the hypercall you suggest - I wouldn't mind its addition, but only
> > > > > > > > for the case when -mapbs is used. As I've indicated before, I'm of the
> > > > > > > > opinion that default behavior should be matching the intentions of the
> > > > > > > > spec, and the intention of EfiBootServices* is for the space to be
> > > > > > > > reclaimed. Plus I'm sure you realize there's a caveat with Dom0 using
> > > > > > > > that hypercall: It might use it for regions where data lives which it
> > > > > > > > wouldn't care about itself, but which an eventual kexec-ed (or alike)
> > > > > > > > entity would later want to consume. Code/data potentially usable by
> > > > > > > > _anyone_ between two resets of the system cannot legitimately be freed
> > > > > > > > (and hence imo is wrong to live in EfiBootServices* regions).
> > > > > > >
> > > > > > > I agree, but currently some such data *is* in EfiBootServices* regions,
> > > > > > > sadly.  When -mapbs is *not* used, I recommend uninstalling all of the
> > > > > > > configuration tables that point to EfiBootServicesData memory before
> > > > > > > freeing that memory.
> > > > > > >
> > > > > >
> > > > > > That seems like a reasonable approach to me. Tables like MEMATTR or
> > > > > > RT_PROP are mostly relevant for bare metal where the host kernel maps
> > > > > > the runtime services, and in general, passing on these tables without
> > > > > > knowing what they do is kind of fishy anyway. You might even argue
> > > > > > that only known table types should be forwarded in the first place,
> > > > > > regardless of the memory type.
> > > > >
> > > > > Which tables are worth handling in Xen?  I know about ACPI, SMBIOS, and
> > > > > ESRT, but I am curious which others Xen should preserve.  Currently, Xen
> > > > > does not know about RT_PROP or MEMATTR; could this be a cause of
> > > > > problems?
> > > >
> > > > dom0 only has access to paravirtualized EFI runtime services, so
> > > > consuming RT_PROP or MEMATTR should be up to Xen (they describe which
> > > > runtime services remain available at runtime, and which permission
> > > > attributes to use for the runtime services memory regions,
> > > > respectively)
> > >
> > > Xen does not do this right now.  I wonder if this could be the cause of
> > > compatibility issues with various firmware implementations.
> > >
> > > > Looking through the kernel code, I don't think there are any that dom0
> > > > should care about beyond ACPI, SMBIOS and ESRT. But as you suggest,
> > > > that means Xen should just mask them in the view of the EFI system
> > > > table it exposes so dom0. Otherwise, the kernel may still try to map
> > > > and parse them.
> > >
> > > What about the BGRT and MOKvar?  I agree that Xen should not expose the
> > > others.  Should it just hide the tables, or should it actually uninstall
> > > them?  My intuition is that the second would be technically more
> > > correct, but also more likely to trigger bugs in various firmware
> > > implementations.
> >
> > BGRT is a ACPI table not a EFI configuration table, so I'd assume it
> > is treated the same way as other ACPI tables
>
> That actually surprises me.  Xen definitely needs to know about its
> GUID, then.

It doesn't have a GUID. The table itself is a ACPI table that is
enumerated in the usual ACPI way. The problem is that the BGRT
describes a region in memory (where the OEM boot logo is), and this is
usually in EfiBootServicesData memory.

This data is typically used during early boot, to keep the logo intact
while the OS loader draws a nice animation below it. Not the end of
the world if it gets corrupted or removed.