diff mbox series

[3/3] efi/memattr: Include EFI memmap size in corruption warnings

Message ID 20250106-efi_fw_bug-v1-3-01a8eb40bfeb@debian.org
State New
Headers show
Series efi/memattr: Improve the efi_memattr_init function. | expand

Commit Message

Breno Leitao Jan. 6, 2025, 7:02 p.m. UTC
Add EFI memory map size to warning messages when a corrupted Memory
Attributes Table is detected, making it easier to diagnose firmware issues.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/firmware/efi/memattr.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

Ard Biesheuvel Jan. 7, 2025, 11:24 a.m. UTC | #1
On Mon, 6 Jan 2025 at 20:03, Breno Leitao <leitao@debian.org> wrote:
>
> Add EFI memory map size to warning messages when a corrupted Memory
> Attributes Table is detected, making it easier to diagnose firmware issues.
>
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  drivers/firmware/efi/memattr.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
> index 5f83cdea88b05cb325e9f90c14a0048131e53cfa..2c276bcc0df48352bec6cd96b69edf67a16f6069 100644
> --- a/drivers/firmware/efi/memattr.c
> +++ b/drivers/firmware/efi/memattr.c
> @@ -22,7 +22,7 @@ unsigned long __ro_after_init efi_mem_attr_table = EFI_INVALID_TABLE_ADDR;
>  void __init efi_memattr_init(void)
>  {
>         efi_memory_attributes_table_t *tbl;
> -       unsigned long size;
> +       unsigned long size, efi_map_size;
>
>         if (efi_mem_attr_table == EFI_INVALID_TABLE_ADDR)
>                 return;
> @@ -49,9 +49,10 @@ void __init efi_memattr_init(void)
>          * just be ignored altogether.
>          */
>         size = tbl->num_entries * tbl->desc_size;
> -       if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) {
> -               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n",
> -                       tbl->version, tbl->desc_size, tbl->num_entries);
> +       efi_map_size = efi.memmap.nr_map * efi.memmap.desc_size;
> +       if (size > 3 * efi_map_size) {
> +               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u, efi_map_size == %lu)\n",
> +                       tbl->version, tbl->desc_size, tbl->num_entries, efi_map_size);
>                 goto unmap;
>         }
>
>

Hello Breno,

I don't mind the patch per se, but I don't think it is terribly useful either.

Could you explain how this helps?
Breno Leitao Jan. 7, 2025, 12:05 p.m. UTC | #2
Hello Ard,

On Tue, Jan 07, 2025 at 12:24:03PM +0100, Ard Biesheuvel wrote:
> On Mon, 6 Jan 2025 at 20:03, Breno Leitao <leitao@debian.org> wrote:
> >
> > Add EFI memory map size to warning messages when a corrupted Memory
> > Attributes Table is detected, making it easier to diagnose firmware issues.
> >
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >  drivers/firmware/efi/memattr.c | 9 +++++----
> >  1 file changed, 5 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
> > index 5f83cdea88b05cb325e9f90c14a0048131e53cfa..2c276bcc0df48352bec6cd96b69edf67a16f6069 100644
> > --- a/drivers/firmware/efi/memattr.c
> > +++ b/drivers/firmware/efi/memattr.c
> > @@ -22,7 +22,7 @@ unsigned long __ro_after_init efi_mem_attr_table = EFI_INVALID_TABLE_ADDR;
> >  void __init efi_memattr_init(void)
> >  {
> >         efi_memory_attributes_table_t *tbl;
> > -       unsigned long size;
> > +       unsigned long size, efi_map_size;
> >
> >         if (efi_mem_attr_table == EFI_INVALID_TABLE_ADDR)
> >                 return;
> > @@ -49,9 +49,10 @@ void __init efi_memattr_init(void)
> >          * just be ignored altogether.
> >          */
> >         size = tbl->num_entries * tbl->desc_size;
> > -       if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) {
> > -               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n",
> > -                       tbl->version, tbl->desc_size, tbl->num_entries);
> > +       efi_map_size = efi.memmap.nr_map * efi.memmap.desc_size;
> > +       if (size > 3 * efi_map_size) {
> > +               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u, efi_map_size == %lu)\n",
> > +                       tbl->version, tbl->desc_size, tbl->num_entries, efi_map_size);
> >                 goto unmap;
> >         }
> >
> >
> 
> Hello Breno,
> 
> I don't mind the patch per se, but I don't think it is terribly useful either.
> 
> Could you explain how this helps?

We are seeing a bunch of `Corrupted EFI Memory Attributes Table
detected!` in the Meta fleet, and this is something we are
investigating.

We highly think this is related to some kexec overwrites, and when we
get here, the EFI table is completely garbage. I haven't seen this
problem on cold boot.

Here are sof the instances I see:

	efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 18058, num_entries == 33554432)
	efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 1, desc_size == 2072184435, num_entries == 3248688968)
	efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 83886080, num_entries == 304)
	efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 2, desc_size == 48, num_entries == 40)	

Anyway, back to you question, this patch helped us to narrow down and
find where the problem was, by printing all variables taken in
consideration to get the conclusion that the firmware is buggy.

Regarding the problem, Usama and I are suspecting that it might be
related to some 77d48d39e99170 ("efistub/tpm: Use ACPI reclaim memory for
event log to avoid corruption"), but at this time with memattr table, where it
might not preserved during kexec(?).

Thanks,
--breno
Ard Biesheuvel Jan. 9, 2025, 2:48 p.m. UTC | #3
On Tue, 7 Jan 2025 at 13:05, Breno Leitao <leitao@debian.org> wrote:
>
> Hello Ard,
>
> On Tue, Jan 07, 2025 at 12:24:03PM +0100, Ard Biesheuvel wrote:
> > On Mon, 6 Jan 2025 at 20:03, Breno Leitao <leitao@debian.org> wrote:
> > >
> > > Add EFI memory map size to warning messages when a corrupted Memory
> > > Attributes Table is detected, making it easier to diagnose firmware issues.
> > >
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > ---
> > >  drivers/firmware/efi/memattr.c | 9 +++++----
> > >  1 file changed, 5 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
> > > index 5f83cdea88b05cb325e9f90c14a0048131e53cfa..2c276bcc0df48352bec6cd96b69edf67a16f6069 100644
> > > --- a/drivers/firmware/efi/memattr.c
> > > +++ b/drivers/firmware/efi/memattr.c
> > > @@ -22,7 +22,7 @@ unsigned long __ro_after_init efi_mem_attr_table = EFI_INVALID_TABLE_ADDR;
> > >  void __init efi_memattr_init(void)
> > >  {
> > >         efi_memory_attributes_table_t *tbl;
> > > -       unsigned long size;
> > > +       unsigned long size, efi_map_size;
> > >
> > >         if (efi_mem_attr_table == EFI_INVALID_TABLE_ADDR)
> > >                 return;
> > > @@ -49,9 +49,10 @@ void __init efi_memattr_init(void)
> > >          * just be ignored altogether.
> > >          */
> > >         size = tbl->num_entries * tbl->desc_size;
> > > -       if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) {
> > > -               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n",
> > > -                       tbl->version, tbl->desc_size, tbl->num_entries);
> > > +       efi_map_size = efi.memmap.nr_map * efi.memmap.desc_size;
> > > +       if (size > 3 * efi_map_size) {
> > > +               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u, efi_map_size == %lu)\n",
> > > +                       tbl->version, tbl->desc_size, tbl->num_entries, efi_map_size);
> > >                 goto unmap;
> > >         }
> > >
> > >
> >
> > Hello Breno,
> >
> > I don't mind the patch per se, but I don't think it is terribly useful either.
> >
> > Could you explain how this helps?
>
> We are seeing a bunch of `Corrupted EFI Memory Attributes Table
> detected!` in the Meta fleet, and this is something we are
> investigating.
>
> We highly think this is related to some kexec overwrites, and when we
> get here, the EFI table is completely garbage. I haven't seen this
> problem on cold boot.
>

It likely means the memory is not reserved correctly.

Could you check whether this

--- a/drivers/firmware/efi/memattr.c
+++ b/drivers/firmware/efi/memattr.c
@@ -56,7 +56,7 @@ int __init efi_memattr_init(void)
        }

        tbl_size = sizeof(*tbl) + size;
-       memblock_reserve(efi_mem_attr_table, tbl_size);
+       efi_mem_reserve(efi_mem_attr_table, tbl_size);
        set_bit(EFI_MEM_ATTR, &efi.flags);

 unmap:


makes any difference?


> Here are sof the instances I see:
>
>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 18058, num_entries == 33554432)
>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 1, desc_size == 2072184435, num_entries == 3248688968)
>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 83886080, num_entries == 304)
>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 2, desc_size == 48, num_entries == 40)
>

The last one looks like a false positive: each of those values seems
perfectly reasonable.

Any chance you could dump the memory map and this table (boot using
efi=debug) on this system?


> Anyway, back to you question, this patch helped us to narrow down and
> find where the problem was, by printing all variables taken in
> consideration to get the conclusion that the firmware is buggy.
>

Fair enough.

> Regarding the problem, Usama and I are suspecting that it might be
> related to some 77d48d39e99170 ("efistub/tpm: Use ACPI reclaim memory for
> event log to avoid corruption"), but at this time with memattr table, where it
> might not preserved during kexec(?).
>

Please see the suggestion above.
Usama Arif Jan. 9, 2025, 3:09 p.m. UTC | #4
On 09/01/2025 14:48, Ard Biesheuvel wrote:
> On Tue, 7 Jan 2025 at 13:05, Breno Leitao <leitao@debian.org> wrote:
>>
>> Hello Ard,
>>
>> On Tue, Jan 07, 2025 at 12:24:03PM +0100, Ard Biesheuvel wrote:
>>> On Mon, 6 Jan 2025 at 20:03, Breno Leitao <leitao@debian.org> wrote:
>>>>
>>>> Add EFI memory map size to warning messages when a corrupted Memory
>>>> Attributes Table is detected, making it easier to diagnose firmware issues.
>>>>
>>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>>> ---
>>>>  drivers/firmware/efi/memattr.c | 9 +++++----
>>>>  1 file changed, 5 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
>>>> index 5f83cdea88b05cb325e9f90c14a0048131e53cfa..2c276bcc0df48352bec6cd96b69edf67a16f6069 100644
>>>> --- a/drivers/firmware/efi/memattr.c
>>>> +++ b/drivers/firmware/efi/memattr.c
>>>> @@ -22,7 +22,7 @@ unsigned long __ro_after_init efi_mem_attr_table = EFI_INVALID_TABLE_ADDR;
>>>>  void __init efi_memattr_init(void)
>>>>  {
>>>>         efi_memory_attributes_table_t *tbl;
>>>> -       unsigned long size;
>>>> +       unsigned long size, efi_map_size;
>>>>
>>>>         if (efi_mem_attr_table == EFI_INVALID_TABLE_ADDR)
>>>>                 return;
>>>> @@ -49,9 +49,10 @@ void __init efi_memattr_init(void)
>>>>          * just be ignored altogether.
>>>>          */
>>>>         size = tbl->num_entries * tbl->desc_size;
>>>> -       if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) {
>>>> -               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n",
>>>> -                       tbl->version, tbl->desc_size, tbl->num_entries);
>>>> +       efi_map_size = efi.memmap.nr_map * efi.memmap.desc_size;
>>>> +       if (size > 3 * efi_map_size) {
>>>> +               pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u, efi_map_size == %lu)\n",
>>>> +                       tbl->version, tbl->desc_size, tbl->num_entries, efi_map_size);
>>>>                 goto unmap;
>>>>         }
>>>>
>>>>
>>>
>>> Hello Breno,
>>>
>>> I don't mind the patch per se, but I don't think it is terribly useful either.
>>>
>>> Could you explain how this helps?
>>
>> We are seeing a bunch of `Corrupted EFI Memory Attributes Table
>> detected!` in the Meta fleet, and this is something we are
>> investigating.
>>
>> We highly think this is related to some kexec overwrites, and when we
>> get here, the EFI table is completely garbage. I haven't seen this
>> problem on cold boot.
>>
> 
> It likely means the memory is not reserved correctly.
> 
> Could you check whether this
> 
> --- a/drivers/firmware/efi/memattr.c
> +++ b/drivers/firmware/efi/memattr.c
> @@ -56,7 +56,7 @@ int __init efi_memattr_init(void)
>         }
> 
>         tbl_size = sizeof(*tbl) + size;
> -       memblock_reserve(efi_mem_attr_table, tbl_size);
> +       efi_mem_reserve(efi_mem_attr_table, tbl_size);
>         set_bit(EFI_MEM_ATTR, &efi.flags);
> 
>  unmap:
> 
> 
> makes any difference?
> 
Hi Ard,

Thanks for the reply!

I have further explained the problems and possible solutions in
https://lore.kernel.org/all/20250108215957.3437660-1-usamaarif642@gmail.com/.

I am assuming the above diff is to solve problem 2 that I have described in the patches.
I haven't tested it, because its a bit difficult to reproduce problem 2, but I think
the above diff might not make a difference? efi_mem_reserve changes e820_table,
while /sys/firmware/memmap uses e820_table_firmware. An alternate solution might be
to change /sys/firmware/memmap to e820_table. I didnt go down that route because,
you would be changing what the kernel exposes to userspace, which might not be the
right thing.

Thanks,
Usama


> 
>> Here are sof the instances I see:
>>
>>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 18058, num_entries == 33554432)
>>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 1, desc_size == 2072184435, num_entries == 3248688968)
>>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 0, desc_size == 83886080, num_entries == 304)
>>         efi: memattr: [Firmware Bug]: Corrupted EFI Memory Attributes Table detected! (version == 2, desc_size == 48, num_entries == 40)
>>
> 
> The last one looks like a false positive: each of those values seems
> perfectly reasonable.
> 
> Any chance you could dump the memory map and this table (boot using
> efi=debug) on this system?
> 
> 
>> Anyway, back to you question, this patch helped us to narrow down and
>> find where the problem was, by printing all variables taken in
>> consideration to get the conclusion that the firmware is buggy.
>>
> 
> Fair enough.
> 
>> Regarding the problem, Usama and I are suspecting that it might be
>> related to some 77d48d39e99170 ("efistub/tpm: Use ACPI reclaim memory for
>> event log to avoid corruption"), but at this time with memattr table, where it
>> might not preserved during kexec(?).
>>
> 
> Please see the suggestion above.
Breno Leitao Jan. 9, 2025, 5:45 p.m. UTC | #5
On Thu, Jan 09, 2025 at 03:48:56PM +0100, Ard Biesheuvel wrote:
> On Tue, 7 Jan 2025 at 13:05, Breno Leitao <leitao@debian.org> wrote:
> > On Tue, Jan 07, 2025 at 12:24:03PM +0100, Ard Biesheuvel wrote:


> > We are seeing a bunch of `Corrupted EFI Memory Attributes Table
> > detected!` in the Meta fleet, and this is something we are
> > investigating.
> >
> > We highly think this is related to some kexec overwrites, and when we
> > get here, the EFI table is completely garbage. I haven't seen this
> > problem on cold boot.
> >
> 
> It likely means the memory is not reserved correctly.
> 
> Could you check whether this
> 
> --- a/drivers/firmware/efi/memattr.c
> +++ b/drivers/firmware/efi/memattr.c
> @@ -56,7 +56,7 @@ int __init efi_memattr_init(void)
>         }
> 
>         tbl_size = sizeof(*tbl) + size;
> -       memblock_reserve(efi_mem_attr_table, tbl_size);
> +       efi_mem_reserve(efi_mem_attr_table, tbl_size);
>         set_bit(EFI_MEM_ATTR, &efi.flags);
> 
>  unmap:
> 
> 
> makes any difference?

It doesn't seem so. This is how I tested it.

I've cherry-picked it on top of Linus' master eea6e4b4dfb8 ("Merge tag
'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi"), 
and booted a system with this kernel.
diff mbox series

Patch

diff --git a/drivers/firmware/efi/memattr.c b/drivers/firmware/efi/memattr.c
index 5f83cdea88b05cb325e9f90c14a0048131e53cfa..2c276bcc0df48352bec6cd96b69edf67a16f6069 100644
--- a/drivers/firmware/efi/memattr.c
+++ b/drivers/firmware/efi/memattr.c
@@ -22,7 +22,7 @@  unsigned long __ro_after_init efi_mem_attr_table = EFI_INVALID_TABLE_ADDR;
 void __init efi_memattr_init(void)
 {
 	efi_memory_attributes_table_t *tbl;
-	unsigned long size;
+	unsigned long size, efi_map_size;
 
 	if (efi_mem_attr_table == EFI_INVALID_TABLE_ADDR)
 		return;
@@ -49,9 +49,10 @@  void __init efi_memattr_init(void)
 	 * just be ignored altogether.
 	 */
 	size = tbl->num_entries * tbl->desc_size;
-	if (size > 3 * efi.memmap.nr_map * efi.memmap.desc_size) {
-		pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u)\n",
-			tbl->version, tbl->desc_size, tbl->num_entries);
+	efi_map_size = efi.memmap.nr_map * efi.memmap.desc_size;
+	if (size > 3 * efi_map_size) {
+		pr_warn(FW_BUG "Corrupted EFI Memory Attributes Table detected! (version == %u, desc_size == %u, num_entries == %u, efi_map_size == %lu)\n",
+			tbl->version, tbl->desc_size, tbl->num_entries, efi_map_size);
 		goto unmap;
 	}