mbox series

[0/2] Fix boot hang issue on Ampere Emag server

Message ID 20230131040355.3116-1-justin.he@arm.com
Headers show
Series Fix boot hang issue on Ampere Emag server | expand

Message

Justin He Jan. 31, 2023, 4:03 a.m. UTC
I met a hung task warning and then kernel was hung forever with latest
kernel on an Ampere Emag server.

The root cause is kernel was hung  when invoking an efi rts call to set
the RandomSeed variable during the booting stage. The arch_efi_call_virt
call (set_variable) was never returned and then caused the hung task error.

On the Emag server, efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE)
is returned with "true"

Fix it by introducing the efi_get_supported_rt_services() and then determine
to set or clear the runtime services bit of efi.flags.

Jia He (2):
  efi: libstub: Fix the retriving of supported rutime services
  efi: Introduce efi_get_supported_rt_services() to get the runtime
    services mask

 drivers/firmware/efi/arm-runtime.c      |  5 ++++-
 drivers/firmware/efi/efi.c              | 28 +++++++++++++++++--------
 drivers/firmware/efi/libstub/efi-stub.c |  2 ++
 include/linux/efi.h                     |  1 +
 4 files changed, 26 insertions(+), 10 deletions(-)

Comments

Ard Biesheuvel Jan. 31, 2023, 7:04 a.m. UTC | #1
On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@arm.com> wrote:
>
> If retrieving UEFI configuration table is failed, the supported runtime
> services mask should be regarded as 0 instead of EFI_RT_SUPPORTED_ALL.
> Otherwise efi_novamap might be incorrectly assigned to "false" on the
> Ampere Emag server.
>
> Signed-off-by: Jia He <justin.he@arm.com>
> ---
>  drivers/firmware/efi/libstub/efi-stub.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/firmware/efi/libstub/efi-stub.c b/drivers/firmware/efi/libstub/efi-stub.c
> index 2955c1ac6a36..f24b5436729c 100644
> --- a/drivers/firmware/efi/libstub/efi-stub.c
> +++ b/drivers/firmware/efi/libstub/efi-stub.c
> @@ -111,6 +111,8 @@ static u32 get_supported_rt_services(void)
>         rt_prop_table = get_efi_config_table(EFI_RT_PROPERTIES_TABLE_GUID);
>         if (rt_prop_table)
>                 supported &= rt_prop_table->runtime_services_supported;
> +       else
> +               supported = 0;
>
>         return supported;
>  }

Hello Justin,

This is not how things are supposed to work. On systems that do not
implement the RT properties table, all runtime services are assumed to
be implemented.

Note that this table is informational only - the runtime services
themselves must be callable and return EFI_UNSUPPORTED if they are
marked as unavailable in the RT properties table.
Ard Biesheuvel Jan. 31, 2023, 7:18 a.m. UTC | #2
(cc Jason for awareness)

On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@arm.com> wrote:
>
> I met a hung task warning and then kernel was hung forever with latest
> kernel on an Ampere Emag server.
>
> The root cause is kernel was hung  when invoking an efi rts call to set
> the RandomSeed variable during the booting stage. The arch_efi_call_virt
> call (set_variable) was never returned and then caused the hung task error.
>

Given that EFI variables work on this platform (as far as I know), the
problem may be that we are calling SetVariable() too early.

Could you double check whether setting variables works as expected?
You can use efibootmgr -t 10 as root (for example) to set the boot
timeout, and check whether the new value is retained after a reboot
(efibootmgr will print the current value for you)

Could you also please share the kernel log up until the point where it hangs?


> On the Emag server, efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE)
> is returned with "true"
>

This is as expected: if the firmware does not expose the RT properties
table, all runtime services are assumed to be available.

> Fix it by introducing the efi_get_supported_rt_services() and then determine
> to set or clear the runtime services bit of efi.flags.
>
> Jia He (2):
>   efi: libstub: Fix the retriving of supported rutime services
>   efi: Introduce efi_get_supported_rt_services() to get the runtime
>     services mask
>
>  drivers/firmware/efi/arm-runtime.c      |  5 ++++-
>  drivers/firmware/efi/efi.c              | 28 +++++++++++++++++--------
>  drivers/firmware/efi/libstub/efi-stub.c |  2 ++
>  include/linux/efi.h                     |  1 +
>  4 files changed, 26 insertions(+), 10 deletions(-)
>
> --
> 2.25.1
>
Justin He Jan. 31, 2023, 3:31 p.m. UTC | #3
> -----Original Message-----
> From: Justin He
> Sent: Tuesday, January 31, 2023 11:22 PM
> To: Ard Biesheuvel <ardb@kernel.org>; Jason A. Donenfeld <Jason@zx2c4.com>
> Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>
> Subject: RE: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
>
> Hi Ard,
>
> > -----Original Message-----
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Sent: Tuesday, January 31, 2023 3:19 PM
> > To: Justin He <Justin.He@arm.com>; Jason A. Donenfeld <Jason@zx2c4.com>
> > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>
> > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> >
> > (cc Jason for awareness)
> >
> > On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@arm.com> wrote:
> > >
> > > I met a hung task warning and then kernel was hung forever with latest
> > > kernel on an Ampere Emag server.
> > >
> > > The root cause is kernel was hung  when invoking an efi rts call to
> > > set the RandomSeed variable during the booting stage. The
> > > arch_efi_call_virt call (set_variable) was never returned and then caused
> the
> > hung task error.
> > >
> >
> > Given that EFI variables work on this platform (as far as I know), the problem
> > may be that we are calling SetVariable() too early.
> >
> > Could you double check whether setting variables works as expected?
> > You can use efibootmgr -t 10 as root (for example) to set the boot timeout,
> and
> > check whether the new value is retained after a reboot (efibootmgr will print
> > the current value for you)
> >
> > Could you also please share the kernel log up until the point where it hangs?
> >
> The set_variable seems to be ok in 5.19+:
> root@:~# efibootmgr -t 10
> BootCurrent: 0000
> Timeout: 10 seconds
> BootOrder: 0000,0004,0003,0002,0001
> Boot0000* ubuntu
> Boot0001* grub
> Boot0002* UEFI: Built-in EFI Shell
> Boot0003* UEFI: PXE boot on MAC: 4C:38:D5:08:F6:02
> Boot0004* ubuntu
> root@:~# efibootmgr -t 9
> BootCurrent: 0000
> Timeout: 9 seconds
> BootOrder: 0000,0004,0003,0002,0001
> Boot0000* ubuntu
> Boot0001* grub
> Boot0002* UEFI: Built-in EFI Shell
> Boot0003* UEFI: PXE boot on MAC: 4C:38:D5:08:F6:02
> Boot0004* ubuntu
> root@:~# uname -a
> Linux $machine_name 5.19.0+ #228 SMP Wed Aug 17 13:22:29 UTC 2022
> aarch64 aarch64 aarch64 GNU/Linux
>
> The dmesg log until the hung(some information was removed, eg, hostname):
>
Sorry, I noticed my previous mail might be bounced back if the text is too long.
I pasted it to online Pastebin.com
https://pastebin.com/3zQyryp1

Hope it helps

Cheers,
Justin
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Jason A. Donenfeld Jan. 31, 2023, 6:03 p.m. UTC | #4
On Tue, Jan 31, 2023 at 08:18:49AM +0100, Ard Biesheuvel wrote:
> (cc Jason for awareness)
> 
> On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@arm.com> wrote:
> >
> > I met a hung task warning and then kernel was hung forever with latest
> > kernel on an Ampere Emag server.
> >
> > The root cause is kernel was hung  when invoking an efi rts call to set
> > the RandomSeed variable during the booting stage. The arch_efi_call_virt
> > call (set_variable) was never returned and then caused the hung task error.
> >
> 
> Given that EFI variables work on this platform (as far as I know), the
> problem may be that we are calling SetVariable() too early.
> 

On my phone and with very limited connectivity for another 10 days, but
I wonder if there's a later place we could move this block:

    if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
        execute_with_initialized_rng(&refresh_nv_rng_seed_nb);

Is there any additional initialization that happens after
efisubsys_init() that we're maybe missing out on there?

Jason
Jason A. Donenfeld Jan. 31, 2023, 6:13 p.m. UTC | #5
Actually...

On Tue, Jan 31, 2023 at 07:03:12PM +0100, Jason A. Donenfeld wrote:
> On Tue, Jan 31, 2023 at 08:18:49AM +0100, Ard Biesheuvel wrote:
> > (cc Jason for awareness)
> > 
> > On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@arm.com> wrote:
> > >
> > > I met a hung task warning and then kernel was hung forever with latest
> > > kernel on an Ampere Emag server.
> > >
> > > The root cause is kernel was hung  when invoking an efi rts call to set
> > > the RandomSeed variable during the booting stage. The arch_efi_call_virt
> > > call (set_variable) was never returned and then caused the hung task error.
> > >
> > 
> > Given that EFI variables work on this platform (as far as I know), the
> > problem may be that we are calling SetVariable() too early.
> > 
> 
> On my phone and with very limited connectivity for another 10 days, but
> I wonder if there's a later place we could move this block:
> 
>     if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
>         execute_with_initialized_rng(&refresh_nv_rng_seed_nb);
> 
> Is there any additional initialization that happens after
> efisubsys_init() that we're maybe missing out on there?
Jason A. Donenfeld Jan. 31, 2023, 6:23 p.m. UTC | #6
On Tue, Jan 31, 2023 at 03:21:39PM +0000, Justin He wrote:
> Hi Ard,
> 
> > -----Original Message-----
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Sent: Tuesday, January 31, 2023 3:19 PM
> > To: Justin He <Justin.He@arm.com>; Jason A. Donenfeld <Jason@zx2c4.com>
> > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>
> > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> >
> > (cc Jason for awareness)
> >
> > On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@arm.com> wrote:
> > >
> > > I met a hung task warning and then kernel was hung forever with latest
> > > kernel on an Ampere Emag server.
> > >
> > > The root cause is kernel was hung  when invoking an efi rts call to
> > > set the RandomSeed variable during the booting stage. The
> > > arch_efi_call_virt call (set_variable) was never returned and then caused the
> > hung task error.
> > >
> >
> > Given that EFI variables work on this platform (as far as I know), the problem
> > may be that we are calling SetVariable() too early.
> >
> > Could you double check whether setting variables works as expected?
> > You can use efibootmgr -t 10 as root (for example) to set the boot timeout, and
> > check whether the new value is retained after a reboot (efibootmgr will print
> > the current value for you)
> >
> > Could you also please share the kernel log up until the point where it hangs?
> >
> The set_variable seems to be ok in 5.19+:
> root@:~# efibootmgr -t 10
> BootCurrent: 0000
> Timeout: 10 seconds

I think what we want to learn is whether efibootmgr -t 10 works in the
latest RC. If not, it would suggest the issue isn't with the seed
setting, but with some other unrelated change.

Can you run efibootmgr -t 10 (or whatever) again on a kernel where
you've commented out these lines in efi.c inside of efisubsys_init():

    if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
        execute_with_initialized_rng(&refresh_nv_rng_seed_nb);

-->

    // if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
    //     execute_with_initialized_rng(&refresh_nv_rng_seed_nb);

Or something like that.

Jason
Justin He Feb. 2, 2023, 10:50 a.m. UTC | #7
Hi Jason

> -----Original Message-----
> From: Jason A. Donenfeld <Jason@zx2c4.com>
> Sent: Wednesday, February 1, 2023 2:23 AM
> To: Justin He <Justin.He@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>; Huacai Chen <chenhuacai@kernel.org>;
> linux-efi@vger.kernel.org; linux-kernel@vger.kernel.org; Alexandru Elisei
> <Alexandru.Elisei@arm.com>
> Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
>
> On Tue, Jan 31, 2023 at 03:21:39PM +0000, Justin He wrote:
> > Hi Ard,
> >
> > > -----Original Message-----
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > > Sent: Tuesday, January 31, 2023 3:19 PM
> > > To: Justin He <Justin.He@arm.com>; Jason A. Donenfeld
> > > <Jason@zx2c4.com>
> > > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > > linux-kernel@vger.kernel.org; Alexandru Elisei
> > > <Alexandru.Elisei@arm.com>
> > > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> > >
> > > (cc Jason for awareness)
> > >
> > > On Tue, 31 Jan 2023 at 05:04, Jia He <justin.he@arm.com> wrote:
> > > >
> > > > I met a hung task warning and then kernel was hung forever with
> > > > latest kernel on an Ampere Emag server.
> > > >
> > > > The root cause is kernel was hung  when invoking an efi rts call
> > > > to set the RandomSeed variable during the booting stage. The
> > > > arch_efi_call_virt call (set_variable) was never returned and then
> > > > caused the
> > > hung task error.
> > > >
> > >
> > > Given that EFI variables work on this platform (as far as I know),
> > > the problem may be that we are calling SetVariable() too early.
> > >
> > > Could you double check whether setting variables works as expected?
> > > You can use efibootmgr -t 10 as root (for example) to set the boot
> > > timeout, and check whether the new value is retained after a reboot
> > > (efibootmgr will print the current value for you)
> > >
> > > Could you also please share the kernel log up until the point where it
> hangs?
> > >
> > The set_variable seems to be ok in 5.19+:
> > root@:~# efibootmgr -t 10
> > BootCurrent: 0000
> > Timeout: 10 seconds
>
> I think what we want to learn is whether efibootmgr -t 10 works in the latest
> RC. If not, it would suggest the issue isn't with the seed setting, but with some
> other unrelated change.
>
> Can you run efibootmgr -t 10 (or whatever) again on a kernel where you've
> commented out these lines in efi.c inside of efisubsys_init():
>
>     if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
>         execute_with_initialized_rng(&refresh_nv_rng_seed_nb);
>
> -->
>
>     // if (efi_rt_services_supported(EFI_RT_SUPPORTED_SET_VARIABLE))
>     //     execute_with_initialized_rng(&refresh_nv_rng_seed_nb);
>
As your suggested (comment above execute_with_initialized_rng in latest kernel):
The efibootmgr -t X will be hung. Looks like one certain commit before your patch
broke the set_variable efi call. I will dig into the further debug and tell you the result.

---
Cheers,
Justin.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Justin He Feb. 7, 2023, 3:21 a.m. UTC | #8
Hi Ard

> -----Original Message-----
[...]
> As your suggested (comment above execute_with_initialized_rng in latest
> kernel):
> The efibootmgr -t X will be hung. Looks like one certain commit before your
> patch broke the set_variable efi call. I will dig into the further debug and tell
> you the result.

The root cause of the hung IMO might be similar to 
commit 550b33cfd445296868a478e8413ffb2e963eed32
Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Thu Nov 10 10:36:20 2022 +0100

    arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines

Do you agree with the idea if I add Ampere ”eMAG” machine into the list of
Using SetVirtualAddressMap() forcibly?

Please note that even in previous kernel patch, the efibootmgr -t 10 will make
kernel hung if I passed "efi=novamap" to the boot parameter.


--
Cheers,
Justin (Jia He)
Ard Biesheuvel Feb. 7, 2023, 8:44 a.m. UTC | #9
On Tue, 7 Feb 2023 at 04:21, Justin He <Justin.He@arm.com> wrote:
>
> Hi Ard
>
> > -----Original Message-----
> [...]
> > As your suggested (comment above execute_with_initialized_rng in latest
> > kernel):
> > The efibootmgr -t X will be hung. Looks like one certain commit before your
> > patch broke the set_variable efi call. I will dig into the further debug and tell
> > you the result.
>
> The root cause of the hung IMO might be similar to
> commit 550b33cfd445296868a478e8413ffb2e963eed32
> Author: Ard Biesheuvel <ardb@kernel.org>
> Date:   Thu Nov 10 10:36:20 2022 +0100
>
>     arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines
>
> Do you agree with the idea if I add Ampere ”eMAG” machine into the list of
> Using SetVirtualAddressMap() forcibly?
>
> Please note that even in previous kernel patch, the efibootmgr -t 10 will make
> kernel hung if I passed "efi=novamap" to the boot parameter.
>

Interesting. What does dmidecode return for the family in the type 1 record?
Justin He Feb. 7, 2023, 8:49 a.m. UTC | #10
> -----Original Message-----
[..]
> > The root cause of the hung IMO might be similar to commit
> > 550b33cfd445296868a478e8413ffb2e963eed32
> > Author: Ard Biesheuvel <ardb@kernel.org>
> > Date:   Thu Nov 10 10:36:20 2022 +0100
> >
> >     arm64: efi: Force the use of SetVirtualAddressMap() on Altra
> > machines
> >
> > Do you agree with the idea if I add Ampere ”eMAG” machine into the
> > list of Using SetVirtualAddressMap() forcibly?
> >
> > Please note that even in previous kernel patch, the efibootmgr -t 10
> > will make kernel hung if I passed "efi=novamap" to the boot parameter.
> >
> 
> Interesting. What does dmidecode return for the family in the type 1 record?

# dmidecode |grep -i family
        Family: eMAG
        Family: ARMv8

The full dmidecode log is at https://pastebin.com/M3MAJtUG


--
Cheers,
Justin (Jia He)
Ard Biesheuvel Feb. 7, 2023, 8:54 a.m. UTC | #11
On Tue, 7 Feb 2023 at 09:49, Justin He <Justin.He@arm.com> wrote:
>
>
>
> > -----Original Message-----
> [..]
> > > The root cause of the hung IMO might be similar to commit
> > > 550b33cfd445296868a478e8413ffb2e963eed32
> > > Author: Ard Biesheuvel <ardb@kernel.org>
> > > Date:   Thu Nov 10 10:36:20 2022 +0100
> > >
> > >     arm64: efi: Force the use of SetVirtualAddressMap() on Altra
> > > machines
> > >
> > > Do you agree with the idea if I add Ampere ”eMAG” machine into the
> > > list of Using SetVirtualAddressMap() forcibly?
> > >
> > > Please note that even in previous kernel patch, the efibootmgr -t 10
> > > will make kernel hung if I passed "efi=novamap" to the boot parameter.
> > >
> >
> > Interesting. What does dmidecode return for the family in the type 1 record?
>
> # dmidecode |grep -i family
>         Family: eMAG
>         Family: ARMv8
>
> The full dmidecode log is at https://pastebin.com/M3MAJtUG
>

OK please try this:

diff --git a/drivers/firmware/efi/libstub/arm64.c
b/drivers/firmware/efi/libstub/arm64.c
index ff2d18c42ee74979..fae930dec82be7c6 100644
--- a/drivers/firmware/efi/libstub/arm64.c
+++ b/drivers/firmware/efi/libstub/arm64.c
@@ -22,7 +22,8 @@ static bool system_needs_vamap(void)
         * Ampere Altra machines crash in SetTime() if SetVirtualAddressMap()
         * has not been called prior.
         */
-       if (!type1_family || strcmp(type1_family, "Altra"))
+       if (!type1_family ||
+           (strcmp(type1_family, "Altra") && strcmp(type1_family, "eMAG")))
                return false;

        efi_warn("Working around broken SetVirtualAddressMap()\n");
Justin He Feb. 7, 2023, 9:02 a.m. UTC | #12
Hi Ard

> -----Original Message-----
> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Tuesday, February 7, 2023 4:54 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>;
> Jason A. Donenfeld <Jason@zx2c4.com>; nd <nd@arm.com>
> Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> 
> On Tue, 7 Feb 2023 at 09:49, Justin He <Justin.He@arm.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > [..]
> > > > The root cause of the hung IMO might be similar to commit
> > > > 550b33cfd445296868a478e8413ffb2e963eed32
> > > > Author: Ard Biesheuvel <ardb@kernel.org>
> > > > Date:   Thu Nov 10 10:36:20 2022 +0100
> > > >
> > > >     arm64: efi: Force the use of SetVirtualAddressMap() on Altra
> > > > machines
> > > >
> > > > Do you agree with the idea if I add Ampere ”eMAG” machine into the
> > > > list of Using SetVirtualAddressMap() forcibly?
> > > >
> > > > Please note that even in previous kernel patch, the efibootmgr -t
> > > > 10 will make kernel hung if I passed "efi=novamap" to the boot
> parameter.
> > > >
> > >
> > > Interesting. What does dmidecode return for the family in the type 1
> record?
> >
> > # dmidecode |grep -i family
> >         Family: eMAG
> >         Family: ARMv8
> >
> > The full dmidecode log is at https://pastebin.com/M3MAJtUG
> >
> 
> OK please try this:
> 
> diff --git a/drivers/firmware/efi/libstub/arm64.c
> b/drivers/firmware/efi/libstub/arm64.c
> index ff2d18c42ee74979..fae930dec82be7c6 100644
> --- a/drivers/firmware/efi/libstub/arm64.c
> +++ b/drivers/firmware/efi/libstub/arm64.c
> @@ -22,7 +22,8 @@ static bool system_needs_vamap(void)
>          * Ampere Altra machines crash in SetTime() if
> SetVirtualAddressMap()
>          * has not been called prior.
>          */
> -       if (!type1_family || strcmp(type1_family, "Altra"))
> +       if (!type1_family ||
> +           (strcmp(type1_family, "Altra") && strcmp(type1_family,
> + "eMAG")))
>                 return false;
> 
>         efi_warn("Working around broken SetVirtualAddressMap()\n");

Yes, it works on my eMAG server: the kernel boots.
Other than efibootmgr failure. But I noticed this efibootmgr failure even before
Commit d3549a938b7 ("avoid SetVirtualAddressMap() when possible ")

root@:~/linux# efibootmgr -t 9; efibootmgr -t 5;
Could not set Timeout: Input/output error
Could not set Timeout: Input/output error


--
Cheers,
Justin (Jia He)
Ard Biesheuvel Feb. 7, 2023, 9:03 a.m. UTC | #13
On Tue, 7 Feb 2023 at 10:03, Justin He <Justin.He@arm.com> wrote:
>
> Hi Ard
>
> > -----Original Message-----
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Sent: Tuesday, February 7, 2023 4:54 PM
> > To: Justin He <Justin.He@arm.com>
> > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>;
> > Jason A. Donenfeld <Jason@zx2c4.com>; nd <nd@arm.com>
> > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> >
> > On Tue, 7 Feb 2023 at 09:49, Justin He <Justin.He@arm.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > [..]
> > > > > The root cause of the hung IMO might be similar to commit
> > > > > 550b33cfd445296868a478e8413ffb2e963eed32
> > > > > Author: Ard Biesheuvel <ardb@kernel.org>
> > > > > Date:   Thu Nov 10 10:36:20 2022 +0100
> > > > >
> > > > >     arm64: efi: Force the use of SetVirtualAddressMap() on Altra
> > > > > machines
> > > > >
> > > > > Do you agree with the idea if I add Ampere ”eMAG” machine into the
> > > > > list of Using SetVirtualAddressMap() forcibly?
> > > > >
> > > > > Please note that even in previous kernel patch, the efibootmgr -t
> > > > > 10 will make kernel hung if I passed "efi=novamap" to the boot
> > parameter.
> > > > >
> > > >
> > > > Interesting. What does dmidecode return for the family in the type 1
> > record?
> > >
> > > # dmidecode |grep -i family
> > >         Family: eMAG
> > >         Family: ARMv8
> > >
> > > The full dmidecode log is at https://pastebin.com/M3MAJtUG
> > >
> >
> > OK please try this:
> >
> > diff --git a/drivers/firmware/efi/libstub/arm64.c
> > b/drivers/firmware/efi/libstub/arm64.c
> > index ff2d18c42ee74979..fae930dec82be7c6 100644
> > --- a/drivers/firmware/efi/libstub/arm64.c
> > +++ b/drivers/firmware/efi/libstub/arm64.c
> > @@ -22,7 +22,8 @@ static bool system_needs_vamap(void)
> >          * Ampere Altra machines crash in SetTime() if
> > SetVirtualAddressMap()
> >          * has not been called prior.
> >          */
> > -       if (!type1_family || strcmp(type1_family, "Altra"))
> > +       if (!type1_family ||
> > +           (strcmp(type1_family, "Altra") && strcmp(type1_family,
> > + "eMAG")))
> >                 return false;
> >
> >         efi_warn("Working around broken SetVirtualAddressMap()\n");
>
> Yes, it works on my eMAG server: the kernel boots.
> Other than efibootmgr failure. But I noticed this efibootmgr failure even before
> Commit d3549a938b7 ("avoid SetVirtualAddressMap() when possible ")
>
> root@:~/linux# efibootmgr -t 9; efibootmgr -t 5;
> Could not set Timeout: Input/output error
> Could not set Timeout: Input/output error
>

Do you get any [Firmware Bug] lines in the kernel log?
Justin He Feb. 7, 2023, 9:08 a.m. UTC | #14
> -----Original Message-----
> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Tuesday, February 7, 2023 5:04 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>;
> Jason A. Donenfeld <Jason@zx2c4.com>; nd <nd@arm.com>
> Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> 
> On Tue, 7 Feb 2023 at 10:03, Justin He <Justin.He@arm.com> wrote:
> >
> > Hi Ard
> >
> > > -----Original Message-----
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > > Sent: Tuesday, February 7, 2023 4:54 PM
> > > To: Justin He <Justin.He@arm.com>
> > > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > > linux-kernel@vger.kernel.org; Alexandru Elisei
> > > <Alexandru.Elisei@arm.com>; Jason A. Donenfeld <Jason@zx2c4.com>; nd
> > > <nd@arm.com>
> > > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> > >
> > > On Tue, 7 Feb 2023 at 09:49, Justin He <Justin.He@arm.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > [..]
> > > > > > The root cause of the hung IMO might be similar to commit
> > > > > > 550b33cfd445296868a478e8413ffb2e963eed32
> > > > > > Author: Ard Biesheuvel <ardb@kernel.org>
> > > > > > Date:   Thu Nov 10 10:36:20 2022 +0100
> > > > > >
> > > > > >     arm64: efi: Force the use of SetVirtualAddressMap() on
> > > > > > Altra machines
> > > > > >
> > > > > > Do you agree with the idea if I add Ampere ”eMAG” machine into
> > > > > > the list of Using SetVirtualAddressMap() forcibly?
> > > > > >
> > > > > > Please note that even in previous kernel patch, the efibootmgr
> > > > > > -t
> > > > > > 10 will make kernel hung if I passed "efi=novamap" to the boot
> > > parameter.
> > > > > >
> > > > >
> > > > > Interesting. What does dmidecode return for the family in the
> > > > > type 1
> > > record?
> > > >
> > > > # dmidecode |grep -i family
> > > >         Family: eMAG
> > > >         Family: ARMv8
> > > >
> > > > The full dmidecode log is at https://pastebin.com/M3MAJtUG
> > > >
> > >
> > > OK please try this:
> > >
> > > diff --git a/drivers/firmware/efi/libstub/arm64.c
> > > b/drivers/firmware/efi/libstub/arm64.c
> > > index ff2d18c42ee74979..fae930dec82be7c6 100644
> > > --- a/drivers/firmware/efi/libstub/arm64.c
> > > +++ b/drivers/firmware/efi/libstub/arm64.c
> > > @@ -22,7 +22,8 @@ static bool system_needs_vamap(void)
> > >          * Ampere Altra machines crash in SetTime() if
> > > SetVirtualAddressMap()
> > >          * has not been called prior.
> > >          */
> > > -       if (!type1_family || strcmp(type1_family, "Altra"))
> > > +       if (!type1_family ||
> > > +           (strcmp(type1_family, "Altra") && strcmp(type1_family,
> > > + "eMAG")))
> > >                 return false;
> > >
> > >         efi_warn("Working around broken SetVirtualAddressMap()\n");
> >
> > Yes, it works on my eMAG server: the kernel boots.
> > Other than efibootmgr failure. But I noticed this efibootmgr failure
> > even before Commit d3549a938b7 ("avoid SetVirtualAddressMap() when
> > possible ")
> >
> > root@:~/linux# efibootmgr -t 9; efibootmgr -t 5; Could not set
> > Timeout: Input/output error Could not set Timeout: Input/output error
> >
> 
> Do you get any [Firmware Bug] lines in the kernel log?

No, 
I built the kernel based on:
commit d2d11f342b179f1894a901f143ec7c008caba43e (HEAD -> master, origin/master, origin/HEAD)
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Feb 5 17:17:10 2023 -0800

    Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Are you worried about your sync exception fixup patch? I think it has been included.


--
Cheers,
Justin (Jia He)
Ard Biesheuvel Feb. 7, 2023, 9:09 a.m. UTC | #15
On Tue, 7 Feb 2023 at 10:08, Justin He <Justin.He@arm.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Ard Biesheuvel <ardb@kernel.org>
> > Sent: Tuesday, February 7, 2023 5:04 PM
> > To: Justin He <Justin.He@arm.com>
> > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>;
> > Jason A. Donenfeld <Jason@zx2c4.com>; nd <nd@arm.com>
> > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> >
> > On Tue, 7 Feb 2023 at 10:03, Justin He <Justin.He@arm.com> wrote:
> > >
> > > Hi Ard
> > >
> > > > -----Original Message-----
> > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > > Sent: Tuesday, February 7, 2023 4:54 PM
> > > > To: Justin He <Justin.He@arm.com>
> > > > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > > > linux-kernel@vger.kernel.org; Alexandru Elisei
> > > > <Alexandru.Elisei@arm.com>; Jason A. Donenfeld <Jason@zx2c4.com>; nd
> > > > <nd@arm.com>
> > > > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> > > >
> > > > On Tue, 7 Feb 2023 at 09:49, Justin He <Justin.He@arm.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > [..]
> > > > > > > The root cause of the hung IMO might be similar to commit
> > > > > > > 550b33cfd445296868a478e8413ffb2e963eed32
> > > > > > > Author: Ard Biesheuvel <ardb@kernel.org>
> > > > > > > Date:   Thu Nov 10 10:36:20 2022 +0100
> > > > > > >
> > > > > > >     arm64: efi: Force the use of SetVirtualAddressMap() on
> > > > > > > Altra machines
> > > > > > >
> > > > > > > Do you agree with the idea if I add Ampere ”eMAG” machine into
> > > > > > > the list of Using SetVirtualAddressMap() forcibly?
> > > > > > >
> > > > > > > Please note that even in previous kernel patch, the efibootmgr
> > > > > > > -t
> > > > > > > 10 will make kernel hung if I passed "efi=novamap" to the boot
> > > > parameter.
> > > > > > >
> > > > > >
> > > > > > Interesting. What does dmidecode return for the family in the
> > > > > > type 1
> > > > record?
> > > > >
> > > > > # dmidecode |grep -i family
> > > > >         Family: eMAG
> > > > >         Family: ARMv8
> > > > >
> > > > > The full dmidecode log is at https://pastebin.com/M3MAJtUG
> > > > >
> > > >
> > > > OK please try this:
> > > >
> > > > diff --git a/drivers/firmware/efi/libstub/arm64.c
> > > > b/drivers/firmware/efi/libstub/arm64.c
> > > > index ff2d18c42ee74979..fae930dec82be7c6 100644
> > > > --- a/drivers/firmware/efi/libstub/arm64.c
> > > > +++ b/drivers/firmware/efi/libstub/arm64.c
> > > > @@ -22,7 +22,8 @@ static bool system_needs_vamap(void)
> > > >          * Ampere Altra machines crash in SetTime() if
> > > > SetVirtualAddressMap()
> > > >          * has not been called prior.
> > > >          */
> > > > -       if (!type1_family || strcmp(type1_family, "Altra"))
> > > > +       if (!type1_family ||
> > > > +           (strcmp(type1_family, "Altra") && strcmp(type1_family,
> > > > + "eMAG")))
> > > >                 return false;
> > > >
> > > >         efi_warn("Working around broken SetVirtualAddressMap()\n");
> > >
> > > Yes, it works on my eMAG server: the kernel boots.
> > > Other than efibootmgr failure. But I noticed this efibootmgr failure
> > > even before Commit d3549a938b7 ("avoid SetVirtualAddressMap() when
> > > possible ")
> > >
> > > root@:~/linux# efibootmgr -t 9; efibootmgr -t 5; Could not set
> > > Timeout: Input/output error Could not set Timeout: Input/output error
> > >
> >
> > Do you get any [Firmware Bug] lines in the kernel log?
>
> No,
> I built the kernel based on:
> commit d2d11f342b179f1894a901f143ec7c008caba43e (HEAD -> master, origin/master, origin/HEAD)
> Author: Linus Torvalds <torvalds@linux-foundation.org>
> Date:   Sun Feb 5 17:17:10 2023 -0800
>
>     Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
>
> Are you worried about your sync exception fixup patch? I think it has been included.
>


I would just like to understand why setvariable is still broken for you.
Justin He Feb. 8, 2023, 3:10 a.m. UTC | #16
Hi Ard

> -----Original Message-----
> From: Ard Biesheuvel <ardb@kernel.org>
> Sent: Tuesday, February 7, 2023 5:09 PM
> To: Justin He <Justin.He@arm.com>
> Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> linux-kernel@vger.kernel.org; Alexandru Elisei <Alexandru.Elisei@arm.com>;
> Jason A. Donenfeld <Jason@zx2c4.com>; nd <nd@arm.com>
> Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> 
> On Tue, 7 Feb 2023 at 10:08, Justin He <Justin.He@arm.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > > Sent: Tuesday, February 7, 2023 5:04 PM
> > > To: Justin He <Justin.He@arm.com>
> > > Cc: Huacai Chen <chenhuacai@kernel.org>; linux-efi@vger.kernel.org;
> > > linux-kernel@vger.kernel.org; Alexandru Elisei
> > > <Alexandru.Elisei@arm.com>; Jason A. Donenfeld <Jason@zx2c4.com>; nd
> > > <nd@arm.com>
> > > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag server
> > >
> > > On Tue, 7 Feb 2023 at 10:03, Justin He <Justin.He@arm.com> wrote:
> > > >
> > > > Hi Ard
> > > >
> > > > > -----Original Message-----
> > > > > From: Ard Biesheuvel <ardb@kernel.org>
> > > > > Sent: Tuesday, February 7, 2023 4:54 PM
> > > > > To: Justin He <Justin.He@arm.com>
> > > > > Cc: Huacai Chen <chenhuacai@kernel.org>;
> > > > > linux-efi@vger.kernel.org; linux-kernel@vger.kernel.org;
> > > > > Alexandru Elisei <Alexandru.Elisei@arm.com>; Jason A. Donenfeld
> > > > > <Jason@zx2c4.com>; nd <nd@arm.com>
> > > > > Subject: Re: [PATCH 0/2] Fix boot hang issue on Ampere Emag
> > > > > server
> > > > >
> > > > > On Tue, 7 Feb 2023 at 09:49, Justin He <Justin.He@arm.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > [..]
> > > > > > > > The root cause of the hung IMO might be similar to commit
> > > > > > > > 550b33cfd445296868a478e8413ffb2e963eed32
> > > > > > > > Author: Ard Biesheuvel <ardb@kernel.org>
> > > > > > > > Date:   Thu Nov 10 10:36:20 2022 +0100
> > > > > > > >
> > > > > > > >     arm64: efi: Force the use of SetVirtualAddressMap() on
> > > > > > > > Altra machines
> > > > > > > >
> > > > > > > > Do you agree with the idea if I add Ampere ”eMAG” machine
> > > > > > > > into the list of Using SetVirtualAddressMap() forcibly?
> > > > > > > >
> > > > > > > > Please note that even in previous kernel patch, the
> > > > > > > > efibootmgr -t
> > > > > > > > 10 will make kernel hung if I passed "efi=novamap" to the
> > > > > > > > boot
> > > > > parameter.
> > > > > > > >
> > > > > > >
> > > > > > > Interesting. What does dmidecode return for the family in
> > > > > > > the type 1
> > > > > record?
> > > > > >
> > > > > > # dmidecode |grep -i family
> > > > > >         Family: eMAG
> > > > > >         Family: ARMv8
> > > > > >
> > > > > > The full dmidecode log is at https://pastebin.com/M3MAJtUG
> > > > > >
> > > > >
> > > > > OK please try this:
> > > > >
> > > > > diff --git a/drivers/firmware/efi/libstub/arm64.c
> > > > > b/drivers/firmware/efi/libstub/arm64.c
> > > > > index ff2d18c42ee74979..fae930dec82be7c6 100644
> > > > > --- a/drivers/firmware/efi/libstub/arm64.c
> > > > > +++ b/drivers/firmware/efi/libstub/arm64.c
> > > > > @@ -22,7 +22,8 @@ static bool system_needs_vamap(void)
> > > > >          * Ampere Altra machines crash in SetTime() if
> > > > > SetVirtualAddressMap()
> > > > >          * has not been called prior.
> > > > >          */
> > > > > -       if (!type1_family || strcmp(type1_family, "Altra"))
> > > > > +       if (!type1_family ||
> > > > > +           (strcmp(type1_family, "Altra") &&
> > > > > + strcmp(type1_family,
> > > > > + "eMAG")))
> > > > >                 return false;
> > > > >
> > > > >         efi_warn("Working around broken
> > > > > SetVirtualAddressMap()\n");
> > > >
> > > > Yes, it works on my eMAG server: the kernel boots.
> > > > Other than efibootmgr failure. But I noticed this efibootmgr
> > > > failure even before Commit d3549a938b7 ("avoid
> > > > SetVirtualAddressMap() when possible ")
> > > >
> > > > root@:~/linux# efibootmgr -t 9; efibootmgr -t 5; Could not set
> > > > Timeout: Input/output error Could not set Timeout: Input/output
> > > > error
> > > >
> > >
> > > Do you get any [Firmware Bug] lines in the kernel log?
> >
> > No,
> > I built the kernel based on:
> > commit d2d11f342b179f1894a901f143ec7c008caba43e (HEAD -> master,
> > origin/master, origin/HEAD)
> > Author: Linus Torvalds <torvalds@linux-foundation.org>
> > Date:   Sun Feb 5 17:17:10 2023 -0800
> >
> >     Merge branch 'fixes' of
> > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
> >
> > Are you worried about your sync exception fixup patch? I think it has been
> included.
> >
> 
> 
> I would just like to understand why setvariable is still broken for you.

On an Ampere *Altra* server, the family name seems to not follow your purpose of invoking efi_get_smbios_string(1, family).

dmidecode |grep -i family -C 10

Handle 0x0001, DMI type 1, 27 bytes
System Information
        Manufacturer: GIGABYTE
        Product Name: R272-P30-00
        Version: 0100
        Serial Number: TS2035812A0022
        UUID: 00000000-0000-4000-8000-B42E99AFEF62
        Wake-up Type: Power Switch
        SKU Number: 01234567890123456789AB
        Family: Server

The full dmidecode info of Altra is at
https://pastebin.com/HQLE1yYv

--
Cheers,
Justin (Jia He)