diff mbox

[1/2] efi: arm64: abort boot on pending SError

Message ID 1467385291-9880-1-git-send-email-ard.biesheuvel@linaro.org
State New
Headers show

Commit Message

Ard Biesheuvel July 1, 2016, 3:01 p.m. UTC
It is the firmware's job to clear any pending SErrors before entering
the kernel. On UEFI, we can fail gracefully rather than panic during
early boot, so check for this condition in the stub.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 drivers/firmware/efi/libstub/arm64-stub.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

-- 
2.7.4


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Comments

Mark Rutland July 1, 2016, 3:22 p.m. UTC | #1
On Fri, Jul 01, 2016 at 05:01:30PM +0200, Ard Biesheuvel wrote:
> It is the firmware's job to clear any pending SErrors before entering

> the kernel. On UEFI, we can fail gracefully rather than panic during

> early boot, so check for this condition in the stub.

> 

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>


An SError could be triggered either asynchronously by FW, or as a result
of our actions at any point after this, e.g. due to the filesystem
accesses made to load an initrd.

So in practice, is checking here useful? Have we seen FW with masked but
pending SError at the point we enter the stub rather than that SError
being triggered later,?

I'm also not sure what this means for CPER, which may use SError to
signal to the OS. It's possible that the UEFI implementation polls
ISR_EL1 itself, and handles SError appropriately internally, or that the
OS can later deal with the SError based on CPER and friends.

> ---

>  drivers/firmware/efi/libstub/arm64-stub.c | 9 ++++++++-

>  1 file changed, 8 insertions(+), 1 deletion(-)

> 

> diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c

> index eae693eb3e91..c7e7396de876 100644

> --- a/drivers/firmware/efi/libstub/arm64-stub.c

> +++ b/drivers/firmware/efi/libstub/arm64-stub.c

> @@ -20,7 +20,14 @@ extern bool __nokaslr;

>  

>  efi_status_t check_platform_features(efi_system_table_t *sys_table_arg)

>  {

> -	u64 tg;

> +	u64 tg, isr;

> +

> +	/* check for a pending SError */

> +	asm ("mrs %0, isr_el1" : "=r"(isr));


I think you can use read_sysreg(isr_el1) here, given that the
read_sysreg helper is a macro function, and we already include sysreg.h
here.

> +	if (isr & BIT(8)) {


Perhaps add:

#define ISR_EL1_A_BIT	BIT(8)

to sysreg.h?

> +		pr_efi_err(sys_table_arg, "Pending SError detected -- aborting\n");

> +		return EFI_LOAD_ERROR;

> +	}


Otherwise, code-wise this looks fine, but as above I'm not sure if this
is the right thiing to do. :/

Thanks,
Mark.

>  

>  	/* UEFI mandates support for 4 KB granularity, no need to check */

>  	if (IS_ENABLED(CONFIG_ARM64_4K_PAGES))

> -- 

> 2.7.4

> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel July 1, 2016, 3:31 p.m. UTC | #2
On 1 July 2016 at 17:22, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jul 01, 2016 at 05:01:30PM +0200, Ard Biesheuvel wrote:

>> It is the firmware's job to clear any pending SErrors before entering

>> the kernel. On UEFI, we can fail gracefully rather than panic during

>> early boot, so check for this condition in the stub.

>>

>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>

> An SError could be triggered either asynchronously by FW, or as a result

> of our actions at any point after this, e.g. due to the filesystem

> accesses made to load an initrd.

>

> So in practice, is checking here useful? Have we seen FW with masked but

> pending SError at the point we enter the stub rather than that SError

> being triggered later,?

>


Yes. EDK2 keeps SError masked throughout its execution by default, and
so any condition that triggered an SError up till this point is likely
to still be pending, and blow up the kernel as soon as it unmasks it.

> I'm also not sure what this means for CPER, which may use SError to

> signal to the OS. It's possible that the UEFI implementation polls

> ISR_EL1 itself, and handles SError appropriately internally, or that the

> OS can later deal with the SError based on CPER and friends.

>


Currently, the kernel panics on an SError, and so what the kernel
should do once we start dealing with them in a more sophisticated way
is hypothetical at the moment. Once that code arrives, it may revert
this change, but for now, being dropped back into the UEFI shell does
sound more appealing than panic early imo.

>> ---

>>  drivers/firmware/efi/libstub/arm64-stub.c | 9 ++++++++-

>>  1 file changed, 8 insertions(+), 1 deletion(-)

>>

>> diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c

>> index eae693eb3e91..c7e7396de876 100644

>> --- a/drivers/firmware/efi/libstub/arm64-stub.c

>> +++ b/drivers/firmware/efi/libstub/arm64-stub.c

>> @@ -20,7 +20,14 @@ extern bool __nokaslr;

>>

>>  efi_status_t check_platform_features(efi_system_table_t *sys_table_arg)

>>  {

>> -     u64 tg;

>> +     u64 tg, isr;

>> +

>> +     /* check for a pending SError */

>> +     asm ("mrs %0, isr_el1" : "=r"(isr));

>

> I think you can use read_sysreg(isr_el1) here, given that the

> read_sysreg helper is a macro function, and we already include sysreg.h

> here.

>


OK

>> +     if (isr & BIT(8)) {

>

> Perhaps add:

>

> #define ISR_EL1_A_BIT   BIT(8)

>

> to sysreg.h?

>


OK

>> +             pr_efi_err(sys_table_arg, "Pending SError detected -- aborting\n");

>> +             return EFI_LOAD_ERROR;

>> +     }

>

> Otherwise, code-wise this looks fine, but as above I'm not sure if this

> is the right thiing to do. :/

>


Yes, let's clear that up first.

-- 
Ard,

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Mark Rutland July 1, 2016, 3:46 p.m. UTC | #3
On Fri, Jul 01, 2016 at 05:31:33PM +0200, Ard Biesheuvel wrote:
> On 1 July 2016 at 17:22, Mark Rutland <mark.rutland@arm.com> wrote:

> > On Fri, Jul 01, 2016 at 05:01:30PM +0200, Ard Biesheuvel wrote:

> >> It is the firmware's job to clear any pending SErrors before entering

> >> the kernel. On UEFI, we can fail gracefully rather than panic during

> >> early boot, so check for this condition in the stub.

> >>

> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> >

> > An SError could be triggered either asynchronously by FW, or as a result

> > of our actions at any point after this, e.g. due to the filesystem

> > accesses made to load an initrd.

> >

> > So in practice, is checking here useful? Have we seen FW with masked but

> > pending SError at the point we enter the stub rather than that SError

> > being triggered later,?

> 

> Yes. EDK2 keeps SError masked throughout its execution by default, and

> so any condition that triggered an SError up till this point is likely

> to still be pending, and blow up the kernel as soon as it unmasks it.


Ok.

> > I'm also not sure what this means for CPER, which may use SError to

> > signal to the OS. It's possible that the UEFI implementation polls

> > ISR_EL1 itself, and handles SError appropriately internally, or that the

> > OS can later deal with the SError based on CPER and friends.

> 

> Currently, the kernel panics on an SError, and so what the kernel

> should do once we start dealing with them in a more sophisticated way

> is hypothetical at the moment. Once that code arrives, it may revert

> this change, but for now, being dropped back into the UEFI shell does

> sound more appealing than panic early imo.


Logging something while the UART is available is certainly appealing.

As you say, we can change this later if/when we have more advanced
SError handling. So modulo my prior comments, I guess this is fine for
now.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel July 2, 2016, 10:14 a.m. UTC | #4
On 1 July 2016 at 17:46, Mark Rutland <mark.rutland@arm.com> wrote:
> On Fri, Jul 01, 2016 at 05:31:33PM +0200, Ard Biesheuvel wrote:

>> On 1 July 2016 at 17:22, Mark Rutland <mark.rutland@arm.com> wrote:

>> > On Fri, Jul 01, 2016 at 05:01:30PM +0200, Ard Biesheuvel wrote:

>> >> It is the firmware's job to clear any pending SErrors before entering

>> >> the kernel. On UEFI, we can fail gracefully rather than panic during

>> >> early boot, so check for this condition in the stub.

>> >>

>> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>> >

>> > An SError could be triggered either asynchronously by FW, or as a result

>> > of our actions at any point after this, e.g. due to the filesystem

>> > accesses made to load an initrd.

>> >

>> > So in practice, is checking here useful? Have we seen FW with masked but

>> > pending SError at the point we enter the stub rather than that SError

>> > being triggered later,?

>>

>> Yes. EDK2 keeps SError masked throughout its execution by default, and

>> so any condition that triggered an SError up till this point is likely

>> to still be pending, and blow up the kernel as soon as it unmasks it.

>

> Ok.

>

>> > I'm also not sure what this means for CPER, which may use SError to

>> > signal to the OS. It's possible that the UEFI implementation polls

>> > ISR_EL1 itself, and handles SError appropriately internally, or that the

>> > OS can later deal with the SError based on CPER and friends.

>>

>> Currently, the kernel panics on an SError, and so what the kernel

>> should do once we start dealing with them in a more sophisticated way

>> is hypothetical at the moment. Once that code arrives, it may revert

>> this change, but for now, being dropped back into the UEFI shell does

>> sound more appealing than panic early imo.

>

> Logging something while the UART is available is certainly appealing.

>


Not just the UART, the graphical console as well, if the system has one.

> As you say, we can change this later if/when we have more advanced

> SError handling. So modulo my prior comments, I guess this is fine for

> now.

>


OK, thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff mbox

Patch

diff --git a/drivers/firmware/efi/libstub/arm64-stub.c b/drivers/firmware/efi/libstub/arm64-stub.c
index eae693eb3e91..c7e7396de876 100644
--- a/drivers/firmware/efi/libstub/arm64-stub.c
+++ b/drivers/firmware/efi/libstub/arm64-stub.c
@@ -20,7 +20,14 @@  extern bool __nokaslr;
 
 efi_status_t check_platform_features(efi_system_table_t *sys_table_arg)
 {
-	u64 tg;
+	u64 tg, isr;
+
+	/* check for a pending SError */
+	asm ("mrs %0, isr_el1" : "=r"(isr));
+	if (isr & BIT(8)) {
+		pr_efi_err(sys_table_arg, "Pending SError detected -- aborting\n");
+		return EFI_LOAD_ERROR;
+	}
 
 	/* UEFI mandates support for 4 KB granularity, no need to check */
 	if (IS_ENABLED(CONFIG_ARM64_4K_PAGES))