[edk2,2/2] ArmPkg/CpuDxe: unmask SErrors in DEBUG builds

Message ID 1467370393-7703-2-git-send-email-ard.biesheuvel@linaro.org
State Accepted
Commit 3b3593b567761d2568ea70c117c7111fd85e79b9
Headers show

Commit Message

Ard Biesheuvel July 1, 2016, 10:53 a.m.
SErrors (formerly called asynchronous aborts) are a distinct class of
exceptions that are not closely tied to the currently executing
instruction. Since execution may be able to proceed in such a condition,
this class of exception is masked by default, and software needs to unmask
it explicitly if it is prepared to handle such exceptions.

On DEBUG builds, we are well equipped to report the CPU context to the user
and it makes sense to report an SError as soon as it occurs rather than to
wait for the OS to take it when it unmasks them, especially since the current
arm64/Linux implementation simply panics in that case. So unmask them when
ArmCpuDxe loads.

Contributed-under: TianoCore Contribution Agreement 1.0
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 ArmPkg/Drivers/CpuDxe/Exception.c | 9 +++++++++
 1 file changed, 9 insertions(+)

-- 
2.7.4

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Comments

Mark Rutland July 1, 2016, 11:01 a.m. | #1
On Fri, Jul 01, 2016 at 12:53:13PM +0200, Ard Biesheuvel wrote:
> SErrors (formerly called asynchronous aborts) are a distinct class of

> exceptions that are not closely tied to the currently executing

> instruction. Since execution may be able to proceed in such a condition,

> this class of exception is masked by default, and software needs to unmask

> it explicitly if it is prepared to handle such exceptions.

> 

> On DEBUG builds, we are well equipped to report the CPU context to the user

> and it makes sense to report an SError as soon as it occurs rather than to

> wait for the OS to take it when it unmasks them, especially since the current

> arm64/Linux implementation simply panics in that case. So unmask them when

> ArmCpuDxe loads.

> 

> Contributed-under: TianoCore Contribution Agreement 1.0

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

>  ArmPkg/Drivers/CpuDxe/Exception.c | 9 +++++++++

>  1 file changed, 9 insertions(+)


These look sensible to me. FWIW, for both patches:

Acked-by: Mark Rutland <mark.rutland@arm.com>


Mark.

> 

> diff --git a/ArmPkg/Drivers/CpuDxe/Exception.c b/ArmPkg/Drivers/CpuDxe/Exception.c

> index c3107cd4a6bc..d806a5fdf910 100644

> --- a/ArmPkg/Drivers/CpuDxe/Exception.c

> +++ b/ArmPkg/Drivers/CpuDxe/Exception.c

> @@ -62,6 +62,15 @@ InitializeExceptions (

>      Status = Cpu->EnableInterrupt (Cpu);

>    }

>  

> +  //

> +  // On a DEBUG build, unmask SErrors so they are delivered right away rather

> +  // than when the OS unmasks them. This gives us a better chance of figuring

> +  // out the cause.

> +  //

> +  DEBUG_CODE (

> +    ArmEnableAsynchronousAbort ();

> +  );

> +

>    return Status;

>  }

>  

> -- 

> 2.7.4

> 

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Leif Lindholm July 1, 2016, 11:03 a.m. | #2
On Fri, Jul 01, 2016 at 12:53:13PM +0200, Ard Biesheuvel wrote:
> SErrors (formerly called asynchronous aborts) are a distinct class of

> exceptions that are not closely tied to the currently executing

> instruction. Since execution may be able to proceed in such a condition,

> this class of exception is masked by default, and software needs to unmask

> it explicitly if it is prepared to handle such exceptions.

> 

> On DEBUG builds, we are well equipped to report the CPU context to the user

> and it makes sense to report an SError as soon as it occurs rather than to

> wait for the OS to take it when it unmasks them, especially since the current

> arm64/Linux implementation simply panics in that case. So unmask them when

> ArmCpuDxe loads.


So, I agree with this patch:
Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>


But if we're doing this, I wonder if we should not also do so for
non-DEBUG builds? Do we expect other operating systems to do anything
useful with the exception? If not, hanging when the error occurs
sounds more informative than hanging as soon as you start your OS.

> Contributed-under: TianoCore Contribution Agreement 1.0

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

>  ArmPkg/Drivers/CpuDxe/Exception.c | 9 +++++++++

>  1 file changed, 9 insertions(+)

> 

> diff --git a/ArmPkg/Drivers/CpuDxe/Exception.c b/ArmPkg/Drivers/CpuDxe/Exception.c

> index c3107cd4a6bc..d806a5fdf910 100644

> --- a/ArmPkg/Drivers/CpuDxe/Exception.c

> +++ b/ArmPkg/Drivers/CpuDxe/Exception.c

> @@ -62,6 +62,15 @@ InitializeExceptions (

>      Status = Cpu->EnableInterrupt (Cpu);

>    }

>  

> +  //

> +  // On a DEBUG build, unmask SErrors so they are delivered right away rather

> +  // than when the OS unmasks them. This gives us a better chance of figuring

> +  // out the cause.

> +  //

> +  DEBUG_CODE (

> +    ArmEnableAsynchronousAbort ();

> +  );

> +

>    return Status;

>  }

>  

> -- 

> 2.7.4

> 

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel July 1, 2016, 11:21 a.m. | #3
On 1 July 2016 at 13:03, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Fri, Jul 01, 2016 at 12:53:13PM +0200, Ard Biesheuvel wrote:

>> SErrors (formerly called asynchronous aborts) are a distinct class of

>> exceptions that are not closely tied to the currently executing

>> instruction. Since execution may be able to proceed in such a condition,

>> this class of exception is masked by default, and software needs to unmask

>> it explicitly if it is prepared to handle such exceptions.

>>

>> On DEBUG builds, we are well equipped to report the CPU context to the user

>> and it makes sense to report an SError as soon as it occurs rather than to

>> wait for the OS to take it when it unmasks them, especially since the current

>> arm64/Linux implementation simply panics in that case. So unmask them when

>> ArmCpuDxe loads.

>

> So, I agree with this patch:

> Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>

>

> But if we're doing this, I wonder if we should not also do so for

> non-DEBUG builds? Do we expect other operating systems to do anything

> useful with the exception? If not, hanging when the error occurs

> sounds more informative than hanging as soon as you start your OS.

>


Well, the point is that we send debug output only to the serial port,
and you can reasonably expect someone running a DEBUG build to be
monitoring its output. On a RELEASE build, it makes more sense for the
OS to handle it (although panicking as early as arm64 linux does is
not very helpful either) since it will hang without any feedback to
the UEFI console (even the RELEASE code in the default exception
handler writes to the serial port directly rather than to UEFI's
console)
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Leif Lindholm July 1, 2016, 1:48 p.m. | #4
On Fri, Jul 01, 2016 at 01:21:36PM +0200, Ard Biesheuvel wrote:
> On 1 July 2016 at 13:03, Leif Lindholm <leif.lindholm@linaro.org> wrote:

> > On Fri, Jul 01, 2016 at 12:53:13PM +0200, Ard Biesheuvel wrote:

> >> SErrors (formerly called asynchronous aborts) are a distinct class of

> >> exceptions that are not closely tied to the currently executing

> >> instruction. Since execution may be able to proceed in such a condition,

> >> this class of exception is masked by default, and software needs to unmask

> >> it explicitly if it is prepared to handle such exceptions.

> >>

> >> On DEBUG builds, we are well equipped to report the CPU context to the user

> >> and it makes sense to report an SError as soon as it occurs rather than to

> >> wait for the OS to take it when it unmasks them, especially since the current

> >> arm64/Linux implementation simply panics in that case. So unmask them when

> >> ArmCpuDxe loads.

> >

> > So, I agree with this patch:

> > Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>

> >

> > But if we're doing this, I wonder if we should not also do so for

> > non-DEBUG builds? Do we expect other operating systems to do anything

> > useful with the exception? If not, hanging when the error occurs

> > sounds more informative than hanging as soon as you start your OS.

> >

> 

> Well, the point is that we send debug output only to the serial port,

> and you can reasonably expect someone running a DEBUG build to be

> monitoring its output. On a RELEASE build, it makes more sense for the

> OS to handle it (although panicking as early as arm64 linux does is

> not very helpful either) since it will hang without any feedback to

> the UEFI console (even the RELEASE code in the default exception

> handler writes to the serial port directly rather than to UEFI's

> console)


No, I follow that logic completely.
I agree it is a lot less useful without the debug output.

However, I am asking if it is still not providing more information to
the user if the firmware hangs when an untraceable exception happens
during boot, than if the operating system hangs at a specific point
regardless where the SError was triggered.

If we say "other operating systems may handle this better, or at least
be able to successfully boot", then that is a decent argument to only
ASSERT on DEBUG.

/
    Leif
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel July 1, 2016, 1:57 p.m. | #5
On 1 July 2016 at 15:48, Leif Lindholm <leif.lindholm@linaro.org> wrote:
> On Fri, Jul 01, 2016 at 01:21:36PM +0200, Ard Biesheuvel wrote:

>> On 1 July 2016 at 13:03, Leif Lindholm <leif.lindholm@linaro.org> wrote:

>> > On Fri, Jul 01, 2016 at 12:53:13PM +0200, Ard Biesheuvel wrote:

>> >> SErrors (formerly called asynchronous aborts) are a distinct class of

>> >> exceptions that are not closely tied to the currently executing

>> >> instruction. Since execution may be able to proceed in such a condition,

>> >> this class of exception is masked by default, and software needs to unmask

>> >> it explicitly if it is prepared to handle such exceptions.

>> >>

>> >> On DEBUG builds, we are well equipped to report the CPU context to the user

>> >> and it makes sense to report an SError as soon as it occurs rather than to

>> >> wait for the OS to take it when it unmasks them, especially since the current

>> >> arm64/Linux implementation simply panics in that case. So unmask them when

>> >> ArmCpuDxe loads.

>> >

>> > So, I agree with this patch:

>> > Reviewed-by: Leif Lindholm <leif.lindholm@linaro.org>

>> >

>> > But if we're doing this, I wonder if we should not also do so for

>> > non-DEBUG builds? Do we expect other operating systems to do anything

>> > useful with the exception? If not, hanging when the error occurs

>> > sounds more informative than hanging as soon as you start your OS.

>> >

>>

>> Well, the point is that we send debug output only to the serial port,

>> and you can reasonably expect someone running a DEBUG build to be

>> monitoring its output. On a RELEASE build, it makes more sense for the

>> OS to handle it (although panicking as early as arm64 linux does is

>> not very helpful either) since it will hang without any feedback to

>> the UEFI console (even the RELEASE code in the default exception

>> handler writes to the serial port directly rather than to UEFI's

>> console)

>

> No, I follow that logic completely.

> I agree it is a lot less useful without the debug output.

>

> However, I am asking if it is still not providing more information to

> the user if the firmware hangs when an untraceable exception happens

> during boot, than if the operating system hangs at a specific point

> regardless where the SError was triggered.

>

> If we say "other operating systems may handle this better, or at least

> be able to successfully boot", then that is a decent argument to only

> ASSERT on DEBUG.

>


I think it is fundamentally the firmware's job to deal with these
conditions if it can, but hanging with a black screen is simply not
very helpful. On a RELEASE build, being able to make it to the UEFI
shell or to a setup screen that at least tells you which version of
the firmware you are running is probably more useful than knowing that
the SError occurred very early on.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Patch

diff --git a/ArmPkg/Drivers/CpuDxe/Exception.c b/ArmPkg/Drivers/CpuDxe/Exception.c
index c3107cd4a6bc..d806a5fdf910 100644
--- a/ArmPkg/Drivers/CpuDxe/Exception.c
+++ b/ArmPkg/Drivers/CpuDxe/Exception.c
@@ -62,6 +62,15 @@  InitializeExceptions (
     Status = Cpu->EnableInterrupt (Cpu);
   }
 
+  //
+  // On a DEBUG build, unmask SErrors so they are delivered right away rather
+  // than when the OS unmasks them. This gives us a better chance of figuring
+  // out the cause.
+  //
+  DEBUG_CODE (
+    ArmEnableAsynchronousAbort ();
+  );
+
   return Status;
 }