diff mbox series

[edk2] ArmPkg/TimerDxe: Always perform an EOI, even for spurious interrupts

Message ID 20180306132424.25961-1-ard.biesheuvel@linaro.org
State Accepted
Commit 5e3719aeaef198f36808a5e53a1f5bb23762e3a5
Headers show
Series [edk2] ArmPkg/TimerDxe: Always perform an EOI, even for spurious interrupts | expand

Commit Message

Ard Biesheuvel March 6, 2018, 1:24 p.m. UTC
From: Marc Zyngier <marc.zyngier@arm.com>


The generic timer driver only EOIs the timer interrupt if
the ISTATUS bit is set. This is completely fine if you pretend
that spurious interrupts do not exist. But as a matter of fact,
they do, and the first one will leave the interrupt activated
at the GIC level, making sure that no other interrupt can make
it anymore.

Making sure that each interrupt Ack is paired with an EOI is the
way to go. Oh, and enabling the interrupt each time it is taken
is completely pointless. We entered this function for a good
reason...

Contributed-under: TianoCore Contribution Agreement 1.1
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

---
 ArmPkg/Drivers/TimerDxe/TimerDxe.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

-- 
2.11.0

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Comments

Laszlo Ersek March 6, 2018, 2:25 p.m. UTC | #1
On 03/06/18 14:24, Ard Biesheuvel wrote:
> From: Marc Zyngier <marc.zyngier@arm.com>

>

> The generic timer driver only EOIs the timer interrupt if

> the ISTATUS bit is set. This is completely fine if you pretend

> that spurious interrupts do not exist. But as a matter of fact,

> they do, and the first one will leave the interrupt activated

> at the GIC level, making sure that no other interrupt can make

> it anymore.

>

> Making sure that each interrupt Ack is paired with an EOI is the

> way to go. Oh, and enabling the interrupt each time it is taken

> is completely pointless. We entered this function for a good

> reason...

>

> Contributed-under: TianoCore Contribution Agreement 1.1

> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---

>  ArmPkg/Drivers/TimerDxe/TimerDxe.c | 10 ++++------

>  1 file changed, 4 insertions(+), 6 deletions(-)

>

> diff --git a/ArmPkg/Drivers/TimerDxe/TimerDxe.c b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

> index 2416c90f5545..33d7c922221f 100644

> --- a/ArmPkg/Drivers/TimerDxe/TimerDxe.c

> +++ b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

> @@ -306,12 +306,13 @@ TimerInterruptHandler (

>    //

>    OriginalTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);

>

> +  // Signal end of interrupt early to help avoid losing subsequent ticks

> +  // from long duration handlers

> +  gInterrupt->EndOfInterrupt (gInterrupt, Source);

> +

>    // Check if the timer interrupt is active

>    if ((ArmGenericTimerGetTimerCtrlReg () ) & ARM_ARCH_TIMER_ISTATUS) {

>

> -    // Signal end of interrupt early to help avoid losing subsequent ticks from long duration handlers

> -    gInterrupt->EndOfInterrupt (gInterrupt, Source);

> -

>      if (mTimerNotifyFunction) {

>        mTimerNotifyFunction (mTimerPeriod * mElapsedPeriod);

>      }

> @@ -339,9 +340,6 @@ TimerInterruptHandler (

>      ArmGenericTimerEnableTimer ();

>    }

>

> -  // Enable timer interrupts

> -  gInterrupt->EnableInterruptSource (gInterrupt, Source);

> -

>    gBS->RestoreTPL (OriginalTPL);

>  }

>

>


It is spooky to see this patch on edk2-devel :)

In RHEL-7 ArmVirtQemu (aka "AAVMF"), we had carried a somewhat similar
patch, for a long time.

I originally added it in Feb 2015, after we rebased our aarch64 host
kernel from 3.18.<whatever> to 3.19.<somethingelse>. That host kernel
(incl. KVM) rebase broke guest UEFI *reboot*. We figured it was a KVM
regression -- but, because we couldn't figure out the exact KVM problem,
we plastered it over in the guest firmware, a bit similarly to the
above. This was done for RHBZ#1188054.

We reported the issue to several KVM people (off-list -- maybe that
wasn't a great idea, sorry!); you might find the thread under msgid
<54D3796E.1040907@redhat.com>, subject "virtual timer interrupt stuck
across guest firmware reboot on KVM", in your 2015 archives :)

At the same time, we filed RHBZ#1189429 for tracking the KVM problem, so
that once the host kernel got fixed, we could backport those patches,
and revert the guest firmware kludge.

The host kernel (KVM) fix was
<https://marc.info/?i=1440942866-23802-1-git-send-email-christoffer.dall@linaro.org>:

  [PATCH 0/9] Rework architected timer and fix UEFI reset

And, indeed we dropped the "AAVMF" kludge (which revert was separately
tracked under RHBZ#1259395).

So it looks like spurious timer interrupts exists on phys hardware as
well -- I guess I should have attempted to upstream the guest fw patch
back then, after all. We might have ended up with an improved version of
it that could have now covered this recent symptom too, perhaps.

Quoting my downstream patch under my sig for reference.

For this patch:

Reviewed-by: Laszlo Ersek <lersek@redhat.com>


Thanks
Laszlo

PS: some of the RHBZs listed above are "private" -- sorry about that,
the rules are arcane, and for mere developers like yours truly, there's
a certain element of "better safe than sorry" to it.

> From 58351b83d3f8badb8f0e44f70b06cfbba9644d20 Mon Sep 17 00:00:00 2001

> From: Laszlo Ersek <lersek@redhat.com>

> Date: Thu, 5 Feb 2015 17:53:53 +0100

> Subject: [RHELSA AAVMF PATCH 1/1] ArmPkg: TimerDxe: smack down spurious timer

>  interrupt (RHELSA hack)

>

> Message-id: <1423155233-22668-2-git-send-email-lersek@redhat.com>

> Patchwork-id: 63732

> O-Subject:  [RHELSA AAVMF PATCH 1/1] ArmPkg: TimerDxe: smack down spurious timer

> 	interrupt (RHELSA hack)

> Bugzilla: 1188054

> Acked-by: Miroslav Rezanina <mrezanin@redhat.com>

> Acked-by: wei@redhat.com

> Acked-by: Andrew Jones <drjones@redhat.com>

>

> When an aarch64 KVM guest is rebooted with the UEFI shell command "reset

> -c", the instance of AAVMF that runs after the reboot hangs in the timer

> driver. We tracked this issue unto a virtual timer interrupt that remains

> stuck / asserted across the reboot for some reason, and is delivered to

> the handler function in TimerDxe *before* TimerDxe is done setting up data

> for the handler and unmasks the interrupt. This happens to cause an

> infinite loop in the handler.

>

> The above situation should never emerge (an interrupt, stuck or not,

> should never be delivered while it is masked). This could be a KVM bug --

> which we track with clone bug 1189429 and reported privately to ARM &

> Linaro people --, but until the root cause gets sorted out, we need to

> move forward with AAVMF. Attempts to clear interrupt lines up-front have

> borne no fruit. Hence we enable the handler to cope with a spurious

> interrupt.

>

> Downstream only. Should be reverted when 1189429 is fixed.

>

> Signed-off-by: Laszlo Ersek <lersek@redhat.com>

> ---

>  ArmPkg/Drivers/TimerDxe/TimerDxe.c | 7 +++++++

>  1 file changed, 7 insertions(+)

>

> diff --git a/ArmPkg/Drivers/TimerDxe/TimerDxe.c b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

> index d0a819fc2729..f57b7acf424f 100644

> --- a/ArmPkg/Drivers/TimerDxe/TimerDxe.c

> +++ b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

> @@ -313,6 +313,12 @@ TimerInterruptHandler (

>      // Signal end of interrupt early to help avoid losing subsequent ticks from long duration handlers

>      gInterrupt->EndOfInterrupt (gInterrupt, Source);

>

> +    if (mTimerPeriod == 0) {

> +      DEBUG ((EFI_D_WARN, "%a: hack: this should never happen; dodged spurious"

> +        " interrupt 0x%Lx\n", __FUNCTION__, (UINT64)Source));

> +      goto Done;

> +    }

> +

>      if (mTimerNotifyFunction) {

>        mTimerNotifyFunction (mTimerPeriod * mElapsedPeriod);

>      }

> @@ -340,6 +346,7 @@ TimerInterruptHandler (

>      ArmGenericTimerEnableTimer ();

>    }

>

> +Done:

>    // Enable timer interrupts

>    gInterrupt->EnableInterruptSource (gInterrupt, Source);

>

> --

> 2.14.1.3.gb7cf6e02401b

>

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel March 6, 2018, 2:39 p.m. UTC | #2
On 6 March 2018 at 14:25, Laszlo Ersek <lersek@redhat.com> wrote:
> On 03/06/18 14:24, Ard Biesheuvel wrote:

>> From: Marc Zyngier <marc.zyngier@arm.com>

>>

>> The generic timer driver only EOIs the timer interrupt if

>> the ISTATUS bit is set. This is completely fine if you pretend

>> that spurious interrupts do not exist. But as a matter of fact,

>> they do, and the first one will leave the interrupt activated

>> at the GIC level, making sure that no other interrupt can make

>> it anymore.

>>

>> Making sure that each interrupt Ack is paired with an EOI is the

>> way to go. Oh, and enabling the interrupt each time it is taken

>> is completely pointless. We entered this function for a good

>> reason...

>>

>> Contributed-under: TianoCore Contribution Agreement 1.1

>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

>> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>> ---

>>  ArmPkg/Drivers/TimerDxe/TimerDxe.c | 10 ++++------

>>  1 file changed, 4 insertions(+), 6 deletions(-)

>>

>> diff --git a/ArmPkg/Drivers/TimerDxe/TimerDxe.c b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

>> index 2416c90f5545..33d7c922221f 100644

>> --- a/ArmPkg/Drivers/TimerDxe/TimerDxe.c

>> +++ b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

>> @@ -306,12 +306,13 @@ TimerInterruptHandler (

>>    //

>>    OriginalTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);

>>

>> +  // Signal end of interrupt early to help avoid losing subsequent ticks

>> +  // from long duration handlers

>> +  gInterrupt->EndOfInterrupt (gInterrupt, Source);

>> +

>>    // Check if the timer interrupt is active

>>    if ((ArmGenericTimerGetTimerCtrlReg () ) & ARM_ARCH_TIMER_ISTATUS) {

>>

>> -    // Signal end of interrupt early to help avoid losing subsequent ticks from long duration handlers

>> -    gInterrupt->EndOfInterrupt (gInterrupt, Source);

>> -

>>      if (mTimerNotifyFunction) {

>>        mTimerNotifyFunction (mTimerPeriod * mElapsedPeriod);

>>      }

>> @@ -339,9 +340,6 @@ TimerInterruptHandler (

>>      ArmGenericTimerEnableTimer ();

>>    }

>>

>> -  // Enable timer interrupts

>> -  gInterrupt->EnableInterruptSource (gInterrupt, Source);

>> -

>>    gBS->RestoreTPL (OriginalTPL);

>>  }

>>

>>

>

> It is spooky to see this patch on edk2-devel :)

>


:-)

> In RHEL-7 ArmVirtQemu (aka "AAVMF"), we had carried a somewhat similar

> patch, for a long time.

>

> I originally added it in Feb 2015, after we rebased our aarch64 host

> kernel from 3.18.<whatever> to 3.19.<somethingelse>. That host kernel

> (incl. KVM) rebase broke guest UEFI *reboot*. We figured it was a KVM

> regression -- but, because we couldn't figure out the exact KVM problem,

> we plastered it over in the guest firmware, a bit similarly to the

> above. This was done for RHBZ#1188054.

>

> We reported the issue to several KVM people (off-list -- maybe that

> wasn't a great idea, sorry!); you might find the thread under msgid

> <54D3796E.1040907@redhat.com>, subject "virtual timer interrupt stuck

> across guest firmware reboot on KVM", in your 2015 archives :)

>

> At the same time, we filed RHBZ#1189429 for tracking the KVM problem, so

> that once the host kernel got fixed, we could backport those patches,

> and revert the guest firmware kludge.

>

> The host kernel (KVM) fix was

> <https://marc.info/?i=1440942866-23802-1-git-send-email-christoffer.dall@linaro.org>:

>

>   [PATCH 0/9] Rework architected timer and fix UEFI reset

>

> And, indeed we dropped the "AAVMF" kludge (which revert was separately

> tracked under RHBZ#1259395).

>

> So it looks like spurious timer interrupts exists on phys hardware as

> well -- I guess I should have attempted to upstream the guest fw patch

> back then, after all. We might have ended up with an improved version of

> it that could have now covered this recent symptom too, perhaps.

>


I was never aware of any such issues at any point, so a head's up
would have been nice, yes :-)

> Quoting my downstream patch under my sig for reference.

>

> For this patch:

>

> Reviewed-by: Laszlo Ersek <lersek@redhat.com>

>


Thanks all

Pushed as 5e3719aeaef1
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Marc Zyngier March 6, 2018, 2:48 p.m. UTC | #3
On 06/03/18 14:25, Laszlo Ersek wrote:
> On 03/06/18 14:24, Ard Biesheuvel wrote:

>> From: Marc Zyngier <marc.zyngier@arm.com>

>>

>> The generic timer driver only EOIs the timer interrupt if

>> the ISTATUS bit is set. This is completely fine if you pretend

>> that spurious interrupts do not exist. But as a matter of fact,

>> they do, and the first one will leave the interrupt activated

>> at the GIC level, making sure that no other interrupt can make

>> it anymore.

>>

>> Making sure that each interrupt Ack is paired with an EOI is the

>> way to go. Oh, and enabling the interrupt each time it is taken

>> is completely pointless. We entered this function for a good

>> reason...

>>

>> Contributed-under: TianoCore Contribution Agreement 1.1

>> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

>> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

>> ---

>>  ArmPkg/Drivers/TimerDxe/TimerDxe.c | 10 ++++------

>>  1 file changed, 4 insertions(+), 6 deletions(-)

>>

>> diff --git a/ArmPkg/Drivers/TimerDxe/TimerDxe.c b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

>> index 2416c90f5545..33d7c922221f 100644

>> --- a/ArmPkg/Drivers/TimerDxe/TimerDxe.c

>> +++ b/ArmPkg/Drivers/TimerDxe/TimerDxe.c

>> @@ -306,12 +306,13 @@ TimerInterruptHandler (

>>    //

>>    OriginalTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);

>>

>> +  // Signal end of interrupt early to help avoid losing subsequent ticks

>> +  // from long duration handlers

>> +  gInterrupt->EndOfInterrupt (gInterrupt, Source);

>> +

>>    // Check if the timer interrupt is active

>>    if ((ArmGenericTimerGetTimerCtrlReg () ) & ARM_ARCH_TIMER_ISTATUS) {

>>

>> -    // Signal end of interrupt early to help avoid losing subsequent ticks from long duration handlers

>> -    gInterrupt->EndOfInterrupt (gInterrupt, Source);

>> -

>>      if (mTimerNotifyFunction) {

>>        mTimerNotifyFunction (mTimerPeriod * mElapsedPeriod);

>>      }

>> @@ -339,9 +340,6 @@ TimerInterruptHandler (

>>      ArmGenericTimerEnableTimer ();

>>    }

>>

>> -  // Enable timer interrupts

>> -  gInterrupt->EnableInterruptSource (gInterrupt, Source);

>> -

>>    gBS->RestoreTPL (OriginalTPL);

>>  }

>>

>>

> 

> It is spooky to see this patch on edk2-devel :)

> 

> In RHEL-7 ArmVirtQemu (aka "AAVMF"), we had carried a somewhat similar

> patch, for a long time.

> 

> I originally added it in Feb 2015, after we rebased our aarch64 host

> kernel from 3.18.<whatever> to 3.19.<somethingelse>. That host kernel

> (incl. KVM) rebase broke guest UEFI *reboot*. We figured it was a KVM

> regression -- but, because we couldn't figure out the exact KVM problem,

> we plastered it over in the guest firmware, a bit similarly to the

> above. This was done for RHBZ#1188054.

> 

> We reported the issue to several KVM people (off-list -- maybe that

> wasn't a great idea, sorry!); you might find the thread under msgid


Yup. In general, not reporting bugs on the ML ends up in missing fixes
and duplicated work. Not the best use of an already limited resource.

> <54D3796E.1040907@redhat.com>, subject "virtual timer interrupt stuck

> across guest firmware reboot on KVM", in your 2015 archives :)


I only have a single trace of this as Christoffer cc'd me on on reply. I
wasn't cc'd on the previous emails, nor on the follow-up if it ever existed.

> At the same time, we filed RHBZ#1189429 for tracking the KVM problem, so

> that once the host kernel got fixed, we could backport those patches,

> and revert the guest firmware kludge.

> 

> The host kernel (KVM) fix was

> <https://marc.info/?i=1440942866-23802-1-git-send-email-christoffer.dall@linaro.org>:

> 

>   [PATCH 0/9] Rework architected timer and fix UEFI reset

> 

> And, indeed we dropped the "AAVMF" kludge (which revert was separately

> tracked under RHBZ#1259395).

> 

> So it looks like spurious timer interrupts exists on phys hardware as

> well -- I guess I should have attempted to upstream the guest fw patch


Not only timer interrupts. Any interrupt can be spurious, depending on
how quickly your interrupt controller can retire an interrupt that
hasn't been acknowledged already. Propagation delays and all that, and
assuming a level interrupt. For an edge interrupt, you cannot retire it,
full stop (because you cannot "unwrite" something).

> back then, after all. We might have ended up with an improved version of

> it that could have now covered this recent symptom too, perhaps.

> 

> Quoting my downstream patch under my sig for reference.

> 

> For this patch:

> 

> Reviewed-by: Laszlo Ersek <lersek@redhat.com>


Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
diff mbox series

Patch

diff --git a/ArmPkg/Drivers/TimerDxe/TimerDxe.c b/ArmPkg/Drivers/TimerDxe/TimerDxe.c
index 2416c90f5545..33d7c922221f 100644
--- a/ArmPkg/Drivers/TimerDxe/TimerDxe.c
+++ b/ArmPkg/Drivers/TimerDxe/TimerDxe.c
@@ -306,12 +306,13 @@  TimerInterruptHandler (
   //
   OriginalTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);
 
+  // Signal end of interrupt early to help avoid losing subsequent ticks
+  // from long duration handlers
+  gInterrupt->EndOfInterrupt (gInterrupt, Source);
+
   // Check if the timer interrupt is active
   if ((ArmGenericTimerGetTimerCtrlReg () ) & ARM_ARCH_TIMER_ISTATUS) {
 
-    // Signal end of interrupt early to help avoid losing subsequent ticks from long duration handlers
-    gInterrupt->EndOfInterrupt (gInterrupt, Source);
-
     if (mTimerNotifyFunction) {
       mTimerNotifyFunction (mTimerPeriod * mElapsedPeriod);
     }
@@ -339,9 +340,6 @@  TimerInterruptHandler (
     ArmGenericTimerEnableTimer ();
   }
 
-  // Enable timer interrupts
-  gInterrupt->EnableInterruptSource (gInterrupt, Source);
-
   gBS->RestoreTPL (OriginalTPL);
 }