[edk2] ArmPkg: Fix double GIC EIOR write per interrupt

Message ID CAKv+Gu-bU_rU1FaZ=9_L5FKAP8S9jp8GXjVqxLoZyVX6p9tyEg@mail.gmail.com
State New
Headers show

Commit Message

Ard Biesheuvel Aug. 8, 2016, 11:51 a.m.
On 8 August 2016 at 13:08, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 8 August 2016 at 13:06, Alexei Fedorov <Alexei.Fedorov@arm.com> wrote:

>> Timer's pending interrupt is cleared  HARDWARE_INTERRUPT_HANDLER TimerInterruptHandler()  in when timer is re-programmed for the next shot.

>

> Yes, that is the timer side.

>

>> Does it mean that TimerDxe driver is part EFI_HARDWARE_INTERRUPT_PROTOCOL?

>>

>

> No. The peripheral and the GIC each have their own mask and enable

> registers for the interrupt. That is exactly why the comment describes

> in detail which part is the responsibility of the GIC, and which is

> the responsibility of the peripheral.

>


... and actually, looking at TimerInterruptHandler (), I don't see the
point of re-enabling the interrupt early, given that

308: OriginalTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);

disables interrupts globally, and only re-enables them on line 346, at
which point the mTimerNotifyFunction() has already returned.

So I propose we simply do

       mTimerNotifyFunction (mTimerPeriod * mElapsedPeriod);

so that if the condition exists that we know will trigger the
interrupt immediately as soon as we unmask it, we simply enter the
loop again just like we would when taking the [nested] interrupt.

@Heyi: any thoughts?

-- 
Ard.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Comments

Ard Biesheuvel Aug. 8, 2016, 1:42 p.m. | #1
On 8 August 2016 at 15:22, Cohen, Eugene <eugene@hp.com> wrote:
> Guys, sorry to join so late, something about timezones...  Let me try to provide some context and history.

>

>> > it does change the contract we have with registered interrupt handlers

>>

>> Looks like it does not:

>> From edk2\EmbeddedPkg\Include\Protocol\HardwareInterrupt.h:

>>

>> " Abstraction for hardware based interrupt routine

>>

>>   ...The driver implementing

>>   this protocol is responsible for clearing the pending interrupt in the

>>   interrupt routing hardware. The HARDWARE_INTERRUPT_HANDLER is

>> responsible

>>   for clearing interrupt sources from individual devices."

>

> I think you are reading more deeply into this verbiage than was intended.  From a separation-of-concerns perspective one driver is concerned with writing to the hardware that generates the interrupt (handler) and another is concerned with writing to the hardware for the interrupt controller to signal the end of interrupt.  So all this is saying is that "the code that touches the interrupt controller is implemented in the driver that publishes this protocol".  It does not say how this code is activated, only who is responsible for poking the register.

>

> The historical expectation is that the handler driver calls the EOI interface in the protocol.  (If it was the opposite then this interface wouldn't even exist since the interrupt controller driver could just do it implicitly.)  You're next question will probably be why it was designed this way - for that we'll have to ask Andrew Fish (added).

>

> I did a little digging and see that the PC-AT chipset implemented an 8259 interrupt protocol (IntelFrameworkPkg\Include\Protocol\Legacy8259.h) that is quite similar to HwInterrupt.  Notice the explicit EndOfInterrupt interface here and how it's used by the timer driver at PcAtChipsetPkg\8254TimerDxe\Timer.c(88).

>

> Given this I asked that you keep the EndOfInterrupt in the handler driver(s) and remove the auto-EOI in the interrupt controller driver, at least for cases where a driver handled the interrupt.

>

> Feel free to clarify the text in the protocol header to align with this - the current wording is not very clear.

>


Thanks for the context. I did some archaeology of my own, and it turns
out that this code was introduced by Andrew in git commit
1bfda055dfbc52678 (svn #11293) more than 5 years ago.

In any case, it appears we are in agreement that it is in fact
incorrect (and deviates from the other implementations) to signal EOI
in the GIC driver, and so I suppose Alexei's patch is good (and we
only need to clarify the comment that he quoted in this thread).

My primary concern was that we change the contract with existing
handlers, but if there was a contract to begin with, we were already
violating it, and so any out of tree breakage is not really our
problem :-)

Thanks all,
Ard.
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Ard Biesheuvel Aug. 8, 2016, 1:50 p.m. | #2
On 8 August 2016 at 15:42, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> On 8 August 2016 at 15:22, Cohen, Eugene <eugene@hp.com> wrote:

>> Guys, sorry to join so late, something about timezones...  Let me try to provide some context and history.

>>

>>> > it does change the contract we have with registered interrupt handlers

>>>

>>> Looks like it does not:

>>> From edk2\EmbeddedPkg\Include\Protocol\HardwareInterrupt.h:

>>>

>>> " Abstraction for hardware based interrupt routine

>>>

>>>   ...The driver implementing

>>>   this protocol is responsible for clearing the pending interrupt in the

>>>   interrupt routing hardware. The HARDWARE_INTERRUPT_HANDLER is

>>> responsible

>>>   for clearing interrupt sources from individual devices."

>>

>> I think you are reading more deeply into this verbiage than was intended.  From a separation-of-concerns perspective one driver is concerned with writing to the hardware that generates the interrupt (handler) and another is concerned with writing to the hardware for the interrupt controller to signal the end of interrupt.  So all this is saying is that "the code that touches the interrupt controller is implemented in the driver that publishes this protocol".  It does not say how this code is activated, only who is responsible for poking the register.

>>

>> The historical expectation is that the handler driver calls the EOI interface in the protocol.  (If it was the opposite then this interface wouldn't even exist since the interrupt controller driver could just do it implicitly.)  You're next question will probably be why it was designed this way - for that we'll have to ask Andrew Fish (added).

>>

>> I did a little digging and see that the PC-AT chipset implemented an 8259 interrupt protocol (IntelFrameworkPkg\Include\Protocol\Legacy8259.h) that is quite similar to HwInterrupt.  Notice the explicit EndOfInterrupt interface here and how it's used by the timer driver at PcAtChipsetPkg\8254TimerDxe\Timer.c(88).

>>

>> Given this I asked that you keep the EndOfInterrupt in the handler driver(s) and remove the auto-EOI in the interrupt controller driver, at least for cases where a driver handled the interrupt.

>>

>> Feel free to clarify the text in the protocol header to align with this - the current wording is not very clear.

>>

>

> Thanks for the context. I did some archaeology of my own, and it turns

> out that this code was introduced by Andrew in git commit

> 1bfda055dfbc52678 (svn #11293) more than 5 years ago.

>

> In any case, it appears we are in agreement that it is in fact

> incorrect (and deviates from the other implementations) to signal EOI

> in the GIC driver, and so I suppose Alexei's patch is good (and we

> only need to clarify the comment that he quoted in this thread).

>

> My primary concern was that we change the contract with existing

> handlers, but if there was a contract to begin with, we were already

> violating it, and so any out of tree breakage is not really our

> problem :-)

>


Pushed as 7989300df7
_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Evan Lloyd Aug. 10, 2016, 5:38 p.m. | #3
Hi.
(Note - this is not "top-posting", it is a discussion point, with references appended.)

I'd like to re-think our GIC EIOR changes in the light of comments from Heyi Guo. (inserted before Eugene's e-mail below)
Despite Eugene's cogent advocacy of the change, and the fact that Alexei's fix is now accepted, I have come to the conclusion that it is not the best thing to do.  As Heyi points out, it opens a race condition where the Timer interrupt line is still asserted after the GIC EIOR has been written.
The timer IRQ needs to be de-asserted before the EIOR write, or the GIC sees the IRQ and latches it ("active and pending").
This is clearly an error, and a minor mystery is that we do not see that on our Juno boards.

A more elegant solution is for the GIC register access to be done as part of the GIC handling (i.e. revert the GIC code, and remove the EIOR from the Timer handling.)  This also caters for any surreptitious use of other interrupts "under the covers".
As an additional benefit, it clearly partitions the peripheral specific handling from the GIC interface.

This is very much at odds with Eugene's position (which, BTW,  is not a stance I find comfortable).
Further, it implies removing any EIOR writes from extant interrupt handlers - however, since they already have the "double write" problem, that is not really a concern.  (BTW, the GIC spec is very clear that "A write to this register must correspond to the most recent valid read from an Interrupt Acknowledge Register... otherwise the system behavior is UNPREDICTABLE", so the double write is not good.)

Does anyone know of a concrete reason why we should not restore the EOI to the GIC handling and remove it from the timer ISR?
Eugene - you express a strongly reasoned case - can I talk you round here?

Regards,
Evan

> Hi Alexei,

>

> Thanks for your explanation. However, when I tested it on D02, it didn't act as expected, i.e. we did see a subsequent interrupt triggered >immediately when gBS->RestoreTPL was called in which IRQ is enabled, and I had set timer interrupt period to some fairly large value (e.g. 5 >seconds).

>

> So I'd like to confirm, is it declared in GIC specification that clearing interrupt source will also clear pending status in GICD?

>

> Thanks and regards.

>

> Heyi


...
> From: Linaro-uefi [mailto:linaro-uefi-bounces@lists.linaro.org] On Behalf Of Heyi Guo

> Sent: 02 August 2016 02:28

> To: Ard Biesheuvel

> Cc: Linaro UEFI Mailman List

> Subject: Re: [Linaro-uefi] [linaro-uefi] Two write to EOIR in a single interrupt

>

> Hi Leif and Ard,

>

> There might be another issue in timer interrupt handler. Timer interrupt seems to be level triggered, so is it OK to write EOIR before clearing the > > > interrupt source, i.e. updating compare value register?

>

> Heyi


>-----Original Message-----

>From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Ard

>Biesheuvel

>Sent: 08 August 2016 14:51

>To: Cohen, Eugene

>Cc: Alexei Fedorov; edk2-devel@lists.01.org; Leif Lindholm; Andrew Fish

>(afish@apple.com); Heyi Guo

>Subject: Re: [edk2] [PATCH] ArmPkg: Fix double GIC EIOR write per interrupt

>

>On 8 August 2016 at 15:42, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:

>> On 8 August 2016 at 15:22, Cohen, Eugene <eugene@hp.com> wrote:

>>> Guys, sorry to join so late, something about timezones...  Let me try to

>provide some context and history.

>>>

>>>> > it does change the contract we have with registered interrupt handlers

>>>>

>>>> Looks like it does not:

>>>> From edk2\EmbeddedPkg\Include\Protocol\HardwareInterrupt.h:

>>>>

>>>> " Abstraction for hardware based interrupt routine

>>>>

>>>>   ...The driver implementing

>>>>   this protocol is responsible for clearing the pending interrupt in the

>>>>   interrupt routing hardware. The HARDWARE_INTERRUPT_HANDLER is

>>>> responsible

>>>>   for clearing interrupt sources from individual devices."

>>>

>>> I think you are reading more deeply into this verbiage than was intended.

>From a separation-of-concerns perspective one driver is concerned with writing

>to the hardware that generates the interrupt (handler) and another is

>concerned with writing to the hardware for the interrupt controller to signal

>the end of interrupt.  So all this is saying is that "the code that touches the

>interrupt controller is implemented in the driver that publishes this protocol".  It

>does not say how this code is activated, only who is responsible for poking the

>register.

>>>

>>> The historical expectation is that the handler driver calls the EOI interface in

>the protocol.  (If it was the opposite then this interface wouldn't even exist

>since the interrupt controller driver could just do it implicitly.)  You're next

>question will probably be why it was designed this way - for that we'll have to

>ask Andrew Fish (added).

>>>

>>> I did a little digging and see that the PC-AT chipset implemented an 8259

>interrupt protocol (IntelFrameworkPkg\Include\Protocol\Legacy8259.h) that is

>quite similar to HwInterrupt.  Notice the explicit EndOfInterrupt interface here

>and how it's used by the timer driver at

>PcAtChipsetPkg\8254TimerDxe\Timer.c(88).

>>>

>>> Given this I asked that you keep the EndOfInterrupt in the handler driver(s)

>and remove the auto-EOI in the interrupt controller driver, at least for cases

>where a driver handled the interrupt.

>>>

>>> Feel free to clarify the text in the protocol header to align with this - the

>current wording is not very clear.

>>>

>>

>> Thanks for the context. I did some archaeology of my own, and it turns

>> out that this code was introduced by Andrew in git commit

>> 1bfda055dfbc52678 (svn #11293) more than 5 years ago.

>>

>> In any case, it appears we are in agreement that it is in fact

>> incorrect (and deviates from the other implementations) to signal EOI

>> in the GIC driver, and so I suppose Alexei's patch is good (and we

>> only need to clarify the comment that he quoted in this thread).

>>

>> My primary concern was that we change the contract with existing

>> handlers, but if there was a contract to begin with, we were already

>> violating it, and so any out of tree breakage is not really our

>> problem :-)

>>

>

>Pushed as 7989300df7

>_______________________________________________

>edk2-devel mailing list

>edk2-devel@lists.01.org

>https://lists.01.org/mailman/listinfo/edk2-devel


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel
Evan Lloyd Aug. 10, 2016, 8:31 p.m. | #4
Hi Eugene.
Some responses inline below.

>-----Original Message-----

>From: Cohen, Eugene [mailto:eugene@hp.com]

>Sent: 10 August 2016 20:34

>To: Evan Lloyd; Ard Biesheuvel; Heyi Guo

>Cc: Alexei Fedorov; edk2-devel@lists.01.org; Andrew Fish (afish@apple.com);

>Leif Lindholm

>Subject: RE: [edk2] [PATCH] ArmPkg: Fix double GIC EIOR write per interrupt

>

>Evan,

>

>> I'd like to re-think our GIC EIOR changes in the light of comments from

>> Heyi Guo. (inserted before Eugene's e-mail below)

>> Despite Eugene's cogent advocacy of the change, and the fact that

>> Alexei's fix is now accepted, I have come to the conclusion that it is not

>> the best thing to do.  As Heyi points out, it opens a race condition

>> where the Timer interrupt line is still asserted after the GIC EIOR has

>> been written.

>> The timer IRQ needs to be de-asserted before the EIOR write, or the

>> GIC sees the IRQ and latches it ("active and pending").

>> This is clearly an error, and a minor mystery is that we do not see that

>> on our Juno boards.

>

>According to the GIC architecture spec for level-sensitive interrupts are pending

>only based on the immediate state of the signal, there is no latching, see

>"Control of the pending status of level-sensitive interrupts" in the GICv2

>Architecture Specification:

>

>"

>For an edge-triggered interrupt, the includes pending status is latched on either

>a write to the GICD_ISPENDRn or

>the assertion of the interrupt signal to the GIC. However, for a level-sensitive

>interrupt, the includes pending status

>either:

>  * is latched on a write to the GICD_ISPENDRn

>  * follows the state of the interrupt signal to the GIC, without any latching.

>"

>

>So there's no "memory" for level sensitive interrupts and an edge triggered

>interrupt will only latch on an edge, not an EOI so I'm not sure there's an issue.


My description was not very clear, and the point is academic if you are happy with the solution.  However:
The GIC spec has a state machine diagram (Figure 4.3), where:
Transition D, pending to active and pending
This transition occurs on acknowledgement of the interrupt by the PE for level-sensitive SPIs, SGIs,
and PPIs.
Transition B1 or B2, remove pending state
This transition occurs when the interrupt has been deasserted by the peripheral, if the interrupt is a
level-sensitive interrupt, or when software has changed the pending state.
Transition E1 or E2, remove active state
This transition occurs when software deactivates an interrupt for SPIs, SGIs, and PPIs.

I suspect that because we EOI ("deactivate" for E1/E2) without "deasserting" the peripheral interrupt then the GIC may restore the pending state (transition E2 instead of B2 then E1), which will look remarkably like a latch.


>

>From what I can tell the primary purpose of EOI is for interrupt priority

>management - when you issue the EOI a priority drop occurs allowing the next-

>highest priority interrupt to be serviced, if any.  But since we are not making

>use of nested interrupts (i.e. IRQ is masked the whole time during interrupt

>processing) I don't think the EOI sequencing matters.  This is based on about 10

>minutes of me reading this GIC spec so please correct any confusion I may

>have.  I agree the double-EOI has to go away no matter what.

>

>> This is very much at odds with Eugene's position (which, BTW,  is not a

>> stance I find comfortable).

>> Further, it implies removing any EIOR writes from extant interrupt

>> handlers - however, since they already have the "double write"

>> problem, that is not really a concern.  (BTW, the GIC spec is very clear

>> that "A write to this register must correspond to the most recent valid

>> read from an Interrupt Acknowledge Register... otherwise the system

>> behavior is UNPREDICTABLE", so the double write is not good.)

>

>I did some detective work and the double-write problem goes way back to the

>BeagleBoard/Omap35xx code from 2010 where the NEWIRQAGR register was

>written multiple times as well.  This code actually writes the EOI-like

>NEWIRQAGR register two times in addition to the additional EOI interface

>write:

>

>  // Needed to prevent infinite nesting when Time Driver lowers TPL

>  MmioWrite32 (INTCPS_CONTROL, INTCPS_CONTROL_NEWIRQAGR);

>  ArmDataSynchronizationBarrier ();

>

>  InterruptHandler = gRegisteredInterruptHandlers[Vector];

>  if (InterruptHandler != NULL) {

>    // Call the registered interrupt handler.

>    InterruptHandler (Vector, SystemContext);

>  }

>

>  // Needed to clear after running the handler

>  MmioWrite32 (INTCPS_CONTROL, INTCPS_CONTROL_NEWIRQAGR);

>  ArmDataSynchronizationBarrier ();

>

>Note the comment about infinite nesting - I don't know what this is referring to

>since I would expect TPL to remain at HIGH_LEVEL for the duration of timer

>interrupt processing.  I'd really like to get Mr. Fish to chime in on this one.

>

>> A more elegant solution is for the GIC register access to be done as

>> part of the GIC handling (i.e. revert the GIC code, and remove the EIOR

>> from the Timer handling.)  This also caters for any surreptitious use of

>> other interrupts "under the covers".

>> As an additional benefit, it clearly partitions the peripheral specific

>> handling from the GIC interface.

>

>I have no specific objection to this - let's just find a way to do this that

>maintains compatibility with the existing HardwareInterrupt protocol definition

>- maybe return EFI_UNSUPPORTED (or perhaps just EFI_SUCCESS) for the EOI

>protocol interface on systems that don't want to expose this functionality.

>Then an old driver could try to use the EOI interface and on old systems it will

>issue the EOI and on new systems it will do nothing, deferring the real EOI to

>when the ISR returns.


That looks like a pretty sensible way forward.  I'll vote for EFI_SUCCESS, white lies are sometimes needed.

>

>Eugene


IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

_______________________________________________
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel

Patch

diff --git a/ArmPkg/Drivers/TimerDxe/TimerDxe.c
b/ArmPkg/Drivers/TimerDxe/TimerDxe.c
index 1169d426b255..f0fcb05757ac 100644
--- a/ArmPkg/Drivers/TimerDxe/TimerDxe.c
+++ b/ArmPkg/Drivers/TimerDxe/TimerDxe.c
@@ -308,10 +308,7 @@  TimerInterruptHandler (
   OriginalTPL = gBS->RaiseTPL (TPL_HIGH_LEVEL);

   // Check if the timer interrupt is active
-  if ((ArmGenericTimerGetTimerCtrlReg () ) & ARM_ARCH_TIMER_ISTATUS) {
-
-    // Signal end of interrupt early to help avoid losing subsequent
ticks from long duration handlers
-    gInterrupt->EndOfInterrupt (gInterrupt, Source);
+  while ((ArmGenericTimerGetTimerCtrlReg () ) & ARM_ARCH_TIMER_ISTATUS) {

     if (mTimerNotifyFunction) {