diff mbox

[Xen-devel] xen/arm: GIC: unable to inject hw irq warning

Message ID alpine.DEB.2.02.1409081938540.2334@kaball.uk.xensource.com
State New
Headers show

Commit Message

Stefano Stabellini Sept. 8, 2014, 9:34 p.m. UTC
On Sat, 6 Sep 2014, Vijay Kilari wrote:
> On Thu, Sep 4, 2014 at 4:19 AM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Wed, 3 Sep 2014, Vijay Kilari wrote:
> >> Hi Stefano,
> >>
> >>  I face following warning messages for LPI/ITS interrupt.
> >> Can you throw some light on this warning?
> >>
> >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> >> active in LR0
> >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> >> active in LR0
> >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> >> active in LR1
> >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> >> active in LR1
> >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> >> active in LR1
> >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> >> active in LR0
> >
> > Hello Vijay,
> > the warning is for a condition that should not happen. All the following
> > must be true for the warning to trigger:
> >
> > - the irq is an hardware irq (p->desc != NULL)
> >
> > - the irq is active in the LR register (GICH_LR_ACTIVE)
> >
> > - we need to inject a second irq while the first one is still active
> >
> > The GICv2 specification explicitely states that it is not possible to
> > set an hardware irq as pending and active in the LR registers (5-175).
> 
>    Here in the code we are checking only for LR active state(0x2) and not
> for ''pending and active' (0x3) state.

True. If the irq is GICH_LR_ACTIVE we do not set it GICH_LR_PENDING for
hardware interrupts.


> > Regardless from the GICv2 specification, it shouldn't be possible to
> > receive a second hardware interrupt in Xen while the guest is still
> > handling the first one. So the question is: how is it possible that xen
> > is even trying to inject a second irq while the first one is still
> > active?  I would check how GIC_IRQ_GUEST_QUEUED was set.
> 
> Where in GICv2 specification it is men>tioned?.

It is explained in the general handling of interrupts, 3.2.

"When a processor takes the interrupt exception, it reads the GICC_IAR of its CPU interface to acknowledge
the interrupt. This read returns an Interrupt ID, and for an SGI, the source processor ID, that the processor
uses to select the correct interrupt handler. When it recognizes this read, the GIC changes the state of the
interrupt as follows:
* if the pending state of the interrupt persists when the interrupt becomes active, or if the interrupt is
generated again, from pending to active and pending.
* otherwise, from pending to active"

Later in the same chapter:

"A CPU interface never signals to the connected processor any interrupt
that is active and pending. It only signals interrupts that are pending
and have sufficient priority"


> In case of GICv3 LPI's are edge triggered and there is no active state.
> Once the hypervisor has EOI'd the LPI, it will be delivered again if
> the device generates a new edge. See 4.3.2 and 4.1.5

Thanks for the pointer, very interesting!
The spec also says that LPIs are never ACTIVE, so how is it possible
that the irq is ACTIVE in the LR register in your case?
Maybe only physical LPIs are never active but virtual LPIs can be
active? That is plausible but very confusing. Could you please run a few
tests to confirm this theory?

If that is correct then I think we'll have to make a few changes to
accommodate LPIs:

- do not set the HW bit in GICH_LRs for LPIs, because the corresponding
physical irq doesn't need to be deactivated anyway

- once we remove the HW bit, LPIs become purely virtual irqs at the
GICH_LR level, therefore we can set them ACTIVE & PENDING in GICH_LR
from gic_update_one_lr

See the attached patch as reference, not even compile tested.
If the LPI is already PENDING in GICH_LRs, it is still possible to loose
an interrupt but I think that is acceptable (interrupt coalescing).


> In case of Xen if interrupt is already in LR (pending or active), Xen
> ignores the interrupt right?
> If so, are we not missing the interrupt?
 
Yes, we do loose an interrupt but such cases shouldn't happen in the
first place with hardware interrupts on GICv2. For LPIs we need to make
changes.

Comments

Stefano Stabellini Sept. 8, 2014, 9:48 p.m. UTC | #1
On Mon, 8 Sep 2014, Stefano Stabellini wrote:
> On Sat, 6 Sep 2014, Vijay Kilari wrote:
> > On Thu, Sep 4, 2014 at 4:19 AM, Stefano Stabellini
> > <stefano.stabellini@eu.citrix.com> wrote:
> > > On Wed, 3 Sep 2014, Vijay Kilari wrote:
> > >> Hi Stefano,
> > >>
> > >>  I face following warning messages for LPI/ITS interrupt.
> > >> Can you throw some light on this warning?
> > >>
> > >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> > >> active in LR0
> > >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> > >> active in LR0
> > >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> > >> active in LR1
> > >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> > >> active in LR1
> > >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> > >> active in LR1
> > >> (XEN) gic.c:394:d0v0 unable to inject hw irq=8736 into d0v0: already
> > >> active in LR0
> > >
> > > Hello Vijay,
> > > the warning is for a condition that should not happen. All the following
> > > must be true for the warning to trigger:
> > >
> > > - the irq is an hardware irq (p->desc != NULL)
> > >
> > > - the irq is active in the LR register (GICH_LR_ACTIVE)
> > >
> > > - we need to inject a second irq while the first one is still active
> > >
> > > The GICv2 specification explicitely states that it is not possible to
> > > set an hardware irq as pending and active in the LR registers (5-175).
> > 
> >    Here in the code we are checking only for LR active state(0x2) and not
> > for ''pending and active' (0x3) state.
> 
> True. If the irq is GICH_LR_ACTIVE we do not set it GICH_LR_PENDING for
> hardware interrupts.
> 
> 
> > > Regardless from the GICv2 specification, it shouldn't be possible to
> > > receive a second hardware interrupt in Xen while the guest is still
> > > handling the first one. So the question is: how is it possible that xen
> > > is even trying to inject a second irq while the first one is still
> > > active?  I would check how GIC_IRQ_GUEST_QUEUED was set.
> > 
> > Where in GICv2 specification it is men>tioned?.
> 
> It is explained in the general handling of interrupts, 3.2.
> 
> "When a processor takes the interrupt exception, it reads the GICC_IAR of its CPU interface to acknowledge
> the interrupt. This read returns an Interrupt ID, and for an SGI, the source processor ID, that the processor
> uses to select the correct interrupt handler. When it recognizes this read, the GIC changes the state of the
> interrupt as follows:
> * if the pending state of the interrupt persists when the interrupt becomes active, or if the interrupt is
> generated again, from pending to active and pending.
> * otherwise, from pending to active"
> 
> Later in the same chapter:
> 
> "A CPU interface never signals to the connected processor any interrupt
> that is active and pending. It only signals interrupts that are pending
> and have sufficient priority"
> 
> 
> > In case of GICv3 LPI's are edge triggered and there is no active state.
> > Once the hypervisor has EOI'd the LPI, it will be delivered again if
> > the device generates a new edge. See 4.3.2 and 4.1.5
> 
> Thanks for the pointer, very interesting!
> The spec also says that LPIs are never ACTIVE, so how is it possible
> that the irq is ACTIVE in the LR register in your case?
> Maybe only physical LPIs are never active but virtual LPIs can be
> active? That is plausible but very confusing. Could you please run a few
> tests to confirm this theory?
> 
> If that is correct then I think we'll have to make a few changes to
> accommodate LPIs:
> 
> - do not set the HW bit in GICH_LRs for LPIs, because the corresponding
> physical irq doesn't need to be deactivated anyway
> 
> - once we remove the HW bit, LPIs become purely virtual irqs at the
> GICH_LR level, therefore we can set them ACTIVE & PENDING in GICH_LR
> from gic_update_one_lr
> 
> See the attached patch as reference, not even compile tested.
> If the LPI is already PENDING in GICH_LRs, it is still possible to loose
> an interrupt but I think that is acceptable (interrupt coalescing).

One more thing: unfortunately a misconfigured device could cause an
interrupt storm and the hypervisor would be completely unprotected
against it.

It is not necessary to implement this in first instance, a comment in
the code would suffice, but I think that eventually we would need to:

- realize that Xen received an LPI that is already pending in an LR
register
- ask for a maintenance interrupt for the virtual LPI
- deactivate the physical LPI

Afterward upon receiving the maintenance interrupt:
- reactivate the physical LPI
diff mbox

Patch

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 78ad4de..576512e 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -395,7 +395,7 @@  static void gicv2_update_lr(int lr, const struct pending_irq *p,
                                              << GICH_V2_LR_PRIORITY_SHIFT) |
               ((p->irq & GICH_V2_LR_VIRTUAL_MASK) << GICH_V2_LR_VIRTUAL_SHIFT));
 
-    if ( p->desc != NULL )
+    if ( p->desc != NULL && !is_lpi(p->irq) )
     {
         if ( platform_has_quirk(PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI) )
             lr_reg |= GICH_V2_LR_MAINTENANCE_IRQ;
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 6611ba0..74e2ab4 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -343,7 +343,7 @@  static void gic_update_one_lr(struct vcpu *v, int i)
         if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) &&
              test_and_clear_bit(GIC_IRQ_GUEST_QUEUED, &p->status) )
         {
-            if ( p->desc == NULL )
+            if ( p->desc == NULL || is_lpi(irq) )
             {
                  lr_val.state |= GICH_LR_PENDING;
                  gic_hw_ops->write_lr(i, &lr_val);
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index a0c07bf..1b3b0fc 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -154,6 +154,11 @@ 
 #define DT_MATCH_GIC_V2 DT_MATCH_COMPATIBLE(DT_COMPAT_GIC_CORTEX_A15), \
                         DT_MATCH_COMPATIBLE(DT_COMPAT_GIC_CORTEX_A7)
 
+static inline bool_t is_lpi(int irq)
+{
+    return irq >= 8192;
+}
+
 /*
  * GICv2 register that needs to be saved/restored
  * on VCPU context switch