diff mbox

[Xen-devel,v8,13/13] gic_remove_from_queues: take a lock on the right vcpu

Message ID 1400761950-25035-13-git-send-email-stefano.stabellini@eu.citrix.com
State New
Headers show

Commit Message

Stefano Stabellini May 22, 2014, 12:32 p.m. UTC
At the moment gic_remove_from_queues doesn't handle the case where the
guest kernel disables an irq on a different vcpu compared to the one
currently receiving the interrupt.
Make sure to take the right vcpu lock before removing the irq from
lr_queue.

Document that the function should remove irqs from LR registers too.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
---
 xen/arch/arm/gic.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Julien Grall May 22, 2014, 4:10 p.m. UTC | #1
Hi Stefano,

On 22/05/14 13:32, Stefano Stabellini wrote:
> At the moment gic_remove_from_queues doesn't handle the case where the
> guest kernel disables an irq on a different vcpu compared to the one
> currently receiving the interrupt.
> Make sure to take the right vcpu lock before removing the irq from
> lr_queue.

I see the same issue with vgic_enable_irqs. We may inject to the wrong 
VCPU (i.e other than 0).

I think we should have the same case in vgic_enable_irqs.

Cheers,
Stefano Stabellini May 22, 2014, 5:45 p.m. UTC | #2
On Thu, 22 May 2014, Julien Grall wrote:
> Hi Stefano,
> 
> On 22/05/14 13:32, Stefano Stabellini wrote:
> > At the moment gic_remove_from_queues doesn't handle the case where the
> > guest kernel disables an irq on a different vcpu compared to the one
> > currently receiving the interrupt.
> > Make sure to take the right vcpu lock before removing the irq from
> > lr_queue.
> 
> I see the same issue with vgic_enable_irqs. We may inject to the wrong VCPU
> (i.e other than 0).
> 
> I think we should have the same case in vgic_enable_irqs.

I think it would make more sense to print a warning in
vgic_distr_mmio_write GICD_ITARGETSR rather than vgic_enable_irqs.
Julien Grall May 22, 2014, 6:10 p.m. UTC | #3
On 22/05/14 18:45, Stefano Stabellini wrote:
> On Thu, 22 May 2014, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 22/05/14 13:32, Stefano Stabellini wrote:
>>> At the moment gic_remove_from_queues doesn't handle the case where the
>>> guest kernel disables an irq on a different vcpu compared to the one
>>> currently receiving the interrupt.
>>> Make sure to take the right vcpu lock before removing the irq from
>>> lr_queue.
>>
>> I see the same issue with vgic_enable_irqs. We may inject to the wrong VCPU
>> (i.e other than 0).
>>
>> I think we should have the same case in vgic_enable_irqs.
>
> I think it would make more sense to print a warning in
> vgic_distr_mmio_write GICD_ITARGETSR rather than vgic_enable_irqs.

IHMO the warning is not enougth. We may screw your state machine.


BTW, for your todo:

 > +    /* TODO: evict the irq from LRs */

We should not evict the IRQ from LRs. The guest may disable the IRQ 
while he is in the IRQ context (and before the IRQ has been EOI). If you 
drop the IRQs from the LRs, this can result to a maintenance interrupt:

"If the specified Interrupt does not exist in the
List registers, the GICH_HCR.EOIcount field is incremented, potentially 
generating a maintenance interrupt."

Regards,
Stefano Stabellini May 23, 2014, 5:33 p.m. UTC | #4
On Thu, 22 May 2014, Julien Grall wrote:
> On 22/05/14 18:45, Stefano Stabellini wrote:
> > On Thu, 22 May 2014, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 22/05/14 13:32, Stefano Stabellini wrote:
> > > > At the moment gic_remove_from_queues doesn't handle the case where the
> > > > guest kernel disables an irq on a different vcpu compared to the one
> > > > currently receiving the interrupt.
> > > > Make sure to take the right vcpu lock before removing the irq from
> > > > lr_queue.
> > > 
> > > I see the same issue with vgic_enable_irqs. We may inject to the wrong
> > > VCPU
> > > (i.e other than 0).
> > > 
> > > I think we should have the same case in vgic_enable_irqs.
> > 
> > I think it would make more sense to print a warning in
> > vgic_distr_mmio_write GICD_ITARGETSR rather than vgic_enable_irqs.
> 
> IHMO the warning is not enougth. We may screw your state machine.

That cannot happen: rank->itargets is actually unused at the moment.


> BTW, for your todo:
> 
> > +    /* TODO: evict the irq from LRs */
> 
> We should not evict the IRQ from LRs. The guest may disable the IRQ while he
> is in the IRQ context (and before the IRQ has been EOI). If you drop the IRQs
> from the LRs, this can result to a maintenance interrupt:
> 
> "If the specified Interrupt does not exist in the
> List registers, the GICH_HCR.EOIcount field is incremented, potentially
> generating a maintenance interrupt."

It is still better than the alternative: having an LR busy for no reason.
A maintenance interrupt would be harmless.
Julien Grall May 23, 2014, 5:46 p.m. UTC | #5
On 05/23/2014 06:33 PM, Stefano Stabellini wrote:
> On Thu, 22 May 2014, Julien Grall wrote:
>> On 22/05/14 18:45, Stefano Stabellini wrote:
>>> On Thu, 22 May 2014, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 22/05/14 13:32, Stefano Stabellini wrote:
>>>>> At the moment gic_remove_from_queues doesn't handle the case where the
>>>>> guest kernel disables an irq on a different vcpu compared to the one
>>>>> currently receiving the interrupt.
>>>>> Make sure to take the right vcpu lock before removing the irq from
>>>>> lr_queue.
>>>>
>>>> I see the same issue with vgic_enable_irqs. We may inject to the wrong
>>>> VCPU
>>>> (i.e other than 0).
>>>>
>>>> I think we should have the same case in vgic_enable_irqs.
>>>
>>> I think it would make more sense to print a warning in
>>> vgic_distr_mmio_write GICD_ITARGETSR rather than vgic_enable_irqs.
>>
>> IHMO the warning is not enougth. We may screw your state machine.
> 
> That cannot happen: rank->itargets is actually unused at the moment.

itargets is not used, but nothing prevent a guest to enabled an IRQ on
VCPU1. This can inject the IRQ in VCPU1 while it's already injected in
VCPU0 (assuming the IRQ what disable for a little time).

> 
>> BTW, for your todo:
>>
>>> +    /* TODO: evict the irq from LRs */
>>
>> We should not evict the IRQ from LRs. The guest may disable the IRQ while he
>> is in the IRQ context (and before the IRQ has been EOI). If you drop the IRQs
>> from the LRs, this can result to a maintenance interrupt:
>>
>> "If the specified Interrupt does not exist in the
>> List registers, the GICH_HCR.EOIcount field is incremented, potentially
>> generating a maintenance interrupt."
> 
> It is still better than the alternative: having an LR busy for no reason.
> A maintenance interrupt would be harmless.

Our internal representation (in the status field, still inflight) won't
be update-to-date for IRQ. We either inject a spurious IRQ (if it's a
virtual IRQ), other set active & pending is physical IRQ (which is
invalid from the GIC specification).

Regards,
Stefano Stabellini May 25, 2014, 3:39 p.m. UTC | #6
On Fri, 23 May 2014, Julien Grall wrote:
> On 05/23/2014 06:33 PM, Stefano Stabellini wrote:
> > On Thu, 22 May 2014, Julien Grall wrote:
> >> On 22/05/14 18:45, Stefano Stabellini wrote:
> >>> On Thu, 22 May 2014, Julien Grall wrote:
> >>>> Hi Stefano,
> >>>>
> >>>> On 22/05/14 13:32, Stefano Stabellini wrote:
> >>>>> At the moment gic_remove_from_queues doesn't handle the case where the
> >>>>> guest kernel disables an irq on a different vcpu compared to the one
> >>>>> currently receiving the interrupt.
> >>>>> Make sure to take the right vcpu lock before removing the irq from
> >>>>> lr_queue.
> >>>>
> >>>> I see the same issue with vgic_enable_irqs. We may inject to the wrong
> >>>> VCPU
> >>>> (i.e other than 0).
> >>>>
> >>>> I think we should have the same case in vgic_enable_irqs.
> >>>
> >>> I think it would make more sense to print a warning in
> >>> vgic_distr_mmio_write GICD_ITARGETSR rather than vgic_enable_irqs.
> >>
> >> IHMO the warning is not enougth. We may screw your state machine.
> > 
> > That cannot happen: rank->itargets is actually unused at the moment.
> 
> itargets is not used, but nothing prevent a guest to enabled an IRQ on
> VCPU1.

That is actually the problem: if vcpu1 is the one enabling an SPI, the
target vcpu should still be the one specified by itarget.


> This can inject the IRQ in VCPU1 while it's already injected in
> VCPU0 (assuming the IRQ what disable for a little time).
> 
> > 
> >> BTW, for your todo:
> >>
> >>> +    /* TODO: evict the irq from LRs */
> >>
> >> We should not evict the IRQ from LRs. The guest may disable the IRQ while he
> >> is in the IRQ context (and before the IRQ has been EOI). If you drop the IRQs
> >> from the LRs, this can result to a maintenance interrupt:
> >>
> >> "If the specified Interrupt does not exist in the
> >> List registers, the GICH_HCR.EOIcount field is incremented, potentially
> >> generating a maintenance interrupt."
> > 
> > It is still better than the alternative: having an LR busy for no reason.
> > A maintenance interrupt would be harmless.
> 
> Our internal representation (in the status field, still inflight) won't
> be update-to-date for IRQ. We either inject a spurious IRQ (if it's a
> virtual IRQ), other set active & pending is physical IRQ (which is
> invalid from the GIC specification).

I think that the best behaviour would be to evict the irq from LRs if
the irq is not active.
Julien Grall May 25, 2014, 5:37 p.m. UTC | #7
On 25/05/14 16:39, Stefano Stabellini wrote:
  > That is actually the problem: if vcpu1 is the one enabling an SPI, the
> target vcpu should still be the one specified by itarget.

Yes, but by enabling I mean writing in ISENABLER* on VCPU1. In this 
case, we may inject this IRQ on this CPUs (see vgic_enable_irqs).

>
>> This can inject the IRQ in VCPU1 while it's already injected in
>> VCPU0 (assuming the IRQ what disable for a little time).
>>
>>>
>>>> BTW, for your todo:
>>>>
>>>>> +    /* TODO: evict the irq from LRs */
>>>>
>>>> We should not evict the IRQ from LRs. The guest may disable the IRQ while he
>>>> is in the IRQ context (and before the IRQ has been EOI). If you drop the IRQs
>>>> from the LRs, this can result to a maintenance interrupt:
>>>>
>>>> "If the specified Interrupt does not exist in the
>>>> List registers, the GICH_HCR.EOIcount field is incremented, potentially
>>>> generating a maintenance interrupt."
>>>
>>> It is still better than the alternative: having an LR busy for no reason.
>>> A maintenance interrupt would be harmless.
>>
>> Our internal representation (in the status field, still inflight) won't
>> be update-to-date for IRQ. We either inject a spurious IRQ (if it's a
>> virtual IRQ), other set active & pending is physical IRQ (which is
>> invalid from the GIC specification).
>
> I think that the best behaviour would be to evict the irq from LRs if
> the irq is not active.

Right.

Regards,
Stefano Stabellini May 25, 2014, 5:44 p.m. UTC | #8
On Sun, 25 May 2014, Julien Grall wrote:
> On 25/05/14 16:39, Stefano Stabellini wrote:
>  > That is actually the problem: if vcpu1 is the one enabling an SPI, the
> > target vcpu should still be the one specified by itarget.
> 
> Yes, but by enabling I mean writing in ISENABLER* on VCPU1. In this case, we
> may inject this IRQ on this CPUs (see vgic_enable_irqs).

Yes, that is completely wrong, I have a patch for that. I'll send it out
separately from this series.


> > > This can inject the IRQ in VCPU1 while it's already injected in
> > > VCPU0 (assuming the IRQ what disable for a little time).
> > > 
> > > > 
> > > > > BTW, for your todo:
> > > > > 
> > > > > > +    /* TODO: evict the irq from LRs */
> > > > > 
> > > > > We should not evict the IRQ from LRs. The guest may disable the IRQ
> > > > > while he
> > > > > is in the IRQ context (and before the IRQ has been EOI). If you drop
> > > > > the IRQs
> > > > > from the LRs, this can result to a maintenance interrupt:
> > > > > 
> > > > > "If the specified Interrupt does not exist in the
> > > > > List registers, the GICH_HCR.EOIcount field is incremented,
> > > > > potentially
> > > > > generating a maintenance interrupt."
> > > > 
> > > > It is still better than the alternative: having an LR busy for no
> > > > reason.
> > > > A maintenance interrupt would be harmless.
> > > 
> > > Our internal representation (in the status field, still inflight) won't
> > > be update-to-date for IRQ. We either inject a spurious IRQ (if it's a
> > > virtual IRQ), other set active & pending is physical IRQ (which is
> > > invalid from the GIC specification).
> > 
> > I think that the best behaviour would be to evict the irq from LRs if
> > the irq is not active.
> 
> Right.
Stefano Stabellini May 25, 2014, 5:54 p.m. UTC | #9
On Sun, 25 May 2014, Stefano Stabellini wrote:
> On Sun, 25 May 2014, Julien Grall wrote:
> > On 25/05/14 16:39, Stefano Stabellini wrote:
> >  > That is actually the problem: if vcpu1 is the one enabling an SPI, the
> > > target vcpu should still be the one specified by itarget.
> > 
> > Yes, but by enabling I mean writing in ISENABLER* on VCPU1. In this case, we
> > may inject this IRQ on this CPUs (see vgic_enable_irqs).
> 
> Yes, that is completely wrong, I have a patch for that. I'll send it out
> separately from this series.

To be clear: I'll drop this patch from this series and send out two
separate patches.


> > > > This can inject the IRQ in VCPU1 while it's already injected in
> > > > VCPU0 (assuming the IRQ what disable for a little time).
> > > > 
> > > > > 
> > > > > > BTW, for your todo:
> > > > > > 
> > > > > > > +    /* TODO: evict the irq from LRs */
> > > > > > 
> > > > > > We should not evict the IRQ from LRs. The guest may disable the IRQ
> > > > > > while he
> > > > > > is in the IRQ context (and before the IRQ has been EOI). If you drop
> > > > > > the IRQs
> > > > > > from the LRs, this can result to a maintenance interrupt:
> > > > > > 
> > > > > > "If the specified Interrupt does not exist in the
> > > > > > List registers, the GICH_HCR.EOIcount field is incremented,
> > > > > > potentially
> > > > > > generating a maintenance interrupt."
> > > > > 
> > > > > It is still better than the alternative: having an LR busy for no
> > > > > reason.
> > > > > A maintenance interrupt would be harmless.
> > > > 
> > > > Our internal representation (in the status field, still inflight) won't
> > > > be update-to-date for IRQ. We either inject a spurious IRQ (if it's a
> > > > virtual IRQ), other set active & pending is physical IRQ (which is
> > > > invalid from the GIC specification).
> > > 
> > > I think that the best behaviour would be to evict the irq from LRs if
> > > the irq is not active.
> > 
> > Right.
>
diff mbox

Patch

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 2bfaba9..bb598eb 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -589,9 +589,16 @@  static inline void gic_add_to_lr_pending(struct vcpu *v, struct pending_irq *n)
 
 void gic_remove_from_queues(struct vcpu *v, unsigned int virtual_irq)
 {
-    struct pending_irq *p = irq_to_pending(v, virtual_irq);
+    struct pending_irq *p;
     unsigned long flags;
 
+    /* TODO: do not assume SPI delivery on vcpu0 */
+    if ( virtual_irq >= 32 && v->vcpu_id != 0 )
+        v = v->domain->vcpu[0];
+
+    p = irq_to_pending(v, virtual_irq);
+
+    /* TODO: evict the irq from LRs */
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
     if ( !list_empty(&p->lr_queue) )
         list_del_init(&p->lr_queue);