diff mbox

[Xen-devel,v8,03/10] xen/arm: inflight irqs during migration

Message ID 1405016003-19131-3-git-send-email-stefano.stabellini@eu.citrix.com
State New
Headers show

Commit Message

Stefano Stabellini July 10, 2014, 6:13 p.m. UTC
We need to take special care when migrating irqs that are already
inflight from one vcpu to another. See "The effect of changes to an
GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
Controller Architecture Specification.

The main issue from the Xen point of view is that the lr_pending and
inflight lists are per-vcpu. The lock we take to protect them is also
per-vcpu.

In order to avoid issues, if the irq is still lr_pending, we can
immediately move it to the new vcpu for injection.

Otherwise if it is in a GICH_LR register, set a new flag
GIC_IRQ_GUEST_MIGRATING, so that we can recognize when we receive an irq
while the previous one is still inflight (given that we are only dealing
with hardware interrupts here, it just means that its LR hasn't been
cleared yet on the old vcpu).  If GIC_IRQ_GUEST_MIGRATING is set, we
only set GIC_IRQ_GUEST_QUEUED and interrupt the old vcpu. To know which
one is the old vcpu, we introduce a new field to pending_irq, called
vcpu_migrate_from.
When clearing the LR on the old vcpu, we take special care of injecting
the interrupt into the new vcpu. To do that we need to release the old
vcpu lock before taking the new vcpu lock.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---

Changes in v8:
- rebase on ab78724fc5628318b172b4344f7280621a151e1b;
- rename target to new_target to avoid conflicts.

Changes in v7:
- move the _VPF_down check before setting GIC_IRQ_GUEST_QUEUED;
- fix comments;
- rename trl to target;
- introduce vcpu_migrate_from;
- do not kick new vcpu on MIGRATING, kick the old vcpu instead;
- separate moving GIC_IRQ_GUEST_QUEUED earlier into a different patch.

Changes in v6:
- remove unnecessary casts to (const unsigned long *) to the arguments
of find_first_bit and find_next_bit;
- instroduce a corresponding smb_rmb call;
- deal with migrating an irq that is inflight and still lr_pending;
- replace the dsb with smb_wmb and smb_rmb, use them to ensure the order
of accesses to GIC_IRQ_GUEST_QUEUED and GIC_IRQ_GUEST_MIGRATING;

Changes in v5:
- pass unsigned long to find_next_bit for alignment on aarch64;
- call vgic_get_target_vcpu instead of gic_add_to_lr_pending to add the
irq in the right inflight queue;
- add barrier and bit tests to make sure that vgic_migrate_irq and
gic_update_one_lr can run simultaneously on different cpus without
issues;
- rework the loop to identify the new vcpu when ITARGETSR is written;
- use find_first_bit instead of find_next_bit.
---
 xen/arch/arm/gic.c         |   28 ++++++++++++++++++++--
 xen/arch/arm/vgic-v2.c     |   43 +++++++++++++++++++++++++++-------
 xen/arch/arm/vgic.c        |   56 ++++++++++++++++++++++++++++++++++++++++++++
 xen/include/asm-arm/vgic.h |    8 +++++++
 4 files changed, 124 insertions(+), 11 deletions(-)

Comments

Ian Campbell July 17, 2014, 12:44 p.m. UTC | #1
On Thu, 2014-07-10 at 19:13 +0100, Stefano Stabellini wrote:
> We need to take special care when migrating irqs that are already
> inflight from one vcpu to another. See "The effect of changes to an
> GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> Controller Architecture Specification.
> 
> The main issue from the Xen point of view is that the lr_pending and
> inflight lists are per-vcpu. The lock we take to protect them is also
> per-vcpu.
> 
> In order to avoid issues, if the irq is still lr_pending, we can
> immediately move it to the new vcpu for injection.
> 
> Otherwise if it is in a GICH_LR register, set a new flag
> GIC_IRQ_GUEST_MIGRATING, so that we can recognize when we receive an irq
> while the previous one is still inflight (given that we are only dealing
> with hardware interrupts here, it just means that its LR hasn't been
> cleared yet on the old vcpu).  If GIC_IRQ_GUEST_MIGRATING is set, we
> only set GIC_IRQ_GUEST_QUEUED and interrupt the old vcpu. To know which
> one is the old vcpu, we introduce a new field to pending_irq, called
> vcpu_migrate_from.
> When clearing the LR on the old vcpu, we take special care of injecting
> the interrupt into the new vcpu. To do that we need to release the old
> vcpu lock before taking the new vcpu lock.

I still think this is an awful lot of complexity and scaffolding for
something which is rare on the scale of things and which could be almost
trivially handled by requesting a maintenance interrupt for one EOI and
completing the move at that point.

In order to avoid a simple maint interrupt you are adding code to the
normal interrupt path and a potential SGI back to another processor (and
I hope I'm misreading this but it looks like an SGI back again to finish
off?). That's got to be way more costly to the first interrupt on the
new VCPU than the cost of a maintenance IRQ on the old one.

I think avoiding maintenance interrupts in general is a worthy goal, but
there are times when they are the most appropriate mechanism.

> @@ -344,6 +385,21 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int irq)
>      }
>  
>      set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
> +    vcpu_migrate_from = n->vcpu_migrate_from;
> +    /* update QUEUED before MIGRATING */
> +    smp_wmb();
> +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
> +    {
> +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> +
> +        /* The old vcpu must have EOIed the SGI but not cleared the LR.
> +         * Give it a kick. */

You mean SPI I think.

Ian.
Stefano Stabellini July 23, 2014, 2:45 p.m. UTC | #2
On Thu, 17 Jul 2014, Ian Campbell wrote:
> On Thu, 2014-07-10 at 19:13 +0100, Stefano Stabellini wrote:
> > We need to take special care when migrating irqs that are already
> > inflight from one vcpu to another. See "The effect of changes to an
> > GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> > Controller Architecture Specification.
> > 
> > The main issue from the Xen point of view is that the lr_pending and
> > inflight lists are per-vcpu. The lock we take to protect them is also
> > per-vcpu.
> > 
> > In order to avoid issues, if the irq is still lr_pending, we can
> > immediately move it to the new vcpu for injection.
> > 
> > Otherwise if it is in a GICH_LR register, set a new flag
> > GIC_IRQ_GUEST_MIGRATING, so that we can recognize when we receive an irq
> > while the previous one is still inflight (given that we are only dealing
> > with hardware interrupts here, it just means that its LR hasn't been
> > cleared yet on the old vcpu).  If GIC_IRQ_GUEST_MIGRATING is set, we
> > only set GIC_IRQ_GUEST_QUEUED and interrupt the old vcpu. To know which
> > one is the old vcpu, we introduce a new field to pending_irq, called
> > vcpu_migrate_from.
> > When clearing the LR on the old vcpu, we take special care of injecting
> > the interrupt into the new vcpu. To do that we need to release the old
> > vcpu lock before taking the new vcpu lock.
> 
> I still think this is an awful lot of complexity and scaffolding for
> something which is rare on the scale of things and which could be almost
> trivially handled by requesting a maintenance interrupt for one EOI and
> completing the move at that point.

Requesting a maintenance interrupt is not as simple as it looks:
- ATM we don't know how to edit a living GICH_LR register, we would have
to add a function for that;
- if we request a maintenance interrupt then we also need to EOI the
physical IRQ, that is something that we don't do anymore (unless
PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI but that is another matter). We would
need to understand that some physical irqs need to be EOI'ed by Xen and
some don't.

Also requesting a maintenance interrupt would only guarantee that the
vcpu is interrupted as soon as possible, but it won't save us from
having to introduce GIC_IRQ_GUEST_MIGRATING. It would only let us skip
adding vcpu_migrate_from and the 5 lines of code in
vgic_vcpu_inject_irq.

Overall I thought that this approach would be easier.


> In order to avoid a simple maint interrupt you are adding code to the
> normal interrupt path and a potential SGI back to another processor (and
> I hope I'm misreading this but it looks like an SGI back again to finish
> off?). That's got to be way more costly to the first interrupt on the
> new VCPU than the cost of a maintenance IRQ on the old one.
> 
> I think avoiding maintenance interrupts in general is a worthy goal, but
> there are times when they are the most appropriate mechanism.

To be clear the case we are talking about is when the guest kernel wants
to migrate an interrupt that is currently inflight in a GICH_LR register.

Requesting a maintenance interrupt for it would only make sure that the
old vcpu is interrupted soon after the EOI. Without it, we need to
identify which one is the old vcpu (in case of 2 consequent migrations),
I introduced vcpu_migrate_from for that, and kick it when receiving the
second interrupt if the first is still inflight. Exactly and only the
few lines of code you quoted below.

It is one SGI more in the uncommon case when we receive a second
physical interrupt without the old vcpu being interrupted yet.  In the
vast majority of cases the old vcpu has already been interrupted by
something else or by the second irq itself (we haven't changed affinity
yet) and there is no need for the additional SGI.


> > @@ -344,6 +385,21 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int irq)
> >      }
> >  
> >      set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
> > +    vcpu_migrate_from = n->vcpu_migrate_from;
> > +    /* update QUEUED before MIGRATING */
> > +    smp_wmb();
> > +    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
> > +    {
> > +        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
> > +
> > +        /* The old vcpu must have EOIed the SGI but not cleared the LR.
> > +         * Give it a kick. */
> 
> You mean SPI I think.

Yes, you are right
Ian Campbell July 23, 2014, 3:38 p.m. UTC | #3
On Wed, 2014-07-23 at 15:45 +0100, Stefano Stabellini wrote:
> On Thu, 17 Jul 2014, Ian Campbell wrote:
> > On Thu, 2014-07-10 at 19:13 +0100, Stefano Stabellini wrote:
> > > We need to take special care when migrating irqs that are already
> > > inflight from one vcpu to another. See "The effect of changes to an
> > > GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> > > Controller Architecture Specification.
> > > 
> > > The main issue from the Xen point of view is that the lr_pending and
> > > inflight lists are per-vcpu. The lock we take to protect them is also
> > > per-vcpu.
> > > 
> > > In order to avoid issues, if the irq is still lr_pending, we can
> > > immediately move it to the new vcpu for injection.
> > > 
> > > Otherwise if it is in a GICH_LR register, set a new flag
> > > GIC_IRQ_GUEST_MIGRATING, so that we can recognize when we receive an irq
> > > while the previous one is still inflight (given that we are only dealing
> > > with hardware interrupts here, it just means that its LR hasn't been
> > > cleared yet on the old vcpu).  If GIC_IRQ_GUEST_MIGRATING is set, we
> > > only set GIC_IRQ_GUEST_QUEUED and interrupt the old vcpu. To know which
> > > one is the old vcpu, we introduce a new field to pending_irq, called
> > > vcpu_migrate_from.
> > > When clearing the LR on the old vcpu, we take special care of injecting
> > > the interrupt into the new vcpu. To do that we need to release the old
> > > vcpu lock before taking the new vcpu lock.
> > 
> > I still think this is an awful lot of complexity and scaffolding for
> > something which is rare on the scale of things and which could be almost
> > trivially handled by requesting a maintenance interrupt for one EOI and
> > completing the move at that point.
> 
> Requesting a maintenance interrupt is not as simple as it looks:
> - ATM we don't know how to edit a living GICH_LR register, we would have
> to add a function for that;

That doesn't sound like a great hardship. Perhaps you can reuse the
setter function anyhow.

> - if we request a maintenance interrupt then we also need to EOI the
> physical IRQ, that is something that we don't do anymore (unless
> PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI but that is another matter). We would
> need to understand that some physical irqs need to be EOI'ed by Xen and
> some don't.

I was thinking the maintenance interrupt handler would take care of
this.

> Also requesting a maintenance interrupt would only guarantee that the
> vcpu is interrupted as soon as possible, but it won't save us from
> having to introduce GIC_IRQ_GUEST_MIGRATING.

I didn't expect GIC_IRQ_GUEST_MIGRATING to go away. If nothing else you
would need it to flag to the maintenance IRQ that it needs to EOI
+complete the migration.

>  It would only let us skip
> adding vcpu_migrate_from and the 5 lines of code in
> vgic_vcpu_inject_irq.

And the code in gic_update_one_lr I think, and most of
vgic_vcpu_inject-cpu. And more than the raw lines of code the
*complexity* would be much lower.

I think it could work like this:

On write to ITARGETSR if the interrupt is active then you set MIGRATING
and update the LR to request a maintenance IRQ.

If another interrupt occurs while this one is active then you mark it
pending just like normal (you don't care if it is migrating or not).

In the maintenance irq handler you check the migrating bit, if it is
clear then nothing to do. If it is set then you know the old cpu (it's
current) clear the LR, EOI the interrupt and write the physical
ITARGETSR (in some order, perhaps not that one). Then SGI the new
processor if the interrupt is pending.

If there have been multiple migrations then you don't care, you only
care about the current target right now as you finish it off.

> Overall I thought that this approach would be easier.
> 
> 
> > In order to avoid a simple maint interrupt you are adding code to the
> > normal interrupt path and a potential SGI back to another processor (and
> > I hope I'm misreading this but it looks like an SGI back again to finish
> > off?). That's got to be way more costly to the first interrupt on the
> > new VCPU than the cost of a maintenance IRQ on the old one.
> > 
> > I think avoiding maintenance interrupts in general is a worthy goal, but
> > there are times when they are the most appropriate mechanism.
> 
> To be clear the case we are talking about is when the guest kernel wants
> to migrate an interrupt that is currently inflight in a GICH_LR register.
> 
> Requesting a maintenance interrupt for it would only make sure that the
> old vcpu is interrupted soon after the EOI.

Yes, that's the point though. When you get this notification then you
can finish off the migration pretty much trivially with no worrying
about other inflight interrupt, pending stuff due to lazy handling of LR
cleanup etc, you just update the h/w state and you are done.

>  Without it, we need to
> identify which one is the old vcpu (in case of 2 consequent migrations),
> I introduced vcpu_migrate_from for that, and kick it when receiving the
> second interrupt if the first is still inflight. Exactly and only the
> few lines of code you quoted below.
> 
> It is one SGI more in the uncommon case when we receive a second

Two more I think?

> physical interrupt without the old vcpu being interrupted yet.  In the
> vast majority of cases the old vcpu has already been interrupted by
> something else or by the second irq itself

> (we haven't changed affinity yet)

In patch #5 you will start doing so though, meaning that the extra
SGI(s) will become more frequent, if not the common case for high
frequency IRQs.

But it's not really so much about the SGIs but all the juggling of state
(in the code as well as in our heads) about all the corner cases which
arise due to the lazy clearing of LRs done in the normal case (which I'm
fine with BTW) and ordering of the various bit tests etc. I just think
it could be done in the simplest way possible with no real overhead
because these migration events are in reality rare.

>  and there is no need for the additional SGI.

Ian.
Stefano Stabellini July 24, 2014, 2:48 p.m. UTC | #4
On Wed, 23 Jul 2014, Ian Campbell wrote:
> On Wed, 2014-07-23 at 15:45 +0100, Stefano Stabellini wrote:
> > On Thu, 17 Jul 2014, Ian Campbell wrote:
> > > On Thu, 2014-07-10 at 19:13 +0100, Stefano Stabellini wrote:
> > > > We need to take special care when migrating irqs that are already
> > > > inflight from one vcpu to another. See "The effect of changes to an
> > > > GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> > > > Controller Architecture Specification.
> > > > 
> > > > The main issue from the Xen point of view is that the lr_pending and
> > > > inflight lists are per-vcpu. The lock we take to protect them is also
> > > > per-vcpu.
> > > > 
> > > > In order to avoid issues, if the irq is still lr_pending, we can
> > > > immediately move it to the new vcpu for injection.
> > > > 
> > > > Otherwise if it is in a GICH_LR register, set a new flag
> > > > GIC_IRQ_GUEST_MIGRATING, so that we can recognize when we receive an irq
> > > > while the previous one is still inflight (given that we are only dealing
> > > > with hardware interrupts here, it just means that its LR hasn't been
> > > > cleared yet on the old vcpu).  If GIC_IRQ_GUEST_MIGRATING is set, we
> > > > only set GIC_IRQ_GUEST_QUEUED and interrupt the old vcpu. To know which
> > > > one is the old vcpu, we introduce a new field to pending_irq, called
> > > > vcpu_migrate_from.
> > > > When clearing the LR on the old vcpu, we take special care of injecting
> > > > the interrupt into the new vcpu. To do that we need to release the old
> > > > vcpu lock before taking the new vcpu lock.
> > > 
> > > I still think this is an awful lot of complexity and scaffolding for
> > > something which is rare on the scale of things and which could be almost
> > > trivially handled by requesting a maintenance interrupt for one EOI and
> > > completing the move at that point.
> > 
> > Requesting a maintenance interrupt is not as simple as it looks:
> > - ATM we don't know how to edit a living GICH_LR register, we would have
> > to add a function for that;
> 
> That doesn't sound like a great hardship. Perhaps you can reuse the
> setter function anyhow.
> 
> > - if we request a maintenance interrupt then we also need to EOI the
> > physical IRQ, that is something that we don't do anymore (unless
> > PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI but that is another matter). We would
> > need to understand that some physical irqs need to be EOI'ed by Xen and
> > some don't.
> 
> I was thinking the maintenance interrupt handler would take care of
> this.

In that case we would have to resurrect the code to loop over the
GICH_EISR* registers from maintenance_interrupt.
Anything can be done, I am just pointing out that this alternative
approach is not as cheap as it might sound.


> > Also requesting a maintenance interrupt would only guarantee that the
> > vcpu is interrupted as soon as possible, but it won't save us from
> > having to introduce GIC_IRQ_GUEST_MIGRATING.
> 
> I didn't expect GIC_IRQ_GUEST_MIGRATING to go away. If nothing else you
> would need it to flag to the maintenance IRQ that it needs to EOI
> +complete the migration.
> 
> >  It would only let us skip
> > adding vcpu_migrate_from and the 5 lines of code in
> > vgic_vcpu_inject_irq.
> 
> And the code in gic_update_one_lr I think, and most of
> vgic_vcpu_inject-cpu.
> And more than the raw lines of code the
> *complexity* would be much lower.

I don't know about the complexity. One thing is to completely get rid of
maintenance interrupts. Another is to get rid of them in most cases but
not all. Having to deal both with not having them and with having them,
increases complexity, at least in my view. It simpler to think that you
have them all the times or never.


In any case replying to this email made me realize that there is indeed
a lot of unneeded code in this patch, especially given that writing to
the physical ITARGETSR is guaranteed to affect pending (non active)
irqs.  From the ARM ARM:

"Software can write to an GICD_ITARGETSR at any time. Any change to a CPU
targets field value:

[...]

Has an effect on any pending interrupts. This means:
 — adding a CPU interface to the target list of a pending interrupt makes
   that interrupt pending on that CPU interface
 — removing a CPU interface from the target list of a pending interrupt
   removes the pending state of that interrupt on that CPU interface."


I think we can rely on this behaviour. Thanks to patch #5 we know that
we'll be receiving the second physical irq on the old cpu and from then
on the next ones always on the new cpu. So we won't need
vcpu_migrate_from, the complex ordering of MIGRATING and QUEUED, or the
maintenance_interrupt.
Ian Campbell July 24, 2014, 4:41 p.m. UTC | #5
On Thu, 2014-07-24 at 15:48 +0100, Stefano Stabellini wrote:
> On Wed, 23 Jul 2014, Ian Campbell wrote:
> > On Wed, 2014-07-23 at 15:45 +0100, Stefano Stabellini wrote:
> > > On Thu, 17 Jul 2014, Ian Campbell wrote:
> > > > On Thu, 2014-07-10 at 19:13 +0100, Stefano Stabellini wrote:
> > > > > We need to take special care when migrating irqs that are already
> > > > > inflight from one vcpu to another. See "The effect of changes to an
> > > > > GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> > > > > Controller Architecture Specification.
> > > > > 
> > > > > The main issue from the Xen point of view is that the lr_pending and
> > > > > inflight lists are per-vcpu. The lock we take to protect them is also
> > > > > per-vcpu.
> > > > > 
> > > > > In order to avoid issues, if the irq is still lr_pending, we can
> > > > > immediately move it to the new vcpu for injection.
> > > > > 
> > > > > Otherwise if it is in a GICH_LR register, set a new flag
> > > > > GIC_IRQ_GUEST_MIGRATING, so that we can recognize when we receive an irq
> > > > > while the previous one is still inflight (given that we are only dealing
> > > > > with hardware interrupts here, it just means that its LR hasn't been
> > > > > cleared yet on the old vcpu).  If GIC_IRQ_GUEST_MIGRATING is set, we
> > > > > only set GIC_IRQ_GUEST_QUEUED and interrupt the old vcpu. To know which
> > > > > one is the old vcpu, we introduce a new field to pending_irq, called
> > > > > vcpu_migrate_from.
> > > > > When clearing the LR on the old vcpu, we take special care of injecting
> > > > > the interrupt into the new vcpu. To do that we need to release the old
> > > > > vcpu lock before taking the new vcpu lock.
> > > > 
> > > > I still think this is an awful lot of complexity and scaffolding for
> > > > something which is rare on the scale of things and which could be almost
> > > > trivially handled by requesting a maintenance interrupt for one EOI and
> > > > completing the move at that point.
> > > 
> > > Requesting a maintenance interrupt is not as simple as it looks:
> > > - ATM we don't know how to edit a living GICH_LR register, we would have
> > > to add a function for that;
> > 
> > That doesn't sound like a great hardship. Perhaps you can reuse the
> > setter function anyhow.
> > 
> > > - if we request a maintenance interrupt then we also need to EOI the
> > > physical IRQ, that is something that we don't do anymore (unless
> > > PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI but that is another matter). We would
> > > need to understand that some physical irqs need to be EOI'ed by Xen and
> > > some don't.
> > 
> > I was thinking the maintenance interrupt handler would take care of
> > this.
> 
> In that case we would have to resurrect the code to loop over the
> GICH_EISR* registers from maintenance_interrupt.
> Anything can be done, I am just pointing out that this alternative
> approach is not as cheap as it might sound.

It's simple though, that's the benefit.

> > > Also requesting a maintenance interrupt would only guarantee that the
> > > vcpu is interrupted as soon as possible, but it won't save us from
> > > having to introduce GIC_IRQ_GUEST_MIGRATING.
> > 
> > I didn't expect GIC_IRQ_GUEST_MIGRATING to go away. If nothing else you
> > would need it to flag to the maintenance IRQ that it needs to EOI
> > +complete the migration.
> > 
> > >  It would only let us skip
> > > adding vcpu_migrate_from and the 5 lines of code in
> > > vgic_vcpu_inject_irq.
> > 
> > And the code in gic_update_one_lr I think, and most of
> > vgic_vcpu_inject-cpu.
> > And more than the raw lines of code the
> > *complexity* would be much lower.
> 
> I don't know about the complexity. One thing is to completely get rid of
> maintenance interrupts. Another is to get rid of them in most cases but
> not all. Having to deal both with not having them and with having them,
> increases complexity, at least in my view. It simpler to think that you
> have them all the times or never.

The way I view it is that the maintenance interrupt path is the dumb and
obvious one which is always there and can always be used and doesn't
need thinking about. Then the other optimisations are finding ways to
avoid actually using it, but can always fall back to the dumb way if
something too complex to deal with occurs.

> In any case replying to this email made me realize that there is indeed
> a lot of unneeded code in this patch, especially given that writing to
> the physical ITARGETSR is guaranteed to affect pending (non active)
> irqs.  From the ARM ARM:
> 
> "Software can write to an GICD_ITARGETSR at any time. Any change to a CPU
> targets field value:
> 
> [...]
> 
> Has an effect on any pending interrupts. This means:
>  — adding a CPU interface to the target list of a pending interrupt makes
>    that interrupt pending on that CPU interface
>  — removing a CPU interface from the target list of a pending interrupt
>    removes the pending state of that interrupt on that CPU interface."
> 
> 
> I think we can rely on this behaviour. Thanks to patch #5 we know that
> we'll be receiving the second physical irq on the old cpu and from then
> on the next ones always on the new cpu. So we won't need
> vcpu_migrate_from, the complex ordering of MIGRATING and QUEUED, or the
> maintenance_interrupt.

That would be good to avoid all that for sure and would certainly impact
my opinion of the complexity cost of this stuff.

Are you sure about the second physical IRQ always hitting on the source
pCPU though? I'm unclear about where the physical ITARGETSR gets written
in the scheme you are proposing.

Ian.
Stefano Stabellini July 24, 2014, 4:45 p.m. UTC | #6
On Thu, 24 Jul 2014, Ian Campbell wrote:
> On Thu, 2014-07-24 at 15:48 +0100, Stefano Stabellini wrote:
> > On Wed, 23 Jul 2014, Ian Campbell wrote:
> > > On Wed, 2014-07-23 at 15:45 +0100, Stefano Stabellini wrote:
> > > > On Thu, 17 Jul 2014, Ian Campbell wrote:
> > > > > On Thu, 2014-07-10 at 19:13 +0100, Stefano Stabellini wrote:
> > > > > > We need to take special care when migrating irqs that are already
> > > > > > inflight from one vcpu to another. See "The effect of changes to an
> > > > > > GICD_ITARGETSR", part of chapter 4.3.12 of the ARM Generic Interrupt
> > > > > > Controller Architecture Specification.
> > > > > > 
> > > > > > The main issue from the Xen point of view is that the lr_pending and
> > > > > > inflight lists are per-vcpu. The lock we take to protect them is also
> > > > > > per-vcpu.
> > > > > > 
> > > > > > In order to avoid issues, if the irq is still lr_pending, we can
> > > > > > immediately move it to the new vcpu for injection.
> > > > > > 
> > > > > > Otherwise if it is in a GICH_LR register, set a new flag
> > > > > > GIC_IRQ_GUEST_MIGRATING, so that we can recognize when we receive an irq
> > > > > > while the previous one is still inflight (given that we are only dealing
> > > > > > with hardware interrupts here, it just means that its LR hasn't been
> > > > > > cleared yet on the old vcpu).  If GIC_IRQ_GUEST_MIGRATING is set, we
> > > > > > only set GIC_IRQ_GUEST_QUEUED and interrupt the old vcpu. To know which
> > > > > > one is the old vcpu, we introduce a new field to pending_irq, called
> > > > > > vcpu_migrate_from.
> > > > > > When clearing the LR on the old vcpu, we take special care of injecting
> > > > > > the interrupt into the new vcpu. To do that we need to release the old
> > > > > > vcpu lock before taking the new vcpu lock.
> > > > > 
> > > > > I still think this is an awful lot of complexity and scaffolding for
> > > > > something which is rare on the scale of things and which could be almost
> > > > > trivially handled by requesting a maintenance interrupt for one EOI and
> > > > > completing the move at that point.
> > > > 
> > > > Requesting a maintenance interrupt is not as simple as it looks:
> > > > - ATM we don't know how to edit a living GICH_LR register, we would have
> > > > to add a function for that;
> > > 
> > > That doesn't sound like a great hardship. Perhaps you can reuse the
> > > setter function anyhow.
> > > 
> > > > - if we request a maintenance interrupt then we also need to EOI the
> > > > physical IRQ, that is something that we don't do anymore (unless
> > > > PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI but that is another matter). We would
> > > > need to understand that some physical irqs need to be EOI'ed by Xen and
> > > > some don't.
> > > 
> > > I was thinking the maintenance interrupt handler would take care of
> > > this.
> > 
> > In that case we would have to resurrect the code to loop over the
> > GICH_EISR* registers from maintenance_interrupt.
> > Anything can be done, I am just pointing out that this alternative
> > approach is not as cheap as it might sound.
> 
> It's simple though, that's the benefit.
> 
> > > > Also requesting a maintenance interrupt would only guarantee that the
> > > > vcpu is interrupted as soon as possible, but it won't save us from
> > > > having to introduce GIC_IRQ_GUEST_MIGRATING.
> > > 
> > > I didn't expect GIC_IRQ_GUEST_MIGRATING to go away. If nothing else you
> > > would need it to flag to the maintenance IRQ that it needs to EOI
> > > +complete the migration.
> > > 
> > > >  It would only let us skip
> > > > adding vcpu_migrate_from and the 5 lines of code in
> > > > vgic_vcpu_inject_irq.
> > > 
> > > And the code in gic_update_one_lr I think, and most of
> > > vgic_vcpu_inject-cpu.
> > > And more than the raw lines of code the
> > > *complexity* would be much lower.
> > 
> > I don't know about the complexity. One thing is to completely get rid of
> > maintenance interrupts. Another is to get rid of them in most cases but
> > not all. Having to deal both with not having them and with having them,
> > increases complexity, at least in my view. It simpler to think that you
> > have them all the times or never.
> 
> The way I view it is that the maintenance interrupt path is the dumb and
> obvious one which is always there and can always be used and doesn't
> need thinking about. Then the other optimisations are finding ways to
> avoid actually using it, but can always fall back to the dumb way if
> something too complex to deal with occurs.
> 
> > In any case replying to this email made me realize that there is indeed
> > a lot of unneeded code in this patch, especially given that writing to
> > the physical ITARGETSR is guaranteed to affect pending (non active)
> > irqs.  From the ARM ARM:
> > 
> > "Software can write to an GICD_ITARGETSR at any time. Any change to a CPU
> > targets field value:
> > 
> > [...]
> > 
> > Has an effect on any pending interrupts. This means:
> >  — adding a CPU interface to the target list of a pending interrupt makes
> >    that interrupt pending on that CPU interface
> >  — removing a CPU interface from the target list of a pending interrupt
> >    removes the pending state of that interrupt on that CPU interface."
> > 
> > 
> > I think we can rely on this behaviour. Thanks to patch #5 we know that
> > we'll be receiving the second physical irq on the old cpu and from then
> > on the next ones always on the new cpu. So we won't need
> > vcpu_migrate_from, the complex ordering of MIGRATING and QUEUED, or the
> > maintenance_interrupt.
> 
> That would be good to avoid all that for sure and would certainly impact
> my opinion of the complexity cost of this stuff.
> 
> Are you sure about the second physical IRQ always hitting on the source
> pCPU though? I'm unclear about where the physical ITARGETSR gets written
> in the scheme you are proposing.

It gets written right away if there are no inflight irqs. Otherwise it
gets written when clearing the LRs. That's why we are sure it is going
to hit the old cpu. If the vcpu gets descheduled after EOIing the irq,
that is also fine because Xen is going to clear the LRs on hypervisor
entry.
Ian Campbell July 24, 2014, 4:48 p.m. UTC | #7
On Thu, 2014-07-24 at 17:45 +0100, Stefano Stabellini wrote:
> > Are you sure about the second physical IRQ always hitting on the source
> > pCPU though? I'm unclear about where the physical ITARGETSR gets written
> > in the scheme you are proposing.
> 
> It gets written right away if there are no inflight irqs.

There's no way that something can be pending in the physical GIC at this
point? i.e. because it happened since we took the trap?

>  Otherwise it
> gets written when clearing the LRs. That's why we are sure it is going
> to hit the old cpu. If the vcpu gets descheduled after EOIing the irq,
> that is also fine because Xen is going to clear the LRs on hypervisor
> entry.

Ian.
Stefano Stabellini July 24, 2014, 4:49 p.m. UTC | #8
On Thu, 24 Jul 2014, Ian Campbell wrote:
> On Thu, 2014-07-24 at 17:45 +0100, Stefano Stabellini wrote:
> > > Are you sure about the second physical IRQ always hitting on the source
> > > pCPU though? I'm unclear about where the physical ITARGETSR gets written
> > > in the scheme you are proposing.
> > 
> > It gets written right away if there are no inflight irqs.
> 
> There's no way that something can be pending in the physical GIC at this
> point? i.e. because it happened since we took the trap?

If it is pending is OK: ARM ARM states that writes to itarget affect
pending irqs too. Only active irqs are not affected. But we deal with
that case separately.


> >  Otherwise it
> > gets written when clearing the LRs. That's why we are sure it is going
> > to hit the old cpu. If the vcpu gets descheduled after EOIing the irq,
> > that is also fine because Xen is going to clear the LRs on hypervisor
> > entry.
> 
> Ian.
>
Ian Campbell July 25, 2014, 9:08 a.m. UTC | #9
On Thu, 2014-07-24 at 17:49 +0100, Stefano Stabellini wrote:
> On Thu, 24 Jul 2014, Ian Campbell wrote:
> > On Thu, 2014-07-24 at 17:45 +0100, Stefano Stabellini wrote:
> > > > Are you sure about the second physical IRQ always hitting on the source
> > > > pCPU though? I'm unclear about where the physical ITARGETSR gets written
> > > > in the scheme you are proposing.
> > > 
> > > It gets written right away if there are no inflight irqs.
> > 
> > There's no way that something can be pending in the physical GIC at this
> > point? i.e. because it happened since we took the trap?
> 
> If it is pending is OK: ARM ARM states that writes to itarget affect
> pending irqs too. Only active irqs are not affected. But we deal with
> that case separately.

So at the point where we write itarget the pending IRQ can potentially
be injected onto the new pCPU immediately, since it won't have it's IRQs
disabled etc.

Does our locking cope with that case?

Also the pending we are talking about here is in the physical
distributor, right? Not the various software state bits which we track.

So an IRQ which we have in lr_pending is actually active in the real
distributor (since we must have taken the interrupt to get it onto our
lists). I'm not sure how/if that changes your reasoning about these
things.

Ian.
diff mbox

Patch

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 256d9cf..a0252f9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -33,6 +33,7 @@ 
 #include <asm/device.h>
 #include <asm/io.h>
 #include <asm/gic.h>
+#include <asm/vgic.h>
 
 static void gic_restore_pending_irqs(struct vcpu *v);
 
@@ -375,10 +376,33 @@  static void gic_update_one_lr(struct vcpu *v, int i)
         clear_bit(GIC_IRQ_GUEST_ACTIVE, &p->status);
         p->lr = GIC_INVALID_LR;
         if ( test_bit(GIC_IRQ_GUEST_ENABLED, &p->status) &&
-             test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) )
+             test_bit(GIC_IRQ_GUEST_QUEUED, &p->status) &&
+             !test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
             gic_raise_guest_irq(v, irq, p->priority);
-        else
+        else {
+            int m, q;
             list_del_init(&p->inflight);
+
+            m = test_and_clear_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
+            p->vcpu_migrate_from = NULL;
+            /* check MIGRATING before QUEUED */
+            smp_rmb();
+            q = test_bit(GIC_IRQ_GUEST_QUEUED, &p->status);
+            if ( m && q )
+            {
+                struct vcpu *v_target;
+
+                /* It is safe to temporarily drop the lock because we
+                 * are finished dealing with this LR. We'll take the
+                 * lock before reading the next. */
+                spin_unlock(&v->arch.vgic.lock);
+                /* vgic_get_target_vcpu takes the rank lock, ensuring
+                 * consistency with other itarget changes. */
+                v_target = vgic_get_target_vcpu(v, irq);
+                vgic_vcpu_inject_irq(v_target, irq);
+                spin_lock(&v->arch.vgic.lock);
+            }
+        }
     }
 }
 
diff --git a/xen/arch/arm/vgic-v2.c b/xen/arch/arm/vgic-v2.c
index 9629cbe..5116a99 100644
--- a/xen/arch/arm/vgic-v2.c
+++ b/xen/arch/arm/vgic-v2.c
@@ -356,34 +356,59 @@  static int vgic_v2_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
         goto write_ignore;
 
     case GICD_ITARGETSR + 8 ... GICD_ITARGETSRN:
+    {
+        /* unsigned long needed for find_next_bit */
+        unsigned long target;
+        int i;
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
         rank = vgic_rank_offset(v, 8, gicd_reg - GICD_ITARGETSR, DABT_WORD);
         if ( rank == NULL) goto write_ignore;
         /* 8-bit vcpu mask for this domain */
         BUG_ON(v->domain->max_vcpus > 8);
-        tr = (1 << v->domain->max_vcpus) - 1;
+        target = (1 << v->domain->max_vcpus) - 1;
         if ( dabt.size == 2 )
-            tr = tr | (tr << 8) | (tr << 16) | (tr << 24);
+            target = target | (target << 8) | (target << 16) | (target << 24);
         else
-            tr = (tr << (8 * (gicd_reg & 0x3)));
-        tr &= *r;
+            target = (target << (8 * (gicd_reg & 0x3)));
+        target &= *r;
         /* ignore zero writes */
-        if ( !tr )
+        if ( !target )
             goto write_ignore;
         /* For word reads ignore writes where any single byte is zero */
         if ( dabt.size == 2 &&
-            !((tr & 0xff) && (tr & (0xff << 8)) &&
-             (tr & (0xff << 16)) && (tr & (0xff << 24))))
+            !((target & 0xff) && (target & (0xff << 8)) &&
+             (target & (0xff << 16)) && (target & (0xff << 24))))
             goto write_ignore;
         vgic_lock_rank(v, rank);
+        i = 0;
+        while ( (i = find_next_bit(&target, 32, i)) < 32 )
+        {
+            unsigned int irq, new_target, old_target;
+            unsigned long old_target_mask;
+            struct vcpu *v_target, *v_old;
+
+            new_target = i % 8;
+            old_target_mask = vgic_byte_read(rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR, DABT_WORD)], 0, i/8);
+            old_target = find_first_bit(&old_target_mask, 8);
+
+            if ( new_target != old_target )
+            {
+                irq = gicd_reg + (i / 8);
+                v_target = v->domain->vcpu[new_target];
+                v_old = v->domain->vcpu[old_target];
+                vgic_migrate_irq(v_old, v_target, irq);
+            }
+            i += 8 - new_target;
+        }
         if ( dabt.size == DABT_WORD )
             rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR,
-                                          DABT_WORD)] = tr;
+                                          DABT_WORD)] = target;
         else
             vgic_byte_write(&rank->itargets[REG_RANK_INDEX(8,
-                       gicd_reg - GICD_ITARGETSR, DABT_WORD)], tr, gicd_reg);
+                       gicd_reg - GICD_ITARGETSR, DABT_WORD)], target, gicd_reg);
         vgic_unlock_rank(v, rank);
         return 1;
+    }
 
     case GICD_IPRIORITYR ... GICD_IPRIORITYRN:
         if ( dabt.size != DABT_BYTE && dabt.size != DABT_WORD ) goto bad_width;
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 538e7a1..fbe3293 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -182,6 +182,46 @@  struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq)
     return v_target;
 }
 
+void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq)
+{
+    unsigned long flags;
+    struct pending_irq *p = irq_to_pending(old, irq);
+
+    /* nothing to do for virtual interrupts */
+    if ( p->desc == NULL )
+        return;
+
+    /* migration already in progress, no need to do anything */
+    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &p->status) )
+        return;
+
+    spin_lock_irqsave(&old->arch.vgic.lock, flags);
+
+    if ( list_empty(&p->inflight) )
+    {
+        spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
+        return;
+    }
+    /* If the IRQ is still lr_pending, re-inject it to the new vcpu */
+    if ( !list_empty(&p->lr_queue) )
+    {
+        list_del_init(&p->lr_queue);
+        list_del_init(&p->inflight);
+        spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
+        vgic_vcpu_inject_irq(new, irq);
+        return;
+    }
+    /* if the IRQ is in a GICH_LR register, set GIC_IRQ_GUEST_MIGRATING
+     * and wait for the EOI */
+    if ( !list_empty(&p->inflight) )
+    {
+        p->vcpu_migrate_from = old;
+        set_bit(GIC_IRQ_GUEST_MIGRATING, &p->status);
+    }
+
+    spin_unlock_irqrestore(&old->arch.vgic.lock, flags);
+}
+
 void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
 {
     const unsigned long mask = r;
@@ -333,6 +373,7 @@  void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int irq)
     struct pending_irq *iter, *n = irq_to_pending(v, irq);
     unsigned long flags;
     bool_t running;
+    struct vcpu *vcpu_migrate_from;
 
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
@@ -344,6 +385,21 @@  void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int irq)
     }
 
     set_bit(GIC_IRQ_GUEST_QUEUED, &n->status);
+    vcpu_migrate_from = n->vcpu_migrate_from;
+    /* update QUEUED before MIGRATING */
+    smp_wmb();
+    if ( test_bit(GIC_IRQ_GUEST_MIGRATING, &n->status) )
+    {
+        spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+
+        /* The old vcpu must have EOIed the SGI but not cleared the LR.
+         * Give it a kick. */
+        running = n->vcpu_migrate_from->is_running;
+        vcpu_unblock(n->vcpu_migrate_from);
+        if ( running )
+            smp_send_event_check_mask(cpumask_of(n->vcpu_migrate_from->processor));
+        return;
+    }
 
     if ( !list_empty(&n->inflight) )
     {
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 19eed7e..c66578b 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -55,17 +55,24 @@  struct pending_irq
      * GIC_IRQ_GUEST_ENABLED: the guest IRQ is enabled at the VGICD
      * level (GICD_ICENABLER/GICD_ISENABLER).
      *
+     * GIC_IRQ_GUEST_MIGRATING: the irq is being migrated to a different
+     * vcpu while it is still inflight and on an GICH_LR register on the
+     * old vcpu.
+     *
      */
 #define GIC_IRQ_GUEST_QUEUED   0
 #define GIC_IRQ_GUEST_ACTIVE   1
 #define GIC_IRQ_GUEST_VISIBLE  2
 #define GIC_IRQ_GUEST_ENABLED  3
+#define GIC_IRQ_GUEST_MIGRATING   4
     unsigned long status;
     struct irq_desc *desc; /* only set it the irq corresponds to a physical irq */
     int irq;
 #define GIC_INVALID_LR         ~(uint8_t)0
     uint8_t lr;
     uint8_t priority;
+    /* keeps track of the vcpu this irq is currently migrating from */
+    struct vcpu *vcpu_migrate_from;
     /* inflight is used to append instances of pending_irq to
      * vgic.inflight_irqs */
     struct list_head inflight;
@@ -164,6 +171,7 @@  extern int vcpu_vgic_free(struct vcpu *v);
 extern int vgic_to_sgi(struct vcpu *v, register_t sgir,
                        enum gic_sgi_mode irqmode, int virq,
                        unsigned long vcpu_mask);
+extern void vgic_migrate_irq(struct vcpu *old, struct vcpu *new, unsigned int irq);
 #endif /* __ASM_ARM_VGIC_H__ */
 
 /*