diff mbox

[Xen-devel,v4,1/4] xen/arm: observe itargets setting in vgic_enable_irqs and vgic_disable_irqs

Message ID 1402076908-26740-1-git-send-email-stefano.stabellini@eu.citrix.com
State New
Headers show

Commit Message

Stefano Stabellini June 6, 2014, 5:48 p.m. UTC
vgic_enable_irqs should enable irq delivery to the vcpu specified by
GICD_ITARGETSR, rather than the vcpu that wrote to GICD_ISENABLER.
Similarly vgic_disable_irqs should use the target vcpu specified by
itarget to disable irqs.

itargets can be set to a mask but vgic_get_target_vcpu always returns
the lower vcpu in the mask.

Correctly initialize itargets for SPIs.

Validate writes to GICD_ITARGETSR.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>

---

Changes in v4:
- remove assert that could allow a guest to crash Xen;
- add itargets validation to vgic_distr_mmio_write;
- export vgic_get_target_vcpu.

Changes in v3:
- add assert in get_target_vcpu;
- rename get_target_vcpu to vgic_get_target_vcpu.

Changes in v2:
- refactor the common code in get_target_vcpu;
- unify PPI and SPI paths;
- correctly initialize itargets for SPI;
- use byte_read.
---
 xen/arch/arm/vgic.c       |   60 +++++++++++++++++++++++++++++++++++++++------
 xen/include/asm-arm/gic.h |    2 ++
 2 files changed, 54 insertions(+), 8 deletions(-)

Comments

Julien Grall June 7, 2014, 2:33 p.m. UTC | #1
Hi Stefano,

On 06/06/14 18:48, Stefano Stabellini wrote:
>       return 0;
>   }
>
> @@ -369,6 +377,22 @@ read_as_zero:
>       return 1;
>   }
>
> +struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq)
> +{
> +    int target;
> +    struct vgic_irq_rank *rank;
> +    struct vcpu *v_target;
> +
> +    rank = vgic_irq_rank(v, 1, irq/32);
> +    vgic_lock_rank(v, rank);
> +    target = byte_read(rank->itargets[(irq%32)/4], 0, irq % 4);
> +    /* just return the first vcpu in the mask */
> +    target = find_next_bit((const unsigned long *) &target, 8, 0);

int* and unsigned long* doesn't have the same alignment on aarch64. This 
may end up to a data abort for Xen side.

IIRC, Ian has fixed a similar issue in commit 5224a733.

[..]

>           }
>           if ( p->desc != NULL )
>           {
> @@ -502,6 +530,7 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
>       int offset = (int)(info->gpa - v->domain->arch.vgic.dbase);
>       int gicd_reg = REG(offset);
>       uint32_t tr;
> +    int i;
>
>       switch ( gicd_reg )
>       {
> @@ -585,6 +614,21 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
>           rank = vgic_irq_rank(v, 8, gicd_reg - GICD_ITARGETSR);
>           if ( rank == NULL) goto write_ignore;
>           vgic_lock_rank(v, rank);
> +        tr = *r & ~(rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR)]);

Write in GICD_ITARGETSR can be either half-word or word. If I'm not 
mistaken you sanity check only handle word access.

Regards,
Stefano Stabellini June 9, 2014, 10:47 a.m. UTC | #2
On Sat, 7 Jun 2014, Julien Grall wrote:
> Hi Stefano,
> 
> On 06/06/14 18:48, Stefano Stabellini wrote:
> >       return 0;
> >   }
> > 
> > @@ -369,6 +377,22 @@ read_as_zero:
> >       return 1;
> >   }
> > 
> > +struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq)
> > +{
> > +    int target;
> > +    struct vgic_irq_rank *rank;
> > +    struct vcpu *v_target;
> > +
> > +    rank = vgic_irq_rank(v, 1, irq/32);
> > +    vgic_lock_rank(v, rank);
> > +    target = byte_read(rank->itargets[(irq%32)/4], 0, irq % 4);
> > +    /* just return the first vcpu in the mask */
> > +    target = find_next_bit((const unsigned long *) &target, 8, 0);
> 
> int* and unsigned long* doesn't have the same alignment on aarch64. This may
> end up to a data abort for Xen side.
> 
> IIRC, Ian has fixed a similar issue in commit 5224a733.
> 

Well spotted! Thanks!


> 
> >           }
> >           if ( p->desc != NULL )
> >           {
> > @@ -502,6 +530,7 @@ static int vgic_distr_mmio_write(struct vcpu *v,
> > mmio_info_t *info)
> >       int offset = (int)(info->gpa - v->domain->arch.vgic.dbase);
> >       int gicd_reg = REG(offset);
> >       uint32_t tr;
> > +    int i;
> > 
> >       switch ( gicd_reg )
> >       {
> > @@ -585,6 +614,21 @@ static int vgic_distr_mmio_write(struct vcpu *v,
> > mmio_info_t *info)
> >           rank = vgic_irq_rank(v, 8, gicd_reg - GICD_ITARGETSR);
> >           if ( rank == NULL) goto write_ignore;
> >           vgic_lock_rank(v, rank);
> > +        tr = *r & ~(rank->itargets[REG_RANK_INDEX(8, gicd_reg -
> > GICD_ITARGETSR)]);
> 
> Write in GICD_ITARGETSR can be either half-word or word. If I'm not mistaken
> you sanity check only handle word access.

I realize that it is a bit tricky to read, but this works for both word
and half-word accesses.
Julien Grall June 9, 2014, 11:37 a.m. UTC | #3
On 06/09/2014 11:47 AM, Stefano Stabellini wrote:
>> Write in GICD_ITARGETSR can be either half-word or word. If I'm not mistaken
>> you sanity check only handle word access.
> 
> I realize that it is a bit tricky to read, but this works for both word
> and half-word accesses.

I think you have to mask the unused bits in the register. We can't
assume that they will be all-zeroed.

Regards,
Stefano Stabellini June 9, 2014, 12:04 p.m. UTC | #4
On Mon, 9 Jun 2014, Julien Grall wrote:
> On 06/09/2014 11:47 AM, Stefano Stabellini wrote:
> >> Write in GICD_ITARGETSR can be either half-word or word. If I'm not mistaken
> >> you sanity check only handle word access.
> > 
> > I realize that it is a bit tricky to read, but this works for both word
> > and half-word accesses.
> 
> I think you have to mask the unused bits in the register. We can't
> assume that they will be all-zeroed.

That's a good hint but I already pass 32 to find_next_bit as max bit to
search for. The rest is ignored.
Julien Grall June 9, 2014, 12:32 p.m. UTC | #5
On 06/09/2014 01:04 PM, Stefano Stabellini wrote:
> On Mon, 9 Jun 2014, Julien Grall wrote:
>> On 06/09/2014 11:47 AM, Stefano Stabellini wrote:
>>>> Write in GICD_ITARGETSR can be either half-word or word. If I'm not mistaken
>>>> you sanity check only handle word access.
>>>
>>> I realize that it is a bit tricky to read, but this works for both word
>>> and half-word accesses.

Hmmm... I meant byte access not half-word. Sorry.

>> I think you have to mask the unused bits in the register. We can't
>> assume that they will be all-zeroed.
> 
> That's a good hint but I already pass 32 to find_next_bit as max bit to
> search for. The rest is ignored.
> 

So if the byte is equal 0, then you can reach a one bit in the unused
part...
Stefano Stabellini June 9, 2014, 5:21 p.m. UTC | #6
On Mon, 9 Jun 2014, Julien Grall wrote:
> On 06/09/2014 01:04 PM, Stefano Stabellini wrote:
> > On Mon, 9 Jun 2014, Julien Grall wrote:
> >> On 06/09/2014 11:47 AM, Stefano Stabellini wrote:
> >>>> Write in GICD_ITARGETSR can be either half-word or word. If I'm not mistaken
> >>>> you sanity check only handle word access.
> >>>
> >>> I realize that it is a bit tricky to read, but this works for both word
> >>> and half-word accesses.
> 
> Hmmm... I meant byte access not half-word. Sorry.
> 
> >> I think you have to mask the unused bits in the register. We can't
> >> assume that they will be all-zeroed.
> > 
> > That's a good hint but I already pass 32 to find_next_bit as max bit to
> > search for. The rest is ignored.
> > 
> 
> So if the byte is equal 0, then you can reach a one bit in the unused
> part...
 
I'll test for that even though is going to make the code a bit uglier.
Ian Campbell June 10, 2014, 11:44 a.m. UTC | #7
On Fri, 2014-06-06 at 18:48 +0100, Stefano Stabellini wrote:
> vgic_enable_irqs should enable irq delivery to the vcpu specified by
> GICD_ITARGETSR, rather than the vcpu that wrote to GICD_ISENABLER.
> Similarly vgic_disable_irqs should use the target vcpu specified by
> itarget to disable irqs.
> 
> itargets can be set to a mask but vgic_get_target_vcpu always returns
> the lower vcpu in the mask.
> 
> Correctly initialize itargets for SPIs.
> 
> Validate writes to GICD_ITARGETSR.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> 
> ---
> 
> Changes in v4:
> - remove assert that could allow a guest to crash Xen;
> - add itargets validation to vgic_distr_mmio_write;
> - export vgic_get_target_vcpu.
> 
> Changes in v3:
> - add assert in get_target_vcpu;
> - rename get_target_vcpu to vgic_get_target_vcpu.
> 
> Changes in v2:
> - refactor the common code in get_target_vcpu;
> - unify PPI and SPI paths;
> - correctly initialize itargets for SPI;
> - use byte_read.
> ---
>  xen/arch/arm/vgic.c       |   60 +++++++++++++++++++++++++++++++++++++++------
>  xen/include/asm-arm/gic.h |    2 ++
>  2 files changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> index cb8df3a..e527892 100644
> --- a/xen/arch/arm/vgic.c
> +++ b/xen/arch/arm/vgic.c
> @@ -106,7 +106,15 @@ int domain_vgic_init(struct domain *d)
>          INIT_LIST_HEAD(&d->arch.vgic.pending_irqs[i].lr_queue);
>      }
>      for (i=0; i<DOMAIN_NR_RANKS(d); i++)
> +    {
> +        int j;
> +
>          spin_lock_init(&d->arch.vgic.shared_irqs[i].lock);
> +        /* Only delivery to CPU0 */

s/delivery/deliver/.

And I think you should prefix it with "By default..."

> +        for ( j = 0 ; j < 8 ; j++ )
> +            d->arch.vgic.shared_irqs[i].itargets[j] =
> +                (1<<0) | (1<<8) | (1<<16) | (1<<24);

Since these are bytes I think you could do:
        memset(d->arch.vgic.shared_irqs[i].itargets, 0x1, sizeof(...))

> +    }
>      return 0;
>  }
>  
> @@ -369,6 +377,22 @@ read_as_zero:
>      return 1;
>  }
>  
> +struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq)
> +{
> +    int target;
> +    struct vgic_irq_rank *rank;
> +    struct vcpu *v_target;
> +
> +    rank = vgic_irq_rank(v, 1, irq/32);

Please can you do what vgic_vcpu_inject_irq does to look up the rank. Or
even better add a helper function which goes from an actual irq number
to the rank instead from a register offset to a bank (which is what
vgic_irq_rank actually is, i.e. vigc_irq_rank_from_gicd_offset or
something)

> +    vgic_lock_rank(v, rank);
> +    target = byte_read(rank->itargets[(irq%32)/4], 0, irq % 4);
> +    /* just return the first vcpu in the mask */

Is this a valid model for delivering an interrupt? I couldn't find
anything explicit in the GIC spec saying that it is valid for the
implementation to arbitrarily deliver to a single CPU.

The stuff in there about the 1-N case only deals with what happens when
one processor of the many does the IAR and what the others then see
(spurious IRQ).

To be clear: I don't object to this implementation but I think it should
either be backed up by reference to the spec or it should be explicitly
mentioned in a comment how/where/why we deviate from it.

> +    target = find_next_bit((const unsigned long *) &target, 8, 0);
> +    v_target = v->domain->vcpu[target];
> +    vgic_unlock_rank(v, rank);
> +    return v_target;
> +}
> +
>  static void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
>  {
>      const unsigned long mask = r;
> @@ -376,12 +400,14 @@ static void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
>      unsigned int irq;
>      unsigned long flags;
>      int i = 0;
> +    struct vcpu *v_target;
>  
>      while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
>          irq = i + (32 * n);
> -        p = irq_to_pending(v, irq);
> +        v_target = vgic_get_target_vcpu(v, irq);
> +        p = irq_to_pending(v_target, irq);
>          clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
> -        gic_remove_from_queues(v, irq);
> +        gic_remove_from_queues(v_target, irq);

What locks are required to poke at another vcpu's state in this way? You
don't seem to take any locks relating to v_target, and I don't think any
of the callers take any global lock e.g. you have already dropped the
rank's lock in the caller (although I confess I'm looking at the current
tree not one with all your patches applied).

WRT dropping the rank lock -- shouldn't this be inside that anyway to
handle races between different vcpus enabling/disabling a single IRQ?

Which also needs care when v==v_target if you already hold any locks
relating to v.

>          if ( p->desc != NULL )
>          {
>              spin_lock_irqsave(&p->desc->lock, flags);
> @@ -399,24 +425,26 @@ static void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
>      unsigned int irq;
>      unsigned long flags;
>      int i = 0;
> +    struct vcpu *v_target;
>  
>      while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
>          irq = i + (32 * n);
> -        p = irq_to_pending(v, irq);
> +        v_target = vgic_get_target_vcpu(v, irq);

Locking for some of this too (although I see in one case you do take the
v_target's gic lock).

> @@ -502,6 +530,7 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
>      int offset = (int)(info->gpa - v->domain->arch.vgic.dbase);
>      int gicd_reg = REG(offset);
>      uint32_t tr;
> +    int i;
>  
>      switch ( gicd_reg )
>      {
> @@ -585,6 +614,21 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
>          rank = vgic_irq_rank(v, 8, gicd_reg - GICD_ITARGETSR);
>          if ( rank == NULL) goto write_ignore;
>          vgic_lock_rank(v, rank);
> +        tr = *r & ~(rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR)]);
> +        i = 0;
> +        /* validate writes */
> +        while ( (i = find_next_bit((const unsigned long *) &tr, 32, i)) < 32 )
> +        {
> +            unsigned int target = i % 8;
> +            if ( target > v->domain->max_vcpus )
> +            {
> +                gdprintk(XENLOG_WARNING, "vGICD: GICD_ITARGETSR write invalid target vcpu %u\n",
> +                         target);

The spec says:
        A CPU targets field bit that corresponds to an unimplemented CPU
        interface is RAZ/WI.

So I think you can just implement the write with the existing code and
then mask the result in rank->itargets[] to clear any invalid CPUs,
which will be a lot simpler than this I think.

Ian.
Stefano Stabellini June 11, 2014, 11:54 a.m. UTC | #8
On Tue, 10 Jun 2014, Ian Campbell wrote:
> On Fri, 2014-06-06 at 18:48 +0100, Stefano Stabellini wrote:
> > vgic_enable_irqs should enable irq delivery to the vcpu specified by
> > GICD_ITARGETSR, rather than the vcpu that wrote to GICD_ISENABLER.
> > Similarly vgic_disable_irqs should use the target vcpu specified by
> > itarget to disable irqs.
> > 
> > itargets can be set to a mask but vgic_get_target_vcpu always returns
> > the lower vcpu in the mask.
> > 
> > Correctly initialize itargets for SPIs.
> > 
> > Validate writes to GICD_ITARGETSR.
> > 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > 
> > ---
> > 
> > Changes in v4:
> > - remove assert that could allow a guest to crash Xen;
> > - add itargets validation to vgic_distr_mmio_write;
> > - export vgic_get_target_vcpu.
> > 
> > Changes in v3:
> > - add assert in get_target_vcpu;
> > - rename get_target_vcpu to vgic_get_target_vcpu.
> > 
> > Changes in v2:
> > - refactor the common code in get_target_vcpu;
> > - unify PPI and SPI paths;
> > - correctly initialize itargets for SPI;
> > - use byte_read.
> > ---
> >  xen/arch/arm/vgic.c       |   60 +++++++++++++++++++++++++++++++++++++++------
> >  xen/include/asm-arm/gic.h |    2 ++
> >  2 files changed, 54 insertions(+), 8 deletions(-)
> > 
> > diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
> > index cb8df3a..e527892 100644
> > --- a/xen/arch/arm/vgic.c
> > +++ b/xen/arch/arm/vgic.c
> > @@ -106,7 +106,15 @@ int domain_vgic_init(struct domain *d)
> >          INIT_LIST_HEAD(&d->arch.vgic.pending_irqs[i].lr_queue);
> >      }
> >      for (i=0; i<DOMAIN_NR_RANKS(d); i++)
> > +    {
> > +        int j;
> > +
> >          spin_lock_init(&d->arch.vgic.shared_irqs[i].lock);
> > +        /* Only delivery to CPU0 */
> 
> s/delivery/deliver/.
> 
> And I think you should prefix it with "By default..."

OK


> > +        for ( j = 0 ; j < 8 ; j++ )
> > +            d->arch.vgic.shared_irqs[i].itargets[j] =
> > +                (1<<0) | (1<<8) | (1<<16) | (1<<24);
> 
> Since these are bytes I think you could do:
>         memset(d->arch.vgic.shared_irqs[i].itargets, 0x1, sizeof(...))

OK


> > +    }
> >      return 0;
> >  }
> >  
> > @@ -369,6 +377,22 @@ read_as_zero:
> >      return 1;
> >  }
> >  
> > +struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq)
> > +{
> > +    int target;
> > +    struct vgic_irq_rank *rank;
> > +    struct vcpu *v_target;
> > +
> > +    rank = vgic_irq_rank(v, 1, irq/32);
> 
> Please can you do what vgic_vcpu_inject_irq does to look up the rank. Or
> even better add a helper function which goes from an actual irq number
> to the rank instead from a register offset to a bank (which is what
> vgic_irq_rank actually is, i.e. vigc_irq_rank_from_gicd_offset or
> something)

I'll add a couple of patch to rename vgic_irq_rank and add a new helper
function.


> > +    vgic_lock_rank(v, rank);
> > +    target = byte_read(rank->itargets[(irq%32)/4], 0, irq % 4);
> > +    /* just return the first vcpu in the mask */
> 
> Is this a valid model for delivering an interrupt? I couldn't find
> anything explicit in the GIC spec saying that it is valid for the
> implementation to arbitrarily deliver to a single CPU.
> 
> The stuff in there about the 1-N case only deals with what happens when
> one processor of the many does the IAR and what the others then see
> (spurious IRQ).
> 
> To be clear: I don't object to this implementation but I think it should
> either be backed up by reference to the spec or it should be explicitly
> mentioned in a comment how/where/why we deviate from it.

I evaluated the possibility of following the spec to the letter but it
would be far too slow in a virtual environment. I'll improve the
comment.


> > +    target = find_next_bit((const unsigned long *) &target, 8, 0);
> > +    v_target = v->domain->vcpu[target];
> > +    vgic_unlock_rank(v, rank);
> > +    return v_target;
> > +}
> > +
> >  static void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
> >  {
> >      const unsigned long mask = r;
> > @@ -376,12 +400,14 @@ static void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
> >      unsigned int irq;
> >      unsigned long flags;
> >      int i = 0;
> > +    struct vcpu *v_target;
> >  
> >      while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
> >          irq = i + (32 * n);
> > -        p = irq_to_pending(v, irq);
> > +        v_target = vgic_get_target_vcpu(v, irq);
> > +        p = irq_to_pending(v_target, irq);
> >          clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
> > -        gic_remove_from_queues(v, irq);
> > +        gic_remove_from_queues(v_target, irq);
> 
> What locks are required to poke at another vcpu's state in this way? You
> don't seem to take any locks relating to v_target, and I don't think any
> of the callers take any global lock e.g. you have already dropped the
> rank's lock in the caller (although I confess I'm looking at the current
> tree not one with all your patches applied).
> 
> WRT dropping the rank lock -- shouldn't this be inside that anyway to
> handle races between different vcpus enabling/disabling a single IRQ?
> 
> Which also needs care when v==v_target if you already hold any locks
> relating to v.

gic_remove_from_queues takes the vgic lock and clearing
GIC_IRQ_GUEST_ENABLED is an atomic operation.
But you are right: they need to be executed atomically.
I'll keep the rank lock for the duration of the operation.


> >          if ( p->desc != NULL )
> >          {
> >              spin_lock_irqsave(&p->desc->lock, flags);
> > @@ -399,24 +425,26 @@ static void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
> >      unsigned int irq;
> >      unsigned long flags;
> >      int i = 0;
> > +    struct vcpu *v_target;
> >  
> >      while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
> >          irq = i + (32 * n);
> > -        p = irq_to_pending(v, irq);
> > +        v_target = vgic_get_target_vcpu(v, irq);
> 
> Locking for some of this too (although I see in one case you do take the
> v_target's gic lock).
> 
> > @@ -502,6 +530,7 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
> >      int offset = (int)(info->gpa - v->domain->arch.vgic.dbase);
> >      int gicd_reg = REG(offset);
> >      uint32_t tr;
> > +    int i;
> >  
> >      switch ( gicd_reg )
> >      {
> > @@ -585,6 +614,21 @@ static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
> >          rank = vgic_irq_rank(v, 8, gicd_reg - GICD_ITARGETSR);
> >          if ( rank == NULL) goto write_ignore;
> >          vgic_lock_rank(v, rank);
> > +        tr = *r & ~(rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR)]);
> > +        i = 0;
> > +        /* validate writes */
> > +        while ( (i = find_next_bit((const unsigned long *) &tr, 32, i)) < 32 )
> > +        {
> > +            unsigned int target = i % 8;
> > +            if ( target > v->domain->max_vcpus )
> > +            {
> > +                gdprintk(XENLOG_WARNING, "vGICD: GICD_ITARGETSR write invalid target vcpu %u\n",
> > +                         target);
> 
> The spec says:
>         A CPU targets field bit that corresponds to an unimplemented CPU
>         interface is RAZ/WI.
> 
> So I think you can just implement the write with the existing code and
> then mask the result in rank->itargets[] to clear any invalid CPUs,
> which will be a lot simpler than this I think.

OK
diff mbox

Patch

diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index cb8df3a..e527892 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -106,7 +106,15 @@  int domain_vgic_init(struct domain *d)
         INIT_LIST_HEAD(&d->arch.vgic.pending_irqs[i].lr_queue);
     }
     for (i=0; i<DOMAIN_NR_RANKS(d); i++)
+    {
+        int j;
+
         spin_lock_init(&d->arch.vgic.shared_irqs[i].lock);
+        /* Only delivery to CPU0 */
+        for ( j = 0 ; j < 8 ; j++ )
+            d->arch.vgic.shared_irqs[i].itargets[j] =
+                (1<<0) | (1<<8) | (1<<16) | (1<<24);
+    }
     return 0;
 }
 
@@ -369,6 +377,22 @@  read_as_zero:
     return 1;
 }
 
+struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq)
+{
+    int target;
+    struct vgic_irq_rank *rank;
+    struct vcpu *v_target;
+
+    rank = vgic_irq_rank(v, 1, irq/32);
+    vgic_lock_rank(v, rank);
+    target = byte_read(rank->itargets[(irq%32)/4], 0, irq % 4);
+    /* just return the first vcpu in the mask */
+    target = find_next_bit((const unsigned long *) &target, 8, 0);
+    v_target = v->domain->vcpu[target];
+    vgic_unlock_rank(v, rank);
+    return v_target;
+}
+
 static void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
 {
     const unsigned long mask = r;
@@ -376,12 +400,14 @@  static void vgic_disable_irqs(struct vcpu *v, uint32_t r, int n)
     unsigned int irq;
     unsigned long flags;
     int i = 0;
+    struct vcpu *v_target;
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        p = irq_to_pending(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
+        p = irq_to_pending(v_target, irq);
         clear_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
-        gic_remove_from_queues(v, irq);
+        gic_remove_from_queues(v_target, irq);
         if ( p->desc != NULL )
         {
             spin_lock_irqsave(&p->desc->lock, flags);
@@ -399,24 +425,26 @@  static void vgic_enable_irqs(struct vcpu *v, uint32_t r, int n)
     unsigned int irq;
     unsigned long flags;
     int i = 0;
+    struct vcpu *v_target;
 
     while ( (i = find_next_bit(&mask, 32, i)) < 32 ) {
         irq = i + (32 * n);
-        p = irq_to_pending(v, irq);
+        v_target = vgic_get_target_vcpu(v, irq);
+        p = irq_to_pending(v_target, irq);
         set_bit(GIC_IRQ_GUEST_ENABLED, &p->status);
         /* We need to force the first injection of evtchn_irq because
          * evtchn_upcall_pending is already set by common code on vcpu
          * creation. */
-        if ( irq == v->domain->arch.evtchn_irq &&
+        if ( irq == v_target->domain->arch.evtchn_irq &&
              vcpu_info(current, evtchn_upcall_pending) &&
              list_empty(&p->inflight) )
-            vgic_vcpu_inject_irq(v, irq);
+            vgic_vcpu_inject_irq(v_target, irq);
         else {
             unsigned long flags;
-            spin_lock_irqsave(&v->arch.vgic.lock, flags);
+            spin_lock_irqsave(&v_target->arch.vgic.lock, flags);
             if ( !list_empty(&p->inflight) && !test_bit(GIC_IRQ_GUEST_VISIBLE, &p->status) )
-                gic_raise_guest_irq(v, irq, p->priority);
-            spin_unlock_irqrestore(&v->arch.vgic.lock, flags);
+                gic_raise_guest_irq(v_target, irq, p->priority);
+            spin_unlock_irqrestore(&v_target->arch.vgic.lock, flags);
         }
         if ( p->desc != NULL )
         {
@@ -502,6 +530,7 @@  static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
     int offset = (int)(info->gpa - v->domain->arch.vgic.dbase);
     int gicd_reg = REG(offset);
     uint32_t tr;
+    int i;
 
     switch ( gicd_reg )
     {
@@ -585,6 +614,21 @@  static int vgic_distr_mmio_write(struct vcpu *v, mmio_info_t *info)
         rank = vgic_irq_rank(v, 8, gicd_reg - GICD_ITARGETSR);
         if ( rank == NULL) goto write_ignore;
         vgic_lock_rank(v, rank);
+        tr = *r & ~(rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR)]);
+        i = 0;
+        /* validate writes */
+        while ( (i = find_next_bit((const unsigned long *) &tr, 32, i)) < 32 )
+        {
+            unsigned int target = i % 8;
+            if ( target > v->domain->max_vcpus )
+            {
+                gdprintk(XENLOG_WARNING, "vGICD: GICD_ITARGETSR write invalid target vcpu %u\n",
+                         target);
+                vgic_unlock_rank(v, rank);
+                return 1;
+            }
+            i++;
+        }
         if ( dabt.size == 2 )
             rank->itargets[REG_RANK_INDEX(8, gicd_reg - GICD_ITARGETSR)] = *r;
         else
diff --git a/xen/include/asm-arm/gic.h b/xen/include/asm-arm/gic.h
index bf6fb1e..bd40628 100644
--- a/xen/include/asm-arm/gic.h
+++ b/xen/include/asm-arm/gic.h
@@ -227,6 +227,8 @@  int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
                   unsigned int *out_hwirq, unsigned int *out_type);
 void gic_clear_lrs(struct vcpu *v);
 
+struct vcpu *vgic_get_target_vcpu(struct vcpu *v, unsigned int irq);
+
 #endif /* __ASSEMBLY__ */
 #endif