diff mbox

[1/3] arm/arm64: KVM: Fix arch timer behavior for disabled interrupts

Message ID 1445113822-7831-2-git-send-email-christoffer.dall@linaro.org
State New
Headers show

Commit Message

Christoffer Dall Oct. 17, 2015, 8:30 p.m. UTC
We have an interesting issue when the guest disables the timer interrupt
on the VGIC, which happens when turning VCPUs off using PSCI, for
example.

The problem is that because the guest disables the virtual interrupt at
the VGIC level, we never inject interrupts to the guest and therefore
never mark the interrupt as active on the physical distributor.  The
host also never takes the timer interrupt (we only use the timer device
to trigger a guest exit and everything else is done in software), so the
interrupt does not become active through normal means.

The result is that we keep entering the guest with a programmed timer
that will always fire as soon as we context switch the hardware timer
state and run the guest, preventing forward progress for the VCPU.

Since the active state on the physical distributor is really part of the
timer logic, it is the job of our virtual arch timer driver to manage
this state.

The timer->map->active boolean field indicates whether we have signalled
this interrupt to the vgic and if that interrupt is still pending or
active.  As long as that is the case, the hardware doesn't have to
generate physical interrupts and therefore we mark the interrupt as
active on the physical distributor.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/arch_timer.c | 19 +++++++++++++++++++
 virt/kvm/arm/vgic.c       | 43 +++++++++++--------------------------------
 2 files changed, 30 insertions(+), 32 deletions(-)

Comments

Christoffer Dall Oct. 17, 2015, 9:50 p.m. UTC | #1
On Sat, Oct 17, 2015 at 10:30:20PM +0200, Christoffer Dall wrote:
> We have an interesting issue when the guest disables the timer interrupt
> on the VGIC, which happens when turning VCPUs off using PSCI, for
> example.
> 
> The problem is that because the guest disables the virtual interrupt at
> the VGIC level, we never inject interrupts to the guest and therefore
> never mark the interrupt as active on the physical distributor.  The
> host also never takes the timer interrupt (we only use the timer device
> to trigger a guest exit and everything else is done in software), so the
> interrupt does not become active through normal means.
> 
> The result is that we keep entering the guest with a programmed timer
> that will always fire as soon as we context switch the hardware timer
> state and run the guest, preventing forward progress for the VCPU.
> 
> Since the active state on the physical distributor is really part of the
> timer logic, it is the job of our virtual arch timer driver to manage
> this state.
> 
> The timer->map->active boolean field indicates whether we have signalled
> this interrupt to the vgic and if that interrupt is still pending or
> active.  As long as that is the case, the hardware doesn't have to
> generate physical interrupts and therefore we mark the interrupt as
> active on the physical distributor.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
Marc was worried about the performance implications of this fix on
Mustang given the potentially slow MMIO path to the GIC on that system,
so I ran some before and after applying this series:

BM	Hackbench	Kernbench	PbZip C		PbZip D
--	---------	---------	-------		-------
Before	17.94		51.66		17.69		10.59
After	18.14		51.62		17.82		10.62

The slight increase on hackbench is well within the variability (1.409
for the 8 runs behind these numbers) so I don't think this will be
noticable.  That said, there's room for optimizations here by only
touching the GIC on vcpu load/put and when the value changes, but I
think this is premature.

-Christoffer
Auger Eric Oct. 19, 2015, 1:07 p.m. UTC | #2
Hi Christoffer,
On 10/17/2015 10:30 PM, Christoffer Dall wrote:
> We have an interesting issue when the guest disables the timer interrupt
> on the VGIC, which happens when turning VCPUs off using PSCI, for
> example.
> 
> The problem is that because the guest disables the virtual interrupt at
> the VGIC level, we never inject interrupts to the guest and therefore
> never mark the interrupt as active on the physical distributor.  The
> host also never takes the timer interrupt (we only use the timer device
> to trigger a guest exit and everything else is done in software), so the
> interrupt does not become active through normal means.
> 
> The result is that we keep entering the guest with a programmed timer
> that will always fire as soon as we context switch the hardware timer
> state and run the guest, preventing forward progress for the VCPU.
> 
> Since the active state on the physical distributor is really part of the
> timer logic, it is the job of our virtual arch timer driver to manage
> this state.
> 
> The timer->map->active boolean field indicates whether we have signalled
> this interrupt to the vgic and if that interrupt is still pending or
> active.  As long as that is the case, the hardware doesn't have to
> generate physical interrupts and therefore we mark the interrupt as
> active on the physical distributor.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/arch_timer.c | 19 +++++++++++++++++++
>  virt/kvm/arm/vgic.c       | 43 +++++++++++--------------------------------
>  2 files changed, 30 insertions(+), 32 deletions(-)
> 
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 48c6e1a..b9d3a32 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -137,6 +137,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  {
>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> +	bool phys_active;
> +	int ret;
>  
>  	/*
>  	 * We're about to run this vcpu again, so there is no need to
> @@ -151,6 +153,23 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>  	 */
>  	if (kvm_timer_should_fire(vcpu))
>  		kvm_timer_inject_irq(vcpu);
> +
> +	/*
> +	 * We keep track of whether the edge-triggered interrupt has been
> +	 * signalled to the vgic/guest, and if so, we mask the interrupt and
> +	 * the physical distributor to prevent the timer from raising a
> +	 * physical interrupt whenever we run a guest, preventing forward
> +	 * VCPU progress.
In practice don't you simply mark the IRQ as active at GIC physical
distributor level, hence preventing the same IRQ from hitting again
> +	 */
> +	if (kvm_vgic_get_phys_irq_active(timer->map))
> +		phys_active = true;
> +	else
> +		phys_active = false;
> +
> +	ret = irq_set_irqchip_state(timer->map->irq,
> +				    IRQCHIP_STATE_ACTIVE,
> +				    phys_active);

physical distributor state is set in arch timer flush. It relates to a
shared device behavior so I find it natural to do it there.

However the map->active is set in arch_timer IRQ injection and unset in
vgic sync. Why not doing the set in kvm_vgic_inject_mapped_irq?

> +	WARN_ON(ret);
>  }
>  
>  /**
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 596455a..ea21bc2 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1092,6 +1092,15 @@ static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
>  
> +	/*
> +	 * We must transfer the pending state back to the distributor before
> +	 * retiring the LR, otherwise we may loose edge-triggered interrupts.
> +	 */
> +	if (vlr.state & LR_STATE_PENDING) {
> +		vgic_dist_irq_set_pending(vcpu, irq);
> +		vlr.hwirq = 0;
> +	}
That fix applies to any edge-sensitive IRQ, ie. not especially the
timer's one? In the positive shouldn't you precise this in the commit
msg too?

Best Regards

Eric

> +
>  	vlr.state = 0;
>  	vgic_set_lr(vcpu, lr_nr, vlr);
>  	clear_bit(lr_nr, vgic_cpu->lr_used);
> @@ -1241,7 +1250,7 @@ static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>  	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
>  	unsigned long *pa_percpu, *pa_shared;
> -	int i, vcpu_id, lr, ret;
> +	int i, vcpu_id;
>  	int overflow = 0;
>  	int nr_shared = vgic_nr_shared_irqs(dist);
>  
> @@ -1296,31 +1305,6 @@ epilog:
>  		 */
>  		clear_bit(vcpu_id, dist->irq_pending_on_cpu);
>  	}
> -
> -	for (lr = 0; lr < vgic->nr_lr; lr++) {
> -		struct vgic_lr vlr;
> -
> -		if (!test_bit(lr, vgic_cpu->lr_used))
> -			continue;
> -
> -		vlr = vgic_get_lr(vcpu, lr);
> -
> -		/*
> -		 * If we have a mapping, and the virtual interrupt is
> -		 * presented to the guest (as pending or active), then we must
> -		 * set the state to active in the physical world. See
> -		 * Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt.
> -		 */
> -		if (vlr.state & LR_HW) {
> -			struct irq_phys_map *map;
> -			map = vgic_irq_map_search(vcpu, vlr.irq);
> -
> -			ret = irq_set_irqchip_state(map->irq,
> -						    IRQCHIP_STATE_ACTIVE,
> -						    true);
> -			WARN_ON(ret);
> -		}
> -	}
>  }
>  
>  static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
> @@ -1430,13 +1414,8 @@ static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
>  
>  	WARN_ON(ret);
>  
> -	if (map->active) {
> -		ret = irq_set_irqchip_state(map->irq,
> -					    IRQCHIP_STATE_ACTIVE,
> -					    false);
> -		WARN_ON(ret);
> +	if (map->active)
>  		return 0;
> -	}
>  
>  	return 1;
>  }
>
Christoffer Dall Oct. 19, 2015, 1:14 p.m. UTC | #3
On Mon, Oct 19, 2015 at 03:07:16PM +0200, Eric Auger wrote:
> Hi Christoffer,
> On 10/17/2015 10:30 PM, Christoffer Dall wrote:
> > We have an interesting issue when the guest disables the timer interrupt
> > on the VGIC, which happens when turning VCPUs off using PSCI, for
> > example.
> > 
> > The problem is that because the guest disables the virtual interrupt at
> > the VGIC level, we never inject interrupts to the guest and therefore
> > never mark the interrupt as active on the physical distributor.  The
> > host also never takes the timer interrupt (we only use the timer device
> > to trigger a guest exit and everything else is done in software), so the
> > interrupt does not become active through normal means.
> > 
> > The result is that we keep entering the guest with a programmed timer
> > that will always fire as soon as we context switch the hardware timer
> > state and run the guest, preventing forward progress for the VCPU.
> > 
> > Since the active state on the physical distributor is really part of the
> > timer logic, it is the job of our virtual arch timer driver to manage
> > this state.
> > 
> > The timer->map->active boolean field indicates whether we have signalled
> > this interrupt to the vgic and if that interrupt is still pending or
> > active.  As long as that is the case, the hardware doesn't have to
> > generate physical interrupts and therefore we mark the interrupt as
> > active on the physical distributor.
> > 
> > Cc: Marc Zyngier <marc.zyngier@arm.com>
> > Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  virt/kvm/arm/arch_timer.c | 19 +++++++++++++++++++
> >  virt/kvm/arm/vgic.c       | 43 +++++++++++--------------------------------
> >  2 files changed, 30 insertions(+), 32 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 48c6e1a..b9d3a32 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -137,6 +137,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  {
> >  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> > +	bool phys_active;
> > +	int ret;
> >  
> >  	/*
> >  	 * We're about to run this vcpu again, so there is no need to
> > @@ -151,6 +153,23 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	 */
> >  	if (kvm_timer_should_fire(vcpu))
> >  		kvm_timer_inject_irq(vcpu);
> > +
> > +	/*
> > +	 * We keep track of whether the edge-triggered interrupt has been
> > +	 * signalled to the vgic/guest, and if so, we mask the interrupt and
> > +	 * the physical distributor to prevent the timer from raising a
> > +	 * physical interrupt whenever we run a guest, preventing forward
> > +	 * VCPU progress.
> In practice don't you simply mark the IRQ as active at GIC physical
> distributor level, hence preventing the same IRQ from hitting again

yes, that's what I meant with my comment, I should reword to "...we mark
the interrupt as active on the physical distributor..."

> > +	 */
> > +	if (kvm_vgic_get_phys_irq_active(timer->map))
> > +		phys_active = true;
> > +	else
> > +		phys_active = false;
> > +
> > +	ret = irq_set_irqchip_state(timer->map->irq,
> > +				    IRQCHIP_STATE_ACTIVE,
> > +				    phys_active);
> 
> physical distributor state is set in arch timer flush. It relates to a
> shared device behavior so I find it natural to do it there.
> 
> However the map->active is set in arch_timer IRQ injection and unset in
> vgic sync. Why not doing the set in kvm_vgic_inject_mapped_irq?

Because you have to set it at every entry to the guest if you run
multiple VCPUs/VMs on this CPU or migrate this VCPU to a different CPU.

> 
> > +	WARN_ON(ret);
> >  }
> >  
> >  /**
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index 596455a..ea21bc2 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1092,6 +1092,15 @@ static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
> >  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
> >  	struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
> >  
> > +	/*
> > +	 * We must transfer the pending state back to the distributor before
> > +	 * retiring the LR, otherwise we may loose edge-triggered interrupts.
> > +	 */
> > +	if (vlr.state & LR_STATE_PENDING) {
> > +		vgic_dist_irq_set_pending(vcpu, irq);
> > +		vlr.hwirq = 0;
> > +	}
> That fix applies to any edge-sensitive IRQ, ie. not especially the
> timer's one? In the positive shouldn't you precise this in the commit
> msg too?
> 

Probably, it could also be a separate patch.  I'll rework this.

Thanks,
-Christoffer
Auger Eric Oct. 19, 2015, 1:27 p.m. UTC | #4
On 10/19/2015 03:14 PM, Christoffer Dall wrote:
> On Mon, Oct 19, 2015 at 03:07:16PM +0200, Eric Auger wrote:
>> Hi Christoffer,
>> On 10/17/2015 10:30 PM, Christoffer Dall wrote:
>>> We have an interesting issue when the guest disables the timer interrupt
>>> on the VGIC, which happens when turning VCPUs off using PSCI, for
>>> example.
>>>
>>> The problem is that because the guest disables the virtual interrupt at
>>> the VGIC level, we never inject interrupts to the guest and therefore
>>> never mark the interrupt as active on the physical distributor.  The
>>> host also never takes the timer interrupt (we only use the timer device
>>> to trigger a guest exit and everything else is done in software), so the
>>> interrupt does not become active through normal means.
>>>
>>> The result is that we keep entering the guest with a programmed timer
>>> that will always fire as soon as we context switch the hardware timer
>>> state and run the guest, preventing forward progress for the VCPU.
>>>
>>> Since the active state on the physical distributor is really part of the
>>> timer logic, it is the job of our virtual arch timer driver to manage
>>> this state.
>>>
>>> The timer->map->active boolean field indicates whether we have signalled
>>> this interrupt to the vgic and if that interrupt is still pending or
>>> active.  As long as that is the case, the hardware doesn't have to
>>> generate physical interrupts and therefore we mark the interrupt as
>>> active on the physical distributor.
>>>
>>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>>> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  virt/kvm/arm/arch_timer.c | 19 +++++++++++++++++++
>>>  virt/kvm/arm/vgic.c       | 43 +++++++++++--------------------------------
>>>  2 files changed, 30 insertions(+), 32 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index 48c6e1a..b9d3a32 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -137,6 +137,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
>>>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>>>  {
>>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
>>> +	bool phys_active;
>>> +	int ret;
>>>  
>>>  	/*
>>>  	 * We're about to run this vcpu again, so there is no need to
>>> @@ -151,6 +153,23 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
>>>  	 */
>>>  	if (kvm_timer_should_fire(vcpu))
>>>  		kvm_timer_inject_irq(vcpu);
>>> +
>>> +	/*
>>> +	 * We keep track of whether the edge-triggered interrupt has been
>>> +	 * signalled to the vgic/guest, and if so, we mask the interrupt and
>>> +	 * the physical distributor to prevent the timer from raising a
>>> +	 * physical interrupt whenever we run a guest, preventing forward
>>> +	 * VCPU progress.
>> In practice don't you simply mark the IRQ as active at GIC physical
>> distributor level, hence preventing the same IRQ from hitting again
> 
> yes, that's what I meant with my comment, I should reword to "...we mark
> the interrupt as active on the physical distributor..."
> 
>>> +	 */
>>> +	if (kvm_vgic_get_phys_irq_active(timer->map))
>>> +		phys_active = true;
>>> +	else
>>> +		phys_active = false;
>>> +
>>> +	ret = irq_set_irqchip_state(timer->map->irq,
>>> +				    IRQCHIP_STATE_ACTIVE,
>>> +				    phys_active);
>>
>> physical distributor state is set in arch timer flush. It relates to a
>> shared device behavior so I find it natural to do it there.
>>
>> However the map->active is set in arch_timer IRQ injection and unset in
>> vgic sync. Why not doing the set in kvm_vgic_inject_mapped_irq?
> 
> Because you have to set it at every entry to the guest if you run
> multiple VCPUs/VMs on this CPU or migrate this VCPU to a different CPU.
I meant 	kvm_vgic_set_phys_irq_active(timer->map, true) call in
kvm_timer_inject_irq? Couldn' that been done in
kvm_vgic_inject_mapped_irq instead. Doesn't this apply to all mapped IRQs?

Eric
> 
>>
>>> +	WARN_ON(ret);
>>>  }
>>>  
>>>  /**
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index 596455a..ea21bc2 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1092,6 +1092,15 @@ static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
>>>  	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
>>>  	struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
>>>  
>>> +	/*
>>> +	 * We must transfer the pending state back to the distributor before
>>> +	 * retiring the LR, otherwise we may loose edge-triggered interrupts.
>>> +	 */
>>> +	if (vlr.state & LR_STATE_PENDING) {
>>> +		vgic_dist_irq_set_pending(vcpu, irq);
>>> +		vlr.hwirq = 0;
>>> +	}
>> That fix applies to any edge-sensitive IRQ, ie. not especially the
>> timer's one? In the positive shouldn't you precise this in the commit
>> msg too?
>>
> 
> Probably, it could also be a separate patch.  I'll rework this.
> 
> Thanks,
> -Christoffer
>
Christoffer Dall Oct. 19, 2015, 1:38 p.m. UTC | #5
On Mon, Oct 19, 2015 at 03:27:42PM +0200, Eric Auger wrote:
> On 10/19/2015 03:14 PM, Christoffer Dall wrote:
> > On Mon, Oct 19, 2015 at 03:07:16PM +0200, Eric Auger wrote:
> >> Hi Christoffer,
> >> On 10/17/2015 10:30 PM, Christoffer Dall wrote:
> >>> We have an interesting issue when the guest disables the timer interrupt
> >>> on the VGIC, which happens when turning VCPUs off using PSCI, for
> >>> example.
> >>>
> >>> The problem is that because the guest disables the virtual interrupt at
> >>> the VGIC level, we never inject interrupts to the guest and therefore
> >>> never mark the interrupt as active on the physical distributor.  The
> >>> host also never takes the timer interrupt (we only use the timer device
> >>> to trigger a guest exit and everything else is done in software), so the
> >>> interrupt does not become active through normal means.
> >>>
> >>> The result is that we keep entering the guest with a programmed timer
> >>> that will always fire as soon as we context switch the hardware timer
> >>> state and run the guest, preventing forward progress for the VCPU.
> >>>
> >>> Since the active state on the physical distributor is really part of the
> >>> timer logic, it is the job of our virtual arch timer driver to manage
> >>> this state.
> >>>
> >>> The timer->map->active boolean field indicates whether we have signalled
> >>> this interrupt to the vgic and if that interrupt is still pending or
> >>> active.  As long as that is the case, the hardware doesn't have to
> >>> generate physical interrupts and therefore we mark the interrupt as
> >>> active on the physical distributor.
> >>>
> >>> Cc: Marc Zyngier <marc.zyngier@arm.com>
> >>> Reported-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  virt/kvm/arm/arch_timer.c | 19 +++++++++++++++++++
> >>>  virt/kvm/arm/vgic.c       | 43 +++++++++++--------------------------------
> >>>  2 files changed, 30 insertions(+), 32 deletions(-)
> >>>
> >>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> >>> index 48c6e1a..b9d3a32 100644
> >>> --- a/virt/kvm/arm/arch_timer.c
> >>> +++ b/virt/kvm/arm/arch_timer.c
> >>> @@ -137,6 +137,8 @@ bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
> >>>  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >>>  {
> >>>  	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
> >>> +	bool phys_active;
> >>> +	int ret;
> >>>  
> >>>  	/*
> >>>  	 * We're about to run this vcpu again, so there is no need to
> >>> @@ -151,6 +153,23 @@ void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
> >>>  	 */
> >>>  	if (kvm_timer_should_fire(vcpu))
> >>>  		kvm_timer_inject_irq(vcpu);
> >>> +
> >>> +	/*
> >>> +	 * We keep track of whether the edge-triggered interrupt has been
> >>> +	 * signalled to the vgic/guest, and if so, we mask the interrupt and
> >>> +	 * the physical distributor to prevent the timer from raising a
> >>> +	 * physical interrupt whenever we run a guest, preventing forward
> >>> +	 * VCPU progress.
> >> In practice don't you simply mark the IRQ as active at GIC physical
> >> distributor level, hence preventing the same IRQ from hitting again
> > 
> > yes, that's what I meant with my comment, I should reword to "...we mark
> > the interrupt as active on the physical distributor..."
> > 
> >>> +	 */
> >>> +	if (kvm_vgic_get_phys_irq_active(timer->map))
> >>> +		phys_active = true;
> >>> +	else
> >>> +		phys_active = false;
> >>> +
> >>> +	ret = irq_set_irqchip_state(timer->map->irq,
> >>> +				    IRQCHIP_STATE_ACTIVE,
> >>> +				    phys_active);
> >>
> >> physical distributor state is set in arch timer flush. It relates to a
> >> shared device behavior so I find it natural to do it there.
> >>
> >> However the map->active is set in arch_timer IRQ injection and unset in
> >> vgic sync. Why not doing the set in kvm_vgic_inject_mapped_irq?
> > 
> > Because you have to set it at every entry to the guest if you run
> > multiple VCPUs/VMs on this CPU or migrate this VCPU to a different CPU.
> I meant 	kvm_vgic_set_phys_irq_active(timer->map, true) call in
> kvm_timer_inject_irq? Couldn' that been done in
> kvm_vgic_inject_mapped_irq instead. Doesn't this apply to all mapped IRQs?
> 
I could, but that would be outside the scope of this patch, which aims
to fix the behavior in 4.3 simply.  Unless I'm missing some sort of
change in functionality there?

The map->active stuff all goes away in 4.4 when we change to
level-triggered semantics anyway.

-Christoffer
diff mbox

Patch

diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 48c6e1a..b9d3a32 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -137,6 +137,8 @@  bool kvm_timer_should_fire(struct kvm_vcpu *vcpu)
 void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = &vcpu->arch.timer_cpu;
+	bool phys_active;
+	int ret;
 
 	/*
 	 * We're about to run this vcpu again, so there is no need to
@@ -151,6 +153,23 @@  void kvm_timer_flush_hwstate(struct kvm_vcpu *vcpu)
 	 */
 	if (kvm_timer_should_fire(vcpu))
 		kvm_timer_inject_irq(vcpu);
+
+	/*
+	 * We keep track of whether the edge-triggered interrupt has been
+	 * signalled to the vgic/guest, and if so, we mask the interrupt and
+	 * the physical distributor to prevent the timer from raising a
+	 * physical interrupt whenever we run a guest, preventing forward
+	 * VCPU progress.
+	 */
+	if (kvm_vgic_get_phys_irq_active(timer->map))
+		phys_active = true;
+	else
+		phys_active = false;
+
+	ret = irq_set_irqchip_state(timer->map->irq,
+				    IRQCHIP_STATE_ACTIVE,
+				    phys_active);
+	WARN_ON(ret);
 }
 
 /**
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 596455a..ea21bc2 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1092,6 +1092,15 @@  static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_lr vlr = vgic_get_lr(vcpu, lr_nr);
 
+	/*
+	 * We must transfer the pending state back to the distributor before
+	 * retiring the LR, otherwise we may loose edge-triggered interrupts.
+	 */
+	if (vlr.state & LR_STATE_PENDING) {
+		vgic_dist_irq_set_pending(vcpu, irq);
+		vlr.hwirq = 0;
+	}
+
 	vlr.state = 0;
 	vgic_set_lr(vcpu, lr_nr, vlr);
 	clear_bit(lr_nr, vgic_cpu->lr_used);
@@ -1241,7 +1250,7 @@  static void __kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
 	unsigned long *pa_percpu, *pa_shared;
-	int i, vcpu_id, lr, ret;
+	int i, vcpu_id;
 	int overflow = 0;
 	int nr_shared = vgic_nr_shared_irqs(dist);
 
@@ -1296,31 +1305,6 @@  epilog:
 		 */
 		clear_bit(vcpu_id, dist->irq_pending_on_cpu);
 	}
-
-	for (lr = 0; lr < vgic->nr_lr; lr++) {
-		struct vgic_lr vlr;
-
-		if (!test_bit(lr, vgic_cpu->lr_used))
-			continue;
-
-		vlr = vgic_get_lr(vcpu, lr);
-
-		/*
-		 * If we have a mapping, and the virtual interrupt is
-		 * presented to the guest (as pending or active), then we must
-		 * set the state to active in the physical world. See
-		 * Documentation/virtual/kvm/arm/vgic-mapped-irqs.txt.
-		 */
-		if (vlr.state & LR_HW) {
-			struct irq_phys_map *map;
-			map = vgic_irq_map_search(vcpu, vlr.irq);
-
-			ret = irq_set_irqchip_state(map->irq,
-						    IRQCHIP_STATE_ACTIVE,
-						    true);
-			WARN_ON(ret);
-		}
-	}
 }
 
 static bool vgic_process_maintenance(struct kvm_vcpu *vcpu)
@@ -1430,13 +1414,8 @@  static int vgic_sync_hwirq(struct kvm_vcpu *vcpu, struct vgic_lr vlr)
 
 	WARN_ON(ret);
 
-	if (map->active) {
-		ret = irq_set_irqchip_state(map->irq,
-					    IRQCHIP_STATE_ACTIVE,
-					    false);
-		WARN_ON(ret);
+	if (map->active)
 		return 0;
-	}
 
 	return 1;
 }