diff mbox

[5/5] arm/arm64: KVM: Initialize the vgic on-demand when injecting IRQs

Message ID 1418139844-27892-6-git-send-email-christoffer.dall@linaro.org
State New
Headers show

Commit Message

Christoffer Dall Dec. 9, 2014, 3:44 p.m. UTC
Userspace assumes that it can wire up IRQ injections after having
created all VCPUs and after having created the VGIC, but potentially
before starting the first VCPU.  This can currently lead to lost IRQs
because the state of that IRQ injection is not stored anywhere and we
don't return an error to userspace.

We haven't seen this problem manifest itself yet, presumably because
guests reset the devices on boot, but this could cause issues with
migration and other non-standard startup configurations.

Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
---
 virt/kvm/arm/vgic.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Auger Eric Dec. 10, 2014, 12:45 p.m. UTC | #1
On 12/09/2014 04:44 PM, Christoffer Dall wrote:
> Userspace assumes that it can wire up IRQ injections after having
> created all VCPUs and after having created the VGIC, but potentially
> before starting the first VCPU.  This can currently lead to lost IRQs
> because the state of that IRQ injection is not stored anywhere and we
> don't return an error to userspace.
> 
> We haven't seen this problem manifest itself yet, 
Actually we did with VFIO signaling setup before VGIC init!
presumably because
> guests reset the devices on boot, but this could cause issues with
> migration and other non-standard startup configurations.
> 
> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> ---
>  virt/kvm/arm/vgic.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index c98cc6b..feef015 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -1693,8 +1693,13 @@ out:
>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>  			bool level)
>  {
> -	if (likely(vgic_ready(kvm)) &&
> -	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
> +	if (unlikely(!vgic_initialized(kvm))) {
> +		mutex_lock(&kvm->lock);
> +		vgic_init(kvm);
> +		mutex_unlock(&kvm->lock);
> +	}
I was previously encouraged to test the virtual interrupt controller
readiness when setting irqfd up(proposal made in
https://lkml.org/lkml/2014/12/3/601). I guess this becomes useless now,
correct? Reviewed-by on the whole series.

Eric
> +
> +	if (vgic_update_irq_pending(kvm, cpuid, irq_num, level))
>  		vgic_kick_vcpus(kvm);
>  
>  	return 0;
>
Christoffer Dall Dec. 11, 2014, 12:01 p.m. UTC | #2
On Wed, Dec 10, 2014 at 01:45:50PM +0100, Eric Auger wrote:
> On 12/09/2014 04:44 PM, Christoffer Dall wrote:
> > Userspace assumes that it can wire up IRQ injections after having
> > created all VCPUs and after having created the VGIC, but potentially
> > before starting the first VCPU.  This can currently lead to lost IRQs
> > because the state of that IRQ injection is not stored anywhere and we
> > don't return an error to userspace.
> > 
> > We haven't seen this problem manifest itself yet, 
> Actually we did with VFIO signaling setup before VGIC init!
> presumably because

well, not with code in mainline

> > guests reset the devices on boot, but this could cause issues with
> > migration and other non-standard startup configurations.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index c98cc6b..feef015 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1693,8 +1693,13 @@ out:
> >  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >  			bool level)
> >  {
> > -	if (likely(vgic_ready(kvm)) &&
> > -	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
> > +	if (unlikely(!vgic_initialized(kvm))) {
> > +		mutex_lock(&kvm->lock);
> > +		vgic_init(kvm);
> > +		mutex_unlock(&kvm->lock);
> > +	}
> I was previously encouraged to test the virtual interrupt controller
> readiness when setting irqfd up(proposal made in
> https://lkml.org/lkml/2014/12/3/601). I guess this becomes useless now,
> correct? Reviewed-by on the whole series.
> 
I think we should move to your userspace explicit init for all
non-legacy userspace and only support gicv3 and vfio/irqfd stuff with
userspace explicitly initializing the vgic.

Thanks for the review!

-Christoffer
Auger Eric Dec. 11, 2014, 12:38 p.m. UTC | #3
On 12/11/2014 01:01 PM, Christoffer Dall wrote:
> On Wed, Dec 10, 2014 at 01:45:50PM +0100, Eric Auger wrote:
>> On 12/09/2014 04:44 PM, Christoffer Dall wrote:
>>> Userspace assumes that it can wire up IRQ injections after having
>>> created all VCPUs and after having created the VGIC, but potentially
>>> before starting the first VCPU.  This can currently lead to lost IRQs
>>> because the state of that IRQ injection is not stored anywhere and we
>>> don't return an error to userspace.
>>>
>>> We haven't seen this problem manifest itself yet, 
>> Actually we did with VFIO signaling setup before VGIC init!
>> presumably because
> 
> well, not with code in mainline
> 
>>> guests reset the devices on boot, but this could cause issues with
>>> migration and other non-standard startup configurations.
>>>
>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>> ---
>>>  virt/kvm/arm/vgic.c | 9 +++++++--
>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>> index c98cc6b..feef015 100644
>>> --- a/virt/kvm/arm/vgic.c
>>> +++ b/virt/kvm/arm/vgic.c
>>> @@ -1693,8 +1693,13 @@ out:
>>>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>>  			bool level)
>>>  {
>>> -	if (likely(vgic_ready(kvm)) &&
>>> -	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
>>> +	if (unlikely(!vgic_initialized(kvm))) {
>>> +		mutex_lock(&kvm->lock);
>>> +		vgic_init(kvm);
>>> +		mutex_unlock(&kvm->lock);
>>> +	}
>> I was previously encouraged to test the virtual interrupt controller
>> readiness when setting irqfd up(proposal made in
>> https://lkml.org/lkml/2014/12/3/601). I guess this becomes useless now,
>> correct? Reviewed-by on the whole series.
>>
> I think we should move to your userspace explicit init for all
> non-legacy userspace and only support gicv3 and vfio/irqfd stuff with
> userspace explicitly initializing the vgic.

Hi Christoffer,

The use case I have in mind still is VFIO+irqfd:
since we cannot preclude the user from ignoring the userspace explicit
init and setting up VFIO signaling+irqfd before vgic init, to me there
is a risk injection starts even before creation. Either we test
irqchip_in_kernel in kvm_vgic_inject_irq or we must have a test when
setting up irqfd as proposed in above patch.

Actually before being able to inject any virtual IRQ we weed even more:
if virtual IRQ settings were not yet defined by the guest we do not know
what to do with the IRQ. We must a least know whether it is level or
edge. Current irq_cfg bitmap might be insufficient to store the info
since it only has 2 states and by chance I use a level-sensitive IRQ and
my QEMU pieces pay attention to that sequencing. I guess the problem is
the same for user-space injection, isn't it?

Eric

> 
> Thanks for the review!
> 
> -Christoffer
>
Christoffer Dall Dec. 12, 2014, 11:06 a.m. UTC | #4
On Thu, Dec 11, 2014 at 01:38:16PM +0100, Eric Auger wrote:
> On 12/11/2014 01:01 PM, Christoffer Dall wrote:
> > On Wed, Dec 10, 2014 at 01:45:50PM +0100, Eric Auger wrote:
> >> On 12/09/2014 04:44 PM, Christoffer Dall wrote:
> >>> Userspace assumes that it can wire up IRQ injections after having
> >>> created all VCPUs and after having created the VGIC, but potentially
> >>> before starting the first VCPU.  This can currently lead to lost IRQs
> >>> because the state of that IRQ injection is not stored anywhere and we
> >>> don't return an error to userspace.
> >>>
> >>> We haven't seen this problem manifest itself yet, 
> >> Actually we did with VFIO signaling setup before VGIC init!
> >> presumably because
> > 
> > well, not with code in mainline
> > 
> >>> guests reset the devices on boot, but this could cause issues with
> >>> migration and other non-standard startup configurations.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  virt/kvm/arm/vgic.c | 9 +++++++--
> >>>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >>> index c98cc6b..feef015 100644
> >>> --- a/virt/kvm/arm/vgic.c
> >>> +++ b/virt/kvm/arm/vgic.c
> >>> @@ -1693,8 +1693,13 @@ out:
> >>>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>>  			bool level)
> >>>  {
> >>> -	if (likely(vgic_ready(kvm)) &&
> >>> -	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
> >>> +	if (unlikely(!vgic_initialized(kvm))) {
> >>> +		mutex_lock(&kvm->lock);
> >>> +		vgic_init(kvm);
> >>> +		mutex_unlock(&kvm->lock);
> >>> +	}
> >> I was previously encouraged to test the virtual interrupt controller
> >> readiness when setting irqfd up(proposal made in
> >> https://lkml.org/lkml/2014/12/3/601). I guess this becomes useless now,
> >> correct? Reviewed-by on the whole series.
> >>
> > I think we should move to your userspace explicit init for all
> > non-legacy userspace and only support gicv3 and vfio/irqfd stuff with
> > userspace explicitly initializing the vgic.
> 
> Hi Christoffer,
> 
> The use case I have in mind still is VFIO+irqfd:
> since we cannot preclude the user from ignoring the userspace explicit
> init and setting up VFIO signaling+irqfd before vgic init, to me there
> is a risk injection starts even before creation. Either we test
> irqchip_in_kernel in kvm_vgic_inject_irq or we must have a test when
> setting up irqfd as proposed in above patch.

yes, test if the vgic has been initialized when setting up irqfd and
return an error if not.

> 
> Actually before being able to inject any virtual IRQ we weed even more:
> if virtual IRQ settings were not yet defined by the guest we do not know
> what to do with the IRQ. We must a least know whether it is level or
> edge. Current irq_cfg bitmap might be insufficient to store the info
> since it only has 2 states and by chance I use a level-sensitive IRQ and
> my QEMU pieces pay attention to that sequencing. I guess the problem is
> the same for user-space injection, isn't it?
> 
that's no different from on the hardware is it?  We assume the interrupt
is what it is (as per its default reset value in GICD_ICFGRn) and when
the guest boots up, it must reconfigure the interrupt.

The higher level picture here is that we don't know when the guest is
done configuring things, from some time before we run any vcpu to some
time after we've run vcpus, and there is no error return path on
injecting the IRQ to the VGIC for this sort of matter, so we just have
to cope with things in the same way that hardware does.

My guess is that no sane guests will actually depend on these interrupts
being raised/lowered before configuring the GIC will have any real
effect, as all guests should configure the gic, clear that interrupt,
unmask it, and only then care about things.

-Christoffer
Christoffer Dall Dec. 12, 2014, 11:14 a.m. UTC | #5
On Thu, Dec 11, 2014 at 06:35:40PM +0000, Marc Zyngier wrote:
> On 09/12/14 15:44, Christoffer Dall wrote:
> > Userspace assumes that it can wire up IRQ injections after having
> > created all VCPUs and after having created the VGIC, but potentially
> > before starting the first VCPU.  This can currently lead to lost IRQs
> > because the state of that IRQ injection is not stored anywhere and we
> > don't return an error to userspace.
> > 
> > We haven't seen this problem manifest itself yet, presumably because
> > guests reset the devices on boot, but this could cause issues with
> > migration and other non-standard startup configurations.
> > 
> > Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> > ---
> >  virt/kvm/arm/vgic.c | 9 +++++++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> > index c98cc6b..feef015 100644
> > --- a/virt/kvm/arm/vgic.c
> > +++ b/virt/kvm/arm/vgic.c
> > @@ -1693,8 +1693,13 @@ out:
> >  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >  			bool level)
> >  {
> > -	if (likely(vgic_ready(kvm)) &&
> > -	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
> > +	if (unlikely(!vgic_initialized(kvm))) {
> > +		mutex_lock(&kvm->lock);
> > +		vgic_init(kvm);
> 
> What if this fails?
> 
yeah, not good.  The thing is that we also don't check the return value
from kvm_vgic_inject_irq(), so we can do two things:

(1) change this function to a void, carry out the check for
vgic_initialized in kvm_vm_ioctl_irq_line() in arm.c and export
vgic_init() outside of vgic.c.

(2) just error out if vgic_init() fails and print a kernel error (or
even a BUG_ON?) in kvm_timer_inject_irq() in arch_timer.c.

In both cases we need to make sure that we never configure the timer to
begin injecting IRQs before the vgic is initialized, as Eric pointed out
before.

What do you think?

-Christoffer
Christoffer Dall Dec. 12, 2014, 11:37 a.m. UTC | #6
On Fri, Dec 12, 2014 at 11:23:35AM +0000, Marc Zyngier wrote:
> On 12/12/14 11:14, Christoffer Dall wrote:
> > On Thu, Dec 11, 2014 at 06:35:40PM +0000, Marc Zyngier wrote:
> >> On 09/12/14 15:44, Christoffer Dall wrote:
> >>> Userspace assumes that it can wire up IRQ injections after having
> >>> created all VCPUs and after having created the VGIC, but potentially
> >>> before starting the first VCPU.  This can currently lead to lost IRQs
> >>> because the state of that IRQ injection is not stored anywhere and we
> >>> don't return an error to userspace.
> >>>
> >>> We haven't seen this problem manifest itself yet, presumably because
> >>> guests reset the devices on boot, but this could cause issues with
> >>> migration and other non-standard startup configurations.
> >>>
> >>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
> >>> ---
> >>>  virt/kvm/arm/vgic.c | 9 +++++++--
> >>>  1 file changed, 7 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> >>> index c98cc6b..feef015 100644
> >>> --- a/virt/kvm/arm/vgic.c
> >>> +++ b/virt/kvm/arm/vgic.c
> >>> @@ -1693,8 +1693,13 @@ out:
> >>>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
> >>>  			bool level)
> >>>  {
> >>> -	if (likely(vgic_ready(kvm)) &&
> >>> -	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
> >>> +	if (unlikely(!vgic_initialized(kvm))) {
> >>> +		mutex_lock(&kvm->lock);
> >>> +		vgic_init(kvm);
> >>
> >> What if this fails?
> >>
> > yeah, not good.  The thing is that we also don't check the return value
> > from kvm_vgic_inject_irq(), so we can do two things:
> > 
> > (1) change this function to a void, carry out the check for
> > vgic_initialized in kvm_vm_ioctl_irq_line() in arm.c and export
> > vgic_init() outside of vgic.c.
> > 
> > (2) just error out if vgic_init() fails and print a kernel error (or
> > even a BUG_ON?) in kvm_timer_inject_irq() in arch_timer.c.
> > 
> > In both cases we need to make sure that we never configure the timer to
> > begin injecting IRQs before the vgic is initialized, as Eric pointed out
> > before.
> > 
> > What do you think?
> 
> I'd favour option two.
> 
> My reasoning is that the timer interrupt is triggered by the HW. If it
> has fired, that's because we've programmed it to trigger, with means a
> vcpu has run. At that point, the vgic would better be initialized, or we
> have something much nastier on our hands.
> 
Sounds reasonable to me, I'll do a quick respin with the check for the
timer (to ensure the user even created a vgic).

-Christoffer
Auger Eric Dec. 15, 2014, 10:43 a.m. UTC | #7
On 12/12/2014 12:06 PM, Christoffer Dall wrote:
> On Thu, Dec 11, 2014 at 01:38:16PM +0100, Eric Auger wrote:
>> On 12/11/2014 01:01 PM, Christoffer Dall wrote:
>>> On Wed, Dec 10, 2014 at 01:45:50PM +0100, Eric Auger wrote:
>>>> On 12/09/2014 04:44 PM, Christoffer Dall wrote:
>>>>> Userspace assumes that it can wire up IRQ injections after having
>>>>> created all VCPUs and after having created the VGIC, but potentially
>>>>> before starting the first VCPU.  This can currently lead to lost IRQs
>>>>> because the state of that IRQ injection is not stored anywhere and we
>>>>> don't return an error to userspace.
>>>>>
>>>>> We haven't seen this problem manifest itself yet, 
>>>> Actually we did with VFIO signaling setup before VGIC init!
>>>> presumably because
>>>
>>> well, not with code in mainline
>>>
>>>>> guests reset the devices on boot, but this could cause issues with
>>>>> migration and other non-standard startup configurations.
>>>>>
>>>>> Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
>>>>> ---
>>>>>  virt/kvm/arm/vgic.c | 9 +++++++--
>>>>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
>>>>> index c98cc6b..feef015 100644
>>>>> --- a/virt/kvm/arm/vgic.c
>>>>> +++ b/virt/kvm/arm/vgic.c
>>>>> @@ -1693,8 +1693,13 @@ out:
>>>>>  int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
>>>>>  			bool level)
>>>>>  {
>>>>> -	if (likely(vgic_ready(kvm)) &&
>>>>> -	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
>>>>> +	if (unlikely(!vgic_initialized(kvm))) {
>>>>> +		mutex_lock(&kvm->lock);
>>>>> +		vgic_init(kvm);
>>>>> +		mutex_unlock(&kvm->lock);
>>>>> +	}
>>>> I was previously encouraged to test the virtual interrupt controller
>>>> readiness when setting irqfd up(proposal made in
>>>> https://lkml.org/lkml/2014/12/3/601). I guess this becomes useless now,
>>>> correct? Reviewed-by on the whole series.
>>>>
>>> I think we should move to your userspace explicit init for all
>>> non-legacy userspace and only support gicv3 and vfio/irqfd stuff with
>>> userspace explicitly initializing the vgic.
>>
>> Hi Christoffer,
>>
>> The use case I have in mind still is VFIO+irqfd:
>> since we cannot preclude the user from ignoring the userspace explicit
>> init and setting up VFIO signaling+irqfd before vgic init, to me there
>> is a risk injection starts even before creation. Either we test
>> irqchip_in_kernel in kvm_vgic_inject_irq or we must have a test when
>> setting up irqfd as proposed in above patch.
> 
> yes, test if the vgic has been initialized when setting up irqfd and
> return an error if not.
> 
>>
>> Actually before being able to inject any virtual IRQ we weed even more:
>> if virtual IRQ settings were not yet defined by the guest we do not know
>> what to do with the IRQ. We must a least know whether it is level or
>> edge. Current irq_cfg bitmap might be insufficient to store the info
>> since it only has 2 states and by chance I use a level-sensitive IRQ and
>> my QEMU pieces pay attention to that sequencing. I guess the problem is
>> the same for user-space injection, isn't it?
>>
> that's no different from on the hardware is it?  We assume the interrupt
> is what it is (as per its default reset value in GICD_ICFGRn) and when
> the guest boots up, it must reconfigure the interrupt.
> 
> The higher level picture here is that we don't know when the guest is
> done configuring things, from some time before we run any vcpu to some
> time after we've run vcpus, and there is no error return path on
> injecting the IRQ to the VGIC for this sort of matter, so we just have
> to cope with things in the same way that hardware does.
> 
> My guess is that no sane guests will actually depend on these interrupts
> being raised/lowered before configuring the GIC will have any real
> effect, as all guests should configure the gic, clear that interrupt,
> unmask it, and only then care about things.

Hi Christoffer,

makes sense to me too, after reading the spec once more ;-)

Best Regards

Eric
> 
> -Christoffer
>
diff mbox

Patch

diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index c98cc6b..feef015 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1693,8 +1693,13 @@  out:
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
 			bool level)
 {
-	if (likely(vgic_ready(kvm)) &&
-	    vgic_update_irq_pending(kvm, cpuid, irq_num, level))
+	if (unlikely(!vgic_initialized(kvm))) {
+		mutex_lock(&kvm->lock);
+		vgic_init(kvm);
+		mutex_unlock(&kvm->lock);
+	}
+
+	if (vgic_update_irq_pending(kvm, cpuid, irq_num, level))
 		vgic_kick_vcpus(kvm);
 
 	return 0;