diff mbox

[Xen-devel,v2,for-4.5] xen/arm: clear UIE on hypervisor entry

Message ID 1416480804-25651-1-git-send-email-stefano.stabellini@eu.citrix.com
State New
Headers show

Commit Message

Stefano Stabellini Nov. 20, 2014, 10:53 a.m. UTC
UIE being set can cause maintenance interrupts to occur when Xen writes
to one or more LR registers. The effect is a busy loop around the
interrupt handler in Xen
(http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
CC: konrad.wilk@oracle.com

---

Konrad, this fixes an actual bug, at least on OMAP5. It should have no
bad side effects on any other platforms as far as I can tell. It should
go in 4.5.

Changes in v2:

- add an in-code comment about maintenance_interrupt not being called.

Comments

Julien Grall Nov. 20, 2014, 11:02 a.m. UTC | #1
Hi Stefano,

On 11/20/2014 10:53 AM, Stefano Stabellini wrote:
> UIE being set can cause maintenance interrupts to occur when Xen writes
> to one or more LR registers. The effect is a busy loop around the
> interrupt handler in Xen
> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
> 
> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> CC: konrad.wilk@oracle.com
> ---
> 
> Konrad, this fixes an actual bug, at least on OMAP5. It should have no
> bad side effects on any other platforms as far as I can tell. It should
> go in 4.5.

From Andrii's mail, there is a bad side effect. We can receive spurious
interrupt.

On V1, you said that you tried this patch on midway. I would prefer if
you give a try on  platform that would really be used (such as xgene or
seattle). As we know Midway won't be used in prod.

Regards,

> Changes in v2:
> 
> - add an in-code comment about maintenance_interrupt not being called.
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 70d10d6..c6c11d3 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>      if ( is_idle_vcpu(v) )
>          return;
>  
> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> +
>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>  
>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -527,8 +529,6 @@ void gic_inject(void)
>  
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> -    else
> -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>  }
>  
>  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
>       * Receiving the interrupt is going to cause gic_inject to be called
>       * on return to guest that is going to clear the old LRs and inject
>       * new interrupts.
> +     *
> +     * Do not add code here: maintenance interrupts caused by setting
> +     * GICH_HCR_UIE, might read as spurious interrupts (1023). As a
> +     * consequence sometimes this handler might not be called.
>       */
>  }
>  
>
Julien Grall Nov. 20, 2014, 11:06 a.m. UTC | #2
On 11/20/2014 11:02 AM, Julien Grall wrote:
> Hi Stefano,
> 
> On 11/20/2014 10:53 AM, Stefano Stabellini wrote:
>> UIE being set can cause maintenance interrupts to occur when Xen writes
>> to one or more LR registers. The effect is a busy loop around the
>> interrupt handler in Xen
>> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
>>
>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>> CC: konrad.wilk@oracle.com
>> ---
>>
>> Konrad, this fixes an actual bug, at least on OMAP5. It should have no
>> bad side effects on any other platforms as far as I can tell. It should
>> go in 4.5.
> 
> From Andrii's mail, there is a bad side effect. We can receive spurious
> interrupt.
> 
> On V1, you said that you tried this patch on midway. I would prefer if
> you give a try on  platform that would really be used (such as xgene or
> seattle). As we know Midway won't be used in prod.

And maybe give a try to gicv3 too as it's common code.

Regards,
Stefano Stabellini Nov. 20, 2014, 11:08 a.m. UTC | #3
On Thu, 20 Nov 2014, Julien Grall wrote:
> Hi Stefano,
> 
> On 11/20/2014 10:53 AM, Stefano Stabellini wrote:
> > UIE being set can cause maintenance interrupts to occur when Xen writes
> > to one or more LR registers. The effect is a busy loop around the
> > interrupt handler in Xen
> > (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
> > 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> > CC: konrad.wilk@oracle.com
> > ---
> > 
> > Konrad, this fixes an actual bug, at least on OMAP5. It should have no
> > bad side effects on any other platforms as far as I can tell. It should
> > go in 4.5.
> 
> >From Andrii's mail, there is a bad side effect. We can receive spurious
> interrupt.

The spurious interrupt is the maintenance interrupt.
There is not one more spurious interrupt notification in addition to the
maintenance interrupt.


> On V1, you said that you tried this patch on midway. I would prefer if
> you give a try on  platform that would really be used (such as xgene or
> seattle). As we know Midway won't be used in prod.

OK


> Regards,
> 
> > Changes in v2:
> > 
> > - add an in-code comment about maintenance_interrupt not being called.
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 70d10d6..c6c11d3 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >      if ( is_idle_vcpu(v) )
> >          return;
> >  
> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> > +
> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >  
> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >  
> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> > -    else
> > -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >  }
> >  
> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> > @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
> >       * Receiving the interrupt is going to cause gic_inject to be called
> >       * on return to guest that is going to clear the old LRs and inject
> >       * new interrupts.
> > +     *
> > +     * Do not add code here: maintenance interrupts caused by setting
> > +     * GICH_HCR_UIE, might read as spurious interrupts (1023). As a
> > +     * consequence sometimes this handler might not be called.
> >       */
> >  }
> >  
> > 
> 
> 
> -- 
> Julien Grall
>
Stefano Stabellini Nov. 20, 2014, 3:54 p.m. UTC | #4
On Thu, 20 Nov 2014, Julien Grall wrote:
> On 11/20/2014 11:02 AM, Julien Grall wrote:
> > Hi Stefano,
> > 
> > On 11/20/2014 10:53 AM, Stefano Stabellini wrote:
> >> UIE being set can cause maintenance interrupts to occur when Xen writes
> >> to one or more LR registers. The effect is a busy loop around the
> >> interrupt handler in Xen
> >> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
> >>
> >> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> >> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> >> CC: konrad.wilk@oracle.com
> >> ---
> >>
> >> Konrad, this fixes an actual bug, at least on OMAP5. It should have no
> >> bad side effects on any other platforms as far as I can tell. It should
> >> go in 4.5.
> > 
> > From Andrii's mail, there is a bad side effect. We can receive spurious
> > interrupt.
> > 
> > On V1, you said that you tried this patch on midway. I would prefer if
> > you give a try on  platform that would really be used (such as xgene or
> > seattle). As we know Midway won't be used in prod.
> 
> And maybe give a try to gicv3 too as it's common code.

I don't have a gicv3 test environment ready but it works on xgene too
Julien Grall Nov. 20, 2014, 4:46 p.m. UTC | #5
Hi Stefano,

On 11/20/2014 03:54 PM, Stefano Stabellini wrote:
> On Thu, 20 Nov 2014, Julien Grall wrote:
>> On 11/20/2014 11:02 AM, Julien Grall wrote:
>>> Hi Stefano,
>>>
>>> On 11/20/2014 10:53 AM, Stefano Stabellini wrote:
>>>> UIE being set can cause maintenance interrupts to occur when Xen writes
>>>> to one or more LR registers. The effect is a busy loop around the
>>>> interrupt handler in Xen
>>>> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
>>>>
>>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>>>> CC: konrad.wilk@oracle.com
>>>> ---
>>>>
>>>> Konrad, this fixes an actual bug, at least on OMAP5. It should have no
>>>> bad side effects on any other platforms as far as I can tell. It should
>>>> go in 4.5.
>>>
>>> From Andrii's mail, there is a bad side effect. We can receive spurious
>>> interrupt.
>>>
>>> On V1, you said that you tried this patch on midway. I would prefer if
>>> you give a try on  platform that would really be used (such as xgene or
>>> seattle). As we know Midway won't be used in prod.
>>
>> And maybe give a try to gicv3 too as it's common code.
> 
> I don't have a gicv3 test environment ready but it works on xgene too

Ok. I will give a quick try on the model today or tomorrow.

Aside from that, and after reading the spec. This patch looks good to me:

Reviewed-by: Julien Grall <julien.grall@linaro.org>

Regards,
Ian Campbell Nov. 20, 2014, 4:47 p.m. UTC | #6
On Thu, 2014-11-20 at 10:53 +0000, Stefano Stabellini wrote:
> UIE being set can cause maintenance interrupts to occur when Xen writes
> to one or more LR registers. The effect is a busy loop around the
> interrupt handler in Xen
> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.

I think it would be useful to explain somewhere why/how a spurious
interrupt can occur, if not here then in the comment you've already
added or in another one in gic_clear_lrs where you clear UIE.

The bit about clearing the LRs on entry causing UIE to become deasserted
before we get as far as reading the IAR is a bit subtle.

> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> CC: konrad.wilk@oracle.com

With the expanded commentary:
        Acked-by: Ian Campbell <ian.campbell@citrix.com>

> ---
> 
> Konrad, this fixes an actual bug, at least on OMAP5. It should have no
> bad side effects on any other platforms as far as I can tell. It should
> go in 4.5.
> 
> Changes in v2:
> 
> - add an in-code comment about maintenance_interrupt not being called.
> 
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 70d10d6..c6c11d3 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
>      if ( is_idle_vcpu(v) )
>          return;
>  
> +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> +
>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
>  
>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> @@ -527,8 +529,6 @@ void gic_inject(void)
>  
>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
>          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> -    else
> -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
>  }
>  
>  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
>       * Receiving the interrupt is going to cause gic_inject to be called
>       * on return to guest that is going to clear the old LRs and inject
>       * new interrupts.
> +     *
> +     * Do not add code here: maintenance interrupts caused by setting
> +     * GICH_HCR_UIE, might read as spurious interrupts (1023). As a
> +     * consequence sometimes this handler might not be called.
>       */
>  }
>
Konrad Rzeszutek Wilk Nov. 20, 2014, 8:17 p.m. UTC | #7
On Thu, Nov 20, 2014 at 04:47:42PM +0000, Ian Campbell wrote:
> On Thu, 2014-11-20 at 10:53 +0000, Stefano Stabellini wrote:
> > UIE being set can cause maintenance interrupts to occur when Xen writes
> > to one or more LR registers. The effect is a busy loop around the
> > interrupt handler in Xen
> > (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
> 
> I think it would be useful to explain somewhere why/how a spurious
> interrupt can occur, if not here then in the comment you've already
> added or in another one in gic_clear_lrs where you clear UIE.
> 
> The bit about clearing the LRs on entry causing UIE to become deasserted
> before we get as far as reading the IAR is a bit subtle.
> 
> > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
> > Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
> > CC: konrad.wilk@oracle.com
> 
> With the expanded commentary:
>         Acked-by: Ian Campbell <ian.campbell@citrix.com>
> 
> > ---
> > 
> > Konrad, this fixes an actual bug, at least on OMAP5. It should have no
> > bad side effects on any other platforms as far as I can tell. It should
> > go in 4.5.

As long as the testing that Julian has asked for does not demonstrate
any issues with this patch I am OK with it going in Xen 4.5.


> > 
> > Changes in v2:
> > 
> > - add an in-code comment about maintenance_interrupt not being called.
> > 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 70d10d6..c6c11d3 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v)
> >      if ( is_idle_vcpu(v) )
> >          return;
> >  
> > +    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> > +
> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >  
> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> > @@ -527,8 +529,6 @@ void gic_inject(void)
> >  
> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >          gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
> > -    else
> > -        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
> >  }
> >  
> >  static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
> > @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
> >       * Receiving the interrupt is going to cause gic_inject to be called
> >       * on return to guest that is going to clear the old LRs and inject
> >       * new interrupts.
> > +     *
> > +     * Do not add code here: maintenance interrupts caused by setting
> > +     * GICH_HCR_UIE, might read as spurious interrupts (1023). As a
> > +     * consequence sometimes this handler might not be called.
> >       */
> >  }
> >  
> 
>
Julien Grall Nov. 21, 2014, 2:13 p.m. UTC | #8
Hi Stefano,

On 20/11/2014 15:54, Stefano Stabellini wrote:
> On Thu, 20 Nov 2014, Julien Grall wrote:
>> On 11/20/2014 11:02 AM, Julien Grall wrote:
>>> Hi Stefano,
>>>
>>> On 11/20/2014 10:53 AM, Stefano Stabellini wrote:
>>>> UIE being set can cause maintenance interrupts to occur when Xen writes
>>>> to one or more LR registers. The effect is a busy loop around the
>>>> interrupt handler in Xen
>>>> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck.
>>>>
>>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
>>>> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com>
>>>> CC: konrad.wilk@oracle.com
>>>> ---
>>>>
>>>> Konrad, this fixes an actual bug, at least on OMAP5. It should have no
>>>> bad side effects on any other platforms as far as I can tell. It should
>>>> go in 4.5.
>>>
>>>  From Andrii's mail, there is a bad side effect. We can receive spurious
>>> interrupt.
>>>
>>> On V1, you said that you tried this patch on midway. I would prefer if
>>> you give a try on  platform that would really be used (such as xgene or
>>> seattle). As we know Midway won't be used in prod.
>>
>> And maybe give a try to gicv3 too as it's common code.
>
> I don't have a gicv3 test environment ready but it works on xgene too

I gave a try on the Foundation Model and didn't see any specific regression:

Tested-by: Julien Grall <julien.grall@linaro.org>

Regards,
diff mbox

Patch

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 70d10d6..c6c11d3 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -403,6 +403,8 @@  void gic_clear_lrs(struct vcpu *v)
     if ( is_idle_vcpu(v) )
         return;
 
+    gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
+
     spin_lock_irqsave(&v->arch.vgic.lock, flags);
 
     while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
@@ -527,8 +529,6 @@  void gic_inject(void)
 
     if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
         gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1);
-    else
-        gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0);
 }
 
 static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi)
@@ -598,6 +598,10 @@  static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
      * Receiving the interrupt is going to cause gic_inject to be called
      * on return to guest that is going to clear the old LRs and inject
      * new interrupts.
+     *
+     * Do not add code here: maintenance interrupts caused by setting
+     * GICH_HCR_UIE, might read as spurious interrupts (1023). As a
+     * consequence sometimes this handler might not be called.
      */
 }