Message ID | 1416480804-25651-1-git-send-email-stefano.stabellini@eu.citrix.com |
---|---|
State | New |
Headers | show |
Hi Stefano, On 11/20/2014 10:53 AM, Stefano Stabellini wrote: > UIE being set can cause maintenance interrupts to occur when Xen writes > to one or more LR registers. The effect is a busy loop around the > interrupt handler in Xen > (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > CC: konrad.wilk@oracle.com > --- > > Konrad, this fixes an actual bug, at least on OMAP5. It should have no > bad side effects on any other platforms as far as I can tell. It should > go in 4.5. From Andrii's mail, there is a bad side effect. We can receive spurious interrupt. On V1, you said that you tried this patch on midway. I would prefer if you give a try on platform that would really be used (such as xgene or seattle). As we know Midway won't be used in prod. Regards, > Changes in v2: > > - add an in-code comment about maintenance_interrupt not being called. > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 70d10d6..c6c11d3 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) > if ( is_idle_vcpu(v) ) > return; > > + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > + > spin_lock_irqsave(&v->arch.vgic.lock, flags); > > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > @@ -527,8 +529,6 @@ void gic_inject(void) > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1); > - else > - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > } > > static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) > @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > * Receiving the interrupt is going to cause gic_inject to be called > * on return to guest that is going to clear the old LRs and inject > * new interrupts. > + * > + * Do not add code here: maintenance interrupts caused by setting > + * GICH_HCR_UIE, might read as spurious interrupts (1023). As a > + * consequence sometimes this handler might not be called. > */ > } > >
On 11/20/2014 11:02 AM, Julien Grall wrote: > Hi Stefano, > > On 11/20/2014 10:53 AM, Stefano Stabellini wrote: >> UIE being set can cause maintenance interrupts to occur when Xen writes >> to one or more LR registers. The effect is a busy loop around the >> interrupt handler in Xen >> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. >> >> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> >> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> >> CC: konrad.wilk@oracle.com >> --- >> >> Konrad, this fixes an actual bug, at least on OMAP5. It should have no >> bad side effects on any other platforms as far as I can tell. It should >> go in 4.5. > > From Andrii's mail, there is a bad side effect. We can receive spurious > interrupt. > > On V1, you said that you tried this patch on midway. I would prefer if > you give a try on platform that would really be used (such as xgene or > seattle). As we know Midway won't be used in prod. And maybe give a try to gicv3 too as it's common code. Regards,
On Thu, 20 Nov 2014, Julien Grall wrote: > Hi Stefano, > > On 11/20/2014 10:53 AM, Stefano Stabellini wrote: > > UIE being set can cause maintenance interrupts to occur when Xen writes > > to one or more LR registers. The effect is a busy loop around the > > interrupt handler in Xen > > (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. > > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > > CC: konrad.wilk@oracle.com > > --- > > > > Konrad, this fixes an actual bug, at least on OMAP5. It should have no > > bad side effects on any other platforms as far as I can tell. It should > > go in 4.5. > > >From Andrii's mail, there is a bad side effect. We can receive spurious > interrupt. The spurious interrupt is the maintenance interrupt. There is not one more spurious interrupt notification in addition to the maintenance interrupt. > On V1, you said that you tried this patch on midway. I would prefer if > you give a try on platform that would really be used (such as xgene or > seattle). As we know Midway won't be used in prod. OK > Regards, > > > Changes in v2: > > > > - add an in-code comment about maintenance_interrupt not being called. > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index 70d10d6..c6c11d3 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) > > if ( is_idle_vcpu(v) ) > > return; > > > > + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > > + > > spin_lock_irqsave(&v->arch.vgic.lock, flags); > > > > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > > @@ -527,8 +529,6 @@ void gic_inject(void) > > > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > > gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1); > > - else > > - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > > } > > > > static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) > > @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > > * Receiving the interrupt is going to cause gic_inject to be called > > * on return to guest that is going to clear the old LRs and inject > > * new interrupts. > > + * > > + * Do not add code here: maintenance interrupts caused by setting > > + * GICH_HCR_UIE, might read as spurious interrupts (1023). As a > > + * consequence sometimes this handler might not be called. > > */ > > } > > > > > > > -- > Julien Grall >
On Thu, 20 Nov 2014, Julien Grall wrote: > On 11/20/2014 11:02 AM, Julien Grall wrote: > > Hi Stefano, > > > > On 11/20/2014 10:53 AM, Stefano Stabellini wrote: > >> UIE being set can cause maintenance interrupts to occur when Xen writes > >> to one or more LR registers. The effect is a busy loop around the > >> interrupt handler in Xen > >> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. > >> > >> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > >> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > >> CC: konrad.wilk@oracle.com > >> --- > >> > >> Konrad, this fixes an actual bug, at least on OMAP5. It should have no > >> bad side effects on any other platforms as far as I can tell. It should > >> go in 4.5. > > > > From Andrii's mail, there is a bad side effect. We can receive spurious > > interrupt. > > > > On V1, you said that you tried this patch on midway. I would prefer if > > you give a try on platform that would really be used (such as xgene or > > seattle). As we know Midway won't be used in prod. > > And maybe give a try to gicv3 too as it's common code. I don't have a gicv3 test environment ready but it works on xgene too
Hi Stefano, On 11/20/2014 03:54 PM, Stefano Stabellini wrote: > On Thu, 20 Nov 2014, Julien Grall wrote: >> On 11/20/2014 11:02 AM, Julien Grall wrote: >>> Hi Stefano, >>> >>> On 11/20/2014 10:53 AM, Stefano Stabellini wrote: >>>> UIE being set can cause maintenance interrupts to occur when Xen writes >>>> to one or more LR registers. The effect is a busy loop around the >>>> interrupt handler in Xen >>>> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. >>>> >>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> >>>> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> >>>> CC: konrad.wilk@oracle.com >>>> --- >>>> >>>> Konrad, this fixes an actual bug, at least on OMAP5. It should have no >>>> bad side effects on any other platforms as far as I can tell. It should >>>> go in 4.5. >>> >>> From Andrii's mail, there is a bad side effect. We can receive spurious >>> interrupt. >>> >>> On V1, you said that you tried this patch on midway. I would prefer if >>> you give a try on platform that would really be used (such as xgene or >>> seattle). As we know Midway won't be used in prod. >> >> And maybe give a try to gicv3 too as it's common code. > > I don't have a gicv3 test environment ready but it works on xgene too Ok. I will give a quick try on the model today or tomorrow. Aside from that, and after reading the spec. This patch looks good to me: Reviewed-by: Julien Grall <julien.grall@linaro.org> Regards,
On Thu, 2014-11-20 at 10:53 +0000, Stefano Stabellini wrote: > UIE being set can cause maintenance interrupts to occur when Xen writes > to one or more LR registers. The effect is a busy loop around the > interrupt handler in Xen > (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. I think it would be useful to explain somewhere why/how a spurious interrupt can occur, if not here then in the comment you've already added or in another one in gic_clear_lrs where you clear UIE. The bit about clearing the LRs on entry causing UIE to become deasserted before we get as far as reading the IAR is a bit subtle. > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > CC: konrad.wilk@oracle.com With the expanded commentary: Acked-by: Ian Campbell <ian.campbell@citrix.com> > --- > > Konrad, this fixes an actual bug, at least on OMAP5. It should have no > bad side effects on any other platforms as far as I can tell. It should > go in 4.5. > > Changes in v2: > > - add an in-code comment about maintenance_interrupt not being called. > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > index 70d10d6..c6c11d3 100644 > --- a/xen/arch/arm/gic.c > +++ b/xen/arch/arm/gic.c > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) > if ( is_idle_vcpu(v) ) > return; > > + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > + > spin_lock_irqsave(&v->arch.vgic.lock, flags); > > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > @@ -527,8 +529,6 @@ void gic_inject(void) > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1); > - else > - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > } > > static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) > @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > * Receiving the interrupt is going to cause gic_inject to be called > * on return to guest that is going to clear the old LRs and inject > * new interrupts. > + * > + * Do not add code here: maintenance interrupts caused by setting > + * GICH_HCR_UIE, might read as spurious interrupts (1023). As a > + * consequence sometimes this handler might not be called. > */ > } >
On Thu, Nov 20, 2014 at 04:47:42PM +0000, Ian Campbell wrote: > On Thu, 2014-11-20 at 10:53 +0000, Stefano Stabellini wrote: > > UIE being set can cause maintenance interrupts to occur when Xen writes > > to one or more LR registers. The effect is a busy loop around the > > interrupt handler in Xen > > (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. > > I think it would be useful to explain somewhere why/how a spurious > interrupt can occur, if not here then in the comment you've already > added or in another one in gic_clear_lrs where you clear UIE. > > The bit about clearing the LRs on entry causing UIE to become deasserted > before we get as far as reading the IAR is a bit subtle. > > > Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> > > Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> > > CC: konrad.wilk@oracle.com > > With the expanded commentary: > Acked-by: Ian Campbell <ian.campbell@citrix.com> > > > --- > > > > Konrad, this fixes an actual bug, at least on OMAP5. It should have no > > bad side effects on any other platforms as far as I can tell. It should > > go in 4.5. As long as the testing that Julian has asked for does not demonstrate any issues with this patch I am OK with it going in Xen 4.5. > > > > Changes in v2: > > > > - add an in-code comment about maintenance_interrupt not being called. > > > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c > > index 70d10d6..c6c11d3 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) > > if ( is_idle_vcpu(v) ) > > return; > > > > + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > > + > > spin_lock_irqsave(&v->arch.vgic.lock, flags); > > > > while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), > > @@ -527,8 +529,6 @@ void gic_inject(void) > > > > if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) > > gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1); > > - else > > - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); > > } > > > > static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) > > @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r > > * Receiving the interrupt is going to cause gic_inject to be called > > * on return to guest that is going to clear the old LRs and inject > > * new interrupts. > > + * > > + * Do not add code here: maintenance interrupts caused by setting > > + * GICH_HCR_UIE, might read as spurious interrupts (1023). As a > > + * consequence sometimes this handler might not be called. > > */ > > } > > > >
Hi Stefano, On 20/11/2014 15:54, Stefano Stabellini wrote: > On Thu, 20 Nov 2014, Julien Grall wrote: >> On 11/20/2014 11:02 AM, Julien Grall wrote: >>> Hi Stefano, >>> >>> On 11/20/2014 10:53 AM, Stefano Stabellini wrote: >>>> UIE being set can cause maintenance interrupts to occur when Xen writes >>>> to one or more LR registers. The effect is a busy loop around the >>>> interrupt handler in Xen >>>> (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. >>>> >>>> Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> >>>> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> >>>> CC: konrad.wilk@oracle.com >>>> --- >>>> >>>> Konrad, this fixes an actual bug, at least on OMAP5. It should have no >>>> bad side effects on any other platforms as far as I can tell. It should >>>> go in 4.5. >>> >>> From Andrii's mail, there is a bad side effect. We can receive spurious >>> interrupt. >>> >>> On V1, you said that you tried this patch on midway. I would prefer if >>> you give a try on platform that would really be used (such as xgene or >>> seattle). As we know Midway won't be used in prod. >> >> And maybe give a try to gicv3 too as it's common code. > > I don't have a gicv3 test environment ready but it works on xgene too I gave a try on the Foundation Model and didn't see any specific regression: Tested-by: Julien Grall <julien.grall@linaro.org> Regards,
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..c6c11d3 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; + gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(&v->arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(¤t->arch.vgic.lr_pending) && lr_all_full() ) gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 1); - else - gic_hw_ops->update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r * Receiving the interrupt is going to cause gic_inject to be called * on return to guest that is going to clear the old LRs and inject * new interrupts. + * + * Do not add code here: maintenance interrupts caused by setting + * GICH_HCR_UIE, might read as spurious interrupts (1023). As a + * consequence sometimes this handler might not be called. */ }
UIE being set can cause maintenance interrupts to occur when Xen writes to one or more LR registers. The effect is a busy loop around the interrupt handler in Xen (http://marc.info/?l=xen-devel&m=141597517132682): everything gets stuck. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Reported-and-Tested-by: Andrii Tseglytskyi <andrii.tseglytskyi@globallogic.com> CC: konrad.wilk@oracle.com --- Konrad, this fixes an actual bug, at least on OMAP5. It should have no bad side effects on any other platforms as far as I can tell. It should go in 4.5. Changes in v2: - add an in-code comment about maintenance_interrupt not being called.