[Xen-devel] Xen 4.5 random freeze question

I think that's OK: it looks like that on your board for some reasons
when UIE is set you get irq 1023 (spurious interrupt) instead of your
normal maintenance interrupt.

But everything should work anyway without issues.

This is the same patch as before but on top of the lastest xen-unstable
tree. Please confirm if it works.

On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> I got this strange log:
> 
> (XEN) received maintenance interrupt irq=1023
> 
> And platform does not hang due to this:
> +    hcr = GICH[GICH_HCR];
> +    if ( hcr & GICH_HCR_UIE )
> +    {
> +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> +        uie_on = 1;
> +    }
> 
> On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini
> >> <stefano.stabellini@eu.citrix.com> wrote:
> >> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi
> >> >> <andrii.tseglytskyi@globallogic.com> wrote:
> >> >> > On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini
> >> >> > <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >> On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> Hi Stefano,
> >> >> >>>
> >> >> >>> On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini
> >> >> >>> <stefano.stabellini@eu.citrix.com> wrote:
> >> >> >>> > On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote:
> >> >> >>> >> Hi Stefano,
> >> >> >>> >>
> >> >> >>> >> > >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >> >>> >> > > -        GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >>> >> > > +        GICH[GICH_HCR] |= GICH_HCR_NPIE;
> >> >> >>> >> > >      else
> >> >> >>> >> > > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >>> >> > > +        GICH[GICH_HCR] &= ~GICH_HCR_NPIE;
> >> >> >>> >> > >
> >> >> >>> >> > >  }
> >> >> >>> >> >
> >> >> >>> >> > Yes, exactly
> >> >> >>> >>
> >> >> >>> >> I tried, hang still occurs with this change
> >> >> >>> >
> >> >> >>> > We need to figure out why during the hang you still have all the LRs
> >> >> >>> > busy even if you are getting maintenance interrupts that should cause
> >> >> >>> > them to be cleared.
> >> >> >>> >
> >> >> >>>
> >> >> >>> I see that I have free LRs during maintenance interrupt
> >> >> >>>
> >> >> >>> (XEN) gic.c:871:d0v0 maintenance interrupt
> >> >> >>> (XEN) GICH_LRs (vcpu 0) mask=0
> >> >> >>> (XEN)    HW_LR[0]=9a015856
> >> >> >>> (XEN)    HW_LR[1]=0
> >> >> >>> (XEN)    HW_LR[2]=0
> >> >> >>> (XEN)    HW_LR[3]=0
> >> >> >>> (XEN) Inflight irq=86 lr=0
> >> >> >>> (XEN) Inflight irq=2 lr=255
> >> >> >>> (XEN) Pending irq=2
> >> >> >>>
> >> >> >>> But I see that after I got hang - maintenance interrupts are generated
> >> >> >>> continuously. Platform continues printing the same log till reboot.
> >> >> >>
> >> >> >> Exactly the same log? As in the one above you just pasted?
> >> >> >> That is very very suspicious.
> >> >> >
> >> >> > Yes exactly the same log. And looks like it means that LRs are flushed
> >> >> > correctly.
> >> >> >
> >> >> >>
> >> >> >> I am thinking that we are not handling GICH_HCR_UIE correctly and
> >> >> >> something we do in Xen, maybe writing to an LR register, might trigger a
> >> >> >> new maintenance interrupt immediately causing an infinite loop.
> >> >> >>
> >> >> >
> >> >> > Yes, this is what I'm thinking about. Taking in account all collected
> >> >> > debug info it looks like once LRs are overloaded with SGIs -
> >> >> > maintenance interrupt occurs.
> >> >> > And then it is not handled properly, and occurs again and again - so
> >> >> > platform hangs inside its handler.
> >> >> >
> >> >> >> Could you please try this patch? It disable GICH_HCR_UIE immediately on
> >> >> >> hypervisor entry.
> >> >> >>
> >> >> >
> >> >> > Now trying.
> >> >> >
> >> >> >>
> >> >> >> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> >> >> >> index 4d2a92d..6ae8dc4 100644
> >> >> >> --- a/xen/arch/arm/gic.c
> >> >> >> +++ b/xen/arch/arm/gic.c
> >> >> >> @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v)
> >> >> >>      if ( is_idle_vcpu(v) )
> >> >> >>          return;
> >> >> >>
> >> >> >> +    GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> +
> >> >> >>      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >> >> >>
> >> >> >>      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> >> >> >> @@ -821,12 +823,8 @@ void gic_inject(void)
> >> >> >>
> >> >> >>      gic_restore_pending_irqs(current);
> >> >> >>
> >> >> >> -
> >> >> >>      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> >> >>          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> >> >> -    else
> >> >> >> -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> >> >> -
> >> >> >>  }
> >> >> >>
> >> >> >>  static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi)
> >> >> >
> >> >>
> >> >> Heh - I don't see hangs with this patch :) But also I see that
> >> >> maintenance interrupt doesn't occur (and no hang as result)
> >> >> Stefano - is this expected?
> >> >
> >> > No maintenance interrupts at all? That's strange. You should be
> >> > receiving them when LRs are full and you still have interrupts pending
> >> > to be added to them.
> >> >
> >> > You could add another printk here to see if you should be receiving
> >> > them:
> >> >
> >> >      if ( !list_empty(&current->arch.vgic.lr_pending) && lr_all_full() )
> >> > +    {
> >> > +        gdprintk(XENLOG_DEBUG, "requesting maintenance interrupt\n");
> >> >          GICH[GICH_HCR] |= GICH_HCR_UIE;
> >> > -    else
> >> > -        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> >> > -
> >> > +    }
> >> >  }
> >> >
> >>
> >> Requested properly:
> >>
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >> (XEN) gic.c:756:d0v0 requesting maintenance interrupt
> >>
> >> But does not occur
> >
> > OK, let's see what's going on then by printing the irq number of the
> > maintenance interrupt:
> >
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index 4d2a92d..fed3167 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -55,6 +55,7 @@ static struct {
> >  static DEFINE_PER_CPU(uint64_t, lr_mask);
> >
> >  static uint8_t nr_lrs;
> > +static bool uie_on;
> >  #define lr_all_full() (this_cpu(lr_mask) == ((1 << nr_lrs) - 1))
> >
> >  /* The GIC mapping of CPU interfaces does not necessarily match the
> > @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v)
> >  {
> >      int i = 0;
> >      unsigned long flags;
> > +    unsigned long hcr;
> >
> >      /* The idle domain has no LRs to be cleared. Since gic_restore_state
> >       * doesn't write any LR registers for the idle domain they could be
> > @@ -701,6 +703,13 @@ void gic_clear_lrs(struct vcpu *v)
> >      if ( is_idle_vcpu(v) )
> >          return;
> >
> > +    hcr = GICH[GICH_HCR];
> > +    if ( hcr & GICH_HCR_UIE )
> > +    {
> > +        GICH[GICH_HCR] &= ~GICH_HCR_UIE;
> > +        uie_on = 1;
> > +    }
> > +
> >      spin_lock_irqsave(&v->arch.vgic.lock, flags);
> >
> >      while ((i = find_next_bit((const unsigned long *) &this_cpu(lr_mask),
> > @@ -865,6 +873,11 @@ void gic_interrupt(struct cpu_user_regs *regs, int is_fiq)
> >          intack = GICC[GICC_IAR];
> >          irq = intack & GICC_IA_IRQ;
> >
> > +        if ( uie_on )
> > +        {
> > +            uie_on = 0;
> > +            printk("received maintenance interrupt irq=%d\n", irq);
> > +        }
> >          if ( likely(irq >= 16 && irq < 1021) )
> >          {
> >              local_irq_enable();
> 
> 
> 
> -- 
> 
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
>

[Xen-devel] Xen 4.5 random freeze question

Commit Message

Comments

Patch