diff mbox

[Xen-devel] Xen 4.5 random freeze question

Message ID alpine.DEB.2.02.1411171742360.26318@kaball.uk.xensource.com
State New
Headers show

Commit Message

Stefano Stabellini Nov. 17, 2014, 6:02 p.m. UTC
On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
> Hi Stefano,
> 
> Thank you for your answer.
> 
> On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
> > Although it is possible that that patch is the cause of your problem,
> > unfortunately it is part of a significant rework of the GIC driver in
> > Xen and I am afraid that testing with only a portion of that patch
> > series might introduce other subtle bugs.  For your reference the series
> > starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
> > commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
> >
> 
> Yes, I tested with and without the whole series.

And the result is that the series causes the problem?


> > If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
> > problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
> > might not work correctly on your platform. It wouldn't be the first time
> > that we see hardware behaving that way, especially if you are using the
> > GIC secure registers instead of the non-secure register as GICH_LRn.HW
> > can only deactivate non-secure interrupts. This is usually due to a
> > configuration error in u-boot.
> >
> > Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
> > platform?
> >
> 
> I tried this. Unfortunately it doesn't help.

Could you try the following patch on top of
5495a512b63bad868c147198f7f049c2617d468c ?

Comments

Andrii Tseglytskyi Nov. 18, 2014, 10:41 a.m. UTC | #1
Hi Stefano,

On Mon, Nov 17, 2014 at 8:02 PM, Stefano Stabellini
<stefano.stabellini@eu.citrix.com> wrote:
> On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
>> Hi Stefano,
>>
>> Thank you for your answer.
>>
>> On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
>> <stefano.stabellini@eu.citrix.com> wrote:
>> > Although it is possible that that patch is the cause of your problem,
>> > unfortunately it is part of a significant rework of the GIC driver in
>> > Xen and I am afraid that testing with only a portion of that patch
>> > series might introduce other subtle bugs.  For your reference the series
>> > starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
>> > commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
>> >
>>
>> Yes, I tested with and without the whole series.
>
> And the result is that the series causes the problem?
>

Yes.

>
>> > If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
>> > problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
>> > might not work correctly on your platform. It wouldn't be the first time
>> > that we see hardware behaving that way, especially if you are using the
>> > GIC secure registers instead of the non-secure register as GICH_LRn.HW
>> > can only deactivate non-secure interrupts. This is usually due to a
>> > configuration error in u-boot.
>> >
>> > Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
>> > platform?
>> >
>>
>> I tried this. Unfortunately it doesn't help.
>
> Could you try the following patch on top of
> 5495a512b63bad868c147198f7f049c2617d468c ?
>
> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index 302c031..a286376 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>      BUG_ON(lr < 0);
>      BUG_ON(state & ~(GICH_LR_STATE_MASK<<GICH_LR_STATE_SHIFT));
>
> -    lr_val = state | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
> +    lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
> -    if ( p->desc != NULL )
> -        lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>
>      GICH[GICH_LR + lr] = lr_val;
>
> @@ -622,6 +620,12 @@ out:
>      return;
>  }
>
> +static void gic_irq_eoi(void *info)
> +{
> +    int virq = (uintptr_t) info;
> +    GICC[GICC_DIR] = virq;
> +}
> +
>  static void gic_update_one_lr(struct vcpu *v, int i)
>  {
>      struct pending_irq *p;
> @@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>          irq = (lr >> GICH_LR_VIRTUAL_SHIFT) & GICH_LR_VIRTUAL_MASK;
>          p = irq_to_pending(v, irq);
>          if ( p->desc != NULL )
> +        {
> +            gic_irq_eoi((void*)(uintptr_t)irq);
>              p->desc->status &= ~IRQ_INPROGRESS;
> +        }
>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>          if ( test_bit(GIC_IRQ_GUEST_PENDING, &p->status) &&
>                  test_bit(GIC_IRQ_GUEST_ENABLED, &p->status))


It helps! Thank you a lot!
I did about ~30 reboots and got no hangs. The only what is needed - is
to rebase these changes on top of xen/master branch.
Changes in patch can be applied only on top of
5495a512b63bad868c147198f7f049c2617d468c
Will you do this change? Is it acceptable for baseline?

Regards,
Andrii
Andrii Tseglytskyi Nov. 18, 2014, 11:31 a.m. UTC | #2
Strange - looks like baseline code already does the same, that you
sent me yesterday. The only what is needed - is to set
PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI.
But baseline contains an issue. And in the same time changes on top of
5495a512b63bad868c147198f7f049c2617d468c work fine.

Regards,
Andrii

On Tue, Nov 18, 2014 at 12:41 PM, Andrii Tseglytskyi
<andrii.tseglytskyi@globallogic.com> wrote:
> Hi Stefano,
>
> On Mon, Nov 17, 2014 at 8:02 PM, Stefano Stabellini
> <stefano.stabellini@eu.citrix.com> wrote:
>> On Mon, 17 Nov 2014, Andrii Tseglytskyi wrote:
>>> Hi Stefano,
>>>
>>> Thank you for your answer.
>>>
>>> On Mon, Nov 17, 2014 at 6:39 PM, Stefano Stabellini
>>> <stefano.stabellini@eu.citrix.com> wrote:
>>> > Although it is possible that that patch is the cause of your problem,
>>> > unfortunately it is part of a significant rework of the GIC driver in
>>> > Xen and I am afraid that testing with only a portion of that patch
>>> > series might introduce other subtle bugs.  For your reference the series
>>> > starts at commit 6f91502be64a05d0635454d629118b96ae38b50f and ends at
>>> > commit 72eaf29e8d70784aaf066ead79df1295a25ecfd0.
>>> >
>>>
>>> Yes, I tested with and without the whole series.
>>
>> And the result is that the series causes the problem?
>>
>
> Yes.
>
>>
>>> > If 5495a512b63bad868c147198f7f049c2617d468c is really the cause of your
>>> > problem, one idea that comes to mind is that GICH_LR_MAINTENANCE_IRQ
>>> > might not work correctly on your platform. It wouldn't be the first time
>>> > that we see hardware behaving that way, especially if you are using the
>>> > GIC secure registers instead of the non-secure register as GICH_LRn.HW
>>> > can only deactivate non-secure interrupts. This is usually due to a
>>> > configuration error in u-boot.
>>> >
>>> > Could you please try to set PLATFORM_QUIRK_GUEST_PIRQ_NEED_EOI for your
>>> > platform?
>>> >
>>>
>>> I tried this. Unfortunately it doesn't help.
>>
>> Could you try the following patch on top of
>> 5495a512b63bad868c147198f7f049c2617d468c ?
>>
>> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
>> index 302c031..a286376 100644
>> --- a/xen/arch/arm/gic.c
>> +++ b/xen/arch/arm/gic.c
>> @@ -557,10 +557,8 @@ static inline void gic_set_lr(int lr, struct pending_irq *p,
>>      BUG_ON(lr < 0);
>>      BUG_ON(state & ~(GICH_LR_STATE_MASK<<GICH_LR_STATE_SHIFT));
>>
>> -    lr_val = state | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>> +    lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
>>          ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
>> -    if ( p->desc != NULL )
>> -        lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
>>
>>      GICH[GICH_LR + lr] = lr_val;
>>
>> @@ -622,6 +620,12 @@ out:
>>      return;
>>  }
>>
>> +static void gic_irq_eoi(void *info)
>> +{
>> +    int virq = (uintptr_t) info;
>> +    GICC[GICC_DIR] = virq;
>> +}
>> +
>>  static void gic_update_one_lr(struct vcpu *v, int i)
>>  {
>>      struct pending_irq *p;
>> @@ -639,7 +643,10 @@ static void gic_update_one_lr(struct vcpu *v, int i)
>>          irq = (lr >> GICH_LR_VIRTUAL_SHIFT) & GICH_LR_VIRTUAL_MASK;
>>          p = irq_to_pending(v, irq);
>>          if ( p->desc != NULL )
>> +        {
>> +            gic_irq_eoi((void*)(uintptr_t)irq);
>>              p->desc->status &= ~IRQ_INPROGRESS;
>> +        }
>>          clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
>>          if ( test_bit(GIC_IRQ_GUEST_PENDING, &p->status) &&
>>                  test_bit(GIC_IRQ_GUEST_ENABLED, &p->status))
>
>
> It helps! Thank you a lot!
> I did about ~30 reboots and got no hangs. The only what is needed - is
> to rebase these changes on top of xen/master branch.
> Changes in patch can be applied only on top of
> 5495a512b63bad868c147198f7f049c2617d468c
> Will you do this change? Is it acceptable for baseline?
>
> Regards,
> Andrii
>
> --
>
> Andrii Tseglytskyi | Embedded Dev
> GlobalLogic
> www.globallogic.com
diff mbox

Patch

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 302c031..a286376 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -557,10 +557,8 @@  static inline void gic_set_lr(int lr, struct pending_irq *p,
     BUG_ON(lr < 0);
     BUG_ON(state & ~(GICH_LR_STATE_MASK<<GICH_LR_STATE_SHIFT));
 
-    lr_val = state | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
+    lr_val = state | GICH_LR_MAINTENANCE_IRQ | ((p->priority >> 3) << GICH_LR_PRIORITY_SHIFT) |
         ((p->irq & GICH_LR_VIRTUAL_MASK) << GICH_LR_VIRTUAL_SHIFT);
-    if ( p->desc != NULL )
-        lr_val |= GICH_LR_HW | (p->desc->irq << GICH_LR_PHYSICAL_SHIFT);
 
     GICH[GICH_LR + lr] = lr_val;
 
@@ -622,6 +620,12 @@  out:
     return;
 }
 
+static void gic_irq_eoi(void *info)
+{
+    int virq = (uintptr_t) info;
+    GICC[GICC_DIR] = virq;
+}
+
 static void gic_update_one_lr(struct vcpu *v, int i)
 {
     struct pending_irq *p;
@@ -639,7 +643,10 @@  static void gic_update_one_lr(struct vcpu *v, int i)
         irq = (lr >> GICH_LR_VIRTUAL_SHIFT) & GICH_LR_VIRTUAL_MASK;
         p = irq_to_pending(v, irq);
         if ( p->desc != NULL )
+        {
+            gic_irq_eoi((void*)(uintptr_t)irq);
             p->desc->status &= ~IRQ_INPROGRESS;
+        }
         clear_bit(GIC_IRQ_GUEST_VISIBLE, &p->status);
         if ( test_bit(GIC_IRQ_GUEST_PENDING, &p->status) &&
                 test_bit(GIC_IRQ_GUEST_ENABLED, &p->status))