diff mbox

[Xen-devel,v1,2/2] xen/arm: Fix deadlock in on_selected_cpus function

Message ID alpine.DEB.2.02.1401281526510.4373@kaball.uk.xensource.com
State New
Headers show

Commit Message

Stefano Stabellini Jan. 28, 2014, 4:02 p.m. UTC
On Tue, 28 Jan 2014, Ian Campbell wrote:
> On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > > but I am a bit wary of making that change at RC2.
> > > 
> > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > primitives to work around this issue on a case by case basis. It's just
> > > too subtle IMHO.
> > > 
> > > The IPI and cross CPU calling primitives are basically predicated on
> > > those IPIs interrupting normal interrupt handlers.
> > 
> > The problem is that we don't know if we can context switch properly
> > nested interrupts.
> 
> What do you mean? We don't have to context switch an IPI.

Sorry, I meant save/restore registers, stack pointer, processor mode,
etc, for nested interrupts.


> > Also I would need to think harder whether everything
> > would work correctly without hitches with multiple SGIs happening
> > simultaneously (with more than 2 cpus involved).
> 
> Since all IPIs would be at the same higher priority only one will be
> active on each CPU at a time. If you are worried about multiple CPUs
> then that is already an issue today, just at a lower priority.

That is correct, but if we moved the on_selected_cpus call out of the
interrupt handler I don't think the problem could occur.


> I have hacked the IPI priority to be higher in the past and it worked
> fine, I just never got round to cleaning it up for submission (I hadn't
> thought of the locking thing and my use case was low priority).
> 
> The interrupt entry and exit paths were written with nested interrupt in
> mind and they have to be so in order to handle interrupts which occur
> from both guest and hypervisor context.
>
> > On the other hand we know that both Oleksandr's and my solution should
> > work OK with no surprises if implemented correctly.
> 
> That's a big if in my mind, any use of trylock is very subtle IMOH.

I wouldn't accept that patch for 4.4 either (or for 4.5 given that we
can certainly come up with something better by that time).


> AIUI this issue only occurs with "proto device assignment" patches added
> to 4.4, n which case I think the solution can wait until 4.5 and can be
> done properly via the IPI priority fix.
 
I think this is a pretty significant problem, even if we don't commit a
fix, we should post a proper patch that we deem acceptable and link it to
the wiki so that anybody that needs it can find it and be sure that it
works correctly.
In my opinion if we go to this length then we might as well commit it
(if it doesn't touch common code of course), but I'll leave the decision
up to you and George.

Given the constraints, the solution I would feel more comfortable with at
this time is something like the following patch (lightly tested):

Comments

Ian Campbell Jan. 28, 2014, 4:12 p.m. UTC | #1
On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote:
> On Tue, 28 Jan 2014, Ian Campbell wrote:
> > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > > > but I am a bit wary of making that change at RC2.
> > > > 
> > > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > > primitives to work around this issue on a case by case basis. It's just
> > > > too subtle IMHO.
> > > > 
> > > > The IPI and cross CPU calling primitives are basically predicated on
> > > > those IPIs interrupting normal interrupt handlers.
> > > 
> > > The problem is that we don't know if we can context switch properly
> > > nested interrupts.
> > 
> > What do you mean? We don't have to context switch an IPI.
> 
> Sorry, I meant save/restore registers, stack pointer, processor mode,
> etc, for nested interrupts.

Right, we do handle that actually since it is the same code as handles
an IRQ during a hypercall (since a hypercall is just another type of
trap).

> > > Also I would need to think harder whether everything
> > > would work correctly without hitches with multiple SGIs happening
> > > simultaneously (with more than 2 cpus involved).
> > 
> > Since all IPIs would be at the same higher priority only one will be
> > active on each CPU at a time. If you are worried about multiple CPUs
> > then that is already an issue today, just at a lower priority.
> 
> That is correct, but if we moved the on_selected_cpus call out of the
> interrupt handler I don't think the problem could occur.

Sure, I'm not opposed to fixing this issue by making an architectural
improvement to the code which happens to not use on_selected_cpus.

> > AIUI this issue only occurs with "proto device assignment" patches added
> > to 4.4, n which case I think the solution can wait until 4.5 and can be
> > done properly via the IPI priority fix.
>  
> I think this is a pretty significant problem, even if we don't commit a
> fix, we should post a proper patch that we deem acceptable and link it to
> the wiki so that anybody that needs it can find it and be sure that it
> works correctly.

FWIW I have an almost fix to the IPI priority thing which I would intend
to fix in 4.5 regardless of whether this issue needs it or not.

> In my opinion if we go to this length then we might as well commit it
> (if it doesn't touch common code of course), but I'll leave the decision
> up to you and George.
> 
> Given the constraints, the solution I would feel more comfortable with at
> this time is something like the following patch (lightly tested):

I don't think this buys you anything over just fixing on_selected_cpus
to work as it should, unless there is some reason why deferring this
work to a tasklet is the logically correct thing to do and/or a better
designh?

(IOW if you are only doing this to avoid calling on_selected_cpus in
interrupt context then I think this is the wrong fix).

Ian.

> diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> index e6257a7..b00ca73 100644
> --- a/xen/arch/arm/gic.c
> +++ b/xen/arch/arm/gic.c
> @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
>  
>  static unsigned nr_lrs;
>  
> +static void gic_irq_eoi(void *info)
> +{
> +    int virq = (uintptr_t) info;
> +    GICC[GICC_DIR] = virq;
> +}
> +
> +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
> +static void eoi_action(unsigned long unused)
> +{
> +    struct pending_irq *p = this_cpu(eoi_irq);
> +    ASSERT(p != NULL);
> +
> +    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
> +            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
> +
> +    this_cpu(eoi_irq) = NULL;
> +}
> +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
> +
>  /* The GIC mapping of CPU interfaces does not necessarily match the
>   * logical CPU numbering. Let's use mapping as returned by the GIC
>   * itself
> @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d)
>  
>  }
>  
> -static void gic_irq_eoi(void *info)
> -{
> -    int virq = (uintptr_t) info;
> -    GICC[GICC_DIR] = virq;
> -}
> -
>  static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
>  {
>      int i = 0, virq, pirq = -1;
> @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
>              if ( cpu == smp_processor_id() )
>                  gic_irq_eoi((void*)(uintptr_t)pirq);
>              else
> -                on_selected_cpus(cpumask_of(cpu),
> -                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
> +            {
> +                ASSERT(this_cpu(eoi_irq) == NULL);
> +                this_cpu(eoi_irq) = p;
> +                tasklet_schedule(&eoi_tasklet);
> +            }
>          }
>  
>          i++;
Stefano Stabellini Jan. 28, 2014, 4:23 p.m. UTC | #2
On Tue, 28 Jan 2014, Ian Campbell wrote:
> On Tue, 2014-01-28 at 16:02 +0000, Stefano Stabellini wrote:
> > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > On Tue, 2014-01-28 at 14:00 +0000, Stefano Stabellini wrote:
> > > > On Tue, 28 Jan 2014, Ian Campbell wrote:
> > > > > On Mon, 2014-01-27 at 19:00 +0000, Stefano Stabellini wrote:
> > > > > > Alternatively, as Ian suggested, we could increase the priotiry of SGIs
> > > > > > but I am a bit wary of making that change at RC2.
> > > > > 
> > > > > I'm leaning the other way -- I'm wary of open coding magic locking
> > > > > primitives to work around this issue on a case by case basis. It's just
> > > > > too subtle IMHO.
> > > > > 
> > > > > The IPI and cross CPU calling primitives are basically predicated on
> > > > > those IPIs interrupting normal interrupt handlers.
> > > > 
> > > > The problem is that we don't know if we can context switch properly
> > > > nested interrupts.
> > > 
> > > What do you mean? We don't have to context switch an IPI.
> > 
> > Sorry, I meant save/restore registers, stack pointer, processor mode,
> > etc, for nested interrupts.
> 
> Right, we do handle that actually since it is the same code as handles
> an IRQ during a hypercall (since a hypercall is just another type of
> trap).

Ah right, good.


> > > > Also I would need to think harder whether everything
> > > > would work correctly without hitches with multiple SGIs happening
> > > > simultaneously (with more than 2 cpus involved).
> > > 
> > > Since all IPIs would be at the same higher priority only one will be
> > > active on each CPU at a time. If you are worried about multiple CPUs
> > > then that is already an issue today, just at a lower priority.
> > 
> > That is correct, but if we moved the on_selected_cpus call out of the
> > interrupt handler I don't think the problem could occur.
> 
> Sure, I'm not opposed to fixing this issue by making an architectural
> improvement to the code which happens to not use on_selected_cpus.
> 
> > > AIUI this issue only occurs with "proto device assignment" patches added
> > > to 4.4, n which case I think the solution can wait until 4.5 and can be
> > > done properly via the IPI priority fix.
> >  
> > I think this is a pretty significant problem, even if we don't commit a
> > fix, we should post a proper patch that we deem acceptable and link it to
> > the wiki so that anybody that needs it can find it and be sure that it
> > works correctly.
> 
> FWIW I have an almost fix to the IPI priority thing which I would intend
> to fix in 4.5 regardless of whether this issue needs it or not.

Well, in that case if you can test the patch in the specific case where
an SGI interrupts a lower priority interrupt, it might be worth sending
out the patch now.


> > In my opinion if we go to this length then we might as well commit it
> > (if it doesn't touch common code of course), but I'll leave the decision
> > up to you and George.
> > 
> > Given the constraints, the solution I would feel more comfortable with at
> > this time is something like the following patch (lightly tested):
> 
> I don't think this buys you anything over just fixing on_selected_cpus
> to work as it should, unless there is some reason why deferring this
> work to a tasklet is the logically correct thing to do and/or a better
> designh?
> 
> (IOW if you are only doing this to avoid calling on_selected_cpus in
> interrupt context then I think this is the wrong fix).

I feel that on_selected_cpus is the kind of work that belongs to the
"bottom half". I was never very happy to have the call where it is now
in the first place. This approach would have the benefit of allowing us
to receive regular interrupts while waiting for other cpus to handle the
SGI. Also I am sure that it would work even in cases where you have
more than one SGIs targeting the same cpu simultaneously.


> Ian.
> 
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
> > index e6257a7..b00ca73 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -58,6 +58,25 @@ static DEFINE_PER_CPU(uint64_t, lr_mask);
> >  
> >  static unsigned nr_lrs;
> >  
> > +static void gic_irq_eoi(void *info)
> > +{
> > +    int virq = (uintptr_t) info;
> > +    GICC[GICC_DIR] = virq;
> > +}
> > +
> > +static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
> > +static void eoi_action(unsigned long unused)
> > +{
> > +    struct pending_irq *p = this_cpu(eoi_irq);
> > +    ASSERT(p != NULL);
> > +
> > +    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
> > +            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
> > +
> > +    this_cpu(eoi_irq) = NULL;
> > +}
> > +static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
> > +
> >  /* The GIC mapping of CPU interfaces does not necessarily match the
> >   * logical CPU numbering. Let's use mapping as returned by the GIC
> >   * itself
> > @@ -897,12 +916,6 @@ int gicv_setup(struct domain *d)
> >  
> >  }
> >  
> > -static void gic_irq_eoi(void *info)
> > -{
> > -    int virq = (uintptr_t) info;
> > -    GICC[GICC_DIR] = virq;
> > -}
> > -
> >  static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
> >  {
> >      int i = 0, virq, pirq = -1;
> > @@ -962,8 +975,11 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
> >              if ( cpu == smp_processor_id() )
> >                  gic_irq_eoi((void*)(uintptr_t)pirq);
> >              else
> > -                on_selected_cpus(cpumask_of(cpu),
> > -                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
> > +            {
> > +                ASSERT(this_cpu(eoi_irq) == NULL);
> > +                this_cpu(eoi_irq) = p;
> > +                tasklet_schedule(&eoi_tasklet);
> > +            }
> >          }
> >  
> >          i++;
> 
>
diff mbox

Patch

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index e6257a7..b00ca73 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -58,6 +58,25 @@  static DEFINE_PER_CPU(uint64_t, lr_mask);
 
 static unsigned nr_lrs;
 
+static void gic_irq_eoi(void *info)
+{
+    int virq = (uintptr_t) info;
+    GICC[GICC_DIR] = virq;
+}
+
+static DEFINE_PER_CPU(struct pending_irq*, eoi_irq);
+static void eoi_action(unsigned long unused)
+{
+    struct pending_irq *p = this_cpu(eoi_irq);
+    ASSERT(p != NULL);
+
+    on_selected_cpus(cpumask_of(p->desc->arch.eoi_cpu),
+            gic_irq_eoi, (void*)(uintptr_t)p->desc->irq, 0);
+
+    this_cpu(eoi_irq) = NULL;
+}
+static DECLARE_TASKLET(eoi_tasklet, eoi_action, 0);
+
 /* The GIC mapping of CPU interfaces does not necessarily match the
  * logical CPU numbering. Let's use mapping as returned by the GIC
  * itself
@@ -897,12 +916,6 @@  int gicv_setup(struct domain *d)
 
 }
 
-static void gic_irq_eoi(void *info)
-{
-    int virq = (uintptr_t) info;
-    GICC[GICC_DIR] = virq;
-}
-
 static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs)
 {
     int i = 0, virq, pirq = -1;
@@ -962,8 +975,11 @@  static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r
             if ( cpu == smp_processor_id() )
                 gic_irq_eoi((void*)(uintptr_t)pirq);
             else
-                on_selected_cpus(cpumask_of(cpu),
-                                 gic_irq_eoi, (void*)(uintptr_t)pirq, 0);
+            {
+                ASSERT(this_cpu(eoi_irq) == NULL);
+                this_cpu(eoi_irq) = p;
+                tasklet_schedule(&eoi_tasklet);
+            }
         }
 
         i++;