diff mbox

arm: use irq_set_affinity with force=false when migrating irqs

Message ID 1409571966-18343-1-git-send-email-sudeep.holla@arm.com
State Superseded
Headers show

Commit Message

Sudeep Holla Sept. 1, 2014, 11:46 a.m. UTC
From: Sudeep Holla <sudeep.holla@arm.com>

Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
enabled the forced irq_set_affinity which previously refused to route an
interrupt to an offline cpu.

Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
implements this force logic and disables the cpu online check for GIC
interrupt controller.

When __cpu_disable calls migrate_irqs, it disables the current cpu in
cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
away from the cpu but passes affinity mask with the cpu being offlined
also included in it.

If irq_set_affinity is called with force=true in a cpu hotplug path,
the caller must ensure that the cpu being offlined is not present in the
affinity mask or it may be selected as the target CPU, leading to the
interrupt not being migrated.

This patch fixes the issue by calling irq_set_affinity with force=false
so that cpu_online_mask is checked while setting the affinity in the
cpu hotplug path.

Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
locks up as the IRQs are not migrated away from CPU0.

Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: <stable@vger.kernel.org> # 3.10.x
---
 arch/arm/kernel/irq.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Hi Russell,

If you or tglx has no objections to this patch, I will put it
in your patch tracker.

Regards,
Sudeep

Comments

Russell King - ARM Linux Sept. 1, 2014, 11:50 a.m. UTC | #1
On Mon, Sep 01, 2014 at 12:46:06PM +0100, Sudeep Holla wrote:
> From: Sudeep Holla <sudeep.holla@arm.com>
> 
> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
> enabled the forced irq_set_affinity which previously refused to route an
> interrupt to an offline cpu.
> 
> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
> implements this force logic and disables the cpu online check for GIC
> interrupt controller.
> 
> When __cpu_disable calls migrate_irqs, it disables the current cpu in
> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
> away from the cpu but passes affinity mask with the cpu being offlined
> also included in it.
> 
> If irq_set_affinity is called with force=true in a cpu hotplug path,
> the caller must ensure that the cpu being offlined is not present in the
> affinity mask or it may be selected as the target CPU, leading to the
> interrupt not being migrated.
> 
> This patch fixes the issue by calling irq_set_affinity with force=false
> so that cpu_online_mask is checked while setting the affinity in the
> cpu hotplug path.
> 
> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
> locks up as the IRQs are not migrated away from CPU0.
> 
> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> Cc: Russell King <linux@arm.linux.org.uk>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: <stable@vger.kernel.org> # 3.10.x
> ---
>  arch/arm/kernel/irq.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Hi Russell,
> 
> If you or tglx has no objections to this patch, I will put it
> in your patch tracker.

Post discussion, I have no objections - except to the above comment.  Let
me rewrite it in a programming language, and maybe you can spot what's
wrong:

	if (russell_has_no_objection(patch) || tglx_has_no_objection(patch))
		submit_patch_to_tracker(patch);

Personally, I'd like to see tglx's ack on this first.
Sudeep Holla Sept. 1, 2014, 12:18 p.m. UTC | #2
On 01/09/14 12:50, Russell King - ARM Linux wrote:
> On Mon, Sep 01, 2014 at 12:46:06PM +0100, Sudeep Holla wrote:
>> From: Sudeep Holla <sudeep.holla@arm.com>
>>
>> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
>> enabled the forced irq_set_affinity which previously refused to route an
>> interrupt to an offline cpu.
>>
>> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
>> implements this force logic and disables the cpu online check for GIC
>> interrupt controller.
>>
>> When __cpu_disable calls migrate_irqs, it disables the current cpu in
>> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
>> away from the cpu but passes affinity mask with the cpu being offlined
>> also included in it.
>>
>> If irq_set_affinity is called with force=true in a cpu hotplug path,
>> the caller must ensure that the cpu being offlined is not present in the
>> affinity mask or it may be selected as the target CPU, leading to the
>> interrupt not being migrated.
>>
>> This patch fixes the issue by calling irq_set_affinity with force=false
>> so that cpu_online_mask is checked while setting the affinity in the
>> cpu hotplug path.
>>
>> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
>> locks up as the IRQs are not migrated away from CPU0.
>>
>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: <stable@vger.kernel.org> # 3.10.x
>> ---
>>   arch/arm/kernel/irq.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Hi Russell,
>>
>> If you or tglx has no objections to this patch, I will put it
>> in your patch tracker.
>
> Post discussion, I have no objections - except to the above comment.  Let
> me rewrite it in a programming language, and maybe you can spot what's
> wrong:
>
> 	if (russell_has_no_objection(patch) || tglx_has_no_objection(patch))
> 		submit_patch_to_tracker(patch);
>
> Personally, I'd like to see tglx's ack on this first.

Understood, I will wait for ack from tglx.

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Thomas Gleixner Sept. 1, 2014, 12:56 p.m. UTC | #3
On Mon, 1 Sep 2014, Russell King - ARM Linux wrote:
> On Mon, Sep 01, 2014 at 12:46:06PM +0100, Sudeep Holla wrote:
> > From: Sudeep Holla <sudeep.holla@arm.com>
> > 
> > Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
> > enabled the forced irq_set_affinity which previously refused to route an
> > interrupt to an offline cpu.
> > 
> > Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
> > implements this force logic and disables the cpu online check for GIC
> > interrupt controller.
> > 
> > When __cpu_disable calls migrate_irqs, it disables the current cpu in
> > cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
> > away from the cpu but passes affinity mask with the cpu being offlined
> > also included in it.
> > 
> > If irq_set_affinity is called with force=true in a cpu hotplug path,
> > the caller must ensure that the cpu being offlined is not present in the
> > affinity mask or it may be selected as the target CPU, leading to the
> > interrupt not being migrated.
> > 
> > This patch fixes the issue by calling irq_set_affinity with force=false
> > so that cpu_online_mask is checked while setting the affinity in the
> > cpu hotplug path.
> > 
> > Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
> > locks up as the IRQs are not migrated away from CPU0.
> > 
> > Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
> > Cc: Russell King <linux@arm.linux.org.uk>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: <stable@vger.kernel.org> # 3.10.x
> > ---
> >  arch/arm/kernel/irq.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > Hi Russell,
> > 
> > If you or tglx has no objections to this patch, I will put it
> > in your patch tracker.
> 
> Post discussion, I have no objections - except to the above comment.  Let
> me rewrite it in a programming language, and maybe you can spot what's
> wrong:
> 
> 	if (russell_has_no_objection(patch) || tglx_has_no_objection(patch))
> 		submit_patch_to_tracker(patch);
> 
> Personally, I'd like to see tglx's ack on this first.

Acked-by: Thomas Gleixner <tglx@linutronix.de>

I'm just not too happy about the changelog. It's confusing at
best:

   Call irq_set_affinity() with argument force = false

   Reason: IRQ Core and GIC was changed to force set affinity.
   	   
   	   If called from hotplug code, then the caller must ensure
   	   that the cpu being offlined is not present in the mask

   Therefor call it with force = false	   

That does not make sense. You change something which preceded the
forced affinity mechanism by 3+ years and completely miss to explain
why it got there in the first place and never should have been.

What about somthing like the following:

Since commit 1dbfa187dad ("ARM: irq migration: force migration off CPU
going down") the ARM interrupt migration code on cpu offline calls
irqchip.irq_set_affinity() with the argument force=true. At the point
of this change the argument had no effect because it was not used by
any interrupt chip driver and there was no semantics defined.

This changed with commit 01f8fa4f01d8 ("genirq: Allow forcing cpu
affinity of interrupts") which made the force argument useful to route
interrupts to not yet online cpus without checking the target cpu
against the cpu online mask. The following commit ffde1de64012
("irqchip: gic: Support forced affinity setting") implemented this for
the GIC interrupt chip.

As a consequence the ARM cpu offline irq migration fails if CPU0 is
offlined, because CPU0 is still set in the affinity mask and the
validataion against cpu online mask is skipped to the force argument
being true. The following first_cpu(mask) selection always selects
CPU0 as the target.

Solve the issue by calling irq_set_affinity() with force=false from
the CPU offline irq migration code so the GIC driver validates the
affinity mask against CPU online mask and therefor removes CPU0 from
the possible target candidates.

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla Sept. 1, 2014, 1:03 p.m. UTC | #4
On 01/09/14 13:56, Thomas Gleixner wrote:
> On Mon, 1 Sep 2014, Russell King - ARM Linux wrote:
>> On Mon, Sep 01, 2014 at 12:46:06PM +0100, Sudeep Holla wrote:
>>> From: Sudeep Holla <sudeep.holla@arm.com>
>>>
>>> Commit 01f8fa4f01d8("genirq: Allow forcing cpu affinity of interrupts")
>>> enabled the forced irq_set_affinity which previously refused to route an
>>> interrupt to an offline cpu.
>>>
>>> Commit ffde1de64012("irqchip: Gic: Support forced affinity setting")
>>> implements this force logic and disables the cpu online check for GIC
>>> interrupt controller.
>>>
>>> When __cpu_disable calls migrate_irqs, it disables the current cpu in
>>> cpu_online_mask and uses forced irq_set_affinity to migrate the IRQs
>>> away from the cpu but passes affinity mask with the cpu being offlined
>>> also included in it.
>>>
>>> If irq_set_affinity is called with force=true in a cpu hotplug path,
>>> the caller must ensure that the cpu being offlined is not present in the
>>> affinity mask or it may be selected as the target CPU, leading to the
>>> interrupt not being migrated.
>>>
>>> This patch fixes the issue by calling irq_set_affinity with force=false
>>> so that cpu_online_mask is checked while setting the affinity in the
>>> cpu hotplug path.
>>>
>>> Tested on TC2 hotpluging CPU0 in and out. Without this patch the system
>>> locks up as the IRQs are not migrated away from CPU0.
>>>
>>> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
>>> Cc: Russell King <linux@arm.linux.org.uk>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Cc: <stable@vger.kernel.org> # 3.10.x
>>> ---
>>>   arch/arm/kernel/irq.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> Hi Russell,
>>>
>>> If you or tglx has no objections to this patch, I will put it
>>> in your patch tracker.
>>
>> Post discussion, I have no objections - except to the above comment.  Let
>> me rewrite it in a programming language, and maybe you can spot what's
>> wrong:
>>
>> 	if (russell_has_no_objection(patch) || tglx_has_no_objection(patch))
>> 		submit_patch_to_tracker(patch);
>>
>> Personally, I'd like to see tglx's ack on this first.
>
> Acked-by: Thomas Gleixner <tglx@linutronix.de>
>

Thanks Thomas.

> I'm just not too happy about the changelog. It's confusing at
> best:
>
>     Call irq_set_affinity() with argument force = false
>
>     Reason: IRQ Core and GIC was changed to force set affinity.
>     	
>     	   If called from hotplug code, then the caller must ensure
>     	   that the cpu being offlined is not present in the mask
>
>     Therefor call it with force = false	
>
> That does not make sense. You change something which preceded the
> forced affinity mechanism by 3+ years and completely miss to explain
> why it got there in the first place and never should have been.
>

Agreed, I missed that. I will change the commit log as specified by
you below.

Regards,
Sudeep

> What about somthing like the following:
>
> Since commit 1dbfa187dad ("ARM: irq migration: force migration off CPU
> going down") the ARM interrupt migration code on cpu offline calls
> irqchip.irq_set_affinity() with the argument force=true. At the point
> of this change the argument had no effect because it was not used by
> any interrupt chip driver and there was no semantics defined.
>
> This changed with commit 01f8fa4f01d8 ("genirq: Allow forcing cpu
> affinity of interrupts") which made the force argument useful to route
> interrupts to not yet online cpus without checking the target cpu
> against the cpu online mask. The following commit ffde1de64012
> ("irqchip: gic: Support forced affinity setting") implemented this for
> the GIC interrupt chip.
>
> As a consequence the ARM cpu offline irq migration fails if CPU0 is
> offlined, because CPU0 is still set in the affinity mask and the
> validataion against cpu online mask is skipped to the force argument
> being true. The following first_cpu(mask) selection always selects
> CPU0 as the target.
>
> Solve the issue by calling irq_set_affinity() with force=false from
> the CPU offline irq migration code so the GIC driver validates the
> affinity mask against CPU online mask and therefor removes CPU0 from
> the possible target candidates.
>
>

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/kernel/irq.c b/arch/arm/kernel/irq.c
index 2c4257604513..5c4d38e32a51 100644
--- a/arch/arm/kernel/irq.c
+++ b/arch/arm/kernel/irq.c
@@ -175,7 +175,7 @@  static bool migrate_one_irq(struct irq_desc *desc)
 	c = irq_data_get_irq_chip(d);
 	if (!c->irq_set_affinity)
 		pr_debug("IRQ%u: unable to set affinity\n", d->irq);
-	else if (c->irq_set_affinity(d, affinity, true) == IRQ_SET_MASK_OK && ret)
+	else if (c->irq_set_affinity(d, affinity, false) == IRQ_SET_MASK_OK && ret)
 		cpumask_copy(d->affinity, affinity);
 
 	return ret;