diff mbox series

powerpc/xive: Do not skip CPU-less nodes when creating the IPIs

Message ID 20210629131542.743888-1-clg@kaod.org
State New
Headers show
Series powerpc/xive: Do not skip CPU-less nodes when creating the IPIs | expand

Commit Message

Cédric Le Goater June 29, 2021, 1:15 p.m. UTC
On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
runtime. Today, the IPI is not created for such nodes, and hot-plugged
CPUs use a bogus IPI, which leads to soft lockups.

We could create the node IPI on demand but it is a bit complex because
this code would be called under bringup_up() and some IRQ locking is
being done. The simplest solution is to create the IPIs for all nodes
at startup.

Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")
Cc: stable@vger.kernel.org # v5.13
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---

This patch breaks old versions of irqbalance (<= v1.4). Possible nodes
are collected from /sys/devices/system/node/ but CPU-less nodes are
not listed there. When interrupts are scanned, the link representing
the node structure is NULL and segfault occurs.

Version 1.7 seems immune. 

---
 arch/powerpc/sysdev/xive/common.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Srikar Dronamraju June 30, 2021, 5:42 a.m. UTC | #1
* C?dric Le Goater <clg@kaod.org> [2021-06-29 15:15:42]:

> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at

> runtime. Today, the IPI is not created for such nodes, and hot-plugged

> CPUs use a bogus IPI, which leads to soft lockups.

> 

> We could create the node IPI on demand but it is a bit complex because

> this code would be called under bringup_up() and some IRQ locking is

> being done. The simplest solution is to create the IPIs for all nodes

> at startup.


Thanks for quickly coming up with the fix.

> 

> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")

> Cc: stable@vger.kernel.org # v5.13

> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>

> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> Signed-off-by: Cédric Le Goater <clg@kaod.org>


Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> ---

> 

> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes

> are collected from /sys/devices/system/node/ but CPU-less nodes are

> not listed there. When interrupts are scanned, the link representing

> the node structure is NULL and segfault occurs.

> 

> Version 1.7 seems immune. 

> 

> ---

>  arch/powerpc/sysdev/xive/common.c | 4 ----

>  1 file changed, 4 deletions(-)

> 

> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c

> index f3b16ed48b05..5d2c58dba57e 100644

> --- a/arch/powerpc/sysdev/xive/common.c

> +++ b/arch/powerpc/sysdev/xive/common.c

> @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void)

>  		struct xive_ipi_desc *xid = &xive_ipis[node];

>  		struct xive_ipi_alloc_info info = { node };

>  

> -		/* Skip nodes without CPUs */

> -		if (cpumask_empty(cpumask_of_node(node)))

> -			continue;

> -

>  		/*

>  		 * Map one IPI interrupt per node for all cpus of that node.

>  		 * Since the HW interrupt number doesn't have any meaning,

> -- 

> 2.31.1

> 


-- 
Thanks and Regards
Srikar Dronamraju
Laurent Vivier July 7, 2021, 5:37 p.m. UTC | #2
On 29/06/2021 15:15, Cédric Le Goater wrote:
> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at

> runtime. Today, the IPI is not created for such nodes, and hot-plugged

> CPUs use a bogus IPI, which leads to soft lockups.

>

> We could create the node IPI on demand but it is a bit complex because

> this code would be called under bringup_up() and some IRQ locking is

> being done. The simplest solution is to create the IPIs for all nodes

> at startup.

>

> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")

> Cc: stable@vger.kernel.org # v5.13

> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>

> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> Signed-off-by: Cédric Le Goater <clg@kaod.org>

> ---

>

> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes

> are collected from /sys/devices/system/node/ but CPU-less nodes are

> not listed there. When interrupts are scanned, the link representing

> the node structure is NULL and segfault occurs.

>

> Version 1.7 seems immune. 

>

> ---

>  arch/powerpc/sysdev/xive/common.c | 4 ----

>  1 file changed, 4 deletions(-)

>

> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c

> index f3b16ed48b05..5d2c58dba57e 100644

> --- a/arch/powerpc/sysdev/xive/common.c

> +++ b/arch/powerpc/sysdev/xive/common.c

> @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void)

>  		struct xive_ipi_desc *xid = &xive_ipis[node];

>  		struct xive_ipi_alloc_info info = { node };

>  

> -		/* Skip nodes without CPUs */

> -		if (cpumask_empty(cpumask_of_node(node)))

> -			continue;

> -

>  		/*

>  		 * Map one IPI interrupt per node for all cpus of that node.

>  		 * Since the HW interrupt number doesn't have any meaning,


Tested-by: Laurent Vivier <lvivier@redhat.com>
Laurent Vivier July 19, 2021, 7:34 a.m. UTC | #3
On 29/06/2021 15:15, Cédric Le Goater wrote:
> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at

> runtime. Today, the IPI is not created for such nodes, and hot-plugged

> CPUs use a bogus IPI, which leads to soft lockups.

> 

> We could create the node IPI on demand but it is a bit complex because

> this code would be called under bringup_up() and some IRQ locking is

> being done. The simplest solution is to create the IPIs for all nodes

> at startup.

> 

> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")

> Cc: stable@vger.kernel.org # v5.13

> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>

> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> Signed-off-by: Cédric Le Goater <clg@kaod.org>

> ---

> 

> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes

> are collected from /sys/devices/system/node/ but CPU-less nodes are

> not listed there. When interrupts are scanned, the link representing

> the node structure is NULL and segfault occurs.

> 

> Version 1.7 seems immune. 

> 

> ---

>  arch/powerpc/sysdev/xive/common.c | 4 ----

>  1 file changed, 4 deletions(-)

> 

> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c

> index f3b16ed48b05..5d2c58dba57e 100644

> --- a/arch/powerpc/sysdev/xive/common.c

> +++ b/arch/powerpc/sysdev/xive/common.c

> @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void)

>  		struct xive_ipi_desc *xid = &xive_ipis[node];

>  		struct xive_ipi_alloc_info info = { node };

>  

> -		/* Skip nodes without CPUs */

> -		if (cpumask_empty(cpumask_of_node(node)))

> -			continue;

> -

>  		/*

>  		 * Map one IPI interrupt per node for all cpus of that node.

>  		 * Since the HW interrupt number doesn't have any meaning,

> 


What happened to this fix? Will it be merged?

Thanks,
Laurent
Michael Ellerman Aug. 2, 2021, 6:37 a.m. UTC | #4
Cédric Le Goater <clg@kaod.org> writes:
> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at

> runtime. Today, the IPI is not created for such nodes, and hot-plugged

> CPUs use a bogus IPI, which leads to soft lockups.

>

> We could create the node IPI on demand but it is a bit complex because

> this code would be called under bringup_up() and some IRQ locking is

> being done. The simplest solution is to create the IPIs for all nodes

> at startup.

>

> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")

> Cc: stable@vger.kernel.org # v5.13

> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>

> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> Signed-off-by: Cédric Le Goater <clg@kaod.org>

> ---

>

> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes

> are collected from /sys/devices/system/node/ but CPU-less nodes are

> not listed there. When interrupts are scanned, the link representing

> the node structure is NULL and segfault occurs.


Breaking userspace is usually frowned upon, even if it is irqbalance.

If CPU-less nodes appeared in /sys/devices/system/node would that fix
it? Could we do that or is that not possible for other reasons?

> Version 1.7 seems immune. 


Which was released in August 2020.

Looks like some distros still ship 1.6, I take it you're not sure if
that is broken or not.

cheers
Cédric Le Goater Aug. 2, 2021, 12:59 p.m. UTC | #5
On 8/2/21 8:37 AM, Michael Ellerman wrote:
> Cédric Le Goater <clg@kaod.org> writes:

>> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at

>> runtime. Today, the IPI is not created for such nodes, and hot-plugged

>> CPUs use a bogus IPI, which leads to soft lockups.

>>

>> We could create the node IPI on demand but it is a bit complex because

>> this code would be called under bringup_up() and some IRQ locking is

>> being done. The simplest solution is to create the IPIs for all nodes

>> at startup.

>>

>> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")

>> Cc: stable@vger.kernel.org # v5.13

>> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>

>> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

>> Signed-off-by: Cédric Le Goater <clg@kaod.org>

>> ---

>>

>> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes

>> are collected from /sys/devices/system/node/ but CPU-less nodes are

>> not listed there. When interrupts are scanned, the link representing

>> the node structure is NULL and segfault occurs.

> 

> Breaking userspace is usually frowned upon, even if it is irqbalance.

>

> If CPU-less nodes appeared in /sys/devices/system/node would that fix

> it? Could we do that or is that not possible for other reasons?

> 

>> Version 1.7 seems immune. 

> 

> Which was released in August 2020.

> 

> Looks like some distros still ship 1.6, I take it you're not sure if

> that is broken or not.


I did a bisect on irqbalance and the "bad" commit was introduced between 
version 1.7 and version 1.8 :

  commit 31dea01f3a47 ("Also fetch node info for non-PCI devices") 
  https://github.com/Irqbalance/irqbalance/commit/31dea01f3a47aa6374560638486879e5129f9c94

which was backported on RHEL 8 in RPM irqbalance-1.4.0-6.el8.

Any distro using irqbalance <= 1.7 without the patch above is fine. 

Since irqbalance handled cleanly irqs referencing offline nodes before 
this patch, I am inclined to think that the irqbalance fix is incomplete. 
Unfortunately, the commit log lacks some context on the non-PCI devices. 

Thanks,

C.
Cédric Le Goater Aug. 6, 2021, 11:50 a.m. UTC | #6
On 6/29/21 3:15 PM, Cédric Le Goater wrote:
> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at

> runtime. Today, the IPI is not created for such nodes, and hot-plugged

> CPUs use a bogus IPI, which leads to soft lockups.

> 

> We could create the node IPI on demand but it is a bit complex because

> this code would be called under bringup_up() and some IRQ locking is

> being done. The simplest solution is to create the IPIs for all nodes

> at startup.

> 

> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")

> Cc: stable@vger.kernel.org # v5.13

> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>

> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

> Signed-off-by: Cédric Le Goater <clg@kaod.org>

> ---

> 

> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes

> are collected from /sys/devices/system/node/ but CPU-less nodes are

> not listed there. When interrupts are scanned, the link representing

> the node structure is NULL and segfault occurs.


This is an irqbalance regression due to :

  https://github.com/Irqbalance/irqbalance/pull/172

I will report through an issue. 

Anyhow, there is a better approach which is to allocate IPIs for all 
nodes at boot time and do the mapping on demand. Removing the mapping 
on last use seems more complex though. 

I will send a v2 after some tests.  

Thanks,

C.

> Version 1.7 seems immune. 

> 

> ---

>  arch/powerpc/sysdev/xive/common.c | 4 ----

>  1 file changed, 4 deletions(-)

> 

> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c

> index f3b16ed48b05..5d2c58dba57e 100644

> --- a/arch/powerpc/sysdev/xive/common.c

> +++ b/arch/powerpc/sysdev/xive/common.c

> @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void)

>  		struct xive_ipi_desc *xid = &xive_ipis[node];

>  		struct xive_ipi_alloc_info info = { node };

>  

> -		/* Skip nodes without CPUs */

> -		if (cpumask_empty(cpumask_of_node(node)))

> -			continue;

> -

>  		/*

>  		 * Map one IPI interrupt per node for all cpus of that node.

>  		 * Since the HW interrupt number doesn't have any meaning,

>
diff mbox series

Patch

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index f3b16ed48b05..5d2c58dba57e 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -1143,10 +1143,6 @@  static int __init xive_request_ipi(void)
 		struct xive_ipi_desc *xid = &xive_ipis[node];
 		struct xive_ipi_alloc_info info = { node };
 
-		/* Skip nodes without CPUs */
-		if (cpumask_empty(cpumask_of_node(node)))
-			continue;
-
 		/*
 		 * Map one IPI interrupt per node for all cpus of that node.
 		 * Since the HW interrupt number doesn't have any meaning,