sched/isolcpus: Show isolated cpu map

Message ID 1486979039-17874-1-git-send-email-wangkefeng.wang@huawei.com
State New
Headers show

Commit Message

Kefeng Wang Feb. 13, 2017, 9:43 a.m.
The commit a6e4491c682a ("sched/isolcpus: Output warning when the
'isolcpus=' kernel parameter is invalid") adds an error message
when specified cpu bigger than nr_cpu_ids, but nr_cpumask_bits in
cpulist_parse() could be nr_cpu_ids or NR_CPUS.

eg, NR_CPUS=64, nr_cpu_ids=8 in ARM64, cpulist_parse() won't return
-ERANGE if isolcpus=1-10;

Let's show the isolated cpu map and drop the improper error message.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

---
 kernel/sched/core.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

-- 
1.7.12.4

Comments

Peter Zijlstra Feb. 13, 2017, 12:06 p.m. | #1
On Mon, Feb 13, 2017 at 05:43:59PM +0800, Kefeng Wang wrote:
> The commit a6e4491c682a ("sched/isolcpus: Output warning when the

> 'isolcpus=' kernel parameter is invalid") adds an error message

> when specified cpu bigger than nr_cpu_ids, but nr_cpumask_bits in

> cpulist_parse() could be nr_cpu_ids or NR_CPUS.

> 

> eg, NR_CPUS=64, nr_cpu_ids=8 in ARM64, cpulist_parse() won't return

> -ERANGE if isolcpus=1-10;

> 


But why does cpulist_parse() use nr_cpumask_bits, that seems to be the
problem, so why not look there?
Kefeng Wang Feb. 13, 2017, 1:07 p.m. | #2
Hi Peter

+Tejun

On 2017/2/13 20:06, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 05:43:59PM +0800, Kefeng Wang wrote:

>> The commit a6e4491c682a ("sched/isolcpus: Output warning when the

>> 'isolcpus=' kernel parameter is invalid") adds an error message

>> when specified cpu bigger than nr_cpu_ids, but nr_cpumask_bits in

>> cpulist_parse() could be nr_cpu_ids or NR_CPUS.

>>

>> eg, NR_CPUS=64, nr_cpu_ids=8 in ARM64, cpulist_parse() won't return

>> -ERANGE if isolcpus=1-10;

>>

> 

> But why does cpulist_parse() use nr_cpumask_bits, that seems to be the

> problem, so why not look there?

> 

> 


Paste the Tejun's patch,

commit 4d59b6ccf000862beed6fc0765d3209f98a8d8a2
Author: Tejun Heo <tj@kernel.org>
Date:   Wed Feb 8 14:30:56 2017 -0800

    cpumask: use nr_cpumask_bits for parsing functions

    Commit 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and
    parsing functions") converted both cpumask printing and parsing
    functions to use nr_cpu_ids instead of nr_cpumask_bits.  While this was
    okay for the printing functions as it just picked one of the two output
    formats that we were alternating between depending on a kernel config,
    doing the same for parsing wasn't okay.

    nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS.  We can always use
    nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can
    be more efficient to use NR_CPUS when we can get away with it.
    Converting the printing functions to nr_cpu_ids makes sense because it
    affects how the masks get presented to userspace and doesn't break
    anything; however, using nr_cpu_ids for parsing functions can
    incorrectly leave the higher bits uninitialized while reading in these
    masks from userland.  As all testing and comparison functions use
    nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks
    can erroneously yield false negative results.

    This made the taskstats interface incorrectly return -EINVAL even when
    the inputs were correct.

    Fix it by restoring the parse functions to use nr_cpumask_bits instead
    of nr_cpu_ids.

    Link: http://lkml.kernel.org/r/20170206182442.GB31078@htj.duckdns.org
    Fixes: 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and parsing functions")
    Signed-off-by: Tejun Heo <tj@kernel.org>

    Reported-by: Martin Steigerwald <martin.steigerwald@teamix.de>
    Debugged-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
    Cc: <stable@vger.kernel.org>        [4.0+]
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra Feb. 13, 2017, 1:32 p.m. | #3
On Mon, Feb 13, 2017 at 09:07:02PM +0800, Kefeng Wang wrote:
> Hi Peter

> 

> +Tejun

> 

> On 2017/2/13 20:06, Peter Zijlstra wrote:

> > On Mon, Feb 13, 2017 at 05:43:59PM +0800, Kefeng Wang wrote:

> >> The commit a6e4491c682a ("sched/isolcpus: Output warning when the

> >> 'isolcpus=' kernel parameter is invalid") adds an error message

> >> when specified cpu bigger than nr_cpu_ids, but nr_cpumask_bits in

> >> cpulist_parse() could be nr_cpu_ids or NR_CPUS.

> >>

> >> eg, NR_CPUS=64, nr_cpu_ids=8 in ARM64, cpulist_parse() won't return

> >> -ERANGE if isolcpus=1-10;

> >>

> > 

> > But why does cpulist_parse() use nr_cpumask_bits, that seems to be the

> > problem, so why not look there?

> > 

> > 

> 

> Paste the Tejun's patch,

> 

> commit 4d59b6ccf000862beed6fc0765d3209f98a8d8a2

> Author: Tejun Heo <tj@kernel.org>

> Date:   Wed Feb 8 14:30:56 2017 -0800

> 

>     cpumask: use nr_cpumask_bits for parsing functions

> 

>     Commit 513e3d2d11c9 ("cpumask: always use nr_cpu_ids in formatting and

>     parsing functions") converted both cpumask printing and parsing

>     functions to use nr_cpu_ids instead of nr_cpumask_bits.  While this was

>     okay for the printing functions as it just picked one of the two output

>     formats that we were alternating between depending on a kernel config,

>     doing the same for parsing wasn't okay.

> 

>     nr_cpumask_bits can be either nr_cpu_ids or NR_CPUS.  We can always use

>     nr_cpu_ids but that is a variable while NR_CPUS is a constant, so it can

>     be more efficient to use NR_CPUS when we can get away with it.

>     Converting the printing functions to nr_cpu_ids makes sense because it

>     affects how the masks get presented to userspace and doesn't break

>     anything; however, using nr_cpu_ids for parsing functions can

>     incorrectly leave the higher bits uninitialized while reading in these

>     masks from userland.  As all testing and comparison functions use

>     nr_cpumask_bits which can be larger than nr_cpu_ids, the parsed cpumasks

>     can erroneously yield false negative results.

> 

>     This made the taskstats interface incorrectly return -EINVAL even when

>     the inputs were correct.

> 

>     Fix it by restoring the parse functions to use nr_cpumask_bits instead

>     of nr_cpu_ids.


OK, so its wrong both ways.

Problem seems to be that cpumask is internally inconsistent with the
number of bits because a small constant NR_CPUS is more efficient for
things like cpumask_subset().

If everything were consistent and used nr_cpu_ids it would all be fine,
but using a mixture is giving pain.

Does something like the below work? It parses up to nr_cpu_ids and then
and's with cpu_possible_mask (which has all bits set). In case
nr_cpumask_bits is larger than nr_cpu_ids this should result in clearing
the top bits (and therefore not leave them uninitialized). And using
nr_cpu_ids for parsing now makes the range check work again.

Since parsing in general is a really slow thing anyway, the extra
cpumask operation doesn't matter.

---
 include/linux/cpumask.h | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index 96f1e88b767c..6cf8945b999d 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -560,7 +560,13 @@ static inline void cpumask_copy(struct cpumask *dstp,
 static inline int cpumask_parse_user(const char __user *buf, int len,
 				     struct cpumask *dstp)
 {
-	return bitmap_parse_user(buf, len, cpumask_bits(dstp), nr_cpumask_bits);
+	int ret;
+
+	ret = bitmap_parse_user(buf, len, cpumask_bits(dstp), nr_cpu_ids);
+	if (!ret)
+		cpumask_and(dstp, dstp, cpu_possible_mask);
+
+	return ret;
 }
 
 /**
@@ -574,8 +580,13 @@ static inline int cpumask_parse_user(const char __user *buf, int len,
 static inline int cpumask_parselist_user(const char __user *buf, int len,
 				     struct cpumask *dstp)
 {
-	return bitmap_parselist_user(buf, len, cpumask_bits(dstp),
-				     nr_cpumask_bits);
+	int ret;
+
+	ret = bitmap_parselist_user(buf, len, cpumask_bits(dstp), nr_cpu_ids);
+	if (!ret)
+		cpumask_and(dstp, dstp, cpu_possible_mask);
+
+	return ret;
 }
 
 /**
@@ -589,8 +600,13 @@ static inline int cpumask_parse(const char *buf, struct cpumask *dstp)
 {
 	char *nl = strchr(buf, '\n');
 	unsigned int len = nl ? (unsigned int)(nl - buf) : strlen(buf);
+	int ret;
 
-	return bitmap_parse(buf, len, cpumask_bits(dstp), nr_cpumask_bits);
+	ret = bitmap_parse(buf, len, cpumask_bits(dstp), nr_cpu_ids);
+	if (!ret)
+		cpumask_and(dstp, dstp, cpu_possible_mask);
+
+	return ret;
 }
 
 /**
@@ -602,7 +618,13 @@ static inline int cpumask_parse(const char *buf, struct cpumask *dstp)
  */
 static inline int cpulist_parse(const char *buf, struct cpumask *dstp)
 {
-	return bitmap_parselist(buf, cpumask_bits(dstp), nr_cpumask_bits);
+	int ret;
+
+	ret = bitmap_parselist(buf, cpumask_bits(dstp), nr_cpu_ids);
+	if (!ret)
+		cpumask_and(dstp, dstp, cpu_possible_mask);
+
+	return ret;
 }
 
 /**

Kefeng Wang Feb. 14, 2017, 1:53 a.m. | #4
On 2017/2/13 21:32, Peter Zijlstra wrote:
> On Mon, Feb 13, 2017 at 09:07:02PM +0800, Kefeng Wang wrote:

>> Hi Peter

>>

>> +Tejun

>>

>> On 2017/2/13 20:06, Peter Zijlstra wrote:

>>> On Mon, Feb 13, 2017 at 05:43:59PM +0800, Kefeng Wang wrote:

>>>> The commit a6e4491c682a ("sched/isolcpus: Output warning when the

>>>> 'isolcpus=' kernel parameter is invalid") adds an error message

>>>> when specified cpu bigger than nr_cpu_ids, but nr_cpumask_bits in

>>>> cpulist_parse() could be nr_cpu_ids or NR_CPUS.

>>>>

>>>> eg, NR_CPUS=64, nr_cpu_ids=8 in ARM64, cpulist_parse() won't return

>>>> -ERANGE if isolcpus=1-10;

>>>>

>>>


> OK, so its wrong both ways.

> 

> Problem seems to be that cpumask is internally inconsistent with the

> number of bits because a small constant NR_CPUS is more efficient for

> things like cpumask_subset().

> 

> If everything were consistent and used nr_cpu_ids it would all be fine,

> but using a mixture is giving pain.

> 

> Does something like the below work? It parses up to nr_cpu_ids and then

> and's with cpu_possible_mask (which has all bits set). In case

> nr_cpumask_bits is larger than nr_cpu_ids this should result in clearing

> the top bits (and therefore not leave them uninitialized). And using

> nr_cpu_ids for parsing now makes the range check work again.


It works again for the above example, but there is another scene,
eg, NR_CPUS=64, nr_cpu_ids=8 and isolcpus=1,2,3,4,5,6,7,8,9,10 on arm64

It prints the error message, but it does isolate cpus successfully.

if isolcpus=1-10, it will only show error message and fail to isolate cpus.

For our user, they may use the two kinds of configuration, but they make some
mistake sometimes and fail to isolated cpus. That's primary reason why I want
to show the isolated cpus map on boot message.

> 

> 

> 

> .

>

Patch hide | download patch | download mbox

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c56fb57..13a122d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6076,13 +6076,16 @@  static void update_top_cache_domain(int cpu)
 /* Setup the mask of cpus configured for isolated domains */
 static int __init isolated_cpu_setup(char *str)
 {
-	int ret;
+	int cpu;
 
 	alloc_bootmem_cpumask_var(&cpu_isolated_map);
-	ret = cpulist_parse(str, cpu_isolated_map);
-	if (ret) {
-		pr_err("sched: Error, all isolcpus= values must be between 0 and %d\n", nr_cpu_ids);
-		return 0;
+	cpulist_parse(str, cpu_isolated_map);
+
+	if (!cpumask_empty(cpu_isolated_map)) {
+		pr_cont("sched: isolated cpus [ ");
+		for_each_cpu(cpu, cpu_isolated_map)
+			pr_cont("%d ", cpu);
+		pr_cont("]\n");
 	}
 	return 1;
 }