diff mbox series

oslat: Fix --cpu-list won't allow to schedule on all possible cores

Message ID 20210218192729.116427-1-peterx@redhat.com
State New
Headers show
Series oslat: Fix --cpu-list won't allow to schedule on all possible cores | expand

Commit Message

Peter Xu Feb. 18, 2021, 7:27 p.m. UTC
parse_cpumask() is too strict for oslat, in that use_current_cpuset() will
filter out all the cores that are not allowed for current process to run.  This
seems to be unnecessary at least for oslat.  For example, the bash process that
runs the oslat program may have a sched affinity of 0-2, however it's still
legal to have it start a oslat thread running on the cores outside 0-2 as long
as the follow up sched_setaffinity() will succeed.

numa_parse_cpustring_all() suites exactly for this case, which should already
have considered sysconf(_SC_NPROCESSORS_ONLN) limit.  Use that instead.

Since at it, also remove initialization of cpu_set variable otherwise it's
leaked in previous parse_cpumask too: numa_parse_cpustring_all() will return a
newly allocated buffer already.  Quotting from manual:

    numa_parse_nodestring() parses a character string list of nodes into a bit
    mask.  The bit mask is allocated by numa_allocate_nodemask().

    numa_parse_nodestring_all() is similar to numa_parse_nodestring, but can
    parse all possible nodes, not only current nodeset.

Cc: John Kacur <jkacur@redhat.com>
Cc: Daniel Wagner <dwagner@suse.de>
Cc: Clark Williams <williams@redhat.com>
Reported-by: Pradipta Kumar Sahoo <psahoo@redhat.com>
Reported-by: Mike Stowell <mstowell@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 src/oslat/oslat.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Comments

Daniel Wagner Feb. 19, 2021, 8:16 a.m. UTC | #1
Hi Peter,

On Thu, Feb 18, 2021 at 02:27:29PM -0500, Peter Xu wrote:
> parse_cpumask() is too strict for oslat, in that use_current_cpuset() will
> filter out all the cores that are not allowed for current process to run.  This
> seems to be unnecessary at least for oslat.  For example, the bash process that
> runs the oslat program may have a sched affinity of 0-2, however it's still
> legal to have it start a oslat thread running on the cores outside 0-2 as long
> as the follow up sched_setaffinity() will succeed.
> 
> numa_parse_cpustring_all() suites exactly for this case, which should already
> have considered sysconf(_SC_NPROCESSORS_ONLN) limit.  Use that instead.
> 
> Since at it, also remove initialization of cpu_set variable otherwise it's
> leaked in previous parse_cpumask too: numa_parse_cpustring_all() will return a
> newly allocated buffer already.  Quotting from manual:
> 
>     numa_parse_nodestring() parses a character string list of nodes into a bit
>     mask.  The bit mask is allocated by numa_allocate_nodemask().
> 
>     numa_parse_nodestring_all() is similar to numa_parse_nodestring, but can
>     parse all possible nodes, not only current nodeset.

My 2 cents: If parse_cpumask() is to strict fix parse_cpumask() so all
tools do the same thing. Note parse_cpumask() contains a couple
of bugs:

"rt-numa: Use mask size for iterator limit"
"rt-numa: Remove max_cpus argument from parse_cpusmask"

Thanks,
Daniel
Daniel Wagner Feb. 19, 2021, 4:20 p.m. UTC | #2
On Fri, Feb 19, 2021 at 10:24:17AM -0500, Peter Xu wrote:
> Yes I explicitly avoided touching parse_cpumask because I don't want to change
> behavior of other tools if I'm not confident with that.  Would above two
> patches fix oslat too (which I didn't check)?  If so, I'll be fine to have this
> patch dropped.  Otherwise I tend to prefer fixing oslat first.  We can further
> rework the common code, but if existing tools are fine, then I don't think it's
> a bugfix, so no need to rush.  While I'll count this patch as a real bugfix, so
> I'd hope we could consider merging it earlier.

As I said, I would really appreciated if all tools behave the same
way. Having oslat be special is going to be pain in the long run. Just
update parse_cpumask, it's not that difficult :)
Daniel Wagner Feb. 19, 2021, 4:43 p.m. UTC | #3
On 19.02.21 17:35, John Kacur wrote:
>> "rt-numa: Use mask size for iterator limit"
>> "rt-numa: Remove max_cpus argument from parse_cpusmask"
> 
> Are those actually bug fixes or just a different way of doing things?

max_cpus is just giving you the number of available CPUs. The loop 
assumes that all are in the range of [0..max_cpus] which is not 
necessarily true. You need to iterate over all possible CPUs. That is 
why passing in max_cpus is pretty useless.
Daniel Wagner Feb. 19, 2021, 4:49 p.m. UTC | #4
Hi Peter,

On 19.02.21 17:39, Peter Xu wrote:
> Now I tried to fix it but it seems you don't really like my fix.

Sorry I didn't want to say I don't like your fix. I just said, I would 
appreciate if things would stay consistent. But then it seems there is
not much interest in this.

> Would you propose yours instead?  I would be more than glad if your version is
> better then I'm happy to drop mine.  Otherwise I'll still prefer to have this
> patch merged to unbreak it first.

I don't care too much. If this patch fixes your problem don't let me
stop you.

Thanks,
Daniel
John Kacur Feb. 19, 2021, 4:55 p.m. UTC | #5
On Fri, 19 Feb 2021, Peter Xu wrote:

> Daniel,
> 
> On Fri, Feb 19, 2021 at 05:20:19PM +0100, Daniel Wagner wrote:
> > On Fri, Feb 19, 2021 at 10:24:17AM -0500, Peter Xu wrote:
> > > Yes I explicitly avoided touching parse_cpumask because I don't want to change
> > > behavior of other tools if I'm not confident with that.  Would above two
> > > patches fix oslat too (which I didn't check)?  If so, I'll be fine to have this
> > > patch dropped.  Otherwise I tend to prefer fixing oslat first.  We can further
> > > rework the common code, but if existing tools are fine, then I don't think it's
> > > a bugfix, so no need to rush.  While I'll count this patch as a real bugfix, so
> > > I'd hope we could consider merging it earlier.
> > 
> > As I said, I would really appreciated if all tools behave the same
> > way. Having oslat be special is going to be pain in the long run. Just
> > update parse_cpumask, it's not that difficult :)
> 
> Oslat is broken now after the numa rework.  Frankly I'm surprised it's broken
> so quickly with just a few commits and actually with no new feature at all but
> pure cleanups.  Again I still appreciate all your work on this but I don't
> really appreciate a lot for having it broken..
> 
> Now I tried to fix it but it seems you don't really like my fix.
> 
> Would you propose yours instead?  I would be more than glad if your version is
> better then I'm happy to drop mine.  Otherwise I'll still prefer to have this
> patch merged to unbreak it first.
> 
> Thanks,
> 
> -- 
> Peter Xu
> 
> 

Hey, I accepted your patch, we're just discussing how to make things work
in the best way for the whole suite in the future.

Thanks for your patches!

John
Peter Xu Feb. 19, 2021, 5:15 p.m. UTC | #6
On Fri, Feb 19, 2021 at 05:49:47PM +0100, Daniel Wagner wrote:
> Hi Peter,
> 
> On 19.02.21 17:39, Peter Xu wrote:
> > Now I tried to fix it but it seems you don't really like my fix.
> 
> Sorry I didn't want to say I don't like your fix. I just said, I would
> appreciate if things would stay consistent. But then it seems there is
> not much interest in this.
> 
> > Would you propose yours instead?  I would be more than glad if your version is
> > better then I'm happy to drop mine.  Otherwise I'll still prefer to have this
> > patch merged to unbreak it first.
> 
> I don't care too much. If this patch fixes your problem don't let me
> stop you.

It's not really my problem, I can compile my own binary.  It's just that I want
to have it working as usual for anyone using it.  Also it'll be great to have
it fixed sooner for our downstream so our QE won't be affect.  I don't want the
unbreak to be delayed by any further cleanup idea anymore - I am totally not
against that, but again I think they can be done on top of a working tree.

Sorry if my wording was too harsh - I apologize for that.

Thanks,
diff mbox series

Patch

diff --git a/src/oslat/oslat.c b/src/oslat/oslat.c
index b2c5373..465a694 100644
--- a/src/oslat/oslat.c
+++ b/src/oslat/oslat.c
@@ -785,7 +785,6 @@  int main(int argc, char *argv[])
 	struct thread *threads;
 	int i, n_cores;
 	struct bitmask *cpu_set = NULL;
-	int max_cpus = sysconf(_SC_NPROCESSORS_ONLN);
 
 #ifdef FRC_MISSING
 	printf("This architecture is not yet supported. "
@@ -797,10 +796,6 @@  int main(int argc, char *argv[])
 		exit(1);
 	}
 
-	cpu_set = numa_allocate_cpumask();
-	if (!cpu_set)
-		fatal("oslat: Could not allocate cpumask\n");
-
 	g.app_name = argv[0];
 	g.rtprio = 0;
 	g.bucket_size = BUCKET_SIZE;
@@ -817,7 +812,8 @@  int main(int argc, char *argv[])
 	if (!g.cpu_list)
 		g.cpu_list = strdup("all");
 
-	if (parse_cpumask(g.cpu_list, max_cpus, &cpu_set) != 0)
+	cpu_set = numa_parse_cpustring_all(g.cpu_list);
+	if (!cpu_set)
 		fatal("oslat: parse_cpumask failed.\n");
 	n_cores = numa_bitmask_weight(cpu_set);