diff mbox series

rt-tests: Drop use_current_cpuset() check

Message ID 20210315193707.359702-1-peterx@redhat.com
State New
Headers show
Series rt-tests: Drop use_current_cpuset() check | expand

Commit Message

Peter Xu March 15, 2021, 7:37 p.m. UTC
CPU list should allow to be any list rather than only the cores that are
allowed to schedule for the current process.

Before this patch, cyclictest will fail with below condition:

$ taskset -pc $$
pid 2316's current affinity list: 0,2,4,6,8
$ sudo cyclictest -m -N -p 1 -a 1,3,5,7 -t 4
WARN: Couldn't setaffinity in main thread: Invalid argument

After this patch, it'll be allowed to run.

Cc: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 src/lib/rt-numa.c | 27 ---------------------------
 1 file changed, 27 deletions(-)

Comments

Daniel Wagner March 16, 2021, 8:18 a.m. UTC | #1
Hi Peter,

On Mon, Mar 15, 2021 at 03:37:07PM -0400, Peter Xu wrote:
> CPU list should allow to be any list rather than only the cores that are

> allowed to schedule for the current process.


See commit aaa57168dfd3 ("rt-tests: cyclictest: Only run on runtime
affinity and user supplied affinity")

> Before this patch, cyclictest will fail with below condition:

> 

> $ taskset -pc $$

> pid 2316's current affinity list: 0,2,4,6,8

> $ sudo cyclictest -m -N -p 1 -a 1,3,5,7 -t 4

> WARN: Couldn't setaffinity in main thread: Invalid argument

> 

> After this patch, it'll be allowed to run.


As John writes in the commit above message I think cyclictest should
honor to the runtime settings. The warning above could be extended and
telling you what's the problem is.

Your commit message should contain the argument why it's better not to
check the environment settings. So the question is what is the least
surprise for the user?

Thanks,
Daniel
Peter Xu March 16, 2021, 8:07 p.m. UTC | #2
On Tue, Mar 16, 2021 at 09:18:26AM +0100, Daniel Wagner wrote:
> Hi Peter,


Daniel,

> 

> On Mon, Mar 15, 2021 at 03:37:07PM -0400, Peter Xu wrote:

> > CPU list should allow to be any list rather than only the cores that are

> > allowed to schedule for the current process.

> 

> See commit aaa57168dfd3 ("rt-tests: cyclictest: Only run on runtime

> affinity and user supplied affinity")


(Thanks for the commit ID)

> 

> > Before this patch, cyclictest will fail with below condition:

> > 

> > $ taskset -pc $$

> > pid 2316's current affinity list: 0,2,4,6,8

> > $ sudo cyclictest -m -N -p 1 -a 1,3,5,7 -t 4

> > WARN: Couldn't setaffinity in main thread: Invalid argument

> > 

> > After this patch, it'll be allowed to run.

> 

> As John writes in the commit above message I think cyclictest should

> honor to the runtime settings. The warning above could be extended and

> telling you what's the problem is.

> 

> Your commit message should contain the argument why it's better not to

> check the environment settings. So the question is what is the least

> surprise for the user?


I think what I'm missing is why we had such a restriction.  Quotting from the
commit ID:

    Currently if the user passes the affinity to cyclictest, threads are run
    there even if they should be excluded according to the affinity of the
    runtime environment.

So I'm not sure I understand the word "runtime environment".

If it's defined as "the set of cores that this process is allowed to run", I
don't understand why it's not allowed to schedule things outside of this set of
cores, especially if sched_setaffinity() syscall would succeed.  IOW, I'm
afraid we got the idea slightly mixed up on "cores allowed to schedule this
process" with "cores allowed to be set as schedule target by this process", and
IMHO we should simply rely on sched_setaffinity() syscall to decide whether the
affinity setting is legal or not as a single checkpoint.

John and I had some discussion offlist about this last time on oslat, it should
be the same thing here I think. But I'd like John to help confirm too since I
could be missing something.

Thanks,

-- 
Peter Xu
Daniel Wagner March 17, 2021, 7:49 a.m. UTC | #3
On Tue, Mar 16, 2021 at 04:07:05PM -0400, Peter Xu wrote:
> I think what I'm missing is why we had such a restriction.  Quotting from the

> commit ID:


IIRC, the current behavior allows the process to be placed into a cgroup
with a subset of CPUs and you just can do 'cyclictest -a -t'. Process
should not ignore external configuration. That's my whole point here.

> So I'm not sure I understand the word "runtime environment".

>

> If it's defined as "the set of cores that this process is allowed to

> run",


I am trying to say is, the tests should not assume they have the full
control of the placement as this is not what I would expect. If I as
'admin' limits the sched mask then cyclictest should not overwrite it.

If you insist removing this code, please add a section in the
documentation explaining why the tools ignore it.

> John and I had some discussion offlist about this last time on oslat,


As I said, I think this behavior is wrong. Anyway, I stopped caring.
Peter Xu March 17, 2021, 12:51 p.m. UTC | #4
Hi, Daniel,

On Wed, Mar 17, 2021 at 08:49:03AM +0100, Daniel Wagner wrote:
> On Tue, Mar 16, 2021 at 04:07:05PM -0400, Peter Xu wrote:

> > I think what I'm missing is why we had such a restriction.  Quotting from the

> > commit ID:

> 

> IIRC, the current behavior allows the process to be placed into a cgroup

> with a subset of CPUs and you just can do 'cyclictest -a -t'. Process

> should not ignore external configuration. That's my whole point here.


In that case again I think a sane solution is not to check the cpu list in
every single tool we use, because even if we do that for all tools in rt-teets
repo, we can't guarantee to have this check for the rest tools to not ignore
this restriction.

A simple example is: what if the user specified "taskset -c $CPU cyclictest -a
$CPU -t 1 ..." where $CPU is not in the allowed list of current bash?  As long
as the taskset would work the so-called "environment" will be changed before
even loading cyclictest.

If you see that's the point I said we should fail at the same check point of
sched_setaffinity() rather than checking it explicitly in the tool, because
if we want a real-world restriction that's the only place I think it's possible..

But I'm not a cgroup/container guy, please correct me if I understood.

-- 
Peter Xu
John Kacur March 17, 2021, 3:08 p.m. UTC | #5
On Wed, 17 Mar 2021, Peter Xu wrote:

> Hi, Daniel,

> 

> On Wed, Mar 17, 2021 at 08:49:03AM +0100, Daniel Wagner wrote:

> > On Tue, Mar 16, 2021 at 04:07:05PM -0400, Peter Xu wrote:

> > > I think what I'm missing is why we had such a restriction.  Quotting from the

> > > commit ID:

> > 

> > IIRC, the current behavior allows the process to be placed into a cgroup

> > with a subset of CPUs and you just can do 'cyclictest -a -t'. Process

> > should not ignore external configuration. That's my whole point here.

> 

> In that case again I think a sane solution is not to check the cpu list in

> every single tool we use, because even if we do that for all tools in rt-teets

> repo, we can't guarantee to have this check for the rest tools to not ignore

> this restriction.

> 

> A simple example is: what if the user specified "taskset -c $CPU cyclictest -a

> $CPU -t 1 ..." where $CPU is not in the allowed list of current bash?  As long

> as the taskset would work the so-called "environment" will be changed before

> even loading cyclictest.

> 

> If you see that's the point I said we should fail at the same check point of

> sched_setaffinity() rather than checking it explicitly in the tool, because

> if we want a real-world restriction that's the only place I think it's possible..

> 

> But I'm not a cgroup/container guy, please correct me if I understood.

> 

> -- 

> Peter Xu

> 

> 


When cyclictest and friends were originally written, we had this view 
point that we "owned" the whole machine, and didn't have any restrictions 
on where to schedule. As machines grew in size, and we added numa 
awareness, and cgroups became more prominent we added this code that tried 
to schedule according to the ill-defined environment that we found 
ourselves in.

As Peter points out we may have restricted ourselves more than is 
necessary, and can rely a bit more on the operating system to restrict us. 
On the otherhand using taskset is an easy workaround if the current code 
is to restrictive.

Because we can use taskset and things are working well otherwise I don't 
see this as super urgent, but I am willing to revisit this code and make 
it less restrictive if that makes sense.

I also am not a cgroup / container person, and would like to play around 
with this a bit more before we make some decisions on which direction to 
go in.

Does that make sense to everyone?

John
Peter Xu March 17, 2021, 3:21 p.m. UTC | #6
On Wed, Mar 17, 2021 at 11:08:31AM -0400, John Kacur wrote:
> 

> 

> On Wed, 17 Mar 2021, Peter Xu wrote:

> 

> > Hi, Daniel,

> > 

> > On Wed, Mar 17, 2021 at 08:49:03AM +0100, Daniel Wagner wrote:

> > > On Tue, Mar 16, 2021 at 04:07:05PM -0400, Peter Xu wrote:

> > > > I think what I'm missing is why we had such a restriction.  Quotting from the

> > > > commit ID:

> > > 

> > > IIRC, the current behavior allows the process to be placed into a cgroup

> > > with a subset of CPUs and you just can do 'cyclictest -a -t'. Process

> > > should not ignore external configuration. That's my whole point here.

> > 

> > In that case again I think a sane solution is not to check the cpu list in

> > every single tool we use, because even if we do that for all tools in rt-teets

> > repo, we can't guarantee to have this check for the rest tools to not ignore

> > this restriction.

> > 

> > A simple example is: what if the user specified "taskset -c $CPU cyclictest -a

> > $CPU -t 1 ..." where $CPU is not in the allowed list of current bash?  As long

> > as the taskset would work the so-called "environment" will be changed before

> > even loading cyclictest.

> > 

> > If you see that's the point I said we should fail at the same check point of

> > sched_setaffinity() rather than checking it explicitly in the tool, because

> > if we want a real-world restriction that's the only place I think it's possible..

> > 

> > But I'm not a cgroup/container guy, please correct me if I understood.

> > 

> > -- 

> > Peter Xu

> > 

> > 

> 

> When cyclictest and friends were originally written, we had this view 

> point that we "owned" the whole machine, and didn't have any restrictions 

> on where to schedule. As machines grew in size, and we added numa 

> awareness, and cgroups became more prominent we added this code that tried 

> to schedule according to the ill-defined environment that we found 

> ourselves in.

> 

> As Peter points out we may have restricted ourselves more than is 

> necessary, and can rely a bit more on the operating system to restrict us. 

> On the otherhand using taskset is an easy workaround if the current code 

> is to restrictive.

> 

> Because we can use taskset and things are working well otherwise I don't 

> see this as super urgent, but I am willing to revisit this code and make 

> it less restrictive if that makes sense.

> 

> I also am not a cgroup / container person, and would like to play around 

> with this a bit more before we make some decisions on which direction to 

> go in.

> 

> Does that make sense to everyone?


Sure thing on my side.  No bug reported so far this time, so I'll wait at least
until then :) I just don't know why it's not hit just like oslat since I don't
see a difference.  When I fixed the oslat thing, I thought cyclictest didn't
have such issue for some reason so I didn't consider to touch it at all.  But
when yesterday I rerun some tests I see this issue on rhel8, hence this patch.

Thanks,

-- 
Peter Xu
John Kacur March 17, 2021, 3:47 p.m. UTC | #7
On Wed, 17 Mar 2021, Peter Xu wrote:

> On Wed, Mar 17, 2021 at 11:08:31AM -0400, John Kacur wrote:

> > 

> > 

> > On Wed, 17 Mar 2021, Peter Xu wrote:

> > 

> > > Hi, Daniel,

> > > 

> > > On Wed, Mar 17, 2021 at 08:49:03AM +0100, Daniel Wagner wrote:

> > > > On Tue, Mar 16, 2021 at 04:07:05PM -0400, Peter Xu wrote:

> > > > > I think what I'm missing is why we had such a restriction.  Quotting from the

> > > > > commit ID:

> > > > 

> > > > IIRC, the current behavior allows the process to be placed into a cgroup

> > > > with a subset of CPUs and you just can do 'cyclictest -a -t'. Process

> > > > should not ignore external configuration. That's my whole point here.

> > > 

> > > In that case again I think a sane solution is not to check the cpu list in

> > > every single tool we use, because even if we do that for all tools in rt-teets

> > > repo, we can't guarantee to have this check for the rest tools to not ignore

> > > this restriction.

> > > 

> > > A simple example is: what if the user specified "taskset -c $CPU cyclictest -a

> > > $CPU -t 1 ..." where $CPU is not in the allowed list of current bash?  As long

> > > as the taskset would work the so-called "environment" will be changed before

> > > even loading cyclictest.

> > > 

> > > If you see that's the point I said we should fail at the same check point of

> > > sched_setaffinity() rather than checking it explicitly in the tool, because

> > > if we want a real-world restriction that's the only place I think it's possible..

> > > 

> > > But I'm not a cgroup/container guy, please correct me if I understood.

> > > 

> > > -- 

> > > Peter Xu

> > > 

> > > 

> > 

> > When cyclictest and friends were originally written, we had this view 

> > point that we "owned" the whole machine, and didn't have any restrictions 

> > on where to schedule. As machines grew in size, and we added numa 

> > awareness, and cgroups became more prominent we added this code that tried 

> > to schedule according to the ill-defined environment that we found 

> > ourselves in.

> > 

> > As Peter points out we may have restricted ourselves more than is 

> > necessary, and can rely a bit more on the operating system to restrict us. 

> > On the otherhand using taskset is an easy workaround if the current code 

> > is to restrictive.

> > 

> > Because we can use taskset and things are working well otherwise I don't 

> > see this as super urgent, but I am willing to revisit this code and make 

> > it less restrictive if that makes sense.

> > 

> > I also am not a cgroup / container person, and would like to play around 

> > with this a bit more before we make some decisions on which direction to 

> > go in.

> > 

> > Does that make sense to everyone?

> 

> Sure thing on my side.  No bug reported so far this time, so I'll wait at least

> until then :) I just don't know why it's not hit just like oslat since I don't

> see a difference.  When I fixed the oslat thing, I thought cyclictest didn't

> have such issue for some reason so I didn't consider to touch it at all.  But

> when yesterday I rerun some tests I see this issue on rhel8, hence this patch.

> 

> Thanks,

> 

> -- 

> Peter Xu

> 

> 


Thanks Peter - I appreciate it!

John
Ahmed S. Darwish March 17, 2021, 4:40 p.m. UTC | #8
On Wed, Mar 17, 2021 at 11:08:31AM -0400, John Kacur wrote:
>

> I also am not a cgroup / container person, and would like to play around

> with this a bit more before we make some decisions on which direction to

> go in.

>


Pardon my ignorance, but (scheduling wise) what does an RT task inside a
container/cgroup mean?

AFAIK, the RT task will still command the CPU as it pleases, no matter
the cgroup's cpu.shares configuration.

Thanks,

--
Ahmed S. Darwish
Daniel Wagner March 17, 2021, 5:15 p.m. UTC | #9
On Wed, Mar 17, 2021 at 05:40:56PM +0100, Ahmed S. Darwish wrote:
> Pardon my ignorance, but (scheduling wise) what does an RT task inside a

> container/cgroup mean?

> 

> AFAIK, the RT task will still command the CPU as it pleases, no matter

> the cgroup's cpu.shares configuration.


It's not about scheduling it's about affinity only.
Daniel Wagner March 17, 2021, 5:20 p.m. UTC | #10
On Wed, Mar 17, 2021 at 11:47:51AM -0400, John Kacur wrote:
> > > Does that make sense to everyone?

> > 

> > Sure thing on my side.  No bug reported so far this time, so I'll wait at least

> > until then :) I just don't know why it's not hit just like oslat since I don't

> > see a difference.  When I fixed the oslat thing, I thought cyclictest didn't

> > have such issue for some reason so I didn't consider to touch it at all.  But

> > when yesterday I rerun some tests I see this issue on rhel8, hence this patch.


Don't let me stop you ripping out the code. I am not agreeing with the
reasoning why it should go but it's a detail I don't care.
diff mbox series

Patch

diff --git a/src/lib/rt-numa.c b/src/lib/rt-numa.c
index babcc63..f581020 100644
--- a/src/lib/rt-numa.c
+++ b/src/lib/rt-numa.c
@@ -93,32 +93,6 @@  int cpu_for_thread_ua(int thread_num, int max_cpus)
 	return 0;
 }
 
-/*
- * After this function is called, affinity_mask is the intersection of
- * the user supplied affinity mask and the affinity mask from the run
- * time environment
- */
-static void use_current_cpuset(int max_cpus, struct bitmask *cpumask)
-{
-	struct bitmask *curmask;
-	int i;
-
-	curmask = numa_allocate_cpumask();
-	numa_sched_getaffinity(getpid(), curmask);
-
-	/*
-	 * Clear bits that are not set in both the cpuset from the
-	 * environment, and in the user specified affinity.
-	 */
-	for (i = 0; i < max_cpus; i++) {
-		if ((!numa_bitmask_isbitset(cpumask, i)) ||
-		    (!numa_bitmask_isbitset(curmask, i)))
-			numa_bitmask_clearbit(cpumask, i);
-	}
-
-	numa_bitmask_free(curmask);
-}
-
 int parse_cpumask(char *str, int max_cpus, struct bitmask **cpumask)
 {
 	struct bitmask *mask;
@@ -133,7 +107,6 @@  int parse_cpumask(char *str, int max_cpus, struct bitmask **cpumask)
 		return 0;
 	}
 
-	use_current_cpuset(max_cpus, mask);
 	*cpumask = mask;
 
 	return 0;