diff mbox

set_schedattr + cpuset issue

Message ID CAE-h8tG6VCG0oX=AsPA-jKGqC=FheJD=4awByMukkUgD4uOUDg@mail.gmail.com
State New
Headers show

Commit Message

Juri Lelli Sept. 2, 2014, 10:36 a.m. UTC
Hi Vincent and Kevin,

thanks a lot for your report. I was also actually experiencing something that I
think is related to your issue, but then I didn't find any time to send out
a proper patch :/.

Could you please test what I've attached and see if it fixes your problem?

Thanks a lot,

- Juri

On 28 August 2014 22:07, Vincent Legout <vincent@legout.info> wrote:
> Hello,
>
> Juri Lelli <juri.lelli@gmail.com> writes:
>
>> On Wed, 2 Jul 2014 17:08:47 -0400
>> Kevin Burns <kevinpb@vt.edu> wrote:
>>
>>> Here's the issue:
>>>
>>> I am able to allocate a bandwidth with a ratio of .1 to two processes using
>>> the sched_setattr() system call.
>>>
>>> I then am able to add said tasks to a cpuset (with one physical processor)
>>> using cset.
>>>
>>> However, when I then try to update the runtime or period of either task,
>>> sched_setattr returns a -EBUSY error.
>>>
>>> Now, if I repeat the above experiment with just one task, I am able to
>>> update the runtime or period without issue. I ran trace-cmd and kernelshark
>>> to verify that the bandwidths were indeed being updated correctly. That and
>>> htop was reporting a higher percentage of CPUusage, which correlated to the
>>> ratios of my task's bandwidth.
>>>
>>> Any ideas as to why cpuset would cause this behaviour?
>>>
>>
>> Could you create a script that I can use to run your setup and reproduce
>> the problem?
>
> Sorry for the delayed answer. I'm working with Kevin and the problem can
> be reproduced using the attached files, also available here:
>
>  http://legout.info/~vincent/sd/
>
> On a Ubuntu 14.04 system running Linux 3.16, when running run.sh for the
> 2nd time, the 2nd call to sched_setattr() returns EBUSY. Uncommenting
> line 41 of run.sh fixes this by returning to SCHED_OTHER before moving
> the task to the cpuset.
>
> The problem arises when using both cpusets and SCHED_DEADLINE. The
> process described in section 5.1 of the SCHED_DEADLINE documentation
> works fine if the process stays on the same cpuset, but I think their
> are some issues when moving a process already in the SCHED_DEADLINE
> policy from one cpuset to another.
>
> According to our experiments, it seems that some fields are not updated
> during this process, and it thus fails. When a task moves from one
> cpuset to another, the total_bw fields of both cpusets doesn't seem to
> be updated. Thus, in the next sched_setattr() call, __dl_overflow()
> returns 1 because it thinks total_bw is 0 in the new cpuset. Then,
> dl_overflow() returns -1 and we have a EBUSY error.
>
> The total_bw field may also overflow because __dl_clear and __dl_add are
> called while the task whose bandwidth is tsk_bw is not in the cpu
> represented by dl_b.
>
> We can get around this by moving the process back to another scheduling
> policy before moving it to another cpuset. But we also had to apply the
> following patch in order to be sure that the bandwith is always updated
> (on top of v3.16). I'd think this condition has been added to skip all
> the tests if the bandwith doesn't change. But there is an issue because
> then, the total_bw field is not going to be updated for the new cpu. I'd
> think the problem comes from the fact that p->dl.dl_bw is not updated
> when a task leaves or returns the SCHED_DEADLINE policy.
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index bc1638b..0df3008 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2031,9 +2031,6 @@ static int dl_overflow(struct task_struct *p, int policy,
>         u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
>         int cpus, err = -1;
>
> -       if (new_bw == p->dl.dl_bw)
> -               return 0;
> -
>         /*
>          * Either if a task, enters, leave, or stays -deadline but changes
>          * its parameters, we may need to update accordingly the total
>
> I hope the above explanations make sense and I didn't miss anything
> trivial. I'd be happy to provide more information or test anything if
> needed.
>
> Thanks,
> Vincent
>

Comments

Vincent Legout Sept. 2, 2014, 2:16 p.m. UTC | #1
Hi,

Juri Lelli <juri.lelli@gmail.com> writes:

> thanks a lot for your report. I was also actually experiencing something that I
> think is related to your issue, but then I didn't find any time to send out
> a proper patch :/.
>
> Could you please test what I've attached and see if it fixes your problem?

Thanks for the patch, it fixes the second issue I mentioned in my
previous email, i.e. the one for which I posted a patch. For this issue,
FWIW:

 Tested-by: Vincent Legout <vincent@legout.info>
 Tested-by: Kevin Burns <kevinpb@vt.edu>

But it doesn't seem to fix the main issue related to cpusets and
SCHED_DEADLINE. It still fails if we don't come back to SCHED_OTHER
before moving the task to another cpuset. I think it's due to the fact
that SCHED_DEADLINE's data structures don't seem to be aware that a task
migrated, they are not updated during this process. Any idea where I
could have a look? Or if this is not supported, would it be possible to
add some checks such that total_bw doesn't overflow when calling
__dl_clear? If yes, I can try to provide a patch.

Thanks!
Vincent
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Wagner Sept. 3, 2014, 10 a.m. UTC | #2
On 09/02/2014 04:16 PM, Vincent Legout wrote:
> Hi,
> 
> Juri Lelli <juri.lelli@gmail.com> writes:
> 
>> thanks a lot for your report. I was also actually experiencing something that I
>> think is related to your issue, but then I didn't find any time to send out
>> a proper patch :/.
>>
>> Could you please test what I've attached and see if it fixes your problem?
> 
> Thanks for the patch, it fixes the second issue I mentioned in my
> previous email, i.e. the one for which I posted a patch. For this issue,
> FWIW:
> 
>  Tested-by: Vincent Legout <vincent@legout.info>
>  Tested-by: Kevin Burns <kevinpb@vt.edu>
> 
> But it doesn't seem to fix the main issue related to cpusets and
> SCHED_DEADLINE. It still fails if we don't come back to SCHED_OTHER
> before moving the task to another cpuset. I think it's due to the fact
> that SCHED_DEADLINE's data structures don't seem to be aware that a task
> migrated, they are not updated during this process. Any idea where I
> could have a look? Or if this is not supported, would it be possible to
> add some checks such that total_bw doesn't overflow when calling
> __dl_clear? If yes, I can try to provide a patch.

I just saw this thread after sending a patch for this problem. It's not
yet in the archive, though it is called:

"[PATCH] sched: Reset bandwith on task when switching from
SCHED_DEADLINE away"

And obviously, I forgot to cc Juri. Sorry about that.

cheers,
daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Wagner Sept. 3, 2014, 1:06 p.m. UTC | #3
On 09/03/2014 12:00 PM, Daniel Wagner wrote:
> 
> On 09/02/2014 04:16 PM, Vincent Legout wrote:
>> Hi,
>>
>> Juri Lelli <juri.lelli@gmail.com> writes:
>>
>>> thanks a lot for your report. I was also actually experiencing something that I
>>> think is related to your issue, but then I didn't find any time to send out
>>> a proper patch :/.
>>>
>>> Could you please test what I've attached and see if it fixes your problem?
>>
>> Thanks for the patch, it fixes the second issue I mentioned in my
>> previous email, i.e. the one for which I posted a patch. For this issue,
>> FWIW:
>>
>>  Tested-by: Vincent Legout <vincent@legout.info>
>>  Tested-by: Kevin Burns <kevinpb@vt.edu>
>>
>> But it doesn't seem to fix the main issue related to cpusets and
>> SCHED_DEADLINE. It still fails if we don't come back to SCHED_OTHER
>> before moving the task to another cpuset. I think it's due to the fact
>> that SCHED_DEADLINE's data structures don't seem to be aware that a task
>> migrated, they are not updated during this process. Any idea where I
>> could have a look? Or if this is not supported, would it be possible to
>> add some checks such that total_bw doesn't overflow when calling
>> __dl_clear? If yes, I can try to provide a patch.
> 
> I just saw this thread after sending a patch for this problem. It's not
> yet in the archive, though it is called:
> 
> "[PATCH] sched: Reset bandwith on task when switching from
> SCHED_DEADLINE away"
> 
> And obviously, I forgot to cc Juri. Sorry about that.

After fixing my sender address it even hit the archive:

https://lkml.org/lkml/2014/9/3/354

cheers,
daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 2243d25fbf0df112a83c12ca8a05e4384ea17058 Mon Sep 17 00:00:00 2001
From: Juri Lelli <juri.lelli@arm.com>
Date: Wed, 11 Jun 2014 16:37:31 +0200
Subject: [PATCH] sched/deadline: clear dl_entity params when setscheduling to
 different class

When a task is using SCHED_DEADLINE and the user setschedules it to a different
class its sched_dl_entity static parameters are not cleaned up. This causes a
bug if the user sets it back to SCHED_DEADLINE with the same parameters again.
The problem resides in the check we perform at the very beginning of
dl_overflow():

	if (new_bw == p->dl.dl_bw)
		return 0;

This condition is met in the case depicted above, so the function returns and
dl_b->total_bw is not updated (the p->dl.dl_bw is not added to it). After this,
admission control is broken.

This patch fixes the thing, properly clearing static parameters for a task
that ceases to use SCHED_DEADLINE.

Reported-by: Daniele Alessandrelli <daniele.alessandrelli@gmail.com>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: linux-kernel@vger.kernel.org
---
 kernel/sched/core.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec1a286..01738b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1776,6 +1776,21 @@  int wake_up_state(struct task_struct *p, unsigned int state)
 }
 
 /*
+ * This function clears the sched_dl_entity static params.
+ */
+static void
+__dl_clear_params(struct task_struct *p)
+{
+	struct sched_dl_entity *dl_se = &p->dl;
+
+	dl_se->dl_runtime = 0;
+	dl_se->dl_deadline = 0;
+	dl_se->dl_period = 0;
+	dl_se->flags = 0;
+	dl_se->dl_bw = 0;
+}
+
+/*
  * Perform scheduler related setup for a newly forked process p.
  * p is forked by current.
  *
@@ -1799,10 +1814,7 @@  static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	RB_CLEAR_NODE(&p->dl.rb_node);
 	hrtimer_init(&p->dl.dl_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	p->dl.dl_runtime = p->dl.runtime = 0;
-	p->dl.dl_deadline = p->dl.deadline = 0;
-	p->dl.dl_period = 0;
-	p->dl.flags = 0;
+	__dl_clear_params(p);
 
 	INIT_LIST_HEAD(&p->rt.run_list);
 
@@ -2060,6 +2072,7 @@  static int dl_overflow(struct task_struct *p, int policy,
 		err = 0;
 	} else if (!dl_policy(policy) && task_has_dl_policy(p)) {
 		__dl_clear(dl_b, p->dl.dl_bw);
+		__dl_clear_params(p);
 		err = 0;
 	}
 	raw_spin_unlock(&dl_b->lock);
-- 
2.0.4