[v3,2/2] sched/rt: Add check_preempt_equal_prio() logic in pick_next_task_rt()

Message ID 1423410686-1928-2-git-send-email-pang.xunlei@linaro.org
State New
Headers show

Commit Message

pang.xunlei Feb. 8, 2015, 3:51 p.m.
check_preempt_curr() doesn't call sched_class::check_preempt_curr
when the class of current is a higher level. So if there is a DL
task running when doing this for RT, check_preempt_equal_prio()
will definitely miss, which may result in some response latency
for this RT task if it is pinned and there're some same-priority
migratable rt tasks already queued.

We should do the similar thing in select_task_rq_rt() when first
picking rt tasks after running out of DL tasks.

This patch tackles the issue by peeking the next rt task(RT1), and
if find RT1 migratable, just requeue it to the tail of the rq using
requeue_task_rt(rq, p, 0). In this way:
- If there do have another rt task(RT2) with the same priority as
  RT1, RT2 will finally be picked as the running task. While RT1
  will be pushed onto another cpu via RT1's post_schedule(), as
  RT1 is migratable. The difference from check_preempt_equal_prio()
  here is that we just don't care whether RT2 is migratable.

- Otherwise, if there's no rt task with the same priority as RT1,
  RT1 will still be picked as the running task after the requeuing.

Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
---
 kernel/sched/rt.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

pang.xunlei Feb. 13, 2015, 3:55 a.m. | #1
Hi steve,

On 13 February 2015 at 08:04, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Sun,  8 Feb 2015 23:51:26 +0800
> Xunlei Pang <pang.xunlei@linaro.org> wrote:
>
>> check_preempt_curr() doesn't call sched_class::check_preempt_curr
>> when the class of current is a higher level.
>
> The above sentence does not make sense.
>
>> So if there is a DL
>> task running when doing this for RT, check_preempt_equal_prio()
>
> Doing what for RT?
>
>> will definitely miss, which may result in some response latency
>
> Miss what?

Sorry, this may lack some information I need to further explain in detail.

>
>> for this RT task if it is pinned and there're some same-priority
>> migratable rt tasks already queued.
>>
>> We should do the similar thing in select_task_rq_rt() when first
>> picking rt tasks after running out of DL tasks.
>>
>> This patch tackles the issue by peeking the next rt task(RT1), and
>> if find RT1 migratable, just requeue it to the tail of the rq using
>> requeue_task_rt(rq, p, 0). In this way:
>> - If there do have another rt task(RT2) with the same priority as
>>   RT1, RT2 will finally be picked as the running task. While RT1
>>   will be pushed onto another cpu via RT1's post_schedule(), as
>>   RT1 is migratable. The difference from check_preempt_equal_prio()
>>   here is that we just don't care whether RT2 is migratable.
>>
>> - Otherwise, if there's no rt task with the same priority as RT1,
>>   RT1 will still be picked as the running task after the requeuing.
>
> What happens if there's three RT tasks of the same prio, RT1 is ready
> to run and is migratable, RT2 is pinned, RT3 is migratable
>
> RT1 just got pushed behind RT3 and it is now not the next one to run.
> RT2 will get this rq, RT3 will be pushed off, but say there's no more
> rq's available to run RT1.
>
> You just broke FIFO.

Yes, I've also thought of this point before.

If this is a problem, we may have the same thing happening in
current check_preempt_equal_prio() code:
When a pinned waking task preempts the current successfully,
because it thinks current is migratable via cpupri_find().

But when resched happens, things may change, i.e. current
becomes non-migratable, so the waking task gets running, while
the previous running task gets stuck. See, it also broke FIFO.

Thanks,
Xunlei

>
> I'm sorry, I'm thinking this is trying too hard to fix the users poor
> management of RT tasks.
>
> If you have 2 or more RT tasks of the same prio, you had better be damn
> aware that if one is pinned, it will block the others, even from
> migrating. You should not have pinned tasks of the same prio as those
> that can migrate.
>
> And if your system depends on DL tasks working nicely with RT tasks on
> the same CPU, it's even more broken by design.
>
> -- Steve
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
pang.xunlei Feb. 13, 2015, 4:40 a.m. | #2
Hi Steve,

On 13 February 2015 at 11:55, Xunlei Pang <pang.xunlei@linaro.org> wrote:
> Hi steve,
>
> On 13 February 2015 at 08:04, Steven Rostedt <rostedt@goodmis.org> wrote:
>> On Sun,  8 Feb 2015 23:51:26 +0800
>> Xunlei Pang <pang.xunlei@linaro.org> wrote:
>>
>>> check_preempt_curr() doesn't call sched_class::check_preempt_curr
>>> when the class of current is a higher level.
>>
>> The above sentence does not make sense.
>>
>>> So if there is a DL
>>> task running when doing this for RT, check_preempt_equal_prio()
>>
>> Doing what for RT?
>>
>>> will definitely miss, which may result in some response latency
>>
>> Miss what?
>
> Sorry, this may lack some information I need to further explain in detail.
>
>>
>>> for this RT task if it is pinned and there're some same-priority
>>> migratable rt tasks already queued.
>>>
>>> We should do the similar thing in select_task_rq_rt() when first
>>> picking rt tasks after running out of DL tasks.
>>>
>>> This patch tackles the issue by peeking the next rt task(RT1), and
>>> if find RT1 migratable, just requeue it to the tail of the rq using
>>> requeue_task_rt(rq, p, 0). In this way:
>>> - If there do have another rt task(RT2) with the same priority as
>>>   RT1, RT2 will finally be picked as the running task. While RT1
>>>   will be pushed onto another cpu via RT1's post_schedule(), as
>>>   RT1 is migratable. The difference from check_preempt_equal_prio()
>>>   here is that we just don't care whether RT2 is migratable.
>>>
>>> - Otherwise, if there's no rt task with the same priority as RT1,
>>>   RT1 will still be picked as the running task after the requeuing.
>>
>> What happens if there's three RT tasks of the same prio, RT1 is ready
>> to run and is migratable, RT2 is pinned, RT3 is migratable
>>
>> RT1 just got pushed behind RT3 and it is now not the next one to run.
>> RT2 will get this rq, RT3 will be pushed off, but say there's no more
>> rq's available to run RT1.
>>
>> You just broke FIFO.
>
> Yes, I've also thought of this point before.
>
> If this is a problem, we may have the same thing happening in
> current check_preempt_equal_prio() code:
> When a pinned waking task preempts the current successfully,
> because it thinks current is migratable via cpupri_find().
>
> But when resched happens, things may change, i.e. current
> becomes non-migratable, so the waking task gets running, while
> the previous running task gets stuck. See, it also broke FIFO.

Aside of this, please ignore this patch, the waking rt tasks will also be
pushed via task_woken_rt() when current is DL, which I missed before.

Thanks,
Xunlei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 04c58b7..26114f5 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1482,6 +1482,22 @@  pick_next_task_rt(struct rq *rq, struct task_struct *prev)
 
 	put_prev_task(rq, prev);
 
+#ifdef CONFIG_SMP
+	/*
+	 * If there's a running higher class task, check_preempt_curr()
+	 * doesn't invoke check_preempt_equal_prio() for rt tasks, so
+	 * we can do the similar thing here.
+	 */
+	if (rq->rt.rt_nr_total > 1 &&
+	    (prev->sched_class == &dl_sched_class ||
+	     prev->sched_class == &stop_sched_class)) {
+		p = peek_next_task_rt(rq);
+		if (p->nr_cpus_allowed != 1 &&
+		    cpupri_find(&rq->rd->cpupri, p, NULL))
+			requeue_task_rt(rq, p, 0);
+	}
+#endif
+
 	p = _pick_next_task_rt(rq);
 
 	/* The running task is never eligible for pushing */