From patchwork Mon Apr 27 06:48:37 2015
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Xunlei Pang <xlpang@126.com>
X-Patchwork-Id: 47599
Return-Path: <patchwork-forward+bncBCN6JTVP2IEBBY5Y66UQKGQEFSODI5I@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-lb0-f200.google.com (mail-lb0-f200.google.com
 [209.85.217.200])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id BF86820553
 for <linaro@patches.linaro.org>; Mon, 27 Apr 2015 06:51:16 +0000 (UTC)
Received: by lbbqq2 with SMTP id qq2sf22660269lbb.0
 for <linaro@patches.linaro.org>; Sun, 26 Apr 2015 23:51:15 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject
 :date:message-id:in-reply-to:references:sender:precedence:list-id
 :x-original-sender:x-original-authentication-results:mailing-list
 :list-post:list-help:list-archive:list-unsubscribe;
 bh=BaSOgvlTE0jsira3Y1OOngCz2lKlCd5b2EgxCRq6KGY=;
 b=P8t7/c549ijySg4mWGaMKuEhcgLZgt1PLFlScSwYMC8a0Od7BFakA2kMlp4n4LgMNc
 Woen6s+JDtnZ8e6LbtLHcjzm9wTxBYhAopJgF0ScelH2uomPEN2egh835b4OFaeerFIR
 VY25YolrPYbeUcYvgOwqAk6zyStx5u7VR6HkT+3e620UP2tyszTXrUYFly4Wx70cEe/E
 Bv//z94gCryUFVv0BU9h3chvTgHyUI1LdOl//y4FVY6U0UHhhSfcMgMBx4vhaZFi8uyk
 ptOz7gf2qmUOztAw7Od56FtlQt4jQiHC3yNgB3C/Vsl//IOmQEpEngz6X4v4efLtFoy2
 VqQQ==
X-Gm-Message-State: ALoCoQl9wuEwAA5vEIQ6tZuFtxx186by9rMf8WgJjyBZfvOoQOdeyXj/WVri8yNk4+vIAwmHJalx
X-Received: by 10.112.14.101 with SMTP id o5mr6468947lbc.3.1430117475710;
 Sun, 26 Apr 2015 23:51:15 -0700 (PDT)
MIME-Version: 1.0
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.244.162 with SMTP id xh2ls740167lac.108.gmail;
 Sun, 26 Apr 2015 23:51:15 -0700 (PDT)
X-Received: by 10.152.37.201 with SMTP id a9mr7566730lak.120.1430117475444; 
 Sun, 26 Apr 2015 23:51:15 -0700 (PDT)
Received: from mail-la0-x22c.google.com (mail-la0-x22c.google.com.
 [2a00:1450:4010:c03::22c]) by mx.google.com with ESMTPS id
 xh7si14093944lac.132.2015.04.26.23.51.15
 for <patchwork-forward@linaro.org>
 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
 Sun, 26 Apr 2015 23:51:15 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c03::22c as permitted sender)
 client-ip=2a00:1450:4010:c03::22c; 
Received: by lagv1 with SMTP id v1so72702446lag.3
 for <patchwork-forward@linaro.org>;
 Sun, 26 Apr 2015 23:51:15 -0700 (PDT)
X-Received: by 10.152.36.2 with SMTP id m2mr8705916laj.72.1430117475323;
 Sun, 26 Apr 2015 23:51:15 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.67.65 with SMTP id l1csp1101792lbt;
 Sun, 26 Apr 2015 23:51:14 -0700 (PDT)
X-Received: by 10.68.216.99 with SMTP id op3mr19634143pbc.69.1430117473216; 
 Sun, 26 Apr 2015 23:51:13 -0700 (PDT)
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 b11si28518906pat.235.2015.04.26.23.51.11; 
 Sun, 26 Apr 2015 23:51:13 -0700 (PDT)
Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not
 designate permitted sender hosts) client-ip=209.132.180.67; 
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S932229AbbD0GvG (ORCPT <rfc822;anders.roxell@linaro.org>
 + 27 others); Mon, 27 Apr 2015 02:51:06 -0400
Received: from m15-111.126.com ([220.181.15.111]:35280 "EHLO m15-111.126.com"
 rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
 id S932149AbbD0Guq (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
 Mon, 27 Apr 2015 02:50:46 -0400
Received: from localhost.localdomain (unknown [210.21.223.3])
 by smtp1 (Coremail) with SMTP id C8mowEC5g0PH2z1VRqg4AA--.10035S5;
 Mon, 27 Apr 2015 14:48:56 +0800 (CST)
From: Xunlei Pang <xlpang@126.com>
To: linux-kernel@vger.kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>,
 Steven Rostedt <rostedt@goodmis.org>, Juri Lelli <juri.lelli@gmail.com>,
 Ingo Molnar <mingo@redhat.com>, Xunlei Pang <pang.xunlei@linaro.org>
Subject: [RFC PATCH RESEND 3/4] sched/rt: Fix wrong SMP scheduler behavior
 for equal prio cases
Date: Mon, 27 Apr 2015 14:48:37 +0800
Message-Id: <1430117318-2080-4-git-send-email-xlpang@126.com>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1430117318-2080-1-git-send-email-xlpang@126.com>
References: <1430117318-2080-1-git-send-email-xlpang@126.com>
X-CM-TRANSID: C8mowEC5g0PH2z1VRqg4AA--.10035S5
X-Coremail-Antispam: 1Uf129KBjvJXoW3AF1kuw17Gw13WF1xXry3XFb_yoWfKFWDpa
 95Ww18Ja1ktay2grZ7Zr48ZrW5KwnYqa17Jr93KayrKr1rt340gFnYyryayFs8ur109FWa
 yF4kKrZxGr1jqrJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2
 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jMT5dUUUUU=
X-Originating-IP: [210.21.223.3]
X-CM-SenderInfo: p0ost0bj6rjloofrz/1tbijB3ov1GfVCVqeQAAsq
Sender: linux-kernel-owner@vger.kernel.org
Precedence: list
List-ID: <patchwork-forward.linaro.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-Original-Sender: xlpang@126.com
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 2a00:1450:4010:c03::22c as permitted sender)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org; 
 dkim=neutral (body hash did not verify) header.i=@; dmarc=fail
 (p=NONE dis=NONE) header.from=126.com
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>

From: Xunlei Pang <pang.xunlei@linaro.org>

We know, there are two main queues each cpu for RT scheduler:
Let's call them "run queue" and "pushable queue" respectively.

For RT tasks, the scheduler uses "plist" to manage the pushable queue,
so when there are multiple tasks queued at the same priority, they get
queued in the strict FIFO order.

Currently, when an rt task gets queued, it is put to the head or the
tail of its "run queue" at the same priority according to different
scenarios. Then if it is migratable, it will also and always be put to
the tail of its "pushable queue" at the same priority.

For one cpu, assuming initially it has some migratable tasks queued
at the same priority as current(RT) both in "run queue" and "pushable
queue" in the same order. At some time, when current gets preempted, it
will be put behind these tasks in the "pushable queue", while it still
stays ahead of these tasks in the "run queue". Afterwards, if there comes
a pull from other cpu or a push from local cpu, the task behind current
in the "run queue" will be removed from the "pushable queue" and gets
running, as the global rt scheduler fetches tasks from the head of the
"pushable queue" to do pulling or pushing.

Obviously, to maintain the right order for the two queues, when current
is preempted(not involving re-queue in the "run queue"), we want to put it
ahead of all those tasks queued at the same priority in the "pushable queue".

So, if a task is running and gets preempted by a higher priority
task or even with same priority for migrating, this patch ensures
that it is put ahead of any existing task with the same priority in
the "pushable queue".

The handling logic used here:
 - Add a new variable named "rt_preempt"(define a new flag named
   RT_PREEMPT_QUEUEAHEAD for it) in task_struct, used by RT.
 - When doing preempted resched_curr() for current RT, set the flag.
   Create a new resched_curr_preempted_rt() for this function, and
   replace all the possible resched_curr() used for rt preemption with
   resched_curr_preempted_rt().
 - In put_prev_task_rt(), test RT_PREEMPT_QUEUEAHEAD if set, enqueue
   the task ahead in the "pushable queue" and clear the flag.

Signed-off-by: Xunlei Pang <pang.xunlei@linaro.org>
---
 include/linux/sched.h    |  5 +++
 include/linux/sched/rt.h | 16 ++++++++
 kernel/sched/core.c      |  6 ++-
 kernel/sched/rt.c        | 96 ++++++++++++++++++++++++++++++++++++++++++------
 4 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index f74d4cc..24e0f72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1321,6 +1321,11 @@ struct task_struct {
 	const struct sched_class *sched_class;
 	struct sched_entity se;
 	struct sched_rt_entity rt;
+
+#ifdef CONFIG_SMP
+	unsigned long rt_preempt; /* Used by rt */
+#endif
+
 #ifdef CONFIG_CGROUP_SCHED
 	struct task_group *sched_task_group;
 #endif
diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h
index 6341f5b..69e3c82 100644
--- a/include/linux/sched/rt.h
+++ b/include/linux/sched/rt.h
@@ -15,6 +15,22 @@ static inline int rt_task(struct task_struct *p)
 	return rt_prio(p->prio);
 }
 
+struct rq;
+
+#ifdef CONFIG_SMP
+extern void resched_curr_preempted_rt(struct rq *rq);
+
+static inline void resched_curr_preempted(struct rq *rq)
+{
+	resched_curr_preempted_rt(rq);
+}
+#else
+static inline void resched_curr_preempted(struct rq *rq)
+{
+	rsched_curr(rq);
+}
+#endif
+
 #ifdef CONFIG_RT_MUTEXES
 extern int rt_mutex_getprio(struct task_struct *p);
 extern void rt_mutex_setprio(struct task_struct *p, int prio);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9123a8..d13fc13 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1002,7 +1002,7 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
 			if (class == rq->curr->sched_class)
 				break;
 			if (class == p->sched_class) {
-				resched_curr(rq);
+				resched_curr_preempted(rq);
 				break;
 			}
 		}
@@ -1833,6 +1833,10 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 
 	INIT_LIST_HEAD(&p->rt.run_list);
 
+#ifdef CONFIG_SMP
+	p->rt_preempt = 0;
+#endif
+
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	INIT_HLIST_HEAD(&p->preempt_notifiers);
 #endif
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 0c0f4df..7439121 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -254,8 +254,33 @@ int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
 }
 #endif /* CONFIG_RT_GROUP_SCHED */
 
+
 #ifdef CONFIG_SMP
 
+#define RT_PREEMPT_QUEUEAHEAD    1UL
+
+/*
+ * p(current) was preempted, and to be put ahead of
+ * any task with the same priority in pushable queue.
+ */
+static inline bool rt_preempted(struct task_struct *p)
+{
+	return !!(p->rt_preempt & RT_PREEMPT_QUEUEAHEAD);
+}
+
+static inline void clear_rt_preempted(struct task_struct *p)
+{
+	p->rt_preempt = 0;
+}
+
+void resched_curr_preempted_rt(struct rq *rq)
+{
+	if (rt_task(rq->curr))
+		rq->curr->rt_preempt |= RT_PREEMPT_QUEUEAHEAD;
+
+	resched_curr(rq);
+}
+
 static int pull_rt_task(struct rq *this_rq);
 
 static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
@@ -359,17 +384,32 @@ static inline void set_post_schedule(struct rq *rq)
 	rq->post_schedule = has_pushable_tasks(rq);
 }
 
-static void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
+static void
+__enqueue_pushable_task(struct rq *rq, struct task_struct *p, bool head)
 {
 	plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
 	plist_node_init(&p->pushable_tasks, p->prio);
-	plist_add(&p->pushable_tasks, &rq->rt.pushable_tasks);
+	if (head)
+		plist_add_head(&p->pushable_tasks, &rq->rt.pushable_tasks);
+	else
+		plist_add_tail(&p->pushable_tasks, &rq->rt.pushable_tasks);
 
 	/* Update the highest prio pushable task */
 	if (p->prio < rq->rt.highest_prio.next)
 		rq->rt.highest_prio.next = p->prio;
 }
 
+static inline
+void enqueue_pushable_task_preempted(struct rq *rq, struct task_struct *curr)
+{
+	__enqueue_pushable_task(rq, curr, true);
+}
+
+static inline void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
+{
+	__enqueue_pushable_task(rq, p, false);
+}
+
 static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
 {
 	plist_del(&p->pushable_tasks, &rq->rt.pushable_tasks);
@@ -385,6 +425,25 @@ static void dequeue_pushable_task(struct rq *rq, struct task_struct *p)
 
 #else
 
+static inline bool rt_preempted(struct task_struct *p)
+{
+	return false;
+}
+
+static inline void clear_rt_preempted(struct task_struct *p)
+{
+}
+
+static inline void resched_curr_preempted_rt(struct rq *rq)
+{
+	resched_curr(rq);
+}
+
+static inline
+void enqueue_pushable_task_preempted(struct rq *rq, struct task_struct *p)
+{
+}
+
 static inline void enqueue_pushable_task(struct rq *rq, struct task_struct *p)
 {
 }
@@ -489,7 +548,7 @@ static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
 			enqueue_rt_entity(rt_se, false);
 
 		if (rt_rq->highest_prio.curr < curr->prio)
-			resched_curr(rq);
+			resched_curr_preempted_rt(rq);
 	}
 }
 
@@ -967,7 +1026,7 @@ static void update_curr_rt(struct rq *rq)
 			raw_spin_lock(&rt_rq->rt_runtime_lock);
 			rt_rq->rt_time += delta_exec;
 			if (sched_rt_runtime_exceeded(rt_rq))
-				resched_curr(rq);
+				resched_curr_preempted_rt(rq);
 			raw_spin_unlock(&rt_rq->rt_runtime_lock);
 		}
 	}
@@ -1409,7 +1468,7 @@ static void check_preempt_equal_prio_common(struct rq *rq)
 	 * to try and push current away.
 	 */
 	requeue_task_rt(rq, next, 1);
-	resched_curr(rq);
+	resched_curr_preempted_rt(rq);
 }
 
 static inline
@@ -1434,7 +1493,7 @@ void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
 static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flags)
 {
 	if (p->prio < rq->curr->prio) {
-		resched_curr(rq);
+		resched_curr_preempted_rt(rq);
 		return;
 	}
 
@@ -1544,8 +1603,21 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
 	 * The previous task needs to be made eligible for pushing
 	 * if it is still active
 	 */
-	if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1)
-		enqueue_pushable_task(rq, p);
+	if (on_rt_rq(&p->rt) && p->nr_cpus_allowed > 1) {
+		/*
+		 * When put_prev_task_rt() is called by
+		 * pick_next_task_rt(), if the current rt task
+		 * is being preempted, to maintain FIFO, it must
+		 * stay ahead of any other task that is queued
+		 * at the same priority.
+		 */
+		if (rt_preempted(p))
+			enqueue_pushable_task_preempted(rq, p);
+		else
+			enqueue_pushable_task(rq, p);
+	}
+
+	clear_rt_preempted(p);
 }
 
 #ifdef CONFIG_SMP
@@ -1764,7 +1836,7 @@ retry:
 	 * just reschedule current.
 	 */
 	if (unlikely(next_task->prio < rq->curr->prio)) {
-		resched_curr(rq);
+		resched_curr_preempted_rt(rq);
 		return 0;
 	}
 
@@ -1811,7 +1883,7 @@ retry:
 	activate_task(lowest_rq, next_task, 0);
 	ret = 1;
 
-	resched_curr(lowest_rq);
+	resched_curr_preempted_rt(lowest_rq);
 
 	double_unlock_balance(rq, lowest_rq);
 
@@ -2213,7 +2285,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
 			check_resched = 0;
 #endif /* CONFIG_SMP */
 		if (check_resched && p->prio < rq->curr->prio)
-			resched_curr(rq);
+			resched_curr_preempted_rt(rq);
 	}
 }
 
@@ -2255,7 +2327,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
 		 * then reschedule.
 		 */
 		if (p->prio < rq->curr->prio)
-			resched_curr(rq);
+			resched_curr_preempted_rt(rq);
 	}
 }