From patchwork Thu Apr 18 16:34:26 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 16250 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f198.google.com (mail-qc0-f198.google.com [209.85.216.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 1B5B923974 for ; Thu, 18 Apr 2013 16:35:53 +0000 (UTC) Received: by mail-qc0-f198.google.com with SMTP id v20sf4285217qcm.1 for ; Thu, 18 Apr 2013 09:35:15 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-beenthere:x-received:received-spf :x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=F2k50nZgWwRT75xKX8nMKkYAGwhcaqF0cFLjkxWHPH0=; b=Cgv5hfOfzdrqdrpodLK5PAT4INi9nblA9g/huriun+7MFolL2F4hcFbG4SPUkPQCEo HRM05bdOvEM7gnXLHkYpAz36XhAn7mPz8vlFxL88QsCWktPByYbiefp/VZsRKdnz0PZa FwlWzQYeFC06U23hTKi7uTFDHYSne3X4hGsnM5JUqOHvhYQvmeNEW4LNVp3bSvL6Kou4 0WBxZGbGhob9nfTIJ25GTVLjDUxu3/wL9XULu3iHI38th0qxD4Y34m7hrmdd1HDlJgek qMabOLu7rBFKyv+v1crdsnvOrYOiKsowghwzTlppPf9wz10vj9g2zYaParSvl0Y9yJj2 53ow== X-Received: by 10.224.10.6 with SMTP id n6mr4672849qan.4.1366302915054; Thu, 18 Apr 2013 09:35:15 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.60.5 with SMTP id d5ls1689921qer.10.gmail; Thu, 18 Apr 2013 09:35:14 -0700 (PDT) X-Received: by 10.58.56.161 with SMTP id b1mr8859728veq.42.1366302914744; Thu, 18 Apr 2013 09:35:14 -0700 (PDT) Received: from mail-vc0-f182.google.com (mail-vc0-f182.google.com [209.85.220.182]) by mx.google.com with ESMTPS id zr5si1317460vec.17.2013.04.18.09.35.14 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 18 Apr 2013 09:35:14 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.220.182 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.220.182; Received: by mail-vc0-f182.google.com with SMTP id ht11so557329vcb.41 for ; Thu, 18 Apr 2013 09:35:14 -0700 (PDT) X-Received: by 10.52.166.103 with SMTP id zf7mr7510837vdb.94.1366302914109; Thu, 18 Apr 2013 09:35:14 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.58.127.98 with SMTP id nf2csp133263veb; Thu, 18 Apr 2013 09:35:12 -0700 (PDT) X-Received: by 10.194.110.69 with SMTP id hy5mr20188391wjb.1.1366302911950; Thu, 18 Apr 2013 09:35:11 -0700 (PDT) Received: from mail-we0-x232.google.com (mail-we0-x232.google.com [2a00:1450:400c:c03::232]) by mx.google.com with ESMTPS id bg9si3922730wjc.235.2013.04.18.09.35.11 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 18 Apr 2013 09:35:11 -0700 (PDT) Received-SPF: neutral (google.com: 2a00:1450:400c:c03::232 is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=2a00:1450:400c:c03::232; Received: by mail-we0-f178.google.com with SMTP id z53so2499601wey.23 for ; Thu, 18 Apr 2013 09:35:11 -0700 (PDT) X-Received: by 10.194.158.161 with SMTP id wv1mr20178521wjb.38.1366302911383; Thu, 18 Apr 2013 09:35:11 -0700 (PDT) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPS id j4sm32372546wiz.10.2013.04.18.09.35.09 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 18 Apr 2013 09:35:10 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, pjt@google.com, rostedt@goodmis.org, fweisbec@gmail.com, efault@gmx.de Cc: Vincent Guittot Subject: [PATCH Resend v6] sched: fix wrong rq's runnable_avg update with rt tasks Date: Thu, 18 Apr 2013 18:34:26 +0200 Message-Id: <1366302867-5055-1-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 X-Gm-Message-State: ALoCoQnxVr2BhvwOuzevdfb8KFWM9RBloOdz8XaWVIV2gd0c0l8VFxmT9x+vJpp5XINfsa6Ogp2p X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.220.182 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V5: - Rename idle_enter/exit function to idle_enter/exit_fair Changes since V4: - Rebase on v3.9-rc6 instead of Steven Rostedt's patches - Create the post_schedule_idle function that was previously created by Steven's patches Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/idle_task.c | 16 ++++++++++++++++ kernel/sched/sched.h | 12 ++++++++++++ 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a33e59..1de3df0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter_fair(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit_fair(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep IRQ/preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..b8ce773 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,16 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + idle_exit_fair(rq); +} + +static void post_schedule_idle(struct rq *rq) +{ + idle_enter_fair(rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -25,6 +35,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); +#ifdef CONFIG_SMP + /* Trigger the post schedule to do an idle_enter for CFS */ + rq->post_schedule = 1; +#endif return rq->idle; } @@ -86,6 +100,8 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, + .post_schedule = post_schedule_idle, #endif .set_curr_task = set_curr_task_idle, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cc03cfd..8f1d80e 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -880,6 +880,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter_fair(struct rq *this_rq); +extern void idle_exit_fair(struct rq *this_rq); +#else +static inline void idle_enter_fair(struct rq *this_rq) {} +static inline void idle_exit_fair(struct rq *this_rq) {} +#endif + #else /* CONFIG_SMP */ static inline void idle_balance(int cpu, struct rq *rq)