From patchwork Wed Apr 10 14:55:41 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 16036 Return-Path: X-Original-To: linaro@staging.patches.linaro.org Delivered-To: linaro@staging.patches.linaro.org Received: from mail-qe0-f70.google.com (mail-qe0-f70.google.com [209.85.128.70]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 8156C23912 for ; Wed, 10 Apr 2013 14:56:18 +0000 (UTC) Received: by mail-qe0-f70.google.com with SMTP id 1sf918844qec.9 for ; Wed, 10 Apr 2013 07:55:59 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-beenthere:x-received:received-spf :x-received:x-forwarded-to:x-forwarded-for:delivered-to:x-received :received-spf:x-received:from:to:cc:subject:date:message-id:x-mailer :x-gm-message-state:x-original-sender :x-original-authentication-results:precedence:mailing-list:list-id :x-google-group-id:list-post:list-help:list-archive:list-unsubscribe; bh=F2k50nZgWwRT75xKX8nMKkYAGwhcaqF0cFLjkxWHPH0=; b=gTYr9TONPmGP74w2cYcaYDtl5j+10Ypc/iNd0t3htDTCgN5hM7ZSYMarVfECPZmV5T IymCbr432QkQX1bBDdcgfo2fIVqietHMh8B3Oe1O7wHhPuRVWyW76fpMWmrs9aU83DPG ctYCiwcOpae38rh+iZIHObVPqtxZFOHPn0e1oKWZtn2i30QrlUqaiI3qUg3KghqHA4o7 4NboQ/Uc5lGFRDZkCYtJ8BGh1JrP/luqbq1nyzx91OSSbGwsMMbtqh3K7ug07qF2Hhzv XGbu97mBV8MwiO1JCieqy36W8B+Ii/pzMhFQ+cPPYfciyPQn21LU9xTZMA5Izfw3OJaV Nyag== X-Received: by 10.224.160.65 with SMTP id m1mr1639776qax.2.1365605759234; Wed, 10 Apr 2013 07:55:59 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.96.229 with SMTP id dv5ls357167qeb.39.gmail; Wed, 10 Apr 2013 07:55:59 -0700 (PDT) X-Received: by 10.58.143.116 with SMTP id sd20mr1740341veb.39.1365605759034; Wed, 10 Apr 2013 07:55:59 -0700 (PDT) Received: from mail-vb0-x22c.google.com (mail-vb0-x22c.google.com [2607:f8b0:400c:c02::22c]) by mx.google.com with ESMTPS id b3si184588vea.15.2013.04.10.07.55.58 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Apr 2013 07:55:59 -0700 (PDT) Received-SPF: neutral (google.com: 2607:f8b0:400c:c02::22c is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=2607:f8b0:400c:c02::22c; Received: by mail-vb0-f44.google.com with SMTP id e12so430222vbg.31 for ; Wed, 10 Apr 2013 07:55:58 -0700 (PDT) X-Received: by 10.58.168.208 with SMTP id zy16mr1817343veb.3.1365605758827; Wed, 10 Apr 2013 07:55:58 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.58.85.136 with SMTP id h8csp108687vez; Wed, 10 Apr 2013 07:55:58 -0700 (PDT) X-Received: by 10.195.13.232 with SMTP id fb8mr4119397wjd.51.1365605757419; Wed, 10 Apr 2013 07:55:57 -0700 (PDT) Received: from mail-wi0-x22e.google.com (mail-wi0-x22e.google.com [2a00:1450:400c:c05::22e]) by mx.google.com with ESMTPS id h2si92756wje.65.2013.04.10.07.55.57 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Apr 2013 07:55:57 -0700 (PDT) Received-SPF: neutral (google.com: 2a00:1450:400c:c05::22e is neither permitted nor denied by best guess record for domain of vincent.guittot@linaro.org) client-ip=2a00:1450:400c:c05::22e; Received: by mail-wi0-f174.google.com with SMTP id hj8so4904297wib.13 for ; Wed, 10 Apr 2013 07:55:57 -0700 (PDT) X-Received: by 10.180.211.50 with SMTP id mz18mr27313535wic.24.1365605756981; Wed, 10 Apr 2013 07:55:56 -0700 (PDT) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPS id bj9sm32263877wib.4.2013.04.10.07.55.55 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 10 Apr 2013 07:55:56 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, pjt@google.com, rostedt@goodmis.org, fweisbec@gmail.com, efault@gmx.de Cc: Vincent Guittot Subject: [PATCH v6] sched: fix wrong rq's runnable_avg update with rt tasks Date: Wed, 10 Apr 2013 16:55:41 +0200 Message-Id: <1365605741-1195-1-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 X-Gm-Message-State: ALoCoQm3TnjnypaPxJTDOCkpNR9usnHIgoIJNIQUCVIp49HBScySYSqmlPo1r6kNrUx1veouhxm0 X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 2607:f8b0:400c:c02::22c is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Precedence: list Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , The current update of the rq's load can be erroneous when RT tasks are involved The update of the load of a rq that becomes idle, is done only if the avg_idle is less than sysctl_sched_migration_cost. If RT tasks and short idle duration alternate, the runnable_avg will not be updated correctly and the time will be accounted as idle time when a CFS task wakes up. A new idle_enter function is called when the next task is the idle function so the elapsed time will be accounted as run time in the load of the rq, whatever the average idle time is. The function update_rq_runnable_avg is removed from idle_balance. When a RT task is scheduled on an idle CPU, the update of the rq's load is not done when the rq exit idle state because CFS's functions are not called. Then, the idle_balance, which is called just before entering the idle function, updates the rq's load and makes the assumption that the elapsed time since the last update, was only running time. As a consequence, the rq's load of a CPU that only runs a periodic RT task, is close to LOAD_AVG_MAX whatever the running duration of the RT task is. A new idle_exit function is called when the prev task is the idle function so the elapsed time will be accounted as idle time in the rq's load. Changes since V5: - Rename idle_enter/exit function to idle_enter/exit_fair Changes since V4: - Rebase on v3.9-rc6 instead of Steven Rostedt's patches - Create the post_schedule_idle function that was previously created by Steven's patches Changes since V3: - Remove dependancy with CONFIG_FAIR_GROUP_SCHED - Add a new idle_enter function and create a post_schedule callback for idle class - Remove the update_runnable_avg from idle_balance Changes since V2: - remove useless definition for UP platform - rebased on top of Steven Rostedt's patches : https://lkml.org/lkml/2013/2/12/558 Changes since V1: - move code out of schedule function and create a pre_schedule callback for idle class instead. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 23 +++++++++++++++++++++-- kernel/sched/idle_task.c | 16 ++++++++++++++++ kernel/sched/sched.h | 12 ++++++++++++ 3 files changed, 49 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7a33e59..1de3df0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1562,6 +1562,27 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); } /* migrations, e.g. sleep=0 leave decay_count == 0 */ } + +/* + * Update the rq's load with the elapsed running time before entering + * idle. if the last scheduled task is not a CFS task, idle_enter will + * be the only way to update the runnable statistic. + */ +void idle_enter_fair(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 1); +} + +/* + * Update the rq's load with the elapsed idle time before a task is + * scheduled. if the newly scheduled task is not a CFS task, idle_exit will + * be the only way to update the runnable statistic. + */ +void idle_exit_fair(struct rq *this_rq) +{ + update_rq_runnable_avg(this_rq, 0); +} + #else static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) {} @@ -5219,8 +5240,6 @@ void idle_balance(int this_cpu, struct rq *this_rq) if (this_rq->avg_idle < sysctl_sched_migration_cost) return; - update_rq_runnable_avg(this_rq, 1); - /* * Drop the rq->lock, but keep IRQ/preempt disabled. */ diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index b6baf37..b8ce773 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -13,6 +13,16 @@ select_task_rq_idle(struct task_struct *p, int sd_flag, int flags) { return task_cpu(p); /* IDLE tasks as never migrated */ } + +static void pre_schedule_idle(struct rq *rq, struct task_struct *prev) +{ + idle_exit_fair(rq); +} + +static void post_schedule_idle(struct rq *rq) +{ + idle_enter_fair(rq); +} #endif /* CONFIG_SMP */ /* * Idle tasks are unconditionally rescheduled: @@ -25,6 +35,10 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl static struct task_struct *pick_next_task_idle(struct rq *rq) { schedstat_inc(rq, sched_goidle); +#ifdef CONFIG_SMP + /* Trigger the post schedule to do an idle_enter for CFS */ + rq->post_schedule = 1; +#endif return rq->idle; } @@ -86,6 +100,8 @@ const struct sched_class idle_sched_class = { #ifdef CONFIG_SMP .select_task_rq = select_task_rq_idle, + .pre_schedule = pre_schedule_idle, + .post_schedule = post_schedule_idle, #endif .set_curr_task = set_curr_task_idle, diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index cc03cfd..8f1d80e 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -880,6 +880,18 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +/* + * Only depends on SMP, FAIR_GROUP_SCHED may be removed when runnable_avg + * becomes useful in lb + */ +#if defined(CONFIG_FAIR_GROUP_SCHED) +extern void idle_enter_fair(struct rq *this_rq); +extern void idle_exit_fair(struct rq *this_rq); +#else +static inline void idle_enter_fair(struct rq *this_rq) {} +static inline void idle_exit_fair(struct rq *this_rq) {} +#endif + #else /* CONFIG_SMP */ static inline void idle_balance(int cpu, struct rq *rq)