From patchwork Tue Sep 23 16:08:02 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 37741 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-wi0-f198.google.com (mail-wi0-f198.google.com [209.85.212.198]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 88C8720792 for ; Tue, 23 Sep 2014 16:10:08 +0000 (UTC) Received: by mail-wi0-f198.google.com with SMTP id n3sf2703822wiv.1 for ; Tue, 23 Sep 2014 09:10:07 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:delivered-to:from:to:cc:subject :date:message-id:in-reply-to:references:sender:precedence:list-id :x-original-sender:x-original-authentication-results:mailing-list :list-post:list-help:list-archive:list-unsubscribe; bh=0ikeG2PvpHUSfIzlDad8zzuOSW2Bp9HJSm5wUBZxmQI=; b=CrXf9SstYCmOqIDQCc2lDAOqgSkFDlTgeyA9Dq4IqWV98iRL38Gh7m6qFRubKrYPds KZGgYD6qdycBk2LvIFIO8JwhJPhRJ339qzoTQVl3LYdl69qAnwJGHUqQ5QRry0Qq/+EO 1KQCGnPM4h4WGv5YuSMPwPzIg2Md0EjrmHDhtvt1tgUighWZsbGwChj/Dlv0km479EPU 9Q550kTp7u/U7XgcQe+HDDcBAqeygBWQ/ZVrCHCRPfx111bYbbLVY1jqrNO5/F4OeWzO UOFdTtY+SabOZAdibLQNtym/vZNKNOxkctglo9o+dap+aV5wBSY0Cr7Akh3xE1SEmOoh CSMg== X-Gm-Message-State: ALoCoQnO38mhRngW+7OPemFSgBr3n75XhwvnXPMhMpgcfie4+lHSHB8E4DqLrat6kElr9/XooxSZ X-Received: by 10.112.184.197 with SMTP id ew5mr119310lbc.0.1411488607491; Tue, 23 Sep 2014 09:10:07 -0700 (PDT) MIME-Version: 1.0 X-BeenThere: patchwork-forward@linaro.org Received: by 10.152.87.175 with SMTP id az15ls67863lab.96.gmail; Tue, 23 Sep 2014 09:10:07 -0700 (PDT) X-Received: by 10.112.130.168 with SMTP id of8mr550926lbb.5.1411488607322; Tue, 23 Sep 2014 09:10:07 -0700 (PDT) Received: from mail-la0-f42.google.com (mail-la0-f42.google.com [209.85.215.42]) by mx.google.com with ESMTPS id x7si19245941lbi.116.2014.09.23.09.10.07 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 23 Sep 2014 09:10:07 -0700 (PDT) Received-SPF: pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.42 as permitted sender) client-ip=209.85.215.42; Received: by mail-la0-f42.google.com with SMTP id hz20so8997593lab.1 for ; Tue, 23 Sep 2014 09:10:06 -0700 (PDT) X-Received: by 10.112.62.200 with SMTP id a8mr659609lbs.34.1411488606602; Tue, 23 Sep 2014 09:10:06 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patch@linaro.org Received: by 10.112.130.169 with SMTP id of9csp442037lbb; Tue, 23 Sep 2014 09:10:05 -0700 (PDT) X-Received: by 10.66.234.1 with SMTP id ua1mr530070pac.133.1411488604487; Tue, 23 Sep 2014 09:10:04 -0700 (PDT) Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id pd4si21565535pdb.173.2014.09.23.09.10.03 for ; Tue, 23 Sep 2014 09:10:04 -0700 (PDT) Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not designate permitted sender hosts) client-ip=209.132.180.67; Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756507AbaIWQJi (ORCPT + 27 others); Tue, 23 Sep 2014 12:09:38 -0400 Received: from mail-we0-f171.google.com ([74.125.82.171]:49267 "EHLO mail-we0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756396AbaIWQJf (ORCPT ); Tue, 23 Sep 2014 12:09:35 -0400 Received: by mail-we0-f171.google.com with SMTP id k48so4874285wev.2 for ; Tue, 23 Sep 2014 09:09:33 -0700 (PDT) X-Received: by 10.194.184.166 with SMTP id ev6mr587413wjc.61.1411488573779; Tue, 23 Sep 2014 09:09:33 -0700 (PDT) Received: from lmenx30s.lme.st.com (LPuteaux-656-01-48-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPSA id hm5sm16427901wjb.2.2014.09.23.09.09.31 for (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 23 Sep 2014 09:09:33 -0700 (PDT) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, preeti@linux.vnet.ibm.com, linux@arm.linux.org.uk, linux-arm-kernel@lists.infradead.org Cc: riel@redhat.com, Morten.Rasmussen@arm.com, efault@gmx.de, nicolas.pitre@linaro.org, linaro-kernel@lists.linaro.org, daniel.lezcano@linaro.org, dietmar.eggemann@arm.com, pjt@google.com, bsegall@google.com, Vincent Guittot Subject: [PATCH v6 3/6] sched: add utilization_avg_contrib Date: Tue, 23 Sep 2014 18:08:02 +0200 Message-Id: <1411488485-10025-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1411488485-10025-1-git-send-email-vincent.guittot@linaro.org> References: <1411488485-10025-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: list List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: vincent.guittot@linaro.org X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of patch+caf_=patchwork-forward=linaro.org@linaro.org designates 209.85.215.42 as permitted sender) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Add new statistics which reflect the average time a task is running on the CPU and the sum of these running time of the tasks on a runqueue. The latter is named utilization_avg_contrib. This patch is based on the usage metric that was proposed in the 1st versions of the per-entity load tracking patchset by Paul Turner but that has be removed afterward. This version differs from the original one in the sense that it's not linked to task_group. The rq's utilization_avg_contrib will be used to check if a rq is overloaded or not instead of trying to compute how many task a group of CPUs can handle Rename runnable_avg_period into avg_period as it is now used with both runnable_avg_sum and running_avg_sum Add some descriptions of the variables to explain their differences Signed-off-by: Vincent Guittot --- include/linux/sched.h | 19 +++++++++++--- kernel/sched/debug.c | 9 ++++--- kernel/sched/fair.c | 68 +++++++++++++++++++++++++++++++++++++++------------ kernel/sched/sched.h | 8 +++++- 4 files changed, 81 insertions(+), 23 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 48ae6c4..51df220 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1071,15 +1071,26 @@ struct load_weight { }; struct sched_avg { + u64 last_runnable_update; + s64 decay_count; + /* + * utilization_avg_contrib describes the amount of time that a + * sched_entity is running on a CPU. It is based on running_avg_sum + * and is scaled in the range [0..SCHED_LOAD_SCALE]. + * load_avg_contrib described the the amount of time that a + * sched_entity is runnable on a rq. It is based on both + * runnable_avg_sum and the weight of the task. + */ + unsigned long load_avg_contrib, utilization_avg_contrib; /* * These sums represent an infinite geometric series and so are bound * above by 1024/(1-y). Thus we only need a u32 to store them for all * choices of y < 1-2^(-32)*1024. + * runnable_avg_sum represents the amount of time a sched_entity is on + * the runqueue whereas running_avg_sum reflects the time the + * sched_entity is effectively running on the runqueue. */ - u32 runnable_avg_sum, runnable_avg_period; - u64 last_runnable_update; - s64 decay_count; - unsigned long load_avg_contrib; + u32 runnable_avg_sum, avg_period, running_avg_sum; }; #ifdef CONFIG_SCHEDSTATS diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index c7fe1ea0..1761db9 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -71,7 +71,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group if (!se) { struct sched_avg *avg = &cpu_rq(cpu)->avg; P(avg->runnable_avg_sum); - P(avg->runnable_avg_period); + P(avg->avg_period); return; } @@ -94,7 +94,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group P(se->load.weight); #ifdef CONFIG_SMP P(se->avg.runnable_avg_sum); - P(se->avg.runnable_avg_period); + P(se->avg.avg_period); P(se->avg.load_avg_contrib); P(se->avg.decay_count); #endif @@ -215,6 +215,8 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq) cfs_rq->runnable_load_avg); SEQ_printf(m, " .%-30s: %ld\n", "blocked_load_avg", cfs_rq->blocked_load_avg); + SEQ_printf(m, " .%-30s: %ld\n", "utilization_load_avg", + cfs_rq->utilization_load_avg); #ifdef CONFIG_FAIR_GROUP_SCHED SEQ_printf(m, " .%-30s: %ld\n", "tg_load_contrib", cfs_rq->tg_load_contrib); @@ -631,7 +633,8 @@ void proc_sched_show_task(struct task_struct *p, struct seq_file *m) P(se.load.weight); #ifdef CONFIG_SMP P(se.avg.runnable_avg_sum); - P(se.avg.runnable_avg_period); + P(se.avg.running_avg_sum); + P(se.avg.avg_period); P(se.avg.load_avg_contrib); P(se.avg.decay_count); #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7422044..2cf153d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -677,8 +677,8 @@ void init_task_runnable_average(struct task_struct *p) p->se.avg.decay_count = 0; slice = sched_slice(task_cfs_rq(p), &p->se) >> 10; - p->se.avg.runnable_avg_sum = slice; - p->se.avg.runnable_avg_period = slice; + p->se.avg.runnable_avg_sum = p->se.avg.running_avg_sum = slice; + p->se.avg.avg_period = slice; __update_task_entity_contrib(&p->se); } #else @@ -1547,7 +1547,7 @@ static u64 numa_get_avg_runtime(struct task_struct *p, u64 *period) *period = now - p->last_task_numa_placement; } else { delta = p->se.avg.runnable_avg_sum; - *period = p->se.avg.runnable_avg_period; + *period = p->se.avg.avg_period; } p->last_sum_exec_runtime = runtime; @@ -2297,7 +2297,8 @@ static u32 __compute_runnable_contrib(u64 n) */ static __always_inline int __update_entity_runnable_avg(u64 now, struct sched_avg *sa, - int runnable) + int runnable, + int running) { u64 delta, periods; u32 runnable_contrib; @@ -2323,7 +2324,7 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa->last_runnable_update = now; /* delta_w is the amount already accumulated against our next period */ - delta_w = sa->runnable_avg_period % 1024; + delta_w = sa->avg_period % 1024; if (delta + delta_w >= 1024) { /* period roll-over */ decayed = 1; @@ -2336,7 +2337,9 @@ static __always_inline int __update_entity_runnable_avg(u64 now, delta_w = 1024 - delta_w; if (runnable) sa->runnable_avg_sum += delta_w; - sa->runnable_avg_period += delta_w; + if (running) + sa->running_avg_sum += delta_w; + sa->avg_period += delta_w; delta -= delta_w; @@ -2346,20 +2349,26 @@ static __always_inline int __update_entity_runnable_avg(u64 now, sa->runnable_avg_sum = decay_load(sa->runnable_avg_sum, periods + 1); - sa->runnable_avg_period = decay_load(sa->runnable_avg_period, + sa->running_avg_sum = decay_load(sa->running_avg_sum, + periods + 1); + sa->avg_period = decay_load(sa->avg_period, periods + 1); /* Efficiently calculate \sum (1..n_period) 1024*y^i */ runnable_contrib = __compute_runnable_contrib(periods); if (runnable) sa->runnable_avg_sum += runnable_contrib; - sa->runnable_avg_period += runnable_contrib; + if (running) + sa->running_avg_sum += runnable_contrib; + sa->avg_period += runnable_contrib; } /* Remainder of delta accrued against u_0` */ if (runnable) sa->runnable_avg_sum += delta; - sa->runnable_avg_period += delta; + if (running) + sa->running_avg_sum += delta; + sa->avg_period += delta; return decayed; } @@ -2411,7 +2420,7 @@ static inline void __update_tg_runnable_avg(struct sched_avg *sa, /* The fraction of a cpu used by this cfs_rq */ contrib = div_u64((u64)sa->runnable_avg_sum << NICE_0_SHIFT, - sa->runnable_avg_period + 1); + sa->avg_period + 1); contrib -= cfs_rq->tg_runnable_contrib; if (abs(contrib) > cfs_rq->tg_runnable_contrib / 64) { @@ -2464,7 +2473,8 @@ static inline void __update_group_entity_contrib(struct sched_entity *se) static inline void update_rq_runnable_avg(struct rq *rq, int runnable) { - __update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable); + __update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable, + runnable); __update_tg_runnable_avg(&rq->avg, &rq->cfs); } #else /* CONFIG_FAIR_GROUP_SCHED */ @@ -2482,7 +2492,7 @@ static inline void __update_task_entity_contrib(struct sched_entity *se) /* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */ contrib = se->avg.runnable_avg_sum * scale_load_down(se->load.weight); - contrib /= (se->avg.runnable_avg_period + 1); + contrib /= (se->avg.avg_period + 1); se->avg.load_avg_contrib = scale_load(contrib); } @@ -2501,6 +2511,27 @@ static long __update_entity_load_avg_contrib(struct sched_entity *se) return se->avg.load_avg_contrib - old_contrib; } + +static inline void __update_task_entity_utilization(struct sched_entity *se) +{ + u32 contrib; + + /* avoid overflowing a 32-bit type w/ SCHED_LOAD_SCALE */ + contrib = se->avg.running_avg_sum * scale_load_down(SCHED_LOAD_SCALE); + contrib /= (se->avg.avg_period + 1); + se->avg.utilization_avg_contrib = scale_load(contrib); +} + +static long __update_entity_utilization_avg_contrib(struct sched_entity *se) +{ + long old_contrib = se->avg.utilization_avg_contrib; + + if (entity_is_task(se)) + __update_task_entity_utilization(se); + + return se->avg.utilization_avg_contrib - old_contrib; +} + static inline void subtract_blocked_load_contrib(struct cfs_rq *cfs_rq, long load_contrib) { @@ -2517,7 +2548,7 @@ static inline void update_entity_load_avg(struct sched_entity *se, int update_cfs_rq) { struct cfs_rq *cfs_rq = cfs_rq_of(se); - long contrib_delta; + long contrib_delta, utilization_delta; u64 now; /* @@ -2529,16 +2560,20 @@ static inline void update_entity_load_avg(struct sched_entity *se, else now = cfs_rq_clock_task(group_cfs_rq(se)); - if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq)) + if (!__update_entity_runnable_avg(now, &se->avg, se->on_rq, + cfs_rq->curr == se)) return; contrib_delta = __update_entity_load_avg_contrib(se); + utilization_delta = __update_entity_utilization_avg_contrib(se); if (!update_cfs_rq) return; - if (se->on_rq) + if (se->on_rq) { cfs_rq->runnable_load_avg += contrib_delta; + cfs_rq->utilization_load_avg += utilization_delta; + } else subtract_blocked_load_contrib(cfs_rq, -contrib_delta); } @@ -2615,6 +2650,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq, } cfs_rq->runnable_load_avg += se->avg.load_avg_contrib; + cfs_rq->utilization_load_avg += se->avg.utilization_avg_contrib; /* we force update consideration on load-balancer moves */ update_cfs_rq_blocked_load(cfs_rq, !wakeup); } @@ -2633,6 +2669,7 @@ static inline void dequeue_entity_load_avg(struct cfs_rq *cfs_rq, update_cfs_rq_blocked_load(cfs_rq, !sleep); cfs_rq->runnable_load_avg -= se->avg.load_avg_contrib; + cfs_rq->utilization_load_avg -= se->avg.utilization_avg_contrib; if (sleep) { cfs_rq->blocked_load_avg += se->avg.load_avg_contrib; se->avg.decay_count = atomic64_read(&cfs_rq->decay_counter); @@ -2970,6 +3007,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) */ update_stats_wait_end(cfs_rq, se); __dequeue_entity(cfs_rq, se); + update_entity_load_avg(se, 1); } update_stats_curr_start(cfs_rq, se); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index f332e45..3ccb136 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -339,8 +339,14 @@ struct cfs_rq { * Under CFS, load is tracked on a per-entity basis and aggregated up. * This allows for the description of both thread and group usage (in * the FAIR_GROUP_SCHED case). + * runnable_load_avg is the sum of the load_avg_contrib of the + * sched_entities on the rq. + * blocked_load_avg is similar to runnable_load_avg except that its + * the blocked sched_entities on the rq. + * utilization_load_avg is the sum of the average running time of the + * sched_entities on the rq. */ - unsigned long runnable_load_avg, blocked_load_avg; + unsigned long runnable_load_avg, blocked_load_avg, utilization_load_avg; atomic64_t decay_counter; u64 last_decay; atomic_long_t removed_load;