From patchwork Tue Nov 8 08:26:10 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 81250 Delivered-To: patch@linaro.org Received: by 10.140.97.165 with SMTP id m34csp1428786qge; Tue, 8 Nov 2016 00:26:48 -0800 (PST) X-Received: by 10.98.131.67 with SMTP id h64mr21108973pfe.86.1478593607823; Tue, 08 Nov 2016 00:26:47 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e19si32957786pgk.268.2016.11.08.00.26.47; Tue, 08 Nov 2016 00:26:47 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932307AbcKHI0c (ORCPT + 27 others); Tue, 8 Nov 2016 03:26:32 -0500 Received: from mail-wm0-f47.google.com ([74.125.82.47]:36041 "EHLO mail-wm0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932188AbcKHI02 (ORCPT ); Tue, 8 Nov 2016 03:26:28 -0500 Received: by mail-wm0-f47.google.com with SMTP id p190so230372016wmp.1 for ; Tue, 08 Nov 2016 00:26:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=oeGER7pcbNZEYkD2A+TIFjv9aNm4WI2yCe3khaglYBs=; b=T7AJNEYQP6XmWSeBxadQg/RhBtcLbqFaL9bkboBIEztQwkxp2UTqNmmN5MXpbDOPSu CghMh1XWlqFPxKWEQb4eVJMw8Awd2IPwDg3C0Kzj7JivBaLo1DrBNtRmHFjqIt6N9L/G bthRkLzr0ricaCeRsq26FSKccO8rP4q5WdP+o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=oeGER7pcbNZEYkD2A+TIFjv9aNm4WI2yCe3khaglYBs=; b=hzolNEn8TCpvCXUy6qnsT5WyYSD0Dnv1ChjQjFU+Rw3BxZtMCOkQQObI/1gIsW0nsr cVti3ERCEwstv/k1+B4u3tGf4tqjKyudUJkBMGAu8sNZEAHeWStC428cR3LbetDzXIui ZMszUkSNxLPLK5YhqT94MtVRgxKqEGSMWj9oLjVUAkROwHRklkdS42N/2Oo/uK7UkDgr KaVmPjvaR9i7D9E4n7RKGDyMGKnbKNb5tEaLRhbdw1g9sQ3JyytrrzekmXTf/R4qDoaM 1bOXtO3pPsw7Lirz4JvD2sWQf2T2/rwH2QOv+jz47fklANm0QFYP4LSEgHKjquJnMwgb 4hrA== X-Gm-Message-State: ABUngveCtEkQDWb46dUEws32neInhsSzZByiQg/y5Gt9jwi/5aiWzhQAtOgE1wsf1xC1A+Wu X-Received: by 10.28.14.65 with SMTP id 62mr13238287wmo.3.1478593586852; Tue, 08 Nov 2016 00:26:26 -0800 (PST) Received: from localhost.localdomain ([2a01:e35:8bd4:7750:6483:2475:9666:6640]) by smtp.gmail.com with ESMTPSA id w1sm35670015wje.36.2016.11.08.00.26.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 08 Nov 2016 00:26:26 -0800 (PST) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, dietmar.eggemann@arm.com Cc: yuyang.du@intel.com, Morten.Rasmussen@arm.com, pjt@google.com, bsegall@google.com, kernellwp@gmail.com, Vincent Guittot Subject: [PATCH 4/6 v6] sched: propagate load during synchronous attach/detach Date: Tue, 8 Nov 2016 09:26:10 +0100 Message-Id: <1478593572-26671-5-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1478593572-26671-1-git-send-email-vincent.guittot@linaro.org> References: <1478593572-26671-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When a task moves from/to a cfs_rq, we set a flag which is then used to propagate the change at parent level (sched_entity and cfs_rq) during next update. If the cfs_rq is throttled, the flag will stay pending until the cfs_rq is unthrottled. For propagating the utilization, we copy the utilization of group cfs_rq to the sched_entity. For propagating the load, we have to take into account the load of the whole task group in order to evaluate the load of the sched_entity. Similarly to what was done before the rewrite of PELT, we add a correction factor in case the task group's load is greater than its share so it will contribute the same load of a task of equal weight. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 205 ++++++++++++++++++++++++++++++++++++++++++++++++++- kernel/sched/sched.h | 1 + 2 files changed, 205 insertions(+), 1 deletion(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1d0034c..184b544 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3032,6 +3032,165 @@ static inline void cfs_rq_util_change(struct cfs_rq *cfs_rq) } /* + * Signed add and clamp on underflow. + * + * Explicitly do a load-store to ensure the intermediate value never hits + * memory. This allows lockless observations without ever seeing the negative + * values. + */ +#define add_positive(_ptr, _val) do { \ + typeof(_ptr) ptr = (_ptr); \ + typeof(_val) val = (_val); \ + typeof(*ptr) res, var = READ_ONCE(*ptr); \ + \ + res = var + val; \ + \ + if (val < 0 && res > var) \ + res = 0; \ + \ + WRITE_ONCE(*ptr, res); \ +} while (0) + +#ifdef CONFIG_FAIR_GROUP_SCHED +/* Take into account change of utilization of a child task group */ +static inline void +update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + struct cfs_rq *gcfs_rq = group_cfs_rq(se); + long delta = gcfs_rq->avg.util_avg - se->avg.util_avg; + + /* Nothing to update */ + if (!delta) + return; + + /* Set new sched_entity's utilization */ + se->avg.util_avg = gcfs_rq->avg.util_avg; + se->avg.util_sum = se->avg.util_avg * LOAD_AVG_MAX; + + /* Update parent cfs_rq utilization */ + add_positive(&cfs_rq->avg.util_avg, delta); + cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * LOAD_AVG_MAX; +} + +/* Take into account change of load of a child task group */ +static inline void +update_tg_cfs_load(struct cfs_rq *cfs_rq, struct sched_entity *se) +{ + struct cfs_rq *gcfs_rq = group_cfs_rq(se); + long delta, load = gcfs_rq->avg.load_avg; + + /* + * If the load of group cfs_rq is null, the load of the + * sched_entity will also be null so we can skip the formula + */ + if (load) { + long tg_load; + + /* Get tg's load and ensure tg_load > 0 */ + tg_load = atomic_long_read(&gcfs_rq->tg->load_avg) + 1; + + /* Ensure tg_load >= load and updated with current load*/ + tg_load -= gcfs_rq->tg_load_avg_contrib; + tg_load += load; + + /* + * We need to compute a correction term in the case that the + * task group is consuming more cpu than a task of equal + * weight. A task with a weight equals to tg->shares will have + * a load less or equal to scale_load_down(tg->shares). + * Similarly, the sched_entities that represent the task group + * at parent level, can't have a load higher than + * scale_load_down(tg->shares). And the Sum of sched_entities' + * load must be <= scale_load_down(tg->shares). + */ + if (tg_load > scale_load_down(gcfs_rq->tg->shares)) { + /* scale gcfs_rq's load into tg's shares*/ + load *= scale_load_down(gcfs_rq->tg->shares); + load /= tg_load; + } + } + + delta = load - se->avg.load_avg; + + /* Nothing to update */ + if (!delta) + return; + + /* Set new sched_entity's load */ + se->avg.load_avg = load; + se->avg.load_sum = se->avg.load_avg * LOAD_AVG_MAX; + + /* Update parent cfs_rq load */ + add_positive(&cfs_rq->avg.load_avg, delta); + cfs_rq->avg.load_sum = cfs_rq->avg.load_avg * LOAD_AVG_MAX; + + /* + * If the sched_entity is already enqueued, we also have to update the + * runnable load avg. + */ + if (se->on_rq) { + /* Update parent cfs_rq runnable_load_avg */ + add_positive(&cfs_rq->runnable_load_avg, delta); + cfs_rq->runnable_load_sum = cfs_rq->runnable_load_avg * LOAD_AVG_MAX; + } +} + +static inline void set_tg_cfs_propagate(struct cfs_rq *cfs_rq) +{ + /* set cfs_rq's flag */ + cfs_rq->propagate_avg = 1; +} + +static inline int test_and_clear_tg_cfs_propagate(struct sched_entity *se) +{ + /* Get my cfs_rq */ + struct cfs_rq *cfs_rq = group_cfs_rq(se); + + /* Nothing to propagate */ + if (!cfs_rq->propagate_avg) + return 0; + + /* Clear my cfs_rq's flag */ + cfs_rq->propagate_avg = 0; + + return 1; +} + +/* Update task and its cfs_rq load average */ +static inline int propagate_entity_load_avg(struct sched_entity *se) +{ + struct cfs_rq *cfs_rq; + + if (entity_is_task(se)) + return 0; + + if (!test_and_clear_tg_cfs_propagate(se)) + return 0; + + /* Get parent cfs_rq */ + cfs_rq = cfs_rq_of(se); + + /* Propagate to parent */ + set_tg_cfs_propagate(cfs_rq); + + /* Update utilization */ + update_tg_cfs_util(cfs_rq, se); + + /* Update load */ + update_tg_cfs_load(cfs_rq, se); + + return 1; +} +#else +static inline int propagate_entity_load_avg(struct sched_entity *se) +{ + return 0; +} + +static inline void set_tg_cfs_propagate(struct cfs_rq *cfs_rq) {} +#endif + +/* * Unsigned subtract and clamp on underflow. * * Explicitly do a load-store to ensure the intermediate value never hits @@ -3112,6 +3271,7 @@ static inline void update_load_avg(struct sched_entity *se, int flags) u64 now = cfs_rq_clock_task(cfs_rq); struct rq *rq = rq_of(cfs_rq); int cpu = cpu_of(rq); + int decayed; /* * Track task load average for carrying it to new CPU after migrated, and @@ -3123,7 +3283,11 @@ static inline void update_load_avg(struct sched_entity *se, int flags) cfs_rq->curr == se, NULL); } - if (update_cfs_rq_load_avg(now, cfs_rq, true) && (flags & UPDATE_TG)) + decayed = update_cfs_rq_load_avg(now, cfs_rq, true); + + decayed |= propagate_entity_load_avg(se); + + if (decayed && (flags & UPDATE_TG)) update_tg_load_avg(cfs_rq, 0); } @@ -3142,6 +3306,7 @@ static void attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s cfs_rq->avg.load_sum += se->avg.load_sum; cfs_rq->avg.util_avg += se->avg.util_avg; cfs_rq->avg.util_sum += se->avg.util_sum; + set_tg_cfs_propagate(cfs_rq); cfs_rq_util_change(cfs_rq); } @@ -3161,6 +3326,7 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s sub_positive(&cfs_rq->avg.load_sum, se->avg.load_sum); sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); + set_tg_cfs_propagate(cfs_rq); cfs_rq_util_change(cfs_rq); } @@ -8698,6 +8864,28 @@ static inline bool vruntime_normalized(struct task_struct *p) return false; } +#ifdef CONFIG_FAIR_GROUP_SCHED +/* + * Propagate the changes of the sched_entity across the tg tree to make it + * visible to the root + */ +static void propagate_entity_cfs_rq(struct sched_entity *se) +{ + struct cfs_rq *cfs_rq; + + for_each_sched_entity(se) { + cfs_rq = cfs_rq_of(se); + + if (cfs_rq_throttled(cfs_rq)) + break; + + update_load_avg(se, UPDATE_TG); + } +} +#else +static void propagate_entity_cfs_rq(struct sched_entity *se) { } +#endif + static void detach_entity_cfs_rq(struct sched_entity *se) { struct cfs_rq *cfs_rq = cfs_rq_of(se); @@ -8706,6 +8894,12 @@ static void detach_entity_cfs_rq(struct sched_entity *se) update_load_avg(se, 0); detach_entity_load_avg(cfs_rq, se); update_tg_load_avg(cfs_rq, false); + + /* + * Propagate the detach across the tg tree to make it visible to the + * root + */ + propagate_entity_cfs_rq(se->parent); } static void attach_entity_cfs_rq(struct sched_entity *se) @@ -8724,6 +8918,12 @@ static void attach_entity_cfs_rq(struct sched_entity *se) update_load_avg(se, sched_feat(ATTACH_AGE_LOAD) ? 0 : SKIP_AGE_LOAD); attach_entity_load_avg(cfs_rq, se); update_tg_load_avg(cfs_rq, false); + + /* + * Propagate the attach across the tg tree to make it visible to the + * root + */ + propagate_entity_cfs_rq(se->parent); } static void detach_task_cfs_rq(struct task_struct *p) @@ -8803,6 +9003,9 @@ void init_cfs_rq(struct cfs_rq *cfs_rq) cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime; #endif #ifdef CONFIG_SMP +#ifdef CONFIG_FAIR_GROUP_SCHED + cfs_rq->propagate_avg = 0; +#endif atomic_long_set(&cfs_rq->removed_load_avg, 0); atomic_long_set(&cfs_rq->removed_util_avg, 0); #endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 2646244..a9c7527 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -404,6 +404,7 @@ struct cfs_rq { unsigned long runnable_load_avg; #ifdef CONFIG_FAIR_GROUP_SCHED unsigned long tg_load_avg_contrib; + unsigned long propagate_avg; #endif atomic_long_t removed_load_avg, removed_util_avg; #ifndef CONFIG_64BIT