From patchwork Wed Dec 21 15:50:26 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 88756 Delivered-To: patch@linaro.org Received: by 10.182.112.6 with SMTP id im6csp2778503obb; Wed, 21 Dec 2016 07:52:27 -0800 (PST) X-Received: by 10.84.213.8 with SMTP id f8mr9418514pli.136.1482335547209; Wed, 21 Dec 2016 07:52:27 -0800 (PST) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a14si8741508pll.11.2016.12.21.07.52.26; Wed, 21 Dec 2016 07:52:27 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@linaro.org; spf=pass (google.com: best guess record for domain of stable-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=stable-owner@vger.kernel.org; dmarc=fail (p=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758627AbcLUPw0 (ORCPT + 3 others); Wed, 21 Dec 2016 10:52:26 -0500 Received: from mail-wj0-f174.google.com ([209.85.210.174]:34678 "EHLO mail-wj0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755842AbcLUPwZ (ORCPT ); Wed, 21 Dec 2016 10:52:25 -0500 Received: by mail-wj0-f174.google.com with SMTP id sd9so20401235wjb.1 for ; Wed, 21 Dec 2016 07:52:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id; bh=n8CwG6BGieq2bbSbSb+Nc2aTRbcC4PkCphNEccoq3Z4=; b=j8yoYK/OHE7Mzsua53GOS+fsshw1We3a/XI45+/0MiHsRyMfajGuJjVDg/qZQUbG/J 0bTvt5iTXlad3FQh3jCUTzZmZltcfJ8EEYwlsxarXovekgx7KRHBTGAJCgl+qqC+9Uri 2dGN6DC2Mg/ChVepImjuKoyUQgxBLuU6fVeG4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=n8CwG6BGieq2bbSbSb+Nc2aTRbcC4PkCphNEccoq3Z4=; b=LYVRyo0DakjmVHFZcdS81wxIdkZQrWSHGQLwnOxTgK0kjxH6f2QUPrBIqIqvaBmxGb YBlRVESW03d+TCaZsAp4T82B4ej6fKePvEzMYRpjDImm+1hb00X3JMDl2BS7tOXXw/uO K4REnSB1Y8xJMHNM2dvHa6UTWwXtzgmdqtx5rP8iTSzCnO00bsLdrAhjO3DO1yI1X8b9 iGNWBqF7sNloQOxx4nFop+jld9o2Tu9vBykzexw5As2mBAmS1l1RkcfWOcu1WXxB6Zts IeSToEI90w3ojOPhMKb6bQtqJx6+I83p2hNGV4q5ufgNX8To+vtIpRCcY15V9i61yLMK Yrmw== X-Gm-Message-State: AIkVDXLydVc13/I2kXNoK+YefD629JxWR+eIMLkpdMSJPFip8AIMB5F6ekBsfi2eZ6mu1Vve X-Received: by 10.194.162.8 with SMTP id xw8mr4804822wjb.125.1482335543265; Wed, 21 Dec 2016 07:52:23 -0800 (PST) Received: from localhost.localdomain ([2a01:e0a:f:6020:d546:a92b:dbfc:a949]) by smtp.gmail.com with ESMTPSA id o3sm31160944wjx.39.2016.12.21.07.52.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 21 Dec 2016 07:52:22 -0800 (PST) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org, pjt@google.com, Vincent Guittot Subject: [PATCH v2] sched: fix group_entity's share update Date: Wed, 21 Dec 2016 16:50:26 +0100 Message-Id: <1482335426-7664-1-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org The update of the share of a cfs_rq is done when its load_avg is updated but before the group_entity's load_avg has been updated for the past time slot. This generates wrong load_avg accounting which can be significant when small tasks are involved in the scheduling. Let take the example of a task a that is dequeued of its task group A: root (cfs_rq) \ (se) A (cfs_rq) \ (se) a Task "a" was the only task in task group A which becomes idle when a is dequeued. We have the sequence: - dequeue_entity a->se - update_load_avg(a->se) - dequeue_entity_load_avg(A->cfs_rq, a->se) - update_cfs_shares(A->cfs_rq) A->cfs_rq->load.weight == 0 A->se->load.weight is updated with the new share (0 in this case) - dequeue_entity A->se - update_load_avg(A->se) but its weight is now null so the last time slot (up to a tick) will be accounted with a weight of 0 instead of its real weight during the time slot. The last time slot will be accounted as an idle one whereas it was a running one. If the running time of task a is short enough that no tick happens when it runs, all running time of group entity A->se will be accounted as idle time. Instead, we should update the share of a cfs_rq (in fact the weight of its group entity) only after having updated the load_avg of the group_entity. update_cfs_shares() now take the sched_entity as parameter instead of the cfs_rq and the weight of the group_entity is updated only once its load_avg has been synced with current time. Cc: #v4.4+ Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 53 +++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 39 insertions(+), 14 deletions(-) -- 2.7.4 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6559d19..b998287 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2689,16 +2689,20 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, static inline int throttled_hierarchy(struct cfs_rq *cfs_rq); -static void update_cfs_shares(struct cfs_rq *cfs_rq) +static void update_cfs_shares(struct sched_entity *se) { - struct task_group *tg; - struct sched_entity *se; long shares; + struct task_group *tg; + struct cfs_rq *cfs_rq = group_cfs_rq(se); - tg = cfs_rq->tg; - se = tg->se[cpu_of(rq_of(cfs_rq))]; - if (!se || throttled_hierarchy(cfs_rq)) + if (!cfs_rq) return; + + if (throttled_hierarchy(cfs_rq)) + return; + + tg = cfs_rq->tg; + #ifndef CONFIG_SMP if (likely(se->load.weight == tg->shares)) return; @@ -2707,8 +2711,10 @@ static void update_cfs_shares(struct cfs_rq *cfs_rq) reweight_entity(cfs_rq_of(se), se, shares); } + + #else /* CONFIG_FAIR_GROUP_SCHED */ -static inline void update_cfs_shares(struct cfs_rq *cfs_rq) +static inline void update_cfs_shares(struct sched_entity *se) { } #endif /* CONFIG_FAIR_GROUP_SCHED */ @@ -3582,10 +3588,18 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) if (renorm && !curr) se->vruntime += cfs_rq->min_vruntime; + /* + * When enqueuing a sched_entity, we must: + * - Update loads to have both entity and cfs_rq synced with now. + * - Add its load to cfs_rq->runnable_avg + * - For group_entity, update its weight to reflect the new share of + * its group cfs_rq + * - Add its new weight to cfs_rq->load.weight + */ update_load_avg(se, UPDATE_TG); enqueue_entity_load_avg(cfs_rq, se); + update_cfs_shares(se); account_entity_enqueue(cfs_rq, se); - update_cfs_shares(cfs_rq); if (flags & ENQUEUE_WAKEUP) place_entity(cfs_rq, se, 0); @@ -3657,6 +3671,15 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * Update run-time statistics of the 'current'. */ update_curr(cfs_rq); + + /* + * When dequeuing a sched_entity, we must: + * - Update loads to have both entity and cfs_rq synced with now. + * - Substract its load from the cfs_rq->runnable_avg. + * - Substract its previous weight from cfs_rq->load.weight. + * - For group entity, update its weight to reflect the new share + * of its group cfs_rq. + */ update_load_avg(se, UPDATE_TG); dequeue_entity_load_avg(cfs_rq, se); @@ -3681,7 +3704,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) /* return excess runtime on last dequeue */ return_cfs_rq_runtime(cfs_rq); - update_cfs_shares(cfs_rq); + update_cfs_shares(se); /* * Now advance min_vruntime if @se was the entity holding it back, @@ -3864,7 +3887,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued) * Ensure that runnable average is periodically updated. */ update_load_avg(curr, UPDATE_TG); - update_cfs_shares(cfs_rq); + update_cfs_shares(curr); #ifdef CONFIG_SCHED_HRTICK /* @@ -4761,7 +4784,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) break; update_load_avg(se, UPDATE_TG); - update_cfs_shares(cfs_rq); + update_cfs_shares(se); } if (!se) @@ -4820,7 +4843,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) break; update_load_avg(se, UPDATE_TG); - update_cfs_shares(cfs_rq); + update_cfs_shares(se); } if (!se) @@ -9356,8 +9379,10 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares) /* Possible calls to update_curr() need rq clock */ update_rq_clock(rq); - for_each_sched_entity(se) - update_cfs_shares(group_cfs_rq(se)); + for_each_sched_entity(se) { + update_load_avg(se, UPDATE_TG); + update_cfs_shares(se); + } raw_spin_unlock_irqrestore(&rq->lock, flags); }