From patchwork Thu Aug 18 08:40:55 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Morten Rasmussen X-Patchwork-Id: 74140 Delivered-To: patch@linaro.org Received: by 10.140.29.52 with SMTP id a49csp239334qga; Thu, 18 Aug 2016 01:51:52 -0700 (PDT) X-Received: by 10.66.8.163 with SMTP id s3mr2087669paa.142.1471510312409; Thu, 18 Aug 2016 01:51:52 -0700 (PDT) Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e64si1461677pfg.292.2016.08.18.01.51.52; Thu, 18 Aug 2016 01:51:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753729AbcHRIvt (ORCPT + 27 others); Thu, 18 Aug 2016 04:51:49 -0400 Received: from foss.arm.com ([217.140.101.70]:56501 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753626AbcHRIvr (ORCPT ); Thu, 18 Aug 2016 04:51:47 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5E23D30C; Thu, 18 Aug 2016 01:42:25 -0700 (PDT) Received: from e105550-lin.cambridge.arm.com (usa-sjc-imap-foss1.foss.arm.com [10.72.51.249]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D8D1B3F215; Thu, 18 Aug 2016 01:40:48 -0700 (PDT) Date: Thu, 18 Aug 2016 09:40:55 +0100 From: Morten Rasmussen To: Peter Zijlstra Cc: mingo@redhat.com, dietmar.eggemann@arm.com, yuyang.du@intel.com, vincent.guittot@linaro.org, mgalbraith@suse.de, sgurrappadi@nvidia.com, freedom.tan@mediatek.com, keita.kobayashi.ym@renesas.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 10/13] sched/fair: Compute task/cpu utilization at wake-up more correctly Message-ID: <20160818084053.GG3391@e105550-lin.cambridge.arm.com> References: <1469453670-2660-1-git-send-email-morten.rasmussen@arm.com> <1469453670-2660-11-git-send-email-morten.rasmussen@arm.com> <20160815142342.GV6879@twins.programming.kicks-ass.net> <20160815154237.GE3391@e105550-lin.cambridge.arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20160815154237.GE3391@e105550-lin.cambridge.arm.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 15, 2016 at 04:42:37PM +0100, Morten Rasmussen wrote: > On Mon, Aug 15, 2016 at 04:23:42PM +0200, Peter Zijlstra wrote: > > But unlike that function, it doesn't actually use __update_load_avg(). > > Why not? > > Fair question :) > > We currently exploit the fact that the task utilization is _not_ updated > in wake-up balancing to make sure we don't under-estimate the capacity > requirements for tasks that have slept for a while. If we update it, we > loose the non-decayed 'peak' utilization, but I guess we could just > store it somewhere when we do the wake-up decay. > > I thought there was a better reason when I wrote the patch, but I don't > recall right now. I will look into it again and see if we can use > __update_load_avg() to do a proper update instead of doing things twice. AFAICT, we should be able to synchronize the task utilization to the previous rq utilization using __update_load_avg() as you suggest. The patch below is should work as a replacement without any changes to subsequent patches. It doesn't solve the under-estimation issue, but I have another patch for that. ---8<--- >From 43226a896fad077c3ab4932f797df17159779d6e Mon Sep 17 00:00:00 2001 From: Morten Rasmussen Date: Thu, 28 Apr 2016 09:52:35 +0100 Subject: [PATCH] sched/fair: Compute task/cpu utilization at wake-up more correctly At task wake-up load-tracking isn't updated until the task is enqueued. The task's own view of its utilization contribution may therefore not be aligned with its contribution to the cfs_rq load-tracking which may have been updated in the meantime. Basically, the task's own utilization hasn't yet accounted for the sleep decay, while the cfs_rq may have (partially). Estimating the cfs_rq utilization in case the task is migrated at wake-up as task_rq(p)->cfs.avg.util_avg - p->se.avg.util_avg is therefore incorrect as the two load-tracking signals aren't time synchronized (different last update). To solve this problem, this patch synchronizes the task utilization with its previous rq before the task utilization is used in the wake-up path. Currently the update/synchronization is done _after_ the task has been placed by select_task_rq_fair(). The synchronization is done without having to take the rq lock using the existing mechanism used in remove_entity_load_avg(). cc: Ingo Molnar cc: Peter Zijlstra Signed-off-by: Morten Rasmussen --- kernel/sched/fair.c | 39 +++++++++++++++++++++++++++++++++++---- 1 file changed, 35 insertions(+), 4 deletions(-) -- 1.9.1 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9e217eff3daf..8b6b8f9da28d 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3119,13 +3119,25 @@ static inline u64 cfs_rq_last_update_time(struct cfs_rq *cfs_rq) #endif /* + * Synchronize entity load avg of dequeued entity without locking + * the previous rq. + */ +void sync_entity_load_avg(struct sched_entity *se) +{ + struct cfs_rq *cfs_rq = cfs_rq_of(se); + u64 last_update_time; + + last_update_time = cfs_rq_last_update_time(cfs_rq); + __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL); +} + +/* * Task first catches up with cfs_rq, and then subtract * itself from the cfs_rq (task must be off the queue now). */ void remove_entity_load_avg(struct sched_entity *se) { struct cfs_rq *cfs_rq = cfs_rq_of(se); - u64 last_update_time; /* * tasks cannot exit without having gone through wake_up_new_task() -> @@ -3137,9 +3149,7 @@ void remove_entity_load_avg(struct sched_entity *se) * calls this. */ - last_update_time = cfs_rq_last_update_time(cfs_rq); - - __update_load_avg(last_update_time, cpu_of(rq_of(cfs_rq)), &se->avg, 0, 0, NULL); + sync_entity_load_avg(se); atomic_long_add(se->avg.load_avg, &cfs_rq->removed_load_avg); atomic_long_add(se->avg.util_avg, &cfs_rq->removed_util_avg); } @@ -5377,6 +5387,24 @@ static inline int task_util(struct task_struct *p) return p->se.avg.util_avg; } +/* + * cpu_util_wake: Compute cpu utilization with any contributions from + * the waking task p removed. + */ +static int cpu_util_wake(int cpu, struct task_struct *p) +{ + unsigned long util, capacity; + + /* Task has no contribution or is new */ + if (cpu != task_cpu(p) || !p->se.avg.last_update_time) + return cpu_util(cpu); + + capacity = capacity_orig_of(cpu); + util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0); + + return (util >= capacity) ? capacity : util; +} + static int wake_cap(struct task_struct *p, int cpu, int prev_cpu) { long min_cap, max_cap; @@ -5388,6 +5416,9 @@ static int wake_cap(struct task_struct *p, int cpu, int prev_cpu) if (max_cap - min_cap < max_cap >> 3) return 0; + /* Bring task utilization in sync with prev_cpu */ + sync_entity_load_avg(&p->se); + return min_cap * 1024 < task_util(p) * capacity_margin; }