From patchwork Tue Apr 12 11:56:45 2016
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Vincent Guittot <vincent.guittot@linaro.org>
X-Patchwork-Id: 65611
Delivered-To: patch@linaro.org
Received: by 10.140.93.198 with SMTP id d64csp1873815qge;
 Tue, 12 Apr 2016 04:56:54 -0700 (PDT)
X-Received: by 10.66.123.105 with SMTP id lz9mr4009505pab.37.1460462214327; 
 Tue, 12 Apr 2016 04:56:54 -0700 (PDT)
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id
 fv10si9987040pac.231.2016.04.12.04.56.54; 
 Tue, 12 Apr 2016 04:56:54 -0700 (PDT)
Received-SPF: pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender) client-ip=209.132.180.67; 
Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org;
 spf=pass (google.com: best guess record for domain of
 linux-kernel-owner@vger.kernel.org designates 209.132.180.67
 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org; 
 dmarc=pass (p=NONE dis=NONE) header.from=linaro.org
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S933540AbcDLL4w (ORCPT <rfc822;pramod.gurav@linaro.org>
 + 29 others); Tue, 12 Apr 2016 07:56:52 -0400
Received: from mail-wm0-f42.google.com ([74.125.82.42]:37627 "EHLO
 mail-wm0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1755868AbcDLL4u (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Tue, 12 Apr 2016 07:56:50 -0400
Received: by mail-wm0-f42.google.com with SMTP id n3so24841440wmn.0
 for <linux-kernel@vger.kernel.org>;
 Tue, 12 Apr 2016 04:56:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; 
 h=date:from:to:cc:subject:message-id:references:mime-version
 :content-disposition:content-transfer-encoding:in-reply-to
 :user-agent; bh=wl1M9F/b0WujSd3MmP+677uNNwNyDY/LWGaAtxcRHDw=;
 b=MvRXxY3sp7LCKBdTWrSURUOP5yniT3gbbpfSagrZ8rMBnADCumgxNr9J4fL0kadZwb
 GDwNZJpqD6T05JgRtBJa8QC4gU5DQtVCzc50gWqJzyCo9Z9fVM0C8OGHktQsgi/KLx0d
 9y1jtkyrPKAhBcYgSLr6QvThkNbDgwjd0BJHU=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:date:from:to:cc:subject:message-id:references
 :mime-version:content-disposition:content-transfer-encoding
 :in-reply-to:user-agent;
 bh=wl1M9F/b0WujSd3MmP+677uNNwNyDY/LWGaAtxcRHDw=;
 b=enQUWzABLlM9XHOW6lBRLK/TEvdNZv0nLflOcnYVUENAeO+lA7+D92/BbjfGostxRQ
 JU/f3nJhAM9pKTunaraDpWx5II6xYbM1w3vZdQGDzEWq4+xbrWhr8NOeudNL2dM0yTKY
 vgHAqFl5qtk9/01brLP4RGMyV4jYjkjkAJQdt6IvqXasPTcUybegDheqW1d/xsamFWVD
 3MLhm1vKIakzUwsebkuD4NZ1FUyto6BUsogyFaXAS39Phg6IzUjFADDzHeMMj8FTpihi
 1rKmGuAVoOxNablF86d/JCUc/XSVFHPDWoQzQprPmSWuvOm2EzFf8TNacBlD85iFvYzt
 88vg==
X-Gm-Message-State: AOPr4FUeHvOS1Ktu5zOBqJ1x2+dxSzXdpcRh/K1s8iiPgbSTuy/tg7sl97LOUDHIbz7MDbl6
X-Received: by 10.194.3.20 with SMTP id 20mr3634323wjy.59.1460462208898;
 Tue, 12 Apr 2016 04:56:48 -0700 (PDT)
Received: from vingu-laptop ([2a01:e35:8bd4:7750:6c62:746c:8110:849f])
 by smtp.gmail.com with ESMTPSA id
 lh1sm33020191wjb.20.2016.04.12.04.56.47
 (version=TLS1_2 cipher=AES128-SHA bits=128/128);
 Tue, 12 Apr 2016 04:56:48 -0700 (PDT)
Date: Tue, 12 Apr 2016 13:56:45 +0200
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Yuyang Du <yuyang.du@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@kernel.org>,
 linux-kernel <linux-kernel@vger.kernel.org>,
 Benjamin Segall <bsegall@google.com>, Paul Turner <pjt@google.com>,
 Morten Rasmussen <morten.rasmussen@arm.com>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
 Juri Lelli <juri.lelli@arm.com>
Subject: Re: [PATCH 2/4] sched/fair: Drop out incomplete current period when
 sched averages accrue
Message-ID: <20160412115645.GA3730@vingu-laptop>
References: <1460327765-18024-1-git-send-email-yuyang.du@intel.com>
 <1460327765-18024-3-git-send-email-yuyang.du@intel.com>
 <CAKfTPtDP4uUWevKFjjYiqyAyVPaFrLmtn11wSRMpYD60O2Jc=w@mail.gmail.com>
 <20160411194141.GH8697@intel.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <20160411194141.GH8697@intel.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Le Tuesday 12 Apr 2016 à 03:41:41 (+0800), Yuyang Du a écrit :
> Hi Vincent,
> 
> On Mon, Apr 11, 2016 at 11:08:04AM +0200, Vincent Guittot wrote:
> > > @@ -2704,11 +2694,14 @@ static __always_inline int
> > >  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > >                   unsigned long weight, int running, struct cfs_rq *cfs_rq)
> > >  {
> > > -       u64 delta, scaled_delta, periods;
> > > -       u32 contrib;
> > > -       unsigned int delta_w, scaled_delta_w, decayed = 0;
> > > +       u64 delta;
> > > +       u32 contrib, periods;
> > >         unsigned long scale_freq, scale_cpu;
> > >
> > > +       /*
> > > +        * now rolls down to a period boundary
> > > +        */
> > > +       now = now && (u64)(~0xFFFFF);
> > >         delta = now - sa->last_update_time;
> > >         /*
> > >          * This should only happen when time goes backwards, which it
> > > @@ -2720,89 +2713,56 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
> > >         }
> > >
> > >         /*
> > > -        * Use 1024ns as the unit of measurement since it's a reasonable
> > > -        * approximation of 1us and fast to compute.
> > > +        * Use 1024*1024ns as an approximation of 1ms period, pretty close.
> > >          */
> > > -       delta >>= 10;
> > > -       if (!delta)
> > > +       periods = delta >> 20;
> > > +       if (!periods)
> > >                 return 0;
> > >         sa->last_update_time = now;
> > 
> > The optimization looks quite interesting but I see one potential issue
> > with migration as we will lose the part of the ongoing period that is
> > now not saved anymore. This lost part can be quite significant for a
> > short task that ping pongs between CPUs.
> 
> Yes, basically, it is we lose precision (~1ms scale in contrast with ~1us scale).

But with a HZ set to 1000 and a sched-slice in the same range, having a precision
of 1ms instead of 1us makes the precision of load tracking of short task quite
random IMHO.

you can keep recording this partial period without using it in the load tracking.
Something like below keep precision without sacrifying the optimization.

---
 kernel/sched/fair.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

> But as I wrote, we may either lose a sub-1ms, or gain a sub-1ms, statistically,
> they should even out, given the load/util updates are quite a large number of
> samples, and we do want a lot of samples for the metrics, this is the point of
> the entire average thing. Plus, as you also said, the incomplete current period
> also plays a (somewhat) negative role here.
> 
> In addition, excluding the flat hierarchical util patch, we gain quite some
> efficiency.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 68273e8..b234169 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -674,6 +674,12 @@ void init_entity_runnable_average(struct sched_entity *se)
 	struct sched_avg *sa = &se->avg;
 
 	sa->last_update_time = 0;
+	/*
+	 * sched_avg's period_contrib should be strictly less then 1024 * 1024, so
+	 * we give it 1023 * 1024 to make sure it is almost a period (1024us), and
+	 * will definitely be updated (after enqueue).
+	 */
+	sa->period_contrib = 1023*1024;
 	sa->load_avg = scale_load_down(se->load.weight);
 	sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
 	/*
@@ -2698,10 +2704,6 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 	u32 contrib, periods;
 	unsigned long scale_freq, scale_cpu;
 
-	/*
-	 * now rolls down to a period boundary
-	 */
-	now = now && (u64)(~0xFFFFF);
 	delta = now - sa->last_update_time;
 	/*
 	 * This should only happen when time goes backwards, which it
@@ -2712,6 +2714,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		return 0;
 	}
 
+	/* Add how much left for the current period */
+	delta += sa->period_contrib;
+
 	/*
 	 * Use 1024*1024ns as an approximation of 1ms period, pretty close.
 	 */
@@ -2720,6 +2725,9 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
 		return 0;
 	sa->last_update_time = now;
 
+	/* Get how much left for the next period */
+	sa->period_contrib = delta & (u64)(0xFFFFF);
+
 	scale_freq = arch_scale_freq_capacity(NULL, cpu);
 	scale_cpu = arch_scale_cpu_capacity(NULL, cpu);