From patchwork Wed Oct  1 13:38:55 2014
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Patchwork-Submitter: Vincent Guittot <vincent.guittot@linaro.org>
X-Patchwork-Id: 38235
Return-Path: <patchwork-forward+bncBCPZXIGQSEHBBR4IWCQQKGQEERRWUTY@linaro.org>
X-Original-To: linaro@patches.linaro.org
Delivered-To: linaro@patches.linaro.org
Received: from mail-lb0-f197.google.com (mail-lb0-f197.google.com
 [209.85.217.197])
 by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id 5AF8120549
 for <linaro@patches.linaro.org>; Wed,  1 Oct 2014 13:40:25 +0000 (UTC)
Received: by mail-lb0-f197.google.com with SMTP id p9sf621469lbv.4
 for <linaro@patches.linaro.org>; Wed, 01 Oct 2014 06:40:24 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:delivered-to:from:to:cc:subject:date:message-id
 :in-reply-to:references:mime-version:sender:precedence:list-id
 :x-original-sender:x-original-authentication-results:mailing-list
 :list-post:list-help:list-archive:list-unsubscribe:content-type
 :content-transfer-encoding;
 bh=Dy57/+05RYyW+8NJUS6ULAL68LUdGY1lLEbl7Yi6uHg=;
 b=O8+rc/u1XjSN5G5pFUIYY96wQduGuqxva0nuX1JBG5/5b4GHRTJkf0LObE0gp03/Qi
 GMiylDGvQiW0bdZyFStxk2nCYz246YiUJXKKg4v3dyl7F2vfDxYLGEWCsoho+LkbvGGN
 q+pkQS9nBRJ6pJ3Vk7EJS01kWfhVbeCmDeI/YARFLMabZrBCgEORv/D7SUYzQo10lfKn
 IWUFftGRXudvkvnHqKnI0fNmdekYHZPtvhhL4wKJP1NXwloxCqC07qUmlEJXDJ+Eybd9
 jvWFsSb/I4F9U3INn1EU/7Eqkr9yHWy5WyqnwXkNaTuHr26rQerlVd0mzL1/D8LYTF+2
 u2TQ==
X-Gm-Message-State: ALoCoQmzC298uzryJ7691w3yGLDpoUvUZXyeLaO1VmimHqS64o6R/Xl3HuMFmrjpasu4CCCTk9AT
X-Received: by 10.112.198.226 with SMTP id jf2mr4080298lbc.1.1412170824027; 
 Wed, 01 Oct 2014 06:40:24 -0700 (PDT)
X-BeenThere: patchwork-forward@linaro.org
Received: by 10.152.8.169 with SMTP id s9ls144497laa.47.gmail; Wed, 01 Oct
 2014 06:40:23 -0700 (PDT)
X-Received: by 10.152.23.99 with SMTP id l3mr55649977laf.39.1412170823804;
 Wed, 01 Oct 2014 06:40:23 -0700 (PDT)
Received: from mail-la0-f46.google.com (mail-la0-f46.google.com
 [209.85.215.46])
 by mx.google.com with ESMTPS id u2si1725490lal.70.2014.10.01.06.40.23
 for <patchwork-forward@linaro.org>
 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 01 Oct 2014 06:40:23 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.215.46 as permitted sender) client-ip=209.85.215.46; 
Received: by mail-la0-f46.google.com with SMTP id gi9so352823lab.19
 for <patchwork-forward@linaro.org>;
 Wed, 01 Oct 2014 06:40:23 -0700 (PDT)
X-Received: by 10.112.75.233 with SMTP id f9mr17923247lbw.102.1412170823596; 
 Wed, 01 Oct 2014 06:40:23 -0700 (PDT)
X-Forwarded-To: patchwork-forward@linaro.org
X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org
Delivered-To: patch@linaro.org
Received: by 10.112.130.169 with SMTP id of9csp531183lbb;
 Wed, 1 Oct 2014 06:40:22 -0700 (PDT)
X-Received: by 10.68.65.38 with SMTP id u6mr14582908pbs.157.1412170821982;
 Wed, 01 Oct 2014 06:40:21 -0700 (PDT)
Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67])
 by mx.google.com with ESMTP id sm7si931030pab.81.2014.10.01.06.40.21
 for <multiple recipients>; Wed, 01 Oct 2014 06:40:21 -0700 (PDT)
Received-SPF: none (google.com: linux-kernel-owner@vger.kernel.org does not
 designate permitted sender hosts) client-ip=209.132.180.67; 
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
 id S1751799AbaJANkS (ORCPT <rfc822;ankit.jindal@linaro.org>
 + 27 others); Wed, 1 Oct 2014 09:40:18 -0400
Received: from mail-wi0-f170.google.com ([209.85.212.170]:52011 "EHLO
 mail-wi0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
 with ESMTP id S1751411AbaJANkP (ORCPT
 <rfc822;linux-kernel@vger.kernel.org>);
 Wed, 1 Oct 2014 09:40:15 -0400
Received: by mail-wi0-f170.google.com with SMTP id hi2so546856wib.1
 for <linux-kernel@vger.kernel.org>;
 Wed, 01 Oct 2014 06:40:14 -0700 (PDT)
X-Received: by 10.180.76.37 with SMTP id h5mr13907806wiw.22.1412170814025;
 Wed, 01 Oct 2014 06:40:14 -0700 (PDT)
Received: from lmenx30s.lme.st.com
 (LPuteaux-656-01-48-212.w82-127.abo.wanadoo.fr. [82.127.83.212])
 by mx.google.com with ESMTPSA id
 t1sm18631671wiy.8.2014.10.01.06.40.12 for <multiple recipients>
 (version=TLSv1.1 cipher=ECDHE-RSA-RC4-SHA bits=128/128);
 Wed, 01 Oct 2014 06:40:13 -0700 (PDT)
From: Vincent Guittot <vincent.guittot@linaro.org>
To: peterz@infradead.org, mingo@kernel.org, riel@redhat.com,
 linux-kernel@vger.kernel.org
Cc: linaro-kernel@lists.linaro.org,
 Vincent Guittot <vincent.guittot@linaro.org>
Subject: [PATCH v2] sched: fix spurious active migration
Date: Wed,  1 Oct 2014 15:38:55 +0200
Message-Id: <1412170735-5356-1-git-send-email-vincent.guittot@linaro.org>
X-Mailer: git-send-email 1.9.1
In-Reply-To: <1412066468-4340-1-git-send-email-vincent.guittot@linaro.org>
References: <1412066468-4340-1-git-send-email-vincent.guittot@linaro.org>
MIME-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Precedence: list
List-ID: <patchwork-forward.linaro.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-Removed-Original-Auth: Dkim didn't pass.
X-Original-Sender: vincent.guittot@linaro.org
X-Original-Authentication-Results: mx.google.com; spf=pass (google.com:
 domain of
 patch+caf_=patchwork-forward=linaro.org@linaro.org designates
 209.85.215.46 as permitted sender)
 smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org
Mailing-list: list patchwork-forward@linaro.org;
 contact patchwork-forward+owners@linaro.org
X-Google-Group-Id: 836684582541
List-Post: <http://groups.google.com/a/linaro.org/group/patchwork-forward/post>, 
 <mailto:patchwork-forward@linaro.org>
List-Help: <http://support.google.com/a/linaro.org/bin/topic.py?topic=25838>, 
 <mailto:patchwork-forward+help@linaro.org>
List-Archive: <http://groups.google.com/a/linaro.org/group/patchwork-forward/>
List-Unsubscribe: <mailto:googlegroups-manage+836684582541+unsubscribe@googlegroups.com>, 
 <http://groups.google.com/a/linaro.org/group/patchwork-forward/subscribe>

Since commit caeb178c60f4 ("sched/fair: Make update_sd_pick_busiest() ...")
sd_pick_busiest returns a group that can be neither imbalanced nor overloaded
but is only more loaded than others. This change has been introduced to ensure
a better load balance in system that are not overloaded but as a side effect,
it can also generate useless active migration between groups.

Let take the example of 3 tasks on a quad cores system. We will always have an
idle core so the load balance will find a busiest group (core) whenever an ILB
is triggered and it will force an active migration (once above
nr_balance_failed threshold) so the idle core becomes busy but another core
will become idle. With the next ILB, the freshly idle core will try to pull the
task of a busy CPU.
The number of spurious active migration is not so huge in quad core system
because the ILB is not triggered so much. But it becomes significant as soon as
you have more than one sched_domain level like on a dual cluster of quad cores
where the ILB is triggered every tick when you have more than 1 busy_cpu

We need to ensure that the migration generate a real improveùent and will not
only move the avg_load imbalance on another CPU.

Before caeb178c60f4f93f1b45c0bc056b5cf6d217b67f, the filtering of such use
case was ensured by the following test in f_b_g
if ((local->idle_cpus < busiest->idle_cpus) &&
		    busiest->sum_nr_running  <= busiest->group_weight)

This patch modified the condition to take into account situation where busiest
group is not overloaded: If the diff between the number of idle cpus in 2
groups is less than or equal to 1 and the busiest group is not overloaded,
moving a task will not improve the load balance but just move it.

A test with sysbench on a dual clusters of quad cores gives the following
results:
command: sysbench --test=cpu --num-threads=5 --max-time=5 run

The HZ is 200 which means that 1000 ticks has fired during the test.

-With Mainline, perf gives the following figures

Samples: 727  of event 'sched:sched_migrate_task'
Event count (approx.): 727
  Overhead  Command          Shared Object  Symbol
  ........  ...............  .............  ..............
    12.52%  migration/1      [unknown]      [.] 00000000
    12.52%  migration/5      [unknown]      [.] 00000000
    12.52%  migration/7      [unknown]      [.] 00000000
    12.10%  migration/6      [unknown]      [.] 00000000
    11.83%  migration/0      [unknown]      [.] 00000000
    11.83%  migration/3      [unknown]      [.] 00000000
    11.14%  migration/4      [unknown]      [.] 00000000
    10.87%  migration/2      [unknown]      [.] 00000000
     2.75%  sysbench         [unknown]      [.] 00000000
     0.83%  swapper          [unknown]      [.] 00000000
     0.55%  ktps65090charge  [unknown]      [.] 00000000
     0.41%  mmcqd/1          [unknown]      [.] 00000000
     0.14%  perf             [unknown]      [.] 00000000

-With this patch, perf gives the following figures

Samples: 20  of event 'sched:sched_migrate_task'
Event count (approx.): 20
  Overhead  Command          Shared Object  Symbol
  ........  ...............  .............  ..............
    80.00%  sysbench         [unknown]      [.] 00000000
    10.00%  swapper          [unknown]      [.] 00000000
     5.00%  ktps65090charge  [unknown]      [.] 00000000
     5.00%  migration/1      [unknown]      [.] 00000000

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
---

Change since v1:
- reorder and rework the conditions

 kernel/sched/fair.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 2a1e6ac..3bc67ba 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6425,13 +6425,14 @@ static struct sched_group *find_busiest_group(struct lb_env *env)
 
 	if (env->idle == CPU_IDLE) {
 		/*
-		 * This cpu is idle. If the busiest group load doesn't
-		 * have more tasks than the number of available cpu's and
-		 * there is no imbalance between this and busiest group
-		 * wrt to idle cpu's, it is balanced.
+		 * This cpu is idle. If the busiest group is not overloaded
+		 * and there is no imbalance between this and busiest group
+		 * wrt idle cpus, it is balanced. The imbalance becomes
+		 * significant if the diff is greater than 1 otherwise we
+		 * might end up to just move the imbalance on another group
 		 */
-		if ((local->idle_cpus < busiest->idle_cpus) &&
-		    busiest->sum_nr_running <= busiest->group_weight)
+		if ((busiest->group_type != group_overloaded) &&
+				(local->idle_cpus <= (busiest->idle_cpus + 1)))
 			goto out_balanced;
 	} else {
 		/*