From patchwork Thu Aug 1 14:40:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170379 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538455ile; Thu, 1 Aug 2019 07:40:39 -0700 (PDT) X-Google-Smtp-Source: APXvYqzJ6WT+OFnsG8mVU5JSrRcPf09bNxfxP5YRk72G8md+wuwyBfPAejcEtg2KfGcHE2kap7nJ X-Received: by 2002:a17:90b:d82:: with SMTP id bg2mr9237147pjb.87.1564670438889; Thu, 01 Aug 2019 07:40:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670438; cv=none; d=google.com; s=arc-20160816; b=PsXQPKsU4eCTzb63ATNAETgeJwnOCW3ioG4OgMQamTGycba8OmCZmoZZJql35w7Pqg fHA+mPzwlZQx9VOGl2sWLH7LZQBtYEzPaVyHMDU2GVBAi4KEg+7lKiT0T+3ZJBzddPj5 9QQR+iXjIvteiXOkXjJgLuLgSJCW9giNUDU9dnwhd4GyrOZxMnwQohvoZrDlgCj6RO4+ RM4vYjCAoURdEml752AUjE/4gKeRwnYXNtAeUuu2YentPk8osTGltGXuNbjhx+1FCg93 J1xrk0peac2VRR3/pYUF8bJ5N+aDwP4YdTSlRHOHSlZYx02JYqqhShVHXhM53csJzyuQ nTLg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=dUYVYalJMZgAeVU29YjiRXel4z/PC0IpF1q5CiNQv+8=; b=pYbcRdqVzYHkU9/9sCypfmHzj19TOkeVuiWF1iyJMEpqXMOlLul0Y9dRaOdunaHUFB aFtH3SGhWMhju1lAY2stKP2ZEw/FClBr07wLsy3tNHOiO8fdjuRLw1PjYMbG5iBcpF4Q UoM9mxkggamRQ4VbySBJ7Mce2LzCrQrMOLugw/xT983Wzv8TlbHEIz/DVlAVIs35KM2z j/JEGFYyI/7apJQajMt5q1Z6do1GujT8Hc09hRVs4IkmPOMJTF9rx6Z82SdMuBpvE9BR wipKdZrZENBcvTWari3W/+iqXbtH+6J8TVNuGfc8n9eLNMGzFw6/dHQgt2cS1yQpqUzd tiIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=tX6MNZQm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d3si35488766pfr.161.2019.08.01.07.40.37; Thu, 01 Aug 2019 07:40:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=tX6MNZQm; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731688AbfHAOkg (ORCPT + 29 others); Thu, 1 Aug 2019 10:40:36 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:42958 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726409AbfHAOke (ORCPT ); Thu, 1 Aug 2019 10:40:34 -0400 Received: by mail-wr1-f67.google.com with SMTP id x1so23978013wrr.9 for ; Thu, 01 Aug 2019 07:40:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=dUYVYalJMZgAeVU29YjiRXel4z/PC0IpF1q5CiNQv+8=; b=tX6MNZQmI4hZOvjFzOFXnPB78JPjpMhfMK094UjgR99Ot4cRbA0mHjt3glVc5Tlhi1 /AmUz1yniLwdrwvedvRAGhf1DuJiqEzcFxSB3sB4wzd9d/RUYeBc1WVf1CiS1DHUL/Hx Bq3fx9EgemQHWPsRHaHrjOdOaXkTOPc+NPsNszFgv87cowRZMLcv0mFCpsjd7aSVyM34 2KUODNrbC54YTPWpi1UeC76DvtBVXd+MeB4QFFz4/2dBXaRR3fbnm76VqkkFJiuNwKVp OyJ2RnEEr74e1hIjXwBcKwRgTJdP/C9Hg5DphwgsrcbhCacUSzEY5VxmUReDjDDI5L/2 gkrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=dUYVYalJMZgAeVU29YjiRXel4z/PC0IpF1q5CiNQv+8=; b=oRcBDBTBND8mmY/MivTCjanJxoFn4nIVOy8GUfBQRX4XADlRhgLPsDsZlhQxZcIxmd aUOf5UXf3dBOUgiTGUAxW9WGvjX+QEzUA70TERzTH9V8EnqH37BSp9LKHytySNlxsQW7 rBi5tc7DAbhBjm7fdM6KW5L8rF0Z107SyOFlv0ynSSGlvGlkPhJu4jLeUIWrgzl/hui3 VAgIvfl/YpOcOUK0Ksmq2CzhuLZWgk7MzbIcRHb+sy+cj+jQ2oU2hApHRogZ7UQgcCD0 7IulvUvdfqNQHByXcgIU0WCh+q1L7kSSn4bVqUohuw3OZRSFOM00BAuYrTl8QM4amqhf cwJQ== X-Gm-Message-State: APjAAAWYRnn5SBr5iLdIS4PdblFPCUmL85UunoVqu6LILljXR57xycwY 7FOOqQCUryb97k8uKE9F21ahXQk+vDM= X-Received: by 2002:a5d:63c8:: with SMTP id c8mr61570347wrw.21.1564670432649; Thu, 01 Aug 2019 07:40:32 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.31 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:32 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 1/8] sched/fair: clean up asym packing Date: Thu, 1 Aug 2019 16:40:17 +0200 Message-Id: <1564670424-26023-2-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Clean up asym packing to follow the default load balance behavior: - classify the group by creating a group_asym_capacity field. - calculate the imbalance in calculate_imbalance() instead of bypassing it. We don't need to test twice same conditions anymore to detect asym packing and we consolidate the calculation of imbalance in calculate_imbalance(). There is no functional changes. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 63 ++++++++++++++--------------------------------------- 1 file changed, 16 insertions(+), 47 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index fb75c0b..b432349 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7743,6 +7743,7 @@ struct sg_lb_stats { unsigned int group_weight; enum group_type group_type; int group_no_capacity; + int group_asym_capacity; unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; @@ -8197,9 +8198,17 @@ static bool update_sd_pick_busiest(struct lb_env *env, * ASYM_PACKING needs to move all the work to the highest * prority CPUs in the group, therefore mark all groups * of lower priority than ourself as busy. + * + * This is primarily intended to used at the sibling level. Some + * cores like POWER7 prefer to use lower numbered SMT threads. In the + * case of POWER7, it can move to lower SMT modes only when higher + * threads are idle. When in lower SMT modes, the threads will + * perform better since they share less core resources. Hence when we + * have idle threads, we want them to be the higher ones. */ if (sgs->sum_nr_running && sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { + sgs->group_asym_capacity = 1; if (!sds->busiest) return true; @@ -8341,51 +8350,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } /** - * check_asym_packing - Check to see if the group is packed into the - * sched domain. - * - * This is primarily intended to used at the sibling level. Some - * cores like POWER7 prefer to use lower numbered SMT threads. In the - * case of POWER7, it can move to lower SMT modes only when higher - * threads are idle. When in lower SMT modes, the threads will - * perform better since they share less core resources. Hence when we - * have idle threads, we want them to be the higher ones. - * - * This packing function is run on idle threads. It checks to see if - * the busiest CPU in this domain (core in the P7 case) has a higher - * CPU number than the packing function is being run on. Here we are - * assuming lower CPU number will be equivalent to lower a SMT thread - * number. - * - * Return: 1 when packing is required and a task should be moved to - * this CPU. The amount of the imbalance is returned in env->imbalance. - * - * @env: The load balancing environment. - * @sds: Statistics of the sched_domain which is to be packed - */ -static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds) -{ - int busiest_cpu; - - if (!(env->sd->flags & SD_ASYM_PACKING)) - return 0; - - if (env->idle == CPU_NOT_IDLE) - return 0; - - if (!sds->busiest) - return 0; - - busiest_cpu = sds->busiest->asym_prefer_cpu; - if (sched_asym_prefer(busiest_cpu, env->dst_cpu)) - return 0; - - env->imbalance = sds->busiest_stat.group_load; - - return 1; -} - -/** * fix_small_imbalance - Calculate the minor imbalance that exists * amongst the groups of a sched_domain, during * load balancing. @@ -8469,6 +8433,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s local = &sds->local_stat; busiest = &sds->busiest_stat; + if (busiest->group_asym_capacity) { + env->imbalance = busiest->group_load; + return; + } + if (busiest->group_type == group_imbalanced) { /* * In the group_imb case we cannot rely on group-wide averages @@ -8573,8 +8542,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) busiest = &sds.busiest_stat; /* ASYM feature bypasses nice load balance check */ - if (check_asym_packing(env, &sds)) - return sds.busiest; + if (busiest->group_asym_capacity) + goto force_balance; /* There is no busy sibling group to pull tasks from */ if (!sds.busiest || busiest->sum_nr_running == 0) From patchwork Thu Aug 1 14:40:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170386 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538950ile; Thu, 1 Aug 2019 07:41:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqwZ0XglxfBXDROYlmUtqr2S2lLpYeNYbQHJYvhQ1ZjBPoqa5jTqvq7YuIyFiMk8ZGwj8OeB X-Received: by 2002:a17:902:846:: with SMTP id 64mr126575620plk.265.1564670461879; Thu, 01 Aug 2019 07:41:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670461; cv=none; d=google.com; s=arc-20160816; b=D5BuABjn3faWDVqjk6qwHuhk23vqI9eoOfs2QVSKeV5ySrbqf7cvPBZ4QTpH4v+Crm iVkP/wWqdCRO6TjlwgnpzVFngYS3vFdb7nHNuRr1xBhwerMqQtBxG9q/uR5EraEBFayZ ruJVhW94xsE3M7qNmjrMqDGlG+kU6fl53+myTUgIrB91kQ9jEwrEIjJt1IHmOnyjf/ZF NUsq+wWtLsMSdIXmAuWE0E4zxn8DBOmba8lUxNoXRTwU8KqWG2MPc2QrSVuJRt0RtRVw VzAejCVHM1hZcWci7EFCiN+xrritH/v5QwMg6dM+Ypob/D/Z2W0IbPNLABcIfECsmkMb nRtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=axmD3S1h/frv2E13/5bCDU36iLrwU3ahTJXFWAatH2M=; b=gp/6ddt/LGF+W/lFMpCJ1yHN+BFKG7w7V37KegkAPze+DkkW9UQlrQ4drm55nx8dys SWczHpQm3BvVFfzNn/mRm0KynfYJHrRFBi0R2IAIUHrKeVoLly07xdXZ19eqaad/31Gh 4GzGVJGynfzaCKx/aSYIzk+CySxxu2lQZjUHylbJNi64vdq3qVRZpBCvLFawXlGAbPaX exyEz6pXNwFTZNDliZ9GGHvPEk/zQZ8GGc0I011iItloWel0gD7aaxUWxtw0VuqcIczF ZQJcw3BmEPi83Jz4AC72NVv7zhnfqQWGM5iEmMciCUDn4GdlTrItSe4XdOhzyPO1Q41x eU8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=aRb9sA6W; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a63si34283174pge.113.2019.08.01.07.41.01; Thu, 01 Aug 2019 07:41:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=aRb9sA6W; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730926AbfHAOlA (ORCPT + 29 others); Thu, 1 Aug 2019 10:41:00 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:52147 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731630AbfHAOkg (ORCPT ); Thu, 1 Aug 2019 10:40:36 -0400 Received: by mail-wm1-f68.google.com with SMTP id 207so64928194wma.1 for ; Thu, 01 Aug 2019 07:40:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=axmD3S1h/frv2E13/5bCDU36iLrwU3ahTJXFWAatH2M=; b=aRb9sA6WjjV7IDju8NSoS81d3+VJ0baffXtJVjDihdnhlfl18KMgPeZJVEVKM5RIZN unmk/MzC+OZHdchCrfIdb1tNXDqArErWo2fzRpVP1Wgow0ocjNybi1lgVwXMdr6zrLEf N6i2IwsjDUquVOQjyIZtPEWXXNJDve1KfDi7kHogCor57/fRQkzDcIUBkMFaETbuKKt3 QYGrvUqlUd36hype07UtDNL7ifu2rCPyTt8jIf5k1AR61Iip62Bm5xI5WMLPqlpqO6Qk 89EAyIvaUOcQkk4IgbH+esvo5cYRk0/5sYeSC2JW7vO0CzsQwqWEKxf3lrZVAjcHI8ED WuKw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=axmD3S1h/frv2E13/5bCDU36iLrwU3ahTJXFWAatH2M=; b=ZD7Kzh9v8oGtYkzMReHgkjVohYtPH0UGBd0+X/Wh29Gr0QWMoFLCLq/efTPDoia5Ib D2HWWJzCgiLqrMi3g79mKEebAD8DtMkvRlbAuoCMlOh1hQkZTMiWOY+rvDerhh3xhLk+ fbzwlodhOWSHUdYcL47bWPmYVBx9QTqnjJOdIuiRuDbojNP4d0M5RsWoSMHoJSv+/Js6 9hopBBZxI93r1ZX1cfH4Top0O0wTK9GMbkesi2tILBiESFJNeqQ8Tv3JZI5jF1b/MfLL Yww1oqt3I+QkSwre/IQb57cJQDYwLga+e6kIVOFHVfD3nsJ7eSAqbODaBb1v/ZRw/38h u5mw== X-Gm-Message-State: APjAAAXbZkmqbmAb7aCwXrFD1CGsgJ59NrkZ49mQXoleZoUkzoM4XYA2 A+JweNBzAewa+V27SiKU6TFSg4Hgl2A= X-Received: by 2002:a05:600c:228b:: with SMTP id 11mr17324724wmf.26.1564670433681; Thu, 01 Aug 2019 07:40:33 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.32 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:32 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 2/8] sched/fair: rename sum_nr_running to sum_h_nr_running Date: Thu, 1 Aug 2019 16:40:18 +0200 Message-Id: <1564670424-26023-3-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rename sum_nr_running to sum_h_nr_running because it effectively tracks cfs->h_nr_running so we can use sum_nr_running to track rq->nr_running when needed. There is no functional changes. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index b432349..d7f76b0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7738,7 +7738,7 @@ struct sg_lb_stats { unsigned long load_per_task; unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ - unsigned int sum_nr_running; /* Nr tasks running in the group */ + unsigned int sum_h_nr_running; /* Nr tasks running in the group */ unsigned int idle_cpus; unsigned int group_weight; enum group_type group_type; @@ -7783,7 +7783,7 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds) .total_capacity = 0UL, .busiest_stat = { .avg_load = 0UL, - .sum_nr_running = 0, + .sum_h_nr_running = 0, .group_type = group_other, }, }; @@ -7974,7 +7974,7 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running < sgs->group_weight) + if (sgs->sum_h_nr_running < sgs->group_weight) return true; if ((sgs->group_capacity * 100) > @@ -7995,7 +7995,7 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running <= sgs->group_weight) + if (sgs->sum_h_nr_running <= sgs->group_weight) return false; if ((sgs->group_capacity * 100) < @@ -8087,7 +8087,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_load += cpu_runnable_load(rq); sgs->group_util += cpu_util(i); - sgs->sum_nr_running += rq->cfs.h_nr_running; + sgs->sum_h_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; if (nr_running > 1) @@ -8117,8 +8117,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; - if (sgs->sum_nr_running) - sgs->load_per_task = sgs->group_load / sgs->sum_nr_running; + if (sgs->sum_h_nr_running) + sgs->load_per_task = sgs->group_load / sgs->sum_h_nr_running; sgs->group_weight = group->group_weight; @@ -8175,7 +8175,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * capable CPUs may harm throughput. Maximize throughput, * power/energy consequences are not considered. */ - if (sgs->sum_nr_running <= sgs->group_weight && + if (sgs->sum_h_nr_running <= sgs->group_weight && group_smaller_min_cpu_capacity(sds->local, sg)) return false; @@ -8206,7 +8206,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * perform better since they share less core resources. Hence when we * have idle threads, we want them to be the higher ones. */ - if (sgs->sum_nr_running && + if (sgs->sum_h_nr_running && sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { sgs->group_asym_capacity = 1; if (!sds->busiest) @@ -8224,9 +8224,9 @@ static bool update_sd_pick_busiest(struct lb_env *env, #ifdef CONFIG_NUMA_BALANCING static inline enum fbq_type fbq_classify_group(struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running > sgs->nr_numa_running) + if (sgs->sum_h_nr_running > sgs->nr_numa_running) return regular; - if (sgs->sum_nr_running > sgs->nr_preferred_running) + if (sgs->sum_h_nr_running > sgs->nr_preferred_running) return remote; return all; } @@ -8301,7 +8301,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd */ if (prefer_sibling && sds->local && group_has_capacity(env, local) && - (sgs->sum_nr_running > local->sum_nr_running + 1)) { + (sgs->sum_h_nr_running > local->sum_h_nr_running + 1)) { sgs->group_no_capacity = 1; sgs->group_type = group_classify(sg, sgs); } @@ -8313,7 +8313,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd next_group: /* Now, start updating sd_lb_stats */ - sds->total_running += sgs->sum_nr_running; + sds->total_running += sgs->sum_h_nr_running; sds->total_load += sgs->group_load; sds->total_capacity += sgs->group_capacity; @@ -8367,7 +8367,7 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) local = &sds->local_stat; busiest = &sds->busiest_stat; - if (!local->sum_nr_running) + if (!local->sum_h_nr_running) local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); else if (busiest->load_per_task > local->load_per_task) imbn = 1; @@ -8465,7 +8465,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s */ if (busiest->group_type == group_overloaded && local->group_type == group_overloaded) { - load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE; + load_above_capacity = busiest->sum_h_nr_running * SCHED_CAPACITY_SCALE; if (load_above_capacity > busiest->group_capacity) { load_above_capacity -= busiest->group_capacity; load_above_capacity *= scale_load_down(NICE_0_LOAD); @@ -8546,7 +8546,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* There is no busy sibling group to pull tasks from */ - if (!sds.busiest || busiest->sum_nr_running == 0) + if (!sds.busiest || busiest->sum_h_nr_running == 0) goto out_balanced; /* XXX broken for overlapping NUMA groups */ @@ -8868,7 +8868,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, env.src_rq = busiest; ld_moved = 0; - if (busiest->nr_running > 1) { + if (busiest->cfs.h_nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found * an imbalance but busiest->nr_running <= 1, the group is From patchwork Thu Aug 1 14:40:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170381 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538564ile; Thu, 1 Aug 2019 07:40:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqyzPzK68fWJJcs9xo0JL19ma715wgCcFpVc0dBU1Xn2JwdGOW+vwOb2jNeZW4+fuaCqnN3O X-Received: by 2002:a17:902:7d86:: with SMTP id a6mr126335290plm.199.1564670443691; Thu, 01 Aug 2019 07:40:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670443; cv=none; d=google.com; s=arc-20160816; b=K24KgRcKZKKuwPskN3N5m51ZIz9TX6Cb1ATozh8acRkNItGCIB9uwlvkm+p6hWFuFs XgE3IH4ZvNFubprGjbDppvnr8nD0ayCNG38eVa9Rzsrv9VxAGJINKG/12F9dnBFd5VYo 9FPdain7nkBk246rSoJWv1yofoj/Mpqj+Vsrjq1Y+JKaXzW3EGBs+p/1+/Z6NcCsjMPR hKabFU6IF6pmLhgtn2RSGaFnLWOGTIAFnM0GcIVidRA/K1WVfyQiryxgfKtp9hjHeqAJ 1KnHDY0iL60JzdHIU2pDNml76dYTpDZehbWmpcaPy/5G90vW0fmD2KHYLpCjBbBBJ8CO cjlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=893Na5Ygp+4zeJW+8ck9Lf3W5foV9bFTJAzn+BBXDEc=; b=N9/20GC3NAYFgFTSLT/9XCszYlySXcUeeqCrl30kaZOisjrA0QYHM2jenz1/IIRmfP hm3Re8gUuGuYHYML0eB7PvlsCl+Lf6UhO5Vux11MLex1YguMaMwfDiG3NQyZvwPVFGXc xXWgr1ZsZOhnF+4wKocLuG3QKky6XeBSph8a5NtqYpl/O2AZKCUsp2uQ5Z8L8WoO/MHY dmLVB0kg7G8O9zqopdsn/wT+7nIqNt/ZwwH1xDkjqX0O4rWNMvd4KwmGWQ7tx8T9F4Hz kdFo+236IaqRAi1lEZHeGoV88ObK4bXhKOWHn0qwWSJCykaXHeWeNe+8ATXDjqVD1d6f aqKA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=zD0IG2rE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a19si39063386pgw.234.2019.08.01.07.40.42; Thu, 01 Aug 2019 07:40:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=zD0IG2rE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732220AbfHAOkm (ORCPT + 29 others); Thu, 1 Aug 2019 10:40:42 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:52146 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731640AbfHAOkh (ORCPT ); Thu, 1 Aug 2019 10:40:37 -0400 Received: by mail-wm1-f65.google.com with SMTP id 207so64928238wma.1 for ; Thu, 01 Aug 2019 07:40:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=893Na5Ygp+4zeJW+8ck9Lf3W5foV9bFTJAzn+BBXDEc=; b=zD0IG2rExt+2+IO1UN/pxAOGLPjwQzwpl2THycWr9AAPDojvvIzjaZsHp+tb7s15s+ XlZM1ASfFST0Js0WYeAnaYpgOt8SRCnUKDY/DhD7/qHhaPzsO0BqSlKhbRYyELKFT31Q qnf5yv1AQ8j/OrShIsCIIqZpAIDZB/v6CYCL6IZg8wZb1BUPsbVPT63gp2ijA8WaCO96 u9w5FRL4PyDCQqDp3Um1mrUAelaOrHG1hH/uFe+DBo/1T5bcDr4HHRrcUeEJJ8wdQKgO kZlF0KCjwL+yopP1HkPp9XTUEKybeG7SGCEUeOuf7qe3vJaYh4tpZ7JlK2kCqsnaNfY6 VEYA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=893Na5Ygp+4zeJW+8ck9Lf3W5foV9bFTJAzn+BBXDEc=; b=HVqrRwYajZxG3hRf5QI8QqAdJmK47OMNp5SQR6kVkYjMSZ8SQYkKBvf7Fbt/zWIQ9E YliZ5BitBtbTqpmyBVQYo1cYEAtg7YOYHGnF5s0WAcYJm/CRcJHuYIMMBap84/GweAxr Ol3SOyupSO9WZesv0J2+HJCl+GFZruxlLHPWVtXpi+EpNhFMa7/GvBbPuSNrmv8Yq2p+ jDh4UsY/+bEW/QBnEZXqygSiIpeWUBrFL1/qr5Nd+g++WyzopcRhjlgBPqpRXxxgxjBv /vk50pfxuPjZ9P5G16H/OFtWytAZbHI1u+Zhgfkkxjar/rTo8qIv39BkldlmOZGnpqic EQTQ== X-Gm-Message-State: APjAAAUwINKtJ2QojOkCpi02amjAouQV731C9sNZGiU3RE6rlZV8BFea LPmmyqXhKp0nyEaIwpclrNyg6eZL3aM= X-Received: by 2002:a1c:407:: with SMTP id 7mr123196010wme.113.1564670434461; Thu, 01 Aug 2019 07:40:34 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.33 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:33 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 3/8] sched/fair: remove meaningless imbalance calculation Date: Thu, 1 Aug 2019 16:40:19 +0200 Message-Id: <1564670424-26023-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org clean up load_balance and remove meaningless calculation and fields before adding new algorithm. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 105 +--------------------------------------------------- 1 file changed, 1 insertion(+), 104 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d7f76b0..d7f4a7e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5450,18 +5450,6 @@ static unsigned long capacity_of(int cpu) return cpu_rq(cpu)->cpu_capacity; } -static unsigned long cpu_avg_load_per_task(int cpu) -{ - struct rq *rq = cpu_rq(cpu); - unsigned long nr_running = READ_ONCE(rq->cfs.h_nr_running); - unsigned long load_avg = cpu_runnable_load(rq); - - if (nr_running) - return load_avg / nr_running; - - return 0; -} - static void record_wakee(struct task_struct *p) { /* @@ -7735,7 +7723,6 @@ static unsigned long task_h_load(struct task_struct *p) struct sg_lb_stats { unsigned long avg_load; /*Avg load across the CPUs of the group */ unsigned long group_load; /* Total load over the CPUs of the group */ - unsigned long load_per_task; unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ unsigned int sum_h_nr_running; /* Nr tasks running in the group */ @@ -8117,9 +8104,6 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; - if (sgs->sum_h_nr_running) - sgs->load_per_task = sgs->group_load / sgs->sum_h_nr_running; - sgs->group_weight = group->group_weight; sgs->group_no_capacity = group_is_overloaded(env, sgs); @@ -8350,76 +8334,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } /** - * fix_small_imbalance - Calculate the minor imbalance that exists - * amongst the groups of a sched_domain, during - * load balancing. - * @env: The load balancing environment. - * @sds: Statistics of the sched_domain whose imbalance is to be calculated. - */ -static inline -void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) -{ - unsigned long tmp, capa_now = 0, capa_move = 0; - unsigned int imbn = 2; - unsigned long scaled_busy_load_per_task; - struct sg_lb_stats *local, *busiest; - - local = &sds->local_stat; - busiest = &sds->busiest_stat; - - if (!local->sum_h_nr_running) - local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); - else if (busiest->load_per_task > local->load_per_task) - imbn = 1; - - scaled_busy_load_per_task = - (busiest->load_per_task * SCHED_CAPACITY_SCALE) / - busiest->group_capacity; - - if (busiest->avg_load + scaled_busy_load_per_task >= - local->avg_load + (scaled_busy_load_per_task * imbn)) { - env->imbalance = busiest->load_per_task; - return; - } - - /* - * OK, we don't have enough imbalance to justify moving tasks, - * however we may be able to increase total CPU capacity used by - * moving them. - */ - - capa_now += busiest->group_capacity * - min(busiest->load_per_task, busiest->avg_load); - capa_now += local->group_capacity * - min(local->load_per_task, local->avg_load); - capa_now /= SCHED_CAPACITY_SCALE; - - /* Amount of load we'd subtract */ - if (busiest->avg_load > scaled_busy_load_per_task) { - capa_move += busiest->group_capacity * - min(busiest->load_per_task, - busiest->avg_load - scaled_busy_load_per_task); - } - - /* Amount of load we'd add */ - if (busiest->avg_load * busiest->group_capacity < - busiest->load_per_task * SCHED_CAPACITY_SCALE) { - tmp = (busiest->avg_load * busiest->group_capacity) / - local->group_capacity; - } else { - tmp = (busiest->load_per_task * SCHED_CAPACITY_SCALE) / - local->group_capacity; - } - capa_move += local->group_capacity * - min(local->load_per_task, local->avg_load + tmp); - capa_move /= SCHED_CAPACITY_SCALE; - - /* Move if we gain throughput */ - if (capa_move > capa_now) - env->imbalance = busiest->load_per_task; -} - -/** * calculate_imbalance - Calculate the amount of imbalance present within the * groups of a given sched_domain during load balance. * @env: load balance environment @@ -8438,15 +8352,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s return; } - if (busiest->group_type == group_imbalanced) { - /* - * In the group_imb case we cannot rely on group-wide averages - * to ensure CPU-load equilibrium, look at wider averages. XXX - */ - busiest->load_per_task = - min(busiest->load_per_task, sds->avg_load); - } - /* * Avg load of busiest sg can be less and avg load of local sg can * be greater than avg load across all sgs of sd because avg load @@ -8457,7 +8362,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s (busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load)) { env->imbalance = 0; - return fix_small_imbalance(env, sds); + return; } /* @@ -8495,14 +8400,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s busiest->group_misfit_task_load); } - /* - * if *imbalance is less than the average load per runnable task - * there is no guarantee that any tasks will be moved so we'll have - * a think about bumping its value to force at least one task to be - * moved - */ - if (env->imbalance < busiest->load_per_task) - return fix_small_imbalance(env, sds); } /******* find_busiest_group() helpers end here *********************/ From patchwork Thu Aug 1 14:40:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170384 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538774ile; Thu, 1 Aug 2019 07:40:52 -0700 (PDT) X-Google-Smtp-Source: APXvYqwCsrK4Sjex/uK8lMMB0RrCPtpLPwBYQ41valI8dcUJlZaouc4hvSktayEXR4oIuEQ09pSC X-Received: by 2002:a63:ec48:: with SMTP id r8mr42696562pgj.387.1564670452648; Thu, 01 Aug 2019 07:40:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670452; cv=none; d=google.com; s=arc-20160816; b=R6fWS7fsPTd1bzguqomiNF+GIWdAgk5c2MN3FDsDhSfXdpFY1np7qOk/rhxnnS1Eui DEQqdnhM8iraqe8RtSS0aXdbFUJ1s6A/wdOEeXjblY8yuaeCPw5zoBt+A5miAiRuzvpS fa7l56b2fIorRLrh/ckogBZQm13k6hM9H6lLipG8tLxVDOGAOLV6iYtgX2mGb57uH1AB rMvDawgW0FFGMkXdhzUAmZcLqiBiwfDIq4GW3uRPrmNjlt4HNvCV1ock32PKUoVgUQqK 0Q5TthdLQIRT8tEJ74vSPAR/c2ctisX8svvfPczNpX4nvKRevHLY3qOVpGFld7jemDIU UWew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=OWULD1yOxuZk7ZTMNvQPb2EL6Rg2ROVoM8OcY0sNXWs=; b=xssvf2c29RDxo6/umOx7cXO80YPWrDp7HXPPLnZiwMaq2RmNB1pI1g/waI7rx80XkJ ObVxEKuSIty9q6ppK5ylqqt7fJ1QJ08dU1azX6rd1Jr9Oj+R/Wj9jIWfU8JyfxygrsP0 L0a6Y/Sl5f+9Puu/Z8nCGxJu75Visr15IEoJ/Zx8UsvBJoYpe21tid6/uxGY4j3mSwrn HD9WPZ5F2Aq02dDKZ0jAdmZo1+RgXa0geYpY9vIVIpH71KmNuIeiUPZHZ0eFytg2NHNv OBY96gf6FgWzRP0W/mgsQ/6fEzf/edZmv6m8IhD/Vr6vP97SimlmbfMC39ZCzRgzjA/j 9d+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="R3/l69pw"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3si34494753pfa.89.2019.08.01.07.40.52; Thu, 01 Aug 2019 07:40:52 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b="R3/l69pw"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732246AbfHAOkv (ORCPT + 29 others); Thu, 1 Aug 2019 10:40:51 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:33222 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726409AbfHAOkk (ORCPT ); Thu, 1 Aug 2019 10:40:40 -0400 Received: by mail-wm1-f65.google.com with SMTP id h19so3829329wme.0 for ; Thu, 01 Aug 2019 07:40:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=OWULD1yOxuZk7ZTMNvQPb2EL6Rg2ROVoM8OcY0sNXWs=; b=R3/l69pwolMxHzVPTgmcrZLViUfhf0L4QDCzyXR+PE/FPIdh982cRj3jKZrtdEGXa8 7yZH+YLfRkl15rQJ6vbK4QAIw8zmtpN6RcpY12WLF2bvM+FmYjbUQ7T1GaBFhMIVWuYx dHu7+PHw6+QCk3hxZgKqe/qQwOinV+YQV+DwPxlHGx8ELrYdl+4bEdn4L7iIxLfFQQ0d U7LWmuJZTOU0wLUhChSJ9rhhYMidd8fQzMrmizDSdafj8mazT+3QY2h0ZdgdmWNkIZqA a1+2PgWA9kEUv8uQzw3EqY5uIOvELqYGL2WIP0g8gpR8xlayZ8cZhbi4zhgS8jDrZqtq 7SaQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=OWULD1yOxuZk7ZTMNvQPb2EL6Rg2ROVoM8OcY0sNXWs=; b=nSBF4TsmHGvjgTTHUWJnMUPd1U1maiuSC413ZUjeO7QshpOWbkPylNcFdpGl3ek2qr AEvnIbG1Z9Bh5DMhsyZd9jTTAFi68vdEm++PZCJtxLURoIMtJ0lwcAoM7awZ4WT2qhft KGGfGHE39tVvIqkpK22TuHi9HRYX8vXckBSFUXEf7dYwAIBYy0t7IiPVIRzVIIcOHYob XXR0DOTZL1i/iLyotgqIceaKAFDVsdy0bCW04o0EK4prOG7w6hXNs0d+gYoG0Bam5vWD PXoRE7EqKP1KGoc3t/MWgh7u5Hqc7tbaB+fjKw2OjS9XcA1COQYHs1GWT6ubgP8lUzYb ILEA== X-Gm-Message-State: APjAAAWdEWrpVJDrqQgEArlXYf5e3uPJRxXyfS5p/M22To23M6urszZP lcvkla2mlrnz4WUrWO8IXxxp/YxDbaA= X-Received: by 2002:a7b:c247:: with SMTP id b7mr121592254wmj.13.1564670435285; Thu, 01 Aug 2019 07:40:35 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.34 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:34 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 4/8] sched/fair: rework load_balance Date: Thu, 1 Aug 2019 16:40:20 +0200 Message-Id: <1564670424-26023-5-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The load_balance algorithm contains some heuristics which have becomes meaningless since the rework of metrics and the introduction of PELT. Furthermore, it's sometimes difficult to fix wrong scheduling decisions because everything is based on load whereas some imbalances are not related to the load. The current algorithm ends up to create virtual and meaningless value like the avg_load_per_task or tweaks the state of a group to make it overloaded whereas it's not, in order to try to migrate tasks. load_balance should better qualify the imbalance of the group and define cleary what has to be moved to fix this imbalance. The type of sched_group has been extended to better reflect the type of imbalance. We now have : group_has_spare group_fully_busy group_misfit_task group_asym_capacity group_imbalanced group_overloaded Based on the type of sched_group, load_balance now sets what it wants to move in order to fix the imnbalance. It can be some load as before but also some utilization, a number of task or a type of task: migrate_task migrate_util migrate_load migrate_misfit This new load_balance algorithm fixes several pending wrong tasks placement: - the 1 task per CPU case with asymetrics system - the case of cfs task preempted by other class - the case of tasks not evenly spread on groups with spare capacity The load balance decisions have been gathered in 3 functions: - update_sd_pick_busiest() select the busiest sched_group. - find_busiest_group() checks if there is an imabalance between local and busiest group. - calculate_imbalance() decides what have to be moved. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 581 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 379 insertions(+), 202 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d7f4a7e..a8681c3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7136,13 +7136,28 @@ static unsigned long __read_mostly max_load_balance_interval = HZ/10; enum fbq_type { regular, remote, all }; +/* + * group_type describes the group of CPUs at the moment of the load balance. + * The enum is ordered by pulling priority, with the group with lowest priority + * first so the groupe_type can be simply compared when selecting the busiest + * group. see update_sd_pick_busiest(). + */ enum group_type { - group_other = 0, + group_has_spare = 0, + group_fully_busy, group_misfit_task, + group_asym_capacity, group_imbalanced, group_overloaded, }; +enum migration_type { + migrate_task = 0, + migrate_util, + migrate_load, + migrate_misfit, +}; + #define LBF_ALL_PINNED 0x01 #define LBF_NEED_BREAK 0x02 #define LBF_DST_PINNED 0x04 @@ -7173,7 +7188,7 @@ struct lb_env { unsigned int loop_max; enum fbq_type fbq_type; - enum group_type src_grp_type; + enum migration_type balance_type; struct list_head tasks; }; @@ -7405,7 +7420,7 @@ static int detach_tasks(struct lb_env *env) { struct list_head *tasks = &env->src_rq->cfs_tasks; struct task_struct *p; - unsigned long load; + unsigned long util, load; int detached = 0; lockdep_assert_held(&env->src_rq->lock); @@ -7438,19 +7453,53 @@ static int detach_tasks(struct lb_env *env) if (!can_migrate_task(p, env)) goto next; - load = task_h_load(p); + switch (env->balance_type) { + case migrate_task: + /* Migrate task */ + env->imbalance--; + break; - if (sched_feat(LB_MIN) && load < 16 && !env->sd->nr_balance_failed) - goto next; + case migrate_util: + util = task_util_est(p); - if ((load / 2) > env->imbalance) - goto next; + if (util > env->imbalance) + goto next; + + env->imbalance -= util; + break; + + case migrate_load: + load = task_h_load(p); + + if (sched_feat(LB_MIN) && + load < 16 && !env->sd->nr_balance_failed) + goto next; + + if ((load / 2) > env->imbalance) + goto next; + + env->imbalance -= load; + break; + + case migrate_misfit: + load = task_h_load(p); + + /* + * utilization of misfit task might decrease a bit + * since it has been recorded. Be conservative in the + * condition. + */ + if (load < env->imbalance) + goto next; + + env->imbalance = 0; + break; + } detach_task(p, env); list_add(&p->se.group_node, &env->tasks); detached++; - env->imbalance -= load; #ifdef CONFIG_PREEMPT /* @@ -7729,8 +7778,7 @@ struct sg_lb_stats { unsigned int idle_cpus; unsigned int group_weight; enum group_type group_type; - int group_no_capacity; - int group_asym_capacity; + unsigned int group_asym_capacity; /* tasks should be move to preferred cpu */ unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; @@ -7745,10 +7793,10 @@ struct sg_lb_stats { struct sd_lb_stats { struct sched_group *busiest; /* Busiest group in this sd */ struct sched_group *local; /* Local group in this sd */ - unsigned long total_running; unsigned long total_load; /* Total load of all groups in sd */ unsigned long total_capacity; /* Total capacity of all groups in sd */ unsigned long avg_load; /* Average load across all groups in sd */ + unsigned int prefer_sibling; /* tasks should go to sibling first */ struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */ struct sg_lb_stats local_stat; /* Statistics of the local group */ @@ -7765,13 +7813,11 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds) *sds = (struct sd_lb_stats){ .busiest = NULL, .local = NULL, - .total_running = 0UL, .total_load = 0UL, .total_capacity = 0UL, .busiest_stat = { - .avg_load = 0UL, - .sum_h_nr_running = 0, - .group_type = group_other, + .idle_cpus = UINT_MAX, + .group_type = group_has_spare, }, }; } @@ -8013,19 +8059,26 @@ group_smaller_max_cpu_capacity(struct sched_group *sg, struct sched_group *ref) } static inline enum -group_type group_classify(struct sched_group *group, +group_type group_classify(struct lb_env *env, + struct sched_group *group, struct sg_lb_stats *sgs) { - if (sgs->group_no_capacity) + if (group_is_overloaded(env, sgs)) return group_overloaded; if (sg_imbalanced(group)) return group_imbalanced; + if (sgs->group_asym_capacity) + return group_asym_capacity; + if (sgs->group_misfit_task_load) return group_misfit_task; - return group_other; + if (!group_has_capacity(env, sgs)) + return group_fully_busy; + + return group_has_spare; } static bool update_nohz_stats(struct rq *rq, bool force) @@ -8062,10 +8115,12 @@ static inline void update_sg_lb_stats(struct lb_env *env, struct sg_lb_stats *sgs, int *sg_status) { - int i, nr_running; + int i, nr_running, local_group; memset(sgs, 0, sizeof(*sgs)); + local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group)); + for_each_cpu_and(i, sched_group_span(group), env->cpus) { struct rq *rq = cpu_rq(i); @@ -8090,9 +8145,16 @@ static inline void update_sg_lb_stats(struct lb_env *env, /* * No need to call idle_cpu() if nr_running is not 0 */ - if (!nr_running && idle_cpu(i)) + if (!nr_running && idle_cpu(i)) { sgs->idle_cpus++; + /* Idle cpu can't have misfit task */ + continue; + } + + if (local_group) + continue; + /* Check for a misfit task on the cpu */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && sgs->group_misfit_task_load < rq->misfit_task_load) { sgs->group_misfit_task_load = rq->misfit_task_load; @@ -8100,14 +8162,24 @@ static inline void update_sg_lb_stats(struct lb_env *env, } } - /* Adjust by relative CPU capacity of the group */ + /* Check if dst cpu is idle and preferred to this group */ + if (env->sd->flags & SD_ASYM_PACKING && + env->idle != CPU_NOT_IDLE && + sgs->sum_h_nr_running && + sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) { + sgs->group_asym_capacity = 1; + } + sgs->group_capacity = group->sgc->capacity; - sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; sgs->group_weight = group->group_weight; - sgs->group_no_capacity = group_is_overloaded(env, sgs); - sgs->group_type = group_classify(group, sgs); + sgs->group_type = group_classify(env, group, sgs); + + /* Computing avg_load makes sense only when group is overloaded */ + if (sgs->group_type == group_overloaded) + sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) / + sgs->group_capacity; } /** @@ -8130,6 +8202,10 @@ static bool update_sd_pick_busiest(struct lb_env *env, { struct sg_lb_stats *busiest = &sds->busiest_stat; + + /* Make sure that there is at least one task to pull */ + if (!sgs->sum_h_nr_running) + return false; /* * Don't try to pull misfit tasks we can't help. * We can use max_capacity here as reduction in capacity on some @@ -8138,7 +8214,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, */ if (sgs->group_type == group_misfit_task && (!group_smaller_max_cpu_capacity(sg, sds->local) || - !group_has_capacity(env, &sds->local_stat))) + sds->local_stat.group_type != group_has_spare)) return false; if (sgs->group_type > busiest->group_type) @@ -8147,11 +8223,67 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->group_type < busiest->group_type) return false; - if (sgs->avg_load <= busiest->avg_load) + /* + * The candidate and the current busiest group are the same type of + * group. Let check which one is the busiest according to the type. + */ + + switch (sgs->group_type) { + case group_overloaded: + /* Select the overloaded group with highest avg_load. */ + if (sgs->avg_load <= busiest->avg_load) + return false; + break; + + case group_imbalanced: + /* Select the 1st imbalanced group as we don't have + * any way to choose one more than another + */ return false; + break; - if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) - goto asym_packing; + case group_asym_capacity: + /* Prefer to move from lowest priority CPU's work */ + if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu)) + return false; + break; + + case group_misfit_task: + /* + * If we have more than one misfit sg go with the + * biggest misfit. + */ + if (sgs->group_misfit_task_load < busiest->group_misfit_task_load) + return false; + break; + + case group_fully_busy: + /* + * Select the fully busy group with highest avg_load. + * In theory, there is no need to pull task from such + * kind of group because tasks have all compute + * capacity that they need but we can still improve the + * overall throughput by reducing contention when + * accessing shared HW resources. + * XXX for now avg_load is not computed and always 0 so + * we select the 1st one. + */ + if (sgs->avg_load <= busiest->avg_load) + return false; + break; + + case group_has_spare: + /* + * Select not overloaded group with lowest number of + * idle cpus. We could also compare the spare capacity + * which is more stable but it can end up that the + * group has less spare capacity but finally more idle + * cpus which means less opportunity to pull tasks. + */ + if (sgs->idle_cpus >= busiest->idle_cpus) + return false; + break; + } /* * Candidate sg has no more than one task per CPU and @@ -8159,50 +8291,12 @@ static bool update_sd_pick_busiest(struct lb_env *env, * capable CPUs may harm throughput. Maximize throughput, * power/energy consequences are not considered. */ - if (sgs->sum_h_nr_running <= sgs->group_weight && - group_smaller_min_cpu_capacity(sds->local, sg)) - return false; - - /* - * If we have more than one misfit sg go with the biggest misfit. - */ - if (sgs->group_type == group_misfit_task && - sgs->group_misfit_task_load < busiest->group_misfit_task_load) + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && + (sgs->group_type <= group_fully_busy) && + (group_smaller_min_cpu_capacity(sds->local, sg))) return false; -asym_packing: - /* This is the busiest node in its class. */ - if (!(env->sd->flags & SD_ASYM_PACKING)) - return true; - - /* No ASYM_PACKING if target CPU is already busy */ - if (env->idle == CPU_NOT_IDLE) - return true; - /* - * ASYM_PACKING needs to move all the work to the highest - * prority CPUs in the group, therefore mark all groups - * of lower priority than ourself as busy. - * - * This is primarily intended to used at the sibling level. Some - * cores like POWER7 prefer to use lower numbered SMT threads. In the - * case of POWER7, it can move to lower SMT modes only when higher - * threads are idle. When in lower SMT modes, the threads will - * perform better since they share less core resources. Hence when we - * have idle threads, we want them to be the higher ones. - */ - if (sgs->sum_h_nr_running && - sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { - sgs->group_asym_capacity = 1; - if (!sds->busiest) - return true; - - /* Prefer to move from lowest priority CPU's work */ - if (sched_asym_prefer(sds->busiest->asym_prefer_cpu, - sg->asym_prefer_cpu)) - return true; - } - - return false; + return true; } #ifdef CONFIG_NUMA_BALANCING @@ -8240,13 +8334,13 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq) * @env: The load balancing environment. * @sds: variable to hold the statistics for this sched_domain. */ + static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds) { struct sched_domain *child = env->sd->child; struct sched_group *sg = env->sd->groups; struct sg_lb_stats *local = &sds->local_stat; struct sg_lb_stats tmp_sgs; - bool prefer_sibling = child && child->flags & SD_PREFER_SIBLING; int sg_status = 0; #ifdef CONFIG_NO_HZ_COMMON @@ -8273,22 +8367,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd if (local_group) goto next_group; - /* - * In case the child domain prefers tasks go to siblings - * first, lower the sg capacity so that we'll try - * and move all the excess tasks away. We lower the capacity - * of a group only if the local group has the capacity to fit - * these excess tasks. The extra check prevents the case where - * you always pull from the heaviest group when it is already - * under-utilized (possible with a large weight task outweighs - * the tasks on the system). - */ - if (prefer_sibling && sds->local && - group_has_capacity(env, local) && - (sgs->sum_h_nr_running > local->sum_h_nr_running + 1)) { - sgs->group_no_capacity = 1; - sgs->group_type = group_classify(sg, sgs); - } if (update_sd_pick_busiest(env, sds, sg, sgs)) { sds->busiest = sg; @@ -8297,13 +8375,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd next_group: /* Now, start updating sd_lb_stats */ - sds->total_running += sgs->sum_h_nr_running; sds->total_load += sgs->group_load; sds->total_capacity += sgs->group_capacity; sg = sg->next; } while (sg != env->sd->groups); + /* Tag domain that child domain prefers tasks go to siblings first */ + sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING; + #ifdef CONFIG_NO_HZ_COMMON if ((env->flags & LBF_NOHZ_AGAIN) && cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) { @@ -8341,69 +8421,132 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd */ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds) { - unsigned long max_pull, load_above_capacity = ~0UL; struct sg_lb_stats *local, *busiest; local = &sds->local_stat; busiest = &sds->busiest_stat; + if (busiest->group_type == group_imbalanced) { + /* + * In the group_imb case we cannot rely on group-wide averages + * to ensure CPU-load equilibrium, try to move any task to fix + * the imbalance. The next load balance will take care of + * balancing back the system. + */ + env->balance_type = migrate_task; + env->imbalance = 1; + return; + } - if (busiest->group_asym_capacity) { + if (busiest->group_type == group_asym_capacity) { + /* + * In case of asym capacity, we will try to migrate all load + * to the preferred CPU + */ + env->balance_type = migrate_load; env->imbalance = busiest->group_load; return; } + if (busiest->group_type == group_misfit_task) { + /* Set imbalance to allow misfit task to be balanced. */ + env->balance_type = migrate_misfit; + env->imbalance = busiest->group_misfit_task_load; + return; + } + /* - * Avg load of busiest sg can be less and avg load of local sg can - * be greater than avg load across all sgs of sd because avg load - * factors in sg capacity and sgs with smaller group_type are - * skipped when updating the busiest sg: + * Try to use spare capacity of local group without overloading it or + * emptying busiest */ - if (busiest->group_type != group_misfit_task && - (busiest->avg_load <= sds->avg_load || - local->avg_load >= sds->avg_load)) { - env->imbalance = 0; + if (local->group_type == group_has_spare) { + if (busiest->group_type > group_fully_busy) { + /* + * If busiest is overloaded, try to fill spare + * capacity. This might end up creating spare capacity + * in busiest or busiest still being overloaded but + * there is no simple way to directly compute the + * amount of load to migrate in order to balance the + * system. + */ + env->balance_type = migrate_util; + env->imbalance = max(local->group_capacity, local->group_util) - + local->group_util; + return; + } + + if (busiest->group_weight == 1 || sds->prefer_sibling) { + /* + * When prefer sibling, evenly spread running tasks on + * groups. + */ + env->balance_type = migrate_task; + env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1; + return; + } + + /* + * If there is no overload, we just want to even the number of + * idle cpus. + */ + env->balance_type = migrate_task; + env->imbalance = max_t(long, 0, (local->idle_cpus - busiest->idle_cpus) >> 1); return; } /* - * If there aren't any idle CPUs, avoid creating some. + * Local is fully busy but have to take more load to relieve the + * busiest group */ - if (busiest->group_type == group_overloaded && - local->group_type == group_overloaded) { - load_above_capacity = busiest->sum_h_nr_running * SCHED_CAPACITY_SCALE; - if (load_above_capacity > busiest->group_capacity) { - load_above_capacity -= busiest->group_capacity; - load_above_capacity *= scale_load_down(NICE_0_LOAD); - load_above_capacity /= busiest->group_capacity; - } else - load_above_capacity = ~0UL; + if (local->group_type < group_overloaded) { + /* + * Local will become overvloaded so the avg_load metrics are + * finally needed + */ + + local->avg_load = (local->group_load * SCHED_CAPACITY_SCALE) / + local->group_capacity; + + sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) / + sds->total_capacity; } /* - * We're trying to get all the CPUs to the average_load, so we don't - * want to push ourselves above the average load, nor do we wish to - * reduce the max loaded CPU below the average load. At the same time, - * we also don't want to reduce the group load below the group - * capacity. Thus we look for the minimum possible imbalance. + * Both group are or will become overloaded and we're trying to get + * all the CPUs to the average_load, so we don't want to push + * ourselves above the average load, nor do we wish to reduce the + * max loaded CPU below the average load. At the same time, we also + * don't want to reduce the group load below the group capacity. + * Thus we look for the minimum possible imbalance. */ - max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity); - - /* How much load to actually move to equalise the imbalance */ + env->balance_type = migrate_load; env->imbalance = min( - max_pull * busiest->group_capacity, + (busiest->avg_load - sds->avg_load) * busiest->group_capacity, (sds->avg_load - local->avg_load) * local->group_capacity ) / SCHED_CAPACITY_SCALE; - - /* Boost imbalance to allow misfit task to be balanced. */ - if (busiest->group_type == group_misfit_task) { - env->imbalance = max_t(long, env->imbalance, - busiest->group_misfit_task_load); - } - } /******* find_busiest_group() helpers end here *********************/ +/* + * Decision matrix according to the local and busiest group state + * + * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded + * has_spare nr_idle balanced N/A N/A balanced balanced + * fully_busy nr_idle nr_idle N/A N/A balanced balanced + * misfit_task force N/A N/A N/A force force + * asym_capacity force force N/A N/A force force + * imbalanced force force N/A N/A force force + * overloaded force force N/A N/A force avg_load + * + * N/A : Not Applicable because already filtered while updating + * statistics. + * balanced : The system is balanced for these 2 groups. + * force : Calculate the imbalance as load migration is probably needed. + * avg_load : Only if imbalance is significant enough. + * nr_idle : dst_cpu is not busy and the number of idle cpus is quite + * different in groups. + */ + /** * find_busiest_group - Returns the busiest group within the sched_domain * if there is an imbalance. @@ -8438,17 +8581,17 @@ static struct sched_group *find_busiest_group(struct lb_env *env) local = &sds.local_stat; busiest = &sds.busiest_stat; - /* ASYM feature bypasses nice load balance check */ - if (busiest->group_asym_capacity) - goto force_balance; - /* There is no busy sibling group to pull tasks from */ if (!sds.busiest || busiest->sum_h_nr_running == 0) goto out_balanced; - /* XXX broken for overlapping NUMA groups */ - sds.avg_load = (SCHED_CAPACITY_SCALE * sds.total_load) - / sds.total_capacity; + /* Misfit tasks should be dealt with regardless of the avg load */ + if (busiest->group_type == group_misfit_task) + goto force_balance; + + /* ASYM feature bypasses nice load balance check */ + if (busiest->group_type == group_asym_capacity) + goto force_balance; /* * If the busiest group is imbalanced the below checks don't @@ -8459,59 +8602,71 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* - * When dst_cpu is idle, prevent SMP nice and/or asymmetric group - * capacities from resulting in underutilization due to avg_load. - */ - if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && - busiest->group_no_capacity) - goto force_balance; - - /* Misfit tasks should be dealt with regardless of the avg load */ - if (busiest->group_type == group_misfit_task) - goto force_balance; - - /* * If the local group is busier than the selected busiest group * don't try and pull any tasks. */ - if (local->avg_load >= busiest->avg_load) + if (local->group_type > busiest->group_type) goto out_balanced; /* - * Don't pull any tasks if this group is already above the domain - * average load. + * When groups are overloaded, use the avg_load to ensure fairness + * between tasks. */ - if (local->avg_load >= sds.avg_load) - goto out_balanced; + if (local->group_type == group_overloaded) { + /* + * If the local group is more loaded than the selected + * busiest group don't try and pull any tasks. + */ + if (local->avg_load >= busiest->avg_load) + goto out_balanced; + + /* XXX broken for overlapping NUMA groups */ + sds.avg_load = (sds.total_load * SCHED_CAPACITY_SCALE) / + sds.total_capacity; - if (env->idle == CPU_IDLE) { /* - * This CPU is idle. If the busiest group is not overloaded - * and there is no imbalance between this and busiest group - * wrt idle CPUs, it is balanced. The imbalance becomes - * significant if the diff is greater than 1 otherwise we - * might end up to just move the imbalance on another group + * Don't pull any tasks if this group is already above the + * domain average load. */ - if ((busiest->group_type != group_overloaded) && - (local->idle_cpus <= (busiest->idle_cpus + 1))) + if (local->avg_load >= sds.avg_load) goto out_balanced; - } else { + /* - * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use - * imbalance_pct to be conservative. + * If the busiest group is more loaded, use imbalance_pct to be + * conservative. */ if (100 * busiest->avg_load <= env->sd->imbalance_pct * local->avg_load) goto out_balanced; + } + /* Try to move all excess tasks to child's sibling domain */ + if (sds.prefer_sibling && local->group_type == group_has_spare && + busiest->sum_h_nr_running > local->sum_h_nr_running + 1) + goto force_balance; + + if (busiest->group_type != group_overloaded && + (env->idle == CPU_NOT_IDLE || + local->idle_cpus <= (busiest->idle_cpus + 1))) + /* + * If the busiest group is not overloaded + * and there is no imbalance between this and busiest group + * wrt idle CPUs, it is balanced. The imbalance + * becomes significant if the diff is greater than 1 otherwise + * we might end up to just move the imbalance on another + * group. + */ + goto out_balanced; + force_balance: /* Looks like there is an imbalance. Compute it */ - env->src_grp_type = busiest->group_type; calculate_imbalance(env, &sds); + return env->imbalance ? sds.busiest : NULL; out_balanced: + env->imbalance = 0; return NULL; } @@ -8523,11 +8678,13 @@ static struct rq *find_busiest_queue(struct lb_env *env, struct sched_group *group) { struct rq *busiest = NULL, *rq; - unsigned long busiest_load = 0, busiest_capacity = 1; + unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1; + unsigned int busiest_nr = 0; int i; for_each_cpu_and(i, sched_group_span(group), env->cpus) { - unsigned long capacity, load; + unsigned long capacity, load, util; + unsigned int nr_running; enum fbq_type rt; rq = cpu_rq(i); @@ -8555,20 +8712,8 @@ static struct rq *find_busiest_queue(struct lb_env *env, if (rt > env->fbq_type) continue; - /* - * For ASYM_CPUCAPACITY domains with misfit tasks we simply - * seek the "biggest" misfit task. - */ - if (env->src_grp_type == group_misfit_task) { - if (rq->misfit_task_load > busiest_load) { - busiest_load = rq->misfit_task_load; - busiest = rq; - } - - continue; - } - capacity = capacity_of(i); + nr_running = rq->cfs.h_nr_running; /* * For ASYM_CPUCAPACITY domains, don't pick a CPU that could @@ -8578,35 +8723,67 @@ static struct rq *find_busiest_queue(struct lb_env *env, */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && capacity_of(env->dst_cpu) < capacity && - rq->nr_running == 1) + nr_running == 1) continue; - load = cpu_runnable_load(rq); + switch (env->balance_type) { + case migrate_task: + if (busiest_nr < nr_running) { + busiest_nr = nr_running; + busiest = rq; + } + break; - /* - * When comparing with imbalance, use cpu_runnable_load() - * which is not scaled with the CPU capacity. - */ + case migrate_util: + util = cpu_util(cpu_of(rq)); - if (rq->nr_running == 1 && load > env->imbalance && - !check_cpu_capacity(rq, env->sd)) - continue; + if (busiest_util < util) { + busiest_util = util; + busiest = rq; + } + break; + + case migrate_load: + /* + * When comparing with load imbalance, use cpu_runnable_load() + * which is not scaled with the CPU capacity. + */ + load = cpu_runnable_load(rq); + + if (nr_running == 1 && load > env->imbalance && + !check_cpu_capacity(rq, env->sd)) + break; + + /* + * For the load comparisons with the other CPU's, consider + * the cpu_runnable_load() scaled with the CPU capacity, so + * that the load can be moved away from the CPU that is + * potentially running at a lower capacity. + * + * Thus we're looking for max(load_i / capacity_i), crosswise + * multiplication to rid ourselves of the division works out + * to: load_i * capacity_j > load_j * capacity_i; where j is + * our previous maximum. + */ + if (load * busiest_capacity > busiest_load * capacity) { + busiest_load = load; + busiest_capacity = capacity; + busiest = rq; + } + break; + + case migrate_misfit: + /* + * For ASYM_CPUCAPACITY domains with misfit tasks we simply + * seek the "biggest" misfit task. + */ + if (rq->misfit_task_load > busiest_load) { + busiest_load = rq->misfit_task_load; + busiest = rq; + } + + break; - /* - * For the load comparisons with the other CPU's, consider - * the cpu_runnable_load() scaled with the CPU capacity, so - * that the load can be moved away from the CPU that is - * potentially running at a lower capacity. - * - * Thus we're looking for max(load_i / capacity_i), crosswise - * multiplication to rid ourselves of the division works out - * to: load_i * capacity_j > load_j * capacity_i; where j is - * our previous maximum. - */ - if (load * busiest_capacity > busiest_load * capacity) { - busiest_load = load; - busiest_capacity = capacity; - busiest = rq; } } @@ -8652,7 +8829,7 @@ voluntary_active_balance(struct lb_env *env) return 1; } - if (env->src_grp_type == group_misfit_task) + if (env->balance_type == migrate_misfit) return 1; return 0; @@ -8765,7 +8942,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, env.src_rq = busiest; ld_moved = 0; - if (busiest->cfs.h_nr_running > 1) { + if (busiest->nr_running > 1) { /* * Attempt to move tasks. If find_busiest_group has found * an imbalance but busiest->nr_running <= 1, the group is From patchwork Thu Aug 1 14:40:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170380 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538516ile; Thu, 1 Aug 2019 07:40:42 -0700 (PDT) X-Google-Smtp-Source: APXvYqx9na2VwfsOnAl8J2U32QMVfnY5sAtLiS8pM/qxdTTS3RiOzPU/edITAWhJlx5ymm6H5kVm X-Received: by 2002:a17:902:f089:: with SMTP id go9mr125998261plb.81.1564670441992; Thu, 01 Aug 2019 07:40:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670441; cv=none; d=google.com; s=arc-20160816; b=euBPXlEy4IUVIqUzzx4MnfA1uOnWkXEfZ8GyijZh3bn5BE6qyAMRDmYJySDsqZ29Ze DGcjL5Rc/UEnUdxWtM1SfFIymBhH7NHMOuKy3G1zpOxoInpJMJcBubN61OEpcr6NAWsX WGqxsgKP7AASPUMaZQoFMC0sKj0IvEiOwABdKzOnkXLChesRK57KN9+ZT6R/h6ulKUCn djrhRrVKxOAn/If9rpbXUjNkgyXCrDySiYpmJft+zHVZloMTva+rgzZX0AoqRDBvDyOM Ul+vEE+X/oXDjTxEGQWrv+9WsN/h3bNQ1xTf4F9ii48p/Nr7zf/DOMYIbZ1NACG0k1jv VWqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=7Tr0FeXYuU+2BdbrnzBKZV4E7SBNB4cbQUdji9DRqbE=; b=xNgRPwOG0ZncXd7F3JiMnsVRKJ2UJo6HiDtqUfFC5B21BKkJD8FYmNySdPXP01zqIQ Pd8Ay4AbktVzoxZDqFb+EoHBpd5Jix4mHYqlBH7Vupoodj8TRp/gbhv68WODqjl5VNcb cE8uOnW7YxtbODdm6cc+9sEcmEPYhlQESccOS62fylIMrOcbuirsz6/bTM6xsTT8NA8g 25RfiMYyqeq80R8NbZU+czqdc2touf9355jg9iqp0kfmnADsxvsAjpNNNEn4AD55ABB1 gBgkS/5qdA+of4AdiaBO2xPyCgAH1l/hMx7u4kw9+BW47IpuE0wAnpdsWsfWxw6lQjoY lAAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=HMIm5VPV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a19si39063386pgw.234.2019.08.01.07.40.41; Thu, 01 Aug 2019 07:40:41 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=HMIm5VPV; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732211AbfHAOkk (ORCPT + 29 others); Thu, 1 Aug 2019 10:40:40 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:52150 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731695AbfHAOkh (ORCPT ); Thu, 1 Aug 2019 10:40:37 -0400 Received: by mail-wm1-f66.google.com with SMTP id 207so64928332wma.1 for ; Thu, 01 Aug 2019 07:40:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=7Tr0FeXYuU+2BdbrnzBKZV4E7SBNB4cbQUdji9DRqbE=; b=HMIm5VPVV34dLGcnEoG7EDIQlYC759qvLHdEdnrHgI7cBq5WMZR46TTTH2LudCymbg 1nGIvARcNHsvj2UB9YLVVqsSU4xrPChyhBCZJiwKEkvq2qfgNibLUJE+Ybh2UVf/Q5KF RJRcPnscShKzc4U48e1+BlvK6XYAZ3IVcMHAlqPe8EcYdnFkl5pmzlsggAs2s10Cxs0s P0/nDWEnVr2Jvl51j/NfKv4lShtlJopRyyZ/4o7Z/UtJXuAbRQLxMfE7i03RP1RwVWaF 6UfuWi5jFIoHniBjk+OBrk4kSuZl/FhM9BTwvKd8ARRXC+UK15zuv0F22kXZ50nWxlEJ uorQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=7Tr0FeXYuU+2BdbrnzBKZV4E7SBNB4cbQUdji9DRqbE=; b=AlmXHTqhCXaexBv1dflifJMqzOKq30uRi0YuKdp9ps25pt3sjUQ+O+1nwO3fgFaRrz 7NK+Xv2dYkmqxlvlupzStMAQi8BNSd727v+mDGdWwkksPVSy8qcZeFvytcvADmqT8Ea2 +Xu697Dp1pORMdLHFAQyhgVLm5VRUuGzVSYu2kjP04rm+9PVLrZ9ZvIZUmk9gWZ5tCBv 7PclysVLkhgaINXuEepDPGE3prxwEr08oJsB759RP05VztRHXzUaeE3tUDmhzYEof06N G9zNKeKHCeFqh7PlT5az5EkiwANiIFASEZWYMF2nQwwDlDknipxLuz9rxw3hUvDNzUDw 0m3A== X-Gm-Message-State: APjAAAUoc3J3O11jMM4qD+F+8epWQXZ9IIHQP/LhFkv3DpU8fWI7IrOV WGotPXvm+aVufzFpc9OJ5k6veKiOjxA= X-Received: by 2002:a7b:cbc6:: with SMTP id n6mr89455332wmi.14.1564670436021; Thu, 01 Aug 2019 07:40:36 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.35 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:35 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 5/8] sched/fair: use rq->nr_running when balancing load Date: Thu, 1 Aug 2019 16:40:21 +0200 Message-Id: <1564670424-26023-6-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org cfs load_balance only takes care of CFS tasks whereas CPUs can be used by other scheduling class. Typically, a CFS task preempted by a RT or deadline task will not get a chance to be pulled on another CPU because the load_balance doesn't take into account tasks from classes. Add sum of nr_running in the statistics and use it to detect such situation. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a8681c3..f05f1ad 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7774,6 +7774,7 @@ struct sg_lb_stats { unsigned long group_load; /* Total load over the CPUs of the group */ unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ + unsigned int sum_nr_running; /* Nr tasks running in the group */ unsigned int sum_h_nr_running; /* Nr tasks running in the group */ unsigned int idle_cpus; unsigned int group_weight; @@ -8007,7 +8008,7 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_h_nr_running < sgs->group_weight) + if (sgs->sum_nr_running < sgs->group_weight) return true; if ((sgs->group_capacity * 100) > @@ -8028,7 +8029,7 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_h_nr_running <= sgs->group_weight) + if (sgs->sum_nr_running <= sgs->group_weight) return false; if ((sgs->group_capacity * 100) < @@ -8132,6 +8133,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->sum_h_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; + sgs->sum_nr_running += nr_running; + if (nr_running > 1) *sg_status |= SG_OVERLOAD; @@ -8480,7 +8483,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s * groups. */ env->balance_type = migrate_task; - env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1; + env->imbalance = (busiest->sum_nr_running - local->sum_nr_running) >> 1; return; } @@ -8643,7 +8646,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) /* Try to move all excess tasks to child's sibling domain */ if (sds.prefer_sibling && local->group_type == group_has_spare && - busiest->sum_h_nr_running > local->sum_h_nr_running + 1) + busiest->sum_nr_running > local->sum_nr_running + 1) goto force_balance; if (busiest->group_type != group_overloaded && From patchwork Thu Aug 1 14:40:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170385 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538925ile; Thu, 1 Aug 2019 07:41:00 -0700 (PDT) X-Google-Smtp-Source: APXvYqzi5PUzt65m0+kPK9awUulJ416JcL45npArVihPSYI0ArW8ODjZt1XWIfwRxud34uvrTBly X-Received: by 2002:a17:902:2aa9:: with SMTP id j38mr120554531plb.206.1564670460248; Thu, 01 Aug 2019 07:41:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670460; cv=none; d=google.com; s=arc-20160816; b=eWykZb30UwpqQt+kdX3iodBCZnOf6eaDYaPFPuUTy005m1DDoSM/IjXNRKLBgsx7Jw GAtbrdAA22aqz+ho6bYWznl6VPO5OPq8jkO++WzeMoEcl5OrJgV/duXF+7ZJn7j9g7/+ MFWeMu+QdFar3WmEdXm3DzpNdltuy0La1GBfwty3CKfdEvsI3UGV72DlJc2Ajksxeu9v WTS30c9aGt6gyplfS3oWmI81CCSgSj2tHg24Rcm/n1YQEpwBMRKoGF3ORmYh/UScFOkX bPu4MEJWYcBZt//NayrBnnN1xLyqUZ5AFSN0E3B/ao9zfgXRVosmsGpRZiWIPOqGQsRS o6BA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=JzB9VaAbbg7LQZ9XSxn2KJHSUzRU4B8rlHeaB0NzIVs=; b=rgDJvevrYD5U4zFo+dJZpRAFh/WIodwBLZofZ5PApuuDP8jmLxxios+Uniuiz0d8JU Yy/fbsMcrPpTPE/MOgzdHb1gtO2h1G5Wz+OOkXHqVKs1r0t42wAUk68MnV2IrtWV5AA5 7G5y2rFqWh8eHLvD3Iuhg66lUwKZoSj8NYvyt+izjCVbhkcTMzwDXQ8kDk3EPfhrtexK F74EFH/QI/IKFfnI44YC6pvJDlqCP4KAd4h7k7UxHd/a/czFXjt1E2vx8YwYshcK/1eb mNBSCZO0BIPvEExX3TVD2mvuNfUsj7SmGI/7hzvjg41yBme5ZhXDibfmhy6aeS7UsJi1 MkqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=qkibWlGk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t18si31382846plo.328.2019.08.01.07.40.59; Thu, 01 Aug 2019 07:41:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=qkibWlGk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732252AbfHAOk6 (ORCPT + 29 others); Thu, 1 Aug 2019 10:40:58 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:38241 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731713AbfHAOki (ORCPT ); Thu, 1 Aug 2019 10:40:38 -0400 Received: by mail-wm1-f67.google.com with SMTP id s15so42088196wmj.3 for ; Thu, 01 Aug 2019 07:40:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JzB9VaAbbg7LQZ9XSxn2KJHSUzRU4B8rlHeaB0NzIVs=; b=qkibWlGky7jPtVk0adbE/rqs5Uc/g8DkejdRLARL5z6ljsALiSbXTObjHxNfYrKpBA 3Rd7Gr3DKgHMiICqjUrs0Eh2qVjF4ONHskFovt1TZCkirFuxBlzvm6/GCrjQKoTD11bg qEk6/S0U1C0vIto+9KYKlfw0NdxYDPhRIqS+IqtMgEuhkdtsY+864Mx4nva3ED5y3rko QdbKYi62LQmkffNjYyQ6An+rwoymFF1QLMnUziKOMo/aPtxbc38lopX6TKEGZ2F/JWfN L093jggoRY7BAckbIHj53Ohc+JEw/ER2wiibR2yo6yhW9Ww4rcahADbsVsOOcYi8IFa0 h/cQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JzB9VaAbbg7LQZ9XSxn2KJHSUzRU4B8rlHeaB0NzIVs=; b=fOeyz/atMEprqa+IgbtiP4ZBmXp4tPaYpdlTTyDnc1BSjUCYKxhqDB8IU88/jCiSBz w6BgpKU2k8kWlS2u46M2yoGB5nFspwAumZy/yqIocz+3/lstLp19PkMNqPDW6jO8MalG SjdN/UH4wZcDWM4SxLqyjWKUI7mQEpgoru1NACreM0p3AMchrU/PdLSnlnCE0FY86oU9 aACDbPqd9yqgU4wgN8axQt1SLY5ZlU0dr5Dl0DBA3BkialmV+DA8VGu5WEfbU6wqq+BI ic91IAx2oPyuDDCo/U0sROvtwUpDUm5O0ni+8DwHBDI1zTsJJbTLY9TqjpCIRmfgHEhx MMdw== X-Gm-Message-State: APjAAAWiDGqliwDx+I6zGllUC9g/zT7M/L8y/KF7n9Mr/HiQLcoje3ze 7c3iYVLidjWgtxXundC7UFztOaHzZKk= X-Received: by 2002:a7b:c1c1:: with SMTP id a1mr122451652wmj.31.1564670436768; Thu, 01 Aug 2019 07:40:36 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:36 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 6/8] sched/fair: use load instead of runnable load Date: Thu, 1 Aug 2019 16:40:22 +0200 Message-Id: <1564670424-26023-7-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org runnable load has been introduced to take into account the case where blocked load biases the load balance decision which was selecting underutilized group with huge blocked load whereas other groups were overloaded. The load is now only used when groups are overloaded. In this case, it's worth being conservative and taking into account the sleeping tasks that might wakeup on the cpu. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index f05f1ad..dfaf0b8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5445,6 +5445,11 @@ static unsigned long cpu_runnable_load(struct rq *rq) return cfs_rq_runnable_load_avg(&rq->cfs); } +static unsigned long cpu_load(struct rq *rq) +{ + return cfs_rq_load_avg(&rq->cfs); +} + static unsigned long capacity_of(int cpu) { return cpu_rq(cpu)->cpu_capacity; @@ -5540,7 +5545,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, s64 this_eff_load, prev_eff_load; unsigned long task_load; - this_eff_load = cpu_runnable_load(cpu_rq(this_cpu)); + this_eff_load = cpu_load(cpu_rq(this_cpu)); if (sync) { unsigned long current_load = task_h_load(current); @@ -5558,7 +5563,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, this_eff_load *= 100; this_eff_load *= capacity_of(prev_cpu); - prev_eff_load = cpu_runnable_load(cpu_rq(prev_cpu)); + prev_eff_load = cpu_load(cpu_rq(prev_cpu)); prev_eff_load -= task_load; if (sched_feat(WA_BIAS)) prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2; @@ -5646,7 +5651,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, max_spare_cap = 0; for_each_cpu(i, sched_group_span(group)) { - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); runnable_load += load; avg_load += cfs_rq_load_avg(&cpu_rq(i)->cfs); @@ -5787,7 +5792,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this continue; } - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); if (load < min_load) { min_load = load; least_loaded_cpu = i; @@ -8128,7 +8133,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false)) env->flags |= LBF_NOHZ_AGAIN; - sgs->group_load += cpu_runnable_load(rq); + sgs->group_load += cpu_load(rq); sgs->group_util += cpu_util(i); sgs->sum_h_nr_running += rq->cfs.h_nr_running; @@ -8569,7 +8574,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) init_sd_lb_stats(&sds); /* - * Compute the various statistics relavent for load balancing at + * Compute the various statistics relevant for load balancing at * this level. */ update_sd_lb_stats(env, &sds); @@ -8748,10 +8753,10 @@ static struct rq *find_busiest_queue(struct lb_env *env, case migrate_load: /* - * When comparing with load imbalance, use cpu_runnable_load() + * When comparing with load imbalance, use cpu_load() * which is not scaled with the CPU capacity. */ - load = cpu_runnable_load(rq); + load = cpu_load(rq); if (nr_running == 1 && load > env->imbalance && !check_cpu_capacity(rq, env->sd)) @@ -8759,7 +8764,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, /* * For the load comparisons with the other CPU's, consider - * the cpu_runnable_load() scaled with the CPU capacity, so + * the cpu_load() scaled with the CPU capacity, so * that the load can be moved away from the CPU that is * potentially running at a lower capacity. * From patchwork Thu Aug 1 14:40:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170382 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538612ile; Thu, 1 Aug 2019 07:40:46 -0700 (PDT) X-Google-Smtp-Source: APXvYqzN3xduv9h8PFozKDDNbUprmQpOdmV26VDr9xwan6nI/bB8rSJaRukFAq77ekkZY1ZEZqSH X-Received: by 2002:a17:90a:cb12:: with SMTP id z18mr8705989pjt.82.1564670446037; Thu, 01 Aug 2019 07:40:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670446; cv=none; d=google.com; s=arc-20160816; b=lcmvBiLoYVGvOKFi2K6+B+09ZAMtvT7RyTfLgwt8P7zPZVp++0i1BZxoJgJw1rzoV6 SvGKlZs/amqC9bXVgJ+VsJiHS4YhR211DqdjwXqUI+sfnTMXp0vmjXGLJF0HZxsNu25U ZPuM/6D/H7A/Z1XMOW0JUqTTBY+dtxJdPqvfUPV0ZLq7zZtIlhylwrmmzj2ii2b9aCmB W2EHZJRMcCg8IzytrFlldtEIghP1y5JFzC2RM4mLp0W8JjcRAZ1yTsGT/dHEylfFHTz4 5ch9mdzG/Oj8LL6RZ/40qbpi4YKRbdLwz2ANBBeXe0lGB8nKq2vq3YWY889DKiAi4SpJ OvRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=M2NFLK/zY34Il3CN9ocOI5d4GwkmGixwcSg2zvyFkdU=; b=Qt8dtM7pi2Kd/EbaVlFYhkmF/SQ0UGgaZsI++Tvu9X2L2KKkIJ3Yv47zLTqqD7sx02 UBSJel6Ox9RlE5vBLwgjS4T3+uJ0sTzMQObXk2Edy4+NNAJts7u5INAvtAjXtnpgXFMZ l30kvPmRBmxGEHl5lzlVddr/6P4OaT/znSDuvo1++YE2/8T+ECZNxrwR2GbpvkBHHUDx WDnA9HpyOvneWL7D4Ud1Z6cR5ErtXTC01juqKrf5ZVTztpFVZ8BdmQmkLiP66l0ZOTq6 QSqR7QVFU10Rdv6pDcBatWMw7DxYoueeX46LbIEzHJqJ310ksAUSv6HfRI2cBTRdxupV pgCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=AZLgdaUg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a19si39063386pgw.234.2019.08.01.07.40.45; Thu, 01 Aug 2019 07:40:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=AZLgdaUg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732229AbfHAOko (ORCPT + 29 others); Thu, 1 Aug 2019 10:40:44 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:35868 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1731741AbfHAOkk (ORCPT ); Thu, 1 Aug 2019 10:40:40 -0400 Received: by mail-wm1-f66.google.com with SMTP id g67so59336669wme.1 for ; Thu, 01 Aug 2019 07:40:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=M2NFLK/zY34Il3CN9ocOI5d4GwkmGixwcSg2zvyFkdU=; b=AZLgdaUgvzf281Lh9Me9GLJ9HM7ksZx0exWcc3M06YCjvkJgxQRL0eNp8iIJt3CFPg S5lnCeMYzoTmQgu/M5iH0Op2fF5cGGbwjRVWRNO97eSbNfsIM8GtoqBsIA4rpn5TIU+K RpLbzpKHuFN9Su18XVPbbnIfrEBdIdYtxWZuivhFIEDq7R+CX7DmWtuhgIEGCa9P8+sR rYSTPPgOHupwuISVf/+DW5sByd2gxlrwIK30r6pEkaVdwT25cADrM4jhTbOBMZEtA0ld QntChvNFNwHb4wG5xR0osfqq0iPdfvFpgm02ozjN/pOwLVAZeIfTfXkKrx1DVbUS6JgR ntnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=M2NFLK/zY34Il3CN9ocOI5d4GwkmGixwcSg2zvyFkdU=; b=lHjarZNUgHUeFKf/6/dKqu+BotkTh26v3VqcFRpiQolOIzueMDSVdjIA9a7uKAKFer gwvQitFlQYspvpMX7pJ8/t2B7RbqLSHJuOkOU4Y6f4GsCOkEdE/GtijYBqiyjj7T1QEw NCpwy6ulEYodL4JWISJ20Iq3mnThCaJIVkHLWBdFHdtgowEth8MBqyXy8LKOaOZ9a0v0 lf2WYcBSDGTgoeC61pTJ0R4IWIFjGFJYYYrNM5gxui1bh28Uhrqwu/NMYKcUWKAPdAMS qMY5UzyV+/4XiO1sE1UNO5sG7JwHtzD8yCrsCRaQrSmRtulVtWnuzQ/xey0btlYpgnQP r7jA== X-Gm-Message-State: APjAAAVV9oC7OHi1Tfyr1RSvye4JyafP+gy4f5kZQqgZH+OMMGxtHkF+ 11V0v1vkHqhM6ozIAqmDBE/bzdzz4N0= X-Received: by 2002:a7b:c398:: with SMTP id s24mr118537283wmj.53.1564670437791; Thu, 01 Aug 2019 07:40:37 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.36 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:37 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 7/8] sched/fair: evenly spread tasks when not overloaded Date: Thu, 1 Aug 2019 16:40:23 +0200 Message-Id: <1564670424-26023-8-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When there is only 1 cpu per group, using the idle cpus to evenly spread tasks doesn't make sense and nr_running is a better metrics. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dfaf0b8..53e64a7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8654,18 +8654,34 @@ static struct sched_group *find_busiest_group(struct lb_env *env) busiest->sum_nr_running > local->sum_nr_running + 1) goto force_balance; - if (busiest->group_type != group_overloaded && - (env->idle == CPU_NOT_IDLE || - local->idle_cpus <= (busiest->idle_cpus + 1))) - /* - * If the busiest group is not overloaded - * and there is no imbalance between this and busiest group - * wrt idle CPUs, it is balanced. The imbalance - * becomes significant if the diff is greater than 1 otherwise - * we might end up to just move the imbalance on another - * group. - */ - goto out_balanced; + if (busiest->group_type != group_overloaded) { + if (env->idle == CPU_NOT_IDLE) + /* + * If the busiest group is not overloaded (and as a + * result the local one too) but this cpu is already + * busy, let another idle cpu try to pull task. + */ + goto out_balanced; + + if (busiest->group_weight > 1 && + local->idle_cpus <= (busiest->idle_cpus + 1)) + /* + * If the busiest group is not overloaded + * and there is no imbalance between this and busiest + * group wrt idle CPUs, it is balanced. The imbalance + * becomes significant if the diff is greater than 1 + * otherwise we might end up to just move the imbalance + * on another group. Of course this applies only if + * there is more than 1 CPU per group. + */ + goto out_balanced; + + if (busiest->sum_h_nr_running == 1) + /* + * busiest doesn't have any tasks waiting to run + */ + goto out_balanced; + } force_balance: /* Looks like there is an imbalance. Compute it */ From patchwork Thu Aug 1 14:40:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 170383 Delivered-To: patch@linaro.org Received: by 2002:a92:512:0:0:0:0:0 with SMTP id q18csp5538727ile; Thu, 1 Aug 2019 07:40:51 -0700 (PDT) X-Google-Smtp-Source: APXvYqyHWFYShS7giBK3YjmHGiIqynm297G0srX2dRLWxTbBKIZL//ANaUGfQi2Zj3ywO/FiZlyY X-Received: by 2002:a65:6448:: with SMTP id s8mr120077518pgv.223.1564670450823; Thu, 01 Aug 2019 07:40:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564670450; cv=none; d=google.com; s=arc-20160816; b=iJ01Oor7Jmdt/tmkj+CzjW9xKWJerAOMRXk84FkYtCAto0I0e5/XgYpgs6cBh6lPPu Ht2V3nlo6CBNW/pkAXGX16e4pPrIX2N+uBkD2BS7IrCx5Lf+6pn18YUPGDoGkPTrKrqs NEuXo7GTRo2z+DYsSAOimq2FdcDOt7oZdO4YCYj0tHCMgbbwJ5wMvT8X1+4s6yX6UQao 2dulMbvaoXhkCpKeSoj3w/TIxnZHLTBVpuf0wBJ47HNKfPv57H55K6cyNYKOIMa0zkcU JQT4/fMNDMCqLuhZgb8qVWtZsXMQDmxcpp1c2JhlWA5wkensuc1zUEno0P//Sv7hvC++ GbGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=JLjvU2ODIosSbAGgfcgnEGYGA9K2KD8U033BkbWqs6s=; b=AtP2OWAM/WaLXjUmjOxfqmAOBF7G1dG0vBMc2cQtJ6+4L5qc/p0qA8+Acsk7jlekSY AabWFQSzRPVqgeRJtICOzVrmygq9gKYs0Zrsavnfp/S44sTaga+ngVmRT1IshJRujONl Em/ywCgQbDRz+xSqbv+a1EHOsrzSpl1Jn6tfiKlgExruZp7qiv+1yoQqWNuh2y7c0BZZ 3i1c6uU2OzSydkPGDBKcRCfNEoXhA2Mu+DBCrhs9ZzxNeEeznp660oEKlG+XUnUqkXZS Z2fdekjOtVp/U7MY53yP2rI6UuZTUYVS8z4mXUEATlUMum4ZLYEjGMTV+YWDc/00dy9X uiUA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bStR7V2Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3si34494753pfa.89.2019.08.01.07.40.50; Thu, 01 Aug 2019 07:40:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=bStR7V2Y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732237AbfHAOkt (ORCPT + 29 others); Thu, 1 Aug 2019 10:40:49 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:50994 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732207AbfHAOkk (ORCPT ); Thu, 1 Aug 2019 10:40:40 -0400 Received: by mail-wm1-f68.google.com with SMTP id v15so64929719wml.0 for ; Thu, 01 Aug 2019 07:40:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JLjvU2ODIosSbAGgfcgnEGYGA9K2KD8U033BkbWqs6s=; b=bStR7V2YeW1gpq+dA8PYooGb99XNuHu+zzQtd51dbKalI8jrPgAb2R9ba/V4CAdF/H woSWGAhZ/iHRhtyFLJs9GdUWHXbv81mkCwgAWmWkHJM9sR33mqR+hVEIto13KjW61G9V /9MqwZDC/t+QiZYeHOAH3iqU2AabjsjpHQFaeWn6DOZwvj3mh6FxGGXiS1qFIqTMTe0L K/Mxa3+TABsk5BcInkHnOZEiiq5cThOwPqIfDSGg2zldedK6t8eZD1wu4GuWkwf/Zf+w k6+9I07eJO2tzcy0/9sexqeBmcBYNm8N+ynpD6GIyR83nfZVg9+BSmldfV8Gmr9R4Nmt 3H3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JLjvU2ODIosSbAGgfcgnEGYGA9K2KD8U033BkbWqs6s=; b=BpaiuKhZW4iQEqclXxalXV1zveX7maETqjL63RdKnxHRzeIa7s9GfJp4/BkqGykeon JD8UlYOpVvl5pL4LC7aN9UZRF4ZJw8bJUddHF0LaisYZBrAr4Z97OPDpa5nskaI8ahuZ z4hL5aM/lJuNoHUA6+2VDmMA3qixOroA9u0LmJMaRLRdOjf24QtlZgdWy/ohbjYuJ5V2 2o1dOn70wZBi8DV69pXhadCCTq9elG11EIO6tC0+eq6XjflT4jl0TpOthRLdFSmJObBH e95pFJZQ/IE8o7K4l+JsbWCcw7jeFAYqTkBHVVKhB5C/dcYQi8/UO4XXDN+wkYXDJBOp WMRw== X-Gm-Message-State: APjAAAUt+ODNRLqilDtHxvT0+gNSlnYTudi6Tp/1Alkwt5wovOiiZorb yMVfxx5jGgcqRMbtxDWUZdAHY+DxiGk= X-Received: by 2002:a1c:a686:: with SMTP id p128mr55764273wme.130.1564670438525; Thu, 01 Aug 2019 07:40:38 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:9865:5ad1:5ff3:80c]) by smtp.gmail.com with ESMTPSA id y10sm58768873wmj.2.2019.08.01.07.40.37 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 01 Aug 2019 07:40:38 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, Vincent Guittot Subject: [PATCH v2 8/8] sched/fair: use utilization to select misfit task Date: Thu, 1 Aug 2019 16:40:24 +0200 Message-Id: <1564670424-26023-9-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> References: <1564670424-26023-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org utilization is used to detect a misfit task but the load is then used to select the task on the CPU which can lead to select a small task with high weight instead of the task that triggered the misfit migration. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 28 ++++++++++++++-------------- kernel/sched/sched.h | 2 +- 2 files changed, 15 insertions(+), 15 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 53e64a7..d08cc12 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3817,16 +3817,16 @@ static inline void update_misfit_status(struct task_struct *p, struct rq *rq) return; if (!p) { - rq->misfit_task_load = 0; + rq->misfit_task_util = 0; return; } if (task_fits_capacity(p, capacity_of(cpu_of(rq)))) { - rq->misfit_task_load = 0; + rq->misfit_task_util = 0; return; } - rq->misfit_task_load = task_h_load(p); + rq->misfit_task_util = task_util_est(p); } #else /* CONFIG_SMP */ @@ -7487,14 +7487,14 @@ static int detach_tasks(struct lb_env *env) break; case migrate_misfit: - load = task_h_load(p); + util = task_util_est(p); /* * utilization of misfit task might decrease a bit * since it has been recorded. Be conservative in the * condition. */ - if (load < env->imbalance) + if (2*util < env->imbalance) goto next; env->imbalance = 0; @@ -7785,7 +7785,7 @@ struct sg_lb_stats { unsigned int group_weight; enum group_type group_type; unsigned int group_asym_capacity; /* tasks should be move to preferred cpu */ - unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ + unsigned long group_misfit_task_util; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; unsigned int nr_preferred_running; @@ -7959,7 +7959,7 @@ check_cpu_capacity(struct rq *rq, struct sched_domain *sd) */ static inline int check_misfit_status(struct rq *rq, struct sched_domain *sd) { - return rq->misfit_task_load && + return rq->misfit_task_util && (rq->cpu_capacity_orig < rq->rd->max_cpu_capacity || check_cpu_capacity(rq, sd)); } @@ -8078,7 +8078,7 @@ group_type group_classify(struct lb_env *env, if (sgs->group_asym_capacity) return group_asym_capacity; - if (sgs->group_misfit_task_load) + if (sgs->group_misfit_task_util) return group_misfit_task; if (!group_has_capacity(env, sgs)) @@ -8164,8 +8164,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, /* Check for a misfit task on the cpu */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && - sgs->group_misfit_task_load < rq->misfit_task_load) { - sgs->group_misfit_task_load = rq->misfit_task_load; + sgs->group_misfit_task_util < rq->misfit_task_util) { + sgs->group_misfit_task_util = rq->misfit_task_util; *sg_status |= SG_OVERLOAD; } } @@ -8261,7 +8261,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * If we have more than one misfit sg go with the * biggest misfit. */ - if (sgs->group_misfit_task_load < busiest->group_misfit_task_load) + if (sgs->group_misfit_task_util < busiest->group_misfit_task_util) return false; break; @@ -8458,7 +8458,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s if (busiest->group_type == group_misfit_task) { /* Set imbalance to allow misfit task to be balanced. */ env->balance_type = migrate_misfit; - env->imbalance = busiest->group_misfit_task_load; + env->imbalance = busiest->group_misfit_task_util; return; } @@ -8801,8 +8801,8 @@ static struct rq *find_busiest_queue(struct lb_env *env, * For ASYM_CPUCAPACITY domains with misfit tasks we simply * seek the "biggest" misfit task. */ - if (rq->misfit_task_load > busiest_load) { - busiest_load = rq->misfit_task_load; + if (rq->misfit_task_util > busiest_util) { + busiest_util = rq->misfit_task_util; busiest = rq; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7583fad..ef6e1b2 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -916,7 +916,7 @@ struct rq { unsigned char idle_balance; - unsigned long misfit_task_load; + unsigned long misfit_task_util; /* For active balancing */ int active_balance;