From patchwork Thu Sep 19 07:33:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174038 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664010ill; Thu, 19 Sep 2019 00:34:04 -0700 (PDT) X-Google-Smtp-Source: APXvYqwMxI1YYO1ZMWn1HQEPFvNn0d5CYIDZC4a526Bsy++YscwLrx94FMaUu3TvsQy975NerKky X-Received: by 2002:a50:a557:: with SMTP id z23mr14071123edb.99.1568878443941; Thu, 19 Sep 2019 00:34:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878443; cv=none; d=google.com; s=arc-20160816; b=vEvf68bdnw38opIz4QPK/FfWUjU2nYjGsepP32caP2DxEhNlmEewOX7ZxT877IxVhp Wtql50cLPp/AjtFerBmRpYnzg2ivBcZfP2fAwWlAROtDqJMMbRIWOWhv2j6EnVogo2yW Mzh/0I0myM3+RkIdYZlWPmoaCzcl7BR8AjaJ8ZHiUtyqlZC732Ls7wlwZpIok6pGHSeH 1AoupxO+ETjqrNdLvYGhwPYpx/7PThieMxtFsT2AEZ24N8J+RsM1pOe6WQY/vdh/70UI XIEjwO/vMiZXdYIc3mw3rEi2en0l8b9mH458/xkraxDNuj6Lo3JBRwpp5mT7eeurc9U/ Lo2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=Yv6OJ+qJWSUveYSQClIlpg+HtWhIHflpjhPTFDhhWg0=; b=d63Xr/SXJvFwBu47yyiN/Jo88OrWUFsL0Pv3HFgSWW5ZCkv+Rf3BwCTsD5p1Kja8qz LUpbCEMgBsF2cJ5I07UsiTeL6loN8fIZJfJXk6eUDD3RSZiiF0R8WUJZpW67DZtrzvET mzghWM7nEddllWKEWh0n5GCYMI2r8Sor9EcmdGgVJbE1aZu0lPFrQR6nXDvziJuVKXot H86O2siZTLTEIoGN/xN4FQGhzW1wQJb4fP+wEw37HjkVRMdyALyU/RHVTO3aXF30NWhs Is7rIoHgbYI9+W/0UEa77c8/rn4jMYdiFpyKfzcVMZRvZhbrFpfa9vWJkGepqPYsh6ST gtNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gU4R5hVl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id e10si4067126ejf.69.2019.09.19.00.34.03; Thu, 19 Sep 2019 00:34:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=gU4R5hVl; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388628AbfISHeC (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:02 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:34669 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387987AbfISHeB (ORCPT ); Thu, 19 Sep 2019 03:34:01 -0400 Received: by mail-wm1-f68.google.com with SMTP id y135so6509467wmc.1 for ; Thu, 19 Sep 2019 00:33:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Yv6OJ+qJWSUveYSQClIlpg+HtWhIHflpjhPTFDhhWg0=; b=gU4R5hVlKT8S03ABoMXLVq+a9wpdo+RCX13F8JEMTiQCW4ZMjIarL6SaDV5qypH8Wd zrhAtU8SDUaELT1tecX1Y+0WNzvBdsPuE47xctMS/zIE7nFrHD/p1uUrkacLbZpXhzHW +JucRgZEomQeGAL/YxQ97GN6gme3eA6WRV0Eo4f1moUCNZCX5cOwE9LBorAbVx0Hp2ft qMX+ngNiNZq4axL611v3eSJ8TU3IH5VowNhLYFmI67gfZOfMdIOPnnLMj0JeTz1i3LJ0 vstNT2HEMmP+Gr/UiNHaAbcvNanVGVCwu6njib1D/bUj4iFie06Q2gULbB0db4t3MpPw fh3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Yv6OJ+qJWSUveYSQClIlpg+HtWhIHflpjhPTFDhhWg0=; b=g4KBgyq0uQo0xLDXlMRroZRVSmvLbd/X/BFLc2pFZBcSTV/toha5Lu1iFevI2a5Rg8 Zgu24xopwEJ7r8QHjrvOY0AiBfCBUClf2tCM75Zgba0stL9LLpn3XLBFegoYM3x2QgKT Plr+K6m4Ia4slL6bOOXuQb1Eg24/bhSJOcQPzvCKDyUyjMJ/m5DVtR9J1gqsR6dhgpij TzIlbQ0Nrveu3lkCpSPaooxUEqlPVsD7rCJzLdMU0SVpJgE5QQSyhuxQTGPbPhmQpCTk kDbeMOJzyYsnw5KAWR1wVRpsUOvhzj2k+nUXwsNMgx1lF6zearA5zVHfHr1VfgWPxQCO V2RA== X-Gm-Message-State: APjAAAW8t0XeBWWhGXeRS8GyJaiJBEOXNXLjeqOXwIzAbBaUospYZk7r lCzpl3miSEKXf9wU5K7YQ86qWxF4ASY= X-Received: by 2002:a1c:f209:: with SMTP id s9mr1614438wmc.60.1568878437850; Thu, 19 Sep 2019 00:33:57 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.33.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:33:56 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 01/10] sched/fair: clean up asym packing Date: Thu, 19 Sep 2019 09:33:32 +0200 Message-Id: <1568878421-12301-2-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Clean up asym packing to follow the default load balance behavior: - classify the group by creating a group_asym_packing field. - calculate the imbalance in calculate_imbalance() instead of bypassing it. We don't need to test twice same conditions anymore to detect asym packing and we consolidate the calculation of imbalance in calculate_imbalance(). There is no functional changes. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 63 ++++++++++++++--------------------------------------- 1 file changed, 16 insertions(+), 47 deletions(-) -- 2.7.4 Acked-by: Rik van Riel diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 1054d2c..3175fea 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7685,6 +7685,7 @@ struct sg_lb_stats { unsigned int group_weight; enum group_type group_type; int group_no_capacity; + unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */ unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING unsigned int nr_numa_running; @@ -8139,9 +8140,17 @@ static bool update_sd_pick_busiest(struct lb_env *env, * ASYM_PACKING needs to move all the work to the highest * prority CPUs in the group, therefore mark all groups * of lower priority than ourself as busy. + * + * This is primarily intended to used at the sibling level. Some + * cores like POWER7 prefer to use lower numbered SMT threads. In the + * case of POWER7, it can move to lower SMT modes only when higher + * threads are idle. When in lower SMT modes, the threads will + * perform better since they share less core resources. Hence when we + * have idle threads, we want them to be the higher ones. */ if (sgs->sum_nr_running && sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { + sgs->group_asym_packing = 1; if (!sds->busiest) return true; @@ -8283,51 +8292,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } /** - * check_asym_packing - Check to see if the group is packed into the - * sched domain. - * - * This is primarily intended to used at the sibling level. Some - * cores like POWER7 prefer to use lower numbered SMT threads. In the - * case of POWER7, it can move to lower SMT modes only when higher - * threads are idle. When in lower SMT modes, the threads will - * perform better since they share less core resources. Hence when we - * have idle threads, we want them to be the higher ones. - * - * This packing function is run on idle threads. It checks to see if - * the busiest CPU in this domain (core in the P7 case) has a higher - * CPU number than the packing function is being run on. Here we are - * assuming lower CPU number will be equivalent to lower a SMT thread - * number. - * - * Return: 1 when packing is required and a task should be moved to - * this CPU. The amount of the imbalance is returned in env->imbalance. - * - * @env: The load balancing environment. - * @sds: Statistics of the sched_domain which is to be packed - */ -static int check_asym_packing(struct lb_env *env, struct sd_lb_stats *sds) -{ - int busiest_cpu; - - if (!(env->sd->flags & SD_ASYM_PACKING)) - return 0; - - if (env->idle == CPU_NOT_IDLE) - return 0; - - if (!sds->busiest) - return 0; - - busiest_cpu = sds->busiest->asym_prefer_cpu; - if (sched_asym_prefer(busiest_cpu, env->dst_cpu)) - return 0; - - env->imbalance = sds->busiest_stat.group_load; - - return 1; -} - -/** * fix_small_imbalance - Calculate the minor imbalance that exists * amongst the groups of a sched_domain, during * load balancing. @@ -8411,6 +8375,11 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s local = &sds->local_stat; busiest = &sds->busiest_stat; + if (busiest->group_asym_packing) { + env->imbalance = busiest->group_load; + return; + } + if (busiest->group_type == group_imbalanced) { /* * In the group_imb case we cannot rely on group-wide averages @@ -8515,8 +8484,8 @@ static struct sched_group *find_busiest_group(struct lb_env *env) busiest = &sds.busiest_stat; /* ASYM feature bypasses nice load balance check */ - if (check_asym_packing(env, &sds)) - return sds.busiest; + if (busiest->group_asym_packing) + goto force_balance; /* There is no busy sibling group to pull tasks from */ if (!sds.busiest || busiest->sum_nr_running == 0) From patchwork Thu Sep 19 07:33:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174047 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664521ill; Thu, 19 Sep 2019 00:34:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqxa9jH1Ykq32CoEwapmaqHqIUzWnv75CkW40XZ7eMC6+aZjqdSrjFHVnPrwWXJ5nfTYeEpR X-Received: by 2002:a50:d55e:: with SMTP id f30mr14812006edj.35.1568878477997; Thu, 19 Sep 2019 00:34:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878477; cv=none; d=google.com; s=arc-20160816; b=AUG5ElKokVNZUFn5AcA+Mkmnv+WTjM9gxRuOQgHx0HnMuDe8aVinF9lUZ7WmanIIkv LLJGZErytbtoaatRrK9jblw1wrOta/XTYaAfNWGZFquhMk881rAdr5VqAQHLz9vHoLF1 CYXjrj40n/jHUb0hZ60PdqU3hKqbzU7a+YqQEppJWkxXv8y9nWmMA2ngtoHQwBPKtXvQ qcafkeL3glFAHjZddz89NoSvxIQetnVS1ay+pk0WQrzcyeO7tUrarzJgtrMsJux7K4/r G2pUn71asZ5vtaPg2J8vRPXCTS5YDmgxhIdHHLnOk9ywARXUEttGjmZg+FBJI3nI6OMF 3UcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=A/SFuEruliHoJKYfECKgXFMMbQHFq3dwW+v5uybTb84=; b=qxMRoMhZK+C7j2hIFuS5YVUgxGqq/P94NuZ7+3i8Nmxuheri55mCC2Yx8KwKR4JgdV B3PMHmuFKisia8j18+cx4hre1aDIajBDNrU7DJLQVXZ/wsqe2X/fcZM2Hw3Www0yPWNC cWukGcwK77We5+rqcDIEK0CvoHMFRtjE8GJQJkXzJa6QU3F5mgfRr2H3n0MpAR3PTnZu 6DAaTNm1vq6KIDL3liO2jjbtB/GZXuSAdOKmd83cYu1DDrgu/zJV86lFj/+0DSi89mRs 8zn/Xp2e21WGWRvbbazOr20Ymqd+4lAHcnJG/MhY1eCZlxW/zxTFdshK2Bm4ldNLwFCZ gfgg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=fJDYCYjY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f15si4936223edm.414.2019.09.19.00.34.37; Thu, 19 Sep 2019 00:34:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=fJDYCYjY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388742AbfISHee (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:34 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:36468 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388613AbfISHeB (ORCPT ); Thu, 19 Sep 2019 03:34:01 -0400 Received: by mail-wm1-f67.google.com with SMTP id m18so239372wmc.1 for ; Thu, 19 Sep 2019 00:34:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=A/SFuEruliHoJKYfECKgXFMMbQHFq3dwW+v5uybTb84=; b=fJDYCYjYKy8EiIblYLJyhEy/vfUWklDTbcUJLIG2CYH6z/PA3AjGQgQ1ETjX65zb1j CZTQr7YBcUiwmvpnfcJjpdiuQ61mYOC03pAPXsTX7FW7CzzrnDpY21pWxBhB2g/t3JM/ euCwsBU5ogNi1SPeFgu0N5cubN3XIbHD9q3bsg6pb+Btqx3dHQxMGj1oFILTDZGuZNG5 6w+yVrCv9NrxI2fcs7eCv5nu513rGx1/NNQQa+XEWYWLiZH789GdugXeXwiPcYwmjQbN Qkctnv2hIvZEJkDcCTTKVFm9j1bxlueo4uBzzQb03DiUiPCXaAWDDaelJWmfYNfRLASP BHlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=A/SFuEruliHoJKYfECKgXFMMbQHFq3dwW+v5uybTb84=; b=CucA+VTEzWIRnkgx2hdwgPO+E/l9bCYVVV2RnaJPeaVplTsUJf/weGkwRSTxXjlBJd IETLAv4DLiWcyueYhTC51q/SxwZuNLKttaZa68u/Vf7Ng13k9DJ9ZcUBHUg7122hVyQ4 1/9iGVtHabSpSzWgkI6VJOD+IuT9NX3lTAP16sbqobitafczkgPapzXQlUrz1xHFm9qN pDzykI2F8VrgX2zB0oLa4/CGAseRE02/X4voR34uPjdPEo+ovLq29zgOuUDtt6w5p9bP AABVgJ4R6nu/4U54Pf3wuW0nfYezbiV/5DgoJpRIZfec2PuJ09Bh5ZDC5kPa6xmiuI+L StHA== X-Gm-Message-State: APjAAAWF8tH0n02b0cq3T8z3+4NWcqIwESRJ1TO7LciZeIJmYEjF7PpD JIBPdLXvS/X95dg25JtlBwGERjUc28M= X-Received: by 2002:a05:600c:230d:: with SMTP id 13mr1622467wmo.114.1568878439717; Thu, 19 Sep 2019 00:33:59 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.33.58 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:33:58 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 02/10] sched/fair: rename sum_nr_running to sum_h_nr_running Date: Thu, 19 Sep 2019 09:33:33 +0200 Message-Id: <1568878421-12301-3-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Rename sum_nr_running to sum_h_nr_running because it effectively tracks cfs->h_nr_running so we can use sum_nr_running to track rq->nr_running when needed. There is no functional changes. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) -- 2.7.4 Acked-by: Rik van Riel Reviewed-by: Valentin Schneider diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3175fea..02ab6b5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7680,7 +7680,7 @@ struct sg_lb_stats { unsigned long load_per_task; unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ - unsigned int sum_nr_running; /* Nr tasks running in the group */ + unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ unsigned int idle_cpus; unsigned int group_weight; enum group_type group_type; @@ -7725,7 +7725,7 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds) .total_capacity = 0UL, .busiest_stat = { .avg_load = 0UL, - .sum_nr_running = 0, + .sum_h_nr_running = 0, .group_type = group_other, }, }; @@ -7916,7 +7916,7 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running < sgs->group_weight) + if (sgs->sum_h_nr_running < sgs->group_weight) return true; if ((sgs->group_capacity * 100) > @@ -7937,7 +7937,7 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running <= sgs->group_weight) + if (sgs->sum_h_nr_running <= sgs->group_weight) return false; if ((sgs->group_capacity * 100) < @@ -8029,7 +8029,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_load += cpu_runnable_load(rq); sgs->group_util += cpu_util(i); - sgs->sum_nr_running += rq->cfs.h_nr_running; + sgs->sum_h_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; if (nr_running > 1) @@ -8059,8 +8059,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; - if (sgs->sum_nr_running) - sgs->load_per_task = sgs->group_load / sgs->sum_nr_running; + if (sgs->sum_h_nr_running) + sgs->load_per_task = sgs->group_load / sgs->sum_h_nr_running; sgs->group_weight = group->group_weight; @@ -8117,7 +8117,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * capable CPUs may harm throughput. Maximize throughput, * power/energy consequences are not considered. */ - if (sgs->sum_nr_running <= sgs->group_weight && + if (sgs->sum_h_nr_running <= sgs->group_weight && group_smaller_min_cpu_capacity(sds->local, sg)) return false; @@ -8148,7 +8148,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, * perform better since they share less core resources. Hence when we * have idle threads, we want them to be the higher ones. */ - if (sgs->sum_nr_running && + if (sgs->sum_h_nr_running && sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { sgs->group_asym_packing = 1; if (!sds->busiest) @@ -8166,9 +8166,9 @@ static bool update_sd_pick_busiest(struct lb_env *env, #ifdef CONFIG_NUMA_BALANCING static inline enum fbq_type fbq_classify_group(struct sg_lb_stats *sgs) { - if (sgs->sum_nr_running > sgs->nr_numa_running) + if (sgs->sum_h_nr_running > sgs->nr_numa_running) return regular; - if (sgs->sum_nr_running > sgs->nr_preferred_running) + if (sgs->sum_h_nr_running > sgs->nr_preferred_running) return remote; return all; } @@ -8243,7 +8243,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd */ if (prefer_sibling && sds->local && group_has_capacity(env, local) && - (sgs->sum_nr_running > local->sum_nr_running + 1)) { + (sgs->sum_h_nr_running > local->sum_h_nr_running + 1)) { sgs->group_no_capacity = 1; sgs->group_type = group_classify(sg, sgs); } @@ -8255,7 +8255,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd next_group: /* Now, start updating sd_lb_stats */ - sds->total_running += sgs->sum_nr_running; + sds->total_running += sgs->sum_h_nr_running; sds->total_load += sgs->group_load; sds->total_capacity += sgs->group_capacity; @@ -8309,7 +8309,7 @@ void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) local = &sds->local_stat; busiest = &sds->busiest_stat; - if (!local->sum_nr_running) + if (!local->sum_h_nr_running) local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); else if (busiest->load_per_task > local->load_per_task) imbn = 1; @@ -8407,7 +8407,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s */ if (busiest->group_type == group_overloaded && local->group_type == group_overloaded) { - load_above_capacity = busiest->sum_nr_running * SCHED_CAPACITY_SCALE; + load_above_capacity = busiest->sum_h_nr_running * SCHED_CAPACITY_SCALE; if (load_above_capacity > busiest->group_capacity) { load_above_capacity -= busiest->group_capacity; load_above_capacity *= scale_load_down(NICE_0_LOAD); @@ -8488,7 +8488,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* There is no busy sibling group to pull tasks from */ - if (!sds.busiest || busiest->sum_nr_running == 0) + if (!sds.busiest || busiest->sum_h_nr_running == 0) goto out_balanced; /* XXX broken for overlapping NUMA groups */ From patchwork Thu Sep 19 07:33:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174039 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664112ill; Thu, 19 Sep 2019 00:34:09 -0700 (PDT) X-Google-Smtp-Source: APXvYqwGJuV0KNYccnl+iQRp6CqPSB49LnX1I9FXf7snPOhXG1sd4oi3/WjE2smuCbOw9+mrMJLx X-Received: by 2002:a17:906:699:: with SMTP id u25mr7179715ejb.250.1568878449778; Thu, 19 Sep 2019 00:34:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878449; cv=none; d=google.com; s=arc-20160816; b=T88qyxBdUxHKYL/KLdQNVa/+QJ3NwH72GHbghENtLxZ+Z4pwSARzV+e+2v/gfk8i+G fz9JyrPem2mEF44jzDcJWjTJPfPTiFKb3f61nXsV7HXE2LJetg0c3BvgkmSVGllpyQ3s wFu66RKz/4pocVoCtdjJ4Ru+09Pay70owcGVYbeXVjk/9KBRJ3d+XA4fYVTnghsJvl8R ZdTItEPj7L7LB2rJjZAFsBbLas83BtzpX/M1kpq4LgX05VQqdFRzynOVKGlci/qi3J+A rsTJCHVVCbmvBNEQvA2lBrfEfs9F2EZvjtzwx0CeMm10vi4iNFTmVnonL27FLP3Jgi9T TMUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=vT3Cd0NoScWU8B4s0yMOgG1KA81Kqbh6T7GDIZMysSg=; b=VnZguOcMJu+Ww06isKw1RYzKNDYIgi8dGeedeBbu+FZilZT1Qn/DeYr145VUUvtp3C OcdW6NRjPotk8uUH1I+9PmpHdxwoQ81sJxO7djYzDaIj+pp+g+1Yh3geSXr8ugbyNze7 hpd/pBeftc7QF0e7m69+q+fFjIWLKQgcyYV3fQPyZZd9bv+U2ut0lMuuCVViAwlxU1pO t+nBhICYSX2zfj420RWTCcmjDVWKnjZf9cLnB/hMPfRPZfc9YMSDgIHSP2jRk94jmMlz BI5jYCo671VE43oZYlgezPPj/PNs8ulS15HGfC7OV9F7f/PFpehmF6l18l9zEZgLs53q txaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IkJ83N0y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j14si4683323eda.181.2019.09.19.00.34.09; Thu, 19 Sep 2019 00:34:09 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=IkJ83N0y; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388639AbfISHeF (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:05 -0400 Received: from mail-wr1-f66.google.com ([209.85.221.66]:46809 "EHLO mail-wr1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388627AbfISHeE (ORCPT ); Thu, 19 Sep 2019 03:34:04 -0400 Received: by mail-wr1-f66.google.com with SMTP id o18so1873628wrv.13 for ; Thu, 19 Sep 2019 00:34:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=vT3Cd0NoScWU8B4s0yMOgG1KA81Kqbh6T7GDIZMysSg=; b=IkJ83N0ynH6XKiHk3CQil2ckbXxjqo5pUHPS6taYyJRDS8qrGC9+tyFOB+7M84gGKo wTSGVdxZ5ZjvvMC3XiD5bdflf3YR+EOkG59XaRaAZIqkg22WJrWmkYrIhzv2n9nSfMek DE1q0JyvpkVb5PsKAbI3keJvgI3EyKv+6VbzthFqjwbq3uDyir+p/P4/t07DDT2HGj56 vXY83lcVveh2vl+JgFFIl6NtjpiTEUO36IPe8TAYZ9Res/U3AZMANGDGzSAneDFzXGAC 9FRepOWBk+rJyynJyoMroEBHJ7Y34W4bSOM0AgWCsA02jwgUbdCSycd5P27mwOjl8d5h 9aZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=vT3Cd0NoScWU8B4s0yMOgG1KA81Kqbh6T7GDIZMysSg=; b=oLQkCo530jWlt7aAD93piJqdOC7xmkW5+o5n/1Q7Hv/9vAAUmDYGebKn4Uc6TJg26n g6aSoj45jnDmAfmsxfaeGYyVi1K1UMACTvPH3uGmszALLym7vUpPUpF4JAWoJaylMge2 YWzBgpDSwqWTJZSkdAPnhaetPY0DYOvqGR85Q+D+O6qlkRkGelAJ9f02h+dMENAjMFMV zG4D6A+zeOxbi/TeeD7bQTQJIxXiLy1lAJhn7yJHTIPF/Eia8mONxRX9y84X/p/9sX/F seANIgl+USYPka0nqk7M5HIjShgwZwhFCog9oD/643w6SWTGgjKzMpBYUfkcjc6gCHL1 7zpQ== X-Gm-Message-State: APjAAAVYEU58QFfUIIGEurv8GjsRBm7MNnu3hVq2Pw73o5LjpSN0Th9O qDzise5F3W0u64LFu3RYxKMebLRC0Y0= X-Received: by 2002:adf:f601:: with SMTP id t1mr5958913wrp.36.1568878441506; Thu, 19 Sep 2019 00:34:01 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.33.59 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:33:59 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 03/10] sched/fair: remove meaningless imbalance calculation Date: Thu, 19 Sep 2019 09:33:34 +0200 Message-Id: <1568878421-12301-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org clean up load_balance and remove meaningless calculation and fields before adding new algorithm. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 105 +--------------------------------------------------- 1 file changed, 1 insertion(+), 104 deletions(-) -- 2.7.4 Acked-by: Rik van Riel Reviewed-by: Valentin Schneider diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 02ab6b5..017aad0 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5390,18 +5390,6 @@ static unsigned long capacity_of(int cpu) return cpu_rq(cpu)->cpu_capacity; } -static unsigned long cpu_avg_load_per_task(int cpu) -{ - struct rq *rq = cpu_rq(cpu); - unsigned long nr_running = READ_ONCE(rq->cfs.h_nr_running); - unsigned long load_avg = cpu_runnable_load(rq); - - if (nr_running) - return load_avg / nr_running; - - return 0; -} - static void record_wakee(struct task_struct *p) { /* @@ -7677,7 +7665,6 @@ static unsigned long task_h_load(struct task_struct *p) struct sg_lb_stats { unsigned long avg_load; /*Avg load across the CPUs of the group */ unsigned long group_load; /* Total load over the CPUs of the group */ - unsigned long load_per_task; unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ @@ -8059,9 +8046,6 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->group_capacity = group->sgc->capacity; sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; - if (sgs->sum_h_nr_running) - sgs->load_per_task = sgs->group_load / sgs->sum_h_nr_running; - sgs->group_weight = group->group_weight; sgs->group_no_capacity = group_is_overloaded(env, sgs); @@ -8292,76 +8276,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd } /** - * fix_small_imbalance - Calculate the minor imbalance that exists - * amongst the groups of a sched_domain, during - * load balancing. - * @env: The load balancing environment. - * @sds: Statistics of the sched_domain whose imbalance is to be calculated. - */ -static inline -void fix_small_imbalance(struct lb_env *env, struct sd_lb_stats *sds) -{ - unsigned long tmp, capa_now = 0, capa_move = 0; - unsigned int imbn = 2; - unsigned long scaled_busy_load_per_task; - struct sg_lb_stats *local, *busiest; - - local = &sds->local_stat; - busiest = &sds->busiest_stat; - - if (!local->sum_h_nr_running) - local->load_per_task = cpu_avg_load_per_task(env->dst_cpu); - else if (busiest->load_per_task > local->load_per_task) - imbn = 1; - - scaled_busy_load_per_task = - (busiest->load_per_task * SCHED_CAPACITY_SCALE) / - busiest->group_capacity; - - if (busiest->avg_load + scaled_busy_load_per_task >= - local->avg_load + (scaled_busy_load_per_task * imbn)) { - env->imbalance = busiest->load_per_task; - return; - } - - /* - * OK, we don't have enough imbalance to justify moving tasks, - * however we may be able to increase total CPU capacity used by - * moving them. - */ - - capa_now += busiest->group_capacity * - min(busiest->load_per_task, busiest->avg_load); - capa_now += local->group_capacity * - min(local->load_per_task, local->avg_load); - capa_now /= SCHED_CAPACITY_SCALE; - - /* Amount of load we'd subtract */ - if (busiest->avg_load > scaled_busy_load_per_task) { - capa_move += busiest->group_capacity * - min(busiest->load_per_task, - busiest->avg_load - scaled_busy_load_per_task); - } - - /* Amount of load we'd add */ - if (busiest->avg_load * busiest->group_capacity < - busiest->load_per_task * SCHED_CAPACITY_SCALE) { - tmp = (busiest->avg_load * busiest->group_capacity) / - local->group_capacity; - } else { - tmp = (busiest->load_per_task * SCHED_CAPACITY_SCALE) / - local->group_capacity; - } - capa_move += local->group_capacity * - min(local->load_per_task, local->avg_load + tmp); - capa_move /= SCHED_CAPACITY_SCALE; - - /* Move if we gain throughput */ - if (capa_move > capa_now) - env->imbalance = busiest->load_per_task; -} - -/** * calculate_imbalance - Calculate the amount of imbalance present within the * groups of a given sched_domain during load balance. * @env: load balance environment @@ -8380,15 +8294,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s return; } - if (busiest->group_type == group_imbalanced) { - /* - * In the group_imb case we cannot rely on group-wide averages - * to ensure CPU-load equilibrium, look at wider averages. XXX - */ - busiest->load_per_task = - min(busiest->load_per_task, sds->avg_load); - } - /* * Avg load of busiest sg can be less and avg load of local sg can * be greater than avg load across all sgs of sd because avg load @@ -8399,7 +8304,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s (busiest->avg_load <= sds->avg_load || local->avg_load >= sds->avg_load)) { env->imbalance = 0; - return fix_small_imbalance(env, sds); + return; } /* @@ -8437,14 +8342,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s busiest->group_misfit_task_load); } - /* - * if *imbalance is less than the average load per runnable task - * there is no guarantee that any tasks will be moved so we'll have - * a think about bumping its value to force at least one task to be - * moved - */ - if (env->imbalance < busiest->load_per_task) - return fix_small_imbalance(env, sds); } /******* find_busiest_group() helpers end here *********************/ From patchwork Thu Sep 19 07:33:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174041 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664142ill; Thu, 19 Sep 2019 00:34:12 -0700 (PDT) X-Google-Smtp-Source: APXvYqzZPWIyuQD393D1E/TTHxpQ4SPEKjDL91CIMEihyOBGDRMHTPSP6xEYZUpLgdF5X0PulNHx X-Received: by 2002:a50:91b1:: with SMTP id g46mr7392469eda.255.1568878451927; Thu, 19 Sep 2019 00:34:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878451; cv=none; d=google.com; s=arc-20160816; b=YclDkBsOUsYqsImEPr5uJ6ZPkfUSPMQ+pz/f3rCMMv9Psnp4FlhxKm6dMnq+GybB+e J/dgziQ4vvTUlcPTZZxr57vofecbXSgt+3iTz9YpLzQkF2n+tJ5tf9xLXdR5K4+jcyI1 n2Ep5Sx4Cbg00hImU4XQTjEBEl7hOajPnwUl9YmleNFrvk5d21i2KPMTHRJeVt5gv0V8 1SD/7KVXxNitGthOaTjV2WHxlkbItVMl+FaAycZsNVfDL+TljMAqp8BtiJ9QLWrGmVQ/ HGjUhmlSPJkHPSRU2A6cAm7eHgDSOMv5hP+5LQwNpqwSh1bGoY0u4O21tGtNH383LeY2 PPXg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=VMMCGD++pLFt/j/93sV0I+2KEXast/f5jWZ4FyiTHLA=; b=s84a8e9X+EBxQqtOyT9a3WhMUk7I0jgu1PPD+rMcHvX+6bNEMGlK55LPcWWtKpl8RE G/edNl0s3YB0mrISZdwWfuFzJjawH6T+ulM3DzWCAGy9/GqVIU6CHer9oXAZbFYjbXD1 bfynucbnoVgJksgiQ4VLygSbZjq7cYuSwCQ42uuR+5z7zYA/dHEhXCmCLirMnhCAApv1 8aANV6e7rUUGCZ/CccT+pkb95kVUe6yBhtL+XzCDzTbY0YkoVOeDmZt6++NseLmooRC9 WLNFiHyURKLIu1yif78UEroROSFNnovI1pxPxGvebvgWdGYrsHjr/gDPSnIEJyypezRi koGQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iXcepOWh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j14si4683323eda.181.2019.09.19.00.34.11; Thu, 19 Sep 2019 00:34:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=iXcepOWh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388664AbfISHeK (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:10 -0400 Received: from mail-wm1-f66.google.com ([209.85.128.66]:33404 "EHLO mail-wm1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388638AbfISHeH (ORCPT ); Thu, 19 Sep 2019 03:34:07 -0400 Received: by mail-wm1-f66.google.com with SMTP id r17so6496222wme.0 for ; Thu, 19 Sep 2019 00:34:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=VMMCGD++pLFt/j/93sV0I+2KEXast/f5jWZ4FyiTHLA=; b=iXcepOWhWYs+gzKR1d2i5gQXV2+5gtITe7HcNG9QgnW+nmQWcj0tc33jg6Z+lYqd7f fTkbtlo5+Y3Dfe2OsJcHZgu8xIWtu4FoX1zCVGjCZ+3Rhjl690yil+n8ssdyMFQx27QL S/MY3swqRseyq3eRISgunHWQWJGHtU/sMPFf1XfgVNV9MsRF26itWGaN//mOFXlmYO5W d3SsdLdCeruSgAJR2173KJLyoaICd+1mvsum8r1UtfkEXbF+kZIw3a0xo/Te2aAwYFTt k0LG7hzKKzAVB3K+X1JyitgWDMEbq/Z7Pq/odydIhEc81W4UtHv4uoRzFp2Qs9nqlRhu B3Rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=VMMCGD++pLFt/j/93sV0I+2KEXast/f5jWZ4FyiTHLA=; b=jjPvn5qaJVLG9AXPmABGcHlBH18YL9eDJUS2bfl4qsaDAZyBIXTdDJVWoJiHRl2zIn Yjq06+/S5Z58yqMHdvU5As6YTY5nXB+CZI0KJgOJFy4j31kgeajpCqzRTF2gqP1FH2Mn IKDRvidJQjKrxI+V+uFgZNVt6EUVWvgZ8pzhjQ5bMxV/si6i+DuGBZAvfZ0tUQGT2VK8 FTMi4Kd287b+QbLTTexljmnYFK2PUHnDQIGsfkI0I94Lj44XsYUaHFAYOkOzd/M9ZV0O FsMWoIehWxMVDjVtLVrwW1+0RhsWiKCvVl/OXAvwzJRtXw2pc9Fc7GGjlQD1VbthOZhD HBAQ== X-Gm-Message-State: APjAAAV1jWiTI5M9g6zwMPA4/kBt2D/e87HA7t2zFZojQUD4fsrOg0il 1CvWE15O4XPNl5QnD/o5rR59rZwSCk0= X-Received: by 2002:a1c:4d0d:: with SMTP id o13mr1558622wmh.19.1568878443474; Thu, 19 Sep 2019 00:34:03 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.34.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:34:01 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 04/10] sched/fair: rework load_balance Date: Thu, 19 Sep 2019 09:33:35 +0200 Message-Id: <1568878421-12301-5-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The load_balance algorithm contains some heuristics which have become meaningless since the rework of the scheduler's metrics like the introduction of PELT. Furthermore, load is an ill-suited metric for solving certain task placement imbalance scenarios. For instance, in the presence of idle CPUs, we should simply try to get at least one task per CPU, whereas the current load-based algorithm can actually leave idle CPUs alone simply because the load is somewhat balanced. The current algorithm ends up creating virtual and meaningless value like the avg_load_per_task or tweaks the state of a group to make it overloaded whereas it's not, in order to try to migrate tasks. load_balance should better qualify the imbalance of the group and clearly define what has to be moved to fix this imbalance. The type of sched_group has been extended to better reflect the type of imbalance. We now have : group_has_spare group_fully_busy group_misfit_task group_asym_capacity group_imbalanced group_overloaded Based on the type fo sched_group, load_balance now sets what it wants to move in order to fix the imbalance. It can be some load as before but also some utilization, a number of task or a type of task: migrate_task migrate_util migrate_load migrate_misfit This new load_balance algorithm fixes several pending wrong tasks placement: - the 1 task per CPU case with asymmetrics system - the case of cfs task preempted by other class - the case of tasks not evenly spread on groups with spare capacity Also the load balance decisions have been consolidated in the 3 functions below after removing the few bypasses and hacks of the current code: - update_sd_pick_busiest() select the busiest sched_group. - find_busiest_group() checks if there is an imbalance between local and busiest group. - calculate_imbalance() decides what have to be moved. Finally, the now unused field total_running of struct sd_lb_stats has been removed. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 585 ++++++++++++++++++++++++++++++++++------------------ 1 file changed, 380 insertions(+), 205 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 017aad0..d33379c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7078,11 +7078,26 @@ static unsigned long __read_mostly max_load_balance_interval = HZ/10; enum fbq_type { regular, remote, all }; +/* + * group_type describes the group of CPUs at the moment of the load balance. + * The enum is ordered by pulling priority, with the group with lowest priority + * first so the groupe_type can be simply compared when selecting the busiest + * group. see update_sd_pick_busiest(). + */ enum group_type { - group_other = 0, + group_has_spare = 0, + group_fully_busy, group_misfit_task, + group_asym_packing, group_imbalanced, - group_overloaded, + group_overloaded +}; + +enum migration_type { + migrate_load = 0, + migrate_util, + migrate_task, + migrate_misfit }; #define LBF_ALL_PINNED 0x01 @@ -7115,7 +7130,7 @@ struct lb_env { unsigned int loop_max; enum fbq_type fbq_type; - enum group_type src_grp_type; + enum migration_type balance_type; struct list_head tasks; }; @@ -7347,7 +7362,7 @@ static int detach_tasks(struct lb_env *env) { struct list_head *tasks = &env->src_rq->cfs_tasks; struct task_struct *p; - unsigned long load; + unsigned long util, load; int detached = 0; lockdep_assert_held(&env->src_rq->lock); @@ -7380,19 +7395,53 @@ static int detach_tasks(struct lb_env *env) if (!can_migrate_task(p, env)) goto next; - load = task_h_load(p); + switch (env->balance_type) { + case migrate_load: + load = task_h_load(p); - if (sched_feat(LB_MIN) && load < 16 && !env->sd->nr_balance_failed) - goto next; + if (sched_feat(LB_MIN) && + load < 16 && !env->sd->nr_balance_failed) + goto next; - if ((load / 2) > env->imbalance) - goto next; + if ((load / 2) > env->imbalance) + goto next; + + env->imbalance -= load; + break; + + case migrate_util: + util = task_util_est(p); + + if (util > env->imbalance) + goto next; + + env->imbalance -= util; + break; + + case migrate_task: + /* Migrate task */ + env->imbalance--; + break; + + case migrate_misfit: + load = task_h_load(p); + + /* + * utilization of misfit task might decrease a bit + * since it has been recorded. Be conservative in the + * condition. + */ + if (load < env->imbalance) + goto next; + + env->imbalance = 0; + break; + } detach_task(p, env); list_add(&p->se.group_node, &env->tasks); detached++; - env->imbalance -= load; #ifdef CONFIG_PREEMPT /* @@ -7406,7 +7455,7 @@ static int detach_tasks(struct lb_env *env) /* * We only want to steal up to the prescribed amount of - * runnable load. + * load/util/tasks. */ if (env->imbalance <= 0) break; @@ -7671,7 +7720,6 @@ struct sg_lb_stats { unsigned int idle_cpus; unsigned int group_weight; enum group_type group_type; - int group_no_capacity; unsigned int group_asym_packing; /* Tasks should be moved to preferred CPU */ unsigned long group_misfit_task_load; /* A CPU has a task too big for its capacity */ #ifdef CONFIG_NUMA_BALANCING @@ -7687,10 +7735,10 @@ struct sg_lb_stats { struct sd_lb_stats { struct sched_group *busiest; /* Busiest group in this sd */ struct sched_group *local; /* Local group in this sd */ - unsigned long total_running; unsigned long total_load; /* Total load of all groups in sd */ unsigned long total_capacity; /* Total capacity of all groups in sd */ unsigned long avg_load; /* Average load across all groups in sd */ + unsigned int prefer_sibling; /* tasks should go to sibling first */ struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */ struct sg_lb_stats local_stat; /* Statistics of the local group */ @@ -7707,13 +7755,11 @@ static inline void init_sd_lb_stats(struct sd_lb_stats *sds) *sds = (struct sd_lb_stats){ .busiest = NULL, .local = NULL, - .total_running = 0UL, .total_load = 0UL, .total_capacity = 0UL, .busiest_stat = { - .avg_load = 0UL, - .sum_h_nr_running = 0, - .group_type = group_other, + .idle_cpus = UINT_MAX, + .group_type = group_has_spare, }, }; } @@ -7955,19 +8001,26 @@ group_smaller_max_cpu_capacity(struct sched_group *sg, struct sched_group *ref) } static inline enum -group_type group_classify(struct sched_group *group, +group_type group_classify(struct lb_env *env, + struct sched_group *group, struct sg_lb_stats *sgs) { - if (sgs->group_no_capacity) + if (group_is_overloaded(env, sgs)) return group_overloaded; if (sg_imbalanced(group)) return group_imbalanced; + if (sgs->group_asym_packing) + return group_asym_packing; + if (sgs->group_misfit_task_load) return group_misfit_task; - return group_other; + if (!group_has_capacity(env, sgs)) + return group_fully_busy; + + return group_has_spare; } static bool update_nohz_stats(struct rq *rq, bool force) @@ -8004,10 +8057,12 @@ static inline void update_sg_lb_stats(struct lb_env *env, struct sg_lb_stats *sgs, int *sg_status) { - int i, nr_running; + int i, nr_running, local_group; memset(sgs, 0, sizeof(*sgs)); + local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group)); + for_each_cpu_and(i, sched_group_span(group), env->cpus) { struct rq *rq = cpu_rq(i); @@ -8032,9 +8087,16 @@ static inline void update_sg_lb_stats(struct lb_env *env, /* * No need to call idle_cpu() if nr_running is not 0 */ - if (!nr_running && idle_cpu(i)) + if (!nr_running && idle_cpu(i)) { sgs->idle_cpus++; + /* Idle cpu can't have misfit task */ + continue; + } + + if (local_group) + continue; + /* Check for a misfit task on the cpu */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && sgs->group_misfit_task_load < rq->misfit_task_load) { sgs->group_misfit_task_load = rq->misfit_task_load; @@ -8042,14 +8104,24 @@ static inline void update_sg_lb_stats(struct lb_env *env, } } - /* Adjust by relative CPU capacity of the group */ + /* Check if dst cpu is idle and preferred to this group */ + if (env->sd->flags & SD_ASYM_PACKING && + env->idle != CPU_NOT_IDLE && + sgs->sum_h_nr_running && + sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) { + sgs->group_asym_packing = 1; + } + sgs->group_capacity = group->sgc->capacity; - sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) / sgs->group_capacity; sgs->group_weight = group->group_weight; - sgs->group_no_capacity = group_is_overloaded(env, sgs); - sgs->group_type = group_classify(group, sgs); + sgs->group_type = group_classify(env, group, sgs); + + /* Computing avg_load makes sense only when group is overloaded */ + if (sgs->group_type == group_overloaded) + sgs->avg_load = (sgs->group_load * SCHED_CAPACITY_SCALE) / + sgs->group_capacity; } /** @@ -8072,6 +8144,10 @@ static bool update_sd_pick_busiest(struct lb_env *env, { struct sg_lb_stats *busiest = &sds->busiest_stat; + /* Make sure that there is at least one task to pull */ + if (!sgs->sum_h_nr_running) + return false; + /* * Don't try to pull misfit tasks we can't help. * We can use max_capacity here as reduction in capacity on some @@ -8080,7 +8156,7 @@ static bool update_sd_pick_busiest(struct lb_env *env, */ if (sgs->group_type == group_misfit_task && (!group_smaller_max_cpu_capacity(sg, sds->local) || - !group_has_capacity(env, &sds->local_stat))) + sds->local_stat.group_type != group_has_spare)) return false; if (sgs->group_type > busiest->group_type) @@ -8089,62 +8165,80 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->group_type < busiest->group_type) return false; - if (sgs->avg_load <= busiest->avg_load) - return false; - - if (!(env->sd->flags & SD_ASYM_CPUCAPACITY)) - goto asym_packing; - /* - * Candidate sg has no more than one task per CPU and - * has higher per-CPU capacity. Migrating tasks to less - * capable CPUs may harm throughput. Maximize throughput, - * power/energy consequences are not considered. + * The candidate and the current busiest group are the same type of + * group. Let check which one is the busiest according to the type. */ - if (sgs->sum_h_nr_running <= sgs->group_weight && - group_smaller_min_cpu_capacity(sds->local, sg)) - return false; - /* - * If we have more than one misfit sg go with the biggest misfit. - */ - if (sgs->group_type == group_misfit_task && - sgs->group_misfit_task_load < busiest->group_misfit_task_load) + switch (sgs->group_type) { + case group_overloaded: + /* Select the overloaded group with highest avg_load. */ + if (sgs->avg_load <= busiest->avg_load) + return false; + break; + + case group_imbalanced: + /* + * Select the 1st imbalanced group as we don't have any way to + * choose one more than another. + */ return false; -asym_packing: - /* This is the busiest node in its class. */ - if (!(env->sd->flags & SD_ASYM_PACKING)) - return true; + case group_asym_packing: + /* Prefer to move from lowest priority CPU's work */ + if (sched_asym_prefer(sg->asym_prefer_cpu, sds->busiest->asym_prefer_cpu)) + return false; + break; - /* No ASYM_PACKING if target CPU is already busy */ - if (env->idle == CPU_NOT_IDLE) - return true; - /* - * ASYM_PACKING needs to move all the work to the highest - * prority CPUs in the group, therefore mark all groups - * of lower priority than ourself as busy. - * - * This is primarily intended to used at the sibling level. Some - * cores like POWER7 prefer to use lower numbered SMT threads. In the - * case of POWER7, it can move to lower SMT modes only when higher - * threads are idle. When in lower SMT modes, the threads will - * perform better since they share less core resources. Hence when we - * have idle threads, we want them to be the higher ones. - */ - if (sgs->sum_h_nr_running && - sched_asym_prefer(env->dst_cpu, sg->asym_prefer_cpu)) { - sgs->group_asym_packing = 1; - if (!sds->busiest) - return true; + case group_misfit_task: + /* + * If we have more than one misfit sg go with the biggest + * misfit. + */ + if (sgs->group_misfit_task_load < busiest->group_misfit_task_load) + return false; + break; - /* Prefer to move from lowest priority CPU's work */ - if (sched_asym_prefer(sds->busiest->asym_prefer_cpu, - sg->asym_prefer_cpu)) - return true; + case group_fully_busy: + /* + * Select the fully busy group with highest avg_load. In + * theory, there is no need to pull task from such kind of + * group because tasks have all compute capacity that they need + * but we can still improve the overall throughput by reducing + * contention when accessing shared HW resources. + * + * XXX for now avg_load is not computed and always 0 so we + * select the 1st one. + */ + if (sgs->avg_load <= busiest->avg_load) + return false; + break; + + case group_has_spare: + /* + * Select not overloaded group with lowest number of + * idle cpus. We could also compare the spare capacity + * which is more stable but it can end up that the + * group has less spare capacity but finally more idle + * cpus which means less opportunity to pull tasks. + */ + if (sgs->idle_cpus >= busiest->idle_cpus) + return false; + break; } - return false; + /* + * Candidate sg has no more than one task per CPU and has higher + * per-CPU capacity. Migrating tasks to less capable CPUs may harm + * throughput. Maximize throughput, power/energy consequences are not + * considered. + */ + if ((env->sd->flags & SD_ASYM_CPUCAPACITY) && + (sgs->group_type <= group_fully_busy) && + (group_smaller_min_cpu_capacity(sds->local, sg))) + return false; + + return true; } #ifdef CONFIG_NUMA_BALANCING @@ -8182,13 +8276,13 @@ static inline enum fbq_type fbq_classify_rq(struct rq *rq) * @env: The load balancing environment. * @sds: variable to hold the statistics for this sched_domain. */ + static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sds) { struct sched_domain *child = env->sd->child; struct sched_group *sg = env->sd->groups; struct sg_lb_stats *local = &sds->local_stat; struct sg_lb_stats tmp_sgs; - bool prefer_sibling = child && child->flags & SD_PREFER_SIBLING; int sg_status = 0; #ifdef CONFIG_NO_HZ_COMMON @@ -8215,22 +8309,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd if (local_group) goto next_group; - /* - * In case the child domain prefers tasks go to siblings - * first, lower the sg capacity so that we'll try - * and move all the excess tasks away. We lower the capacity - * of a group only if the local group has the capacity to fit - * these excess tasks. The extra check prevents the case where - * you always pull from the heaviest group when it is already - * under-utilized (possible with a large weight task outweighs - * the tasks on the system). - */ - if (prefer_sibling && sds->local && - group_has_capacity(env, local) && - (sgs->sum_h_nr_running > local->sum_h_nr_running + 1)) { - sgs->group_no_capacity = 1; - sgs->group_type = group_classify(sg, sgs); - } if (update_sd_pick_busiest(env, sds, sg, sgs)) { sds->busiest = sg; @@ -8239,13 +8317,15 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd next_group: /* Now, start updating sd_lb_stats */ - sds->total_running += sgs->sum_h_nr_running; sds->total_load += sgs->group_load; sds->total_capacity += sgs->group_capacity; sg = sg->next; } while (sg != env->sd->groups); + /* Tag domain that child domain prefers tasks go to siblings first */ + sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING; + #ifdef CONFIG_NO_HZ_COMMON if ((env->flags & LBF_NOHZ_AGAIN) && cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) { @@ -8283,69 +8363,133 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd */ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *sds) { - unsigned long max_pull, load_above_capacity = ~0UL; struct sg_lb_stats *local, *busiest; local = &sds->local_stat; busiest = &sds->busiest_stat; - if (busiest->group_asym_packing) { + if (busiest->group_type == group_misfit_task) { + /* Set imbalance to allow misfit task to be balanced. */ + env->balance_type = migrate_misfit; + env->imbalance = busiest->group_misfit_task_load; + return; + } + + if (busiest->group_type == group_asym_packing) { + /* + * In case of asym capacity, we will try to migrate all load to + * the preferred CPU. + */ + env->balance_type = migrate_load; env->imbalance = busiest->group_load; return; } + if (busiest->group_type == group_imbalanced) { + /* + * In the group_imb case we cannot rely on group-wide averages + * to ensure CPU-load equilibrium, try to move any task to fix + * the imbalance. The next load balance will take care of + * balancing back the system. + */ + env->balance_type = migrate_task; + env->imbalance = 1; + return; + } + /* - * Avg load of busiest sg can be less and avg load of local sg can - * be greater than avg load across all sgs of sd because avg load - * factors in sg capacity and sgs with smaller group_type are - * skipped when updating the busiest sg: + * Try to use spare capacity of local group without overloading it or + * emptying busiest */ - if (busiest->group_type != group_misfit_task && - (busiest->avg_load <= sds->avg_load || - local->avg_load >= sds->avg_load)) { - env->imbalance = 0; + if (local->group_type == group_has_spare) { + if (busiest->group_type > group_fully_busy) { + /* + * If busiest is overloaded, try to fill spare + * capacity. This might end up creating spare capacity + * in busiest or busiest still being overloaded but + * there is no simple way to directly compute the + * amount of load to migrate in order to balance the + * system. + */ + env->balance_type = migrate_util; + env->imbalance = max(local->group_capacity, local->group_util) - + local->group_util; + return; + } + + if (busiest->group_weight == 1 || sds->prefer_sibling) { + /* + * When prefer sibling, evenly spread running tasks on + * groups. + */ + env->balance_type = migrate_task; + env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1; + return; + } + + /* + * If there is no overload, we just want to even the number of + * idle cpus. + */ + env->balance_type = migrate_task; + env->imbalance = max_t(long, 0, (local->idle_cpus - busiest->idle_cpus) >> 1); return; } /* - * If there aren't any idle CPUs, avoid creating some. + * Local is fully busy but have to take more load to relieve the + * busiest group */ - if (busiest->group_type == group_overloaded && - local->group_type == group_overloaded) { - load_above_capacity = busiest->sum_h_nr_running * SCHED_CAPACITY_SCALE; - if (load_above_capacity > busiest->group_capacity) { - load_above_capacity -= busiest->group_capacity; - load_above_capacity *= scale_load_down(NICE_0_LOAD); - load_above_capacity /= busiest->group_capacity; - } else - load_above_capacity = ~0UL; + if (local->group_type < group_overloaded) { + /* + * Local will become overloaded so the avg_load metrics are + * finally needed. + */ + + local->avg_load = (local->group_load * SCHED_CAPACITY_SCALE) / + local->group_capacity; + + sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) / + sds->total_capacity; } /* - * We're trying to get all the CPUs to the average_load, so we don't - * want to push ourselves above the average load, nor do we wish to - * reduce the max loaded CPU below the average load. At the same time, - * we also don't want to reduce the group load below the group - * capacity. Thus we look for the minimum possible imbalance. + * Both group are or will become overloaded and we're trying to get all + * the CPUs to the average_load, so we don't want to push ourselves + * above the average load, nor do we wish to reduce the max loaded CPU + * below the average load. At the same time, we also don't want to + * reduce the group load below the group capacity. Thus we look for + * the minimum possible imbalance. */ - max_pull = min(busiest->avg_load - sds->avg_load, load_above_capacity); - - /* How much load to actually move to equalise the imbalance */ + env->balance_type = migrate_load; env->imbalance = min( - max_pull * busiest->group_capacity, + (busiest->avg_load - sds->avg_load) * busiest->group_capacity, (sds->avg_load - local->avg_load) * local->group_capacity ) / SCHED_CAPACITY_SCALE; - - /* Boost imbalance to allow misfit task to be balanced. */ - if (busiest->group_type == group_misfit_task) { - env->imbalance = max_t(long, env->imbalance, - busiest->group_misfit_task_load); - } - } /******* find_busiest_group() helpers end here *********************/ +/* + * Decision matrix according to the local and busiest group state + * + * busiest \ local has_spare fully_busy misfit asym imbalanced overloaded + * has_spare nr_idle balanced N/A N/A balanced balanced + * fully_busy nr_idle nr_idle N/A N/A balanced balanced + * misfit_task force N/A N/A N/A force force + * asym_capacity force force N/A N/A force force + * imbalanced force force N/A N/A force force + * overloaded force force N/A N/A force avg_load + * + * N/A : Not Applicable because already filtered while updating + * statistics. + * balanced : The system is balanced for these 2 groups. + * force : Calculate the imbalance as load migration is probably needed. + * avg_load : Only if imbalance is significant enough. + * nr_idle : dst_cpu is not busy and the number of idle cpus is quite + * different in groups. + */ + /** * find_busiest_group - Returns the busiest group within the sched_domain * if there is an imbalance. @@ -8380,17 +8524,17 @@ static struct sched_group *find_busiest_group(struct lb_env *env) local = &sds.local_stat; busiest = &sds.busiest_stat; - /* ASYM feature bypasses nice load balance check */ - if (busiest->group_asym_packing) - goto force_balance; - /* There is no busy sibling group to pull tasks from */ - if (!sds.busiest || busiest->sum_h_nr_running == 0) + if (!sds.busiest) goto out_balanced; - /* XXX broken for overlapping NUMA groups */ - sds.avg_load = (SCHED_CAPACITY_SCALE * sds.total_load) - / sds.total_capacity; + /* Misfit tasks should be dealt with regardless of the avg load */ + if (busiest->group_type == group_misfit_task) + goto force_balance; + + /* ASYM feature bypasses nice load balance check */ + if (busiest->group_type == group_asym_packing) + goto force_balance; /* * If the busiest group is imbalanced the below checks don't @@ -8401,55 +8545,64 @@ static struct sched_group *find_busiest_group(struct lb_env *env) goto force_balance; /* - * When dst_cpu is idle, prevent SMP nice and/or asymmetric group - * capacities from resulting in underutilization due to avg_load. - */ - if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) && - busiest->group_no_capacity) - goto force_balance; - - /* Misfit tasks should be dealt with regardless of the avg load */ - if (busiest->group_type == group_misfit_task) - goto force_balance; - - /* * If the local group is busier than the selected busiest group * don't try and pull any tasks. */ - if (local->avg_load >= busiest->avg_load) + if (local->group_type > busiest->group_type) goto out_balanced; /* - * Don't pull any tasks if this group is already above the domain - * average load. + * When groups are overloaded, use the avg_load to ensure fairness + * between tasks. */ - if (local->avg_load >= sds.avg_load) - goto out_balanced; + if (local->group_type == group_overloaded) { + /* + * If the local group is more loaded than the selected + * busiest group don't try and pull any tasks. + */ + if (local->avg_load >= busiest->avg_load) + goto out_balanced; + + /* XXX broken for overlapping NUMA groups */ + sds.avg_load = (sds.total_load * SCHED_CAPACITY_SCALE) / + sds.total_capacity; - if (env->idle == CPU_IDLE) { /* - * This CPU is idle. If the busiest group is not overloaded - * and there is no imbalance between this and busiest group - * wrt idle CPUs, it is balanced. The imbalance becomes - * significant if the diff is greater than 1 otherwise we - * might end up to just move the imbalance on another group + * Don't pull any tasks if this group is already above the + * domain average load. */ - if ((busiest->group_type != group_overloaded) && - (local->idle_cpus <= (busiest->idle_cpus + 1))) + if (local->avg_load >= sds.avg_load) goto out_balanced; - } else { + /* - * In the CPU_NEWLY_IDLE, CPU_NOT_IDLE cases, use - * imbalance_pct to be conservative. + * If the busiest group is more loaded, use imbalance_pct to be + * conservative. */ if (100 * busiest->avg_load <= env->sd->imbalance_pct * local->avg_load) goto out_balanced; } + /* Try to move all excess tasks to child's sibling domain */ + if (sds.prefer_sibling && local->group_type == group_has_spare && + busiest->sum_h_nr_running > local->sum_h_nr_running + 1) + goto force_balance; + + if (busiest->group_type != group_overloaded && + (env->idle == CPU_NOT_IDLE || + local->idle_cpus <= (busiest->idle_cpus + 1))) + /* + * If the busiest group is not overloaded + * and there is no imbalance between this and busiest group + * wrt idle CPUs, it is balanced. The imbalance + * becomes significant if the diff is greater than 1 otherwise + * we might end up to just move the imbalance on another + * group. + */ + goto out_balanced; + force_balance: /* Looks like there is an imbalance. Compute it */ - env->src_grp_type = busiest->group_type; calculate_imbalance(env, &sds); return env->imbalance ? sds.busiest : NULL; @@ -8465,11 +8618,13 @@ static struct rq *find_busiest_queue(struct lb_env *env, struct sched_group *group) { struct rq *busiest = NULL, *rq; - unsigned long busiest_load = 0, busiest_capacity = 1; + unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1; + unsigned int busiest_nr = 0; int i; for_each_cpu_and(i, sched_group_span(group), env->cpus) { - unsigned long capacity, load; + unsigned long capacity, load, util; + unsigned int nr_running; enum fbq_type rt; rq = cpu_rq(i); @@ -8497,20 +8652,8 @@ static struct rq *find_busiest_queue(struct lb_env *env, if (rt > env->fbq_type) continue; - /* - * For ASYM_CPUCAPACITY domains with misfit tasks we simply - * seek the "biggest" misfit task. - */ - if (env->src_grp_type == group_misfit_task) { - if (rq->misfit_task_load > busiest_load) { - busiest_load = rq->misfit_task_load; - busiest = rq; - } - - continue; - } - capacity = capacity_of(i); + nr_running = rq->cfs.h_nr_running; /* * For ASYM_CPUCAPACITY domains, don't pick a CPU that could @@ -8520,35 +8663,67 @@ static struct rq *find_busiest_queue(struct lb_env *env, */ if (env->sd->flags & SD_ASYM_CPUCAPACITY && capacity_of(env->dst_cpu) < capacity && - rq->nr_running == 1) + nr_running == 1) continue; - load = cpu_runnable_load(rq); + switch (env->balance_type) { + case migrate_load: + /* + * When comparing with load imbalance, use cpu_runnable_load() + * which is not scaled with the CPU capacity. + */ + load = cpu_runnable_load(rq); - /* - * When comparing with imbalance, use cpu_runnable_load() - * which is not scaled with the CPU capacity. - */ + if (nr_running == 1 && load > env->imbalance && + !check_cpu_capacity(rq, env->sd)) + break; - if (rq->nr_running == 1 && load > env->imbalance && - !check_cpu_capacity(rq, env->sd)) - continue; + /* + * For the load comparisons with the other CPU's, consider + * the cpu_runnable_load() scaled with the CPU capacity, so + * that the load can be moved away from the CPU that is + * potentially running at a lower capacity. + * + * Thus we're looking for max(load_i / capacity_i), crosswise + * multiplication to rid ourselves of the division works out + * to: load_i * capacity_j > load_j * capacity_i; where j is + * our previous maximum. + */ + if (load * busiest_capacity > busiest_load * capacity) { + busiest_load = load; + busiest_capacity = capacity; + busiest = rq; + } + break; + + case migrate_util: + util = cpu_util(cpu_of(rq)); + + if (busiest_util < util) { + busiest_util = util; + busiest = rq; + } + break; + + case migrate_task: + if (busiest_nr < nr_running) { + busiest_nr = nr_running; + busiest = rq; + } + break; + + case migrate_misfit: + /* + * For ASYM_CPUCAPACITY domains with misfit tasks we simply + * seek the "biggest" misfit task. + */ + if (rq->misfit_task_load > busiest_load) { + busiest_load = rq->misfit_task_load; + busiest = rq; + } + + break; - /* - * For the load comparisons with the other CPU's, consider - * the cpu_runnable_load() scaled with the CPU capacity, so - * that the load can be moved away from the CPU that is - * potentially running at a lower capacity. - * - * Thus we're looking for max(load_i / capacity_i), crosswise - * multiplication to rid ourselves of the division works out - * to: load_i * capacity_j > load_j * capacity_i; where j is - * our previous maximum. - */ - if (load * busiest_capacity > busiest_load * capacity) { - busiest_load = load; - busiest_capacity = capacity; - busiest = rq; } } @@ -8594,7 +8769,7 @@ voluntary_active_balance(struct lb_env *env) return 1; } - if (env->src_grp_type == group_misfit_task) + if (env->balance_type == migrate_misfit) return 1; return 0; From patchwork Thu Sep 19 07:33:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174040 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664119ill; Thu, 19 Sep 2019 00:34:10 -0700 (PDT) X-Google-Smtp-Source: APXvYqyS+dHB4GE8twXlNQSsRRwR9ne9us+aKfc2vABXaD6k2CecnRXRm/+w3lU0ADpVK7oUHpXw X-Received: by 2002:a50:acc1:: with SMTP id x59mr14673692edc.278.1568878450233; Thu, 19 Sep 2019 00:34:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878450; cv=none; d=google.com; s=arc-20160816; b=ldGdCykEwC/dzw2sMga6CqkECyONcE2wwZKlcUtzvjp765aEgENQIOdTIamP2qjdmO 1WL2KozgONpoq4xREQ4DTMDvkQgG5DofLtvNRiyDTq0nB1qPX+zMP5phrNxA+GiOzfYi qQRcEY1FPHY12LCgdiWkIoIUiSW3WtIfObH4hgXdGcKf9ZD2JbGeSkepyAPVygealz0+ nIgEC4g/c7Z6UgupHH75qMRKoOAruoHuBrID3PGBIyRLcgIv3gC2zXWCSe/Bz04I/f78 cGnSfZZG8zMyEVbsMy08jnLPgaAVW/uDHR0VQJkqYCMYfSmex9PUU5nuwN593UVH282S C8xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=kRzPPO6auHhE/DujuF4oxr11CuUWI/AICc91av/Y9I8=; b=VFNfZtAedvjp3BKpwsKaTA8YQLXJ+Av909o84TWapd2plZJwYx+DyDlQWYR/AsveLo BpdV1MbB2hwNs2BkY7obsPfDQhz7OAhk6VDLWG++KhHISzSR5QMuRxgnq1ojF+1iCBDD C2XkeVInoUrP2JHIHScibTNMo+Ck0/nB0jTzatL8UE2mb7LHP62SwToqL4n7I4aey/40 lYZbMJK7OCF12mU8hx0XYPZOa3YrAH5KZJ2yNK3iWQUfrMGXOKxsP+F+y5dqPIarDDMr MNPa3xLXCzxeCuKF5V+ZS4EzvZONAK3VRXFskzsF67W6bWtkm1/eUiDz8mKjDsncLSO3 NiAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Msd7C7Yg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j14si4683323eda.181.2019.09.19.00.34.10; Thu, 19 Sep 2019 00:34:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Msd7C7Yg; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388652AbfISHeJ (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:09 -0400 Received: from mail-wr1-f67.google.com ([209.85.221.67]:45539 "EHLO mail-wr1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388627AbfISHeH (ORCPT ); Thu, 19 Sep 2019 03:34:07 -0400 Received: by mail-wr1-f67.google.com with SMTP id r5so1880273wrm.12 for ; Thu, 19 Sep 2019 00:34:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=kRzPPO6auHhE/DujuF4oxr11CuUWI/AICc91av/Y9I8=; b=Msd7C7YgcknAK6Jfv3VHjT2ctWoJqZl8q7RbneMRKIIwnsAlECkOyIaISFN18R2H4p Z06DTe5L0CaYseMfh8g1bEgjBTXHlxi5Aij3602uqqrk8IUILjtF/8Mipe9d8aDN7OIq zYeg3bdV7llUY8F3uFIxCaPy6WUAI7PKxkeFx+BMJeaXciBd6Hi3AAon5ILdmycMiXHO aTpN1fz8r6gA128dMW3z1xDtKTztiu90PC7ze+dzmwn3OLIExhbfZxK+hdIwD0Plrh2s a0+ckgV0vOMkT13GBBxADDSC4w0EBQQeSWyKumJMJmGMtI5gt+c5TuD2a9xQr/Gp2lp/ kiow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=kRzPPO6auHhE/DujuF4oxr11CuUWI/AICc91av/Y9I8=; b=j10nI44eflN3IpECOYjdJz5VKnynma9qG5Nju4YMaiwSB0oSNCNVRa1+8yuPjzqSYg VCgNw84aOYagQUKPXwYwjlrbE58jlMhLsX9/UwpKNkA2T/F5WcPEtmXjbWVO4zSV6WbN ItpvFv6BoHsc/Fu7WpKaQGfzrSzOrmJN/JDLkogoJlMC9ZlaskqOzON2H8D8MhNK0bqc YtG0vZ5jUMyZ3egEMpy3oWNRVhQKLIaXScv3X9/PdhQhxT5YY52claVSnvbjQm2lIs/S aOmQkVUoFynHNpPh297VsfMzNfoB8Wfrhuh1aYlnzZorG8JmTFSma36qyQ6owFommlJX 7NaQ== X-Gm-Message-State: APjAAAVt1oH0IIpguKlcvvytgmiF2kX37HeZao3t00EsCXGY8I85UOVc yygjhHyIhbemBYpVw4k0uKPi9NBSsgY= X-Received: by 2002:adf:f008:: with SMTP id j8mr6400595wro.75.1568878445082; Thu, 19 Sep 2019 00:34:05 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.34.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:34:03 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 05/10] sched/fair: use rq->nr_running when balancing load Date: Thu, 19 Sep 2019 09:33:36 +0200 Message-Id: <1568878421-12301-6-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org cfs load_balance only takes care of CFS tasks whereas CPUs can be used by other scheduling class. Typically, a CFS task preempted by a RT or deadline task will not get a chance to be pulled on another CPU because the load_balance doesn't take into account tasks from other classes. Add sum of nr_running in the statistics and use it to detect such situation. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d33379c..7e74836 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7716,6 +7716,7 @@ struct sg_lb_stats { unsigned long group_load; /* Total load over the CPUs of the group */ unsigned long group_capacity; unsigned long group_util; /* Total utilization of the group */ + unsigned int sum_nr_running; /* Nr of tasks running in the group */ unsigned int sum_h_nr_running; /* Nr of CFS tasks running in the group */ unsigned int idle_cpus; unsigned int group_weight; @@ -7949,7 +7950,7 @@ static inline int sg_imbalanced(struct sched_group *group) static inline bool group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_h_nr_running < sgs->group_weight) + if (sgs->sum_nr_running < sgs->group_weight) return true; if ((sgs->group_capacity * 100) > @@ -7970,7 +7971,7 @@ group_has_capacity(struct lb_env *env, struct sg_lb_stats *sgs) static inline bool group_is_overloaded(struct lb_env *env, struct sg_lb_stats *sgs) { - if (sgs->sum_h_nr_running <= sgs->group_weight) + if (sgs->sum_nr_running <= sgs->group_weight) return false; if ((sgs->group_capacity * 100) < @@ -8074,6 +8075,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, sgs->sum_h_nr_running += rq->cfs.h_nr_running; nr_running = rq->nr_running; + sgs->sum_nr_running += nr_running; + if (nr_running > 1) *sg_status |= SG_OVERLOAD; @@ -8423,7 +8426,7 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s * groups. */ env->balance_type = migrate_task; - env->imbalance = (busiest->sum_h_nr_running - local->sum_h_nr_running) >> 1; + env->imbalance = (busiest->sum_nr_running - local->sum_nr_running) >> 1; return; } @@ -8585,7 +8588,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) /* Try to move all excess tasks to child's sibling domain */ if (sds.prefer_sibling && local->group_type == group_has_spare && - busiest->sum_h_nr_running > local->sum_h_nr_running + 1) + busiest->sum_nr_running > local->sum_nr_running + 1) goto force_balance; if (busiest->group_type != group_overloaded && From patchwork Thu Sep 19 07:33:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174046 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664470ill; Thu, 19 Sep 2019 00:34:33 -0700 (PDT) X-Google-Smtp-Source: APXvYqwbe91ubls9trZSdJmcHb2TX7UYCSNnujBsZl4SVkcWUeyFj4ssVFNa4F9UF1KNjO51hmFf X-Received: by 2002:a05:6402:120e:: with SMTP id c14mr14480986edw.272.1568878473433; Thu, 19 Sep 2019 00:34:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878473; cv=none; d=google.com; s=arc-20160816; b=r64fmdCs5/MseipSlUeU8EzmlMza46O3TIeviWUXCods142+MRNHL+r4bCWlNRIs+8 3+1RPkjn9EUP9u1WcL7CNHDRg9cG7I9Id1q8f9C8zfGw6zXkHzRQv1TZ2/pWT2rwVMGr qsgcyvP6vyfVqTBAkZ1BASPtx0xr6/LRFYWXxtZ5DZ+m5GSCIz1jvW1MJb+ptRiYcVrV KP+By84/2y0QoE0HpsAGBA+m2uIKe8tylUoxTG8CWos1jFRlx4SUVnUXTrsWBnxd5oe/ XHbGgNxTgUP7dGw6aANzyF7qOWToacJ22mZW5C/UT7QiyJ0h41umFTQl03/BIxfBIrtq o/7w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=9SzE/B4CCxEhq7OsUQcCE6/p2xSl6Tj+2wygHScnuHQ=; b=ye4iMY8cvSzC6RjzilxTTJPtZ3M21b8jgZ66SCi4vTt7ekXK++o2krzIZlMRKkgQRf Act61shhusdHJBsto/A32/fSn9QL7nOXMJYdVny3FqBW8j82de3R7WqlkGtndYN6MEY7 NHskYsp7wANtMI4jDYzPxDoOFWpNaJs6XAHkIoJ9lLhvqZta0vux/p3NxX9km1FrvpjK c9CafzVdy3Bcj/YZBBpRU3Qy7RGMDe4/zF6yk+QpJxd2PXKNOjGm1+p9mF3v9YGm//0y 07WmR69q09lF2ZlMp8S2AT/BzUMhYRKY+cpu5yJHAlh3taFC0iy11pMSfHCTPJrTxVPH WbWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hpdpNkOF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id gt10si4048137ejb.282.2019.09.19.00.34.33; Thu, 19 Sep 2019 00:34:33 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=hpdpNkOF; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388730AbfISHea (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:30 -0400 Received: from mail-wm1-f67.google.com ([209.85.128.67]:34680 "EHLO mail-wm1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388640AbfISHeI (ORCPT ); Thu, 19 Sep 2019 03:34:08 -0400 Received: by mail-wm1-f67.google.com with SMTP id y135so6509734wmc.1 for ; Thu, 19 Sep 2019 00:34:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=9SzE/B4CCxEhq7OsUQcCE6/p2xSl6Tj+2wygHScnuHQ=; b=hpdpNkOFGyBlI+JIZTTOrrKnLGNHOyR3/Uo/YEbDvJwWRpq+FfFJr36/DtYuQnCB4P oV9ELP1JFK+ucFywfSXLCkqVr3qymX3epBYwjZl7k74iWcLhjX0A8g9Jw1z30lo9mYVO AzNZprjiq4UlGFI9XBLMI3exJpQean5hs42gBpkmz8b1wYcvrzZKo4UVZdNOHdzZyPXE RAV07EYoWylGVWJm6I1NZWIaHW4PhwlxNs+qlncWjpWvjt5lFRVo+ctXIbV29nYXkohU Y5libxhM/0WpS4DQqG2wU4R2vLD2YrpI/1RCMJwGkg7yqk5HuDz5BKLCy75tXVoQGMmJ 1yBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=9SzE/B4CCxEhq7OsUQcCE6/p2xSl6Tj+2wygHScnuHQ=; b=VWfqPYmpSZi1EI5qnmOQTApfum4vq6fbTh5Fo1mbZgppcD6e+0KagQgntzkk2cLo0u /KK9pL9PJpJSqGY4v0qRHy667zoNLkhQby4LqjW4+bb9rlhX3Bk6xj1CG+ZNxw8BolW+ QPASRcQw6QFuAP2DtWyhnx6Sd9Xpdif4fwjiotgudLzKlt2NXlRgg3HNbMbJFixczBN9 hdjj7qKuPkStXLjyiY5OmNv5YXxWpxLVGzvhf0BDvt8DNf/1zMy97XrF1NL4y7hSWDfY 4sNwSjc6VwD6elHNz+M1hI8avi/Ku983NTN9wPJkx3oNkTYzSq8TF0dDwMJfFaO8dfKE FrZw== X-Gm-Message-State: APjAAAWwaLrKgon+u+uCPBZ8ja5uSbrt7p5aHbiWw7wI6VbG+Qmhj3ye unntrETAni/l+tfBg+XPjg2ExDYVVUM= X-Received: by 2002:a1c:1981:: with SMTP id 123mr1492487wmz.88.1568878446575; Thu, 19 Sep 2019 00:34:06 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.34.05 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:34:05 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 06/10] sched/fair: use load instead of runnable load in load_balance Date: Thu, 19 Sep 2019 09:33:37 +0200 Message-Id: <1568878421-12301-7-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org runnable load has been introduced to take into account the case where blocked load biases the load balance decision which was selecting underutilized group with huge blocked load whereas other groups were overloaded. The load is now only used when groups are overloaded. In this case, it's worth being conservative and taking into account the sleeping tasks that might wakeup on the cpu. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7e74836..15ec38c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5385,6 +5385,11 @@ static unsigned long cpu_runnable_load(struct rq *rq) return cfs_rq_runnable_load_avg(&rq->cfs); } +static unsigned long cpu_load(struct rq *rq) +{ + return cfs_rq_load_avg(&rq->cfs); +} + static unsigned long capacity_of(int cpu) { return cpu_rq(cpu)->cpu_capacity; @@ -8070,7 +8075,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false)) env->flags |= LBF_NOHZ_AGAIN; - sgs->group_load += cpu_runnable_load(rq); + sgs->group_load += cpu_load(rq); sgs->group_util += cpu_util(i); sgs->sum_h_nr_running += rq->cfs.h_nr_running; @@ -8512,7 +8517,7 @@ static struct sched_group *find_busiest_group(struct lb_env *env) init_sd_lb_stats(&sds); /* - * Compute the various statistics relavent for load balancing at + * Compute the various statistics relevant for load balancing at * this level. */ update_sd_lb_stats(env, &sds); @@ -8672,10 +8677,10 @@ static struct rq *find_busiest_queue(struct lb_env *env, switch (env->balance_type) { case migrate_load: /* - * When comparing with load imbalance, use cpu_runnable_load() + * When comparing with load imbalance, use cpu_load() * which is not scaled with the CPU capacity. */ - load = cpu_runnable_load(rq); + load = cpu_load(rq); if (nr_running == 1 && load > env->imbalance && !check_cpu_capacity(rq, env->sd)) @@ -8683,7 +8688,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, /* * For the load comparisons with the other CPU's, consider - * the cpu_runnable_load() scaled with the CPU capacity, so + * the cpu_load() scaled with the CPU capacity, so * that the load can be moved away from the CPU that is * potentially running at a lower capacity. * From patchwork Thu Sep 19 07:33:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174042 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664220ill; Thu, 19 Sep 2019 00:34:17 -0700 (PDT) X-Google-Smtp-Source: APXvYqyzSNYfEd7MzkiNabch03+gDYmT95Cr+d95xAmen9BCW3j+0349PDzQaa2hBc56msh9DEnU X-Received: by 2002:a17:907:20eb:: with SMTP id rh11mr12706897ejb.25.1568878457225; Thu, 19 Sep 2019 00:34:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878457; cv=none; d=google.com; s=arc-20160816; b=b2TbBIm/t/9SHSEmm98GcQPwJM8r+Zs9TmJjOMNpAB+R3aWCK/6UlGsxAtILVTf4A0 kp2hkGpWOPCK5S5QXrI+WKYbWDtCEt/yDgshgfDKR2VLsuo9PVDyOcck7S3muC+VoC70 J5rmo/NKuQy1dAjUUjgKOTY8txWX6PM+HX4yOZqsRJv8p86zBXRYlRf5W1VRXvS/4ZPO H5iYe1eP2lzGF5NRSJOJ9zB0OQvPQFL4SV9uaD84vT9o0RMfqv2hwm88qyk3nnBsxVas Kv9aqBVDf9Aam2YZYJCZJGNTzVpz2SqRew7xfwi+ipGdovfFUt3r4Bm/Fntgc5yEFw4/ /DGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=EBLUOB2EXDZsxDF6E6gn9L3xHCrHxfXfVGa48LIRlDY=; b=n7exypyg6wMeTIWeqdY+xz4e1AKvhcKPXufnoOrgO5isrznx0ZhxDSZcczXE+LRpI9 uwkB3PJJV//vAgPh4PmEplx2S6BZuwsYJSmSXvcBxy8IYNop4iScBUxwluL9U2reP+rk ePA95mcoSGPUT100YkrXuZYq3OPG+arI8nzNN+IDmLrsmUsSbXrDztC8tThN1821/l4M beijJ5AVLO1XF26z1toeU3SKazOt9riso//B7zaVI+QfKZqKwBytfQYZneDoEsvskNr0 MPN4Q6onLCz23c/03rgi8gIckoTtq/+pNcFiVb/RboLJvMN4y8pqpOCxKl1nCNsjJloh swbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kWLPydG7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id x25si311420edd.350.2019.09.19.00.34.17; Thu, 19 Sep 2019 00:34:17 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=kWLPydG7; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388677AbfISHeN (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:13 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:55679 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388657AbfISHeL (ORCPT ); Thu, 19 Sep 2019 03:34:11 -0400 Received: by mail-wm1-f65.google.com with SMTP id a6so3048409wma.5 for ; Thu, 19 Sep 2019 00:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=EBLUOB2EXDZsxDF6E6gn9L3xHCrHxfXfVGa48LIRlDY=; b=kWLPydG7rKDJilBY5fR43MCB50iZ6fjHCaPSf8iG4jbYCpmgwNMIRGSn1rhTb8MJUV HkGHdW4R/uzQo5p9Ru6uWh63KeB0DI3eYbrB+qbp0DBfKVUmzshM23iSPiG5hluIM6VS z7ch/Puy6hYlu/tDbjFMhpEgQWLx6Drx3qST/mlredIsROxcFxe3wHZobKujOFiyUhtH i9PZOZBLlq1dJyWrlcqAEOL5XTFH0uA56Qm69s0Na8/mxqBTVoFnCf3SJ3QjU2BS2gDd n/iu9EDM4ZiB+pjlrQjk6QECQVjbM4dayHr90rriQQmXHw+KeZY8nwXf2rdstTD+viKQ DCRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=EBLUOB2EXDZsxDF6E6gn9L3xHCrHxfXfVGa48LIRlDY=; b=hWvsz3olBeC5HwJ5FmB45ceai2jEzLBJwNy/WNqcOzSbLGRSBnLV6aVHLMhvrlB4Ey +ZDv7Bhpqq/eVmeK7lRQxpmD2nHlI84s4LSYhXaxVdHsNcdMWGmfP5d5FbKb5eWUVMt3 IzxxxhOjXOv5VhUuo1nrKfGG579Lka0PnjZkBdfbAfRKWa8vaJ366HyovVfpg1Snuq01 uAagrE2bRsQZecDQQ4suhgj7RAPDflxYomayfJlvhWB6OtRm/6nCwTTFUOm5Vh7w/Xdh BpOLNLhqm9NjjrfwgzeNOqe5SnfkWpg5bdP+e41/jxZMcyTbw0fCh0tD/QblM9AiNDH4 ADuA== X-Gm-Message-State: APjAAAUTaaA29t0YTsJeqW/WF7X4AoQ79KmwEPNPJMvtgsiD+0irMdBt hDOlS5jqXf+7tnMxK7yIXMdyV+5NV20= X-Received: by 2002:a7b:c84f:: with SMTP id c15mr1577723wml.52.1568878449259; Thu, 19 Sep 2019 00:34:09 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.34.06 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:34:06 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 07/10] sched/fair: evenly spread tasks when not overloaded Date: Thu, 19 Sep 2019 09:33:38 +0200 Message-Id: <1568878421-12301-8-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When there is only 1 cpu per group, using the idle cpus to evenly spread tasks doesn't make sense and nr_running is a better metrics. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 15ec38c..a7c8ee6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8596,18 +8596,34 @@ static struct sched_group *find_busiest_group(struct lb_env *env) busiest->sum_nr_running > local->sum_nr_running + 1) goto force_balance; - if (busiest->group_type != group_overloaded && - (env->idle == CPU_NOT_IDLE || - local->idle_cpus <= (busiest->idle_cpus + 1))) - /* - * If the busiest group is not overloaded - * and there is no imbalance between this and busiest group - * wrt idle CPUs, it is balanced. The imbalance - * becomes significant if the diff is greater than 1 otherwise - * we might end up to just move the imbalance on another - * group. - */ - goto out_balanced; + if (busiest->group_type != group_overloaded) { + if (env->idle == CPU_NOT_IDLE) + /* + * If the busiest group is not overloaded (and as a + * result the local one too) but this cpu is already + * busy, let another idle cpu try to pull task. + */ + goto out_balanced; + + if (busiest->group_weight > 1 && + local->idle_cpus <= (busiest->idle_cpus + 1)) + /* + * If the busiest group is not overloaded + * and there is no imbalance between this and busiest + * group wrt idle CPUs, it is balanced. The imbalance + * becomes significant if the diff is greater than 1 + * otherwise we might end up to just move the imbalance + * on another group. Of course this applies only if + * there is more than 1 CPU per group. + */ + goto out_balanced; + + if (busiest->sum_h_nr_running == 1) + /* + * busiest doesn't have any tasks waiting to run + */ + goto out_balanced; + } force_balance: /* Looks like there is an imbalance. Compute it */ From patchwork Thu Sep 19 07:33:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174043 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664294ill; Thu, 19 Sep 2019 00:34:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqwInXEaYFIRye0ul6D4PlFEs1jjyO2SwtBC7iH3HA02OX3/LG/XaZs0HDFlwCNgAUQzKYJh X-Received: by 2002:a17:906:19d3:: with SMTP id h19mr13326173ejd.121.1568878460910; Thu, 19 Sep 2019 00:34:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878460; cv=none; d=google.com; s=arc-20160816; b=ypkrI1ZQTUcOBZb8duHUKDx26BjxKWTLgoOiVG0OWm6LKxIkaBe34xOX+hje345I6U cmSNwkzVXvhXJqAhkB4PKDartFqv3hCTT1YygcNkCNlQd0Sn1QS/fR7khBgES0o8EOu0 WdxDiZHimEvhnG5r8gEOgec27JblwA3p1GuKiTBQggFIX2L3aRm0xqvcDsfq51mYY8bZ pqbEk3e5NUnwrj1CGtY9AlJb19MhoDh78UR9V0jU/K38RyowHsRukf/csXf6R8Gvuwny JsX8/2AXAfzSn86rsUvXrZIvQqFMsAJ6W8DAJCSGGt5BZnhiRhS98HqxetFXtWgZcv3S 5BcQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=AbILMuuDKN55sODbN+TLyTtglr3cL8J1sJ9BIBcXt6A=; b=UYqSN1QbPNkTaFigajlb8JET6IIDqZEVPZCtLgvlfuAHc1llP9DO6E8IqxE+w7q59X mkTtyOy8bzid/pveWoDyMg854P0gPlK+bP/dAc45nphFl+eRkra5YTmWh6cR5vBfuCIq //Ip+1v3TYRqc0Ows1r6sQOAQQrfos645fw6qktb8tdOYUC9PP7aATw46kW3byeiciBZ QzdU9eGIXnVUD/jOlDkCzQlSqtlrxO4H3cRlANjyl6KDERSPGrphKsycRFM8FIU20yt9 95tb353EYipO8cRM97hV8F4CJPaqkZeQMSZ9yL6vLKlzPGWYKBEDP+fEfJWayo6SM/5b yQww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=otw9vtrw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z11si4857911edl.1.2019.09.19.00.34.20; Thu, 19 Sep 2019 00:34:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=otw9vtrw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388693AbfISHeQ (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:16 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:54676 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388671AbfISHeN (ORCPT ); Thu, 19 Sep 2019 03:34:13 -0400 Received: by mail-wm1-f65.google.com with SMTP id p7so3063087wmp.4 for ; Thu, 19 Sep 2019 00:34:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=AbILMuuDKN55sODbN+TLyTtglr3cL8J1sJ9BIBcXt6A=; b=otw9vtrwhhPK8FHKTcBk1J76bXEWX0hh4qd3KHNz4DE/4ogtL5vjwN8cGO7gpWGqns m3+31f30t6LakblwZJig06N4cS+FBHSap9q6xCdZto7aJKF3yZmmR1vb8EuScHhUJSjr fBYdoTeaqZLSinNRy1KJ8hgTCrTgxIX/3WvqqWvoHXlBVwtcAU5YfRMgOY+uFrubEIIs VtdY1+hc6mbIRODCttt/dayg7Nibpiaad1JXUFJgH1Nmfly7s4iMHFU3FoVEbgdWSR+j NydRukRgLiPCJXIQ1w4L0YLvRXncbZFS12/YvxdbLNK2B2Iy4vvTPND4CMk4xvhEz2yL PlRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=AbILMuuDKN55sODbN+TLyTtglr3cL8J1sJ9BIBcXt6A=; b=ukFhl6jd7V3+AWbFUNGD1QN6JhVdr060sXLpecoKFEut2wnNZs42gTn4hBSrATRRfo bNp3cyDmFFP5JBzTdTBe34anQM+kEETVK1v+mNdeLR8wy3t5wSGyWLz+wTEgbJ7Ecvjm PjoCcPCaeRGD9WvaVNAt018tC/oDlbYlZ6eTpfmLgMPfm4AUTmHbxRC8XMkg57iL7U6z 759r2uzEumoEwjwBORmlWqpMvnE8d8oHnfarGNINSvru4eKcyRP0CIf2x6p9m4lIsVl9 ibBvM6+MnRhNs/PApf6VpOjRb4BffRIXyFe1YDBDdBOsEVH5ceEqU3RLmdJ3mt1/+KJC YnOw== X-Gm-Message-State: APjAAAUhqNsRENy++qZr9NSDxPiz/0Q6V+XPp0JUw/Ziobfe94v0yXX1 MyYy343wLDP0jMayQd/ybJInDGEPFzQ= X-Received: by 2002:a05:600c:254f:: with SMTP id e15mr1487005wma.163.1568878450982; Thu, 19 Sep 2019 00:34:10 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.34.09 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:34:09 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 08/10] sched/fair: use utilization to select misfit task Date: Thu, 19 Sep 2019 09:33:39 +0200 Message-Id: <1568878421-12301-9-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org utilization is used to detect a misfit task but the load is then used to select the task on the CPU which can lead to select a small task with high weight instead of the task that triggered the misfit migration. Signed-off-by: Vincent Guittot Acked-by: Valentin Schneider --- kernel/sched/fair.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index a7c8ee6..acca869 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7429,14 +7429,8 @@ static int detach_tasks(struct lb_env *env) break; case migrate_misfit: - load = task_h_load(p); - - /* - * utilization of misfit task might decrease a bit - * since it has been recorded. Be conservative in the - * condition. - */ - if (load < env->imbalance) + /* This is not a misfit task */ + if (task_fits_capacity(p, capacity_of(env->src_cpu))) goto next; env->imbalance = 0; From patchwork Thu Sep 19 07:33:40 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174045 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664339ill; Thu, 19 Sep 2019 00:34:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqyET/QQuxdlkSJaGYcPNU3M8Dh+YJX20uBiggllmNC/pP2R3lsFGv3MdJUpg52TAgxZ5RcH X-Received: by 2002:a05:6402:1259:: with SMTP id l25mr14634754edw.174.1568878464120; Thu, 19 Sep 2019 00:34:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878464; cv=none; d=google.com; s=arc-20160816; b=BSWy3Y3aCYbXG3nlxwC23nS73wmUEt6eKMVQJjyTGIYB9QJl0ByM9eNeumDeI5Wopv O4kKmUdXeRZf2NGSoYt9TK0vWIm15YH5ghuCi9SOK/QfeyAnRImtfE4mdV2p1hU45yK5 IR06vqa/K6PoJiBbEezNz5rOHvIVagNgGLhuu1gN5+4W8Yp/yww5a9xUr1wtpFf9qQ02 Ndu/Rdu1h9AfLYOLBQBFpcJnVV6AhT5SG6y43x24dl2wHG/zDy4on7WNLAZ3T455vf9y +87Lm8DyNQnAc75M9RddO8lNwwnOItS12LO8rSE+36kHjc/TZkcX55oiDrg42iiKqq86 eNvA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=E0Lgz25SpUetPrpayHmg7zZdcUkNn2aH/KnWe2aleok=; b=FAnI0l9z7e4OShdBNnynF3/EwLiQsKrzFHLInof9WQRQvcBOIx0KFQwTvGSXVyzPrn D/QuM75MX2jScGZ6rm2fkTB1Z0OWcDZ3WGEUP5DUos081W9vE+VopLQ2nY9K7K3Sjr/x 4jO6C3DlWIsZDcsrfyu5NUp4H8Ll0L+MaxbVz/sCp1JJJLqm/lqk5xJ3lXEzzDq1sj/M OhPXQ5DjVdExCmubDCFTdtwjIXCi/kNEUDaCsPSRvPuuxVOqAqlTQ2mGKvDwCxRdVajX FOpC6R7HgFp1tEog8wpDMejhKbLhJW9UHaYFPbhhXGPd0CKlo4QhmUJ81yohKr+j3OgE zLCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Lk+J5omQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z11si4857911edl.1.2019.09.19.00.34.23; Thu, 19 Sep 2019 00:34:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=Lk+J5omQ; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388719AbfISHeX (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:23 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:41723 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388657AbfISHeP (ORCPT ); Thu, 19 Sep 2019 03:34:15 -0400 Received: by mail-wr1-f65.google.com with SMTP id h7so1905342wrw.8 for ; Thu, 19 Sep 2019 00:34:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=E0Lgz25SpUetPrpayHmg7zZdcUkNn2aH/KnWe2aleok=; b=Lk+J5omQRVbDf3S2TzthQ2TDCGVfOORMYRykG6caePHrKT4UJLOqzRiVomJwl0v6Si Zh5ON9sbhJ3nB4SATF5E6spIrYRy8+8bz2gvvd3eEajirzSv9l/jYn4+YCiLanGOZJ35 EpdhzEhoSF5cMu0ARjJSK84l14QiFALGNWSAFpLF4vIQWtWC86co/Bs+WpB0RJJZ3M4k SURkqGWEDAhRG0csovcm51s9ACqc0d09JHgtg3SWiN6HbXPJQdcNvKa6e1UORvVRXGsr bQsDRMz6SxErS/ZnVS1aw3rLhFF7KIdAi11uZaFZU3FpnW9vP97JToCaCnWADiFYI899 cQ4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=E0Lgz25SpUetPrpayHmg7zZdcUkNn2aH/KnWe2aleok=; b=X+Yy4f2QpthjB8UAnpPiv2iP7sr0MQGjdT/7UDzQGwM3RmSO9vhVpjkDl/S5MZDfhN B6seS9RFczJefpahLJ2K6El2KV3NTm4s5n/rKLYWyWYfW+yK8y/pQ4fVT0ztlf/O/i6Y nH2JrP2SdAX1IjaZwpNV/WeP4tSAbTqMRFs7AF/wQ6ZMr5HR+KeNVXA3x2yiE0r2rS3F DCL/QH+uqN+L0Om2GhXcEmONpy0ycaUvs+oCy1UhteJkt2ONaRpSrsH5oUF2H35A15kL ZDRveVwT/53IO6gZl2YpzbS7SErq3lVYN2ULhXBRmoXKJDU7q3h7hG7kG7uJ7xG0wlTL Y2yg== X-Gm-Message-State: APjAAAV8VTT4u7VlS49ZOIgwwZhNEOh2kwmdYPKftDyGLac3E0Wqdl3H 3iPrASnjirra2cH1YG63EFurJHuK7Aw= X-Received: by 2002:adf:dbc6:: with SMTP id e6mr1492515wrj.149.1568878453024; Thu, 19 Sep 2019 00:34:13 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.34.11 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:34:11 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 09/10] sched/fair: use load instead of runnable load in wakeup path Date: Thu, 19 Sep 2019 09:33:40 +0200 Message-Id: <1568878421-12301-10-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org runnable load has been introduced to take into account the case where blocked load biases the wake up path which may end to select an overloaded CPU with a large number of runnable tasks instead of an underutilized CPU with a huge blocked load. Tha wake up path now starts to looks for idle CPUs before comparing runnable load and it's worth aligning the wake up path with the load_balance. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index acca869..39a37ae 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5485,7 +5485,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, s64 this_eff_load, prev_eff_load; unsigned long task_load; - this_eff_load = cpu_runnable_load(cpu_rq(this_cpu)); + this_eff_load = cpu_load(cpu_rq(this_cpu)); if (sync) { unsigned long current_load = task_h_load(current); @@ -5503,7 +5503,7 @@ wake_affine_weight(struct sched_domain *sd, struct task_struct *p, this_eff_load *= 100; this_eff_load *= capacity_of(prev_cpu); - prev_eff_load = cpu_runnable_load(cpu_rq(prev_cpu)); + prev_eff_load = cpu_load(cpu_rq(prev_cpu)); prev_eff_load -= task_load; if (sched_feat(WA_BIAS)) prev_eff_load *= 100 + (sd->imbalance_pct - 100) / 2; @@ -5591,7 +5591,7 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, max_spare_cap = 0; for_each_cpu(i, sched_group_span(group)) { - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); runnable_load += load; avg_load += cfs_rq_load_avg(&cpu_rq(i)->cfs); @@ -5732,7 +5732,7 @@ find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this continue; } - load = cpu_runnable_load(cpu_rq(i)); + load = cpu_load(cpu_rq(i)); if (load < min_load) { min_load = load; least_loaded_cpu = i; From patchwork Thu Sep 19 07:33:41 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 174044 Delivered-To: patch@linaro.org Received: by 2002:a92:7e96:0:0:0:0:0 with SMTP id q22csp664299ill; Thu, 19 Sep 2019 00:34:21 -0700 (PDT) X-Google-Smtp-Source: APXvYqysc0772lRj0LvqMw3hNi9XbeHW6v/k3L0d4knArq+dIY8Wi10/ZQeSMF/e76aVLAv+0p5R X-Received: by 2002:a17:906:76c2:: with SMTP id q2mr13164115ejn.202.1568878461554; Thu, 19 Sep 2019 00:34:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1568878461; cv=none; d=google.com; s=arc-20160816; b=MpOlJKreRM/vutNSp9ywayLk2mEYKOr1/OfoVR5zGrqD0J15SRQ0FxS00bObHMDc0K jOEykqettvyWSKopQLWsL87HSQwnSQ1e8fVANSJUULV+QVCZwlh2EEIWJ2WtJ4+TNLy3 UxCLs1si+44aefo7csZ9vvs3ZxyJMb4cOVSc4a1uTcerk0+a4b/Gq+gosxQLj/BnJrnj YjyPBxnAHFpYDAkbnCazdl3yHdsqRKFw8Yxx4WsC1jnfjVFRMiC40Cbj90wv3AU9b1ID TUp7eVgh5dNduYX73qokJZGniMi5NtGVl5AXiDJjFzr/OP/j+5tiWSrSItoKm8EavyNI 8HlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=G38Beoqh46tSkk9feTTalhF9YSEcEpI24Anrd1Kl34E=; b=DQi8hq43W3JnrNy1j5CaythQhvM3TG+xuohFra933aJEtwJxP9w7xd5vJxvelrmNk/ lVS1R1R0OeEdiTkgDpzNnOd2ljLGXfTJZ1Rt8nxetj2HMlbIw7T8FttYPdzB8FeW9Au6 GaTkFvy6j8CLRC3rDswi87D+nCYj6A32QqufVZYkjKuzYsdzkMhJzQudAVRDJDP/k0fO gul1Z6Mrf5QqB6K3lSogwzNRqfMCCUCflQEMk2dZ4LCZBd6vvlUO7r2UkW8eT+f0ntQO MJ+0mMcu91x9LaQVqZ/EeXHEeLAqIMSPUzdlj96PzbgGFLR2u7/iHBAdquO4DjCGbRuE /67Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ytdXnKLO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z11si4857911edl.1.2019.09.19.00.34.21; Thu, 19 Sep 2019 00:34:21 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=ytdXnKLO; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388705AbfISHeT (ORCPT + 26 others); Thu, 19 Sep 2019 03:34:19 -0400 Received: from mail-wr1-f65.google.com ([209.85.221.65]:44502 "EHLO mail-wr1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388685AbfISHeR (ORCPT ); Thu, 19 Sep 2019 03:34:17 -0400 Received: by mail-wr1-f65.google.com with SMTP id i18so1883807wru.11 for ; Thu, 19 Sep 2019 00:34:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=G38Beoqh46tSkk9feTTalhF9YSEcEpI24Anrd1Kl34E=; b=ytdXnKLOw7tyOpXbn4f94AyPbvdEkIJ7jwf1d/tJKiU5sjIzlgVtIZL2lgmxoNqKYk T4lGkWHf6yrp/1AtWR2IIlqKJ+lexzWecwblAecG+xVH4SDYZD4XsYbuc8469OLtdeNf kmLkqQR2twsWELCby55o28FgPGbRNSia6dCNeKWEeQnWzrk5qn6+fy62+mX2zQph18Sy J+xle92Rxh+kidq8JSP/FTxdyD458kWYcAxcZrdM6JGGm3eURpLIDjjX85isS4TPCfcv IPFDoMS/yzf0810SpaRTShFS2jzigstH6B66XkP7XGQZ/XbFB3uPgUITFJkfc0n7TYab D3vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=G38Beoqh46tSkk9feTTalhF9YSEcEpI24Anrd1Kl34E=; b=B8KHXY0Skz/2I750pORRd+gWAVKbwM/4cWrSsv4aC2F0x42PM+nCL/qd7WWNB1dfAw MFxD6xHvCv7E65bjhK4KuaV6TsvM1wwJakVZ09Q9ONVkEE572AcE8ojuhCCf/xJp2ND3 3d+sNHMCa3njIy2O5WEDu4SaMq0ff3DJZdOHr6oBUecUvI3tOqWBNDwtD967SwZBOfgy PCddi0Ojow+7qX7WGAWmf4Tphzz0r2SP2yuePd7FZX9KTwmtr4K8xm7ztks57dTO7H+r eTMRrHNTIWI3z2VDR0O5Oh80lDAUeIhVNEOQG5QGK93cFPwcQyLdxQtvAhNZRhorwxZD kWAA== X-Gm-Message-State: APjAAAWxzIzPyD9WLBrj5laXrluduTFneDb6tymIqCot1wJ3zkeums1b NDoc01ePWxNwKwzfHhhATRHnxYyOLWs= X-Received: by 2002:adf:904f:: with SMTP id h73mr6287470wrh.128.1568878454611; Thu, 19 Sep 2019 00:34:14 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:d555:8fca:a19c:222c]) by smtp.gmail.com with ESMTPSA id s12sm13300250wra.82.2019.09.19.00.34.13 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Thu, 19 Sep 2019 00:34:13 -0700 (PDT) From: Vincent Guittot To: linux-kernel@vger.kernel.org, mingo@redhat.com, peterz@infradead.org Cc: pauld@redhat.com, valentin.schneider@arm.com, srikar@linux.vnet.ibm.com, quentin.perret@arm.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, hdanton@sina.com, Vincent Guittot Subject: [PATCH v3 10/10] sched/fair: optimize find_idlest_group Date: Thu, 19 Sep 2019 09:33:41 +0200 Message-Id: <1568878421-12301-11-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> References: <1568878421-12301-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org find_idlest_group() now loads CPU's load_avg in 2 different ways. Consolidate the function to read and use load_avg only once and simplify the algorithm to only look for the group with lowest load_avg. Signed-off-by: Vincent Guittot --- kernel/sched/fair.c | 52 +++++++++++----------------------------------------- 1 file changed, 11 insertions(+), 41 deletions(-) -- 2.7.4 diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 39a37ae..1fac444 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5560,16 +5560,14 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, { struct sched_group *idlest = NULL, *group = sd->groups; struct sched_group *most_spare_sg = NULL; - unsigned long min_runnable_load = ULONG_MAX; - unsigned long this_runnable_load = ULONG_MAX; - unsigned long min_avg_load = ULONG_MAX, this_avg_load = ULONG_MAX; + unsigned long min_load = ULONG_MAX, this_load = ULONG_MAX; unsigned long most_spare = 0, this_spare = 0; int imbalance_scale = 100 + (sd->imbalance_pct-100)/2; unsigned long imbalance = scale_load_down(NICE_0_LOAD) * (sd->imbalance_pct-100) / 100; do { - unsigned long load, avg_load, runnable_load; + unsigned long load; unsigned long spare_cap, max_spare_cap; int local_group; int i; @@ -5586,15 +5584,11 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, * Tally up the load of all CPUs in the group and find * the group containing the CPU with most spare capacity. */ - avg_load = 0; - runnable_load = 0; + load = 0; max_spare_cap = 0; for_each_cpu(i, sched_group_span(group)) { - load = cpu_load(cpu_rq(i)); - runnable_load += load; - - avg_load += cfs_rq_load_avg(&cpu_rq(i)->cfs); + load += cpu_load(cpu_rq(i)); spare_cap = capacity_spare_without(i, p); @@ -5603,31 +5597,15 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, } /* Adjust by relative CPU capacity of the group */ - avg_load = (avg_load * SCHED_CAPACITY_SCALE) / - group->sgc->capacity; - runnable_load = (runnable_load * SCHED_CAPACITY_SCALE) / + load = (load * SCHED_CAPACITY_SCALE) / group->sgc->capacity; if (local_group) { - this_runnable_load = runnable_load; - this_avg_load = avg_load; + this_load = load; this_spare = max_spare_cap; } else { - if (min_runnable_load > (runnable_load + imbalance)) { - /* - * The runnable load is significantly smaller - * so we can pick this new CPU: - */ - min_runnable_load = runnable_load; - min_avg_load = avg_load; - idlest = group; - } else if ((runnable_load < (min_runnable_load + imbalance)) && - (100*min_avg_load > imbalance_scale*avg_load)) { - /* - * The runnable loads are close so take the - * blocked load into account through avg_load: - */ - min_avg_load = avg_load; + if (load < min_load) { + min_load = load; idlest = group; } @@ -5668,18 +5646,10 @@ find_idlest_group(struct sched_domain *sd, struct task_struct *p, * local domain to be very lightly loaded relative to the remote * domains but "imbalance" skews the comparison making remote CPUs * look much more favourable. When considering cross-domain, add - * imbalance to the runnable load on the remote node and consider - * staying local. + * imbalance to the load on the remote node and consider staying + * local. */ - if ((sd->flags & SD_NUMA) && - min_runnable_load + imbalance >= this_runnable_load) - return NULL; - - if (min_runnable_load > (this_runnable_load + imbalance)) - return NULL; - - if ((this_runnable_load < (min_runnable_load + imbalance)) && - (100*this_avg_load < imbalance_scale*min_avg_load)) + if (min_load + imbalance >= this_load) return NULL; return idlest;