From patchwork Fri May 25 13:12:28 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 136866 Delivered-To: patch@linaro.org Received: by 2002:a2e:9706:0:0:0:0:0 with SMTP id r6-v6csp3574748lji; Fri, 25 May 2018 06:14:01 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqWfpnXnq+G517Y7sfzLfks1VPgd+xBFDMg59SH0xE+Zai48oNEo+qSN47lQnR40AZXBqBv X-Received: by 2002:a63:b34e:: with SMTP id x14-v6mr2017760pgt.70.1527254041554; Fri, 25 May 2018 06:14:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1527254041; cv=none; d=google.com; s=arc-20160816; b=a/s6Q2pXCzO7Oi8/+utB0aSXneypOAVjgOzcALtVR2AVTWMiDGdwUVGyIUbgZ6sLCd ZQJ6Rky4u99lnQv/qWR9w349dyV3Cb/QaKwB5T7yyx2T72m6wDP4nLvIPyEte6ZNn+I8 u+G+jZDMTe4NCCwDwfW7PB6f10DMTenY6M4SXd/kdlbzXfJNPVoFemkaFl/i21GPV42f FW619v8FW7svunS/mkTOMGVhvQ20dYge4Cgq2pqTQIoxEmZqJ4Goo061jK/iI9qAw9/T Ouf3Y9+KscV9z5z5rJREidi9vspU7im0QlDvcIACAsSuGgH5PvSFbj8Fq7I5P0tBiKfs 0xzw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature:arc-authentication-results; bh=nTSic1dLCSqiM5XnJDa6S6TYVz1k4vZ89pFtGsiTIyo=; b=ZXNVXkAHreZ60qnCltbkKy0UbJxFpwb/MN7LgvtqqxaS4El5fJFgKWN3/2BQI7DOUI SLObBPF/acdZrFSy5Ue0Z7Lf1lKFEiX/B52VhwL7sWyyahlhD8kzdRiSxHh4jCgLTT1F Z0iCl6X0bCnlSbnb9cGcvCXaxWAhu9mJJz1zmS9deht7cE5UOzLTOR0NRg08HluX5Q4M 1FYItatH2XVb3/L6KIoe7jDlGUOZuHdeI4ZJfToGJ4lKGAkAqVBrJbqEB6OEwFagtucj fu4nmNDlblemlq84fHHDu3YEAGgBi/bWcAanlaCEPMD81UEYk2fEjLyTLGMr5lAubjO0 meNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MTKMC7nY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t1-v6si6931527plq.341.2018.05.25.06.14.01; Fri, 25 May 2018 06:14:01 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=MTKMC7nY; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936146AbeEYNN7 (ORCPT + 30 others); Fri, 25 May 2018 09:13:59 -0400 Received: from mail-wm0-f68.google.com ([74.125.82.68]:54296 "EHLO mail-wm0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935442AbeEYNMs (ORCPT ); Fri, 25 May 2018 09:12:48 -0400 Received: by mail-wm0-f68.google.com with SMTP id f6-v6so14298966wmc.4 for ; Fri, 25 May 2018 06:12:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=nTSic1dLCSqiM5XnJDa6S6TYVz1k4vZ89pFtGsiTIyo=; b=MTKMC7nYyq7dPrGlSxYJCPafm9jZgE5ftrcLFNHX3BZpSSB1bRDYXniV772jTfiddW jdqvj3U1L0kzAtnWHW72rOks7fSaJ1rdjLu3HGs+WY02GjzuJYO7c+HT2xolyloFqfys hb5y4mA04JTA22w8DzJw767S5IeTRw7lXKa+s= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=nTSic1dLCSqiM5XnJDa6S6TYVz1k4vZ89pFtGsiTIyo=; b=Elx1WKTL62NEzaCTnI2Nk2j2UUeXtKWeSNdz0qUBevN5X4boQlPd8mVSSZBNITq5Zx 5vfMlD9k6cEGk8d+UsJzv4eRrZOXZcK8PMYoYsRJJmV0z6qjlDnv6z9NKzhRuRbV53XK Zl0IBk4GseV2LpYuVOj8zdiC2HElyroucQESyTZ6BM+HQ0l70q7Xkwyi0Fx7ry+jXJ/f XOr+WUc1V9+2Ay3f7rYFrorP/q/gIjZo9caoxcr8jix8sS4jTwxO747NEbgc1Ir3/pfC wuGr4BowP1jHmaiUM1F5jbdg/oH6Cl/izy9+QyZ+GbM4y81b6paxLXZVQnxqtlNlh3cg yyDQ== X-Gm-Message-State: ALKqPwd8Ivhayo7YUmzI7a/JLITn+OzswDL29b3yKuPUbcazq4pR3lym dPaXquMcM7XpNlYMmA1jYlh2Pg== X-Received: by 2002:a1c:e64e:: with SMTP id d75-v6mr1655927wmh.101.1527253966998; Fri, 25 May 2018 06:12:46 -0700 (PDT) Received: from localhost.localdomain ([2a01:e0a:f:6020:a860:64b4:335b:c763]) by smtp.gmail.com with ESMTPSA id 4-v6sm9690948wmg.8.2018.05.25.06.12.46 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 25 May 2018 06:12:46 -0700 (PDT) From: Vincent Guittot To: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org, rjw@rjwysocki.net Cc: juri.lelli@redhat.com, dietmar.eggemann@arm.com, Morten.Rasmussen@arm.com, viresh.kumar@linaro.org, valentin.schneider@arm.com, quentin.perret@arm.com, Vincent Guittot Subject: [PATCH v5 07/10] sched/irq: add irq utilization tracking Date: Fri, 25 May 2018 15:12:28 +0200 Message-Id: <1527253951-22709-8-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> References: <1527253951-22709-1-git-send-email-vincent.guittot@linaro.org> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org interrupt and steal time are the only remaining activities tracked by rt_avg. Like for sched classes, we can use PELT to track their average utilization of the CPU. But unlike sched class, we don't track when entering/leaving interrupt; Instead, we take into account the time spent under interrupt context when we update rqs' clock (rq_clock_task). This also means that we have to decay the normal context time and account for interrupt time during the update. That's also important to note that because rq_clock == rq_clock_task + interrupt time and rq_clock_task is used by a sched class to compute its utilization, the util_avg of a sched class only reflects the utilization of the time spent in normal context and not of the whole time of the CPU. The utilization of interrupt gives an more accurate level of utilization of CPU. The CPU utilization is : avg_irq + (1 - avg_irq / max capacity) * /Sum avg_rq Most of the time, avg_irq is small and neglictible so the use of the approximation CPU utilization = /Sum avg_rq was enough Signed-off-by: Vincent Guittot --- kernel/sched/core.c | 4 +++- kernel/sched/fair.c | 26 +++++++------------------- kernel/sched/pelt.c | 38 ++++++++++++++++++++++++++++++++++++++ kernel/sched/pelt.h | 7 +++++++ kernel/sched/sched.h | 1 + 5 files changed, 56 insertions(+), 20 deletions(-) -- 2.7.4 diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d155518..ab58288 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -16,6 +16,8 @@ #include "../workqueue_internal.h" #include "../smpboot.h" +#include "pelt.h" + #define CREATE_TRACE_POINTS #include @@ -184,7 +186,7 @@ static void update_rq_clock_task(struct rq *rq, s64 delta) #if defined(CONFIG_IRQ_TIME_ACCOUNTING) || defined(CONFIG_PARAVIRT_TIME_ACCOUNTING) if ((irq_delta + steal) && sched_feat(NONTASK_CAPACITY)) - sched_rt_avg_update(rq, irq_delta + steal); + update_irq_load_avg(rq, irq_delta + steal); #endif } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index da75eda..1bb3379 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -5323,8 +5323,6 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load, this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i; } - - sched_avg_update(this_rq); } /* Used instead of source_load when we know the type == 0 */ @@ -7298,6 +7296,9 @@ static inline bool others_rqs_have_blocked(struct rq *rq) if (rq->avg_dl.util_avg) return true; + if (rq->avg_irq.util_avg) + return true; + return false; } @@ -7362,6 +7363,7 @@ static void update_blocked_averages(int cpu) } update_rt_rq_load_avg(rq_clock_task(rq), rq, 0); update_dl_rq_load_avg(rq_clock_task(rq), rq, 0); + update_irq_load_avg(rq, 0); /* Don't need periodic decay once load/util_avg are null */ if (others_rqs_have_blocked(rq)) done = false; @@ -7432,6 +7434,7 @@ static inline void update_blocked_averages(int cpu) update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq); update_rt_rq_load_avg(rq_clock_task(rq), rq, 0); update_dl_rq_load_avg(rq_clock_task(rq), rq, 0); + update_irq_load_avg(rq, 0); #ifdef CONFIG_NO_HZ_COMMON rq->last_blocked_load_update_tick = jiffies; if (!cfs_rq_has_blocked(cfs_rq) && !others_rqs_have_blocked(rq)) @@ -7544,24 +7547,9 @@ static inline int get_sd_load_idx(struct sched_domain *sd, static unsigned long scale_rt_capacity(int cpu) { struct rq *rq = cpu_rq(cpu); - u64 total, used, age_stamp, avg; - s64 delta; - - /* - * Since we're reading these variables without serialization make sure - * we read them once before doing sanity checks on them. - */ - age_stamp = READ_ONCE(rq->age_stamp); - avg = READ_ONCE(rq->rt_avg); - delta = __rq_clock_broken(rq) - age_stamp; - - if (unlikely(delta < 0)) - delta = 0; - - total = sched_avg_period() + delta; - - used = div_u64(avg, total); + unsigned long used; + used = READ_ONCE(rq->avg_irq.util_avg); used += READ_ONCE(rq->avg_rt.util_avg); used += READ_ONCE(rq->avg_dl.util_avg); if (likely(used < SCHED_CAPACITY_SCALE)) diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c index 3d5bd3a..d2e4f21 100644 --- a/kernel/sched/pelt.c +++ b/kernel/sched/pelt.c @@ -355,3 +355,41 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running) return 0; } + +/* + * irq: + * + * util_sum = \Sum se->avg.util_sum but se->avg.util_sum is not tracked + * util_sum = cpu_scale * load_sum + * runnable_load_sum = load_sum + * + */ + +int update_irq_load_avg(struct rq *rq, u64 running) +{ + int ret = 0; + /* + * We know the time that has been used by interrupt since last update + * but we don't when. Let be pessimistic and assume that interrupt has + * happened just before the update. This is not so far from reality + * because interrupt will most probably wake up task and trig an update + * of rq clock during which the metric si updated. + * We start to decay with normal context time and then we add the + * interrupt context time. + * We can safely remove running from rq->clock because + * rq->clock += delta with delta >= running + */ + ret = ___update_load_sum(rq->clock - running, rq->cpu, &rq->avg_irq, + 0, + 0, + 0); + ret += ___update_load_sum(rq->clock, rq->cpu, &rq->avg_irq, + 1, + 1, + 1); + + if (ret) + ___update_load_avg(&rq->avg_irq, 1, 1); + + return ret; +} diff --git a/kernel/sched/pelt.h b/kernel/sched/pelt.h index 0e4f912..0ce9a5a 100644 --- a/kernel/sched/pelt.h +++ b/kernel/sched/pelt.h @@ -5,6 +5,7 @@ int __update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_e int __update_load_avg_cfs_rq(u64 now, int cpu, struct cfs_rq *cfs_rq); int update_rt_rq_load_avg(u64 now, struct rq *rq, int running); int update_dl_rq_load_avg(u64 now, struct rq *rq, int running); +int update_irq_load_avg(struct rq *rq, u64 running); /* * When a task is dequeued, its estimated utilization should not be update if @@ -51,6 +52,12 @@ update_dl_rq_load_avg(u64 now, struct rq *rq, int running) { return 0; } + +static inline int +update_irq_load_avg(struct rq *rq, u64 running) +{ + return 0; +} #endif diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0eb07a8..f7e8d5b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -850,6 +850,7 @@ struct rq { u64 age_stamp; struct sched_avg avg_rt; struct sched_avg avg_dl; + struct sched_avg avg_irq; u64 idle_stamp; u64 avg_idle;