From patchwork Tue Aug 24 11:29:40 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 502648 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88B0CC4338F for ; Tue, 24 Aug 2021 11:30:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6B72961373 for ; Tue, 24 Aug 2021 11:30:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236681AbhHXLas (ORCPT ); Tue, 24 Aug 2021 07:30:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236614AbhHXLaq (ORCPT ); Tue, 24 Aug 2021 07:30:46 -0400 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A351BC061757; Tue, 24 Aug 2021 04:29:59 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id 2so4025037pfo.8; Tue, 24 Aug 2021 04:29:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=JVr8PGszwG12RQLxsTfg9itjnLUTmikBGfspTpaerwM=; b=ZDZBH2mSLT+feJJpheykz9SuYyO5rGEd8xjPms8ZJXHWR7UNVDo6OrcCv6/ZZJbR6v LrCo4oy2tfRuKERpq5jszz44yJEx6I00vvNAxPd1AT6nBE0V0C1iA5KJ/8thCrUYnsCS 041S9U/TAG31QccfiPJChN6WbA+DsTyfwPoeP1I1/C71hW2b31StShyBit/D+bDTWmen 1JOKBV/6QWT3pWDsG3oxVbjstkaRsAjpsrB/Aq1GW0PzTmlIZJ16TAlpRHTOwjKk1wlo Qs1yjj0DnMQ45kI/FhidW03DGkx7UojtDQhFckrJr8rW1Bkl6sQFlOJTZo4Oyhk+G9OV XmJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=JVr8PGszwG12RQLxsTfg9itjnLUTmikBGfspTpaerwM=; b=Mun1T59gC1owNqiNRDJ3xc/3lxDOr2bedN8wMYY9M49PCTQOyvGOVTBUbajlc+wSvk ZxQZR89N5zltlTUw81B/Drio4X5wCOzSe6rBELFtzB0g7kB0SaDBVz4ijzKrsZsIcRqw ec+XX5Wg8nHwBQgTHNYL65zDz7YJcq6whmlx6ZrNhq7Fh5l7z5oplYqnF7OdPlpIiI/n lyYhn45aIf8ioyDAZpV5k+MwlZB0W+qG7RVdkgPYsi8kunkS6BHs48MibTaBEqiFva6K 0khB/XK6fCEBW2p/Ozjiz7QTJbU+HWUMU3GELvZtwWR376dSXkXo84jtZswHvMymqsMv Fsrw== X-Gm-Message-State: AOAM532p8cLMIwjd3UYt+qfkmUC0an7OgV7evmSrqgHGQKBwGxlSEcR3 4w70FGTZludfEOFTuAzWHKw= X-Google-Smtp-Source: ABdhPJxLUEzIuAMnwdG2zQjfjy3YxnU+T74jFM8L65lCKMs6eTvFCNINd2kll67RMj14dV2SKj62jw== X-Received: by 2002:a62:1888:0:b029:3c9:7957:519b with SMTP id 130-20020a6218880000b02903c97957519bmr38551885pfy.17.1629804599244; Tue, 24 Aug 2021 04:29:59 -0700 (PDT) Received: from localhost.localdomain ([45.77.24.247]) by smtp.gmail.com with ESMTPSA id on15sm2128732pjb.19.2021.08.24.04.29.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 04:29:58 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, achaiken@aurora.tech Cc: lkp@intel.com, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Yafang Shao Subject: [PATCH v3 1/7] sched, fair: use __schedstat_set() in set_next_entity() Date: Tue, 24 Aug 2021 11:29:40 +0000 Message-Id: <20210824112946.9324-2-laoar.shao@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210824112946.9324-1-laoar.shao@gmail.com> References: <20210824112946.9324-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org schedstat_enabled() has been already checked, so we can use __schedstat_set() directly. Signed-off-by: Yafang Shao Acked-by: Mel Gorman Cc: Alison Chaiken --- kernel/sched/fair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5aa3cfd15a2e..422426768b84 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -4502,7 +4502,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) */ if (schedstat_enabled() && rq_of(cfs_rq)->cfs.load.weight >= 2*se->load.weight) { - schedstat_set(se->statistics.slice_max, + __schedstat_set(se->statistics.slice_max, max((u64)schedstat_val(se->statistics.slice_max), se->sum_exec_runtime - se->prev_sum_exec_runtime)); } From patchwork Tue Aug 24 11:29:41 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 502063 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCFD7C4338F for ; Tue, 24 Aug 2021 11:30:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A272661184 for ; Tue, 24 Aug 2021 11:30:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236627AbhHXLay (ORCPT ); Tue, 24 Aug 2021 07:30:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35470 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235509AbhHXLar (ORCPT ); Tue, 24 Aug 2021 07:30:47 -0400 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A37DC0613CF; Tue, 24 Aug 2021 04:30:03 -0700 (PDT) Received: by mail-pj1-x1032.google.com with SMTP id h1so8177574pjs.2; Tue, 24 Aug 2021 04:30:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=XJRc1IGJooe37yaS2cWWgxiC9DlWZ8UbzXcrx731grY=; b=cwpkkKBKNcpT9gyT7cAOEDs4gJcYgc2LKIItlK7xwjFEx2oaujeJdjMlMTRk7u4jHj rZ5Ix0CYoM1M2KVp/y/0lc59pLjzO3JKi2FSZmMlqdStJA9zbjPAFWE8hNMk1JzvDrsP c8nW342MylykES6i+VHwjjKoE2g/d3qd5N/i0DB55x56jbMmIoNjx279/Im/W2cs8ALa gC6qC5JOM7UjI1hPhTGPFUadx+xXi0Au2e3G3NNV4xl1VmfZLWmxFXpoGU+RdyoidUnZ NHoS7koyn0WpLsu2q7jOjo9Ju+Ms4hLvnKzgZqLIYEVpR6r07LkXsjtGWiQWmtdX7eFp 7Kmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XJRc1IGJooe37yaS2cWWgxiC9DlWZ8UbzXcrx731grY=; b=iRo3ZyFEKctAzTJCyAZRUHCLt21LqxXcGXZbK4r4QPr1y1ivV8WKGjcbD7ibJ0u9KC RVXRq5ixR+PVGzl3bfLBczzFMKLtXfu0u5GlOjkwLsdp+Z3mubuRJ51HAEjPnpfa7TQt tld/+yGc/OhBPRJ6SaVmJJKXbVkQ0YfV1rVN6SkSzcOiwJv2yHB269n2jfD0WcLnBcSz TXJkUBa+ZFfwX0BFH2nfM7ZH/lhjZjF+yu+IsO6mwbgGpYZIsxVxR8rpbkmlk6aQi/5x RR/RiWksiwIRRL0AxN5g6zQKy5o9CMGpEj69pOfpiMV7NGcaIkagrhc42eukmHt0K2SR 1k0w== X-Gm-Message-State: AOAM530JKz8xXGht6vuWkn/2AKqxVtFPgenR9cmNYmyFXGNc1nrc3YJ7 DyST6aJf/0ffDNlBrzhv5Js= X-Google-Smtp-Source: ABdhPJwsFd+YmgIHw1zPjLXo5hHIbX66/OHhAzv+zrWoj4eRZl4fxFBZHhUui6XbZzXjoq7mVfdDUg== X-Received: by 2002:a17:902:edd0:b0:135:b351:bd5a with SMTP id q16-20020a170902edd000b00135b351bd5amr4721611plk.52.1629804602752; Tue, 24 Aug 2021 04:30:02 -0700 (PDT) Received: from localhost.localdomain ([45.77.24.247]) by smtp.gmail.com with ESMTPSA id on15sm2128732pjb.19.2021.08.24.04.29.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 04:30:02 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, achaiken@aurora.tech Cc: lkp@intel.com, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Yafang Shao Subject: [PATCH v3 2/7] sched: make struct sched_statistics independent of fair sched class Date: Tue, 24 Aug 2021 11:29:41 +0000 Message-Id: <20210824112946.9324-3-laoar.shao@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210824112946.9324-1-laoar.shao@gmail.com> References: <20210824112946.9324-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org If we want to use the schedstats facility to trace other sched classes, we should make it independent of fair sched class. The struct sched_statistics is the schedular statistics of a task_struct or a task_group. So we can move it into struct task_struct and struct task_group to achieve the goal. After the patch, schestats are orgnized as follows, struct task_struct { ... struct sched_statistics statistics; ... struct sched_entity *se; struct sched_rt_entity *rt; ... }; struct task_group { |---> stats[0] : of CPU0 ... | struct sched_statistics **stats; --|---> stats[1] : of CPU1 ... | |---> stats[n] : of CPUn #ifdef CONFIG_FAIR_GROUP_SCHED struct sched_entity **se; #endif #ifdef CONFIG_RT_GROUP_SCHED struct sched_rt_entity **rt_se; #endif ... }; The sched_statistics members may be frequently modified when schedstats is enabled, in order to avoid impacting on random data which may in the same cacheline with them, the struct sched_statistics is defined as cacheline aligned. As this patch changes the core struct of scheduler, so I verified the performance it may impact on the scheduler with 'perf bench sched pipe', suggested by Mel. Below is the result, in which all the values are in usecs/op. Before After kernel.sched_schedstats=0 ~5.6 ~5.6 kernel.sched_schedstats=1 ~5.7 ~5.7 [These data is a little difference with the prev version, that is because my old test machine is destroyed so I have to use a new different test machine.] Almost no impact on the sched performance. No functional change. [lkp@intel.com: reported build failure in earlier version] Signed-off-by: Yafang Shao Acked-by: Mel Gorman Cc: kernel test robot Cc: Alison Chaiken --- include/linux/sched.h | 5 +- kernel/sched/core.c | 24 ++++---- kernel/sched/deadline.c | 4 +- kernel/sched/debug.c | 90 +++++++++++++++-------------- kernel/sched/fair.c | 121 ++++++++++++++++++++++++++++----------- kernel/sched/rt.c | 4 +- kernel/sched/sched.h | 3 + kernel/sched/stats.h | 55 ++++++++++++++++++ kernel/sched/stop_task.c | 4 +- 9 files changed, 212 insertions(+), 98 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index f43fb7a32a9c..39c29eae1af9 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -521,7 +521,7 @@ struct sched_statistics { u64 nr_wakeups_passive; u64 nr_wakeups_idle; #endif -}; +} ____cacheline_aligned; struct sched_entity { /* For load-balancing: */ @@ -537,8 +537,6 @@ struct sched_entity { u64 nr_migrations; - struct sched_statistics statistics; - #ifdef CONFIG_FAIR_GROUP_SCHED int depth; struct sched_entity *parent; @@ -775,6 +773,7 @@ struct task_struct { unsigned int rt_priority; const struct sched_class *sched_class; + struct sched_statistics stats; struct sched_entity se; struct sched_rt_entity rt; struct sched_dl_entity dl; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 21d633971fcf..38bb7afb396c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3489,11 +3489,11 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags) #ifdef CONFIG_SMP if (cpu == rq->cpu) { __schedstat_inc(rq->ttwu_local); - __schedstat_inc(p->se.statistics.nr_wakeups_local); + __schedstat_inc(p->stats.nr_wakeups_local); } else { struct sched_domain *sd; - __schedstat_inc(p->se.statistics.nr_wakeups_remote); + __schedstat_inc(p->stats.nr_wakeups_remote); rcu_read_lock(); for_each_domain(rq->cpu, sd) { if (cpumask_test_cpu(cpu, sched_domain_span(sd))) { @@ -3505,14 +3505,14 @@ ttwu_stat(struct task_struct *p, int cpu, int wake_flags) } if (wake_flags & WF_MIGRATED) - __schedstat_inc(p->se.statistics.nr_wakeups_migrate); + __schedstat_inc(p->stats.nr_wakeups_migrate); #endif /* CONFIG_SMP */ __schedstat_inc(rq->ttwu_count); - __schedstat_inc(p->se.statistics.nr_wakeups); + __schedstat_inc(p->stats.nr_wakeups); if (wake_flags & WF_SYNC) - __schedstat_inc(p->se.statistics.nr_wakeups_sync); + __schedstat_inc(p->stats.nr_wakeups_sync); } /* @@ -4196,7 +4196,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) #ifdef CONFIG_SCHEDSTATS /* Even if schedstat is disabled, there should not be garbage */ - memset(&p->se.statistics, 0, sizeof(p->se.statistics)); + memset(&p->stats, 0, sizeof(p->stats)); #endif RB_CLEAR_NODE(&p->dl.rb_node); @@ -9608,9 +9608,9 @@ void normalize_rt_tasks(void) continue; p->se.exec_start = 0; - schedstat_set(p->se.statistics.wait_start, 0); - schedstat_set(p->se.statistics.sleep_start, 0); - schedstat_set(p->se.statistics.block_start, 0); + schedstat_set(p->stats.wait_start, 0); + schedstat_set(p->stats.sleep_start, 0); + schedstat_set(p->stats.block_start, 0); if (!dl_task(p) && !rt_task(p)) { /* @@ -9700,6 +9700,7 @@ static void sched_free_group(struct task_group *tg) { free_fair_sched_group(tg); free_rt_sched_group(tg); + free_tg_schedstats(tg); autogroup_free(tg); kmem_cache_free(task_group_cache, tg); } @@ -9719,6 +9720,9 @@ struct task_group *sched_create_group(struct task_group *parent) if (!alloc_rt_sched_group(tg, parent)) goto err; + if (!alloc_tg_schedstats(tg)) + goto err; + alloc_uclamp_sched_group(tg, parent); return tg; @@ -10456,7 +10460,7 @@ static int cpu_cfs_stat_show(struct seq_file *sf, void *v) int i; for_each_possible_cpu(i) - ws += schedstat_val(tg->se[i]->statistics.wait_sum); + ws += schedstat_val(tg->stats[i]->wait_sum); seq_printf(sf, "wait_sum %llu\n", ws); } diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index e94314633b39..51dd30990042 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1265,8 +1265,8 @@ static void update_curr_dl(struct rq *rq) return; } - schedstat_set(curr->se.statistics.exec_max, - max(curr->se.statistics.exec_max, delta_exec)); + schedstat_set(curr->stats.exec_max, + max(curr->stats.exec_max, delta_exec)); curr->se.sum_exec_runtime += delta_exec; account_group_exec_runtime(curr, delta_exec); diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 49716228efb4..4cfee2aa1a2d 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -442,9 +442,11 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group struct sched_entity *se = tg->se[cpu]; #define P(F) SEQ_printf(m, " .%-30s: %lld\n", #F, (long long)F) -#define P_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld\n", #F, (long long)schedstat_val(F)) +#define P_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld\n", \ + "se->statistics."#F, (long long)schedstat_val(tg->stats[cpu]->F)) #define PN(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", #F, SPLIT_NS((long long)F)) -#define PN_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", #F, SPLIT_NS((long long)schedstat_val(F))) +#define PN_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", \ + "se->statistics."#F, SPLIT_NS((long long)schedstat_val(tg->stats[cpu]->F))) if (!se) return; @@ -454,16 +456,16 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group PN(se->sum_exec_runtime); if (schedstat_enabled()) { - PN_SCHEDSTAT(se->statistics.wait_start); - PN_SCHEDSTAT(se->statistics.sleep_start); - PN_SCHEDSTAT(se->statistics.block_start); - PN_SCHEDSTAT(se->statistics.sleep_max); - PN_SCHEDSTAT(se->statistics.block_max); - PN_SCHEDSTAT(se->statistics.exec_max); - PN_SCHEDSTAT(se->statistics.slice_max); - PN_SCHEDSTAT(se->statistics.wait_max); - PN_SCHEDSTAT(se->statistics.wait_sum); - P_SCHEDSTAT(se->statistics.wait_count); + PN_SCHEDSTAT(wait_start); + PN_SCHEDSTAT(sleep_start); + PN_SCHEDSTAT(block_start); + PN_SCHEDSTAT(sleep_max); + PN_SCHEDSTAT(block_max); + PN_SCHEDSTAT(exec_max); + PN_SCHEDSTAT(slice_max); + PN_SCHEDSTAT(wait_max); + PN_SCHEDSTAT(wait_sum); + P_SCHEDSTAT(wait_count); } P(se->load.weight); @@ -530,9 +532,9 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p) p->prio); SEQ_printf(m, "%9Ld.%06ld %9Ld.%06ld %9Ld.%06ld", - SPLIT_NS(schedstat_val_or_zero(p->se.statistics.wait_sum)), + SPLIT_NS(schedstat_val_or_zero(p->stats.wait_sum)), SPLIT_NS(p->se.sum_exec_runtime), - SPLIT_NS(schedstat_val_or_zero(p->se.statistics.sum_sleep_runtime))); + SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime))); #ifdef CONFIG_NUMA_BALANCING SEQ_printf(m, " %d %d", task_node(p), task_numa_group_id(p)); @@ -948,8 +950,8 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, "---------------------------------------------------------" "----------\n"); -#define P_SCHEDSTAT(F) __PS(#F, schedstat_val(p->F)) -#define PN_SCHEDSTAT(F) __PSN(#F, schedstat_val(p->F)) +#define P_SCHEDSTAT(F) __PS("se.statistics."#F, schedstat_val(p->stats.F)) +#define PN_SCHEDSTAT(F) __PSN("se.statistics."#F, schedstat_val(p->stats.F)) PN(se.exec_start); PN(se.vruntime); @@ -962,33 +964,33 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, if (schedstat_enabled()) { u64 avg_atom, avg_per_cpu; - PN_SCHEDSTAT(se.statistics.sum_sleep_runtime); - PN_SCHEDSTAT(se.statistics.wait_start); - PN_SCHEDSTAT(se.statistics.sleep_start); - PN_SCHEDSTAT(se.statistics.block_start); - PN_SCHEDSTAT(se.statistics.sleep_max); - PN_SCHEDSTAT(se.statistics.block_max); - PN_SCHEDSTAT(se.statistics.exec_max); - PN_SCHEDSTAT(se.statistics.slice_max); - PN_SCHEDSTAT(se.statistics.wait_max); - PN_SCHEDSTAT(se.statistics.wait_sum); - P_SCHEDSTAT(se.statistics.wait_count); - PN_SCHEDSTAT(se.statistics.iowait_sum); - P_SCHEDSTAT(se.statistics.iowait_count); - P_SCHEDSTAT(se.statistics.nr_migrations_cold); - P_SCHEDSTAT(se.statistics.nr_failed_migrations_affine); - P_SCHEDSTAT(se.statistics.nr_failed_migrations_running); - P_SCHEDSTAT(se.statistics.nr_failed_migrations_hot); - P_SCHEDSTAT(se.statistics.nr_forced_migrations); - P_SCHEDSTAT(se.statistics.nr_wakeups); - P_SCHEDSTAT(se.statistics.nr_wakeups_sync); - P_SCHEDSTAT(se.statistics.nr_wakeups_migrate); - P_SCHEDSTAT(se.statistics.nr_wakeups_local); - P_SCHEDSTAT(se.statistics.nr_wakeups_remote); - P_SCHEDSTAT(se.statistics.nr_wakeups_affine); - P_SCHEDSTAT(se.statistics.nr_wakeups_affine_attempts); - P_SCHEDSTAT(se.statistics.nr_wakeups_passive); - P_SCHEDSTAT(se.statistics.nr_wakeups_idle); + PN_SCHEDSTAT(sum_sleep_runtime); + PN_SCHEDSTAT(wait_start); + PN_SCHEDSTAT(sleep_start); + PN_SCHEDSTAT(block_start); + PN_SCHEDSTAT(sleep_max); + PN_SCHEDSTAT(block_max); + PN_SCHEDSTAT(exec_max); + PN_SCHEDSTAT(slice_max); + PN_SCHEDSTAT(wait_max); + PN_SCHEDSTAT(wait_sum); + P_SCHEDSTAT(wait_count); + PN_SCHEDSTAT(iowait_sum); + P_SCHEDSTAT(iowait_count); + P_SCHEDSTAT(nr_migrations_cold); + P_SCHEDSTAT(nr_failed_migrations_affine); + P_SCHEDSTAT(nr_failed_migrations_running); + P_SCHEDSTAT(nr_failed_migrations_hot); + P_SCHEDSTAT(nr_forced_migrations); + P_SCHEDSTAT(nr_wakeups); + P_SCHEDSTAT(nr_wakeups_sync); + P_SCHEDSTAT(nr_wakeups_migrate); + P_SCHEDSTAT(nr_wakeups_local); + P_SCHEDSTAT(nr_wakeups_remote); + P_SCHEDSTAT(nr_wakeups_affine); + P_SCHEDSTAT(nr_wakeups_affine_attempts); + P_SCHEDSTAT(nr_wakeups_passive); + P_SCHEDSTAT(nr_wakeups_idle); avg_atom = p->se.sum_exec_runtime; if (nr_switches) @@ -1054,7 +1056,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, void proc_sched_set_task(struct task_struct *p) { #ifdef CONFIG_SCHEDSTATS - memset(&p->se.statistics, 0, sizeof(p->se.statistics)); + memset(&p->stats, 0, sizeof(p->stats)); #endif } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 422426768b84..7cb802431cfe 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -819,6 +819,41 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq) } #endif /* CONFIG_SMP */ +#ifdef CONFIG_FAIR_GROUP_SCHED +static inline void +__schedstats_from_sched_entity(struct sched_entity *se, + struct sched_statistics **stats) +{ + struct task_group *tg; + struct task_struct *p; + struct cfs_rq *cfs; + int cpu; + + if (entity_is_task(se)) { + p = task_of(se); + *stats = &p->stats; + } else { + cfs = group_cfs_rq(se); + tg = cfs->tg; + cpu = cpu_of(rq_of(cfs)); + *stats = tg->stats[cpu]; + } +} + +#else + +static inline void +__schedstats_from_sched_entity(struct sched_entity *se, + struct sched_statistics **stats) +{ + struct task_struct *p; + + p = task_of(se); + *stats = &p->stats; +} + +#endif + /* * Update the current task's runtime statistics. */ @@ -826,6 +861,7 @@ static void update_curr(struct cfs_rq *cfs_rq) { struct sched_entity *curr = cfs_rq->curr; u64 now = rq_clock_task(rq_of(cfs_rq)); + struct sched_statistics *stats = NULL; u64 delta_exec; if (unlikely(!curr)) @@ -837,8 +873,11 @@ static void update_curr(struct cfs_rq *cfs_rq) curr->exec_start = now; - schedstat_set(curr->statistics.exec_max, - max(delta_exec, curr->statistics.exec_max)); + if (schedstat_enabled()) { + __schedstats_from_sched_entity(curr, &stats); + __schedstat_set(stats->exec_max, + max(delta_exec, stats->exec_max)); + } curr->sum_exec_runtime += delta_exec; schedstat_add(cfs_rq->exec_clock, delta_exec); @@ -865,40 +904,46 @@ static void update_curr_fair(struct rq *rq) static inline void update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se) { + struct sched_statistics *stats = NULL; u64 wait_start, prev_wait_start; if (!schedstat_enabled()) return; + __schedstats_from_sched_entity(se, &stats); + wait_start = rq_clock(rq_of(cfs_rq)); - prev_wait_start = schedstat_val(se->statistics.wait_start); + prev_wait_start = schedstat_val(stats->wait_start); if (entity_is_task(se) && task_on_rq_migrating(task_of(se)) && likely(wait_start > prev_wait_start)) wait_start -= prev_wait_start; - __schedstat_set(se->statistics.wait_start, wait_start); + __schedstat_set(stats->wait_start, wait_start); } static inline void update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) { - struct task_struct *p; + struct sched_statistics *stats = NULL; + struct task_struct *p = NULL; u64 delta; if (!schedstat_enabled()) return; + __schedstats_from_sched_entity(se, &stats); + /* * When the sched_schedstat changes from 0 to 1, some sched se * maybe already in the runqueue, the se->statistics.wait_start * will be 0.So it will let the delta wrong. We need to avoid this * scenario. */ - if (unlikely(!schedstat_val(se->statistics.wait_start))) + if (unlikely(!schedstat_val(stats->wait_start))) return; - delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(se->statistics.wait_start); + delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(stats->wait_start); if (entity_is_task(se)) { p = task_of(se); @@ -908,30 +953,33 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) * time stamp can be adjusted to accumulate wait time * prior to migration. */ - __schedstat_set(se->statistics.wait_start, delta); + __schedstat_set(stats->wait_start, delta); return; } trace_sched_stat_wait(p, delta); } - __schedstat_set(se->statistics.wait_max, - max(schedstat_val(se->statistics.wait_max), delta)); - __schedstat_inc(se->statistics.wait_count); - __schedstat_add(se->statistics.wait_sum, delta); - __schedstat_set(se->statistics.wait_start, 0); + __schedstat_set(stats->wait_max, + max(schedstat_val(stats->wait_max), delta)); + __schedstat_inc(stats->wait_count); + __schedstat_add(stats->wait_sum, delta); + __schedstat_set(stats->wait_start, 0); } static inline void update_stats_enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se) { + struct sched_statistics *stats = NULL; struct task_struct *tsk = NULL; u64 sleep_start, block_start; if (!schedstat_enabled()) return; - sleep_start = schedstat_val(se->statistics.sleep_start); - block_start = schedstat_val(se->statistics.block_start); + __schedstats_from_sched_entity(se, &stats); + + sleep_start = schedstat_val(stats->sleep_start); + block_start = schedstat_val(stats->block_start); if (entity_is_task(se)) tsk = task_of(se); @@ -942,11 +990,11 @@ update_stats_enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se) if ((s64)delta < 0) delta = 0; - if (unlikely(delta > schedstat_val(se->statistics.sleep_max))) - __schedstat_set(se->statistics.sleep_max, delta); + if (unlikely(delta > schedstat_val(stats->sleep_max))) + __schedstat_set(stats->sleep_max, delta); - __schedstat_set(se->statistics.sleep_start, 0); - __schedstat_add(se->statistics.sum_sleep_runtime, delta); + __schedstat_set(stats->sleep_start, 0); + __schedstat_add(stats->sum_sleep_runtime, delta); if (tsk) { account_scheduler_latency(tsk, delta >> 10, 1); @@ -959,16 +1007,16 @@ update_stats_enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se) if ((s64)delta < 0) delta = 0; - if (unlikely(delta > schedstat_val(se->statistics.block_max))) - __schedstat_set(se->statistics.block_max, delta); + if (unlikely(delta > schedstat_val(stats->block_max))) + __schedstat_set(stats->block_max, delta); - __schedstat_set(se->statistics.block_start, 0); - __schedstat_add(se->statistics.sum_sleep_runtime, delta); + __schedstat_set(stats->block_start, 0); + __schedstat_add(stats->sum_sleep_runtime, delta); if (tsk) { if (tsk->in_iowait) { - __schedstat_add(se->statistics.iowait_sum, delta); - __schedstat_inc(se->statistics.iowait_count); + __schedstat_add(stats->iowait_sum, delta); + __schedstat_inc(stats->iowait_count); trace_sched_stat_iowait(tsk, delta); } @@ -1030,10 +1078,10 @@ update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) /* XXX racy against TTWU */ state = READ_ONCE(tsk->__state); if (state & TASK_INTERRUPTIBLE) - __schedstat_set(se->statistics.sleep_start, + __schedstat_set(tsk->stats.sleep_start, rq_clock(rq_of(cfs_rq))); if (state & TASK_UNINTERRUPTIBLE) - __schedstat_set(se->statistics.block_start, + __schedstat_set(tsk->stats.block_start, rq_clock(rq_of(cfs_rq))); } } @@ -4478,6 +4526,8 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr) static void set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) { + struct sched_statistics *stats = NULL; + clear_buddies(cfs_rq, se); /* 'current' is not kept within the tree. */ @@ -4502,8 +4552,9 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) */ if (schedstat_enabled() && rq_of(cfs_rq)->cfs.load.weight >= 2*se->load.weight) { - __schedstat_set(se->statistics.slice_max, - max((u64)schedstat_val(se->statistics.slice_max), + __schedstats_from_sched_entity(se, &stats); + __schedstat_set(stats->slice_max, + max((u64)schedstat_val(stats->slice_max), se->sum_exec_runtime - se->prev_sum_exec_runtime)); } @@ -5993,12 +6044,12 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p, if (sched_feat(WA_WEIGHT) && target == nr_cpumask_bits) target = wake_affine_weight(sd, p, this_cpu, prev_cpu, sync); - schedstat_inc(p->se.statistics.nr_wakeups_affine_attempts); + schedstat_inc(p->stats.nr_wakeups_affine_attempts); if (target == nr_cpumask_bits) return prev_cpu; schedstat_inc(sd->ttwu_move_affine); - schedstat_inc(p->se.statistics.nr_wakeups_affine); + schedstat_inc(p->stats.nr_wakeups_affine); return target; } @@ -7802,7 +7853,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) if (!cpumask_test_cpu(env->dst_cpu, p->cpus_ptr)) { int cpu; - schedstat_inc(p->se.statistics.nr_failed_migrations_affine); + schedstat_inc(p->stats.nr_failed_migrations_affine); env->flags |= LBF_SOME_PINNED; @@ -7836,7 +7887,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) env->flags &= ~LBF_ALL_PINNED; if (task_running(env->src_rq, p)) { - schedstat_inc(p->se.statistics.nr_failed_migrations_running); + schedstat_inc(p->stats.nr_failed_migrations_running); return 0; } @@ -7858,12 +7909,12 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env) env->sd->nr_balance_failed > env->sd->cache_nice_tries) { if (tsk_cache_hot == 1) { schedstat_inc(env->sd->lb_hot_gained[env->idle]); - schedstat_inc(p->se.statistics.nr_forced_migrations); + schedstat_inc(p->stats.nr_forced_migrations); } return 1; } - schedstat_inc(p->se.statistics.nr_failed_migrations_hot); + schedstat_inc(p->stats.nr_failed_migrations_hot); return 0; } diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 3daf42a0f462..95a7c3ad2dc3 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1009,8 +1009,8 @@ static void update_curr_rt(struct rq *rq) if (unlikely((s64)delta_exec <= 0)) return; - schedstat_set(curr->se.statistics.exec_max, - max(curr->se.statistics.exec_max, delta_exec)); + schedstat_set(curr->stats.exec_max, + max(curr->stats.exec_max, delta_exec)); curr->se.sum_exec_runtime += delta_exec; account_group_exec_runtime(curr, delta_exec); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index e6347c88c467..6a4541d7d659 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -389,6 +389,9 @@ struct cfs_bandwidth { struct task_group { struct cgroup_subsys_state css; + /* schedstats of this group on each CPU */ + struct sched_statistics **stats; + #ifdef CONFIG_FAIR_GROUP_SCHED /* schedulable entities of this group on each CPU */ struct sched_entity **se; diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h index d8f8eb0c655b..e6905e369c5d 100644 --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -41,6 +41,7 @@ rq_sched_info_dequeue(struct rq *rq, unsigned long long delta) #define schedstat_val_or_zero(var) ((schedstat_enabled()) ? (var) : 0) #else /* !CONFIG_SCHEDSTATS: */ + static inline void rq_sched_info_arrive (struct rq *rq, unsigned long long delta) { } static inline void rq_sched_info_dequeue(struct rq *rq, unsigned long long delta) { } static inline void rq_sched_info_depart (struct rq *rq, unsigned long long delta) { } @@ -53,8 +54,62 @@ static inline void rq_sched_info_depart (struct rq *rq, unsigned long long delt # define schedstat_set(var, val) do { } while (0) # define schedstat_val(var) 0 # define schedstat_val_or_zero(var) 0 + #endif /* CONFIG_SCHEDSTATS */ +#if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SCHEDSTATS) +static inline void free_tg_schedstats(struct task_group *tg) +{ + int i; + + for_each_possible_cpu(i) { + if (tg->stats) + kfree(tg->stats[i]); + } + + kfree(tg->stats); +} + +static inline int alloc_tg_schedstats(struct task_group *tg) +{ + struct sched_statistics *stats; + int i; + + /* + * This memory should be allocated whatever schedstat_enabled() or + * not. + */ + tg->stats = kcalloc(nr_cpu_ids, sizeof(stats), GFP_KERNEL); + if (!tg->stats) + return 0; + + for_each_possible_cpu(i) { + stats = kzalloc_node(sizeof(struct sched_statistics), + GFP_KERNEL, cpu_to_node(i)); + if (!stats) + return 0; + + tg->stats[i] = stats; + } + + return 1; +} + +#else + +static inline void free_tg_schedstats(struct task_group *tg) +{ + +} + +static inline int alloc_tg_schedstats(struct task_group *tg) +{ + return 1; +} + +#endif + + #ifdef CONFIG_PSI /* * PSI tracks state that persists across sleeps, such as iowaits and diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c index f988ebe3febb..0b165a25f22f 100644 --- a/kernel/sched/stop_task.c +++ b/kernel/sched/stop_task.c @@ -78,8 +78,8 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev) if (unlikely((s64)delta_exec < 0)) delta_exec = 0; - schedstat_set(curr->se.statistics.exec_max, - max(curr->se.statistics.exec_max, delta_exec)); + schedstat_set(curr->stats.exec_max, + max(curr->stats.exec_max, delta_exec)); curr->se.sum_exec_runtime += delta_exec; account_group_exec_runtime(curr, delta_exec); From patchwork Tue Aug 24 11:29:42 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 502647 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A989C432BE for ; Tue, 24 Aug 2021 11:30:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 04DD76135F for ; Tue, 24 Aug 2021 11:30:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236781AbhHXLa5 (ORCPT ); Tue, 24 Aug 2021 07:30:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35492 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236758AbhHXLav (ORCPT ); Tue, 24 Aug 2021 07:30:51 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 207F1C0613C1; Tue, 24 Aug 2021 04:30:07 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id ot2-20020a17090b3b4200b0019127f8ed87so1789473pjb.1; Tue, 24 Aug 2021 04:30:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=loaHRPIeGMQ85RfdiGbBupQ4rNWPYRIhgiocTz7Ohw0=; b=RozjAPdqYcxTPkcccB5tlH5cMKiRP4PA5xEDXzMGAAbo7nfb451s4hEngMgCpqjvsT VIon9/xtLe8mGGwZk0SsYxkuVHBXatdLtQAE3xiMsaRRPu2olzaMXmj6ZOgj8KkLVATr p+iNsFlGszNa+2SOkYlGQBLTJq9qu9Vy9BXjNuTYRZv53GsQsuhAXIK0Ggxnt4CZdCsC ARvBoryVHjGJqn+zBFjOFi2Ob23IKAWtXG3LMqsLPXad+I2S0W0xxKX89lrCFXjvQsA6 KyUu1EfcNo1GV6pXzXsrB9VJmX/Pa4x+C+lVKniQqCTq4jHxC08JYVJZEmB9qqIPVRCO QbjA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=loaHRPIeGMQ85RfdiGbBupQ4rNWPYRIhgiocTz7Ohw0=; b=D+EULhTado7jspa2qny9u89PsQLciq4oyCwZRv+HXiqOV5rhv9wbUpBoNyQ0hpO+FK 5kKAGV3JgBlDZbAptzOv/xhuPIRuTS55PTQFFeFbTvtYDdr7OGxL1ScVC7/1QiK8abgM JOuShn6w2Bu8afMd/HxQGVsgzVN834kmVsy5QYmpLDg3UObYHsBMBnLZWoA5zRYXhBSR GyDCumj1fWTVLfuR41he6agnh83JGTii+V05R/nSGX9cjz+ANpAFIOxJHI6JFqx2uGBI 6qpsAkuhJ3YkFudMEZe4QmDK8gOe6UadLd1GYfllWz0nJs7f01ya4P6pneLOlpw7SK8Z r1Tw== X-Gm-Message-State: AOAM5316yDPdIj9SH5pfdIlCwY/xXD6JMyD3cIs5R1uC6FhK5lJTn4IE GOxDi1aATW2GKg/0V00DqVA= X-Google-Smtp-Source: ABdhPJxvjWX46x4kygxVTfwj44zFSHdc78EduREFtGcBVQENQ/1UYhCD8U45UZ3IY2/ReaREQw6+5A== X-Received: by 2002:a17:902:a40c:b029:12c:17cf:ab6f with SMTP id p12-20020a170902a40cb029012c17cfab6fmr32665043plq.71.1629804606190; Tue, 24 Aug 2021 04:30:06 -0700 (PDT) Received: from localhost.localdomain ([45.77.24.247]) by smtp.gmail.com with ESMTPSA id on15sm2128732pjb.19.2021.08.24.04.30.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 04:30:05 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, achaiken@aurora.tech Cc: lkp@intel.com, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Yafang Shao Subject: [PATCH v3 3/7] sched: make schedstats helpers independent of fair sched class Date: Tue, 24 Aug 2021 11:29:42 +0000 Message-Id: <20210824112946.9324-4-laoar.shao@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210824112946.9324-1-laoar.shao@gmail.com> References: <20210824112946.9324-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org The original prototype of the schedstats helpers are update_stats_wait_*(struct cfs_rq *cfs_rq, struct sched_entity *se) The cfs_rq in these helpers is used to get the rq_clock, and the se is used to get the struct sched_statistics and the struct task_struct. In order to make these helpers available by all sched classes, we can pass the rq, sched_statistics and task_struct directly. Then the new helpers are update_stats_wait_*(struct rq *rq, struct task_struct *p, struct sched_statistics *stats) which are independent of fair sched class. To avoid vmlinux growing too large or introducing ovehead when !schedstat_enabled(), some new helpers after schedstat_enabled() are also introduced, Suggested by Mel. These helpers are in sched/stats.c, __update_stats_wait_*(struct rq *rq, struct task_struct *p, struct sched_statistics *stats) The size of vmlinux as follows, Before After Size of vmlinux 826308552 826304640 The size is a litte smaller as some functions are not inlined again after the change. I also compared the sched performance with 'perf bench sched pipe', suggested by Mel. The result as follows, Before After kernel.sched_schedstats=0 ~5.6 ~5.6 kernel.sched_schedstats=1 ~5.7 ~5.7 [These data is a little difference with the prev version, that is because my old test machine is destroyed so I have to use a new different test machine.] Almost no difference. No functional change. [lkp@intel.com: reported build failure in prev version] Signed-off-by: Yafang Shao Acked-by: Mel Gorman Cc: kernel test robot Cc: Alison Chaiken --- kernel/sched/fair.c | 133 +++++++------------------------------------ kernel/sched/stats.c | 103 +++++++++++++++++++++++++++++++++ kernel/sched/stats.h | 34 +++++++++++ 3 files changed, 156 insertions(+), 114 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7cb802431cfe..1324000c78bb 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -902,32 +902,28 @@ static void update_curr_fair(struct rq *rq) } static inline void -update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se) +update_stats_wait_start_fair(struct cfs_rq *cfs_rq, struct sched_entity *se) { struct sched_statistics *stats = NULL; - u64 wait_start, prev_wait_start; + struct task_struct *p = NULL; if (!schedstat_enabled()) return; __schedstats_from_sched_entity(se, &stats); - wait_start = rq_clock(rq_of(cfs_rq)); - prev_wait_start = schedstat_val(stats->wait_start); + if (entity_is_task(se)) + p = task_of(se); - if (entity_is_task(se) && task_on_rq_migrating(task_of(se)) && - likely(wait_start > prev_wait_start)) - wait_start -= prev_wait_start; + __update_stats_wait_start(rq_of(cfs_rq), p, stats); - __schedstat_set(stats->wait_start, wait_start); } static inline void -update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) +update_stats_wait_end_fair(struct cfs_rq *cfs_rq, struct sched_entity *se) { struct sched_statistics *stats = NULL; struct task_struct *p = NULL; - u64 delta; if (!schedstat_enabled()) return; @@ -943,105 +939,34 @@ update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se) if (unlikely(!schedstat_val(stats->wait_start))) return; - delta = rq_clock(rq_of(cfs_rq)) - schedstat_val(stats->wait_start); - - if (entity_is_task(se)) { + if (entity_is_task(se)) p = task_of(se); - if (task_on_rq_migrating(p)) { - /* - * Preserve migrating task's wait time so wait_start - * time stamp can be adjusted to accumulate wait time - * prior to migration. - */ - __schedstat_set(stats->wait_start, delta); - return; - } - trace_sched_stat_wait(p, delta); - } - __schedstat_set(stats->wait_max, - max(schedstat_val(stats->wait_max), delta)); - __schedstat_inc(stats->wait_count); - __schedstat_add(stats->wait_sum, delta); - __schedstat_set(stats->wait_start, 0); + __update_stats_wait_end(rq_of(cfs_rq), p, stats); } static inline void -update_stats_enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se) +update_stats_enqueue_sleeper_fair(struct cfs_rq *cfs_rq, struct sched_entity *se) { struct sched_statistics *stats = NULL; struct task_struct *tsk = NULL; - u64 sleep_start, block_start; if (!schedstat_enabled()) return; __schedstats_from_sched_entity(se, &stats); - sleep_start = schedstat_val(stats->sleep_start); - block_start = schedstat_val(stats->block_start); - if (entity_is_task(se)) tsk = task_of(se); - if (sleep_start) { - u64 delta = rq_clock(rq_of(cfs_rq)) - sleep_start; - - if ((s64)delta < 0) - delta = 0; - - if (unlikely(delta > schedstat_val(stats->sleep_max))) - __schedstat_set(stats->sleep_max, delta); - - __schedstat_set(stats->sleep_start, 0); - __schedstat_add(stats->sum_sleep_runtime, delta); - - if (tsk) { - account_scheduler_latency(tsk, delta >> 10, 1); - trace_sched_stat_sleep(tsk, delta); - } - } - if (block_start) { - u64 delta = rq_clock(rq_of(cfs_rq)) - block_start; - - if ((s64)delta < 0) - delta = 0; - - if (unlikely(delta > schedstat_val(stats->block_max))) - __schedstat_set(stats->block_max, delta); - - __schedstat_set(stats->block_start, 0); - __schedstat_add(stats->sum_sleep_runtime, delta); - - if (tsk) { - if (tsk->in_iowait) { - __schedstat_add(stats->iowait_sum, delta); - __schedstat_inc(stats->iowait_count); - trace_sched_stat_iowait(tsk, delta); - } - - trace_sched_stat_blocked(tsk, delta); - - /* - * Blocking time is in units of nanosecs, so shift by - * 20 to get a milliseconds-range estimation of the - * amount of time that the task spent sleeping: - */ - if (unlikely(prof_on == SLEEP_PROFILING)) { - profile_hits(SLEEP_PROFILING, - (void *)get_wchan(tsk), - delta >> 20); - } - account_scheduler_latency(tsk, delta >> 10, 0); - } - } + __update_stats_enqueue_sleeper(rq_of(cfs_rq), tsk, stats); } /* * Task is being enqueued - update stats: */ static inline void -update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) +update_stats_enqueue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { if (!schedstat_enabled()) return; @@ -1051,14 +976,14 @@ update_stats_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * a dequeue/enqueue event is a NOP) */ if (se != cfs_rq->curr) - update_stats_wait_start(cfs_rq, se); + update_stats_wait_start_fair(cfs_rq, se); if (flags & ENQUEUE_WAKEUP) - update_stats_enqueue_sleeper(cfs_rq, se); + update_stats_enqueue_sleeper_fair(cfs_rq, se); } static inline void -update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) +update_stats_dequeue_fair(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) { if (!schedstat_enabled()) @@ -1069,7 +994,7 @@ update_stats_dequeue(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) * waiting task: */ if (se != cfs_rq->curr) - update_stats_wait_end(cfs_rq, se); + update_stats_wait_end_fair(cfs_rq, se); if ((flags & DEQUEUE_SLEEP) && entity_is_task(se)) { struct task_struct *tsk = task_of(se); @@ -4273,26 +4198,6 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial) static void check_enqueue_throttle(struct cfs_rq *cfs_rq); -static inline void check_schedstat_required(void) -{ -#ifdef CONFIG_SCHEDSTATS - if (schedstat_enabled()) - return; - - /* Force schedstat enabled if a dependent tracepoint is active */ - if (trace_sched_stat_wait_enabled() || - trace_sched_stat_sleep_enabled() || - trace_sched_stat_iowait_enabled() || - trace_sched_stat_blocked_enabled() || - trace_sched_stat_runtime_enabled()) { - printk_deferred_once("Scheduler tracepoints stat_sleep, stat_iowait, " - "stat_blocked and stat_runtime require the " - "kernel parameter schedstats=enable or " - "kernel.sched_schedstats=1\n"); - } -#endif -} - static inline bool cfs_bandwidth_used(void); /* @@ -4366,7 +4271,7 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) place_entity(cfs_rq, se, 0); check_schedstat_required(); - update_stats_enqueue(cfs_rq, se, flags); + update_stats_enqueue_fair(cfs_rq, se, flags); check_spread(cfs_rq, se); if (!curr) __enqueue_entity(cfs_rq, se); @@ -4450,7 +4355,7 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) update_load_avg(cfs_rq, se, UPDATE_TG); se_update_runnable(se); - update_stats_dequeue(cfs_rq, se, flags); + update_stats_dequeue_fair(cfs_rq, se, flags); clear_buddies(cfs_rq, se); @@ -4537,7 +4442,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se) * a CPU. So account for the time it spent waiting on the * runqueue. */ - update_stats_wait_end(cfs_rq, se); + update_stats_wait_end_fair(cfs_rq, se); __dequeue_entity(cfs_rq, se); update_load_avg(cfs_rq, se, UPDATE_TG); } @@ -4637,7 +4542,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev) check_spread(cfs_rq, prev); if (prev->on_rq) { - update_stats_wait_start(cfs_rq, prev); + update_stats_wait_start_fair(cfs_rq, prev); /* Put 'current' back into the tree. */ __enqueue_entity(cfs_rq, prev); /* in !on_rq case, update occurred at dequeue */ diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index 3f93fc3b5648..b2542f4d3192 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -4,6 +4,109 @@ */ #include "sched.h" +void __update_stats_wait_start(struct rq *rq, struct task_struct *p, + struct sched_statistics *stats) +{ +u64 wait_start, prev_wait_start; + + wait_start = rq_clock(rq); + prev_wait_start = schedstat_val(stats->wait_start); + + if (p && likely(wait_start > prev_wait_start)) + wait_start -= prev_wait_start; + + __schedstat_set(stats->wait_start, wait_start); +} + +void __update_stats_wait_end(struct rq *rq, struct task_struct *p, + struct sched_statistics *stats) +{ + u64 delta = rq_clock(rq) - schedstat_val(stats->wait_start); + + if (p) { + if (task_on_rq_migrating(p)) { + /* + * Preserve migrating task's wait time so wait_start + * time stamp can be adjusted to accumulate wait time + * prior to migration. + */ + __schedstat_set(stats->wait_start, delta); + + return; + } + + trace_sched_stat_wait(p, delta); + } + + __schedstat_set(stats->wait_max, + max(schedstat_val(stats->wait_max), delta)); + __schedstat_inc(stats->wait_count); + __schedstat_add(stats->wait_sum, delta); + __schedstat_set(stats->wait_start, 0); +} + +void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p, + struct sched_statistics *stats) +{ + u64 sleep_start, block_start; + + sleep_start = schedstat_val(stats->sleep_start); + block_start = schedstat_val(stats->block_start); + + if (sleep_start) { + u64 delta = rq_clock(rq) - sleep_start; + + if ((s64)delta < 0) + delta = 0; + + if (unlikely(delta > schedstat_val(stats->sleep_max))) + __schedstat_set(stats->sleep_max, delta); + + __schedstat_set(stats->sleep_start, 0); + __schedstat_add(stats->sum_sleep_runtime, delta); + + if (p) { + account_scheduler_latency(p, delta >> 10, 1); + trace_sched_stat_sleep(p, delta); + } + } + + if (block_start) { + u64 delta = rq_clock(rq) - block_start; + + if ((s64)delta < 0) + delta = 0; + + if (unlikely(delta > schedstat_val(stats->block_max))) + __schedstat_set(stats->block_max, delta); + + __schedstat_set(stats->block_start, 0); + __schedstat_add(stats->sum_sleep_runtime, delta); + + if (p) { + if (p->in_iowait) { + __schedstat_add(stats->iowait_sum, delta); + __schedstat_inc(stats->iowait_count); + trace_sched_stat_iowait(p, delta); + } + + trace_sched_stat_blocked(p, delta); + + /* + * Blocking time is in units of nanosecs, so shift by + * 20 to get a milliseconds-range estimation of the + * amount of time that the task spent sleeping: + */ + if (unlikely(prof_on == SLEEP_PROFILING)) { + profile_hits(SLEEP_PROFILING, + (void *)get_wchan(p), + delta >> 20); + } + account_scheduler_latency(p, delta >> 10, 0); + } + } +} + /* * Current schedstat API version. * diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h index e6905e369c5d..9ecd81b91f26 100644 --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -2,6 +2,8 @@ #ifdef CONFIG_SCHEDSTATS +extern struct static_key_false sched_schedstats; + /* * Expects runqueue lock to be held for atomicity of update */ @@ -40,6 +42,33 @@ rq_sched_info_dequeue(struct rq *rq, unsigned long long delta) #define schedstat_val(var) (var) #define schedstat_val_or_zero(var) ((schedstat_enabled()) ? (var) : 0) +void __update_stats_wait_start(struct rq *rq, struct task_struct *p, + struct sched_statistics *stats); + +void __update_stats_wait_end(struct rq *rq, struct task_struct *p, + struct sched_statistics *stats); +void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p, + struct sched_statistics *stats); + +static inline void +check_schedstat_required(void) +{ + if (schedstat_enabled()) + return; + + /* Force schedstat enabled if a dependent tracepoint is active */ + if (trace_sched_stat_wait_enabled() || + trace_sched_stat_sleep_enabled() || + trace_sched_stat_iowait_enabled() || + trace_sched_stat_blocked_enabled() || + trace_sched_stat_runtime_enabled()) { + printk_deferred_once("Scheduler tracepoints stat_sleep, stat_iowait, " + "stat_blocked and stat_runtime require the " + "kernel parameter schedstats=enable or " + "kernel.sched_schedstats=1\n"); + } +} + #else /* !CONFIG_SCHEDSTATS: */ static inline void rq_sched_info_arrive (struct rq *rq, unsigned long long delta) { } @@ -55,6 +84,11 @@ static inline void rq_sched_info_depart (struct rq *rq, unsigned long long delt # define schedstat_val(var) 0 # define schedstat_val_or_zero(var) 0 +# define __update_stats_wait_start(rq, p, stats) do { } while (0) +# define __update_stats_wait_end(rq, p, stats) do { } while (0) +# define __update_stats_enqueue_sleeper(rq, p, stats) do { } while (0) +# define check_schedstat_required() do { } while (0) + #endif /* CONFIG_SCHEDSTATS */ #if defined(CONFIG_FAIR_GROUP_SCHED) && defined(CONFIG_SCHEDSTATS) From patchwork Tue Aug 24 11:29:43 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 502061 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2B55C4338F for ; Tue, 24 Aug 2021 11:30:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ACD74613A7 for ; Tue, 24 Aug 2021 11:30:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236924AbhHXLbJ (ORCPT ); Tue, 24 Aug 2021 07:31:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35504 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236786AbhHXLay (ORCPT ); Tue, 24 Aug 2021 07:30:54 -0400 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BA5BC0613CF; Tue, 24 Aug 2021 04:30:10 -0700 (PDT) Received: by mail-pg1-x531.google.com with SMTP id r2so19476640pgl.10; Tue, 24 Aug 2021 04:30:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8zm5J1NbSAv3TdiryGrbNnK0wLWnfsAGlTXyiy0Xe10=; b=a2PIZ+La9zU57Y18HTm6bWATO20DaJ6YSqmOX+Y7Z6m6Gtef65SuzOrEd3i9EbJaRo FXBn0yCFYqrg4PkA2HQZpyn6yMtpfMv/ejX67KRnb/3PYgHp9SUQaI+i3iQmdW+4gKrp uhJJI37FdgluSOHX91OmbaxBxfGt/uGxNXA0pOTKW8ajIs3pogYVn4E+zqr+G3orWXCK U1rfw85WdxKJXGHQ9DryjE1Elkvc6q7LgLFHHtB9x/g8nqFFEYLqTQ65kTxRryhsTqL5 RkL1+lGqWVA2yuXVm+4A6V5rz8TStFK5mRDjhi4viByTogB/KsxCUDwPEsGOTNKGsqE1 ShFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8zm5J1NbSAv3TdiryGrbNnK0wLWnfsAGlTXyiy0Xe10=; b=ZCGf05+DFJBKXLHo/M774OLGHEgfksvlERoYsmE78J1g2JrSQaWcWUS61PkB7kmSCG CUaZ7j+BXmH/w6HC9AxZn4B6oLdSySueUKZGTUSlBESO+foNasdlBtYE/tK8REJqWfGd I2BFKqpZ4NYpdEX4gUqmCgNAKEu4NVbH6CiXQMW2K6W0aC469H7j4/sJ+YNhEPC5uA4I S+UX/Jygzki516xHRCWP9GjsQMaoWos1jvN67KDghV9IJWAMl3fxiXOJELMK0qy1s/JW 6PW7Wy9FwIQMzP815CPfW2uI2QL7QjFL0OV7vQ8L91dXXnCdJgCCTT/coSGw0QafwXIW UL3Q== X-Gm-Message-State: AOAM532zdElTdUhswdLTEJQ2HMThWPdQRhyB/1BYLjl3GIT1IVU2vRzP WHKKr4Hxyfc2/amRJVLy/Vg= X-Google-Smtp-Source: ABdhPJzPJJ+t84WA7ZilqmCYK56gWc6BPMwkORW3LDFBA4mrU0+Plk7hI2aUzOOc2Rm0FfirEBQurQ== X-Received: by 2002:aa7:8e4f:0:b0:3ee:27d5:28bc with SMTP id d15-20020aa78e4f000000b003ee27d528bcmr2194660pfr.24.1629804609612; Tue, 24 Aug 2021 04:30:09 -0700 (PDT) Received: from localhost.localdomain ([45.77.24.247]) by smtp.gmail.com with ESMTPSA id on15sm2128732pjb.19.2021.08.24.04.30.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 04:30:09 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, achaiken@aurora.tech Cc: lkp@intel.com, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Yafang Shao Subject: [PATCH v3 4/7] sched: make the output of schedstats independent of fair sched class Date: Tue, 24 Aug 2021 11:29:43 +0000 Message-Id: <20210824112946.9324-5-laoar.shao@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210824112946.9324-1-laoar.shao@gmail.com> References: <20210824112946.9324-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org The per cpu stats can be show with /proc/sched_debug, which includes the per cpu schedstats of each task group. Currently these per cpu schedstats only show for the fair sched class. If we want to support other sched classes, we have to make these output independent of fair sched class. Signed-off-by: Yafang Shao Cc: Mel Gorman Cc: Alison Chaiken --- kernel/sched/debug.c | 70 +++++++++++++++++++++++++++++++------------- 1 file changed, 50 insertions(+), 20 deletions(-) diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 4cfee2aa1a2d..705987aed658 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -442,11 +442,7 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group struct sched_entity *se = tg->se[cpu]; #define P(F) SEQ_printf(m, " .%-30s: %lld\n", #F, (long long)F) -#define P_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld\n", \ - "se->statistics."#F, (long long)schedstat_val(tg->stats[cpu]->F)) #define PN(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", #F, SPLIT_NS((long long)F)) -#define PN_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", \ - "se->statistics."#F, SPLIT_NS((long long)schedstat_val(tg->stats[cpu]->F))) if (!se) return; @@ -454,20 +450,6 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group PN(se->exec_start); PN(se->vruntime); PN(se->sum_exec_runtime); - - if (schedstat_enabled()) { - PN_SCHEDSTAT(wait_start); - PN_SCHEDSTAT(sleep_start); - PN_SCHEDSTAT(block_start); - PN_SCHEDSTAT(sleep_max); - PN_SCHEDSTAT(block_max); - PN_SCHEDSTAT(exec_max); - PN_SCHEDSTAT(slice_max); - PN_SCHEDSTAT(wait_max); - PN_SCHEDSTAT(wait_sum); - P_SCHEDSTAT(wait_count); - } - P(se->load.weight); #ifdef CONFIG_SMP P(se->avg.load_avg); @@ -475,13 +457,60 @@ static void print_cfs_group_stats(struct seq_file *m, int cpu, struct task_group P(se->avg.runnable_avg); #endif -#undef PN_SCHEDSTAT #undef PN -#undef P_SCHEDSTAT #undef P } #endif +#if defined(CONFIG_FAIR_GROUP_SCHED) || defined(CONFIG_RT_GROUP_SCHED) +struct tg_schedstats { + struct seq_file *m; + int cpu; +}; + +static int tg_show_schedstats(struct task_group *tg, void *data) +{ + struct tg_schedstats *p = data; + struct seq_file *m = p->m; + int cpu = p->cpu; + +#define P_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld\n", \ + "se->statistics."#F, (long long)schedstat_val(tg->stats[cpu]->F)) +#define PN_SCHEDSTAT(F) SEQ_printf(m, " .%-30s: %lld.%06ld\n", \ + "se->statistics."#F, SPLIT_NS((long long)schedstat_val(tg->stats[cpu]->F))) + + PN_SCHEDSTAT(wait_start); + PN_SCHEDSTAT(sleep_start); + PN_SCHEDSTAT(block_start); + PN_SCHEDSTAT(sleep_max); + PN_SCHEDSTAT(block_max); + PN_SCHEDSTAT(exec_max); + PN_SCHEDSTAT(slice_max); + PN_SCHEDSTAT(wait_max); + PN_SCHEDSTAT(wait_sum); + P_SCHEDSTAT(wait_count); + +#undef P_SCHEDSTAT +#undef PN_SCHEDSTAT + +return 0; +} + +static void print_task_group_stats(struct seq_file *m, int cpu) +{ + struct tg_schedstats data = { + .m = m, + .cpu = cpu, + }; + + if (!schedstat_enabled()) + return; + + walk_tg_tree(tg_show_schedstats, tg_nop, &data); +} +#endif + + #ifdef CONFIG_CGROUP_SCHED static DEFINE_SPINLOCK(sched_debug_lock); static char group_path[PATH_MAX]; @@ -756,6 +785,7 @@ do { \ print_cfs_stats(m, cpu); print_rt_stats(m, cpu); print_dl_stats(m, cpu); + print_task_group_stats(m, cpu); print_rq(m, rq, cpu); SEQ_printf(m, "\n"); From patchwork Tue Aug 24 11:29:44 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 502062 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49854C432BE for ; Tue, 24 Aug 2021 11:30:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2F56C61184 for ; Tue, 24 Aug 2021 11:30:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236880AbhHXLbA (ORCPT ); Tue, 24 Aug 2021 07:31:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236850AbhHXLa5 (ORCPT ); Tue, 24 Aug 2021 07:30:57 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63F2FC061764; Tue, 24 Aug 2021 04:30:13 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id j187so18075482pfg.4; Tue, 24 Aug 2021 04:30:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=3Z/mcAy0AwiMUi0p/QTC0GnZiW2ukvf/TDSTvscZh4g=; b=fWGzZmB2V5fJVKJv2PVaan7zHWI+8TWC8cTiV5HibtIABww1Hlx1mO9TQRrDWfc//J mIdfg2K6z3SlLu3uMqFYWSpDaCOSZxj9QM3SrmcjmmAmAJgNKY+vJy+hwm2PKeBY1wuG NMDiVtylNkNwtnxDnL80ve950a+9lOlrdKQrJgjBSZujGqNjXNC/W7nZAtFjlx6Z7Mw9 A8WVXAosSDWGQPm9UvOqTcHdtMibCIpcXIrM0E0T3n7k3oXAPBwld5e2aKLri+R4kANe vQpW2PxB+7d2cP8JhJlUzATGETU9pYm9s1/bWcMYGXMpMUxZBifcATqzxLdHG7huUmc3 FR+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=3Z/mcAy0AwiMUi0p/QTC0GnZiW2ukvf/TDSTvscZh4g=; b=bPMT2raKPuGO4inuuwarDQNPZoqwSwy34/XYhQLLVpjyo5cAi+blSE3lKxGD2nK2R4 On4gnicVhz0qzTd1tM7UzSaKTKWNgcVOraTbrc1iGu7bCo1qDW2VK8188eYXd5G4MxNv gmP0QmFvv2Mjp+9bJJHjaWvMlskTuT/zhPA70L7QLVLXFa8i5GgEcmJmRyc0jOZjR52f MBBRPDBvNgXGbN8JXhZoskrVFJSBN1Zf3+ZwlGwMWhr03cJT1KRSDnY86CBPGWoj86Ig 0XWC5pxt/RoKc369BcHs6LKMVUB1MFJ2m1odKx7NPK2eVclwLyDUvonCQajVCuaz/M1y bITg== X-Gm-Message-State: AOAM533L0xx80ysDE4ZGENKRCqv3VuQxQypQumpIbAa3sxAS9WIrp7nD 43HPT4kzYLPfsXc2BasGWBA= X-Google-Smtp-Source: ABdhPJx7q6DYUPXPo7wmC9La9yOcbba+MOoCx3CxVmU4jxXAwbKJcH3XxOsoTgLpqD13zSWrbjsImg== X-Received: by 2002:a65:51c8:: with SMTP id i8mr32730434pgq.451.1629804612995; Tue, 24 Aug 2021 04:30:12 -0700 (PDT) Received: from localhost.localdomain ([45.77.24.247]) by smtp.gmail.com with ESMTPSA id on15sm2128732pjb.19.2021.08.24.04.30.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 04:30:12 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, achaiken@aurora.tech Cc: lkp@intel.com, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Yafang Shao Subject: [PATCH v3 5/7] sched: introduce task block time in schedstats Date: Tue, 24 Aug 2021 11:29:44 +0000 Message-Id: <20210824112946.9324-6-laoar.shao@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210824112946.9324-1-laoar.shao@gmail.com> References: <20210824112946.9324-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Currently in schedstats we have sum_sleep_runtime and iowait_sum, but there's no metric to show how long the task is in D state. Once a task in D state, it means the task is blocked in the kernel, for example the task may be waiting for a mutex. The D state is more frequent than iowait, and it is more critital than S state. So it is worth to add a metric to measure it. Signed-off-by: Yafang Shao Cc: Mel Gorman Cc: Alison Chaiken --- include/linux/sched.h | 2 ++ kernel/sched/debug.c | 6 ++++-- kernel/sched/stats.c | 1 + 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 39c29eae1af9..7888ad8384ba 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -502,6 +502,8 @@ struct sched_statistics { u64 block_start; u64 block_max; + s64 sum_block_runtime; + u64 exec_max; u64 slice_max; diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 705987aed658..5c6bc3f373f0 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -560,10 +560,11 @@ print_task(struct seq_file *m, struct rq *rq, struct task_struct *p) (long long)(p->nvcsw + p->nivcsw), p->prio); - SEQ_printf(m, "%9Ld.%06ld %9Ld.%06ld %9Ld.%06ld", + SEQ_printf(m, "%9lld.%06ld %9lld.%06ld %9lld.%06ld %9lld.%06ld", SPLIT_NS(schedstat_val_or_zero(p->stats.wait_sum)), SPLIT_NS(p->se.sum_exec_runtime), - SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime))); + SPLIT_NS(schedstat_val_or_zero(p->stats.sum_sleep_runtime)), + SPLIT_NS(schedstat_val_or_zero(p->stats.sum_block_runtime))); #ifdef CONFIG_NUMA_BALANCING SEQ_printf(m, " %d %d", task_node(p), task_numa_group_id(p)); @@ -995,6 +996,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, u64 avg_atom, avg_per_cpu; PN_SCHEDSTAT(sum_sleep_runtime); + PN_SCHEDSTAT(sum_block_runtime); PN_SCHEDSTAT(wait_start); PN_SCHEDSTAT(sleep_start); PN_SCHEDSTAT(block_start); diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c index b2542f4d3192..21fae41c06f5 100644 --- a/kernel/sched/stats.c +++ b/kernel/sched/stats.c @@ -82,6 +82,7 @@ void __update_stats_enqueue_sleeper(struct rq *rq, struct task_struct *p, __schedstat_set(stats->block_start, 0); __schedstat_add(stats->sum_sleep_runtime, delta); + __schedstat_add(stats->sum_block_runtime, delta); if (p) { if (p->in_iowait) { From patchwork Tue Aug 24 11:29:45 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 502646 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8223FC432BE for ; Tue, 24 Aug 2021 11:30:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6928E613AD for ; Tue, 24 Aug 2021 11:30:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236858AbhHXLbH (ORCPT ); Tue, 24 Aug 2021 07:31:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35558 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236900AbhHXLbB (ORCPT ); Tue, 24 Aug 2021 07:31:01 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBB91C061796; Tue, 24 Aug 2021 04:30:16 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id n18so19498459pgm.12; Tue, 24 Aug 2021 04:30:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=4DXvdN5Je/o6na7RsjQfMKd0MSHmdj3a0FLf0NHnXF4=; b=I7nz3ZSdlWOuLXhdOJF5Rx2EBPYnF9BmEI6gv4sNYvnUeRm2nt4sZK9hzxrP70k2ce 4/iWIKrJdeE/drPzVX3a95aKmVF6IR7MEJzgTCeVZ5RVlyfFX6EN3I/KFrLjFKiRrWfI uGgYPIpbA3yYFrMsmOG3/TuVecfuHxjQCxCy0MsnDCBKxy5r5s8YpBudpYLnexi+t6DB f07Acbp+GhXSwZZYwbr6ai93Syc6nDXdQc/uS0Y0bIeg1V8zyGDBLxetYPPy6GbZyZB0 sHnBphE9U1nECcLG3o/vM1B3jaDcMjAD3WeliGlSydf8HGkmPRmi2Lh6jmcK3gONOKtq G8uA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4DXvdN5Je/o6na7RsjQfMKd0MSHmdj3a0FLf0NHnXF4=; b=kF6nrHUiC32MwQjjhsGMrFg3jf51L1ox1Sob0LN1mTZoalj+aO2bqCJSqdLuvXAVFY lAL/tkfYE+qlPv7uvTQS+cmfD4dwnp+kqYmOigEQmLohhlxg0x/1nuMZC16YuFCJT9At HtoS0r7sAqxFJb9oFOoLA2m4kR2/s6lRx6j4YouetpFfPKc2d0VCxdajwa5wcmZXB2H3 2Jj63oa7jTKOqkD0CN4XQNkQ7t7v5VkTS3zT+fUDSl74XZaHUqNtlbnRMvT4xERSrTfw yg1KNpb6aYJ9I0TOiaig+9oKyX26U0RbBPjEADaWDGMhOLibalnsWRiQ3f1i7K7iJiP2 eIsA== X-Gm-Message-State: AOAM530wIlzJFnRsYSDbolq3WQe1tqb/p3pcm5jmxm9hwNOAZ7/iTi4m ReznOrKpBtPfaeAxLpsAXL0= X-Google-Smtp-Source: ABdhPJyzzTBIL+jiF5xnCWqZrkKm1d6RGJcuN/f5Irydngr6YM57xdtPP4wUqconoQ5HUErbXT0KOA== X-Received: by 2002:a65:5a89:: with SMTP id c9mr36569268pgt.274.1629804616302; Tue, 24 Aug 2021 04:30:16 -0700 (PDT) Received: from localhost.localdomain ([45.77.24.247]) by smtp.gmail.com with ESMTPSA id on15sm2128732pjb.19.2021.08.24.04.30.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 04:30:15 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, achaiken@aurora.tech Cc: lkp@intel.com, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Yafang Shao Subject: [PATCH v3 6/7] sched, rt: support sched_stat_runtime tracepoint for RT sched class Date: Tue, 24 Aug 2021 11:29:45 +0000 Message-Id: <20210824112946.9324-7-laoar.shao@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210824112946.9324-1-laoar.shao@gmail.com> References: <20210824112946.9324-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org The runtime of a RT task has already been there, so we only need to add a tracepoint. One difference between fair task and RT task is that there is no vruntime in RT task. To reuse the sched_stat_runtime tracepoint, '0' is passed as vruntime for RT task. The output of this tracepoint for RT task as follows, stress-9748 [039] d.h. 113.519352: sched_stat_runtime: comm=stress pid=9748 runtime=997573 [ns] vruntime=0 [ns] stress-9748 [039] d.h. 113.520352: sched_stat_runtime: comm=stress pid=9748 runtime=997627 [ns] vruntime=0 [ns] stress-9748 [039] d.h. 113.521352: sched_stat_runtime: comm=stress pid=9748 runtime=998203 [ns] vruntime=0 [ns] Signed-off-by: Yafang Shao Cc: Mel Gorman Cc: Alison Chaiken --- kernel/sched/rt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 95a7c3ad2dc3..5d251112e51c 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1012,6 +1012,8 @@ static void update_curr_rt(struct rq *rq) schedstat_set(curr->stats.exec_max, max(curr->stats.exec_max, delta_exec)); + trace_sched_stat_runtime(curr, delta_exec, 0); + curr->se.sum_exec_runtime += delta_exec; account_group_exec_runtime(curr, delta_exec); From patchwork Tue Aug 24 11:29:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yafang Shao X-Patchwork-Id: 502645 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C1DAC432BE for ; Tue, 24 Aug 2021 11:30:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6F096613AB for ; Tue, 24 Aug 2021 11:30:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236954AbhHXLbK (ORCPT ); Tue, 24 Aug 2021 07:31:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236906AbhHXLbD (ORCPT ); Tue, 24 Aug 2021 07:31:03 -0400 Received: from mail-pj1-x102e.google.com (mail-pj1-x102e.google.com [IPv6:2607:f8b0:4864:20::102e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03641C061764; Tue, 24 Aug 2021 04:30:20 -0700 (PDT) Received: by mail-pj1-x102e.google.com with SMTP id fz10so2628499pjb.0; Tue, 24 Aug 2021 04:30:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BC48mUfzqeHfpAPYKOs+JM4+J7ZU0EAvwJIUR4SbHpo=; b=EZ9yClD57uCA8gv4I6chiV+z30+uN7w2XNNKSPeETd4xz7l3x7wEMbb5PzdD/rTGJg ivbBENrO7SY5paGJjTcD2pPxxIo4coTDK0M7gI1rw4LBpeKsoQKwB2Ltvo7/Q8vQX5s3 0hrZZ/Iq2whSxE8CGVXg07+kNw2btw+UBr9aq47s3yO/rCOisWzwStkMn5FNREmSPKes hOnGXVDhSQKXo4qFfhB2GyHwLla+xU5lfEdszVFWEi3MSjmvvKJ+DU5TRLh7j1sB3ezF tvqdJG4uiLQijTO6jRSS+dGXITwZi6nQhTJF7yO2fakmoTcRC0eWOKqkdLfcX9/n0fcw 3djg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=BC48mUfzqeHfpAPYKOs+JM4+J7ZU0EAvwJIUR4SbHpo=; b=F3ETIxIlKpqPzwvwnrcypCnK6Ciq0Iq8/CCKPeTxMakBp9M2bhxtt+RaX5UtePvRGK 3iHEdCvnQJglEh4eGcEg9qx3KT62rP3c1r/CAADuqr90DGfTDgkhk4Zk7fqmkOaPY7OK 6nGTQlNbxeGZf/0b6At/c2mBOj1mBNv8Es4PRJaeVo032ugyTuwp42fU7FETAF/+Of5b ahZn1iA1VzhX3lgww4MLwqO9IF8efoIHP3oyHhmVecQ1SqkLW3lFBf3lg1IW3fEiWrTo vEs5BtXaVp8py6eNIIlIACkESZjWmEj/eRIby9KPYWFXDJDuFRfRLqlOdgKvMIoh4E3P MKyA== X-Gm-Message-State: AOAM5318rbAzFVpD3IzV57EHr6QaCFHoFnriwJj1Gr38KL+WPxqIS0xM sXAcCS0HE0/88LCOhKMpVM4= X-Google-Smtp-Source: ABdhPJzRmtJHvQ65vG90SGgZ7YN4I3APMRiEUiz2zxmRpX0CFGNpAUStcgOK1ITQ2wH0BGj+yVqRkA== X-Received: by 2002:a17:90a:44:: with SMTP id 4mr3859918pjb.130.1629804619503; Tue, 24 Aug 2021 04:30:19 -0700 (PDT) Received: from localhost.localdomain ([45.77.24.247]) by smtp.gmail.com with ESMTPSA id on15sm2128732pjb.19.2021.08.24.04.30.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 24 Aug 2021 04:30:19 -0700 (PDT) From: Yafang Shao To: mingo@redhat.com, peterz@infradead.org, mgorman@suse.de, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, bristot@redhat.com, achaiken@aurora.tech Cc: lkp@intel.com, linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org, Yafang Shao Subject: [PATCH v3 7/7] sched, rt: support schedstats for RT sched class Date: Tue, 24 Aug 2021 11:29:46 +0000 Message-Id: <20210824112946.9324-8-laoar.shao@gmail.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20210824112946.9324-1-laoar.shao@gmail.com> References: <20210824112946.9324-1-laoar.shao@gmail.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org We want to measure the latency of RT tasks in our production environment with schedstats facility, but currently schedstats is only supported for fair sched class. This patch enable it for RT sched class as well. After we make the struct sched_statistics and the helpers of it independent of fair sched class, we can easily use the schedstats facility for RT sched class. The schedstat usage in RT sched class is similar with fair sched class, for example, fair RT enqueue update_stats_enqueue_fair update_stats_enqueue_rt dequeue update_stats_dequeue_fair update_stats_dequeue_rt put_prev_task update_stats_wait_start update_stats_wait_start_rt set_next_task update_stats_wait_end update_stats_wait_end_rt The user can get the schedstats information in the same way in fair sched class. For example, fair RT /proc/[pid]/sched /proc/[pid]/sched The output of a RT task's schedstats as follows, $ cat /proc/227408/sched ... se.statistics.sum_sleep_runtime : 402284.476088 se.statistics.sum_block_runtime : 402272.475254 se.statistics.wait_start : 0.000000 se.statistics.sleep_start : 0.000000 se.statistics.block_start : 46903176.965093 se.statistics.sleep_max : 12.000834 se.statistics.block_max : 1446.963040 se.statistics.exec_max : 0.463806 se.statistics.slice_max : 0.000000 se.statistics.wait_max : 146.656326 se.statistics.wait_sum : 81741.944704 se.statistics.wait_count : 1004 se.statistics.iowait_sum : 77875.399958 se.statistics.iowait_count : 142 se.statistics.nr_migrations_cold : 0 se.statistics.nr_failed_migrations_affine : 0 se.statistics.nr_failed_migrations_running : 0 se.statistics.nr_failed_migrations_hot : 0 se.statistics.nr_forced_migrations : 0 se.statistics.nr_wakeups : 1003 se.statistics.nr_wakeups_sync : 0 se.statistics.nr_wakeups_migrate : 0 se.statistics.nr_wakeups_local : 351 se.statistics.nr_wakeups_remote : 652 se.statistics.nr_wakeups_affine : 0 se.statistics.nr_wakeups_affine_attempts : 0 se.statistics.nr_wakeups_passive : 0 se.statistics.nr_wakeups_idle : 0 ... The sched:sched_stat_{wait, sleep, iowait, blocked} tracepoints can be used to trace RT tasks as well. The output of these tracepoints for a RT tasks as follows, - blocked kworker/u113:0-230817 [000] d... 47197.452940: sched_stat_blocked: comm=stress pid=227408 delay=4096 [ns] - iowait kworker/3:1-222921 [003] d... 47492.211521: sched_stat_iowait: comm=stress pid=227408 delay=905187613 [ns] - wait stress-227400 [003] d... 47202.283021: sched_stat_wait: comm=stress pid=227408 delay=67958890 [ns] - runtime stress-227408 [003] d... 47202.283027: sched_stat_runtime: comm=stress pid=227408 runtime=7815 [ns] vruntime=0 [ns] - sleep sleep-244868 [022] dN.. 50070.614833: sched_stat_sleep: comm=sleep.sh pid=244300 delay=1001131165 [ns] sleep-244869 [022] dN.. 50071.616222: sched_stat_sleep: comm=sleep.sh pid=244300 delay=1001100486 [ns] sleep-244879 [022] dN.. 50072.617628: sched_stat_sleep: comm=sleep.sh pid=244300 delay=1001137198 [ns] [ In sleep.sh, it sleeps 1 sec each time. ] [lkp@intel.com: reported build failure in earlier version] Signed-off-by: Yafang Shao Cc: kernel test robot Cc: Mel Gorman Cc: Alison Chaiken --- kernel/sched/rt.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 141 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 5d251112e51c..446164597232 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -1273,6 +1273,129 @@ static void __delist_rt_entity(struct sched_rt_entity *rt_se, struct rt_prio_arr rt_se->on_list = 0; } +#ifdef CONFIG_RT_GROUP_SCHED +static inline void +__schedstats_from_sched_rt_entity(struct sched_rt_entity *rt_se, + struct sched_statistics **stats) +{ + struct task_struct *p; + struct task_group *tg; + struct rt_rq *rt_rq; + int cpu; + + if (rt_entity_is_task(rt_se)) { + p = rt_task_of(rt_se); + *stats = &p->stats; + } else { + rt_rq = group_rt_rq(rt_se); + tg = rt_rq->tg; + cpu = cpu_of(rq_of_rt_rq(rt_rq)); + *stats = tg->stats[cpu]; + } +} + +#else + +static inline void +__schedstats_from_sched_rt_entity(struct sched_rt_entity *rt_se, + struct sched_statistics **stats) +{ + struct task_struct *p; + + p = rt_task_of(rt_se); + *stats = &p->stats; +} + +#endif + +static inline void +update_stats_wait_start_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se) +{ + struct sched_statistics *stats = NULL; + struct task_struct *p = NULL; + + if (!schedstat_enabled()) + return; + + if (rt_entity_is_task(rt_se)) + p = rt_task_of(rt_se); + + __schedstats_from_sched_rt_entity(rt_se, &stats); + + __update_stats_wait_start(rq_of_rt_rq(rt_rq), p, stats); +} + +static inline void +update_stats_enqueue_sleeper_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se) +{ + struct sched_statistics *stats = NULL; + struct task_struct *p = NULL; + + if (!schedstat_enabled()) + return; + + if (rt_entity_is_task(rt_se)) + p = rt_task_of(rt_se); + + __schedstats_from_sched_rt_entity(rt_se, &stats); + + __update_stats_enqueue_sleeper(rq_of_rt_rq(rt_rq), p, stats); +} + +static inline void +update_stats_enqueue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se, + int flags) +{ + if (!schedstat_enabled()) + return; + + if (flags & ENQUEUE_WAKEUP) + update_stats_enqueue_sleeper_rt(rt_rq, rt_se); +} + +static inline void +update_stats_wait_end_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se) +{ + struct sched_statistics *stats = NULL; + struct task_struct *p = NULL; + + if (!schedstat_enabled()) + return; + + if (rt_entity_is_task(rt_se)) + p = rt_task_of(rt_se); + + __schedstats_from_sched_rt_entity(rt_se, &stats); + + __update_stats_wait_end(rq_of_rt_rq(rt_rq), p, stats); +} + +static inline void +update_stats_dequeue_rt(struct rt_rq *rt_rq, struct sched_rt_entity *rt_se, + int flags) +{ + struct task_struct *p = NULL; + + if (!schedstat_enabled()) + return; + + if (rt_entity_is_task(rt_se)) + p = rt_task_of(rt_se); + + if ((flags & DEQUEUE_SLEEP) && p) { + unsigned int state; + + state = READ_ONCE(p->__state); + if (state & TASK_INTERRUPTIBLE) + __schedstat_set(p->stats.sleep_start, + rq_clock(rq_of_rt_rq(rt_rq))); + + if (state & TASK_UNINTERRUPTIBLE) + __schedstat_set(p->stats.block_start, + rq_clock(rq_of_rt_rq(rt_rq))); + } +} + static void __enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags) { struct rt_rq *rt_rq = rt_rq_of_se(rt_se); @@ -1346,6 +1469,8 @@ static void enqueue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags) { struct rq *rq = rq_of_rt_se(rt_se); + update_stats_enqueue_rt(rt_rq_of_se(rt_se), rt_se, flags); + dequeue_rt_stack(rt_se, flags); for_each_sched_rt_entity(rt_se) __enqueue_rt_entity(rt_se, flags); @@ -1356,6 +1481,8 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags) { struct rq *rq = rq_of_rt_se(rt_se); + update_stats_dequeue_rt(rt_rq_of_se(rt_se), rt_se, flags); + dequeue_rt_stack(rt_se, flags); for_each_sched_rt_entity(rt_se) { @@ -1378,6 +1505,9 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags) if (flags & ENQUEUE_WAKEUP) rt_se->timeout = 0; + check_schedstat_required(); + update_stats_wait_start_rt(rt_rq_of_se(rt_se), rt_se); + enqueue_rt_entity(rt_se, flags); if (!task_current(rq, p) && p->nr_cpus_allowed > 1) @@ -1578,7 +1708,12 @@ static void check_preempt_curr_rt(struct rq *rq, struct task_struct *p, int flag static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first) { + struct sched_rt_entity *rt_se = &p->rt; + struct rt_rq *rt_rq = &rq->rt; + p->se.exec_start = rq_clock_task(rq); + if (on_rt_rq(&p->rt)) + update_stats_wait_end_rt(rt_rq, rt_se); /* The running task is never eligible for pushing */ dequeue_pushable_task(rq, p); @@ -1652,6 +1787,12 @@ static struct task_struct *pick_next_task_rt(struct rq *rq) static void put_prev_task_rt(struct rq *rq, struct task_struct *p) { + struct sched_rt_entity *rt_se = &p->rt; + struct rt_rq *rt_rq = &rq->rt; + + if (on_rt_rq(&p->rt)) + update_stats_wait_start_rt(rt_rq, rt_se); + update_curr_rt(rq); update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 1);