From patchwork Tue Feb 7 05:10:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651630 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1886CC636D4 for ; Tue, 7 Feb 2023 05:02:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230172AbjBGFCB (ORCPT ); Tue, 7 Feb 2023 00:02:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60568 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230132AbjBGFBv (ORCPT ); Tue, 7 Feb 2023 00:01:51 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 795FC2B084; Mon, 6 Feb 2023 21:01:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746103; x=1707282103; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=Ks6Wx9Pqszz2YvypOx73VVPt8pwQu0X0WqROcgHGPFg=; b=WZdoyckUVTPTLZ2SvcnaawYxkYhotga/TCp5v4N7a2+bLqWFEwZmIjlk WoWar9WG52JEq7WxCX/2bNAesLGgDxBYbocBFWpoy4y/xMiu/MEq/PjWe cCO0WmtY9GowklBREZ399+wQ2MVjzrTqQLKmN4XWXX9PU2CLnV5OREUN0 VhPmsYet5dPMJFBVEslZ1wfq4kFG+XPwEXk3JqvukrC+rDHvUQC7PWE2+ rF/PVCsHvI4lRJUXluceKGroS4y0UpdI3OIGzuEvyggnP+Z/DxSRFoVw/ kNEmureFao+KWih2O8kBXEhAeQICNolkr+1CWB/9lGWPUC+zJeNXsVZ/x w==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625749" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625749" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657693" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657693" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:40 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 02/24] sched: Add interfaces for IPC classes Date: Mon, 6 Feb 2023 21:10:43 -0800 Message-Id: <20230207051105.11575-3-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Add the interfaces that architectures shall implement to convey the data to support IPC classes. arch_update_ipcc() updates the IPC classification of the current task as given by hardware. arch_get_ipcc_score() provides a performance score for a given IPC class when placed on a specific CPU. Higher scores indicate higher performance. When a driver or equivalent enablement code has configured the necessary hardware to support IPC classes, it should call sched_enable_ipc_classes() to notify the scheduler that it can start using IPC classes data. The number of classes and the score of each class of task are determined by hardware. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * Clarified the properties of the IPC score: abstract, and linear. It can normalized when needed. (Ionela) * Selected a better default IPC score. (Ionela) * Removed arch_has_ipc_classes(). It is not suitable for hardware that is not ready to support IPC classes after boot. (Lukasz) * Added a new sched_enable_ipc_classes() interface that drivers or enablement code can call when ready to support IPC classes. (Lukasz) Changes since v1: * Shortened the names of the IPCC interfaces (PeterZ): sched_task_classes_enabled >> sched_ipcc_enabled arch_has_task_classes >> arch_has_ipc_classes arch_update_task_class >> arch_update_ipcc arch_get_task_class_score >> arch_get_ipcc_score * Removed smt_siblings_idle argument from arch_update_ipcc(). (PeterZ) --- include/linux/sched/topology.h | 6 ++++ kernel/sched/sched.h | 66 ++++++++++++++++++++++++++++++++++ kernel/sched/topology.c | 9 +++++ 3 files changed, 81 insertions(+) diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 816df6cc444e..5b084d3c9ad1 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -280,4 +280,10 @@ static inline int task_node(const struct task_struct *p) return cpu_to_node(task_cpu(p)); } +#ifdef CONFIG_IPC_CLASSES +extern void sched_enable_ipc_classes(void); +#else +static inline void sched_enable_ipc_classes(void) { } +#endif + #endif /* _LINUX_SCHED_TOPOLOGY_H */ diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1072502976df..0a9c3024326d 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2532,6 +2532,72 @@ void arch_scale_freq_tick(void) } #endif +#ifdef CONFIG_IPC_CLASSES +DECLARE_STATIC_KEY_FALSE(sched_ipcc); + +static inline bool sched_ipcc_enabled(void) +{ + return static_branch_unlikely(&sched_ipcc); +} + +#ifndef arch_update_ipcc +/** + * arch_update_ipcc() - Update the IPC class of the current task + * @curr: The current task + * + * Request that the IPC classification of @curr is updated. + * + * Returns: none + */ +static __always_inline +void arch_update_ipcc(struct task_struct *curr) +{ +} +#endif + +#ifndef arch_get_ipcc_score + +#define SCHED_IPCC_SCORE_SCALE (1L << SCHED_FIXEDPOINT_SHIFT) +/** + * arch_get_ipcc_score() - Get the IPC score of a class of task + * @ipcc: The IPC class + * @cpu: A CPU number + * + * The IPC performance scores reflects (but it is not identical to) the number + * of instructions retired per cycle for a given IPC class. It is a linear and + * abstract metric. Higher scores reflect better performance. + * + * The IPC score can be normalized with respect to the class, i, with the + * highest IPC score on the CPU, c, with highest performance: + * + * IPC(i, c) + * ------------------------------------ * SCHED_IPCC_SCORE_SCALE + * max(IPC(i, c) : (i, c)) + * + * Scheduling schemes that want to use the IPC score along with other + * normalized metrics for scheduling (e.g., CPU capacity) may need to normalize + * it. + * + * Other scheduling schemes (e.g., asym_packing) do not need normalization. + * + * Returns the performance score of an IPC class, @ipcc, when running on @cpu. + * Error when either @ipcc or @cpu are invalid. + */ +static __always_inline +unsigned long arch_get_ipcc_score(unsigned short ipcc, int cpu) +{ + return SCHED_IPCC_SCORE_SCALE; +} +#endif +#else /* CONFIG_IPC_CLASSES */ + +#define arch_get_ipcc_score(ipcc, cpu) (-EINVAL) +#define arch_update_ipcc(curr) + +static inline bool sched_ipcc_enabled(void) { return false; } + +#endif /* CONFIG_IPC_CLASSES */ + #ifndef arch_scale_freq_capacity /** * arch_scale_freq_capacity - get the frequency scale factor of a given CPU. diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index d93c3379e901..8380bb7f0cd9 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -670,6 +670,15 @@ DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); +#ifdef CONFIG_IPC_CLASSES +DEFINE_STATIC_KEY_FALSE(sched_ipcc); + +void sched_enable_ipc_classes(void) +{ + static_branch_enable_cpuslocked(&sched_ipcc); +} +#endif + static void update_top_cache_domain(int cpu) { struct sched_domain_shared *sds = NULL; From patchwork Tue Feb 7 05:10:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651629 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CBDDC636CD for ; Tue, 7 Feb 2023 05:02:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230223AbjBGFCS (ORCPT ); Tue, 7 Feb 2023 00:02:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230152AbjBGFBx (ORCPT ); Tue, 7 Feb 2023 00:01:53 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CAF374C0A; Mon, 6 Feb 2023 21:01:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746109; x=1707282109; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=iDkGE/mIiTsNfeFOnUGlfr0oeLSL2C0UHOku/Uu39Qo=; b=ep/FwozEgPY3+GMxNFBQI7DFYoIlSGA70lZmUH/CqwfRw4+z9+c6iLY0 qw3YJw+vjvQqbe8YZ3NucEpnWmJDcS3Q1osfUPoU4+75i7tKjzhgNcnxo y9VJWF+4v/zKyJOqHcO4PQnyEHZigA5nMwYUuE1pbuEx7l9fM4PMzGuBc ipo0hQ8/lbMeugQxE5DAoepBYKL2GCo+1Pjs6OaOaOUGgMLeS7wzaNPmo HiMbNsbxb9FtzqTP2T0MUfoEdnAMo0v8AhzCurAn7bhez6Cf0KAsoaZT6 05BFJy61P30OuxiJgsxtNWYgw38oyKW1EwLDPz7Ae9fh0RoL4ej2TXL2K A==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625771" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625771" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657699" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657699" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:40 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 04/24] sched/core: Add user_tick as argument to scheduler_tick() Date: Mon, 6 Feb 2023 21:10:45 -0800 Message-Id: <20230207051105.11575-5-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Differentiate between user and kernel ticks so that the scheduler updates the IPC class of the current task during the former. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * Corrected error in the changeset description: the IPC class of the current task is updated at user tick. (Dietmar) Changes since v1: * None --- include/linux/sched.h | 2 +- kernel/sched/core.c | 2 +- kernel/time/timer.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 98f84f90a01d..10c6abdc3465 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -293,7 +293,7 @@ enum { TASK_COMM_LEN = 16, }; -extern void scheduler_tick(void); +extern void scheduler_tick(bool user_tick); #define MAX_SCHEDULE_TIMEOUT LONG_MAX diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ed1549d28090..39d218a2f243 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5555,7 +5555,7 @@ static inline u64 cpu_resched_latency(struct rq *rq) { return 0; } * This function gets called by the timer code, with HZ frequency. * We call it with interrupts disabled. */ -void scheduler_tick(void) +void scheduler_tick(bool user_tick) { int cpu = smp_processor_id(); struct rq *rq = cpu_rq(cpu); diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 63a8ce7177dd..e15e24105891 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -2073,7 +2073,7 @@ void update_process_times(int user_tick) if (in_irq()) irq_work_tick(); #endif - scheduler_tick(); + scheduler_tick(user_tick); if (IS_ENABLED(CONFIG_POSIX_TIMERS)) run_posix_cpu_timers(); } From patchwork Tue Feb 7 05:10:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651628 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B839DC636CD for ; Tue, 7 Feb 2023 05:02:24 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230246AbjBGFCY (ORCPT ); Tue, 7 Feb 2023 00:02:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60000 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229730AbjBGFBy (ORCPT ); Tue, 7 Feb 2023 00:01:54 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9462E55A2; Mon, 6 Feb 2023 21:01:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746111; x=1707282111; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=9U3HT2fPkvVUOBW50CYIJ3B+L4hO5S4Nc1uYuLdQPpY=; b=aC0QiTkNKtnMvNl23FvuS1tVrjynTfR9uEtl8VzdUW+9Y/2E5f2tORgs 414kjOfSowGsHPg/CX3gTgBzYlJk+5r4hOGMZeaWBMhmlisHG7SNRVK+K cNZUs0BhJC/oZM422gUvFhP9W9M7uQaZRw07TRlxEIE1evtZeCmlFSKim gSDNOYgWQxz2Sn1BCfbmAwQOZAXsY/58ONzAtIxFeQ6RS2fFK6O1un5Xv kojvIHxSnaZa+JTTsHXe7mCgbA16TVdyYKInn75Ub4XG7WIGfSehJaWrF U8xInG+XqJTd3lazXpDkwsaY/3H9uTXdU2ojsY2RiCkt7p6wZO6HCxnvH g==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625795" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625795" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657713" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657713" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:41 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 06/24] sched/fair: Collect load-balancing stats for IPC classes Date: Mon, 6 Feb 2023 21:10:47 -0800 Message-Id: <20230207051105.11575-7-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When selecting a busiest scheduling group, the IPC class of the current task can be used to select between two scheduling groups of types asym_ packing or fully_busy that are otherwise identical. Compute the IPC class performance score for a scheduling group. It is the sum of the scores of the current tasks of all the runqueues. Also, keep track of the class of the task with the lowest IPC class score in the scheduling group. These two metrics will be used during idle load balancing to compute the current and the prospective IPC class score of a scheduling group. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * Also excluded deadline and realtime tasks from IPCC stats. (Dietmar) * Also excluded tasks that cannot run on the destination CPU from the IPCC stats. * Folded struct sg_lb_ipcc_stats into struct sg_lb_stats. (Dietmar) * Reworded description sg_lb_stats::min_ipcc. (Ionela) * Handle errors of arch_get_ipcc_score(). (Ionela) Changes since v1: * Implemented cleanups and reworks from PeterZ. Thanks! * Used the new interface names. --- kernel/sched/fair.c | 61 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 61 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 0ada2d18b934..d773380a95b3 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -8897,6 +8897,11 @@ struct sg_lb_stats { unsigned int nr_numa_running; unsigned int nr_preferred_running; #endif +#ifdef CONFIG_IPC_CLASSES + unsigned long min_score; /* Min(score(rq->curr->ipcc)) */ + unsigned short min_ipcc; /* Class of the task with the minimum IPCC score in the rq */ + unsigned long sum_score; /* Sum(score(rq->curr->ipcc)) */ +#endif }; /* @@ -9240,6 +9245,59 @@ group_type group_classify(unsigned int imbalance_pct, return group_has_spare; } +#ifdef CONFIG_IPC_CLASSES +static void init_rq_ipcc_stats(struct sg_lb_stats *sgs) +{ + /* All IPCC stats have been set to zero in update_sg_lb_stats(). */ + sgs->min_score = ULONG_MAX; +} + +/* Called only if cpu_of(@rq) is not idle and has tasks running. */ +static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, + struct rq *rq) +{ + struct task_struct *curr; + unsigned short ipcc; + unsigned long score; + + if (!sched_ipcc_enabled()) + return; + + curr = rcu_dereference(rq->curr); + if (!curr || (curr->flags & PF_EXITING) || is_idle_task(curr) || + task_is_realtime(curr) || + !cpumask_test_cpu(dst_cpu, curr->cpus_ptr)) + return; + + ipcc = curr->ipcc; + score = arch_get_ipcc_score(ipcc, cpu_of(rq)); + + /* + * Ignore tasks with invalid scores. When finding the busiest group, we + * prefer those with higher sum_score. This group will not be selected. + */ + if (IS_ERR_VALUE(score)) + return; + + sgs->sum_score += score; + + if (score < sgs->min_score) { + sgs->min_score = score; + sgs->min_ipcc = ipcc; + } +} + +#else /* CONFIG_IPC_CLASSES */ +static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, + struct rq *rq) +{ +} + +static void init_rq_ipcc_stats(struct sg_lb_stats *sgs) +{ +} +#endif /* CONFIG_IPC_CLASSES */ + /** * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks * @dst_cpu: Destination CPU of the load balancing @@ -9332,6 +9390,7 @@ static inline void update_sg_lb_stats(struct lb_env *env, int i, nr_running, local_group; memset(sgs, 0, sizeof(*sgs)); + init_rq_ipcc_stats(sgs); local_group = group == sds->local; @@ -9381,6 +9440,8 @@ static inline void update_sg_lb_stats(struct lb_env *env, if (sgs->group_misfit_task_load < load) sgs->group_misfit_task_load = load; } + + update_sg_lb_ipcc_stats(env->dst_cpu, sgs, rq); } sgs->group_capacity = group->sgc->capacity; From patchwork Tue Feb 7 05:10:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651627 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 81E53C64EC4 for ; Tue, 7 Feb 2023 05:02:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230261AbjBGFCY (ORCPT ); Tue, 7 Feb 2023 00:02:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60068 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230161AbjBGFBz (ORCPT ); Tue, 7 Feb 2023 00:01:55 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D43E4901F; Mon, 6 Feb 2023 21:01:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746113; x=1707282113; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=XBeyrfeapZSrbuAay+AhGt8TNoIKKt2H5iHcCn0WtSs=; b=dhD9I/7mLVj2Pggklg/grywvghBoSKaHwulbTuRAAHYuSm415n4FAEfV tYcSr+xDP+sso3ijlWudcK+qfujjfWET/RZHe5mWc7Rcarq7kwPwyaejx 47TbMWIgSkkkWrwRxyQj36YO2mQVrMudO8wQ6VkRhb3nBeHdigjm46e/P UcpJL10fI5HT6suN3NwaKLGJj+pvhcYvFoR/acBhYiGK6LiYaVum9sCkr 0eA0Q3jPQDP1zUSOpT3mVTJ6y95Mj7Ri7Ecd6mFaDJPs4JiLYb/8bPE2h Gacexu2m7mQUSjj3I1fzWPF+F+YLmpc4k+EivlWM2dlB1gDdZCQDiMkc1 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625837" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625837" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657723" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657723" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:42 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 09/24] sched/fair: Use IPCC stats to break ties between fully_busy SMT groups Date: Mon, 6 Feb 2023 21:10:50 -0800 Message-Id: <20230207051105.11575-10-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org IPCC statistics are used during idle load balancing. After balancing one of the siblings of an SMT core will become idle. The rest of the busy siblings will enjoy increased throughput. The IPCC statistics provide a measure of the increased throughput. Use them to pick a busiest group from otherwise identical fully_busy scheduling groups (of which the avg_load is equal - and zero). Using IPCC scores to break ties with non-SMT fully_busy sched groups is not necessary. SMT sched groups always need more help. Add a stub sched_asym_ipcc_prefer() for !CONFIG_IPC_CLASSES. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * Introduced this patch. Changes since v1: * N/A --- kernel/sched/fair.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 841927b9b192..72d88270b320 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9415,6 +9415,12 @@ static void update_sg_lb_stats_scores(struct sg_lb_stats *sgs, { } +static bool sched_asym_ipcc_prefer(struct sg_lb_stats *a, + struct sg_lb_stats *b) +{ + return false; +} + static bool sched_asym_ipcc_pick(struct sched_group *a, struct sched_group *b, struct sg_lb_stats *a_stats, @@ -9698,10 +9704,21 @@ static bool update_sd_pick_busiest(struct lb_env *env, if (sgs->avg_load == busiest->avg_load) { /* * SMT sched groups need more help than non-SMT groups. - * If @sg happens to also be SMT, either choice is good. */ - if (sds->busiest->flags & SD_SHARE_CPUCAPACITY) - return false; + if (sds->busiest->flags & SD_SHARE_CPUCAPACITY) { + if (!(sg->flags & SD_SHARE_CPUCAPACITY)) + return false; + + /* + * Between two SMT groups, use IPCC scores to pick the + * one that would improve throughput the most (only + * asym_packing uses IPCC scores for now). + */ + if (sched_ipcc_enabled() && + env->sd->flags & SD_ASYM_PACKING && + sched_asym_ipcc_prefer(busiest, sgs)) + return false; + } } break; From patchwork Tue Feb 7 05:10:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651626 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C660C6379F for ; Tue, 7 Feb 2023 05:02:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230282AbjBGFC1 (ORCPT ); Tue, 7 Feb 2023 00:02:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60084 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229843AbjBGFB4 (ORCPT ); Tue, 7 Feb 2023 00:01:56 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E45BC643; Mon, 6 Feb 2023 21:01:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746114; x=1707282114; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=BrjwTRAQcpfuOeAV7Rg7Zm/X0IgwzgEOiAKjPiNKh1M=; b=ZGnjxVpa8iBCrxYykm4azWnUE82ILNVXXrmb3XL1r1GVuqN/eJz9Fv6+ BG5KtJsW2kvHDhW8u15yamKhtbsHFWIPtI5bZy6b62TNAOHUIfg8BTl6q U+sxFHy32zkOO96x2ls6L01fxLkJfrLFYAAqUmWnvEcC3BtRfXWzNNNnD wP+BBoG13YwvVkI85fwTRqaZp3LSWQKqYSjGYX+zBXadW27ShXC/Q0wv5 6eEhN8XbB0PTIvYMag6ip7fv7IhxmdWx61s09inzXdO9nfMYk/YOUB8lI R8WIEstc3TnFW/izgAh1XJg4cMWsQLMitCr/4HASdp/tYqyQKsmegaFA6 A==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625849" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625849" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657730" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657730" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:42 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 10/24] sched/fair: Use IPCC scores to select a busiest runqueue Date: Mon, 6 Feb 2023 21:10:51 -0800 Message-Id: <20230207051105.11575-11-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org For two runqueues of equal priority and equal number of running of tasks, select the one whose current task would have the highest IPC class score if placed on the destination CPU. For now, use IPCC scores only for scheduling domains with the SD_ASYM_PACKING flag. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * Only use IPCC scores to break ties if the sched domain uses asym_packing. (Ionela) * Handle errors of arch_get_ipcc_score(). (Ionela) Changes since v1: * Fixed a bug when selecting a busiest runqueue: when comparing two runqueues with equal nr_running, we must compute the IPCC score delta of both. * Renamed local variables to improve the layout of the code block. (PeterZ) * Used the new interface names. --- kernel/sched/fair.c | 64 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 72d88270b320..d3c22dc145f7 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9399,6 +9399,37 @@ static bool sched_asym_ipcc_pick(struct sched_group *a, return sched_asym_ipcc_prefer(a_stats, b_stats); } +/** + * ipcc_score_delta - Get the IPCC score delta wrt the load balance's dst_cpu + * @p: A task + * @env: Load balancing environment + * + * Returns: The IPCC score delta that @p would get if placed in the destination + * CPU of @env. LONG_MIN to indicate that the delta should not be used. + */ +static long ipcc_score_delta(struct task_struct *p, struct lb_env *env) +{ + unsigned long score_src, score_dst; + unsigned short ipcc = p->ipcc; + + if (!sched_ipcc_enabled()) + return LONG_MIN; + + /* Only asym_packing uses IPCC scores at the moment. */ + if (!(env->sd->flags & SD_ASYM_PACKING)) + return LONG_MIN; + + score_dst = arch_get_ipcc_score(ipcc, env->dst_cpu); + if (IS_ERR_VALUE(score_dst)) + return LONG_MIN; + + score_src = arch_get_ipcc_score(ipcc, task_cpu(p)); + if (IS_ERR_VALUE(score_src)) + return LONG_MIN; + + return score_dst - score_src; +} + #else /* CONFIG_IPC_CLASSES */ static void update_sg_lb_ipcc_stats(int dst_cpu, struct sg_lb_stats *sgs, struct rq *rq) @@ -9429,6 +9460,11 @@ static bool sched_asym_ipcc_pick(struct sched_group *a, return false; } +static long ipcc_score_delta(struct task_struct *p, struct lb_env *env) +{ + return LONG_MIN; +} + #endif /* CONFIG_IPC_CLASSES */ /** @@ -10589,6 +10625,7 @@ static struct rq *find_busiest_queue(struct lb_env *env, { struct rq *busiest = NULL, *rq; unsigned long busiest_util = 0, busiest_load = 0, busiest_capacity = 1; + long busiest_ipcc_delta = LONG_MIN; unsigned int busiest_nr = 0; int i; @@ -10705,8 +10742,35 @@ static struct rq *find_busiest_queue(struct lb_env *env, case migrate_task: if (busiest_nr < nr_running) { + struct task_struct *curr; + busiest_nr = nr_running; busiest = rq; + + /* + * Remember the IPCC score delta of busiest::curr. + * We may need it to break a tie with other queues + * with equal nr_running. + */ + curr = rcu_dereference(busiest->curr); + busiest_ipcc_delta = ipcc_score_delta(curr, env); + /* + * If rq and busiest have the same number of running + * tasks and IPC classes are supported, pick rq if doing + * so would give rq::curr a bigger IPC boost on dst_cpu. + */ + } else if (busiest_nr == nr_running) { + struct task_struct *curr; + long delta; + + curr = rcu_dereference(rq->curr); + delta = ipcc_score_delta(curr, env); + + if (busiest_ipcc_delta < delta) { + busiest_ipcc_delta = delta; + busiest_nr = nr_running; + busiest = rq; + } } break; From patchwork Tue Feb 7 05:10:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651625 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF6ABC636CC for ; Tue, 7 Feb 2023 05:02:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230307AbjBGFC3 (ORCPT ); Tue, 7 Feb 2023 00:02:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229878AbjBGFB5 (ORCPT ); Tue, 7 Feb 2023 00:01:57 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 901C7C678; Mon, 6 Feb 2023 21:01:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746115; x=1707282115; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=d7YOpMlf1P1ORwPsFp8Wf2z9+WUpcXCD+VCa+NWGzJ0=; b=KgwOpYS9wwI5sG+q542pWz5u/vqrl00gdbipfL2/6e4zuRlSjEa2a1FL EByuNyYUqIIm1c1vhNPO/wgqf+J91afi2g6JOobXDlY4H+cSFYwbpilRE k2W035GmwVnSMmxflsN0H6BOW+8rw4OyOtxAf4PArdcu1YtuXJQjLuDxU UdfT1ID5O/f8u7Lo1tBAdnH3XwyjQoLIOC1CwkU7FvRHl+99J5lfaDCKx KinJoe/DUXiPwN55nUT5Qw8qoJhb7j5HVmHuNplFkU+0dkhqn8Rc+jMg8 msjsuc6nY5h3tVppu8rYD5ZtiImDWUCq9i95V9D/HTtapQu0W+SMGgVrg Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625861" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625861" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657733" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657733" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:43 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 11/24] thermal: intel: hfi: Introduce Intel Thread Director classes Date: Mon, 6 Feb 2023 21:10:52 -0800 Message-Id: <20230207051105.11575-12-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org On Intel hybrid parts, each type of CPU has specific performance and energy efficiency capabilities. The Intel Thread Director technology extends the Hardware Feedback Interface (HFI) to provide performance and energy efficiency data for advanced classes of instructions. Add support to parse per-class capabilities. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * None Changes since v1: * Removed a now obsolete comment. --- drivers/thermal/intel/intel_hfi.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 6e604bda2b93..2527ae3836c7 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -77,7 +77,7 @@ union cpuid6_edx { * @ee_cap: Energy efficiency capability * * Capabilities of a logical processor in the HFI table. These capabilities are - * unitless. + * unitless and specific to each HFI class. */ struct hfi_cpu_data { u8 perf_cap; @@ -89,7 +89,8 @@ struct hfi_cpu_data { * @perf_updated: Hardware updated performance capabilities * @ee_updated: Hardware updated energy efficiency capabilities * - * Properties of the data in an HFI table. + * Properties of the data in an HFI table. There exists one header per each + * HFI class. */ struct hfi_hdr { u8 perf_updated; @@ -127,16 +128,21 @@ struct hfi_instance { /** * struct hfi_features - Supported HFI features + * @nr_classes: Number of classes supported * @nr_table_pages: Size of the HFI table in 4KB pages * @cpu_stride: Stride size to locate the capability data of a logical * processor within the table (i.e., row stride) + * @class_stride: Stride size to locate a class within the capability + * data of a logical processor or the HFI table header * @hdr_size: Size of the table header * * Parameters and supported features that are common to all HFI instances */ struct hfi_features { + unsigned int nr_classes; size_t nr_table_pages; unsigned int cpu_stride; + unsigned int class_stride; unsigned int hdr_size; }; @@ -333,8 +339,8 @@ static void init_hfi_cpu_index(struct hfi_cpu_info *info) } /* - * The format of the HFI table depends on the number of capabilities that the - * hardware supports. Keep a data structure to navigate the table. + * The format of the HFI table depends on the number of capabilities and classes + * that the hardware supports. Keep a data structure to navigate the table. */ static void init_hfi_instance(struct hfi_instance *hfi_instance) { @@ -515,18 +521,30 @@ static __init int hfi_parse_features(void) /* The number of 4KB pages required by the table */ hfi_features.nr_table_pages = edx.split.table_pages + 1; + /* + * Capability fields of an HFI class are grouped together. Classes are + * contiguous in memory. Hence, use the number of supported features to + * locate a specific class. + */ + hfi_features.class_stride = nr_capabilities; + + /* For now, use only one class of the HFI table */ + hfi_features.nr_classes = 1; + /* * The header contains change indications for each supported feature. * The size of the table header is rounded up to be a multiple of 8 * bytes. */ - hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities, 8) * 8; + hfi_features.hdr_size = DIV_ROUND_UP(nr_capabilities * + hfi_features.nr_classes, 8) * 8; /* * Data of each logical processor is also rounded up to be a multiple * of 8 bytes. */ - hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities, 8) * 8; + hfi_features.cpu_stride = DIV_ROUND_UP(nr_capabilities * + hfi_features.nr_classes, 8) * 8; return 0; } From patchwork Tue Feb 7 05:10:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651624 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAA3BC6379F for ; Tue, 7 Feb 2023 05:02:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230318AbjBGFCa (ORCPT ); Tue, 7 Feb 2023 00:02:30 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229949AbjBGFB5 (ORCPT ); Tue, 7 Feb 2023 00:01:57 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 05AC7CA05; Mon, 6 Feb 2023 21:01:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746115; x=1707282115; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=vDnh1C/cq9wjWDStRuJO1Z3DAdsnd8sqeD4zFb0EIc0=; b=KRH57/e/ImFV0sKKTP80CpIj4dO4n+IZ5L2AQHQEqx+QQjlj1auDvaS6 ImagWXbEZaydTT+vz3o4cHHtECTMXZ9lEnCS6V6Hcj3VwbKk1Npu5lEO2 d8Z8HO7RpvcId73r9XM16gwpA0G7BF3Pq4hXwy12hYctZs1kDnhIEHDZN Hlu2iTzt/oW87/2icGmfrUTU9Frt7SXNYyOCH+qAtLEycVSfbMqeq+aWx ABGC5shFVOvjV+lmwlJO+8IPVkNvYlIalOTBsQSTx7cntDLEXMA4CrNWe bz6J+pEXY6e+tiAvUzLFLXcr4nN/oH75swM+fJrYPAthgRzDAJSSEVxoi A==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625884" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625884" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:44 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657744" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657744" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:43 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 13/24] thermal: intel: hfi: Store per-CPU IPCC scores Date: Mon, 6 Feb 2023 21:10:54 -0800 Message-Id: <20230207051105.11575-14-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The scheduler reads the IPCC scores when balancing load. These reads can be quite frequent. Hardware can also update the HFI table frequently. Concurrent access may cause a lot of lock contention. It gets worse as the number of CPUs increases. Instead, create separate per-CPU IPCC scores that the scheduler can read without the HFI table lock. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Ricardo Neri --- Changes since v2: * Only create these per-CPU variables when Intel Thread Director is supported. Changes since v1: * Added this patch. --- drivers/thermal/intel/intel_hfi.c | 46 +++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 2527ae3836c7..b06021828892 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -170,6 +171,43 @@ static struct workqueue_struct *hfi_updates_wq; #define HFI_UPDATE_INTERVAL HZ #define HFI_MAX_THERM_NOTIFY_COUNT 16 +#ifdef CONFIG_IPC_CLASSES +static int __percpu *hfi_ipcc_scores; + +static int alloc_hfi_ipcc_scores(void) +{ + if (!cpu_feature_enabled(X86_FEATURE_ITD)) + return 0; + + hfi_ipcc_scores = __alloc_percpu(sizeof(*hfi_ipcc_scores) * + hfi_features.nr_classes, + sizeof(*hfi_ipcc_scores)); + + return !hfi_ipcc_scores; +} + +static void set_hfi_ipcc_score(void *caps, int cpu) +{ + int i, *hfi_class; + + if (!cpu_feature_enabled(X86_FEATURE_ITD)) + return; + + hfi_class = per_cpu_ptr(hfi_ipcc_scores, cpu); + + for (i = 0; i < hfi_features.nr_classes; i++) { + struct hfi_cpu_data *class_caps; + + class_caps = caps + i * hfi_features.class_stride; + WRITE_ONCE(hfi_class[i], class_caps->perf_cap); + } +} + +#else +static int alloc_hfi_ipcc_scores(void) { return 0; } +static void set_hfi_ipcc_score(void *caps, int cpu) { } +#endif /* CONFIG_IPC_CLASSES */ + static void get_hfi_caps(struct hfi_instance *hfi_instance, struct thermal_genl_cpu_caps *cpu_caps) { @@ -192,6 +230,8 @@ static void get_hfi_caps(struct hfi_instance *hfi_instance, cpu_caps[i].efficiency = caps->ee_cap << 2; ++i; + + set_hfi_ipcc_score(caps, cpu); } raw_spin_unlock_irq(&hfi_instance->table_lock); } @@ -580,8 +620,14 @@ void __init intel_hfi_init(void) if (!hfi_updates_wq) goto err_nomem; + if (alloc_hfi_ipcc_scores()) + goto err_ipcc; + return; +err_ipcc: + destroy_workqueue(hfi_updates_wq); + err_nomem: for (j = 0; j < i; ++j) { hfi_instance = &hfi_instances[j]; From patchwork Tue Feb 7 05:10:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651623 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11CD5C636D4 for ; Tue, 7 Feb 2023 05:02:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230345AbjBGFCj (ORCPT ); Tue, 7 Feb 2023 00:02:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59990 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230092AbjBGFB7 (ORCPT ); Tue, 7 Feb 2023 00:01:59 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C74CCDF1; Mon, 6 Feb 2023 21:01:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746117; x=1707282117; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=A+SMOZzgEsVxAgLjF6sCJ/1gpwfA0tMbTNw8B4DLAIM=; b=fCIWpLnWvVOZblBGvUfz1mfcwQWd0Wj6VTVW0bEwvGTCk23O+ZGxeuld jjtcNSGAX6l4iZ/W9PZlReroJ2vApzJ3USF1g8Lm5qUnqlG7JT1qMWDI0 H6FNfi+ZqU1aCCQh8BNlveiPN+RIkkMHbCJ10uUqyWtocDETvtzI67hYH hj9Z0//AAgnDnozgPFdehOz5Miy7HNxQ9O9luBHuopvI18PATDNOP04hv 0kmhNvELgkcUgxUUwU9wzBR/rmZt6fH1Go/E3a8qZeLjprt4QZbQHVhjZ uunENef9sTQ0wGONgCJh2cT7ncG+dxubU8Opat1VDUcw4tNLBdVseObBe g==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625902" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625902" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:44 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657752" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657752" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:44 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 15/24] thermal: intel: hfi: Report the IPC class score of a CPU Date: Mon, 6 Feb 2023 21:10:56 -0800 Message-Id: <20230207051105.11575-16-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Implement the arch_get_ipcc_score() interface of the scheduler. Use the performance capabilities of the extended Hardware Feedback Interface table as the IPC score. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * None Changes since v1: * Adjusted the returned HFI class (which starts at 0) to match the scheduler IPCC class (which starts at 1). (PeterZ) * Used the new interface names. --- arch/x86/include/asm/topology.h | 2 ++ drivers/thermal/intel/intel_hfi.c | 27 +++++++++++++++++++++++++++ 2 files changed, 29 insertions(+) diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index ffcdac3f398f..c4fcd9c3c634 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -229,8 +229,10 @@ void init_freq_invariance_cppc(void); #if defined(CONFIG_IPC_CLASSES) && defined(CONFIG_INTEL_HFI_THERMAL) void intel_hfi_update_ipcc(struct task_struct *curr); +unsigned long intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu); #define arch_update_ipcc intel_hfi_update_ipcc +#define arch_get_ipcc_score intel_hfi_get_ipcc_score #endif /* defined(CONFIG_IPC_CLASSES) && defined(CONFIG_INTEL_HFI_THERMAL) */ #endif /* _ASM_X86_TOPOLOGY_H */ diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 530dcf57e06e..fa9b4a678d92 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -206,6 +206,33 @@ void intel_hfi_update_ipcc(struct task_struct *curr) curr->ipcc = msr.split.classid + 1; } +unsigned long intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu) +{ + unsigned short hfi_class; + int *scores; + + if (cpu < 0 || cpu >= nr_cpu_ids) + return -EINVAL; + + if (ipcc == IPC_CLASS_UNCLASSIFIED) + return -EINVAL; + + /* + * Scheduler IPC classes start at 1. HFI classes start at 0. + * See note intel_hfi_update_ipcc(). + */ + hfi_class = ipcc - 1; + + if (hfi_class >= hfi_features.nr_classes) + return -EINVAL; + + scores = per_cpu_ptr(hfi_ipcc_scores, cpu); + if (!scores) + return -ENODEV; + + return READ_ONCE(scores[hfi_class]); +} + static int alloc_hfi_ipcc_scores(void) { if (!cpu_feature_enabled(X86_FEATURE_ITD)) From patchwork Tue Feb 7 05:10:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651622 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E175AC6379F for ; Tue, 7 Feb 2023 05:02:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230382AbjBGFCx (ORCPT ); Tue, 7 Feb 2023 00:02:53 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230096AbjBGFB7 (ORCPT ); Tue, 7 Feb 2023 00:01:59 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C78A26EBF; Mon, 6 Feb 2023 21:01:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746117; x=1707282117; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=3ZodASqqSSM2NIj1LSnSMKhTz+ez7n5TchQdgqVyKMk=; b=cf3RcTOkwDxK/NR3BWE33SHlTo7EtZxQEsk/ll545VQT14y2JO6ZaclE FP+2ti8Bzb61Sst5OGfjNqF23wahIOHdpU6++RAAsvcssP0J4pn6d4/Cb D7O7oYuFeRapk0b96g2ehWZj/YxfsCQU+avvB65ooJTzxmLLVYc67Xpc8 DGmU+eOTBhyAweTDJHMgr1bqc5lErPud88NxWOd93bTZO0c8rdObktfsf fAJQD/AH7uEOd6scEKRIjli/Ksi6ntnHo4WOTcsGWJrHVVOtu0ZVUqMZw A1mtrQKDfki2gCri0mN2zGkTRDc8y86fWRwgkI+8kWO2QWht0csLSO9Pu g==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625917" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625917" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:45 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657758" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657758" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:44 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 17/24] thermal: intel: hfi: Enable the Intel Thread Director Date: Mon, 6 Feb 2023 21:10:58 -0800 Message-Id: <20230207051105.11575-18-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Enable Intel Thread Director from the CPU hotplug callback: globally from CPU0 and then enable the thread-classification hardware in each logical processor individually. Also, initialize the number of classes supported. Let the scheduler know that it can start using IPC classes. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri Acked-by: Rafael J. Wysocki --- Changes since v2: * Use the new sched_enable_ipc_classes() interface to enable the use of IPC classes in the scheduler. Changes since v1: * None --- arch/x86/include/asm/msr-index.h | 2 ++ drivers/thermal/intel/intel_hfi.c | 40 +++++++++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index ad35355ee43e..0ea25cc9c621 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -1106,6 +1106,8 @@ /* Hardware Feedback Interface */ #define MSR_IA32_HW_FEEDBACK_PTR 0x17d0 #define MSR_IA32_HW_FEEDBACK_CONFIG 0x17d1 +#define MSR_IA32_HW_FEEDBACK_THREAD_CONFIG 0x17d4 +#define MSR_IA32_HW_FEEDBACK_CHAR 0x17d2 /* x2APIC locked status */ #define MSR_IA32_XAPIC_DISABLE_STATUS 0xBD diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 7ea6acce7107..35d947f47550 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -48,6 +48,8 @@ /* Hardware Feedback Interface MSR configuration bits */ #define HW_FEEDBACK_PTR_VALID_BIT BIT(0) #define HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT BIT(0) +#define HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT BIT(1) +#define HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT BIT(0) /* CPUID detection and enumeration definitions for HFI */ @@ -72,6 +74,15 @@ union cpuid6_edx { u32 full; }; +union cpuid6_ecx { + struct { + u32 dont_care0:8; + u32 nr_classes:8; + u32 dont_care1:16; + } split; + u32 full; +}; + #ifdef CONFIG_IPC_CLASSES union hfi_thread_feedback_char_msr { struct { @@ -506,6 +517,11 @@ void intel_hfi_online(unsigned int cpu) init_hfi_cpu_index(info); + if (cpu_feature_enabled(X86_FEATURE_ITD)) { + msr_val = HW_FEEDBACK_THREAD_CONFIG_ENABLE_BIT; + wrmsrl(MSR_IA32_HW_FEEDBACK_THREAD_CONFIG, msr_val); + } + /* * Now check if the HFI instance of the package/die of @cpu has been * initialized (by checking its header). In such case, all we have to @@ -561,8 +577,22 @@ void intel_hfi_online(unsigned int cpu) */ rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT; + + if (cpu_feature_enabled(X86_FEATURE_ITD)) + msr_val |= HW_FEEDBACK_CONFIG_ITD_ENABLE_BIT; + wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); + /* + * We have all we need to support IPC classes. Task classification is + * now working. + * + * All class scores are zero until after the first HFI update. That is + * OK. The scheduler queries these scores at every load balance. + */ + if (cpu_feature_enabled(X86_FEATURE_ITD)) + sched_enable_ipc_classes(); + unlock: mutex_unlock(&hfi_instance_lock); return; @@ -640,8 +670,14 @@ static __init int hfi_parse_features(void) */ hfi_features.class_stride = nr_capabilities; - /* For now, use only one class of the HFI table */ - hfi_features.nr_classes = 1; + if (cpu_feature_enabled(X86_FEATURE_ITD)) { + union cpuid6_ecx ecx; + + ecx.full = cpuid_ecx(CPUID_HFI_LEAF); + hfi_features.nr_classes = ecx.split.nr_classes; + } else { + hfi_features.nr_classes = 1; + } /* * The header contains change indications for each supported feature. From patchwork Tue Feb 7 05:11:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651620 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B1361C636CD for ; Tue, 7 Feb 2023 05:03:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230425AbjBGFDE (ORCPT ); Tue, 7 Feb 2023 00:03:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230124AbjBGFCB (ORCPT ); Tue, 7 Feb 2023 00:02:01 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22F9F1165B; Mon, 6 Feb 2023 21:01:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746119; x=1707282119; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=ptfoaHFvByfojwrO5m9wFLhJI66qhikYpEJ1HsiQosE=; b=T5FDN758/KNrKvZ6v/rCdtf0U8IQ81E3whmmtmtiqcv5uBjFKiXQHKEy NXYezTjbhG3XLAEbsQyQs+/AXTbO3ol8+IzvbbOVfRD3yk9D6yVoA5o0W Yu3cE5GqLgXw+ZI+9QNkPnbLkPbjZlh3oKFKuYItC1xwzOaK/LdSHfAph BBXOQHWmePCUCzL2t6nUfvd3z1REkB+VroHcQN9EzRFj3VJ/TPAzdmbHK nqqwQbY7lqjtj1LjXIsyEKpRTaVkro54a2xVOONVd0BrJV5R3nKIZGxKN jYpUPlhg/XtEOUEP/q6MNBfM5V0FQeXQAuNo7YJzuuu1iVFFYUXbe1COk A==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625950" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625950" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657773" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657773" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:45 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 20/24] sched/fair: Introduce sched_smt_siblings_idle() Date: Mon, 6 Feb 2023 21:11:01 -0800 Message-Id: <20230207051105.11575-21-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org X86 needs to know the idle state of the SMT siblings of a CPU to improve the accuracy of IPCC classification. X86 implements support for IPC classes in the thermal HFI driver. Rename is_core_idle() as sched_smt_siblings_idle() and make it available outside the scheduler code. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Len Brown Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- is_core_idle() is no longer an inline function after this patch. To rule out performance degradation, I compared the execution time of the inline and non-inline versions on a 4-socket Cascade Lake system using the NUMA stressor of stress-ng: $ stress-ng --numa 1500 -t 10m is_core_idle() was called ~200,000 times. I measured the value of the TSC counter before and after calling is_core_idle() and computed the delta value. I arbitrarily removed outliers (defined as any delta larger than 5000 counts). This required removing ~40 samples. The table below summarizes the difference in execution time. All quantities are expressed in TSC counts, except the standard deviation, expressed as a percentage of the average. Average Median Std(%) Mode TSCdelta inline 668.76 626 67.24 42 TSCdelta non-inline 677.64 624 67.67 46 All metrics are similar for the inline and non-inline cases. --- Changes since v2: * Brought back this previously dropped patch. * Profiled inline vs non-inline is_core_idle(). I found not major penalty. * Merged is_core_idle() and sched_smt_siblings_idle() into a single function. (Dietmar) Changes since v1: * Dropped this patch. --- include/linux/sched.h | 2 ++ kernel/sched/fair.c | 21 +++++++++++++++------ 2 files changed, 17 insertions(+), 6 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 45f28a601b3d..7ef9fd84e7ad 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2449,4 +2449,6 @@ static inline void sched_core_fork(struct task_struct *p) { } extern void sched_set_stop_task(int cpu, struct task_struct *stop); +extern bool sched_smt_siblings_idle(int cpu); + #endif diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d3c22dc145f7..a66d86c5cb5c 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1064,7 +1064,14 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se) * Scheduling class queueing methods: */ -static inline bool is_core_idle(int cpu) +/** + * sched_smt_siblings_idle - Check whether SMT siblings of a CPU are idle + * @cpu: The CPU to check + * + * Returns true if all the SMT siblings of @cpu are idle or @cpu does not have + * SMT siblings. The idle state of @cpu is not considered. + */ +bool sched_smt_siblings_idle(int cpu) { #ifdef CONFIG_SCHED_SMT int sibling; @@ -1767,7 +1774,7 @@ static inline int numa_idle_core(int idle_core, int cpu) * Prefer cores instead of packing HT siblings * and triggering future load balancing. */ - if (is_core_idle(cpu)) + if (sched_smt_siblings_idle(cpu)) idle_core = cpu; return idle_core; @@ -9518,7 +9525,8 @@ sched_asym(struct lb_env *env, struct sd_lb_stats *sds, struct sg_lb_stats *sgs * If the destination CPU has SMT siblings, env->idle != CPU_NOT_IDLE * is not sufficient. We need to make sure the whole core is idle. */ - if (sds->local->flags & SD_SHARE_CPUCAPACITY && !is_core_idle(env->dst_cpu)) + if (sds->local->flags & SD_SHARE_CPUCAPACITY && + !sched_smt_siblings_idle(env->dst_cpu)) return false; /* Only do SMT checks if either local or candidate have SMT siblings. */ @@ -10687,7 +10695,8 @@ static struct rq *find_busiest_queue(struct lb_env *env, sched_asym_prefer(i, env->dst_cpu) && nr_running == 1) { if (env->sd->flags & SD_SHARE_CPUCAPACITY || - (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && is_core_idle(i))) + (!(env->sd->flags & SD_SHARE_CPUCAPACITY) && + sched_smt_siblings_idle(i))) continue; } @@ -10816,7 +10825,7 @@ asym_active_balance(struct lb_env *env) * busy sibling. */ return sched_asym_prefer(env->dst_cpu, env->src_cpu) || - !is_core_idle(env->src_cpu); + !sched_smt_siblings_idle(env->src_cpu); } return false; @@ -11563,7 +11572,7 @@ static void nohz_balancer_kick(struct rq *rq) */ if (sd->flags & SD_SHARE_CPUCAPACITY || (!(sd->flags & SD_SHARE_CPUCAPACITY) && - is_core_idle(i))) { + sched_smt_siblings_idle(i))) { flags = NOHZ_STATS_KICK | NOHZ_BALANCE_KICK; goto unlock; } From patchwork Tue Feb 7 05:11:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651621 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12236C636CC for ; Tue, 7 Feb 2023 05:02:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230391AbjBGFCz (ORCPT ); Tue, 7 Feb 2023 00:02:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230039AbjBGFCA (ORCPT ); Tue, 7 Feb 2023 00:02:00 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 26BF511EBA; Mon, 6 Feb 2023 21:01:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746119; x=1707282119; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=LRruVzFjzKpVUadVEuZKwelOSSQHqDO6Cu2BUXGn/i0=; b=lIT7rH82m7j2ydTUSyTFAOE1oQoG9CuZrVnEL911ibaoIZCKxF2LVrU7 ELMtviurQo4F/kRMQBuVTfeD2+O6BexIzlNousae+dIaMBft3Yx57FoFi DcOjS3KpHPTj/QALr8DzEJ6SG90f8o+w1fCj+A1Anq3LCSMNXPVAqNpPe sJCuL4sfzXsD3izaZRm0Yyh0nyrX7XNcf7Z+piTmw76JpC+rxSi1Ejn05 aSpt6gKezZDx2D1tCtISx0KgudexZSC4H35g0a3JTBEtwYv4nqeXx+CCg ULxy6wm4D+9SjrXXaeXehv07wq32/SNFuNyy3HC2lNpFQc07g/27rUbOo w==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625961" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625961" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:46 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657779" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657779" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:46 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 21/24] thermal: intel: hfi: Implement model-specific checks for task classification Date: Mon, 6 Feb 2023 21:11:02 -0800 Message-Id: <20230207051105.11575-22-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org In Alder Lake and Raptor Lake, the result of thread classification is more accurate when only one SMT sibling is busy. Classification results for class 2 and 3 are always reliable. To avoid unnecessary migrations, only update the class of a task if it has been the same during 4 consecutive user ticks. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * None Changes since v1: * Adjusted the result the classification of Intel Thread Director to start at class 1. Class 0 for the scheduler means that the task is unclassified. * Used the new names of the IPC classes members in task_struct. * Reworked helper functions to use sched_smt_siblings_idle() to query the idle state of the SMT siblings of a CPU. --- drivers/thermal/intel/intel_hfi.c | 60 ++++++++++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 35d947f47550..fdb53e4cabc1 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -40,6 +40,7 @@ #include #include +#include #include "../thermal_core.h" #include "intel_hfi.h" @@ -209,9 +210,64 @@ static int __percpu *hfi_ipcc_scores; */ #define HFI_UNCLASSIFIED_DEFAULT 1 +#define CLASS_DEBOUNCER_SKIPS 4 + +/** + * debounce_and_update_class() - Process and update a task's classification + * + * @p: The task of which the classification will be updated + * @new_ipcc: The new IPC classification + * + * Update the classification of @p with the new value that hardware provides. + * Only update the classification of @p if it has been the same during + * CLASS_DEBOUNCER_SKIPS consecutive ticks. + */ +static void debounce_and_update_class(struct task_struct *p, u8 new_ipcc) +{ + u16 debounce_skip; + + /* The class of @p changed. Only restart the debounce counter. */ + if (p->ipcc_tmp != new_ipcc) { + p->ipcc_cntr = 1; + goto out; + } + + /* + * The class of @p did not change. Update it if it has been the same + * for CLASS_DEBOUNCER_SKIPS user ticks. + */ + debounce_skip = p->ipcc_cntr + 1; + if (debounce_skip < CLASS_DEBOUNCER_SKIPS) + p->ipcc_cntr++; + else + p->ipcc = new_ipcc; + +out: + p->ipcc_tmp = new_ipcc; +} + +static bool classification_is_accurate(u8 hfi_class, bool smt_siblings_idle) +{ + switch (boot_cpu_data.x86_model) { + case INTEL_FAM6_ALDERLAKE: + case INTEL_FAM6_ALDERLAKE_L: + case INTEL_FAM6_RAPTORLAKE: + case INTEL_FAM6_RAPTORLAKE_P: + case INTEL_FAM6_RAPTORLAKE_S: + if (hfi_class == 3 || hfi_class == 2 || smt_siblings_idle) + return true; + + return false; + + default: + return true; + } +} + void intel_hfi_update_ipcc(struct task_struct *curr) { union hfi_thread_feedback_char_msr msr; + bool idle; /* We should not be here if ITD is not supported. */ if (!cpu_feature_enabled(X86_FEATURE_ITD)) { @@ -227,7 +283,9 @@ void intel_hfi_update_ipcc(struct task_struct *curr) * 0 is a valid classification for Intel Thread Director. A scheduler * IPCC class of 0 means that the task is unclassified. Adjust. */ - curr->ipcc = msr.split.classid + 1; + idle = sched_smt_siblings_idle(task_cpu(curr)); + if (classification_is_accurate(msr.split.classid, idle)) + debounce_and_update_class(curr, msr.split.classid + 1); } unsigned long intel_hfi_get_ipcc_score(unsigned short ipcc, int cpu) From patchwork Tue Feb 7 05:11:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ricardo Neri X-Patchwork-Id: 651619 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0DE7EC636D4 for ; Tue, 7 Feb 2023 05:03:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230034AbjBGFD0 (ORCPT ); Tue, 7 Feb 2023 00:03:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230137AbjBGFCC (ORCPT ); Tue, 7 Feb 2023 00:02:02 -0500 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64BAD196BA; Mon, 6 Feb 2023 21:02:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1675746120; x=1707282120; h=from:to:cc:subject:date:message-id:in-reply-to: references; bh=un8n4Xz9yXThF7EWOBFRrgSk/o9VXPxzUfeccAO2+YA=; b=A1K+TFOmKRVUM96AxuUVd4h3fifqUoc5xeYFhTBA5F+eu+psGs+Z+nHv KTqb2ir9HdMKJ8pZOw75N+KgLixp8pfAxpjXDGqcb6OSRjnLIVUvPhkdt Z2obMioz5pNcPZ/icAu87MBHUTCZuU0jRVrJMapqDpKzSsLNjmy+H44nQ OgRLhcOYgOVewXrILdyMssmxxF3ku+hwmS4ZXbepNjk3c+ae8kahRFg9c 3xNrHq3qDFSk5AvF5DhiizIhlqsJszoxtwFS+ho0WBH2gbNT2vobQp34W gHcNUaqVj5ROf2WOsNByAgKvYwYkS8xtQiMPt8Qn7G1ZPcV8iQDpOOQuH g==; X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="415625995" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="415625995" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2023 21:01:47 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10613"; a="668657794" X-IronPort-AV: E=Sophos;i="5.97,278,1669104000"; d="scan'208";a="668657794" Received: from ranerica-svr.sc.intel.com ([172.25.110.23]) by fmsmga007.fm.intel.com with ESMTP; 06 Feb 2023 21:01:47 -0800 From: Ricardo Neri To: "Peter Zijlstra (Intel)" , Juri Lelli , Vincent Guittot Cc: Ricardo Neri , "Ravi V. Shankar" , Ben Segall , Daniel Bristot de Oliveira , Dietmar Eggemann , Len Brown , Mel Gorman , "Rafael J. Wysocki" , Srinivas Pandruvada , Steven Rostedt , Tim Chen , Valentin Schneider , Lukasz Luba , Ionela Voinescu , x86@kernel.org, "Joel Fernandes (Google)" , linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ricardo Neri , "Tim C . Chen" Subject: [PATCH v3 24/24] x86/process: Reset hardware history in context switch Date: Mon, 6 Feb 2023 21:11:05 -0800 Message-Id: <20230207051105.11575-25-ricardo.neri-calderon@linux.intel.com> X-Mailer: git-send-email 2.17.1 In-Reply-To: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> References: <20230207051105.11575-1-ricardo.neri-calderon@linux.intel.com> Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Reset the classification history of the current task when switching to the next task. Hardware will start the classification of the next task from scratch. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Ionela Voinescu Cc: Joel Fernandes (Google) Cc: Len Brown Cc: Lukasz Luba Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Ricardo Neri --- Changes since v2: * None Changes since v1: * Measurements of the cost of the HRESET instruction Methodology: I created a tight loop with interrupts and preemption disabled. I recorded the value of the TSC counter before and after executing HRESET or RDTSC. I repeated the measurement 100,000 times. I performed the experiment using an Alder Lake S system. I set the frequency of the CPUs at a fixed value. The table below compares the cost of HRESET with RDTSC (expressed in the elapsed TSC count). The cost of the two instructions is comparable. PCore ECore Frequency (GHz) 5.0 3.8 HRESET (avg) 28.5 44.7 HRESET (stdev %) 3.6 2.3 RDTSC (avg) 25.2 35.7 RDTSC (stdev %) 3.9 2.6 * Used an ALTERNATIVE macro instead of static_cpu_has() to execute HRESET when supported. (PeterZ) --- arch/x86/include/asm/hreset.h | 30 ++++++++++++++++++++++++++++++ arch/x86/kernel/cpu/common.c | 7 +++++++ arch/x86/kernel/process_32.c | 3 +++ arch/x86/kernel/process_64.c | 3 +++ 4 files changed, 43 insertions(+) create mode 100644 arch/x86/include/asm/hreset.h diff --git a/arch/x86/include/asm/hreset.h b/arch/x86/include/asm/hreset.h new file mode 100644 index 000000000000..d68ca2fb8642 --- /dev/null +++ b/arch/x86/include/asm/hreset.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_HRESET_H + +/** + * HRESET - History reset. Available since binutils v2.36. + * + * Request the processor to reset the history of task classification on the + * current logical processor. The history components to be + * reset are specified in %eax. Only bits specified in CPUID(0x20).EBX + * and enabled in the IA32_HRESET_ENABLE MSR can be selected. + * + * The assembly code looks like: + * + * hreset %eax + * + * The corresponding machine code looks like: + * + * F3 0F 3A F0 ModRM Imm + * + * The value of ModRM is 0xc0 to specify %eax register addressing. + * The ignored immediate operand is set to 0. + * + * The instruction is documented in the Intel SDM. + */ + +#define __ASM_HRESET ".byte 0xf3, 0xf, 0x3a, 0xf0, 0xc0, 0x0" + +void reset_hardware_history(void); + +#endif /* _ASM_X86_HRESET_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f3f936f7de5f..17e2068530b0 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -414,6 +415,12 @@ static __always_inline void setup_umip(struct cpuinfo_x86 *c) static u32 hardware_history_features __ro_after_init; +void reset_hardware_history(void) +{ + asm_inline volatile (ALTERNATIVE("", __ASM_HRESET, X86_FEATURE_HRESET) + : : "a" (hardware_history_features) : "memory"); +} + static __always_inline void setup_hreset(struct cpuinfo_x86 *c) { if (!cpu_feature_enabled(X86_FEATURE_HRESET)) diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 470c128759ea..397a6e6f4e61 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include "process.h" @@ -214,6 +215,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* Load the Intel cache allocation PQR MSR. */ resctrl_sched_in(); + reset_hardware_history(); + return prev_p; } diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 4e34b3b68ebd..6176044ecc16 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #ifdef CONFIG_IA32_EMULATION @@ -658,6 +659,8 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* Load the Intel cache allocation PQR MSR. */ resctrl_sched_in(); + reset_hardware_history(); + return prev_p; }