From patchwork Fri Jan 27 20:20:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srinivas Pandruvada X-Patchwork-Id: 647913 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D848FC61DA7 for ; Fri, 27 Jan 2023 20:21:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232517AbjA0UVC (ORCPT ); Fri, 27 Jan 2023 15:21:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35512 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232622AbjA0UVB (ORCPT ); Fri, 27 Jan 2023 15:21:01 -0500 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 850F077537; Fri, 27 Jan 2023 12:20:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674850859; x=1706386859; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/O70WEfgi8r2v8zGP89oVB7lVpD2sIvrsqQ6kxAoIH4=; b=YQniGedVZT7Sk5l16EPcQNgNVMgLqwVuOCRFSVqCcSDIlwXrRY1zoUtF RjycdSBa2Zd2sxZidWXURLnssoG7NHACaD4srRp9M05mV2jlIZAoTV/QM UjgYdD9vttJA+Ot/SA2YF26pUd6vNhHey4MH9Obv+T0xzt9QS9kteAW5S bjsuLeuqZg7VZPAsIEA0SowW2iSYPdgmz6WJ1W07VriSGFgbMsejOuvpR 6ONPdbv6ZEVjn34SRA7CuMitHYtJpldhO2rwTrixtZK78xqME95KPTSzf yvw2Ssz8/RlnlfgHbrBwh6bKPlYUSy8PEXS8bR5pfrYqvYfJeVKDVPpsi A==; X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="310804597" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="310804597" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2023 12:20:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="693840013" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="693840013" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by orsmga008.jf.intel.com with ESMTP; 27 Jan 2023 12:20:56 -0800 From: Srinivas Pandruvada To: rafael@kernel.org, rui.zhang@intel.com, daniel.lezcano@linaro.org Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Srinivas Pandruvada Subject: [PATCH v4 1/3] powercap: idle_inject: Export symbols Date: Fri, 27 Jan 2023 12:20:46 -0800 Message-Id: <20230127202048.992504-2-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230127202048.992504-1-srinivas.pandruvada@linux.intel.com> References: <20230127202048.992504-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Export symbols for external interfaces, so that they can be used in other loadable modules. Export is done under name space IDLE_INJECT. Signed-off-by: Srinivas Pandruvada --- v2/v3/v4: No change drivers/powercap/idle_inject.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c index 21f7cd7d159b..a7aed680b54f 100644 --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -163,6 +163,7 @@ void idle_inject_set_duration(struct idle_inject_device *ii_dev, if (!run_duration_us) pr_debug("CPU is forced to 100 percent idle\n"); } +EXPORT_SYMBOL_NS_GPL(idle_inject_set_duration, IDLE_INJECT); /** * idle_inject_get_duration - idle and run duration retrieval helper @@ -177,6 +178,7 @@ void idle_inject_get_duration(struct idle_inject_device *ii_dev, *run_duration_us = READ_ONCE(ii_dev->run_duration_us); *idle_duration_us = READ_ONCE(ii_dev->idle_duration_us); } +EXPORT_SYMBOL_NS_GPL(idle_inject_get_duration, IDLE_INJECT); /** * idle_inject_set_latency - set the maximum latency allowed @@ -188,6 +190,7 @@ void idle_inject_set_latency(struct idle_inject_device *ii_dev, { WRITE_ONCE(ii_dev->latency_us, latency_us); } +EXPORT_SYMBOL_NS_GPL(idle_inject_set_latency, IDLE_INJECT); /** * idle_inject_start - start idle injections @@ -219,6 +222,7 @@ int idle_inject_start(struct idle_inject_device *ii_dev) return 0; } +EXPORT_SYMBOL_NS_GPL(idle_inject_start, IDLE_INJECT); /** * idle_inject_stop - stops idle injections @@ -265,6 +269,7 @@ void idle_inject_stop(struct idle_inject_device *ii_dev) cpu_hotplug_enable(); } +EXPORT_SYMBOL_NS_GPL(idle_inject_stop, IDLE_INJECT); /** * idle_inject_setup - prepare the current task for idle injection @@ -340,6 +345,7 @@ struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) return NULL; } +EXPORT_SYMBOL_NS_GPL(idle_inject_register, IDLE_INJECT); /** * idle_inject_unregister - unregister idle injection control device @@ -360,6 +366,7 @@ void idle_inject_unregister(struct idle_inject_device *ii_dev) kfree(ii_dev); } +EXPORT_SYMBOL_NS_GPL(idle_inject_unregister, IDLE_INJECT); static struct smp_hotplug_thread idle_inject_threads = { .store = &idle_inject_thread.tsk, From patchwork Fri Jan 27 20:20:47 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srinivas Pandruvada X-Patchwork-Id: 647912 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F328C38142 for ; Fri, 27 Jan 2023 20:21:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232167AbjA0UVM (ORCPT ); Fri, 27 Jan 2023 15:21:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232622AbjA0UVG (ORCPT ); Fri, 27 Jan 2023 15:21:06 -0500 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A0C3A790B3; Fri, 27 Jan 2023 12:21:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674850861; x=1706386861; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EATllbCD8tCrkyOe+A2As7KubdJWPVBiPDGExUYSfcY=; b=NiQQ0cjHSDaGSzJnwAgL6gXTmF2otTz3qeJWZ2sihfZJ8esuwixYdGF9 bwSag+H+fVmxkiafXXflWWEVdhZiw+81BHElc+Z3vNcZENPRaKAkzUJkb LViMFgekc+QhCHhz1UQDv/5k6z6EsiFzS0FqOScqcxoEfPtPkSIbc/4q1 P/jIebKdNRlNO0P4DEHXKJq4WxJtd3wSHZuDx/bN8poHdl6w0gBaryUmM RyPqLpk/ekocFdaTWU8q7ekcTbqsexqwGsCgA/RPhLjUZSQKl6Ljmk7K1 MUGGfjGBDM1FUTIYkBxqmjU0WZ8XWiZzDyYtOgfbm4oXynuI+6BWoR5hP Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="310804599" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="310804599" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2023 12:20:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="693840016" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="693840016" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by orsmga008.jf.intel.com with ESMTP; 27 Jan 2023 12:20:57 -0800 From: Srinivas Pandruvada To: rafael@kernel.org, rui.zhang@intel.com, daniel.lezcano@linaro.org Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Srinivas Pandruvada Subject: [PATCH v4 2/3] powercap: idle_inject: Add update callback Date: Fri, 27 Jan 2023 12:20:47 -0800 Message-Id: <20230127202048.992504-3-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230127202048.992504-1-srinivas.pandruvada@linux.intel.com> References: <20230127202048.992504-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The powercap/idle_inject core uses play_idle_precise() to inject idle time. But play_idle_precise() can't ensure that the CPU is fully idle for the specified duration because of wakeups due to interrupts. To compensate for the reduced idle time due to these wakes, the caller can adjust requested idle time for the next cycle. The goal of idle injection is to keep system at some idle percent on average, so this is fine to overshoot or undershoot instantaneous idle times. The idle inject core provides an interface idle_inject_set_duration() to set idle and runtime duration. Some architectures provide interface to get actual idle time observed by the hardware. So, the effective idle percent can be adjusted using the hardware feedback. For example, Intel CPUs provides package idle counters, which is currently used by Intel powerclamp driver to readjust runtime duration. When the caller's desired idle time over a period is less or greater than the actual CPU idle time observed by the hardware, caller can readjust idle and runtime duration for the next cycle. The only way this can be done currently is by monitoring hardware idle time from a different software thread and readjust idle and runtime duration using idle_inject_set_duration(). This can be avoided by adding a callback which callers can register and readjust from this callback function. Add a capability to register an optional update() callback, which can be called from the idle inject core before waking up CPUs for idle injection. This callback can be registered via a new interface: idle_inject_register_full(). During this process of constantly adjusting idle and runtime duration there can be some cases where actual idle time is more than the desired. In this case idle inject can be skipped for a cycle. If update() callback returns false, then the idle inject core skips waking up CPUs for the idle injection. Signed-off-by: Srinivas Pandruvada --- v4: - No change v3: - Replace prepare/complete callback with update callback v2 - Replace begin/end with prepare/complete - Add new interface idle_inject_register_full with callbacks - Update kernel doc - Update commit description drivers/powercap/idle_inject.c | 50 ++++++++++++++++++++++++++++++---- include/linux/idle_inject.h | 3 ++ 2 files changed, 47 insertions(+), 6 deletions(-) diff --git a/drivers/powercap/idle_inject.c b/drivers/powercap/idle_inject.c index a7aed680b54f..36d3b96b9739 100644 --- a/drivers/powercap/idle_inject.c +++ b/drivers/powercap/idle_inject.c @@ -63,13 +63,27 @@ struct idle_inject_thread { * @idle_duration_us: duration of CPU idle time to inject * @run_duration_us: duration of CPU run time to allow * @latency_us: max allowed latency + * @update: Optional callback deciding whether or not to skip idle + * injection in the given cycle. * @cpumask: mask of CPUs affected by idle injection + * + * This structure is used to define per instance idle inject device data. Each + * instance has an idle duration, a run duration and mask of CPUs to inject + * idle. + * Actual idle is injected by calling kernel scheduler interface + * play_idle_precise(). There is one optional callbacks which the caller can + * register by calling idle_inject_register_full(): + * update() - This callback is called just before waking up CPUs to inject + * idle. If this callback returns false, CPUs are not woken up to inject idle + * for this cycle. Also gives opportunity to the caller to readjust idle + * and run duration by calling idle_inject_set_duration() for the next cycle. */ struct idle_inject_device { struct hrtimer timer; unsigned int idle_duration_us; unsigned int run_duration_us; unsigned int latency_us; + bool (*update)(void); unsigned long cpumask[]; }; @@ -111,11 +125,12 @@ static enum hrtimer_restart idle_inject_timer_fn(struct hrtimer *timer) struct idle_inject_device *ii_dev = container_of(timer, struct idle_inject_device, timer); + if (!ii_dev->update || (ii_dev->update && ii_dev->update())) + idle_inject_wakeup(ii_dev); + duration_us = READ_ONCE(ii_dev->run_duration_us); duration_us += READ_ONCE(ii_dev->idle_duration_us); - idle_inject_wakeup(ii_dev); - hrtimer_forward_now(timer, ns_to_ktime(duration_us * NSEC_PER_USEC)); return HRTIMER_RESTART; @@ -298,17 +313,22 @@ static int idle_inject_should_run(unsigned int cpu) } /** - * idle_inject_register - initialize idle injection on a set of CPUs + * idle_inject_register_full - initialize idle injection on a set of CPUs * @cpumask: CPUs to be affected by idle injection + * @update: This callback is called just before waking up CPUs to inject + * idle * * This function creates an idle injection control device structure for the - * given set of CPUs and initializes the timer associated with it. It does not - * start any injection cycles. + * given set of CPUs and initializes the timer associated with it. This + * function also allows to register update()callback. + * It does not start any injection cycles. * * Return: NULL if memory allocation fails, idle injection control device * pointer on success. */ -struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) + +struct idle_inject_device *idle_inject_register_full(struct cpumask *cpumask, + bool (*update)(void)) { struct idle_inject_device *ii_dev; int cpu, cpu_rb; @@ -321,6 +341,7 @@ struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); ii_dev->timer.function = idle_inject_timer_fn; ii_dev->latency_us = UINT_MAX; + ii_dev->update = update; for_each_cpu(cpu, to_cpumask(ii_dev->cpumask)) { @@ -345,6 +366,23 @@ struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) return NULL; } +EXPORT_SYMBOL_NS_GPL(idle_inject_register_full, IDLE_INJECT); + +/** + * idle_inject_register - initialize idle injection on a set of CPUs + * @cpumask: CPUs to be affected by idle injection + * + * This function creates an idle injection control device structure for the + * given set of CPUs and initializes the timer associated with it. It does not + * start any injection cycles. + * + * Return: NULL if memory allocation fails, idle injection control device + * pointer on success. + */ +struct idle_inject_device *idle_inject_register(struct cpumask *cpumask) +{ + return idle_inject_register_full(cpumask, NULL); +} EXPORT_SYMBOL_NS_GPL(idle_inject_register, IDLE_INJECT); /** diff --git a/include/linux/idle_inject.h b/include/linux/idle_inject.h index fb88e23a99d3..a85d5dd40f72 100644 --- a/include/linux/idle_inject.h +++ b/include/linux/idle_inject.h @@ -13,6 +13,9 @@ struct idle_inject_device; struct idle_inject_device *idle_inject_register(struct cpumask *cpumask); +struct idle_inject_device *idle_inject_register_full(struct cpumask *cpumask, + bool (*update)(void)); + void idle_inject_unregister(struct idle_inject_device *ii_dev); int idle_inject_start(struct idle_inject_device *ii_dev); From patchwork Fri Jan 27 20:20:48 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Srinivas Pandruvada X-Patchwork-Id: 648371 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B50B4C61DB3 for ; Fri, 27 Jan 2023 20:21:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231837AbjA0UVE (ORCPT ); Fri, 27 Jan 2023 15:21:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233188AbjA0UVC (ORCPT ); Fri, 27 Jan 2023 15:21:02 -0500 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 69C1540EC; Fri, 27 Jan 2023 12:20:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1674850859; x=1706386859; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nBCABfM+AIeqB9SirR9XWtd7lABam0dXeHCpJ6loGPU=; b=HGDyIQsLS3tkV9R3MOcOWeKBjVHXGY3IXIZ+v6TYZ8WQYU2zunBG6Pm3 WN+16lL/kH4XOlWReXpCn7ulDr216rL9e5FFbhOhKB6poAw5EYcgnnkJs OWzskJqEMZGYTS9nhm03XanUWs9kn6OssGur//1PJQWHx1FVLViW9u43J Syi0vDM7G8fXPYLDxCLkNPLt+oLn7CrVGSdBx8soBUZNAXMqcukOupgxd UhFoOO2ZDl2qrZ+V7JYTqkzvfI8id/Cu3aKl81iEg+93x/+42gka2Y4cY h8GhywVjOyRO+TwTVEDwzetrvh4kyj+oTkfngxIBN7iBJzZpYht7YNW1r w==; X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="310804604" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="310804604" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jan 2023 12:20:57 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10603"; a="693840019" X-IronPort-AV: E=Sophos;i="5.97,252,1669104000"; d="scan'208";a="693840019" Received: from spandruv-desk.jf.intel.com ([10.54.75.8]) by orsmga008.jf.intel.com with ESMTP; 27 Jan 2023 12:20:57 -0800 From: Srinivas Pandruvada To: rafael@kernel.org, rui.zhang@intel.com, daniel.lezcano@linaro.org Cc: linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, Srinivas Pandruvada , kernel test robot Subject: [PATCH v4 3/3] thermal/drivers/intel_powerclamp: Use powercap idle-inject framework Date: Fri, 27 Jan 2023 12:20:48 -0800 Message-Id: <20230127202048.992504-4-srinivas.pandruvada@linux.intel.com> X-Mailer: git-send-email 2.39.1 In-Reply-To: <20230127202048.992504-1-srinivas.pandruvada@linux.intel.com> References: <20230127202048.992504-1-srinivas.pandruvada@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org There are two idle injection implementation in the Linux kernel. One via intel_powerclamp and the other using powercap/idle_inject. Both implementation end up in calling play_idle* function from a FIFO priority thread. Both can't be used at the same time. It is better to use one idle injection framework for better maintainability. In this way, there is only one caller for play_idle. Here powercap/idle_inject can be used for both per-core and for system wide idle injection. This framework has a well defined interface which allow registry for per-core or for all CPUs (system wide). This reduces code complexity in the intel powerclamp driver as all the per CPU kthreads, delayed work and calls to play_idle can be removed. The changes include: - Remove unneeded include files - Remove per CPU kthread workers: balancing_work and idle_injection_work. - Reuse the compensation related code by moving from previous worker thread to idle_injection callback. - Adjust the idle_duration and runtime by using powercap/idle_inject interface. - Remove all variables, which are not required once powercap/idle_inject is used. - Add mutex to avoid race during removal of idle injection during module unload and user action to change idle inject percent. Also for protection during dynamic adjustment of run and idle time from update() callback. - Remove online/offline callbacks to designate control CPU - Use cpu_present_mask global variable for CPU mask - Remove hot plug locks Signed-off-by: Srinivas Pandruvada --- v4: - Remove local cpumask for present CPUs as there is one global one also no need for hot plug locks as it can't change - Add some comments on functions v3: - Use Update callback which is per device - Remove use of control_cpu and online/offline callback to set this v2: - Use idle_inject_register_full instead of idle_inject_register - Also fix dependency issue with POWERCAP config Reported-by: kernel test robot drivers/thermal/intel/Kconfig | 2 + drivers/thermal/intel/intel_powerclamp.c | 374 ++++++++++------------- 2 files changed, 155 insertions(+), 221 deletions(-) diff --git a/drivers/thermal/intel/Kconfig b/drivers/thermal/intel/Kconfig index f0c845679250..6c2a95f41c81 100644 --- a/drivers/thermal/intel/Kconfig +++ b/drivers/thermal/intel/Kconfig @@ -3,6 +3,8 @@ config INTEL_POWERCLAMP tristate "Intel PowerClamp idle injection driver" depends on X86 depends on CPU_SUP_INTEL + select POWERCAP + select IDLE_INJECT help Enable this to enable Intel PowerClamp idle injection driver. This enforce idle time which results in more package C-state residency. The diff --git a/drivers/thermal/intel/intel_powerclamp.c b/drivers/thermal/intel/intel_powerclamp.c index b80e25ec1261..320525c3c530 100644 --- a/drivers/thermal/intel/intel_powerclamp.c +++ b/drivers/thermal/intel/intel_powerclamp.c @@ -2,7 +2,7 @@ /* * intel_powerclamp.c - package c-state idle injection * - * Copyright (c) 2012, Intel Corporation. + * Copyright (c) 2012-2023, Intel Corporation. * * Authors: * Arjan van de Ven @@ -27,21 +27,15 @@ #include #include #include -#include #include #include -#include -#include #include #include -#include -#include +#include -#include #include #include #include -#include #define MAX_TARGET_RATIO (50U) /* For each undisturbed clamping period (no extra wake ups during idle time), @@ -58,35 +52,26 @@ static unsigned int target_mwait; static struct dentry *debug_dir; -/* user selected target */ -static unsigned int set_target_ratio; +/* Idle ratio observed using package C-state counters */ static unsigned int current_ratio; -static bool should_skip; -static unsigned int control_cpu; /* The cpu assigned to collect stat and update - * control parameters. default to BSP but BSP - * can be offlined. - */ -static bool clamping; +/* Skip the idle injection till set to true */ +static bool should_skip; -struct powerclamp_worker_data { - struct kthread_worker *worker; - struct kthread_work balancing_work; - struct kthread_delayed_work idle_injection_work; +struct powerclamp_data { unsigned int cpu; unsigned int count; unsigned int guard; unsigned int window_size_now; unsigned int target_ratio; - unsigned int duration_jiffies; bool clamping; }; -static struct powerclamp_worker_data __percpu *worker_data; +static struct powerclamp_data powerclamp_data; + static struct thermal_cooling_device *cooling_dev; -static unsigned long *cpu_clamping_mask; /* bit map for tracking per cpu - * clamping kthread worker - */ + +static DEFINE_MUTEX(powerclamp_lock); static unsigned int duration; static unsigned int pkg_cstate_ratio_cur; @@ -302,7 +287,7 @@ static void adjust_compensation(int target_ratio, unsigned int win) if (d->confidence >= CONFIDENCE_OK) return; - delta = set_target_ratio - current_ratio; + delta = powerclamp_data.target_ratio - current_ratio; /* filter out bad data */ if (delta >= 0 && delta <= (1+target_ratio/10)) { if (d->steady_comp) @@ -341,82 +326,39 @@ static bool powerclamp_adjust_controls(unsigned int target_ratio, adjust_compensation(target_ratio, win); /* if we are above target+guard, skip */ - return set_target_ratio + guard <= current_ratio; + return powerclamp_data.target_ratio + guard <= current_ratio; } -static void clamp_balancing_func(struct kthread_work *work) +/* + * This function calculates runtime from the current target ratio. + * This function gets called under powerclamp_lock. + */ +static unsigned int get_run_time(void) { - struct powerclamp_worker_data *w_data; - int sleeptime; - unsigned long target_jiffies; unsigned int compensated_ratio; - int interval; /* jiffies to sleep for each attempt */ - - w_data = container_of(work, struct powerclamp_worker_data, - balancing_work); + unsigned int runtime; /* * make sure user selected ratio does not take effect until * the next round. adjust target_ratio if user has changed * target such that we can converge quickly. */ - w_data->target_ratio = READ_ONCE(set_target_ratio); - w_data->guard = 1 + w_data->target_ratio / 20; - w_data->window_size_now = window_size; - w_data->duration_jiffies = msecs_to_jiffies(duration); - w_data->count++; + powerclamp_data.guard = 1 + powerclamp_data.target_ratio / 20; + powerclamp_data.window_size_now = window_size; /* * systems may have different ability to enter package level * c-states, thus we need to compensate the injected idle ratio * to achieve the actual target reported by the HW. */ - compensated_ratio = w_data->target_ratio + - get_compensation(w_data->target_ratio); + compensated_ratio = powerclamp_data.target_ratio + + get_compensation(powerclamp_data.target_ratio); if (compensated_ratio <= 0) compensated_ratio = 1; - interval = w_data->duration_jiffies * 100 / compensated_ratio; - - /* align idle time */ - target_jiffies = roundup(jiffies, interval); - sleeptime = target_jiffies - jiffies; - if (sleeptime <= 0) - sleeptime = 1; - - if (clamping && w_data->clamping && cpu_online(w_data->cpu)) - kthread_queue_delayed_work(w_data->worker, - &w_data->idle_injection_work, - sleeptime); -} -static void clamp_idle_injection_func(struct kthread_work *work) -{ - struct powerclamp_worker_data *w_data; + runtime = duration * 100 / compensated_ratio - duration; - w_data = container_of(work, struct powerclamp_worker_data, - idle_injection_work.work); - - /* - * only elected controlling cpu can collect stats and update - * control parameters. - */ - if (w_data->cpu == control_cpu && - !(w_data->count % w_data->window_size_now)) { - should_skip = - powerclamp_adjust_controls(w_data->target_ratio, - w_data->guard, - w_data->window_size_now); - smp_mb(); - } - - if (should_skip) - goto balance; - - play_idle(jiffies_to_usecs(w_data->duration_jiffies)); - -balance: - if (clamping && w_data->clamping && cpu_online(w_data->cpu)) - kthread_queue_work(w_data->worker, &w_data->balancing_work); + return runtime; } /* @@ -452,126 +394,132 @@ static void poll_pkg_cstate(struct work_struct *dummy) msr_last = msr_now; tsc_last = tsc_now; - if (true == clamping) + mutex_lock(&powerclamp_lock); + if (powerclamp_data.clamping) schedule_delayed_work(&poll_pkg_cstate_work, HZ); + mutex_unlock(&powerclamp_lock); } -static void start_power_clamp_worker(unsigned long cpu) -{ - struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu); - struct kthread_worker *worker; +static struct idle_inject_device *ii_dev; - worker = kthread_create_worker_on_cpu(cpu, 0, "kidle_inj/%ld", cpu); - if (IS_ERR(worker)) - return; - - w_data->worker = worker; - w_data->count = 0; - w_data->cpu = cpu; - w_data->clamping = true; - set_bit(cpu, cpu_clamping_mask); - sched_set_fifo(worker->task); - kthread_init_work(&w_data->balancing_work, clamp_balancing_func); - kthread_init_delayed_work(&w_data->idle_injection_work, - clamp_idle_injection_func); - kthread_queue_work(w_data->worker, &w_data->balancing_work); -} - -static void stop_power_clamp_worker(unsigned long cpu) +/* + * This function is called from idle injection core on timer expiry + * for the run duration. This allows powerclamp to readjust or skip + * injecting idle for this cycle. + */ +static bool idle_inject_update(void) { - struct powerclamp_worker_data *w_data = per_cpu_ptr(worker_data, cpu); + bool update; - if (!w_data->worker) - return; + mutex_lock(&powerclamp_lock); - w_data->clamping = false; - /* - * Make sure that all works that get queued after this point see - * the clamping disabled. The counter part is not needed because - * there is an implicit memory barrier when the queued work - * is proceed. - */ - smp_wmb(); - kthread_cancel_work_sync(&w_data->balancing_work); - kthread_cancel_delayed_work_sync(&w_data->idle_injection_work); - /* - * The balancing work still might be queued here because - * the handling of the "clapming" variable, cancel, and queue - * operations are not synchronized via a lock. But it is not - * a big deal. The balancing work is fast and destroy kthread - * will wait for it. - */ - clear_bit(w_data->cpu, cpu_clamping_mask); - kthread_destroy_worker(w_data->worker); + if (!(powerclamp_data.count % powerclamp_data.window_size_now)) { - w_data->worker = NULL; -} + should_skip = powerclamp_adjust_controls(powerclamp_data.target_ratio, + powerclamp_data.guard, + powerclamp_data.window_size_now); + update = true; + } -static int start_power_clamp(void) -{ - unsigned long cpu; + if (update) { + unsigned int runtime = get_run_time(); - set_target_ratio = clamp(set_target_ratio, 0U, MAX_TARGET_RATIO - 1); - /* prevent cpu hotplug */ - cpus_read_lock(); + idle_inject_set_duration(ii_dev, runtime, duration); + } - /* prefer BSP */ - control_cpu = cpumask_first(cpu_online_mask); + powerclamp_data.count++; - clamping = true; - schedule_delayed_work(&poll_pkg_cstate_work, 0); + mutex_unlock(&powerclamp_lock); - /* start one kthread worker per online cpu */ - for_each_online_cpu(cpu) { - start_power_clamp_worker(cpu); - } - cpus_read_unlock(); + if (should_skip) + return false; - return 0; + return true; } -static void end_power_clamp(void) +/* This function starts idle injection by calling idle_inject_start() */ +static void trigger_idle_injection(void) { - int i; + unsigned int runtime = get_run_time(); + + idle_inject_set_duration(ii_dev, runtime, duration); + idle_inject_start(ii_dev); + powerclamp_data.clamping = true; +} +/* + * This function is called from start_power_clamp() to register + * CPUS with powercap idle injection register and set default + * idle duration and latency. + */ +static int powerclamp_idle_injection_register(void) +{ /* - * Block requeuing in all the kthread workers. They will flush and - * stop faster. + * The idle inject core will only inject for online CPUs, + * So we can register for all present CPUs. In this way + * if some CPU goes online/offline while idle inject + * is registered, nothing additional calls are required. + * The same runtime and idle time is applicable for + * newly onlined CPUs if any. + * + * Here cpu_present_mask can be used as is. + * cast to (struct cpumask *) is required as the + * cpu_present_mask is const struct cpumask *, otherwise + * there will be compiler warnings. */ - clamping = false; - for_each_set_bit(i, cpu_clamping_mask, num_possible_cpus()) { - pr_debug("clamping worker for cpu %d alive, destroy\n", i); - stop_power_clamp_worker(i); + ii_dev = idle_inject_register_full((struct cpumask *)cpu_present_mask, + idle_inject_update); + if (!ii_dev) { + pr_err("powerclamp: idle_inject_register failed\n"); + return -EAGAIN; } + + idle_inject_set_duration(ii_dev, TICK_USEC, duration); + idle_inject_set_latency(ii_dev, UINT_MAX); + + return 0; } -static int powerclamp_cpu_online(unsigned int cpu) +/* + * This function is called from end_power_clamp() to stop idle injection + * and unregister CPUS from powercap idle injection core. + */ +static void remove_idle_injection(void) { - if (clamping == false) - return 0; - start_power_clamp_worker(cpu); - /* prefer BSP as controlling CPU */ - if (cpu == 0) { - control_cpu = 0; - smp_mb(); - } - return 0; + if (!powerclamp_data.clamping) + return; + + powerclamp_data.clamping = false; + idle_inject_stop(ii_dev); } -static int powerclamp_cpu_predown(unsigned int cpu) +/* + * This function is called when user change the cooling device + * state from zero to some other value. + */ +static int start_power_clamp(void) { - if (clamping == false) - return 0; + int ret; - stop_power_clamp_worker(cpu); - if (cpu != control_cpu) - return 0; + ret = powerclamp_idle_injection_register(); + if (!ret) { + trigger_idle_injection(); + schedule_delayed_work(&poll_pkg_cstate_work, 0); + } - control_cpu = cpumask_first(cpu_online_mask); - if (control_cpu == cpu) - control_cpu = cpumask_next(cpu, cpu_online_mask); - smp_mb(); - return 0; + return ret; +} + +/* + * This function is called when user change the cooling device + * state from non zero value zero. + */ +static void end_power_clamp(void) +{ + if (powerclamp_data.clamping) { + remove_idle_injection(); + idle_inject_unregister(ii_dev); + } } static int powerclamp_get_max_state(struct thermal_cooling_device *cdev, @@ -585,12 +533,17 @@ static int powerclamp_get_max_state(struct thermal_cooling_device *cdev, static int powerclamp_get_cur_state(struct thermal_cooling_device *cdev, unsigned long *state) { - if (true == clamping) + + mutex_lock(&powerclamp_lock); + + if (powerclamp_data.clamping) *state = pkg_cstate_ratio_cur; else /* to save power, do not poll idle ratio while not clamping */ *state = -1; /* indicates invalid state */ + mutex_unlock(&powerclamp_lock); + return 0; } @@ -599,24 +552,32 @@ static int powerclamp_set_cur_state(struct thermal_cooling_device *cdev, { int ret = 0; + mutex_lock(&powerclamp_lock); + new_target_ratio = clamp(new_target_ratio, 0UL, - (unsigned long) (MAX_TARGET_RATIO-1)); - if (set_target_ratio == 0 && new_target_ratio > 0) { + (unsigned long) (MAX_TARGET_RATIO - 1)); + if (!powerclamp_data.target_ratio && new_target_ratio > 0) { pr_info("Start idle injection to reduce power\n"); - set_target_ratio = new_target_ratio; + powerclamp_data.target_ratio = new_target_ratio; ret = start_power_clamp(); + if (ret) + powerclamp_data.target_ratio = 0; goto exit_set; - } else if (set_target_ratio > 0 && new_target_ratio == 0) { + } else if (powerclamp_data.target_ratio > 0 && new_target_ratio == 0) { pr_info("Stop forced idle injection\n"); end_power_clamp(); - set_target_ratio = 0; + powerclamp_data.target_ratio = 0; } else /* adjust currently running */ { - set_target_ratio = new_target_ratio; - /* make new set_target_ratio visible to other cpus */ - smp_mb(); + unsigned int runtime; + + powerclamp_data.target_ratio = new_target_ratio; + runtime = get_run_time(); + idle_inject_set_duration(ii_dev, runtime, duration); } exit_set: + mutex_unlock(&powerclamp_lock); + return ret; } @@ -657,7 +618,6 @@ static int powerclamp_debug_show(struct seq_file *m, void *unused) { int i = 0; - seq_printf(m, "controlling cpu: %d\n", control_cpu); seq_printf(m, "pct confidence steady dynamic (compensation)\n"); for (i = 0; i < MAX_TARGET_RATIO; i++) { seq_printf(m, "%d\t%lu\t%lu\t%lu\n", @@ -680,75 +640,47 @@ static inline void powerclamp_create_debug_files(void) &powerclamp_debug_fops); } -static enum cpuhp_state hp_state; - static int __init powerclamp_init(void) { int retval; - cpu_clamping_mask = bitmap_zalloc(num_possible_cpus(), GFP_KERNEL); - if (!cpu_clamping_mask) - return -ENOMEM; - /* probe cpu features and ids here */ retval = powerclamp_probe(); if (retval) - goto exit_free; + return retval; /* set default limit, maybe adjusted during runtime based on feedback */ window_size = 2; - retval = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, - "thermal/intel_powerclamp:online", - powerclamp_cpu_online, - powerclamp_cpu_predown); - if (retval < 0) - goto exit_free; - - hp_state = retval; - - worker_data = alloc_percpu(struct powerclamp_worker_data); - if (!worker_data) { - retval = -ENOMEM; - goto exit_unregister; - } cooling_dev = thermal_cooling_device_register("intel_powerclamp", NULL, - &powerclamp_cooling_ops); - if (IS_ERR(cooling_dev)) { - retval = -ENODEV; - goto exit_free_thread; - } + &powerclamp_cooling_ops); + if (IS_ERR(cooling_dev)) + return -ENODEV; if (!duration) - duration = jiffies_to_msecs(DEFAULT_DURATION_JIFFIES); + duration = jiffies_to_usecs(DEFAULT_DURATION_JIFFIES); powerclamp_create_debug_files(); return 0; - -exit_free_thread: - free_percpu(worker_data); -exit_unregister: - cpuhp_remove_state_nocalls(hp_state); -exit_free: - bitmap_free(cpu_clamping_mask); - return retval; } module_init(powerclamp_init); static void __exit powerclamp_exit(void) { + mutex_lock(&powerclamp_lock); end_power_clamp(); - cpuhp_remove_state_nocalls(hp_state); - free_percpu(worker_data); + mutex_unlock(&powerclamp_lock); + thermal_cooling_device_unregister(cooling_dev); - bitmap_free(cpu_clamping_mask); cancel_delayed_work_sync(&poll_pkg_cstate_work); debugfs_remove_recursive(debug_dir); } module_exit(powerclamp_exit); +MODULE_IMPORT_NS(IDLE_INJECT); + MODULE_LICENSE("GPL"); MODULE_AUTHOR("Arjan van de Ven "); MODULE_AUTHOR("Jacob Pan ");