From patchwork Tue Mar 10 21:41:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Francisco Jerez X-Patchwork-Id: 212618 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7BDA3C3F2D0 for ; Tue, 10 Mar 2020 21:46:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3569221655 for ; Tue, 10 Mar 2020 21:46:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=riseup.net header.i=@riseup.net header.b="c1z0yYRy" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727691AbgCJVqY (ORCPT ); Tue, 10 Mar 2020 17:46:24 -0400 Received: from mx1.riseup.net ([198.252.153.129]:49230 "EHLO mx1.riseup.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727506AbgCJVqY (ORCPT ); Tue, 10 Mar 2020 17:46:24 -0400 Received: from bell.riseup.net (unknown [10.0.1.178]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Sectigo RSA Domain Validation Secure Server CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 48cTDv48vSzFf6s; Tue, 10 Mar 2020 14:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1583876783; bh=0yX01JJFStKAYRhSVlYvPVR/0Is5p2IQSnPOaQhexy0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=c1z0yYRywJe0yR/iqrAH57P7MVw0ePcf/4aZfypegv7+hwutrt7Djffd5gCFBn9rX mPZhHgYYWpxZGLkR+LIAeRnB8dizURZpcd57A89SPM8n4ve+tsqkxPCK+p84Str5rF Vio4YgXUNyjBONNcxItsVZx4f2/i7rnk8oAg3jvY= X-Riseup-User-ID: 7730B927CA4233461DB3793177BE823CBE662ADA5D87858D0448C0AEBCA798FD Received: from [127.0.0.1] (localhost [127.0.0.1]) by bell.riseup.net (Postfix) with ESMTPSA id 48cTDv2QLDzJrlc; Tue, 10 Mar 2020 14:46:23 -0700 (PDT) From: Francisco Jerez To: linux-pm@vger.kernel.org, intel-gfx@lists.freedesktop.org Cc: "Rafael J. Wysocki" , "Pandruvada, Srinivas" , "Vivi, Rodrigo" , Peter Zijlstra Subject: [PATCH 05/10] cpufreq: intel_pstate: Implement VLP controller statistics and status calculation. Date: Tue, 10 Mar 2020 14:41:58 -0700 Message-Id: <20200310214203.26459-6-currojerez@riseup.net> In-Reply-To: <20200310214203.26459-1-currojerez@riseup.net> References: <20200310214203.26459-1-currojerez@riseup.net> MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The goal of the helper code introduced here is to compute two informational data structures: struct vlp_input_stats aggregating various scheduling and PM statistics gathered in every call of the update_util() hook, and struct vlp_status_sample which contains status information derived from the former indicating whether the system is likely to have an IO or CPU bottleneck. This will be used as main heuristic input by the new variably low-pass filtering controller (AKA VLP) that will assist the HWP at finding a reasonably energy-efficient P-state given the additional information available to the kernel about I/O utilization and scheduling behavior. Signed-off-by: Francisco Jerez --- drivers/cpufreq/intel_pstate.c | 230 +++++++++++++++++++++++++++++++++ 1 file changed, 230 insertions(+) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 8cb5bf419b40..12ee350db2a9 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -33,6 +34,8 @@ #include #include +#include "../../kernel/sched/sched.h" + #define INTEL_PSTATE_SAMPLING_INTERVAL (10 * NSEC_PER_MSEC) #define INTEL_CPUFREQ_TRANSITION_LATENCY 20000 @@ -59,6 +62,11 @@ static inline int32_t mul_fp(int32_t x, int32_t y) return ((int64_t)x * (int64_t)y) >> FRAC_BITS; } +static inline int rnd_fp(int32_t x) +{ + return (x + (1 << (FRAC_BITS - 1))) >> FRAC_BITS; +} + static inline int32_t div_fp(s64 x, s64 y) { return div64_s64((int64_t)x << FRAC_BITS, y); @@ -169,6 +177,49 @@ struct vid_data { int32_t ratio; }; +/** + * Scheduling and PM statistics gathered by update_vlp_sample() at + * every call of the VLP update_state() hook, used as heuristic + * inputs. + */ +struct vlp_input_stats { + int32_t realtime_count; + int32_t io_wait_count; + uint32_t max_response_frequency_hz; + uint32_t last_response_frequency_hz; +}; + +enum vlp_status { + VLP_BOTTLENECK_IO = 1 << 0, + /* + * XXX - Add other status bits here indicating a CPU or TDP + * bottleneck. + */ +}; + +/** + * Heuristic status information calculated by get_vlp_status_sample() + * from struct vlp_input_stats above, indicating whether the system + * has a potential IO or latency bottleneck. + */ +struct vlp_status_sample { + enum vlp_status value; + int32_t realtime_avg; +}; + +/** + * struct vlp_data - VLP controller parameters and state. + * @sample_interval_ns: Update interval in ns. + * @sample_frequency_hz: Reciprocal of the update interval in Hz. + */ +struct vlp_data { + s64 sample_interval_ns; + int32_t sample_frequency_hz; + + struct vlp_input_stats stats; + struct vlp_status_sample status; +}; + /** * struct global_params - Global parameters, mostly tunable via sysfs. * @no_turbo: Whether or not to use turbo P-states. @@ -239,6 +290,7 @@ struct cpudata { struct pstate_data pstate; struct vid_data vid; + struct vlp_data vlp; u64 last_update; u64 last_sample_time; @@ -268,6 +320,18 @@ struct cpudata { static struct cpudata **all_cpu_data; +/** + * struct vlp_params - VLP controller static configuration + * @sample_interval_ms: Update interval in ms. + * @avg*_hz: Exponential averaging frequencies of the various + * low-pass filters as an integer in Hz. + */ +struct vlp_params { + int sample_interval_ms; + int avg_hz; + int debug; +}; + /** * struct pstate_funcs - Per CPU model specific callbacks * @get_max: Callback to get maximum non turbo effective P state @@ -296,6 +360,11 @@ struct pstate_funcs { }; static struct pstate_funcs pstate_funcs __read_mostly; +static struct vlp_params vlp_params __read_mostly = { + .sample_interval_ms = 10, + .avg_hz = 2, + .debug = 0, +}; static int hwp_active __read_mostly; static int hwp_mode_bdw __read_mostly; @@ -1793,6 +1862,167 @@ static inline int32_t get_target_pstate(struct cpudata *cpu) return target; } +/** + * Initialize the struct vlp_data of the specified CPU to the defaults + * calculated from @vlp_params. + */ +static void intel_pstate_reset_vlp(struct cpudata *cpu) +{ + struct vlp_data *vlp = &cpu->vlp; + + vlp->sample_interval_ns = vlp_params.sample_interval_ms * NSEC_PER_MSEC; + vlp->sample_frequency_hz = max(1u, (uint32_t)MSEC_PER_SEC / + vlp_params.sample_interval_ms); + vlp->stats.last_response_frequency_hz = vlp_params.avg_hz; +} + +/** + * Fixed point representation with twice the usual number of + * fractional bits. + */ +#define DFRAC_BITS 16 +#define DFRAC_ONE (1 << DFRAC_BITS) +#define DFRAC_MAX_INT (0u - (uint32_t)DFRAC_ONE) + +/** + * Fast but rather inaccurate piecewise-linear approximation of a + * fixed-point inverse exponential: + * + * exp2n(p) = int_tofp(1) * 2 ^ (-p / DFRAC_ONE) + O(1) + * + * The error term should be lower in magnitude than 0.044. + */ +static int32_t exp2n(uint32_t p) +{ + if (p < 32 * DFRAC_ONE) { + /* Interpolate between 2^-floor(p) and 2^-ceil(p). */ + const uint32_t floor_p = p >> DFRAC_BITS; + const uint32_t ceil_p = (p + DFRAC_ONE - 1) >> DFRAC_BITS; + const uint64_t frac_p = p - (floor_p << DFRAC_BITS); + + return ((int_tofp(1) >> floor_p) * (DFRAC_ONE - frac_p) + + (ceil_p >= 32 ? 0 : int_tofp(1) >> ceil_p) * frac_p) >> + DFRAC_BITS; + } + + /* Short-circuit to avoid overflow. */ + return 0; +} + +/** + * Calculate the exponential averaging weight for a new sample based + * on the requested averaging frequency @hz and the delay since the + * last update. + */ +static int32_t get_last_sample_avg_weight(struct cpudata *cpu, unsigned int hz) +{ + /* + * Approximate, but saves several 64-bit integer divisions + * below and should be fully evaluated at compile-time. + * Causes the exponential averaging to have an effective base + * of 1.90702343749, which has little functional implications + * as long as the hz parameter is scaled accordingly. + */ + const uint32_t ns_per_s_shift = order_base_2(NSEC_PER_SEC); + const uint64_t delta_ns = cpu->sample.time - cpu->last_sample_time; + + return exp2n(min((uint64_t)DFRAC_MAX_INT, + (hz * delta_ns) >> (ns_per_s_shift - DFRAC_BITS))); +} + +/** + * Calculate some status information heuristically based on the struct + * vlp_input_stats statistics gathered by the update_state() hook. + */ +static const struct vlp_status_sample *get_vlp_status_sample( + struct cpudata *cpu, const int32_t po) +{ + struct vlp_data *vlp = &cpu->vlp; + struct vlp_input_stats *stats = &vlp->stats; + struct vlp_status_sample *last_status = &vlp->status; + + /* + * Calculate the VLP_BOTTLENECK_IO state bit, which indicates + * whether some IO device driver has requested a PM response + * frequency bound, typically due to the device being under + * close to full utilization, which should cause the + * controller to make a more conservative trade-off between + * latency and energy usage, since performance isn't + * guaranteed to scale further with increasing CPU frequency + * whenever the system is close to IO-bound. + * + * Note that the maximum achievable response frequency is + * limited by the sampling frequency of the controller, + * response frequency requests greater than that will be + * promoted to infinity (i.e. no low-pass filtering) in order + * to avoid violating the response frequency constraint + * provided via PM QoS. + */ + const bool bottleneck_io = stats->max_response_frequency_hz < + vlp->sample_frequency_hz; + + /* + * Calculate the realtime statistic that tracks the + * exponentially-averaged rate of occurrence of + * latency-sensitive events (like wake-ups from IO wait). + */ + const uint64_t delta_ns = cpu->sample.time - cpu->last_sample_time; + const int32_t realtime_sample = + div_fp((uint64_t)(stats->realtime_count + + (bottleneck_io ? 0 : stats->io_wait_count)) * + NSEC_PER_SEC, + 100 * delta_ns); + const int32_t alpha = get_last_sample_avg_weight(cpu, + vlp_params.avg_hz); + const int32_t realtime_avg = realtime_sample + + mul_fp(alpha, last_status->realtime_avg - realtime_sample); + + /* Consume the input statistics. */ + stats->io_wait_count = 0; + stats->realtime_count = 0; + if (bottleneck_io) + stats->last_response_frequency_hz = + stats->max_response_frequency_hz; + stats->max_response_frequency_hz = 0; + + /* Update the state of the controller. */ + last_status->realtime_avg = realtime_avg; + last_status->value = (bottleneck_io ? VLP_BOTTLENECK_IO : 0); + + /* Update state used for tracing. */ + cpu->sample.busy_scaled = int_tofp(stats->max_response_frequency_hz); + cpu->iowait_boost = realtime_avg; + + return last_status; +} + +/** + * Collect some scheduling and PM statistics in response to an + * update_state() call. + */ +static bool update_vlp_sample(struct cpudata *cpu, u64 time, unsigned int flags) +{ + struct vlp_input_stats *stats = &cpu->vlp.stats; + + /* Update PM QoS request. */ + const uint32_t resp_hz = cpu_response_frequency_qos_limit(); + + stats->max_response_frequency_hz = !resp_hz ? UINT_MAX : + max(stats->max_response_frequency_hz, resp_hz); + + /* Update scheduling statistics. */ + if ((flags & SCHED_CPUFREQ_IOWAIT)) + stats->io_wait_count++; + + if (cpu_rq(cpu->cpu)->rt.rt_nr_running) + stats->realtime_count++; + + /* Return whether a P-state update is due. */ + return smp_processor_id() == cpu->cpu && + time - cpu->sample.time >= cpu->vlp.sample_interval_ns && + intel_pstate_sample(cpu, time); +} + static int intel_pstate_prepare_request(struct cpudata *cpu, int pstate) { int min_pstate = max(cpu->pstate.min_pstate, cpu->min_perf_ratio); From patchwork Tue Mar 10 21:41:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Francisco Jerez X-Patchwork-Id: 212615 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33556C1975A for ; Tue, 10 Mar 2020 21:46:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E152A222C3 for ; Tue, 10 Mar 2020 21:46:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=riseup.net header.i=@riseup.net header.b="SsE+jQEC" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727551AbgCJVqZ (ORCPT ); Tue, 10 Mar 2020 17:46:25 -0400 Received: from mx1.riseup.net ([198.252.153.129]:49234 "EHLO mx1.riseup.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727685AbgCJVqY (ORCPT ); Tue, 10 Mar 2020 17:46:24 -0400 Received: from bell.riseup.net (unknown [10.0.1.178]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Sectigo RSA Domain Validation Secure Server CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 48cTDv5qNzzFf7g; Tue, 10 Mar 2020 14:46:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1583876783; bh=KEkzgcuMGqyPvBrxJ4k0uNT7bSd/BkqPcZ27lj5X3iQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SsE+jQECn1EKJsgYYTSBo6F1id/YM6VWt39fclFqNhwoM4MoRKR8K7uFm7/SJR1TD heL7OFWXINHXDJpgE5u8Dww6bpjrPnEH2erUnKd408jVOSQDsJhHEY0uvLXmw630YI k32HpJUJvsY0yI9Ox1WvM44Zjk4oX12SVhTpmd98= X-Riseup-User-ID: AD4EAA60D84A63821C196DFB6AFBF4BEDE6F01FBB0D27219F575F5C9C11F17D0 Received: from [127.0.0.1] (localhost [127.0.0.1]) by bell.riseup.net (Postfix) with ESMTPSA id 48cTDv3zs7zJsFM; Tue, 10 Mar 2020 14:46:23 -0700 (PDT) From: Francisco Jerez To: linux-pm@vger.kernel.org, intel-gfx@lists.freedesktop.org Cc: "Rafael J. Wysocki" , "Pandruvada, Srinivas" , "Vivi, Rodrigo" , Peter Zijlstra Subject: [PATCH 06/10] cpufreq: intel_pstate: Implement VLP controller target P-state range estimation. Date: Tue, 10 Mar 2020 14:41:59 -0700 Message-Id: <20200310214203.26459-7-currojerez@riseup.net> In-Reply-To: <20200310214203.26459-1-currojerez@riseup.net> References: <20200310214203.26459-1-currojerez@riseup.net> MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org The function introduced here calculates a P-state range derived from the statistics computed in the previous patch which will be used to drive the HWP P-state range or (if HWP is not available) as basis for some additional kernel-side frequency selection mechanism which will choose a single P-state from the range. This is meant to provide a variably low-pass filtering effect that will damp oscillations below a frequency threshold that can be specified by device drivers via PM QoS in order to achieve energy-efficient behavior in cases where the system has an IO bottleneck. Signed-off-by: Francisco Jerez --- drivers/cpufreq/intel_pstate.c | 157 +++++++++++++++++++++++++++++++++ 1 file changed, 157 insertions(+) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 12ee350db2a9..cecadfec8bc1 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -207,17 +207,34 @@ struct vlp_status_sample { int32_t realtime_avg; }; +/** + * VLP controller state used for the estimation of the target P-state + * range, computed by get_vlp_target_range() from the heuristic status + * information defined above in struct vlp_status_sample. + */ +struct vlp_target_range { + unsigned int value[2]; + int32_t p_base; +}; + /** * struct vlp_data - VLP controller parameters and state. * @sample_interval_ns: Update interval in ns. * @sample_frequency_hz: Reciprocal of the update interval in Hz. + * @gain*: Response factor of the controller relative to each + * one of its linear input variables as fixed-point + * fraction. */ struct vlp_data { s64 sample_interval_ns; int32_t sample_frequency_hz; + int32_t gain_aggr; + int32_t gain_rt; + int32_t gain; struct vlp_input_stats stats; struct vlp_status_sample status; + struct vlp_target_range target; }; /** @@ -323,12 +340,18 @@ static struct cpudata **all_cpu_data; /** * struct vlp_params - VLP controller static configuration * @sample_interval_ms: Update interval in ms. + * @setpoint_*_pml: Target CPU utilization at which the controller is + * expected to leave the current P-state untouched, + * as an integer per mille. * @avg*_hz: Exponential averaging frequencies of the various * low-pass filters as an integer in Hz. */ struct vlp_params { int sample_interval_ms; + int setpoint_0_pml; + int setpoint_aggr_pml; int avg_hz; + int realtime_gain_pml; int debug; }; @@ -362,7 +385,10 @@ struct pstate_funcs { static struct pstate_funcs pstate_funcs __read_mostly; static struct vlp_params vlp_params __read_mostly = { .sample_interval_ms = 10, + .setpoint_0_pml = 900, + .setpoint_aggr_pml = 1500, .avg_hz = 2, + .realtime_gain_pml = 12000, .debug = 0, }; @@ -1873,6 +1899,11 @@ static void intel_pstate_reset_vlp(struct cpudata *cpu) vlp->sample_interval_ns = vlp_params.sample_interval_ms * NSEC_PER_MSEC; vlp->sample_frequency_hz = max(1u, (uint32_t)MSEC_PER_SEC / vlp_params.sample_interval_ms); + vlp->gain_rt = div_fp(cpu->pstate.max_pstate * + vlp_params.realtime_gain_pml, 1000); + vlp->gain_aggr = max(1, div_fp(1000, vlp_params.setpoint_aggr_pml)); + vlp->gain = max(1, div_fp(1000, vlp_params.setpoint_0_pml)); + vlp->target.p_base = 0; vlp->stats.last_response_frequency_hz = vlp_params.avg_hz; } @@ -1996,6 +2027,132 @@ static const struct vlp_status_sample *get_vlp_status_sample( return last_status; } +/** + * Calculate the target P-state range for the next update period. + * Uses a variably low-pass-filtering controller intended to improve + * energy efficiency when a CPU response frequency target is specified + * via PM QoS (e.g. under IO-bound conditions). + */ +static const struct vlp_target_range *get_vlp_target_range(struct cpudata *cpu) +{ + struct vlp_data *vlp = &cpu->vlp; + struct vlp_target_range *last_target = &vlp->target; + + /* + * P-state limits in fixed-point as allowed by the policy. + */ + const int32_t p0 = int_tofp(max(cpu->pstate.min_pstate, + cpu->min_perf_ratio)); + const int32_t p1 = int_tofp(cpu->max_perf_ratio); + + /* + * Observed average P-state during the sampling period. The + * conservative path (po_cons) uses the TSC increment as + * denominator which will give the minimum (arguably most + * energy-efficient) P-state able to accomplish the observed + * amount of work during the sampling period. + * + * The downside of that somewhat optimistic estimate is that + * it can give a biased result for intermittent + * latency-sensitive workloads, which may have to be completed + * in a short window of time for the system to achieve maximum + * performance, even if the average CPU utilization is low. + * For that reason the aggressive path (po_aggr) uses the + * MPERF increment as denominator, which is approximately + * optimal under the pessimistic assumption that the CPU work + * cannot be parallelized with any other dependent IO work + * that subsequently keeps the CPU idle (partly in C1+ + * states). + */ + const int32_t po_cons = + div_fp((cpu->sample.aperf << cpu->aperf_mperf_shift) + * cpu->pstate.max_pstate_physical, + cpu->sample.tsc); + const int32_t po_aggr = + div_fp((cpu->sample.aperf << cpu->aperf_mperf_shift) + * cpu->pstate.max_pstate_physical, + (cpu->sample.mperf << cpu->aperf_mperf_shift)); + + const struct vlp_status_sample *status = + get_vlp_status_sample(cpu, po_cons); + + /* Calculate the target P-state. */ + const int32_t p_tgt_cons = mul_fp(vlp->gain, po_cons); + const int32_t p_tgt_aggr = mul_fp(vlp->gain_aggr, po_aggr); + const int32_t p_tgt = max(p0, min(p1, max(p_tgt_cons, p_tgt_aggr))); + + /* Calculate the realtime P-state target lower bound. */ + const int32_t pm = int_tofp(cpu->pstate.max_pstate); + const int32_t p_tgt_rt = min(pm, mul_fp(vlp->gain_rt, + status->realtime_avg)); + + /* + * Low-pass filter the P-state estimate above by exponential + * averaging. For an oscillating workload (e.g. submitting + * work repeatedly to a device like a soundcard or GPU) this + * will approximate the minimum P-state that would be able to + * accomplish the observed amount of work during the averaging + * period, which is also the optimally energy-efficient one, + * under the assumptions that: + * + * - The power curve of the system is convex throughout the + * range of P-states allowed by the policy. I.e. energy + * efficiency is steadily decreasing with frequency past p0 + * (which is typically close to the maximum-efficiency + * ratio). In practice for the lower range of P-states + * this may only be approximately true due to the + * interaction between different components of the system. + * + * - Parallelism constraints of the workload don't prevent it + * from achieving the same throughput at the lower P-state. + * This will happen in cases where the application is + * designed in a way that doesn't allow for dependent CPU + * and IO jobs to be pipelined, leading to alternating full + * and zero utilization of the CPU and IO device. This + * will give an average IO device utilization lower than + * 100% regardless of the CPU frequency, which should + * prevent the device driver from requesting a response + * frequency bound, so the filtered P-state calculated + * below won't have an influence on the controller + * response. + * + * - The period of the oscillating workload is significantly + * shorter than the time constant of the exponential + * average (1s / last_response_frequency_hz). Otherwise for + * more slowly oscillating workloads the controller + * response will roughly follow the oscillation, leading to + * decreased energy efficiency. + * + * - The behavior of the workload doesn't change + * qualitatively during the next update interval. This is + * only true in the steady state, and could possibly lead + * to a transitory period in which the controller response + * deviates from the most energy-efficient ratio until the + * workload reaches a steady state again. + */ + const int32_t alpha = get_last_sample_avg_weight( + cpu, vlp->stats.last_response_frequency_hz); + + last_target->p_base = p_tgt + mul_fp(alpha, + last_target->p_base - p_tgt); + + /* + * Use the low-pass-filtered controller response for better + * energy efficiency unless we have reasons to believe that + * some of the optimality assumptions discussed above may not + * hold. + */ + if ((status->value & VLP_BOTTLENECK_IO)) { + last_target->value[0] = rnd_fp(p0); + last_target->value[1] = rnd_fp(last_target->p_base); + } else { + last_target->value[0] = rnd_fp(p_tgt_rt); + last_target->value[1] = rnd_fp(p1); + } + + return last_target; +} + /** * Collect some scheduling and PM statistics in response to an * update_state() call. From patchwork Tue Mar 10 21:42:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Francisco Jerez X-Patchwork-Id: 212617 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1107BC3F2D2 for ; Tue, 10 Mar 2020 21:46:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id BD75A21655 for ; Tue, 10 Mar 2020 21:46:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=riseup.net header.i=@riseup.net header.b="BEQDPbgw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727506AbgCJVqY (ORCPT ); Tue, 10 Mar 2020 17:46:24 -0400 Received: from mx1.riseup.net ([198.252.153.129]:49240 "EHLO mx1.riseup.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727551AbgCJVqY (ORCPT ); Tue, 10 Mar 2020 17:46:24 -0400 Received: from bell.riseup.net (unknown [10.0.1.178]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Sectigo RSA Domain Validation Secure Server CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 48cTDw0378zFf59; Tue, 10 Mar 2020 14:46:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1583876784; bh=ltwuhymiRp0VE2eGHyBbW5dhX5HoTXYQ8tBDbkFknOM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BEQDPbgw4AFgInKuxEXiyG6FyPYZvsQhtCWhvbRFbNd1QwRwlwiaAucd5RGHSlcvb +nwClGR6I18+Jef9P791Em/n0bngfm31Vd/R9TnNzvW8T/65UHLhtjtIAp30kxszpe 6xw6eKaGNg5QyLmzoVaZm/E2rOBI3pWsPvGibZFg= X-Riseup-User-ID: 27D01099CFA45474A2CF5CDB74DB5D1B94D423BF3BF6EE795CB3C483527536A2 Received: from [127.0.0.1] (localhost [127.0.0.1]) by bell.riseup.net (Postfix) with ESMTPSA id 48cTDv5PBzzJs07; Tue, 10 Mar 2020 14:46:23 -0700 (PDT) From: Francisco Jerez To: linux-pm@vger.kernel.org, intel-gfx@lists.freedesktop.org Cc: "Rafael J. Wysocki" , "Pandruvada, Srinivas" , "Vivi, Rodrigo" , Peter Zijlstra Subject: [PATCH 07/10] cpufreq: intel_pstate: Implement VLP controller for HWP parts. Date: Tue, 10 Mar 2020 14:42:00 -0700 Message-Id: <20200310214203.26459-8-currojerez@riseup.net> In-Reply-To: <20200310214203.26459-1-currojerez@riseup.net> References: <20200310214203.26459-1-currojerez@riseup.net> MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org This implements a simple variably low-pass-filtering governor in control of the HWP MIN/MAX PERF range based on the previously introduced get_vlp_target_range(). See "cpufreq: intel_pstate: Implement VLP controller target P-state range estimation." for the rationale. Signed-off-by: Francisco Jerez --- drivers/cpufreq/intel_pstate.c | 79 +++++++++++++++++++++++++++++++++- 1 file changed, 77 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index cecadfec8bc1..a01eed40d897 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1905,6 +1905,20 @@ static void intel_pstate_reset_vlp(struct cpudata *cpu) vlp->gain = max(1, div_fp(1000, vlp_params.setpoint_0_pml)); vlp->target.p_base = 0; vlp->stats.last_response_frequency_hz = vlp_params.avg_hz; + + if (hwp_active) { + const uint32_t p0 = max(cpu->pstate.min_pstate, + cpu->min_perf_ratio); + const uint32_t p1 = max_t(uint32_t, p0, cpu->max_perf_ratio); + const uint64_t hwp_req = (READ_ONCE(cpu->hwp_req_cached) & + ~(HWP_MAX_PERF(~0L) | + HWP_MIN_PERF(~0L) | + HWP_DESIRED_PERF(~0L))) | + HWP_MIN_PERF(p0) | HWP_MAX_PERF(p1); + + wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, hwp_req); + cpu->hwp_req_cached = hwp_req; + } } /** @@ -2222,6 +2236,46 @@ static void intel_pstate_adjust_pstate(struct cpudata *cpu) fp_toint(cpu->iowait_boost * 100)); } +static void intel_pstate_adjust_pstate_range(struct cpudata *cpu, + const unsigned int range[]) +{ + const int from = cpu->hwp_req_cached; + unsigned int p0, p1, p_min, p_max; + struct sample *sample; + uint64_t hwp_req; + + update_turbo_state(); + + p0 = max(cpu->pstate.min_pstate, cpu->min_perf_ratio); + p1 = max_t(unsigned int, p0, cpu->max_perf_ratio); + p_min = clamp_t(unsigned int, range[0], p0, p1); + p_max = clamp_t(unsigned int, range[1], p0, p1); + + trace_cpu_frequency(p_max * cpu->pstate.scaling, cpu->cpu); + + hwp_req = (READ_ONCE(cpu->hwp_req_cached) & + ~(HWP_MAX_PERF(~0L) | HWP_MIN_PERF(~0L) | + HWP_DESIRED_PERF(~0L))) | + HWP_MIN_PERF(vlp_params.debug & 2 ? p0 : p_min) | + HWP_MAX_PERF(vlp_params.debug & 4 ? p1 : p_max); + + if (hwp_req != cpu->hwp_req_cached) { + wrmsrl(MSR_HWP_REQUEST, hwp_req); + cpu->hwp_req_cached = hwp_req; + } + + sample = &cpu->sample; + trace_pstate_sample(mul_ext_fp(100, sample->core_avg_perf), + fp_toint(sample->busy_scaled), + from, + hwp_req, + sample->mperf, + sample->aperf, + sample->tsc, + get_avg_frequency(cpu), + fp_toint(cpu->iowait_boost * 100)); +} + static void intel_pstate_update_util(struct update_util_data *data, u64 time, unsigned int flags) { @@ -2260,6 +2314,22 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, intel_pstate_adjust_pstate(cpu); } +/** + * Implementation of the cpufreq update_util hook based on the VLP + * controller (see get_vlp_target_range()). + */ +static void intel_pstate_update_util_hwp_vlp(struct update_util_data *data, + u64 time, unsigned int flags) +{ + struct cpudata *cpu = container_of(data, struct cpudata, update_util); + + if (update_vlp_sample(cpu, time, flags)) { + const struct vlp_target_range *target = + get_vlp_target_range(cpu); + intel_pstate_adjust_pstate_range(cpu, target->value); + } +} + static struct pstate_funcs core_funcs = { .get_max = core_get_max_pstate, .get_max_physical = core_get_max_pstate_physical, @@ -2389,6 +2459,9 @@ static int intel_pstate_init_cpu(unsigned int cpunum) intel_pstate_get_cpu_pstates(cpu); + if (pstate_funcs.update_util == intel_pstate_update_util_hwp_vlp) + intel_pstate_reset_vlp(cpu); + pr_debug("controlling: cpu %d\n", cpunum); return 0; @@ -2398,7 +2471,8 @@ static void intel_pstate_set_update_util_hook(unsigned int cpu_num) { struct cpudata *cpu = all_cpu_data[cpu_num]; - if (hwp_active && !hwp_boost) + if (hwp_active && !hwp_boost && + pstate_funcs.update_util != intel_pstate_update_util_hwp_vlp) return; if (cpu->update_util_set) @@ -2526,7 +2600,8 @@ static int intel_pstate_set_policy(struct cpufreq_policy *policy) * was turned off, in that case we need to clear the * update util hook. */ - if (!hwp_boost) + if (!hwp_boost && pstate_funcs.update_util != + intel_pstate_update_util_hwp_vlp) intel_pstate_clear_update_util_hook(policy->cpu); intel_pstate_hwp_set(policy->cpu); } From patchwork Tue Mar 10 21:42:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Francisco Jerez X-Patchwork-Id: 212616 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39131C18E7A for ; Tue, 10 Mar 2020 21:46:26 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E723D24684 for ; Tue, 10 Mar 2020 21:46:25 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=riseup.net header.i=@riseup.net header.b="KmCtsL34" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727693AbgCJVqZ (ORCPT ); Tue, 10 Mar 2020 17:46:25 -0400 Received: from mx1.riseup.net ([198.252.153.129]:49230 "EHLO mx1.riseup.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727673AbgCJVqZ (ORCPT ); Tue, 10 Mar 2020 17:46:25 -0400 Received: from bell.riseup.net (unknown [10.0.1.178]) (using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits)) (Client CN "*.riseup.net", Issuer "Sectigo RSA Domain Validation Secure Server CA" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 48cTDw3BHrzFf3v; Tue, 10 Mar 2020 14:46:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1583876784; bh=lQcnxVdzWXKWrSmpl4/uUHBbYTey/h/8Iznnu2Xl5Xs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=KmCtsL34b0YFEWPEFfHqRzMBNBREpOirxpkcWv+hTDvVyaOoQ6FimlCPTfVvEKV6M KcS9mEeZTeZRqFF5A+rl6bfZZSe2+V5+0uIJsgkXsi3Oq03WvQVsDyuoW7l8jHNJbl b+VG8f4+zpSKsDI7E+KX4qa8sME6xnD+b7tQa7V4= X-Riseup-User-ID: 88BB92681B8A847D8338CF869D6D9FCBBC58A96AF3BBB0CC65835658AD7ACD8C Received: from [127.0.0.1] (localhost [127.0.0.1]) by bell.riseup.net (Postfix) with ESMTPSA id 48cTDw1HpczJs07; Tue, 10 Mar 2020 14:46:24 -0700 (PDT) From: Francisco Jerez To: linux-pm@vger.kernel.org, intel-gfx@lists.freedesktop.org Cc: "Rafael J. Wysocki" , "Pandruvada, Srinivas" , "Vivi, Rodrigo" , Peter Zijlstra Subject: [PATCH 09/10] OPTIONAL: cpufreq: intel_pstate: Add tracing of VLP controller status. Date: Tue, 10 Mar 2020 14:42:02 -0700 Message-Id: <20200310214203.26459-10-currojerez@riseup.net> In-Reply-To: <20200310214203.26459-1-currojerez@riseup.net> References: <20200310214203.26459-1-currojerez@riseup.net> MIME-Version: 1.0 Sender: linux-pm-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Signed-off-by: Francisco Jerez --- drivers/cpufreq/intel_pstate.c | 9 ++++++--- include/trace/events/power.h | 13 +++++++++---- 2 files changed, 15 insertions(+), 7 deletions(-) diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 050cc8f03c26..c4558a131660 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -2233,7 +2233,8 @@ static void intel_pstate_adjust_pstate(struct cpudata *cpu) sample->aperf, sample->tsc, get_avg_frequency(cpu), - fp_toint(cpu->iowait_boost * 100)); + fp_toint(cpu->iowait_boost * 100), + cpu->vlp.status.value); } static void intel_pstate_adjust_pstate_range(struct cpudata *cpu, @@ -2273,7 +2274,8 @@ static void intel_pstate_adjust_pstate_range(struct cpudata *cpu, sample->aperf, sample->tsc, get_avg_frequency(cpu), - fp_toint(cpu->iowait_boost * 100)); + fp_toint(cpu->iowait_boost * 100), + cpu->vlp.status.value); } static void intel_pstate_update_util(struct update_util_data *data, u64 time, @@ -2782,7 +2784,8 @@ static void intel_cpufreq_trace(struct cpudata *cpu, unsigned int trace_type, in sample->aperf, sample->tsc, get_avg_frequency(cpu), - fp_toint(cpu->iowait_boost * 100)); + fp_toint(cpu->iowait_boost * 100), + 0); } static int intel_cpufreq_target(struct cpufreq_policy *policy, diff --git a/include/trace/events/power.h b/include/trace/events/power.h index 7e4b52e8ca3a..e94d5e618175 100644 --- a/include/trace/events/power.h +++ b/include/trace/events/power.h @@ -72,7 +72,8 @@ TRACE_EVENT(pstate_sample, u64 aperf, u64 tsc, u32 freq, - u32 io_boost + u32 io_boost, + u32 vlp_status ), TP_ARGS(core_busy, @@ -83,7 +84,8 @@ TRACE_EVENT(pstate_sample, aperf, tsc, freq, - io_boost + io_boost, + vlp_status ), TP_STRUCT__entry( @@ -96,6 +98,7 @@ TRACE_EVENT(pstate_sample, __field(u64, tsc) __field(u32, freq) __field(u32, io_boost) + __field(u32, vlp_status) ), TP_fast_assign( @@ -108,9 +111,10 @@ TRACE_EVENT(pstate_sample, __entry->tsc = tsc; __entry->freq = freq; __entry->io_boost = io_boost; + __entry->vlp_status = vlp_status; ), - TP_printk("core_busy=%lu scaled=%lu from=%lu to=%lu mperf=%llu aperf=%llu tsc=%llu freq=%lu io_boost=%lu", + TP_printk("core_busy=%lu scaled=%lu from=%lu to=%lu mperf=%llu aperf=%llu tsc=%llu freq=%lu io_boost=%lu vlp=%lu", (unsigned long)__entry->core_busy, (unsigned long)__entry->scaled_busy, (unsigned long)__entry->from, @@ -119,7 +123,8 @@ TRACE_EVENT(pstate_sample, (unsigned long long)__entry->aperf, (unsigned long long)__entry->tsc, (unsigned long)__entry->freq, - (unsigned long)__entry->io_boost + (unsigned long)__entry->io_boost, + (unsigned long)__entry->vlp_status ) );