diff mbox series

[V3,2/2] cpufreq: intel_pstate: Implement ->resolve_freq()

Message ID 23e3dee8688f5a9767635b686bb7a9c0e09a4438.1564724511.git.viresh.kumar@linaro.org
State New
Headers show
Series None | expand

Commit Message

Viresh Kumar Aug. 2, 2019, 5:44 a.m. UTC
Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,
which can be used to force a limit on the min/max P state of the driver.
Though these files eventually control the min/max frequencies that the
CPUs will run at, they don't make a change to policy->min/max values.

When the values of these files are changed (in passive mode of the
driver), it leads to calling ->limits() callback of the cpufreq
governors, like schedutil. On a call to it the governors shall
forcefully update the frequency to come within the limits. For getting
the value within limits, the schedutil governor calls
cpufreq_driver_resolve_freq(), which eventually tries to call
->resolve_freq() callback for this driver. Since the callback isn't
present, the schedutil governor fails to get the target freq within
limit and sometimes aborts the update believing that the frequency is
already set to the target value.

This patch implements the resolve_freq() callback, so the correct target
frequency can be returned by the driver and the schedutil governor gets
the frequency within limits immediately.

Fixes: ecd288429126 ("cpufreq: schedutil: Don't set next_freq to UINT_MAX")
Cc: v4.18+ <stable@vger.kernel.org> # v4.18+
Reported-by: Doug Smythies <doug.smythies@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
V3:
- This was earlier posted as a diff to an email reply and is getting
  sent for the first time only as a proper patch.

 drivers/cpufreq/intel_pstate.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

-- 
2.21.0.rc0.269.g1a574e7a288b

Comments

Doug Smythies Aug. 3, 2019, 3 p.m. UTC | #1
On 2019.08.02 02:28 Rafael J. Wysocki wrote:
> On Friday, August 2, 2019 11:17:55 AM CEST Rafael J. Wysocki wrote:

>> On Fri, Aug 2, 2019 at 7:44 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:

>>>

>>> Intel pstate driver exposes min_perf_pct and max_perf_pct sysfs files,

>>> which can be used to force a limit on the min/max P state of the driver.

>>> Though these files eventually control the min/max frequencies that the

>>> CPUs will run at, they don't make a change to policy->min/max values.

>> 

>> That's correct.

>> 

>>> When the values of these files are changed (in passive mode of the

>>> driver), it leads to calling ->limits() callback of the cpufreq

>>> governors, like schedutil. On a call to it the governors shall

>>> forcefully update the frequency to come within the limits.

>> 

>> OK, so the problem is that it is a bug to invoke the governor's ->limits()

>> callback without updating policy->min/max, because that's what

>> "limits" mean to the governors.

>> 

>> Fair enough.

>

> AFAICS this can be addressed by adding PM QoS freq limits requests of each CPU to

> intel_pstate in the passive mode such that changing min_perf_pct or max_perf_pct

> will cause these requests to be updated.


All governors for the intel_cpufreq (intel_pstate in passive mode) CPU frequency
scaling driver are broken with respect to this issue, not just the schedutil
governor. My initial escalation had been focused on acpi-cpufreq/schedutil
and intel_cpufreq/schedutil, as they were both broken, and both fixed by my initially
submitted reversion. What can I say, I missed that other intel_cpufreq governors
were also involved.

I tested all of them: conservative ondemand userspace powersave performance schedutil
Note that no other governor uses resolve_freq().

... Doug
diff mbox series

Patch

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index cc27d4c59dca..2d84361fbebc 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -2314,6 +2314,18 @@  static int intel_cpufreq_target(struct cpufreq_policy *policy,
 	return 0;
 }
 
+static unsigned int intel_cpufreq_resolve_freq(struct cpufreq_policy *policy,
+					       unsigned int target_freq)
+{
+	struct cpudata *cpu = all_cpu_data[policy->cpu];
+	int target_pstate;
+
+	target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
+	target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
+
+	return target_pstate * cpu->pstate.scaling;
+}
+
 static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
 					      unsigned int target_freq)
 {
@@ -2350,6 +2362,7 @@  static struct cpufreq_driver intel_cpufreq = {
 	.verify		= intel_cpufreq_verify_policy,
 	.target		= intel_cpufreq_target,
 	.fast_switch	= intel_cpufreq_fast_switch,
+	.resolve_freq	= intel_cpufreq_resolve_freq,
 	.init		= intel_cpufreq_cpu_init,
 	.exit		= intel_pstate_cpu_exit,
 	.stop_cpu	= intel_cpufreq_stop_cpu,