diff mbox series

[v2,2/2] cpufreq: CPPC: Dont read counters for idle CPUs

Message ID 20250619000925.415528-3-pmalani@google.com
State New
Headers show
Series cpufreq: CPPC: idle cpu perf handling | expand

Commit Message

Prashant Malani June 19, 2025, 12:09 a.m. UTC
AMU performance counters tend to be inaccurate when measured on idle CPUs.
On an idle CPU which is programmed to 3.4 GHz (verified through firmware),
here is a measurement and calculation of operating frequency:

t0: ref=899127636, del=3012458473
t1: ref=899129626, del=3012466509
perf=40

For reference, when we measure the same CPU with stress-ng running, we have
a more accurate result:
t0: ref=30751756418, del=104490567689
t1: ref=30751760628, del=104490582296
perf=34

(t0 and t1 are 2 microseconds apart)

In the above, the prescribed method[1] of calculating frequency from CPPC
counters was used.

The follow-on effect is that the inaccurate frequency is stashed in the
cpufreq policy struct when the CPU is brought online. Since CPUs are mostly
idle when they are brought online, this means cpufreq has an inaccurate
view of the programmed clock rate.

Consequently, if userspace tries to actually set the frequency to the
previously erroneous rate (4 GHz in the above example), cpufreq returns
early without calling in to the CPPC driver to send the relevant PCC
command; it thinks the CPU is already at that frequency.

Update the CPPC get_rate() code to skip sampling counters if we know a CPU
is idle, and go directly to the fallback response of returning the
“desired” frequency. The code intends to do that anyway if the counters
happen to return an “idle” reading.

[1] https://docs.kernel.org/admin-guide/acpi/cppc_sysfs.html#computing-average-delivered-performance

Signed-off-by: Prashant Malani <pmalani@google.com>
---

Changes in v2:
- Add sched.h header for usage when compiled as module.

 drivers/cpufreq/cppc_cpufreq.c | 5 +++++
 1 file changed, 5 insertions(+)
diff mbox series

Patch

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index b7c688a5659c..5ed04774e569 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -18,6 +18,7 @@ 
 #include <linux/cpufreq.h>
 #include <linux/irq_work.h>
 #include <linux/kthread.h>
+#include <linux/sched.h>
 #include <linux/time.h>
 #include <linux/vmalloc.h>
 #include <uapi/linux/sched/types.h>
@@ -753,6 +754,10 @@  static unsigned int cppc_cpufreq_get_rate(unsigned int cpu)
 
 	cpufreq_cpu_put(policy);
 
+	/* Idle CPUs have unreliable counters, so skip to the end. */
+	if (idle_cpu(cpu))
+		goto out_invalid_counters;
+
 	ret = cppc_get_perf_ctrs_sample(cpu, &fb_ctrs_t0, &fb_ctrs_t1);
 	if (ret) {
 		if (ret == -EFAULT)