Message ID | 20250522-userspace-governor-doc-v1-1-c8a038e39084@sony.com |
---|---|
State | New |
Headers | show |
Series | cpufreq, docs: (userspace governor) add that actual freq is >= scaling_setspeed | expand |
Hi Russell, On Thu, May 22, 2025 at 06:15:24AM -0500, Russell Haley wrote: > > The userspace governor requests a frequency between policy->min and > > policy->max on behalf of user space. In intel_pstate this translates > > to setting DESIRED_PERF to the requested value which is also the case > > for the other governors. > > Huh. On this Skylake box with kernel 6.14.6, it seems to be setting > Minimum_Performance, and leaving desired at 0. > > > echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > userspace > > echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed > 1400000 > > sudo x86_energy_perf_policy &| grep REQ > cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 Oh cool, I didn't know about x86_energy_perf_policy. Consider the following on a Raptor Lake machine: 1. HWP_REQUEST MSR set by intel_pstate in active mode: # echo active > intel_pstate/status # x86_energy_perf_policy -c 0 2>&1 | grep REQ cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us) # echo 2000000 > cpufreq/policy0/scaling_min_freq # echo 3000000 > cpufreq/policy0/scaling_max_freq # x86_energy_perf_policy -c 0 2>&1 | grep REQ cpu0: HWP_REQ: min 26 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us) scaling_{min,max}_freq just affect the min and max frequencies set in HWP_REQEST. desired_freq is left at 0. 2. HWP_REQUEST MSR set by intel_pstate in passive mode with userspace governor: # echo passive > intel_pstate/status # echo userspace > cpufreq/policy0/scaling_governor # cat cpufreq/policy0/scaling_setspeed 866151 # x86_energy_perf_policy -c 0 2>&1 | grep REQ cpu0: HWP_REQ: min 11 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us) # echo 2000000 > cpufreq/policy0/scaling_setspeed # x86_energy_perf_policy -c 0 2>&1 | grep REQ cpu0: HWP_REQ: min 26 max 68 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us) scaling_setspeed only changes the min frequency in HWP_REQUEST. Meaning, software is explicitly allowing the hardware to choose higher frequencies. 3. Same as above, except with strictuserspace governor, which is a custom kernel module which is exactly the same as the userspace governor, except it has the CPUFREQ_GOV_STRICT_TARGET flag set: # echo strictuserspace > cpufreq/policy0/scaling_governor # x86_energy_perf_policy -c 0 2>&1 | grep REQ cpu0: HWP_REQ: min 26 max 26 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us) # echo 3000000 > cpufreq/policy0/scaling_setspeed # x86_energy_perf_policy -c 0 2>&1 | grep REQ cpu0: HWP_REQ: min 39 max 39 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 pkg0: HWP_REQ_PKG: min 1 max 255 des 0 epp 128 window 0x0 (0*10^0us) With the strict flag set, intel_pstate honours this by setting the min and max freq same. desired_perf is always 0 in the above cases. The strict flag check is done in intel_cpufreq_update_pstate, which sets max_pstate to target_pstate if policy has strict target, and cpu->max_perf_ratio otherwise. As Russell and Rafael have noted, CPU frequency is subject to hardware coordination and optimizations. While I get that, shouldn't software try its best with whatever interface it has available? If a user sets the userspace governor, that's because they want to have manual control over CPU frequency, for whatever reason. The kernel should honor this by setting the min and max freq in HWP_REQUEST equal. The current behaviour explicitly lets the hardware choose higher frequencies. Since Russell pointed out that the "actual freq >= target freq" can be achieved by leaving intel_pstate active and setting scaling_{min,max}_freq instead (for some reason this slipped my mind), I now think the strict target flag should be added to the userspace governor, leaving the documentation as is. Maybe a warning like "you may want to set this exact frequency, but it's subject to hardware coordination, so beware" can be added. Thanks Regards, Shashank
On Thu, May 22, 2025 at 1:54 PM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Thu, May 22, 2025 at 1:15 PM Russell Haley <yumpusamongus@gmail.com> wrote: > > > > > > > > On 5/22/25 4:47 AM, Rafael J. Wysocki wrote: > > > On Thu, May 22, 2025 at 10:51 AM Russell Haley <yumpusamongus@gmail.com> wrote: > > >> > > >> > > >> On 5/22/25 3:05 AM, Shashank Balaji wrote: > > >>> The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which > > >>> means the requested frequency may not strictly be followed. This is true in the > > >>> case of the intel_pstate driver with HWP enabled. When programming the > > >>> HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf > > >>> is set to the policy's max. So, the hardware is free to increase the frequency > > >>> beyond the requested frequency. > > >>> > > >>> This behaviour can be slightly surprising, given the current wording "allows > > >>> userspace to set the CPU frequency". Hence, document this. > > >>> > > >> > > >> In my opinion, the documentation is correct, and it is the > > >> implementation in intel_pstate that is wrong. If the user wanted two > > >> separate knobs that control the minimum and maximum frequencies, they > > >> could leave intel_pstate in "active" mode and change scaling_min_freq > > >> and scaling_max_freq. > > >> > > >> If the user asks for the frequency to be set from userspace, the > > >> frequency had damn well better be set from userspace. > > > > > > The userspace governor requests a frequency between policy->min and > > > policy->max on behalf of user space. In intel_pstate this translates > > > to setting DESIRED_PERF to the requested value which is also the case > > > for the other governors. > > > > Huh. On this Skylake box with kernel 6.14.6, it seems to be setting > > Minimum_Performance, and leaving desired at 0. > > > > > echo userspace | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor > > userspace > > > echo 1400000 | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_setspeed > > 1400000 > > > sudo x86_energy_perf_policy &| grep REQ > > cpu0: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > cpu1: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > cpu2: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > cpu3: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > cpu4: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > cpu5: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > cpu6: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > cpu7: HWP_REQ: min 14 max 40 des 0 epp 128 window 0x0 (0*10^0us) use_pkg 0 > > OK, let me double check the code. I stand corrected, HWP_MIN_PERF is indeed set in accordance with the target frequency, not HWP_DESIRED_PERF. The reason why is because running at a frequency below the target might cause insufficient performance to be delivered which would break the assumptions of the schedutil governor. However, setting HWP_DESIRED_PERF to 0 may be a mistake because it may cause the CPU to always run above the target frequency which is not desirable from the power perspective. What can be done is to set HWP_MIN_PERF and HWP_DESIRED_PERF to the same value. [Note that intel_cpufreq_adjust_perf() used by the schedutil governor actually sets HWP_DESIRED_PERF in accordance with the target frequency, but it also sets HWP_MIN_PERF to the minimum sufficient perf value supplied by schedutil. Since intel_cpufreq_fast_switch() and intel_cpufreq_target() only get one target frequency, they cannot really say if any frequency below the target will be sufficient.]
diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst index 3950583f2b1549b27f568632547e22e9ef8bc167..066fe74f856699c8dd6aaf5e135162ce70686333 100644 --- a/Documentation/admin-guide/pm/cpufreq.rst +++ b/Documentation/admin-guide/pm/cpufreq.rst @@ -397,8 +397,15 @@ policy limits change after that. ------------- This governor does not do anything by itself. Instead, it allows user space -to set the CPU frequency for the policy it is attached to by writing to the -``scaling_setspeed`` attribute of that policy. +to set a target CPU frequency for the policy it is attached to by writing to the +``scaling_setspeed`` attribute of that policy. The actual frequency will be +greater than or equal to ``scaling_setspeed``, depending on the cpufreq driver. +For example, if hardware-managed P-states are enabled, then the ``intel_pstate`` +driver will set the minimum frequency to the value of ``scaling_setspeed`` and +the maximum frequency to the value of ``scaling_max_freq``. The hardware is +free to select any frequency between those two values. If this behavior is not +desired, then ``scaling_max_freq`` should be set to the same value as +``scaling_setspeed``. ``schedutil`` -------------
The userspace governor does not have the CPUFREQ_GOV_STRICT_TARGET flag, which means the requested frequency may not strictly be followed. This is true in the case of the intel_pstate driver with HWP enabled. When programming the HWP_REQUEST MSR, the min_perf is set to `scaling_setspeed`, and the max_perf is set to the policy's max. So, the hardware is free to increase the frequency beyond the requested frequency. This behaviour can be slightly surprising, given the current wording "allows userspace to set the CPU frequency". Hence, document this. Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> --- Documentation/admin-guide/pm/cpufreq.rst | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) --- base-commit: d608703fcdd9e9538f6c7a0fcf98bf79b1375b60 change-id: 20250522-userspace-governor-doc-86380dbab3d5 Best regards,