mbox series

[v5,0/5] Rework system pressure interface to the scheduler

Message ID 20240220145947.1107937-1-vincent.guittot@linaro.org
Headers show
Series Rework system pressure interface to the scheduler | expand

Message

Vincent Guittot Feb. 20, 2024, 2:59 p.m. UTC
Following the consolidation and cleanup of CPU capacity in [1], this serie
reworks how the scheduler gets the pressures on CPUs. We need to take into
account all pressures applied by cpufreq on the compute capacity of a CPU
for dozens of ms or more and not only cpufreq cooling device or HW
mitigiations. We split the pressure applied on CPU's capacity in 2 parts:
- one from cpufreq and freq_qos
- one from HW high freq mitigiation.

The next step will be to add a dedicated interface for long standing
capping of the CPU capacity (i.e. for seconds or more) like the
scaling_max_freq of cpufreq sysfs. The latter is already taken into
account by this serie but as a temporary pressure which is not always the
best choice when we know that it will happen for seconds or more.

[1] https://lore.kernel.org/lkml/20231211104855.558096-1-vincent.guittot@linaro.org/

Change since v4:
- Add READ_ONCE() in cpufreq_get_pressure()
- Add ack and reviewed tags

Change since v3:
- Fix uninitialized variables in cpufreq_update_pressure()

Change since v2:
- Rework cpufreq_update_pressure()

Change since v1:
- Use struct cpufreq_policy as parameter of cpufreq_update_pressure()
- Fix typos and comments
- Make sched_thermal_decay_shift boot param as deprecated

Vincent Guittot (5):
  cpufreq: Add a cpufreq pressure feedback for the scheduler
  sched: Take cpufreq feedback into account
  thermal/cpufreq: Remove arch_update_thermal_pressure()
  sched: Rename arch_update_thermal_pressure into
    arch_update_hw_pressure
  sched/pelt: Remove shift of thermal clock

 .../admin-guide/kernel-parameters.txt         |  1 +
 arch/arm/include/asm/topology.h               |  6 +-
 arch/arm64/include/asm/topology.h             |  6 +-
 drivers/base/arch_topology.c                  | 26 ++++----
 drivers/cpufreq/cpufreq.c                     | 36 +++++++++++
 drivers/cpufreq/qcom-cpufreq-hw.c             |  4 +-
 drivers/thermal/cpufreq_cooling.c             |  3 -
 include/linux/arch_topology.h                 |  8 +--
 include/linux/cpufreq.h                       | 10 +++
 include/linux/sched/topology.h                |  8 +--
 .../{thermal_pressure.h => hw_pressure.h}     | 14 ++---
 include/trace/events/sched.h                  |  2 +-
 init/Kconfig                                  | 12 ++--
 kernel/sched/core.c                           |  8 +--
 kernel/sched/fair.c                           | 63 +++++++++----------
 kernel/sched/pelt.c                           | 18 +++---
 kernel/sched/pelt.h                           | 16 ++---
 kernel/sched/sched.h                          | 22 +------
 18 files changed, 144 insertions(+), 119 deletions(-)
 rename include/trace/events/{thermal_pressure.h => hw_pressure.h} (55%)

Comments

Lukasz Luba Feb. 21, 2024, 1:28 p.m. UTC | #1
Hi Vincent,

On 2/20/24 14:59, Vincent Guittot wrote:
> Following the consolidation and cleanup of CPU capacity in [1], this serie
> reworks how the scheduler gets the pressures on CPUs. We need to take into
> account all pressures applied by cpufreq on the compute capacity of a CPU
> for dozens of ms or more and not only cpufreq cooling device or HW
> mitigiations. We split the pressure applied on CPU's capacity in 2 parts:
> - one from cpufreq and freq_qos
> - one from HW high freq mitigiation.
> 
> The next step will be to add a dedicated interface for long standing
> capping of the CPU capacity (i.e. for seconds or more) like the
> scaling_max_freq of cpufreq sysfs. The latter is already taken into
> account by this serie but as a temporary pressure which is not always the
> best choice when we know that it will happen for seconds or more.
> 
> [1] https://lore.kernel.org/lkml/20231211104855.558096-1-vincent.guittot@linaro.org/
> 
> Change since v4:
> - Add READ_ONCE() in cpufreq_get_pressure()
> - Add ack and reviewed tags
> 
> Change since v3:
> - Fix uninitialized variables in cpufreq_update_pressure()
> 
> Change since v2:
> - Rework cpufreq_update_pressure()
> 
> Change since v1:
> - Use struct cpufreq_policy as parameter of cpufreq_update_pressure()
> - Fix typos and comments
> - Make sched_thermal_decay_shift boot param as deprecated
> 
> Vincent Guittot (5):
>    cpufreq: Add a cpufreq pressure feedback for the scheduler
>    sched: Take cpufreq feedback into account
>    thermal/cpufreq: Remove arch_update_thermal_pressure()
>    sched: Rename arch_update_thermal_pressure into
>      arch_update_hw_pressure
>    sched/pelt: Remove shift of thermal clock
> 
>   .../admin-guide/kernel-parameters.txt         |  1 +
>   arch/arm/include/asm/topology.h               |  6 +-
>   arch/arm64/include/asm/topology.h             |  6 +-
>   drivers/base/arch_topology.c                  | 26 ++++----
>   drivers/cpufreq/cpufreq.c                     | 36 +++++++++++
>   drivers/cpufreq/qcom-cpufreq-hw.c             |  4 +-
>   drivers/thermal/cpufreq_cooling.c             |  3 -
>   include/linux/arch_topology.h                 |  8 +--
>   include/linux/cpufreq.h                       | 10 +++
>   include/linux/sched/topology.h                |  8 +--
>   .../{thermal_pressure.h => hw_pressure.h}     | 14 ++---
>   include/trace/events/sched.h                  |  2 +-
>   init/Kconfig                                  | 12 ++--
>   kernel/sched/core.c                           |  8 +--
>   kernel/sched/fair.c                           | 63 +++++++++----------
>   kernel/sched/pelt.c                           | 18 +++---
>   kernel/sched/pelt.h                           | 16 ++---
>   kernel/sched/sched.h                          | 22 +------
>   18 files changed, 144 insertions(+), 119 deletions(-)
>   rename include/trace/events/{thermal_pressure.h => hw_pressure.h} (55%)
> 


The code looks good and works as expected. The time delays in those
old mechanisms that were important to me are good now. The boost is
handled, cpufreq capping from sysfs - all good. Also the last patch
which removes the shift and makes it obsolete. Thanks!

Feel free to add to all patches:

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>

Regards,
Lukasz