diff mbox series

cpufreq: Rebuild sched-domains when removing cpufreq driver

Message ID 20230918112937.493352-1-pierre.gondois@arm.com
State New
Headers show
Series cpufreq: Rebuild sched-domains when removing cpufreq driver | expand

Commit Message

Pierre Gondois Sept. 18, 2023, 11:29 a.m. UTC
The Energy Aware Scheduler (EAS) relies on the schedutil governor.
When moving to/from the schedutil governor, sched domains must be
rebuilt to allow re-evaluating the enablement conditions of EAS.
This is done through sched_cpufreq_governor_change().

Having a cpufreq governor assumes having a cpufreq driver running.
Inserting/removing a cpufreq driver should trigger a re-evaluation
of EAS enablement conditions, avoiding to see EAS enabled when
removing a running cpufreq driver.

Add a sched_cpufreq_governor_change() call in cpufreq driver removal
path.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
---
 drivers/cpufreq/cpufreq.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Shrikanth Hegde Sept. 20, 2023, 5:29 a.m. UTC | #1
On 9/19/23 1:19 PM, Pierre Gondois wrote:
> 
> 
> On 9/19/23 01:03, Dietmar Eggemann wrote:
>> On 18/09/2023 13:29, Pierre Gondois wrote:
>>> The Energy Aware Scheduler (EAS) relies on the schedutil governor.
>>> When moving to/from the schedutil governor, sched domains must be
>>> rebuilt to allow re-evaluating the enablement conditions of EAS.
>>> This is done through sched_cpufreq_governor_change().
>>>

Hi Pierre. It looks correct to update when removing the 
cpufreq governer. 


>>> Having a cpufreq governor assumes having a cpufreq driver running.
>>> Inserting/removing a cpufreq driver should trigger a re-evaluation
>>> of EAS enablement conditions, avoiding to see EAS enabled when
>>> removing a running cpufreq driver.
>>>
>>> Add a sched_cpufreq_governor_change() call in cpufreq driver removal
>>> path.
>>
>> Rebuilding SDs when inserting the driver is already covered by
>>
>>    cpufreq_online()
>>      cpufreq_set_policy()
>>        sched_cpufreq_governor_change()
>>          if (old or new gov eq. schedutil)
>>            schedule_work(&rebuild_sd_work)
>>
>> So what's missing is only a sched_cpufreq_governor_change() call when
>> removing the driver, right?
> 
> Yes exact, removing a cpufreq driver (e.g. `rmmod cppc_cpufreq.ko`) goes
> through:
> cpufreq_remove_dev()
> \-__cpufreq_offline()
> 
> so the path you mentioned is not used in this case.
> 

One Doubt, while looking through code. Not well versed with this area. 

cpuhp_cpufreq_offline is being registered with CPU hotplug. That ends up 
calling cpufreq_offline. This may cause non desired issues.
1. rebuild of sched domains twice instead, once by CPU hotplug and once by this.
2. offline/online of CPU (non-SMT) may not disabling EAS. 




> Regards,
> Pierre
Dietmar Eggemann Sept. 21, 2023, 10:04 p.m. UTC | #2
On 20/09/2023 07:29, Shrikanth Hegde wrote:
> 
> On 9/19/23 1:19 PM, Pierre Gondois wrote:
>>
>> On 9/19/23 01:03, Dietmar Eggemann wrote:
>>> On 18/09/2023 13:29, Pierre Gondois wrote:

[...]

>>> Rebuilding SDs when inserting the driver is already covered by
>>>
>>>    cpufreq_online()
>>>      cpufreq_set_policy()
>>>        sched_cpufreq_governor_change()
>>>          if (old or new gov eq. schedutil)
>>>            schedule_work(&rebuild_sd_work)
>>>
>>> So what's missing is only a sched_cpufreq_governor_change() call when
>>> removing the driver, right?
>>
>> Yes exact, removing a cpufreq driver (e.g. `rmmod cppc_cpufreq.ko`) goes
>> through:
>> cpufreq_remove_dev()
>> \-__cpufreq_offline()
>>
>> so the path you mentioned is not used in this case.

But IMHO the name of sched_cpufreq_governor_change() is now misleading
when called from the driver removal path. It's not the governor which
has changed. We want to disable EAS when unloading the driver in (3):

 ...
 if (!arch_scale_freq_invariant()) {
  if (sched_debug()) {
   pr_warn("rd %*pbl: Disabling EAS: frequency-invariant load tracking
            not yet supported",
   cpumask_pr_args(cpu_map));
  }
  goto free;
 }
 ...

> One Doubt, while looking through code. Not well versed with this area. 
> 
> cpuhp_cpufreq_offline is being registered with CPU hotplug. That ends up 
> calling cpufreq_offline. This may cause non desired issues.
> 1. rebuild of sched domains twice instead, once by CPU hotplug and once by this.

That's possible when you CPU hp out the last CPU of a policy (or Perf
Domain (PD).

Otherwise `cpuhp_cpufreq_offline -> cpufreq_offline() ->
__cpufreq_offline()` returns early in `if (!policy_is_inactive(policy))`
condition.

partition_sched_domains_locked() does:

 (1) detach_destroy_domains()
 (2) build_sched_domains()
 (3) build_perf_domains()

But only the first workqueue event (from cpuset_hotplug_workfn or
rebuild_sd_workfn) will rebuild Sched Domains and PDs (1)-(3), the
second one will only build PDs (3) again.
That's not nice but AFAICS it is not functional incorrect.

> 2. offline/online of CPU (non-SMT) may not disabling EAS.

Can't see this issue right now. When we offline the last CPU of a PD on
a 2 PD system, EAS should be stopped since the root domain does not have
any asymmetric CPU capacities left and when we online it again, EAS
should be started (sched_energy_set()).
Pierre Gondois Sept. 29, 2023, 7:22 a.m. UTC | #3
On 9/29/23 05:27, Viresh Kumar wrote:
> On 28-09-23, 14:49, Pierre Gondois wrote:
>> Another solution would be to call sched_cpufreq_governor_change()
>> from cpufreq_schedutil's init()/exit() callbacks. This would make
>> more sense as EAS/schedutil cpufreq are tightly bound, and it would
>> allow to cover all the possible paths.
>>
>> When tried locally, it seems to cover all scenarios:
>> - insmod/rmmod a cpufreq driver
>> - changing the governor policy
>> - offlining all the CPUs of a pd
> 
> Right now it is done for all governors. We don't need that ? We just care about
> schedutil here ?
> 

Yes exact, EAS requires the schedutil governor to be used by all policies. The purpose
of this function is to re-evaluate if EAS enablement conditions are met. I.e., if one
policy changed from/to schedutil, if EAS must be [de|re]activated.
diff mbox series

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 60ed89000e82..0a4979c34fd1 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1673,6 +1673,8 @@  static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
 		cpufreq_driver->exit(policy);
 		policy->freq_table = NULL;
 	}
+
+	sched_cpufreq_governor_change(policy, policy->governor);
 }
 
 static int cpufreq_offline(unsigned int cpu)