[V2,4/5] cpufreq: Register notifiers with the PM QoS framework

Message ID e6969d79643c87af24895d508e5c6f7462b1c758.1550748118.git.viresh.kumar@linaro.org
State New
Headers show
Series
  • Untitled series #18782
Related show

Commit Message

Viresh Kumar Feb. 21, 2019, 11:29 a.m.
This registers the notifiers for min/max frequency constraints with the
PM QoS framework. The constraints are also taken into consideration in
cpufreq_set_policy().

This also relocates cpufreq_policy_put_kobj() as it is required to be
called from cpufreq_policy_alloc() now.

No constraints are added until now though.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 drivers/cpufreq/cpufreq.c | 135 +++++++++++++++++++++++++++++++-------
 include/linux/cpufreq.h   |   4 ++
 2 files changed, 116 insertions(+), 23 deletions(-)

-- 
2.21.0.rc0.269.g1a574e7a288b

Comments

Qais Yousef Feb. 22, 2019, 11:44 a.m. | #1
Hi Verish

On 02/21/19 16:59, Viresh Kumar wrote:

[...]

> @@ -2239,6 +2314,8 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,

>  			      struct cpufreq_policy *new_policy)

>  {

>  	struct cpufreq_governor *old_gov;

> +	struct device *cpu_dev = get_cpu_device(policy->cpu);

> +	unsigned long min, max;

>  	int ret;

>  

>  	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",

> @@ -2253,11 +2330,23 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,

>  	if (new_policy->min > new_policy->max)

>  		return -EINVAL;

>  

> +	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);

> +	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);

> +

> +	if (min > new_policy->min)

> +		new_policy->min = min;

> +	if (max < new_policy->max)

> +		new_policy->max = max;

> +


Assuming for example min and max range from 1-10, and thermal throttles max to
5 using pm_qos to deal with temporary thermal pressure. But shortly after
a driver thinks that max shouldn't be greater than 7 for one reason or another.

What will happen after thermal pressure removes its constraint? Will we still
remember the driver's request and apply it so max is set to 7 instead of 10?

Thanks

--
Qais Yousef
Viresh Kumar Feb. 25, 2019, 4:31 a.m. | #2
On 22-02-19, 11:44, Qais Yousef wrote:
> Hi Verish

> 

> On 02/21/19 16:59, Viresh Kumar wrote:

> 

> [...]

> 

> > @@ -2239,6 +2314,8 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,

> >  			      struct cpufreq_policy *new_policy)

> >  {

> >  	struct cpufreq_governor *old_gov;

> > +	struct device *cpu_dev = get_cpu_device(policy->cpu);

> > +	unsigned long min, max;

> >  	int ret;

> >  

> >  	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",

> > @@ -2253,11 +2330,23 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,

> >  	if (new_policy->min > new_policy->max)

> >  		return -EINVAL;

> >  

> > +	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);

> > +	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);

> > +

> > +	if (min > new_policy->min)

> > +		new_policy->min = min;

> > +	if (max < new_policy->max)

> > +		new_policy->max = max;

> > +

> 

> Assuming for example min and max range from 1-10, and thermal throttles max to

> 5 using pm_qos to deal with temporary thermal pressure. But shortly after

> a driver thinks that max shouldn't be greater than 7 for one reason or another.

> 

> What will happen after thermal pressure removes its constraint? Will we still

> remember the driver's request and apply it so max is set to 7 instead of 10?


Once everything comes via PM QoS, it will remember all the presently available
requests and choose a target min/max frequency based on that.

But even with this patchset, with half stuff done with PM QoS and half done with
cpufreq notifiers, it should still work that way only.

-- 
viresh
Qais Yousef Feb. 25, 2019, 8:58 a.m. | #3
On 02/25/19 10:01, Viresh Kumar wrote:
> On 22-02-19, 11:44, Qais Yousef wrote:

> > Hi Verish

> > 

> > On 02/21/19 16:59, Viresh Kumar wrote:

> > 

> > [...]

> > 

> > > @@ -2239,6 +2314,8 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,

> > >  			      struct cpufreq_policy *new_policy)

> > >  {

> > >  	struct cpufreq_governor *old_gov;

> > > +	struct device *cpu_dev = get_cpu_device(policy->cpu);

> > > +	unsigned long min, max;

> > >  	int ret;

> > >  

> > >  	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",

> > > @@ -2253,11 +2330,23 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,

> > >  	if (new_policy->min > new_policy->max)

> > >  		return -EINVAL;

> > >  

> > > +	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);

> > > +	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);

> > > +

> > > +	if (min > new_policy->min)

> > > +		new_policy->min = min;

> > > +	if (max < new_policy->max)

> > > +		new_policy->max = max;

> > > +

> > 

> > Assuming for example min and max range from 1-10, and thermal throttles max to

> > 5 using pm_qos to deal with temporary thermal pressure. But shortly after

> > a driver thinks that max shouldn't be greater than 7 for one reason or another.

> > 

> > What will happen after thermal pressure removes its constraint? Will we still

> > remember the driver's request and apply it so max is set to 7 instead of 10?

> 

> Once everything comes via PM QoS, it will remember all the presently available

> requests and choose a target min/max frequency based on that.


OK I can see the logic now in kernel/power/qos.c now. Sorry I missed it and it
was easier to ask :-)

> 

> But even with this patchset, with half stuff done with PM QoS and half done with

> cpufreq notifiers, it should still work that way only.


And this is why we need to check here if the PM QoS value doesn't conflict with
the current min/max, right? Until the current notifier code is removed they
could trip over each others.

It would be nice to add a comment here about PM QoS managing and remembering
values and that we need to be careful that both mechanisms don't trip over
each others until this transient period is over.

I have a nit too. It would be nice to explicitly state this is
CPU_{MIN,MAX}_FREQUENCY. I can see someone else adding {MIN,MAX}_FREQUENCY for
something elsee (memory maybe?)

Although I looked at the previous series briefly, but this one looks more
compact and easier to follow, so +1 for that.

--
Qais Yousef
Viresh Kumar Feb. 25, 2019, 9:09 a.m. | #4
On 25-02-19, 08:58, Qais Yousef wrote:
> On 02/25/19 10:01, Viresh Kumar wrote:

> > > > +	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);

> > > > +	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);

> > > > +

> > > > +	if (min > new_policy->min)

> > > > +		new_policy->min = min;

> > > > +	if (max < new_policy->max)

> > > > +		new_policy->max = max;


> And this is why we need to check here if the PM QoS value doesn't conflict with

> the current min/max, right? Until the current notifier code is removed they

> could trip over each others.


No. The above if/else block is already removed as part of patch 5/5. It was
required because of conflict between userspace specific min/max and qos min/max,
which are migrated to use qos by patc 5/5.

The cpufreq notifier mechanism already lets users play with min/max and that is
already safe from conflicts.


> It would be nice to add a comment here about PM QoS managing and remembering

> values


I am not sure if that would add any value. Some documentation update may be
useful for people looking for details though, that I shall do after all the
changes get in and things become a bit stable.

> and that we need to be careful that both mechanisms don't trip over

> each others until this transient period is over.


The second mechanism will die very very soon once this is merged, migrating them
shouldn't be a big challenge AFAICT. I didn't attempt that because I didn't
wanted to waste time updating things in case this version also doesn't make
sense to others.

> I have a nit too. It would be nice to explicitly state this is

> CPU_{MIN,MAX}_FREQUENCY. I can see someone else adding {MIN,MAX}_FREQUENCY for

> something elsee (memory maybe?)


This is not CPU specific, but any device. The same interface shall be used by
devfreq as well, who wanted to use freq-constraints initially.

> Although I looked at the previous series briefly, but this one looks more

> compact and easier to follow, so +1 for that.


Thanks for looking into this Qais.

-- 
viresh
Qais Yousef Feb. 25, 2019, 12:14 p.m. | #5
On 02/25/19 14:39, Viresh Kumar wrote:
> On 25-02-19, 08:58, Qais Yousef wrote:

> > On 02/25/19 10:01, Viresh Kumar wrote:

> > > > > +	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);

> > > > > +	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);

> > > > > +

> > > > > +	if (min > new_policy->min)

> > > > > +		new_policy->min = min;

> > > > > +	if (max < new_policy->max)

> > > > > +		new_policy->max = max;

> 

> > And this is why we need to check here if the PM QoS value doesn't conflict with

> > the current min/max, right? Until the current notifier code is removed they

> > could trip over each others.

> 

> No. The above if/else block is already removed as part of patch 5/5. It was

> required because of conflict between userspace specific min/max and qos min/max,

> which are migrated to use qos by patc 5/5.

> 

> The cpufreq notifier mechanism already lets users play with min/max and that is

> already safe from conflicts.

> 

> 

> > It would be nice to add a comment here about PM QoS managing and remembering

> > values

> 

> I am not sure if that would add any value. Some documentation update may be

> useful for people looking for details though, that I shall do after all the

> changes get in and things become a bit stable.

> 


Up to you. But not everyone is familiar with the code and a one line comment
that points to where aggregation is happening would be helpful for someone
scanning this code IMHO.

> > and that we need to be careful that both mechanisms don't trip over

> > each others until this transient period is over.

> 

> The second mechanism will die very very soon once this is merged, migrating them

> shouldn't be a big challenge AFAICT. I didn't attempt that because I didn't

> wanted to waste time updating things in case this version also doesn't make

> sense to others.

> 

> > I have a nit too. It would be nice to explicitly state this is

> > CPU_{MIN,MAX}_FREQUENCY. I can see someone else adding {MIN,MAX}_FREQUENCY for

> > something elsee (memory maybe?)

> 

> This is not CPU specific, but any device. The same interface shall be used by

> devfreq as well, who wanted to use freq-constraints initially.

> 


I don't get that to be honest. I probably have to read more.

Is what you're saying that when applying a MIN_FREQUENCY constraint the same
value will be applied to both cpufreq and devfreq? Isn't this too coarse?

> > Although I looked at the previous series briefly, but this one looks more

> > compact and easier to follow, so +1 for that.

> 

> Thanks for looking into this Qais.

> 

> -- 

> viresh


Thanks

--
Qais Yousef
Qais Yousef Feb. 26, 2019, 10 a.m. | #6
On 02/26/19 08:00, Viresh Kumar wrote:
> On 25-02-19, 12:14, Qais Yousef wrote:

> > On 02/25/19 14:39, Viresh Kumar wrote:

> > > On 25-02-19, 08:58, Qais Yousef wrote:

> > > > On 02/25/19 10:01, Viresh Kumar wrote:

> > > > > > > +	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);

> > > > > > > +	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);

> > > > > > > +

> > > > > > > +	if (min > new_policy->min)

> > > > > > > +		new_policy->min = min;

> > > > > > > +	if (max < new_policy->max)

> > > > > > > +		new_policy->max = max;

> > > 

> > > > And this is why we need to check here if the PM QoS value doesn't conflict with

> > > > the current min/max, right? Until the current notifier code is removed they

> > > > could trip over each others.

> > > 

> > > No. The above if/else block is already removed as part of patch 5/5. It was

> > > required because of conflict between userspace specific min/max and qos min/max,

> > > which are migrated to use qos by patc 5/5.

> > > 

> > > The cpufreq notifier mechanism already lets users play with min/max and that is

> > > already safe from conflicts.

> > > 

> > > 

> > > > It would be nice to add a comment here about PM QoS managing and remembering

> > > > values

> > > 

> > > I am not sure if that would add any value. Some documentation update may be

> > > useful for people looking for details though, that I shall do after all the

> > > changes get in and things become a bit stable.

> > > 

> > 

> > Up to you. But not everyone is familiar with the code and a one line comment

> > that points to where aggregation is happening would be helpful for someone

> > scanning this code IMHO.

> 

> Okay, will add something then.

> 

> > > > and that we need to be careful that both mechanisms don't trip over

> > > > each others until this transient period is over.

> > > 

> > > The second mechanism will die very very soon once this is merged, migrating them

> > > shouldn't be a big challenge AFAICT. I didn't attempt that because I didn't

> > > wanted to waste time updating things in case this version also doesn't make

> > > sense to others.

> > > 

> > > > I have a nit too. It would be nice to explicitly state this is

> > > > CPU_{MIN,MAX}_FREQUENCY. I can see someone else adding {MIN,MAX}_FREQUENCY for

> > > > something elsee (memory maybe?)

> > > 

> > > This is not CPU specific, but any device. The same interface shall be used by

> > > devfreq as well, who wanted to use freq-constraints initially.

> > > 

> > 

> > I don't get that to be honest. I probably have to read more.

> > 

> > Is what you're saying that when applying a MIN_FREQUENCY constraint the same

> > value will be applied to both cpufreq and devfreq? Isn't this too coarse?

> 

> Oh no. A constraint with QoS is added like this:

> 

>         dev_pm_qos_add_request(dev, req, DEV_PM_QOS_MIN_FREQUENCY, min);

> 

> Now dev here can be any device struct, CPU's or GPU's or anything else. All the

> MIN freq requests are stored/processed per device and for a CPU in cpufreq all

> we will see is MIN requests for the CPUs. And so the macro is required to be a

> bit generic and shouldn't have CPU word within it.

> 

> Hope I was able to clarify your doubt a bit. Thanks.


Ah I see yes it all makes sense now.

Thanks!

--
Qais Yousef

Patch

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 0e626b00053b..d615bf35ac00 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -26,6 +26,7 @@ 
 #include <linux/kernel_stat.h>
 #include <linux/module.h>
 #include <linux/mutex.h>
+#include <linux/pm_qos.h>
 #include <linux/slab.h>
 #include <linux/suspend.h>
 #include <linux/syscore_ops.h>
@@ -1082,11 +1083,77 @@  static void handle_update(struct work_struct *work)
 	cpufreq_update_policy(cpu);
 }
 
+static void cpufreq_update_freq_work(struct work_struct *work)
+{
+	struct cpufreq_policy *policy =
+		container_of(work, struct cpufreq_policy, req_work);
+	struct cpufreq_policy new_policy = *policy;
+
+	/* We should read constraint values from QoS layer */
+	new_policy.min = 0;
+	new_policy.max = UINT_MAX;
+
+	down_write(&policy->rwsem);
+
+	if (!policy_is_inactive(policy))
+		cpufreq_set_policy(policy, &new_policy);
+
+	up_write(&policy->rwsem);
+}
+
+static int cpufreq_update_freq(struct cpufreq_policy *policy)
+{
+	schedule_work(&policy->req_work);
+	return 0;
+}
+
+static int cpufreq_notifier_min(struct notifier_block *nb, unsigned long freq,
+				void *data)
+{
+	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_min);
+
+	return cpufreq_update_freq(policy);
+}
+
+static int cpufreq_notifier_max(struct notifier_block *nb, unsigned long freq,
+				void *data)
+{
+	struct cpufreq_policy *policy = container_of(nb, struct cpufreq_policy, nb_max);
+
+	return cpufreq_update_freq(policy);
+}
+
+static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
+{
+	struct kobject *kobj;
+	struct completion *cmp;
+
+	down_write(&policy->rwsem);
+	cpufreq_stats_free_table(policy);
+	kobj = &policy->kobj;
+	cmp = &policy->kobj_unregister;
+	up_write(&policy->rwsem);
+	kobject_put(kobj);
+
+	/*
+	 * We need to make sure that the underlying kobj is
+	 * actually not referenced anymore by anybody before we
+	 * proceed with unloading.
+	 */
+	pr_debug("waiting for dropping of refcount\n");
+	wait_for_completion(cmp);
+	pr_debug("wait complete\n");
+}
+
 static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 {
 	struct cpufreq_policy *policy;
+	struct device *dev = get_cpu_device(cpu);
 	int ret;
 
+	if (!dev)
+		return NULL;
+
 	policy = kzalloc(sizeof(*policy), GFP_KERNEL);
 	if (!policy)
 		return NULL;
@@ -1103,20 +1170,45 @@  static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
 				   cpufreq_global_kobject, "policy%u", cpu);
 	if (ret) {
-		pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
+		dev_err(dev, "%s: failed to init policy->kobj: %d\n", __func__, ret);
 		goto err_free_real_cpus;
 	}
 
+	policy->nb_min.notifier_call = cpufreq_notifier_min;
+	policy->nb_max.notifier_call = cpufreq_notifier_max;
+
+	ret = dev_pm_qos_add_notifier(dev, &policy->nb_min,
+				      DEV_PM_QOS_MIN_FREQUENCY);
+	if (ret) {
+		dev_err(dev, "Failed to register MIN QoS notifier: %d (%*pbl)\n",
+			ret, cpumask_pr_args(policy->cpus));
+		goto err_kobj_remove;
+	}
+
+	ret = dev_pm_qos_add_notifier(dev, &policy->nb_max,
+				      DEV_PM_QOS_MAX_FREQUENCY);
+	if (ret) {
+		dev_err(dev, "Failed to register MAX QoS notifier: %d (%*pbl)\n",
+			ret, cpumask_pr_args(policy->cpus));
+		goto err_min_qos_notifier;
+	}
+
 	INIT_LIST_HEAD(&policy->policy_list);
 	init_rwsem(&policy->rwsem);
 	spin_lock_init(&policy->transition_lock);
 	init_waitqueue_head(&policy->transition_wait);
 	init_completion(&policy->kobj_unregister);
 	INIT_WORK(&policy->update, handle_update);
+	INIT_WORK(&policy->req_work, cpufreq_update_freq_work);
 
 	policy->cpu = cpu;
 	return policy;
 
+err_min_qos_notifier:
+	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
+				   DEV_PM_QOS_MIN_FREQUENCY);
+err_kobj_remove:
+	cpufreq_policy_put_kobj(policy);
 err_free_real_cpus:
 	free_cpumask_var(policy->real_cpus);
 err_free_rcpumask:
@@ -1129,30 +1221,9 @@  static struct cpufreq_policy *cpufreq_policy_alloc(unsigned int cpu)
 	return NULL;
 }
 
-static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy)
-{
-	struct kobject *kobj;
-	struct completion *cmp;
-
-	down_write(&policy->rwsem);
-	cpufreq_stats_free_table(policy);
-	kobj = &policy->kobj;
-	cmp = &policy->kobj_unregister;
-	up_write(&policy->rwsem);
-	kobject_put(kobj);
-
-	/*
-	 * We need to make sure that the underlying kobj is
-	 * actually not referenced anymore by anybody before we
-	 * proceed with unloading.
-	 */
-	pr_debug("waiting for dropping of refcount\n");
-	wait_for_completion(cmp);
-	pr_debug("wait complete\n");
-}
-
 static void cpufreq_policy_free(struct cpufreq_policy *policy)
 {
+	struct device *dev = get_cpu_device(policy->cpu);
 	unsigned long flags;
 	int cpu;
 
@@ -1164,6 +1235,10 @@  static void cpufreq_policy_free(struct cpufreq_policy *policy)
 		per_cpu(cpufreq_cpu_data, cpu) = NULL;
 	write_unlock_irqrestore(&cpufreq_driver_lock, flags);
 
+	dev_pm_qos_remove_notifier(dev, &policy->nb_max,
+				   DEV_PM_QOS_MAX_FREQUENCY);
+	dev_pm_qos_remove_notifier(dev, &policy->nb_min,
+				   DEV_PM_QOS_MIN_FREQUENCY);
 	cpufreq_policy_put_kobj(policy);
 	free_cpumask_var(policy->real_cpus);
 	free_cpumask_var(policy->related_cpus);
@@ -2239,6 +2314,8 @@  static int cpufreq_set_policy(struct cpufreq_policy *policy,
 			      struct cpufreq_policy *new_policy)
 {
 	struct cpufreq_governor *old_gov;
+	struct device *cpu_dev = get_cpu_device(policy->cpu);
+	unsigned long min, max;
 	int ret;
 
 	pr_debug("setting new policy for CPU %u: %u - %u kHz\n",
@@ -2253,11 +2330,23 @@  static int cpufreq_set_policy(struct cpufreq_policy *policy,
 	if (new_policy->min > new_policy->max)
 		return -EINVAL;
 
+	min = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MIN_FREQUENCY);
+	max = dev_pm_qos_read_value(cpu_dev, DEV_PM_QOS_MAX_FREQUENCY);
+
+	if (min > new_policy->min)
+		new_policy->min = min;
+	if (max < new_policy->max)
+		new_policy->max = max;
+
 	/* verify the cpu speed can be set within this limit */
 	ret = cpufreq_driver->verify(new_policy);
 	if (ret)
 		return ret;
 
+	/*
+	 * The notifier-chain shall be removed once all the users of
+	 * CPUFREQ_ADJUST are moved to use the QoS framework.
+	 */
 	/* adjust if necessary - all reasons */
 	blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
 			CPUFREQ_ADJUST, new_policy);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index b160e98076e3..f8f48d8a9b52 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -90,6 +90,7 @@  struct cpufreq_policy {
 
 	struct work_struct	update; /* if update_policy() needs to be
 					 * called, but you're in IRQ context */
+	struct work_struct	req_work;
 
 	struct cpufreq_user_policy user_policy;
 	struct cpufreq_frequency_table	*freq_table;
@@ -154,6 +155,9 @@  struct cpufreq_policy {
 
 	/* Pointer to the cooling device if used for thermal mitigation */
 	struct thermal_cooling_device *cdev;
+
+	struct notifier_block nb_min;
+	struct notifier_block nb_max;
 };
 
 /* Only for ACPI */