[RFC,1/4] cpufreq: governor: support scheduler cpufreq callbacks on remote CPUs

Message ID 1461119969-10371-1-git-send-email-smuckle@linaro.org
State New
Headers show

Commit Message

Steve Muckle April 20, 2016, 2:39 a.m.
In preparation for the scheduler cpufreq callback happening
on remote CPUs, add support for this in the dbs governors.
The dbs governors make assumptions about the callback occurring
on the CPU being updated.

Signed-off-by: Steve Muckle <smuckle@linaro.org>

---
 drivers/cpufreq/cpufreq_governor.c | 21 +++++++++++++++++++--
 drivers/cpufreq/cpufreq_governor.h |  1 +
 2 files changed, 20 insertions(+), 2 deletions(-)

-- 
2.4.10

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Steve Muckle April 21, 2016, 2:20 a.m. | #1
On Wed, Apr 20, 2016 at 02:37:18PM +0200, Rafael J. Wysocki wrote:
...
> > @@ -1173,20 +1179,88 @@ static inline void intel_pstate_adjust_busy_pstate(struct cpudata *cpu)

> >  		get_avg_frequency(cpu));

> >  }

> >  

> > +static void _intel_pstate_update_util(struct cpudata *cpu, u64 time)

> 

> What about calling this intel_pstate_update_cpu()?


Sure will change.

...
> >  static void intel_pstate_update_util(struct update_util_data *data, u64 time,

> >  				     unsigned long util, unsigned long max)

> >  {

> >  	struct cpudata *cpu = container_of(data, struct cpudata, update_util);

> > -	u64 delta_ns = time - cpu->sample.time;

> > +	s64 delta_ns = time - cpu->sample.time;

> >  

> > -	if ((s64)delta_ns >= pid_params.sample_rate_ns) {

> > -		bool sample_taken = intel_pstate_sample(cpu, time);

> > +	if (delta_ns < pid_params.sample_rate_ns)

> 

> Why don't you check cpu->ipi_in_progress here too and bail out if it is set?

> 

> That would allow you to avoid checking the time again below, woulnd't it?


Yeah I think that should work. I can't recall why I thought I needed
to check the time first, then ipi_in_progress, then the time. As long
as ipi_in_progress is checked prior to the time, it should be fine.

> 

> > +		return;

> >  

> > -		if (sample_taken && !hwp_active)

> > -			intel_pstate_adjust_busy_pstate(cpu);

> > +	if (cpu->cpu == smp_processor_id()) {

> > +		_intel_pstate_update_util(cpu, time);

> > +	} else {

> > +		/* The target CPU's rq lock is held. */

> > +		if (cpu->ipi_in_progress)

> > +			return;

> > +

> > +		/* Re-check sample_time which may have advanced. */

> > +		smp_rmb();

> > +		delta_ns = time - READ_ONCE(cpu->sample.time);

> > +		if (delta_ns < pid_params.sample_rate_ns)

> > +			return;

> > +

> > +		cpu->ipi_in_progress = true;

> > +		cpu->time = time;

> > +		irq_work_queue_on(&cpu->irq_work, cpu->cpu);

> >  	}

> >  }

> >  

> > +static inline void intel_pstate_irq_work_sync(unsigned int cpu)

> > +{

> > +	irq_work_sync(&all_cpu_data[cpu]->irq_work);

> > +}

> > +

> > +static inline void intel_pstate_init_irq_work(struct cpudata *cpu)

> > +{

> > +	init_irq_work(&cpu->irq_work, intel_pstate_update_util_remote);

> > +}

> > +#else /* !CONFIG_SMP */

> > +static inline void intel_pstate_irq_work_sync(unsigned int cpu) {}

> > +static inline void intel_pstate_init_irq_work(struct cpudata *cpu) {}

> > +

> > +static void intel_pstate_update_util(struct update_util_data *data, u64 time,

> > +				     unsigned long util, unsigned long max)

> > +{

> > +	struct cpudata *cpu = container_of(data, struct cpudata, update_util);

> > +	s64 delta_ns = time - cpu->sample.time;

> > +

> > +	if (delta_ns < pid_params.sample_rate_ns)

> > +		return;

> > +

> > +	_intel_pstate_update_util(cpu, time);

> > +}

> > +#endif

> > +

> > +

> > +

> 

> The additional two empty lines are not necessary.

> 


Sorry yeah these were unintentional, will remove these and the ones below.

Thanks for the review.

thanks,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Muckle April 25, 2016, 7:17 p.m. | #2
On Wed, Apr 20, 2016 at 02:26:06PM +0200, Rafael J. Wysocki wrote:
> You could have added a cover [0/4] message which would have made it easier

> to reply to the entire series in general.  Let me do it here.


Will add that next time.

> Doing it the way it is done in this series would be fine by me in general

> (up to a few more or less minor comments), but it is still unclear to me

> how much of a difference these changes would make in terms of improved

> response times etc.


I spent some time last week constructing a test case where the
benefits could be seen. A task which was previously low utilization
wakes on CPU0 and becomes CPU bound. Just after that, a new task is
spawned on CPU0. The initial task utilization is high so ideally we
would like to see the frequency immediately rise, but in my test it
does not occur until the next tick. There is 7ms of delay in the trace
I've saved.

Unfortunately these patches alone will not address it. There are a
couple other issues which get in the way (which is why I didn't
respond here right away). Let me spend some more time on those and see
how it goes.

> > In preparation for the scheduler cpufreq callback happening

> > on remote CPUs, add support for this in the dbs governors.

> > The dbs governors make assumptions about the callback occurring

> > on the CPU being updated.

> 

> While the above is generally correct, it would be nice to say more about what

> happens in the patch.  Like:

> 

> "To that end, add a CPU number field to struct cpu_dbs_info and modify

> dbs_update_util_handler() to schedule IRQ works on target CPUs rather than on

> the local one only."


I'm happy to do that if it is what you'd like to see, but just
curious, isn't it really just restating the patch contents?

...
> > diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h

> > index 3e0eb7c54903..1d5f4857ff80 100644

> > --- a/drivers/cpufreq/cpufreq_governor.h

> > +++ b/drivers/cpufreq/cpufreq_governor.h

> > @@ -122,6 +122,7 @@ struct cpu_dbs_info {

> >  	unsigned int prev_load;

> >  	struct update_util_data update_util;

> >  	struct policy_dbs_info *policy_dbs;

> > +	int cpu;

> >  };

> 

> Wouldn't it be better to add the cpu field to struct update_util_data and set

> it from cpufreq_add_update_util_hook()?

> 

> That would allow you to avoid adding the cpu field to struct sugov_cpu in the

> second patch at least.


Sure, will do.

thanks,
Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index 20f0a4e114d1..429d3a5b9866 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -248,6 +248,20 @@  static void dbs_irq_work(struct irq_work *irq_work)
 	schedule_work_on(smp_processor_id(), &policy_dbs->work);
 }
 
+#ifdef CONFIG_SMP
+static inline void dbs_irq_work_queue(struct policy_dbs_info *policy_dbs,
+				      int cpu)
+{
+	irq_work_queue_on(&policy_dbs->irq_work, cpu);
+}
+#else
+static inline void dbs_irq_work_queue(struct policy_dbs_info *policy_dbs,
+				      int cpu)
+{
+	irq_work_queue(&policy_dbs->irq_work);
+}
+#endif
+
 static void dbs_update_util_handler(struct update_util_data *data, u64 time,
 				    unsigned long util, unsigned long max)
 {
@@ -295,7 +309,7 @@  static void dbs_update_util_handler(struct update_util_data *data, u64 time,
 
 	policy_dbs->last_sample_time = time;
 	policy_dbs->work_in_progress = true;
-	irq_work_queue(&policy_dbs->irq_work);
+	dbs_irq_work_queue(policy_dbs, cdbs->cpu);
 }
 
 static void gov_set_update_util(struct policy_dbs_info *policy_dbs,
@@ -384,7 +398,7 @@  static int cpufreq_governor_init(struct cpufreq_policy *policy)
 	struct dbs_data *dbs_data;
 	struct policy_dbs_info *policy_dbs;
 	unsigned int latency;
-	int ret = 0;
+	int j, ret = 0;
 
 	/* State should be equivalent to EXIT */
 	if (policy->governor_data)
@@ -394,6 +408,9 @@  static int cpufreq_governor_init(struct cpufreq_policy *policy)
 	if (!policy_dbs)
 		return -ENOMEM;
 
+	for_each_cpu(j, policy->cpus)
+		per_cpu(cpu_dbs, j).cpu = j;
+
 	/* Protect gov->gdbs_data against concurrent updates. */
 	mutex_lock(&gov_dbs_data_mutex);
 
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index 3e0eb7c54903..1d5f4857ff80 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -122,6 +122,7 @@  struct cpu_dbs_info {
 	unsigned int prev_load;
 	struct update_util_data update_util;
 	struct policy_dbs_info *policy_dbs;
+	int cpu;
 };
 
 /* Common Governor data across policies */