[V4,0/3] sched: cpufreq: Allow remote callbacks

Message ID cover.1501060871.git.viresh.kumar@linaro.org
State New
Headers show

Commit Message

Viresh Kumar July 26, 2017, 9:22 a.m.
Hi,

I had some IRC discussions with Peter and V4 is based on his feedback.
Here is the diff between V3 and V4:


-------------------------8<-------------------------

With Android UI and benchmarks the latency of cpufreq response to
certain scheduling events can become very critical. Currently, callbacks
into schedutil are only made from the scheduler if the target CPU of the
event is the same as the current CPU. This means there are certain
situations where a target CPU may not run schedutil for some time.

One testcase to show this behavior is where a task starts running on
CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the
system is configured such that new tasks should receive maximum demand
initially, this should result in CPU0 increasing frequency immediately.
Because of the above mentioned limitation though this does not occur.
This is verified using ftrace with the sample [1] application.

Maybe the ideal solution is to always allow remote callbacks but that
has its own challenges:

o There is no protection required for single CPU per policy case today,
  and adding any kind of locking there, to supply remote callbacks,
  isn't really a good idea.

o If is local CPU isn't part of the same cpufreq policy as the target
  CPU, then we wouldn't be able to do fast switching at all and have to
  use some kind of bottom half to schedule work on the target CPU to do
  real switching. That may be overkill as well.


And so this series only allows remote callbacks for target CPUs that
share the cpufreq policy with the local CPU.

This series is tested with couple of usecases (Android: hackbench,
recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey
board (64 bit octa-core, single policy). Only galleryfling showed minor
improvements, while others didn't had much deviation.

The reason being that this patchset only targets a corner case, where
following are required to be true to improve performance and that
doesn't happen too often with these tests:

- Task is migrated to another CPU.
- The task has maximum demand initially, and should take the CPU to
  higher OPPs.
- And the target CPU doesn't call into schedutil until the next tick.

V3->V4:
- Respect iowait boost flag and util updates for the all remote
  callbacks.
- Minor updates in commit log of 2/3.

V2->V3:
- Rearranged/merged patches as suggested by Rafael (looks much better
  now)
- Also handle new hook added to intel-pstate driver.
- The final code remains the same as V2, except for the above hook.

V1->V2:
- Don't support remote callbacks for unshared cpufreq policies.
- Don't support remote callbacks where local CPU isn't part of the
  target CPU's cpufreq policy.
- Dropped dvfs_possible_from_any_cpu flag.

--
viresh

Viresh Kumar (3):
  sched: cpufreq: Allow remote cpufreq callbacks
  cpufreq: schedutil: Process remote callback for shared policies
  cpufreq: governor: Process remote callback for shared policies

 drivers/cpufreq/cpufreq_governor.c |  4 ++++
 drivers/cpufreq/intel_pstate.c     |  8 ++++++++
 include/linux/sched/cpufreq.h      |  1 +
 kernel/sched/cpufreq.c             |  1 +
 kernel/sched/cpufreq_schedutil.c   | 14 +++++++++-----
 kernel/sched/deadline.c            |  2 +-
 kernel/sched/fair.c                |  8 +++++---
 kernel/sched/rt.c                  |  2 +-
 kernel/sched/sched.h               | 10 ++--------
 9 files changed, 32 insertions(+), 18 deletions(-)

-- 
2.13.0.71.gd7076ec9c9cb

Comments

Rafael J. Wysocki July 26, 2017, 5:42 p.m. | #1
On Wednesday, July 26, 2017 02:52:32 PM Viresh Kumar wrote:
> We do not call cpufreq callbacks from scheduler core for remote

> (non-local) CPUs currently. But there are cases where such remote

> callbacks are useful, specially in the case of shared cpufreq policies.

> 

> This patch updates the scheduler core to call the cpufreq callbacks for

> remote CPUs as well.

> 

> For now, all the registered utilization update callbacks are updated to

> return early if remote callback is detected. That is, this patch just

> moves the decision making down in the hierarchy.

> 

> Later patches would enable remote callbacks for shared policies.

> 

> Based on initial work from Steve Muckle.

> 

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> ---

>  drivers/cpufreq/cpufreq_governor.c |  4 ++++

>  drivers/cpufreq/intel_pstate.c     |  8 ++++++++

>  include/linux/sched/cpufreq.h      |  1 +

>  kernel/sched/cpufreq.c             |  1 +

>  kernel/sched/cpufreq_schedutil.c   | 11 ++++++++---

>  kernel/sched/deadline.c            |  2 +-

>  kernel/sched/fair.c                |  8 +++++---

>  kernel/sched/rt.c                  |  2 +-

>  kernel/sched/sched.h               | 10 ++--------

>  9 files changed, 31 insertions(+), 16 deletions(-)

> 

> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c

> index eed069ecfd5e..5499796cf9a8 100644

> --- a/drivers/cpufreq/cpufreq_governor.c

> +++ b/drivers/cpufreq/cpufreq_governor.c

> @@ -272,6 +272,10 @@ static void dbs_update_util_handler(struct update_util_data *data, u64 time,

>  	struct policy_dbs_info *policy_dbs = cdbs->policy_dbs;

>  	u64 delta_ns, lst;

>  

> +	/* Don't allow remote callbacks */

> +	if (smp_processor_id() != data->cpu)

> +		return;

> +

>  	/*

>  	 * The work may not be allowed to be queued up right now.

>  	 * Possible reasons:

> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c

> index 89bbc0c11b22..0dd14c8edd2d 100644

> --- a/drivers/cpufreq/intel_pstate.c

> +++ b/drivers/cpufreq/intel_pstate.c

> @@ -1747,6 +1747,10 @@ static void intel_pstate_update_util_pid(struct update_util_data *data,

>  	struct cpudata *cpu = container_of(data, struct cpudata, update_util);

>  	u64 delta_ns = time - cpu->sample.time;

>  

> +	/* Don't allow remote callbacks */

> +	if (smp_processor_id() != data->cpu)

> +		return;


You can do this check against cpu->cpu, however.

> +

>  	if ((s64)delta_ns < pid_params.sample_rate_ns)

>  		return;

>  

> @@ -1764,6 +1768,10 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time,

>  	struct cpudata *cpu = container_of(data, struct cpudata, update_util);

>  	u64 delta_ns;

>  

> +	/* Don't allow remote callbacks */

> +	if (smp_processor_id() != data->cpu)

> +		return;


And same here.

> +

>  	if (flags & SCHED_CPUFREQ_IOWAIT) {

>  		cpu->iowait_boost = int_tofp(1);

>  	} else if (cpu->iowait_boost) {

> diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h

> index d2be2ccbb372..8256a8f35f22 100644

> --- a/include/linux/sched/cpufreq.h

> +++ b/include/linux/sched/cpufreq.h

> @@ -16,6 +16,7 @@

>  #ifdef CONFIG_CPU_FREQ

>  struct update_util_data {

>         void (*func)(struct update_util_data *data, u64 time, unsigned int flags);

> +       unsigned int cpu;


So it looks like you don't need this.

schedutil doesn't need it as per patch [2/3].

ondemand/conservative don't need it as per patch [3/3]

intel_pstate doesn't need this too, because of the above.

Thanks,
Rafael
Viresh Kumar July 27, 2017, 3:23 a.m. | #2
On 26-07-17, 19:42, Rafael J. Wysocki wrote:
> On Wednesday, July 26, 2017 02:52:32 PM Viresh Kumar wrote:


> > +	/* Don't allow remote callbacks */

> > +	if (smp_processor_id() != data->cpu)

> > +		return;

> 

> You can do this check against cpu->cpu, however.

> 

> > +	/* Don't allow remote callbacks */

> > +	if (smp_processor_id() != data->cpu)

> > +		return;

> 

> And same here.

> 

> > +

> >  	if (flags & SCHED_CPUFREQ_IOWAIT) {

> >  		cpu->iowait_boost = int_tofp(1);

> >  	} else if (cpu->iowait_boost) {

> > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h

> > index d2be2ccbb372..8256a8f35f22 100644

> > --- a/include/linux/sched/cpufreq.h

> > +++ b/include/linux/sched/cpufreq.h

> > @@ -16,6 +16,7 @@

> >  #ifdef CONFIG_CPU_FREQ

> >  struct update_util_data {

> >         void (*func)(struct update_util_data *data, u64 time, unsigned int flags);

> > +       unsigned int cpu;

> 

> So it looks like you don't need this.

> 

> schedutil doesn't need it as per patch [2/3].


Hmm, so your comments are exactly same as what Peter suggested few
days back.

sugov_get_util() uses it in 2/3, as we need to know the target CPU
anyway.

But I think it would be better to add the cpu variable in sugov_cpu
structure instead as only schedutil needs it. I will do that change
and send V5.

-- 
viresh
Joel Fernandes (Google) July 27, 2017, 5:14 a.m. | #3
Hi Viresh,

On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
<snip>
>

> With Android UI and benchmarks the latency of cpufreq response to

> certain scheduling events can become very critical. Currently, callbacks

> into schedutil are only made from the scheduler if the target CPU of the

> event is the same as the current CPU. This means there are certain

> situations where a target CPU may not run schedutil for some time.

>

> One testcase to show this behavior is where a task starts running on

> CPU0, then a new task is also spawned on CPU0 by a task on CPU1. If the

> system is configured such that new tasks should receive maximum demand

> initially, this should result in CPU0 increasing frequency immediately.

> Because of the above mentioned limitation though this does not occur.

> This is verified using ftrace with the sample [1] application.


I think you dropped [1] in your cover-letter. May be you meant to add
it at the end of the cover letter?

I noticed from your v2 that its:
https://pastebin.com/7LkMSRxE

Also one more comment about this usecase:

You mentioned in our discussion at [2] sometime back, about the
question of initial utilization,

"We don't have any such configurable way possible right
now, but there were discussions on how much utilization should a new
task be assigned when it first comes up."

But, then in your cover letter above, you mentioned "This is verified
using ftrace". So my question is how has this been verified with
ftrace if the new initial utilization as you said in [2] is currently
still under discussion? Basically how could you verify with ftrace
that the target CPU frequency isn't increasing immediately on spawning
of a new task remotely, if the initial utilization of a new task isn't
something we set/configure with current code? Am I missing something?

[2] https://lists.linaro.org/pipermail/eas-dev/2017-January/000785.html

>

> Maybe the ideal solution is to always allow remote callbacks but that

> has its own challenges:

>

> o There is no protection required for single CPU per policy case today,

>   and adding any kind of locking there, to supply remote callbacks,

>   isn't really a good idea.

>

> o If is local CPU isn't part of the same cpufreq policy as the target

>   CPU, then we wouldn't be able to do fast switching at all and have to

>   use some kind of bottom half to schedule work on the target CPU to do

>   real switching. That may be overkill as well.

>

>

> And so this series only allows remote callbacks for target CPUs that

> share the cpufreq policy with the local CPU.

>

> This series is tested with couple of usecases (Android: hackbench,

> recentfling, galleryfling, vellamo, Ubuntu: hackbench) on ARM hikey

> board (64 bit octa-core, single policy). Only galleryfling showed minor

> improvements, while others didn't had much deviation.

>

> The reason being that this patchset only targets a corner case, where

> following are required to be true to improve performance and that

> doesn't happen too often with these tests:

>

> - Task is migrated to another CPU.

> - The task has maximum demand initially, and should take the CPU to


Just to make the cover-letter more clear and also confirming with you
I understand the above usecase, maybe in the future this can reworded
from "initially" to "before the migration" and "take the CPU" to "take
the target CPU of the migration" ?

>   higher OPPs.

> - And the target CPU doesn't call into schedutil until the next tick.


I found this usecase to be more plausible and can see this patch
series being useful there.

Could you also keep me in CC on these patches (at joelaf@google.com)?
I'm interested in this series.

thanks!

-Joel
Joel Fernandes (Google) July 27, 2017, 5:34 a.m. | #4
Hi Viresh,

On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> We do not call cpufreq callbacks from scheduler core for remote

> (non-local) CPUs currently. But there are cases where such remote

> callbacks are useful, specially in the case of shared cpufreq policies.

>

> This patch updates the scheduler core to call the cpufreq callbacks for

> remote CPUs as well.

>

> For now, all the registered utilization update callbacks are updated to

> return early if remote callback is detected. That is, this patch just

> moves the decision making down in the hierarchy.

>

> Later patches would enable remote callbacks for shared policies.

>

> Based on initial work from Steve Muckle.

>

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

<snip>
> --- a/kernel/sched/cpufreq_schedutil.c

> +++ b/kernel/sched/cpufreq_schedutil.c

> @@ -72,10 +72,15 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);

>

>  /************************ Governor internals ***********************/

>

> -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

> +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time,

> +                                    int target_cpu)

>  {

>         s64 delta_ns;

>

> +       /* Don't allow remote callbacks */

> +       if (smp_processor_id() != target_cpu)

> +               return false;

> +

>         if (sg_policy->work_in_progress)

>                 return false;

>

> @@ -221,7 +226,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

>         sugov_set_iowait_boost(sg_cpu, time, flags);

>         sg_cpu->last_update = time;

>

> -       if (!sugov_should_update_freq(sg_policy, time))

> +       if (!sugov_should_update_freq(sg_policy, time, hook->cpu))

>                 return;


Since with the remote callbacks now possible, isn't it unsafe to
modify sg_cpu and sg_policy structures without a lock in
sugov_update_single?

Unlike sugov_update_shared, we don't acquire any lock in
sugov_update_single before updating these structures. Did I miss
something?


thanks,

-Joel
Joel Fernandes (Google) July 27, 2017, 5:49 a.m. | #5
On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> This patch updates the schedutil governor to process cpufreq utilization

> update hooks called for remote CPUs where the remote CPU is managed by

> the cpufreq policy of the local CPU.

>

> Based on initial work from Steve Muckle.

>

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>


Reviewed-by: Joel Fernandes <joelaf@google.com>



thanks,

-Joel


> ---

>  kernel/sched/cpufreq_schedutil.c | 21 ++++++++++-----------

>  1 file changed, 10 insertions(+), 11 deletions(-)

>

> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c

> index bb834747e49b..c3baf70d360c 100644

> --- a/kernel/sched/cpufreq_schedutil.c

> +++ b/kernel/sched/cpufreq_schedutil.c

> @@ -72,13 +72,12 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);

>

>  /************************ Governor internals ***********************/

>

> -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time,

> -                                    int target_cpu)

> +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

>  {

>         s64 delta_ns;

>

> -       /* Don't allow remote callbacks */

> -       if (smp_processor_id() != target_cpu)

> +       /* Allow remote callbacks only on the CPUs sharing cpufreq policy */

> +       if (!cpumask_test_cpu(smp_processor_id(), sg_policy->policy->cpus))

>                 return false;

>

>         if (sg_policy->work_in_progress)

> @@ -159,12 +158,12 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,

>         return cpufreq_driver_resolve_freq(policy, freq);

>  }

>

> -static void sugov_get_util(unsigned long *util, unsigned long *max)

> +static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)

>  {

> -       struct rq *rq = this_rq();

> +       struct rq *rq = cpu_rq(cpu);

>         unsigned long cfs_max;

>

> -       cfs_max = arch_scale_cpu_capacity(NULL, smp_processor_id());

> +       cfs_max = arch_scale_cpu_capacity(NULL, cpu);

>

>         *util = min(rq->cfs.avg.util_avg, cfs_max);

>         *max = cfs_max;

> @@ -226,7 +225,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

>         sugov_set_iowait_boost(sg_cpu, time, flags);

>         sg_cpu->last_update = time;

>

> -       if (!sugov_should_update_freq(sg_policy, time, hook->cpu))

> +       if (!sugov_should_update_freq(sg_policy, time))

>                 return;

>

>         busy = sugov_cpu_is_busy(sg_cpu);

> @@ -234,7 +233,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

>         if (flags & SCHED_CPUFREQ_RT_DL) {

>                 next_f = policy->cpuinfo.max_freq;

>         } else {

> -               sugov_get_util(&util, &max);

> +               sugov_get_util(&util, &max, hook->cpu);

>                 sugov_iowait_boost(sg_cpu, &util, &max);

>                 next_f = get_next_freq(sg_policy, util, max);

>                 /*

> @@ -295,7 +294,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,

>         unsigned long util, max;

>         unsigned int next_f;

>

> -       sugov_get_util(&util, &max);

> +       sugov_get_util(&util, &max, hook->cpu);

>

>         raw_spin_lock(&sg_policy->update_lock);

>

> @@ -306,7 +305,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time,

>         sugov_set_iowait_boost(sg_cpu, time, flags);

>         sg_cpu->last_update = time;

>

> -       if (sugov_should_update_freq(sg_policy, time, hook->cpu)) {

> +       if (sugov_should_update_freq(sg_policy, time)) {

>                 if (flags & SCHED_CPUFREQ_RT_DL)

>                         next_f = sg_policy->policy->cpuinfo.max_freq;
Viresh Kumar July 27, 2017, 5:50 a.m. | #6
On 26-07-17, 22:34, Joel Fernandes (Google) wrote:
> On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > @@ -221,7 +226,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

> >         sugov_set_iowait_boost(sg_cpu, time, flags);

> >         sg_cpu->last_update = time;

> >

> > -       if (!sugov_should_update_freq(sg_policy, time))

> > +       if (!sugov_should_update_freq(sg_policy, time, hook->cpu))

> >                 return;

> 

> Since with the remote callbacks now possible, isn't it unsafe to

> modify sg_cpu and sg_policy structures without a lock in

> sugov_update_single?

> 

> Unlike sugov_update_shared, we don't acquire any lock in

> sugov_update_single before updating these structures. Did I miss

> something?


As Peter already mentioned it earlier, the callbacks are called with
rq locks held and so sugov_update_single() wouldn't get called in
parallel for a target CPU.

That's the only race you were worried about ?

-- 
viresh
Joel Fernandes (Google) July 27, 2017, 6:13 a.m. | #7
On Wed, Jul 26, 2017 at 10:50 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 26-07-17, 22:34, Joel Fernandes (Google) wrote:

>> On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

>> > @@ -221,7 +226,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

>> >         sugov_set_iowait_boost(sg_cpu, time, flags);

>> >         sg_cpu->last_update = time;

>> >

>> > -       if (!sugov_should_update_freq(sg_policy, time))

>> > +       if (!sugov_should_update_freq(sg_policy, time, hook->cpu))

>> >                 return;

>>

>> Since with the remote callbacks now possible, isn't it unsafe to

>> modify sg_cpu and sg_policy structures without a lock in

>> sugov_update_single?

>>

>> Unlike sugov_update_shared, we don't acquire any lock in

>> sugov_update_single before updating these structures. Did I miss

>> something?

>

> As Peter already mentioned it earlier, the callbacks are called with

> rq locks held and so sugov_update_single() wouldn't get called in

> parallel for a target CPU.


Ah ok, I have to catch up with that discussion since I missed the
whole thing. Now that you will have me on CC, that shouldn't happen,
thanks and sorry about the noise.

> That's the only race you were worried about ?


Yes. So then in that case, makes sense to move raw_spin_lock in
sugov_update_shared further down? (Just discussing, this point is
independent of your patch), Something like:

        sg_cpu->flags = flags;
@@ -304,6 +302,8 @@ static void sugov_update_shared(struct
update_util_data *hook, u64 time,
        sugov_set_iowait_boost(sg_cpu, time, flags);
        sg_cpu->last_update = time;

+       raw_spin_lock(&sg_policy->update_lock);
+
        if (sugov_should_update_freq(sg_policy, time)) {
                if (flags & SCHED_CPUFREQ_RT_DL)
                        next_f = sg_policy->policy->cpuinfo.max_freq;



thanks,

-Joeldiff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 622eed1b7658..9a6c12fb2c16 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -295,8 +295,6 @@ static void sugov_update_shared(struct
update_util_data *hook, u64 time,

        sugov_get_util(&util, &max);

-       raw_spin_lock(&sg_policy->update_lock);
-
        sg_cpu->util = util;
        sg_cpu->max = max;

Joel Fernandes (Google) July 27, 2017, 6:23 a.m. | #8
Hi Viresh,

On Wed, Jul 26, 2017 at 10:46 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 26-07-17, 22:14, Joel Fernandes (Google) wrote:

<snip>
>> Also one more comment about this usecase:

>>

>> You mentioned in our discussion at [2] sometime back, about the

>> question of initial utilization,

>>

>> "We don't have any such configurable way possible right

>> now, but there were discussions on how much utilization should a new

>> task be assigned when it first comes up."

>

> We still initialize it to a value, just that it isn't configurable.

> See below..

>

>> But, then in your cover letter above, you mentioned "This is verified

>> using ftrace". So my question is how has this been verified with

>> ftrace if the new initial utilization as you said in [2] is currently

>> still under discussion? Basically how could you verify with ftrace

>> that the target CPU frequency isn't increasing immediately on spawning

>> of a new task remotely, if the initial utilization of a new task isn't

>> something we set/configure with current code? Am I missing something?

>>

>> [2] https://lists.linaro.org/pipermail/eas-dev/2017-January/000785.html

>

> The statement "new tasks should receive maximum demand initially" is

> used to represent tasks which have high demand every time they run.

> For example scrolling of a web page or gallery on our phones. Yes,

> maybe I can use the work "migrated" (as you suggested later) as the

> history of its utilization will move with it then to the new CPU.

>

> But even without that, if you see the routine

> init_entity_runnable_average() in fair.c, the new tasks are

> initialized in a way that they are seen as heavy tasks. And so even

> for the first time they run, freq should normally increase on the

> target CPU (at least with above application).i


Ok, but the "heavy" in init_entity_runnable_average means for load,
not the util_avg. The util_avg is what's used for frequency scaling
IIUC and is set to 0 in that function no?

>

> The application was written by Steve (all credit goes to him) before

> he left Linaro, but I did test it with ftrace. What I saw with ftrace

> was that the freq isn't reevaluated for almost an entire tick many

> times because we enqueued the task remotely. And that changes with

> this series.

>

>> > The reason being that this patchset only targets a corner case, where

>> > following are required to be true to improve performance and that

>> > doesn't happen too often with these tests:

>> >

>> > - Task is migrated to another CPU.

>> > - The task has maximum demand initially, and should take the CPU to

>>

>> Just to make the cover-letter more clear and also confirming with you

>> I understand the above usecase, maybe in the future this can reworded

>> from "initially" to "before the migration" and "take the CPU" to "take

>> the target CPU of the migration" ?

>

> I can reword it a bit, but the test case wasn't really migrating

> anything and was looking only at the initial loads.


Ok, I wasn't talking about the synthetic test in the second part of my
email above but about the explanation you gave about Galleryfling
improvement (that the migration of a task with high utilization
doesn't update the target frequency) which makes sense to me so we are
on the same page about that.

thanks,

-Joel
Viresh Kumar July 27, 2017, 7:14 a.m. | #9
On 26-07-17, 23:13, Joel Fernandes (Google) wrote:
> On Wed, Jul 26, 2017 at 10:50 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > On 26-07-17, 22:34, Joel Fernandes (Google) wrote:

> >> On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> >> > @@ -221,7 +226,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

> >> >         sugov_set_iowait_boost(sg_cpu, time, flags);

> >> >         sg_cpu->last_update = time;

> >> >

> >> > -       if (!sugov_should_update_freq(sg_policy, time))

> >> > +       if (!sugov_should_update_freq(sg_policy, time, hook->cpu))

> >> >                 return;

> >>

> >> Since with the remote callbacks now possible, isn't it unsafe to

> >> modify sg_cpu and sg_policy structures without a lock in

> >> sugov_update_single?

> >>

> >> Unlike sugov_update_shared, we don't acquire any lock in

> >> sugov_update_single before updating these structures. Did I miss

> >> something?

> >

> > As Peter already mentioned it earlier, the callbacks are called with

> > rq locks held and so sugov_update_single() wouldn't get called in

> > parallel for a target CPU.

> 

> Ah ok, I have to catch up with that discussion since I missed the

> whole thing. Now that you will have me on CC, that shouldn't happen,

> thanks and sorry about the noise.

> 

> > That's the only race you were worried about ?

> 

> Yes. So then in that case, makes sense to move raw_spin_lock in

> sugov_update_shared further down? (Just discussing, this point is

> independent of your patch), Something like:


Even that was discussed tomorrow with Peter :)

No it wouldn't work because sg_cpu->util we are updating here may be
getting read from some other cpu that shares policy with sg_cpu.

-- 
viresh
Juri Lelli July 27, 2017, 7:21 a.m. | #10
Hi,

On 26/07/17 23:23, Joel Fernandes (Google) wrote:
> Hi Viresh,

> 

> On Wed, Jul 26, 2017 at 10:46 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

> > On 26-07-17, 22:14, Joel Fernandes (Google) wrote:


[...]

> >

> > But even without that, if you see the routine

> > init_entity_runnable_average() in fair.c, the new tasks are

> > initialized in a way that they are seen as heavy tasks. And so even

> > for the first time they run, freq should normally increase on the

> > target CPU (at least with above application).i

> 

> Ok, but the "heavy" in init_entity_runnable_average means for load,

> not the util_avg. The util_avg is what's used for frequency scaling

> IIUC and is set to 0 in that function no?

> 


True for init_entity_runnable_average(), but for new task post_init_
entity_util_avg() is then also called (from wake_up_new_task()), which
modifies the initial util_avg value (depending on current rq {util,
load}_avg.
Peter Zijlstra July 27, 2017, 9:10 a.m. | #11
On Thu, Jul 27, 2017 at 12:44:41PM +0530, Viresh Kumar wrote:
> Even that was discussed tomorrow with Peter :)


Just to clarify I don't have a time machine. That discussion was
_yesterday_,... I think :-)
Peter Zijlstra July 27, 2017, 9:56 a.m. | #12
On Wed, Jul 26, 2017 at 02:52:32PM +0530, Viresh Kumar wrote:
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c

> index 45fcf21ad685..bb834747e49b 100644

> --- a/kernel/sched/cpufreq_schedutil.c

> +++ b/kernel/sched/cpufreq_schedutil.c

> @@ -72,10 +72,15 @@ static DEFINE_PER_CPU(struct sugov_cpu, sugov_cpu);

>  

>  /************************ Governor internals ***********************/

>  

> -static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)

> +static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time,

> +				     int target_cpu)

>  {

>  	s64 delta_ns;

>  

> +	/* Don't allow remote callbacks */


So I think you can reduce confusion in general if we extend this comment
somewhat.

	/*
	 * Since cpufreq_update_util() is called with rq->lock held for
	 * the @target_cpu, our per-cpu data is fully serialized.
	 *
	 * However, drivers cannot in general deal with cross-cpu
	 * requests, so while get_next_freq() will work, our
	 * sugov_update_commit() -> cpufreq_driver_fast_switch()
	 * call will not.
	 *
	 * Hence stop here for remote requests, as calculating the
	 * frequency is pointless if we cannot in fact act on it.
	 */

> +	if (smp_processor_id() != target_cpu)

> +		return false;

> +

>  	if (sg_policy->work_in_progress)

>  		return false;

>
Joel Fernandes (Google) July 28, 2017, 3:34 a.m. | #13
On Thu, Jul 27, 2017 at 12:14 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 26-07-17, 23:13, Joel Fernandes (Google) wrote:

>> On Wed, Jul 26, 2017 at 10:50 PM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

>> > On 26-07-17, 22:34, Joel Fernandes (Google) wrote:

>> >> On Wed, Jul 26, 2017 at 2:22 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:

>> >> > @@ -221,7 +226,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time,

>> >> >         sugov_set_iowait_boost(sg_cpu, time, flags);

>> >> >         sg_cpu->last_update = time;

>> >> >

>> >> > -       if (!sugov_should_update_freq(sg_policy, time))

>> >> > +       if (!sugov_should_update_freq(sg_policy, time, hook->cpu))

>> >> >                 return;

>> >>

>> >> Since with the remote callbacks now possible, isn't it unsafe to

>> >> modify sg_cpu and sg_policy structures without a lock in

>> >> sugov_update_single?

>> >>

>> >> Unlike sugov_update_shared, we don't acquire any lock in

>> >> sugov_update_single before updating these structures. Did I miss

>> >> something?

>> >

>> > As Peter already mentioned it earlier, the callbacks are called with

>> > rq locks held and so sugov_update_single() wouldn't get called in

>> > parallel for a target CPU.

>>

>> Ah ok, I have to catch up with that discussion since I missed the

>> whole thing. Now that you will have me on CC, that shouldn't happen,

>> thanks and sorry about the noise.

>>

>> > That's the only race you were worried about ?

>>

>> Yes. So then in that case, makes sense to move raw_spin_lock in

>> sugov_update_shared further down? (Just discussing, this point is

>> independent of your patch), Something like:

>

> Even that was discussed tomorrow with Peter :)

>

> No it wouldn't work because sg_cpu->util we are updating here may be

> getting read from some other cpu that shares policy with sg_cpu.

>


Ok. yes you are right :) thank you Viresh and Peter for the clarification.

thanks,

-Joel
Joel Fernandes (Google) July 28, 2017, 3:44 a.m. | #14
On Thu, Jul 27, 2017 at 12:21 AM, Juri Lelli <juri.lelli@arm.com> wrote:
[..]
>> >

>> > But even without that, if you see the routine

>> > init_entity_runnable_average() in fair.c, the new tasks are

>> > initialized in a way that they are seen as heavy tasks. And so even

>> > for the first time they run, freq should normally increase on the

>> > target CPU (at least with above application).i

>>

>> Ok, but the "heavy" in init_entity_runnable_average means for load,

>> not the util_avg. The util_avg is what's used for frequency scaling

>> IIUC and is set to 0 in that function no?

>>

>

> True for init_entity_runnable_average(), but for new task post_init_

> entity_util_avg() is then also called (from wake_up_new_task()), which

> modifies the initial util_avg value (depending on current rq {util,

> load}_avg.


Got it. That makes sense, thank you!

-Joel

Patch

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index d64754fb912e..df9aa1ee53ff 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -79,6 +79,10 @@  static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
        s64 delta_ns;
        bool update;
 
+       /* Allow remote callbacks only on the CPUs sharing cpufreq policy */
+       if (!cpumask_test_cpu(smp_processor_id(), sg_policy->policy->cpus))
+               return false;
+
        if (sg_policy->work_in_progress)
                return false;
 
@@ -225,10 +229,6 @@  static void sugov_update_single(struct update_util_data *hook, u64 time,
        unsigned int next_f;
        bool busy;
 
-       /* Remote callbacks aren't allowed for policies which aren't shared */
-       if (smp_processor_id() != hook->cpu)
-               return;
-
        sugov_set_iowait_boost(sg_cpu, time, flags);
        sg_cpu->last_update = time;
 
@@ -298,14 +298,9 @@  static void sugov_update_shared(struct update_util_data *hook, u64 time,
 {
        struct sugov_cpu *sg_cpu = container_of(hook, struct sugov_cpu, update_util);
        struct sugov_policy *sg_policy = sg_cpu->sg_policy;
-       struct cpufreq_policy *policy = sg_policy->policy;
        unsigned long util, max;
        unsigned int next_f;
 
-       /* Allow remote callbacks only on the CPUs sharing cpufreq policy */
-       if (!cpumask_test_cpu(smp_processor_id(), policy->cpus))
-               return;
-
        sugov_get_util(&util, &max, hook->cpu);
 
        raw_spin_lock(&sg_policy->update_lock);