[RFD,4/5] sched/cpufreq_schedutil: always consider all CPUs when deciding next freq

Message ID 20170324140900.7334-5-juri.lelli@arm.com
State New
Headers show
Series
  • SCHED_DEADLINE freq/cpu invariance and OPP selection
Related show

Commit Message

Juri Lelli March 24, 2017, 2:08 p.m.
No assumption can be made upon the rate at which frequency updates get
triggered, as there are scheduling policies (like SCHED_DEADLINE) which
don't trigger them so frequently.

Remove such assumption from the code.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
 kernel/sched/cpufreq_schedutil.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

-- 
2.10.0

Comments

Rafael J. Wysocki March 29, 2017, 10:41 p.m. | #1
On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:
> No assumption can be made upon the rate at which frequency updates get

> triggered, as there are scheduling policies (like SCHED_DEADLINE) which

> don't trigger them so frequently.

> 

> Remove such assumption from the code.


But the util/max values for idle CPUs may be stale, no?

Thanks,
Rafael
Juri Lelli March 31, 2017, 7:31 a.m. | #2
On 30/03/17 22:13, Rafael J. Wysocki wrote:
> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <juri.lelli@arm.com> wrote:

> > Hi,

> 

> Hi,

> 

> > On 30/03/17 00:41, Rafael J. Wysocki wrote:

> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:

> >> > No assumption can be made upon the rate at which frequency updates get

> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which

> >> > don't trigger them so frequently.

> >> >

> >> > Remove such assumption from the code.

> >>

> >> But the util/max values for idle CPUs may be stale, no?

> >>

> >

> > Right, that might be a problem. A proper solution I think would be to

> > remotely update such values for idle CPUs, and I believe Vincent is

> > working on a patch for that.

> >

> > As mid-term workarounds, changing a bit the current one, come to my

> > mind:

> >

> >  - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set

> >  - remove CFS contribution (without triggering a freq update) when a CPU

> >    enters IDLE; this might not work well, though, as we probably want

> >    to keep in blocked util contribution for a bit

> >

> > What you think is the way to go?

> 

> Well, do we want SCHED_DEADLINE util contribution to be there even for

> idle CPUs?

> 


DEADLINE util contribution is removed, even if the CPU is idle, by the
reclaiming mechanism when we know (applying GRUB algorithm rules [1])
that it can't be used anymore by a task (roughly speaking). So, we
shouldn't have this problem in the DEADLINE case.

[1] https://marc.info/?l=linux-kernel&m=149029880524038
Juri Lelli March 31, 2017, 9:16 a.m. | #3
On 31/03/17 11:03, Rafael J. Wysocki wrote:
> On Fri, Mar 31, 2017 at 9:31 AM, Juri Lelli <juri.lelli@arm.com> wrote:

> > On 30/03/17 22:13, Rafael J. Wysocki wrote:

> >> On Thu, Mar 30, 2017 at 10:58 AM, Juri Lelli <juri.lelli@arm.com> wrote:

> >> > Hi,

> >>

> >> Hi,

> >>

> >> > On 30/03/17 00:41, Rafael J. Wysocki wrote:

> >> >> On Friday, March 24, 2017 02:08:59 PM Juri Lelli wrote:

> >> >> > No assumption can be made upon the rate at which frequency updates get

> >> >> > triggered, as there are scheduling policies (like SCHED_DEADLINE) which

> >> >> > don't trigger them so frequently.

> >> >> >

> >> >> > Remove such assumption from the code.

> >> >>

> >> >> But the util/max values for idle CPUs may be stale, no?

> >> >>

> >> >

> >> > Right, that might be a problem. A proper solution I think would be to

> >> > remotely update such values for idle CPUs, and I believe Vincent is

> >> > working on a patch for that.

> >> >

> >> > As mid-term workarounds, changing a bit the current one, come to my

> >> > mind:

> >> >

> >> >  - consider TICK_NSEC (continue) only when SCHED_CPUFREQ_DL is not set

> >> >  - remove CFS contribution (without triggering a freq update) when a CPU

> >> >    enters IDLE; this might not work well, though, as we probably want

> >> >    to keep in blocked util contribution for a bit

> >> >

> >> > What you think is the way to go?

> >>

> >> Well, do we want SCHED_DEADLINE util contribution to be there even for

> >> idle CPUs?

> >>

> >

> > DEADLINE util contribution is removed, even if the CPU is idle, by the

> > reclaiming mechanism when we know (applying GRUB algorithm rules [1])

> > that it can't be used anymore by a task (roughly speaking). So, we

> > shouldn't have this problem in the DEADLINE case.

> >

> > [1] https://marc.info/?l=linux-kernel&m=149029880524038

> 

> OK

> 

> Why don't you store the contributions from DL and CFS separately, then

> (say, as util_dl, util_cfs, respectively) and only discard the CFS one

> if delta_ns > TICK_NSEC?


Sure, this should work as well. I'll try this approach for next version.

Thanks,

- Juri

Patch hide | download patch | download mbox

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index da67a1cf91e7..40f30373b709 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -233,14 +233,13 @@  static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu)
 		 * If the CPU utilization was last updated before the previous
 		 * frequency update and the time elapsed between the last update
 		 * of the CPU utilization and the last frequency update is long
-		 * enough, don't take the CPU into account as it probably is
-		 * idle now (and clear iowait_boost for it).
+		 * enough, reset iowait_boost, as it probably is not boosted
+		 * anymore now.
 		 */
 		delta_ns = last_freq_update_time - j_sg_cpu->last_update;
-		if (delta_ns > TICK_NSEC) {
+		if (delta_ns > TICK_NSEC)
 			j_sg_cpu->iowait_boost = 0;
-			continue;
-		}
+
 		if (j_sg_cpu->flags & SCHED_CPUFREQ_RT)
 			return policy->cpuinfo.max_freq;