[V2,0/2] cpufreq_cooling: Get effective CPU utilization from scheduler

Message ID	cover.1603448113.git.viresh.kumar@linaro.org
Headers	show Return-Path: <SRS0=XRtZ=D6=vger.kernel.org=linux-pm-owner@kernel.org> From: Viresh Kumar <viresh.kumar@linaro.org> To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Vincent Guittot <vincent.guittot@linaro.org>, Amit Daniel Kachhap <amit.kachhap@gmail.com>, Amit Kucheria <amitk@kernel.org>, Ben Segall <bsegall@google.com>, Daniel Bristot de Oliveira <bristot@redhat.com>, Daniel Lezcano <daniel.lezcano@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Javi Merino <javi.merino@kernel.org>, Juri Lelli <juri.lelli@redhat.com>, Mel Gorman <mgorman@suse.de>, "Rafael J. Wysocki" <rjw@rjwysocki.net>, Steven Rostedt <rostedt@goodmis.org>, Viresh Kumar <viresh.kumar@linaro.org>, Zhang Rui <rui.zhang@intel.com> Cc: linux-kernel@vger.kernel.org, Quentin Perret <qperret@google.com>, Lukasz Luba <lukasz.luba@arm.com>, linux-pm@vger.kernel.org Subject: [PATCH V2 0/2] cpufreq_cooling: Get effective CPU utilization from scheduler Date: Fri, 23 Oct 2020 15:50:19 +0530 Message-Id: <cover.1603448113.git.viresh.kumar@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	cpufreq_cooling: Get effective CPU utilization from scheduler \| expand [V2,0/2] cpufreq_cooling: Get effective CPU utilization from scheduler [V2,1/2] sched/core: Rename and move schedutil_cpu_util() to core.c [V2,2/2] thermal: cpufreq_cooling: Reuse sched_cpu_util()

Message ID

cover.1603448113.git.viresh.kumar@linaro.org

Headers

From: Viresh Kumar <viresh.kumar@linaro.org>
To: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Amit Daniel Kachhap <amit.kachhap@gmail.com>,
	Amit Kucheria <amitk@kernel.org>, Ben Segall <bsegall@google.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Javi Merino <javi.merino@kernel.org>, Juri Lelli <juri.lelli@redhat.com>,
	Mel Gorman <mgorman@suse.de>, "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Steven Rostedt <rostedt@goodmis.org>,
	Viresh Kumar <viresh.kumar@linaro.org>, Zhang Rui <rui.zhang@intel.com>
Cc: linux-kernel@vger.kernel.org, Quentin Perret <qperret@google.com>,
	Lukasz Luba <lukasz.luba@arm.com>, linux-pm@vger.kernel.org
Subject: [PATCH V2 0/2] cpufreq_cooling: Get effective CPU utilization from
	scheduler
Date: Fri, 23 Oct 2020 15:50:19 +0530
Message-Id: <cover.1603448113.git.viresh.kumar@linaro.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

cpufreq_cooling: Get effective CPU utilization from scheduler | expand

Message

Viresh Kumar Oct. 23, 2020, 10:20 a.m. UTC

Hi Peter/Rafael,

I thought about the fallback thing getting registered by scheduler with
cpufreq core, and after Peter's comment about two interactions with
cpufreq I didn't like it much. Either way we are exposing the cpu
utilization finding algorithm to rest of the kernel, through cpufreq or
otherwise.

And so kept it simple for now. Scheduler exposes a single routine,
sched_cpu_util(), along with the related enum and that's all schedutil
and cpufreq_cooling stuff want.

V1->V2:
- Name the routine as sched_cpu_util().
- Make it more self sufficient and remove few parameters that aren't
  required to be exposed anymore to rest of the kernel.
- Better cleanups in schedutil and cpufreq_cooling.

---

Schedutil and fair.c use schedutil_cpu_util() to get an idea of how busy
a CPU is. Do the same for cpufreq_cooling which uses CPU's idle time
currently to get load, which is used to calculate the current power
consumption of the CPUs, which isn't that accurate.

Tested with hackbench and sysbench on Hikey (octa-core SMP) and no
regression was observed.

--
Viresh

Viresh Kumar (2):
  sched/core: Rename and move schedutil_cpu_util() to core.c
  thermal: cpufreq_cooling: Reuse sched_cpu_util()

 drivers/thermal/cpufreq_cooling.c |  70 +++++-------------
 include/linux/sched.h             |  19 +++++
 kernel/sched/core.c               | 113 +++++++++++++++++++++++++++++
 kernel/sched/cpufreq_schedutil.c  | 116 +-----------------------------
 kernel/sched/fair.c               |   6 +-
 kernel/sched/sched.h              |  29 +-------
 6 files changed, 156 insertions(+), 197 deletions(-)

Comments

Viresh Kumar Oct. 23, 2020, 10:54 a.m. UTC | #1

On 23-10-20, 12:34, Peter Zijlstra wrote:
> On Fri, Oct 23, 2020 at 03:50:20PM +0530, Viresh Kumar wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index d2003a7d5ab5..369ff54d11d4 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5117,6 +5117,119 @@ struct task_struct *idle_task(int cpu)
> >  	return cpu_rq(cpu)->idle;
> >  }
> >  
> > +/*
> > + * This function computes an effective utilization for the given CPU, to be
> > + * used for frequency selection given the linear relation: f = u * f_max.
> > + *
> > + * The scheduler tracks the following metrics:
> > + *
> > + *   cpu_util_{cfs,rt,dl,irq}()
> > + *   cpu_bw_dl()
> > + *
> > + * Where the cfs,rt and dl util numbers are tracked with the same metric and
> > + * synchronized windows and are thus directly comparable.
> > + *
> > + * The cfs,rt,dl utilization are the running times measured with rq->clock_task
> > + * which excludes things like IRQ and steal-time. These latter are then accrued
> > + * in the irq utilization.
> > + *
> > + * The DL bandwidth number otoh is not a measured metric but a value computed
> > + * based on the task model parameters and gives the minimal utilization
> > + * required to meet deadlines.
> > + */
> > +unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > +				 unsigned long max, enum cpu_util_type type,
> > +				 struct task_struct *p)
> > +{
> 	...
> > +}
> > +
> > +unsigned long sched_cpu_util(int cpu, enum cpu_util_type type,
> > +			     unsigned long max)
> > +{
> > +	return effective_cpu_util(cpu, cpu_util_cfs(cpu_rq(cpu)), max, type,
> > +				  NULL);
> > +}
> 
> Shouldn't all that be: #ifdef CONFIG_SMP ?

I didn't realize that these matrices are only available in case of SMP
and that's why schedutil isn't available for !SMP. I wonder what we
should be doing in cpufreq_cooling now ? Make it depend on SMP ? Or
calculate load the traditional way (the stuff I just removed) for !SMP
case ?

:)

Lukasz Luba Oct. 23, 2020, 11:08 a.m. UTC | #2

On 10/23/20 11:54 AM, Viresh Kumar wrote:
> On 23-10-20, 12:34, Peter Zijlstra wrote:
>> On Fri, Oct 23, 2020 at 03:50:20PM +0530, Viresh Kumar wrote:
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index d2003a7d5ab5..369ff54d11d4 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -5117,6 +5117,119 @@ struct task_struct *idle_task(int cpu)
>>>   	return cpu_rq(cpu)->idle;
>>>   }
>>>   
>>> +/*
>>> + * This function computes an effective utilization for the given CPU, to be
>>> + * used for frequency selection given the linear relation: f = u * f_max.
>>> + *
>>> + * The scheduler tracks the following metrics:
>>> + *
>>> + *   cpu_util_{cfs,rt,dl,irq}()
>>> + *   cpu_bw_dl()
>>> + *
>>> + * Where the cfs,rt and dl util numbers are tracked with the same metric and
>>> + * synchronized windows and are thus directly comparable.
>>> + *
>>> + * The cfs,rt,dl utilization are the running times measured with rq->clock_task
>>> + * which excludes things like IRQ and steal-time. These latter are then accrued
>>> + * in the irq utilization.
>>> + *
>>> + * The DL bandwidth number otoh is not a measured metric but a value computed
>>> + * based on the task model parameters and gives the minimal utilization
>>> + * required to meet deadlines.
>>> + */
>>> +unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>>> +				 unsigned long max, enum cpu_util_type type,
>>> +				 struct task_struct *p)
>>> +{
>> 	...
>>> +}
>>> +
>>> +unsigned long sched_cpu_util(int cpu, enum cpu_util_type type,
>>> +			     unsigned long max)
>>> +{
>>> +	return effective_cpu_util(cpu, cpu_util_cfs(cpu_rq(cpu)), max, type,
>>> +				  NULL);
>>> +}
>>
>> Shouldn't all that be: #ifdef CONFIG_SMP ?
> 
> I didn't realize that these matrices are only available in case of SMP
> and that's why schedutil isn't available for !SMP. I wonder what we
> should be doing in cpufreq_cooling now ? Make it depend on SMP ? Or
> calculate load the traditional way (the stuff I just removed) for !SMP
> case ?

IMO the !SMP can leave with the old design, so keeping two
implementations under #ifdef CONFIG_SMP is fair I would say in this
case.

There are popular platforms !SMP (BeagleBone, RPi1, RPiZero) but I
haven't heard anyone was using IPA on them.

Regards,
Lukasz

> 
> :)
>

Vincent Guittot Oct. 23, 2020, 12:34 p.m. UTC | #3

On Fri, 23 Oct 2020 at 12:54, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 23-10-20, 12:34, Peter Zijlstra wrote:
> > On Fri, Oct 23, 2020 at 03:50:20PM +0530, Viresh Kumar wrote:
> > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > > index d2003a7d5ab5..369ff54d11d4 100644
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -5117,6 +5117,119 @@ struct task_struct *idle_task(int cpu)
> > >     return cpu_rq(cpu)->idle;
> > >  }
> > >
> > > +/*
> > > + * This function computes an effective utilization for the given CPU, to be
> > > + * used for frequency selection given the linear relation: f = u * f_max.
> > > + *
> > > + * The scheduler tracks the following metrics:
> > > + *
> > > + *   cpu_util_{cfs,rt,dl,irq}()
> > > + *   cpu_bw_dl()
> > > + *
> > > + * Where the cfs,rt and dl util numbers are tracked with the same metric and
> > > + * synchronized windows and are thus directly comparable.
> > > + *
> > > + * The cfs,rt,dl utilization are the running times measured with rq->clock_task
> > > + * which excludes things like IRQ and steal-time. These latter are then accrued
> > > + * in the irq utilization.
> > > + *
> > > + * The DL bandwidth number otoh is not a measured metric but a value computed
> > > + * based on the task model parameters and gives the minimal utilization
> > > + * required to meet deadlines.
> > > + */
> > > +unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
> > > +                            unsigned long max, enum cpu_util_type type,
> > > +                            struct task_struct *p)
> > > +{
> >       ...
> > > +}
> > > +
> > > +unsigned long sched_cpu_util(int cpu, enum cpu_util_type type,
> > > +                        unsigned long max)
> > > +{
> > > +   return effective_cpu_util(cpu, cpu_util_cfs(cpu_rq(cpu)), max, type,
> > > +                             NULL);
> > > +}
> >
> > Shouldn't all that be: #ifdef CONFIG_SMP ?
>
> I didn't realize that these matrices are only available in case of SMP
> and that's why schedutil isn't available for !SMP. I wonder what we

Maybe it's time to make sched_util and pelt available for !SMP too.

With util_est and uclamp, I can see some benefits for !SMP compare to ondemand

> should be doing in cpufreq_cooling now ? Make it depend on SMP ? Or
> calculate load the traditional way (the stuff I just removed) for !SMP
> case ?
>
> :)
>
> --
> viresh

Viresh Kumar Oct. 28, 2020, 5:49 a.m. UTC | #4

On 23-10-20, 14:34, Vincent Guittot wrote:
> Maybe it's time to make sched_util and pelt available for !SMP too.

> 

> With util_est and uclamp, I can see some benefits for !SMP compare to ondemand


That's a decision you guys (sched maintainers) need to make :)

-- 
viresh