[RFC,5/7] sched: cfs: cpu frequency scaling arch functions

Message ID 1413958051-7103-6-git-send-email-mturquette@linaro.org
State New
Headers show

Commit Message

Mike Turquette Oct. 22, 2014, 6:07 a.m.
arch_eval_cpu_freq and arch_scale_cpu_freq are added to allow the
scheduler to evaluate if cpu frequency should change and to invoke that
change from a safe context.

They are weakly defined arch functions that do nothing by default. A
CPUfreq governor could use these functions to implement a frequency
scaling policy based on updates to per-task statistics or updates to
per-cpu utilization.

As discussed at Linux Plumbers Conference 2014, the goal will be to
focus on a single cpu frequency scaling policy that works for everyone.
That may mean that the weak arch functions definitions can be removed
entirely and a single policy implements that logic for all
architectures.

Not-signed-off-by: Mike Turquette <mturquette@linaro.org>
---
 kernel/sched/fair.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Rik van Riel Oct. 22, 2014, 8:06 p.m. | #1
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/22/2014 02:07 AM, Mike Turquette wrote:
> arch_eval_cpu_freq and arch_scale_cpu_freq are added to allow the 
> scheduler to evaluate if cpu frequency should change and to invoke
> that change from a safe context.
> 
> They are weakly defined arch functions that do nothing by default.
> A CPUfreq governor could use these functions to implement a
> frequency scaling policy based on updates to per-task statistics or
> updates to per-cpu utilization.
> 
> As discussed at Linux Plumbers Conference 2014, the goal will be
> to focus on a single cpu frequency scaling policy that works for
> everyone. That may mean that the weak arch functions definitions
> can be removed entirely and a single policy implements that logic
> for all architectures.

On virtual machines, we probably want to use both frequency and
steal time to calculate the factor.

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUSA5XAAoJEM553pKExN6DeRYH/jeXImjO2/WZFp82Yv6ukMxI
r8/kzrLMA+NS1XXCWYIcOiBqReEabkZZmypt21Tdnpkvi4GbZPpG0PEApSvOfqWE
w71J87cpMGV/e4uLcBDcvgHJX8RBQLO/ZqDcMm+zcSoeJ3G3NMK2YlZp3Uf8xqcB
tE2VGW7o2yEqNJL1fqYb++3upQmc10vIFqxVIJfP+TqZRyaVP+5kBqOMDTWb5qCV
qZjBKe1jDX5sLLGfY0ddAeuUH1iEJBIUMCcr027ezcqRp4YoqIrHRInHmNxEs5Az
9PN8N0yGgqhvkcCfXG7He+tQBHECOnjyQlrM/2K8Cw11RziwDkC/yYIp3DPgjxc=
=f/8V
-----END PGP SIGNATURE-----
Mike Turquette Oct. 22, 2014, 11:20 p.m. | #2
On Wed, Oct 22, 2014 at 1:06 PM, Rik van Riel <riel@redhat.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/22/2014 02:07 AM, Mike Turquette wrote:
>> arch_eval_cpu_freq and arch_scale_cpu_freq are added to allow the
>> scheduler to evaluate if cpu frequency should change and to invoke
>> that change from a safe context.
>>
>> They are weakly defined arch functions that do nothing by default.
>> A CPUfreq governor could use these functions to implement a
>> frequency scaling policy based on updates to per-task statistics or
>> updates to per-cpu utilization.
>>
>> As discussed at Linux Plumbers Conference 2014, the goal will be
>> to focus on a single cpu frequency scaling policy that works for
>> everyone. That may mean that the weak arch functions definitions
>> can be removed entirely and a single policy implements that logic
>> for all architectures.
>
> On virtual machines, we probably want to use both frequency and
> steal time to calculate the factor.

You mean for calculating desired cpu frequency on a virtual guest? Is
that something we want to do?

Thanks,
Mike

>
> - --
> All rights reversed
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
>
> iQEcBAEBAgAGBQJUSA5XAAoJEM553pKExN6DeRYH/jeXImjO2/WZFp82Yv6ukMxI
> r8/kzrLMA+NS1XXCWYIcOiBqReEabkZZmypt21Tdnpkvi4GbZPpG0PEApSvOfqWE
> w71J87cpMGV/e4uLcBDcvgHJX8RBQLO/ZqDcMm+zcSoeJ3G3NMK2YlZp3Uf8xqcB
> tE2VGW7o2yEqNJL1fqYb++3upQmc10vIFqxVIJfP+TqZRyaVP+5kBqOMDTWb5qCV
> qZjBKe1jDX5sLLGfY0ddAeuUH1iEJBIUMCcr027ezcqRp4YoqIrHRInHmNxEs5Az
> 9PN8N0yGgqhvkcCfXG7He+tQBHECOnjyQlrM/2K8Cw11RziwDkC/yYIp3DPgjxc=
> =f/8V
> -----END PGP SIGNATURE-----
Rik van Riel Oct. 23, 2014, 1:42 a.m. | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/22/2014 07:20 PM, Mike Turquette wrote:
> On Wed, Oct 22, 2014 at 1:06 PM, Rik van Riel <riel@redhat.com>
> wrote: On 10/22/2014 02:07 AM, Mike Turquette wrote:
>>>> arch_eval_cpu_freq and arch_scale_cpu_freq are added to allow
>>>> the scheduler to evaluate if cpu frequency should change and
>>>> to invoke that change from a safe context.
>>>> 
>>>> They are weakly defined arch functions that do nothing by
>>>> default. A CPUfreq governor could use these functions to
>>>> implement a frequency scaling policy based on updates to
>>>> per-task statistics or updates to per-cpu utilization.
>>>> 
>>>> As discussed at Linux Plumbers Conference 2014, the goal will
>>>> be to focus on a single cpu frequency scaling policy that
>>>> works for everyone. That may mean that the weak arch
>>>> functions definitions can be removed entirely and a single
>>>> policy implements that logic for all architectures.
> 
> On virtual machines, we probably want to use both frequency and 
> steal time to calculate the factor.
> 
>> You mean for calculating desired cpu frequency on a virtual
>> guest? Is that something we want to do?

A guest will be unable to set the cpu frequency, but it should
know what the frequency is, so it can take the capacity of each
CPU into account when doing things like load balancing.

This has little impact on this patch series, the impact is more
in the load balancer, which can see how much compute capacity is
available on each CPU, and adjust the load accordingly.

I have seen some code come by that adjusts each cpu's compute_capacity,
but do not remember whether it looks at cpu frequency, and am pretty
sure it does not look at steal time currently :)

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUSF0DAAoJEM553pKExN6DYgkIALSZxKxQhMAJl0VUrBtPEFlr
cXOr0jKS/0FowS22agzpJr/OoWi58mGGm6mKr6LkoZJ34K96Y6/H4ie7Sr7Q4BL/
A4hQpTwxHzGasawQwdQOG/lW2q2oDUqsQuxRQDOs97I4vtYwxsj+D3qDtfIyaosf
f7ctWDQMzBBgLlrDn1wWmDE6K1pxa2eqnf0rRVSRNRXQ/lncHHzPdFOj4sJE9RVQ
E47gqeisDf+m7TyvG1I9MN6ZIHMEfgaQcmVvO8/QGqnb1ZMom6JTCDa4UqAd97XB
1NQ/QSJvQ5ED/cCfLy91YguEr/GY+QFsKeCjL1604e+3lsN4DjuejtcUP9/LQVs=
=On7B
-----END PGP SIGNATURE-----
Mike Galbraith Oct. 23, 2014, 2:12 a.m. | #4
On Wed, 2014-10-22 at 21:42 -0400, Rik van Riel wrote: 
> On 10/22/2014 07:20 PM, Mike Turquette wrote:
> > On Wed, Oct 22, 2014 at 1:06 PM, Rik van Riel <riel@redhat.com>
> > wrote: On 10/22/2014 02:07 AM, Mike Turquette wrote:
> >>>> arch_eval_cpu_freq and arch_scale_cpu_freq are added to allow
> >>>> the scheduler to evaluate if cpu frequency should change and
> >>>> to invoke that change from a safe context.
> >>>> 
> >>>> They are weakly defined arch functions that do nothing by
> >>>> default. A CPUfreq governor could use these functions to
> >>>> implement a frequency scaling policy based on updates to
> >>>> per-task statistics or updates to per-cpu utilization.
> >>>> 
> >>>> As discussed at Linux Plumbers Conference 2014, the goal will
> >>>> be to focus on a single cpu frequency scaling policy that
> >>>> works for everyone. That may mean that the weak arch
> >>>> functions definitions can be removed entirely and a single
> >>>> policy implements that logic for all architectures.
> > 
> > On virtual machines, we probably want to use both frequency and 
> > steal time to calculate the factor.
> > 
> >> You mean for calculating desired cpu frequency on a virtual
> >> guest? Is that something we want to do?
> 
> A guest will be unable to set the cpu frequency, but it should
> know what the frequency is, so it can take the capacity of each
> CPU into account when doing things like load balancing.

Hm.  Why does using vaporite freq/capacity/whatever make any sense, the
silicon under the V(aporite)PU can/does change at the drop of a hat, no?

-Mike
Rik van Riel Oct. 23, 2014, 2:42 a.m. | #5
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/22/2014 10:12 PM, Mike Galbraith wrote:
> On Wed, 2014-10-22 at 21:42 -0400, Rik van Riel wrote:
>> On 10/22/2014 07:20 PM, Mike Turquette wrote:
>>> On Wed, Oct 22, 2014 at 1:06 PM, Rik van Riel
>>> <riel@redhat.com> wrote: On 10/22/2014 02:07 AM, Mike Turquette
>>> wrote:
>>>>>> arch_eval_cpu_freq and arch_scale_cpu_freq are added to
>>>>>> allow the scheduler to evaluate if cpu frequency should
>>>>>> change and to invoke that change from a safe context.
>>>>>> 
>>>>>> They are weakly defined arch functions that do nothing
>>>>>> by default. A CPUfreq governor could use these functions
>>>>>> to implement a frequency scaling policy based on updates
>>>>>> to per-task statistics or updates to per-cpu
>>>>>> utilization.
>>>>>> 
>>>>>> As discussed at Linux Plumbers Conference 2014, the goal
>>>>>> will be to focus on a single cpu frequency scaling policy
>>>>>> that works for everyone. That may mean that the weak
>>>>>> arch functions definitions can be removed entirely and a
>>>>>> single policy implements that logic for all
>>>>>> architectures.
>>> 
>>> On virtual machines, we probably want to use both frequency and
>>>  steal time to calculate the factor.
>>> 
>>>> You mean for calculating desired cpu frequency on a virtual 
>>>> guest? Is that something we want to do?
>> 
>> A guest will be unable to set the cpu frequency, but it should 
>> know what the frequency is, so it can take the capacity of each 
>> CPU into account when doing things like load balancing.
> 
> Hm.  Why does using vaporite freq/capacity/whatever make any sense,
> the silicon under the V(aporite)PU can/does change at the drop of a
> hat, no?

It can, but IIRC that should cause the kvmclock data for that VCPU
to be regenerated, and the VCPU should be able to use that to figure
out that the frequency changed the next time it runs the scheduler
code on that VCPU.

- -- 
All rights reversed
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUSGsmAAoJEM553pKExN6DlhUH/RLmVoHmab2zfPgZfRXWD9PX
yKkx1tmoNPFAdp7l1xgz+fIVtp5I7gUnCo03r0x3JDL8dYiEfU1BfX1bs2WSresL
7q50DVLQe8VXIqgmu1INqzQSJGfF9yOW4Kgg2xHkNBoWUdt+3fjF9JSEMJFxOZOs
pFT85ITTs0zFIRDlwdEBEs0kRLEqh0JBeLx501RSC9VQ9OIZ3lp9O1BnawQ8WI0o
Qq8ODXFgy1BGUE+Ow+skP8MnQUyBgb6b+f0Q6AmK/Er6lzw8PMwNvnmYN14ruR3R
YkTjsyYxlYlzrx2IKZNWuYy5OXguRIslWi67fI0k/yE2WVHy/yXPbRErYQfM2o8=
=PeDr
-----END PGP SIGNATURE-----

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0930ad8..1af6f6d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2265,6 +2265,8 @@  static u32 __compute_runnable_contrib(u64 n)
 }
 
 unsigned long arch_scale_load_capacity(int cpu);
+void arch_eval_cpu_freq(struct cpumask *cpus);
+void arch_scale_cpu_freq(void);
 
 /*
  * We can represent the historical contribution to runnable average as the
@@ -5805,6 +5807,16 @@  unsigned long __weak arch_scale_load_capacity(int cpu)
 	return default_scale_load_capacity(cpu);
 }
 
+void __weak arch_eval_cpu_freq(struct cpumask *cpus)
+{
+	return;
+}
+
+void __weak arch_scale_cpu_freq(void)
+{
+	return;
+}
+
 static unsigned long scale_rt_capacity(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);