diff mbox series

[1/3] sched/fair: Prepare variables for increased precision of EAS estimated energy

Message ID 20210625152603.25960-2-lukasz.luba@arm.com
State New
Headers show
Series Improve EAS energy estimation and increase precision | expand

Commit Message

Lukasz Luba June 25, 2021, 3:26 p.m. UTC
The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up
task. It probes many possibilities and compares the estimated energy values
for different scenarios. For calculating those energy values it relies on
Energy Model (EM) data and em_cpu_energy(). The precision which is used in
EM data is in milli-Watts (or abstract scale), which sometimes is not
sufficient. In some cases it might happen that two CPUs from different
Performance Domains (PDs) get the same calculated value for a given task
placement, but in more precised scale, they might differ. This rounding
error has to be addressed. This patch prepares EAS code for better
precision in the coming EM improvements.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 kernel/sched/fair.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

Comments

Rafael J. Wysocki June 30, 2021, 5:01 p.m. UTC | #1
On Fri, Jun 25, 2021 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>

> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

> task. It probes many possibilities and compares the estimated energy values

> for different scenarios. For calculating those energy values it relies on

> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

> EM data is in milli-Watts (or abstract scale), which sometimes is not

> sufficient. In some cases it might happen that two CPUs from different

> Performance Domains (PDs) get the same calculated value for a given task

> placement, but in more precised scale, they might differ. This rounding

> error has to be addressed. This patch prepares EAS code for better

> precision in the coming EM improvements.

>

> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>


If you want me to pick up this series, this patch requires an ACK from
the scheduler maintainers.

> ---

>  kernel/sched/fair.c | 13 +++++++------

>  1 file changed, 7 insertions(+), 6 deletions(-)

>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 7b8990fd4896..b517c9e79768 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -6582,7 +6582,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)

>   * to compute what would be the energy if we decided to actually migrate that

>   * task.

>   */

> -static long

> +static u64

>  compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)

>  {

>         struct cpumask *pd_mask = perf_domain_span(pd);

> @@ -6689,12 +6689,13 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)

>   */

>  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)

>  {

> -       unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;

>         struct root_domain *rd = cpu_rq(smp_processor_id())->rd;

> +       u64 prev_delta = ULLONG_MAX, best_delta = ULLONG_MAX;

>         int cpu, best_energy_cpu = prev_cpu, target = -1;

> -       unsigned long cpu_cap, util, base_energy = 0;

> +       unsigned long cpu_cap, util;

>         struct sched_domain *sd;

>         struct perf_domain *pd;

> +       u64 base_energy = 0;

>

>         rcu_read_lock();

>         pd = rcu_dereference(rd->pd);

> @@ -6718,9 +6719,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)

>                 goto unlock;

>

>         for (; pd; pd = pd->next) {

> -               unsigned long cur_delta, spare_cap, max_spare_cap = 0;

> +               unsigned long spare_cap, max_spare_cap = 0;

>                 bool compute_prev_delta = false;

> -               unsigned long base_energy_pd;

> +               u64 base_energy_pd, cur_delta;

>                 int max_spare_cap_cpu = -1;

>

>                 for_each_cpu_and(cpu, perf_domain_span(pd), sched_domain_span(sd)) {

> @@ -6790,7 +6791,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)

>          * Pick the best CPU if prev_cpu cannot be used, or if it saves at

>          * least 6% of the energy used by prev_cpu.

>          */

> -       if ((prev_delta == ULONG_MAX) ||

> +       if ((prev_delta == ULLONG_MAX) ||

>             (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))

>                 target = best_energy_cpu;

>

> --

> 2.17.1

>
Lukasz Luba June 30, 2021, 5:28 p.m. UTC | #2
On 6/30/21 6:01 PM, Rafael J. Wysocki wrote:
> On Fri, Jun 25, 2021 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

>> task. It probes many possibilities and compares the estimated energy values

>> for different scenarios. For calculating those energy values it relies on

>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

>> EM data is in milli-Watts (or abstract scale), which sometimes is not

>> sufficient. In some cases it might happen that two CPUs from different

>> Performance Domains (PDs) get the same calculated value for a given task

>> placement, but in more precised scale, they might differ. This rounding

>> error has to be addressed. This patch prepares EAS code for better

>> precision in the coming EM improvements.

>>

>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

> 

> If you want me to pick up this series, this patch requires an ACK from

> the scheduler maintainers.

> 


It would be great, if you could take it after e.g. Peter ACK it.

Peter could you have a look at it, please?
In this patch 1/3 we have only variables upgrade.

Regards,
Lukasz
Lukasz Luba July 2, 2021, 7:07 p.m. UTC | #3
Hi Peter,

Gentle ping.
You might missed my previous email.

On 6/30/21 6:28 PM, Lukasz Luba wrote:
> 

> 

> On 6/30/21 6:01 PM, Rafael J. Wysocki wrote:

>> On Fri, Jun 25, 2021 at 5:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>

>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

>>> task. It probes many possibilities and compares the estimated energy 

>>> values

>>> for different scenarios. For calculating those energy values it 

>>> relies on

>>> Energy Model (EM) data and em_cpu_energy(). The precision which is 

>>> used in

>>> EM data is in milli-Watts (or abstract scale), which sometimes is not

>>> sufficient. In some cases it might happen that two CPUs from different

>>> Performance Domains (PDs) get the same calculated value for a given task

>>> placement, but in more precised scale, they might differ. This rounding

>>> error has to be addressed. This patch prepares EAS code for better

>>> precision in the coming EM improvements.

>>>

>>> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

>>

>> If you want me to pick up this series, this patch requires an ACK from

>> the scheduler maintainers.

>>

> 

> It would be great, if you could take it after e.g. Peter ACK it.

> 

> Peter could you have a look at it, please?

> In this patch 1/3 we have only variables upgrade.

> 

> Regards,

> Lukasz


Could you have a look and ACK the patch 1/3, please?
Vincent Guittot July 7, 2021, 7:07 a.m. UTC | #4
On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

> task. It probes many possibilities and compares the estimated energy values

> for different scenarios. For calculating those energy values it relies on

> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

> EM data is in milli-Watts (or abstract scale), which sometimes is not

> sufficient. In some cases it might happen that two CPUs from different

> Performance Domains (PDs) get the same calculated value for a given task

> placement, but in more precised scale, they might differ. This rounding

> error has to be addressed. This patch prepares EAS code for better

> precision in the coming EM improvements.


Could you explain why 32bits results are not enough and you need to
move to 64bits ?

Right now the result is in the range [0..2^32[ mW. If you need more
precision and you want to return uW instead, you will have a result in
the range  [0..4kW[ which seems to be still enough

>

> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>

> ---

>  kernel/sched/fair.c | 13 +++++++------

>  1 file changed, 7 insertions(+), 6 deletions(-)

>

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c

> index 7b8990fd4896..b517c9e79768 100644

> --- a/kernel/sched/fair.c

> +++ b/kernel/sched/fair.c

> @@ -6582,7 +6582,7 @@ static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)

>   * to compute what would be the energy if we decided to actually migrate that

>   * task.

>   */

> -static long

> +static u64

>  compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)

>  {

>         struct cpumask *pd_mask = perf_domain_span(pd);

> @@ -6689,12 +6689,13 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)

>   */

>  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)

>  {

> -       unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;

>         struct root_domain *rd = cpu_rq(smp_processor_id())->rd;

> +       u64 prev_delta = ULLONG_MAX, best_delta = ULLONG_MAX;

>         int cpu, best_energy_cpu = prev_cpu, target = -1;

> -       unsigned long cpu_cap, util, base_energy = 0;

> +       unsigned long cpu_cap, util;

>         struct sched_domain *sd;

>         struct perf_domain *pd;

> +       u64 base_energy = 0;

>

>         rcu_read_lock();

>         pd = rcu_dereference(rd->pd);

> @@ -6718,9 +6719,9 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)

>                 goto unlock;

>

>         for (; pd; pd = pd->next) {

> -               unsigned long cur_delta, spare_cap, max_spare_cap = 0;

> +               unsigned long spare_cap, max_spare_cap = 0;

>                 bool compute_prev_delta = false;

> -               unsigned long base_energy_pd;

> +               u64 base_energy_pd, cur_delta;

>                 int max_spare_cap_cpu = -1;

>

>                 for_each_cpu_and(cpu, perf_domain_span(pd), sched_domain_span(sd)) {

> @@ -6790,7 +6791,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)

>          * Pick the best CPU if prev_cpu cannot be used, or if it saves at

>          * least 6% of the energy used by prev_cpu.

>          */

> -       if ((prev_delta == ULONG_MAX) ||

> +       if ((prev_delta == ULLONG_MAX) ||

>             (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))

>                 target = best_energy_cpu;

>

> --

> 2.17.1

>
Lukasz Luba July 7, 2021, 7:49 a.m. UTC | #5
On 7/7/21 8:07 AM, Vincent Guittot wrote:
> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

>> task. It probes many possibilities and compares the estimated energy values

>> for different scenarios. For calculating those energy values it relies on

>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

>> EM data is in milli-Watts (or abstract scale), which sometimes is not

>> sufficient. In some cases it might happen that two CPUs from different

>> Performance Domains (PDs) get the same calculated value for a given task

>> placement, but in more precised scale, they might differ. This rounding

>> error has to be addressed. This patch prepares EAS code for better

>> precision in the coming EM improvements.

> 

> Could you explain why 32bits results are not enough and you need to

> move to 64bits ?

> 

> Right now the result is in the range [0..2^32[ mW. If you need more

> precision and you want to return uW instead, you will have a result in

> the range  [0..4kW[ which seems to be still enough

> 


Currently we have the max value limit for 'power' in EM which is
EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power
values ~64k mW (~64Watts) for an OPP. Then based on 'power' we
pre-calculate 'cost' fields:
cost[i] = power[i] * freq_max / freq[i]
So, for max freq the cost == power. Let's use that in the example.

Then the em_cpu_energy() calculates as follow:
cost * sum_util / scale_cpu
We are interested in the first part - the value of multiplication.

The sum_util values that we can see for x CPUs which have scale_cap=1024
can be close to 800, let's use it in the example:
cost * sum_util = 64k * (x * 800), where
x=4: ~200mln
x=8: ~400mln
x=16: ~800mln
x=64: ~3200mln (last one which would fit in u32)

When we increase the precision by even 100, then the above values won't
fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has
issues:
cost * sum_util = (10k *100) * (x * 800), where
x=4: ~3200mln
x=8: ~6400mln

For *1000 precision even a power of 1Watt becomes an issue:
cost * sum_util = (1k *1000) * (x * 800), where
x=4: ~3200mln
x=8: ~6400mln

That's why to make the code safe for bigger power values, I had to use
the u64 on 32bit machines.
Vincent Guittot July 7, 2021, 8 a.m. UTC | #6
On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

>

>

> On 7/7/21 8:07 AM, Vincent Guittot wrote:

> > On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>

> >> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

> >> task. It probes many possibilities and compares the estimated energy values

> >> for different scenarios. For calculating those energy values it relies on

> >> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

> >> EM data is in milli-Watts (or abstract scale), which sometimes is not

> >> sufficient. In some cases it might happen that two CPUs from different

> >> Performance Domains (PDs) get the same calculated value for a given task

> >> placement, but in more precised scale, they might differ. This rounding

> >> error has to be addressed. This patch prepares EAS code for better

> >> precision in the coming EM improvements.

> >

> > Could you explain why 32bits results are not enough and you need to

> > move to 64bits ?

> >

> > Right now the result is in the range [0..2^32[ mW. If you need more

> > precision and you want to return uW instead, you will have a result in

> > the range  [0..4kW[ which seems to be still enough

> >

>

> Currently we have the max value limit for 'power' in EM which is

> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

> pre-calculate 'cost' fields:

> cost[i] = power[i] * freq_max / freq[i]

> So, for max freq the cost == power. Let's use that in the example.

>

> Then the em_cpu_energy() calculates as follow:

> cost * sum_util / scale_cpu

> We are interested in the first part - the value of multiplication.


But all these are internal computations of the energy model. At the
end, the computed energy that is returned by compute_energy() and
em_cpu_energy(), fits in a long

>

> The sum_util values that we can see for x CPUs which have scale_cap=1024

> can be close to 800, let's use it in the example:

> cost * sum_util = 64k * (x * 800), where

> x=4: ~200mln

> x=8: ~400mln

> x=16: ~800mln

> x=64: ~3200mln (last one which would fit in u32)

>

> When we increase the precision by even 100, then the above values won't

> fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has

> issues:

> cost * sum_util = (10k *100) * (x * 800), where

> x=4: ~3200mln

> x=8: ~6400mln

>

> For *1000 precision even a power of 1Watt becomes an issue:

> cost * sum_util = (1k *1000) * (x * 800), where

> x=4: ~3200mln

> x=8: ~6400mln

>

> That's why to make the code safe for bigger power values, I had to use

> the u64 on 32bit machines.
Lukasz Luba July 7, 2021, 8:23 a.m. UTC | #7
On 7/7/21 9:00 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>>

>>

>> On 7/7/21 8:07 AM, Vincent Guittot wrote:

>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>

>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

>>>> task. It probes many possibilities and compares the estimated energy values

>>>> for different scenarios. For calculating those energy values it relies on

>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not

>>>> sufficient. In some cases it might happen that two CPUs from different

>>>> Performance Domains (PDs) get the same calculated value for a given task

>>>> placement, but in more precised scale, they might differ. This rounding

>>>> error has to be addressed. This patch prepares EAS code for better

>>>> precision in the coming EM improvements.

>>>

>>> Could you explain why 32bits results are not enough and you need to

>>> move to 64bits ?

>>>

>>> Right now the result is in the range [0..2^32[ mW. If you need more

>>> precision and you want to return uW instead, you will have a result in

>>> the range  [0..4kW[ which seems to be still enough

>>>

>>

>> Currently we have the max value limit for 'power' in EM which is

>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

>> pre-calculate 'cost' fields:

>> cost[i] = power[i] * freq_max / freq[i]

>> So, for max freq the cost == power. Let's use that in the example.

>>

>> Then the em_cpu_energy() calculates as follow:

>> cost * sum_util / scale_cpu

>> We are interested in the first part - the value of multiplication.

> 

> But all these are internal computations of the energy model. At the

> end, the computed energy that is returned by compute_energy() and

> em_cpu_energy(), fits in a long


Let's take a look at existing *10000 precision for x CPUs:
cost * sum_util / scale_cpu =
(64k *10000) * (x * 800) / 1024
which is:
x * ~500mln

So to be close to overflowing u32 the 'x' has to be > (?=) 8
(depends on sum_util).

> 

>>

>> The sum_util values that we can see for x CPUs which have scale_cap=1024

>> can be close to 800, let's use it in the example:

>> cost * sum_util = 64k * (x * 800), where

>> x=4: ~200mln

>> x=8: ~400mln

>> x=16: ~800mln

>> x=64: ~3200mln (last one which would fit in u32)

>>

>> When we increase the precision by even 100, then the above values won't

>> fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has

>> issues:

>> cost * sum_util = (10k *100) * (x * 800), where

>> x=4: ~3200mln

>> x=8: ~6400mln

>>

>> For *1000 precision even a power of 1Watt becomes an issue:

>> cost * sum_util = (1k *1000) * (x * 800), where

>> x=4: ~3200mln

>> x=8: ~6400mln

>>

>> That's why to make the code safe for bigger power values, I had to use

>> the u64 on 32bit machines.
Vincent Guittot July 7, 2021, 9:37 a.m. UTC | #8
On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

>

>

> On 7/7/21 9:00 AM, Vincent Guittot wrote:

> > On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>

> >>

> >>

> >> On 7/7/21 8:07 AM, Vincent Guittot wrote:

> >>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>

> >>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

> >>>> task. It probes many possibilities and compares the estimated energy values

> >>>> for different scenarios. For calculating those energy values it relies on

> >>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

> >>>> EM data is in milli-Watts (or abstract scale), which sometimes is not

> >>>> sufficient. In some cases it might happen that two CPUs from different

> >>>> Performance Domains (PDs) get the same calculated value for a given task

> >>>> placement, but in more precised scale, they might differ. This rounding

> >>>> error has to be addressed. This patch prepares EAS code for better

> >>>> precision in the coming EM improvements.

> >>>

> >>> Could you explain why 32bits results are not enough and you need to

> >>> move to 64bits ?

> >>>

> >>> Right now the result is in the range [0..2^32[ mW. If you need more

> >>> precision and you want to return uW instead, you will have a result in

> >>> the range  [0..4kW[ which seems to be still enough

> >>>

> >>

> >> Currently we have the max value limit for 'power' in EM which is

> >> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

> >> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

> >> pre-calculate 'cost' fields:

> >> cost[i] = power[i] * freq_max / freq[i]

> >> So, for max freq the cost == power. Let's use that in the example.

> >>

> >> Then the em_cpu_energy() calculates as follow:

> >> cost * sum_util / scale_cpu

> >> We are interested in the first part - the value of multiplication.

> >

> > But all these are internal computations of the energy model. At the

> > end, the computed energy that is returned by compute_energy() and

> > em_cpu_energy(), fits in a long

>

> Let's take a look at existing *10000 precision for x CPUs:

> cost * sum_util / scale_cpu =

> (64k *10000) * (x * 800) / 1024

> which is:

> x * ~500mln

>

> So to be close to overflowing u32 the 'x' has to be > (?=) 8

> (depends on sum_util).


Sorry but I don't get your point.
This patch is about the return type of compute_energy() and
em_cpu_energy(). And even if we decide to return uW instead of mW,
there is still a lot of margin.

It's not because you need u64 for computing intermediate value that
you must returns u64

>

> >

> >>

> >> The sum_util values that we can see for x CPUs which have scale_cap=1024

> >> can be close to 800, let's use it in the example:

> >> cost * sum_util = 64k * (x * 800), where

> >> x=4: ~200mln

> >> x=8: ~400mln

> >> x=16: ~800mln

> >> x=64: ~3200mln (last one which would fit in u32)

> >>

> >> When we increase the precision by even 100, then the above values won't

> >> fit in the u32. Even a max cost of e.g. 10k mW and 100 precision has

> >> issues:

> >> cost * sum_util = (10k *100) * (x * 800), where

> >> x=4: ~3200mln

> >> x=8: ~6400mln

> >>

> >> For *1000 precision even a power of 1Watt becomes an issue:

> >> cost * sum_util = (1k *1000) * (x * 800), where

> >> x=4: ~3200mln

> >> x=8: ~6400mln

> >>

> >> That's why to make the code safe for bigger power values, I had to use

> >> the u64 on 32bit machines.
Dietmar Eggemann July 7, 2021, 9:45 a.m. UTC | #9
On 07/07/2021 10:23, Lukasz Luba wrote:
>  

> On 7/7/21 9:00 AM, Vincent Guittot wrote:

>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>

>>>

>>>

>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:

>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:


[...]

>>>> Could you explain why 32bits results are not enough and you need to

>>>> move to 64bits ?

>>>>

>>>> Right now the result is in the range [0..2^32[ mW. If you need more

>>>> precision and you want to return uW instead, you will have a result in

>>>> the rangeĀ  [0..4kW[ which seems to be still enough

>>>>

>>>

>>> Currently we have the max value limit for 'power' in EM which is

>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

>>> pre-calculate 'cost' fields:

>>> cost[i] = power[i] * freq_max / freq[i]

>>> So, for max freq the cost == power. Let's use that in the example.

>>>

>>> Then the em_cpu_energy() calculates as follow:

>>> cost * sum_util / scale_cpu

>>> We are interested in the first part - the value of multiplication.

>>

>> But all these are internal computations of the energy model. At the

>> end, the computed energy that is returned by compute_energy() and

>> em_cpu_energy(), fits in a long

> 

> Let's take a look at existing *10000 precision for x CPUs:

> cost * sum_util / scale_cpu =

> (64k *10000) * (x * 800) / 1024

> which is:

> x * ~500mln

> 

> So to be close to overflowing u32 the 'x' has to be > (?=) 8

> (depends on sum_util).


I assume the worst case is `x * 1024` (max return value of
effective_cpu_util = effective_cpu_util()) so x ~ 6.7.

I'm not aware of any arm32 b.L. systems with > 4 CPUs in a PD.
Lukasz Luba July 7, 2021, 9:48 a.m. UTC | #10
On 7/7/21 10:37 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>>

>>

>> On 7/7/21 9:00 AM, Vincent Guittot wrote:

>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>

>>>>

>>>>

>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:

>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>>>

>>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

>>>>>> task. It probes many possibilities and compares the estimated energy values

>>>>>> for different scenarios. For calculating those energy values it relies on

>>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

>>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not

>>>>>> sufficient. In some cases it might happen that two CPUs from different

>>>>>> Performance Domains (PDs) get the same calculated value for a given task

>>>>>> placement, but in more precised scale, they might differ. This rounding

>>>>>> error has to be addressed. This patch prepares EAS code for better

>>>>>> precision in the coming EM improvements.

>>>>>

>>>>> Could you explain why 32bits results are not enough and you need to

>>>>> move to 64bits ?

>>>>>

>>>>> Right now the result is in the range [0..2^32[ mW. If you need more

>>>>> precision and you want to return uW instead, you will have a result in

>>>>> the range  [0..4kW[ which seems to be still enough

>>>>>

>>>>

>>>> Currently we have the max value limit for 'power' in EM which is

>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

>>>> pre-calculate 'cost' fields:

>>>> cost[i] = power[i] * freq_max / freq[i]

>>>> So, for max freq the cost == power. Let's use that in the example.

>>>>

>>>> Then the em_cpu_energy() calculates as follow:

>>>> cost * sum_util / scale_cpu

>>>> We are interested in the first part - the value of multiplication.

>>>

>>> But all these are internal computations of the energy model. At the

>>> end, the computed energy that is returned by compute_energy() and

>>> em_cpu_energy(), fits in a long

>>

>> Let's take a look at existing *10000 precision for x CPUs:

>> cost * sum_util / scale_cpu =

>> (64k *10000) * (x * 800) / 1024

>> which is:

>> x * ~500mln

>>

>> So to be close to overflowing u32 the 'x' has to be > (?=) 8

>> (depends on sum_util).

> 

> Sorry but I don't get your point.

> This patch is about the return type of compute_energy() and

> em_cpu_energy(). And even if we decide to return uW instead of mW,

> there is still a lot of margin.

> 

> It's not because you need u64 for computing intermediate value that

> you must returns u64


The example above shows the need of u64 return value for platforms
which are:
- 32bit
- have e.g. 16 CPUs
- has big power value e.g. ~64k mW
Then let's to the calc:
(64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln

The returned value after applying the whole patch set
won't fit in u32 for such cluster.

We might make *assumption* that the 32bit platforms will not
have bigger number of CPUs in the cluster or won't report
big power values. But I didn't wanted to make such assumption.
Lukasz Luba July 7, 2021, 9:54 a.m. UTC | #11
On 7/7/21 10:45 AM, Dietmar Eggemann wrote:
> On 07/07/2021 10:23, Lukasz Luba wrote:

>>   

>> On 7/7/21 9:00 AM, Vincent Guittot wrote:

>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>

>>>>

>>>>

>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:

>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

> 

> [...]

> 

>>>>> Could you explain why 32bits results are not enough and you need to

>>>>> move to 64bits ?

>>>>>

>>>>> Right now the result is in the range [0..2^32[ mW. If you need more

>>>>> precision and you want to return uW instead, you will have a result in

>>>>> the rangeĀ  [0..4kW[ which seems to be still enough

>>>>>

>>>>

>>>> Currently we have the max value limit for 'power' in EM which is

>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

>>>> pre-calculate 'cost' fields:

>>>> cost[i] = power[i] * freq_max / freq[i]

>>>> So, for max freq the cost == power. Let's use that in the example.

>>>>

>>>> Then the em_cpu_energy() calculates as follow:

>>>> cost * sum_util / scale_cpu

>>>> We are interested in the first part - the value of multiplication.

>>>

>>> But all these are internal computations of the energy model. At the

>>> end, the computed energy that is returned by compute_energy() and

>>> em_cpu_energy(), fits in a long

>>

>> Let's take a look at existing *10000 precision for x CPUs:

>> cost * sum_util / scale_cpu =

>> (64k *10000) * (x * 800) / 1024

>> which is:

>> x * ~500mln

>>

>> So to be close to overflowing u32 the 'x' has to be > (?=) 8

>> (depends on sum_util).

> 

> I assume the worst case is `x * 1024` (max return value of

> effective_cpu_util = effective_cpu_util()) so x ~ 6.7.

> 

> I'm not aware of any arm32 b.L. systems with > 4 CPUs in a PD.

> 


True, arm32 didn't support bigger number than 4 CPUs in the cluster.
We would be safe for them, but I don't want to break with this
assumption any other 32bit platform from competitors, which might
create such 32bit 16cores clusters.

If Peter, Vincent and you are OK to put this assumption about
max safe CPUs number, then we can get rid of patch 1/3.

But the temporary division of u64 must stay, because there is
arm32 platform which need it. So returning also u64 is not a big
harm and looks more consistent.
Vincent Guittot July 7, 2021, 9:56 a.m. UTC | #12
On Wed, 7 Jul 2021 at 11:48, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

>

>

> On 7/7/21 10:37 AM, Vincent Guittot wrote:

> > On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>

> >>

> >>

> >> On 7/7/21 9:00 AM, Vincent Guittot wrote:

> >>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>

> >>>>

> >>>>

> >>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:

> >>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>>>

> >>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

> >>>>>> task. It probes many possibilities and compares the estimated energy values

> >>>>>> for different scenarios. For calculating those energy values it relies on

> >>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

> >>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not

> >>>>>> sufficient. In some cases it might happen that two CPUs from different

> >>>>>> Performance Domains (PDs) get the same calculated value for a given task

> >>>>>> placement, but in more precised scale, they might differ. This rounding

> >>>>>> error has to be addressed. This patch prepares EAS code for better

> >>>>>> precision in the coming EM improvements.

> >>>>>

> >>>>> Could you explain why 32bits results are not enough and you need to

> >>>>> move to 64bits ?

> >>>>>

> >>>>> Right now the result is in the range [0..2^32[ mW. If you need more

> >>>>> precision and you want to return uW instead, you will have a result in

> >>>>> the range  [0..4kW[ which seems to be still enough

> >>>>>

> >>>>

> >>>> Currently we have the max value limit for 'power' in EM which is

> >>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

> >>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

> >>>> pre-calculate 'cost' fields:

> >>>> cost[i] = power[i] * freq_max / freq[i]

> >>>> So, for max freq the cost == power. Let's use that in the example.

> >>>>

> >>>> Then the em_cpu_energy() calculates as follow:

> >>>> cost * sum_util / scale_cpu

> >>>> We are interested in the first part - the value of multiplication.

> >>>

> >>> But all these are internal computations of the energy model. At the

> >>> end, the computed energy that is returned by compute_energy() and

> >>> em_cpu_energy(), fits in a long

> >>

> >> Let's take a look at existing *10000 precision for x CPUs:

> >> cost * sum_util / scale_cpu =

> >> (64k *10000) * (x * 800) / 1024

> >> which is:

> >> x * ~500mln

> >>

> >> So to be close to overflowing u32 the 'x' has to be > (?=) 8

> >> (depends on sum_util).

> >

> > Sorry but I don't get your point.

> > This patch is about the return type of compute_energy() and

> > em_cpu_energy(). And even if we decide to return uW instead of mW,

> > there is still a lot of margin.

> >

> > It's not because you need u64 for computing intermediate value that

> > you must returns u64

>

> The example above shows the need of u64 return value for platforms

> which are:

> - 32bit

> - have e.g. 16 CPUs

> - has big power value e.g. ~64k mW

> Then let's to the calc:

> (64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln


so you return a power consumption of 8kW !!!

>

> The returned value after applying the whole patch set

> won't fit in u32 for such cluster.

>

> We might make *assumption* that the 32bit platforms will not

> have bigger number of CPUs in the cluster or won't report

> big power values. But I didn't wanted to make such assumption.

>

>
Lukasz Luba July 7, 2021, 10:06 a.m. UTC | #13
On 7/7/21 10:56 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 11:48, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>>

>>

>> On 7/7/21 10:37 AM, Vincent Guittot wrote:

>>> On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>

>>>>

>>>>

>>>> On 7/7/21 9:00 AM, Vincent Guittot wrote:

>>>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>>>

>>>>>>

>>>>>>

>>>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:

>>>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>>>>>

>>>>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

>>>>>>>> task. It probes many possibilities and compares the estimated energy values

>>>>>>>> for different scenarios. For calculating those energy values it relies on

>>>>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

>>>>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not

>>>>>>>> sufficient. In some cases it might happen that two CPUs from different

>>>>>>>> Performance Domains (PDs) get the same calculated value for a given task

>>>>>>>> placement, but in more precised scale, they might differ. This rounding

>>>>>>>> error has to be addressed. This patch prepares EAS code for better

>>>>>>>> precision in the coming EM improvements.

>>>>>>>

>>>>>>> Could you explain why 32bits results are not enough and you need to

>>>>>>> move to 64bits ?

>>>>>>>

>>>>>>> Right now the result is in the range [0..2^32[ mW. If you need more

>>>>>>> precision and you want to return uW instead, you will have a result in

>>>>>>> the range  [0..4kW[ which seems to be still enough

>>>>>>>

>>>>>>

>>>>>> Currently we have the max value limit for 'power' in EM which is

>>>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

>>>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

>>>>>> pre-calculate 'cost' fields:

>>>>>> cost[i] = power[i] * freq_max / freq[i]

>>>>>> So, for max freq the cost == power. Let's use that in the example.

>>>>>>

>>>>>> Then the em_cpu_energy() calculates as follow:

>>>>>> cost * sum_util / scale_cpu

>>>>>> We are interested in the first part - the value of multiplication.

>>>>>

>>>>> But all these are internal computations of the energy model. At the

>>>>> end, the computed energy that is returned by compute_energy() and

>>>>> em_cpu_energy(), fits in a long

>>>>

>>>> Let's take a look at existing *10000 precision for x CPUs:

>>>> cost * sum_util / scale_cpu =

>>>> (64k *10000) * (x * 800) / 1024

>>>> which is:

>>>> x * ~500mln

>>>>

>>>> So to be close to overflowing u32 the 'x' has to be > (?=) 8

>>>> (depends on sum_util).

>>>

>>> Sorry but I don't get your point.

>>> This patch is about the return type of compute_energy() and

>>> em_cpu_energy(). And even if we decide to return uW instead of mW,

>>> there is still a lot of margin.

>>>

>>> It's not because you need u64 for computing intermediate value that

>>> you must returns u64

>>

>> The example above shows the need of u64 return value for platforms

>> which are:

>> - 32bit

>> - have e.g. 16 CPUs

>> - has big power value e.g. ~64k mW

>> Then let's to the calc:

>> (64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln

> 

> so you return a power consumption of 8kW !!!

> 


No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts
each at max freq and 80% load.

Max power can be < 64Watts, which is 64k milli-Watts (< EM_MAX_POWER)
64k mW * 10000 --> is the 0.1uW precision
Vincent Guittot July 7, 2021, 10:11 a.m. UTC | #14
On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

>

>

> On 7/7/21 10:56 AM, Vincent Guittot wrote:

> > On Wed, 7 Jul 2021 at 11:48, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>

> >>

> >>

> >> On 7/7/21 10:37 AM, Vincent Guittot wrote:

> >>> On Wed, 7 Jul 2021 at 10:23, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>

> >>>>

> >>>>

> >>>> On 7/7/21 9:00 AM, Vincent Guittot wrote:

> >>>>> On Wed, 7 Jul 2021 at 09:49, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>>>

> >>>>>>

> >>>>>>

> >>>>>> On 7/7/21 8:07 AM, Vincent Guittot wrote:

> >>>>>>> On Fri, 25 Jun 2021 at 17:26, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>>>>>

> >>>>>>>> The Energy Aware Scheduler (EAS) tries to find best CPU for a waking up

> >>>>>>>> task. It probes many possibilities and compares the estimated energy values

> >>>>>>>> for different scenarios. For calculating those energy values it relies on

> >>>>>>>> Energy Model (EM) data and em_cpu_energy(). The precision which is used in

> >>>>>>>> EM data is in milli-Watts (or abstract scale), which sometimes is not

> >>>>>>>> sufficient. In some cases it might happen that two CPUs from different

> >>>>>>>> Performance Domains (PDs) get the same calculated value for a given task

> >>>>>>>> placement, but in more precised scale, they might differ. This rounding

> >>>>>>>> error has to be addressed. This patch prepares EAS code for better

> >>>>>>>> precision in the coming EM improvements.

> >>>>>>>

> >>>>>>> Could you explain why 32bits results are not enough and you need to

> >>>>>>> move to 64bits ?

> >>>>>>>

> >>>>>>> Right now the result is in the range [0..2^32[ mW. If you need more

> >>>>>>> precision and you want to return uW instead, you will have a result in

> >>>>>>> the range  [0..4kW[ which seems to be still enough

> >>>>>>>

> >>>>>>

> >>>>>> Currently we have the max value limit for 'power' in EM which is

> >>>>>> EM_MAX_POWER 0xffff (64k - 1). We allow to register such big power

> >>>>>> values ~64k mW (~64Watts) for an OPP. Then based on 'power' we

> >>>>>> pre-calculate 'cost' fields:

> >>>>>> cost[i] = power[i] * freq_max / freq[i]

> >>>>>> So, for max freq the cost == power. Let's use that in the example.

> >>>>>>

> >>>>>> Then the em_cpu_energy() calculates as follow:

> >>>>>> cost * sum_util / scale_cpu

> >>>>>> We are interested in the first part - the value of multiplication.

> >>>>>

> >>>>> But all these are internal computations of the energy model. At the

> >>>>> end, the computed energy that is returned by compute_energy() and

> >>>>> em_cpu_energy(), fits in a long

> >>>>

> >>>> Let's take a look at existing *10000 precision for x CPUs:

> >>>> cost * sum_util / scale_cpu =

> >>>> (64k *10000) * (x * 800) / 1024

> >>>> which is:

> >>>> x * ~500mln

> >>>>

> >>>> So to be close to overflowing u32 the 'x' has to be > (?=) 8

> >>>> (depends on sum_util).

> >>>

> >>> Sorry but I don't get your point.

> >>> This patch is about the return type of compute_energy() and

> >>> em_cpu_energy(). And even if we decide to return uW instead of mW,

> >>> there is still a lot of margin.

> >>>

> >>> It's not because you need u64 for computing intermediate value that

> >>> you must returns u64

> >>

> >> The example above shows the need of u64 return value for platforms

> >> which are:

> >> - 32bit

> >> - have e.g. 16 CPUs

> >> - has big power value e.g. ~64k mW

> >> Then let's to the calc:

> >> (64k * 10000) * (16 * 800) / 1024 = ~8000mln = ~8bln

> >

> > so you return a power consumption of 8kW !!!

> >

>

> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts


Oh! you want 0.1uW precision .... This doesn't seem realistic at all.
I'm not even sure that the power model can even reach an accuracy of
1mW

> each at max freq and 80% load.

>

> Max power can be < 64Watts, which is 64k milli-Watts (< EM_MAX_POWER)

> 64k mW * 10000 --> is the 0.1uW precision

>
Lukasz Luba July 7, 2021, 10:29 a.m. UTC | #15
On 7/7/21 11:11 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>


[snip]

>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

> 

> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.

> I'm not even sure that the power model can even reach an accuracy of

> 1mW

> 


True, the EM is registering platform with 1mW precision, but 1uW
precision makes more sense for internal EAS calculation. I don't
force platforms to report 1uW power, I just want to operate on
it internally. PowerCap and DTPM also operate internally on 1uW,
so it's not that unrealistic that some kernel components want
better resolution.

But as Peter suggested, we might skip 32bit platforms for this issue.
I have to discussed this internally.
Vincent Guittot July 7, 2021, 10:32 a.m. UTC | #16
On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

>

>

> On 7/7/21 11:11 AM, Vincent Guittot wrote:

> > On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>

>

> [snip]

>

> >> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

> >

> > Oh! you want 0.1uW precision .... This doesn't seem realistic at all.

> > I'm not even sure that the power model can even reach an accuracy of

> > 1mW

> >

>

> True, the EM is registering platform with 1mW precision, but 1uW


Do you mean 1uW or 0.1uW ?

> precision makes more sense for internal EAS calculation. I don't

> force platforms to report 1uW power, I just want to operate on

> it internally. PowerCap and DTPM also operate internally on 1uW,

> so it's not that unrealistic that some kernel components want

> better resolution.

>

> But as Peter suggested, we might skip 32bit platforms for this issue.

> I have to discussed this internally.
Lukasz Luba July 7, 2021, 10:41 a.m. UTC | #17
On 7/7/21 11:32 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>>

>>

>> On 7/7/21 11:11 AM, Vincent Guittot wrote:

>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>

>>

>> [snip]

>>

>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

>>>

>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.

>>> I'm not even sure that the power model can even reach an accuracy of

>>> 1mW

>>>

>>

>> True, the EM is registering platform with 1mW precision, but 1uW

> 

> Do you mean 1uW or 0.1uW ?


In this patch set I've proposed 0.1uW, but I'm open to drop one
order of magnitude. The 1uW still be good.
Vincent Guittot July 7, 2021, 10:50 a.m. UTC | #18
On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

>

>

> On 7/7/21 11:32 AM, Vincent Guittot wrote:

> > On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>

> >>

> >>

> >> On 7/7/21 11:11 AM, Vincent Guittot wrote:

> >>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>

> >>

> >> [snip]

> >>

> >>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

> >>>

> >>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.

> >>> I'm not even sure that the power model can even reach an accuracy of

> >>> 1mW

> >>>

> >>

> >> True, the EM is registering platform with 1mW precision, but 1uW

> >

> > Do you mean 1uW or 0.1uW ?

>

> In this patch set I've proposed 0.1uW, but I'm open to drop one

> order of magnitude. The 1uW still be good.


I don't want to underestimate the capabilities of the power model but
I don't see which benefit you will get with 0.1uW precision
With a 1uW precision the long type currently used for the returned
value is fine for 32bits machine AFAICT
Lukasz Luba July 7, 2021, 11:02 a.m. UTC | #19
On 7/7/21 11:50 AM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>>

>>

>> On 7/7/21 11:32 AM, Vincent Guittot wrote:

>>> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>

>>>>

>>>>

>>>> On 7/7/21 11:11 AM, Vincent Guittot wrote:

>>>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>>>

>>>>

>>>> [snip]

>>>>

>>>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

>>>>>

>>>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.

>>>>> I'm not even sure that the power model can even reach an accuracy of

>>>>> 1mW

>>>>>

>>>>

>>>> True, the EM is registering platform with 1mW precision, but 1uW

>>>

>>> Do you mean 1uW or 0.1uW ?

>>

>> In this patch set I've proposed 0.1uW, but I'm open to drop one

>> order of magnitude. The 1uW still be good.

> 

> I don't want to underestimate the capabilities of the power model but

> I don't see which benefit you will get with 0.1uW precision

> With a 1uW precision the long type currently used for the returned

> value is fine for 32bits machine AFAICT

> 


For 1uW and 1.2Watts for one core, 4 CPUs in cluster we get:
(1200 * 1000) * (4 * 1024) = ~4.9bln
so it would need div 64 version
Vincent Guittot July 7, 2021, 1:53 p.m. UTC | #20
On Wed, 7 Jul 2021 at 13:02, Lukasz Luba <lukasz.luba@arm.com> wrote:
>

>

>

> On 7/7/21 11:50 AM, Vincent Guittot wrote:

> > On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>

> >>

> >>

> >> On 7/7/21 11:32 AM, Vincent Guittot wrote:

> >>> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>

> >>>>

> >>>>

> >>>> On 7/7/21 11:11 AM, Vincent Guittot wrote:

> >>>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:

> >>>>>>

> >>>>

> >>>> [snip]

> >>>>

> >>>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

> >>>>>

> >>>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.

> >>>>> I'm not even sure that the power model can even reach an accuracy of

> >>>>> 1mW

> >>>>>

> >>>>

> >>>> True, the EM is registering platform with 1mW precision, but 1uW

> >>>

> >>> Do you mean 1uW or 0.1uW ?

> >>

> >> In this patch set I've proposed 0.1uW, but I'm open to drop one

> >> order of magnitude. The 1uW still be good.

> >

> > I don't want to underestimate the capabilities of the power model but

> > I don't see which benefit you will get with 0.1uW precision

> > With a 1uW precision the long type currently used for the returned

> > value is fine for 32bits machine AFAICT

> >

>

> For 1uW and 1.2Watts for one core, 4 CPUs in cluster we get:

> (1200 * 1000) * (4 * 1024) = ~4.9bln

> so it would need div 64 version


But as stated before, this is an internal computation step and doesn't
have to be reflected in the returned value which can stay a long
Lukasz Luba July 7, 2021, 2:25 p.m. UTC | #21
On 7/7/21 2:53 PM, Vincent Guittot wrote:
> On Wed, 7 Jul 2021 at 13:02, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>

>>

>>

>> On 7/7/21 11:50 AM, Vincent Guittot wrote:

>>> On Wed, 7 Jul 2021 at 12:41, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>

>>>>

>>>>

>>>> On 7/7/21 11:32 AM, Vincent Guittot wrote:

>>>>> On Wed, 7 Jul 2021 at 12:29, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>>>

>>>>>>

>>>>>>

>>>>>> On 7/7/21 11:11 AM, Vincent Guittot wrote:

>>>>>>> On Wed, 7 Jul 2021 at 12:06, Lukasz Luba <lukasz.luba@arm.com> wrote:

>>>>>>>>

>>>>>>

>>>>>> [snip]

>>>>>>

>>>>>>>> No. It's in 0.1uW scale, so 800Watts. Which is 16 CPUs * 64Watts

>>>>>>>

>>>>>>> Oh! you want 0.1uW precision .... This doesn't seem realistic at all.

>>>>>>> I'm not even sure that the power model can even reach an accuracy of

>>>>>>> 1mW

>>>>>>>

>>>>>>

>>>>>> True, the EM is registering platform with 1mW precision, but 1uW

>>>>>

>>>>> Do you mean 1uW or 0.1uW ?

>>>>

>>>> In this patch set I've proposed 0.1uW, but I'm open to drop one

>>>> order of magnitude. The 1uW still be good.

>>>

>>> I don't want to underestimate the capabilities of the power model but

>>> I don't see which benefit you will get with 0.1uW precision

>>> With a 1uW precision the long type currently used for the returned

>>> value is fine for 32bits machine AFAICT

>>>

>>

>> For 1uW and 1.2Watts for one core, 4 CPUs in cluster we get:

>> (1200 * 1000) * (4 * 1024) = ~4.9bln

>> so it would need div 64 version

> 

> But as stated before, this is an internal computation step and doesn't

> have to be reflected in the returned value which can stay a long

> 


I agree, we might scale down the result if it's too big, before the
return. We could figure this out at the EM registration point, so a
proper shift might be applied for such platform. It might enable both
32bit and 64bit platforms to avoid the rounding error. Let me experiment
with some code to check all the cases.
Thank you for the comments!
diff mbox series

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7b8990fd4896..b517c9e79768 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6582,7 +6582,7 @@  static unsigned long cpu_util_next(int cpu, struct task_struct *p, int dst_cpu)
  * to compute what would be the energy if we decided to actually migrate that
  * task.
  */
-static long
+static u64
 compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
 {
 	struct cpumask *pd_mask = perf_domain_span(pd);
@@ -6689,12 +6689,13 @@  compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd)
  */
 static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 {
-	unsigned long prev_delta = ULONG_MAX, best_delta = ULONG_MAX;
 	struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
+	u64 prev_delta = ULLONG_MAX, best_delta = ULLONG_MAX;
 	int cpu, best_energy_cpu = prev_cpu, target = -1;
-	unsigned long cpu_cap, util, base_energy = 0;
+	unsigned long cpu_cap, util;
 	struct sched_domain *sd;
 	struct perf_domain *pd;
+	u64 base_energy = 0;
 
 	rcu_read_lock();
 	pd = rcu_dereference(rd->pd);
@@ -6718,9 +6719,9 @@  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 		goto unlock;
 
 	for (; pd; pd = pd->next) {
-		unsigned long cur_delta, spare_cap, max_spare_cap = 0;
+		unsigned long spare_cap, max_spare_cap = 0;
 		bool compute_prev_delta = false;
-		unsigned long base_energy_pd;
+		u64 base_energy_pd, cur_delta;
 		int max_spare_cap_cpu = -1;
 
 		for_each_cpu_and(cpu, perf_domain_span(pd), sched_domain_span(sd)) {
@@ -6790,7 +6791,7 @@  static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
 	 * Pick the best CPU if prev_cpu cannot be used, or if it saves at
 	 * least 6% of the energy used by prev_cpu.
 	 */
-	if ((prev_delta == ULONG_MAX) ||
+	if ((prev_delta == ULLONG_MAX) ||
 	    (prev_delta - best_delta) > ((prev_delta + base_energy) >> 4))
 		target = best_energy_cpu;