diff mbox

[5/6] sched/fair: Get rid of scaling utilization by capacity_orig

Message ID 55EDDD5A.70904@arm.com
State New
Headers show

Commit Message

Dietmar Eggemann Sept. 7, 2015, 6:54 p.m. UTC
On 07/09/15 17:21, Vincent Guittot wrote:
> On 7 September 2015 at 17:37, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 04/09/15 00:51, Steve Muckle wrote:
>>> Hi Morten, Dietmar,
>>>
>>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>>> ...
>>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
>>>> + * recent utilization of currently non-runnable tasks on a CPU. It represents
>>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
>>>
>>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>>> __update_load_avg(). If there is now an assumption that util_avg may be
>>> used directly as a capacity value, should it be changed to
>>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>>> always be or if they can be combined.
>>
>> You're referring to the code line
>>
>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>
>> in __update_load_avg()?
>>
>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
>> load related.
> 
> I agree with Steve that there is an issue from a unit point of view
> 
> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
> load because of << SCHED_LOAD_SHIFT)
> 
> Before this patch , the translation from load to capacity unit was
> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
> 
> So you still have to change the unit from load to capacity with a "/
> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
> 
> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;

I see the point but IMHO this will only be necessary if the SCHED_LOAD_RESOLUTION
stuff gets re-enabled again.

It's not really about utilization or capacity units but rather about using the same
SCALE/SHIFT values for both sides, right?

I always thought that scale_load_down() takes care of that.

So shouldn't:


fix that issue in case SCHED_LOAD_RESOLUTION != 0 ?

I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so that we can
assume that load/util and capacity are always using 1024/10.

Cheers,

-- Dietmar

> 
> 
> Regards,
> Vincent
> 
> 
>>
>> LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
>> SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
>> scale_load_down() are also NOPs so this area is probably
>> worth a separate clean-up.
>> Beyond that, I'm not sure if the current functionality is
>> broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?
>>

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Comments

Peter Zijlstra Sept. 7, 2015, 7:47 p.m. UTC | #1
On Mon, Sep 07, 2015 at 07:54:18PM +0100, Dietmar Eggemann wrote:
> I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so that we can
> assume that load/util and capacity are always using 1024/10.

Ha!, I just requested Google look into moving it to 20 again ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 8, 2015, 7:22 a.m. UTC | #2
On 7 September 2015 at 20:54, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 07/09/15 17:21, Vincent Guittot wrote:
>> On 7 September 2015 at 17:37, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>> On 04/09/15 00:51, Steve Muckle wrote:
>>>> Hi Morten, Dietmar,
>>>>
>>>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>>>> ...
>>>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
>>>>> + * recent utilization of currently non-runnable tasks on a CPU. It represents
>>>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
>>>>
>>>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>>>> __update_load_avg(). If there is now an assumption that util_avg may be
>>>> used directly as a capacity value, should it be changed to
>>>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>>>> always be or if they can be combined.
>>>
>>> You're referring to the code line
>>>
>>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>>
>>> in __update_load_avg()?
>>>
>>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
>>> load related.
>>
>> I agree with Steve that there is an issue from a unit point of view
>>
>> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
>> load because of << SCHED_LOAD_SHIFT)
>>
>> Before this patch , the translation from load to capacity unit was
>> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
>>
>> So you still have to change the unit from load to capacity with a "/
>> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
>>
>> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
>> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
>> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>
> I see the point but IMHO this will only be necessary if the SCHED_LOAD_RESOLUTION
> stuff gets re-enabled again.
>
> It's not really about utilization or capacity units but rather about using the same
> SCALE/SHIFT values for both sides, right?

It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
scale the value in the right range. In the case of cpu_usage which
returns sa->util_avg , it's the capacity range not the load range.

>
> I always thought that scale_load_down() takes care of that.

AFAIU, scale_load_down is a way to increase the resolution  of the
load not to move from load to capacity

>
> So shouldn't:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 3445d2fb38f4..b80f799aface 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2644,7 +2644,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>                         cfs_rq->runnable_load_avg =
>                                 div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>                 }
> -               sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> +               sa->util_avg = (sa->util_sum * scale_load_down(SCHED_LOAD_SCALE)) / LOAD_AVG_MAX;
>         }
>
>         return decayed;
>
> fix that issue in case SCHED_LOAD_RESOLUTION != 0 ?


No, but
sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
will fix the unit issue.
I agree that i don't change the result because both SCHED_LOAD_SHIFT
and SCHED_CAPACITY_SHIFT are set to 10 but as mentioned above, they
are set separately so it can make the difference if someone change one
SHIFT value.

Regards,
Vincent

>
> I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so that we can
> assume that load/util and capacity are always using 1024/10.
>
> Cheers,
>
> -- Dietmar
>
>>
>>
>> Regards,
>> Vincent
>>
>>
>>>
>>> LOAD (UTIL) and CAPACITY have the same SCALE and SHIFT values because
>>> SCHED_LOAD_RESOLUTION is always defined to 0. scale_load() and
>>> scale_load_down() are also NOPs so this area is probably
>>> worth a separate clean-up.
>>> Beyond that, I'm not sure if the current functionality is
>>> broken if we use different SCALE and SHIFT values for LOAD and CAPACITY?
>>>
>
> [...]
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Sept. 8, 2015, 11:44 a.m. UTC | #3
On Mon, Sep 07, 2015 at 07:54:18PM +0100, Dietmar Eggemann wrote:
> I see the point but IMHO this will only be necessary if the SCHED_LOAD_RESOLUTION
> stuff gets re-enabled again.

Paul, Ben, gentle reminder to look at re-enabling this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Sept. 8, 2015, 12:26 p.m. UTC | #4
On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
> No, but
> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> will fix the unit issue.

Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.

And where load_sum already gets a factor 1024 from the weight
multiplication, util_sum does not get such a factor, and all the scaling
we do on it loose bits.

So at the moment we go compute the util_avg value, we need to inflate
util_sum with an extra factor 1024 in order to make it work.

And seeing that we do the shift up on sa->util_sum without consideration
of overflow, would it not make sense to add that factor before the
scaling and into the addition?

Now, given all that, units are a complete mess here, and I'd not mind
something like:

#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
#error "something usefull"
#endif

somewhere near here.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Dietmar Eggemann Sept. 8, 2015, 12:47 p.m. UTC | #5
On 07/09/15 20:47, Peter Zijlstra wrote:
> On Mon, Sep 07, 2015 at 07:54:18PM +0100, Dietmar Eggemann wrote:
>> I would vote for removing this SCHED_LOAD_RESOLUTION thing completely so that we can
>> assume that load/util and capacity are always using 1024/10.
> 
> Ha!, I just requested Google look into moving it to 20 again ;-)
> 

In this case Steve and Vincent have a point here.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Dietmar Eggemann Sept. 8, 2015, 12:50 p.m. UTC | #6
On 08/09/15 08:22, Vincent Guittot wrote:
> On 7 September 2015 at 20:54, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 07/09/15 17:21, Vincent Guittot wrote:
>>> On 7 September 2015 at 17:37, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>>> On 04/09/15 00:51, Steve Muckle wrote:
>>>>> Hi Morten, Dietmar,
>>>>>
>>>>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>>>>> ...
>>>>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
>>>>>> + * recent utilization of currently non-runnable tasks on a CPU. It represents
>>>>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
>>>>>
>>>>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>>>>> __update_load_avg(). If there is now an assumption that util_avg may be
>>>>> used directly as a capacity value, should it be changed to
>>>>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>>>>> always be or if they can be combined.
>>>>
>>>> You're referring to the code line
>>>>
>>>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>>>
>>>> in __update_load_avg()?
>>>>
>>>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
>>>> load related.
>>>
>>> I agree with Steve that there is an issue from a unit point of view
>>>
>>> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
>>> load because of << SCHED_LOAD_SHIFT)
>>>
>>> Before this patch , the translation from load to capacity unit was
>>> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
>>>
>>> So you still have to change the unit from load to capacity with a "/
>>> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
>>>
>>> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
>>> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
>>> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>>
>> I see the point but IMHO this will only be necessary if the SCHED_LOAD_RESOLUTION
>> stuff gets re-enabled again.
>>
>> It's not really about utilization or capacity units but rather about using the same
>> SCALE/SHIFT values for both sides, right?
> 
> It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
> SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
> scale the value in the right range. In the case of cpu_usage which
> returns sa->util_avg , it's the capacity range not the load range.

Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
CAPACITY have no unit.

I agree that with the current patch-set we have a SHIFT/SCALE problem
once SCHED_LOAD_RESOLUTION is set to != 0.

> 
>>
>> I always thought that scale_load_down() takes care of that.
> 
> AFAIU, scale_load_down is a way to increase the resolution  of the
> load not to move from load to capacity

IMHO, increasing the resolution of the load is done by re-enabling this
define SCHED_LOAD_RESOLUTION  10 thing (or by setting
SCHED_LOAD_RESOLUTION to something else than 0).

I tried to figure out why we have this issue when comparing UTIL w/
CAPACITY and not LOAD w/ CAPACITY:

Both are initialized like that:

 sa->load_avg = scale_load_down(se->load.weight);
 sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
 sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
 sa->util_sum = LOAD_AVG_MAX;

and we use 'se->on_rq * scale_load_down(se->load.weight)' as 'unsigned
long weight' argument to call __update_load_avg() making sure the
scaling differences between LOAD and CAPACITY are respected while
updating sa->load_sum (and sa->load_avg).

OTAH, we don't apply a scale_load_down for sa->util_[sum/avg] only a '<<
SCHED_LOAD_SHIFT) / LOAD_AVG_MAX' on sa->util_avg.
So changing '<< SCHED_LOAD_SHIFT' to '*
scale_load_down(SCHED_LOAD_SCALE)' would be the logical thing to do.

I agree that '<< SCHED_CAPACITY_SHIFT' would have the same effect but
why using a CAPACITY related thing on the LOAD/UTIL side? The only
reason would be the unit problem which I don't understand.

> 
>>
>> So shouldn't:
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 3445d2fb38f4..b80f799aface 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2644,7 +2644,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>                         cfs_rq->runnable_load_avg =
>>                                 div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>>                 }
>> -               sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>> +               sa->util_avg = (sa->util_sum * scale_load_down(SCHED_LOAD_SCALE)) / LOAD_AVG_MAX;
>>         }
>>
>>         return decayed;
>>
>> fix that issue in case SCHED_LOAD_RESOLUTION != 0 ?
> 
> 
> No, but
> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> will fix the unit issue.
> I agree that i don't change the result because both SCHED_LOAD_SHIFT
> and SCHED_CAPACITY_SHIFT are set to 10 but as mentioned above, they
> are set separately so it can make the difference if someone change one
> SHIFT value.

SCHED_LOAD_SHIFT and SCHED_CAPACITY_SHIFT can be set separately but the
way to change SCHED_LOAD_SHIFT is by re-enabling the define
SCHED_LOAD_RESOLUTION 10 in kernel/sched/sched.h. I guess nobody wants
to change SCHED_CAPACITY_[SHIFT/SCALE].

Cheers,

-- Dietmar

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 8, 2015, 1:39 p.m. UTC | #7
On 8 September 2015 at 14:26, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> No, but
>> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> will fix the unit issue.
>
> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>
> And where load_sum already gets a factor 1024 from the weight
> multiplication, util_sum does not get such a factor, and all the scaling
> we do on it loose bits.

fair point

>
> So at the moment we go compute the util_avg value, we need to inflate
> util_sum with an extra factor 1024 in order to make it work.
>
> And seeing that we do the shift up on sa->util_sum without consideration
> of overflow, would it not make sense to add that factor before the
> scaling and into the addition?

Yes this should save 1 left shift and 1 right shift

 >>

>
> Now, given all that, units are a complete mess here, and I'd not mind
> something like:
>
> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> #error "something usefull"
> #endif

In this case why not simply doing
#define SCHED_CAPACITY_SHIFT SCHED_LOAD_SHIFT
or the opposite ?

>
> somewhere near here.
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 8, 2015, 2:01 p.m. UTC | #8
On 8 September 2015 at 14:50, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
> On 08/09/15 08:22, Vincent Guittot wrote:
>> On 7 September 2015 at 20:54, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>> On 07/09/15 17:21, Vincent Guittot wrote:
>>>> On 7 September 2015 at 17:37, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>>>> On 04/09/15 00:51, Steve Muckle wrote:
>>>>>> Hi Morten, Dietmar,
>>>>>>
>>>>>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>>>>>> ...
>>>>>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
>>>>>>> + * recent utilization of currently non-runnable tasks on a CPU. It represents
>>>>>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
>>>>>>
>>>>>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>>>>>> __update_load_avg(). If there is now an assumption that util_avg may be
>>>>>> used directly as a capacity value, should it be changed to
>>>>>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>>>>>> always be or if they can be combined.
>>>>>
>>>>> You're referring to the code line
>>>>>
>>>>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>>>>
>>>>> in __update_load_avg()?
>>>>>
>>>>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
>>>>> load related.
>>>>
>>>> I agree with Steve that there is an issue from a unit point of view
>>>>
>>>> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
>>>> load because of << SCHED_LOAD_SHIFT)
>>>>
>>>> Before this patch , the translation from load to capacity unit was
>>>> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
>>>>
>>>> So you still have to change the unit from load to capacity with a "/
>>>> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
>>>>
>>>> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
>>>> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
>>>> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>>>
>>> I see the point but IMHO this will only be necessary if the SCHED_LOAD_RESOLUTION
>>> stuff gets re-enabled again.
>>>
>>> It's not really about utilization or capacity units but rather about using the same
>>> SCALE/SHIFT values for both sides, right?
>>
>> It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
>> SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
>> scale the value in the right range. In the case of cpu_usage which
>> returns sa->util_avg , it's the capacity range not the load range.
>
> Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> CAPACITY have no unit.

If you set 2 different values to SCHED_LOAD_SHIFT and
SCHED_CAPACITY_SHIFT for test purpose, you will see that util_avg will
not use the right range of value

If we don't take into account freq and cpu invariance in a 1st step

sa->util_sum is a load in the range [0..LOAD_AVG_MAX]. I say load
because of the max value

the current implementation of util_avg is
sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX

so sa->util_avg is a load in the range [0..SCHED_LOAD_SCALE]

the current implementation of get_cpu_usage is
return (sa->util_avg * capacity_orig_of(cpu)) >> SCHED_LOAD_SHIFT

so the usage has the same unit and range as capacity of the cpu and
can be compared with another capacity value

Your patchset returns directly sa->util_avg which is a load to compare
it with capacity value

So you have to convert sa->util_avg from load to capacity so if you have
sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX

sa->util_avg is now a capacity with the same range as you cpu thanks
to the cpu invariance factor that the patch 3 has added.

the << SCHED_CAPACITY_SHIFT above can be optimized with the >>
SCHED_CAPACITY_SHIFT included in
sa->util_sum += scale(contrib, scale_cpu);
as mentioned by Peter

At now, SCHED_CAPACITY_SHIFT is set to 10 as well as SCHED_LOAD_SHIFT
so using one instead of the other doesn't change the result but if
it's no more the case, we need to take care of the range/unit that we
use

Regards,
Vincent


>
> I agree that with the current patch-set we have a SHIFT/SCALE problem
> once SCHED_LOAD_RESOLUTION is set to != 0.
>
>>
>>>
>>> I always thought that scale_load_down() takes care of that.
>>
>> AFAIU, scale_load_down is a way to increase the resolution  of the
>> load not to move from load to capacity
>
> IMHO, increasing the resolution of the load is done by re-enabling this
> define SCHED_LOAD_RESOLUTION  10 thing (or by setting
> SCHED_LOAD_RESOLUTION to something else than 0).
>
> I tried to figure out why we have this issue when comparing UTIL w/
> CAPACITY and not LOAD w/ CAPACITY:
>
> Both are initialized like that:
>
>  sa->load_avg = scale_load_down(se->load.weight);
>  sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>  sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>  sa->util_sum = LOAD_AVG_MAX;
>
> and we use 'se->on_rq * scale_load_down(se->load.weight)' as 'unsigned
> long weight' argument to call __update_load_avg() making sure the
> scaling differences between LOAD and CAPACITY are respected while
> updating sa->load_sum (and sa->load_avg).
>
> OTAH, we don't apply a scale_load_down for sa->util_[sum/avg] only a '<<
> SCHED_LOAD_SHIFT) / LOAD_AVG_MAX' on sa->util_avg.
> So changing '<< SCHED_LOAD_SHIFT' to '*
> scale_load_down(SCHED_LOAD_SCALE)' would be the logical thing to do.
>
> I agree that '<< SCHED_CAPACITY_SHIFT' would have the same effect but
> why using a CAPACITY related thing on the LOAD/UTIL side? The only
> reason would be the unit problem which I don't understand.
>
>>
>>>
>>> So shouldn't:
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 3445d2fb38f4..b80f799aface 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -2644,7 +2644,7 @@ __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
>>>                         cfs_rq->runnable_load_avg =
>>>                                 div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>>>                 }
>>> -               sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>> +               sa->util_avg = (sa->util_sum * scale_load_down(SCHED_LOAD_SCALE)) / LOAD_AVG_MAX;
>>>         }
>>>
>>>         return decayed;
>>>
>>> fix that issue in case SCHED_LOAD_RESOLUTION != 0 ?
>>
>>
>> No, but
>> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> will fix the unit issue.
>> I agree that i don't change the result because both SCHED_LOAD_SHIFT
>> and SCHED_CAPACITY_SHIFT are set to 10 but as mentioned above, they
>> are set separately so it can make the difference if someone change one
>> SHIFT value.
>
> SCHED_LOAD_SHIFT and SCHED_CAPACITY_SHIFT can be set separately but the
> way to change SCHED_LOAD_SHIFT is by re-enabling the define
> SCHED_LOAD_RESOLUTION 10 in kernel/sched/sched.h. I guess nobody wants
> to change SCHED_CAPACITY_[SHIFT/SCALE].
>
> Cheers,
>
> -- Dietmar
>
> [...]
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 8, 2015, 2:06 p.m. UTC | #9
On 8 September 2015 at 14:52, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> > No, but
>> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> > will fix the unit issue.
>>
>> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>>
>> And where load_sum already gets a factor 1024 from the weight
>> multiplication, util_sum does not get such a factor, and all the scaling
>> we do on it loose bits.
>>
>> So at the moment we go compute the util_avg value, we need to inflate
>> util_sum with an extra factor 1024 in order to make it work.
>>
>> And seeing that we do the shift up on sa->util_sum without consideration
>> of overflow, would it not make sense to add that factor before the
>> scaling and into the addition?
>>
>> Now, given all that, units are a complete mess here, and I'd not mind
>> something like:
>>
>> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> #error "something usefull"
>> #endif
>>
>> somewhere near here.
>
> Something like teh below..
>
> Another thing to ponder; the downside of scaled_delta_w is that its
> fairly likely delta is small and you loose all bits, whereas the weight
> is likely to be large can could loose a fwe bits without issue.
>
> That is, in fixed point scaling like this, you want to start with the
> biggest numbers, not the smallest, otherwise you loose too much.
>
> The flip side is of course that now you can share a multiplcation.
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
>         sa->load_avg = scale_load_down(se->load.weight);
>         sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>         sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> -       sa->util_sum = LOAD_AVG_MAX;
> +       sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>         /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>  }
>
> @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
>         return contrib + runnable_avg_yN_sum[n];
>  }
>
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif

so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?

> +
>  #define cap_scale(v, s) ((v)*(s) >> SCHED_CAPACITY_SHIFT)
>
>  /*
> @@ -2599,7 +2603,7 @@ __update_load_avg(u64 now, int cpu, stru
>                         }
>                 }
>                 if (running)
> -                       sa->util_sum += cap_scale(scaled_delta_w, scale_cpu);
> +                       sa->util_sum += scaled_delta_w * scale_cpu;
>
>                 delta -= delta_w;
>
> @@ -2623,7 +2627,7 @@ __update_load_avg(u64 now, int cpu, stru
>                                 cfs_rq->runnable_load_sum += weight * contrib;
>                 }
>                 if (running)
> -                       sa->util_sum += cap_scale(contrib, scale_cpu);
> +                       sa->util_sum += contrib * scale_cpu;
>         }
>
>         /* Remainder of delta accrued against u_0` */
> @@ -2634,7 +2638,7 @@ __update_load_avg(u64 now, int cpu, stru
>                         cfs_rq->runnable_load_sum += weight * scaled_delta;
>         }
>         if (running)
> -               sa->util_sum += cap_scale(scaled_delta, scale_cpu);
> +               sa->util_sum += scaled_delta * scale_cpu;
>
>         sa->period_contrib += delta;
>
> @@ -2644,7 +2648,7 @@ __update_load_avg(u64 now, int cpu, stru
>                         cfs_rq->runnable_load_avg =
>                                 div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
>                 }
> -               sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> +               sa->util_avg = sa->util_sum / LOAD_AVG_MAX;
>         }
>
>         return decayed;
> @@ -2686,8 +2690,7 @@ static inline int update_cfs_rq_load_avg
>         if (atomic_long_read(&cfs_rq->removed_util_avg)) {
>                 long r = atomic_long_xchg(&cfs_rq->removed_util_avg, 0);
>                 sa->util_avg = max_t(long, sa->util_avg - r, 0);
> -               sa->util_sum = max_t(s32, sa->util_sum -
> -                       ((r * LOAD_AVG_MAX) >> SCHED_LOAD_SHIFT), 0);
> +               sa->util_sum = max_t(s32, sa->util_sum - r * LOAD_AVG_MAX, 0);

looks good to me

>         }
>
>         decayed = __update_load_avg(now, cpu_of(rq_of(cfs_rq)), sa,
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Sept. 8, 2015, 2:10 p.m. UTC | #10
On Tue, Sep 08, 2015 at 03:39:37PM +0200, Vincent Guittot wrote:
> > Now, given all that, units are a complete mess here, and I'd not mind
> > something like:
> >
> > #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> > #error "something usefull"
> > #endif
> 
> In this case why not simply doing
> #define SCHED_CAPACITY_SHIFT SCHED_LOAD_SHIFT
> or the opposite ?

Sadly not enough; aside from the fact that we really should do !0
LOAD_RESOLUTION on 64bit, the whole magic tables (runnable_avg_yN_*[])
and LOAD_AVG_MAX* values rely on the unit being 1<<10.

So regardless of defining one in terms of the other, we should check
both are in fact 10 and error out otherwise.

Changing them must involve recomputing these numbers or otherwise
mucking about with shifts to ensure its back to 10 when we do this load
muck.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Dietmar Eggemann Sept. 8, 2015, 2:27 p.m. UTC | #11
On 08/09/15 15:01, Vincent Guittot wrote:
> On 8 September 2015 at 14:50, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>> On 08/09/15 08:22, Vincent Guittot wrote:
>>> On 7 September 2015 at 20:54, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>>> On 07/09/15 17:21, Vincent Guittot wrote:
>>>>> On 7 September 2015 at 17:37, Dietmar Eggemann <dietmar.eggemann@arm.com> wrote:
>>>>>> On 04/09/15 00:51, Steve Muckle wrote:
>>>>>>> Hi Morten, Dietmar,
>>>>>>>
>>>>>>> On 08/14/2015 09:23 AM, Morten Rasmussen wrote:
>>>>>>> ...
>>>>>>>> + * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
>>>>>>>> + * recent utilization of currently non-runnable tasks on a CPU. It represents
>>>>>>>> + * the amount of utilization of a CPU in the range [0..capacity_orig] where
>>>>>>>
>>>>>>> I see util_sum is scaled by SCHED_LOAD_SHIFT at the end of
>>>>>>> __update_load_avg(). If there is now an assumption that util_avg may be
>>>>>>> used directly as a capacity value, should it be changed to
>>>>>>> SCHED_CAPACITY_SHIFT? These are equal right now, not sure if they will
>>>>>>> always be or if they can be combined.
>>>>>>
>>>>>> You're referring to the code line
>>>>>>
>>>>>> 2647   sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>>>>>>
>>>>>> in __update_load_avg()?
>>>>>>
>>>>>> Here we actually scale by 'SCHED_LOAD_SCALE/LOAD_AVG_MAX' so both values are
>>>>>> load related.
>>>>>
>>>>> I agree with Steve that there is an issue from a unit point of view
>>>>>
>>>>> sa->util_sum and LOAD_AVG_MAX have the same unit so sa->util_avg is a
>>>>> load because of << SCHED_LOAD_SHIFT)
>>>>>
>>>>> Before this patch , the translation from load to capacity unit was
>>>>> done in get_cpu_usage with "* capacity) >> SCHED_LOAD_SHIFT"
>>>>>
>>>>> So you still have to change the unit from load to capacity with a "/
>>>>> SCHED_LOAD_SCALE * SCHED_CAPACITY_SCALE" somewhere.
>>>>>
>>>>> sa->util_avg = ((sa->util_sum << SCHED_LOAD_SHIFT) /SCHED_LOAD_SCALE *
>>>>> SCHED_CAPACITY_SCALE / LOAD_AVG_MAX = (sa->util_sum <<
>>>>> SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>>>>
>>>> I see the point but IMHO this will only be necessary if the SCHED_LOAD_RESOLUTION
>>>> stuff gets re-enabled again.
>>>>
>>>> It's not really about utilization or capacity units but rather about using the same
>>>> SCALE/SHIFT values for both sides, right?
>>>
>>> It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
>>> SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
>>> scale the value in the right range. In the case of cpu_usage which
>>> returns sa->util_avg , it's the capacity range not the load range.
>>
>> Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
>> CAPACITY have no unit.
> 
> If you set 2 different values to SCHED_LOAD_SHIFT and
> SCHED_CAPACITY_SHIFT for test purpose, you will see that util_avg will
> not use the right range of value
> 
> If we don't take into account freq and cpu invariance in a 1st step
> 
> sa->util_sum is a load in the range [0..LOAD_AVG_MAX]. I say load
> because of the max value
> 
> the current implementation of util_avg is
> sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX
> 
> so sa->util_avg is a load in the range [0..SCHED_LOAD_SCALE]
> 
> the current implementation of get_cpu_usage is
> return (sa->util_avg * capacity_orig_of(cpu)) >> SCHED_LOAD_SHIFT
> 
> so the usage has the same unit and range as capacity of the cpu and
> can be compared with another capacity value
> 
> Your patchset returns directly sa->util_avg which is a load to compare
> it with capacity value
> 
> So you have to convert sa->util_avg from load to capacity so if you have
> sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX
> 
> sa->util_avg is now a capacity with the same range as you cpu thanks
> to the cpu invariance factor that the patch 3 has added.
> 
> the << SCHED_CAPACITY_SHIFT above can be optimized with the >>
> SCHED_CAPACITY_SHIFT included in
> sa->util_sum += scale(contrib, scale_cpu);
> as mentioned by Peter
> 
> At now, SCHED_CAPACITY_SHIFT is set to 10 as well as SCHED_LOAD_SHIFT
> so using one instead of the other doesn't change the result but if
> it's no more the case, we need to take care of the range/unit that we
> use

No arguing here, I just called this a SHIFT/SCALE problem.

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 8, 2015, 2:31 p.m. UTC | #12
On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
> > On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
> > > No, but
> > > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> > > will fix the unit issue.
> > 
> > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.

I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
1<<10, it is just the sum of the geometric series and the upper bound of
util_sum?

> > And where load_sum already gets a factor 1024 from the weight
> > multiplication, util_sum does not get such a factor, and all the scaling
> > we do on it loose bits.
> > 
> > So at the moment we go compute the util_avg value, we need to inflate
> > util_sum with an extra factor 1024 in order to make it work.

Agreed. Inflating the util_sum instead of util_avg like you do below
makes more sense. The load_sum/util_sum assymmetry is somewhat confusing.

> > And seeing that we do the shift up on sa->util_sum without consideration
> > of overflow, would it not make sense to add that factor before the
> > scaling and into the addition?

I don't think util_sum can overflow as it is bounded by LOAD_AVG_MAX
unless you shift it a lot, like << 20. The << SCHED_LOAD_SHIFT in the
existing code is wrong I think. Looking at the initialization of
util_avg = scale_load_down(SCHED_LOAD_SCALE) it is not using using high
resolution load.

> > Now, given all that, units are a complete mess here, and I'd not mind
> > something like:
> > 
> > #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> > #error "something usefull"
> > #endif
> > 
> > somewhere near here.

Yes. As I see it, it all falls completely if that isn't true. 

> 
> Something like teh below..
> 
> Another thing to ponder; the downside of scaled_delta_w is that its
> fairly likely delta is small and you loose all bits, whereas the weight
> is likely to be large can could loose a fwe bits without issue.

That issue applies both to load and util.

> 
> That is, in fixed point scaling like this, you want to start with the
> biggest numbers, not the smallest, otherwise you loose too much.
> 
> The flip side is of course that now you can share a multiplcation.

But if we apply the scaling to the weight instead of time, we would only
have to apply it once and not three times like it is now? So maybe we
can end up with almost the same number of multiplications.

We might be loosing bits for low priority task running on cpus at a low
frequency though.

> 
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
>  	sa->load_avg = scale_load_down(se->load.weight);
>  	sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>  	sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> -	sa->util_sum = LOAD_AVG_MAX;
> +	sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>  	/* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>  }
>  
> @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
>  	return contrib + runnable_avg_yN_sum[n];
>  }
>  
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif

As mentioned above. Does it have to be 10?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 8, 2015, 2:35 p.m. UTC | #13
On Tue, Sep 08, 2015 at 04:06:36PM +0200, Vincent Guittot wrote:
> On 8 September 2015 at 14:52, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
> >> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
> >> > No, but
> >> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> >> > will fix the unit issue.
> >>
> >> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> >>
> >> And where load_sum already gets a factor 1024 from the weight
> >> multiplication, util_sum does not get such a factor, and all the scaling
> >> we do on it loose bits.
> >>
> >> So at the moment we go compute the util_avg value, we need to inflate
> >> util_sum with an extra factor 1024 in order to make it work.
> >>
> >> And seeing that we do the shift up on sa->util_sum without consideration
> >> of overflow, would it not make sense to add that factor before the
> >> scaling and into the addition?
> >>
> >> Now, given all that, units are a complete mess here, and I'd not mind
> >> something like:
> >>
> >> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
> >> #error "something usefull"
> >> #endif
> >>
> >> somewhere near here.
> >
> > Something like teh below..
> >
> > Another thing to ponder; the downside of scaled_delta_w is that its
> > fairly likely delta is small and you loose all bits, whereas the weight
> > is likely to be large can could loose a fwe bits without issue.
> >
> > That is, in fixed point scaling like this, you want to start with the
> > biggest numbers, not the smallest, otherwise you loose too much.
> >
> > The flip side is of course that now you can share a multiplcation.
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
> >         sa->load_avg = scale_load_down(se->load.weight);
> >         sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
> >         sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> > -       sa->util_sum = LOAD_AVG_MAX;
> > +       sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
> >         /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
> >  }
> >
> > @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
> >         return contrib + runnable_avg_yN_sum[n];
> >  }
> >
> > +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> > +#error "load tracking assumes 2^10 as unit"
> > +#endif
> 
> so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?

Don't you mean:

#define SCHED_LOAD_SHIFT (SCHED_CAPACITY_SHIFT + SCHED_LOAD_RESOLUTION)

?

Or do you want to increase the capacity resolution as well if you
increase the load resolution?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 8, 2015, 2:40 p.m. UTC | #14
On 8 September 2015 at 16:35, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Tue, Sep 08, 2015 at 04:06:36PM +0200, Vincent Guittot wrote:
>> On 8 September 2015 at 14:52, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, Sep 08, 2015 at 02:26:06PM +0200, Peter Zijlstra wrote:
>> >> On Tue, Sep 08, 2015 at 09:22:05AM +0200, Vincent Guittot wrote:
>> >> > No, but
>> >> > sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
>> >> > will fix the unit issue.
>> >>
>> >> Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> >>
>> >> And where load_sum already gets a factor 1024 from the weight
>> >> multiplication, util_sum does not get such a factor, and all the scaling
>> >> we do on it loose bits.
>> >>
>> >> So at the moment we go compute the util_avg value, we need to inflate
>> >> util_sum with an extra factor 1024 in order to make it work.
>> >>
>> >> And seeing that we do the shift up on sa->util_sum without consideration
>> >> of overflow, would it not make sense to add that factor before the
>> >> scaling and into the addition?
>> >>
>> >> Now, given all that, units are a complete mess here, and I'd not mind
>> >> something like:
>> >>
>> >> #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> >> #error "something usefull"
>> >> #endif
>> >>
>> >> somewhere near here.
>> >
>> > Something like teh below..
>> >
>> > Another thing to ponder; the downside of scaled_delta_w is that its
>> > fairly likely delta is small and you loose all bits, whereas the weight
>> > is likely to be large can could loose a fwe bits without issue.
>> >
>> > That is, in fixed point scaling like this, you want to start with the
>> > biggest numbers, not the smallest, otherwise you loose too much.
>> >
>> > The flip side is of course that now you can share a multiplcation.
>> >
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -682,7 +682,7 @@ void init_entity_runnable_average(struct
>> >         sa->load_avg = scale_load_down(se->load.weight);
>> >         sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>> >         sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>> > -       sa->util_sum = LOAD_AVG_MAX;
>> > +       sa->util_sum = sa->util_avg * LOAD_AVG_MAX;
>> >         /* when this task enqueue'ed, it will contribute to its cfs_rq's load_avg */
>> >  }
>> >
>> > @@ -2515,6 +2515,10 @@ static u32 __compute_runnable_contrib(u6
>> >         return contrib + runnable_avg_yN_sum[n];
>> >  }
>> >
>> > +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
>> > +#error "load tracking assumes 2^10 as unit"
>> > +#endif
>>
>> so why don't we set SCHED_CAPACITY_SHIFT to SCHED_LOAD_SHIFT ?
>
> Don't you mean:
>
> #define SCHED_LOAD_SHIFT (SCHED_CAPACITY_SHIFT + SCHED_LOAD_RESOLUTION)

yes you're right

>
> ?
>
> Or do you want to increase the capacity resolution as well if you
> increase the load resolution?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 8, 2015, 3:17 p.m. UTC | #15
On 8 September 2015 at 16:10, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 08, 2015 at 03:39:37PM +0200, Vincent Guittot wrote:
>> > Now, given all that, units are a complete mess here, and I'd not mind
>> > something like:
>> >
>> > #if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != SCHED_CAPACITY_SHIFT
>> > #error "something usefull"
>> > #endif
>>
>> In this case why not simply doing
>> #define SCHED_CAPACITY_SHIFT SCHED_LOAD_SHIFT
>> or the opposite ?
>
> Sadly not enough; aside from the fact that we really should do !0
> LOAD_RESOLUTION on 64bit, the whole magic tables (runnable_avg_yN_*[])
> and LOAD_AVG_MAX* values rely on the unit being 1<<10.

ah yes, i forgot to take into account the LOAD_RESOLUTION.

So after some more thinking,  i finally don't see where in the code,
we will have a issue if SCHED_CAPACITY_SHIFT is not equal to
(SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) or not equal to 10 with the
respect of using a value that doesn't overflow the variables

Regards,
Vincent

>
> So regardless of defining one in terms of the other, we should check
> both are in fact 10 and error out otherwise.
>
> Changing them must involve recomputing these numbers or otherwise
> mucking about with shifts to ensure its back to 10 when we do this load
> muck.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Sept. 8, 2015, 3:33 p.m. UTC | #16
On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> 
> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
> 1<<10, it is just the sum of the geometric series and the upper bound of
> util_sum?

It needs a 1024, it might just have been the 1024 ns we use a period
instead of the scale unit though.

The LOAD_AVG_MAX is the number where adding a next element to the series
doesn't change the result anymore, so scaling it up will allow more
significant elements to the series before we bottom out, which is the _N
thing.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Yuyang Du Sept. 9, 2015, 7:07 p.m. UTC | #17
On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>  
> +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> +#error "load tracking assumes 2^10 as unit"
> +#endif
> +

Sorry for late response. I might already missed somthing.

But I got a bit lost here, with:

#define SCHED_LOAD_SHIFT        (10 + SCHED_LOAD_RESOLUTION)
#define SCHED_CAPACITY_SHIFT    10

the #if is certainly false.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Yuyang Du Sept. 9, 2015, 8:15 p.m. UTC | #18
On Tue, Sep 08, 2015 at 01:50:38PM +0100, Dietmar Eggemann wrote:
> > It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
> > SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
> > scale the value in the right range. In the case of cpu_usage which
> > returns sa->util_avg , it's the capacity range not the load range.
> 
> Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> CAPACITY have no unit.

To be more accurate, probably, LOAD can be thought of as having unit,
but UTIL has no unit.
 
Anyway, those are my definitions:

1) unit, only for LOAD, and SCHED_LOAD_X is the unit (but
   SCHED_LOAD_RESOLUTION make it also some 2, see below)
2) range, aka, resolution or fix-point percentage (as Ben said)
3) timing ratio, LOAD_AVG_MAX etc, unralated with SCHED_LOAD_X

> >> I always thought that scale_load_down() takes care of that.
> > 
> > AFAIU, scale_load_down is a way to increase the resolution  of the
> > load not to move from load to capacity
> 
> I tried to figure out why we have this issue when comparing UTIL w/
> CAPACITY and not LOAD w/ CAPACITY:
> 
> Both are initialized like that:
> 
>  sa->load_avg = scale_load_down(se->load.weight);
>  sa->load_sum = sa->load_avg * LOAD_AVG_MAX;
>  sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>  sa->util_sum = LOAD_AVG_MAX;
> 
> and we use 'se->on_rq * scale_load_down(se->load.weight)' as 'unsigned
> long weight' argument to call __update_load_avg() making sure the
> scaling differences between LOAD and CAPACITY are respected while
> updating sa->load_sum (and sa->load_avg).

Yes, because we used SCHED_LOAD_X as both unit and range for LOAD.
 
> OTAH, we don't apply a scale_load_down for sa->util_[sum/avg] only a '<<
> SCHED_LOAD_SHIFT) / LOAD_AVG_MAX' on sa->util_avg.
> So changing '<< SCHED_LOAD_SHIFT' to '*
> scale_load_down(SCHED_LOAD_SCALE)' would be the logical thing to do.

Actually, for UTIL, we only need range, so don't conflate with LOAD,
what about we get all these clarified by redefining SCHED_LOAD_RESOLUTION
as the resolution/range generic macro for LOAD, UTIL, and CAPACITY:

#define SCHED_RESOLUTION_SHIFT  10
#define SCHED_RESOLUTION_SCALE  (1L << SCHED_RESOLUTION_SHIFT)

#if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
# define scale_load(w)          ((w) << SCHED_RESOLUTION_SHIFT)
# define scale_load_down(w)     ((w) >> SCHED_RESOLUTION_SHIFT)
# define SCHED_LOAD_SHIFT       (10 + SCHED_RESOLUTION_SHIFT)
#else
# define scale_load(w)          (w)
# define scale_load_down(w)     (w)
# define SCHED_LOAD_SHIFT       (10)
#endif

#define SCHED_LOAD_SCALE        (1L << SCHED_LOAD_SHIFT)

For UTIL, e.g., it will be initiated as:
sa->util_avg = SCHED_RESOLUTION_SCALE;

And for capacity, we just use SCHED_RESOLUTION_SHIFT
(so SCHED_CAPACITY_SHIFT is not needed).

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Segall Sept. 9, 2015, 10:23 p.m. UTC | #19
Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
>> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> 
>> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
>> 1<<10, it is just the sum of the geometric series and the upper bound of
>> util_sum?
>
> It needs a 1024, it might just have been the 1024 ns we use a period
> instead of the scale unit though.
>
> The LOAD_AVG_MAX is the number where adding a next element to the series
> doesn't change the result anymore, so scaling it up will allow more
> significant elements to the series before we bottom out, which is the _N
> thing.
>

Yes, as the comments say, the 1024ns unit is arbitrary (and is an
average of not-quite-microseconds instead of just nanoseconds to allow
more bits to load.weight when we multiply load.weight by this number).
In fact there are two arbitrary 1024 units here, which are technically
unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
operate on units of almost-microseconds and we also do decays every
almost-millisecond.

There appears to be a bunch of confusion in the current code around
util_sum/util_avg which appears to using SCHED_LOAD_SCALE
for a fixed-point percentage or something, which is at least reasonable,
but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
results in either initializing as 100% or .1% depending on RESOLUTION.
This'll get clobbered on first update, but if it needs to be
initialized, it should either get initialized to something sane or at
least consistent.

load_sum/load_avg appear to be scale_load_down()ed properly, and appear
to be used as such at a quick glance.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Sept. 10, 2015, 10:06 a.m. UTC | #20
On Thu, Sep 10, 2015 at 03:07:48AM +0800, Yuyang Du wrote:
> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> >  
> > +#if (SCHED_LOAD_SHIFT - SCHED_LOAD_RESOLUTION) != 10 || SCHED_CAPACITY_SHIFT != 10
> > +#error "load tracking assumes 2^10 as unit"
> > +#endif
> > +
> 
> Sorry for late response. I might already missed somthing.
> 
> But I got a bit lost here, with:
> 
> #define SCHED_LOAD_SHIFT        (10 + SCHED_LOAD_RESOLUTION)
> #define SCHED_CAPACITY_SHIFT    10
> 
> the #if is certainly false.

That is intended, triggering #error would be 'bad'.

The reason for this bit is to raise a stink if someone 'accidentally'
changes one of these values and expects things to just work.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Sept. 10, 2015, 10:07 a.m. UTC | #21
On Thu, Sep 10, 2015 at 04:15:20AM +0800, Yuyang Du wrote:
> On Tue, Sep 08, 2015 at 01:50:38PM +0100, Dietmar Eggemann wrote:
> > > It's both a unit and a SCALE/SHIFT problem, SCHED_LOAD_SHIFT and
> > > SCHED_CAPACITY_SHIFT are defined separately so we must be sure to
> > > scale the value in the right range. In the case of cpu_usage which
> > > returns sa->util_avg , it's the capacity range not the load range.
> > 
> > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> > CAPACITY have no unit.
> 
> To be more accurate, probably, LOAD can be thought of as having unit,
> but UTIL has no unit.

But I'm thinking that is wrong; it should have one, esp. if we go scale
the thing. Giving it the same fixed point unit as load simplifies the
code.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 10, 2015, 11:06 a.m. UTC | #22
On Wed, Sep 09, 2015 at 03:23:43PM -0700, bsegall@google.com wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> >> 
> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
> >> 1<<10, it is just the sum of the geometric series and the upper bound of
> >> util_sum?
> >
> > It needs a 1024, it might just have been the 1024 ns we use a period
> > instead of the scale unit though.
> >
> > The LOAD_AVG_MAX is the number where adding a next element to the series
> > doesn't change the result anymore, so scaling it up will allow more
> > significant elements to the series before we bottom out, which is the _N
> > thing.
> >
> 
> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
> average of not-quite-microseconds instead of just nanoseconds to allow
> more bits to load.weight when we multiply load.weight by this number).
> In fact there are two arbitrary 1024 units here, which are technically
> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
> operate on units of almost-microseconds and we also do decays every
> almost-millisecond.
> 
> There appears to be a bunch of confusion in the current code around
> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
> for a fixed-point percentage or something, which is at least reasonable,
> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
> results in either initializing as 100% or .1% depending on RESOLUTION.
> This'll get clobbered on first update, but if it needs to be
> initialized, it should either get initialized to something sane or at
> least consistent.

This is what I thought too. The whole geometric series math is completely
independent of the scale used for priority in load_avg and the fixed
point shifting used for util_avg.

> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
> to be used as such at a quick glance.

I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
right:

	sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;

util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):

        sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);

so it appear to be intended to be using low resolution like load_avg
(weight is scaled down before it is passed into __update_load_avg()),
but util_avg is shifted up to high resolution. It should be:

	sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
					SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;

to be consistent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Vincent Guittot Sept. 10, 2015, 11:11 a.m. UTC | #23
On 10 September 2015 at 13:06, Morten Rasmussen
<morten.rasmussen@arm.com> wrote:
> On Wed, Sep 09, 2015 at 03:23:43PM -0700, bsegall@google.com wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>>
>> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
>> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> >>
>> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
>> >> 1<<10, it is just the sum of the geometric series and the upper bound of
>> >> util_sum?
>> >
>> > It needs a 1024, it might just have been the 1024 ns we use a period
>> > instead of the scale unit though.
>> >
>> > The LOAD_AVG_MAX is the number where adding a next element to the series
>> > doesn't change the result anymore, so scaling it up will allow more
>> > significant elements to the series before we bottom out, which is the _N
>> > thing.
>> >
>>
>> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
>> average of not-quite-microseconds instead of just nanoseconds to allow
>> more bits to load.weight when we multiply load.weight by this number).
>> In fact there are two arbitrary 1024 units here, which are technically
>> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
>> operate on units of almost-microseconds and we also do decays every
>> almost-millisecond.
>>
>> There appears to be a bunch of confusion in the current code around
>> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
>> for a fixed-point percentage or something, which is at least reasonable,
>> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
>> results in either initializing as 100% or .1% depending on RESOLUTION.
>> This'll get clobbered on first update, but if it needs to be
>> initialized, it should either get initialized to something sane or at
>> least consistent.
>
> This is what I thought too. The whole geometric series math is completely
> independent of the scale used for priority in load_avg and the fixed
> point shifting used for util_avg.
>
>> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
>> to be used as such at a quick glance.
>
> I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
> right:
>
>         sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>
> util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):
>
>         sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>
> so it appear to be intended to be using low resolution like load_avg
> (weight is scaled down before it is passed into __update_load_avg()),
> but util_avg is shifted up to high resolution. It should be:
>
>         sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
>                                         SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;

you probably mean (SCHED_LOAD_SHIFT -  SCHED_LOAD_RESOLUTION)

The goal of this patchset is to be able to scale util_avg in the range
of cpu capacity so why don't we directly initialize it with
sa->util_avg = SCHED_CAPACITY_SCALE;

and then use

 sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;

so we don't have to take care of high and low load resolution
Regards,

>
> to be consistent.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 10, 2015, 12:10 p.m. UTC | #24
On Thu, Sep 10, 2015 at 01:11:01PM +0200, Vincent Guittot wrote:
> On 10 September 2015 at 13:06, Morten Rasmussen
> <morten.rasmussen@arm.com> wrote:
> > On Wed, Sep 09, 2015 at 03:23:43PM -0700, bsegall@google.com wrote:
> >> Peter Zijlstra <peterz@infradead.org> writes:
> >>
> >> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
> >> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
> >> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
> >> >>
> >> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
> >> >> 1<<10, it is just the sum of the geometric series and the upper bound of
> >> >> util_sum?
> >> >
> >> > It needs a 1024, it might just have been the 1024 ns we use a period
> >> > instead of the scale unit though.
> >> >
> >> > The LOAD_AVG_MAX is the number where adding a next element to the series
> >> > doesn't change the result anymore, so scaling it up will allow more
> >> > significant elements to the series before we bottom out, which is the _N
> >> > thing.
> >> >
> >>
> >> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
> >> average of not-quite-microseconds instead of just nanoseconds to allow
> >> more bits to load.weight when we multiply load.weight by this number).
> >> In fact there are two arbitrary 1024 units here, which are technically
> >> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
> >> operate on units of almost-microseconds and we also do decays every
> >> almost-millisecond.
> >>
> >> There appears to be a bunch of confusion in the current code around
> >> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
> >> for a fixed-point percentage or something, which is at least reasonable,
> >> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
> >> results in either initializing as 100% or .1% depending on RESOLUTION.
> >> This'll get clobbered on first update, but if it needs to be
> >> initialized, it should either get initialized to something sane or at
> >> least consistent.
> >
> > This is what I thought too. The whole geometric series math is completely
> > independent of the scale used for priority in load_avg and the fixed
> > point shifting used for util_avg.
> >
> >> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
> >> to be used as such at a quick glance.
> >
> > I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
> > right:
> >
> >         sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
> >
> > util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):
> >
> >         sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
> >
> > so it appear to be intended to be using low resolution like load_avg
> > (weight is scaled down before it is passed into __update_load_avg()),
> > but util_avg is shifted up to high resolution. It should be:
> >
> >         sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
> >                                         SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;
> 
> you probably mean (SCHED_LOAD_SHIFT -  SCHED_LOAD_RESOLUTION)

Yes. Thanks for providing the right expression. There seems to be enough
confusion in this thread already :)

> The goal of this patchset is to be able to scale util_avg in the range
> of cpu capacity so why don't we directly initialize it with
> sa->util_avg = SCHED_CAPACITY_SCALE;
> 
> and then use
> 
>  sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> 
> so we don't have to take care of high and low load resolution

That works for me, except that the left-shift has gone be PeterZ's
optimization patch posted earlier in this thread. It is changing
util_sum to scaled by capacity instead of being the pure geometric
series which requires the left shift at the end when we divide by
LOAD_AVG_MAX. So it should be equivalent to what you are proposing if we
change the initialization to your proposal too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Segall Sept. 10, 2015, 5:23 p.m. UTC | #25
Morten Rasmussen <morten.rasmussen@arm.com> writes:

> On Wed, Sep 09, 2015 at 03:23:43PM -0700, bsegall@google.com wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Tue, Sep 08, 2015 at 03:31:58PM +0100, Morten Rasmussen wrote:
>> >> On Tue, Sep 08, 2015 at 02:52:05PM +0200, Peter Zijlstra wrote:
>> >> > > Tricky that, LOAD_AVG_MAX very much relies on the unit being 1<<10.
>> >> 
>> >> I don't get why LOAD_AVG_MAX relies on the util_avg shifting being
>> >> 1<<10, it is just the sum of the geometric series and the upper bound of
>> >> util_sum?
>> >
>> > It needs a 1024, it might just have been the 1024 ns we use a period
>> > instead of the scale unit though.
>> >
>> > The LOAD_AVG_MAX is the number where adding a next element to the series
>> > doesn't change the result anymore, so scaling it up will allow more
>> > significant elements to the series before we bottom out, which is the _N
>> > thing.
>> >
>> 
>> Yes, as the comments say, the 1024ns unit is arbitrary (and is an
>> average of not-quite-microseconds instead of just nanoseconds to allow
>> more bits to load.weight when we multiply load.weight by this number).
>> In fact there are two arbitrary 1024 units here, which are technically
>> unrelated and are both unrelated to SCHED_LOAD_RESOLUTION/etc - we
>> operate on units of almost-microseconds and we also do decays every
>> almost-millisecond.
>> 
>> There appears to be a bunch of confusion in the current code around
>> util_sum/util_avg which appears to using SCHED_LOAD_SCALE
>> for a fixed-point percentage or something, which is at least reasonable,
>> but is initializing it as scale_load_down(SCHED_LOAD_SCALE), which
>> results in either initializing as 100% or .1% depending on RESOLUTION.
>> This'll get clobbered on first update, but if it needs to be
>> initialized, it should either get initialized to something sane or at
>> least consistent.
>
> This is what I thought too. The whole geometric series math is completely
> independent of the scale used for priority in load_avg and the fixed
> point shifting used for util_avg.
>
>> load_sum/load_avg appear to be scale_load_down()ed properly, and appear
>> to be used as such at a quick glance.
>
> I don't think shifting by SCHED_LOAD_SHIFT in __update_load_avg() is
> right:
>
> 	sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
>
> util_avg is initialized to low resolution (>> SCHED_LOAD_RESOLUTION):
>
>         sa->util_avg = scale_load_down(SCHED_LOAD_SCALE);
>
> so it appear to be intended to be using low resolution like load_avg
> (weight is scaled down before it is passed into __update_load_avg()),
> but util_avg is shifted up to high resolution. It should be:
>
> 	sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
> 					SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;
>
> to be consistent.

Yeah, util_avg was/is screwed up in terms of either the initialization
or which shift to use there. The load ones however appear to be fine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Yuyang Du Sept. 11, 2015, 12:50 a.m. UTC | #26
On Thu, Sep 10, 2015 at 01:10:19PM +0100, Morten Rasmussen wrote:
> > > so it appear to be intended to be using low resolution like load_avg
> > > (weight is scaled down before it is passed into __update_load_avg()),
> > > but util_avg is shifted up to high resolution. It should be:
> > >
> > >         sa->util_avg = (sa->util_sum << (SCHED_LOAD_SHIFT -
> > >                                         SCHED_LOAD_SHIFT)) / LOAD_AVG_MAX;
> > 
> > you probably mean (SCHED_LOAD_SHIFT -  SCHED_LOAD_RESOLUTION)
> 
> Yes. Thanks for providing the right expression. There seems to be enough
> confusion in this thread already :)
 
And yes, it is my bad in the first place, sorry, I did not think it though :)

> > The goal of this patchset is to be able to scale util_avg in the range
> > of cpu capacity so why don't we directly initialize it with
> > sa->util_avg = SCHED_CAPACITY_SCALE;

Yes, we should, and specifically, it is bacause we can combine the
resolution thing for util% * capacity%, so we only need to use the
resolution once.

> > and then use
> > 
> >  sa->util_avg = (sa->util_sum << SCHED_CAPACITY_SHIFT) / LOAD_AVG_MAX;
> > 
> > so we don't have to take care of high and low load resolution
> 
> That works for me, except that the left-shift has gone be PeterZ's
> optimization patch posted earlier in this thread. It is changing
> util_sum to scaled by capacity instead of being the pure geometric
> series which requires the left shift at the end when we divide by
> LOAD_AVG_MAX. So it should be equivalent to what you are proposing if we
> change the initialization to your proposal too.

I previously initialized the util_sum as:

sa->util_sum = LOAD_AVG_MAX;

it is because wihout capacity adjustment, this can save some multiplications
in __update_load_avg(), but actually if we do capacity adjustment, we must
multiply anyway, so it is better we initialize it as:

sa->util_sum = sa->util_avg * LOAD_AVG_MAX;

Anyway, with the patch I posted in the other email in this thread, we
can fix all this very clearly, I hope so. I did not post a fix patch,
it is because the solutions are already there, it is just how we make it 
look better, and you can provide it in your new version.

Thanks,
Yuyang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 11, 2015, 10:31 a.m. UTC | #27
On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
> On Thu, Sep 10, 2015 at 12:07:27PM +0200, Peter Zijlstra wrote:
> > > > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
> > > > CAPACITY have no unit.
> > > 
> > > To be more accurate, probably, LOAD can be thought of as having unit,
> > > but UTIL has no unit.
> > 
> > But I'm thinking that is wrong; it should have one, esp. if we go scale
> > the thing. Giving it the same fixed point unit as load simplifies the
> > code.
> 
> I think we probably are saying the same thing with different terms. Anyway,
> let me reiterate what I said and make it a little more formalized.
> 
> UTIL has no unit because it is pure ratio, the cpu_running%, which is in the
> range of [0, 100%], and we increase the resolution, because we don't want
> to lose many (due to integer rounding) by multiplying a number (say 1024), then
> the range becomes [0, 1024].

Fully agree, and with frequency invariance we basically scale running
time to take into account that the cpu might be running slower that it
is capable of at the highest frequency. With cpu invariance also scale
by any difference their might be in max frequency and/or cpu
micro-archiecture so utilization becomes comparable between cpus. One
can also see it as we slow down or speed up time depending the current
compute capacity of the cpu relative to the max capacity.

> CAPACITY is also a ratio of ACTUAL_PERF/MAX_PERF, from (0, 1]. Even LOAD
> is the same, a ratio of NICE_X/NICE_0, from [15/1024=0.015, 88761/1024=86.68],
> as it only has relativity meaning (i.e., when comparing to each other).

Fully agree. Though 'LOAD' is a somewhat overloaded term in the
scheduler. Just to be clear, you refer to load.weight, load_avg is the
multiplication of load.weight and the task runnable time ratio.

> I said it has unit, it is in the sense that it looks like currency (for instance,
> Yuan), you can use to buy CPU fair share. But it is just how you look at it and
> there are certainly many other ways.
> 
> So, I still propose to generalize all these with the following patch, in the
> belief that this makes it simple and clear, and error-reducing. 
> 
> --
> 
> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>  definition
> 
> A integer metric needs certain resolution to allow how much detail we
> can look into (not losing detail by integer rounding), which also
> determines the range of the metrics.
> 
> For instance, to increase the resolution of [0, 1] (two levels), one
> can multiply 1024 and get [0, 1024] (1025 levels).
> 
> In sched/fair, a few metrics depend on the resolution: load/load_avg,
> util_avg, and capacity (frequency adjustment). In order to reduce the
> risks of making mistakes relating to resolution/range, we therefore
> generalize the resolution by defining a basic resolution constant
> number, and then formalize all metrics to depend on the basic
> resolution. The basic resolution is 1024 or (1 << 10). Further, one
> can recursively apply another basic resolution to increase the final
> resolution (e.g., 1048676=1<<20).
> 
> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
> ---
>  include/linux/sched.h |  2 +-
>  kernel/sched/sched.h  | 12 +++++++-----
>  2 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 119823d..55a7b93 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>  /*
>   * Increase resolution of cpu_capacity calculations
>   */
> -#define SCHED_CAPACITY_SHIFT	10
> +#define SCHED_CAPACITY_SHIFT	SCHED_RESOLUTION_SHIFT
>  #define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
>  
>  /*
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 68cda11..d27cdd8 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>   */
>  #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>  
> +# define SCHED_RESOLUTION_SHIFT	10
> +# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
> +
>  /*
>   * Increase resolution of nice-level calculations for 64-bit architectures.
>   * The extra resolution improves shares distribution and load balancing of
> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>   * increased costs.
>   */
>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> -# define SCHED_LOAD_RESOLUTION	10
> -# define scale_load(w)		((w) << SCHED_LOAD_RESOLUTION)
> -# define scale_load_down(w)	((w) >> SCHED_LOAD_RESOLUTION)
> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
> +# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
> +# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
>  #else
> -# define SCHED_LOAD_RESOLUTION	0
> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
>  # define scale_load(w)		(w)
>  # define scale_load_down(w)	(w)
>  #endif
>  
> -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
>  #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
>  
>  #define NICE_0_LOAD		SCHED_LOAD_SCALE

I think this is pretty much the required relationship between all the
SHIFTs and SCALEs that Peter checked for in his #if-#error thing
earlier, so no disagreements from my side :-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Segall Sept. 11, 2015, 5:05 p.m. UTC | #28
Morten Rasmussen <morten.rasmussen@arm.com> writes:

> On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
>> On Thu, Sep 10, 2015 at 12:07:27PM +0200, Peter Zijlstra wrote:
>> > > > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
>> > > > CAPACITY have no unit.
>> > > 
>> > > To be more accurate, probably, LOAD can be thought of as having unit,
>> > > but UTIL has no unit.
>> > 
>> > But I'm thinking that is wrong; it should have one, esp. if we go scale
>> > the thing. Giving it the same fixed point unit as load simplifies the
>> > code.
>> 
>> I think we probably are saying the same thing with different terms. Anyway,
>> let me reiterate what I said and make it a little more formalized.
>> 
>> UTIL has no unit because it is pure ratio, the cpu_running%, which is in the
>> range of [0, 100%], and we increase the resolution, because we don't want
>> to lose many (due to integer rounding) by multiplying a number (say 1024), then
>> the range becomes [0, 1024].
>
> Fully agree, and with frequency invariance we basically scale running
> time to take into account that the cpu might be running slower that it
> is capable of at the highest frequency. With cpu invariance also scale
> by any difference their might be in max frequency and/or cpu
> micro-archiecture so utilization becomes comparable between cpus. One
> can also see it as we slow down or speed up time depending the current
> compute capacity of the cpu relative to the max capacity.
>
>> CAPACITY is also a ratio of ACTUAL_PERF/MAX_PERF, from (0, 1]. Even LOAD
>> is the same, a ratio of NICE_X/NICE_0, from [15/1024=0.015, 88761/1024=86.68],
>> as it only has relativity meaning (i.e., when comparing to each other).
>
> Fully agree. Though 'LOAD' is a somewhat overloaded term in the
> scheduler. Just to be clear, you refer to load.weight, load_avg is the
> multiplication of load.weight and the task runnable time ratio.
>
>> I said it has unit, it is in the sense that it looks like currency (for instance,
>> Yuan), you can use to buy CPU fair share. But it is just how you look at it and
>> there are certainly many other ways.
>> 
>> So, I still propose to generalize all these with the following patch, in the
>> belief that this makes it simple and clear, and error-reducing. 
>> 
>> --
>> 
>> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>>  definition
>> 
>> A integer metric needs certain resolution to allow how much detail we
>> can look into (not losing detail by integer rounding), which also
>> determines the range of the metrics.
>> 
>> For instance, to increase the resolution of [0, 1] (two levels), one
>> can multiply 1024 and get [0, 1024] (1025 levels).
>> 
>> In sched/fair, a few metrics depend on the resolution: load/load_avg,
>> util_avg, and capacity (frequency adjustment). In order to reduce the
>> risks of making mistakes relating to resolution/range, we therefore
>> generalize the resolution by defining a basic resolution constant
>> number, and then formalize all metrics to depend on the basic
>> resolution. The basic resolution is 1024 or (1 << 10). Further, one
>> can recursively apply another basic resolution to increase the final
>> resolution (e.g., 1048676=1<<20).
>> 
>> Signed-off-by: Yuyang Du <yuyang.du@intel.com>
>> ---
>>  include/linux/sched.h |  2 +-
>>  kernel/sched/sched.h  | 12 +++++++-----
>>  2 files changed, 8 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 119823d..55a7b93 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>>  /*
>>   * Increase resolution of cpu_capacity calculations
>>   */
>> -#define SCHED_CAPACITY_SHIFT	10
>> +#define SCHED_CAPACITY_SHIFT	SCHED_RESOLUTION_SHIFT
>>  #define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
>>  
>>  /*
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 68cda11..d27cdd8 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>>   */
>>  #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>>  
>> +# define SCHED_RESOLUTION_SHIFT	10
>> +# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
>> +
>>  /*
>>   * Increase resolution of nice-level calculations for 64-bit architectures.
>>   * The extra resolution improves shares distribution and load balancing of
>> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>>   * increased costs.
>>   */
>>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
>> -# define SCHED_LOAD_RESOLUTION	10
>> -# define scale_load(w)		((w) << SCHED_LOAD_RESOLUTION)
>> -# define scale_load_down(w)	((w) >> SCHED_LOAD_RESOLUTION)
>> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
>> +# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
>> +# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
>>  #else
>> -# define SCHED_LOAD_RESOLUTION	0
>> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
>>  # define scale_load(w)		(w)
>>  # define scale_load_down(w)	(w)
>>  #endif
>>  
>> -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
>>  #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
>>  
>>  #define NICE_0_LOAD		SCHED_LOAD_SCALE
>
> I think this is pretty much the required relationship between all the
> SHIFTs and SCALEs that Peter checked for in his #if-#error thing
> earlier, so no disagreements from my side :-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
required to be the same value and should not be conflated.

In particular, since cgroups are on the same timeline as tasks and their
shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
that SCHED_LOAD_RESOLUTION is invisible), changing that part of
SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
= 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
an internal value to the kernel.

In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Yuyang Du Sept. 11, 2015, 6:24 p.m. UTC | #29
On Fri, Sep 11, 2015 at 10:05:53AM -0700, bsegall@google.com wrote:
> 
> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> required to be the same value and should not be conflated.
 
> In particular, since cgroups are on the same timeline as tasks and their
> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> an internal value to the kernel.
>
> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.

Not fully looked into the concerns, but the new SCHED_RESOLUTION_SHIFT
is intended to formalize all the integer metrics that need better resolution.
It is not special to any metric, so actually it is to de-conflate whoever is
conflated.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 14, 2015, 12:56 p.m. UTC | #30
On Fri, Sep 11, 2015 at 10:05:53AM -0700, bsegall@google.com wrote:
> Morten Rasmussen <morten.rasmussen@arm.com> writes:
> 
> > On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> index 119823d..55a7b93 100644
> >> --- a/include/linux/sched.h
> >> +++ b/include/linux/sched.h
> >> @@ -912,7 +912,7 @@ enum cpu_idle_type {
> >>  /*
> >>   * Increase resolution of cpu_capacity calculations
> >>   */
> >> -#define SCHED_CAPACITY_SHIFT	10
> >> +#define SCHED_CAPACITY_SHIFT	SCHED_RESOLUTION_SHIFT
> >>  #define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
> >>  
> >>  /*
> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> index 68cda11..d27cdd8 100644
> >> --- a/kernel/sched/sched.h
> >> +++ b/kernel/sched/sched.h
> >> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
> >>   */
> >>  #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
> >>  
> >> +# define SCHED_RESOLUTION_SHIFT	10
> >> +# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
> >> +
> >>  /*
> >>   * Increase resolution of nice-level calculations for 64-bit architectures.
> >>   * The extra resolution improves shares distribution and load balancing of
> >> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
> >>   * increased costs.
> >>   */
> >>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> >> -# define SCHED_LOAD_RESOLUTION	10
> >> -# define scale_load(w)		((w) << SCHED_LOAD_RESOLUTION)
> >> -# define scale_load_down(w)	((w) >> SCHED_LOAD_RESOLUTION)
> >> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
> >> +# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
> >> +# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
> >>  #else
> >> -# define SCHED_LOAD_RESOLUTION	0
> >> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
> >>  # define scale_load(w)		(w)
> >>  # define scale_load_down(w)	(w)
> >>  #endif
> >>  
> >> -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
> >>  #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
> >>  
> >>  #define NICE_0_LOAD		SCHED_LOAD_SCALE
> >
> > I think this is pretty much the required relationship between all the
> > SHIFTs and SCALEs that Peter checked for in his #if-#error thing
> > earlier, so no disagreements from my side :-)
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> 
> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> required to be the same value and should not be conflated.
> 
> In particular, since cgroups are on the same timeline as tasks and their
> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> an internal value to the kernel.
> 
> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.

I think I follow, but doesn't that mean that the current code is broken
too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:

#define SCHED_LOAD_SHIFT        (10 + SCHED_LOAD_RESOLUTION)
#define SCHED_LOAD_SCALE        (1L << SCHED_LOAD_SHIFT)

#define NICE_0_LOAD             SCHED_LOAD_SCALE
#define NICE_0_SHIFT            SCHED_LOAD_SHIFT

To me it sounds like we need to define it the other way around:

#define NICE_0_SHIFT            10
#define NICE_0_LOAD             (1L << NICE_0_SHIFT)

and then add any additional resolution bits from there to ensure that
NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Segall Sept. 14, 2015, 5:34 p.m. UTC | #31
Morten Rasmussen <morten.rasmussen@arm.com> writes:

> On Fri, Sep 11, 2015 at 10:05:53AM -0700, bsegall@google.com wrote:
>> Morten Rasmussen <morten.rasmussen@arm.com> writes:
>> 
>> > On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
>> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> >> index 119823d..55a7b93 100644
>> >> --- a/include/linux/sched.h
>> >> +++ b/include/linux/sched.h
>> >> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>> >>  /*
>> >>   * Increase resolution of cpu_capacity calculations
>> >>   */
>> >> -#define SCHED_CAPACITY_SHIFT	10
>> >> +#define SCHED_CAPACITY_SHIFT	SCHED_RESOLUTION_SHIFT
>> >>  #define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
>> >>  
>> >>  /*
>> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> >> index 68cda11..d27cdd8 100644
>> >> --- a/kernel/sched/sched.h
>> >> +++ b/kernel/sched/sched.h
>> >> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>> >>   */
>> >>  #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>> >>  
>> >> +# define SCHED_RESOLUTION_SHIFT	10
>> >> +# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
>> >> +
>> >>  /*
>> >>   * Increase resolution of nice-level calculations for 64-bit architectures.
>> >>   * The extra resolution improves shares distribution and load balancing of
>> >> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
>> >>   * increased costs.
>> >>   */
>> >>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
>> >> -# define SCHED_LOAD_RESOLUTION	10
>> >> -# define scale_load(w)		((w) << SCHED_LOAD_RESOLUTION)
>> >> -# define scale_load_down(w)	((w) >> SCHED_LOAD_RESOLUTION)
>> >> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
>> >> +# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
>> >> +# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
>> >>  #else
>> >> -# define SCHED_LOAD_RESOLUTION	0
>> >> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
>> >>  # define scale_load(w)		(w)
>> >>  # define scale_load_down(w)	(w)
>> >>  #endif
>> >>  
>> >> -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
>> >>  #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
>> >>  
>> >>  #define NICE_0_LOAD		SCHED_LOAD_SCALE
>> >
>> > I think this is pretty much the required relationship between all the
>> > SHIFTs and SCALEs that Peter checked for in his #if-#error thing
>> > earlier, so no disagreements from my side :-)
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at  http://www.tux.org/lkml/
>> 
>> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
>> required to be the same value and should not be conflated.
>> 
>> In particular, since cgroups are on the same timeline as tasks and their
>> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
>> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
>> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
>> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
>> an internal value to the kernel.
>> 
>> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
>> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
>
> I think I follow, but doesn't that mean that the current code is broken
> too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
>
> #define SCHED_LOAD_SHIFT        (10 + SCHED_LOAD_RESOLUTION)
> #define SCHED_LOAD_SCALE        (1L << SCHED_LOAD_SHIFT)
>
> #define NICE_0_LOAD             SCHED_LOAD_SCALE
> #define NICE_0_SHIFT            SCHED_LOAD_SHIFT
>
> To me it sounds like we need to define it the other way around:
>
> #define NICE_0_SHIFT            10
> #define NICE_0_LOAD             (1L << NICE_0_SHIFT)
>
> and then add any additional resolution bits from there to ensure that
> NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.

No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
ie including SLR. It has never been clear to me what
SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,
and the new utilization uses of it are entirely unlinked to 1024 == NICE_0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Segall Sept. 14, 2015, 5:36 p.m. UTC | #32
Yuyang Du <yuyang.du@intel.com> writes:

> On Fri, Sep 11, 2015 at 10:05:53AM -0700, bsegall@google.com wrote:
>> 
>> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
>> required to be the same value and should not be conflated.
>  
>> In particular, since cgroups are on the same timeline as tasks and their
>> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
>> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
>> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
>> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
>> an internal value to the kernel.
>>
>> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
>> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
>
> Not fully looked into the concerns, but the new SCHED_RESOLUTION_SHIFT
> is intended to formalize all the integer metrics that need better resolution.
> It is not special to any metric, so actually it is to de-conflate whoever is
> conflated.

It conflates the userspace-invisible SCHED_LOAD_RESOLUTION with the
userspace-visible value of scale_load_down(NICE_0_LOAD). Increasing
SCHED_LOAD_RESOLUTION must not change scale_load_down(NICE_0_LOAD).
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Yuyang Du Sept. 14, 2015, 10:56 p.m. UTC | #33
On Mon, Sep 14, 2015 at 10:34:00AM -0700, bsegall@google.com wrote:
> >> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> >> required to be the same value and should not be conflated.
> >> 
> >> In particular, since cgroups are on the same timeline as tasks and their
> >> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> >> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> >> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> >> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> >> an internal value to the kernel.
> >> 
> >> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> >> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
> >
> > I think I follow, but doesn't that mean that the current code is broken
> > too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
> >
> > #define SCHED_LOAD_SHIFT        (10 + SCHED_LOAD_RESOLUTION)
> > #define SCHED_LOAD_SCALE        (1L << SCHED_LOAD_SHIFT)
> >
> > #define NICE_0_LOAD             SCHED_LOAD_SCALE
> > #define NICE_0_SHIFT            SCHED_LOAD_SHIFT
> >
> > To me it sounds like we need to define it the other way around:
> >
> > #define NICE_0_SHIFT            10
> > #define NICE_0_LOAD             (1L << NICE_0_SHIFT)
> >
> > and then add any additional resolution bits from there to ensure that
> > NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
> 
> No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
> ie including SLR. It has never been clear to me what
> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,
> and the new utilization uses of it are entirely unlinked to 1024 == NICE_0

Presume your SLR means SCHED_LOAD_RESOLUTION:

1) The introduction of (not redefinition of) SCHED_RESOLUTION_SHIFT does not
change anything after macro expansion.

2) The constants in prio_to_weight[] and prio_to_wmult[] are tied to a
resolution of 10bits NICE_0, i.e., 1024, I guest it is the user visible
part you mentioned, so is the cgroup share.

To me, it is all ok. With the SCHED_RESOLUTION_SHIFT, the basic resolution
unit, it is just for us to state clearly, the NICE_0's weight has a fixed
resolution of SCHED_RESOLUTION_SHIFT, or even add this:

#if prio_to_weight[20] != 1 << SCHED_RESOLUTION_SHIFT
error "NICE_0 weight not calibrated"
#endif
/* I can learn, Peter */

I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to me,
they are just integer metrics, needing a resolution respectively. That is it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Morten Rasmussen Sept. 15, 2015, 8:43 a.m. UTC | #34
On Mon, Sep 14, 2015 at 10:34:00AM -0700, bsegall@google.com wrote:
> Morten Rasmussen <morten.rasmussen@arm.com> writes:
> 
> > On Fri, Sep 11, 2015 at 10:05:53AM -0700, bsegall@google.com wrote:
> >> Morten Rasmussen <morten.rasmussen@arm.com> writes:
> >> 
> >> > On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
> >> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> >> >> index 119823d..55a7b93 100644
> >> >> --- a/include/linux/sched.h
> >> >> +++ b/include/linux/sched.h
> >> >> @@ -912,7 +912,7 @@ enum cpu_idle_type {
> >> >>  /*
> >> >>   * Increase resolution of cpu_capacity calculations
> >> >>   */
> >> >> -#define SCHED_CAPACITY_SHIFT	10
> >> >> +#define SCHED_CAPACITY_SHIFT	SCHED_RESOLUTION_SHIFT
> >> >>  #define SCHED_CAPACITY_SCALE	(1L << SCHED_CAPACITY_SHIFT)
> >> >>  
> >> >>  /*
> >> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> >> index 68cda11..d27cdd8 100644
> >> >> --- a/kernel/sched/sched.h
> >> >> +++ b/kernel/sched/sched.h
> >> >> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
> >> >>   */
> >> >>  #define NS_TO_JIFFIES(TIME)	((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
> >> >>  
> >> >> +# define SCHED_RESOLUTION_SHIFT	10
> >> >> +# define SCHED_RESOLUTION_SCALE	(1L << SCHED_RESOLUTION_SHIFT)
> >> >> +
> >> >>  /*
> >> >>   * Increase resolution of nice-level calculations for 64-bit architectures.
> >> >>   * The extra resolution improves shares distribution and load balancing of
> >> >> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq *this_rq) { }
> >> >>   * increased costs.
> >> >>   */
> >> >>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage under light load  */
> >> >> -# define SCHED_LOAD_RESOLUTION	10
> >> >> -# define scale_load(w)		((w) << SCHED_LOAD_RESOLUTION)
> >> >> -# define scale_load_down(w)	((w) >> SCHED_LOAD_RESOLUTION)
> >> >> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT + SCHED_RESOLUTION_SHIFT)
> >> >> +# define scale_load(w)		((w) << SCHED_RESOLUTION_SHIFT)
> >> >> +# define scale_load_down(w)	((w) >> SCHED_RESOLUTION_SHIFT)
> >> >>  #else
> >> >> -# define SCHED_LOAD_RESOLUTION	0
> >> >> +# define SCHED_LOAD_SHIFT	(SCHED_RESOLUTION_SHIFT)
> >> >>  # define scale_load(w)		(w)
> >> >>  # define scale_load_down(w)	(w)
> >> >>  #endif
> >> >>  
> >> >> -#define SCHED_LOAD_SHIFT	(10 + SCHED_LOAD_RESOLUTION)
> >> >>  #define SCHED_LOAD_SCALE	(1L << SCHED_LOAD_SHIFT)
> >> >>  
> >> >>  #define NICE_0_LOAD		SCHED_LOAD_SCALE
> >> >
> >> > I think this is pretty much the required relationship between all the
> >> > SHIFTs and SCALEs that Peter checked for in his #if-#error thing
> >> > earlier, so no disagreements from my side :-)
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> > Please read the FAQ at  http://www.tux.org/lkml/
> >> 
> >> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
> >> required to be the same value and should not be conflated.
> >> 
> >> In particular, since cgroups are on the same timeline as tasks and their
> >> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
> >> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
> >> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
> >> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
> >> an internal value to the kernel.
> >> 
> >> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
> >> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
> >
> > I think I follow, but doesn't that mean that the current code is broken
> > too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
> >
> > #define SCHED_LOAD_SHIFT        (10 + SCHED_LOAD_RESOLUTION)
> > #define SCHED_LOAD_SCALE        (1L << SCHED_LOAD_SHIFT)
> >
> > #define NICE_0_LOAD             SCHED_LOAD_SCALE
> > #define NICE_0_SHIFT            SCHED_LOAD_SHIFT
> >
> > To me it sounds like we need to define it the other way around:
> >
> > #define NICE_0_SHIFT            10
> > #define NICE_0_LOAD             (1L << NICE_0_SHIFT)
> >
> > and then add any additional resolution bits from there to ensure that
> > NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
> 
> No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
> ie including SLR. It has never been clear to me what
> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,

I see, I wasn't sure if NICE_0_LOAD is being used in the code somewhere
with the assumption that NICE_0_LOAD = load.weight = 1024. The
scale_(down_)_load() conversion between base load (nice_0 = 1024) and
hi-res load makes makes sense.

> and the new utilization uses of it are entirely unlinked to 1024 == NICE_0

Yes, agreed. For utilization we just need to define some fixed point
resolution (as Yuyang said). That resolution is independent of the hi-res
load additional bits and should remain so. The same fixed point
resolution has to be used for capacity as well unless we want to
introduce scale_(down_)_capacity() functions to allow utilization to be
compared to capacity.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Segall Sept. 15, 2015, 5:11 p.m. UTC | #35
Yuyang Du <yuyang.du@intel.com> writes:

> On Mon, Sep 14, 2015 at 10:34:00AM -0700, bsegall@google.com wrote:
>> >> SCHED_LOAD_RESOLUTION and the non-SLR part of SCHED_LOAD_SHIFT are not
>> >> required to be the same value and should not be conflated.
>> >> 
>> >> In particular, since cgroups are on the same timeline as tasks and their
>> >> shares are not scaled by SCHED_LOAD_SHIFT in any way (but are scaled so
>> >> that SCHED_LOAD_RESOLUTION is invisible), changing that part of
>> >> SCHED_LOAD_SHIFT would cause issues, since things can assume that nice-0
>> >> = 1024. However changing SCHED_LOAD_RESOLUTION would be fine, as that is
>> >> an internal value to the kernel.
>> >> 
>> >> In addition, changing the non-SLR part of SCHED_LOAD_SHIFT would require
>> >> recomputing all of prio_to_weight/wmult for the new NICE_0_LOAD.
>> >
>> > I think I follow, but doesn't that mean that the current code is broken
>> > too? NICE_0_LOAD changes if you change SCHED_LOAD_RESOLUTION:
>> >
>> > #define SCHED_LOAD_SHIFT        (10 + SCHED_LOAD_RESOLUTION)
>> > #define SCHED_LOAD_SCALE        (1L << SCHED_LOAD_SHIFT)
>> >
>> > #define NICE_0_LOAD             SCHED_LOAD_SCALE
>> > #define NICE_0_SHIFT            SCHED_LOAD_SHIFT
>> >
>> > To me it sounds like we need to define it the other way around:
>> >
>> > #define NICE_0_SHIFT            10
>> > #define NICE_0_LOAD             (1L << NICE_0_SHIFT)
>> >
>> > and then add any additional resolution bits from there to ensure that
>> > NICE_0_LOAD and the prio_to_weight/wmult tables are unchanged.
>> 
>> No, NICE_0_LOAD is supposed to be scale_load(prio_to_weight[nice_0]),
>> ie including SLR. It has never been clear to me what
>> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,
>> and the new utilization uses of it are entirely unlinked to 1024 == NICE_0
>
> Presume your SLR means SCHED_LOAD_RESOLUTION:
>
> 1) The introduction of (not redefinition of) SCHED_RESOLUTION_SHIFT does not
> change anything after macro expansion.
>
> 2) The constants in prio_to_weight[] and prio_to_wmult[] are tied to a
> resolution of 10bits NICE_0, i.e., 1024, I guest it is the user visible
> part you mentioned, so is the cgroup share.
>
> To me, it is all ok. With the SCHED_RESOLUTION_SHIFT, the basic resolution
> unit, it is just for us to state clearly, the NICE_0's weight has a fixed
> resolution of SCHED_RESOLUTION_SHIFT, or even add this:
>
> #if prio_to_weight[20] != 1 << SCHED_RESOLUTION_SHIFT
> error "NICE_0 weight not calibrated"
> #endif
> /* I can learn, Peter */
>
> I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to me,
> they are just integer metrics, needing a resolution respectively. That is it.

Yes this would change nothing at the moment post-expansion, that's not
the point. SLR being 10 bits and the nice-0 being 1024 are completely
and utterly unrelated and the headers should not pretend they need to be
the same value, any more than there should be a #define that is shared
with every other use of 1024 in the kernel.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Yuyang Du Sept. 15, 2015, 6:39 p.m. UTC | #36
On Tue, Sep 15, 2015 at 10:11:41AM -0700, bsegall@google.com wrote:
> >
> > I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to me,
> > they are just integer metrics, needing a resolution respectively. That is it.
> 
> Yes this would change nothing at the moment post-expansion, that's not
> the point. SLR being 10 bits and the nice-0 being 1024 are completely
> and utterly unrelated and the headers should not pretend they need to be
> the same value,

I never said they are related, why should they be related. And they need or
need not to be the same value, fine.

However, the SLR has to be a value. It is because it mighe be 10 or 20 (LOAD),
therefore I make SCHED_RESOLUTION_SHIFT 10 (kind of a denominator). Not the
other way around.

We can define SCHED_RESOLUTION_SHIFT 1, and then define SLR = x * SCHED_RESOLUTION_SHIFT
with x being a random number, if you must.

And by the way, with SCHED_RESOLUTION_SHIFT, there will not be SLR anymore, we only
need SCHED_LOAD_SHIFT, which has a low resolution 1*SCHED_RESOLUTION_SHIFT or a high
one 2*SCHED_RESOLUTION_SHIFT. The scale_load*() is the conversion between the 
resolutions of NICE_0 and NICE_0_LOAD.

> any more than there should be a #define that is shared
> with every other use of 1024 in the kernel.

The point really is, metrics (if not many ) need resolution, not just NICE_0_LOAD does. 
You can choose to either hardcode a number, like SCHED_CAPACITY_SHIFT now,
or you can use SCHED_RESOLUTION_SHIFT, which is even as simple as a sign to say what
the defined is (the scaled one with a better resolution vs. the original one).
I guess this is to say we now have a (no-big-deal) resolution system.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Peter Zijlstra Sept. 16, 2015, 3:36 p.m. UTC | #37
On Mon, Sep 14, 2015 at 10:34:00AM -0700, bsegall@google.com wrote:

> It has never been clear to me what
> SCHED_LOAD_SCALE/SCHED_LOAD_SHIFT were for as opposed to NICE_0_LOAD,

SCHED_LOAD_SCALE/SHIFT are the fixed point mult/shift, and NICE_0_LOAD
is the load of a nice-0 task. They happen to be the same by the choice
that nice-0 has a load of 1 (the only natural choice given proportional
weight and hyperboles etc..).

But for the fixed point math we use SCHED_LOAD_* and only when we want
to explicitly use the unit load we use NICE_0_LOAD (its only used 4-5
times or so).


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Benjamin Segall Sept. 16, 2015, 5:06 p.m. UTC | #38
Yuyang Du <yuyang.du@intel.com> writes:

> On Tue, Sep 15, 2015 at 10:11:41AM -0700, bsegall@google.com wrote:
>> >
>> > I guess you are saying we are conflating NICE_0 with NICE_0_LOAD. But to me,
>> > they are just integer metrics, needing a resolution respectively. That is it.
>> 
>> Yes this would change nothing at the moment post-expansion, that's not
>> the point. SLR being 10 bits and the nice-0 being 1024 are completely
>> and utterly unrelated and the headers should not pretend they need to be
>> the same value,
>
> I never said they are related, why should they be related. And they need or
> need not to be the same value, fine.
>
> However, the SLR has to be a value. It is because it mighe be 10 or 20 (LOAD),
> therefore I make SCHED_RESOLUTION_SHIFT 10 (kind of a denominator). Not the
> other way around.
>
> We can define SCHED_RESOLUTION_SHIFT 1, and then define SLR = x * SCHED_RESOLUTION_SHIFT
> with x being a random number, if you must.

That's sorta the point - you could do this and it would be just as (non-)sensical.

>
>> any more than there should be a #define that is shared
>> with every other use of 1024 in the kernel.
>
> The point really is, metrics (if not many ) need resolution, not just NICE_0_LOAD does. 
> You can choose to either hardcode a number, like SCHED_CAPACITY_SHIFT now,
> or you can use SCHED_RESOLUTION_SHIFT, which is even as simple as a sign to say what
> the defined is (the scaled one with a better resolution vs. the original one).
> I guess this is to say we now have a (no-big-deal) resolution system.

Yes they were chosen for similar reasons, but they are not conceptually
related, and you couldn't decide to just bump up all the resolutions by
changing SCHED_RESOLUTION_SHIFT, so doing this would just be misleading.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Yuyang Du Sept. 17, 2015, 2:31 a.m. UTC | #39
On Wed, Sep 16, 2015 at 10:06:24AM -0700, bsegall@google.com wrote:
> > The point really is, metrics (if not many ) need resolution, not just NICE_0_LOAD does. 
> > You can choose to either hardcode a number, like SCHED_CAPACITY_SHIFT now,
> > or you can use SCHED_RESOLUTION_SHIFT, which is even as simple as a sign to say what
> > the defined is (the scaled one with a better resolution vs. the original one).
> > I guess this is to say we now have a (no-big-deal) resolution system.
> 
> Yes they were chosen for similar reasons, but they are not conceptually
> related, and you couldn't decide to just bump up all the resolutions by
> changing SCHED_RESOLUTION_SHIFT, so doing this would just be misleading.

Yes, it appears they are made seemingly conceptually related. But probably 
it isn't worth a concern, if one knows it is just a scaled integer metric.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3445d2fb38f4..b80f799aface 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2644,7 +2644,7 @@  __update_load_avg(u64 now, int cpu, struct sched_avg *sa,
                        cfs_rq->runnable_load_avg =
                                div_u64(cfs_rq->runnable_load_sum, LOAD_AVG_MAX);
                }
-               sa->util_avg = (sa->util_sum << SCHED_LOAD_SHIFT) / LOAD_AVG_MAX;
+               sa->util_avg = (sa->util_sum * scale_load_down(SCHED_LOAD_SCALE)) / LOAD_AVG_MAX;
        }
 
        return decayed;