diff mbox

[RESEND,v9,05/10] sched: make scale_rt invariant with frequency

Message ID 1421316570-23097-6-git-send-email-vincent.guittot@linaro.org
State New
Headers show

Commit Message

Vincent Guittot Jan. 15, 2015, 10:09 a.m. UTC
The average running time of RT tasks is used to estimate the remaining compute
capacity for CFS tasks. This remaining capacity is the original capacity scaled
down by a factor (aka scale_rt_capacity). This estimation of available capacity
must also be invariant with frequency scaling.

A frequency scaling factor is applied on the running time of the RT tasks for
computing scale_rt_capacity.

In sched_rt_avg_update, we scale the RT execution time like below:
rq->rt_avg += rt_delta * arch_scale_freq_capacity() >> SCHED_CAPACITY_SHIFT

Then, scale_rt_capacity can be summarized by:
scale_rt_capacity = SCHED_CAPACITY_SCALE -
		((rq->rt_avg << SCHED_CAPACITY_SHIFT) / period)

We can optimize by removing right and left shift in the computation of rq->rt_avg
and scale_rt_capacity

The call to arch_scale_frequency_capacity in the rt scheduling path might be
a concern for RT folks because I'm not sure whether we can rely on
arch_scale_freq_capacity to be short and efficient ?

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
---
 kernel/sched/fair.c  | 17 +++++------------
 kernel/sched/sched.h |  4 +++-
 2 files changed, 8 insertions(+), 13 deletions(-)

Comments

Vincent Guittot Feb. 24, 2015, 10:21 a.m. UTC | #1
On 19 February 2015 at 18:18, Morten Rasmussen <morten.rasmussen@arm.com> wrote:
> On Thu, Feb 19, 2015 at 04:52:41PM +0000, Peter Zijlstra wrote:
>> On Thu, Jan 15, 2015 at 11:09:25AM +0100, Vincent Guittot wrote:
>> > The average running time of RT tasks is used to estimate the remaining compute
>> > capacity for CFS tasks. This remaining capacity is the original capacity scaled
>> > down by a factor (aka scale_rt_capacity). This estimation of available capacity
>> > must also be invariant with frequency scaling.
>> >
>> > A frequency scaling factor is applied on the running time of the RT tasks for
>> > computing scale_rt_capacity.
>> >
>> > In sched_rt_avg_update, we scale the RT execution time like below:
>> > rq->rt_avg += rt_delta * arch_scale_freq_capacity() >> SCHED_CAPACITY_SHIFT
>> >
>> > Then, scale_rt_capacity can be summarized by:
>> > scale_rt_capacity = SCHED_CAPACITY_SCALE -
>> >             ((rq->rt_avg << SCHED_CAPACITY_SHIFT) / period)
>> >
>> > We can optimize by removing right and left shift in the computation of rq->rt_avg
>> > and scale_rt_capacity
>>
>> So far so good..
>>
>> > The call to arch_scale_frequency_capacity in the rt scheduling path might be
>> > a concern for RT folks because I'm not sure whether we can rely on
>> > arch_scale_freq_capacity to be short and efficient ?
>>
>> No, that is, arch_scale_frequency_capacity() _must_ be short and
>> efficient, event for the fair class, its called in very hot paths.
>
> ... and very frequently too.
>
>> I think we've talked about this before; this function should basically
>> only return a cached value, which is periodically updated through some
>> means.
>
> Agreed. I think it is reasonable to assume that the arch code
> implementing arch_scale_freq_capacity() does it's best to make it fast
> for the particular architecture. Since the scaling factor to be returned
> by the function may be obtained in different ways for different
> architectures the caching should be done on the arch side.
>
>> But lets see, I've yet to see an actual implementation of it; and its
>> got that sd argument, curious what you're going to do with that.
>
> So we do have an RFC implementation for ARM already which I posted in
> December and is also included in the rather large RFC posting I did some
> weeks ago. That one basically reads two atomic variables and returns the
> ratio between the two. I have yet to benchmark how horribly expensive it
> is though. The sd argument is ignored. We might actually not need it at
> all?

For consistency across all arch_scale_xx_capacity, i would prefer to
keep the same prototype interface (struct sched_domain *sd, int cpu)
even if it's not used ofr now


> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
diff mbox

Patch

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index a5039da..b37c27b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5785,7 +5785,7 @@  unsigned long __weak arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
 static unsigned long scale_rt_capacity(int cpu)
 {
 	struct rq *rq = cpu_rq(cpu);
-	u64 total, available, age_stamp, avg;
+	u64 total, used, age_stamp, avg;
 	s64 delta;
 
 	/*
@@ -5801,19 +5801,12 @@  static unsigned long scale_rt_capacity(int cpu)
 
 	total = sched_avg_period() + delta;
 
-	if (unlikely(total < avg)) {
-		/* Ensures that capacity won't end up being negative */
-		available = 0;
-	} else {
-		available = total - avg;
-	}
+	used = div_u64(avg, total);
 
-	if (unlikely((s64)total < SCHED_CAPACITY_SCALE))
-		total = SCHED_CAPACITY_SCALE;
+	if (likely(used < SCHED_CAPACITY_SCALE))
+		return SCHED_CAPACITY_SCALE - used;
 
-	total >>= SCHED_CAPACITY_SHIFT;
-
-	return div_u64(available, total);
+	return 1;
 }
 
 static void update_cpu_capacity(struct sched_domain *sd, int cpu)
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c34bd11..fc5b152 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1312,9 +1312,11 @@  static inline int hrtick_enabled(struct rq *rq)
 
 #ifdef CONFIG_SMP
 extern void sched_avg_update(struct rq *rq);
+extern unsigned long arch_scale_freq_capacity(struct sched_domain *sd, int cpu);
+
 static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
 {
-	rq->rt_avg += rt_delta;
+	rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
 	sched_avg_update(rq);
 }
 #else