Message ID | 639aad743bac7f3292146738f44dbd1480169c8e.1488437503.git.viresh.kumar@linaro.org |
---|---|
State | New |
Headers | show |
Series | [1/3] cpufreq: schedutil: move cached_raw_freq to struct sugov_policy | expand |
On Thursday, March 02, 2017 02:03:22 PM Viresh Kumar wrote: > The same code is present both within and outside the loop and it doesn't > look like it provides any additional benefit. Well, not quite. This is on purpose. Note the "if (j == smp_processor_id())" condition within the loop and think about how the current CPU is taken into account. :-) Thanks, Rafael
On Saturday, March 04, 2017 01:03:17 AM Rafael J. Wysocki wrote: > On Thursday, March 02, 2017 02:03:22 PM Viresh Kumar wrote: > > The same code is present both within and outside the loop and it doesn't > > look like it provides any additional benefit. > > Well, not quite. This is on purpose. > > Note the "if (j == smp_processor_id())" condition within the loop and think > about how the current CPU is taken into account. :-) Ah OK, you did that, sorry. So one idea is that if SCHED_CPUFREQ_RT_DL is set in flags, we don't even need to start the loop which is quite a cost to simply notice that there's nothing to do. Also I don't quite agree with adding an extra pair of integer multiplications to that loop just to get rid of the extra args. That aside from chasing extra pointers, of course. Thanks, Rafael
On Mon, Mar 6, 2017 at 5:45 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 04-03-17, 01:11, Rafael J. Wysocki wrote: >> So one idea is that if SCHED_CPUFREQ_RT_DL is set in flags, we don't even >> need to start the loop which is quite a cost to simply notice that there's >> nothing to do. > > Hmm. Isn't the probability of this flag being set, same for all CPUs in the > policy? No, I don't think so. > If yes, then why do we need to handle the current CPU specially? We don't need to chase a pointer to get to the flags for the current CPU (and same goes for util and max) and what if it is the last one in the policy->cpus mask? >> Also I don't quite agree with adding an extra pair of integer multiplications >> to that loop just to get rid of the extra args. > > But that should be cheap enough as we would be multiplying with 1 in one of them > and with 0 on the other. I'm not sure it will be really that cheap. > Isn't that better then keeping same code at two places? Well, it isn't IMO, unless you have numbers to support your point. > Also as I mentioned in the commit log, the number of extra comparisons for the > current CPU will be balanced if we have three CPUs in the policy and with every > other CPU in the policy, we will end up doing one comparison less. With > Quad-core policies, we reduce the number of comparisons by 1 and for octa-core > ones, we reduce it by 5. So to the point, the code was written this way on purpose and not just by accident as your changelog suggests and if you want to change it, you need numbers. Thanks, Rafael
On 06-03-17, 13:24, Rafael J. Wysocki wrote: > On Mon, Mar 6, 2017 at 5:45 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > On 04-03-17, 01:11, Rafael J. Wysocki wrote: > >> So one idea is that if SCHED_CPUFREQ_RT_DL is set in flags, we don't even > >> need to start the loop which is quite a cost to simply notice that there's > >> nothing to do. > > > > Hmm. Isn't the probability of this flag being set, same for all CPUs in the > > policy? > > No, I don't think so. Why do you think so? I thought all CPU in the policy can have the RT/DL flag set and the probability of all of them is just the same. > So to the point, the code was written this way on purpose and not just > by accident as your changelog suggests and I didn't wanted to convey that really and I knew that it was written on purpose. > if you want to change it, you need numbers. What kind of numbers can we get for such a change ? I tried to take the running average of the time it takes to execute this routine over 10000 samples, but it varies a lot even with the same build. Any tests like hackbench, etc wouldn't be of any help as well. -- viresh
On Tue, Mar 7, 2017 at 11:31 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 06-03-17, 13:24, Rafael J. Wysocki wrote: >> On Mon, Mar 6, 2017 at 5:45 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: >> > On 04-03-17, 01:11, Rafael J. Wysocki wrote: >> >> So one idea is that if SCHED_CPUFREQ_RT_DL is set in flags, we don't even >> >> need to start the loop which is quite a cost to simply notice that there's >> >> nothing to do. >> > >> > Hmm. Isn't the probability of this flag being set, same for all CPUs in the >> > policy? >> >> No, I don't think so. > > Why do you think so? I thought all CPU in the policy can have the RT/DL flag set > and the probability of all of them is just the same. Well, yes, but if the current CPU has that flag set already, we surely don't need to check the other ones in the policy? >> So to the point, the code was written this way on purpose and not just >> by accident as your changelog suggests and > > I didn't wanted to convey that really and I knew that it was written on purpose. > >> if you want to change it, you need numbers. > > What kind of numbers can we get for such a change ? I tried to take the running > average of the time it takes to execute this routine over 10000 samples, but it > varies a lot even with the same build. Any tests like hackbench, etc wouldn't be > of any help as well. So why do you think it needs to be changed, but really? Is that because it is particularly hard to follow or similar? Thanks, Rafael
On 07-03-17, 14:19, Rafael J. Wysocki wrote: > On Tue, Mar 7, 2017 at 11:31 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > > Why do you think so? I thought all CPU in the policy can have the RT/DL flag set > > and the probability of all of them is just the same. > > Well, yes, but if the current CPU has that flag set already, we surely > don't need to check the other ones in the policy? That's true for every other CPU in policy too.. > >> So to the point, the code was written this way on purpose and not just > >> by accident as your changelog suggests and > > > > I didn't wanted to convey that really and I knew that it was written on purpose. > > > >> if you want to change it, you need numbers. > > > > What kind of numbers can we get for such a change ? I tried to take the running > > average of the time it takes to execute this routine over 10000 samples, but it > > varies a lot even with the same build. Any tests like hackbench, etc wouldn't be > > of any help as well. > > So why do you think it needs to be changed, but really? > > Is that because it is particularly hard to follow or similar? Just that I didn't like keeping the same code at two places (outside and inside the loop) and the benefit it has. Anyway, its not straight forward to get any numbers supporting my argument. I can claim improvement only theoretically by comparing the number of comparisons that we may end up doing for quad or octa core policies. Lets abandon this patch as I failed to convince you :) Thanks for applying the other two patches though. Cheers. -- viresh
On Wed, Mar 8, 2017 at 5:18 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: > On 07-03-17, 14:19, Rafael J. Wysocki wrote: >> On Tue, Mar 7, 2017 at 11:31 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote: >> > Why do you think so? I thought all CPU in the policy can have the RT/DL flag set >> > and the probability of all of them is just the same. >> >> Well, yes, but if the current CPU has that flag set already, we surely >> don't need to check the other ones in the policy? > > That's true for every other CPU in policy too.. Not exactly. The flags value for the current CPU is in a hot cache line already (if not in a register) and it is not necessary to chase a pointer (and possibly fetch a new cache line) to get to it. That also applies to util and max for the current CPU, but the benefit here is debatable. >> >> So to the point, the code was written this way on purpose and not just >> >> by accident as your changelog suggests and >> > >> > I didn't wanted to convey that really and I knew that it was written on purpose. >> > >> >> if you want to change it, you need numbers. >> > >> > What kind of numbers can we get for such a change ? I tried to take the running >> > average of the time it takes to execute this routine over 10000 samples, but it >> > varies a lot even with the same build. Any tests like hackbench, etc wouldn't be >> > of any help as well. >> >> So why do you think it needs to be changed, but really? >> >> Is that because it is particularly hard to follow or similar? > > Just that I didn't like keeping the same code at two places (outside > and inside the loop) and the benefit it has. So there are two things here, the flags check and the invocation of sugov_iowait_boost() for the current CPU. I claim that the flags check is a clear benefit due to what I said above. The other thing is a way to initialize util and max to sensible values. It also can be done the way you did it and that change should not affect the execution time. So overall, maybe you can move the flags check to sugov_update_shared(), so that you don't need to pass flags to sugov_next_freq_shared(), and then do what you did to util and max. But that would be a 4.12 change anyway. > Anyway, its not straight forward to get any numbers supporting my > argument. I can claim improvement only theoretically by comparing the > number of comparisons that we may end up doing for quad or octa core > policies. Lets abandon this patch as I failed to convince you :) > > Thanks for applying the other two patches though. No problem. Thanks, Rafael
On 08-03-17, 11:50, Rafael J. Wysocki wrote: > So overall, maybe you can move the flags check to > sugov_update_shared(), so that you don't need to pass flags to > sugov_next_freq_shared(), and then do what you did to util and max. Just to confirm, below is what you are suggesting ? -------------------------8<------------------------- 1 file changed, 9 insertions(+), 16 deletions(-) > But that would be a 4.12 change anyway. Sure.diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 78468aa051ab..f5ffe241812e 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -217,30 +217,19 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_update_commit(sg_policy, time, next_f); } -static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, - unsigned long util, unsigned long max, - unsigned int flags) +static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu) { struct sugov_policy *sg_policy = sg_cpu->sg_policy; struct cpufreq_policy *policy = sg_policy->policy; - unsigned int max_f = policy->cpuinfo.max_freq; u64 last_freq_update_time = sg_policy->last_freq_update_time; + unsigned long util = 0, max = 1; unsigned int j; - if (flags & SCHED_CPUFREQ_RT_DL) - return max_f; - - sugov_iowait_boost(sg_cpu, &util, &max); - for_each_cpu(j, policy->cpus) { - struct sugov_cpu *j_sg_cpu; + struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j); unsigned long j_util, j_max; s64 delta_ns; - if (j == smp_processor_id()) - continue; - - j_sg_cpu = &per_cpu(sugov_cpu, j); /* * If the CPU utilization was last updated before the previous * frequency update and the time elapsed between the last update @@ -254,7 +243,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, continue; } if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) - return max_f; + return policy->cpuinfo.max_freq; j_util = j_sg_cpu->util; j_max = j_sg_cpu->max; @@ -289,7 +278,11 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, sg_cpu->last_update = time; if (sugov_should_update_freq(sg_policy, time)) { - next_f = sugov_next_freq_shared(sg_cpu, util, max, flags); + if (flags & SCHED_CPUFREQ_RT_DL) + next_f = sg_policy->policy->cpuinfo.max_freq; + else + next_f = sugov_next_freq_shared(sg_cpu); + sugov_update_commit(sg_policy, time, next_f); }
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 570925ea7253..ec56537429a8 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -216,30 +216,20 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, sugov_update_commit(sg_policy, time, next_f); } -static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, - unsigned long util, unsigned long max, - unsigned int flags) +static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu) { struct sugov_policy *sg_policy = sg_cpu->sg_policy; struct cpufreq_policy *policy = sg_policy->policy; unsigned int max_f = policy->cpuinfo.max_freq; u64 last_freq_update_time = sg_policy->last_freq_update_time; + unsigned long util = 0, max = 1; unsigned int j; - if (flags & SCHED_CPUFREQ_RT_DL) - return max_f; - - sugov_iowait_boost(sg_cpu, &util, &max); - for_each_cpu(j, policy->cpus) { - struct sugov_cpu *j_sg_cpu; + struct sugov_cpu *j_sg_cpu = &per_cpu(sugov_cpu, j); unsigned long j_util, j_max; s64 delta_ns; - if (j == smp_processor_id()) - continue; - - j_sg_cpu = &per_cpu(sugov_cpu, j); /* * If the CPU utilization was last updated before the previous * frequency update and the time elapsed between the last update @@ -288,7 +278,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, sg_cpu->last_update = time; if (sugov_should_update_freq(sg_policy, time)) { - next_f = sugov_next_freq_shared(sg_cpu, util, max, flags); + next_f = sugov_next_freq_shared(sg_cpu); sugov_update_commit(sg_policy, time, next_f); }
The same code is present both within and outside the loop and it doesn't look like it provides any additional benefit. Remove the special handling of sg_cpu and let it happen within the loop. With this change we will do two extra comparisons for the sg_cpu in the loop, but the loop will do one less comparison for every other CPU in the policy. While at it, also remove the excess parameters of sugov_next_freq_shared(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> --- kernel/sched/cpufreq_schedutil.c | 18 ++++-------------- 1 file changed, 4 insertions(+), 14 deletions(-) -- 2.7.1.410.g6faf27b