Message ID | 20171130114723.29210-2-patrick.bellasi@arm.com |
---|---|
State | New |
Headers | show |
Series | [v3,1/6] cpufreq: schedutil: reset sg_cpus's flags at IDLE enter | expand |
Hi, On 30/11/17 11:47, Patrick Bellasi wrote: [...] > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index 2f52ec0f1539..67339ccb5595 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sg_cpu->util = util; > sg_cpu->max = max; > + > + /* CPU is entering IDLE, reset flags without triggering an update */ > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > + sg_cpu->flags = 0; > + goto done; > + } Looks good for now. I'm just thinking that we will happen for DL, as a CPU that still "has" a sleeping task is not going to be really idle until the 0-lag time. I guess we could move this at that point in time? > sg_cpu->flags = flags; > > sugov_set_iowait_boost(sg_cpu, time, flags); > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > sugov_update_commit(sg_policy, time, next_f); > } > > +done: > raw_spin_unlock(&sg_policy->update_lock); > } > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > index d518664cce4f..6e8ae2aa7a13 100644 > --- a/kernel/sched/idle_task.c > +++ b/kernel/sched/idle_task.c > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > put_prev_task(rq, prev); > update_idle_core(rq); > schedstat_inc(rq->sched_goidle); > + > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); Don't know if it make things any cleaner, but you could add to the comment that we don't actually trigger a frequency update with this call. Best, Juri
On 30-Nov 14:12, Juri Lelli wrote: > Hi, > > On 30/11/17 11:47, Patrick Bellasi wrote: > > [...] > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index 2f52ec0f1539..67339ccb5595 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > sg_cpu->util = util; > > sg_cpu->max = max; > > + > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > + sg_cpu->flags = 0; > > + goto done; > > + } > > Looks good for now. I'm just thinking that we will happen for DL, as a > CPU that still "has" a sleeping task is not going to be really idle > until the 0-lag time. AFAIU, for the time being, DL already cannot really rely on this flag for its behaviors to be correct. Indeed, flags are reset as soon as a FAIR task wakes up and it's enqueued. Only once your DL integration patches are in, we do not depends on flags anymore since DL will report a ceratain utilization up to the 0-lag time, isn't it? If that's the case, I would say that the flags will be used only to jump to the max OPP for RT tasks. Thus, this patch should still be valid. > I guess we could move this at that point in time? Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag is notified only by idle tasks. That's the only code path where we are sure the CPU is entering IDLE. > > sg_cpu->flags = flags; > > > > sugov_set_iowait_boost(sg_cpu, time, flags); > > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sugov_update_commit(sg_policy, time, next_f); > > } > > > > +done: > > raw_spin_unlock(&sg_policy->update_lock); > > } > > > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > > index d518664cce4f..6e8ae2aa7a13 100644 > > --- a/kernel/sched/idle_task.c > > +++ b/kernel/sched/idle_task.c > > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > > put_prev_task(rq, prev); > > update_idle_core(rq); > > schedstat_inc(rq->sched_goidle); > > + > > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); > > Don't know if it make things any cleaner, but you could add to the > comment that we don't actually trigger a frequency update with this > call. Right, will add on next posting. > Best, > > Juri Cheers Patrick -- #include <best/regards.h> Patrick Bellasi
On 30/11/17 15:41, Patrick Bellasi wrote: > On 30-Nov 14:12, Juri Lelli wrote: > > Hi, > > > > On 30/11/17 11:47, Patrick Bellasi wrote: > > > > [...] > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > index 2f52ec0f1539..67339ccb5595 100644 > > > --- a/kernel/sched/cpufreq_schedutil.c > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > > > sg_cpu->util = util; > > > sg_cpu->max = max; > > > + > > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > > + sg_cpu->flags = 0; > > > + goto done; > > > + } > > > > Looks good for now. I'm just thinking that we will happen for DL, as a > > CPU that still "has" a sleeping task is not going to be really idle > > until the 0-lag time. > > AFAIU, for the time being, DL already cannot really rely on this flag > for its behaviors to be correct. Indeed, flags are reset as soon as > a FAIR task wakes up and it's enqueued. Right, and your flags ORing patch should help with this. > > Only once your DL integration patches are in, we do not depends on > flags anymore since DL will report a ceratain utilization up to the > 0-lag time, isn't it? Utilization won't decrease until 0-lag time, correct. I was just wondering if resetting flags before that time (when a CPU enters idle) might be an issue. > > If that's the case, I would say that the flags will be used only to > jump to the max OPP for RT tasks. Thus, this patch should still be valid. > > > I guess we could move this at that point in time? > > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag > is notified only by idle tasks. That's the only code path where we are > sure the CPU is entering IDLE. > W.r.t. the possible issue above, I was thinking that we might want to reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two distinct set of patches. Who gets in last will have to ponder the thing a little bit more. :) Best, Juri
On 30-Nov 17:02, Juri Lelli wrote: > On 30/11/17 15:41, Patrick Bellasi wrote: > > On 30-Nov 14:12, Juri Lelli wrote: > > > Hi, > > > > > > On 30/11/17 11:47, Patrick Bellasi wrote: > > > > > > [...] > > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > > index 2f52ec0f1539..67339ccb5595 100644 > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > > > > > sg_cpu->util = util; > > > > sg_cpu->max = max; > > > > + > > > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > > > + sg_cpu->flags = 0; > > > > + goto done; > > > > + } > > > > > > Looks good for now. I'm just thinking that we will happen for DL, as a > > > CPU that still "has" a sleeping task is not going to be really idle > > > until the 0-lag time. > > > > AFAIU, for the time being, DL already cannot really rely on this flag > > for its behaviors to be correct. Indeed, flags are reset as soon as > > a FAIR task wakes up and it's enqueued. > > Right, and your flags ORing patch should help with this. > > > > > Only once your DL integration patches are in, we do not depends on > > flags anymore since DL will report a ceratain utilization up to the > > 0-lag time, isn't it? > > Utilization won't decrease until 0-lag time, correct. Then IMO with your DL patches the DL class don't need the flags anymore since schedutil will know (and account) for the utlization required by the DL tasks. Isn't it? > I was just wondering if resetting flags before that time (when a CPU > enters idle) might be an issue. If the above is correct, then flags will be used only for the RT class (and IO boosting)... and thus this patch will still be useful as it is now: meaning that once the idle task is selected we do not care anymore about RT and IOBoosting (only). > > If that's the case, I would say that the flags will be used only to > > jump to the max OPP for RT tasks. Thus, this patch should still be valid. > > > > > I guess we could move this at that point in time? > > > > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag > > is notified only by idle tasks. That's the only code path where we are > > sure the CPU is entering IDLE. > > > > W.r.t. the possible issue above, I was thinking that we might want to > reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two > distinct set of patches. Who gets in last will have to ponder the thing > a little bit more. :) Perhaps I'm still a bit confused but, to me, it seems that with your patches we completely fix DL but we still can use this exact same patch just for RT tasks. > Best, > > Juri -- #include <best/regards.h> Patrick Bellasi
On 30/11/17 16:19, Patrick Bellasi wrote: > On 30-Nov 17:02, Juri Lelli wrote: > > On 30/11/17 15:41, Patrick Bellasi wrote: > > > On 30-Nov 14:12, Juri Lelli wrote: > > > > Hi, > > > > > > > > On 30/11/17 11:47, Patrick Bellasi wrote: > > > > > > > > [...] > > > > > > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > > > > index 2f52ec0f1539..67339ccb5595 100644 > > > > > --- a/kernel/sched/cpufreq_schedutil.c > > > > > +++ b/kernel/sched/cpufreq_schedutil.c > > > > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > > > > > > > sg_cpu->util = util; > > > > > sg_cpu->max = max; > > > > > + > > > > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > > > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > > > > + sg_cpu->flags = 0; > > > > > + goto done; > > > > > + } > > > > > > > > Looks good for now. I'm just thinking that we will happen for DL, as a > > > > CPU that still "has" a sleeping task is not going to be really idle > > > > until the 0-lag time. > > > > > > AFAIU, for the time being, DL already cannot really rely on this flag > > > for its behaviors to be correct. Indeed, flags are reset as soon as > > > a FAIR task wakes up and it's enqueued. > > > > Right, and your flags ORing patch should help with this. > > > > > > > > Only once your DL integration patches are in, we do not depends on > > > flags anymore since DL will report a ceratain utilization up to the > > > 0-lag time, isn't it? > > > > Utilization won't decrease until 0-lag time, correct. > > Then IMO with your DL patches the DL class don't need the flags > anymore since schedutil will know (and account) for the > utlization required by the DL tasks. Isn't it? > > > I was just wondering if resetting flags before that time (when a CPU > > enters idle) might be an issue. > > If the above is correct, then flags will be used only for the RT class (and > IO boosting)... and thus this patch will still be useful as it is now: > meaning that once the idle task is selected we do not care anymore > about RT and IOBoosting (only). > > > > If that's the case, I would say that the flags will be used only to > > > jump to the max OPP for RT tasks. Thus, this patch should still be valid. > > > > > > > I guess we could move this at that point in time? > > > > > > Not sure what you mean here. Right now the new SCHED_CPUFREQ_IDLE flag > > > is notified only by idle tasks. That's the only code path where we are > > > sure the CPU is entering IDLE. > > > > > > > W.r.t. the possible issue above, I was thinking that we might want to > > reset flags at 0-lag time for DL (if CPU is still idle). Anyway, two > > distinct set of patches. Who gets in last will have to ponder the thing > > a little bit more. :) > > Perhaps I'm still a bit confused but, to me, it seems that with your > patches we completely fix DL but we still can use this exact same > patch just for RT tasks. We don't use the flags for bailing out during aggregation, so it should be ok for DL yes. Thanks, Juri
On 30-11-17, 11:47, Patrick Bellasi wrote: > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h > index d1ad3d825561..bb5f778db023 100644 > --- a/include/linux/sched/cpufreq.h > +++ b/include/linux/sched/cpufreq.h > @@ -11,6 +11,7 @@ > #define SCHED_CPUFREQ_RT (1U << 0) > #define SCHED_CPUFREQ_DL (1U << 1) > #define SCHED_CPUFREQ_IOWAIT (1U << 2) > +#define SCHED_CPUFREQ_IDLE (1U << 3) > > #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL) > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > index 2f52ec0f1539..67339ccb5595 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sg_cpu->util = util; > sg_cpu->max = max; > + > + /* CPU is entering IDLE, reset flags without triggering an update */ > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > + sg_cpu->flags = 0; > + goto done; > + } > sg_cpu->flags = flags; > > sugov_set_iowait_boost(sg_cpu, time, flags); > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > sugov_update_commit(sg_policy, time, next_f); > } > > +done: > raw_spin_unlock(&sg_policy->update_lock); > } > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > index d518664cce4f..6e8ae2aa7a13 100644 > --- a/kernel/sched/idle_task.c > +++ b/kernel/sched/idle_task.c > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > put_prev_task(rq, prev); > update_idle_core(rq); > schedstat_inc(rq->sched_goidle); > + > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); We posted some comments on V2 for this particular patch suggesting some improvements. The patch hasn't changed at all and you haven't replied to few of those suggestions as well. Any particular reason for that? For example: - I suggested to get rid of the conditional expression in cpufreq_schedutil.c file that you have added. - And Joel suggested to clear the RT/DL flags from dequeue path to avoid adding SCHED_CPUFREQ_IDLE flag. -- viresh
Hi Viresh, On 07-Dec 10:31, Viresh Kumar wrote: > On 30-11-17, 11:47, Patrick Bellasi wrote: > > diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h > > index d1ad3d825561..bb5f778db023 100644 > > --- a/include/linux/sched/cpufreq.h > > +++ b/include/linux/sched/cpufreq.h > > @@ -11,6 +11,7 @@ > > #define SCHED_CPUFREQ_RT (1U << 0) > > #define SCHED_CPUFREQ_DL (1U << 1) > > #define SCHED_CPUFREQ_IOWAIT (1U << 2) > > +#define SCHED_CPUFREQ_IDLE (1U << 3) > > > > #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL) > > > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c > > index 2f52ec0f1539..67339ccb5595 100644 > > --- a/kernel/sched/cpufreq_schedutil.c > > +++ b/kernel/sched/cpufreq_schedutil.c > > @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > > > sg_cpu->util = util; > > sg_cpu->max = max; > > + > > + /* CPU is entering IDLE, reset flags without triggering an update */ > > + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { > > + sg_cpu->flags = 0; > > + goto done; > > + } > > sg_cpu->flags = flags; > > > > sugov_set_iowait_boost(sg_cpu, time, flags); > > @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, > > sugov_update_commit(sg_policy, time, next_f); > > } > > > > +done: > > raw_spin_unlock(&sg_policy->update_lock); > > } > > > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > > index d518664cce4f..6e8ae2aa7a13 100644 > > --- a/kernel/sched/idle_task.c > > +++ b/kernel/sched/idle_task.c > > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > > put_prev_task(rq, prev); > > update_idle_core(rq); > > schedstat_inc(rq->sched_goidle); > > + > > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); > > We posted some comments on V2 for this particular patch suggesting > some improvements. The patch hasn't changed at all and you haven't > replied to few of those suggestions as well. Any particular reason for > that? You right, since the previous posting has been a long time ago, with this one I mainly wanted to refresh the discussion. Thanks for highlighting hereafter which one was the main discussion points. > For example: > - I suggested to get rid of the conditional expression in > cpufreq_schedutil.c file that you have added. We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting them), however I think we still need an if condition somewhere. Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger an OPP change (reasons described in the changelog). If that's still a goal, then we will need to check this flag and bail out from sugov_update_shared straight away. That's why I've added a check at the beginning and also defined it as unlikely to have not impact on all cases where we call a schedutil update with runnable tasks. Does this makes sense? > - And Joel suggested to clear the RT/DL flags from dequeue path to > avoid adding SCHED_CPUFREQ_IDLE flag. I had a thought about Joel's proposal: >> wouldn't another way be to just clear the flag from the RT scheduling >> class with an extra call to cpufreq_update_util with flags = 0 during >> dequeue_rt_entity? The main concern for me was that the current API is completely transparent about which scheduling class is calling schedutil for updates. Thus, at dequeue time of an RT task we cannot really clear all the flags (e.g. IOWAIT of a fair task), we should clear only the RT related flags. This means that we likely need to implement Joel's idea by: 1. adding a new set of flags like: SCHED_CPUFREQ_RT_IDLE, SCHED_CPUFREQ_DL_IDLE, etc... 3. add an operation flag, e.g. SCHED_CPUFERQ_SET, SCHED_CPUFREQ_RESET to be ORed with the class flag, e.g. cpufreq_update_util(rq, SCHED_CPUFREQ_SET|SCHED_CPUFREQ_RT); 3. change the API to carry the operation required for a flag, e.g.: cpufreq_update_util(rq, flag, set={true, false}); To be honest I don't like any of those, especially compared to the simplicity of the one proposed by this patch. :) IMO, the only pitfall of this patch is that (as Juri pointed out in v2) for DL it can happen that we do not want to reset the flag right when a CPU enters IDLE. We need instead a specific call to reset the DL flag at the 0-lag time. However, AFAIU, this special case for DL will disappear as long as we have last Juri's set [1]in. Indeed, at this point, schedutil will always and only need to know the utilization required by DL. [1] https://lkml.org/lkml/2017/12/4/173 Cheers Patrick -- #include <best/regards.h> Patrick Bellasi
On 12/07/2017 01:45 PM, Patrick Bellasi wrote: > Hi Viresh, > > On 07-Dec 10:31, Viresh Kumar wrote: >> On 30-11-17, 11:47, Patrick Bellasi wrote: [...] >> We posted some comments on V2 for this particular patch suggesting >> some improvements. The patch hasn't changed at all and you haven't >> replied to few of those suggestions as well. Any particular reason for >> that? > > You right, since the previous posting has been a long time ago, with > this one I mainly wanted to refresh the discussion. Thanks for > highlighting hereafter which one was the main discussion points. > > >> For example: >> - I suggested to get rid of the conditional expression in >> cpufreq_schedutil.c file that you have added. > > We can probably set flags to SCHED_CPUFREQ_IDLE (instead of resetting > them), however I think we still need an if condition somewhere. > > Indeed, when SCHED_CPUFREQ_IDLE is asserted we don't want to trigger > an OPP change (reasons described in the changelog). > > If that's still a goal, then we will need to check this flag and bail > out from sugov_update_shared straight away. That's why I've added a > check at the beginning and also defined it as unlikely to have not > impact on all cases where we call a schedutil update with runnable > tasks. > > Does this makes sense? IIRC, there was also this question of doing this not only in the shared but also in the single case ... [...]
Hi Viresh, On 12/12/17 17:07, Viresh Kumar wrote: [...] > From: Viresh Kumar <viresh.kumar@linaro.org> > Date: Tue, 12 Dec 2017 15:43:26 +0530 > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags > > Currently the schedutil governor overwrites the sg_cpu->flags field on > every call to the utilization handler. It was pretty good as the initial > implementation of utilization handlers, there are several drawbacks > though. > > The biggest drawback is that the sg_cpu->flags field doesn't always > represent the correct type of tasks that are enqueued on a CPU's rq. For > example, if a fair task is enqueued while a RT or DL task is running, we > will overwrite the flags with value 0 and that may take the CPU to lower > OPPs unintentionally. There can be other corner cases as well which we > aren't aware of currently. > > This patch changes the current implementation to keep track of all the > task types that are currently enqueued to the CPUs rq. There are two > flags for every scheduling class now, one to set the flag and other one > to clear it. The flag is set by the scheduling classes from the existing > set of calls to cpufreq_update_util(), and the flag is cleared when the > last task of the scheduling class is dequeued. For now, the util update > handlers return immediately if they were called to clear the flag. > > We can add more optimizations over this patch separately. > > The last parameter of sugov_set_iowait_boost() is also dropped as the > function can get it from sg_cpu anyway. > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [...] > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy) > memset(sg_cpu, 0, sizeof(*sg_cpu)); > sg_cpu->cpu = cpu; > sg_cpu->sg_policy = sg_policy; > - sg_cpu->flags = SCHED_CPUFREQ_RT; > + sg_cpu->flags = 0; > sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq; > } Why this change during initialization? Thanks, - Juri
On 12-12-17, 14:38, Juri Lelli wrote: > Hi Viresh, > > On 12/12/17 17:07, Viresh Kumar wrote: > > [...] > > > From: Viresh Kumar <viresh.kumar@linaro.org> > > Date: Tue, 12 Dec 2017 15:43:26 +0530 > > Subject: [PATCH] sched: Keep track of cpufreq utilization update flags > > > > Currently the schedutil governor overwrites the sg_cpu->flags field on > > every call to the utilization handler. It was pretty good as the initial > > implementation of utilization handlers, there are several drawbacks > > though. > > > > The biggest drawback is that the sg_cpu->flags field doesn't always > > represent the correct type of tasks that are enqueued on a CPU's rq. For > > example, if a fair task is enqueued while a RT or DL task is running, we > > will overwrite the flags with value 0 and that may take the CPU to lower > > OPPs unintentionally. There can be other corner cases as well which we > > aren't aware of currently. > > > > This patch changes the current implementation to keep track of all the > > task types that are currently enqueued to the CPUs rq. There are two > > flags for every scheduling class now, one to set the flag and other one > > to clear it. The flag is set by the scheduling classes from the existing > > set of calls to cpufreq_update_util(), and the flag is cleared when the > > last task of the scheduling class is dequeued. For now, the util update > > handlers return immediately if they were called to clear the flag. > > > > We can add more optimizations over this patch separately. > > > > The last parameter of sugov_set_iowait_boost() is also dropped as the > > function can get it from sg_cpu anyway. > > > > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> > > [...] > > > @@ -655,7 +669,7 @@ static int sugov_start(struct cpufreq_policy *policy) > > memset(sg_cpu, 0, sizeof(*sg_cpu)); > > sg_cpu->cpu = cpu; > > sg_cpu->sg_policy = sg_policy; > > - sg_cpu->flags = SCHED_CPUFREQ_RT; > > + sg_cpu->flags = 0; > > sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq; > > } > > Why this change during initialization? Firstly I am not sure why it was set to SCHED_CPUFREQ_RT, as schedutil wouldn't change the frequency until the first time the util handler is called. And once that is called we were updating the flag anyway. So, unless I misunderstood its purpose, it was doing anything helpful. I need to remove it otherwise the RT flag may remain set for a very long time unnecessarily. That would be until the time the last RT task is not dequeued. Consider this for example: we are at max freq when sugov_start() is called and it sets the RT flag, but there is no RT task to run. Now, we have tons of CFS tasks but we always keep running at max because of the flag. Even the schedutil RT thread doesn't get a chance to run/deququed, because we never want a freq change with the RT flag and stay at max. Makes sense ? -- viresh
On 20-Dec 15:33, Peter Zijlstra wrote: > On Thu, Nov 30, 2017 at 11:47:18AM +0000, Patrick Bellasi wrote: > > Currently, sg_cpu's flags are set to the value defined by the last call > > of the cpufreq_update_util(); for RT/DL classes this corresponds to the > > SCHED_CPUFREQ_{RT/DL} flags always being set. > > > > When multiple CPUs share the same frequency domain it might happen that > > a CPU which executed an RT task, right before entering IDLE, has one of > > the SCHED_CPUFREQ_RT_DL flags set, permanently, until it exits IDLE. > > > > Although such an idle CPU is _going to be_ ignored by the > > sugov_next_freq_shared(): > > 1. this kind of "useless RT requests" are ignored only if more then > > TICK_NSEC have elapsed since the last update > > 2. we can still potentially trigger an already too late switch to > > MAX, which starts also a new throttling interval > > 3. the internal state machine is not consistent with what the > > scheduler knows, i.e. the CPU is now actually idle > > So I _really_ hate having to clutter the idle path for this shared case > :/ :) We would like to have per-CPU frequency domains... but the HW guys always complain that's too costly from an HW/power standpoint... and they are likely right :-/ So, here are are just at trying hard to have a SW status matching the HW status... which is just another pain :-/ > 1, can obviously be fixed by short-circuiting the timeout when idle. Mmm.. right... it should be possible for schedutil to detect that a certain CPU is currently idle. Can we use core.c::idle_cpu() from cpufreq_schedutil? > 2. not sure how if you do 1; anybody doing a switch will go through > sugov_next_freq_shared() which will poll all relevant CPUs and per 1 > will see its idle, no? Right, that should work... > Not sure what that leaves for 3. When a CPU is detected idle, perhaps we can still clear the RT flags... ... just for "consistency" of current status representation. > > > diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c > > index d518664cce4f..6e8ae2aa7a13 100644 > > --- a/kernel/sched/idle_task.c > > +++ b/kernel/sched/idle_task.c > > @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf > > put_prev_task(rq, prev); > > update_idle_core(rq); > > schedstat_inc(rq->sched_goidle); > > + > > + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ > > + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); > > + > > return rq->idle; > > } > > > > -- > > 2.14.1 > > -- #include <best/regards.h> Patrick Bellasi
diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h index d1ad3d825561..bb5f778db023 100644 --- a/include/linux/sched/cpufreq.h +++ b/include/linux/sched/cpufreq.h @@ -11,6 +11,7 @@ #define SCHED_CPUFREQ_RT (1U << 0) #define SCHED_CPUFREQ_DL (1U << 1) #define SCHED_CPUFREQ_IOWAIT (1U << 2) +#define SCHED_CPUFREQ_IDLE (1U << 3) #define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL) diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 2f52ec0f1539..67339ccb5595 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -347,6 +347,12 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, sg_cpu->util = util; sg_cpu->max = max; + + /* CPU is entering IDLE, reset flags without triggering an update */ + if (unlikely(flags & SCHED_CPUFREQ_IDLE)) { + sg_cpu->flags = 0; + goto done; + } sg_cpu->flags = flags; sugov_set_iowait_boost(sg_cpu, time, flags); @@ -361,6 +367,7 @@ static void sugov_update_shared(struct update_util_data *hook, u64 time, sugov_update_commit(sg_policy, time, next_f); } +done: raw_spin_unlock(&sg_policy->update_lock); } diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c index d518664cce4f..6e8ae2aa7a13 100644 --- a/kernel/sched/idle_task.c +++ b/kernel/sched/idle_task.c @@ -30,6 +30,10 @@ pick_next_task_idle(struct rq *rq, struct task_struct *prev, struct rq_flags *rf put_prev_task(rq, prev); update_idle_core(rq); schedstat_inc(rq->sched_goidle); + + /* kick cpufreq (see the comment in kernel/sched/sched.h). */ + cpufreq_update_util(rq, SCHED_CPUFREQ_IDLE); + return rq->idle; }