diff mbox series

[RFD,3/5] sched/cpufreq_schedutil: make worker kthread be SCHED_DEADLINE

Message ID 20170324140900.7334-4-juri.lelli@arm.com
State New
Headers show
Series SCHED_DEADLINE freq/cpu invariance and OPP selection | expand

Commit Message

Juri Lelli March 24, 2017, 2:08 p.m. UTC
Worker kthread needs to be able to change frequency for all other
threads.

Make it special, just under STOP class.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Claudio Scordino <claudio@evidence.eu.com>
---
 include/linux/sched.h            |  1 +
 include/uapi/linux/sched.h       |  1 +
 kernel/sched/core.c              | 19 +++++++++++++++++--
 kernel/sched/cpufreq_schedutil.c | 15 ++++++++++++---
 kernel/sched/deadline.c          |  6 ++++++
 kernel/sched/sched.h             |  8 +++++++-
 6 files changed, 44 insertions(+), 6 deletions(-)

-- 
2.10.0

Comments

Peter Zijlstra March 27, 2017, 4:50 p.m. UTC | #1
On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:
> Worker kthread needs to be able to change frequency for all other

> threads.

> 

> Make it special, just under STOP class.


*yuck* ;-)

So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this
'soecial' task will need to boost it. Now add BWI to your thinking and
shudder.


On IRC broonie mentioned that:

 - most PMIC operations are fire and forget (no need to wait for a
   response).
 - PMIC 'packets' are 'small'.
 - SPI has the possibility to push stuff on the queue.

Taken together this seems to suggest we can rework cpufreq drivers to
function in-context, either directly push the packet on the bus if
available, or queue it and let whoever owns it sort it without blocking.

It might be possible to rework/augment I2C to also support pushing stuff
on a queue.


So if we can make all that work, we can do away with this horrible
horrible kthread. Which is, IMO, a much better solution.

Thoughts?
Rafael J. Wysocki March 27, 2017, 5:05 p.m. UTC | #2
On Monday, March 27, 2017 06:01:34 PM Juri Lelli wrote:
> On 27/03/17 18:50, Peter Zijlstra wrote:

> > On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:

> > > Worker kthread needs to be able to change frequency for all other

> > > threads.

> > > 

> > > Make it special, just under STOP class.

> > 

> > *yuck* ;-)

> > 

> 

> Eh, I know. :/

> 

> > So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this

> > 'soecial' task will need to boost it. Now add BWI to your thinking and

> > shudder.

> > 

> 

> Currently that kthread is FIFO already, so boosting still applies. Not as

> bad as in the BWI case though. More thinking required.

> 

> > 

> > On IRC broonie mentioned that:

> > 

> >  - most PMIC operations are fire and forget (no need to wait for a

> >    response).

> >  - PMIC 'packets' are 'small'.

> >  - SPI has the possibility to push stuff on the queue.

> > 

> > Taken together this seems to suggest we can rework cpufreq drivers to

> > function in-context, either directly push the packet on the bus if

> > available, or queue it and let whoever owns it sort it without blocking.

> > 

> > It might be possible to rework/augment I2C to also support pushing stuff

> > on a queue.

> > 

> > 

> > So if we can make all that work, we can do away with this horrible

> > horrible kthread. Which is, IMO, a much better solution.

> > 

> > Thoughts?

> 

> Right. This is more a schedutil (cpufreq) problem though, IMHO. Even if

> I agree that what you are proposing is way more clean (and here I

> actually assume it's feasible at all), I fear it will take quite some

> time to get reworked.


Why do you think so?

Thanks,
Rafael
Rafael J. Wysocki March 27, 2017, 5:37 p.m. UTC | #3
On Monday, March 27, 2017 06:13:36 PM Juri Lelli wrote:
> On 27/03/17 19:05, Rafael J. Wysocki wrote:

> > On Monday, March 27, 2017 06:01:34 PM Juri Lelli wrote:

> > > On 27/03/17 18:50, Peter Zijlstra wrote:

> > > > On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:

> > > > > Worker kthread needs to be able to change frequency for all other

> > > > > threads.

> > > > > 

> > > > > Make it special, just under STOP class.

> > > > 

> > > > *yuck* ;-)

> > > > 

> > > 

> > > Eh, I know. :/

> > > 

> > > > So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this

> > > > 'soecial' task will need to boost it. Now add BWI to your thinking and

> > > > shudder.

> > > > 

> > > 

> > > Currently that kthread is FIFO already, so boosting still applies. Not as

> > > bad as in the BWI case though. More thinking required.

> > > 

> > > > 

> > > > On IRC broonie mentioned that:

> > > > 

> > > >  - most PMIC operations are fire and forget (no need to wait for a

> > > >    response).

> > > >  - PMIC 'packets' are 'small'.

> > > >  - SPI has the possibility to push stuff on the queue.

> > > > 

> > > > Taken together this seems to suggest we can rework cpufreq drivers to

> > > > function in-context, either directly push the packet on the bus if

> > > > available, or queue it and let whoever owns it sort it without blocking.

> > > > 

> > > > It might be possible to rework/augment I2C to also support pushing stuff

> > > > on a queue.

> > > > 

> > > > 

> > > > So if we can make all that work, we can do away with this horrible

> > > > horrible kthread. Which is, IMO, a much better solution.

> > > > 

> > > > Thoughts?

> > > 

> > > Right. This is more a schedutil (cpufreq) problem though, IMHO. Even if

> > > I agree that what you are proposing is way more clean (and here I

> > > actually assume it's feasible at all), I fear it will take quite some

> > > time to get reworked.

> > 

> > Why do you think so?

> > 

> 

> It simply seemed a major rework to me. :)


OK

So there are two pieces here.

One is that if we want *all* drivers to work with schedutil, we need to keep
the kthread for the ones that will never be reworked (because nobody cares
etc).  But then perhaps the kthread implementation may be left alone (because
nobody cares etc).

The second one is that there are drivers operating in-context that work with
schedutil already, so I don't see major obstacles to making more drivers work
that way.  That would be only a matter of reworking the drivers in question.

Thanks,
Rafael
Mark Brown March 27, 2017, 6:09 p.m. UTC | #4
On Mon, Mar 27, 2017 at 07:37:39PM +0200, Rafael J. Wysocki wrote:

> One is that if we want *all* drivers to work with schedutil, we need to keep

> the kthread for the ones that will never be reworked (because nobody cares

> etc).  But then perhaps the kthread implementation may be left alone (because

> nobody cares etc).


A very large proportion of the affected code (at least once you get into
the realm of talking to the PMICs) is shared in core code in various
subsystems rather than in individual drivers so the problem might be
tractable.
Peter Zijlstra March 28, 2017, 10:20 a.m. UTC | #5
On Tue, Mar 28, 2017 at 11:29:31AM +0200, Vincent Guittot wrote:
> On 27 March 2017 at 18:50, Peter Zijlstra <peterz@infradead.org> wrote:

> > On Fri, Mar 24, 2017 at 02:08:58PM +0000, Juri Lelli wrote:

> >> Worker kthread needs to be able to change frequency for all other

> >> threads.

> >>

> >> Make it special, just under STOP class.

> >

> > *yuck* ;-)

> >

> > So imagine our I2C/SPI bus is 'busy' and its mutex taken, then this

> > 'soecial' task will need to boost it. Now add BWI to your thinking and

> > shudder.

> >

> >

> > On IRC broonie mentioned that:

> >

> >  - most PMIC operations are fire and forget (no need to wait for a

> >    response).

> >  - PMIC 'packets' are 'small'.

> >  - SPI has the possibility to push stuff on the queue.

> >

> > Taken together this seems to suggest we can rework cpufreq drivers to

> > function in-context, either directly push the packet on the bus if

> > available, or queue it and let whoever owns it sort it without blocking.

> >

> > It might be possible to rework/augment I2C to also support pushing stuff

> > on a queue.

> 

> But sending new voltage value to PMIC is only part of the sequence.

> When cpufreq set a new opp, it does

> 

> -set new voltage

> -wait for the voltage to settle down.

> -set the new clock frequency

> 

> you can even have to switch to an intermediate clock source.

> 

> When such sequence is managed by the kernel, we can't easily git ride

> of a kthread


That stinks :-(

The whole blocking and bandwidth inheritance stuff gives me nightmares.
Makes all this stuff almost impossible to analyse.

Esp. if the bus (SPI/I2C) is shared with some other 'expensive' device
like a DSP or MMC flash crud.
Rafael J. Wysocki March 30, 2017, 8:22 p.m. UTC | #6
On Thursday, March 30, 2017 08:50:11 AM Vikram Mulukutla wrote:
> 

> > OK

> > 

> > So there are two pieces here.

> > 

> > One is that if we want *all* drivers to work with schedutil, we need to 

> > keep

> > the kthread for the ones that will never be reworked (because nobody 

> > cares

> > etc).  But then perhaps the kthread implementation may be left alone 

> > (because

> > nobody cares etc).

> > 

> > The second one is that there are drivers operating in-context that work 

> > with

> > schedutil already, so I don't see major obstacles to making more 

> > drivers work

> > that way.  That would be only a matter of reworking the drivers in 

> > question.

> > 

> > Thanks,

> > Rafael

> 

> There are some MSM platforms that do need a kthread and would love to 

> use

> schedutil. This is all mainly due to the point that Vincent raised; 

> having

> to actually wait for voltage transitions before clock switches. I can't

> speak about the future, but that's the situation right now. Leaving the

> kthread alone for now would be appreciated!


I was not arguing for removing the kthread (quite opposite rather).

My point was that *if* it is viable to rework drivers to operate in-context,
that would be the way to go IMO instead of messing up with the kthread thing.

Thanks,
Rafael
Juri Lelli March 31, 2017, 7:26 a.m. UTC | #7
On 30/03/17 22:22, Rafael J. Wysocki wrote:
> On Thursday, March 30, 2017 08:50:11 AM Vikram Mulukutla wrote:

> > 

> > > OK

> > > 

> > > So there are two pieces here.

> > > 

> > > One is that if we want *all* drivers to work with schedutil, we need to 

> > > keep

> > > the kthread for the ones that will never be reworked (because nobody 

> > > cares

> > > etc).  But then perhaps the kthread implementation may be left alone 

> > > (because

> > > nobody cares etc).

> > > 

> > > The second one is that there are drivers operating in-context that work 

> > > with

> > > schedutil already, so I don't see major obstacles to making more 

> > > drivers work

> > > that way.  That would be only a matter of reworking the drivers in 

> > > question.

> > > 

> > > Thanks,

> > > Rafael

> > 

> > There are some MSM platforms that do need a kthread and would love to 

> > use

> > schedutil. This is all mainly due to the point that Vincent raised; 

> > having

> > to actually wait for voltage transitions before clock switches. I can't

> > speak about the future, but that's the situation right now. Leaving the

> > kthread alone for now would be appreciated!

> 

> I was not arguing for removing the kthread (quite opposite rather).

> 

> My point was that *if* it is viable to rework drivers to operate in-context,

> that would be the way to go IMO instead of messing up with the kthread thing.

> 


Right, I agree. Problem is that in principle we might still want to use
DEADLINE with the other platforms (MSM being a perfect example), so IMHO
we should still try to find a solution for the kthread anyway.
diff mbox series

Patch

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 952cac87e433..6f508980f320 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1351,6 +1351,7 @@  extern int idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, const struct sched_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int, const struct sched_param *);
 extern int sched_setattr(struct task_struct *, const struct sched_attr *);
+extern int sched_setattr_nocheck(struct task_struct *, const struct sched_attr *);
 extern struct task_struct *idle_task(int cpu);
 
 /**
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index e2a6c7b3510b..72723859ef74 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -48,5 +48,6 @@ 
  */
 #define SCHED_FLAG_RESET_ON_FORK	0x01
 #define SCHED_FLAG_RECLAIM		0x02
+#define SCHED_FLAG_SPECIAL		0x04
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 378d402ee7a6..9b211c77cb54 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2495,6 +2495,9 @@  static int dl_overflow(struct task_struct *p, int policy,
 	u64 new_bw = dl_policy(policy) ? to_ratio(period, runtime) : 0;
 	int cpus, err = -1;
 
+	if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+		return 0;
+
 	/* !deadline task may carry old deadline bandwidth */
 	if (new_bw == p->dl.dl_bw && task_has_dl_policy(p))
 		return 0;
@@ -4052,6 +4055,10 @@  __getparam_dl(struct task_struct *p, struct sched_attr *attr)
 static bool
 __checkparam_dl(const struct sched_attr *attr)
 {
+	/* special dl tasks don't actually use any parameter */
+	if (attr->sched_flags & SCHED_FLAG_SPECIAL)
+		return true;
+
 	/* deadline != 0 */
 	if (attr->sched_deadline == 0)
 		return false;
@@ -4138,7 +4145,9 @@  static int __sched_setscheduler(struct task_struct *p,
 	}
 
 	if (attr->sched_flags &
-		~(SCHED_FLAG_RESET_ON_FORK | SCHED_FLAG_RECLAIM))
+		~(SCHED_FLAG_RESET_ON_FORK |
+		  SCHED_FLAG_RECLAIM |
+		  SCHED_FLAG_SPECIAL))
 		return -EINVAL;
 
 	/*
@@ -4260,7 +4269,8 @@  static int __sched_setscheduler(struct task_struct *p,
 		}
 #endif
 #ifdef CONFIG_SMP
-		if (dl_bandwidth_enabled() && dl_policy(policy)) {
+		if (dl_bandwidth_enabled() && dl_policy(policy) &&
+				!(attr->sched_flags & SCHED_FLAG_SPECIAL)) {
 			cpumask_t *span = rq->rd->span;
 
 			/*
@@ -4390,6 +4400,11 @@  int sched_setattr(struct task_struct *p, const struct sched_attr *attr)
 }
 EXPORT_SYMBOL_GPL(sched_setattr);
 
+int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr)
+{
+	return __sched_setscheduler(p, attr, false, true);
+}
+
 /**
  * sched_setscheduler_nocheck - change the scheduling policy and/or RT priority of a thread from kernelspace.
  * @p: the task in question.
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 05f5625ea005..da67a1cf91e7 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -394,7 +394,16 @@  static void sugov_policy_free(struct sugov_policy *sg_policy)
 static int sugov_kthread_create(struct sugov_policy *sg_policy)
 {
 	struct task_struct *thread;
-	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
+	struct sched_attr attr = {
+		.size = sizeof(struct sched_attr),
+		.sched_policy = SCHED_DEADLINE,
+		.sched_flags = SCHED_FLAG_SPECIAL,
+		.sched_nice = 0,
+		.sched_priority = 0,
+		.sched_runtime = 0,
+		.sched_deadline = 0,
+		.sched_period = 0,
+	};
 	struct cpufreq_policy *policy = sg_policy->policy;
 	int ret;
 
@@ -412,10 +421,10 @@  static int sugov_kthread_create(struct sugov_policy *sg_policy)
 		return PTR_ERR(thread);
 	}
 
-	ret = sched_setscheduler_nocheck(thread, SCHED_FIFO, &param);
+	ret = sched_setattr_nocheck(thread, &attr);
 	if (ret) {
 		kthread_stop(thread);
-		pr_warn("%s: failed to set SCHED_FIFO\n", __func__);
+		pr_warn("%s: failed to set SCHED_DEADLINE\n", __func__);
 		return ret;
 	}
 
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 5c1a205e830f..853de524c6c6 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -131,6 +131,9 @@  static void task_non_contending(struct task_struct *p)
 	if (dl_se->dl_runtime == 0)
 		return;
 
+	if (dl_entity_is_special(dl_se))
+		return;
+
 	WARN_ON(hrtimer_active(&dl_se->inactive_timer));
 	WARN_ON(dl_se->dl_non_contending);
 
@@ -968,6 +971,9 @@  static void update_curr_dl(struct rq *rq)
 
 	sched_rt_avg_update(rq, delta_exec);
 
+	if (unlikely(dl_entity_is_special(dl_se)))
+		return;
+
 	if (unlikely(dl_se->flags & SCHED_FLAG_RECLAIM))
 		delta_exec = grub_reclaim(delta_exec, rq, curr->dl.dl_bw);
 	dl_se->runtime -= delta_exec;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 93c24528ceb6..7b5e81120813 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -155,13 +155,19 @@  static inline int task_has_dl_policy(struct task_struct *p)
 	return dl_policy(p->policy);
 }
 
+static inline int dl_entity_is_special(struct sched_dl_entity *dl_se)
+{
+	return dl_se->flags & SCHED_FLAG_SPECIAL;
+}
+
 /*
  * Tells if entity @a should preempt entity @b.
  */
 static inline bool
 dl_entity_preempt(struct sched_dl_entity *a, struct sched_dl_entity *b)
 {
-	return dl_time_before(a->deadline, b->deadline);
+	return dl_entity_is_special(a) ||
+	       dl_time_before(a->deadline, b->deadline);
 }
 
 /*