diff mbox series

[V3] powercap/drivers/idle_injection: Add an idle injection framework

Message ID 1526655056-24592-1-git-send-email-daniel.lezcano@linaro.org
State New
Headers show
Series [V3] powercap/drivers/idle_injection: Add an idle injection framework | expand

Commit Message

Daniel Lezcano May 18, 2018, 2:50 p.m. UTC
Initially, the cpu_cooling device for ARM was changed by adding a new
policy inserting idle cycles. The intel_powerclamp driver does a
similar action.

Instead of implementing idle injections privately in the cpu_cooling
device, move the idle injection code in a dedicated framework and give
the opportunity to other frameworks to make use of it.

The framework relies on the smpboot kthreads which handles via its
mainloop the common code for hotplugging and [un]parking.

This code was previously tested with the cpu cooling device and went
through several iterations. It results now in split code and API
exported in the header file. It was tested with the cpu cooling device
with success.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

---
 V3:
   - Fixed typos (Viresh Kumar)
   - Removed extra blank line (Viresh Kumar)
   - Added full stop (Viresh Kumar)
   - Fixed Return kerneldoc format (Viresh Kumar)
   - Fixed multiple kthreads initialization (Viresh Kumar)
   - Fixed rollbacking the actions in the unregister function (Viresh Kumar)
   - Make idle injection kthreads name hardcoded
   - Kthreads are created in the initcall process

 V2: Fixed checkpatch warnings
---
 drivers/powercap/Kconfig          |  10 ++
 drivers/powercap/Makefile         |   1 +
 drivers/powercap/idle_injection.c | 326 ++++++++++++++++++++++++++++++++++++++
 include/linux/idle_injection.h    |  29 ++++
 4 files changed, 366 insertions(+)
 create mode 100644 drivers/powercap/idle_injection.c
 create mode 100644 include/linux/idle_injection.h

-- 
2.7.4

Comments

Viresh Kumar May 21, 2018, 10:32 a.m. UTC | #1
On 18-05-18, 16:50, Daniel Lezcano wrote:
> Initially, the cpu_cooling device for ARM was changed by adding a new

> policy inserting idle cycles. The intel_powerclamp driver does a

> similar action.

> 

> Instead of implementing idle injections privately in the cpu_cooling

> device, move the idle injection code in a dedicated framework and give

> the opportunity to other frameworks to make use of it.


I thought you agreed to move above in the comments section ?

> The framework relies on the smpboot kthreads which handles via its

> mainloop the common code for hotplugging and [un]parking.

> 

> This code was previously tested with the cpu cooling device and went

> through several iterations. It results now in split code and API

> exported in the header file. It was tested with the cpu cooling device

> with success.

> 

> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> ---

>  V3:

>    - Fixed typos (Viresh Kumar)

>    - Removed extra blank line (Viresh Kumar)

>    - Added full stop (Viresh Kumar)

>    - Fixed Return kerneldoc format (Viresh Kumar)

>    - Fixed multiple kthreads initialization (Viresh Kumar)

>    - Fixed rollbacking the actions in the unregister function (Viresh Kumar)

>    - Make idle injection kthreads name hardcoded

>    - Kthreads are created in the initcall process

> 

>  V2: Fixed checkpatch warnings

> ---

>  drivers/powercap/Kconfig          |  10 ++

>  drivers/powercap/Makefile         |   1 +

>  drivers/powercap/idle_injection.c | 326 ++++++++++++++++++++++++++++++++++++++

>  include/linux/idle_injection.h    |  29 ++++

>  4 files changed, 366 insertions(+)

>  create mode 100644 drivers/powercap/idle_injection.c

>  create mode 100644 include/linux/idle_injection.h

> 

> diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig

> index 85727ef..a767ef2 100644

> --- a/drivers/powercap/Kconfig

> +++ b/drivers/powercap/Kconfig

> @@ -29,4 +29,14 @@ config INTEL_RAPL

>  	  controller, CPU core (Power Plance 0), graphics uncore (Power Plane

>  	  1), etc.

>  

> +config IDLE_INJECTION

> +	bool "Idle injection framework"

> +	depends on CPU_IDLE

> +	default n

> +	help

> +	  This enables support for the idle injection framework. It

> +	  provides a way to force idle periods on a set of specified

> +	  CPUs for power capping. Idle period can be injected

> +	  synchronously on a set of specified CPUs or alternatively

> +	  on a per CPU basis.

>  endif

> diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile

> index 0a21ef3..c3bbfee 100644

> --- a/drivers/powercap/Makefile

> +++ b/drivers/powercap/Makefile

> @@ -1,2 +1,3 @@

>  obj-$(CONFIG_POWERCAP)	+= powercap_sys.o

>  obj-$(CONFIG_INTEL_RAPL) += intel_rapl.o

> +obj-$(CONFIG_IDLE_INJECTION) += idle_injection.o

> diff --git a/drivers/powercap/idle_injection.c b/drivers/powercap/idle_injection.c

> new file mode 100644

> index 0000000..a5fe205

> --- /dev/null

> +++ b/drivers/powercap/idle_injection.c

> @@ -0,0 +1,326 @@

> +// SPDX-License-Identifier: GPL-2.0

> +/*

> + * drivers/powercap/idle_injection.c

> + *

> + * Copyright 2018 Linaro Limited

> + *

> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org>

> + *

> + * The idle injection framework proposes a way to force a cpu to enter

> + * an idle state during a specified amount of time for a specified

> + * period.

> + *

> + * It relies on the smpboot kthreads which handles, via its main loop,

> + * the common code for hotplugging and [un]parking.

> + *

> + * At init time, all the kthreads are created and parked.

> + *

> + * A cpumask is specified as parameter for the idle injection

> + * registering function. The kthreads will be synchronized regarding

> + * this cpumask.

> + *

> + * The idle + run duration is specified via the helpers and then the

> + * idle injection can be started at this point.

> + *

> + * A kthread will call play_idle() with the specified idle duration

> + * from above and then will schedule itself. The latest CPU belonging

> + * to the group is in charge of setting the timer for the next idle

> + * injection deadline.

> + *

> + * The task handling the timer interrupt will wakeup all the kthreads

> + * belonging to the cpumask.

> + */

> +#include <linux/cpu.h>

> +#include <linux/freezer.h>

> +#include <linux/hrtimer.h>

> +#include <linux/sched.h>

> +#include <linux/slab.h>

> +#include <linux/smpboot.h>

> +

> +#include <uapi/linux/sched/types.h>

> +

> +/**

> + * struct idle_injection_thread - task on/off switch structure

> + * @tsk: a pointer to a task_struct injecting the idle cycles

> + * @should_run: a integer used as a boolean by the smpboot kthread API

> + */

> +struct idle_injection_thread {

> +	struct task_struct *tsk;

> +	int should_run;

> +};

> +

> +/**

> + * struct idle_injection_device - data for the idle injection

> + * @cpumask: a cpumask containing the list of CPUs managed by the device

> + * @timer: a hrtimer giving the tempo for the idle injection

> + * @count: an atomic to keep track of the last task exiting the idle cycle

> + * @idle_duration_ms: an atomic specifying the idle duration

> + * @run_duration_ms: an atomic specifying the running duration

> + */

> +struct idle_injection_device {

> +	cpumask_var_t cpumask;

> +	struct hrtimer timer;

> +	atomic_t count;

> +	atomic_t idle_duration_ms;

> +	atomic_t run_duration_ms;

> +};

> +

> +static DEFINE_PER_CPU(struct idle_injection_thread, idle_injection_thread);

> +static DEFINE_PER_CPU(struct idle_injection_device *, idle_injection_device);

> +

> +/**

> + * idle_injection_wakeup - Wake up all idle injection threads

> + * @ii_dev: the idle injection device

> + *

> + * Every idle injection task belonging to the idle injection device

> + * and running on an online CPU will be wake up by this call.

> + */

> +static void idle_injection_wakeup(struct idle_injection_device *ii_dev)

> +{

> +	struct idle_injection_thread *iit;

> +	int cpu;

> +

> +	for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {

> +		iit = per_cpu_ptr(&idle_injection_thread, cpu);

> +		iit->should_run = 1;

> +		wake_up_process(iit->tsk);

> +	}

> +}

> +

> +/**

> + * idle_injection_wakeup_fn - idle injection timer callback

> + * @timer: a hrtimer structure

> + *

> + * This function is called when the idle injection timer expires which

> + * will wake up the idle injection tasks and these ones, in turn, play

> + * idle a specified amount of time.

> + *

> + * Return: HRTIMER_NORESTART.

> + */

> +static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)

> +{

> +	struct idle_injection_device *ii_dev =

> +		container_of(timer, struct idle_injection_device, timer);

> +

> +	idle_injection_wakeup(ii_dev);

> +

> +	return HRTIMER_NORESTART;

> +}

> +

> +/**

> + * idle_injection_fn - idle injection routine

> + * @cpu: the CPU number the tasks belongs to

> + *

> + * The idle injection routine will stay idle the specified amount of

> + * time

> + */

> +static void idle_injection_fn(unsigned int cpu)

> +{

> +	struct idle_injection_device *ii_dev;

> +	struct idle_injection_thread *iit;

> +	int run_duration_ms, idle_duration_ms;

> +

> +	ii_dev = per_cpu(idle_injection_device, cpu);

> +

> +	iit = per_cpu_ptr(&idle_injection_thread, cpu);

> +

> +	/*

> +	 * Boolean used by the smpboot mainloop and used as a flip-flop


You never replied to my comment in previous posting where I suggested
s/mainloop/main loop/ . Maybe my comment is wrong, but it needs to be
Nak'd.

> +	 * in this function

> +	 */

> +	iit->should_run = 0;

> +

> +	atomic_inc(&ii_dev->count);

> +

> +	idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);

> +

> +	play_idle(idle_duration_ms);

> +

> +	/*

> +	 * The last CPU waking up is in charge of setting the timer. If

> +	 * the CPU is hotplugged, the timer will move to another CPU

> +	 * (which may not belong to the same cluster) but that is not a

> +	 * problem as the timer will be set again by another CPU

> +	 * belonging to the cluster. This mechanism is self adaptive.

> +	 */

> +	if (!atomic_dec_and_test(&ii_dev->count))

> +		return;

> +

> +	run_duration_ms = atomic_read(&ii_dev->run_duration_ms);


This reads as if it is okay to have run_duration_ms set as 0, so we
run idle loop only once. Which is fine, but why do you mandate this to
be non-zero in idle_injection_start() ?

> +	if (!run_duration_ms)

> +		return;

> +

> +	hrtimer_start(&ii_dev->timer, ms_to_ktime(run_duration_ms),

> +		      HRTIMER_MODE_REL_PINNED);

> +}

> +

> +/**

> + * idle_injection_set_duration - idle and run duration helper

> + * @run_duration_ms: an unsigned int giving the running time in milliseconds

> + * @idle_duration_ms: an unsigned int giving the idle time in milliseconds

> + */

> +void idle_injection_set_duration(struct idle_injection_device *ii_dev,

> +				 unsigned int run_duration_ms,

> +				 unsigned int idle_duration_ms)

> +{

> +	atomic_set(&ii_dev->run_duration_ms, run_duration_ms);

> +	atomic_set(&ii_dev->idle_duration_ms, idle_duration_ms);


You check for valid values of these in idle_injection_start() but not
here, why ?

> +}

> +

> +/**

> + * idle_injection_get_duration - idle and run duration helper

> + * @run_duration_ms: a pointer to an unsigned int to store the running time

> + * @idle_duration_ms: a pointer to an unsigned int to store the idle time

> + */

> +void idle_injection_get_duration(struct idle_injection_device *ii_dev,

> +				 unsigned int *run_duration_ms,

> +				 unsigned int *idle_duration_ms)

> +{

> +	*run_duration_ms = atomic_read(&ii_dev->run_duration_ms);

> +	*idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);

> +}

> +

> +/**

> + * idle_injection_start - starts the idle injections

> + * @ii_dev: a pointer to an idle_injection_device structure

> + *

> + * The function starts the idle injection cycles by first waking up

> + * all the tasks the ii_dev is attached to and let them handle the

> + * idle-run periods.

> + *

> + * Return: -EINVAL if the idle or the running durations are not set.

> + */

> +int idle_injection_start(struct idle_injection_device *ii_dev)

> +{

> +	if (!atomic_read(&ii_dev->idle_duration_ms))

> +		return -EINVAL;

> +

> +	if (!atomic_read(&ii_dev->run_duration_ms))

> +		return -EINVAL;

> +

> +	pr_debug("Starting injecting idle cycles on CPUs '%*pbl'\n",

> +		 cpumask_pr_args(ii_dev->cpumask));

> +

> +	idle_injection_wakeup(ii_dev);

> +

> +	return 0;

> +}

> +

> +/**

> + * idle_injection_stop - stops the idle injections

> + * @ii_dev: a pointer to an idle injection_device structure

> + *

> + * The function stops the idle injection by canceling the timer in

> + * charge of waking up the tasks to inject idle and unset the idle and

> + * running durations.

> + */

> +void idle_injection_stop(struct idle_injection_device *ii_dev)

> +{

> +	pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",

> +		 cpumask_pr_args(ii_dev->cpumask));

> +

> +	hrtimer_cancel(&ii_dev->timer);


How are we sure that idle_injection_fn() isn't running at this point
and it would start the timer cancelled here again ?

> +

> +	idle_injection_set_duration(ii_dev, 0, 0);


And why exactly this this required ? Why shouldn't we allow this
sequence to work:

idle_injection_set_duration()
idle_injection_start()
idle_injection_stop()
idle_injection_start()
idle_injection_stop()
idle_injection_start()
idle_injection_stop()

> +}

> +

> +/**

> + * idle_injection_setup - initialize the current task as a RT task

> + * @cpu: the CPU number where the kthread is running on (not used)

> + *

> + * Called one time, this function is in charge of setting the task

> + * scheduler parameters.

> + */

> +static void idle_injection_setup(unsigned int cpu)

> +{

> +	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };

> +

> +	set_freezable();

> +

> +	sched_setscheduler(current, SCHED_FIFO, &param);

> +}

> +

> +/**

> + * idle_injection_should_run - function helper for the smpboot API

> + * @cpu: the CPU number where the kthread is running on

> + *

> + * Return: a boolean telling if the thread can run.

> + */

> +static int idle_injection_should_run(unsigned int cpu)

> +{

> +	struct idle_injection_thread *iit =

> +		per_cpu_ptr(&idle_injection_thread, cpu);

> +

> +	return iit->should_run;

> +}

> +

> +/**

> + * idle_injection_register - idle injection init routine

> + * @cpumask: the list of CPUs managed by the idle injection device

> + *

> + * This is the initialization function in charge of creating the

> + * initializing the timer and allocate the structures. It does not

> + * starts the idle injection cycles.

> + *

> + * Return: NULL if an allocation fails.

> + */

> +struct idle_injection_device *idle_injection_register(struct cpumask *cpumask)

> +{

> +	struct idle_injection_device *ii_dev;

> +	int cpu;

> +

> +	ii_dev = kzalloc(sizeof(*ii_dev), GFP_KERNEL);

> +	if (!ii_dev)

> +		return NULL;

> +

> +	if (!alloc_cpumask_var(&ii_dev->cpumask, GFP_KERNEL))

> +		goto out_free_ii_dev;

> +	cpumask_copy(ii_dev->cpumask, cpumask);

> +

> +	hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);

> +

> +	ii_dev->timer.function = idle_injection_wakeup_fn;

> +

> +	for_each_cpu(cpu, cpumask)

> +		per_cpu(idle_injection_device, cpu) = ii_dev;


Not sure but do we need protection against registration done twice for
a CPU ?

> +

> +	return ii_dev;

> +

> +out_free_ii_dev:

> +	kfree(ii_dev);

> +	return NULL;

> +}

> +

> +/**

> + * idle_injection_unregister - Unregister the idle injection device

> + * @ii_dev: a pointer to an idle injection device

> + *

> + * The function is in charge of stopping the idle injections,

> + * unregister the kthreads and free the allocated memory in the

> + * register function.

> + */

> +void idle_injection_unregister(struct idle_injection_device *ii_dev)

> +{

> +	int cpu;

> +

> +	idle_injection_stop(ii_dev);

> +

> +	for_each_cpu(cpu, ii_dev->cpumask)

> +		per_cpu(idle_injection_device, cpu) = NULL;

> +

> +	kfree(ii_dev);

> +}

> +

> +static struct smp_hotplug_thread idle_injection_threads = {

> +	.store = &idle_injection_thread.tsk,

> +	.setup = idle_injection_setup,

> +	.thread_fn = idle_injection_fn,

> +	.thread_comm = "idle_inject/%u",

> +	.thread_should_run = idle_injection_should_run,

> +};

> +

> +static int __init idle_injection_init(void)

> +{

> +	return smpboot_register_percpu_thread(&idle_injection_threads);

> +}

> +early_initcall(idle_injection_init);


-- 
viresh
Viresh Kumar May 23, 2018, 5:41 a.m. UTC | #2
On 22-05-18, 15:42, Daniel Lezcano wrote:
> On 21/05/2018 12:32, Viresh Kumar wrote:

> > On 18-05-18, 16:50, Daniel Lezcano wrote:

> >> Initially, the cpu_cooling device for ARM was changed by adding a new

> >> policy inserting idle cycles. The intel_powerclamp driver does a

> >> similar action.

> >>

> >> Instead of implementing idle injections privately in the cpu_cooling

> >> device, move the idle injection code in a dedicated framework and give

> >> the opportunity to other frameworks to make use of it.

> > 

> > I thought you agreed to move above in the comments section ?

> 

> This is what I did. I just kept the relevant log here.


The fact that you are stating that you tried to update the cooling
device earlier looked like a bit of version history to me, not what
this patch is doing.

But its okay if you really want that to be preserved in git history :)

> >> +static void idle_injection_fn(unsigned int cpu)

> >> +{

> >> +	struct idle_injection_device *ii_dev;

> >> +	struct idle_injection_thread *iit;

> >> +	int run_duration_ms, idle_duration_ms;

> >> +

> >> +	ii_dev = per_cpu(idle_injection_device, cpu);

> >> +

> >> +	iit = per_cpu_ptr(&idle_injection_thread, cpu);

> >> +

> >> +	/*

> >> +	 * Boolean used by the smpboot mainloop and used as a flip-flop

> >> +	 * in this function

> >> +	 */

> >> +	iit->should_run = 0;

> >> +

> >> +	atomic_inc(&ii_dev->count);

> >> +

> >> +	idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);

> >> +

> >> +	play_idle(idle_duration_ms);

> >> +

> >> +	/*

> >> +	 * The last CPU waking up is in charge of setting the timer. If

> >> +	 * the CPU is hotplugged, the timer will move to another CPU

> >> +	 * (which may not belong to the same cluster) but that is not a

> >> +	 * problem as the timer will be set again by another CPU

> >> +	 * belonging to the cluster. This mechanism is self adaptive.

> >> +	 */

> >> +	if (!atomic_dec_and_test(&ii_dev->count))

> >> +		return;

> >> +

> >> +	run_duration_ms = atomic_read(&ii_dev->run_duration_ms);

> > 

> > This reads as if it is okay to have run_duration_ms set as 0, so we

> > run idle loop only once. Which is fine, but why do you mandate this to

> > be non-zero in idle_injection_start() ?

> 

> It does not make sense to run this function with a run duration set to

> zero because we will immediately go to idle again after exiting idle. So

> the action is exiting. In this context we can't accept to start

> injecting idle cycles.


Right and that's why I said "Which is fine" in my comment above. My
question was more on why we error out in idle_injection_start() if
run_duration_ms is 0.

Just for my understanding, is it a valid usecase where we want to run
the idle loop only once ? i.e. set idle_duration_ms to a non-zero
value but run_duration_ms to 0 ? In that case we shouldn't check for
zero run_duration_ms in idle_injection_start().

> >> +	if (!run_duration_ms)

> >> +		return;

> >> +

> >> +	hrtimer_start(&ii_dev->timer, ms_to_ktime(run_duration_ms),

> >> +		      HRTIMER_MODE_REL_PINNED);

> >> +}

> >> +

> >> +/**

> >> + * idle_injection_set_duration - idle and run duration helper

> >> + * @run_duration_ms: an unsigned int giving the running time in milliseconds

> >> + * @idle_duration_ms: an unsigned int giving the idle time in milliseconds

> >> + */

> >> +void idle_injection_set_duration(struct idle_injection_device *ii_dev,

> >> +				 unsigned int run_duration_ms,

> >> +				 unsigned int idle_duration_ms)

> >> +{

> >> +	atomic_set(&ii_dev->run_duration_ms, run_duration_ms);

> >> +	atomic_set(&ii_dev->idle_duration_ms, idle_duration_ms);

> > 

> > You check for valid values of these in idle_injection_start() but not

> > here, why ?

> 

> By checking against a zero values in the start function is a way to make

> sure we are not starting the idle injection with uninitialized values

> and by setting the duration to zero is a way to stop the idle injection.


Why do we need two ways of stopping the idle injection thread ? Why
isn't just calling idle_injection_stop() the right thing to do in that
case ?

> >> +}

> >> +

> >> +/**

> >> + * idle_injection_get_duration - idle and run duration helper

> >> + * @run_duration_ms: a pointer to an unsigned int to store the running time

> >> + * @idle_duration_ms: a pointer to an unsigned int to store the idle time

> >> + */

> >> +void idle_injection_get_duration(struct idle_injection_device *ii_dev,

> >> +				 unsigned int *run_duration_ms,

> >> +				 unsigned int *idle_duration_ms)

> >> +{

> >> +	*run_duration_ms = atomic_read(&ii_dev->run_duration_ms);

> >> +	*idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);

> >> +}

> >> +

> >> +/**

> >> + * idle_injection_start - starts the idle injections

> >> + * @ii_dev: a pointer to an idle_injection_device structure

> >> + *

> >> + * The function starts the idle injection cycles by first waking up

> >> + * all the tasks the ii_dev is attached to and let them handle the

> >> + * idle-run periods.

> >> + *

> >> + * Return: -EINVAL if the idle or the running durations are not set.

> >> + */

> >> +int idle_injection_start(struct idle_injection_device *ii_dev)

> >> +{

> >> +	if (!atomic_read(&ii_dev->idle_duration_ms))

> >> +		return -EINVAL;

> >> +

> >> +	if (!atomic_read(&ii_dev->run_duration_ms))

> >> +		return -EINVAL;

> >> +


So according to above comments from me, I am saying that this
particular test isn't really required as we may want to run idle loop
only once.

> >> +	pr_debug("Starting injecting idle cycles on CPUs '%*pbl'\n",

> >> +		 cpumask_pr_args(ii_dev->cpumask));

> >> +

> >> +	idle_injection_wakeup(ii_dev);

> >> +

> >> +	return 0;

> >> +}

> >> +

> >> +/**

> >> + * idle_injection_stop - stops the idle injections

> >> + * @ii_dev: a pointer to an idle injection_device structure

> >> + *

> >> + * The function stops the idle injection by canceling the timer in

> >> + * charge of waking up the tasks to inject idle and unset the idle and

> >> + * running durations.

> >> + */

> >> +void idle_injection_stop(struct idle_injection_device *ii_dev)

> >> +{

> >> +	pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",

> >> +		 cpumask_pr_args(ii_dev->cpumask));

> >> +

> >> +	hrtimer_cancel(&ii_dev->timer);

> > 

> > How are we sure that idle_injection_fn() isn't running at this point

> > and it would start the timer cancelled here again ?

> 

> Nothing will ensure that. We will have an extra idle injection in this

> case. We can invert the set_duration(0,0) and the timer cancellation to

> reduce to reduce the window.


That's what I thought and so its racy. If someone calls
idle_injection_unregister(), then we call this routine and then free
the data structures while they are still getting used by the thread :(

> >> +

> >> +	idle_injection_set_duration(ii_dev, 0, 0);

> > 

> > And why exactly this this required ? Why shouldn't we allow this

> > sequence to work:

> > 

> > idle_injection_set_duration()

> > idle_injection_start()

> > idle_injection_stop()

> > idle_injection_start()

> > idle_injection_stop()

> > idle_injection_start()

> > idle_injection_stop()

> 

> Sorry, I don't get it.

> 

> Who will decide to start() and stop() ?


Some confusion here about the usecase then. How do you see this stuff
getting used and how users (cooling-driver ?) will use it ?

-- 
viresh
Daniel Lezcano May 23, 2018, 8 a.m. UTC | #3
On 23/05/2018 07:41, Viresh Kumar wrote:
> On 22-05-18, 15:42, Daniel Lezcano wrote:

>> On 21/05/2018 12:32, Viresh Kumar wrote:

>>> On 18-05-18, 16:50, Daniel Lezcano wrote:

>>>> Initially, the cpu_cooling device for ARM was changed by adding a new

>>>> policy inserting idle cycles. The intel_powerclamp driver does a

>>>> similar action.

>>>>

>>>> Instead of implementing idle injections privately in the cpu_cooling

>>>> device, move the idle injection code in a dedicated framework and give

>>>> the opportunity to other frameworks to make use of it.

>>>

>>> I thought you agreed to move above in the comments section ?

>>

>> This is what I did. I just kept the relevant log here.

> 

> The fact that you are stating that you tried to update the cooling

> device earlier looked like a bit of version history to me, not what

> this patch is doing.

> 

> But its okay if you really want that to be preserved in git history :)

> 

>>>> +static void idle_injection_fn(unsigned int cpu)

>>>> +{

>>>> +	struct idle_injection_device *ii_dev;

>>>> +	struct idle_injection_thread *iit;

>>>> +	int run_duration_ms, idle_duration_ms;

>>>> +

>>>> +	ii_dev = per_cpu(idle_injection_device, cpu);

>>>> +

>>>> +	iit = per_cpu_ptr(&idle_injection_thread, cpu);

>>>> +

>>>> +	/*

>>>> +	 * Boolean used by the smpboot mainloop and used as a flip-flop

>>>> +	 * in this function

>>>> +	 */

>>>> +	iit->should_run = 0;

>>>> +

>>>> +	atomic_inc(&ii_dev->count);

>>>> +

>>>> +	idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);

>>>> +

>>>> +	play_idle(idle_duration_ms);

>>>> +

>>>> +	/*

>>>> +	 * The last CPU waking up is in charge of setting the timer. If

>>>> +	 * the CPU is hotplugged, the timer will move to another CPU

>>>> +	 * (which may not belong to the same cluster) but that is not a

>>>> +	 * problem as the timer will be set again by another CPU

>>>> +	 * belonging to the cluster. This mechanism is self adaptive.

>>>> +	 */

>>>> +	if (!atomic_dec_and_test(&ii_dev->count))

>>>> +		return;

>>>> +

>>>> +	run_duration_ms = atomic_read(&ii_dev->run_duration_ms);

>>>

>>> This reads as if it is okay to have run_duration_ms set as 0, so we

>>> run idle loop only once. Which is fine, but why do you mandate this to

>>> be non-zero in idle_injection_start() ?

>>

>> It does not make sense to run this function with a run duration set to

>> zero because we will immediately go to idle again after exiting idle. So

>> the action is exiting. In this context we can't accept to start

>> injecting idle cycles.

> 

> Right and that's why I said "Which is fine" in my comment above. My

> question was more on why we error out in idle_injection_start() if

> run_duration_ms is 0.

> 

> Just for my understanding, is it a valid usecase where we want to run

> the idle loop only once ? i.e. set idle_duration_ms to a non-zero

> value but run_duration_ms to 0 ? In that case we shouldn't check for

> zero run_duration_ms in idle_injection_start().


Yes, that could be a valid use case if we want to synchronously inject
idle cycles without period.

IOW, call play_idle() on a set of cpus at the same time. And the caller
of start is the one with the control of the period.

If you want this usecase, we need to implement more things:
 - single user of the framework: as soon as we register, no-one else can
use the idle injection
 - blocking stop, we wait for all the kthreads to join a barrier before
returning to the caller
 - blocking start, we wait for all the kthreads to end injecting the
idle cycle

>>>> +	if (!run_duration_ms)

>>>> +		return;

>>>> +

>>>> +	hrtimer_start(&ii_dev->timer, ms_to_ktime(run_duration_ms),

>>>> +		      HRTIMER_MODE_REL_PINNED);

>>>> +}

>>>> +

>>>> +/**

>>>> + * idle_injection_set_duration - idle and run duration helper

>>>> + * @run_duration_ms: an unsigned int giving the running time in milliseconds

>>>> + * @idle_duration_ms: an unsigned int giving the idle time in milliseconds

>>>> + */

>>>> +void idle_injection_set_duration(struct idle_injection_device *ii_dev,

>>>> +				 unsigned int run_duration_ms,

>>>> +				 unsigned int idle_duration_ms)

>>>> +{

>>>> +	atomic_set(&ii_dev->run_duration_ms, run_duration_ms);

>>>> +	atomic_set(&ii_dev->idle_duration_ms, idle_duration_ms);

>>>

>>> You check for valid values of these in idle_injection_start() but not

>>> here, why ?

>>

>> By checking against a zero values in the start function is a way to make

>> sure we are not starting the idle injection with uninitialized values

>> and by setting the duration to zero is a way to stop the idle injection.

> 

> Why do we need two ways of stopping the idle injection thread ? Why

> isn't just calling idle_injection_stop() the right thing to do in that

> case ?


How do we prevent the last kthread in the idle_injection_fn to set the
timer ?

>>>> +}

>>>> +

>>>> +/**

>>>> + * idle_injection_get_duration - idle and run duration helper

>>>> + * @run_duration_ms: a pointer to an unsigned int to store the running time

>>>> + * @idle_duration_ms: a pointer to an unsigned int to store the idle time

>>>> + */

>>>> +void idle_injection_get_duration(struct idle_injection_device *ii_dev,

>>>> +				 unsigned int *run_duration_ms,

>>>> +				 unsigned int *idle_duration_ms)

>>>> +{

>>>> +	*run_duration_ms = atomic_read(&ii_dev->run_duration_ms);

>>>> +	*idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);

>>>> +}

>>>> +

>>>> +/**

>>>> + * idle_injection_start - starts the idle injections

>>>> + * @ii_dev: a pointer to an idle_injection_device structure

>>>> + *

>>>> + * The function starts the idle injection cycles by first waking up

>>>> + * all the tasks the ii_dev is attached to and let them handle the

>>>> + * idle-run periods.

>>>> + *

>>>> + * Return: -EINVAL if the idle or the running durations are not set.

>>>> + */

>>>> +int idle_injection_start(struct idle_injection_device *ii_dev)

>>>> +{

>>>> +	if (!atomic_read(&ii_dev->idle_duration_ms))

>>>> +		return -EINVAL;

>>>> +

>>>> +	if (!atomic_read(&ii_dev->run_duration_ms))

>>>> +		return -EINVAL;

>>>> +

> 

> So according to above comments from me, I am saying that this

> particular test isn't really required as we may want to run idle loop

> only once.


Note that will be the same than calling play_idle() synchronously.

>>>> +	pr_debug("Starting injecting idle cycles on CPUs '%*pbl'\n",

>>>> +		 cpumask_pr_args(ii_dev->cpumask));

>>>> +

>>>> +	idle_injection_wakeup(ii_dev);

>>>> +

>>>> +	return 0;

>>>> +}

>>>> +

>>>> +/**

>>>> + * idle_injection_stop - stops the idle injections

>>>> + * @ii_dev: a pointer to an idle injection_device structure

>>>> + *

>>>> + * The function stops the idle injection by canceling the timer in

>>>> + * charge of waking up the tasks to inject idle and unset the idle and

>>>> + * running durations.

>>>> + */

>>>> +void idle_injection_stop(struct idle_injection_device *ii_dev)

>>>> +{

>>>> +	pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",

>>>> +		 cpumask_pr_args(ii_dev->cpumask));

>>>> +

>>>> +	hrtimer_cancel(&ii_dev->timer);

>>>

>>> How are we sure that idle_injection_fn() isn't running at this point

>>> and it would start the timer cancelled here again ?

>>

>> Nothing will ensure that. We will have an extra idle injection in this

>> case. We can invert the set_duration(0,0) and the timer cancellation to

>> reduce to reduce the window.

> 

> That's what I thought and so its racy. If someone calls

> idle_injection_unregister(), then we call this routine and then free

> the data structures while they are still getting used by the thread :(


Yes, we need to make the framework single-user, a refcount should be
enough. However, register() returns a pointer and the caller of
unregister must have this pointer. If it is the case, then register and
unregister code collaborate, if the one calling unregister cuts the
branch of the user of the idle_injection then we have braindead code.

We can handle this case by adding locks or we can have a single-user of
the framework without lock. We don't expect a lot of idle injection
users (I see only two right now and they are mutually exclusive), so
having lockless code is ok for me.


>>>> +

>>>> +	idle_injection_set_duration(ii_dev, 0, 0);

>>>

>>> And why exactly this this required ? Why shouldn't we allow this

>>> sequence to work:

>>>

>>> idle_injection_set_duration()

>>> idle_injection_start()

>>> idle_injection_stop()

>>> idle_injection_start()

>>> idle_injection_stop()

>>> idle_injection_start()

>>> idle_injection_stop()

>>

>> Sorry, I don't get it.

>>

>> Who will decide to start() and stop() ?

> 

> Some confusion here about the usecase then. How do you see this stuff

> getting used and how users (cooling-driver ?) will use it ?


The cooling device computes the duration and sets it every time
set_cur_state is called.

when the mitigation begins (prev state = 0 and cur state > 0), it starts
the threads.

When the mitigation ends (prev_state > 0 and cur_state = 0), it calls stop()

In between, set_duration is called via set_cur_state to vary the
mitigation effect.

-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Viresh Kumar May 23, 2018, 9:55 a.m. UTC | #4
On 23-05-18, 10:00, Daniel Lezcano wrote:
> On 23/05/2018 07:41, Viresh Kumar wrote:

> > Right and that's why I said "Which is fine" in my comment above. My

> > question was more on why we error out in idle_injection_start() if

> > run_duration_ms is 0.

> > 

> > Just for my understanding, is it a valid usecase where we want to run

> > the idle loop only once ? i.e. set idle_duration_ms to a non-zero

> > value but run_duration_ms to 0 ? In that case we shouldn't check for

> > zero run_duration_ms in idle_injection_start().

> 

> Yes, that could be a valid use case if we want to synchronously inject

> idle cycles without period.

> 

> IOW, call play_idle() on a set of cpus at the same time. And the caller

> of start is the one with the control of the period.

> 

> If you want this usecase, we need to implement more things:


Well I was trying to understand the use case you have in mind. Nothing
else. So this stuff isn't required anymore.

> >>>> +void idle_injection_set_duration(struct idle_injection_device *ii_dev,

> >>>> +				 unsigned int run_duration_ms,

> >>>> +				 unsigned int idle_duration_ms)

> >>>> +{

> >>>> +	atomic_set(&ii_dev->run_duration_ms, run_duration_ms);

> >>>> +	atomic_set(&ii_dev->idle_duration_ms, idle_duration_ms);

> >>>

> >>> You check for valid values of these in idle_injection_start() but not

> >>> here, why ?

> >>

> >> By checking against a zero values in the start function is a way to make

> >> sure we are not starting the idle injection with uninitialized values

> >> and by setting the duration to zero is a way to stop the idle injection.

> > 

> > Why do we need two ways of stopping the idle injection thread ? Why

> > isn't just calling idle_injection_stop() the right thing to do in that

> > case ?

> 

> How do we prevent the last kthread in the idle_injection_fn to set the

> timer ?


Okay, I get the problem now. But this doesn't stop the kthread in a
guaranteed way and we probably need a different solution. More later..

> >>>> +void idle_injection_stop(struct idle_injection_device *ii_dev)

> >>>> +{

> >>>> +	pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",

> >>>> +		 cpumask_pr_args(ii_dev->cpumask));

> >>>> +

> >>>> +	hrtimer_cancel(&ii_dev->timer);

> >>>

> >>> How are we sure that idle_injection_fn() isn't running at this point

> >>> and it would start the timer cancelled here again ?

> >>

> >> Nothing will ensure that. We will have an extra idle injection in this

> >> case. We can invert the set_duration(0,0) and the timer cancellation to

> >> reduce to reduce the window.

> > 

> > That's what I thought and so its racy. If someone calls

> > idle_injection_unregister(), then we call this routine and then free

> > the data structures while they are still getting used by the thread :(

> 

> Yes, we need to make the framework single-user, a refcount should be

> enough. However, register() returns a pointer and the caller of

> unregister must have this pointer. If it is the case, then register and

> unregister code collaborate, if the one calling unregister cuts the

> branch of the user of the idle_injection then we have braindead code.

> 

> We can handle this case by adding locks or we can have a single-user of

> the framework without lock. We don't expect a lot of idle injection

> users (I see only two right now and they are mutually exclusive), so

> having lockless code is ok for me.


Maybe I wasn't able to explain the problem I see, but lemme retry
that. Assume that there is only one use and that id cpu-idle-cooling.
We are currently running the idle loop with idle duration X and run
duration Y.

Now lets say the cooling device gets unregistered itself (maybe module
removal, etc). And it calls idle_injection_unregister() with a valid
pointer. Not sure if the thermal framework will call set_cur_state
anymore. But the problem will remain even if it does that.

We call idle_injection_stop() from unregister, which will cancel
hrtimer, set durations as 0 and return. Then we free the iidev. It is
certainly possible at this point of time that the kthread is still
running the idle loop which it may have started before unregister was
called. And so after the idle loop is finished it will try to access
ii_dev, which is already freed.

So, idle_injection_stop() needs to guarantee that the kthread and the
hrtimer are all stopped now and no one is using the ii_dev structure
anymore.

Perhaps you need some completion stuff here to give confirmation here,
etc.

-- 
viresh
Daniel Lezcano May 23, 2018, 1:19 p.m. UTC | #5
On 23/05/2018 11:55, Viresh Kumar wrote:
> On 23-05-18, 10:00, Daniel Lezcano wrote:


[ ... ]

> Maybe I wasn't able to explain the problem I see, but lemme retry

> that. Assume that there is only one use and that id cpu-idle-cooling.

> We are currently running the idle loop with idle duration X and run

> duration Y.

> 

> Now lets say the cooling device gets unregistered itself (maybe module

> removal, etc). And it calls idle_injection_unregister() with a valid

> pointer. Not sure if the thermal framework will call set_cur_state

> anymore. But the problem will remain even if it does that.

> 

> We call idle_injection_stop() from unregister, which will cancel

> hrtimer, set durations as 0 and return. Then we free the iidev. It is

> certainly possible at this point of time that the kthread is still

> running the idle loop which it may have started before unregister was

> called. And so after the idle loop is finished it will try to access

> ii_dev, which is already freed.

> 

> So, idle_injection_stop() needs to guarantee that the kthread and the

> hrtimer are all stopped now and no one is using the ii_dev structure

> anymore.

> 

> Perhaps you need some completion stuff here to give confirmation here,

> etc.


Ok, let me come back with something.

Thanks for reviewing again.

  -- Daniel



-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
diff mbox series

Patch

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index 85727ef..a767ef2 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -29,4 +29,14 @@  config INTEL_RAPL
 	  controller, CPU core (Power Plance 0), graphics uncore (Power Plane
 	  1), etc.
 
+config IDLE_INJECTION
+	bool "Idle injection framework"
+	depends on CPU_IDLE
+	default n
+	help
+	  This enables support for the idle injection framework. It
+	  provides a way to force idle periods on a set of specified
+	  CPUs for power capping. Idle period can be injected
+	  synchronously on a set of specified CPUs or alternatively
+	  on a per CPU basis.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 0a21ef3..c3bbfee 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,2 +1,3 @@ 
 obj-$(CONFIG_POWERCAP)	+= powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl.o
+obj-$(CONFIG_IDLE_INJECTION) += idle_injection.o
diff --git a/drivers/powercap/idle_injection.c b/drivers/powercap/idle_injection.c
new file mode 100644
index 0000000..a5fe205
--- /dev/null
+++ b/drivers/powercap/idle_injection.c
@@ -0,0 +1,326 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * drivers/powercap/idle_injection.c
+ *
+ * Copyright 2018 Linaro Limited
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ *
+ * The idle injection framework proposes a way to force a cpu to enter
+ * an idle state during a specified amount of time for a specified
+ * period.
+ *
+ * It relies on the smpboot kthreads which handles, via its main loop,
+ * the common code for hotplugging and [un]parking.
+ *
+ * At init time, all the kthreads are created and parked.
+ *
+ * A cpumask is specified as parameter for the idle injection
+ * registering function. The kthreads will be synchronized regarding
+ * this cpumask.
+ *
+ * The idle + run duration is specified via the helpers and then the
+ * idle injection can be started at this point.
+ *
+ * A kthread will call play_idle() with the specified idle duration
+ * from above and then will schedule itself. The latest CPU belonging
+ * to the group is in charge of setting the timer for the next idle
+ * injection deadline.
+ *
+ * The task handling the timer interrupt will wakeup all the kthreads
+ * belonging to the cpumask.
+ */
+#include <linux/cpu.h>
+#include <linux/freezer.h>
+#include <linux/hrtimer.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/smpboot.h>
+
+#include <uapi/linux/sched/types.h>
+
+/**
+ * struct idle_injection_thread - task on/off switch structure
+ * @tsk: a pointer to a task_struct injecting the idle cycles
+ * @should_run: a integer used as a boolean by the smpboot kthread API
+ */
+struct idle_injection_thread {
+	struct task_struct *tsk;
+	int should_run;
+};
+
+/**
+ * struct idle_injection_device - data for the idle injection
+ * @cpumask: a cpumask containing the list of CPUs managed by the device
+ * @timer: a hrtimer giving the tempo for the idle injection
+ * @count: an atomic to keep track of the last task exiting the idle cycle
+ * @idle_duration_ms: an atomic specifying the idle duration
+ * @run_duration_ms: an atomic specifying the running duration
+ */
+struct idle_injection_device {
+	cpumask_var_t cpumask;
+	struct hrtimer timer;
+	atomic_t count;
+	atomic_t idle_duration_ms;
+	atomic_t run_duration_ms;
+};
+
+static DEFINE_PER_CPU(struct idle_injection_thread, idle_injection_thread);
+static DEFINE_PER_CPU(struct idle_injection_device *, idle_injection_device);
+
+/**
+ * idle_injection_wakeup - Wake up all idle injection threads
+ * @ii_dev: the idle injection device
+ *
+ * Every idle injection task belonging to the idle injection device
+ * and running on an online CPU will be wake up by this call.
+ */
+static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
+{
+	struct idle_injection_thread *iit;
+	int cpu;
+
+	for_each_cpu_and(cpu, ii_dev->cpumask, cpu_online_mask) {
+		iit = per_cpu_ptr(&idle_injection_thread, cpu);
+		iit->should_run = 1;
+		wake_up_process(iit->tsk);
+	}
+}
+
+/**
+ * idle_injection_wakeup_fn - idle injection timer callback
+ * @timer: a hrtimer structure
+ *
+ * This function is called when the idle injection timer expires which
+ * will wake up the idle injection tasks and these ones, in turn, play
+ * idle a specified amount of time.
+ *
+ * Return: HRTIMER_NORESTART.
+ */
+static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)
+{
+	struct idle_injection_device *ii_dev =
+		container_of(timer, struct idle_injection_device, timer);
+
+	idle_injection_wakeup(ii_dev);
+
+	return HRTIMER_NORESTART;
+}
+
+/**
+ * idle_injection_fn - idle injection routine
+ * @cpu: the CPU number the tasks belongs to
+ *
+ * The idle injection routine will stay idle the specified amount of
+ * time
+ */
+static void idle_injection_fn(unsigned int cpu)
+{
+	struct idle_injection_device *ii_dev;
+	struct idle_injection_thread *iit;
+	int run_duration_ms, idle_duration_ms;
+
+	ii_dev = per_cpu(idle_injection_device, cpu);
+
+	iit = per_cpu_ptr(&idle_injection_thread, cpu);
+
+	/*
+	 * Boolean used by the smpboot mainloop and used as a flip-flop
+	 * in this function
+	 */
+	iit->should_run = 0;
+
+	atomic_inc(&ii_dev->count);
+
+	idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);
+
+	play_idle(idle_duration_ms);
+
+	/*
+	 * The last CPU waking up is in charge of setting the timer. If
+	 * the CPU is hotplugged, the timer will move to another CPU
+	 * (which may not belong to the same cluster) but that is not a
+	 * problem as the timer will be set again by another CPU
+	 * belonging to the cluster. This mechanism is self adaptive.
+	 */
+	if (!atomic_dec_and_test(&ii_dev->count))
+		return;
+
+	run_duration_ms = atomic_read(&ii_dev->run_duration_ms);
+	if (!run_duration_ms)
+		return;
+
+	hrtimer_start(&ii_dev->timer, ms_to_ktime(run_duration_ms),
+		      HRTIMER_MODE_REL_PINNED);
+}
+
+/**
+ * idle_injection_set_duration - idle and run duration helper
+ * @run_duration_ms: an unsigned int giving the running time in milliseconds
+ * @idle_duration_ms: an unsigned int giving the idle time in milliseconds
+ */
+void idle_injection_set_duration(struct idle_injection_device *ii_dev,
+				 unsigned int run_duration_ms,
+				 unsigned int idle_duration_ms)
+{
+	atomic_set(&ii_dev->run_duration_ms, run_duration_ms);
+	atomic_set(&ii_dev->idle_duration_ms, idle_duration_ms);
+}
+
+/**
+ * idle_injection_get_duration - idle and run duration helper
+ * @run_duration_ms: a pointer to an unsigned int to store the running time
+ * @idle_duration_ms: a pointer to an unsigned int to store the idle time
+ */
+void idle_injection_get_duration(struct idle_injection_device *ii_dev,
+				 unsigned int *run_duration_ms,
+				 unsigned int *idle_duration_ms)
+{
+	*run_duration_ms = atomic_read(&ii_dev->run_duration_ms);
+	*idle_duration_ms = atomic_read(&ii_dev->idle_duration_ms);
+}
+
+/**
+ * idle_injection_start - starts the idle injections
+ * @ii_dev: a pointer to an idle_injection_device structure
+ *
+ * The function starts the idle injection cycles by first waking up
+ * all the tasks the ii_dev is attached to and let them handle the
+ * idle-run periods.
+ *
+ * Return: -EINVAL if the idle or the running durations are not set.
+ */
+int idle_injection_start(struct idle_injection_device *ii_dev)
+{
+	if (!atomic_read(&ii_dev->idle_duration_ms))
+		return -EINVAL;
+
+	if (!atomic_read(&ii_dev->run_duration_ms))
+		return -EINVAL;
+
+	pr_debug("Starting injecting idle cycles on CPUs '%*pbl'\n",
+		 cpumask_pr_args(ii_dev->cpumask));
+
+	idle_injection_wakeup(ii_dev);
+
+	return 0;
+}
+
+/**
+ * idle_injection_stop - stops the idle injections
+ * @ii_dev: a pointer to an idle injection_device structure
+ *
+ * The function stops the idle injection by canceling the timer in
+ * charge of waking up the tasks to inject idle and unset the idle and
+ * running durations.
+ */
+void idle_injection_stop(struct idle_injection_device *ii_dev)
+{
+	pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",
+		 cpumask_pr_args(ii_dev->cpumask));
+
+	hrtimer_cancel(&ii_dev->timer);
+
+	idle_injection_set_duration(ii_dev, 0, 0);
+}
+
+/**
+ * idle_injection_setup - initialize the current task as a RT task
+ * @cpu: the CPU number where the kthread is running on (not used)
+ *
+ * Called one time, this function is in charge of setting the task
+ * scheduler parameters.
+ */
+static void idle_injection_setup(unsigned int cpu)
+{
+	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
+
+	set_freezable();
+
+	sched_setscheduler(current, SCHED_FIFO, &param);
+}
+
+/**
+ * idle_injection_should_run - function helper for the smpboot API
+ * @cpu: the CPU number where the kthread is running on
+ *
+ * Return: a boolean telling if the thread can run.
+ */
+static int idle_injection_should_run(unsigned int cpu)
+{
+	struct idle_injection_thread *iit =
+		per_cpu_ptr(&idle_injection_thread, cpu);
+
+	return iit->should_run;
+}
+
+/**
+ * idle_injection_register - idle injection init routine
+ * @cpumask: the list of CPUs managed by the idle injection device
+ *
+ * This is the initialization function in charge of creating the
+ * initializing the timer and allocate the structures. It does not
+ * starts the idle injection cycles.
+ *
+ * Return: NULL if an allocation fails.
+ */
+struct idle_injection_device *idle_injection_register(struct cpumask *cpumask)
+{
+	struct idle_injection_device *ii_dev;
+	int cpu;
+
+	ii_dev = kzalloc(sizeof(*ii_dev), GFP_KERNEL);
+	if (!ii_dev)
+		return NULL;
+
+	if (!alloc_cpumask_var(&ii_dev->cpumask, GFP_KERNEL))
+		goto out_free_ii_dev;
+	cpumask_copy(ii_dev->cpumask, cpumask);
+
+	hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+
+	ii_dev->timer.function = idle_injection_wakeup_fn;
+
+	for_each_cpu(cpu, cpumask)
+		per_cpu(idle_injection_device, cpu) = ii_dev;
+
+	return ii_dev;
+
+out_free_ii_dev:
+	kfree(ii_dev);
+	return NULL;
+}
+
+/**
+ * idle_injection_unregister - Unregister the idle injection device
+ * @ii_dev: a pointer to an idle injection device
+ *
+ * The function is in charge of stopping the idle injections,
+ * unregister the kthreads and free the allocated memory in the
+ * register function.
+ */
+void idle_injection_unregister(struct idle_injection_device *ii_dev)
+{
+	int cpu;
+
+	idle_injection_stop(ii_dev);
+
+	for_each_cpu(cpu, ii_dev->cpumask)
+		per_cpu(idle_injection_device, cpu) = NULL;
+
+	kfree(ii_dev);
+}
+
+static struct smp_hotplug_thread idle_injection_threads = {
+	.store = &idle_injection_thread.tsk,
+	.setup = idle_injection_setup,
+	.thread_fn = idle_injection_fn,
+	.thread_comm = "idle_inject/%u",
+	.thread_should_run = idle_injection_should_run,
+};
+
+static int __init idle_injection_init(void)
+{
+	return smpboot_register_percpu_thread(&idle_injection_threads);
+}
+early_initcall(idle_injection_init);
diff --git a/include/linux/idle_injection.h b/include/linux/idle_injection.h
new file mode 100644
index 0000000..ffd20d7
--- /dev/null
+++ b/include/linux/idle_injection.h
@@ -0,0 +1,29 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2018 Linaro Ltd
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ *
+ */
+#ifndef __IDLE_INJECTION_H__
+#define __IDLE_INJECTION_H__
+
+/* private idle injection device structure */
+struct idle_injection_device;
+
+struct idle_injection_device *idle_injection_register(struct cpumask *cpumask);
+
+void idle_injection_unregister(struct idle_injection_device *ii_dev);
+
+int idle_injection_start(struct idle_injection_device *ii_dev);
+
+void idle_injection_stop(struct idle_injection_device *ii_dev);
+
+void idle_injection_set_duration(struct idle_injection_device *ii_dev,
+				 unsigned int run_duration_ms,
+				 unsigned int idle_duration_ms);
+
+void idle_injection_get_duration(struct idle_injection_device *ii_dev,
+				 unsigned int *run_duration_ms,
+				 unsigned int *idle_duration_ms);
+#endif /* __IDLE_INJECTION_H__ */