[V9] powercap/drivers/idle_injection: Add an idle injection framework

Message ID 1529414627-5638-1-git-send-email-daniel.lezcano@linaro.org
State New
Headers show
Series
  • [V9] powercap/drivers/idle_injection: Add an idle injection framework
Related show

Commit Message

Daniel Lezcano June 19, 2018, 1:23 p.m.
Initially, the cpu_cooling device for ARM was changed by adding a new
policy inserting idle cycles. The intel_powerclamp driver does a
similar action.

Instead of implementing idle injections privately in the cpu_cooling
device, move the idle injection code in a dedicated framework and give
the opportunity to other frameworks to make use of it.

The framework relies on the smpboot kthreads which handles via its
main loop the common code for hotplugging and [un]parking.

This code was previously tested with the cpu cooling device and went
through several iterations. It results now in split code and API
exported in the header file. It was tested with the cpu cooling device
with success.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Javi Merino <javi.merino@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kevin Wangtao <kevin.wangtao@linaro.org>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Rui Zhang <rui.zhang@intel.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Parri <andrea.parri@amarulasolutions.com>
---

V9:
   - Unconditionnally reset the should_run flag for all kthreads
     belonging to the cpumask and remove the park() callback (Viresh Kumar)
   - Fix up the typos in the comments (Viresh Kumar)

V8:
   - Rollback only what was modified
   - Add the park() callback to reset the should_run flag

V7:
   - Replace count approach by htimer_forward and restart (Peter Zijlstra)
   - Wait for task inactive when stopping the idle injections
   - Changed the comments and description

V6:
   - Move count to wake up function (Viresh Kumar)
   - Split atomic/non-atomic context to wake up tasks
   - Add the park callback to handle unplug/inject race
   - Replace atomic by READ_ONCE and WRITE_ONCE (Peter Zijlstra)

V5:
   - Move init_completion in the init function (Viresh Kumar)
   - Increment task count in the wakeup function (Viresh Kumar)
   - Remove rollback at init time (Viresh Kumar)

V4:
   - Wait for completion when stopping (Viresh Kumar)
   - Check init already done and rollback (Viresh Kumar)

V3:
   - Fixed typos (Viresh Kumar)
   - Removed extra blank line (Viresh Kumar)
   - Added full stop (Viresh Kumar)
   - Fixed Return kerneldoc format (Viresh Kumar)
   - Fixed multiple kthreads initialization (Viresh Kumar)
   - Fixed rollbacking the actions in the unregister function (Viresh Kumar)
   - Make idle injection kthreads name hardcoded
   - Kthreads are created in the initcall process

 V2: Fixed checkpatch warnings

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

---
 drivers/powercap/Kconfig          |  10 ++
 drivers/powercap/Makefile         |   1 +
 drivers/powercap/idle_injection.c | 362 ++++++++++++++++++++++++++++++++++++++
 include/linux/idle_injection.h    |  29 +++
 4 files changed, 402 insertions(+)
 create mode 100644 drivers/powercap/idle_injection.c
 create mode 100644 include/linux/idle_injection.h

-- 
2.7.4

Comments

Viresh Kumar June 20, 2018, 4:34 a.m. | #1
On 19-06-18, 15:23, Daniel Lezcano wrote:
> Initially, the cpu_cooling device for ARM was changed by adding a new

> policy inserting idle cycles. The intel_powerclamp driver does a

> similar action.

> 

> Instead of implementing idle injections privately in the cpu_cooling

> device, move the idle injection code in a dedicated framework and give

> the opportunity to other frameworks to make use of it.

> 

> The framework relies on the smpboot kthreads which handles via its

> main loop the common code for hotplugging and [un]parking.

> 

> This code was previously tested with the cpu cooling device and went

> through several iterations. It results now in split code and API

> exported in the header file. It was tested with the cpu cooling device

> with success.

> 

> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> Cc: Viresh Kumar <viresh.kumar@linaro.org>

> Cc: Eduardo Valentin <edubezval@gmail.com>

> Cc: Javi Merino <javi.merino@kernel.org>

> Cc: Leo Yan <leo.yan@linaro.org>

> Cc: Kevin Wangtao <kevin.wangtao@linaro.org>

> Cc: Vincent Guittot <vincent.guittot@linaro.org>

> Cc: Rui Zhang <rui.zhang@intel.com>

> Cc: Daniel Thompson <daniel.thompson@linaro.org>

> Cc: Peter Zijlstra <peterz@infradead.org>

> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>

> ---

> 

> V9:

>    - Unconditionnally reset the should_run flag for all kthreads

>      belonging to the cpumask and remove the park() callback (Viresh Kumar)

>    - Fix up the typos in the comments (Viresh Kumar)


Looks great now. Can't think of any more races :)

Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>


-- 
viresh
Daniel Lezcano June 24, 2018, 10:40 p.m. | #2
Hi Rafael,

do you think this patch is acceptable for merging ?



On 20/06/2018 06:34, Viresh Kumar wrote:
> On 19-06-18, 15:23, Daniel Lezcano wrote:

>> Initially, the cpu_cooling device for ARM was changed by adding a new

>> policy inserting idle cycles. The intel_powerclamp driver does a

>> similar action.

>>

>> Instead of implementing idle injections privately in the cpu_cooling

>> device, move the idle injection code in a dedicated framework and give

>> the opportunity to other frameworks to make use of it.

>>

>> The framework relies on the smpboot kthreads which handles via its

>> main loop the common code for hotplugging and [un]parking.

>>

>> This code was previously tested with the cpu cooling device and went

>> through several iterations. It results now in split code and API

>> exported in the header file. It was tested with the cpu cooling device

>> with success.

>>

>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

>> Cc: Viresh Kumar <viresh.kumar@linaro.org>

>> Cc: Eduardo Valentin <edubezval@gmail.com>

>> Cc: Javi Merino <javi.merino@kernel.org>

>> Cc: Leo Yan <leo.yan@linaro.org>

>> Cc: Kevin Wangtao <kevin.wangtao@linaro.org>

>> Cc: Vincent Guittot <vincent.guittot@linaro.org>

>> Cc: Rui Zhang <rui.zhang@intel.com>

>> Cc: Daniel Thompson <daniel.thompson@linaro.org>

>> Cc: Peter Zijlstra <peterz@infradead.org>

>> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>

>> ---

>>

>> V9:

>>    - Unconditionnally reset the should_run flag for all kthreads

>>      belonging to the cpumask and remove the park() callback (Viresh Kumar)

>>    - Fix up the typos in the comments (Viresh Kumar)

> 

> Looks great now. Can't think of any more races :)

> 

> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>

> 



-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Rafael J. Wysocki June 25, 2018, 8:20 a.m. | #3
Hi Daniel,

On Mon, Jun 25, 2018 at 12:40 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>

> Hi Rafael,

>

> do you think this patch is acceptable for merging ?


Well, it needs a bit of work still IMO, but let me comment on the patch itself.

Also, can you point me to the code using it that you have, please?
I'd like to try to switch powerclamp over to it.

Thanks,
Rafael
Rafael J. Wysocki June 25, 2018, 8:28 a.m. | #4
On Mon, Jun 25, 2018 at 10:23 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Tue, Jun 19, 2018 at 3:23 PM, Daniel Lezcano

> <daniel.lezcano@linaro.org> wrote:

>> Initially, the cpu_cooling device for ARM was changed by adding a new

>> policy inserting idle cycles. The intel_powerclamp driver does a

>> similar action.

>>

>> Instead of implementing idle injections privately in the cpu_cooling

>> device, move the idle injection code in a dedicated framework and give

>> the opportunity to other frameworks to make use of it.

>>

>> The framework relies on the smpboot kthreads which handles via its

>> main loop the common code for hotplugging and [un]parking.

>>

>> This code was previously tested with the cpu cooling device and went

>> through several iterations. It results now in split code and API

>> exported in the header file. It was tested with the cpu cooling device

>> with success.

>>

>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

>> Cc: Viresh Kumar <viresh.kumar@linaro.org>

>> Cc: Eduardo Valentin <edubezval@gmail.com>

>> Cc: Javi Merino <javi.merino@kernel.org>

>> Cc: Leo Yan <leo.yan@linaro.org>

>> Cc: Kevin Wangtao <kevin.wangtao@linaro.org>

>> Cc: Vincent Guittot <vincent.guittot@linaro.org>

>> Cc: Rui Zhang <rui.zhang@intel.com>

>> Cc: Daniel Thompson <daniel.thompson@linaro.org>

>> Cc: Peter Zijlstra <peterz@infradead.org>

>> Cc: Andrea Parri <andrea.parri@amarulasolutions.com>

>> ---

>>

>> V9:

>>    - Unconditionnally reset the should_run flag for all kthreads

>>      belonging to the cpumask and remove the park() callback (Viresh Kumar)

>>    - Fix up the typos in the comments (Viresh Kumar)

>>

>> V8:

>>    - Rollback only what was modified

>>    - Add the park() callback to reset the should_run flag

>>

>> V7:

>>    - Replace count approach by htimer_forward and restart (Peter Zijlstra)

>>    - Wait for task inactive when stopping the idle injections

>>    - Changed the comments and description

>>

>> V6:

>>    - Move count to wake up function (Viresh Kumar)

>>    - Split atomic/non-atomic context to wake up tasks

>>    - Add the park callback to handle unplug/inject race

>>    - Replace atomic by READ_ONCE and WRITE_ONCE (Peter Zijlstra)

>>

>> V5:

>>    - Move init_completion in the init function (Viresh Kumar)

>>    - Increment task count in the wakeup function (Viresh Kumar)

>>    - Remove rollback at init time (Viresh Kumar)

>>

>> V4:

>>    - Wait for completion when stopping (Viresh Kumar)

>>    - Check init already done and rollback (Viresh Kumar)

>>

>> V3:

>>    - Fixed typos (Viresh Kumar)

>>    - Removed extra blank line (Viresh Kumar)

>>    - Added full stop (Viresh Kumar)

>>    - Fixed Return kerneldoc format (Viresh Kumar)

>>    - Fixed multiple kthreads initialization (Viresh Kumar)

>>    - Fixed rollbacking the actions in the unregister function (Viresh Kumar)

>>    - Make idle injection kthreads name hardcoded

>>    - Kthreads are created in the initcall process

>>

>>  V2: Fixed checkpatch warnings

>>

>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

>> ---

>>  drivers/powercap/Kconfig          |  10 ++

>>  drivers/powercap/Makefile         |   1 +

>>  drivers/powercap/idle_injection.c | 362 ++++++++++++++++++++++++++++++++++++++

>>  include/linux/idle_injection.h    |  29 +++

>>  4 files changed, 402 insertions(+)

>>  create mode 100644 drivers/powercap/idle_injection.c

>>  create mode 100644 include/linux/idle_injection.h

>>

>> diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig

>> index 85727ef..a767ef2 100644

>> --- a/drivers/powercap/Kconfig

>> +++ b/drivers/powercap/Kconfig

>> @@ -29,4 +29,14 @@ config INTEL_RAPL

>>           controller, CPU core (Power Plance 0), graphics uncore (Power Plane

>>           1), etc.

>>

>> +config IDLE_INJECTION

>> +       bool "Idle injection framework"

>> +       depends on CPU_IDLE

>> +       default n

>> +       help

>> +         This enables support for the idle injection framework. It

>> +         provides a way to force idle periods on a set of specified

>> +         CPUs for power capping. Idle period can be injected

>> +         synchronously on a set of specified CPUs or alternatively

>> +         on a per CPU basis.

>>  endif

>> diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile

>> index 0a21ef3..c3bbfee 100644

>> --- a/drivers/powercap/Makefile

>> +++ b/drivers/powercap/Makefile

>> @@ -1,2 +1,3 @@

>>  obj-$(CONFIG_POWERCAP) += powercap_sys.o

>>  obj-$(CONFIG_INTEL_RAPL) += intel_rapl.o

>> +obj-$(CONFIG_IDLE_INJECTION) += idle_injection.o

>> diff --git a/drivers/powercap/idle_injection.c b/drivers/powercap/idle_injection.c

>> new file mode 100644

>> index 0000000..50ce348

>> --- /dev/null

>> +++ b/drivers/powercap/idle_injection.c

>> @@ -0,0 +1,362 @@

>> +// SPDX-License-Identifier: GPL-2.0

>> +/*

>> + * Copyright 2018 Linaro Limited

>> + *

>> + * Author: Daniel Lezcano <daniel.lezcano@linaro.org>

>> + *

>> + * The idle injection framework proposes a way to force a cpu to enter

>> + * an idle state during a specified amount of time for a specified

>> + * period.

>> + *

>> + * It relies on the smpboot kthreads which handles, via its main loop,

>> + * the common code for hotplugging and [un]parking.

>> + *

>> + * At init time, all the kthreads are created.

>> + *

>> + * A cpumask is specified as parameter for the idle injection

>> + * registering function. The kthreads will be synchronized regarding

>> + * this cpumask.

>> + *

>> + * The idle + run duration is specified via the helpers and then the

>> + * idle injection can be started at this point.

>> + *

>> + * A kthread will call play_idle() with the specified idle duration

>> + * from above.

>> + *

>> + * A timer is set after waking up all the tasks, to the next idle

>> + * injection cycle.

>> + *

>> + * The task handling the timer interrupt will wakeup all the kthreads

>> + * belonging to the cpumask.

>> + *

>> + * Stopping the idle injection is synchronous, when the function

>> + * returns, there is the guarantee there is no more idle injection

>> + * kthread in activity.

>> + *

>> + * It is up to the user of this framework to provide a lock at an

>> + * upper level to prevent stupid things to happen, like starting while

>> + * we are unregistering.

>> + */

>

> The English above and elsewhere needs some polishing IMO, but I can

> take care of that. :-)

>

>> +#define pr_fmt(fmt) "ii_dev: " fmt

>> +

>> +#include <linux/cpu.h>

>> +#include <linux/freezer.h>

>> +#include <linux/hrtimer.h>

>> +#include <linux/kthread.h>

>> +#include <linux/sched.h>

>> +#include <linux/slab.h>

>> +#include <linux/smpboot.h>

>> +

>> +#include <uapi/linux/sched/types.h>

>> +

>> +/**

>> + * struct idle_injection_thread - task on/off switch structure

>> + * @tsk: a pointer to a task_struct injecting the idle cycles

>> + * @should_run: a integer used as a boolean by the smpboot kthread API

>> + */

>> +struct idle_injection_thread {

>> +       struct task_struct *tsk;

>> +       int should_run;

>> +};


Besides, if you don't mind, I would shorten idle_injection_ prefix
everywhere to idle_inject_ which is shorter and carries the same
information IMO.
Rafael J. Wysocki June 25, 2018, 8:41 a.m. | #5
On Tue, Jun 19, 2018 at 3:23 PM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:

[cut]

One more thing.

> +/**

> + * idle_injection_register - idle injection init routine

> + * @cpumask: the list of CPUs managed by the idle injection device

> + *

> + * This is the initialization function in charge of creating the

> + * initializing of the timer and allocate the structures. It does not

> + * starts the idle injection cycles.

> + *

> + * Return: NULL if an allocation fails.

> + */

> +struct idle_injection_device *idle_injection_register(struct cpumask *cpumask)

> +{

> +       struct idle_injection_device *ii_dev;

> +       int cpu, cpu_rb;

> +

> +       ii_dev = kzalloc(sizeof(*ii_dev) + cpumask_size(), GFP_KERNEL);

> +       if (!ii_dev)

> +               return NULL;

> +

> +       cpumask_copy(to_cpumask(ii_dev->cpumask), cpumask);

> +       hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);

> +       ii_dev->timer.function = idle_injection_wakeup_fn;

> +

> +       for_each_cpu(cpu, to_cpumask(ii_dev->cpumask)) {

> +

> +               if (per_cpu(idle_injection_device, cpu)) {

> +                       pr_err("cpu%d is already registered\n", cpu);


If you print something like this, it should be clear what it is about
and what piece of code it comes from as there will be no context
around it it the log.

> +                       goto out_rollback_per_cpu;


Shorten the label perhaps?

> +               }

> +

> +               per_cpu(idle_injection_device, cpu) = ii_dev;

> +       }

> +

> +       return ii_dev;

> +

> +out_rollback_per_cpu:

> +       for_each_cpu(cpu_rb, to_cpumask(ii_dev->cpumask)) {

> +               if (cpu == cpu_rb)

> +                       break;

> +               per_cpu(idle_injection_device, cpu_rb) = NULL;

> +       }

> +

> +       kfree(ii_dev);

> +

> +       return NULL;

> +}

> +
Daniel Lezcano June 25, 2018, 9:44 a.m. | #6
On 25/06/2018 10:20, Rafael J. Wysocki wrote:
> Hi Daniel,

> 

> On Mon, Jun 25, 2018 at 12:40 AM, Daniel Lezcano

> <daniel.lezcano@linaro.org> wrote:

>>

>> Hi Rafael,

>>

>> do you think this patch is acceptable for merging ?

> 

> Well, it needs a bit of work still IMO, but let me comment on the patch itself.

> 

> Also, can you point me to the code using it that you have, please?

> I'd like to try to switch powerclamp over to it.



Sure, here it is:

https://git.linaro.org/people/daniel.lezcano/linux.git/log/?h=thermal/idle-inject-v4




-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Viresh Kumar June 25, 2018, 9:45 a.m. | #7
On 25-06-18, 10:41, Rafael J. Wysocki wrote:
> On Tue, Jun 19, 2018 at 3:23 PM, Daniel Lezcano

> > +               if (per_cpu(idle_injection_device, cpu)) {

> > +                       pr_err("cpu%d is already registered\n", cpu);

> 

> If you print something like this, it should be clear what it is about

> and what piece of code it comes from as there will be no context

> around it it the log.


It should be good as we have this at the top.

#define pr_fmt(fmt) "ii_dev: " fmt


-- 
viresh
Rafael J. Wysocki June 25, 2018, 10 a.m. | #8
On Mon, Jun 25, 2018 at 11:44 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 25-06-18, 10:23, Rafael J. Wysocki wrote:

>> On Tue, Jun 19, 2018 at 3:23 PM, Daniel Lezcano

>> > +void idle_injection_get_duration(struct idle_injection_device *ii_dev,

>> > +                                unsigned int *run_duration_ms,

>> > +                                unsigned int *idle_duration_ms)

>> > +{

>> > +       *run_duration_ms = READ_ONCE(ii_dev->run_duration_ms);

>> > +       *idle_duration_ms = READ_ONCE(ii_dev->idle_duration_ms);

>>

>> Should you check the arguments against NULL before attempting to

>> dereference them?  If not, then why not?

>

> I would vote with a NO. This is a mandatory parameter and not filling it in with

> a valid pointer is a BUG really. I don't see a reason why we should prevent the

> kernel from crashing if such a mistake happens :)


Well, fair enough.
Daniel Lezcano June 25, 2018, 11:48 a.m. | #9
On 25/06/2018 10:41, Rafael J. Wysocki wrote:
> On Tue, Jun 19, 2018 at 3:23 PM, Daniel Lezcano

> <daniel.lezcano@linaro.org> wrote:

> 

> [cut]

> 

> One more thing.

> 

>> +/**

>> + * idle_injection_register - idle injection init routine

>> + * @cpumask: the list of CPUs managed by the idle injection device

>> + *

>> + * This is the initialization function in charge of creating the

>> + * initializing of the timer and allocate the structures. It does not

>> + * starts the idle injection cycles.

>> + *

>> + * Return: NULL if an allocation fails.

>> + */

>> +struct idle_injection_device *idle_injection_register(struct cpumask *cpumask)

>> +{

>> +       struct idle_injection_device *ii_dev;

>> +       int cpu, cpu_rb;

>> +

>> +       ii_dev = kzalloc(sizeof(*ii_dev) + cpumask_size(), GFP_KERNEL);

>> +       if (!ii_dev)

>> +               return NULL;

>> +

>> +       cpumask_copy(to_cpumask(ii_dev->cpumask), cpumask);

>> +       hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);

>> +       ii_dev->timer.function = idle_injection_wakeup_fn;

>> +

>> +       for_each_cpu(cpu, to_cpumask(ii_dev->cpumask)) {

>> +

>> +               if (per_cpu(idle_injection_device, cpu)) {

>> +                       pr_err("cpu%d is already registered\n", cpu);

> 

> If you print something like this, it should be clear what it is about

> and what piece of code it comes from as there will be no context

> around it it the log.


Is the prefix as pointed by Viresh enough ? Or do want me to add a trace
like:

pr_err("Failed to register 'cpu%d', it is already registered", cpu);
(where "ii_dev:" gives the context)

?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Daniel Lezcano June 25, 2018, 11:52 a.m. | #10
On 25/06/2018 10:23, Rafael J. Wysocki wrote:

[ ... ]

>> + * It is up to the user of this framework to provide a lock at an

>> + * upper level to prevent stupid things to happen, like starting while

>> + * we are unregistering.

>> + */

> 

> The English above and elsewhere needs some polishing IMO, but I can

> take care of that. :-)


I can give a try and if you are still unhappy, you change them in a
better way.

[ ... ]

>> +static void idle_injection_setup(unsigned int cpu)

>> +{

>> +       struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };

>> +

>> +       set_freezable();

> 

> Why do you need set_freezable() here?


We don't want to continue injecting idle cycles when the system is
suspended.




-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Rafael J. Wysocki June 25, 2018, 9:55 p.m. | #11
On Mon, Jun 25, 2018 at 1:52 PM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 25/06/2018 10:23, Rafael J. Wysocki wrote:

>

> [ ... ]

>

>>> + * It is up to the user of this framework to provide a lock at an

>>> + * upper level to prevent stupid things to happen, like starting while

>>> + * we are unregistering.

>>> + */

>>

>> The English above and elsewhere needs some polishing IMO, but I can

>> take care of that. :-)

>

> I can give a try and if you are still unhappy, you change them in a

> better way.


Well, I could tell you what I would write, but then it'll take less
time and effort if I just write it myself. :-)

> [ ... ]

>

>>> +static void idle_injection_setup(unsigned int cpu)

>>> +{

>>> +       struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };

>>> +

>>> +       set_freezable();

>>

>> Why do you need set_freezable() here?

>

> We don't want to continue injecting idle cycles when the system is

> suspended.


And where's the corresponding try_to_freeze() called?
Rafael J. Wysocki June 26, 2018, 9:27 a.m. | #12
On Tue, Jun 26, 2018 at 12:15 AM, Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
> On 25/06/2018 23:55, Rafael J. Wysocki wrote:

>> On Mon, Jun 25, 2018 at 1:52 PM, Daniel Lezcano

>> <daniel.lezcano@linaro.org> wrote:

>>> On 25/06/2018 10:23, Rafael J. Wysocki wrote:

>>>

>>> [ ... ]

>>>

>>>>> + * It is up to the user of this framework to provide a lock at an

>>>>> + * upper level to prevent stupid things to happen, like starting while

>>>>> + * we are unregistering.

>>>>> + */

>>>>

>>>> The English above and elsewhere needs some polishing IMO, but I can

>>>> take care of that. :-)

>>>

>>> I can give a try and if you are still unhappy, you change them in a

>>> better way.

>>

>> Well, I could tell you what I would write, but then it'll take less

>> time and effort if I just write it myself. :-)

>

> Ok.

>

>>> [ ... ]

>>>

>>>>> +static void idle_injection_setup(unsigned int cpu)

>>>>> +{

>>>>> +       struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };

>>>>> +

>>>>> +       set_freezable();

>>>>

>>>> Why do you need set_freezable() here?

>>>

>>> We don't want to continue injecting idle cycles when the system is

>>> suspended.

>>

>> And where's the corresponding try_to_freeze() called?

>

> Yes, it is missing. I suppose try_to_freeze() should be put at the

> beginning of the idle_inject_fn() function, right ?


Well, given that people want to get rid of kthread freezing entirely,
adding it may not be a good idea at all.

Also I'm not sure if we should avoid injecting idle on system suspend.
It may slow down the suspend process, but that's OK and after that it
shouldn't matter.  And if it turns out to matter, we'll address the
problem instead of trying to avoid it by freezing the kthreads.

So, I would just drop the set_freezable() and leave it like that.

Patch

diff --git a/drivers/powercap/Kconfig b/drivers/powercap/Kconfig
index 85727ef..a767ef2 100644
--- a/drivers/powercap/Kconfig
+++ b/drivers/powercap/Kconfig
@@ -29,4 +29,14 @@  config INTEL_RAPL
 	  controller, CPU core (Power Plance 0), graphics uncore (Power Plane
 	  1), etc.
 
+config IDLE_INJECTION
+	bool "Idle injection framework"
+	depends on CPU_IDLE
+	default n
+	help
+	  This enables support for the idle injection framework. It
+	  provides a way to force idle periods on a set of specified
+	  CPUs for power capping. Idle period can be injected
+	  synchronously on a set of specified CPUs or alternatively
+	  on a per CPU basis.
 endif
diff --git a/drivers/powercap/Makefile b/drivers/powercap/Makefile
index 0a21ef3..c3bbfee 100644
--- a/drivers/powercap/Makefile
+++ b/drivers/powercap/Makefile
@@ -1,2 +1,3 @@ 
 obj-$(CONFIG_POWERCAP)	+= powercap_sys.o
 obj-$(CONFIG_INTEL_RAPL) += intel_rapl.o
+obj-$(CONFIG_IDLE_INJECTION) += idle_injection.o
diff --git a/drivers/powercap/idle_injection.c b/drivers/powercap/idle_injection.c
new file mode 100644
index 0000000..50ce348
--- /dev/null
+++ b/drivers/powercap/idle_injection.c
@@ -0,0 +1,362 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2018 Linaro Limited
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ *
+ * The idle injection framework proposes a way to force a cpu to enter
+ * an idle state during a specified amount of time for a specified
+ * period.
+ *
+ * It relies on the smpboot kthreads which handles, via its main loop,
+ * the common code for hotplugging and [un]parking.
+ *
+ * At init time, all the kthreads are created.
+ *
+ * A cpumask is specified as parameter for the idle injection
+ * registering function. The kthreads will be synchronized regarding
+ * this cpumask.
+ *
+ * The idle + run duration is specified via the helpers and then the
+ * idle injection can be started at this point.
+ *
+ * A kthread will call play_idle() with the specified idle duration
+ * from above.
+ *
+ * A timer is set after waking up all the tasks, to the next idle
+ * injection cycle.
+ *
+ * The task handling the timer interrupt will wakeup all the kthreads
+ * belonging to the cpumask.
+ *
+ * Stopping the idle injection is synchronous, when the function
+ * returns, there is the guarantee there is no more idle injection
+ * kthread in activity.
+ *
+ * It is up to the user of this framework to provide a lock at an
+ * upper level to prevent stupid things to happen, like starting while
+ * we are unregistering.
+ */
+#define pr_fmt(fmt) "ii_dev: " fmt
+
+#include <linux/cpu.h>
+#include <linux/freezer.h>
+#include <linux/hrtimer.h>
+#include <linux/kthread.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/smpboot.h>
+
+#include <uapi/linux/sched/types.h>
+
+/**
+ * struct idle_injection_thread - task on/off switch structure
+ * @tsk: a pointer to a task_struct injecting the idle cycles
+ * @should_run: a integer used as a boolean by the smpboot kthread API
+ */
+struct idle_injection_thread {
+	struct task_struct *tsk;
+	int should_run;
+};
+
+/**
+ * struct idle_injection_device - data for the idle injection
+ * @timer: a hrtimer giving the tempo for the idle injection
+ * @idle_duration_ms: an unsigned int specifying the idle duration
+ * @run_duration_ms: an unsigned int specifying the running duration
+ * @cpumask: a cpumask containing the list of CPUs managed by the device
+ */
+struct idle_injection_device {
+	struct hrtimer timer;
+	unsigned int idle_duration_ms;
+	unsigned int run_duration_ms;
+	unsigned long int cpumask[0];
+};
+
+static DEFINE_PER_CPU(struct idle_injection_thread, idle_injection_thread);
+static DEFINE_PER_CPU(struct idle_injection_device *, idle_injection_device);
+
+/**
+ * idle_injection_wakeup - Wake up all idle injection threads
+ * @ii_dev: the idle injection device
+ *
+ * Every idle injection task belonging to the idle injection device
+ * and running on an online CPU will be wake up by this call.
+ */
+static void idle_injection_wakeup(struct idle_injection_device *ii_dev)
+{
+	struct idle_injection_thread *iit;
+	unsigned int cpu;
+
+	for_each_cpu_and(cpu, to_cpumask(ii_dev->cpumask), cpu_online_mask) {
+		iit = per_cpu_ptr(&idle_injection_thread, cpu);
+		iit->should_run = 1;
+		wake_up_process(iit->tsk);
+	}
+}
+
+/**
+ * idle_injection_wakeup_fn - idle injection timer callback
+ * @timer: a hrtimer structure
+ *
+ * This function is called when the idle injection timer expires which
+ * will wake up the idle injection tasks and these ones, in turn, play
+ * idle a specified amount of time.
+ *
+ * Return: HRTIMER_RESTART.
+ */
+static enum hrtimer_restart idle_injection_wakeup_fn(struct hrtimer *timer)
+{
+	unsigned int duration_ms;
+	struct idle_injection_device *ii_dev =
+		container_of(timer, struct idle_injection_device, timer);
+
+	duration_ms = READ_ONCE(ii_dev->run_duration_ms);
+	duration_ms += READ_ONCE(ii_dev->idle_duration_ms);
+
+	idle_injection_wakeup(ii_dev);
+
+	hrtimer_forward_now(timer, ms_to_ktime(duration_ms));
+
+	return HRTIMER_RESTART;
+}
+
+/**
+ * idle_injection_fn - idle injection routine
+ * @cpu: the CPU number the task belongs to
+ *
+ * The idle injection routine will stay idle the specified amount of
+ * time
+ */
+static void idle_injection_fn(unsigned int cpu)
+{
+	struct idle_injection_device *ii_dev;
+	struct idle_injection_thread *iit;
+
+	ii_dev = per_cpu(idle_injection_device, cpu);
+	iit = per_cpu_ptr(&idle_injection_thread, cpu);
+
+	/*
+	 * Boolean used by the smpboot main loop and used as a
+	 * flip-flop in this function
+	 */
+	iit->should_run = 0;
+
+	play_idle(READ_ONCE(ii_dev->idle_duration_ms));
+}
+
+/**
+ * idle_injection_set_duration - idle and run duration helper
+ * @run_duration_ms: an unsigned int giving the running time in milliseconds
+ * @idle_duration_ms: an unsigned int giving the idle time in milliseconds
+ */
+void idle_injection_set_duration(struct idle_injection_device *ii_dev,
+				 unsigned int run_duration_ms,
+				 unsigned int idle_duration_ms)
+{
+	if (run_duration_ms && idle_duration_ms) {
+		WRITE_ONCE(ii_dev->run_duration_ms, run_duration_ms);
+		WRITE_ONCE(ii_dev->idle_duration_ms, idle_duration_ms);
+	}
+}
+
+/**
+ * idle_injection_get_duration - idle and run duration helper
+ * @run_duration_ms: a pointer to an unsigned int to store the running time
+ * @idle_duration_ms: a pointer to an unsigned int to store the idle time
+ */
+void idle_injection_get_duration(struct idle_injection_device *ii_dev,
+				 unsigned int *run_duration_ms,
+				 unsigned int *idle_duration_ms)
+{
+	*run_duration_ms = READ_ONCE(ii_dev->run_duration_ms);
+	*idle_duration_ms = READ_ONCE(ii_dev->idle_duration_ms);
+}
+
+/**
+ * idle_injection_start - starts the idle injections
+ * @ii_dev: a pointer to an idle_injection_device structure
+ *
+ * The function starts the idle injection cycles by first waking up
+ * all the tasks the ii_dev is attached to and let them handle the
+ * idle-run periods.
+ *
+ * Return: -EINVAL if the idle or the running durations are not set.
+ */
+int idle_injection_start(struct idle_injection_device *ii_dev)
+{
+	unsigned int idle_duration_ms = READ_ONCE(ii_dev->idle_duration_ms);
+	unsigned int run_duration_ms = READ_ONCE(ii_dev->run_duration_ms);
+
+	if (!idle_duration_ms || !run_duration_ms)
+		return -EINVAL;
+
+	pr_debug("Starting injecting idle cycles on CPUs '%*pbl'\n",
+		 cpumask_pr_args(to_cpumask(ii_dev->cpumask)));
+
+	idle_injection_wakeup(ii_dev);
+
+	hrtimer_start(&ii_dev->timer,
+		      ms_to_ktime(idle_duration_ms + run_duration_ms),
+		      HRTIMER_MODE_REL);
+
+	return 0;
+}
+
+/**
+ * idle_injection_stop - stops the idle injections
+ * @ii_dev: a pointer to an idle injection_device structure
+ *
+ * The function stops the idle injection and waits for the threads to
+ * complete. If we are in the process of injecting an idle cycle, then
+ * this will wait the end of the cycle.
+ *
+ * When the function returns there is no more idle injection
+ * activity. The kthreads are scheduled out and the periodic timer is
+ * off.
+ */
+void idle_injection_stop(struct idle_injection_device *ii_dev)
+{
+	struct idle_injection_thread *iit;
+	unsigned int cpu;
+
+	pr_debug("Stopping injecting idle cycles on CPUs '%*pbl'\n",
+		 cpumask_pr_args(to_cpumask(ii_dev->cpumask)));
+
+	hrtimer_cancel(&ii_dev->timer);
+
+	/*
+	 * We want the guarantee we have a quiescent point where
+	 * parked threads stay in there state while we are stopping
+	 * the idle injection. After exiting the loop, if any CPU is
+	 * plugged in, the 'should_run' boolean being false, the
+	 * smpboot main loop schedules the task out.
+	 */
+	cpu_hotplug_disable();
+
+	/*
+	 * Iterate over all (online + offline) CPUs here in case a CPU
+	 * got hotplugged out with "should_run" set to "1" and the
+	 * thread may start running again after the ii_dev is freed
+	 * and the CPU gets hotplugged in.
+	 */
+	for_each_cpu(cpu, to_cpumask(ii_dev->cpumask)) {
+		iit = per_cpu_ptr(&idle_injection_thread, cpu);
+		iit->should_run = 0;
+
+		wait_task_inactive(iit->tsk, 0);
+	}
+
+	cpu_hotplug_enable();
+}
+
+/**
+ * idle_injection_setup - initialize the current task as a RT task
+ * @cpu: the CPU number where the kthread is running on (not used)
+ *
+ * Called one time, this function is in charge of setting the task
+ * scheduler parameters.
+ */
+static void idle_injection_setup(unsigned int cpu)
+{
+	struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO / 2 };
+
+	set_freezable();
+
+	sched_setscheduler(current, SCHED_FIFO, &param);
+}
+
+/**
+ * idle_injection_should_run - function helper for the smpboot API
+ * @cpu: the CPU number where the kthread is running on
+ *
+ * Return: a boolean telling if the thread can run.
+ */
+static int idle_injection_should_run(unsigned int cpu)
+{
+	struct idle_injection_thread *iit =
+		per_cpu_ptr(&idle_injection_thread, cpu);
+
+	return iit->should_run;
+}
+
+/**
+ * idle_injection_register - idle injection init routine
+ * @cpumask: the list of CPUs managed by the idle injection device
+ *
+ * This is the initialization function in charge of creating the
+ * initializing of the timer and allocate the structures. It does not
+ * starts the idle injection cycles.
+ *
+ * Return: NULL if an allocation fails.
+ */
+struct idle_injection_device *idle_injection_register(struct cpumask *cpumask)
+{
+	struct idle_injection_device *ii_dev;
+	int cpu, cpu_rb;
+
+	ii_dev = kzalloc(sizeof(*ii_dev) + cpumask_size(), GFP_KERNEL);
+	if (!ii_dev)
+		return NULL;
+
+	cpumask_copy(to_cpumask(ii_dev->cpumask), cpumask);
+	hrtimer_init(&ii_dev->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	ii_dev->timer.function = idle_injection_wakeup_fn;
+
+	for_each_cpu(cpu, to_cpumask(ii_dev->cpumask)) {
+
+		if (per_cpu(idle_injection_device, cpu)) {
+			pr_err("cpu%d is already registered\n", cpu);
+			goto out_rollback_per_cpu;
+		}
+
+		per_cpu(idle_injection_device, cpu) = ii_dev;
+	}
+
+	return ii_dev;
+
+out_rollback_per_cpu:
+	for_each_cpu(cpu_rb, to_cpumask(ii_dev->cpumask)) {
+		if (cpu == cpu_rb)
+			break;
+		per_cpu(idle_injection_device, cpu_rb) = NULL;
+	}
+
+	kfree(ii_dev);
+
+	return NULL;
+}
+
+/**
+ * idle_injection_unregister - Unregister the idle injection device
+ * @ii_dev: a pointer to an idle injection device
+ *
+ * The function is in charge of stopping the idle injections,
+ * unregister the kthreads and free the allocated memory in the
+ * register function.
+ */
+void idle_injection_unregister(struct idle_injection_device *ii_dev)
+{
+	unsigned int cpu;
+
+	idle_injection_stop(ii_dev);
+
+	for_each_cpu(cpu, to_cpumask(ii_dev->cpumask))
+		per_cpu(idle_injection_device, cpu) = NULL;
+
+	kfree(ii_dev);
+}
+
+static struct smp_hotplug_thread idle_injection_threads = {
+	.store = &idle_injection_thread.tsk,
+	.setup = idle_injection_setup,
+	.thread_fn = idle_injection_fn,
+	.thread_comm = "idle_inject/%u",
+	.thread_should_run = idle_injection_should_run,
+};
+
+static int __init idle_injection_init(void)
+{
+	return smpboot_register_percpu_thread(&idle_injection_threads);
+}
+early_initcall(idle_injection_init);
diff --git a/include/linux/idle_injection.h b/include/linux/idle_injection.h
new file mode 100644
index 0000000..ffd20d7
--- /dev/null
+++ b/include/linux/idle_injection.h
@@ -0,0 +1,29 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2018 Linaro Ltd
+ *
+ * Author: Daniel Lezcano <daniel.lezcano@linaro.org>
+ *
+ */
+#ifndef __IDLE_INJECTION_H__
+#define __IDLE_INJECTION_H__
+
+/* private idle injection device structure */
+struct idle_injection_device;
+
+struct idle_injection_device *idle_injection_register(struct cpumask *cpumask);
+
+void idle_injection_unregister(struct idle_injection_device *ii_dev);
+
+int idle_injection_start(struct idle_injection_device *ii_dev);
+
+void idle_injection_stop(struct idle_injection_device *ii_dev);
+
+void idle_injection_set_duration(struct idle_injection_device *ii_dev,
+				 unsigned int run_duration_ms,
+				 unsigned int idle_duration_ms);
+
+void idle_injection_get_duration(struct idle_injection_device *ii_dev,
+				 unsigned int *run_duration_ms,
+				 unsigned int *idle_duration_ms);
+#endif /* __IDLE_INJECTION_H__ */