[V5,1/5] workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues

Message ID CAKohpo=U5OqN7SM7XyOWv-WoUHOoJvGc6wcGH61+uikPK3novg@mail.gmail.com
State Accepted
Headers show

Commit Message

Viresh Kumar April 25, 2013, 3:43 a.m.
On 25 April 2013 09:00, Amit Kucheria <amit.kucheria@linaro.org> wrote:
> Yes. That was my intention - preventing a prompt on existing defconfigs and
> there by maintaining current behavior.

Hmm... Following is the version after fixing all problems you reported.
@Tejun: I have attached it too as gmail's copy-paste may break it. Please
consider applying this series if it looks fine to you.


---------------x----------------x---------------------

From: Viresh Kumar <viresh.kumar@linaro.org>
Date: Mon, 8 Apr 2013 16:45:40 +0530
Subject: [PATCH V5 1/5] workqueues: Introduce new flag WQ_POWER_EFFICIENT for
 power oriented workqueues

Workqueues can be performance or power-oriented. Currently, most workqueues are
bound to the CPU they were created on. This gives good performance (due to cache
effects) at the cost of potentially waking up otherwise idle cores just to
process some work. To save power, we can allow the work to be rescheduled on a
core that is already awake.

Workqueues created with the WQ_UNBOUND flag will allow some power savings.
However, we don't change the default behaviour of the system.  To enable
power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
be turned on. This option can also be overridden by the
workqueue.power_efficient boot parameter.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 Documentation/kernel-parameters.txt | 17 +++++++++++++++++
 include/linux/workqueue.h           |  3 +++
 kernel/power/Kconfig                | 19 +++++++++++++++++++
 kernel/workqueue.c                  | 11 +++++++++++
 4 files changed, 50 insertions(+)

exclusion */
@@ -4085,6 +4093,9 @@ struct workqueue_struct
*__alloc_workqueue_key(const char *fmt,
 	struct workqueue_struct *wq;
 	struct pool_workqueue *pwq;

+	if ((flags & WQ_POWER_EFFICIENT) && wq_power_efficient)
+		flags |= WQ_UNBOUND;
+
 	/* allocate wq and format name */
 	if (flags & WQ_UNBOUND)
 		tbl_size = wq_numa_tbl_len * sizeof(wq->numa_pwq_tbl[0]);

Comments

Amit Kucheria April 25, 2013, 11:13 a.m. | #1
On Thu, Apr 25, 2013 at 9:13 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 25 April 2013 09:00, Amit Kucheria <amit.kucheria@linaro.org> wrote:
>> Yes. That was my intention - preventing a prompt on existing defconfigs and
>> there by maintaining current behavior.
>
> Hmm... Following is the version after fixing all problems you reported.
> @Tejun: I have attached it too as gmail's copy-paste may break it. Please
> consider applying this series if it looks fine to you.
>
>
> ---------------x----------------x---------------------
>
> From: Viresh Kumar <viresh.kumar@linaro.org>
> Date: Mon, 8 Apr 2013 16:45:40 +0530
> Subject: [PATCH V5 1/5] workqueues: Introduce new flag WQ_POWER_EFFICIENT for
>  power oriented workqueues
>
> Workqueues can be performance or power-oriented. Currently, most workqueues are
> bound to the CPU they were created on. This gives good performance (due to cache
> effects) at the cost of potentially waking up otherwise idle cores just to
> process some work. To save power, we can allow the work to be rescheduled on a
> core that is already awake.
>
> Workqueues created with the WQ_UNBOUND flag will allow some power savings.
> However, we don't change the default behaviour of the system.  To enable
> power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
> be turned on. This option can also be overridden by the
> workqueue.power_efficient boot parameter.
>
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

For the series, you can add my:

Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>

> ---
>  Documentation/kernel-parameters.txt | 17 +++++++++++++++++
>  include/linux/workqueue.h           |  3 +++
>  kernel/power/Kconfig                | 19 +++++++++++++++++++
>  kernel/workqueue.c                  | 11 +++++++++++
>  4 files changed, 50 insertions(+)
>
> diff --git a/Documentation/kernel-parameters.txt
> b/Documentation/kernel-parameters.txt
> index c4fa000..22edc83 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3312,6 +3312,23 @@ bytes respectively. Such letter suffixes can
> also be entirely omitted.
>                         that this also can be controlled per-workqueue for
>                         workqueues visible under /sys/bus/workqueue/.
>
> +       workqueue.power_efficient
> +                       Workqueues can be performance or power-oriented.
> +                       Currently, most workqueues are bound to the CPU they
> +                       were created on. This gives good performance (due to
> +                       cache effects) at the cost of potentially waking up
> +                       otherwise idle cores just to process some work. To save
> +                       power, we can allow the work to be rescheduled on a core
> +                       that is already awake.
> +
> +                       Workqueues created with the WQ_UNBOUND flag will allow
> +                       some power savings.  However, we don't change the
> +                       default behaviour of the system.  To enable power-saving
> +                       behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT
> +                       needs to be turned on. This option can also be
> +                       overridden by the workqueue.power_efficient boot
> +                       parameter.
> +
>         x2apic_phys     [X86-64,APIC] Use x2apic physical mode instead of
>                         default x2apic cluster mode on platforms
>                         supporting x2apic.
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 623488f..83fa570 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -302,6 +302,9 @@ enum {
>         WQ_HIGHPRI              = 1 << 4, /* high priority */
>         WQ_CPU_INTENSIVE        = 1 << 5, /* cpu instensive workqueue */
>         WQ_SYSFS                = 1 << 6, /* visible in sysfs, see wq_sysfs_register() */
> +       WQ_POWER_EFFICIENT      = 1 << 7, /* WQ_UNBOUND, for power
> +                                          * saving, if wq_power_efficient is
> +                                          * enabled. Unused otherwise. */
>
>         __WQ_DRAINING           = 1 << 16, /* internal: workqueue is draining */
>         __WQ_ORDERED            = 1 << 17, /* internal: workqueue is ordered */
> diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
> index 5dfdc9e..018f039 100644
> --- a/kernel/power/Kconfig
> +++ b/kernel/power/Kconfig
> @@ -263,6 +263,25 @@ config PM_GENERIC_DOMAINS
>         bool
>         depends on PM
>
> +config WQ_POWER_EFFICIENT
> +       bool "Workqueue allocated as UNBOUND (by default) for power efficiency"
> +       depends on PM
> +       default n
> +       help
> +         Workqueues can be performance or power-oriented. Currently, most
> +         workqueues are bound to the CPU they were created on. This gives good
> +         performance (due to cache effects) at the cost of potentially waking
> +         up otherwise idle cores just to process some work. To save power, we
> +         can allow the work to be rescheduled on a core that is already awake.
> +
> +         Workqueues created with the WQ_UNBOUND flag will allow some power
> +         savings.  However, we don't change the default behaviour of the
> +         system.  To enable power-saving behaviour, a new config option
> +         CONFIG_WQ_POWER_EFFICIENT needs to be turned on. This option can also
> +         be overridden by the workqueue.power_efficient boot parameter.
> +
> +         If in doubt, say N.
> +
>  config PM_GENERIC_DOMAINS_SLEEP
>         def_bool y
>         depends on PM_SLEEP && PM_GENERIC_DOMAINS
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 4aa9f5b..a327027 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -272,6 +272,14 @@ static cpumask_var_t *wq_numa_possible_cpumask;
>  static bool wq_disable_numa;
>  module_param_named(disable_numa, wq_disable_numa, bool, 0444);
>
> +#ifdef CONFIG_WQ_POWER_EFFICIENT
> +static bool wq_power_efficient = true;
> +#else
> +static bool wq_power_efficient;
> +#endif
> +
> +module_param_named(power_efficient, wq_power_efficient, bool, 0444);
> +
>  static bool wq_numa_enabled;           /* unbound NUMA affinity enabled */
>
>  /* buf for wq_update_unbound_numa_attrs(), protected by CPU hotplug
> exclusion */
> @@ -4085,6 +4093,9 @@ struct workqueue_struct
> *__alloc_workqueue_key(const char *fmt,
>         struct workqueue_struct *wq;
>         struct pool_workqueue *pwq;
>
> +       if ((flags & WQ_POWER_EFFICIENT) && wq_power_efficient)
> +               flags |= WQ_UNBOUND;
> +
>         /* allocate wq and format name */
>         if (flags & WQ_UNBOUND)
>                 tbl_size = wq_numa_tbl_len * sizeof(wq->numa_pwq_tbl[0]);
Viresh Kumar April 25, 2013, 11:15 a.m. | #2
On 25 April 2013 16:43, Amit Kucheria <amit.kucheria@linaro.org> wrote:
> On Thu, Apr 25, 2013 at 9:13 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> On 25 April 2013 09:00, Amit Kucheria <amit.kucheria@linaro.org> wrote:
>>> Yes. That was my intention - preventing a prompt on existing defconfigs and
>>> there by maintaining current behavior.
>>
>> Hmm... Following is the version after fixing all problems you reported.
>> @Tejun: I have attached it too as gmail's copy-paste may break it. Please
>> consider applying this series if it looks fine to you.
>>
>>
>> ---------------x----------------x---------------------
>>
>> From: Viresh Kumar <viresh.kumar@linaro.org>
>> Date: Mon, 8 Apr 2013 16:45:40 +0530
>> Subject: [PATCH V5 1/5] workqueues: Introduce new flag WQ_POWER_EFFICIENT for
>>  power oriented workqueues
>>
>> Workqueues can be performance or power-oriented. Currently, most workqueues are
>> bound to the CPU they were created on. This gives good performance (due to cache
>> effects) at the cost of potentially waking up otherwise idle cores just to
>> process some work. To save power, we can allow the work to be rescheduled on a
>> core that is already awake.
>>
>> Workqueues created with the WQ_UNBOUND flag will allow some power savings.
>> However, we don't change the default behaviour of the system.  To enable
>> power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
>> be turned on. This option can also be overridden by the
>> workqueue.power_efficient boot parameter.
>>
>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
>
> For the series, you can add my:
>
> Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>

Thanks.
Tejun Heo April 26, 2013, 7:11 p.m. | #3
Hey, Viresh.

It's already too late for the upcoming merge window, but things
generally look good to me and I'll apply the patchset once wq/for-3.11
opens.  One nitpick tho.

On Thu, Apr 25, 2013 at 09:13:44AM +0530, Viresh Kumar wrote:
> +	workqueue.power_efficient
> +			Workqueues can be performance or power-oriented.
> +			Currently, most workqueues are bound to the CPU they
                                   ^^^^
                                   per-cpu would be better

> +			were created on. This gives good performance (due to
> +			cache effects) at the cost of potentially waking up
> +			otherwise idle cores just to process some work. To save
> +			power, we can allow the work to be rescheduled on a core
> +			that is already awake.

The above description is confusing to me.  As have been discussed
multiple times before, per-cpu workqueue in itself doesn't wake up the
CPU physically.  The timer may but per-cpu workqueue doesn't.  It was
confusing when this patchset was first posted and the above phrasing
is still confusing.  What the patchset tries to do is preventing the
scheduler from perceiving the CPU as active due to the activated
worker thread pinned to that CPU, right?  The knob doesn't really do
anything about waking up the processor in itself.  It just avoids
feeding the scheduler with noisy activation events and allows it to
allocate work item execution according to the scheduler's view of CPU
active/idleness.  As the scheduler has longer / larger scope of
overall CPU activities and means to regulate them, this leads to more
power-efficient allocation of work item executions, right?  It'd be
really great if the descriptions and the comment above the flag makes
this abundantly clear because it's not something too apparent.

Thanks.
Tejun Heo April 29, 2013, 4:19 p.m. | #4
Hello,

On Mon, Apr 29, 2013 at 12:06:28PM +0530, Viresh Kumar wrote:
> Whatever you wrote above confused me even more :)

Heh heh, confumageddon!!!!

> This is what i had in my mind until now. Its not about per-cpu workqueue.
> 
> Lets take example of system_wq. It doesn't have WQ_UNBOUND flag set.
> Now if we call queue_work_on() with cpu x and sytem_wq, then work
> will execute on cpu x. If we call queue_work() then it will queue the work
> on local cpu.

Yeap, !WQ_UNBOUND workqueues == per-cpu workqueues.

> At this time local cpu may be busy or idle (Atleast according to scheduler).
> We don't want a idle cpu (From schedulers perspective) to be used for
> running this work's handler due to two reasons.
> - idle cpu may be in WFI or deeper idle states and so we can avoid waking
>   it up.

I have no idea what WFI is but the physical CPU is already awake at
that time.  It can't be idle - it's running queue_work().  It could be
running in lower freq tho, which each code piece doesn't really have
much control over.

> - We will make idle cpu look busy and so other kernel stuff may be scheduled
>   on it now. But we could have kept it idle for a long time.

Hmmm... yeah, about the same thing I wrote, it's not really about not
waking up the CPU right now physically but avoiding forcing the
scheduler scheduling a pinned task on an otherwise quiescent CPU.
This effectively allows the scheduler to migrate such work items
towards a CPU which the scheduler considers to be better (in power or
whatever) leading to noticeable powersave.

> And what timer are you talking about? I am not talking about deffered work only,
> but normal work too.

Deferred work item == timer + work item.

> I might have wrongly phrased some part of my patch (maybe used workqueue
> instead of work), will fix that up.

I think it'd be necessary to distinguish the physical CPU being idle
and the scheduler considers it to be idle (no task to schedule on it)
and explain how increasing the latter can lead to powersave.  As it's
currently written, it seemingly, to me anyway, suggests that the
proposed change somehow avoids waking up actually idle CPU, which
isn't the case as queue_work() *always* schedules on the local CPU.
The local CPU can't be idle by definition.

Thanks.
Viresh Kumar April 29, 2013, 4:42 p.m. | #5
On 29 April 2013 21:49, Tejun Heo <tj@kernel.org> wrote:
> On Mon, Apr 29, 2013 at 12:06:28PM +0530, Viresh Kumar wrote:
> Yeap, !WQ_UNBOUND workqueues == per-cpu workqueues.

Sigh!! You were talking about thread per cpu here... Sorry for missing
it earlier :(

>> At this time local cpu may be busy or idle (Atleast according to scheduler).
>> We don't want a idle cpu (From schedulers perspective) to be used for
>> running this work's handler due to two reasons.
>> - idle cpu may be in WFI or deeper idle states and so we can avoid waking
>>   it up.
>
> I have no idea what WFI is but the physical CPU is already awake at
> that time.  It can't be idle - it's running queue_work().  It could be
> running in lower freq tho, which each code piece doesn't really have
> much control over.

Stupid point. WFI: Wait for interrupt (low power mode of cpu).

>> - We will make idle cpu look busy and so other kernel stuff may be scheduled
>>   on it now. But we could have kept it idle for a long time.
>
> Hmmm... yeah, about the same thing I wrote, it's not really about not
> waking up the CPU right now physically but avoiding forcing the
> scheduler scheduling a pinned task on an otherwise quiescent CPU.
> This effectively allows the scheduler to migrate such work items
> towards a CPU which the scheduler considers to be better (in power or
> whatever) leading to noticeable powersave.

Correct.

>> And what timer are you talking about? I am not talking about deffered work only,
>> but normal work too.
>
> Deferred work item == timer + work item.

Ya, i knew that :)

>> I might have wrongly phrased some part of my patch (maybe used workqueue
>> instead of work), will fix that up.
>
> I think it'd be necessary to distinguish the physical CPU being idle
> and the scheduler considers it to be idle (no task to schedule on it)
> and explain how increasing the latter can lead to powersave.  As it's
> currently written, it seemingly, to me anyway, suggests that the
> proposed change somehow avoids waking up actually idle CPU, which
> isn't the case as queue_work() *always* schedules on the local CPU.
> The local CPU can't be idle by definition.

Yes you are correct. I will fix it.

Patch

diff --git a/Documentation/kernel-parameters.txt
b/Documentation/kernel-parameters.txt
index c4fa000..22edc83 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3312,6 +3312,23 @@  bytes respectively. Such letter suffixes can
also be entirely omitted.
 			that this also can be controlled per-workqueue for
 			workqueues visible under /sys/bus/workqueue/.

+	workqueue.power_efficient
+			Workqueues can be performance or power-oriented.
+			Currently, most workqueues are bound to the CPU they
+			were created on. This gives good performance (due to
+			cache effects) at the cost of potentially waking up
+			otherwise idle cores just to process some work. To save
+			power, we can allow the work to be rescheduled on a core
+			that is already awake.
+
+			Workqueues created with the WQ_UNBOUND flag will allow
+			some power savings.  However, we don't change the
+			default behaviour of the system.  To enable power-saving
+			behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT
+			needs to be turned on. This option can also be
+			overridden by the workqueue.power_efficient boot
+			parameter.
+
 	x2apic_phys	[X86-64,APIC] Use x2apic physical mode instead of
 			default x2apic cluster mode on platforms
 			supporting x2apic.
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 623488f..83fa570 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -302,6 +302,9 @@  enum {
 	WQ_HIGHPRI		= 1 << 4, /* high priority */
 	WQ_CPU_INTENSIVE	= 1 << 5, /* cpu instensive workqueue */
 	WQ_SYSFS		= 1 << 6, /* visible in sysfs, see wq_sysfs_register() */
+	WQ_POWER_EFFICIENT	= 1 << 7, /* WQ_UNBOUND, for power
+					   * saving, if wq_power_efficient is
+					   * enabled. Unused otherwise. */

 	__WQ_DRAINING		= 1 << 16, /* internal: workqueue is draining */
 	__WQ_ORDERED		= 1 << 17, /* internal: workqueue is ordered */
diff --git a/kernel/power/Kconfig b/kernel/power/Kconfig
index 5dfdc9e..018f039 100644
--- a/kernel/power/Kconfig
+++ b/kernel/power/Kconfig
@@ -263,6 +263,25 @@  config PM_GENERIC_DOMAINS
 	bool
 	depends on PM

+config WQ_POWER_EFFICIENT
+	bool "Workqueue allocated as UNBOUND (by default) for power efficiency"
+	depends on PM
+	default n
+	help
+	  Workqueues can be performance or power-oriented. Currently, most
+	  workqueues are bound to the CPU they were created on. This gives good
+	  performance (due to cache effects) at the cost of potentially waking
+	  up otherwise idle cores just to process some work. To save power, we
+	  can allow the work to be rescheduled on a core that is already awake.
+
+	  Workqueues created with the WQ_UNBOUND flag will allow some power
+	  savings.  However, we don't change the default behaviour of the
+	  system.  To enable power-saving behaviour, a new config option
+	  CONFIG_WQ_POWER_EFFICIENT needs to be turned on. This option can also
+	  be overridden by the workqueue.power_efficient boot parameter.
+
+	  If in doubt, say N.
+
 config PM_GENERIC_DOMAINS_SLEEP
 	def_bool y
 	depends on PM_SLEEP && PM_GENERIC_DOMAINS
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4aa9f5b..a327027 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -272,6 +272,14 @@  static cpumask_var_t *wq_numa_possible_cpumask;
 static bool wq_disable_numa;
 module_param_named(disable_numa, wq_disable_numa, bool, 0444);

+#ifdef CONFIG_WQ_POWER_EFFICIENT
+static bool wq_power_efficient = true;
+#else
+static bool wq_power_efficient;
+#endif
+
+module_param_named(power_efficient, wq_power_efficient, bool, 0444);
+
 static bool wq_numa_enabled;		/* unbound NUMA affinity enabled */

 /* buf for wq_update_unbound_numa_attrs(), protected by CPU hotplug