diff mbox

[3/3] workqueue: Schedule work on non-idle cpu instead of current one

Message ID 3232b3192e2e373cc4aaf43359d454c5ad53cddb.1348568074.git.viresh.kumar@linaro.org
State Accepted
Headers show

Commit Message

Viresh Kumar Sept. 25, 2012, 10:36 a.m. UTC
Workqueues queues work on current cpu, if the caller haven't passed a preferred
cpu. This may wake up an idle CPU, which is actually not required.

This work can be processed by any CPU and so we must select a non-idle CPU here.
This patch adds in support in workqueue framework to get preferred CPU details
from the scheduler, instead of using current CPU.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 arch/arm/Kconfig   | 11 +++++++++++
 kernel/workqueue.c | 25 ++++++++++++++++++-------
 2 files changed, 29 insertions(+), 7 deletions(-)

Comments

Peter Zijlstra Sept. 25, 2012, 11:22 a.m. UTC | #1
On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:
> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq,
> struct work_struct *work)
>  {
>         int ret;
>  
> -       ret = queue_work_on(get_cpu(), wq, work);
> -       put_cpu();
> +       preempt_disable();
> +       ret = queue_work_on(wq_select_cpu(), wq, work);
> +       preempt_enable();
>  
>         return ret;
>  }

Right, so the problem I see here is that wq_select_cpu() is horridly
expensive..

> @@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long
> __data)
>         struct delayed_work *dwork = (struct delayed_work *)__data;
>         struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
>  
> -       __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
> +       __queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
>  } 

Shouldn't timer migration have sorted this one?
Viresh Kumar Sept. 25, 2012, 11:30 a.m. UTC | #2
On 25 September 2012 16:52, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:
>> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq,
>> struct work_struct *work)
>>  {
>>         int ret;
>>
>> -       ret = queue_work_on(get_cpu(), wq, work);
>> -       put_cpu();
>> +       preempt_disable();
>> +       ret = queue_work_on(wq_select_cpu(), wq, work);
>> +       preempt_enable();
>>
>>         return ret;
>>  }
>
> Right, so the problem I see here is that wq_select_cpu() is horridly
> expensive..

But this is what the initial idea during LPC we had. Any improvements here
you can suggest?

>> @@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long
>> __data)
>>         struct delayed_work *dwork = (struct delayed_work *)__data;
>>         struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
>>
>> -       __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
>> +       __queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
>>  }
>
> Shouldn't timer migration have sorted this one?

Maybe yes. Will investigate more on it.

Thanks for your early feedback.

--
viresh
Vincent Guittot Sept. 25, 2012, 11:38 a.m. UTC | #3
On 25 September 2012 13:30, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 25 September 2012 16:52, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Tue, 2012-09-25 at 16:06 +0530, Viresh Kumar wrote:
>>> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq,
>>> struct work_struct *work)
>>>  {
>>>         int ret;
>>>
>>> -       ret = queue_work_on(get_cpu(), wq, work);
>>> -       put_cpu();
>>> +       preempt_disable();
>>> +       ret = queue_work_on(wq_select_cpu(), wq, work);
>>> +       preempt_enable();
>>>
>>>         return ret;
>>>  }
>>
>> Right, so the problem I see here is that wq_select_cpu() is horridly
>> expensive..
>
> But this is what the initial idea during LPC we had. Any improvements here
> you can suggest?

The main outcome of the LPC was that we should be able to select
another CPU than the local one.
Using the same policy than timer, is a 1st step to consolidate
interface. A next step should be to update the policy of the function

Vincent
>
>>> @@ -1102,7 +1113,7 @@ static void delayed_work_timer_fn(unsigned long
>>> __data)
>>>         struct delayed_work *dwork = (struct delayed_work *)__data;
>>>         struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
>>>
>>> -       __queue_work(smp_processor_id(), cwq->wq, &dwork->work);
>>> +       __queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
>>>  }
>>
>> Shouldn't timer migration have sorted this one?
>
> Maybe yes. Will investigate more on it.
>
> Thanks for your early feedback.
>
> --
> viresh
Peter Zijlstra Sept. 25, 2012, 11:40 a.m. UTC | #4
On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote:
> But this is what the initial idea during LPC we had.

Yeah.. that's true.

>  Any improvements here you can suggest? 

We could uhm... /me tries thinking ... reuse some of the NOHZ magic?
Would that be sufficient, not waking a NOHZ cpu, or do you really want
not waking any idle cpu?
Peter Zijlstra Sept. 25, 2012, 11:46 a.m. UTC | #5
On Tue, 2012-09-25 at 13:40 +0200, Peter Zijlstra wrote:
> On Tue, 2012-09-25 at 17:00 +0530, Viresh Kumar wrote:
> > But this is what the initial idea during LPC we had.
> 
> Yeah.. that's true.
> 
> >  Any improvements here you can suggest? 
> 
> We could uhm... /me tries thinking ... reuse some of the NOHZ magic?
> Would that be sufficient, not waking a NOHZ cpu, or do you really want
> not waking any idle cpu?

Depending on the trade-off we could have the NOHZ stuff track a
non-NOHZ-idle cpu and avoid having to compute one every time we need it.
Tejun Heo Sept. 25, 2012, 5:56 p.m. UTC | #6
Hello,

On Tue, Sep 25, 2012 at 04:06:08PM +0530, Viresh Kumar wrote:
> +config MIGRATE_WQ
> +	bool "(EXPERIMENTAL) Migrate Workqueues to non-idle cpu"
> +	depends on SMP && EXPERIMENTAL
> +	help
> +	  Workqueues queues work on current cpu, if the caller haven't passed a
> +	  preferred cpu. This may wake up an idle CPU, which is actually not
> +	  required. This work can be processed by any CPU and so we must select
> +	  a non-idle CPU here.  This patch adds in support in workqueue
> +	  framework to get preferred CPU details from the scheduler, instead of
> +	  using current CPU.

I don't think it's a good idea to make behavior like this a config
option.  The behavior difference is subtle and may induce incorrect
behavior.

> +/* This enables migration of a work to a non-IDLE cpu instead of current cpu */
> +#ifdef CONFIG_MIGRATE_WQ
> +static int wq_select_cpu(void)
> +{
> +	return sched_select_cpu(SD_NUMA, -1);
> +}
> +#else
> +#define wq_select_cpu()		smp_processor_id()
> +#endif
> +
>  /* Serializes the accesses to the list of workqueues. */
>  static DEFINE_SPINLOCK(workqueue_lock);
>  static LIST_HEAD(workqueues);
> @@ -995,7 +1005,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
>  		struct global_cwq *last_gcwq;
>  
>  		if (unlikely(cpu == WORK_CPU_UNBOUND))
> -			cpu = raw_smp_processor_id();
> +			cpu = wq_select_cpu();
>  
>  		/*
>  		 * It's multi cpu.  If @wq is non-reentrant and @work
> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work)
>  {
>  	int ret;
>  
> -	ret = queue_work_on(get_cpu(), wq, work);
> -	put_cpu();
> +	preempt_disable();
> +	ret = queue_work_on(wq_select_cpu(), wq, work);
> +	preempt_enable();

First of all, I'm not entirely sure this is safe.  queue_work() used
to *guarantee* that the work item would execute on the local CPU.  I
don't think there are many which depend on that but I'd be surprised
if this doesn't lead to some subtle problems somewhere.  It might not
be realistic to audit all users and we might have to just let it
happen and watch for the fallouts.  Dunno, still wanna see some level
of auditing.

Also, I'm wondering why this is necessary at all for workqueues.  For
schedule/queue_work(), you pretty much know the current cpu is not
idle.  For delayed workqueue, sure but for immediate scheduling, why?

Thanks.
Viresh Kumar Sept. 26, 2012, 11:21 a.m. UTC | #7
On 25 September 2012 23:26, Tejun Heo <tj@kernel.org> wrote:
> On Tue, Sep 25, 2012 at 04:06:08PM +0530, Viresh Kumar wrote:
>> +config MIGRATE_WQ
>> +     bool "(EXPERIMENTAL) Migrate Workqueues to non-idle cpu"
>> +     depends on SMP && EXPERIMENTAL
>> +     help
>> +       Workqueues queues work on current cpu, if the caller haven't passed a
>> +       preferred cpu. This may wake up an idle CPU, which is actually not
>> +       required. This work can be processed by any CPU and so we must select
>> +       a non-idle CPU here.  This patch adds in support in workqueue
>> +       framework to get preferred CPU details from the scheduler, instead of
>> +       using current CPU.
>
> I don't think it's a good idea to make behavior like this a config
> option.  The behavior difference is subtle and may induce incorrect
> behavior.

Ok. Will remove it.

>> @@ -1066,8 +1076,9 @@ int queue_work(struct workqueue_struct *wq, struct work_struct *work)
>>  {
>>       int ret;
>>
>> -     ret = queue_work_on(get_cpu(), wq, work);
>> -     put_cpu();
>> +     preempt_disable();
>> +     ret = queue_work_on(wq_select_cpu(), wq, work);
>> +     preempt_enable();
>
> First of all, I'm not entirely sure this is safe.  queue_work() used
> to *guarantee* that the work item would execute on the local CPU.  I
> don't think there are many which depend on that but I'd be surprised
> if this doesn't lead to some subtle problems somewhere.  It might not
> be realistic to audit all users and we might have to just let it
> happen and watch for the fallouts.  Dunno, still wanna see some level
> of auditing.

Ok.

> Also, I'm wondering why this is necessary at all for workqueues.  For
> schedule/queue_work(), you pretty much know the current cpu is not
> idle.  For delayed workqueue, sure but for immediate scheduling, why?

This was done for below scenario:
- A cpu has programmed a timer and is IDLE now.
- CPU gets into interrupt handler due to timer and queues a work. As the
CPU is currently IDLE, we should queue this work to some other CPU.

I know this patch did migrate works in all cases. Will fix it by queuing work
only for this case in V2.

--
viresh
diff mbox

Patch

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 5944511..da17bd0 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1594,6 +1594,17 @@  config HMP_SLOW_CPU_MASK
 	  Specify the cpuids of the slow CPUs in the system as a list string,
 	  e.g. cpuid 0+1 should be specified as 0-1.
 
+config MIGRATE_WQ
+	bool "(EXPERIMENTAL) Migrate Workqueues to non-idle cpu"
+	depends on SMP && EXPERIMENTAL
+	help
+	  Workqueues queues work on current cpu, if the caller haven't passed a
+	  preferred cpu. This may wake up an idle CPU, which is actually not
+	  required. This work can be processed by any CPU and so we must select
+	  a non-idle CPU here.  This patch adds in support in workqueue
+	  framework to get preferred CPU details from the scheduler, instead of
+	  using current CPU.
+
 config HAVE_ARM_SCU
 	bool
 	help
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 692a55b..fd8df4a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -456,6 +456,16 @@  static inline void debug_work_activate(struct work_struct *work) { }
 static inline void debug_work_deactivate(struct work_struct *work) { }
 #endif
 
+/* This enables migration of a work to a non-IDLE cpu instead of current cpu */
+#ifdef CONFIG_MIGRATE_WQ
+static int wq_select_cpu(void)
+{
+	return sched_select_cpu(SD_NUMA, -1);
+}
+#else
+#define wq_select_cpu()		smp_processor_id()
+#endif
+
 /* Serializes the accesses to the list of workqueues. */
 static DEFINE_SPINLOCK(workqueue_lock);
 static LIST_HEAD(workqueues);
@@ -995,7 +1005,7 @@  static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		struct global_cwq *last_gcwq;
 
 		if (unlikely(cpu == WORK_CPU_UNBOUND))
-			cpu = raw_smp_processor_id();
+			cpu = wq_select_cpu();
 
 		/*
 		 * It's multi cpu.  If @wq is non-reentrant and @work
@@ -1066,8 +1076,9 @@  int queue_work(struct workqueue_struct *wq, struct work_struct *work)
 {
 	int ret;
 
-	ret = queue_work_on(get_cpu(), wq, work);
-	put_cpu();
+	preempt_disable();
+	ret = queue_work_on(wq_select_cpu(), wq, work);
+	preempt_enable();
 
 	return ret;
 }
@@ -1102,7 +1113,7 @@  static void delayed_work_timer_fn(unsigned long __data)
 	struct delayed_work *dwork = (struct delayed_work *)__data;
 	struct cpu_workqueue_struct *cwq = get_work_cwq(&dwork->work);
 
-	__queue_work(smp_processor_id(), cwq->wq, &dwork->work);
+	__queue_work(wq_select_cpu(), cwq->wq, &dwork->work);
 }
 
 /**
@@ -1158,7 +1169,7 @@  int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 			if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
 				lcpu = gcwq->cpu;
 			else
-				lcpu = raw_smp_processor_id();
+				lcpu = wq_select_cpu();
 		} else
 			lcpu = WORK_CPU_UNBOUND;
 
@@ -2823,8 +2834,8 @@  EXPORT_SYMBOL_GPL(cancel_work_sync);
 static inline void __flush_delayed_work(struct delayed_work *dwork)
 {
 	if (del_timer_sync(&dwork->timer))
-		__queue_work(raw_smp_processor_id(),
-			     get_work_cwq(&dwork->work)->wq, &dwork->work);
+		__queue_work(wq_select_cpu(), get_work_cwq(&dwork->work)->wq,
+			     &dwork->work);
 }
 
 /**