diff mbox

[V4,3/4] block: queue work on unbound wq

Message ID 91239cde99aaba2715f63db1f88241d9f4a36e13.1364740180.git.viresh.kumar@linaro.org
State Accepted
Headers show

Commit Message

Viresh Kumar March 31, 2013, 2:31 p.m. UTC
Block layer uses workqueues for multiple purposes. There is no real dependency
of scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces normal workqueues with UNBOUND versions.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
 block/blk-core.c |  3 ++-
 block/blk-ioc.c  |  2 +-
 block/genhd.c    | 10 ++++++----
 3 files changed, 9 insertions(+), 6 deletions(-)

Comments

Tejun Heo March 31, 2013, 6:19 p.m. UTC | #1
Hello, Viresh.

On Sun, Mar 31, 2013 at 08:01:46PM +0530, Viresh Kumar wrote:
> Block layer uses workqueues for multiple purposes. There is no real dependency
> of scheduling these on the cpu which scheduled them.
> 
> On a idle system, it is observed that and idle cpu wakes up many times just to
> service this work. It would be better if we can schedule it on a cpu which the
> scheduler believes to be the most appropriate one.
> 
> This patch replaces normal workqueues with UNBOUND versions.

Hmm.... so, we really don't want to unconditionally convert workqueues
to unbound.  Especially not kblockd.  On configurations with multiple
high iops devices with IRQ routing, having request completion runinng
on the same CPU has significant performance advantages.  We can't
simply switch it to an unbound wokrqueue because it saves power on
small arm configuration.

Plus, I'd much prefer to be clearly marking the workqueues which would
contribute to powersaving when converted to unbound at least until we
can come up with a no-compromise solution which doesn't need to trade
off between cache locality and powersaving.

So, let's please introduce a new flag to mark these workqueues, say,
WQ_UNBOUND_FOR_POWER_SAVE or whatever (please come up with a better
name) and provide a compile time switch with boot time override.

Thanks.
Viresh Kumar April 1, 2013, 6:31 a.m. UTC | #2
Hi Tejun,

On 31 March 2013 23:49, Tejun Heo <tj@kernel.org> wrote:
> So, let's please introduce a new flag to mark these workqueues, say,
> WQ_UNBOUND_FOR_POWER_SAVE or whatever (please come up with a better
> name) and provide a compile time switch with boot time override.

Just wanted to make this clear before writing it:

You want me to do something like (With better names):

int wq_unbound_for_power_save_enabled = 0;

#ifdef CONFIG_WQ_UNBOUND_FOR_POWER_SAVE
#define WQ_UNBOUND_FOR_POWER_SAVE   wq_unbound_for_power_save_enabled
#else
#define WQ_UNBOUND_FOR_POWER_SAVE 0

And provide a call to enable/disable wq_unbound_for_power_save_enabled ??
Tejun Heo April 3, 2013, 9:54 p.m. UTC | #3
Hello, Viresh.

Sorry about the delay.  Lost this one somehow.

On Mon, Apr 01, 2013 at 12:01:05PM +0530, Viresh Kumar wrote:
> Just wanted to make this clear before writing it:
> 
> You want me to do something like (With better names):
> 
> int wq_unbound_for_power_save_enabled = 0;
> 
> #ifdef CONFIG_WQ_UNBOUND_FOR_POWER_SAVE
> #define WQ_UNBOUND_FOR_POWER_SAVE   wq_unbound_for_power_save_enabled
> #else
> #define WQ_UNBOUND_FOR_POWER_SAVE 0
> 
> And provide a call to enable/disable wq_unbound_for_power_save_enabled ??

Not a call, probably a module_param() so that it can be switched
on/off during boot.  You can make the param writable so that it can be
flipped run-time but updating existing workqueues would be non-trivial
and I don't think it's gonna be worthwhile.

Thanks!
Viresh Kumar April 5, 2013, 9:47 a.m. UTC | #4
On 4 April 2013 03:24, Tejun Heo <tj@kernel.org> wrote:
> Not a call, probably a module_param() so that it can be switched
> on/off during boot.  You can make the param writable so that it can be
> flipped run-time but updating existing workqueues would be non-trivial
> and I don't think it's gonna be worthwhile.

module_param()?? We can't compile kernel/workqueue.c as a module and
hence i went with #define + a variable with functions to set/reset it...

I am not looking to update all existing workqueues to use it but workqueues
which are affecting power for us... And if there are some very very
performance critical ones, then we must better use queue_work_on() for
them to make it more clear.
Viresh Kumar April 8, 2013, 10:24 a.m. UTC | #5
On 5 April 2013 17:52, Tejun Heo <tj@kernel.org> wrote:
> On Fri, Apr 5, 2013 at 2:47 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
>> module_param()?? We can't compile kernel/workqueue.c as a module and
>> hence i went with #define + a variable with functions to set/reset it...
>
> module_param works fine for kernel proper.

Got it!! Btw, to which Kconfig should i add WQ config option?
Viresh Kumar April 8, 2013, 11:02 a.m. UTC | #6
On 4 April 2013 03:24, Tejun Heo <tj@kernel.org> wrote:
> Not a call, probably a module_param() so that it can be switched
> on/off during boot.  You can make the param writable so that it can be
> flipped run-time but updating existing workqueues would be non-trivial
> and I don't think it's gonna be worthwhile.

It might be better if we make it read only and so don't try ugly stuff on the
fly. Might be good for testing but not otherwise..
Jens Axboe April 8, 2013, 11:13 a.m. UTC | #7
On Sun, Mar 31 2013, Tejun Heo wrote:
> Hello, Viresh.
> 
> On Sun, Mar 31, 2013 at 08:01:46PM +0530, Viresh Kumar wrote:
> > Block layer uses workqueues for multiple purposes. There is no real dependency
> > of scheduling these on the cpu which scheduled them.
> > 
> > On a idle system, it is observed that and idle cpu wakes up many times just to
> > service this work. It would be better if we can schedule it on a cpu which the
> > scheduler believes to be the most appropriate one.
> > 
> > This patch replaces normal workqueues with UNBOUND versions.
> 
> Hmm.... so, we really don't want to unconditionally convert workqueues
> to unbound.  Especially not kblockd.  On configurations with multiple
> high iops devices with IRQ routing, having request completion runinng
> on the same CPU has significant performance advantages.  We can't
> simply switch it to an unbound wokrqueue because it saves power on
> small arm configuration.

I had the same complaint, when it was posted originally...

> Plus, I'd much prefer to be clearly marking the workqueues which would
> contribute to powersaving when converted to unbound at least until we
> can come up with a no-compromise solution which doesn't need to trade
> off between cache locality and powersaving.
> 
> So, let's please introduce a new flag to mark these workqueues, say,
> WQ_UNBOUND_FOR_POWER_SAVE or whatever (please come up with a better
> name) and provide a compile time switch with boot time override.

And lets please have it off by default. The specialized configs / setups
can turn it on, but we should default to better performance.
Viresh Kumar April 8, 2013, 11:14 a.m. UTC | #8
On 8 April 2013 16:43, Jens Axboe <axboe@kernel.dk> wrote:
> And lets please have it off by default. The specialized configs / setups
> can turn it on, but we should default to better performance.

Yes, we will.
diff mbox

Patch

diff --git a/block/blk-core.c b/block/blk-core.c
index 492242f..91cd486 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3186,7 +3186,8 @@  int __init blk_dev_init(void)
 
 	/* used for unplugging and affects IO latency/throughput - HIGHPRI */
 	kblockd_workqueue = alloc_workqueue("kblockd",
-					    WQ_MEM_RECLAIM | WQ_HIGHPRI, 0);
+					    WQ_MEM_RECLAIM | WQ_HIGHPRI |
+					    WQ_UNBOUND, 0);
 	if (!kblockd_workqueue)
 		panic("Failed to create kblockd\n");
 
diff --git a/block/blk-ioc.c b/block/blk-ioc.c
index 9c4bb82..5dd576d 100644
--- a/block/blk-ioc.c
+++ b/block/blk-ioc.c
@@ -144,7 +144,7 @@  void put_io_context(struct io_context *ioc)
 	if (atomic_long_dec_and_test(&ioc->refcount)) {
 		spin_lock_irqsave(&ioc->lock, flags);
 		if (!hlist_empty(&ioc->icq_list))
-			schedule_work(&ioc->release_work);
+			queue_work(system_unbound_wq, &ioc->release_work);
 		else
 			free_ioc = true;
 		spin_unlock_irqrestore(&ioc->lock, flags);
diff --git a/block/genhd.c b/block/genhd.c
index a1ed52a..0f4470a 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1488,9 +1488,10 @@  static void __disk_unblock_events(struct gendisk *disk, bool check_now)
 	intv = disk_events_poll_jiffies(disk);
 	set_timer_slack(&ev->dwork.timer, intv / 4);
 	if (check_now)
-		queue_delayed_work(system_freezable_wq, &ev->dwork, 0);
+		queue_delayed_work(system_freezable_unbound_wq, &ev->dwork, 0);
 	else if (intv)
-		queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+		queue_delayed_work(system_freezable_unbound_wq, &ev->dwork,
+				intv);
 out_unlock:
 	spin_unlock_irqrestore(&ev->lock, flags);
 }
@@ -1533,7 +1534,7 @@  void disk_flush_events(struct gendisk *disk, unsigned int mask)
 	spin_lock_irq(&ev->lock);
 	ev->clearing |= mask;
 	if (!ev->block)
-		mod_delayed_work(system_freezable_wq, &ev->dwork, 0);
+		mod_delayed_work(system_freezable_unbound_wq, &ev->dwork, 0);
 	spin_unlock_irq(&ev->lock);
 }
 
@@ -1626,7 +1627,8 @@  static void disk_check_events(struct disk_events *ev,
 
 	intv = disk_events_poll_jiffies(disk);
 	if (!ev->block && intv)
-		queue_delayed_work(system_freezable_wq, &ev->dwork, intv);
+		queue_delayed_work(system_freezable_unbound_wq, &ev->dwork,
+				intv);
 
 	spin_unlock_irq(&ev->lock);