diff mbox series

driver core: Use unbound workqueue for deferred probes

Message ID 1614167749-22005-1-git-send-email-ylal@codeaurora.org
State Superseded
Headers show
Series driver core: Use unbound workqueue for deferred probes | expand

Commit Message

Yogesh Lal Feb. 24, 2021, 11:55 a.m. UTC
Queue deferred driver probes on unbounded workqueue, to allow
scheduler better manage scheduling of long running probes.

Signed-off-by: Yogesh Lal <ylal@codeaurora.org>
---
 drivers/base/dd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Yogesh Lal Feb. 25, 2021, 10:33 a.m. UTC | #1
Hi Greg,


On 2/24/2021 6:13 PM, Greg KH wrote:
> On Wed, Feb 24, 2021 at 05:25:49PM +0530, Yogesh Lal wrote:

>> Queue deferred driver probes on unbounded workqueue, to allow

>> scheduler better manage scheduling of long running probes.

> 

> Really?  What does this change and help?  What is the visable affect of

> this patch?  What problem does it solve?

> 


We observed boot up improvement (~400 msec) when the deferred probe work 
is made unbound. This is due to scheduler moving the worker running 
deferred probe work to big CPUs. without this change, we see the worker 
is running on LITTLE CPU due to affinity.
​
Please let us now if there are any concerns/restrictions that deferred 
probe work should run only on pinned kworkers. Since this work runs 
deferred probe of several devices , the locality may not be that important

Thanks
Yogesh Lal

> thanks,

> 

> greg k-h

> 


-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
Greg Kroah-Hartman Feb. 25, 2021, 11:44 a.m. UTC | #2
On Thu, Feb 25, 2021 at 04:03:50PM +0530, Yogesh Lal wrote:
> Hi Greg,

> 

> 

> On 2/24/2021 6:13 PM, Greg KH wrote:

> > On Wed, Feb 24, 2021 at 05:25:49PM +0530, Yogesh Lal wrote:

> > > Queue deferred driver probes on unbounded workqueue, to allow

> > > scheduler better manage scheduling of long running probes.

> > 

> > Really?  What does this change and help?  What is the visable affect of

> > this patch?  What problem does it solve?

> > 

> 

> We observed boot up improvement (~400 msec) when the deferred probe work is

> made unbound. This is due to scheduler moving the worker running deferred

> probe work to big CPUs. without this change, we see the worker is running on

> LITTLE CPU due to affinity.


Why is none of this information in the changelog text?  How are we
supposed to know this?  And is this 400msec out of 10 seconds or
something else?  Also, this sounds like your "little" cpus are really
bad, you might want to look into fixing them first :)

But if you really want to make this go faster, do not deferr your probe!
Why not fix that problem in your drivers instead?

> Please let us now if there are any concerns/restrictions that deferred probe

> work should run only on pinned kworkers. Since this work runs deferred probe

> of several devices , the locality may not be that important


Can you prove that it is not important?  I know lots of gyrations are
done in some busses to keep probe happening on the same CPU for very
good reasons.  Changing that should not be done lightly as you will
break this.

thanks,

greg k-h
Yogesh Lal March 15, 2021, 10:45 a.m. UTC | #3
On 2/25/2021 5:14 PM, Greg KH wrote:
> On Thu, Feb 25, 2021 at 04:03:50PM +0530, Yogesh Lal wrote:

>> Hi Greg,

>>

>>

>> On 2/24/2021 6:13 PM, Greg KH wrote:

>>> On Wed, Feb 24, 2021 at 05:25:49PM +0530, Yogesh Lal wrote:

>>>> Queue deferred driver probes on unbounded workqueue, to allow

>>>> scheduler better manage scheduling of long running probes.

>>>

>>> Really?  What does this change and help?  What is the visable affect of

>>> this patch?  What problem does it solve?

>>>

>>

>> We observed boot up improvement (~400 msec) when the deferred probe work is

>> made unbound. This is due to scheduler moving the worker running deferred

>> probe work to big CPUs. without this change, we see the worker is running on

>> LITTLE CPU due to affinity.

> 

> Why is none of this information in the changelog text?  How are we

> supposed to know this?  And is this 400msec out of 10 seconds or


We wanted to  first understand the requirement of bounded deferred probe 
why it was really required.

> something else?  Also, this sounds like your "little" cpus are really

> bad, you might want to look into fixing them first :)

> 


~600ms (deferred probe bound to little core) and ~200ms (deferred probe 
queued on unbound wq).

> But if you really want to make this go faster, do not deferr your probe!

> Why not fix that problem in your drivers instead?

> 


Yes, we are exploring in that direction as well but want to get upstream 
opinion and understand the usability of unbounded wq.

>> Please let us now if there are any concerns/restrictions that deferred probe

>> work should run only on pinned kworkers. Since this work runs deferred probe

>> of several devices , the locality may not be that important

> 

> Can you prove that it is not important?  I know lots of gyrations are

> done in some busses to keep probe happening on the same CPU for very

> good reasons.  Changing that should not be done lightly as you will

> break this.


While debugging further and checking if probe are migrating found that 
init thread can potentially migrate, as it has cpu affinity set to all 
cpus, during driver probe (or there is something which prevents it, 
which I am missing?) . Also, async probes use unbounded workqueue.
So, using unbounded wq for deferred probes looks to be similar to these, 
w.r.t. scheduling behavior.


> 

> thanks,

> 

> greg k-h

> 


-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member of the Code Aurora Forum, hosted by The Linux Foundation
Greg Kroah-Hartman March 23, 2021, 9:50 a.m. UTC | #4
On Mon, Mar 15, 2021 at 04:15:12PM +0530, Yogesh Lal wrote:
> 

> 

> On 2/25/2021 5:14 PM, Greg KH wrote:

> > On Thu, Feb 25, 2021 at 04:03:50PM +0530, Yogesh Lal wrote:

> > > Hi Greg,

> > > 

> > > 

> > > On 2/24/2021 6:13 PM, Greg KH wrote:

> > > > On Wed, Feb 24, 2021 at 05:25:49PM +0530, Yogesh Lal wrote:

> > > > > Queue deferred driver probes on unbounded workqueue, to allow

> > > > > scheduler better manage scheduling of long running probes.

> > > > 

> > > > Really?  What does this change and help?  What is the visable affect of

> > > > this patch?  What problem does it solve?

> > > > 

> > > 

> > > We observed boot up improvement (~400 msec) when the deferred probe work is

> > > made unbound. This is due to scheduler moving the worker running deferred

> > > probe work to big CPUs. without this change, we see the worker is running on

> > > LITTLE CPU due to affinity.

> > 

> > Why is none of this information in the changelog text?  How are we

> > supposed to know this?  And is this 400msec out of 10 seconds or

> 

> We wanted to  first understand the requirement of bounded deferred probe why

> it was really required.

> 

> > something else?  Also, this sounds like your "little" cpus are really

> > bad, you might want to look into fixing them first :)

> > 

> 

> ~600ms (deferred probe bound to little core) and ~200ms (deferred probe

> queued on unbound wq).

> 

> > But if you really want to make this go faster, do not deferr your probe!

> > Why not fix that problem in your drivers instead?

> > 

> 

> Yes, we are exploring in that direction as well but want to get upstream

> opinion and understand the usability of unbounded wq.

> 

> > > Please let us now if there are any concerns/restrictions that deferred probe

> > > work should run only on pinned kworkers. Since this work runs deferred probe

> > > of several devices , the locality may not be that important

> > 

> > Can you prove that it is not important?  I know lots of gyrations are

> > done in some busses to keep probe happening on the same CPU for very

> > good reasons.  Changing that should not be done lightly as you will

> > break this.

> 

> While debugging further and checking if probe are migrating found that init

> thread can potentially migrate, as it has cpu affinity set to all cpus,

> during driver probe (or there is something which prevents it, which I am

> missing?) . Also, async probes use unbounded workqueue.

> So, using unbounded wq for deferred probes looks to be similar to these,

> w.r.t. scheduling behavior.


I do not understand anymore, is this patch still needed or not?

And if so, please resubmit with a lot more description in the changelog
text describing all of this...

thanks,

greg k-h
diff mbox series

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 9179825f..c9c174a 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -182,7 +182,7 @@  static void driver_deferred_probe_trigger(void)
 	 * Kick the re-probe thread.  It may already be scheduled, but it is
 	 * safe to kick it again.
 	 */
-	schedule_work(&deferred_probe_work);
+	queue_work(system_unbound_wq, &deferred_probe_work);
 }
 
 /**