diff mbox

driver core: add wait event for deferred probe

Message ID 1360429077-14616-1-git-send-email-haojian.zhuang@linaro.org
State Changes Requested
Headers show

Commit Message

Haojian Zhuang Feb. 9, 2013, 4:57 p.m. UTC
do_initcalls() could call all driver initialization code in kernel_init
thread. It means that probe() function will be also called from that
time. After this, kernel could access console & release __init section
in the same thread.

But if device probe fails and moves into deferred probe list, a new
thread is created to reinvoke probe. If the device is serial console,
kernel has to open console failure & release __init section before
reinvoking failure. Because there's no synchronization mechanism.
Now add wait event to synchronize after do_initcalls().

Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org>
---
 drivers/base/dd.c |    8 +++++---
 init/main.c       |    1 +
 2 files changed, 6 insertions(+), 3 deletions(-)

Comments

Andrew Morton Feb. 11, 2013, 11:10 p.m. UTC | #1
On Sun, 10 Feb 2013 00:57:57 +0800
Haojian Zhuang <haojian.zhuang@linaro.org> wrote:

> do_initcalls() could call all driver initialization code in kernel_init
> thread. It means that probe() function will be also called from that
> time. After this, kernel could access console & release __init section
> in the same thread.
> 
> But if device probe fails and moves into deferred probe list, a new
> thread is created to reinvoke probe. If the device is serial console,
> kernel has to open console failure & release __init section before
> reinvoking failure. Because there's no synchronization mechanism.
> Now add wait event to synchronize after do_initcalls().

It sounds like this:

static int __ref kernel_init(void *unused)
{
	kernel_init_freeable();
	/* need to finish all async __init code before freeing the memory */
	async_synchronize_full();

is designed to prevent the problem you describe?

> --- a/init/main.c
> +++ b/init/main.c
> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void)
>  	do_ctors();
>  	usermodehelper_enable();
>  	do_initcalls();
> +	wait_for_device_probe();
>  }

Needs a nice comment here explaining what's going on.
Haojian Zhuang Feb. 12, 2013, 2:52 a.m. UTC | #2
On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Sun, 10 Feb 2013 00:57:57 +0800
> Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>
>> do_initcalls() could call all driver initialization code in kernel_init
>> thread. It means that probe() function will be also called from that
>> time. After this, kernel could access console & release __init section
>> in the same thread.
>>
>> But if device probe fails and moves into deferred probe list, a new
>> thread is created to reinvoke probe. If the device is serial console,
>> kernel has to open console failure & release __init section before
>> reinvoking failure. Because there's no synchronization mechanism.
>> Now add wait event to synchronize after do_initcalls().
>
> It sounds like this:
>
> static int __ref kernel_init(void *unused)
> {
>         kernel_init_freeable();
>         /* need to finish all async __init code before freeing the memory */
>         async_synchronize_full();
>
> is designed to prevent the problem you describe?
>
It can't prevent the problem that I described. Because deferred_probe()
is introduced recently.

All synchronization should be finished just after do_initcalls(). Since
load_default_modules() is also called in the end of kernel_init_freeable(),
I'm not sure that whether I could remove async_synchronize_full()
here. So I didn't touch it.

>> --- a/init/main.c
>> +++ b/init/main.c
>> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void)
>>       do_ctors();
>>       usermodehelper_enable();
>>       do_initcalls();
>> +     wait_for_device_probe();
>>  }
>
> Needs a nice comment here explaining what's going on.

No problem. I'll add comment here.

Regards
Haojian
Grant Likely Feb. 13, 2013, 9:36 p.m. UTC | #3
On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Sun, 10 Feb 2013 00:57:57 +0800
> > Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> >
> >> do_initcalls() could call all driver initialization code in kernel_init
> >> thread. It means that probe() function will be also called from that
> >> time. After this, kernel could access console & release __init section
> >> in the same thread.
> >>
> >> But if device probe fails and moves into deferred probe list, a new
> >> thread is created to reinvoke probe. If the device is serial console,
> >> kernel has to open console failure & release __init section before
> >> reinvoking failure. Because there's no synchronization mechanism.
> >> Now add wait event to synchronize after do_initcalls().
> >
> > It sounds like this:
> >
> > static int __ref kernel_init(void *unused)
> > {
> >         kernel_init_freeable();
> >         /* need to finish all async __init code before freeing the memory */
> >         async_synchronize_full();
> >
> > is designed to prevent the problem you describe?
> >
> It can't prevent the problem that I described. Because deferred_probe()
> is introduced recently.
> 
> All synchronization should be finished just after do_initcalls(). Since
> load_default_modules() is also called in the end of kernel_init_freeable(),
> I'm not sure that whether I could remove async_synchronize_full()
> here. So I didn't touch it.
> 
> >> --- a/init/main.c
> >> +++ b/init/main.c
> >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void)
> >>       do_ctors();
> >>       usermodehelper_enable();
> >>       do_initcalls();
> >> +     wait_for_device_probe();
> >>  }
> >
> > Needs a nice comment here explaining what's going on.
> 
> No problem. I'll add comment here.

Actually, this approach will create new problems. There is no guarantee
that a given device will be able to initialize before exiting
do_basic_setup(). If, for instance, a device depends on a resource
provided by a module, then it will just keep deferring. In that case
you've got a hung kernel.

I think what you really want is the following:

 static int deferred_probe_initcall(void)
 {
	deferred_wq = create_singlethread_workqueue("deferwq");
	if (WARN_ON(!deferred_wq))
		return -ENOMEM;

	driver_deferred_probe_enable = true;
+	deferred_probe_work_func(NULL);
-	driver_deferred_probe_trigger();
	return 0;
 }
 late_initcall(deferred_probe_initcall);

Or something similar. That would guarantee that as many passes as are needed
(which in practical terms only means a couple) for device probing to
settle down before exiting the initcall processing. That should achieve
the effect you desire.

It still masks the __init section issue by making it a lot less likely,
but it does ensure that all of the built-in driver dependency order
issues are processed before continuing on to userspace.

g.
anish singh Feb. 14, 2013, 3:27 a.m. UTC | #4
On Thu, Feb 14, 2013 at 3:06 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>> On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote:
>> > On Sun, 10 Feb 2013 00:57:57 +0800
>> > Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>> >
>> >> do_initcalls() could call all driver initialization code in kernel_init
>> >> thread. It means that probe() function will be also called from that
>> >> time. After this, kernel could access console & release __init section
>> >> in the same thread.
>> >>
>> >> But if device probe fails and moves into deferred probe list, a new
>> >> thread is created to reinvoke probe. If the device is serial console,
>> >> kernel has to open console failure & release __init section before
>> >> reinvoking failure. Because there's no synchronization mechanism.
>> >> Now add wait event to synchronize after do_initcalls().
>> >
>> > It sounds like this:
>> >
>> > static int __ref kernel_init(void *unused)
>> > {
>> >         kernel_init_freeable();
>> >         /* need to finish all async __init code before freeing the memory */
>> >         async_synchronize_full();
>> >
>> > is designed to prevent the problem you describe?
>> >
>> It can't prevent the problem that I described. Because deferred_probe()
>> is introduced recently.
>>
>> All synchronization should be finished just after do_initcalls(). Since
>> load_default_modules() is also called in the end of kernel_init_freeable(),
>> I'm not sure that whether I could remove async_synchronize_full()
>> here. So I didn't touch it.
>>
>> >> --- a/init/main.c
>> >> +++ b/init/main.c
>> >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void)
>> >>       do_ctors();
>> >>       usermodehelper_enable();
>> >>       do_initcalls();
>> >> +     wait_for_device_probe();
>> >>  }
>> >
>> > Needs a nice comment here explaining what's going on.
>>
>> No problem. I'll add comment here.
>
> Actually, this approach will create new problems. There is no guarantee
> that a given device will be able to initialize before exiting
> do_basic_setup(). If, for instance, a device depends on a resource
> provided by a module, then it will just keep deferring. In that case
> you've got a hung kernel.
>
> I think what you really want is the following:
>
>  static int deferred_probe_initcall(void)
>  {
>         deferred_wq = create_singlethread_workqueue("deferwq");
>         if (WARN_ON(!deferred_wq))
>                 return -ENOMEM;
>
>         driver_deferred_probe_enable = true;
> +       deferred_probe_work_func(NULL);
> -       driver_deferred_probe_trigger();
>         return 0;
>  }
>  late_initcall(deferred_probe_initcall);
>
> Or something similar. That would guarantee that as many passes as are needed
> (which in practical terms only means a couple) for device probing to
> settle down before exiting the initcall processing. That should achieve
> the effect you desire.
>
> It still masks the __init section issue by making it a lot less likely,
Grant, Can you please explain me this problem?My understanding is below:
If all the detection of devices with there respective driver is done before
__init section is freed then we will not have the problem mentioned.
However if the driver requests the probing to be deferred then __init section
of the deferred driver will not be freed right?

I am afraid but the patch description is bit cryptic for me specially
this line "kernel has to open console failure & release __init section before
reinvoking failure".

> but it does ensure that all of the built-in driver dependency order
> issues are processed before continuing on to userspace.
>
> g.
>
> --
> Grant Likely, B.Sc, P.Eng.
> Secret Lab Technologies, Ltd.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Arnd Bergmann Feb. 14, 2013, 9:56 a.m. UTC | #5
On Thursday 14 February 2013, anish singh wrote:
> Grant, Can you please explain me this problem?My understanding is below:
> If all the detection of devices with there respective driver is done before
> __init section is freed then we will not have the problem mentioned.
> However if the driver requests the probing to be deferred then __init section
> of the deferred driver will not be freed right?

The kernel has no idea which drivers are deferred at the time when all the
__init sections are freed.

> I am afraid but the patch description is bit cryptic for me specially
> this line "kernel has to open console failure & release __init section before
> reinvoking failure".

I would put it this way: With the introduction of deferred probing, the
rules for the use of __init sections have changed slightly for some
corner cases. While normal device drivers can, as before, not call
__init functions from their .probe() callbacks, we could do that in
drivers as long as they were built-in and did not support hotplug,
and that exception was used in console drivers. This exception has
now become more specific, and those drivers also must not use
deferred probing that depends on other loadable modules or hotpluggable
devices.

Grant's patch fixes the corner case where you have a device whose .probe()
callback gets deferred and waits for some object that is provided
by a different built-in driver for a non-hotpluggable device, by making
sure that this particular deferred probe has completed before the __init
section is freed. Unlike Haojian's patch, it allows other deferred
device probe functions that do not need to call __init functions to be
delayed until much later, when a driver module is loaded or a device
is hotplugged.

	Arnd
Russell King - ARM Linux Feb. 14, 2013, 10:04 a.m. UTC | #6
On Thu, Feb 14, 2013 at 09:56:36AM +0000, Arnd Bergmann wrote:
> I would put it this way: With the introduction of deferred probing, the
> rules for the use of __init sections have changed slightly for some
> corner cases. While normal device drivers can, as before, not call
> __init functions from their .probe() callbacks, we could do that in
> drivers as long as they were built-in and did not support hotplug,
> and that exception was used in console drivers. This exception has
> now become more specific, and those drivers also must not use
> deferred probing that depends on other loadable modules or hotpluggable
> devices.

In the general case, that remains true, but it's still _not_ true for
console drivers.

The console _should_ be initialised before it is attempted to be opened
before passing control to userspace, which happens before the .init
section is freed.

If the console is deferred past that point, then userspace has no console.
The behaviour of userspace in that situation can be very interesting, and
I'd suggest that such is not well tested; consider the effect of not
having fd 0,1,2 connected to something like a console but your filesystem
and something doing a printf().  You can hope that userspace will take
care of that condition, but I personally would not put much faith in it.

With the plethora of 'init' daemon solutions we now have, I have less
faith than I used to that such a condition would be correctly handled
by all of them.
Arnd Bergmann Feb. 14, 2013, 11:08 a.m. UTC | #7
On Thursday 14 February 2013, Russell King - ARM Linux wrote:
> In the general case, that remains true, but it's still not true for
> console drivers.
> 
> The console should be initialised before it is attempted to be opened
> before passing control to userspace, which happens before the .init
> section is freed.

Yes, I forgot about that. This is indeed an additional requirement
besides what I listed. The late_initcall in which Grant was adding
the serialization however is executed just before the /dev/console
gets opened, which seems like an appropriate place.

What might get into the way is that other late_initcalls get
executed after this one and are required for the console. In
that case, we would have to move the deferred_probe_initcall
from a late_initcall to the end of do_basic_setup, after all
the other initcalls.

	Arnd
Haojian Zhuang Feb. 14, 2013, 3:52 p.m. UTC | #8
On 14 February 2013 05:36, Grant Likely <grant.likely@secretlab.ca> wrote:
> On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>> On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote:
>> > On Sun, 10 Feb 2013 00:57:57 +0800
>> > Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
>> >
>> >> do_initcalls() could call all driver initialization code in kernel_init
>> >> thread. It means that probe() function will be also called from that
>> >> time. After this, kernel could access console & release __init section
>> >> in the same thread.
>> >>
>> >> But if device probe fails and moves into deferred probe list, a new
>> >> thread is created to reinvoke probe. If the device is serial console,
>> >> kernel has to open console failure & release __init section before
>> >> reinvoking failure. Because there's no synchronization mechanism.
>> >> Now add wait event to synchronize after do_initcalls().
>> >
>> > It sounds like this:
>> >
>> > static int __ref kernel_init(void *unused)
>> > {
>> >         kernel_init_freeable();
>> >         /* need to finish all async __init code before freeing the memory */
>> >         async_synchronize_full();
>> >
>> > is designed to prevent the problem you describe?
>> >
>> It can't prevent the problem that I described. Because deferred_probe()
>> is introduced recently.
>>
>> All synchronization should be finished just after do_initcalls(). Since
>> load_default_modules() is also called in the end of kernel_init_freeable(),
>> I'm not sure that whether I could remove async_synchronize_full()
>> here. So I didn't touch it.
>>
>> >> --- a/init/main.c
>> >> +++ b/init/main.c
>> >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void)
>> >>       do_ctors();
>> >>       usermodehelper_enable();
>> >>       do_initcalls();
>> >> +     wait_for_device_probe();
>> >>  }
>> >
>> > Needs a nice comment here explaining what's going on.
>>
>> No problem. I'll add comment here.
>
> Actually, this approach will create new problems. There is no guarantee
> that a given device will be able to initialize before exiting
> do_basic_setup(). If, for instance, a device depends on a resource
> provided by a module, then it will just keep deferring. In that case
> you've got a hung kernel.
>
> I think what you really want is the following:
>
>  static int deferred_probe_initcall(void)
>  {
>         deferred_wq = create_singlethread_workqueue("deferwq");
>         if (WARN_ON(!deferred_wq))
>                 return -ENOMEM;
>
>         driver_deferred_probe_enable = true;
> +       deferred_probe_work_func(NULL);
> -       driver_deferred_probe_trigger();

If you can change it into code in below, it could work. Otherwise, it
always fails.
        driver_deferred_probe_enable = true;
        driver_deferred_probe_trigger();
+      deferred_probe_work_func(NULL);
        return 0;

Because deferred_probe_work_func() depends on that deferred_probe is added
into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
first, the deferred uart probe can't be added into active list. So even you call
work_func at here, it doesn't help.

Would you send it again? If so, you can add tested-by with my signature.

>         return 0;
>  }
>  late_initcall(deferred_probe_initcall);
>
> Or something similar. That would guarantee that as many passes as are needed
> (which in practical terms only means a couple) for device probing to
> settle down before exiting the initcall processing. That should achieve
> the effect you desire.
>
> It still masks the __init section issue by making it a lot less likely,
> but it does ensure that all of the built-in driver dependency order
> issues are processed before continuing on to userspace.
>
> g.
>
> --
> Grant Likely, B.Sc, P.Eng.
> Secret Lab Technologies, Ltd.
Arnd Bergmann Feb. 14, 2013, 3:57 p.m. UTC | #9
On Thursday 14 February 2013, Haojian Zhuang wrote:
> If you can change it into code in below, it could work. Otherwise, it
> always fails.
>         driver_deferred_probe_enable = true;
>         driver_deferred_probe_trigger();
> +      deferred_probe_work_func(NULL);
>         return 0;
> 
> Because deferred_probe_work_func() depends on that deferred_probe is added
> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
> first, the deferred uart probe can't be added into active list. So even you call
> work_func at here, it doesn't help.
> 

Would that not cause two instances of the work function to run at the same time?
That sounds like a source for a lot of problems.

	Arnd
Haojian Zhuang Feb. 14, 2013, 4:04 p.m. UTC | #10
On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 14 February 2013, Haojian Zhuang wrote:
>> If you can change it into code in below, it could work. Otherwise, it
>> always fails.
>>         driver_deferred_probe_enable = true;
>>         driver_deferred_probe_trigger();
>> +      deferred_probe_work_func(NULL);
>>         return 0;
>>
>> Because deferred_probe_work_func() depends on that deferred_probe is added
>> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
>> first, the deferred uart probe can't be added into active list. So even you call
>> work_func at here, it doesn't help.
>>
>
> Would that not cause two instances of the work function to run at the same time?
> That sounds like a source for a lot of problems.
>
>         Arnd

Two instances of the work function? I'm sorry that I don't
understanding your meaning.
Could you help explain your question?

Best Regards
Haojian
Grant Likely Feb. 14, 2013, 4:33 p.m. UTC | #11
On Thu, 14 Feb 2013 08:57:17 +0530, anish singh <anish198519851985@gmail.com> wrote:
> On Thu, Feb 14, 2013 at 3:06 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
> >  static int deferred_probe_initcall(void)
> >  {
> >         deferred_wq = create_singlethread_workqueue("deferwq");
> >         if (WARN_ON(!deferred_wq))
> >                 return -ENOMEM;
> >
> >         driver_deferred_probe_enable = true;
> > +       deferred_probe_work_func(NULL);
> > -       driver_deferred_probe_trigger();
> >         return 0;
> >  }
> >  late_initcall(deferred_probe_initcall);
> >
> > Or something similar. That would guarantee that as many passes as are needed
> > (which in practical terms only means a couple) for device probing to
> > settle down before exiting the initcall processing. That should achieve
> > the effect you desire.
> >
> > It still masks the __init section issue by making it a lot less likely,
> Grant, Can you please explain me this problem?My understanding is below:
> If all the detection of devices with there respective driver is done before
> __init section is freed then we will not have the problem mentioned.
> However if the driver requests the probing to be deferred then __init section
> of the deferred driver will not be freed right?
> 
> I am afraid but the patch description is bit cryptic for me specially
> this line "kernel has to open console failure & release __init section before
> reinvoking failure".

Yes I can, but first I'll briefly describe the Linux driver model to make sure
we're talking about the same thing...

drivercore in Linux is oriented around two data structures:
1) Devices (struct device), and
2) Drivers (struct device_driver)

Hardware is modeled with instances of 'struct device'. For each device
that Linux knows about there is one 'struct device'[1]. The devices are
organized into a hierarchical tree, and you can see it by looking in
/sys/devices.

Device drivers are represended by struct device_driver. Each driver,
whether built-in or a module, is responsible to register itself by
embedding a struct device_driver, or a structure that contains a struct
device_driver. For example, struct platform_driver has an embedded
struct device_driver.

The whole purpose of drivercore is to match up drivers to devices. Each
bus_type has its own mechanism for deciding which devices and
device_drivers go together, but it still results in trying to bind a
struct device_driver to each struct device.

An important detail here is that drivercore is entirely asynchronous.
There are no requirements on what order devices and device_drivers are
registered. When a device gets registered, drivercore attempts to bind
it to any device_driver that it already knows about. Similarly, when a
new device_driver gets registered, drivercore will see if there are any
unbound devices that it can bind it to. It is even possible to trigger a
bind attempt sometime after both device and device_driver have been
registered.[2]

This is the reason that deferred_probe is an option. As long as the
kernel keeps track of which device_drivers requested deferred probe, we
can nudge drivercore to reattempt probing. It really doesn't matter what
order or when drivers get bound...

...except when it does. Here's where we get into the issues related to
__init sections and deferred probe. Since the driver model can bind a
driver at any time, including after userspace has started, the
expectation is that none of the code paths associated with probing will
be discarded. That is why .probe hooks cannot be __init annotated. The
problem for consoles is that the console init hook gets called from
the probe path and a lot of console init hooks are __init annotated.

Before deferred probe, this was rarely (if ever) an actual problem. In
general the order of operations during kernel init is:

1) (early boot) create and register a bunch of struct devices
2) (initcalls) register a bunch of struct device_drivers
 - a bunch of binds happen at this point as device drivers get
   registered
 - This is why device initialization order is primarily driven by driver
   link order.
3) discard __init sections
4) userspace.

It's not that bind is guaranteed to occur before __init discard, but
rather nothing prevented it from happening after. Deferred probe changes
that. With deferred probe, Haojian correctly analyzed that driver bind
is getting pushed after __init is discarded.

However, the problem with his solution was that it assumed that *all*
deferred drivers must be resolved before proceeding with init which
cannot be guaranteed. It is absolutely possble for a built-in driver to
depend on something provided by a module. As rmk pointed out, that is
not okay for console devices, so it is still important to make sure
everything needed for the console is built-in, but non-critical devices
may not care.

The difference between my fix and Haojian's is that my fix forces a
single pass of the deferred list in the same context as the initcalls
before the deferred probe workqueue takes over. That will ensure that
all inter-driver dependencies get themselves sorted out before
completing the initcalls, and therefore before __init gets discarded.

Phew, that was a lot more long-winded that I indended. I hope that helps
though.

g.

[1] It's not quite that simple. Some devices have more than one struct
device, but for the purpose of this discussion, one per device is
sufficient.
[2] It is also possible to unbind drivers from devices without
unregistering the device driver, but I don't want to get too complicated
in this description.
Arnd Bergmann Feb. 14, 2013, 4:50 p.m. UTC | #12
On Thursday 14 February 2013, Haojian Zhuang wrote:
> On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 14 February 2013, Haojian Zhuang wrote:
> >> If you can change it into code in below, it could work. Otherwise, it
> >> always fails.
> >>         driver_deferred_probe_enable = true;
> >>         driver_deferred_probe_trigger();
> >> +      deferred_probe_work_func(NULL);
> >>         return 0;
> >>
> >> Because deferred_probe_work_func() depends on that deferred_probe is added
> >> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
> >> first, the deferred uart probe can't be added into active list. So even you call
> >> work_func at here, it doesn't help.
> >>
> >
> > Would that not cause two instances of the work function to run at the same time?
> > That sounds like a source for a lot of problems.
> >
> >         Arnd
> 
> Two instances of the work function? I'm sorry that I don't
> understanding your meaning.
> Could you help explain your question?

I mean you end up calling the work function directly, while it gets run as part
of the work queue on a different CPU at the same time. I just noticed that
there is actually locking in place in deferred_probe_work_func that prevents
any actual bugs, but you are still adding extra overhead here.

Maybe just add

	flush_workqueue(deferred_wq);

here?

	Arnd
Haojian Zhuang Feb. 14, 2013, 4:58 p.m. UTC | #13
On 15 February 2013 00:50, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 14 February 2013, Haojian Zhuang wrote:
>> On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Thursday 14 February 2013, Haojian Zhuang wrote:
>> >> If you can change it into code in below, it could work. Otherwise, it
>> >> always fails.
>> >>         driver_deferred_probe_enable = true;
>> >>         driver_deferred_probe_trigger();
>> >> +      deferred_probe_work_func(NULL);
>> >>         return 0;
>> >>
>> >> Because deferred_probe_work_func() depends on that deferred_probe is added
>> >> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
>> >> first, the deferred uart probe can't be added into active list. So even you call
>> >> work_func at here, it doesn't help.
>> >>
>> >
>> > Would that not cause two instances of the work function to run at the same time?
>> > That sounds like a source for a lot of problems.
>> >
>> >         Arnd
>>
>> Two instances of the work function? I'm sorry that I don't
>> understanding your meaning.
>> Could you help explain your question?
>
> I mean you end up calling the work function directly, while it gets run as part
> of the work queue on a different CPU at the same time. I just noticed that
> there is actually locking in place in deferred_probe_work_func that prevents
> any actual bugs, but you are still adding extra overhead here.
>
> Maybe just add
>
>         flush_workqueue(deferred_wq);
>
> here?
>
>         Arnd

It's fine to me. Since both of them are flushing workqueue.

Tested-by: <haojian.zhuang@linaro.org>
Grant Likely Feb. 14, 2013, 5:38 p.m. UTC | #14
On Thu, 14 Feb 2013 23:52:14 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> On 14 February 2013 05:36, Grant Likely <grant.likely@secretlab.ca> wrote:
> > On Tue, 12 Feb 2013 10:52:10 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> >> On 12 February 2013 07:10, Andrew Morton <akpm@linux-foundation.org> wrote:
> >> > On Sun, 10 Feb 2013 00:57:57 +0800
> >> > Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> >> >
> >> >> do_initcalls() could call all driver initialization code in kernel_init
> >> >> thread. It means that probe() function will be also called from that
> >> >> time. After this, kernel could access console & release __init section
> >> >> in the same thread.
> >> >>
> >> >> But if device probe fails and moves into deferred probe list, a new
> >> >> thread is created to reinvoke probe. If the device is serial console,
> >> >> kernel has to open console failure & release __init section before
> >> >> reinvoking failure. Because there's no synchronization mechanism.
> >> >> Now add wait event to synchronize after do_initcalls().
> >> >
> >> > It sounds like this:
> >> >
> >> > static int __ref kernel_init(void *unused)
> >> > {
> >> >         kernel_init_freeable();
> >> >         /* need to finish all async __init code before freeing the memory */
> >> >         async_synchronize_full();
> >> >
> >> > is designed to prevent the problem you describe?
> >> >
> >> It can't prevent the problem that I described. Because deferred_probe()
> >> is introduced recently.
> >>
> >> All synchronization should be finished just after do_initcalls(). Since
> >> load_default_modules() is also called in the end of kernel_init_freeable(),
> >> I'm not sure that whether I could remove async_synchronize_full()
> >> here. So I didn't touch it.
> >>
> >> >> --- a/init/main.c
> >> >> +++ b/init/main.c
> >> >> @@ -786,6 +786,7 @@ static void __init do_basic_setup(void)
> >> >>       do_ctors();
> >> >>       usermodehelper_enable();
> >> >>       do_initcalls();
> >> >> +     wait_for_device_probe();
> >> >>  }
> >> >
> >> > Needs a nice comment here explaining what's going on.
> >>
> >> No problem. I'll add comment here.
> >
> > Actually, this approach will create new problems. There is no guarantee
> > that a given device will be able to initialize before exiting
> > do_basic_setup(). If, for instance, a device depends on a resource
> > provided by a module, then it will just keep deferring. In that case
> > you've got a hung kernel.
> >
> > I think what you really want is the following:
> >
> >  static int deferred_probe_initcall(void)
> >  {
> >         deferred_wq = create_singlethread_workqueue("deferwq");
> >         if (WARN_ON(!deferred_wq))
> >                 return -ENOMEM;
> >
> >         driver_deferred_probe_enable = true;
> > +       deferred_probe_work_func(NULL);
> > -       driver_deferred_probe_trigger();
> 
> If you can change it into code in below, it could work. Otherwise, it
> always fails.
>         driver_deferred_probe_enable = true;
>         driver_deferred_probe_trigger();
> +      deferred_probe_work_func(NULL);
>         return 0;
> 
> Because deferred_probe_work_func() depends on that deferred_probe is added
> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
> first, the deferred uart probe can't be added into active list. So even you call
> work_func at here, it doesn't help.
> 
> Would you send it again? If so, you can add tested-by with my signature.

Hmmm, that works, but it isn't very elegant. With that solution both the
workqueue and the initcall are processing the deferred devices at the
same time. It quite possibly could result in missed devices when walking
the deferred list. Consider the scenario:

B depends on A, C and D depend on B

1) Inital conditions    pending:A-B-C-D active:empty    WQ:idle  IC:idle
2) Trigger              pending:empty   active:A-B-C-D
3) WQ: pop device       pending:empty   active:B-C-D    WQ:A     IC:idle
4) IC: pop device       pending:empty   active:C-D      WQ:A     IC:B
5) WQ: A probed & pop   pending:empty   active:D        WQ:C     IC:B
6) IC: B defers & pop   pending:B       active:empty    WQ:C     IC:D
   /* B defers because A wasn't ready, but when A did complete B wasn't
    * onthe pending list - this is bad */
7) WQ: C defers         pending:B-C     active:empty    WQ:idle  IC:D
8) IC: D defers         pending:B-C-D   active:empty    WQ:idle  IC:idle

When A successfully probes, everthing in the pending list gets appended
to the active one, but if there are multiple threads poping devices off
the list, then there can be devices that /should/ be moved to the active
list but don't because they've been popped off by a separate thread.

With the current code there really should only be one processor of the
deferred list at a time. Try this instead:


        driver_deferred_probe_enable = true;
        driver_deferred_probe_trigger();
+       flush_workqueue(deferred_wq);
        return 0;

I had tried to keep the first pass of the list within the initcall
context, but it created a lot of special cases in the code that I wasn't
happy with. This is a lot simpler and has exactly the same visible
behaviour.

Let me know if that works for you and I'll send a proper patch to Linus
with your Tested-by.  He may yell at me for sending something so
incredibly late in the v3.8 stablization, but this really is a bug fix.
If he won't take it then I'll ask Greg to put it into v3.8.1.

g.
anish singh Feb. 14, 2013, 5:42 p.m. UTC | #15
On Thu, 2013-02-14 at 16:33 +0000, Grant Likely wrote:
> On Thu, 14 Feb 2013 08:57:17 +0530, anish singh <anish198519851985@gmail.com> wrote:
> > On Thu, Feb 14, 2013 at 3:06 AM, Grant Likely <grant.likely@secretlab.ca> wrote:
> > >  static int deferred_probe_initcall(void)
> > >  {
> > >         deferred_wq = create_singlethread_workqueue("deferwq");
> > >         if (WARN_ON(!deferred_wq))
> > >                 return -ENOMEM;
> > >
> > >         driver_deferred_probe_enable = true;
> > > +       deferred_probe_work_func(NULL);
> > > -       driver_deferred_probe_trigger();
> > >         return 0;
> > >  }
> > >  late_initcall(deferred_probe_initcall);
> > >
> > > Or something similar. That would guarantee that as many passes as are needed
> > > (which in practical terms only means a couple) for device probing to
> > > settle down before exiting the initcall processing. That should achieve
> > > the effect you desire.
> > >
> > > It still masks the __init section issue by making it a lot less likely,
> > Grant, Can you please explain me this problem?My understanding is below:
> > If all the detection of devices with there respective driver is done before
> > __init section is freed then we will not have the problem mentioned.
> > However if the driver requests the probing to be deferred then __init section
> > of the deferred driver will not be freed right?
> > 
> > I am afraid but the patch description is bit cryptic for me specially
> > this line "kernel has to open console failure & release __init section before
> > reinvoking failure".
> 
> Yes I can, but first I'll briefly describe the Linux driver model to make sure
> we're talking about the same thing...
> 
> drivercore in Linux is oriented around two data structures:
> 1) Devices (struct device), and
> 2) Drivers (struct device_driver)
> 
> Hardware is modeled with instances of 'struct device'. For each device
> that Linux knows about there is one 'struct device'[1]. The devices are
> organized into a hierarchical tree, and you can see it by looking in
> /sys/devices.
> 
> Device drivers are represended by struct device_driver. Each driver,
> whether built-in or a module, is responsible to register itself by
> embedding a struct device_driver, or a structure that contains a struct
> device_driver. For example, struct platform_driver has an embedded
> struct device_driver.
> 
> The whole purpose of drivercore is to match up drivers to devices. Each
> bus_type has its own mechanism for deciding which devices and
> device_drivers go together, but it still results in trying to bind a
> struct device_driver to each struct device.
> 
> An important detail here is that drivercore is entirely asynchronous.
> There are no requirements on what order devices and device_drivers are
> registered. When a device gets registered, drivercore attempts to bind
> it to any device_driver that it already knows about. Similarly, when a
> new device_driver gets registered, drivercore will see if there are any
> unbound devices that it can bind it to. It is even possible to trigger a
> bind attempt sometime after both device and device_driver have been
> registered.[2]
> 
> This is the reason that deferred_probe is an option. As long as the
> kernel keeps track of which device_drivers requested deferred probe, we
> can nudge drivercore to reattempt probing. It really doesn't matter what
> order or when drivers get bound...
> 
> ...except when it does. Here's where we get into the issues related to
> __init sections and deferred probe. Since the driver model can bind a
> driver at any time, including after userspace has started, the
> expectation is that none of the code paths associated with probing will
> be discarded. That is why .probe hooks cannot be __init annotated. The
> problem for consoles is that the console init hook gets called from
> the probe path and a lot of console init hooks are __init annotated.
> 
> Before deferred probe, this was rarely (if ever) an actual problem. In
> general the order of operations during kernel init is:
> 
> 1) (early boot) create and register a bunch of struct devices
> 2) (initcalls) register a bunch of struct device_drivers
>  - a bunch of binds happen at this point as device drivers get
>    registered
>  - This is why device initialization order is primarily driven by driver
>    link order.
> 3) discard __init sections
> 4) userspace.
> 
> It's not that bind is guaranteed to occur before __init discard, but
> rather nothing prevented it from happening after. Deferred probe changes
> that. With deferred probe, Haojian correctly analyzed that driver bind
> is getting pushed after __init is discarded.
> 
> However, the problem with his solution was that it assumed that *all*
> deferred drivers must be resolved before proceeding with init which
> cannot be guaranteed. It is absolutely possble for a built-in driver to
> depend on something provided by a module. As rmk pointed out, that is
> not okay for console devices, so it is still important to make sure
> everything needed for the console is built-in, but non-critical devices
> may not care.
> 
> The difference between my fix and Haojian's is that my fix forces a
> single pass of the deferred list in the same context as the initcalls
> before the deferred probe workqueue takes over. That will ensure that
> all inter-driver dependencies get themselves sorted out before
> completing the initcalls, and therefore before __init gets discarded.
> 
> Phew, that was a lot more long-winded that I indended. I hope that helps
> though. 
I hope part of this explanation goes into the patch description or at
least a comment in the code added by you.

I can't imagine a better explanation than this.Thanks to you, Arnd
Bergmann and Russell King.
> 
> g.
> 
> [1] It's not quite that simple. Some devices have more than one struct
> device, but for the purpose of this discussion, one per device is
> sufficient.
> [2] It is also possible to unbind drivers from devices without
> unregistering the device driver, but I don't want to get too complicated
> in this description.
Grant Likely Feb. 14, 2013, 5:42 p.m. UTC | #16
On Thu, 14 Feb 2013 15:57:18 +0000, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 14 February 2013, Haojian Zhuang wrote:
> > If you can change it into code in below, it could work. Otherwise, it
> > always fails.
> >         driver_deferred_probe_enable = true;
> >         driver_deferred_probe_trigger();
> > +      deferred_probe_work_func(NULL);
> >         return 0;
> > 
> > Because deferred_probe_work_func() depends on that deferred_probe is added
> > into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
> > first, the deferred uart probe can't be added into active list. So even you call
> > work_func at here, it doesn't help.
> > 
> 
> Would that not cause two instances of the work function to run at the same time?
> That sounds like a source for a lot of problems.

Yes. Even ignoring the problems with device drivers not handling
multithreaded probing well, the current deferred probe wouldn't handle
it well. I could make it support multithreading, but I don't see a whole
lot of value there.

g.
Grant Likely Feb. 14, 2013, 5:54 p.m. UTC | #17
On Fri, 15 Feb 2013 00:58:23 +0800, Haojian Zhuang <haojian.zhuang@linaro.org> wrote:
> On 15 February 2013 00:50, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Thursday 14 February 2013, Haojian Zhuang wrote:
> >> On 14 February 2013 23:57, Arnd Bergmann <arnd@arndb.de> wrote:
> >> > On Thursday 14 February 2013, Haojian Zhuang wrote:
> >> >> If you can change it into code in below, it could work. Otherwise, it
> >> >> always fails.
> >> >>         driver_deferred_probe_enable = true;
> >> >>         driver_deferred_probe_trigger();
> >> >> +      deferred_probe_work_func(NULL);
> >> >>         return 0;
> >> >>
> >> >> Because deferred_probe_work_func() depends on that deferred_probe is added
> >> >> into deferred_probe_active_list. If driver_deferred_probe_trigger() isn't called
> >> >> first, the deferred uart probe can't be added into active list. So even you call
> >> >> work_func at here, it doesn't help.
> >> >>
> >> >
> >> > Would that not cause two instances of the work function to run at the same time?
> >> > That sounds like a source for a lot of problems.
> >> >
> >> >         Arnd
> >>
> >> Two instances of the work function? I'm sorry that I don't
> >> understanding your meaning.
> >> Could you help explain your question?
> >
> > I mean you end up calling the work function directly, while it gets run as part
> > of the work queue on a different CPU at the same time. I just noticed that
> > there is actually locking in place in deferred_probe_work_func that prevents
> > any actual bugs, but you are still adding extra overhead here.
> >
> > Maybe just add
> >
> >         flush_workqueue(deferred_wq);
> >
> > here?
> >
> >         Arnd
> 
> It's fine to me. Since both of them are flushing workqueue.
> 
> Tested-by: <haojian.zhuang@linaro.org>

Hahaha. I just came to the same conclusion. I'll craft a proper patch
and send it off.

g.
diff mbox

Patch

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 65631015..23db672 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -53,6 +53,9 @@  static LIST_HEAD(deferred_probe_pending_list);
 static LIST_HEAD(deferred_probe_active_list);
 static struct workqueue_struct *deferred_wq;
 
+static atomic_t probe_count = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
+
 /**
  * deferred_probe_work_func() - Retry probing devices in the active list.
  */
@@ -77,6 +80,7 @@  static void deferred_probe_work_func(struct work_struct *work)
 		private = list_first_entry(&deferred_probe_active_list,
 					typeof(*dev->p), deferred_probe);
 		dev = private->device;
+		atomic_dec(&probe_count);
 		list_del_init(&private->deferred_probe);
 
 		get_device(dev);
@@ -257,9 +261,6 @@  int device_bind_driver(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(device_bind_driver);
 
-static atomic_t probe_count = ATOMIC_INIT(0);
-static DECLARE_WAIT_QUEUE_HEAD(probe_waitqueue);
-
 static int really_probe(struct device *dev, struct device_driver *drv)
 {
 	int ret = 0;
@@ -308,6 +309,7 @@  probe_failed:
 		/* Driver requested deferred probing */
 		dev_info(dev, "Driver %s requests probe deferral\n", drv->name);
 		driver_deferred_probe_add(dev);
+		atomic_inc(&probe_count);
 	} else if (ret != -ENODEV && ret != -ENXIO) {
 		/* driver matched but the probe failed */
 		printk(KERN_WARNING
diff --git a/init/main.c b/init/main.c
index 63534a1..033bf5f 100644
--- a/init/main.c
+++ b/init/main.c
@@ -786,6 +786,7 @@  static void __init do_basic_setup(void)
 	do_ctors();
 	usermodehelper_enable();
 	do_initcalls();
+	wait_for_device_probe();
 }
 
 static void __init do_pre_smp_initcalls(void)