Message ID | 20210202043345.3778765-3-saravanak@google.com |
---|---|
State | New |
Headers | show |
Series | Make fw_devlink=on more forgiving | expand |
On Tue, Feb 2, 2021 at 5:33 AM Saravana Kannan <saravanak@google.com> wrote: > > After a deferred probe attempt has exhaused all the devices that can be > bound, any device that remains unbound has one/both of these conditions > true: > > (1) It is waiting on its supplier to bind > (2) It does not have a matching driver > > So, to make fw_devlink=on more forgiving of missing drivers for optional > suppliers, after we've done a full deferred probe attempt, this patch > deletes all device links created by fw_devlink where the supplier hasn't > probed yet and the supplier itself is not waiting on any of its > suppliers. This allows consumers to probe during another deferred probe > attempt if they were waiting on optional suppliers. > > When modules are enabled, we can't differentiate between a driver > that'll never be registered vs a driver that'll be registered soon by > loading a module. So, this patch doesn't do anything for the case where > modules are enabled. > > Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default") > Signed-off-by: Saravana Kannan <saravanak@google.com> > --- > drivers/base/base.h | 2 + > drivers/base/core.c | 104 ++++++++++++++++++++++++++++++++++++-------- > drivers/base/dd.c | 5 +++ > 3 files changed, 94 insertions(+), 17 deletions(-) > > diff --git a/drivers/base/base.h b/drivers/base/base.h > index f5600a83124f..34befe9475cb 100644 > --- a/drivers/base/base.h > +++ b/drivers/base/base.h > @@ -186,6 +186,8 @@ extern void device_links_no_driver(struct device *dev); > extern bool device_links_busy(struct device *dev); > extern void device_links_unbind_consumers(struct device *dev); > > +bool fw_devlink_deferred_probe_retry(void); > + > /* device pm support */ > void device_pm_move_to_tail(struct device *dev); > > diff --git a/drivers/base/core.c b/drivers/base/core.c > index c95b1daabac7..5e53fc6a21ea 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -50,6 +50,7 @@ static LIST_HEAD(deferred_sync); > static unsigned int defer_sync_state_count = 1; > static DEFINE_MUTEX(fwnode_link_lock); > static bool fw_devlink_is_permissive(void); > +static bool fw_devlink_def_probe_retry; > > /** > * fwnode_link_add - Create a link between two fwnode_handles. > @@ -881,6 +882,13 @@ static void device_link_put_kref(struct device_link *link) > WARN(1, "Unable to drop a managed device link reference\n"); > } > > +static void device_link_drop_managed(struct device_link *link) > +{ > + link->flags &= ~DL_FLAG_MANAGED; > + WRITE_ONCE(link->status, DL_STATE_NONE); > + kref_put(&link->kref, __device_link_del); > +} > + > /** > * device_link_del - Delete a stateless link between two devices. > * @link: Device link to delete. > @@ -943,6 +951,29 @@ static void device_links_missing_supplier(struct device *dev) > } > } > > +/** > + * device_links_probe_blocked_by - Return first supplier blocking probe > + * @dev: Consumer device. > + * > + * Checks if the probe of @dev is blocked by a supplier without a driver. If > + * yes, return that supplier dev. Otherwise, return NULL. > + */ > +static struct device *device_links_probe_blocked_by(struct device *dev) > +{ > + struct device_link *link; > + > + list_for_each_entry(link, &dev->links.suppliers, c_node) { > + if (!(link->flags & DL_FLAG_MANAGED) || > + link->flags & DL_FLAG_SYNC_STATE_ONLY) > + continue; > + > + if (link->status != DL_STATE_AVAILABLE) > + return link->supplier; > + } > + > + return NULL; This is slightly confusing, because you don't actually use the returned device pointer, but simply check it against NULL. AFAICS this function can return bool and I'd call it device_links_probe_blocked(). > +} > + > /** > * device_links_check_suppliers - Check presence of supplier drivers. > * @dev: Consumer device. > @@ -961,7 +992,7 @@ static void device_links_missing_supplier(struct device *dev) > */ > int device_links_check_suppliers(struct device *dev) > { > - struct device_link *link; > + struct device_link *link, *tmp; > int ret = 0; > > /* > @@ -982,19 +1013,47 @@ int device_links_check_suppliers(struct device *dev) > > device_links_write_lock(); > > - list_for_each_entry(link, &dev->links.suppliers, c_node) { > + list_for_each_entry_safe(link, tmp, &dev->links.suppliers, c_node) { > if (!(link->flags & DL_FLAG_MANAGED)) > continue; > > - if (link->status != DL_STATE_AVAILABLE && > - !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) { > - device_links_missing_supplier(dev); > - dev_dbg(dev, "probe deferral - supplier %s not ready\n", > - dev_name(link->supplier)); > - ret = -EPROBE_DEFER; > - break; > + > + if (link->status == DL_STATE_AVAILABLE || > + link->flags & DL_FLAG_SYNC_STATE_ONLY) { > + WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE); > + continue; > + } > + > + /* > + * After a deferred probe attempt has exhaused all the devices > + * that can be bound, any device that remains unbound has > + * one/both of these conditions true: > + * > + * (1) It is waiting on its supplier to bind > + * (2) It does not have a matching driver > + * > + * If this device is waiting on a supplier to bind to a driver, > + * we make sure condition (1) above is not true for the > + * supplier. In which case, condition (2) has to be true for > + * the supplier. That is, the supplier doesn't have a matching > + * driver. > + * > + * When we find such a supplier, we delete the device link if > + * it was created by fw_devlink. This it to allow the consumer > + * to probe in case the supplier is an optional. > + */ > + if (fw_devlink_def_probe_retry && I would put a IS_ENABLED(CONFIG_MODULES) check here to let the compiler optimize out the code depending on it and make it clear that this is a NOP if there are modules. > + link->flags & DL_FLAG_INFERRED && > + !device_links_probe_blocked_by(link->supplier)) { > + device_link_drop_managed(link); > + continue; > } > - WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE); > + > + device_links_missing_supplier(dev); > + dev_dbg(dev, "probe deferral - supplier %s not ready\n", > + dev_name(link->supplier)); > + ret = -EPROBE_DEFER; > + break; > } > dev->links.status = DL_DEV_PROBING; > > @@ -1132,13 +1191,6 @@ static void __device_links_supplier_defer_sync(struct device *sup) > list_add_tail(&sup->links.defer_sync, &deferred_sync); > } > > -static void device_link_drop_managed(struct device_link *link) > -{ > - link->flags &= ~DL_FLAG_MANAGED; > - WRITE_ONCE(link->status, DL_STATE_NONE); > - kref_put(&link->kref, __device_link_del); > -} > - > static ssize_t waiting_for_supplier_show(struct device *dev, > struct device_attribute *attr, > char *buf) > @@ -1597,6 +1649,24 @@ static int fw_devlink_relax_cycle(struct device *con, void *sup) > return ret; > } > > +/** fw_devlink_deferred_probe_retry - Set up fw_devlink for probe retries Kerneldoc format mistake. > + * > + * This function requests fw_devlink to set itself up for a deferred probe > + * retry. This allows fw_devlink to ignore device links it created to > + * suppliers that'll never probe. This is necessary in case some of the > + * suppliers are optional and their consumers can probe without them. > + * > + * Returns true if deferred probe retry is likely to make any difference. > + */ > +bool fw_devlink_deferred_probe_retry(void) > +{ > + if (IS_ENABLED(CONFIG_MODULES)) > + return false; To make the above more visible, I'd fold this function into the caller. > + > + fw_devlink_def_probe_retry = true; > + return fw_devlink_get_flags() && !fw_devlink_is_permissive(); > +} > + > /** > * fw_devlink_create_devlink - Create a device link from a consumer to fwnode > * @con - Consumer device for the device link > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 9179825ff646..11325df2327f 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -317,6 +317,11 @@ static int deferred_probe_initcall(void) > driver_deferred_probe_trigger(); > /* Sort as many dependencies as possible before exiting initcalls */ > flush_work(&deferred_probe_work); > + > + if (fw_devlink_deferred_probe_retry()) { > + driver_deferred_probe_trigger(); > + flush_work(&deferred_probe_work); > + } > initcalls_done = true; > > /* > -- Overall, the "let's do nothing if modules are not enabled" approach is a bit disappointing, because what if somebody builds all of the drivers needed for boot in and enables modules anyway, for example to allow USB drivers to be probed dynamically? They surely can be forgiven for expecting to get the right ordering then, like in the "no modules" case ...
On Tue, Feb 2, 2021 at 8:47 PM Saravana Kannan <saravanak@google.com> wrote: > > On Tue, Feb 2, 2021 at 6:34 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > On Tue, Feb 2, 2021 at 5:33 AM Saravana Kannan <saravanak@google.com> wrote: > > > [cut] > > > > > + * > > > + * This function requests fw_devlink to set itself up for a deferred probe > > > + * retry. This allows fw_devlink to ignore device links it created to > > > + * suppliers that'll never probe. This is necessary in case some of the > > > + * suppliers are optional and their consumers can probe without them. > > > + * > > > + * Returns true if deferred probe retry is likely to make any difference. > > > + */ > > > +bool fw_devlink_deferred_probe_retry(void) > > > +{ > > > + if (IS_ENABLED(CONFIG_MODULES)) > > > + return false; > > > > To make the above more visible, I'd fold this function into the caller. > > I had written it this way because I'm thinking of adding a timeout > heuristic for MODULES in here. I can move it to the caller if you feel > strongly about it. Not really strongly, but then moving it back when you need doesn't sound particularly troublesome to me. :-) > > > > > + > > > + fw_devlink_def_probe_retry = true; > > > + return fw_devlink_get_flags() && !fw_devlink_is_permissive(); > > > +} > > > + > > > /** > > > * fw_devlink_create_devlink - Create a device link from a consumer to fwnode > > > * @con - Consumer device for the device link > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > > index 9179825ff646..11325df2327f 100644 > > > --- a/drivers/base/dd.c > > > +++ b/drivers/base/dd.c > > > @@ -317,6 +317,11 @@ static int deferred_probe_initcall(void) > > > driver_deferred_probe_trigger(); > > > /* Sort as many dependencies as possible before exiting initcalls */ > > > flush_work(&deferred_probe_work); > > > + > > > + if (fw_devlink_deferred_probe_retry()) { > > > + driver_deferred_probe_trigger(); > > > + flush_work(&deferred_probe_work); > > > + } > > > initcalls_done = true; > > > > > > /* > > > -- > > > > Overall, the "let's do nothing if modules are not enabled" approach is > > a bit disappointing, because what if somebody builds all of the > > drivers needed for boot in and enables modules anyway, for example to > > allow USB drivers to be probed dynamically? > > Yeah, I'm disappointed too :( But I'm trying to get it to work for > !MODULES so that we can enable fw_devlink=on by default at least for > !MODULES to make sure drivers don't introduce more issues going > forward. And then I plan to continue working on making it work > correctly for MODULES case too. > > Getting fw_devlink=on to work perfectly for MODULES and !MODULES is > not a problem at all. But it needs fixing a bunch of drivers (mostly > simple fixes like setting the right flag, handling deferred probes > correctly, etc), but I'm hitting a catch-22 here. I can't find the > drivers without setting fw_devlink=on by default. But if I did that, > it's going to break a bunch of boards. > > What's your thought on leaving fw_devlink=on by default on 5.12 and > fixing drivers as issues are reported? If there are any issues known today that need to be addressed, I'd fix them first and then try to enable fw_devlink=on maybe just for !MODULES to start with. > If that's a no, do you have any other ideas on how to deal with this catch-22? Try to enable, fix issues as they show up in linux-next. If there are still outstanding issues before the next release, back off and try in the next cycle. Repeat. This doesn't sound particularly attractive, but I don't have any better idea, sorry.
On Thu, Feb 4, 2021 at 10:41 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Tue, Feb 2, 2021 at 8:47 PM Saravana Kannan <saravanak@google.com> wrote: > > > > On Tue, Feb 2, 2021 at 6:34 AM Rafael J. Wysocki <rafael@kernel.org> wrote: > > > > > > On Tue, Feb 2, 2021 at 5:33 AM Saravana Kannan <saravanak@google.com> wrote: > > > > > > [cut] > > > > > > > > + * > > > > + * This function requests fw_devlink to set itself up for a deferred probe > > > > + * retry. This allows fw_devlink to ignore device links it created to > > > > + * suppliers that'll never probe. This is necessary in case some of the > > > > + * suppliers are optional and their consumers can probe without them. > > > > + * > > > > + * Returns true if deferred probe retry is likely to make any difference. > > > > + */ > > > > +bool fw_devlink_deferred_probe_retry(void) > > > > +{ > > > > + if (IS_ENABLED(CONFIG_MODULES)) > > > > + return false; > > > > > > To make the above more visible, I'd fold this function into the caller. > > > > I had written it this way because I'm thinking of adding a timeout > > heuristic for MODULES in here. I can move it to the caller if you feel > > strongly about it. > > Not really strongly, but then moving it back when you need doesn't > sound particularly troublesome to me. :-) Ok, will move it. I'm also rewriting this patch. So we'll see where this lands. > > > > > > > > + > > > > + fw_devlink_def_probe_retry = true; > > > > + return fw_devlink_get_flags() && !fw_devlink_is_permissive(); > > > > +} > > > > + > > > > /** > > > > * fw_devlink_create_devlink - Create a device link from a consumer to fwnode > > > > * @con - Consumer device for the device link > > > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > > > > index 9179825ff646..11325df2327f 100644 > > > > --- a/drivers/base/dd.c > > > > +++ b/drivers/base/dd.c > > > > @@ -317,6 +317,11 @@ static int deferred_probe_initcall(void) > > > > driver_deferred_probe_trigger(); > > > > /* Sort as many dependencies as possible before exiting initcalls */ > > > > flush_work(&deferred_probe_work); > > > > + > > > > + if (fw_devlink_deferred_probe_retry()) { > > > > + driver_deferred_probe_trigger(); > > > > + flush_work(&deferred_probe_work); > > > > + } > > > > initcalls_done = true; > > > > > > > > /* > > > > -- > > > > > > Overall, the "let's do nothing if modules are not enabled" approach is > > > a bit disappointing, because what if somebody builds all of the > > > drivers needed for boot in and enables modules anyway, for example to > > > allow USB drivers to be probed dynamically? > > > > Yeah, I'm disappointed too :( But I'm trying to get it to work for > > !MODULES so that we can enable fw_devlink=on by default at least for > > !MODULES to make sure drivers don't introduce more issues going > > forward. And then I plan to continue working on making it work > > correctly for MODULES case too. > > > > Getting fw_devlink=on to work perfectly for MODULES and !MODULES is > > not a problem at all. But it needs fixing a bunch of drivers (mostly > > simple fixes like setting the right flag, handling deferred probes > > correctly, etc), but I'm hitting a catch-22 here. I can't find the > > drivers without setting fw_devlink=on by default. But if I did that, > > it's going to break a bunch of boards. > > > > What's your thought on leaving fw_devlink=on by default on 5.12 and > > fixing drivers as issues are reported? > > If there are any issues known today that need to be addressed, I'd fix > them first and then try to enable fw_devlink=on maybe just for > !MODULES to start with. Yeah, that's what I'm thinking of for now. > > If that's a no, do you have any other ideas on how to deal with this catch-22? > > Try to enable, fix issues as they show up in linux-next. If there are > still outstanding issues before the next release, back off and try in > the next cycle. Repeat. If it's just dealing with outstanding issues that are reported, I'm hoping I can do that. The biggest headache right now is dealing with devices that have drivers that directly parse the fwnode AND still have a struct device. So the struct device remains unbound even if the driver has initialized the device. > > This doesn't sound particularly attractive, but I don't have any > better idea, sorry. :'( Yeah, another approach I'm thinking of is to have a separate "strict mode" for fw_devlink=on or above. Where it'll try it's best till kernel late init and then fallback to permissive. But it's becoming a headache to deal with some corner cases. -Saravana
diff --git a/drivers/base/base.h b/drivers/base/base.h index f5600a83124f..34befe9475cb 100644 --- a/drivers/base/base.h +++ b/drivers/base/base.h @@ -186,6 +186,8 @@ extern void device_links_no_driver(struct device *dev); extern bool device_links_busy(struct device *dev); extern void device_links_unbind_consumers(struct device *dev); +bool fw_devlink_deferred_probe_retry(void); + /* device pm support */ void device_pm_move_to_tail(struct device *dev); diff --git a/drivers/base/core.c b/drivers/base/core.c index c95b1daabac7..5e53fc6a21ea 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -50,6 +50,7 @@ static LIST_HEAD(deferred_sync); static unsigned int defer_sync_state_count = 1; static DEFINE_MUTEX(fwnode_link_lock); static bool fw_devlink_is_permissive(void); +static bool fw_devlink_def_probe_retry; /** * fwnode_link_add - Create a link between two fwnode_handles. @@ -881,6 +882,13 @@ static void device_link_put_kref(struct device_link *link) WARN(1, "Unable to drop a managed device link reference\n"); } +static void device_link_drop_managed(struct device_link *link) +{ + link->flags &= ~DL_FLAG_MANAGED; + WRITE_ONCE(link->status, DL_STATE_NONE); + kref_put(&link->kref, __device_link_del); +} + /** * device_link_del - Delete a stateless link between two devices. * @link: Device link to delete. @@ -943,6 +951,29 @@ static void device_links_missing_supplier(struct device *dev) } } +/** + * device_links_probe_blocked_by - Return first supplier blocking probe + * @dev: Consumer device. + * + * Checks if the probe of @dev is blocked by a supplier without a driver. If + * yes, return that supplier dev. Otherwise, return NULL. + */ +static struct device *device_links_probe_blocked_by(struct device *dev) +{ + struct device_link *link; + + list_for_each_entry(link, &dev->links.suppliers, c_node) { + if (!(link->flags & DL_FLAG_MANAGED) || + link->flags & DL_FLAG_SYNC_STATE_ONLY) + continue; + + if (link->status != DL_STATE_AVAILABLE) + return link->supplier; + } + + return NULL; +} + /** * device_links_check_suppliers - Check presence of supplier drivers. * @dev: Consumer device. @@ -961,7 +992,7 @@ static void device_links_missing_supplier(struct device *dev) */ int device_links_check_suppliers(struct device *dev) { - struct device_link *link; + struct device_link *link, *tmp; int ret = 0; /* @@ -982,19 +1013,47 @@ int device_links_check_suppliers(struct device *dev) device_links_write_lock(); - list_for_each_entry(link, &dev->links.suppliers, c_node) { + list_for_each_entry_safe(link, tmp, &dev->links.suppliers, c_node) { if (!(link->flags & DL_FLAG_MANAGED)) continue; - if (link->status != DL_STATE_AVAILABLE && - !(link->flags & DL_FLAG_SYNC_STATE_ONLY)) { - device_links_missing_supplier(dev); - dev_dbg(dev, "probe deferral - supplier %s not ready\n", - dev_name(link->supplier)); - ret = -EPROBE_DEFER; - break; + + if (link->status == DL_STATE_AVAILABLE || + link->flags & DL_FLAG_SYNC_STATE_ONLY) { + WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE); + continue; + } + + /* + * After a deferred probe attempt has exhaused all the devices + * that can be bound, any device that remains unbound has + * one/both of these conditions true: + * + * (1) It is waiting on its supplier to bind + * (2) It does not have a matching driver + * + * If this device is waiting on a supplier to bind to a driver, + * we make sure condition (1) above is not true for the + * supplier. In which case, condition (2) has to be true for + * the supplier. That is, the supplier doesn't have a matching + * driver. + * + * When we find such a supplier, we delete the device link if + * it was created by fw_devlink. This it to allow the consumer + * to probe in case the supplier is an optional. + */ + if (fw_devlink_def_probe_retry && + link->flags & DL_FLAG_INFERRED && + !device_links_probe_blocked_by(link->supplier)) { + device_link_drop_managed(link); + continue; } - WRITE_ONCE(link->status, DL_STATE_CONSUMER_PROBE); + + device_links_missing_supplier(dev); + dev_dbg(dev, "probe deferral - supplier %s not ready\n", + dev_name(link->supplier)); + ret = -EPROBE_DEFER; + break; } dev->links.status = DL_DEV_PROBING; @@ -1132,13 +1191,6 @@ static void __device_links_supplier_defer_sync(struct device *sup) list_add_tail(&sup->links.defer_sync, &deferred_sync); } -static void device_link_drop_managed(struct device_link *link) -{ - link->flags &= ~DL_FLAG_MANAGED; - WRITE_ONCE(link->status, DL_STATE_NONE); - kref_put(&link->kref, __device_link_del); -} - static ssize_t waiting_for_supplier_show(struct device *dev, struct device_attribute *attr, char *buf) @@ -1597,6 +1649,24 @@ static int fw_devlink_relax_cycle(struct device *con, void *sup) return ret; } +/** fw_devlink_deferred_probe_retry - Set up fw_devlink for probe retries + * + * This function requests fw_devlink to set itself up for a deferred probe + * retry. This allows fw_devlink to ignore device links it created to + * suppliers that'll never probe. This is necessary in case some of the + * suppliers are optional and their consumers can probe without them. + * + * Returns true if deferred probe retry is likely to make any difference. + */ +bool fw_devlink_deferred_probe_retry(void) +{ + if (IS_ENABLED(CONFIG_MODULES)) + return false; + + fw_devlink_def_probe_retry = true; + return fw_devlink_get_flags() && !fw_devlink_is_permissive(); +} + /** * fw_devlink_create_devlink - Create a device link from a consumer to fwnode * @con - Consumer device for the device link diff --git a/drivers/base/dd.c b/drivers/base/dd.c index 9179825ff646..11325df2327f 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -317,6 +317,11 @@ static int deferred_probe_initcall(void) driver_deferred_probe_trigger(); /* Sort as many dependencies as possible before exiting initcalls */ flush_work(&deferred_probe_work); + + if (fw_devlink_deferred_probe_retry()) { + driver_deferred_probe_trigger(); + flush_work(&deferred_probe_work); + } initcalls_done = true; /*
After a deferred probe attempt has exhaused all the devices that can be bound, any device that remains unbound has one/both of these conditions true: (1) It is waiting on its supplier to bind (2) It does not have a matching driver So, to make fw_devlink=on more forgiving of missing drivers for optional suppliers, after we've done a full deferred probe attempt, this patch deletes all device links created by fw_devlink where the supplier hasn't probed yet and the supplier itself is not waiting on any of its suppliers. This allows consumers to probe during another deferred probe attempt if they were waiting on optional suppliers. When modules are enabled, we can't differentiate between a driver that'll never be registered vs a driver that'll be registered soon by loading a module. So, this patch doesn't do anything for the case where modules are enabled. Fixes: e590474768f1 ("driver core: Set fw_devlink=on by default") Signed-off-by: Saravana Kannan <saravanak@google.com> --- drivers/base/base.h | 2 + drivers/base/core.c | 104 ++++++++++++++++++++++++++++++++++++-------- drivers/base/dd.c | 5 +++ 3 files changed, 94 insertions(+), 17 deletions(-)