Message ID | bdea97a5-93e5-471f-88fc-a3c6ae74970a@hansg.org |
---|---|
State | New |
Headers | show |
Series | [6.9,gpiolib,regression] gpiolib: triggers: kobject: 'gpiochipX' is not, initialized, yet kobject_get() errors | expand |
On Fri, Mar 29, 2024 at 5:16 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > On Fri, 29 Mar 2024 15:11:21 +0100, Hans de Goede <hans@hansg.org> said: ... > Thanks for the report. I hope I'm not being naive here but would the following > one-liner work? > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > index ce94e37bcbee..69f365ccbfd8 100644 > --- a/drivers/gpio/gpiolib.c > +++ b/drivers/gpio/gpiolib.c > @@ -1179,7 +1179,7 @@ struct gpio_device *gpio_device_find(const void *data, > > gc = srcu_dereference(gdev->chip, &gdev->srcu); > > - if (gc && match(gc, data)) > + if (device_is_registered(&gdev->dev) && gc && match(gc, data)) > return gpio_device_get(gdev); In case you are going with this approach, wouldn't be better to if (!device_is_register(...)) continue; gc = ... ? > } > > This would make gpio_device_find() ignore any GPIO device that's not yet > registered on the GPIO bus which is almost the last step of the registration > process right before creating the sysfs attributes.
Hi Bartosz, On 3/29/24 4:16 PM, Bartosz Golaszewski wrote: > On Fri, 29 Mar 2024 15:11:21 +0100, Hans de Goede <hans@hansg.org> said: >> Hi All, >> >> I've already tried to fix this, so let me just copy and paste my half finished patch >> to explain the problem. >> >> I was planning on submitting this as a RFC patch at least, but there are also some >> other new issues with 6.9 on this tablet and I'm not sure how this interacts >> with those issues and I don't have time to work on this any further this weekend. >> >> Anyways below is the patch / bug report. >> >> I'm wondering if a better fix would be to add a "ready" flag to gdev >> and may gpiochip_find ignore not yet ready chips (we need them on >> the list before they are ready to reserve the GPIO numbers) ? >> >> Regards, >> >> Hans >> > > Hi Hans! > > Thanks for the report. I hope I'm not being naive here but would the following > one-liner work? > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > index ce94e37bcbee..69f365ccbfd8 100644 > --- a/drivers/gpio/gpiolib.c > +++ b/drivers/gpio/gpiolib.c > @@ -1179,7 +1179,7 @@ struct gpio_device *gpio_device_find(const void *data, > > gc = srcu_dereference(gdev->chip, &gdev->srcu); > > - if (gc && match(gc, data)) > + if (device_is_registered(&gdev->dev) && gc && match(gc, data)) > return gpio_device_get(gdev); > } > > This would make gpio_device_find() ignore any GPIO device that's not yet > registered on the GPIO bus which is almost the last step of the registration > process right before creating the sysfs attributes. Yes that should work and it has the added advantage that it also waits for things like the irqchip to be setup before gpio_device_find() will find the gpio-device. I cannot trigger the race every boot, but I do hit it quite regularly and with this change I've done 10 successful consecutive boots, so I believe that this indeed fixes the race. I've prepared a patch with this fix now which I'll send out shortly. As for Andy's suggestion I'm not all that familiar with the RCU stuff, but I think that if we were to go that route then the device_is_registered() check should be moved up to above the "guard(srcu)(&gdev->srcu);" line rather then above the "gc = srcu_deref..." line, since in that case we are not using the gdev->chip pointer at all if we bail ? Anyways for now I've just gone with your suggested 1 liner. Regards, Hans p.s. While looking into this I noticed one other possible problem, unless gpiochip_add_data_with_key() and gpiolib_dev_init() are guaranteed to never run at the same time then we may end up calling gpiochip_setup_dev() twice, once from gpiolib_dev_init() and once from gpiochip_add_data_with_key() when the 2 race.
On Tue, Apr 02, 2024 at 03:41:00PM +0200, Hans de Goede wrote: > On 3/29/24 4:16 PM, Bartosz Golaszewski wrote: > > On Fri, 29 Mar 2024 15:11:21 +0100, Hans de Goede <hans@hansg.org> said: ... > > Thanks for the report. I hope I'm not being naive here but would the following > > one-liner work? > > > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > > index ce94e37bcbee..69f365ccbfd8 100644 > > --- a/drivers/gpio/gpiolib.c > > +++ b/drivers/gpio/gpiolib.c > > @@ -1179,7 +1179,7 @@ struct gpio_device *gpio_device_find(const void *data, > > > > gc = srcu_dereference(gdev->chip, &gdev->srcu); > > > > - if (gc && match(gc, data)) > > + if (device_is_registered(&gdev->dev) && gc && match(gc, data)) > > return gpio_device_get(gdev); > > } > > > > This would make gpio_device_find() ignore any GPIO device that's not yet > > registered on the GPIO bus which is almost the last step of the registration > > process right before creating the sysfs attributes. > > Yes that should work and it has the added advantage that it also waits > for things like the irqchip to be setup before gpio_device_find() will > find the gpio-device. > > I cannot trigger the race every boot, but I do hit it quite regularly > and with this change I've done 10 successful consecutive boots, so > I believe that this indeed fixes the race. > > I've prepared a patch with this fix now which I'll send out shortly. > > As for Andy's suggestion I'm not all that familiar with the RCU stuff, > but I think that if we were to go that route then the device_is_registered() > check should be moved up to above the "guard(srcu)(&gdev->srcu);" > line rather then above the "gc = srcu_deref..." line, since in that > case we are not using the gdev->chip pointer at all if we bail ? I believe you are right and we need to move this check out of SRCU scope. (FWIW, I also thought the very same way after I had sent the message and was hesitating to reply with that.)
On Tue, 2 Apr 2024 at 15:54, Andy Shevchenko <andy@kernel.org> wrote: > > On Tue, Apr 02, 2024 at 03:41:00PM +0200, Hans de Goede wrote: > > On 3/29/24 4:16 PM, Bartosz Golaszewski wrote: > > > On Fri, 29 Mar 2024 15:11:21 +0100, Hans de Goede <hans@hansg.org> said: > > ... > > > > Thanks for the report. I hope I'm not being naive here but would the following > > > one-liner work? > > > > > > diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c > > > index ce94e37bcbee..69f365ccbfd8 100644 > > > --- a/drivers/gpio/gpiolib.c > > > +++ b/drivers/gpio/gpiolib.c > > > @@ -1179,7 +1179,7 @@ struct gpio_device *gpio_device_find(const void *data, > > > > > > gc = srcu_dereference(gdev->chip, &gdev->srcu); > > > > > > - if (gc && match(gc, data)) > > > + if (device_is_registered(&gdev->dev) && gc && match(gc, data)) > > > return gpio_device_get(gdev); > > > } > > > > > > This would make gpio_device_find() ignore any GPIO device that's not yet > > > registered on the GPIO bus which is almost the last step of the registration > > > process right before creating the sysfs attributes. > > > > Yes that should work and it has the added advantage that it also waits > > for things like the irqchip to be setup before gpio_device_find() will > > find the gpio-device. > > > > I cannot trigger the race every boot, but I do hit it quite regularly > > and with this change I've done 10 successful consecutive boots, so > > I believe that this indeed fixes the race. > > > > I've prepared a patch with this fix now which I'll send out shortly. > > > > As for Andy's suggestion I'm not all that familiar with the RCU stuff, > > but I think that if we were to go that route then the device_is_registered() > > check should be moved up to above the "guard(srcu)(&gdev->srcu);" > > line rather then above the "gc = srcu_deref..." line, since in that > > case we are not using the gdev->chip pointer at all if we bail ? > > I believe you are right and we need to move this check out of SRCU scope. > (FWIW, I also thought the very same way after I had sent the message and > was hesitating to reply with that.) > Yes, there's no reason to have it inside the SRCU read lock section. It doesn't protect the kobject internals. The variable checked in device_is_registered() is a single bit. Once we see it as set, we can assume the device is registered with its subsystem. Bart
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index ce94e37bcbee..2dbd1183d35c 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -697,13 +697,15 @@ static void gpiodev_release(struct device *dev) struct gpio_device *gdev = to_gpio_device(dev); unsigned int i; - for (i = 0; i < gdev->ngpio; i++) - cleanup_srcu_struct(&gdev->descs[i].srcu); + if (gdev->cleanup_scru_on_release) { + cleanup_srcu_struct(&gdev->srcu); + for (i = 0; i < gdev->ngpio; i++) + cleanup_srcu_struct(&gdev->descs[i].srcu); + } ida_free(&gpio_ida, gdev->id); kfree_const(gdev->label); kfree(gdev->descs); - cleanup_srcu_struct(&gdev->srcu); kfree(gdev); } @@ -729,8 +731,6 @@ static int gpiochip_setup_dev(struct gpio_device *gdev) struct fwnode_handle *fwnode = dev_fwnode(&gdev->dev); int ret; - device_initialize(&gdev->dev); - /* * If fwnode doesn't belong to another device, it's safe to clear its * initialized flag. @@ -927,6 +927,8 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, gdev->ngpio = gc->ngpio; gdev->can_sleep = gc->can_sleep; + device_initialize(&gdev->dev); + scoped_guard(mutex, &gpio_devices_lock) { /* * TODO: this allocates a Linux GPIO number base in the global @@ -941,7 +943,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, if (base < 0) { ret = base; base = 0; - goto err_free_label; + goto err_gpio_device_put; } /* @@ -961,7 +963,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, ret = gpiodev_add_to_list_unlocked(gdev); if (ret) { chip_err(gc, "GPIO integer space overlap, cannot add chip\n"); - goto err_free_label; + goto err_gpio_device_put; } } @@ -1045,6 +1047,14 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, if (ret) goto err_remove_irqchip; } + + /* + * HACK instead initializing all the scru structs should be moved + * to above the device_initialize(&gdev->dev) call, and their cleanup + * on error should be moved accordingly and skipped with the + * goto err_print_message; when error-exiting after device_initialize(). + */ + gdev->cleanup_scru_on_release = true; return 0; err_remove_irqchip: @@ -1067,13 +1077,9 @@ int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data, scoped_guard(mutex, &gpio_devices_lock) list_del_rcu(&gdev->list); synchronize_srcu(&gpio_devices_srcu); - if (gdev->dev.release) { - /* release() has been registered by gpiochip_setup_dev() */ - gpio_device_put(gdev); - goto err_print_message; - } -err_free_label: - kfree_const(gdev->label); +err_gpio_device_put: + gpio_device_put(gdev); + goto err_print_message; err_free_descs: kfree(gdev->descs); err_free_dev_name: diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h index f67d5991ab1c..199b9c49974e 100644 --- a/drivers/gpio/gpiolib.h +++ b/drivers/gpio/gpiolib.h @@ -64,6 +64,7 @@ struct gpio_device { int base; u16 ngpio; bool can_sleep; + bool cleanup_scru_on_release; const char *label; void *data; struct list_head list;