Message ID | ad839a82-8694-4f99-b1c1-0ee53c9d40cf@gmail.com |
---|---|
State | New |
Headers | show |
Series | i2c: Replace lists of special clients with flagging of such clients | expand |
On Fri, Nov 01, 2024 at 11:09:51PM +0100, Heiner Kallweit wrote: > So far a list is used to track auto-detected clients per driver. > The same functionality can be achieved much simpler by flagging > auto-detected clients. > > Two notes regarding the usage of driver_for_each_device: > In our case it can't fail, however the function is annotated __must_check. > So a little workaround is needed to avoid a compiler warning. > Then we may remove nodes from the list over which we iterate. > This is safe, see the explanation at the beginning of lib/klist.c. > > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Applied to for-next, thanks! One minor edit: > +#define I2C_CLIENT_AUTO 0x100 /* for board_info; auto-detected */ This is not for 'board_info'. I changed it to "client was auto-detected".
Hi, Got a deadlock issue with this patch in v6.14-rc1. On Fri, 1 Nov 2024 23:09:51 +0100 Heiner Kallweit <hkallweit1@gmail.com> wrote: > So far a list is used to track auto-detected clients per driver. > The same functionality can be achieved much simpler by flagging > auto-detected clients. > > Two notes regarding the usage of driver_for_each_device: > In our case it can't fail, however the function is annotated __must_check. > So a little workaround is needed to avoid a compiler warning. > Then we may remove nodes from the list over which we iterate. > This is safe, see the explanation at the beginning of lib/klist.c. > > Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> > --- > v3: > - protect client removal with core_lock mutex > --- > drivers/i2c/i2c-core-base.c | 52 ++++++++++++------------------------- > include/linux/i2c.h | 3 +-- > 2 files changed, 17 insertions(+), 38 deletions(-) > ... > @@ -1780,8 +1752,10 @@ void i2c_del_adapter(struct i2c_adapter *adap) > * we can't remove the dummy devices during the first pass: they > * could have been instantiated by real devices wishing to clean > * them up properly, so we give them a chance to do that first. */ > + mutex_lock(&core_lock); > device_for_each_child(&adap->dev, NULL, __unregister_client); > device_for_each_child(&adap->dev, NULL, __unregister_dummy); > + mutex_unlock(&core_lock); > Calling __unregister_client() with core_lock mutex held leads to a deadlock in my case: # echo 30a40000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind [ 242.928264] [ 242.929779] ============================================ [ 242.935092] WARNING: possible recursive locking detected [ 242.940406] 6.14.0-rc1+ #22 Not tainted [ 242.944245] -------------------------------------------- [ 242.949556] sh/299 is trying to acquire lock: [ 242.953915] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0x44/0x1b0 [ 242.961689] [ 242.961689] but task is already holding lock: [ 242.967524] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0 [ 242.975285] [ 242.975285] other info that might help us debug this: [ 242.981814] Possible unsafe locking scenario: [ 242.981814] [ 242.987732] CPU0 [ 242.990179] ---- [ 242.992625] lock(core_lock); [ 242.995686] lock(core_lock); [ 242.998748] [ 242.998748] *** DEADLOCK *** [ 242.998748] [ 243.004666] May be due to missing lock nesting notation [ 243.004666] [ 243.011455] 5 locks held by sh/299: [ 243.014946] #0: ffff000079a533f0 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0x1c4/0x398 [ 243.022976] #1: ffff000005c29088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xf8/0x1c8 [ 243.031962] #2: ffff000000c240f8 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250 [ 243.041645] #3: ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0 [ 243.049845] #4: ffff000079f24908 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250 [ 243.059522] [ 243.059522] stack backtrace: [ 243.063883] CPU: 2 UID: 0 PID: 299 Comm: sh Not tainted 6.14.0-rc1+ #22 [ 243.070502] Hardware name: GE HealthCare Supernova Patient Hub v1 (DT) [ 243.077032] Call trace: [ 243.079481] show_stack+0x20/0x38 (C) [ 243.083152] dump_stack_lvl+0x90/0xd0 [ 243.086819] dump_stack+0x18/0x28 [ 243.090140] print_deadlock_bug+0x260/0x350 [ 243.094332] __lock_acquire+0x113c/0x2180 [ 243.098346] lock_acquire+0x1c4/0x350 [ 243.102015] __mutex_lock+0x9c/0x500 [ 243.105599] mutex_lock_nested+0x2c/0x40 [ 243.109528] i2c_del_adapter+0x44/0x1b0 [ 243.113371] i2c_mux_del_adapters+0xa0/0x100 [ 243.117649] pca954x_cleanup+0x98/0xd0 [ 243.121406] pca954x_remove+0x38/0x50 [ 243.125078] i2c_device_remove+0x34/0xb8 [ 243.129007] device_remove+0x54/0x90 [ 243.132590] device_release_driver_internal+0x1e8/0x250 [ 243.137824] device_release_driver+0x20/0x38 [ 243.142101] bus_remove_device+0xd4/0x120 [ 243.146116] device_del+0x14c/0x410 [ 243.149612] device_unregister+0x20/0x48 [ 243.153540] i2c_unregister_device.part.0+0x50/0x88 [ 243.158427] __unregister_client+0x74/0x80 [ 243.162530] device_for_each_child+0x68/0xc8 [ 243.166811] i2c_del_adapter+0xb8/0x1b0 [ 243.170653] i2c_imx_remove+0x4c/0x190 [ 243.174412] platform_remove+0x30/0x58 [ 243.178167] device_remove+0x54/0x90 [ 243.181751] device_release_driver_internal+0x1e8/0x250 [ 243.186982] device_driver_detach+0x20/0x38 [ 243.191172] unbind_store+0xbc/0xc8 ... When I unbind the i2c SoC adapter driver, i2c_del_adapter() is indeed called recursively. The first call is for the 30a40000.i2c SoC adapter and the second one for an i2c mux connected on the i2c bus. My device-tree looks like this: i2c@30a40000 { compatible = "fsl,imx8mp-i2c", "fsl,imx21-i2c"; ... i2c-mux@70 { compatible = "nxp,pca9543"; ... i2c@0 { ... touchscreen@2a { compatible = "eeti,exc80h60"; ... }; }; i2c@1 { ... }; }; }; Should the core_lock mutex be taken when both __unregister_client() and __unregister_dummy() are called ? Best regards, Hervé Codina
On 04.02.2025 16:24, Herve Codina wrote: > Hi, > > Got a deadlock issue with this patch in v6.14-rc1. > > On Fri, 1 Nov 2024 23:09:51 +0100 > Heiner Kallweit <hkallweit1@gmail.com> wrote: > >> So far a list is used to track auto-detected clients per driver. >> The same functionality can be achieved much simpler by flagging >> auto-detected clients. >> >> Two notes regarding the usage of driver_for_each_device: >> In our case it can't fail, however the function is annotated __must_check. >> So a little workaround is needed to avoid a compiler warning. >> Then we may remove nodes from the list over which we iterate. >> This is safe, see the explanation at the beginning of lib/klist.c. >> >> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> >> --- >> v3: >> - protect client removal with core_lock mutex >> --- >> drivers/i2c/i2c-core-base.c | 52 ++++++++++++------------------------- >> include/linux/i2c.h | 3 +-- >> 2 files changed, 17 insertions(+), 38 deletions(-) >> > ... > >> @@ -1780,8 +1752,10 @@ void i2c_del_adapter(struct i2c_adapter *adap) >> * we can't remove the dummy devices during the first pass: they >> * could have been instantiated by real devices wishing to clean >> * them up properly, so we give them a chance to do that first. */ >> + mutex_lock(&core_lock); >> device_for_each_child(&adap->dev, NULL, __unregister_client); >> device_for_each_child(&adap->dev, NULL, __unregister_dummy); >> + mutex_unlock(&core_lock); >> > > Calling __unregister_client() with core_lock mutex held leads to a deadlock > in my case: > > # echo 30a40000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind > [ 242.928264] > [ 242.929779] ============================================ > [ 242.935092] WARNING: possible recursive locking detected > [ 242.940406] 6.14.0-rc1+ #22 Not tainted > [ 242.944245] -------------------------------------------- > [ 242.949556] sh/299 is trying to acquire lock: > [ 242.953915] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0x44/0x1b0 > [ 242.961689] > [ 242.961689] but task is already holding lock: > [ 242.967524] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0 > [ 242.975285] > [ 242.975285] other info that might help us debug this: > [ 242.981814] Possible unsafe locking scenario: > [ 242.981814] > [ 242.987732] CPU0 > [ 242.990179] ---- > [ 242.992625] lock(core_lock); > [ 242.995686] lock(core_lock); > [ 242.998748] > [ 242.998748] *** DEADLOCK *** > [ 242.998748] > [ 243.004666] May be due to missing lock nesting notation > [ 243.004666] > [ 243.011455] 5 locks held by sh/299: > [ 243.014946] #0: ffff000079a533f0 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0x1c4/0x398 > [ 243.022976] #1: ffff000005c29088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xf8/0x1c8 > [ 243.031962] #2: ffff000000c240f8 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250 > [ 243.041645] #3: ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0 > [ 243.049845] #4: ffff000079f24908 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250 > [ 243.059522] > [ 243.059522] stack backtrace: > [ 243.063883] CPU: 2 UID: 0 PID: 299 Comm: sh Not tainted 6.14.0-rc1+ #22 > [ 243.070502] Hardware name: GE HealthCare Supernova Patient Hub v1 (DT) > [ 243.077032] Call trace: > [ 243.079481] show_stack+0x20/0x38 (C) > [ 243.083152] dump_stack_lvl+0x90/0xd0 > [ 243.086819] dump_stack+0x18/0x28 > [ 243.090140] print_deadlock_bug+0x260/0x350 > [ 243.094332] __lock_acquire+0x113c/0x2180 > [ 243.098346] lock_acquire+0x1c4/0x350 > [ 243.102015] __mutex_lock+0x9c/0x500 > [ 243.105599] mutex_lock_nested+0x2c/0x40 > [ 243.109528] i2c_del_adapter+0x44/0x1b0 > [ 243.113371] i2c_mux_del_adapters+0xa0/0x100 > [ 243.117649] pca954x_cleanup+0x98/0xd0 > [ 243.121406] pca954x_remove+0x38/0x50 > [ 243.125078] i2c_device_remove+0x34/0xb8 > [ 243.129007] device_remove+0x54/0x90 > [ 243.132590] device_release_driver_internal+0x1e8/0x250 > [ 243.137824] device_release_driver+0x20/0x38 > [ 243.142101] bus_remove_device+0xd4/0x120 > [ 243.146116] device_del+0x14c/0x410 > [ 243.149612] device_unregister+0x20/0x48 > [ 243.153540] i2c_unregister_device.part.0+0x50/0x88 > [ 243.158427] __unregister_client+0x74/0x80 > [ 243.162530] device_for_each_child+0x68/0xc8 > [ 243.166811] i2c_del_adapter+0xb8/0x1b0 > [ 243.170653] i2c_imx_remove+0x4c/0x190 > [ 243.174412] platform_remove+0x30/0x58 > [ 243.178167] device_remove+0x54/0x90 > [ 243.181751] device_release_driver_internal+0x1e8/0x250 > [ 243.186982] device_driver_detach+0x20/0x38 > [ 243.191172] unbind_store+0xbc/0xc8 > ... > > When I unbind the i2c SoC adapter driver, i2c_del_adapter() is indeed called > recursively. The first call is for the 30a40000.i2c SoC adapter and the > second one for an i2c mux connected on the i2c bus. > > My device-tree looks like this: > i2c@30a40000 { > compatible = "fsl,imx8mp-i2c", "fsl,imx21-i2c"; > ... > i2c-mux@70 { > compatible = "nxp,pca9543"; > ... > i2c@0 { > ... > touchscreen@2a { > compatible = "eeti,exc80h60"; > ... > }; > }; > > i2c@1 { > ... > }; > }; > }; > > > Should the core_lock mutex be taken when both __unregister_client() and > __unregister_dummy() are called ? > Thanks for the report! I just submitted a fix, for now as RFC. If fixes the deadlock and uses a new approach to prevent the race which caused us to acquire the core lock in few place. Could you please re-test? > Best regards, > Hervé Codina Heiner
diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c index 7c810893b..42c62839d 100644 --- a/drivers/i2c/i2c-core-base.c +++ b/drivers/i2c/i2c-core-base.c @@ -1696,23 +1696,6 @@ int i2c_add_numbered_adapter(struct i2c_adapter *adap) } EXPORT_SYMBOL_GPL(i2c_add_numbered_adapter); -static void i2c_do_del_adapter(struct i2c_driver *driver, - struct i2c_adapter *adapter) -{ - struct i2c_client *client, *_n; - - /* Remove the devices we created ourselves as the result of hardware - * probing (using a driver's detect method) */ - list_for_each_entry_safe(client, _n, &driver->clients, detected) { - if (client->adapter == adapter) { - dev_dbg(&adapter->dev, "Removing %s at 0x%x\n", - client->name, client->addr); - list_del(&client->detected); - i2c_unregister_device(client); - } - } -} - static int __unregister_client(struct device *dev, void *dummy) { struct i2c_client *client = i2c_verify_client(dev); @@ -1728,12 +1711,6 @@ static int __unregister_dummy(struct device *dev, void *dummy) return 0; } -static int __process_removed_adapter(struct device_driver *d, void *data) -{ - i2c_do_del_adapter(to_i2c_driver(d), data); - return 0; -} - /** * i2c_del_adapter - unregister I2C adapter * @adap: the adapter being unregistered @@ -1757,11 +1734,6 @@ void i2c_del_adapter(struct i2c_adapter *adap) } i2c_acpi_remove_space_handler(adap); - /* Tell drivers about this removal */ - mutex_lock(&core_lock); - bus_for_each_drv(&i2c_bus_type, NULL, adap, - __process_removed_adapter); - mutex_unlock(&core_lock); /* Remove devices instantiated from sysfs */ mutex_lock_nested(&adap->userspace_clients_lock, @@ -1780,8 +1752,10 @@ void i2c_del_adapter(struct i2c_adapter *adap) * we can't remove the dummy devices during the first pass: they * could have been instantiated by real devices wishing to clean * them up properly, so we give them a chance to do that first. */ + mutex_lock(&core_lock); device_for_each_child(&adap->dev, NULL, __unregister_client); device_for_each_child(&adap->dev, NULL, __unregister_dummy); + mutex_unlock(&core_lock); /* device name is gone after device_unregister */ dev_dbg(&adap->dev, "adapter [%s] unregistered\n", adap->name); @@ -2001,7 +1975,6 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver) /* add the driver to the list of i2c drivers in the driver core */ driver->driver.owner = owner; driver->driver.bus = &i2c_bus_type; - INIT_LIST_HEAD(&driver->clients); /* When registration returns, the driver core * will have called probe() for all matching-but-unbound devices. @@ -2019,10 +1992,13 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver) } EXPORT_SYMBOL(i2c_register_driver); -static int __process_removed_driver(struct device *dev, void *data) +static int __i2c_unregister_detected_client(struct device *dev, void *argp) { - if (dev->type == &i2c_adapter_type) - i2c_do_del_adapter(data, to_i2c_adapter(dev)); + struct i2c_client *client = i2c_verify_client(dev); + + if (client && client->flags & I2C_CLIENT_AUTO) + i2c_unregister_device(client); + return 0; } @@ -2033,7 +2009,12 @@ static int __process_removed_driver(struct device *dev, void *data) */ void i2c_del_driver(struct i2c_driver *driver) { - i2c_for_each_dev(driver, __process_removed_driver); + mutex_lock(&core_lock); + /* Satisfy __must_check, function can't fail */ + if (driver_for_each_device(&driver->driver, NULL, NULL, + __i2c_unregister_detected_client)) { + } + mutex_unlock(&core_lock); driver_unregister(&driver->driver); pr_debug("driver [%s] unregistered\n", driver->driver.name); @@ -2460,6 +2441,7 @@ static int i2c_detect_address(struct i2c_client *temp_client, /* Finally call the custom detection function */ memset(&info, 0, sizeof(struct i2c_board_info)); info.addr = addr; + info.flags = I2C_CLIENT_AUTO; err = driver->detect(temp_client, &info); if (err) { /* -ENODEV is returned if the detection fails. We catch it @@ -2486,9 +2468,7 @@ static int i2c_detect_address(struct i2c_client *temp_client, dev_dbg(&adapter->dev, "Creating %s at 0x%02x\n", info.type, info.addr); client = i2c_new_client_device(adapter, &info); - if (!IS_ERR(client)) - list_add_tail(&client->detected, &driver->clients); - else + if (IS_ERR(client)) dev_err(&adapter->dev, "Failed creating %s at 0x%02x\n", info.type, info.addr); } diff --git a/include/linux/i2c.h b/include/linux/i2c.h index 388ce71a2..c4c8e841a 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -244,7 +244,6 @@ enum i2c_driver_flags { * @id_table: List of I2C devices supported by this driver * @detect: Callback for device detection * @address_list: The I2C addresses to probe (for detect) - * @clients: List of detected clients we created (for i2c-core use only) * @flags: A bitmask of flags defined in &enum i2c_driver_flags * * The driver.owner field should be set to the module owner of this driver. @@ -299,7 +298,6 @@ struct i2c_driver { /* Device detection callback for automatic device creation */ int (*detect)(struct i2c_client *client, struct i2c_board_info *info); const unsigned short *address_list; - struct list_head clients; u32 flags; }; @@ -334,6 +332,7 @@ struct i2c_client { #define I2C_CLIENT_SLAVE 0x20 /* we are the slave */ #define I2C_CLIENT_HOST_NOTIFY 0x40 /* We want to use I2C host notify */ #define I2C_CLIENT_WAKE 0x80 /* for board_info; true iff can wake */ +#define I2C_CLIENT_AUTO 0x100 /* for board_info; auto-detected */ #define I2C_CLIENT_SCCB 0x9000 /* Use Omnivision SCCB protocol */ /* Must match I2C_M_STOP|IGNORE_NAK */
So far a list is used to track auto-detected clients per driver. The same functionality can be achieved much simpler by flagging auto-detected clients. Two notes regarding the usage of driver_for_each_device: In our case it can't fail, however the function is annotated __must_check. So a little workaround is needed to avoid a compiler warning. Then we may remove nodes from the list over which we iterate. This is safe, see the explanation at the beginning of lib/klist.c. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> --- v3: - protect client removal with core_lock mutex --- drivers/i2c/i2c-core-base.c | 52 ++++++++++++------------------------- include/linux/i2c.h | 3 +-- 2 files changed, 17 insertions(+), 38 deletions(-)