diff mbox series

[v3,2/4] i2c: Replace list-based mechanism for handling auto-detected clients

Message ID ad839a82-8694-4f99-b1c1-0ee53c9d40cf@gmail.com
State New
Headers show
Series i2c: Replace lists of special clients with flagging of such clients | expand

Commit Message

Heiner Kallweit Nov. 1, 2024, 10:09 p.m. UTC
So far a list is used to track auto-detected clients per driver.
The same functionality can be achieved much simpler by flagging
auto-detected clients.

Two notes regarding the usage of driver_for_each_device:
In our case it can't fail, however the function is annotated __must_check.
So a little workaround is needed to avoid a compiler warning.
Then we may remove nodes from the list over which we iterate.
This is safe, see the explanation at the beginning of lib/klist.c.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
v3:
- protect client removal with core_lock mutex
---
 drivers/i2c/i2c-core-base.c | 52 ++++++++++++-------------------------
 include/linux/i2c.h         |  3 +--
 2 files changed, 17 insertions(+), 38 deletions(-)

Comments

Wolfram Sang Jan. 3, 2025, 12:29 p.m. UTC | #1
On Fri, Nov 01, 2024 at 11:09:51PM +0100, Heiner Kallweit wrote:
> So far a list is used to track auto-detected clients per driver.
> The same functionality can be achieved much simpler by flagging
> auto-detected clients.
> 
> Two notes regarding the usage of driver_for_each_device:
> In our case it can't fail, however the function is annotated __must_check.
> So a little workaround is needed to avoid a compiler warning.
> Then we may remove nodes from the list over which we iterate.
> This is safe, see the explanation at the beginning of lib/klist.c.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Applied to for-next, thanks! One minor edit:

> +#define I2C_CLIENT_AUTO		0x100	/* for board_info; auto-detected */

This is not for 'board_info'. I changed it to "client was
auto-detected".
Herve Codina Feb. 4, 2025, 3:24 p.m. UTC | #2
Hi,

Got a deadlock issue with this patch in v6.14-rc1.

On Fri, 1 Nov 2024 23:09:51 +0100
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> So far a list is used to track auto-detected clients per driver.
> The same functionality can be achieved much simpler by flagging
> auto-detected clients.
> 
> Two notes regarding the usage of driver_for_each_device:
> In our case it can't fail, however the function is annotated __must_check.
> So a little workaround is needed to avoid a compiler warning.
> Then we may remove nodes from the list over which we iterate.
> This is safe, see the explanation at the beginning of lib/klist.c.
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
> v3:
> - protect client removal with core_lock mutex
> ---
>  drivers/i2c/i2c-core-base.c | 52 ++++++++++++-------------------------
>  include/linux/i2c.h         |  3 +--
>  2 files changed, 17 insertions(+), 38 deletions(-)
> 
...

> @@ -1780,8 +1752,10 @@ void i2c_del_adapter(struct i2c_adapter *adap)
>  	 * we can't remove the dummy devices during the first pass: they
>  	 * could have been instantiated by real devices wishing to clean
>  	 * them up properly, so we give them a chance to do that first. */
> +	mutex_lock(&core_lock);
>  	device_for_each_child(&adap->dev, NULL, __unregister_client);
>  	device_for_each_child(&adap->dev, NULL, __unregister_dummy);
> +	mutex_unlock(&core_lock);
>  

Calling __unregister_client() with core_lock mutex held leads to a deadlock
in my case:

    # echo 30a40000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
    [  242.928264] 
    [  242.929779] ============================================
    [  242.935092] WARNING: possible recursive locking detected
    [  242.940406] 6.14.0-rc1+ #22 Not tainted
    [  242.944245] --------------------------------------------
    [  242.949556] sh/299 is trying to acquire lock:
    [  242.953915] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0x44/0x1b0
    [  242.961689] 
    [  242.961689] but task is already holding lock:
    [  242.967524] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0
    [  242.975285] 
    [  242.975285] other info that might help us debug this:
    [  242.981814]  Possible unsafe locking scenario:
    [  242.981814] 
    [  242.987732]        CPU0
    [  242.990179]        ----
    [  242.992625]   lock(core_lock);
    [  242.995686]   lock(core_lock);
    [  242.998748] 
    [  242.998748]  *** DEADLOCK ***
    [  242.998748] 
    [  243.004666]  May be due to missing lock nesting notation
    [  243.004666] 
    [  243.011455] 5 locks held by sh/299:
    [  243.014946]  #0: ffff000079a533f0 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0x1c4/0x398
    [  243.022976]  #1: ffff000005c29088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xf8/0x1c8
    [  243.031962]  #2: ffff000000c240f8 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250
    [  243.041645]  #3: ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0
    [  243.049845]  #4: ffff000079f24908 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250
    [  243.059522] 
    [  243.059522] stack backtrace:
    [  243.063883] CPU: 2 UID: 0 PID: 299 Comm: sh Not tainted 6.14.0-rc1+ #22
    [  243.070502] Hardware name: GE HealthCare Supernova Patient Hub v1 (DT)
    [  243.077032] Call trace:
    [  243.079481]  show_stack+0x20/0x38 (C)
    [  243.083152]  dump_stack_lvl+0x90/0xd0
    [  243.086819]  dump_stack+0x18/0x28
    [  243.090140]  print_deadlock_bug+0x260/0x350
    [  243.094332]  __lock_acquire+0x113c/0x2180
    [  243.098346]  lock_acquire+0x1c4/0x350
    [  243.102015]  __mutex_lock+0x9c/0x500
    [  243.105599]  mutex_lock_nested+0x2c/0x40
    [  243.109528]  i2c_del_adapter+0x44/0x1b0
    [  243.113371]  i2c_mux_del_adapters+0xa0/0x100
    [  243.117649]  pca954x_cleanup+0x98/0xd0
    [  243.121406]  pca954x_remove+0x38/0x50
    [  243.125078]  i2c_device_remove+0x34/0xb8
    [  243.129007]  device_remove+0x54/0x90
    [  243.132590]  device_release_driver_internal+0x1e8/0x250
    [  243.137824]  device_release_driver+0x20/0x38
    [  243.142101]  bus_remove_device+0xd4/0x120
    [  243.146116]  device_del+0x14c/0x410
    [  243.149612]  device_unregister+0x20/0x48
    [  243.153540]  i2c_unregister_device.part.0+0x50/0x88
    [  243.158427]  __unregister_client+0x74/0x80
    [  243.162530]  device_for_each_child+0x68/0xc8
    [  243.166811]  i2c_del_adapter+0xb8/0x1b0
    [  243.170653]  i2c_imx_remove+0x4c/0x190
    [  243.174412]  platform_remove+0x30/0x58
    [  243.178167]  device_remove+0x54/0x90
    [  243.181751]  device_release_driver_internal+0x1e8/0x250
    [  243.186982]  device_driver_detach+0x20/0x38
    [  243.191172]  unbind_store+0xbc/0xc8
    ...

When I unbind the i2c SoC adapter driver, i2c_del_adapter() is indeed called
recursively. The first call is for the 30a40000.i2c SoC adapter and the
second one for an i2c mux connected on the i2c bus.

My device-tree looks like this:
	i2c@30a40000 {
		compatible = "fsl,imx8mp-i2c", "fsl,imx21-i2c";
		...
		i2c-mux@70 {
			compatible = "nxp,pca9543";
			...
			i2c@0 {
				...
				touchscreen@2a {
					compatible = "eeti,exc80h60";
					...
				};
			};
			
			i2c@1 {
				...
			};
		};
	};


Should the core_lock mutex be taken when both __unregister_client() and
__unregister_dummy() are called ?

Best regards,
Hervé Codina
Heiner Kallweit Feb. 4, 2025, 8 p.m. UTC | #3
On 04.02.2025 16:24, Herve Codina wrote:
> Hi,
> 
> Got a deadlock issue with this patch in v6.14-rc1.
> 
> On Fri, 1 Nov 2024 23:09:51 +0100
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> So far a list is used to track auto-detected clients per driver.
>> The same functionality can be achieved much simpler by flagging
>> auto-detected clients.
>>
>> Two notes regarding the usage of driver_for_each_device:
>> In our case it can't fail, however the function is annotated __must_check.
>> So a little workaround is needed to avoid a compiler warning.
>> Then we may remove nodes from the list over which we iterate.
>> This is safe, see the explanation at the beginning of lib/klist.c.
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>> v3:
>> - protect client removal with core_lock mutex
>> ---
>>  drivers/i2c/i2c-core-base.c | 52 ++++++++++++-------------------------
>>  include/linux/i2c.h         |  3 +--
>>  2 files changed, 17 insertions(+), 38 deletions(-)
>>
> ...
> 
>> @@ -1780,8 +1752,10 @@ void i2c_del_adapter(struct i2c_adapter *adap)
>>  	 * we can't remove the dummy devices during the first pass: they
>>  	 * could have been instantiated by real devices wishing to clean
>>  	 * them up properly, so we give them a chance to do that first. */
>> +	mutex_lock(&core_lock);
>>  	device_for_each_child(&adap->dev, NULL, __unregister_client);
>>  	device_for_each_child(&adap->dev, NULL, __unregister_dummy);
>> +	mutex_unlock(&core_lock);
>>  
> 
> Calling __unregister_client() with core_lock mutex held leads to a deadlock
> in my case:
> 
>     # echo 30a40000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
>     [  242.928264] 
>     [  242.929779] ============================================
>     [  242.935092] WARNING: possible recursive locking detected
>     [  242.940406] 6.14.0-rc1+ #22 Not tainted
>     [  242.944245] --------------------------------------------
>     [  242.949556] sh/299 is trying to acquire lock:
>     [  242.953915] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0x44/0x1b0
>     [  242.961689] 
>     [  242.961689] but task is already holding lock:
>     [  242.967524] ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0
>     [  242.975285] 
>     [  242.975285] other info that might help us debug this:
>     [  242.981814]  Possible unsafe locking scenario:
>     [  242.981814] 
>     [  242.987732]        CPU0
>     [  242.990179]        ----
>     [  242.992625]   lock(core_lock);
>     [  242.995686]   lock(core_lock);
>     [  242.998748] 
>     [  242.998748]  *** DEADLOCK ***
>     [  242.998748] 
>     [  243.004666]  May be due to missing lock nesting notation
>     [  243.004666] 
>     [  243.011455] 5 locks held by sh/299:
>     [  243.014946]  #0: ffff000079a533f0 (sb_writers#6){.+.+}-{0:0}, at: vfs_write+0x1c4/0x398
>     [  243.022976]  #1: ffff000005c29088 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0xf8/0x1c8
>     [  243.031962]  #2: ffff000000c240f8 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250
>     [  243.041645]  #3: ffff8000818b82e0 (core_lock){+.+.}-{4:4}, at: i2c_del_adapter+0xa0/0x1b0
>     [  243.049845]  #4: ffff000079f24908 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x48/0x250
>     [  243.059522] 
>     [  243.059522] stack backtrace:
>     [  243.063883] CPU: 2 UID: 0 PID: 299 Comm: sh Not tainted 6.14.0-rc1+ #22
>     [  243.070502] Hardware name: GE HealthCare Supernova Patient Hub v1 (DT)
>     [  243.077032] Call trace:
>     [  243.079481]  show_stack+0x20/0x38 (C)
>     [  243.083152]  dump_stack_lvl+0x90/0xd0
>     [  243.086819]  dump_stack+0x18/0x28
>     [  243.090140]  print_deadlock_bug+0x260/0x350
>     [  243.094332]  __lock_acquire+0x113c/0x2180
>     [  243.098346]  lock_acquire+0x1c4/0x350
>     [  243.102015]  __mutex_lock+0x9c/0x500
>     [  243.105599]  mutex_lock_nested+0x2c/0x40
>     [  243.109528]  i2c_del_adapter+0x44/0x1b0
>     [  243.113371]  i2c_mux_del_adapters+0xa0/0x100
>     [  243.117649]  pca954x_cleanup+0x98/0xd0
>     [  243.121406]  pca954x_remove+0x38/0x50
>     [  243.125078]  i2c_device_remove+0x34/0xb8
>     [  243.129007]  device_remove+0x54/0x90
>     [  243.132590]  device_release_driver_internal+0x1e8/0x250
>     [  243.137824]  device_release_driver+0x20/0x38
>     [  243.142101]  bus_remove_device+0xd4/0x120
>     [  243.146116]  device_del+0x14c/0x410
>     [  243.149612]  device_unregister+0x20/0x48
>     [  243.153540]  i2c_unregister_device.part.0+0x50/0x88
>     [  243.158427]  __unregister_client+0x74/0x80
>     [  243.162530]  device_for_each_child+0x68/0xc8
>     [  243.166811]  i2c_del_adapter+0xb8/0x1b0
>     [  243.170653]  i2c_imx_remove+0x4c/0x190
>     [  243.174412]  platform_remove+0x30/0x58
>     [  243.178167]  device_remove+0x54/0x90
>     [  243.181751]  device_release_driver_internal+0x1e8/0x250
>     [  243.186982]  device_driver_detach+0x20/0x38
>     [  243.191172]  unbind_store+0xbc/0xc8
>     ...
> 
> When I unbind the i2c SoC adapter driver, i2c_del_adapter() is indeed called
> recursively. The first call is for the 30a40000.i2c SoC adapter and the
> second one for an i2c mux connected on the i2c bus.
> 
> My device-tree looks like this:
> 	i2c@30a40000 {
> 		compatible = "fsl,imx8mp-i2c", "fsl,imx21-i2c";
> 		...
> 		i2c-mux@70 {
> 			compatible = "nxp,pca9543";
> 			...
> 			i2c@0 {
> 				...
> 				touchscreen@2a {
> 					compatible = "eeti,exc80h60";
> 					...
> 				};
> 			};
> 			
> 			i2c@1 {
> 				...
> 			};
> 		};
> 	};
> 
> 
> Should the core_lock mutex be taken when both __unregister_client() and
> __unregister_dummy() are called ?
> 

Thanks for the report! I just submitted a fix, for now as RFC.
If fixes the deadlock and uses a new approach to prevent the race
which caused us to acquire the core lock in few place.
Could you please re-test?

> Best regards,
> Hervé Codina

Heiner
diff mbox series

Patch

diff --git a/drivers/i2c/i2c-core-base.c b/drivers/i2c/i2c-core-base.c
index 7c810893b..42c62839d 100644
--- a/drivers/i2c/i2c-core-base.c
+++ b/drivers/i2c/i2c-core-base.c
@@ -1696,23 +1696,6 @@  int i2c_add_numbered_adapter(struct i2c_adapter *adap)
 }
 EXPORT_SYMBOL_GPL(i2c_add_numbered_adapter);
 
-static void i2c_do_del_adapter(struct i2c_driver *driver,
-			      struct i2c_adapter *adapter)
-{
-	struct i2c_client *client, *_n;
-
-	/* Remove the devices we created ourselves as the result of hardware
-	 * probing (using a driver's detect method) */
-	list_for_each_entry_safe(client, _n, &driver->clients, detected) {
-		if (client->adapter == adapter) {
-			dev_dbg(&adapter->dev, "Removing %s at 0x%x\n",
-				client->name, client->addr);
-			list_del(&client->detected);
-			i2c_unregister_device(client);
-		}
-	}
-}
-
 static int __unregister_client(struct device *dev, void *dummy)
 {
 	struct i2c_client *client = i2c_verify_client(dev);
@@ -1728,12 +1711,6 @@  static int __unregister_dummy(struct device *dev, void *dummy)
 	return 0;
 }
 
-static int __process_removed_adapter(struct device_driver *d, void *data)
-{
-	i2c_do_del_adapter(to_i2c_driver(d), data);
-	return 0;
-}
-
 /**
  * i2c_del_adapter - unregister I2C adapter
  * @adap: the adapter being unregistered
@@ -1757,11 +1734,6 @@  void i2c_del_adapter(struct i2c_adapter *adap)
 	}
 
 	i2c_acpi_remove_space_handler(adap);
-	/* Tell drivers about this removal */
-	mutex_lock(&core_lock);
-	bus_for_each_drv(&i2c_bus_type, NULL, adap,
-			       __process_removed_adapter);
-	mutex_unlock(&core_lock);
 
 	/* Remove devices instantiated from sysfs */
 	mutex_lock_nested(&adap->userspace_clients_lock,
@@ -1780,8 +1752,10 @@  void i2c_del_adapter(struct i2c_adapter *adap)
 	 * we can't remove the dummy devices during the first pass: they
 	 * could have been instantiated by real devices wishing to clean
 	 * them up properly, so we give them a chance to do that first. */
+	mutex_lock(&core_lock);
 	device_for_each_child(&adap->dev, NULL, __unregister_client);
 	device_for_each_child(&adap->dev, NULL, __unregister_dummy);
+	mutex_unlock(&core_lock);
 
 	/* device name is gone after device_unregister */
 	dev_dbg(&adap->dev, "adapter [%s] unregistered\n", adap->name);
@@ -2001,7 +1975,6 @@  int i2c_register_driver(struct module *owner, struct i2c_driver *driver)
 	/* add the driver to the list of i2c drivers in the driver core */
 	driver->driver.owner = owner;
 	driver->driver.bus = &i2c_bus_type;
-	INIT_LIST_HEAD(&driver->clients);
 
 	/* When registration returns, the driver core
 	 * will have called probe() for all matching-but-unbound devices.
@@ -2019,10 +1992,13 @@  int i2c_register_driver(struct module *owner, struct i2c_driver *driver)
 }
 EXPORT_SYMBOL(i2c_register_driver);
 
-static int __process_removed_driver(struct device *dev, void *data)
+static int __i2c_unregister_detected_client(struct device *dev, void *argp)
 {
-	if (dev->type == &i2c_adapter_type)
-		i2c_do_del_adapter(data, to_i2c_adapter(dev));
+	struct i2c_client *client = i2c_verify_client(dev);
+
+	if (client && client->flags & I2C_CLIENT_AUTO)
+		i2c_unregister_device(client);
+
 	return 0;
 }
 
@@ -2033,7 +2009,12 @@  static int __process_removed_driver(struct device *dev, void *data)
  */
 void i2c_del_driver(struct i2c_driver *driver)
 {
-	i2c_for_each_dev(driver, __process_removed_driver);
+	mutex_lock(&core_lock);
+	/* Satisfy __must_check, function can't fail */
+	if (driver_for_each_device(&driver->driver, NULL, NULL,
+				   __i2c_unregister_detected_client)) {
+	}
+	mutex_unlock(&core_lock);
 
 	driver_unregister(&driver->driver);
 	pr_debug("driver [%s] unregistered\n", driver->driver.name);
@@ -2460,6 +2441,7 @@  static int i2c_detect_address(struct i2c_client *temp_client,
 	/* Finally call the custom detection function */
 	memset(&info, 0, sizeof(struct i2c_board_info));
 	info.addr = addr;
+	info.flags = I2C_CLIENT_AUTO;
 	err = driver->detect(temp_client, &info);
 	if (err) {
 		/* -ENODEV is returned if the detection fails. We catch it
@@ -2486,9 +2468,7 @@  static int i2c_detect_address(struct i2c_client *temp_client,
 		dev_dbg(&adapter->dev, "Creating %s at 0x%02x\n",
 			info.type, info.addr);
 		client = i2c_new_client_device(adapter, &info);
-		if (!IS_ERR(client))
-			list_add_tail(&client->detected, &driver->clients);
-		else
+		if (IS_ERR(client))
 			dev_err(&adapter->dev, "Failed creating %s at 0x%02x\n",
 				info.type, info.addr);
 	}
diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index 388ce71a2..c4c8e841a 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -244,7 +244,6 @@  enum i2c_driver_flags {
  * @id_table: List of I2C devices supported by this driver
  * @detect: Callback for device detection
  * @address_list: The I2C addresses to probe (for detect)
- * @clients: List of detected clients we created (for i2c-core use only)
  * @flags: A bitmask of flags defined in &enum i2c_driver_flags
  *
  * The driver.owner field should be set to the module owner of this driver.
@@ -299,7 +298,6 @@  struct i2c_driver {
 	/* Device detection callback for automatic device creation */
 	int (*detect)(struct i2c_client *client, struct i2c_board_info *info);
 	const unsigned short *address_list;
-	struct list_head clients;
 
 	u32 flags;
 };
@@ -334,6 +332,7 @@  struct i2c_client {
 #define I2C_CLIENT_SLAVE	0x20	/* we are the slave */
 #define I2C_CLIENT_HOST_NOTIFY	0x40	/* We want to use I2C host notify */
 #define I2C_CLIENT_WAKE		0x80	/* for board_info; true iff can wake */
+#define I2C_CLIENT_AUTO		0x100	/* for board_info; auto-detected */
 #define I2C_CLIENT_SCCB		0x9000	/* Use Omnivision SCCB protocol */
 					/* Must match I2C_M_STOP|IGNORE_NAK */