lockdep: incorrect deadlock warning with two GPIO expanders

Message ID CAMpxmJW5WNF2zE+fjmpDxubAYA_A67iQhNGTvSAL-MD-6dFTCw@mail.gmail.com
State New
Headers show

Commit Message

Bartosz Golaszewski Sept. 12, 2016, 3:16 p.m.
+ Linus, Alexandre, linux-gpio

2016-09-12 14:09 GMT+02:00 Peter Zijlstra <peterz@infradead.org>:
>2016-09-12 13:51 GMT+02:00 Bartosz Golaszewski <bgolaszewski@baylibre.com>:

>> I'm trying to figure out a way of getting rid of an incorrect lockdep

>> deadlock warning, but the issue is not trivial.

>>

>> In our hardware an I2C multiplexer is controlled by a GPIO provided by

>> an expander. There's a second expander using the same device driver

>> (pca953x) on one of the I2C bus segments. The diagram below presents

>> the setup:

>>

>>                                                - - - - -

>>  -------             ---------  Bus segment 1 |         |

>> |       |           |         |---------------  Devices

>> |       | SCL/SDA   |         |               |         |

>> | Linux |-----------| I2C MUX |                - - - - -

>> |       |    |      |         | Bus segment 2

>> |       |    |      |         |-------------------

>>  -------     |       ---------                    |

>>              |           |                    - - - - -

>>         ------------     | MUX GPIO          |         |

>>        |            |    |                     Devices

>>        |    GPIO    |    |                   |         |

>>        | Expander 1 |----                     - - - - -

>>        |            |                             |

>>         ------------                              | SCL/SDA

>>                                                   |

>>                                              ------------

>>                                             |            |

>>                                             |    GPIO    |

>>                                             | Expander 2 |

>>                                             |            |

>>                                              ------------

>>

>> Lockdep prints a deadlock message when trying to set the direction or

>> get/set the value of any GPIO provided by the second expander.

>>

>> The reason for the warning is that we take the chip->i2c_lock in

>> pca953x_gpio_set/get_value() or pca953x_gpio_direction_input/output()

>> and then come right back to pca953x_gpio_set_value() when the GPIO mux

>> kicks in. The call path is as follows (starting from a fixed regulator

>> controlled by a GPIO provided by the second expander):

>>

>>   regulator_register()

>>   |_ _regulator_do_enable()

>>      |_ pca953x_gpio_set_value() <- mutex_lock

>>         |_ pca953x_write_single()

>>            |_ i2c_smbus_write_byte_data()

>>               |_ i2c_smbus_xfer()

>>                  |_ i2c_transfer()

>>                     |_ __i2c_transfer()

>>                        |_ i2c_mux_master_xfer()

>>                           |_ i2c_mux_gpio_select()

>>                              |_ i2c_mux_gpio_set()

>>                                 |_ pca953x_gpio_set_value() <- 2nd mutex_lock

>>

>> The locks actually protect different expanders, but lockdep doesn't

>> see this and says:

>>

>>   Possible unsafe locking scenario:

>>

>>         CPU0

>>         ----

>>    lock(&chip->i2c_lock);

>>    lock(&chip->i2c_lock);

>>

>>   *** DEADLOCK ***

>>

>>   May be due to missing lock nesting notation

>>

>> Using mutex_lock_nested(&chip->i2c_lock, SINGLE_DEPTH_NESTING) in

>> pca953x_gpio_get_value() and pca953x_gpio_direction_input/output()

>> helps for reading the values or setting the direction, but doesn't do

>> anything if used in pca953x_gpio_set_value() since we still end up

>> taking the lock of the same subclass again.

>>

>> It would require some nasty hacks to figure out that a GPIO is being

>> used by an I2C mux if we wanted to explicitly provide a different

>> sublass in this case, but that would not fix the culprit, since the

>> same problem would occur in other gpio drivers under similar

>> circumstances.

>>

>> It seems the problem is with the way lockdep works, but I lack the

>> knowledge to propose any solution.

>>

>> Any help & ideas are appreciated.

>

> So I'm entirely clueless on how the device model works let alone i2c

> and/or gpio. So I'm going to need some help as well. What's an SCL/SDA

> for instance?

>


SCL/SDA stands for I2C clock and data lines. It's not important for
the problem, I just used it to mark the physical layer between the SoC
and I2C multiplexer.

> So the 'problem' is that pca953x_probe()'s mutex_init() will collapse

> all mutexes it initializes into a single class. It assumes that the

> locking rules for all instances will be the same.

>

> This happens to not be true in this case.

>


Correct.

> The tricky part, and here I have absolutely no clue what so ever, is

> being able to tell at pca953x_probe() time that this is so.

>


AFAIK there is no clean way to tell that a GPIO is used by an I2C
multiplexer at probe time. Linus, Alexandre could you confirm?

> Once we can tell, at probe time, there are two different annotations we

> could use, depending on need.

>


My current workaround is the following patch:


It uses the fact that the two expanders we have are of different type
(pca9534 and pca9535). The id pointer points to per-chip device info
residing in .data which makes it suitable for mutex key.

I don't think such hack is suitable for mainline though.

> I suppose that theoretically you can keep nesting like that ad

> infinitum, but I also expect that its uncommon enough, and maybe not

> practical, to really nest like this -- seeing this is the first instance

> of such issues.

>

> In any case, can you tell at probe time? And how deep a nesting should

> we worry about?

>


In this case we would only need two classes - one for 'regular' GPIOs
and another for the GPIOs used to control an I2C multiplexer.

> Seeing how this lock is specific to the driver, and there is no generic

> infrastructure, I don't see how we could solve it other than on a

> per-driver basis.


I'm planning to convert this driver to using regmap and I'm afraid
we'll run into similar issue (given that regmap doesn't nest the
default mutex nor uses different classes).

Best regards,
Bartosz Golaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Linus Walleij Sept. 13, 2016, 12:29 p.m. | #1
On Mon, Sep 12, 2016 at 5:33 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Sep 12, 2016 at 05:16:14PM +0200, Bartosz Golaszewski wrote:


>> AFAIK there is no clean way to tell that a GPIO is used by an I2C

>> multiplexer at probe time. Linus, Alexandre could you confirm?


Nominally, the GPIO descriptors are just abstract resources such
as regulators or clocks, they can be used for a lot but just like
a clock, regulator, dma channel etc does not know who is using
it and for what, it does not know this, no.

> You cannot inspect the device tree while probing?


Of course it *can* but we would end up encoding a special
case every time something like this happens, tied to just
device tree, then another bolt-on for ACPI etc.

I have a hard time following the problem really, I'm
afraid I'm simply just not smart enough :(

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Walleij Sept. 15, 2016, 12:41 p.m. | #2
On Thu, Sep 15, 2016 at 9:51 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 13, 2016 at 02:29:24PM +0200, Linus Walleij wrote:

>> On Mon, Sep 12, 2016 at 5:33 PM, Peter Zijlstra <peterz@infradead.org> wrote:

>> > On Mon, Sep 12, 2016 at 05:16:14PM +0200, Bartosz Golaszewski wrote:

>>

>> >> AFAIK there is no clean way to tell that a GPIO is used by an I2C

>> >> multiplexer at probe time. Linus, Alexandre could you confirm?

>>

>> Nominally, the GPIO descriptors are just abstract resources such

>> as regulators or clocks, they can be used for a lot but just like

>> a clock, regulator, dma channel etc does not know who is using

>> it and for what, it does not know this, no.

>>

>> > You cannot inspect the device tree while probing?

>>

>> Of course it *can* but we would end up encoding a special

>> case every time something like this happens, tied to just

>> device tree, then another bolt-on for ACPI etc.

>>

>> I have a hard time following the problem really, I'm

>> afraid I'm simply just not smart enough :(

>

> Why would this be DT or ACPI specific? Linux itself has a tree/graph of

> all busses and devices right? That's what all this drivers/base/ stuff

> is on about.

>

> So can't you walk up that and see if you encounter the exact same driver

> again?

>

> Something like:

>

>         for (nr = 0, parent = dev->parent; parent; parent = parent->parent) {

>                 if (parent->device_driver == &pca953x_driver.driver)

>                         nr++;

>         }


Oh clever. Of course.

Bartosz can you try out this approach?

Yours,
Linus Walleij
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartosz Golaszewski Sept. 15, 2016, 1:20 p.m. | #3
2016-09-15 14:41 GMT+02:00 Linus Walleij <linus.walleij@linaro.org>:
> On Thu, Sep 15, 2016 at 9:51 AM, Peter Zijlstra <peterz@infradead.org> wrote:

>> On Tue, Sep 13, 2016 at 02:29:24PM +0200, Linus Walleij wrote:

>>> On Mon, Sep 12, 2016 at 5:33 PM, Peter Zijlstra <peterz@infradead.org> wrote:

>>> > On Mon, Sep 12, 2016 at 05:16:14PM +0200, Bartosz Golaszewski wrote:

>>>

>>> >> AFAIK there is no clean way to tell that a GPIO is used by an I2C

>>> >> multiplexer at probe time. Linus, Alexandre could you confirm?

>>>

>>> Nominally, the GPIO descriptors are just abstract resources such

>>> as regulators or clocks, they can be used for a lot but just like

>>> a clock, regulator, dma channel etc does not know who is using

>>> it and for what, it does not know this, no.

>>>

>>> > You cannot inspect the device tree while probing?

>>>

>>> Of course it *can* but we would end up encoding a special

>>> case every time something like this happens, tied to just

>>> device tree, then another bolt-on for ACPI etc.

>>>

>>> I have a hard time following the problem really, I'm

>>> afraid I'm simply just not smart enough :(

>>

>> Why would this be DT or ACPI specific? Linux itself has a tree/graph of

>> all busses and devices right? That's what all this drivers/base/ stuff

>> is on about.

>>

>> So can't you walk up that and see if you encounter the exact same driver

>> again?

>>

>> Something like:

>>

>>         for (nr = 0, parent = dev->parent; parent; parent = parent->parent) {

>>                 if (parent->device_driver == &pca953x_driver.driver)

>>                         nr++;

>>         }

>

> Oh clever. Of course.

>

> Bartosz can you try out this approach?

>


I think it may be more complicated than that, depending on the hw
topology, but the general idea seems reasonable. I'll try this.

Thanks,
Bartosz Golaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartosz Golaszewski Sept. 15, 2016, 3:23 p.m. | #4
2016-09-15 16:38 GMT+02:00 Peter Zijlstra <peterz@infradead.org>:
> On Thu, Sep 15, 2016 at 04:08:52PM +0200, Bartosz Golaszewski wrote:

>> 2016-09-15 15:39 GMT+02:00 Peter Zijlstra <peterz@infradead.org>:

>

>> > In any case, if this fails, we can always punt and simply count the

>> > total number of instances of this driver on the system and go with that.

>> >

>>

>> But for __mutex_init() to work with the key argument you need to know

>> it at compile time, right?

>

> You can do something like:

>

>         mutex_init(&mutex);

>         lockdep_set_subclass(&mutex, nr);

>

> which will of course fail at runtime the moment nr >= 8, but is that

> really a concern?

>

> Equally you can do:

>

> static struct lock_class_key my_keys[NR];

>

>         mutex_init(&mutex);

>         BUG_ON(nr > NR);

>         lockdep_set_class(&mutex, my_keys + nr);

>

> and have a bigger limit.


Thanks, I was not aware of this API.

Best regards,
Bartosz Golaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartosz Golaszewski Sept. 16, 2016, 11:14 a.m. | #5
2016-09-16 12:56 GMT+02:00 Peter Zijlstra <peterz@infradead.org>:
> On Thu, Sep 15, 2016 at 03:39:22PM +0200, Peter Zijlstra wrote:

>

>> In any case, if this fails, we can always punt and simply count the

>> total number of instances of this driver on the system and go with that.

>

> One caveat with that approach is that if this is s module, someone doing

> rmmod, insmod a number of times gets tricky.


The whole thing is a bit more complicated - I'm trying to figure out
the code behind it, but the first expander has four parents whose
driver == pca953x, while the second has five of them. There is no
device name, but kobj->name corresponds with the i2c device
sub-directories on correct addresses and busses (1-0021 for the first
one and 4-0020 for the second on my board).

With both devices having pca953x parents I still can't tell one from
another. Still working on it.

Thanks,
Bartosz Golaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartosz Golaszewski Sept. 16, 2016, noon | #6
2016-09-16 13:14 GMT+02:00 Bartosz Golaszewski <bgolaszewski@baylibre.com>:
>

> The whole thing is a bit more complicated - I'm trying to figure out

> the code behind it, but the first expander has four parents whose

> driver == pca953x, while the second has five of them. There is no

> device name, but kobj->name corresponds with the i2c device

> sub-directories on correct addresses and busses (1-0021 for the first

> one and 4-0020 for the second on my board).


Actually nevermind my last e-mail - I did a braindead mistake when
obtaining logs.

Thanks,
Bartosz
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bartosz Golaszewski Sept. 16, 2016, 2:21 p.m. | #7
2016-09-15 14:41 GMT+02:00 Linus Walleij <linus.walleij@linaro.org>:
>

> Oh clever. Of course.

>

> Bartosz can you try out this approach?

>

> Yours,

> Linus Walleij


Hi Linus,

I submitted a patch reusing a routine already present in i2c.h that
allows to test if an i2c adapter's parent is also an i2c adapter,
which is the case for adapters behind a multiplexer.

Also Cc'ed linux-i2c and Wolfram Sang to let him take a look if the
API is used correctly.

Best regards,
Bartosz Golaszewski
--
To unsubscribe from this list: send the line "unsubscribe linux-gpio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch hide | download patch | download mbox

diff --git a/drivers/gpio/gpio-pca953x.c b/drivers/gpio/gpio-pca953x.c
index 5e3be32..40b96bf 100644
--- a/drivers/gpio/gpio-pca953x.c
+++ b/drivers/gpio/gpio-pca953x.c
@@ -739,6 +739,7 @@  static const struct of_device_id pca953x_dt_ids[];
 static int pca953x_probe(struct i2c_client *client,
    const struct i2c_device_id *id)
 {
+ struct lock_class_key *lock_key = (struct lock_class_key *)id;
  struct pca953x_platform_data *pdata;
  struct pca953x_chip *chip;
  int irq_base = 0;
@@ -783,7 +784,7 @@  static int pca953x_probe(struct i2c_client *client,

  chip->chip_type = PCA_CHIP_TYPE(chip->driver_data);

- mutex_init(&chip->i2c_lock);
+ __mutex_init(&chip->i2c_lock, "chip->i2c_lock", lock_key);

  /* initialize cached registers from their original values.
  * we can't share this chip with another i2c master.