diff mbox series

gpiolib: tie module references to GPIO devices, not requested descs

Message ID 20230818190108.22031-1-brgl@bgdev.pl
State New
Headers show
Series gpiolib: tie module references to GPIO devices, not requested descs | expand

Commit Message

Bartosz Golaszewski Aug. 18, 2023, 7:01 p.m. UTC
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

After a deeper look at commit 3386fb86ecde ("gpiolib: fix reference
leaks when removing GPIO chips still in use") I'm now convinced that
gpiolib gets module reference counting wrong.

As we only take the reference to the owner module when a descriptor is
requested and put it when it's freed, we can easily trigger a crash by
removing a module which registered a driver bound to a GPIO chip which
is unused as nothing prevents us from doing so.

For correct behavior, we should take the reference to the module when
we're creating a GPIO device and only put it when that device is
released as it's at this point that we can safely remove the module's
code from memory.

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
 drivers/gpio/gpiolib.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

Comments

Andy Shevchenko Aug. 21, 2023, 9:43 a.m. UTC | #1
On Fri, Aug 18, 2023 at 09:01:08PM +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> After a deeper look at commit 3386fb86ecde ("gpiolib: fix reference
> leaks when removing GPIO chips still in use") I'm now convinced that
> gpiolib gets module reference counting wrong.
> 
> As we only take the reference to the owner module when a descriptor is
> requested and put it when it's freed, we can easily trigger a crash by
> removing a module which registered a driver bound to a GPIO chip which
> is unused as nothing prevents us from doing so.
> 
> For correct behavior, we should take the reference to the module when
> we're creating a GPIO device and only put it when that device is
> released as it's at this point that we can safely remove the module's
> code from memory.

Two cases to consider:
1) legacy gpio_*() APIs, do they suppose to create a GPIO device?
2) IRQ request without GPIO being requested, is it the case?

Seems to me that the 1) is the case, while 2) is not.
Bartosz Golaszewski Aug. 21, 2023, 10 a.m. UTC | #2
On Mon, Aug 21, 2023 at 11:43 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Fri, Aug 18, 2023 at 09:01:08PM +0200, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > After a deeper look at commit 3386fb86ecde ("gpiolib: fix reference
> > leaks when removing GPIO chips still in use") I'm now convinced that
> > gpiolib gets module reference counting wrong.
> >
> > As we only take the reference to the owner module when a descriptor is
> > requested and put it when it's freed, we can easily trigger a crash by
> > removing a module which registered a driver bound to a GPIO chip which
> > is unused as nothing prevents us from doing so.
> >
> > For correct behavior, we should take the reference to the module when
> > we're creating a GPIO device and only put it when that device is
> > released as it's at this point that we can safely remove the module's
> > code from memory.
>
> Two cases to consider:
> 1) legacy gpio_*() APIs, do they suppose to create a GPIO device?

Legacy uses descriptors under the hood so there must be a GPIO device.

> 2) IRQ request without GPIO being requested, is it the case?

I need to double-check and also test this but it seems to me that
right now if you do this (request an irq from a GPIO irqchip), the
reference count of the module will not be increased. With this change
it will have already been at 1 until the GPIO device backing this irq
will go down. So it should actually fix another use-after-free bug.
But don't take my word for it, I will test it later when I have the
time.

There's another issue that will become visible with this patch -
namely the modules that register devices from their init functions,
will no longer allow unloading until the device is unbound first. This
is not wrong wrong as module's init is not the place to register
devices, platform or otherwise but I'm wondering if it counts as
breaking someone's setup?

Bart

>
> Seems to me that the 1) is the case, while 2) is not.
>
> --
> With Best Regards,
> Andy Shevchenko
>
>
Bartosz Golaszewski Aug. 24, 2023, 12:49 p.m. UTC | #3
On Mon, Aug 21, 2023 at 12:00 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> On Mon, Aug 21, 2023 at 11:43 AM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> >
> > On Fri, Aug 18, 2023 at 09:01:08PM +0200, Bartosz Golaszewski wrote:
> > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > >
> > > After a deeper look at commit 3386fb86ecde ("gpiolib: fix reference
> > > leaks when removing GPIO chips still in use") I'm now convinced that
> > > gpiolib gets module reference counting wrong.
> > >
> > > As we only take the reference to the owner module when a descriptor is
> > > requested and put it when it's freed, we can easily trigger a crash by
> > > removing a module which registered a driver bound to a GPIO chip which
> > > is unused as nothing prevents us from doing so.
> > >
> > > For correct behavior, we should take the reference to the module when
> > > we're creating a GPIO device and only put it when that device is
> > > released as it's at this point that we can safely remove the module's
> > > code from memory.
> >
> > Two cases to consider:
> > 1) legacy gpio_*() APIs, do they suppose to create a GPIO device?
>
> Legacy uses descriptors under the hood so there must be a GPIO device.
>
> > 2) IRQ request without GPIO being requested, is it the case?
>
> I need to double-check and also test this but it seems to me that
> right now if you do this (request an irq from a GPIO irqchip), the
> reference count of the module will not be increased. With this change
> it will have already been at 1 until the GPIO device backing this irq
> will go down. So it should actually fix another use-after-free bug.
> But don't take my word for it, I will test it later when I have the
> time.
>
> There's another issue that will become visible with this patch -
> namely the modules that register devices from their init functions,
> will no longer allow unloading until the device is unbound first. This
> is not wrong wrong as module's init is not the place to register
> devices, platform or otherwise but I'm wondering if it counts as
> breaking someone's setup?
>
> Bart
>

Ok so just checked in theory and verified in practice: with an irq
request orthogonal to the GPIO descriptor, when the GPIO device goes
down, it destroys the irq domain (side note: gpio-sim now finally
disposes of all existing mappings too which would have been the source
of an error here). When the user calls free_irq(), the underlying
irq_do_desc() calls mtree_load() which now returns NULL (mapping is
gone) and nothing happens.

This change doesn't change that behavior - you can still unbind the
GPIO device at any moment and the irq user will be fine.

The problem is: I can no longer reproduce the crash I saw in KASAN
with current next and I'm thinking I may have mistaken one of the bugs
I recently fixed for the culprit here. What I'm seeing now when a
module is unloaded is: driver gets unregistered, device gets unbound
and that's it, all works fine. So this patch and the libgpiod one may
have been pointless noise. :(

Taking the module reference only when there's a requested descriptor
is in line with what most other frameworks do as well.

I need more coffee but maybe at this point I should switch to
panzerschokolade...

Bart
diff mbox series

Patch

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 76e0c38026c3..cb0072d2d137 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -583,6 +583,7 @@  static void gpiodev_release(struct device *dev)
 	list_del(&gdev->list);
 	spin_unlock_irqrestore(&gpio_lock, flags);
 
+	module_put(gdev->owner);
 	ida_free(&gpio_ida, gdev->id);
 	kfree_const(gdev->label);
 	kfree(gdev->descs);
@@ -753,6 +754,10 @@  int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 	else
 		gdev->owner = THIS_MODULE;
 
+	ret = try_module_get(gdev->owner);
+	if (!ret)
+		goto err_free_dev_name;
+
 	/*
 	 * Try the device properties if the driver didn't supply the number
 	 * of GPIO lines.
@@ -769,7 +774,7 @@  int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 			 */
 			ngpios = 0;
 		else if (ret)
-			goto err_free_dev_name;
+			goto err_put_module;
 
 		gc->ngpio = ngpios;
 	}
@@ -777,7 +782,7 @@  int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 	if (gc->ngpio == 0) {
 		chip_err(gc, "tried to insert a GPIO chip with zero lines\n");
 		ret = -EINVAL;
-		goto err_free_dev_name;
+		goto err_put_module;
 	}
 
 	if (gc->ngpio > FASTPATH_NGPIO)
@@ -787,7 +792,7 @@  int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 	gdev->descs = kcalloc(gc->ngpio, sizeof(*gdev->descs), GFP_KERNEL);
 	if (!gdev->descs) {
 		ret = -ENOMEM;
-		goto err_free_dev_name;
+		goto err_put_module;
 	}
 
 	gdev->label = kstrdup_const(gc->label ?: "unknown", GFP_KERNEL);
@@ -937,6 +942,8 @@  int gpiochip_add_data_with_key(struct gpio_chip *gc, void *data,
 	kfree_const(gdev->label);
 err_free_descs:
 	kfree(gdev->descs);
+err_put_module:
+	module_put(gdev->owner);
 err_free_dev_name:
 	kfree(dev_name(&gdev->dev));
 err_free_ida:
@@ -2101,20 +2108,16 @@  static int validate_desc(const struct gpio_desc *desc, const char *func)
 
 int gpiod_request(struct gpio_desc *desc, const char *label)
 {
-	int ret = -EPROBE_DEFER;
+	int ret;
 
 	VALIDATE_DESC(desc);
 
-	if (try_module_get(desc->gdev->owner)) {
-		ret = gpiod_request_commit(desc, label);
-		if (ret)
-			module_put(desc->gdev->owner);
-		else
-			gpio_device_get(desc->gdev);
-	}
-
+	ret = gpiod_request_commit(desc, label);
 	if (ret)
-		gpiod_dbg(desc, "%s: status %d\n", __func__, ret);
+		return ret;
+
+	gpio_device_get(desc->gdev);
+	gpiod_dbg(desc, "%s: status %d\n", __func__, ret);
 
 	return ret;
 }
@@ -2177,7 +2180,6 @@  void gpiod_free(struct gpio_desc *desc)
 	if (!gpiod_free_commit(desc))
 		WARN_ON(extra_checks);
 
-	module_put(desc->gdev->owner);
 	gpio_device_put(desc->gdev);
 }