diff mbox series

[v3] gpiolib: fix reference leaks when removing GPIO chips still in use

Message ID 20230811193034.59124-1-brgl@bgdev.pl
State New
Headers show
Series [v3] gpiolib: fix reference leaks when removing GPIO chips still in use | expand

Commit Message

Bartosz Golaszewski Aug. 11, 2023, 7:30 p.m. UTC
From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

After we remove a GPIO chip that still has some requested descriptors,
gpiod_free_commit() will fail and we will never put the references to the
GPIO device and the owning module in gpiod_free().

Rework this function to:
- not warn on desc == NULL as this is a use-case on which most free
  functions silently return
- put the references to desc->gdev and desc->gdev->owner unconditionally
  so that the release callback actually gets called when the remaining
  references are dropped by external GPIO users

Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
v1 -> v2:
- add a comment about why we can't use VALIDATE_DESC_VOID()

v2 -> v3:
- we must drop the reference to the owner module before we drop the one
  to the gpio_device as the latter may be removed if this is the last
  reference and we'll end up calling module_put() on freed memory

 drivers/gpio/gpiolib.c | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

Comments

Linus Walleij Aug. 15, 2023, 8:25 a.m. UTC | #1
On Fri, Aug 11, 2023 at 9:30 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:

> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
>
> After we remove a GPIO chip that still has some requested descriptors,
> gpiod_free_commit() will fail and we will never put the references to the
> GPIO device and the owning module in gpiod_free().
>
> Rework this function to:
> - not warn on desc == NULL as this is a use-case on which most free
>   functions silently return
> - put the references to desc->gdev and desc->gdev->owner unconditionally
>   so that the release callback actually gets called when the remaining
>   references are dropped by external GPIO users
>
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> ---
> v1 -> v2:
> - add a comment about why we can't use VALIDATE_DESC_VOID()
>
> v2 -> v3:
> - we must drop the reference to the owner module before we drop the one
>   to the gpio_device as the latter may be removed if this is the last
>   reference and we'll end up calling module_put() on freed memory

v3 looks good to me.
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij
Andy Shevchenko Aug. 15, 2023, 9:49 a.m. UTC | #2
On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> After we remove a GPIO chip that still has some requested descriptors,
> gpiod_free_commit() will fail and we will never put the references to the
> GPIO device and the owning module in gpiod_free().
> 
> Rework this function to:
> - not warn on desc == NULL as this is a use-case on which most free
>   functions silently return
> - put the references to desc->gdev and desc->gdev->owner unconditionally
>   so that the release callback actually gets called when the remaining
>   references are dropped by external GPIO users

...

> -	if (desc && desc->gdev && gpiod_free_commit(desc)) {

The commit message doesn't explain disappearing of gdev check.

> -		module_put(desc->gdev->owner);
> -		gpio_device_put(desc->gdev);
> -	} else {
> +	/*
> +	 * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip
> +	 * may already be NULL but we still want to put the references.
> +	 */
> +	if (!desc)
> +		return;
> +
> +	if (!gpiod_free_commit(desc))
>  		WARN_ON(extra_checks);
> -	}
> +
> +	module_put(desc->gdev->owner);
> +	gpio_device_put(desc->gdev);
>  }

So, if gdev can be NULL, you will get an Oops with new code.

To keep a status quo this needs to be rewritten.
Linus Walleij Aug. 15, 2023, 11:40 a.m. UTC | #3
On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
> On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:
> > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> >
> > After we remove a GPIO chip that still has some requested descriptors,
> > gpiod_free_commit() will fail and we will never put the references to the
> > GPIO device and the owning module in gpiod_free().
> >
> > Rework this function to:
> > - not warn on desc == NULL as this is a use-case on which most free
> >   functions silently return
> > - put the references to desc->gdev and desc->gdev->owner unconditionally
> >   so that the release callback actually gets called when the remaining
> >   references are dropped by external GPIO users
>
> ...
>
> > -     if (desc && desc->gdev && gpiod_free_commit(desc)) {
>
> The commit message doesn't explain disappearing of gdev check.
>
> > -             module_put(desc->gdev->owner);
> > -             gpio_device_put(desc->gdev);
> > -     } else {
> > +     /*
> > +      * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip
> > +      * may already be NULL but we still want to put the references.
> > +      */
> > +     if (!desc)
> > +             return;
> > +
> > +     if (!gpiod_free_commit(desc))
> >               WARN_ON(extra_checks);
> > -     }
> > +
> > +     module_put(desc->gdev->owner);
> > +     gpio_device_put(desc->gdev);
> >  }
>
> So, if gdev can be NULL, you will get an Oops with new code.

I read it such that gdev->chip can be NULL, but not gdev,
and desc->gdev->owner is fine to reference?

Yours,
Linus Walleij
Andy Shevchenko Aug. 15, 2023, 12:57 p.m. UTC | #4
On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote:
> On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:
> > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > >
> > > After we remove a GPIO chip that still has some requested descriptors,
> > > gpiod_free_commit() will fail and we will never put the references to the
> > > GPIO device and the owning module in gpiod_free().
> > >
> > > Rework this function to:
> > > - not warn on desc == NULL as this is a use-case on which most free
> > >   functions silently return
> > > - put the references to desc->gdev and desc->gdev->owner unconditionally
> > >   so that the release callback actually gets called when the remaining
> > >   references are dropped by external GPIO users

...

> > > -     if (desc && desc->gdev && gpiod_free_commit(desc)) {
> >
> > The commit message doesn't explain disappearing of gdev check.
> >
> > > -             module_put(desc->gdev->owner);
> > > -             gpio_device_put(desc->gdev);
> > > -     } else {
> > > +     /*
> > > +      * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip
> > > +      * may already be NULL but we still want to put the references.
> > > +      */
> > > +     if (!desc)
> > > +             return;
> > > +
> > > +     if (!gpiod_free_commit(desc))
> > >               WARN_ON(extra_checks);
> > > -     }
> > > +
> > > +     module_put(desc->gdev->owner);
> > > +     gpio_device_put(desc->gdev);
> > >  }
> >
> > So, if gdev can be NULL, you will get an Oops with new code.
> 
> I read it such that gdev->chip can be NULL, but not gdev,
> and desc->gdev->owner is fine to reference?

Basically the Q is
"if desc is non-NULL, does it guarantee that gdev is non-NULL either?"
Linus Walleij Aug. 15, 2023, 1:07 p.m. UTC | #5
On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
> On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote:
> > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:
> > > > From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> > > >
> > > > After we remove a GPIO chip that still has some requested descriptors,
> > > > gpiod_free_commit() will fail and we will never put the references to the
> > > > GPIO device and the owning module in gpiod_free().
> > > >
> > > > Rework this function to:
> > > > - not warn on desc == NULL as this is a use-case on which most free
> > > >   functions silently return
> > > > - put the references to desc->gdev and desc->gdev->owner unconditionally
> > > >   so that the release callback actually gets called when the remaining
> > > >   references are dropped by external GPIO users
>
> ...
>
> > > > -     if (desc && desc->gdev && gpiod_free_commit(desc)) {
> > >
> > > The commit message doesn't explain disappearing of gdev check.
> > >
> > > > -             module_put(desc->gdev->owner);
> > > > -             gpio_device_put(desc->gdev);
> > > > -     } else {
> > > > +     /*
> > > > +      * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip
> > > > +      * may already be NULL but we still want to put the references.
> > > > +      */
> > > > +     if (!desc)
> > > > +             return;
> > > > +
> > > > +     if (!gpiod_free_commit(desc))
> > > >               WARN_ON(extra_checks);
> > > > -     }
> > > > +
> > > > +     module_put(desc->gdev->owner);
> > > > +     gpio_device_put(desc->gdev);
> > > >  }
> > >
> > > So, if gdev can be NULL, you will get an Oops with new code.
> >
> > I read it such that gdev->chip can be NULL, but not gdev,
> > and desc->gdev->owner is fine to reference?
>
> Basically the Q is
> "if desc is non-NULL, does it guarantee that gdev is non-NULL either?"

gdev->desc is assigned in one single spot, which is in
gpiochip_add_data_with_key():

       for (i = 0; i < gc->ngpio; i++)
                gdev->descs[i].gdev = gdev;

It is never assigned anywhere else, so I guess yes.

We may also ask if it is ever invalid (i.e. if desc->gdev can point to
junk).

A gdev turns to junk when its reference count goes down to zero
and gpiodev_release() is called effectively calling kfree() on the
struct gpio_device *.

But that can only happen as a result of module_put() getting
called, pulling the references down to zero. Which is what we
are discussing. The line after module_put(), desc->gdev
*could* be NULL.

But then we just call gpio_device_put(desc->gdev) which is
just a call to device_put(), which is NULL-tolerant.

Yours,
Linus Walleij
Andy Shevchenko Aug. 15, 2023, 2:43 p.m. UTC | #6
On Tue, Aug 15, 2023 at 03:07:50PM +0200, Linus Walleij wrote:
> On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko
> <andriy.shevchenko@linux.intel.com> wrote:
> > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote:
> > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko
> > > <andriy.shevchenko@linux.intel.com> wrote:
> > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:

...

> > > > > +     module_put(desc->gdev->owner);
> > > > > +     gpio_device_put(desc->gdev);
> > > >
> > > > So, if gdev can be NULL, you will get an Oops with new code.
> > >
> > > I read it such that gdev->chip can be NULL, but not gdev,
> > > and desc->gdev->owner is fine to reference?
> >
> > Basically the Q is
> > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?"
> 
> gdev->desc is assigned in one single spot, which is in
> gpiochip_add_data_with_key():
> 
>        for (i = 0; i < gc->ngpio; i++)
>                 gdev->descs[i].gdev = gdev;
> 
> It is never assigned anywhere else, so I guess yes.
> 
> We may also ask if it is ever invalid (i.e. if desc->gdev can point to
> junk).
> 
> A gdev turns to junk when its reference count goes down to zero
> and gpiodev_release() is called effectively calling kfree() on the
> struct gpio_device *.
> 
> But that can only happen as a result of module_put() getting
> called, pulling the references down to zero. Which is what we
> are discussing. The line after module_put(), desc->gdev
> *could* be NULL.

Yes.

> But then we just call gpio_device_put(desc->gdev) which is
> just a call to device_put(), which is NULL-tolerant.

But gpio_device_put() does not NULL tolerant.
So, oops in this line then.
Andy Shevchenko Aug. 15, 2023, 2:45 p.m. UTC | #7
On Tue, Aug 15, 2023 at 05:43:53PM +0300, Andy Shevchenko wrote:
> On Tue, Aug 15, 2023 at 03:07:50PM +0200, Linus Walleij wrote:
> > On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote:
> > > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko
> > > > <andriy.shevchenko@linux.intel.com> wrote:
> > > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:

...

> > > > > > +     module_put(desc->gdev->owner);
> > > > > > +     gpio_device_put(desc->gdev);
> > > > >
> > > > > So, if gdev can be NULL, you will get an Oops with new code.
> > > >
> > > > I read it such that gdev->chip can be NULL, but not gdev,
> > > > and desc->gdev->owner is fine to reference?
> > >
> > > Basically the Q is
> > > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?"
> > 
> > gdev->desc is assigned in one single spot, which is in
> > gpiochip_add_data_with_key():
> > 
> >        for (i = 0; i < gc->ngpio; i++)
> >                 gdev->descs[i].gdev = gdev;
> > 
> > It is never assigned anywhere else, so I guess yes.
> > 
> > We may also ask if it is ever invalid (i.e. if desc->gdev can point to
> > junk).
> > 
> > A gdev turns to junk when its reference count goes down to zero
> > and gpiodev_release() is called effectively calling kfree() on the
> > struct gpio_device *.
> > 
> > But that can only happen as a result of module_put() getting
> > called, pulling the references down to zero. Which is what we
> > are discussing. The line after module_put(), desc->gdev
> > *could* be NULL.
> 
> Yes.
> 
> > But then we just call gpio_device_put(desc->gdev) which is
> > just a call to device_put(), which is NULL-tolerant.
> 
> But gpio_device_put() does not NULL tolerant.
> So, oops in this line then.

That said, this gpiod_free() function needs a lot of additional comments to
explain all this.
Bartosz Golaszewski Aug. 15, 2023, 6:36 p.m. UTC | #8
On Tue, Aug 15, 2023 at 4:43 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> On Tue, Aug 15, 2023 at 03:07:50PM +0200, Linus Walleij wrote:
> > On Tue, Aug 15, 2023 at 2:57 PM Andy Shevchenko
> > <andriy.shevchenko@linux.intel.com> wrote:
> > > On Tue, Aug 15, 2023 at 01:40:22PM +0200, Linus Walleij wrote:
> > > > On Tue, Aug 15, 2023 at 11:50 AM Andy Shevchenko
> > > > <andriy.shevchenko@linux.intel.com> wrote:
> > > > > On Fri, Aug 11, 2023 at 09:30:34PM +0200, Bartosz Golaszewski wrote:
>
> ...
>
> > > > > > +     module_put(desc->gdev->owner);
> > > > > > +     gpio_device_put(desc->gdev);
> > > > >
> > > > > So, if gdev can be NULL, you will get an Oops with new code.
> > > >
> > > > I read it such that gdev->chip can be NULL, but not gdev,
> > > > and desc->gdev->owner is fine to reference?
> > >
> > > Basically the Q is
> > > "if desc is non-NULL, does it guarantee that gdev is non-NULL either?"
> >
> > gdev->desc is assigned in one single spot, which is in
> > gpiochip_add_data_with_key():
> >
> >        for (i = 0; i < gc->ngpio; i++)
> >                 gdev->descs[i].gdev = gdev;
> >
> > It is never assigned anywhere else, so I guess yes.
> >
> > We may also ask if it is ever invalid (i.e. if desc->gdev can point to
> > junk).
> >
> > A gdev turns to junk when its reference count goes down to zero
> > and gpiodev_release() is called effectively calling kfree() on the
> > struct gpio_device *.
> >
> > But that can only happen as a result of module_put() getting
> > called, pulling the references down to zero. Which is what we
> > are discussing. The line after module_put(), desc->gdev
> > *could* be NULL.
>
> Yes.
>
> > But then we just call gpio_device_put(desc->gdev) which is
> > just a call to device_put(), which is NULL-tolerant.
>
> But gpio_device_put() does not NULL tolerant.
> So, oops in this line then.
>

No. struct gpio_device is reference counted and as long as we get the
reference counting right - the descriptor is guaranteed to hold it and
only put it when it itself is being destroyed. In other words:
desc->gdev cannot be NULL here and cannot be junk. If it is, then it's
a programming bug on our side and we do want to crash and burn so that
we can catch and fix it.

If you think more comments are needed here, feel free to add them or
I'll do it at a later time. I don't want to delay this patch any
longer as it's one of the issues we need to fix to make driver unbind
great again. Unless you see an issue with its logic, I want to queue
it tomorrow so that it gets built in next and send it upstream by the
end of this week.

Bart
Bartosz Golaszewski Aug. 16, 2023, 11:37 a.m. UTC | #9
On Fri, Aug 11, 2023 at 9:30 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
>
> After we remove a GPIO chip that still has some requested descriptors,
> gpiod_free_commit() will fail and we will never put the references to the
> GPIO device and the owning module in gpiod_free().
>
> Rework this function to:
> - not warn on desc == NULL as this is a use-case on which most free
>   functions silently return
> - put the references to desc->gdev and desc->gdev->owner unconditionally
>   so that the release callback actually gets called when the remaining
>   references are dropped by external GPIO users
>
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> ---
> v1 -> v2:
> - add a comment about why we can't use VALIDATE_DESC_VOID()
>
> v2 -> v3:
> - we must drop the reference to the owner module before we drop the one
>   to the gpio_device as the latter may be removed if this is the last
>   reference and we'll end up calling module_put() on freed memory
>
>  drivers/gpio/gpiolib.c | 16 +++++++++++-----
>  1 file changed, 11 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
> index 251c875b5c34..76e0c38026c3 100644
> --- a/drivers/gpio/gpiolib.c
> +++ b/drivers/gpio/gpiolib.c
> @@ -2167,12 +2167,18 @@ static bool gpiod_free_commit(struct gpio_desc *desc)
>
>  void gpiod_free(struct gpio_desc *desc)
>  {
> -       if (desc && desc->gdev && gpiod_free_commit(desc)) {
> -               module_put(desc->gdev->owner);
> -               gpio_device_put(desc->gdev);
> -       } else {
> +       /*
> +        * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip
> +        * may already be NULL but we still want to put the references.
> +        */
> +       if (!desc)
> +               return;
> +
> +       if (!gpiod_free_commit(desc))
>                 WARN_ON(extra_checks);
> -       }
> +
> +       module_put(desc->gdev->owner);
> +       gpio_device_put(desc->gdev);
>  }
>
>  /**
> --
> 2.39.2
>

Queued for fixes.

Bart
diff mbox series

Patch

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 251c875b5c34..76e0c38026c3 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -2167,12 +2167,18 @@  static bool gpiod_free_commit(struct gpio_desc *desc)
 
 void gpiod_free(struct gpio_desc *desc)
 {
-	if (desc && desc->gdev && gpiod_free_commit(desc)) {
-		module_put(desc->gdev->owner);
-		gpio_device_put(desc->gdev);
-	} else {
+	/*
+	 * We must not use VALIDATE_DESC_VOID() as the underlying gdev->chip
+	 * may already be NULL but we still want to put the references.
+	 */
+	if (!desc)
+		return;
+
+	if (!gpiod_free_commit(desc))
 		WARN_ON(extra_checks);
-	}
+
+	module_put(desc->gdev->owner);
+	gpio_device_put(desc->gdev);
 }
 
 /**