diff mbox series

[1/2] PM: runtime: Add a general runtime get sync operation to deal with usage counter

Message ID 20201109080938.4174745-2-zhangqilong3@huawei.com
State New
Headers show
Series Fix usage counter leak by adding a general sync ops | expand

Commit Message

Zhang Qilong Nov. 9, 2020, 8:09 a.m. UTC
In many case, we need to check return value of pm_runtime_get_sync, but
it brings a trouble to the usage counter processing. Many callers forget
to decrease the usage counter when it failed. It has been discussed a
lot[0][1]. So we add a function to deal with the usage counter for better
coding.

[0]https://lkml.org/lkml/2020/6/14/88
[1]https://patchwork.ozlabs.org/project/linux-tegra/patch/20200520095148.10995-1-dinghao.liu@zju.edu.cn/
Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
---
 include/linux/pm_runtime.h | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

Comments

Rafael J. Wysocki Nov. 9, 2020, 12:59 p.m. UTC | #1
On Mon, Nov 9, 2020 at 9:05 AM Zhang Qilong <zhangqilong3@huawei.com> wrote:
>
> In many case, we need to check return value of pm_runtime_get_sync, but
> it brings a trouble to the usage counter processing. Many callers forget
> to decrease the usage counter when it failed. It has been discussed a
> lot[0][1]. So we add a function to deal with the usage counter for better
> coding.
>
> [0]https://lkml.org/lkml/2020/6/14/88
> [1]https://patchwork.ozlabs.org/project/linux-tegra/patch/20200520095148.10995-1-dinghao.liu@zju.edu.cn/
> Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> ---
>  include/linux/pm_runtime.h | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
>
> diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
> index 4b708f4e8eed..2b0af5b1dffd 100644
> --- a/include/linux/pm_runtime.h
> +++ b/include/linux/pm_runtime.h
> @@ -386,6 +386,38 @@ static inline int pm_runtime_get_sync(struct device *dev)
>         return __pm_runtime_resume(dev, RPM_GET_PUT);
>  }
>
> +/**
> + * gene_pm_runtime_get_sync - Bump up usage counter of a device and resume it.
> + * @dev: Target device.

The force argument is not documented.

> + *
> + * Increase runtime PM usage counter of @dev first, and carry out runtime-resume
> + * of it synchronously. If __pm_runtime_resume return negative value(device is in
> + * error state) or return positive value(the runtime of device is already active)
> + * with force is true, it need decrease the usage counter of the device when
> + * return.
> + *
> + * The possible return values of this function is zero or negative value.
> + * zero:
> + *    - it means success and the status will store the resume operation status
> + *      if needed, the runtime PM usage counter of @dev remains incremented.
> + * negative:
> + *    - it means failure and the runtime PM usage counter of @dev has been
> + *      decreased.
> + * positive:
> + *    - it means the runtime of the device is already active before that. If
> + *      caller set force to true, we still need to decrease the usage counter.

Why is this needed?

> + */
> +static inline int gene_pm_runtime_get_sync(struct device *dev, bool force)

The name is not really a good one and note that pm_runtime_get() has
the same problem as _get_sync() (ie. the usage counter is incremented
regardless of the return value).

> +{
> +       int ret = 0;
> +
> +       ret = __pm_runtime_resume(dev, RPM_GET_PUT);
> +       if (ret < 0 || (ret > 0 && force))
> +               pm_runtime_put_noidle(dev);
> +
> +       return ret;
> +}
> +
>  /**
>   * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0.
>   * @dev: Target device.
> --

Thanks!
Zhang Qilong Nov. 9, 2020, 1:24 p.m. UTC | #2
Hi
> 

> On Mon, Nov 9, 2020 at 9:05 AM Zhang Qilong <zhangqilong3@huawei.com>

> wrote:

> >

> > In many case, we need to check return value of pm_runtime_get_sync,

> > but it brings a trouble to the usage counter processing. Many callers

> > forget to decrease the usage counter when it failed. It has been

> > discussed a lot[0][1]. So we add a function to deal with the usage

> > counter for better coding.

> >

> > [0]https://lkml.org/lkml/2020/6/14/88

> > [1]https://patchwork.ozlabs.org/project/linux-tegra/patch/202005200951

> > 48.10995-1-dinghao.liu@zju.edu.cn/

> > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>

> > ---

> >  include/linux/pm_runtime.h | 32 ++++++++++++++++++++++++++++++++

> >  1 file changed, 32 insertions(+)

> >

> > diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h

> > index 4b708f4e8eed..2b0af5b1dffd 100644

> > --- a/include/linux/pm_runtime.h

> > +++ b/include/linux/pm_runtime.h

> > @@ -386,6 +386,38 @@ static inline int pm_runtime_get_sync(struct device

> *dev)

> >         return __pm_runtime_resume(dev, RPM_GET_PUT);  }

> >

> > +/**

> > + * gene_pm_runtime_get_sync - Bump up usage counter of a device and

> resume it.

> > + * @dev: Target device.

> 

> The force argument is not documented.


(1) Good catch, I will add it in next version.

> 

> > + *

> > + * Increase runtime PM usage counter of @dev first, and carry out

> > + runtime-resume

> > + * of it synchronously. If __pm_runtime_resume return negative

> > + value(device is in

> > + * error state) or return positive value(the runtime of device is

> > + already active)

> > + * with force is true, it need decrease the usage counter of the

> > + device when

> > + * return.

> > + *

> > + * The possible return values of this function is zero or negative value.

> > + * zero:

> > + *    - it means success and the status will store the resume operation

> status

> > + *      if needed, the runtime PM usage counter of @dev remains

> incremented.

> > + * negative:

> > + *    - it means failure and the runtime PM usage counter of @dev has

> been

> > + *      decreased.

> > + * positive:

> > + *    - it means the runtime of the device is already active before that. If

> > + *      caller set force to true, we still need to decrease the usage

> counter.

> 

> Why is this needed?


(2) If caller set force, it means caller will return even the device has already been active
(__pm_runtime_resume return positive value) after calling gene_pm_runtime_get_sync,
we still need to decrease the usage count.

> 

> > + */

> > +static inline int gene_pm_runtime_get_sync(struct device *dev, bool

> > +force)

> 

> The name is not really a good one and note that pm_runtime_get() has the

> same problem as _get_sync() (ie. the usage counter is incremented regardless

> of the return value).

> 


(3) I have not thought a good name now, if you have good ideas, welcome.


Thanks, 
Zhang

> > +{

> > +       int ret = 0;

> > +

> > +       ret = __pm_runtime_resume(dev, RPM_GET_PUT);

> > +       if (ret < 0 || (ret > 0 && force))

> > +               pm_runtime_put_noidle(dev);

> > +

> > +       return ret;

> > +}

> > +

> >  /**

> >   * pm_runtime_put - Drop device usage counter and queue up "idle check"

> if 0.

> >   * @dev: Target device.

> > --

> 

> Thanks!
Rafael J. Wysocki Nov. 9, 2020, 2:07 p.m. UTC | #3
On Mon, Nov 9, 2020 at 2:46 PM zhangqilong <zhangqilong3@huawei.com> wrote:
>
> Hi,
>
> >
> > On Mon, Nov 9, 2020 at 2:24 PM zhangqilong <zhangqilong3@huawei.com>
> > wrote:
> > >
> > > Hi
> > > >
> > > > On Mon, Nov 9, 2020 at 9:05 AM Zhang Qilong
> > > > <zhangqilong3@huawei.com>
> > > > wrote:
> > > > >
> > > > > In many case, we need to check return value of
> > > > > pm_runtime_get_sync, but it brings a trouble to the usage counter
> > > > > processing. Many callers forget to decrease the usage counter when
> > > > > it failed. It has been discussed a lot[0][1]. So we add a function
> > > > > to deal with the usage counter for better coding.
> > > > >
> > > > > [0]https://lkml.org/lkml/2020/6/14/88
> > > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/patch/20200520
> > > > > 0951 48.10995-1-dinghao.liu@zju.edu.cn/
> > > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>
> > > > > ---
> > > > >  include/linux/pm_runtime.h | 32
> > ++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 32 insertions(+)
> > > > >
> > > > > diff --git a/include/linux/pm_runtime.h
> > > > > b/include/linux/pm_runtime.h index 4b708f4e8eed..2b0af5b1dffd
> > > > > 100644
> > > > > --- a/include/linux/pm_runtime.h
> > > > > +++ b/include/linux/pm_runtime.h
> > > > > @@ -386,6 +386,38 @@ static inline int pm_runtime_get_sync(struct
> > > > > device
> > > > *dev)
> > > > >         return __pm_runtime_resume(dev, RPM_GET_PUT);  }
> > > > >
> > > > > +/**
> > > > > + * gene_pm_runtime_get_sync - Bump up usage counter of a device
> > > > > +and
> > > > resume it.
> > > > > + * @dev: Target device.
> > > >
> > > > The force argument is not documented.
> > >
> > > (1) Good catch, I will add it in next version.
> > >
> > > >
> > > > > + *
> > > > > + * Increase runtime PM usage counter of @dev first, and carry out
> > > > > + runtime-resume
> > > > > + * of it synchronously. If __pm_runtime_resume return negative
> > > > > + value(device is in
> > > > > + * error state) or return positive value(the runtime of device is
> > > > > + already active)
> > > > > + * with force is true, it need decrease the usage counter of the
> > > > > + device when
> > > > > + * return.
> > > > > + *
> > > > > + * The possible return values of this function is zero or negative value.
> > > > > + * zero:
> > > > > + *    - it means success and the status will store the resume operation
> > > > status
> > > > > + *      if needed, the runtime PM usage counter of @dev remains
> > > > incremented.
> > > > > + * negative:
> > > > > + *    - it means failure and the runtime PM usage counter of @dev has
> > > > been
> > > > > + *      decreased.
> > > > > + * positive:
> > > > > + *    - it means the runtime of the device is already active before that.
> > If
> > > > > + *      caller set force to true, we still need to decrease the usage
> > > > counter.
> > > >
> > > > Why is this needed?
> > >
> > > (2) If caller set force, it means caller will return even the device
> > > has already been active (__pm_runtime_resume return positive value)
> > > after calling gene_pm_runtime_get_sync, we still need to decrease the
> > usage count.
> >
> > But who needs this?
> >
> > I don't think that it is a good idea to complicate the API this way.
>
> The callers like:
> ret = pm_runtime_get_sync(dev);
> if (ret) {
>         ...
>         return (xxx);
> }

Which isn't correct really, is it?

If ret is greater than 0, the error should not be returned in the
first place, so you may want the new wrapper to return zero in that
case instead.

> drivers/spi/spi-img-spfi.c:734 img_spfi_resume() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/mfd/arizona-core.c:49 arizona_clk32k_enable() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/usb/dwc3/dwc3-pci.c:212 dwc3_pci_resume_work() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/input/keyboard/omap4-keypad.c:279 omap4_keypad_probe() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/gpu/drm/vc4/vc4_dsi.c:839 vc4_dsi_encoder_enable() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/gpu/drm/i915/selftests/mock_gem_device.c:157 mock_gem_device() warn: 'pm_runtime_get_sync(&pdev->dev)' returns positive and negative
> drivers/watchdog/rti_wdt.c:230 rti_wdt_probe() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/media/platform/exynos4-is/mipi-csis.c:513 s5pcsis_s_stream() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c:89 mtk_vcodec_dec_pw_on() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/media/platform/ti-vpe/cal.c:794 cal_probe() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/media/platform/ti-vpe/vpe.c:2478 vpe_runtime_get() warn: pm_runtime_get_sync() also returns 1 on success
> drivers/media/i2c/smiapp/smiapp-core.c:1529 smiapp_pm_get_init() warn: pm_runtime_get_sync() also returns 1 on success
> ...
> they need it to simplify the function.
>
> If we only want to simplify like
> ret = pm_runtime_get_sync(dev);
> if (ret < 0) {
>         ...
>         Return (xxx)
> }
> The parameter force could be removed.

Which is exactly my point.
Zhang Qilong Nov. 9, 2020, 2:19 p.m. UTC | #4
> On Mon, Nov 9, 2020 at 2:46 PM zhangqilong <zhangqilong3@huawei.com>

> wrote:

> >

> > Hi,

> >

> > >

> > > On Mon, Nov 9, 2020 at 2:24 PM zhangqilong <zhangqilong3@huawei.com>

> > > wrote:

> > > >

> > > > Hi

> > > > >

> > > > > On Mon, Nov 9, 2020 at 9:05 AM Zhang Qilong

> > > > > <zhangqilong3@huawei.com>

> > > > > wrote:

> > > > > >

> > > > > > In many case, we need to check return value of

> > > > > > pm_runtime_get_sync, but it brings a trouble to the usage

> > > > > > counter processing. Many callers forget to decrease the usage

> > > > > > counter when it failed. It has been discussed a lot[0][1]. So

> > > > > > we add a function to deal with the usage counter for better coding.

> > > > > >

> > > > > > [0]https://lkml.org/lkml/2020/6/14/88

> > > > > > [1]https://patchwork.ozlabs.org/project/linux-tegra/patch/2020

> > > > > > 0520

> > > > > > 0951 48.10995-1-dinghao.liu@zju.edu.cn/

> > > > > > Signed-off-by: Zhang Qilong <zhangqilong3@huawei.com>

> > > > > > ---

> > > > > >  include/linux/pm_runtime.h | 32

> > > ++++++++++++++++++++++++++++++++

> > > > > >  1 file changed, 32 insertions(+)

> > > > > >

> > > > > > diff --git a/include/linux/pm_runtime.h

> > > > > > b/include/linux/pm_runtime.h index 4b708f4e8eed..2b0af5b1dffd

> > > > > > 100644

> > > > > > --- a/include/linux/pm_runtime.h

> > > > > > +++ b/include/linux/pm_runtime.h

> > > > > > @@ -386,6 +386,38 @@ static inline int

> > > > > > pm_runtime_get_sync(struct device

> > > > > *dev)

> > > > > >         return __pm_runtime_resume(dev, RPM_GET_PUT);  }

> > > > > >

> > > > > > +/**

> > > > > > + * gene_pm_runtime_get_sync - Bump up usage counter of a

> > > > > > +device and

> > > > > resume it.

> > > > > > + * @dev: Target device.

> > > > >

> > > > > The force argument is not documented.

> > > >

> > > > (1) Good catch, I will add it in next version.

> > > >

> > > > >

> > > > > > + *

> > > > > > + * Increase runtime PM usage counter of @dev first, and carry

> > > > > > + out runtime-resume

> > > > > > + * of it synchronously. If __pm_runtime_resume return

> > > > > > + negative value(device is in

> > > > > > + * error state) or return positive value(the runtime of

> > > > > > + device is already active)

> > > > > > + * with force is true, it need decrease the usage counter of

> > > > > > + the device when

> > > > > > + * return.

> > > > > > + *

> > > > > > + * The possible return values of this function is zero or negative value.

> > > > > > + * zero:

> > > > > > + *    - it means success and the status will store the resume

> operation

> > > > > status

> > > > > > + *      if needed, the runtime PM usage counter of @dev remains

> > > > > incremented.

> > > > > > + * negative:

> > > > > > + *    - it means failure and the runtime PM usage counter of @dev

> has

> > > > > been

> > > > > > + *      decreased.

> > > > > > + * positive:

> > > > > > + *    - it means the runtime of the device is already active before

> that.

> > > If

> > > > > > + *      caller set force to true, we still need to decrease the usage

> > > > > counter.

> > > > >

> > > > > Why is this needed?

> > > >

> > > > (2) If caller set force, it means caller will return even the

> > > > device has already been active (__pm_runtime_resume return

> > > > positive value) after calling gene_pm_runtime_get_sync, we still

> > > > need to decrease the

> > > usage count.

> > >

> > > But who needs this?

> > >

> > > I don't think that it is a good idea to complicate the API this way.

> >

> > The callers like:

> > ret = pm_runtime_get_sync(dev);

> > if (ret) {

> >         ...

> >         return (xxx);

> > }

> 

> Which isn't correct really, is it?

> 

> If ret is greater than 0, the error should not be returned in the first place, so

> you may want the new wrapper to return zero in that case instead.


I get your idea.

> 

> > drivers/spi/spi-img-spfi.c:734 img_spfi_resume() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/mfd/arizona-core.c:49 arizona_clk32k_enable() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/usb/dwc3/dwc3-pci.c:212 dwc3_pci_resume_work() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/input/keyboard/omap4-keypad.c:279 omap4_keypad_probe() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/gpu/drm/vc4/vc4_dsi.c:839 vc4_dsi_encoder_enable() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/gpu/drm/i915/selftests/mock_gem_device.c:157 mock_gem_device()

> > warn: 'pm_runtime_get_sync(&pdev->dev)' returns positive and negative

> > drivers/watchdog/rti_wdt.c:230 rti_wdt_probe() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/media/platform/exynos4-is/mipi-csis.c:513 s5pcsis_s_stream()

> > warn: pm_runtime_get_sync() also returns 1 on success

> > drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c:89

> > mtk_vcodec_dec_pw_on() warn: pm_runtime_get_sync() also returns 1 on

> > success

> > drivers/media/platform/ti-vpe/cal.c:794 cal_probe() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/media/platform/ti-vpe/vpe.c:2478 vpe_runtime_get() warn:

> > pm_runtime_get_sync() also returns 1 on success

> > drivers/media/i2c/smiapp/smiapp-core.c:1529 smiapp_pm_get_init() warn:

> > pm_runtime_get_sync() also returns 1 on success ...

> > they need it to simplify the function.

> >

> > If we only want to simplify like

> > ret = pm_runtime_get_sync(dev);

> > if (ret < 0) {

> >         ...

> >         Return (xxx)

> > }

> > The parameter force could be removed.

> 

> Which is exactly my point.


OK, I re-code it next version.

Thanks,
Zhang
diff mbox series

Patch

diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h
index 4b708f4e8eed..2b0af5b1dffd 100644
--- a/include/linux/pm_runtime.h
+++ b/include/linux/pm_runtime.h
@@ -386,6 +386,38 @@  static inline int pm_runtime_get_sync(struct device *dev)
 	return __pm_runtime_resume(dev, RPM_GET_PUT);
 }
 
+/**
+ * gene_pm_runtime_get_sync - Bump up usage counter of a device and resume it.
+ * @dev: Target device.
+ *
+ * Increase runtime PM usage counter of @dev first, and carry out runtime-resume
+ * of it synchronously. If __pm_runtime_resume return negative value(device is in
+ * error state) or return positive value(the runtime of device is already active)
+ * with force is true, it need decrease the usage counter of the device when
+ * return.
+ *
+ * The possible return values of this function is zero or negative value.
+ * zero:
+ *    - it means success and the status will store the resume operation status
+ *      if needed, the runtime PM usage counter of @dev remains incremented.
+ * negative:
+ *    - it means failure and the runtime PM usage counter of @dev has been
+ *      decreased.
+ * positive:
+ *    - it means the runtime of the device is already active before that. If
+ *      caller set force to true, we still need to decrease the usage counter.
+ */
+static inline int gene_pm_runtime_get_sync(struct device *dev, bool force)
+{
+	int ret = 0;
+
+	ret = __pm_runtime_resume(dev, RPM_GET_PUT);
+	if (ret < 0 || (ret > 0 && force))
+		pm_runtime_put_noidle(dev);
+
+	return ret;
+}
+
 /**
  * pm_runtime_put - Drop device usage counter and queue up "idle check" if 0.
  * @dev: Target device.