mbox series

[v4,00/79] Address some issues with PM runtime at media subsystem

Message ID cover.1619621413.git.mchehab+huawei@kernel.org
Headers show
Series Address some issues with PM runtime at media subsystem | expand

Message

Mauro Carvalho Chehab April 28, 2021, 2:51 p.m. UTC
During the review of the patches from unm.edu, one of the patterns
I noticed is the amount of patches trying to fix pm_runtime_get_sync()
calls.

After analyzing the feedback from version 1 of this series, I noticed
a few other weird behaviors at the PM runtime resume code. So, this
series start addressing some bugs and issues at the current code.
Then, it gets rid of pm_runtime_get_sync() at the media subsystem
(with 2 exceptions).

It should be noticed that
Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
added a new method to does a pm_runtime get, which increments
the usage count only on success.

The rationale of getting rid of pm_runtime_get_sync() is:

1. despite its name, this is actually a PM runtime resume call,
   but some developers didn't seem to realize that, as I got this
   pattern on some drivers:

        pm_runtime_get_sync(&client->dev);
        pm_runtime_disable(&client->dev);
        pm_runtime_set_suspended(&client->dev);
        pm_runtime_put_noidle(&client->dev);

   It makes no sense to resume PM just to suspend it again ;-)

2. Usual *_get() methods only increment their use count on success,
   but pm_runtime_get_sync() increments it unconditionally. Due to
   that, several drivers were mistakenly not calling
   pm_runtime_put_noidle() when it fails;

3. The name of the new variant is a lot clearer:
	pm_runtime_resume_and_get()
    As its same clearly says that this is a PM runtime resume function,
    that also increments the usage counter on success;

4. Consistency: we did similar changes subsystem wide with
   for instance strlcpy() and strcpy() that got replaced by
   strscpy(). Having all drivers using the same known-to-be-safe
   methods is a good thing;

5. Prevent newer drivers to copy-and-paste a code that it would
   be easier to break if they don't truly understand what's behind
   the scenes.

This series replace places  pm_runtime_get_sync(), by calling
pm_runtime_resume_and_get() instead.

This should help to avoid future mistakes like that, as people
tend to use the existing drivers as examples for newer ones.

compile-tested only.

Patches 1 to 7 fix some issues that already exists at the current
PM runtime code;

patches 8 to 20 fix some usage_count problems that still exists
at the media subsystem;

patches 21 to 78 repaces pm_runtime_get_sync() by 
pm_runtime_resume_and_get();

Patch 79 (and a hunk on patch 78) documents the two exceptions
where pm_runtime_get_sync() will still be used for now.

---

v4:
    - Added a couple of additional fixes at existing PM runtime code;
    - Some patches are now more conservative in order to avoid causing
     regressions.
v3:
    - fix a compilation error;
v2:
    - addressed pointed issues and fixed a few other PM issues.


Mauro Carvalho Chehab (79):
  media: venus: fix PM runtime logic at venus_sys_error_handler()
  media: s6p_cec: decrement usage count if disabled
  media: i2c: ccs-core: return the right error code at suspend
  media: i2c: ov7740: don't resume at remove time
  media: i2c: video-i2c: don't resume at remove time
  media: i2c: imx334: fix the pm runtime get logic
  media: exynos-gsc: don't resume at remove time
  media: atmel: properly get pm_runtime
  media: marvel-ccic: fix some issues when getting pm_runtime
  media: mdk-mdp: fix pm_runtime_get_sync() usage count
  media: rcar_fdp1: fix pm_runtime_get_sync() usage count
  media: renesas-ceu: Properly check for PM errors
  media: s5p: fix pm_runtime_get_sync() usage count
  media: am437x: fix pm_runtime_get_sync() usage count
  media: sh_vou: fix pm_runtime_get_sync() usage count
  media: mtk-vcodec: fix pm_runtime_get_sync() usage count
  media: s5p-jpeg: fix pm_runtime_get_sync() usage count
  media: sti/delta: fix pm_runtime_get_sync() usage count
  media: sunxi: fix pm_runtime_get_sync() usage count
  staging: media: rkvdec: fix pm_runtime_get_sync() usage count
  staging: media: atomisp: use pm_runtime_resume_and_get()
  staging: media: imx7-mipi-csis: use pm_runtime_resume_and_get()
  staging: media: ipu3: use pm_runtime_resume_and_get()
  staging: media: cedrus_video: use pm_runtime_resume_and_get()
  staging: media: tegra-vde: use pm_runtime_resume_and_get()
  staging: media: tegra-video: use pm_runtime_resume_and_get()
  media: i2c: ak7375: use pm_runtime_resume_and_get()
  media: i2c: ccs-core: use pm_runtime_resume_and_get()
  media: i2c: dw9714: use pm_runtime_resume_and_get()
  media: i2c: dw9768: use pm_runtime_resume_and_get()
  media: i2c: dw9807-vcm: use pm_runtime_resume_and_get()
  media: i2c: hi556: use pm_runtime_resume_and_get()
  media: i2c: imx214: use pm_runtime_resume_and_get()
  media: i2c: imx219: use pm_runtime_resume_and_get()
  media: i2c: imx258: use pm_runtime_resume_and_get()
  media: i2c: imx274: use pm_runtime_resume_and_get()
  media: i2c: imx290: use pm_runtime_resume_and_get()
  media: i2c: imx319: use pm_runtime_resume_and_get()
  media: i2c: imx355: use pm_runtime_resume_and_get()
  media: i2c: mt9m001: use pm_runtime_resume_and_get()
  media: i2c: ov02a10: use pm_runtime_resume_and_get()
  media: i2c: ov13858: use pm_runtime_resume_and_get()
  media: i2c: ov2659: use pm_runtime_resume_and_get()
  media: i2c: ov2685: use pm_runtime_resume_and_get()
  media: i2c: ov2740: use pm_runtime_resume_and_get()
  media: i2c: ov5647: use pm_runtime_resume_and_get()
  media: i2c: ov5648: use pm_runtime_resume_and_get()
  media: i2c: ov5670: use pm_runtime_resume_and_get()
  media: i2c: ov5675: use pm_runtime_resume_and_get()
  media: i2c: ov5695: use pm_runtime_resume_and_get()
  media: i2c: ov7740: use pm_runtime_resume_and_get()
  media: i2c: ov8856: use pm_runtime_resume_and_get()
  media: i2c: ov8865: use pm_runtime_resume_and_get()
  media: i2c: ov9734: use pm_runtime_resume_and_get()
  media: i2c: tvp5150: use pm_runtime_resume_and_get()
  media: i2c: video-i2c: use pm_runtime_resume_and_get()
  media: rockchip/rga: use pm_runtime_resume_and_get()
  media: sti/hva: use pm_runtime_resume_and_get()
  media: sti/bdisp: use pm_runtime_resume_and_get()
  media: ipu3: use pm_runtime_resume_and_get()
  media: coda: use pm_runtime_resume_and_get()
  media: exynos4-is: use pm_runtime_resume_and_get()
  media: exynos-gsc: use pm_runtime_resume_and_get()
  media: mtk-jpeg: use pm_runtime_resume_and_get()
  media: camss: use pm_runtime_resume_and_get()
  media: venus: use pm_runtime_resume_and_get()
  media: venus: vdec: use pm_runtime_resume_and_get()
  media: venus: venc: use pm_runtime_resume_and_get()
  media: rcar-fcp: use pm_runtime_resume_and_get()
  media: rkisp1: use pm_runtime_resume_and_get()
  media: s3c-camif: use pm_runtime_resume_and_get()
  media: s5p-mfc: use pm_runtime_resume_and_get()
  media: stm32: use pm_runtime_resume_and_get()
  media: sunxi: use pm_runtime_resume_and_get()
  media: ti-vpe: use pm_runtime_resume_and_get()
  media: vsp1: use pm_runtime_resume_and_get()
  media: rcar-vin: use pm_runtime_resume_and_get()
  media: hantro: use pm_runtime_resume_and_get()
  media: hantro: do a PM resume earlier

 drivers/media/cec/platform/s5p/s5p_cec.c      |  7 +++--
 drivers/media/i2c/ak7375.c                    | 10 +------
 drivers/media/i2c/ccs/ccs-core.c              | 18 +++++-------
 drivers/media/i2c/dw9714.c                    | 10 +------
 drivers/media/i2c/dw9768.c                    | 10 +------
 drivers/media/i2c/dw9807-vcm.c                | 10 +------
 drivers/media/i2c/hi556.c                     |  3 +-
 drivers/media/i2c/imx214.c                    |  6 ++--
 drivers/media/i2c/imx219.c                    |  6 ++--
 drivers/media/i2c/imx258.c                    |  6 ++--
 drivers/media/i2c/imx274.c                    |  3 +-
 drivers/media/i2c/imx290.c                    |  6 ++--
 drivers/media/i2c/imx319.c                    |  6 ++--
 drivers/media/i2c/imx334.c                    |  7 +++--
 drivers/media/i2c/imx355.c                    |  6 ++--
 drivers/media/i2c/mt9m001.c                   |  9 ++++--
 drivers/media/i2c/ov02a10.c                   |  6 ++--
 drivers/media/i2c/ov13858.c                   |  6 ++--
 drivers/media/i2c/ov2659.c                    |  6 ++--
 drivers/media/i2c/ov2685.c                    |  7 ++---
 drivers/media/i2c/ov2740.c                    |  6 ++--
 drivers/media/i2c/ov5647.c                    |  9 +++---
 drivers/media/i2c/ov5648.c                    |  6 ++--
 drivers/media/i2c/ov5670.c                    |  6 ++--
 drivers/media/i2c/ov5675.c                    |  3 +-
 drivers/media/i2c/ov5695.c                    |  6 ++--
 drivers/media/i2c/ov7740.c                    |  8 ++---
 drivers/media/i2c/ov8856.c                    |  3 +-
 drivers/media/i2c/ov8865.c                    |  6 ++--
 drivers/media/i2c/ov9734.c                    |  3 +-
 drivers/media/i2c/tvp5150.c                   | 16 ++--------
 drivers/media/i2c/video-i2c.c                 | 14 +++------
 drivers/media/pci/intel/ipu3/ipu3-cio2-main.c |  3 +-
 drivers/media/platform/am437x/am437x-vpfe.c   | 22 ++++++++++----
 drivers/media/platform/atmel/atmel-isc-base.c | 27 ++++++++++++-----
 drivers/media/platform/atmel/atmel-isi.c      | 19 +++++++++---
 drivers/media/platform/coda/coda-common.c     |  7 +++--
 drivers/media/platform/exynos-gsc/gsc-core.c  | 11 +++----
 drivers/media/platform/exynos-gsc/gsc-m2m.c   |  2 +-
 .../media/platform/exynos4-is/fimc-capture.c  |  6 ++--
 drivers/media/platform/exynos4-is/fimc-is.c   |  4 +--
 .../platform/exynos4-is/fimc-isp-video.c      |  3 +-
 drivers/media/platform/exynos4-is/fimc-isp.c  |  7 ++---
 drivers/media/platform/exynos4-is/fimc-lite.c |  5 ++--
 drivers/media/platform/exynos4-is/fimc-m2m.c  |  2 +-
 drivers/media/platform/exynos4-is/media-dev.c |  8 ++---
 drivers/media/platform/exynos4-is/mipi-csis.c |  8 ++---
 .../media/platform/marvell-ccic/mcam-core.c   |  9 ++++--
 .../media/platform/mtk-jpeg/mtk_jpeg_core.c   |  4 +--
 drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c  |  6 ++--
 .../platform/mtk-vcodec/mtk_vcodec_dec_pm.c   |  4 +--
 .../media/platform/qcom/camss/camss-csid.c    |  6 ++--
 .../media/platform/qcom/camss/camss-csiphy.c  |  6 ++--
 .../media/platform/qcom/camss/camss-ispif.c   |  6 ++--
 drivers/media/platform/qcom/camss/camss-vfe.c |  5 ++--
 drivers/media/platform/qcom/venus/core.c      | 28 +++++++++++-------
 .../media/platform/qcom/venus/pm_helpers.c    | 10 +++----
 drivers/media/platform/qcom/venus/vdec.c      |  4 +--
 drivers/media/platform/qcom/venus/venc.c      |  5 ++--
 drivers/media/platform/rcar-fcp.c             |  6 ++--
 drivers/media/platform/rcar-vin/rcar-csi2.c   | 15 ++++++++--
 drivers/media/platform/rcar-vin/rcar-dma.c    |  6 ++--
 drivers/media/platform/rcar-vin/rcar-v4l2.c   |  6 ++--
 drivers/media/platform/rcar_fdp1.c            | 12 ++++++--
 drivers/media/platform/renesas-ceu.c          |  4 +--
 drivers/media/platform/rockchip/rga/rga-buf.c |  3 +-
 drivers/media/platform/rockchip/rga/rga.c     |  4 ++-
 .../platform/rockchip/rkisp1/rkisp1-capture.c |  3 +-
 .../media/platform/s3c-camif/camif-capture.c  |  2 +-
 drivers/media/platform/s3c-camif/camif-core.c |  5 ++--
 drivers/media/platform/s5p-jpeg/jpeg-core.c   |  2 +-
 drivers/media/platform/s5p-mfc/s5p_mfc_pm.c   |  6 ++--
 drivers/media/platform/sh_vou.c               |  6 +++-
 drivers/media/platform/sti/bdisp/bdisp-v4l2.c |  7 +++--
 drivers/media/platform/sti/delta/delta-v4l2.c |  4 +--
 drivers/media/platform/sti/hva/hva-hw.c       | 17 ++++++-----
 drivers/media/platform/stm32/stm32-dcmi.c     |  5 ++--
 .../platform/sunxi/sun4i-csi/sun4i_v4l2.c     |  6 ++--
 .../sunxi/sun8i-rotate/sun8i_rotate.c         |  2 +-
 drivers/media/platform/ti-vpe/cal-video.c     |  4 ++-
 drivers/media/platform/ti-vpe/cal.c           |  8 +++--
 drivers/media/platform/ti-vpe/vpe.c           |  4 +--
 drivers/media/platform/vsp1/vsp1_drv.c        |  6 ++--
 .../staging/media/atomisp/pci/atomisp_fops.c  |  6 ++--
 drivers/staging/media/hantro/hantro_drv.c     | 29 ++++++++++++-------
 drivers/staging/media/imx/imx7-mipi-csis.c    |  7 ++---
 drivers/staging/media/ipu3/ipu3.c             |  3 +-
 drivers/staging/media/rkvdec/rkvdec.c         |  2 +-
 .../staging/media/sunxi/cedrus/cedrus_video.c |  6 ++--
 drivers/staging/media/tegra-vde/vde.c         | 19 ++++++++++--
 drivers/staging/media/tegra-video/csi.c       |  3 +-
 drivers/staging/media/tegra-video/vi.c        |  3 +-
 92 files changed, 335 insertions(+), 347 deletions(-)

Comments

Mauro Carvalho Chehab April 29, 2021, 7:13 a.m. UTC | #1
Em Wed, 28 Apr 2021 14:17:50 -0300
Ezequiel Garcia <ezequiel@collabora.com> escreveu:

> Hi Mauro,

> 

> Thanks a lot for taking care of this.

> 

> On Wed, 2021-04-28 at 16:52 +0200, Mauro Carvalho Chehab wrote:

> > The device_run() first enables the clock and then

> > tries to resume PM runtime, checking for errors.

> > 

> > Well, if for some reason the pm_runtime can not resume,

> > it would be better to detect it beforehand.

> > 

> > So, change the order inside device_run().

> > 

> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>  

> 

> Clocks could be behind power-domains, IIRC, so this change

> is fixing that.

> 

> However, this has ever been a problem for this driver,

> so I don't think it makes sense to bother with Fixes tag.


I would prefer to move this patch to the first part of this
series, together with other fixes, rebasing it to apply cleanly
before the pm_runtime_resume_and_get() patch, with:

    Fixes: 775fec69008d ("media: add Rockchip VPU JPEG encoder driver")

This way, people that could be interested on backporting it will be
capable to apply it as is to stable Kernel releases that came
with this driver.

> 

> Reviewed-by: Ezequiel Garcia <ezequiel@collabora.com>

> 

> Thanks,

> Ezequiel

> 

> > ---

> >  drivers/staging/media/hantro/hantro_drv.c | 8 ++++----

> >  1 file changed, 4 insertions(+), 4 deletions(-)

> > 

> > diff --git a/drivers/staging/media/hantro/hantro_drv.c b/drivers/staging/media/hantro/hantro_drv.c

> > index 25fa36e7e773..67de6b15236d 100644

> > --- a/drivers/staging/media/hantro/hantro_drv.c

> > +++ b/drivers/staging/media/hantro/hantro_drv.c

> > @@ -160,14 +160,14 @@ static void device_run(void *priv)

> >         src = hantro_get_src_buf(ctx);

> >         dst = hantro_get_dst_buf(ctx);

> >  

> > -       ret = clk_bulk_enable(ctx->dev->variant->num_clocks, ctx->dev->clocks);

> > -       if (ret)

> > -               goto err_cancel_job;

> > -

> >         ret = pm_runtime_resume_and_get(ctx->dev->dev);

> >         if (ret < 0)

> >                 goto err_cancel_job;

> >  

> > +       ret = clk_bulk_enable(ctx->dev->variant->num_clocks, ctx->dev->clocks);

> > +       if (ret)

> > +               goto err_cancel_job;

> > +

> >         v4l2_m2m_buf_copy_metadata(src, dst, true);

> >  

> >         ctx->codec_ops->run(ctx);  

> 

> 




Thanks,
Mauro
Mauro Carvalho Chehab April 29, 2021, 7:38 a.m. UTC | #2
Em Wed, 28 Apr 2021 17:09:57 +0200
Johan Hovold <johan@kernel.org> escreveu:

> On Wed, Apr 28, 2021 at 04:51:41PM +0200, Mauro Carvalho Chehab wrote:

> > The pm_runtime_get_sync() internally increments the

> > dev->power.usage_count without decrementing it, even on errors.

> > Replace it by the new pm_runtime_resume_and_get(), introduced by:

> > commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> > in order to properly decrement the usage counter and avoid memory

> > leaks.  

> 

> Again, there is no memory leak here either. Just a potential PM usage

> counter leak.


True. Will fix it at the entire series with:

	FILTER_BRANCH_SQUELCH_WARNING=1 git filter-branch -f --msg-filter "cat|perl -0pe 's/ and avoid memory\n\s*leaks./, avoiding\na potential PM usage counter leak./igs'" BASE..

Regards,
Mauro
Mauro Carvalho Chehab April 29, 2021, 10:18 a.m. UTC | #3
Em Wed, 28 Apr 2021 17:50:08 +0200
Johan Hovold <johan@kernel.org> escreveu:

> On Wed, Apr 28, 2021 at 04:51:21PM +0200, Mauro Carvalho Chehab wrote:

> > 1. despite its name, this is actually a PM runtime resume call,
> >    but some developers didn't seem to realize that, as I got this
> >    pattern on some drivers:
> > 
> >         pm_runtime_get_sync(&client->dev);
> >         pm_runtime_disable(&client->dev);
> >         pm_runtime_set_suspended(&client->dev);
> >         pm_runtime_put_noidle(&client->dev);
> > 
> >    It makes no sense to resume PM just to suspend it again ;-)  
> 
> This is perfectly alright. Take a look at ov7740_remove() for example:
> 
> 	pm_runtime_get_sync(&client->dev);
> 	pm_runtime_disable(&client->dev);
> 	pm_runtime_set_suspended(&client->dev);
> 	pm_runtime_put_noidle(&client->dev);
> 	
> 	ov7740_set_power(ov7740, 0);
> 
> There's an explicit power-off after balancing the PM count and this will
> work regardless of the power state when entering this function.

Ok, but this should equally work:

 	pm_runtime_disable(&client->dev);
 	pm_runtime_set_suspended(&client->dev);
 	
 	ov7740_set_power(ov7740, 0);

as there's no additional cleanup made on this particular driver
between pm_runtime_get_sync() and pm_runtime_put_noidle().

> So this has nothing to do with pm_runtime_get_sync() per se.

Yes, but some patches on this series are cleaning up the driver release
logic.

> 
> > 2. Usual *_get() methods only increment their use count on success,
> >    but pm_runtime_get_sync() increments it unconditionally. Due to
> >    that, several drivers were mistakenly not calling
> >    pm_runtime_put_noidle() when it fails;  
> 
> Sure, but pm_runtime_get_async() also works this way. You just won't be
> notified if the async resume fails.

Granted, it makes sense along the pm_runtime kAPI.

It is inconsistent with the behavior of kobject_get*() and other
*_get*() methods that are based or inspired on it, as, on those, the
operations are atomic: either everything succeeds and it doesn't return
an error, or the usage counter is not incremented and the object
state doesn't change after the call.

> > 3. The name of the new variant is a lot clearer:
> > 	pm_runtime_resume_and_get()
> >     As its same clearly says that this is a PM runtime resume function,
> >     that also increments the usage counter on success;  
> 
> It also introduced an inconsistency in the API and does not pair as well
> with the pm_runtime_put variants.

Agreed. A name that would be more consistent with PM runtime would
probably be:

	pm_runtime_resume_if_get()

as there are already:

	pm_runtime_get_if_in_use()
	pm_runtime_get_if_active()

But any such discussions are out of the scope of this patchset ;-)

> 
> > 4. Consistency: we did similar changes subsystem wide with
> >    for instance strlcpy() and strcpy() that got replaced by
> >    strscpy(). Having all drivers using the same known-to-be-safe
> >    methods is a good thing;  
> 
> It's not known to be safe; there are ways to get also this interface
> wrong as for example this series has shown.

Very true. Yet, it is a lot simpler to use functions that won't change
the state of the objects when returning an error, as this is by far
the most common pattern within the Kernel.

Human brains are trained to identify certain patterns. When there's
something using a similar pattern, but with a different behavior, 
our brains are more subject to fail identifying problems.

> > 5. Prevent newer drivers to copy-and-paste a code that it would
> >    be easier to break if they don't truly understand what's behind
> >    the scenes.  
> 
> Cargo-cult programming always runs that risk.

True.

> > This series replace places  pm_runtime_get_sync(), by calling
> > pm_runtime_resume_and_get() instead.
> > 
> > This should help to avoid future mistakes like that, as people
> > tend to use the existing drivers as examples for newer ones.  
> 
> The only valid point about and use for pm_runtime_resume_and_get() is to
> avoid leaking a PM usage count reference in the unlikely case that
> resume fails (something which hardly any driver implements recovery
> from anyway).
> 
> It's a convenience wrapper that saves you from writing one extra line in
> some cases (depending on how you implement runtime-pm support) and not a
> silver bullet against bugs.
>  
> > compile-tested only.
> > 
> > Patches 1 to 7 fix some issues that already exists at the current
> > PM runtime code;
> > 
> > patches 8 to 20 fix some usage_count problems that still exists
> > at the media subsystem;
> > 
> > patches 21 to 78 repaces pm_runtime_get_sync() by 
> > pm_runtime_resume_and_get();
> > 
> > Patch 79 (and a hunk on patch 78) documents the two exceptions
> > where pm_runtime_get_sync() will still be used for now.
> > 
> > ---
> > 
> > v4:
> >     - Added a couple of additional fixes at existing PM runtime code;
> >     - Some patches are now more conservative in order to avoid causing
> >      regressions.
> > v3:
> >     - fix a compilation error;
> > v2:
> >     - addressed pointed issues and fixed a few other PM issues.  
> 
> This really doesn't say much more than "changed stuff" so kinda hard to
> track if review feedback has been taken into account for example.

I addressed all review feedback I got (as far as I'm aware), and added
all received reviewed-by/acked-by.

Yeah, I could have written a more comprehensive changes description
there.

Thanks,
Mauro
Dmitry Osipenko April 29, 2021, 12:33 p.m. UTC | #4
29.04.2021 13:18, Mauro Carvalho Chehab пишет:
>> This is perfectly alright. Take a look at ov7740_remove() for example:
>>
>> 	pm_runtime_get_sync(&client->dev);
>> 	pm_runtime_disable(&client->dev);
>> 	pm_runtime_set_suspended(&client->dev);
>> 	pm_runtime_put_noidle(&client->dev);
>> 	
>> 	ov7740_set_power(ov7740, 0);
>>
>> There's an explicit power-off after balancing the PM count and this will
>> work regardless of the power state when entering this function.
> Ok, but this should equally work:
> 
>  	pm_runtime_disable(&client->dev);
>  	pm_runtime_set_suspended(&client->dev);
>  	
>  	ov7740_set_power(ov7740, 0);
> 
> as there's no additional cleanup made on this particular driver
> between pm_runtime_get_sync() and pm_runtime_put_noidle().
> 

The pm_runtime_get_sync() turns hardware ON by invoking
ov7740_set_power(ov7740, 1), and thus, the ON->OFF is kept balanced in
both RPM-available and RPM-unavailable cases. The RPM state of device
should be reset after driver removal.

It doesn't look like any additional cleanups are needed by that ov7740
driver. The driver removal is opposite to the probe, hence it should be
correct as-is.
Johan Hovold April 29, 2021, 3:17 p.m. UTC | #5
On Thu, Apr 29, 2021 at 12:18:16PM +0200, Mauro Carvalho Chehab wrote:
> Em Wed, 28 Apr 2021 17:50:08 +0200
> Johan Hovold <johan@kernel.org> escreveu:
> 
> > On Wed, Apr 28, 2021 at 04:51:21PM +0200, Mauro Carvalho Chehab wrote:
> 
> > > 1. despite its name, this is actually a PM runtime resume call,
> > >    but some developers didn't seem to realize that, as I got this
> > >    pattern on some drivers:
> > > 
> > >         pm_runtime_get_sync(&client->dev);
> > >         pm_runtime_disable(&client->dev);
> > >         pm_runtime_set_suspended(&client->dev);
> > >         pm_runtime_put_noidle(&client->dev);
> > > 
> > >    It makes no sense to resume PM just to suspend it again ;-)  
> > 
> > This is perfectly alright. Take a look at ov7740_remove() for example:
> > 
> > 	pm_runtime_get_sync(&client->dev);
> > 	pm_runtime_disable(&client->dev);
> > 	pm_runtime_set_suspended(&client->dev);
> > 	pm_runtime_put_noidle(&client->dev);
> > 	
> > 	ov7740_set_power(ov7740, 0);
> > 
> > There's an explicit power-off after balancing the PM count and this will
> > work regardless of the power state when entering this function.
> 
> Ok, but this should equally work:
> 
>  	pm_runtime_disable(&client->dev);
>  	pm_runtime_set_suspended(&client->dev);
>  	
>  	ov7740_set_power(ov7740, 0);
> 
> as there's no additional cleanup made on this particular driver
> between pm_runtime_get_sync() and pm_runtime_put_noidle().

No, that would break the driver as I pointed out to you yesterday:

	https://lore.kernel.org/r/YImG1klSPkFSaS3a@hovoldconsulting.com

If the device is already suspended when remove is called then you'll
end up with an unbalanced call to ov7740_set_power() that will try to
disable an already disabled clock.

> > So this has nothing to do with pm_runtime_get_sync() per se.
> 
> Yes, but some patches on this series are cleaning up the driver release
> logic.

You mentioned this example as an argument against using
pm_runtime_get_sync(), which I don't think makes sense.

> > > 2. Usual *_get() methods only increment their use count on success,
> > >    but pm_runtime_get_sync() increments it unconditionally. Due to
> > >    that, several drivers were mistakenly not calling
> > >    pm_runtime_put_noidle() when it fails;  
> > 
> > Sure, but pm_runtime_get_async() also works this way. You just won't be
> > notified if the async resume fails.
> 
> Granted, it makes sense along the pm_runtime kAPI.
> 
> It is inconsistent with the behavior of kobject_get*() and other
> *_get*() methods that are based or inspired on it, as, on those, the
> operations are atomic: either everything succeeds and it doesn't return
> an error, or the usage counter is not incremented and the object
> state doesn't change after the call.

Right, and I'm aware that some people have overlooked this. But its not
the end of the world since hardly any driver can handle resume failures
properly anyway. 

This is mostly just an exercise to shut up static checkers.

> > > 3. The name of the new variant is a lot clearer:
> > > 	pm_runtime_resume_and_get()
> > >     As its same clearly says that this is a PM runtime resume function,
> > >     that also increments the usage counter on success;  
> > 
> > It also introduced an inconsistency in the API and does not pair as well
> > with the pm_runtime_put variants.
> 
> Agreed. A name that would be more consistent with PM runtime would
> probably be:
> 
> 	pm_runtime_resume_if_get()

Naw, since the get part always succeeds.

It should start with pm_runtime_get, but pm_runtime_get_sync() is
unfortunately taken.

> as there are already:
> 
> 	pm_runtime_get_if_in_use()
> 	pm_runtime_get_if_active()
> 
> But any such discussions are out of the scope of this patchset ;-)

Right.

> > > 4. Consistency: we did similar changes subsystem wide with
> > >    for instance strlcpy() and strcpy() that got replaced by
> > >    strscpy(). Having all drivers using the same known-to-be-safe
> > >    methods is a good thing;  
> > 
> > It's not known to be safe; there are ways to get also this interface
> > wrong as for example this series has shown.
> 
> Very true. Yet, it is a lot simpler to use functions that won't change
> the state of the objects when returning an error, as this is by far
> the most common pattern within the Kernel.

A resume failure does change the state (and needs to be recovered from),
but I get what you're saying.

> Human brains are trained to identify certain patterns. When there's
> something using a similar pattern, but with a different behavior, 
> our brains are more subject to fail identifying problems.

Sure. But I'm not sure that having two interfaces with different
semantics to do the job is doing us any favours here. But again, that
discussion has already been had.

And I realise that this is partly also your motive here (even if the old
interface isn't going to go away).

> > > compile-tested only.
> > > Patches 1 to 7 fix some issues that already exists at the current
> > > PM runtime code;
> > > 
> > > patches 8 to 20 fix some usage_count problems that still exists
> > > at the media subsystem;
> > > 
> > > patches 21 to 78 repaces pm_runtime_get_sync() by 
> > > pm_runtime_resume_and_get();
> > > 
> > > Patch 79 (and a hunk on patch 78) documents the two exceptions
> > > where pm_runtime_get_sync() will still be used for now.

80 patches in one series (posted to lkml) is a bit excessive. Perhaps
you can break it up in a fixes part and one or more cleanups parts?

Johan
Jonathan Cameron April 30, 2021, 3:54 p.m. UTC | #6
On Wed, 28 Apr 2021 16:51:27 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The PM runtime get logic is currently broken, as it checks if

> ret is zero instead of checking if it is an error code,

> as reported by Dan Carpenter.

> 

> While here, use the pm_runtime_resume_and_get() as added by:

> commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> added pm_runtime_resume_and_get() in order to automatically handle

> dev->power.usage_count decrement on errors.


Perhaps also call out that a fail of pm_runtime_get_sync in current
code would potentially result in a spurious runtime_suspend() call
whereas using pm_runtime_resume_and_get() will call
pm_runtime_put_noidle()

> 

> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

> Reviewed-by: Daniele Alessandrelli <daniele.alessandrelli@intel.com>

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---

>  drivers/media/i2c/imx334.c | 7 ++++---

>  1 file changed, 4 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/media/i2c/imx334.c b/drivers/media/i2c/imx334.c

> index 047aa7658d21..23f28606e570 100644

> --- a/drivers/media/i2c/imx334.c

> +++ b/drivers/media/i2c/imx334.c

> @@ -717,9 +717,9 @@ static int imx334_set_stream(struct v4l2_subdev *sd, int enable)

>  	}

>  

>  	if (enable) {

> -		ret = pm_runtime_get_sync(imx334->dev);

> -		if (ret)

> -			goto error_power_off;

> +		ret = pm_runtime_resume_and_get(imx334->dev);

> +		if (ret < 0)

> +			goto error_unlock;

>  

>  		ret = imx334_start_streaming(imx334);

>  		if (ret)

> @@ -737,6 +737,7 @@ static int imx334_set_stream(struct v4l2_subdev *sd, int enable)

>  

>  error_power_off:

>  	pm_runtime_put(imx334->dev);

> +error_unlock:

>  	mutex_unlock(&imx334->mutex);

>  

>  	return ret;
Jonathan Cameron April 30, 2021, 4:26 p.m. UTC | #7
On Wed, 28 Apr 2021 16:51:32 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The pm_runtime_get_sync() internally increments the

> dev->power.usage_count without decrementing it, even on errors.

> Replace it by the new pm_runtime_resume_and_get(), introduced by:

> commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> in order to properly decrement the usage counter and avoid memory

> leaks.

> 

> Also, right now, the driver is ignoring any troubles when

> trying to do PM resume. So, add the proper error handling

> for the code.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>


Refactor to move the error handling block to one place perhaps?

Also, I thin more cleanup is needed int he 
> ---

>  drivers/media/platform/rcar_fdp1.c | 12 ++++++++++--

>  1 file changed, 10 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/media/platform/rcar_fdp1.c b/drivers/media/platform/rcar_fdp1.c

> index 01c1fbb97bf6..c32d237af618 100644

> --- a/drivers/media/platform/rcar_fdp1.c

> +++ b/drivers/media/platform/rcar_fdp1.c

> @@ -2140,7 +2140,13 @@ static int fdp1_open(struct file *file)

>  	}

>  

>  	/* Perform any power management required */

> -	pm_runtime_get_sync(fdp1->dev);

> +	ret = pm_runtime_resume_and_get(fdp1->dev);

> +	if (ret < 0) {

> +		v4l2_m2m_ctx_release(ctx->fh.m2m_ctx);


Next few lines repeated enough times that I'd suggest just pulling error handling out
to it's own path.

> +		v4l2_ctrl_handler_free(&ctx->hdl);

> +		kfree(ctx);

> +		goto done;

> +	}

>  

>  	v4l2_fh_add(&ctx->fh);

>  

> @@ -2351,7 +2357,9 @@ static int fdp1_probe(struct platform_device *pdev)

>  

>  	/* Power up the cells to read HW */

>  	pm_runtime_enable(&pdev->dev);

> -	pm_runtime_get_sync(fdp1->dev);

> +	ret = pm_runtime_resume_and_get(fdp1->dev);


From a balance point of view, do you want to all pm_runtime_disable()
if this happens and also I'd guess you need to unregister the device
as done in other error paths in probe

The lack of balance between ordering in probe and remove before
your patch bothers me but I don't know the code well enough to tell
if there are any actual bugs there.


> +	if (ret < 0)

> +		return ret;

>  

>  	hw_version = fdp1_read(fdp1, FD1_IP_INTDATA);

>  	switch (hw_version) {
Jonathan Cameron April 30, 2021, 4:28 p.m. UTC | #8
On Wed, 28 Apr 2021 16:51:33 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Right now, the driver just assumes that PM runtime resume

> worked, but it may fail.

> 

> Well, the pm_runtime_get_sync() internally increments the

> dev->power.usage_count without decrementing it, even on errors.

> 

> So, using it is tricky. Let's replace it by the new

> pm_runtime_resume_and_get(), introduced by:

> commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> and return an error if something bad happens.

> 

> This should ensure that the PM runtime usage_count will be

> properly decremented if an error happens at open time.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

LGTM
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---

>  drivers/media/platform/renesas-ceu.c | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/media/platform/renesas-ceu.c b/drivers/media/platform/renesas-ceu.c

> index cd137101d41e..17f01b6e3fe0 100644

> --- a/drivers/media/platform/renesas-ceu.c

> +++ b/drivers/media/platform/renesas-ceu.c

> @@ -1099,10 +1099,10 @@ static int ceu_open(struct file *file)

>  

>  	mutex_lock(&ceudev->mlock);

>  	/* Causes soft-reset and sensor power on on first open */

> -	pm_runtime_get_sync(ceudev->dev);

> +	ret = pm_runtime_resume_and_get(ceudev->dev);

>  	mutex_unlock(&ceudev->mlock);

>  

> -	return 0;

> +	return ret;

>  }

>  

>  static int ceu_release(struct file *file)
Jonathan Cameron April 30, 2021, 4:30 p.m. UTC | #9
On Wed, 28 Apr 2021 16:51:31 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The pm_runtime_get_sync() internally increments the

> dev->power.usage_count without decrementing it, even on errors.

> Replace it by the new pm_runtime_resume_and_get(), introduced by:

> commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> in order to properly decrement the usage counter and avoid memory

> leaks.

> 

> While here, fix the return contition of mtk_mdp_m2m_start_streaming(),

> as it doesn't make any sense to return 0 if the PM runtime failed

> to resume.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---

>  drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c | 6 +++---

>  1 file changed, 3 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c b/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c

> index ace4528cdc5e..f14779e7596e 100644

> --- a/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c

> +++ b/drivers/media/platform/mtk-mdp/mtk_mdp_m2m.c

> @@ -391,12 +391,12 @@ static int mtk_mdp_m2m_start_streaming(struct vb2_queue *q, unsigned int count)

>  	struct mtk_mdp_ctx *ctx = q->drv_priv;

>  	int ret;

>  

> -	ret = pm_runtime_get_sync(&ctx->mdp_dev->pdev->dev);

> +	ret = pm_runtime_resume_and_get(&ctx->mdp_dev->pdev->dev);

>  	if (ret < 0)

> -		mtk_mdp_dbg(1, "[%d] pm_runtime_get_sync failed:%d",

> +		mtk_mdp_dbg(1, "[%d] pm_runtime_resume_and_get failed:%d",

>  			    ctx->id, ret);

>  

> -	return 0;

> +	return ret;

>  }

>  

>  static void *mtk_mdp_m2m_buf_remove(struct mtk_mdp_ctx *ctx,
Jonathan Cameron April 30, 2021, 4:44 p.m. UTC | #10
On Wed, 28 Apr 2021 16:51:37 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> The pm_runtime_get_sync() internally increments the

> dev->power.usage_count without decrementing it, even on errors.

> Replace it by the new pm_runtime_resume_and_get(), introduced by:

> commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> in order to properly decrement the usage counter and avoid memory

> leaks.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Without a return value, this seems like it's not complete as driver
will carry on and eventually call mtk_vcode_dec_pw_off() which will assume
it needs to decrement the count.


> ---

>  drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c | 4 ++--

>  1 file changed, 2 insertions(+), 2 deletions(-)

> 

> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c

> index ddee7046ce42..fe096fe61c9d 100644

> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c

> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_pm.c

> @@ -92,9 +92,9 @@ void mtk_vcodec_dec_pw_on(struct mtk_vcodec_pm *pm)

>  {

>  	int ret;

>  

> -	ret = pm_runtime_get_sync(pm->dev);

> +	ret = pm_runtime_resume_and_get(pm->dev);

>  	if (ret)

> -		mtk_v4l2_err("pm_runtime_get_sync fail %d", ret);

> +		mtk_v4l2_err("pm_runtime_resume_and_get fail %d", ret);

>  }

>  

>  void mtk_vcodec_dec_pw_off(struct mtk_vcodec_pm *pm)
Jonathan Cameron April 30, 2021, 4:59 p.m. UTC | #11
On Wed, 28 Apr 2021 16:51:42 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> added pm_runtime_resume_and_get() in order to automatically handle

> dev->power.usage_count decrement on errors.

> 

> Use the new API, in order to cleanup the error check logic.


I'd mention that the origin error handling order was wrong and you've
also fixed that by moving hm_pool_unregister() later.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> ---

>  drivers/staging/media/atomisp/pci/atomisp_fops.c | 6 +++---

>  1 file changed, 3 insertions(+), 3 deletions(-)

> 

> diff --git a/drivers/staging/media/atomisp/pci/atomisp_fops.c b/drivers/staging/media/atomisp/pci/atomisp_fops.c

> index f1e6b2597853..26d05474a035 100644

> --- a/drivers/staging/media/atomisp/pci/atomisp_fops.c

> +++ b/drivers/staging/media/atomisp/pci/atomisp_fops.c

> @@ -837,7 +837,7 @@ static int atomisp_open(struct file *file)

>  	}

>  

>  	/* runtime power management, turn on ISP */

> -	ret = pm_runtime_get_sync(vdev->v4l2_dev->dev);

> +	ret = pm_runtime_resume_and_get(vdev->v4l2_dev->dev);

>  	if (ret < 0) {

>  		dev_err(isp->dev, "Failed to power on device\n");

>  		goto error;

> @@ -881,9 +881,9 @@ static int atomisp_open(struct file *file)

>  


>  css_error:

>  	atomisp_css_uninit(isp);

> -error:

> -	hmm_pool_unregister(HMM_POOL_TYPE_DYNAMIC);

>  	pm_runtime_put(vdev->v4l2_dev->dev);

> +error:

> +	hmm_pool_unregister(HMM_POOL_TYPE_DYNAMIC);

>  	rt_mutex_unlock(&isp->mutex);

>  	return ret;

>  }
Jonathan Cameron April 30, 2021, 5:03 p.m. UTC | #12
On Wed, 28 Apr 2021 16:51:44 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> added pm_runtime_resume_and_get() in order to automatically handle

> dev->power.usage_count decrement on errors.

> 

> Use the new API, in order to cleanup the error check logic.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Could add that pm_runtime_put() should have been pm_runtime_put_noidle()
inorder to not potentially result in a call to runtime suspend when
we never resumed in the first place.

Otherwise reasonable cleanup.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---

>  drivers/staging/media/ipu3/ipu3.c | 3 +--

>  1 file changed, 1 insertion(+), 2 deletions(-)

> 

> diff --git a/drivers/staging/media/ipu3/ipu3.c b/drivers/staging/media/ipu3/ipu3.c

> index ee1bba6bdcac..8e1e9e46e604 100644

> --- a/drivers/staging/media/ipu3/ipu3.c

> +++ b/drivers/staging/media/ipu3/ipu3.c

> @@ -392,10 +392,9 @@ int imgu_s_stream(struct imgu_device *imgu, int enable)

>  	}

>  

>  	/* Set Power */

> -	r = pm_runtime_get_sync(dev);

> +	r = pm_runtime_resume_and_get(dev);

>  	if (r < 0) {

>  		dev_err(dev, "failed to set imgu power\n");

> -		pm_runtime_put(dev);

>  		return r;

>  	}

>
Jonathan Cameron April 30, 2021, 5:14 p.m. UTC | #13
On Wed, 28 Apr 2021 16:51:48 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> added pm_runtime_resume_and_get() in order to automatically handle

> dev->power.usage_count decrement on errors.

> 

> Use the new API, in order to cleanup the error check logic.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---

>  drivers/media/i2c/ak7375.c | 10 +---------

>  1 file changed, 1 insertion(+), 9 deletions(-)

> 

> diff --git a/drivers/media/i2c/ak7375.c b/drivers/media/i2c/ak7375.c

> index e1f94ee0f48f..40b1a4aa846c 100644

> --- a/drivers/media/i2c/ak7375.c

> +++ b/drivers/media/i2c/ak7375.c

> @@ -87,15 +87,7 @@ static const struct v4l2_ctrl_ops ak7375_vcm_ctrl_ops = {

>  

>  static int ak7375_open(struct v4l2_subdev *sd, struct v4l2_subdev_fh *fh)

>  {

> -	int ret;

> -

> -	ret = pm_runtime_get_sync(sd->dev);

> -	if (ret < 0) {

> -		pm_runtime_put_noidle(sd->dev);

> -		return ret;

> -	}

> -

> -	return 0;

> +	return pm_runtime_resume_and_get(sd->dev);

>  }

>  

>  static int ak7375_close(struct v4l2_subdev *sd, struct v4l2_subdev_fh *fh)
Jonathan Cameron April 30, 2021, 5:20 p.m. UTC | #14
On Wed, 28 Apr 2021 16:52:06 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> added pm_runtime_resume_and_get() in order to automatically handle

> dev->power.usage_count decrement on errors.

> 

> Use the new API, in order to cleanup the error check logic.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Trivial inline.  Otherwise
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>



> ---

>  drivers/media/i2c/ov2740.c | 6 ++----

>  1 file changed, 2 insertions(+), 4 deletions(-)

> 

> diff --git a/drivers/media/i2c/ov2740.c b/drivers/media/i2c/ov2740.c

> index 0f3f17f3c426..54779f720f9d 100644

> --- a/drivers/media/i2c/ov2740.c

> +++ b/drivers/media/i2c/ov2740.c

> @@ -751,9 +751,8 @@ static int ov2740_set_stream(struct v4l2_subdev *sd, int enable)

>  

>  	mutex_lock(&ov2740->mutex);

>  	if (enable) {

> -		ret = pm_runtime_get_sync(&client->dev);

> +		ret = pm_runtime_resume_and_get(&client->dev);

>  		if (ret < 0) {

> -			pm_runtime_put_noidle(&client->dev);

>  			mutex_unlock(&ov2740->mutex);

>  			return ret;

>  		}

> @@ -1049,9 +1048,8 @@ static int ov2740_nvmem_read(void *priv, unsigned int off, void *val,

>  		goto exit;

>  	}

>  

> -	ret = pm_runtime_get_sync(dev);

> +	ret = pm_runtime_resume_and_get(dev);

>  	if (ret < 0) {


Drop the brackets?

> -		pm_runtime_put_noidle(dev);

>  		goto exit;

>  	}

>
Jonathan Cameron April 30, 2021, 5:35 p.m. UTC | #15
On Wed, 28 Apr 2021 16:52:19 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> added pm_runtime_resume_and_get() in order to automatically handle

> dev->power.usage_count decrement on errors.

> 

> While the hva driver does it right, there are lots of errors

> on other drivers due to its misusage. So, let's change

> this driver to also use pm_runtime_resume_and_get(), as we're

> doing similar changes all over the media subsystem.

> 

> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

Bit of a fiddly one, but looks right to me.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>


> ---

>  drivers/media/platform/sti/hva/hva-hw.c | 17 +++++++++--------

>  1 file changed, 9 insertions(+), 8 deletions(-)

> 

> diff --git a/drivers/media/platform/sti/hva/hva-hw.c b/drivers/media/platform/sti/hva/hva-hw.c

> index f59811e27f51..77b8bfa5e0c5 100644

> --- a/drivers/media/platform/sti/hva/hva-hw.c

> +++ b/drivers/media/platform/sti/hva/hva-hw.c

> @@ -270,9 +270,8 @@ static unsigned long int hva_hw_get_ip_version(struct hva_dev *hva)

>  	struct device *dev = hva_to_dev(hva);

>  	unsigned long int version;

>  

> -	if (pm_runtime_get_sync(dev) < 0) {

> +	if (pm_runtime_resume_and_get(dev) < 0) {

>  		dev_err(dev, "%s     failed to get pm_runtime\n", HVA_PREFIX);

> -		pm_runtime_put_noidle(dev);

>  		mutex_unlock(&hva->protect_mutex);

>  		return -EFAULT;

>  	}

> @@ -386,10 +385,10 @@ int hva_hw_probe(struct platform_device *pdev, struct hva_dev *hva)

>  	pm_runtime_set_suspended(dev);

>  	pm_runtime_enable(dev);

>  

> -	ret = pm_runtime_get_sync(dev);

> +	ret = pm_runtime_resume_and_get(dev);

>  	if (ret < 0) {

>  		dev_err(dev, "%s     failed to set PM\n", HVA_PREFIX);

> -		goto err_pm;

> +		goto err_clk;

>  	}

>  

>  	/* check IP hardware version */

> @@ -462,6 +461,7 @@ int hva_hw_execute_task(struct hva_ctx *ctx, enum hva_hw_cmd_type cmd,

>  	u8 client_id = ctx->id;

>  	int ret;

>  	u32 reg = 0;

> +	bool got_pm = false;

>  

>  	mutex_lock(&hva->protect_mutex);

>  

> @@ -469,12 +469,13 @@ int hva_hw_execute_task(struct hva_ctx *ctx, enum hva_hw_cmd_type cmd,

>  	enable_irq(hva->irq_its);

>  	enable_irq(hva->irq_err);

>  

> -	if (pm_runtime_get_sync(dev) < 0) {

> +	if (pm_runtime_resume_and_get(dev) < 0) {

>  		dev_err(dev, "%s     failed to get pm_runtime\n", ctx->name);

>  		ctx->sys_errors++;

>  		ret = -EFAULT;

>  		goto out;

>  	}

> +	got_pm = true;

>  

>  	reg = readl_relaxed(hva->regs + HVA_HIF_REG_CLK_GATING);

>  	switch (cmd) {

> @@ -537,7 +538,8 @@ int hva_hw_execute_task(struct hva_ctx *ctx, enum hva_hw_cmd_type cmd,

>  		dev_dbg(dev, "%s     unknown command 0x%x\n", ctx->name, cmd);

>  	}

>  

> -	pm_runtime_put_autosuspend(dev);

> +	if (got_pm)

> +		pm_runtime_put_autosuspend(dev);

>  	mutex_unlock(&hva->protect_mutex);

>  

>  	return ret;

> @@ -553,9 +555,8 @@ void hva_hw_dump_regs(struct hva_dev *hva, struct seq_file *s)

>  

>  	mutex_lock(&hva->protect_mutex);

>  

> -	if (pm_runtime_get_sync(dev) < 0) {

> +	if (pm_runtime_resume_and_get(dev) < 0) {

>  		seq_puts(s, "Cannot wake up IP\n");

> -		pm_runtime_put_noidle(dev);

>  		mutex_unlock(&hva->protect_mutex);

>  		return;

>  	}
Jonathan Cameron April 30, 2021, 5:49 p.m. UTC | #16
On Wed, 28 Apr 2021 16:52:23 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
> added pm_runtime_resume_and_get() in order to automatically handle
> dev->power.usage_count decrement on errors.
> 
> Use the new API, in order to cleanup the error check logic.
> 
> Reviewed-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  drivers/media/platform/exynos4-is/fimc-capture.c   | 6 ++----
>  drivers/media/platform/exynos4-is/fimc-is.c        | 4 ++--
>  drivers/media/platform/exynos4-is/fimc-isp-video.c | 3 +--
>  drivers/media/platform/exynos4-is/fimc-isp.c       | 7 +++----
>  drivers/media/platform/exynos4-is/fimc-lite.c      | 5 +++--
>  drivers/media/platform/exynos4-is/fimc-m2m.c       | 2 +-
>  drivers/media/platform/exynos4-is/media-dev.c      | 8 +++-----
>  drivers/media/platform/exynos4-is/mipi-csis.c      | 8 +++-----
>  8 files changed, 18 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/media/platform/exynos4-is/fimc-capture.c b/drivers/media/platform/exynos4-is/fimc-capture.c
> index 13c838d3f947..0da36443173c 100644
> --- a/drivers/media/platform/exynos4-is/fimc-capture.c
> +++ b/drivers/media/platform/exynos4-is/fimc-capture.c
> @@ -478,11 +478,9 @@ static int fimc_capture_open(struct file *file)
>  		goto unlock;
>  
>  	set_bit(ST_CAPT_BUSY, &fimc->state);
> -	ret = pm_runtime_get_sync(&fimc->pdev->dev);
> -	if (ret < 0) {
> -		pm_runtime_put_sync(&fimc->pdev->dev);
> +	ret = pm_runtime_resume_and_get(&fimc->pdev->dev);
> +	if (ret < 0)
>  		goto unlock;
> -	}
>  
>  	ret = v4l2_fh_open(file);
>  	if (ret) {
> diff --git a/drivers/media/platform/exynos4-is/fimc-is.c b/drivers/media/platform/exynos4-is/fimc-is.c
> index 972d9601d236..1b24f5bfc4af 100644
> --- a/drivers/media/platform/exynos4-is/fimc-is.c
> +++ b/drivers/media/platform/exynos4-is/fimc-is.c
> @@ -828,9 +828,9 @@ static int fimc_is_probe(struct platform_device *pdev)
>  			goto err_irq;
>  	}
>  
> -	ret = pm_runtime_get_sync(dev);
> +	ret = pm_runtime_resume_and_get(dev);
>  	if (ret < 0)
> -		goto err_pm;
> +		goto err_irq;
>  

I think the pm_runtime_put_noidle() below err_pm: should become
a pm_runtime_put() because otherwise we won't call fimc_is_runtime_suspend()
if runtime pm is enabled (either directly or via runtime pm count becoming zero.

(not I'm not 100% sure of my argument here...)

>  	vb2_dma_contig_set_max_seg_size(dev, DMA_BIT_MASK(32));
>  
> diff --git a/drivers/media/platform/exynos4-is/fimc-isp-video.c b/drivers/media/platform/exynos4-is/fimc-isp-video.c
> index 612b9872afc8..8d9dc597deaa 100644
> --- a/drivers/media/platform/exynos4-is/fimc-isp-video.c
> +++ b/drivers/media/platform/exynos4-is/fimc-isp-video.c
> @@ -275,7 +275,7 @@ static int isp_video_open(struct file *file)
>  	if (ret < 0)
>  		goto unlock;
>  
> -	ret = pm_runtime_get_sync(&isp->pdev->dev);
> +	ret = pm_runtime_resume_and_get(&isp->pdev->dev);
>  	if (ret < 0)
>  		goto rel_fh;
>  
> @@ -293,7 +293,6 @@ static int isp_video_open(struct file *file)
>  	if (!ret)
>  		goto unlock;

So the good path is to jump over the block.  I'd just unlock and return here
to make it more readable but unrelated to this patch.

>  rel_fh:
> -	pm_runtime_put_noidle(&isp->pdev->dev);

Logic flow here isn't very standard, but with this change, you'll not
call anything to unwind a successful call to pm_runtime_resume_and_get if
fmic_pipeline_call() returns and error.
My guess is pm_runtime_put_sync() is needed before rel_fh; to balance this.
 
>  	v4l2_fh_release(file);
>  unlock:
>  	mutex_unlock(&isp->video_lock);
> diff --git a/drivers/media/platform/exynos4-is/fimc-isp.c b/drivers/media/platform/exynos4-is/fimc-isp.c
> index a77c49b18511..74b49d30901e 100644
> --- a/drivers/media/platform/exynos4-is/fimc-isp.c
> +++ b/drivers/media/platform/exynos4-is/fimc-isp.c
> @@ -304,11 +304,10 @@ static int fimc_isp_subdev_s_power(struct v4l2_subdev *sd, int on)
>  	pr_debug("on: %d\n", on);
>  
>  	if (on) {
> -		ret = pm_runtime_get_sync(&is->pdev->dev);
> -		if (ret < 0) {
> -			pm_runtime_put(&is->pdev->dev);
> +		ret = pm_runtime_resume_and_get(&is->pdev->dev);
> +		if (ret < 0)
>  			return ret;
> -		}
> +
>  		set_bit(IS_ST_PWR_ON, &is->state);
>  
>  		ret = fimc_is_start_firmware(is);
> diff --git a/drivers/media/platform/exynos4-is/fimc-lite.c b/drivers/media/platform/exynos4-is/fimc-lite.c
> index fe20af3a7178..4d8b18078ff3 100644
> --- a/drivers/media/platform/exynos4-is/fimc-lite.c
> +++ b/drivers/media/platform/exynos4-is/fimc-lite.c
> @@ -469,9 +469,9 @@ static int fimc_lite_open(struct file *file)
>  	}
>  
>  	set_bit(ST_FLITE_IN_USE, &fimc->state);
> -	ret = pm_runtime_get_sync(&fimc->pdev->dev);
> +	ret = pm_runtime_resume_and_get(&fimc->pdev->dev);
>  	if (ret < 0)
> -		goto err_pm;
> +		goto err_in_use;
>  
>  	ret = v4l2_fh_open(file);
>  	if (ret < 0)
> @@ -499,6 +499,7 @@ static int fimc_lite_open(struct file *file)
>  	v4l2_fh_release(file);
>  err_pm:
>  	pm_runtime_put_sync(&fimc->pdev->dev);
> +err_in_use:
>  	clear_bit(ST_FLITE_IN_USE, &fimc->state);
>  unlock:
>  	mutex_unlock(&fimc->lock);
> diff --git a/drivers/media/platform/exynos4-is/fimc-m2m.c b/drivers/media/platform/exynos4-is/fimc-m2m.c
> index c9704a147e5c..7c1eb05c508f 100644
> --- a/drivers/media/platform/exynos4-is/fimc-m2m.c
> +++ b/drivers/media/platform/exynos4-is/fimc-m2m.c
> @@ -75,7 +75,7 @@ static int start_streaming(struct vb2_queue *q, unsigned int count)
>  	struct fimc_ctx *ctx = q->drv_priv;
>  	int ret;
>  
> -	ret = pm_runtime_get_sync(&ctx->fimc_dev->pdev->dev);
> +	ret = pm_runtime_resume_and_get(&ctx->fimc_dev->pdev->dev);
Use the fact pm_runtime_resume_and_get only returns 0 or < 0 to
make this
return pm_runtime_resume_and_get()

>  	return ret > 0 ? 0 : ret;
>  }
>  
> diff --git a/drivers/media/platform/exynos4-is/media-dev.c b/drivers/media/platform/exynos4-is/media-dev.c
> index 13d192ba4aa6..9346d44a06c2 100644
> --- a/drivers/media/platform/exynos4-is/media-dev.c
> +++ b/drivers/media/platform/exynos4-is/media-dev.c
> @@ -512,11 +512,9 @@ static int fimc_md_register_sensor_entities(struct fimc_md *fmd)
>  	if (!fmd->pmf)
>  		return -ENXIO;
>  
> -	ret = pm_runtime_get_sync(fmd->pmf);
> -	if (ret < 0) {
> -		pm_runtime_put(fmd->pmf);
> +	ret = pm_runtime_resume_and_get(fmd->pmf);
> +	if (ret < 0)
>  		return ret;
> -	}
>  
>  	fmd->num_sensors = 0;
>  
> @@ -1291,7 +1289,7 @@ static int cam_clk_prepare(struct clk_hw *hw)
>  	if (camclk->fmd->pmf == NULL)
>  		return -ENODEV;
>  
> -	ret = pm_runtime_get_sync(camclk->fmd->pmf);
> +	ret = pm_runtime_resume_and_get(camclk->fmd->pmf);
return pm_...

>  	return ret < 0 ? ret : 0;
>  }
>  
> diff --git a/drivers/media/platform/exynos4-is/mipi-csis.c b/drivers/media/platform/exynos4-is/mipi-csis.c
> index 1aac167abb17..2a6afb78a049 100644
> --- a/drivers/media/platform/exynos4-is/mipi-csis.c
> +++ b/drivers/media/platform/exynos4-is/mipi-csis.c
> @@ -494,7 +494,7 @@ static int s5pcsis_s_power(struct v4l2_subdev *sd, int on)
>  	struct device *dev = &state->pdev->dev;
>  
>  	if (on)
> -		return pm_runtime_get_sync(dev);
> +		return pm_runtime_resume_and_get(dev);
>  
>  	return pm_runtime_put_sync(dev);
>  }
> @@ -509,11 +509,9 @@ static int s5pcsis_s_stream(struct v4l2_subdev *sd, int enable)
>  
>  	if (enable) {
>  		s5pcsis_clear_counters(state);
> -		ret = pm_runtime_get_sync(&state->pdev->dev);
> -		if (ret && ret != 1) {
> -			pm_runtime_put_noidle(&state->pdev->dev);
> +		ret = pm_runtime_resume_and_get(&state->pdev->dev);
> +		if (ret && ret != 1)

It won't be equal to 1 as pm_runtime_resume_and_get only returns <= 0

>  			return ret;
> -		}
>  	}
>  
>  	mutex_lock(&state->lock);
Jonathan Cameron April 30, 2021, 5:53 p.m. UTC | #17
On Wed, 28 Apr 2021 16:52:28 +0200
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")
> added pm_runtime_resume_and_get() in order to automatically handle
> dev->power.usage_count decrement on errors.
> 
> Use the new API, in order to cleanup the error check logic.
> 
> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
> ---
>  drivers/media/platform/qcom/venus/vdec.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/vdec.c b/drivers/media/platform/qcom/venus/vdec.c
> index ddb7cd39424e..347e533ea673 100644
> --- a/drivers/media/platform/qcom/venus/vdec.c
> +++ b/drivers/media/platform/qcom/venus/vdec.c
> @@ -568,7 +568,7 @@ static int vdec_pm_get(struct venus_inst *inst)
>  	int ret;
>  
>  	mutex_lock(&core->pm_lock);
> -	ret = pm_runtime_get_sync(dev);
> +	ret = pm_runtime_resume_and_get(dev);
>  	mutex_unlock(&core->pm_lock);
>  
>  	return ret < 0 ? ret : 0;
Don't need this dance any more due to " Return 0 if the runtime PM usage counter of @dev has been
 * incremented or a negative error code otherwise."
return ret;

> @@ -601,7 +601,7 @@ static int vdec_pm_get_put(struct venus_inst *inst)
>  	mutex_lock(&core->pm_lock);
>  
>  	if (pm_runtime_suspended(dev)) {
> -		ret = pm_runtime_get_sync(dev);
> +		ret = pm_runtime_resume_and_get(dev);
>  		if (ret < 0)
>  			goto error;
>
Johan Hovold May 3, 2021, 9:57 a.m. UTC | #18
On Fri, Apr 30, 2021 at 06:03:38PM +0100, Jonathan Cameron wrote:
> On Wed, 28 Apr 2021 16:51:44 +0200

> Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> 

> > Commit dd8088d5a896 ("PM: runtime: Add pm_runtime_resume_and_get to deal with usage counter")

> > added pm_runtime_resume_and_get() in order to automatically handle

> > dev->power.usage_count decrement on errors.

> > 

> > Use the new API, in order to cleanup the error check logic.

> > 

> > Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>

> Could add that pm_runtime_put() should have been pm_runtime_put_noidle()

> inorder to not potentially result in a call to runtime suspend when

> we never resumed in the first place.


No, that would never happen anyway and any pm_runtime_put() will do
even if pm_runtime_put_noidle() is the natural choice in this case to
balance the counter.

> Otherwise reasonable cleanup.

> 

> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> 

> > ---

> >  drivers/staging/media/ipu3/ipu3.c | 3 +--

> >  1 file changed, 1 insertion(+), 2 deletions(-)

> > 

> > diff --git a/drivers/staging/media/ipu3/ipu3.c b/drivers/staging/media/ipu3/ipu3.c

> > index ee1bba6bdcac..8e1e9e46e604 100644

> > --- a/drivers/staging/media/ipu3/ipu3.c

> > +++ b/drivers/staging/media/ipu3/ipu3.c

> > @@ -392,10 +392,9 @@ int imgu_s_stream(struct imgu_device *imgu, int enable)

> >  	}

> >  

> >  	/* Set Power */

> > -	r = pm_runtime_get_sync(dev);

> > +	r = pm_runtime_resume_and_get(dev);

> >  	if (r < 0) {

> >  		dev_err(dev, "failed to set imgu power\n");

> > -		pm_runtime_put(dev);

> >  		return r;

> >  	}


Johan