mbox series

[v3,0/9] drm/msm+PM+icc: Make job_run() reclaim-safe

Message ID 20230803220202.78036-1-robdclark@gmail.com
Headers show
Series drm/msm+PM+icc: Make job_run() reclaim-safe | expand

Message

Rob Clark Aug. 3, 2023, 10:01 p.m. UTC
From: Rob Clark <robdclark@chromium.org>

Inspired by https://lore.kernel.org/dri-devel/20200604081224.863494-10-daniel.vetter@ffwll.ch/
it seemed like a good idea to get rid of memory allocation in job_run()
fence signaling path, and use lockdep annotations to yell at us about
anything that could deadlock against shrinker/reclaim.  Anything that
can trigger reclaim, or block on any other thread that has triggered
reclaim, can block the GPU shrinker from releasing memory if it is
waiting the job to complete, causing deadlock.

The first two patches decouple allocation from devfreq->lock, and teach
lockdep that devfreq->lock can be acquired in paths that the shrinker
indirectly depends on.

The next three patches do the same for PM QoS.  And the next two do a
similar thing for interconnect.

And then finally the last two patches enable the lockdep fence-
signalling annotations.


v2: Switch from embedding hw_fence in submit/job object to preallocating
    the hw_fence.  Rework "fenced unpin" locking to drop obj lock from
    fence signaling path (ie. the part that was still WIP in the first
    iteration of the patchset).  Adds the final patch to enable fence
    signaling annotations now that job_run() and job_free() are safe.
    The PM devfreq/QoS and interconnect patches are unchanged.

v3: Mostly unchanged, but series is much smaller now that drm changes
    have landed, mostly consisting of the remaining devfreq/qos/
    interconnect fixes.

Rob Clark (9):
  PM / devfreq: Drop unneed locking to appease lockdep
  PM / devfreq: Teach lockdep about locking order
  PM / QoS: Fix constraints alloc vs reclaim locking
  PM / QoS: Decouple request alloc from dev_pm_qos_mtx
  PM / QoS: Teach lockdep about dev_pm_qos_mtx locking order
  interconnect: Fix locking for runpm vs reclaim
  interconnect: Teach lockdep about icc_bw_lock order
  drm/sched: Add (optional) fence signaling annotation
  drm/msm: Enable fence signalling annotations

 drivers/base/power/qos.c               | 85 +++++++++++++++++++-------
 drivers/devfreq/devfreq.c              | 52 ++++++++--------
 drivers/gpu/drm/msm/msm_ringbuffer.c   |  1 +
 drivers/gpu/drm/scheduler/sched_main.c |  9 +++
 drivers/interconnect/core.c            | 18 +++++-
 include/drm/gpu_scheduler.h            |  2 +
 6 files changed, 117 insertions(+), 50 deletions(-)

Comments

Rafael J. Wysocki Aug. 4, 2023, 5:07 p.m. UTC | #1
On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <robdclark@gmail.com> wrote:
>
> From: Rob Clark <robdclark@chromium.org>
>
> In the process of adding lockdep annotation for drm GPU scheduler's
> job_run() to detect potential deadlock against shrinker/reclaim, I hit
> this lockdep splat:
>
>    ======================================================
>    WARNING: possible circular locking dependency detected
>    6.2.0-rc8-debug+ #558 Tainted: G        W
>    ------------------------------------------------------
>    ring0/125 is trying to acquire lock:
>    ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
>
>    but task is already holding lock:
>    ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
>
>    which lock already depends on the new lock.
>
>    the existing dependency chain (in reverse order) is:
>
>    -> #4 (&gpu->active_lock){+.+.}-{3:3}:
>           __mutex_lock+0xcc/0x3c8
>           mutex_lock_nested+0x30/0x44
>           msm_gpu_submit+0xec/0x178
>           msm_job_run+0x78/0x150
>           drm_sched_main+0x290/0x370
>           kthread+0xf0/0x100
>           ret_from_fork+0x10/0x20
>
>    -> #3 (dma_fence_map){++++}-{0:0}:
>           __dma_fence_might_wait+0x74/0xc0
>           dma_resv_lockdep+0x1f4/0x2f4
>           do_one_initcall+0x104/0x2bc
>           kernel_init_freeable+0x344/0x34c
>           kernel_init+0x30/0x134
>           ret_from_fork+0x10/0x20
>
>    -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
>           fs_reclaim_acquire+0x80/0xa8
>           slab_pre_alloc_hook.constprop.0+0x40/0x25c
>           __kmem_cache_alloc_node+0x60/0x1cc
>           __kmalloc+0xd8/0x100
>           topology_parse_cpu_capacity+0x8c/0x178
>           get_cpu_for_node+0x88/0xc4
>           parse_cluster+0x1b0/0x28c
>           parse_cluster+0x8c/0x28c
>           init_cpu_topology+0x168/0x188
>           smp_prepare_cpus+0x24/0xf8
>           kernel_init_freeable+0x18c/0x34c
>           kernel_init+0x30/0x134
>           ret_from_fork+0x10/0x20
>
>    -> #1 (fs_reclaim){+.+.}-{0:0}:
>           __fs_reclaim_acquire+0x3c/0x48
>           fs_reclaim_acquire+0x54/0xa8
>           slab_pre_alloc_hook.constprop.0+0x40/0x25c
>           __kmem_cache_alloc_node+0x60/0x1cc
>           kmalloc_trace+0x50/0xa8
>           dev_pm_qos_constraints_allocate+0x38/0x100
>           __dev_pm_qos_add_request+0xb0/0x1e8
>           dev_pm_qos_add_request+0x58/0x80
>           dev_pm_qos_expose_latency_limit+0x60/0x13c
>           register_cpu+0x12c/0x130
>           topology_init+0xac/0xbc
>           do_one_initcall+0x104/0x2bc
>           kernel_init_freeable+0x344/0x34c
>           kernel_init+0x30/0x134
>           ret_from_fork+0x10/0x20
>
>    -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
>           __lock_acquire+0xe00/0x1060
>           lock_acquire+0x1e0/0x2f8
>           __mutex_lock+0xcc/0x3c8
>           mutex_lock_nested+0x30/0x44
>           dev_pm_qos_update_request+0x38/0x68
>           msm_devfreq_boost+0x40/0x70
>           msm_devfreq_active+0xc0/0xf0
>           msm_gpu_submit+0x10c/0x178
>           msm_job_run+0x78/0x150
>           drm_sched_main+0x290/0x370
>           kthread+0xf0/0x100
>           ret_from_fork+0x10/0x20
>
>    other info that might help us debug this:
>
>    Chain exists of:
>      dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
>
>     Possible unsafe locking scenario:
>
>           CPU0                    CPU1
>           ----                    ----
>      lock(&gpu->active_lock);
>                                   lock(dma_fence_map);
>                                   lock(&gpu->active_lock);
>      lock(dev_pm_qos_mtx);
>
>     *** DEADLOCK ***
>
>    3 locks held by ring0/123:
>     #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
>     #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
>     #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
>
>    stack backtrace:
>    CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
>    Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
>    Call trace:
>     dump_backtrace.part.0+0xb4/0xf8
>     show_stack+0x20/0x38
>     dump_stack_lvl+0x9c/0xd0
>     dump_stack+0x18/0x34
>     print_circular_bug+0x1b4/0x1f0
>     check_noncircular+0x78/0xac
>     __lock_acquire+0xe00/0x1060
>     lock_acquire+0x1e0/0x2f8
>     __mutex_lock+0xcc/0x3c8
>     mutex_lock_nested+0x30/0x44
>     dev_pm_qos_update_request+0x38/0x68
>     msm_devfreq_boost+0x40/0x70
>     msm_devfreq_active+0xc0/0xf0
>     msm_gpu_submit+0x10c/0x178
>     msm_job_run+0x78/0x150
>     drm_sched_main+0x290/0x370
>     kthread+0xf0/0x100
>     ret_from_fork+0x10/0x20
>
> The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> freq change) path, but it is also held across allocations that could
> recurse into shrinker.
>
> Solve this by changing dev_pm_qos_constraints_allocate() into a function
> that can be called unconditionally before the device qos object is
> needed and before aquiring dev_pm_qos_mtx.  This way the allocations can
> be done without holding the mutex.  In the case that we raced with
> another thread to allocate the qos object, detect this *after* acquiring
> the dev_pm_qos_mtx and simply free the redundant allocations.
>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
>  1 file changed, 41 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> index 8e93167f1783..f3e0c6b65635 100644
> --- a/drivers/base/power/qos.c
> +++ b/drivers/base/power/qos.c
> @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
>  }
>
>  /*
> - * dev_pm_qos_constraints_allocate
> + * dev_pm_qos_constraints_ensure_allocated
>   * @dev: device to allocate data for
>   *
> - * Called at the first call to add_request, for constraint data allocation
> - * Must be called with the dev_pm_qos_mtx mutex held
> + * Called to ensure that devices qos is allocated, before acquiring
> + * dev_pm_qos_mtx.
>   */
> -static int dev_pm_qos_constraints_allocate(struct device *dev)
> +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
>  {
>         struct dev_pm_qos *qos;
>         struct pm_qos_constraints *c;
>         struct blocking_notifier_head *n;
>
> +       if (!dev)
> +               return -ENODEV;
> +
> +       if (!IS_ERR_OR_NULL(dev->power.qos))
> +               return 0;
> +
>         qos = kzalloc(sizeof(*qos), GFP_KERNEL);
>         if (!qos)
>                 return -ENOMEM;
> @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
>
>         INIT_LIST_HEAD(&qos->flags.list);
>
> +       mutex_lock(&dev_pm_qos_mtx);
> +
> +       if (!IS_ERR_OR_NULL(dev->power.qos)) {
> +               /*
> +                * We have raced with another task to create the qos.
> +                * No biggie, just free the resources we've allocated
> +                * outside of dev_pm_qos_mtx and move on with life.
> +                */
> +               kfree(n);
> +               kfree(qos);
> +               goto unlock;
> +       }
> +
>         spin_lock_irq(&dev->power.lock);
>         dev->power.qos = qos;
>         spin_unlock_irq(&dev->power.lock);
>
> +unlock:
> +       mutex_unlock(&dev_pm_qos_mtx);
> +
>         return 0;
>  }
>
> @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
>  {
>         int ret = 0;
>
> -       if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> +       if (!req || dev_pm_qos_invalid_req_type(dev, type))
>                 return -EINVAL;
>
>         if (WARN(dev_pm_qos_request_active(req),
>                  "%s() called for already added request\n", __func__))
>                 return -EINVAL;
>
> -       if (IS_ERR(dev->power.qos))
> +       if (IS_ERR_OR_NULL(dev->power.qos))
>                 ret = -ENODEV;
> -       else if (!dev->power.qos)
> -               ret = dev_pm_qos_constraints_allocate(dev);
>
>         trace_dev_pm_qos_add_request(dev_name(dev), type, value);
>         if (ret)
> @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
>  {
>         int ret;
>
> +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> +       if (ret)
> +               return ret;
> +

It is a bit unfortunate that the mutex is dropped and then immediately
re-acquired again.  I don't think that this is strictly necessary.

>         mutex_lock(&dev_pm_qos_mtx);
>         ret = __dev_pm_qos_add_request(dev, req, type, value);
>         mutex_unlock(&dev_pm_qos_mtx);
> @@ -537,15 +561,11 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
>  {
>         int ret = 0;
>
> -       mutex_lock(&dev_pm_qos_mtx);
> -
> -       if (IS_ERR(dev->power.qos))
> -               ret = -ENODEV;
> -       else if (!dev->power.qos)
> -               ret = dev_pm_qos_constraints_allocate(dev);
> -
> +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
>         if (ret)
> -               goto unlock;
> +               return ret;
> +
> +       mutex_lock(&dev_pm_qos_mtx);
>
>         switch (type) {
>         case DEV_PM_QOS_RESUME_LATENCY:
> @@ -565,7 +585,6 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
>                 ret = -EINVAL;
>         }
>
> -unlock:
>         mutex_unlock(&dev_pm_qos_mtx);
>         return ret;
>  }
> @@ -905,10 +924,13 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
>  {
>         int ret;
>
> +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> +       if (ret)
> +               return ret;
> +
>         mutex_lock(&dev_pm_qos_mtx);
>
> -       if (IS_ERR_OR_NULL(dev->power.qos)
> -           || !dev->power.qos->latency_tolerance_req) {
> +       if (!dev->power.qos->latency_tolerance_req) {
>                 struct dev_pm_qos_request *req;
>
>                 if (val < 0) {
> --
> 2.41.0
>
Rob Clark Aug. 4, 2023, 6:38 p.m. UTC | #2
On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <robdclark@gmail.com> wrote:
> >
> > From: Rob Clark <robdclark@chromium.org>
> >
> > In the process of adding lockdep annotation for drm GPU scheduler's
> > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > this lockdep splat:
> >
> >    ======================================================
> >    WARNING: possible circular locking dependency detected
> >    6.2.0-rc8-debug+ #558 Tainted: G        W
> >    ------------------------------------------------------
> >    ring0/125 is trying to acquire lock:
> >    ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> >
> >    but task is already holding lock:
> >    ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> >
> >    which lock already depends on the new lock.
> >
> >    the existing dependency chain (in reverse order) is:
> >
> >    -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> >           __mutex_lock+0xcc/0x3c8
> >           mutex_lock_nested+0x30/0x44
> >           msm_gpu_submit+0xec/0x178
> >           msm_job_run+0x78/0x150
> >           drm_sched_main+0x290/0x370
> >           kthread+0xf0/0x100
> >           ret_from_fork+0x10/0x20
> >
> >    -> #3 (dma_fence_map){++++}-{0:0}:
> >           __dma_fence_might_wait+0x74/0xc0
> >           dma_resv_lockdep+0x1f4/0x2f4
> >           do_one_initcall+0x104/0x2bc
> >           kernel_init_freeable+0x344/0x34c
> >           kernel_init+0x30/0x134
> >           ret_from_fork+0x10/0x20
> >
> >    -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> >           fs_reclaim_acquire+0x80/0xa8
> >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> >           __kmem_cache_alloc_node+0x60/0x1cc
> >           __kmalloc+0xd8/0x100
> >           topology_parse_cpu_capacity+0x8c/0x178
> >           get_cpu_for_node+0x88/0xc4
> >           parse_cluster+0x1b0/0x28c
> >           parse_cluster+0x8c/0x28c
> >           init_cpu_topology+0x168/0x188
> >           smp_prepare_cpus+0x24/0xf8
> >           kernel_init_freeable+0x18c/0x34c
> >           kernel_init+0x30/0x134
> >           ret_from_fork+0x10/0x20
> >
> >    -> #1 (fs_reclaim){+.+.}-{0:0}:
> >           __fs_reclaim_acquire+0x3c/0x48
> >           fs_reclaim_acquire+0x54/0xa8
> >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> >           __kmem_cache_alloc_node+0x60/0x1cc
> >           kmalloc_trace+0x50/0xa8
> >           dev_pm_qos_constraints_allocate+0x38/0x100
> >           __dev_pm_qos_add_request+0xb0/0x1e8
> >           dev_pm_qos_add_request+0x58/0x80
> >           dev_pm_qos_expose_latency_limit+0x60/0x13c
> >           register_cpu+0x12c/0x130
> >           topology_init+0xac/0xbc
> >           do_one_initcall+0x104/0x2bc
> >           kernel_init_freeable+0x344/0x34c
> >           kernel_init+0x30/0x134
> >           ret_from_fork+0x10/0x20
> >
> >    -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> >           __lock_acquire+0xe00/0x1060
> >           lock_acquire+0x1e0/0x2f8
> >           __mutex_lock+0xcc/0x3c8
> >           mutex_lock_nested+0x30/0x44
> >           dev_pm_qos_update_request+0x38/0x68
> >           msm_devfreq_boost+0x40/0x70
> >           msm_devfreq_active+0xc0/0xf0
> >           msm_gpu_submit+0x10c/0x178
> >           msm_job_run+0x78/0x150
> >           drm_sched_main+0x290/0x370
> >           kthread+0xf0/0x100
> >           ret_from_fork+0x10/0x20
> >
> >    other info that might help us debug this:
> >
> >    Chain exists of:
> >      dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> >
> >     Possible unsafe locking scenario:
> >
> >           CPU0                    CPU1
> >           ----                    ----
> >      lock(&gpu->active_lock);
> >                                   lock(dma_fence_map);
> >                                   lock(&gpu->active_lock);
> >      lock(dev_pm_qos_mtx);
> >
> >     *** DEADLOCK ***
> >
> >    3 locks held by ring0/123:
> >     #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> >     #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> >     #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> >
> >    stack backtrace:
> >    CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> >    Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> >    Call trace:
> >     dump_backtrace.part.0+0xb4/0xf8
> >     show_stack+0x20/0x38
> >     dump_stack_lvl+0x9c/0xd0
> >     dump_stack+0x18/0x34
> >     print_circular_bug+0x1b4/0x1f0
> >     check_noncircular+0x78/0xac
> >     __lock_acquire+0xe00/0x1060
> >     lock_acquire+0x1e0/0x2f8
> >     __mutex_lock+0xcc/0x3c8
> >     mutex_lock_nested+0x30/0x44
> >     dev_pm_qos_update_request+0x38/0x68
> >     msm_devfreq_boost+0x40/0x70
> >     msm_devfreq_active+0xc0/0xf0
> >     msm_gpu_submit+0x10c/0x178
> >     msm_job_run+0x78/0x150
> >     drm_sched_main+0x290/0x370
> >     kthread+0xf0/0x100
> >     ret_from_fork+0x10/0x20
> >
> > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > freq change) path, but it is also held across allocations that could
> > recurse into shrinker.
> >
> > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > that can be called unconditionally before the device qos object is
> > needed and before aquiring dev_pm_qos_mtx.  This way the allocations can
> > be done without holding the mutex.  In the case that we raced with
> > another thread to allocate the qos object, detect this *after* acquiring
> > the dev_pm_qos_mtx and simply free the redundant allocations.
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >  drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> >  1 file changed, 41 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > index 8e93167f1783..f3e0c6b65635 100644
> > --- a/drivers/base/power/qos.c
> > +++ b/drivers/base/power/qos.c
> > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> >  }
> >
> >  /*
> > - * dev_pm_qos_constraints_allocate
> > + * dev_pm_qos_constraints_ensure_allocated
> >   * @dev: device to allocate data for
> >   *
> > - * Called at the first call to add_request, for constraint data allocation
> > - * Must be called with the dev_pm_qos_mtx mutex held
> > + * Called to ensure that devices qos is allocated, before acquiring
> > + * dev_pm_qos_mtx.
> >   */
> > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> >  {
> >         struct dev_pm_qos *qos;
> >         struct pm_qos_constraints *c;
> >         struct blocking_notifier_head *n;
> >
> > +       if (!dev)
> > +               return -ENODEV;
> > +
> > +       if (!IS_ERR_OR_NULL(dev->power.qos))
> > +               return 0;
> > +
> >         qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> >         if (!qos)
> >                 return -ENOMEM;
> > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> >
> >         INIT_LIST_HEAD(&qos->flags.list);
> >
> > +       mutex_lock(&dev_pm_qos_mtx);
> > +
> > +       if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > +               /*
> > +                * We have raced with another task to create the qos.
> > +                * No biggie, just free the resources we've allocated
> > +                * outside of dev_pm_qos_mtx and move on with life.
> > +                */
> > +               kfree(n);
> > +               kfree(qos);
> > +               goto unlock;
> > +       }
> > +
> >         spin_lock_irq(&dev->power.lock);
> >         dev->power.qos = qos;
> >         spin_unlock_irq(&dev->power.lock);
> >
> > +unlock:
> > +       mutex_unlock(&dev_pm_qos_mtx);
> > +
> >         return 0;
> >  }
> >
> > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> >  {
> >         int ret = 0;
> >
> > -       if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > +       if (!req || dev_pm_qos_invalid_req_type(dev, type))
> >                 return -EINVAL;
> >
> >         if (WARN(dev_pm_qos_request_active(req),
> >                  "%s() called for already added request\n", __func__))
> >                 return -EINVAL;
> >
> > -       if (IS_ERR(dev->power.qos))
> > +       if (IS_ERR_OR_NULL(dev->power.qos))
> >                 ret = -ENODEV;
> > -       else if (!dev->power.qos)
> > -               ret = dev_pm_qos_constraints_allocate(dev);
> >
> >         trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> >         if (ret)
> > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> >  {
> >         int ret;
> >
> > +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > +       if (ret)
> > +               return ret;
> > +
>
> It is a bit unfortunate that the mutex is dropped and then immediately
> re-acquired again.  I don't think that this is strictly necessary.

We could have dev_pm_qos_constraints_ensure_allocated() return with
the lock held in the success case if we had to.. but that seems a bit
funny looking.  And the dev_pm_qos_update_user_latency_tolerance()
path would need to shuffle slightly to move the kzalloc out of the
lock.

BR,
-R


> >         mutex_lock(&dev_pm_qos_mtx);
> >         ret = __dev_pm_qos_add_request(dev, req, type, value);
> >         mutex_unlock(&dev_pm_qos_mtx);
> > @@ -537,15 +561,11 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
> >  {
> >         int ret = 0;
> >
> > -       mutex_lock(&dev_pm_qos_mtx);
> > -
> > -       if (IS_ERR(dev->power.qos))
> > -               ret = -ENODEV;
> > -       else if (!dev->power.qos)
> > -               ret = dev_pm_qos_constraints_allocate(dev);
> > -
> > +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> >         if (ret)
> > -               goto unlock;
> > +               return ret;
> > +
> > +       mutex_lock(&dev_pm_qos_mtx);
> >
> >         switch (type) {
> >         case DEV_PM_QOS_RESUME_LATENCY:
> > @@ -565,7 +585,6 @@ int dev_pm_qos_add_notifier(struct device *dev, struct notifier_block *notifier,
> >                 ret = -EINVAL;
> >         }
> >
> > -unlock:
> >         mutex_unlock(&dev_pm_qos_mtx);
> >         return ret;
> >  }
> > @@ -905,10 +924,13 @@ int dev_pm_qos_update_user_latency_tolerance(struct device *dev, s32 val)
> >  {
> >         int ret;
> >
> > +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > +       if (ret)
> > +               return ret;
> > +
> >         mutex_lock(&dev_pm_qos_mtx);
> >
> > -       if (IS_ERR_OR_NULL(dev->power.qos)
> > -           || !dev->power.qos->latency_tolerance_req) {
> > +       if (!dev->power.qos->latency_tolerance_req) {
> >                 struct dev_pm_qos_request *req;
> >
> >                 if (val < 0) {
> > --
> > 2.41.0
> >
Rafael J. Wysocki Aug. 4, 2023, 7:11 p.m. UTC | #3
On Fri, Aug 4, 2023 at 8:38 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > this lockdep splat:
> > >
> > >    ======================================================
> > >    WARNING: possible circular locking dependency detected
> > >    6.2.0-rc8-debug+ #558 Tainted: G        W
> > >    ------------------------------------------------------
> > >    ring0/125 is trying to acquire lock:
> > >    ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> > >
> > >    but task is already holding lock:
> > >    ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > >
> > >    which lock already depends on the new lock.
> > >
> > >    the existing dependency chain (in reverse order) is:
> > >
> > >    -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> > >           __mutex_lock+0xcc/0x3c8
> > >           mutex_lock_nested+0x30/0x44
> > >           msm_gpu_submit+0xec/0x178
> > >           msm_job_run+0x78/0x150
> > >           drm_sched_main+0x290/0x370
> > >           kthread+0xf0/0x100
> > >           ret_from_fork+0x10/0x20
> > >
> > >    -> #3 (dma_fence_map){++++}-{0:0}:
> > >           __dma_fence_might_wait+0x74/0xc0
> > >           dma_resv_lockdep+0x1f4/0x2f4
> > >           do_one_initcall+0x104/0x2bc
> > >           kernel_init_freeable+0x344/0x34c
> > >           kernel_init+0x30/0x134
> > >           ret_from_fork+0x10/0x20
> > >
> > >    -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > >           fs_reclaim_acquire+0x80/0xa8
> > >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > >           __kmem_cache_alloc_node+0x60/0x1cc
> > >           __kmalloc+0xd8/0x100
> > >           topology_parse_cpu_capacity+0x8c/0x178
> > >           get_cpu_for_node+0x88/0xc4
> > >           parse_cluster+0x1b0/0x28c
> > >           parse_cluster+0x8c/0x28c
> > >           init_cpu_topology+0x168/0x188
> > >           smp_prepare_cpus+0x24/0xf8
> > >           kernel_init_freeable+0x18c/0x34c
> > >           kernel_init+0x30/0x134
> > >           ret_from_fork+0x10/0x20
> > >
> > >    -> #1 (fs_reclaim){+.+.}-{0:0}:
> > >           __fs_reclaim_acquire+0x3c/0x48
> > >           fs_reclaim_acquire+0x54/0xa8
> > >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > >           __kmem_cache_alloc_node+0x60/0x1cc
> > >           kmalloc_trace+0x50/0xa8
> > >           dev_pm_qos_constraints_allocate+0x38/0x100
> > >           __dev_pm_qos_add_request+0xb0/0x1e8
> > >           dev_pm_qos_add_request+0x58/0x80
> > >           dev_pm_qos_expose_latency_limit+0x60/0x13c
> > >           register_cpu+0x12c/0x130
> > >           topology_init+0xac/0xbc
> > >           do_one_initcall+0x104/0x2bc
> > >           kernel_init_freeable+0x344/0x34c
> > >           kernel_init+0x30/0x134
> > >           ret_from_fork+0x10/0x20
> > >
> > >    -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > >           __lock_acquire+0xe00/0x1060
> > >           lock_acquire+0x1e0/0x2f8
> > >           __mutex_lock+0xcc/0x3c8
> > >           mutex_lock_nested+0x30/0x44
> > >           dev_pm_qos_update_request+0x38/0x68
> > >           msm_devfreq_boost+0x40/0x70
> > >           msm_devfreq_active+0xc0/0xf0
> > >           msm_gpu_submit+0x10c/0x178
> > >           msm_job_run+0x78/0x150
> > >           drm_sched_main+0x290/0x370
> > >           kthread+0xf0/0x100
> > >           ret_from_fork+0x10/0x20
> > >
> > >    other info that might help us debug this:
> > >
> > >    Chain exists of:
> > >      dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> > >
> > >     Possible unsafe locking scenario:
> > >
> > >           CPU0                    CPU1
> > >           ----                    ----
> > >      lock(&gpu->active_lock);
> > >                                   lock(dma_fence_map);
> > >                                   lock(&gpu->active_lock);
> > >      lock(dev_pm_qos_mtx);
> > >
> > >     *** DEADLOCK ***
> > >
> > >    3 locks held by ring0/123:
> > >     #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> > >     #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> > >     #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > >
> > >    stack backtrace:
> > >    CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> > >    Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > >    Call trace:
> > >     dump_backtrace.part.0+0xb4/0xf8
> > >     show_stack+0x20/0x38
> > >     dump_stack_lvl+0x9c/0xd0
> > >     dump_stack+0x18/0x34
> > >     print_circular_bug+0x1b4/0x1f0
> > >     check_noncircular+0x78/0xac
> > >     __lock_acquire+0xe00/0x1060
> > >     lock_acquire+0x1e0/0x2f8
> > >     __mutex_lock+0xcc/0x3c8
> > >     mutex_lock_nested+0x30/0x44
> > >     dev_pm_qos_update_request+0x38/0x68
> > >     msm_devfreq_boost+0x40/0x70
> > >     msm_devfreq_active+0xc0/0xf0
> > >     msm_gpu_submit+0x10c/0x178
> > >     msm_job_run+0x78/0x150
> > >     drm_sched_main+0x290/0x370
> > >     kthread+0xf0/0x100
> > >     ret_from_fork+0x10/0x20
> > >
> > > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > > freq change) path, but it is also held across allocations that could
> > > recurse into shrinker.
> > >
> > > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > > that can be called unconditionally before the device qos object is
> > > needed and before aquiring dev_pm_qos_mtx.  This way the allocations can
> > > be done without holding the mutex.  In the case that we raced with
> > > another thread to allocate the qos object, detect this *after* acquiring
> > > the dev_pm_qos_mtx and simply free the redundant allocations.
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > ---
> > >  drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> > >  1 file changed, 41 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > > index 8e93167f1783..f3e0c6b65635 100644
> > > --- a/drivers/base/power/qos.c
> > > +++ b/drivers/base/power/qos.c
> > > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> > >  }
> > >
> > >  /*
> > > - * dev_pm_qos_constraints_allocate
> > > + * dev_pm_qos_constraints_ensure_allocated
> > >   * @dev: device to allocate data for
> > >   *
> > > - * Called at the first call to add_request, for constraint data allocation
> > > - * Must be called with the dev_pm_qos_mtx mutex held
> > > + * Called to ensure that devices qos is allocated, before acquiring
> > > + * dev_pm_qos_mtx.
> > >   */
> > > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> > >  {
> > >         struct dev_pm_qos *qos;
> > >         struct pm_qos_constraints *c;
> > >         struct blocking_notifier_head *n;
> > >
> > > +       if (!dev)
> > > +               return -ENODEV;
> > > +
> > > +       if (!IS_ERR_OR_NULL(dev->power.qos))
> > > +               return 0;
> > > +
> > >         qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> > >         if (!qos)
> > >                 return -ENOMEM;
> > > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> > >
> > >         INIT_LIST_HEAD(&qos->flags.list);
> > >
> > > +       mutex_lock(&dev_pm_qos_mtx);
> > > +
> > > +       if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > > +               /*
> > > +                * We have raced with another task to create the qos.
> > > +                * No biggie, just free the resources we've allocated
> > > +                * outside of dev_pm_qos_mtx and move on with life.
> > > +                */
> > > +               kfree(n);
> > > +               kfree(qos);
> > > +               goto unlock;
> > > +       }
> > > +
> > >         spin_lock_irq(&dev->power.lock);
> > >         dev->power.qos = qos;
> > >         spin_unlock_irq(&dev->power.lock);
> > >
> > > +unlock:
> > > +       mutex_unlock(&dev_pm_qos_mtx);
> > > +
> > >         return 0;
> > >  }
> > >
> > > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> > >  {
> > >         int ret = 0;
> > >
> > > -       if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > > +       if (!req || dev_pm_qos_invalid_req_type(dev, type))
> > >                 return -EINVAL;
> > >
> > >         if (WARN(dev_pm_qos_request_active(req),
> > >                  "%s() called for already added request\n", __func__))
> > >                 return -EINVAL;
> > >
> > > -       if (IS_ERR(dev->power.qos))
> > > +       if (IS_ERR_OR_NULL(dev->power.qos))
> > >                 ret = -ENODEV;
> > > -       else if (!dev->power.qos)
> > > -               ret = dev_pm_qos_constraints_allocate(dev);
> > >
> > >         trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> > >         if (ret)
> > > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > >  {
> > >         int ret;
> > >
> > > +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > > +       if (ret)
> > > +               return ret;
> > > +
> >
> > It is a bit unfortunate that the mutex is dropped and then immediately
> > re-acquired again.  I don't think that this is strictly necessary.
>
> We could have dev_pm_qos_constraints_ensure_allocated() return with
> the lock held in the success case if we had to.. but that seems a bit
> funny looking.  And the dev_pm_qos_update_user_latency_tolerance()
> path would need to shuffle slightly to move the kzalloc out of the
> lock.

Well, what about something like this (modulo whitespace damage by
GMail), attached for completeness:

---
 drivers/base/power/qos.c |   37 +++++++++++++++++++++++++------------
 1 file changed, 25 insertions(+), 12 deletions(-)

Index: linux-pm/drivers/base/power/qos.c
===================================================================
--- linux-pm.orig/drivers/base/power/qos.c
+++ linux-pm/drivers/base/power/qos.c
@@ -186,26 +186,21 @@ static int apply_constraint(struct dev_p

 /*
  * dev_pm_qos_constraints_allocate
- * @dev: device to allocate data for
  *
  * Called at the first call to add_request, for constraint data allocation
  * Must be called with the dev_pm_qos_mtx mutex held
  */
-static int dev_pm_qos_constraints_allocate(struct device *dev)
+static struct dev_pm_qos *dev_pm_qos_constraints_allocate(void)
 {
     struct dev_pm_qos *qos;
     struct pm_qos_constraints *c;
     struct blocking_notifier_head *n;

-    qos = kzalloc(sizeof(*qos), GFP_KERNEL);
+    qos = kzalloc(sizeof(*qos) + kzalloc(3 * sizeof(*n), GFP_KERNEL);
     if (!qos)
-        return -ENOMEM;
+        return NULL;

-    n = kzalloc(3 * sizeof(*n), GFP_KERNEL);
-    if (!n) {
-        kfree(qos);
-        return -ENOMEM;
-    }
+    n = (struct blocking_notifier_head *)(qos + 1);

     c = &qos->resume_latency;
     plist_head_init(&c->list);
@@ -227,6 +222,20 @@ static int dev_pm_qos_constraints_alloca

     INIT_LIST_HEAD(&qos->flags.list);

+    return qos;
+}
+
+static int dev_pm_qos_constraints_add(struct device *dev,
+                       struct dev_pm_qos *qos)
+{
+    if (!qos)
+        return -ENOMEM;
+
+    if (!IS_ERR_OR_NULL(dev->power.qos)) {
+        kfree(qos);
+        return -ENODEV;
+    }
+
     spin_lock_irq(&dev->power.lock);
     dev->power.qos = qos;
     spin_unlock_irq(&dev->power.lock);
@@ -326,6 +335,7 @@ static bool dev_pm_qos_invalid_req_type(
 }

 static int __dev_pm_qos_add_request(struct device *dev,
+                    struct dev_pm_qos *qos,
                     struct dev_pm_qos_request *req,
                     enum dev_pm_qos_req_type type, s32 value)
 {
@@ -340,8 +350,10 @@ static int __dev_pm_qos_add_request(stru

     if (IS_ERR(dev->power.qos))
         ret = -ENODEV;
-    else if (!dev->power.qos)
-        ret = dev_pm_qos_constraints_allocate(dev);
+    else if (dev->power.qos)
+        kfree(qos);
+    else
+        ret = dev_pm_qos_constraints_add(dev);

     trace_dev_pm_qos_add_request(dev_name(dev), type, value);
     if (ret)
@@ -388,10 +400,11 @@ static int __dev_pm_qos_add_request(stru
 int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
                enum dev_pm_qos_req_type type, s32 value)
 {
+    struct dev_pm_qos *qos = dev_pm_qos_constraints_allocate();
     int ret;

     mutex_lock(&dev_pm_qos_mtx);
-    ret = __dev_pm_qos_add_request(dev, req, type, value);
+    ret = __dev_pm_qos_add_request(dev, qos, req, type, value);
     mutex_unlock(&dev_pm_qos_mtx);
     return ret;
 }
Rob Clark Aug. 4, 2023, 8:37 p.m. UTC | #4
On Fri, Aug 4, 2023 at 12:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Fri, Aug 4, 2023 at 8:38 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <robdclark@gmail.com> wrote:
> > > >
> > > > From: Rob Clark <robdclark@chromium.org>
> > > >
> > > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > > this lockdep splat:
> > > >
> > > >    ======================================================
> > > >    WARNING: possible circular locking dependency detected
> > > >    6.2.0-rc8-debug+ #558 Tainted: G        W
> > > >    ------------------------------------------------------
> > > >    ring0/125 is trying to acquire lock:
> > > >    ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> > > >
> > > >    but task is already holding lock:
> > > >    ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > >
> > > >    which lock already depends on the new lock.
> > > >
> > > >    the existing dependency chain (in reverse order) is:
> > > >
> > > >    -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> > > >           __mutex_lock+0xcc/0x3c8
> > > >           mutex_lock_nested+0x30/0x44
> > > >           msm_gpu_submit+0xec/0x178
> > > >           msm_job_run+0x78/0x150
> > > >           drm_sched_main+0x290/0x370
> > > >           kthread+0xf0/0x100
> > > >           ret_from_fork+0x10/0x20
> > > >
> > > >    -> #3 (dma_fence_map){++++}-{0:0}:
> > > >           __dma_fence_might_wait+0x74/0xc0
> > > >           dma_resv_lockdep+0x1f4/0x2f4
> > > >           do_one_initcall+0x104/0x2bc
> > > >           kernel_init_freeable+0x344/0x34c
> > > >           kernel_init+0x30/0x134
> > > >           ret_from_fork+0x10/0x20
> > > >
> > > >    -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > > >           fs_reclaim_acquire+0x80/0xa8
> > > >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > >           __kmem_cache_alloc_node+0x60/0x1cc
> > > >           __kmalloc+0xd8/0x100
> > > >           topology_parse_cpu_capacity+0x8c/0x178
> > > >           get_cpu_for_node+0x88/0xc4
> > > >           parse_cluster+0x1b0/0x28c
> > > >           parse_cluster+0x8c/0x28c
> > > >           init_cpu_topology+0x168/0x188
> > > >           smp_prepare_cpus+0x24/0xf8
> > > >           kernel_init_freeable+0x18c/0x34c
> > > >           kernel_init+0x30/0x134
> > > >           ret_from_fork+0x10/0x20
> > > >
> > > >    -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > >           __fs_reclaim_acquire+0x3c/0x48
> > > >           fs_reclaim_acquire+0x54/0xa8
> > > >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > >           __kmem_cache_alloc_node+0x60/0x1cc
> > > >           kmalloc_trace+0x50/0xa8
> > > >           dev_pm_qos_constraints_allocate+0x38/0x100
> > > >           __dev_pm_qos_add_request+0xb0/0x1e8
> > > >           dev_pm_qos_add_request+0x58/0x80
> > > >           dev_pm_qos_expose_latency_limit+0x60/0x13c
> > > >           register_cpu+0x12c/0x130
> > > >           topology_init+0xac/0xbc
> > > >           do_one_initcall+0x104/0x2bc
> > > >           kernel_init_freeable+0x344/0x34c
> > > >           kernel_init+0x30/0x134
> > > >           ret_from_fork+0x10/0x20
> > > >
> > > >    -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > > >           __lock_acquire+0xe00/0x1060
> > > >           lock_acquire+0x1e0/0x2f8
> > > >           __mutex_lock+0xcc/0x3c8
> > > >           mutex_lock_nested+0x30/0x44
> > > >           dev_pm_qos_update_request+0x38/0x68
> > > >           msm_devfreq_boost+0x40/0x70
> > > >           msm_devfreq_active+0xc0/0xf0
> > > >           msm_gpu_submit+0x10c/0x178
> > > >           msm_job_run+0x78/0x150
> > > >           drm_sched_main+0x290/0x370
> > > >           kthread+0xf0/0x100
> > > >           ret_from_fork+0x10/0x20
> > > >
> > > >    other info that might help us debug this:
> > > >
> > > >    Chain exists of:
> > > >      dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> > > >
> > > >     Possible unsafe locking scenario:
> > > >
> > > >           CPU0                    CPU1
> > > >           ----                    ----
> > > >      lock(&gpu->active_lock);
> > > >                                   lock(dma_fence_map);
> > > >                                   lock(&gpu->active_lock);
> > > >      lock(dev_pm_qos_mtx);
> > > >
> > > >     *** DEADLOCK ***
> > > >
> > > >    3 locks held by ring0/123:
> > > >     #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> > > >     #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> > > >     #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > >
> > > >    stack backtrace:
> > > >    CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> > > >    Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > > >    Call trace:
> > > >     dump_backtrace.part.0+0xb4/0xf8
> > > >     show_stack+0x20/0x38
> > > >     dump_stack_lvl+0x9c/0xd0
> > > >     dump_stack+0x18/0x34
> > > >     print_circular_bug+0x1b4/0x1f0
> > > >     check_noncircular+0x78/0xac
> > > >     __lock_acquire+0xe00/0x1060
> > > >     lock_acquire+0x1e0/0x2f8
> > > >     __mutex_lock+0xcc/0x3c8
> > > >     mutex_lock_nested+0x30/0x44
> > > >     dev_pm_qos_update_request+0x38/0x68
> > > >     msm_devfreq_boost+0x40/0x70
> > > >     msm_devfreq_active+0xc0/0xf0
> > > >     msm_gpu_submit+0x10c/0x178
> > > >     msm_job_run+0x78/0x150
> > > >     drm_sched_main+0x290/0x370
> > > >     kthread+0xf0/0x100
> > > >     ret_from_fork+0x10/0x20
> > > >
> > > > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > > > freq change) path, but it is also held across allocations that could
> > > > recurse into shrinker.
> > > >
> > > > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > > > that can be called unconditionally before the device qos object is
> > > > needed and before aquiring dev_pm_qos_mtx.  This way the allocations can
> > > > be done without holding the mutex.  In the case that we raced with
> > > > another thread to allocate the qos object, detect this *after* acquiring
> > > > the dev_pm_qos_mtx and simply free the redundant allocations.
> > > >
> > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > ---
> > > >  drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> > > >  1 file changed, 41 insertions(+), 19 deletions(-)
> > > >
> > > > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > > > index 8e93167f1783..f3e0c6b65635 100644
> > > > --- a/drivers/base/power/qos.c
> > > > +++ b/drivers/base/power/qos.c
> > > > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> > > >  }
> > > >
> > > >  /*
> > > > - * dev_pm_qos_constraints_allocate
> > > > + * dev_pm_qos_constraints_ensure_allocated
> > > >   * @dev: device to allocate data for
> > > >   *
> > > > - * Called at the first call to add_request, for constraint data allocation
> > > > - * Must be called with the dev_pm_qos_mtx mutex held
> > > > + * Called to ensure that devices qos is allocated, before acquiring
> > > > + * dev_pm_qos_mtx.
> > > >   */
> > > > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> > > >  {
> > > >         struct dev_pm_qos *qos;
> > > >         struct pm_qos_constraints *c;
> > > >         struct blocking_notifier_head *n;
> > > >
> > > > +       if (!dev)
> > > > +               return -ENODEV;
> > > > +
> > > > +       if (!IS_ERR_OR_NULL(dev->power.qos))
> > > > +               return 0;
> > > > +
> > > >         qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> > > >         if (!qos)
> > > >                 return -ENOMEM;
> > > > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > >
> > > >         INIT_LIST_HEAD(&qos->flags.list);
> > > >
> > > > +       mutex_lock(&dev_pm_qos_mtx);
> > > > +
> > > > +       if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > > > +               /*
> > > > +                * We have raced with another task to create the qos.
> > > > +                * No biggie, just free the resources we've allocated
> > > > +                * outside of dev_pm_qos_mtx and move on with life.
> > > > +                */
> > > > +               kfree(n);
> > > > +               kfree(qos);
> > > > +               goto unlock;
> > > > +       }
> > > > +
> > > >         spin_lock_irq(&dev->power.lock);
> > > >         dev->power.qos = qos;
> > > >         spin_unlock_irq(&dev->power.lock);
> > > >
> > > > +unlock:
> > > > +       mutex_unlock(&dev_pm_qos_mtx);
> > > > +
> > > >         return 0;
> > > >  }
> > > >
> > > > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> > > >  {
> > > >         int ret = 0;
> > > >
> > > > -       if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > > > +       if (!req || dev_pm_qos_invalid_req_type(dev, type))
> > > >                 return -EINVAL;
> > > >
> > > >         if (WARN(dev_pm_qos_request_active(req),
> > > >                  "%s() called for already added request\n", __func__))
> > > >                 return -EINVAL;
> > > >
> > > > -       if (IS_ERR(dev->power.qos))
> > > > +       if (IS_ERR_OR_NULL(dev->power.qos))
> > > >                 ret = -ENODEV;
> > > > -       else if (!dev->power.qos)
> > > > -               ret = dev_pm_qos_constraints_allocate(dev);
> > > >
> > > >         trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> > > >         if (ret)
> > > > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > > >  {
> > > >         int ret;
> > > >
> > > > +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > > > +       if (ret)
> > > > +               return ret;
> > > > +
> > >
> > > It is a bit unfortunate that the mutex is dropped and then immediately
> > > re-acquired again.  I don't think that this is strictly necessary.
> >
> > We could have dev_pm_qos_constraints_ensure_allocated() return with
> > the lock held in the success case if we had to.. but that seems a bit
> > funny looking.  And the dev_pm_qos_update_user_latency_tolerance()
> > path would need to shuffle slightly to move the kzalloc out of the
> > lock.
>
> Well, what about something like this (modulo whitespace damage by
> GMail), attached for completeness:
>

There is one other path to handle, and some small details, but I think
the approach could work.. let's see..

BR,
-R

> ---
>  drivers/base/power/qos.c |   37 +++++++++++++++++++++++++------------
>  1 file changed, 25 insertions(+), 12 deletions(-)
>
> Index: linux-pm/drivers/base/power/qos.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/qos.c
> +++ linux-pm/drivers/base/power/qos.c
> @@ -186,26 +186,21 @@ static int apply_constraint(struct dev_p
>
>  /*
>   * dev_pm_qos_constraints_allocate
> - * @dev: device to allocate data for
>   *
>   * Called at the first call to add_request, for constraint data allocation
>   * Must be called with the dev_pm_qos_mtx mutex held
>   */
> -static int dev_pm_qos_constraints_allocate(struct device *dev)
> +static struct dev_pm_qos *dev_pm_qos_constraints_allocate(void)
>  {
>      struct dev_pm_qos *qos;
>      struct pm_qos_constraints *c;
>      struct blocking_notifier_head *n;
>
> -    qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> +    qos = kzalloc(sizeof(*qos) + kzalloc(3 * sizeof(*n), GFP_KERNEL);
>      if (!qos)
> -        return -ENOMEM;
> +        return NULL;
>
> -    n = kzalloc(3 * sizeof(*n), GFP_KERNEL);
> -    if (!n) {
> -        kfree(qos);
> -        return -ENOMEM;
> -    }
> +    n = (struct blocking_notifier_head *)(qos + 1);
>
>      c = &qos->resume_latency;
>      plist_head_init(&c->list);
> @@ -227,6 +222,20 @@ static int dev_pm_qos_constraints_alloca
>
>      INIT_LIST_HEAD(&qos->flags.list);
>
> +    return qos;
> +}
> +
> +static int dev_pm_qos_constraints_add(struct device *dev,
> +                       struct dev_pm_qos *qos)
> +{
> +    if (!qos)
> +        return -ENOMEM;
> +
> +    if (!IS_ERR_OR_NULL(dev->power.qos)) {
> +        kfree(qos);
> +        return -ENODEV;
> +    }
> +
>      spin_lock_irq(&dev->power.lock);
>      dev->power.qos = qos;
>      spin_unlock_irq(&dev->power.lock);
> @@ -326,6 +335,7 @@ static bool dev_pm_qos_invalid_req_type(
>  }
>
>  static int __dev_pm_qos_add_request(struct device *dev,
> +                    struct dev_pm_qos *qos,
>                      struct dev_pm_qos_request *req,
>                      enum dev_pm_qos_req_type type, s32 value)
>  {
> @@ -340,8 +350,10 @@ static int __dev_pm_qos_add_request(stru
>
>      if (IS_ERR(dev->power.qos))
>          ret = -ENODEV;
> -    else if (!dev->power.qos)
> -        ret = dev_pm_qos_constraints_allocate(dev);
> +    else if (dev->power.qos)
> +        kfree(qos);
> +    else
> +        ret = dev_pm_qos_constraints_add(dev);
>
>      trace_dev_pm_qos_add_request(dev_name(dev), type, value);
>      if (ret)
> @@ -388,10 +400,11 @@ static int __dev_pm_qos_add_request(stru
>  int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
>                 enum dev_pm_qos_req_type type, s32 value)
>  {
> +    struct dev_pm_qos *qos = dev_pm_qos_constraints_allocate();
>      int ret;
>
>      mutex_lock(&dev_pm_qos_mtx);
> -    ret = __dev_pm_qos_add_request(dev, req, type, value);
> +    ret = __dev_pm_qos_add_request(dev, qos, req, type, value);
>      mutex_unlock(&dev_pm_qos_mtx);
>      return ret;
>  }
Rafael J. Wysocki Aug. 4, 2023, 8:45 p.m. UTC | #5
On Fri, Aug 4, 2023 at 10:38 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Fri, Aug 4, 2023 at 12:11 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Fri, Aug 4, 2023 at 8:38 PM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > On Fri, Aug 4, 2023 at 10:07 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Fri, Aug 4, 2023 at 12:02 AM Rob Clark <robdclark@gmail.com> wrote:
> > > > >
> > > > > From: Rob Clark <robdclark@chromium.org>
> > > > >
> > > > > In the process of adding lockdep annotation for drm GPU scheduler's
> > > > > job_run() to detect potential deadlock against shrinker/reclaim, I hit
> > > > > this lockdep splat:
> > > > >
> > > > >    ======================================================
> > > > >    WARNING: possible circular locking dependency detected
> > > > >    6.2.0-rc8-debug+ #558 Tainted: G        W
> > > > >    ------------------------------------------------------
> > > > >    ring0/125 is trying to acquire lock:
> > > > >    ffffffd6d6ce0f28 (dev_pm_qos_mtx){+.+.}-{3:3}, at: dev_pm_qos_update_request+0x38/0x68
> > > > >
> > > > >    but task is already holding lock:
> > > > >    ffffff8087239208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > > >
> > > > >    which lock already depends on the new lock.
> > > > >
> > > > >    the existing dependency chain (in reverse order) is:
> > > > >
> > > > >    -> #4 (&gpu->active_lock){+.+.}-{3:3}:
> > > > >           __mutex_lock+0xcc/0x3c8
> > > > >           mutex_lock_nested+0x30/0x44
> > > > >           msm_gpu_submit+0xec/0x178
> > > > >           msm_job_run+0x78/0x150
> > > > >           drm_sched_main+0x290/0x370
> > > > >           kthread+0xf0/0x100
> > > > >           ret_from_fork+0x10/0x20
> > > > >
> > > > >    -> #3 (dma_fence_map){++++}-{0:0}:
> > > > >           __dma_fence_might_wait+0x74/0xc0
> > > > >           dma_resv_lockdep+0x1f4/0x2f4
> > > > >           do_one_initcall+0x104/0x2bc
> > > > >           kernel_init_freeable+0x344/0x34c
> > > > >           kernel_init+0x30/0x134
> > > > >           ret_from_fork+0x10/0x20
> > > > >
> > > > >    -> #2 (mmu_notifier_invalidate_range_start){+.+.}-{0:0}:
> > > > >           fs_reclaim_acquire+0x80/0xa8
> > > > >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > >           __kmem_cache_alloc_node+0x60/0x1cc
> > > > >           __kmalloc+0xd8/0x100
> > > > >           topology_parse_cpu_capacity+0x8c/0x178
> > > > >           get_cpu_for_node+0x88/0xc4
> > > > >           parse_cluster+0x1b0/0x28c
> > > > >           parse_cluster+0x8c/0x28c
> > > > >           init_cpu_topology+0x168/0x188
> > > > >           smp_prepare_cpus+0x24/0xf8
> > > > >           kernel_init_freeable+0x18c/0x34c
> > > > >           kernel_init+0x30/0x134
> > > > >           ret_from_fork+0x10/0x20
> > > > >
> > > > >    -> #1 (fs_reclaim){+.+.}-{0:0}:
> > > > >           __fs_reclaim_acquire+0x3c/0x48
> > > > >           fs_reclaim_acquire+0x54/0xa8
> > > > >           slab_pre_alloc_hook.constprop.0+0x40/0x25c
> > > > >           __kmem_cache_alloc_node+0x60/0x1cc
> > > > >           kmalloc_trace+0x50/0xa8
> > > > >           dev_pm_qos_constraints_allocate+0x38/0x100
> > > > >           __dev_pm_qos_add_request+0xb0/0x1e8
> > > > >           dev_pm_qos_add_request+0x58/0x80
> > > > >           dev_pm_qos_expose_latency_limit+0x60/0x13c
> > > > >           register_cpu+0x12c/0x130
> > > > >           topology_init+0xac/0xbc
> > > > >           do_one_initcall+0x104/0x2bc
> > > > >           kernel_init_freeable+0x344/0x34c
> > > > >           kernel_init+0x30/0x134
> > > > >           ret_from_fork+0x10/0x20
> > > > >
> > > > >    -> #0 (dev_pm_qos_mtx){+.+.}-{3:3}:
> > > > >           __lock_acquire+0xe00/0x1060
> > > > >           lock_acquire+0x1e0/0x2f8
> > > > >           __mutex_lock+0xcc/0x3c8
> > > > >           mutex_lock_nested+0x30/0x44
> > > > >           dev_pm_qos_update_request+0x38/0x68
> > > > >           msm_devfreq_boost+0x40/0x70
> > > > >           msm_devfreq_active+0xc0/0xf0
> > > > >           msm_gpu_submit+0x10c/0x178
> > > > >           msm_job_run+0x78/0x150
> > > > >           drm_sched_main+0x290/0x370
> > > > >           kthread+0xf0/0x100
> > > > >           ret_from_fork+0x10/0x20
> > > > >
> > > > >    other info that might help us debug this:
> > > > >
> > > > >    Chain exists of:
> > > > >      dev_pm_qos_mtx --> dma_fence_map --> &gpu->active_lock
> > > > >
> > > > >     Possible unsafe locking scenario:
> > > > >
> > > > >           CPU0                    CPU1
> > > > >           ----                    ----
> > > > >      lock(&gpu->active_lock);
> > > > >                                   lock(dma_fence_map);
> > > > >                                   lock(&gpu->active_lock);
> > > > >      lock(dev_pm_qos_mtx);
> > > > >
> > > > >     *** DEADLOCK ***
> > > > >
> > > > >    3 locks held by ring0/123:
> > > > >     #0: ffffff8087251170 (&gpu->lock){+.+.}-{3:3}, at: msm_job_run+0x64/0x150
> > > > >     #1: ffffffd00b0e57e8 (dma_fence_map){++++}-{0:0}, at: msm_job_run+0x68/0x150
> > > > >     #2: ffffff8087251208 (&gpu->active_lock){+.+.}-{3:3}, at: msm_gpu_submit+0xec/0x178
> > > > >
> > > > >    stack backtrace:
> > > > >    CPU: 6 PID: 123 Comm: ring0 Not tainted 6.2.0-rc8-debug+ #559
> > > > >    Hardware name: Google Lazor (rev1 - 2) with LTE (DT)
> > > > >    Call trace:
> > > > >     dump_backtrace.part.0+0xb4/0xf8
> > > > >     show_stack+0x20/0x38
> > > > >     dump_stack_lvl+0x9c/0xd0
> > > > >     dump_stack+0x18/0x34
> > > > >     print_circular_bug+0x1b4/0x1f0
> > > > >     check_noncircular+0x78/0xac
> > > > >     __lock_acquire+0xe00/0x1060
> > > > >     lock_acquire+0x1e0/0x2f8
> > > > >     __mutex_lock+0xcc/0x3c8
> > > > >     mutex_lock_nested+0x30/0x44
> > > > >     dev_pm_qos_update_request+0x38/0x68
> > > > >     msm_devfreq_boost+0x40/0x70
> > > > >     msm_devfreq_active+0xc0/0xf0
> > > > >     msm_gpu_submit+0x10c/0x178
> > > > >     msm_job_run+0x78/0x150
> > > > >     drm_sched_main+0x290/0x370
> > > > >     kthread+0xf0/0x100
> > > > >     ret_from_fork+0x10/0x20
> > > > >
> > > > > The issue is that dev_pm_qos_mtx is held in the runpm suspend/resume (or
> > > > > freq change) path, but it is also held across allocations that could
> > > > > recurse into shrinker.
> > > > >
> > > > > Solve this by changing dev_pm_qos_constraints_allocate() into a function
> > > > > that can be called unconditionally before the device qos object is
> > > > > needed and before aquiring dev_pm_qos_mtx.  This way the allocations can
> > > > > be done without holding the mutex.  In the case that we raced with
> > > > > another thread to allocate the qos object, detect this *after* acquiring
> > > > > the dev_pm_qos_mtx and simply free the redundant allocations.
> > > > >
> > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > ---
> > > > >  drivers/base/power/qos.c | 60 +++++++++++++++++++++++++++-------------
> > > > >  1 file changed, 41 insertions(+), 19 deletions(-)
> > > > >
> > > > > diff --git a/drivers/base/power/qos.c b/drivers/base/power/qos.c
> > > > > index 8e93167f1783..f3e0c6b65635 100644
> > > > > --- a/drivers/base/power/qos.c
> > > > > +++ b/drivers/base/power/qos.c
> > > > > @@ -185,18 +185,24 @@ static int apply_constraint(struct dev_pm_qos_request *req,
> > > > >  }
> > > > >
> > > > >  /*
> > > > > - * dev_pm_qos_constraints_allocate
> > > > > + * dev_pm_qos_constraints_ensure_allocated
> > > > >   * @dev: device to allocate data for
> > > > >   *
> > > > > - * Called at the first call to add_request, for constraint data allocation
> > > > > - * Must be called with the dev_pm_qos_mtx mutex held
> > > > > + * Called to ensure that devices qos is allocated, before acquiring
> > > > > + * dev_pm_qos_mtx.
> > > > >   */
> > > > > -static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > > > +static int dev_pm_qos_constraints_ensure_allocated(struct device *dev)
> > > > >  {
> > > > >         struct dev_pm_qos *qos;
> > > > >         struct pm_qos_constraints *c;
> > > > >         struct blocking_notifier_head *n;
> > > > >
> > > > > +       if (!dev)
> > > > > +               return -ENODEV;
> > > > > +
> > > > > +       if (!IS_ERR_OR_NULL(dev->power.qos))
> > > > > +               return 0;
> > > > > +
> > > > >         qos = kzalloc(sizeof(*qos), GFP_KERNEL);
> > > > >         if (!qos)
> > > > >                 return -ENOMEM;
> > > > > @@ -227,10 +233,26 @@ static int dev_pm_qos_constraints_allocate(struct device *dev)
> > > > >
> > > > >         INIT_LIST_HEAD(&qos->flags.list);
> > > > >
> > > > > +       mutex_lock(&dev_pm_qos_mtx);
> > > > > +
> > > > > +       if (!IS_ERR_OR_NULL(dev->power.qos)) {
> > > > > +               /*
> > > > > +                * We have raced with another task to create the qos.
> > > > > +                * No biggie, just free the resources we've allocated
> > > > > +                * outside of dev_pm_qos_mtx and move on with life.
> > > > > +                */
> > > > > +               kfree(n);
> > > > > +               kfree(qos);
> > > > > +               goto unlock;
> > > > > +       }
> > > > > +
> > > > >         spin_lock_irq(&dev->power.lock);
> > > > >         dev->power.qos = qos;
> > > > >         spin_unlock_irq(&dev->power.lock);
> > > > >
> > > > > +unlock:
> > > > > +       mutex_unlock(&dev_pm_qos_mtx);
> > > > > +
> > > > >         return 0;
> > > > >  }
> > > > >
> > > > > @@ -331,17 +353,15 @@ static int __dev_pm_qos_add_request(struct device *dev,
> > > > >  {
> > > > >         int ret = 0;
> > > > >
> > > > > -       if (!dev || !req || dev_pm_qos_invalid_req_type(dev, type))
> > > > > +       if (!req || dev_pm_qos_invalid_req_type(dev, type))
> > > > >                 return -EINVAL;
> > > > >
> > > > >         if (WARN(dev_pm_qos_request_active(req),
> > > > >                  "%s() called for already added request\n", __func__))
> > > > >                 return -EINVAL;
> > > > >
> > > > > -       if (IS_ERR(dev->power.qos))
> > > > > +       if (IS_ERR_OR_NULL(dev->power.qos))
> > > > >                 ret = -ENODEV;
> > > > > -       else if (!dev->power.qos)
> > > > > -               ret = dev_pm_qos_constraints_allocate(dev);
> > > > >
> > > > >         trace_dev_pm_qos_add_request(dev_name(dev), type, value);
> > > > >         if (ret)
> > > > > @@ -390,6 +410,10 @@ int dev_pm_qos_add_request(struct device *dev, struct dev_pm_qos_request *req,
> > > > >  {
> > > > >         int ret;
> > > > >
> > > > > +       ret = dev_pm_qos_constraints_ensure_allocated(dev);
> > > > > +       if (ret)
> > > > > +               return ret;
> > > > > +
> > > >
> > > > It is a bit unfortunate that the mutex is dropped and then immediately
> > > > re-acquired again.  I don't think that this is strictly necessary.
> > >
> > > We could have dev_pm_qos_constraints_ensure_allocated() return with
> > > the lock held in the success case if we had to.. but that seems a bit
> > > funny looking.  And the dev_pm_qos_update_user_latency_tolerance()
> > > path would need to shuffle slightly to move the kzalloc out of the
> > > lock.
> >
> > Well, what about something like this (modulo whitespace damage by
> > GMail), attached for completeness:
> >
>
> There is one other path to handle, and some small details,

Yes, this was just an illustration of the approach.

> but I think the approach could work.. let's see..

OK