drm/msm: Add fence->wait() op

Message ID	20210720150716.1213775-1-robdclark@gmail.com
State	New
Headers	show Return-Path: <linux-arm-msm-owner@kernel.org> From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: Rob Clark <robdclark@chromium.org>, Rob Clark <robdclark@gmail.com>, Sean Paul <sean@poorly.run>, David Airlie <airlied@linux.ie>, Daniel Vetter <daniel@ffwll.ch>, Sumit Semwal <sumit.semwal@linaro.org>, =?utf-8?q?Christian_K=C3=B6ni?= =?utf-8?q?g?= <christian.koenig@amd.com>, linux-arm-msm@vger.kernel.org (open list:DRM DRIVER FOR MSM ADRENO GPU), freedreno@lists.freedesktop.org (open list:DRM DRIVER FOR MSM ADRENO GPU), linux-kernel@vger.kernel.org (open list), linux-media@vger.kernel.org (open list:DMA BUFFER SHARING FRAMEWORK), linaro-mm-sig@lists.linaro.org (moderated list:DMA BUFFER SHARING FRAMEWORK) Subject: [PATCH] drm/msm: Add fence->wait() op Date: Tue, 20 Jul 2021 08:07:15 -0700 Message-Id: <20210720150716.1213775-1-robdclark@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	drm/msm: Add fence->wait() op \| expand drm/msm: Add fence->wait() op

Rob Clark July 20, 2021, 3:07 p.m. UTC

From: Rob Clark <robdclark@chromium.org>

Somehow we had neither ->wait() nor dma_fence_signal() calls, and no
one noticed.  Oops.

Note that this removes the !timeout case, which has not been used in
a long time.

Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/msm_fence.c | 59 +++++++++++++++++++--------------
 1 file changed, 34 insertions(+), 25 deletions(-)

Rob Clark July 20, 2021, 6:30 p.m. UTC | #1

On Tue, Jul 20, 2021 at 11:03 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Hi Rob,
>
> Am 20.07.21 um 17:07 schrieb Rob Clark:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Somehow we had neither ->wait() nor dma_fence_signal() calls, and no
> > one noticed.  Oops.
>
>
> I'm not sure if that is a good idea.
>
> The dma_fence->wait() callback is pretty much deprecated and should not
> be used any more.
>
> What exactly do you need that for?

Well, the alternative is to track the set of fences which have
signalling enabled, and then figure out which ones to signal, which
seems like a lot more work, vs just re-purposing the wait
implementation we already have for non-dma_fence cases ;-)

Why is the ->wait() callback (pretty much) deprecated?

BR,
-R

> Regards,
> Christian.
>
> >
> > Note that this removes the !timeout case, which has not been used in
> > a long time.
>
>
> >
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/gpu/drm/msm/msm_fence.c | 59 +++++++++++++++++++--------------
> >   1 file changed, 34 insertions(+), 25 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > index cd59a5918038..8ee96b90ded6 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.c
> > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > @@ -38,11 +38,10 @@ static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fenc
> >       return (int32_t)(fctx->completed_fence - fence) >= 0;
> >   }
> >
> > -/* legacy path for WAIT_FENCE ioctl: */
> > -int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > -             ktime_t *timeout, bool interruptible)
> > +static signed long wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > +             signed long remaining_jiffies, bool interruptible)
> >   {
> > -     int ret;
> > +     signed long ret;
> >
> >       if (fence > fctx->last_fence) {
> >               DRM_ERROR_RATELIMITED("%s: waiting on invalid fence: %u (of %u)\n",
> > @@ -50,33 +49,34 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> >               return -EINVAL;
> >       }
> >
> > -     if (!timeout) {
> > -             /* no-wait: */
> > -             ret = fence_completed(fctx, fence) ? 0 : -EBUSY;
> > +     if (interruptible) {
> > +             ret = wait_event_interruptible_timeout(fctx->event,
> > +                     fence_completed(fctx, fence),
> > +                     remaining_jiffies);
> >       } else {
> > -             unsigned long remaining_jiffies = timeout_to_jiffies(timeout);
> > -
> > -             if (interruptible)
> > -                     ret = wait_event_interruptible_timeout(fctx->event,
> > -                             fence_completed(fctx, fence),
> > -                             remaining_jiffies);
> > -             else
> > -                     ret = wait_event_timeout(fctx->event,
> > -                             fence_completed(fctx, fence),
> > -                             remaining_jiffies);
> > -
> > -             if (ret == 0) {
> > -                     DBG("timeout waiting for fence: %u (completed: %u)",
> > -                                     fence, fctx->completed_fence);
> > -                     ret = -ETIMEDOUT;
> > -             } else if (ret != -ERESTARTSYS) {
> > -                     ret = 0;
> > -             }
> > +             ret = wait_event_timeout(fctx->event,
> > +                     fence_completed(fctx, fence),
> > +                     remaining_jiffies);
> > +     }
> > +
> > +     if (ret == 0) {
> > +             DBG("timeout waiting for fence: %u (completed: %u)",
> > +                             fence, fctx->completed_fence);
> > +             ret = -ETIMEDOUT;
> > +     } else if (ret != -ERESTARTSYS) {
> > +             ret = 0;
> >       }
> >
> >       return ret;
> >   }
> >
> > +/* legacy path for WAIT_FENCE ioctl: */
> > +int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > +             ktime_t *timeout, bool interruptible)
> > +{
> > +     return wait_fence(fctx, fence, timeout_to_jiffies(timeout), interruptible);
> > +}
> > +
> >   /* called from workqueue */
> >   void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> >   {
> > @@ -114,10 +114,19 @@ static bool msm_fence_signaled(struct dma_fence *fence)
> >       return fence_completed(f->fctx, f->base.seqno);
> >   }
> >
> > +static signed long msm_fence_wait(struct dma_fence *fence, bool intr,
> > +             signed long timeout)
> > +{
> > +     struct msm_fence *f = to_msm_fence(fence);
> > +
> > +     return wait_fence(f->fctx, fence->seqno, timeout, intr);
> > +}
> > +
> >   static const struct dma_fence_ops msm_fence_ops = {
> >       .get_driver_name = msm_fence_get_driver_name,
> >       .get_timeline_name = msm_fence_get_timeline_name,
> >       .signaled = msm_fence_signaled,
> > +     .wait = msm_fence_wait,
> >   };
> >
> >   struct dma_fence *
>

Daniel Vetter July 20, 2021, 8:55 p.m. UTC | #2

On Tue, Jul 20, 2021 at 8:26 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Tue, Jul 20, 2021 at 11:03 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Hi Rob,
> >
> > Am 20.07.21 um 17:07 schrieb Rob Clark:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Somehow we had neither ->wait() nor dma_fence_signal() calls, and no
> > > one noticed.  Oops.
> >
> >
> > I'm not sure if that is a good idea.
> >
> > The dma_fence->wait() callback is pretty much deprecated and should not
> > be used any more.
> >
> > What exactly do you need that for?
>
> Well, the alternative is to track the set of fences which have
> signalling enabled, and then figure out which ones to signal, which
> seems like a lot more work, vs just re-purposing the wait
> implementation we already have for non-dma_fence cases ;-)
>
> Why is the ->wait() callback (pretty much) deprecated?

Because if you need it that means for your driver dma_fence_add_cb is
broken, which means a _lot_ of things don't work. Like dma_buf poll
(compositors have patches to start using that), and I think
drm/scheduler also becomes rather unhappy.

It essentially exists only for old drivers where ->enable_signalling
is unreliable and we paper over that with a retry loop in ->wait and
pray no one notices that it's too butchered. The proper fix is to have
a driver thread to guarantee that ->enable_signalling works reliable,
so you don't need a ->wait.

Can you type up a kerneldoc patch for dma_fence_ops->wait to hammer
this in please?
-Daniel

>
> BR,
> -R
>
> > Regards,
> > Christian.
> >
> > >
> > > Note that this removes the !timeout case, which has not been used in
> > > a long time.
> >
> >
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > ---
> > >   drivers/gpu/drm/msm/msm_fence.c | 59 +++++++++++++++++++--------------
> > >   1 file changed, 34 insertions(+), 25 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > > index cd59a5918038..8ee96b90ded6 100644
> > > --- a/drivers/gpu/drm/msm/msm_fence.c
> > > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > > @@ -38,11 +38,10 @@ static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fenc
> > >       return (int32_t)(fctx->completed_fence - fence) >= 0;
> > >   }
> > >
> > > -/* legacy path for WAIT_FENCE ioctl: */
> > > -int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > > -             ktime_t *timeout, bool interruptible)
> > > +static signed long wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > > +             signed long remaining_jiffies, bool interruptible)
> > >   {
> > > -     int ret;
> > > +     signed long ret;
> > >
> > >       if (fence > fctx->last_fence) {
> > >               DRM_ERROR_RATELIMITED("%s: waiting on invalid fence: %u (of %u)\n",
> > > @@ -50,33 +49,34 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > >               return -EINVAL;
> > >       }
> > >
> > > -     if (!timeout) {
> > > -             /* no-wait: */
> > > -             ret = fence_completed(fctx, fence) ? 0 : -EBUSY;
> > > +     if (interruptible) {
> > > +             ret = wait_event_interruptible_timeout(fctx->event,
> > > +                     fence_completed(fctx, fence),
> > > +                     remaining_jiffies);
> > >       } else {
> > > -             unsigned long remaining_jiffies = timeout_to_jiffies(timeout);
> > > -
> > > -             if (interruptible)
> > > -                     ret = wait_event_interruptible_timeout(fctx->event,
> > > -                             fence_completed(fctx, fence),
> > > -                             remaining_jiffies);
> > > -             else
> > > -                     ret = wait_event_timeout(fctx->event,
> > > -                             fence_completed(fctx, fence),
> > > -                             remaining_jiffies);
> > > -
> > > -             if (ret == 0) {
> > > -                     DBG("timeout waiting for fence: %u (completed: %u)",
> > > -                                     fence, fctx->completed_fence);
> > > -                     ret = -ETIMEDOUT;
> > > -             } else if (ret != -ERESTARTSYS) {
> > > -                     ret = 0;
> > > -             }
> > > +             ret = wait_event_timeout(fctx->event,
> > > +                     fence_completed(fctx, fence),
> > > +                     remaining_jiffies);
> > > +     }
> > > +
> > > +     if (ret == 0) {
> > > +             DBG("timeout waiting for fence: %u (completed: %u)",
> > > +                             fence, fctx->completed_fence);
> > > +             ret = -ETIMEDOUT;
> > > +     } else if (ret != -ERESTARTSYS) {
> > > +             ret = 0;
> > >       }
> > >
> > >       return ret;
> > >   }
> > >
> > > +/* legacy path for WAIT_FENCE ioctl: */
> > > +int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > > +             ktime_t *timeout, bool interruptible)
> > > +{
> > > +     return wait_fence(fctx, fence, timeout_to_jiffies(timeout), interruptible);
> > > +}
> > > +
> > >   /* called from workqueue */
> > >   void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> > >   {
> > > @@ -114,10 +114,19 @@ static bool msm_fence_signaled(struct dma_fence *fence)
> > >       return fence_completed(f->fctx, f->base.seqno);
> > >   }
> > >
> > > +static signed long msm_fence_wait(struct dma_fence *fence, bool intr,
> > > +             signed long timeout)
> > > +{
> > > +     struct msm_fence *f = to_msm_fence(fence);
> > > +
> > > +     return wait_fence(f->fctx, fence->seqno, timeout, intr);
> > > +}
> > > +
> > >   static const struct dma_fence_ops msm_fence_ops = {
> > >       .get_driver_name = msm_fence_get_driver_name,
> > >       .get_timeline_name = msm_fence_get_timeline_name,
> > >       .signaled = msm_fence_signaled,
> > > +     .wait = msm_fence_wait,
> > >   };
> > >
> > >   struct dma_fence *
> >

Rob Clark July 21, 2021, 4:34 p.m. UTC | #3

On Wed, Jul 21, 2021 at 12:59 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>

> On Wed, Jul 21, 2021 at 12:32 AM Rob Clark <robdclark@gmail.com> wrote:

> >

> > On Tue, Jul 20, 2021 at 1:55 PM Daniel Vetter <daniel@ffwll.ch> wrote:

> > >

> > > On Tue, Jul 20, 2021 at 8:26 PM Rob Clark <robdclark@gmail.com> wrote:

> > > >

> > > > On Tue, Jul 20, 2021 at 11:03 AM Christian König

> > > > <ckoenig.leichtzumerken@gmail.com> wrote:

> > > > >

> > > > > Hi Rob,

> > > > >

> > > > > Am 20.07.21 um 17:07 schrieb Rob Clark:

> > > > > > From: Rob Clark <robdclark@chromium.org>

> > > > > >

> > > > > > Somehow we had neither ->wait() nor dma_fence_signal() calls, and no

> > > > > > one noticed.  Oops.

> > > > >

> > > > >

> > > > > I'm not sure if that is a good idea.

> > > > >

> > > > > The dma_fence->wait() callback is pretty much deprecated and should not

> > > > > be used any more.

> > > > >

> > > > > What exactly do you need that for?

> > > >

> > > > Well, the alternative is to track the set of fences which have

> > > > signalling enabled, and then figure out which ones to signal, which

> > > > seems like a lot more work, vs just re-purposing the wait

> > > > implementation we already have for non-dma_fence cases ;-)

> > > >

> > > > Why is the ->wait() callback (pretty much) deprecated?

> > >

> > > Because if you need it that means for your driver dma_fence_add_cb is

> > > broken, which means a _lot_ of things don't work. Like dma_buf poll

> > > (compositors have patches to start using that), and I think

> > > drm/scheduler also becomes rather unhappy.

> >

> > I'm starting to page back in how this works.. fence cb's aren't broken

> > (which is also why dma_fence_wait() was not completely broken),

> > because in retire_submits() we call

> > dma_fence_is_signaled(submit->hw_fence).

> >

> > But the reason that the custom wait function cleans up a tiny bit of

> > jank is that the wait_queue_head_t gets signaled earlier, before we

> > start iterating the submits and doing all that retire_submit() stuff

> > (unpin/unref bo's, etc).  I suppose I could just split things up to

> > call dma_fence_signal() earlier, and *then* do the retire_submits()

> > stuff.

>

> Yeah reducing the latency there sounds like a good idea.

> -Daniel

>


Hmm, no, turns out that isn't the problem.. or, well, it is probably a
good idea to call drm_fence_signal() earlier.  But it seems like
waking up from wait_event_* is faster than wake_up_state(wait->task,
TASK_NORMAL).  I suppose the wake_up_state() approach still needs for
the scheduler to get around to schedule the runnable task.

So for now, I'm going back to my own wait function (plus earlier
drm_fence_signal())

Before removing dma_fence_opps::wait(), I guess we want to re-think
dma_fence_default_wait().. but I think that would require a
dma_fence_context base class (rather than just a raw integer).

BR,
-R

> >

> > BR,

> > -R

> >

> > > It essentially exists only for old drivers where ->enable_signalling

> > > is unreliable and we paper over that with a retry loop in ->wait and

> > > pray no one notices that it's too butchered. The proper fix is to have

> > > a driver thread to guarantee that ->enable_signalling works reliable,

> > > so you don't need a ->wait.

> > >

> > > Can you type up a kerneldoc patch for dma_fence_ops->wait to hammer

> > > this in please?

> > > -Daniel

> > >

> > > >

> > > > BR,

> > > > -R

> > > >

> > > > > Regards,

> > > > > Christian.

> > > > >

> > > > > >

> > > > > > Note that this removes the !timeout case, which has not been used in

> > > > > > a long time.

> > > > >

> > > > >

> > > > > >

> > > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>

> > > > > > ---

> > > > > >   drivers/gpu/drm/msm/msm_fence.c | 59 +++++++++++++++++++--------------

> > > > > >   1 file changed, 34 insertions(+), 25 deletions(-)

> > > > > >

> > > > > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c

> > > > > > index cd59a5918038..8ee96b90ded6 100644

> > > > > > --- a/drivers/gpu/drm/msm/msm_fence.c

> > > > > > +++ b/drivers/gpu/drm/msm/msm_fence.c

> > > > > > @@ -38,11 +38,10 @@ static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fenc

> > > > > >       return (int32_t)(fctx->completed_fence - fence) >= 0;

> > > > > >   }

> > > > > >

> > > > > > -/* legacy path for WAIT_FENCE ioctl: */

> > > > > > -int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > > -             ktime_t *timeout, bool interruptible)

> > > > > > +static signed long wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > > +             signed long remaining_jiffies, bool interruptible)

> > > > > >   {

> > > > > > -     int ret;

> > > > > > +     signed long ret;

> > > > > >

> > > > > >       if (fence > fctx->last_fence) {

> > > > > >               DRM_ERROR_RATELIMITED("%s: waiting on invalid fence: %u (of %u)\n",

> > > > > > @@ -50,33 +49,34 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > >               return -EINVAL;

> > > > > >       }

> > > > > >

> > > > > > -     if (!timeout) {

> > > > > > -             /* no-wait: */

> > > > > > -             ret = fence_completed(fctx, fence) ? 0 : -EBUSY;

> > > > > > +     if (interruptible) {

> > > > > > +             ret = wait_event_interruptible_timeout(fctx->event,

> > > > > > +                     fence_completed(fctx, fence),

> > > > > > +                     remaining_jiffies);

> > > > > >       } else {

> > > > > > -             unsigned long remaining_jiffies = timeout_to_jiffies(timeout);

> > > > > > -

> > > > > > -             if (interruptible)

> > > > > > -                     ret = wait_event_interruptible_timeout(fctx->event,

> > > > > > -                             fence_completed(fctx, fence),

> > > > > > -                             remaining_jiffies);

> > > > > > -             else

> > > > > > -                     ret = wait_event_timeout(fctx->event,

> > > > > > -                             fence_completed(fctx, fence),

> > > > > > -                             remaining_jiffies);

> > > > > > -

> > > > > > -             if (ret == 0) {

> > > > > > -                     DBG("timeout waiting for fence: %u (completed: %u)",

> > > > > > -                                     fence, fctx->completed_fence);

> > > > > > -                     ret = -ETIMEDOUT;

> > > > > > -             } else if (ret != -ERESTARTSYS) {

> > > > > > -                     ret = 0;

> > > > > > -             }

> > > > > > +             ret = wait_event_timeout(fctx->event,

> > > > > > +                     fence_completed(fctx, fence),

> > > > > > +                     remaining_jiffies);

> > > > > > +     }

> > > > > > +

> > > > > > +     if (ret == 0) {

> > > > > > +             DBG("timeout waiting for fence: %u (completed: %u)",

> > > > > > +                             fence, fctx->completed_fence);

> > > > > > +             ret = -ETIMEDOUT;

> > > > > > +     } else if (ret != -ERESTARTSYS) {

> > > > > > +             ret = 0;

> > > > > >       }

> > > > > >

> > > > > >       return ret;

> > > > > >   }

> > > > > >

> > > > > > +/* legacy path for WAIT_FENCE ioctl: */

> > > > > > +int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > > +             ktime_t *timeout, bool interruptible)

> > > > > > +{

> > > > > > +     return wait_fence(fctx, fence, timeout_to_jiffies(timeout), interruptible);

> > > > > > +}

> > > > > > +

> > > > > >   /* called from workqueue */

> > > > > >   void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)

> > > > > >   {

> > > > > > @@ -114,10 +114,19 @@ static bool msm_fence_signaled(struct dma_fence *fence)

> > > > > >       return fence_completed(f->fctx, f->base.seqno);

> > > > > >   }

> > > > > >

> > > > > > +static signed long msm_fence_wait(struct dma_fence *fence, bool intr,

> > > > > > +             signed long timeout)

> > > > > > +{

> > > > > > +     struct msm_fence *f = to_msm_fence(fence);

> > > > > > +

> > > > > > +     return wait_fence(f->fctx, fence->seqno, timeout, intr);

> > > > > > +}

> > > > > > +

> > > > > >   static const struct dma_fence_ops msm_fence_ops = {

> > > > > >       .get_driver_name = msm_fence_get_driver_name,

> > > > > >       .get_timeline_name = msm_fence_get_timeline_name,

> > > > > >       .signaled = msm_fence_signaled,

> > > > > > +     .wait = msm_fence_wait,

> > > > > >   };

> > > > > >

> > > > > >   struct dma_fence *

> > > > >

> > >

> > >

> > >

> > > --

> > > Daniel Vetter

> > > Software Engineer, Intel Corporation

> > > http://blog.ffwll.ch

>

>

>

> --

> Daniel Vetter

> Software Engineer, Intel Corporation

> http://blog.ffwll.ch

Daniel Vetter July 21, 2021, 7:03 p.m. UTC | #4

On Wed, Jul 21, 2021 at 09:34:43AM -0700, Rob Clark wrote:
> On Wed, Jul 21, 2021 at 12:59 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Jul 21, 2021 at 12:32 AM Rob Clark <robdclark@gmail.com> wrote:
> > >
> > > On Tue, Jul 20, 2021 at 1:55 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Tue, Jul 20, 2021 at 8:26 PM Rob Clark <robdclark@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jul 20, 2021 at 11:03 AM Christian König
> > > > > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > > > >
> > > > > > Hi Rob,
> > > > > >
> > > > > > Am 20.07.21 um 17:07 schrieb Rob Clark:
> > > > > > > From: Rob Clark <robdclark@chromium.org>
> > > > > > >
> > > > > > > Somehow we had neither ->wait() nor dma_fence_signal() calls, and no
> > > > > > > one noticed.  Oops.
> > > > > >
> > > > > >
> > > > > > I'm not sure if that is a good idea.
> > > > > >
> > > > > > The dma_fence->wait() callback is pretty much deprecated and should not
> > > > > > be used any more.
> > > > > >
> > > > > > What exactly do you need that for?
> > > > >
> > > > > Well, the alternative is to track the set of fences which have
> > > > > signalling enabled, and then figure out which ones to signal, which
> > > > > seems like a lot more work, vs just re-purposing the wait
> > > > > implementation we already have for non-dma_fence cases ;-)
> > > > >
> > > > > Why is the ->wait() callback (pretty much) deprecated?
> > > >
> > > > Because if you need it that means for your driver dma_fence_add_cb is
> > > > broken, which means a _lot_ of things don't work. Like dma_buf poll
> > > > (compositors have patches to start using that), and I think
> > > > drm/scheduler also becomes rather unhappy.
> > >
> > > I'm starting to page back in how this works.. fence cb's aren't broken
> > > (which is also why dma_fence_wait() was not completely broken),
> > > because in retire_submits() we call
> > > dma_fence_is_signaled(submit->hw_fence).
> > >
> > > But the reason that the custom wait function cleans up a tiny bit of
> > > jank is that the wait_queue_head_t gets signaled earlier, before we
> > > start iterating the submits and doing all that retire_submit() stuff
> > > (unpin/unref bo's, etc).  I suppose I could just split things up to
> > > call dma_fence_signal() earlier, and *then* do the retire_submits()
> > > stuff.
> >
> > Yeah reducing the latency there sounds like a good idea.
> > -Daniel
> >
> 
> Hmm, no, turns out that isn't the problem.. or, well, it is probably a
> good idea to call drm_fence_signal() earlier.  But it seems like
> waking up from wait_event_* is faster than wake_up_state(wait->task,
> TASK_NORMAL).  I suppose the wake_up_state() approach still needs for
> the scheduler to get around to schedule the runnable task.
> 
> So for now, I'm going back to my own wait function (plus earlier
> drm_fence_signal())
> 
> Before removing dma_fence_opps::wait(), I guess we want to re-think
> dma_fence_default_wait().. but I think that would require a
> dma_fence_context base class (rather than just a raw integer).

Uh that's not great ... can't we fix this instead of papering over it in
drivers? Aside from maybe different wakeup flags it all is supposed to
work exactly the same underneath, and whether using a wait queue or not
really shouldn't matter.
-Daniel

> 
> BR,
> -R
> 
> > >
> > > BR,
> > > -R
> > >
> > > > It essentially exists only for old drivers where ->enable_signalling
> > > > is unreliable and we paper over that with a retry loop in ->wait and
> > > > pray no one notices that it's too butchered. The proper fix is to have
> > > > a driver thread to guarantee that ->enable_signalling works reliable,
> > > > so you don't need a ->wait.
> > > >
> > > > Can you type up a kerneldoc patch for dma_fence_ops->wait to hammer
> > > > this in please?
> > > > -Daniel
> > > >
> > > > >
> > > > > BR,
> > > > > -R
> > > > >
> > > > > > Regards,
> > > > > > Christian.
> > > > > >
> > > > > > >
> > > > > > > Note that this removes the !timeout case, which has not been used in
> > > > > > > a long time.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > > > > > ---
> > > > > > >   drivers/gpu/drm/msm/msm_fence.c | 59 +++++++++++++++++++--------------
> > > > > > >   1 file changed, 34 insertions(+), 25 deletions(-)
> > > > > > >
> > > > > > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> > > > > > > index cd59a5918038..8ee96b90ded6 100644
> > > > > > > --- a/drivers/gpu/drm/msm/msm_fence.c
> > > > > > > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > > > > > > @@ -38,11 +38,10 @@ static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fenc
> > > > > > >       return (int32_t)(fctx->completed_fence - fence) >= 0;
> > > > > > >   }
> > > > > > >
> > > > > > > -/* legacy path for WAIT_FENCE ioctl: */
> > > > > > > -int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > > > > > > -             ktime_t *timeout, bool interruptible)
> > > > > > > +static signed long wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > > > > > > +             signed long remaining_jiffies, bool interruptible)
> > > > > > >   {
> > > > > > > -     int ret;
> > > > > > > +     signed long ret;
> > > > > > >
> > > > > > >       if (fence > fctx->last_fence) {
> > > > > > >               DRM_ERROR_RATELIMITED("%s: waiting on invalid fence: %u (of %u)\n",
> > > > > > > @@ -50,33 +49,34 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > > > > > >               return -EINVAL;
> > > > > > >       }
> > > > > > >
> > > > > > > -     if (!timeout) {
> > > > > > > -             /* no-wait: */
> > > > > > > -             ret = fence_completed(fctx, fence) ? 0 : -EBUSY;
> > > > > > > +     if (interruptible) {
> > > > > > > +             ret = wait_event_interruptible_timeout(fctx->event,
> > > > > > > +                     fence_completed(fctx, fence),
> > > > > > > +                     remaining_jiffies);
> > > > > > >       } else {
> > > > > > > -             unsigned long remaining_jiffies = timeout_to_jiffies(timeout);
> > > > > > > -
> > > > > > > -             if (interruptible)
> > > > > > > -                     ret = wait_event_interruptible_timeout(fctx->event,
> > > > > > > -                             fence_completed(fctx, fence),
> > > > > > > -                             remaining_jiffies);
> > > > > > > -             else
> > > > > > > -                     ret = wait_event_timeout(fctx->event,
> > > > > > > -                             fence_completed(fctx, fence),
> > > > > > > -                             remaining_jiffies);
> > > > > > > -
> > > > > > > -             if (ret == 0) {
> > > > > > > -                     DBG("timeout waiting for fence: %u (completed: %u)",
> > > > > > > -                                     fence, fctx->completed_fence);
> > > > > > > -                     ret = -ETIMEDOUT;
> > > > > > > -             } else if (ret != -ERESTARTSYS) {
> > > > > > > -                     ret = 0;
> > > > > > > -             }
> > > > > > > +             ret = wait_event_timeout(fctx->event,
> > > > > > > +                     fence_completed(fctx, fence),
> > > > > > > +                     remaining_jiffies);
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     if (ret == 0) {
> > > > > > > +             DBG("timeout waiting for fence: %u (completed: %u)",
> > > > > > > +                             fence, fctx->completed_fence);
> > > > > > > +             ret = -ETIMEDOUT;
> > > > > > > +     } else if (ret != -ERESTARTSYS) {
> > > > > > > +             ret = 0;
> > > > > > >       }
> > > > > > >
> > > > > > >       return ret;
> > > > > > >   }
> > > > > > >
> > > > > > > +/* legacy path for WAIT_FENCE ioctl: */
> > > > > > > +int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
> > > > > > > +             ktime_t *timeout, bool interruptible)
> > > > > > > +{
> > > > > > > +     return wait_fence(fctx, fence, timeout_to_jiffies(timeout), interruptible);
> > > > > > > +}
> > > > > > > +
> > > > > > >   /* called from workqueue */
> > > > > > >   void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
> > > > > > >   {
> > > > > > > @@ -114,10 +114,19 @@ static bool msm_fence_signaled(struct dma_fence *fence)
> > > > > > >       return fence_completed(f->fctx, f->base.seqno);
> > > > > > >   }
> > > > > > >
> > > > > > > +static signed long msm_fence_wait(struct dma_fence *fence, bool intr,
> > > > > > > +             signed long timeout)
> > > > > > > +{
> > > > > > > +     struct msm_fence *f = to_msm_fence(fence);
> > > > > > > +
> > > > > > > +     return wait_fence(f->fctx, fence->seqno, timeout, intr);
> > > > > > > +}
> > > > > > > +
> > > > > > >   static const struct dma_fence_ops msm_fence_ops = {
> > > > > > >       .get_driver_name = msm_fence_get_driver_name,
> > > > > > >       .get_timeline_name = msm_fence_get_timeline_name,
> > > > > > >       .signaled = msm_fence_signaled,
> > > > > > > +     .wait = msm_fence_wait,
> > > > > > >   };
> > > > > > >
> > > > > > >   struct dma_fence *
> > > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Daniel Vetter
> > > > Software Engineer, Intel Corporation
> > > > http://blog.ffwll.ch
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

Christian König July 22, 2021, 8:42 a.m. UTC | #5

Am 21.07.21 um 21:03 schrieb Daniel Vetter:
> On Wed, Jul 21, 2021 at 09:34:43AM -0700, Rob Clark wrote:

>> On Wed, Jul 21, 2021 at 12:59 AM Daniel Vetter <daniel@ffwll.ch> wrote:

>>> On Wed, Jul 21, 2021 at 12:32 AM Rob Clark <robdclark@gmail.com> wrote:

>>>> On Tue, Jul 20, 2021 at 1:55 PM Daniel Vetter <daniel@ffwll.ch> wrote:

>>>>> On Tue, Jul 20, 2021 at 8:26 PM Rob Clark <robdclark@gmail.com> wrote:

>>>>>> On Tue, Jul 20, 2021 at 11:03 AM Christian König

>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:

>>>>>>> Hi Rob,

>>>>>>>

>>>>>>> Am 20.07.21 um 17:07 schrieb Rob Clark:

>>>>>>>> From: Rob Clark <robdclark@chromium.org>

>>>>>>>>

>>>>>>>> Somehow we had neither ->wait() nor dma_fence_signal() calls, and no

>>>>>>>> one noticed.  Oops.

>>>>>>>

>>>>>>> I'm not sure if that is a good idea.

>>>>>>>

>>>>>>> The dma_fence->wait() callback is pretty much deprecated and should not

>>>>>>> be used any more.

>>>>>>>

>>>>>>> What exactly do you need that for?

>>>>>> Well, the alternative is to track the set of fences which have

>>>>>> signalling enabled, and then figure out which ones to signal, which

>>>>>> seems like a lot more work, vs just re-purposing the wait

>>>>>> implementation we already have for non-dma_fence cases ;-)

>>>>>>

>>>>>> Why is the ->wait() callback (pretty much) deprecated?

>>>>> Because if you need it that means for your driver dma_fence_add_cb is

>>>>> broken, which means a _lot_ of things don't work. Like dma_buf poll

>>>>> (compositors have patches to start using that), and I think

>>>>> drm/scheduler also becomes rather unhappy.

>>>> I'm starting to page back in how this works.. fence cb's aren't broken

>>>> (which is also why dma_fence_wait() was not completely broken),

>>>> because in retire_submits() we call

>>>> dma_fence_is_signaled(submit->hw_fence).

>>>>

>>>> But the reason that the custom wait function cleans up a tiny bit of

>>>> jank is that the wait_queue_head_t gets signaled earlier, before we

>>>> start iterating the submits and doing all that retire_submit() stuff

>>>> (unpin/unref bo's, etc).  I suppose I could just split things up to

>>>> call dma_fence_signal() earlier, and *then* do the retire_submits()

>>>> stuff.

>>> Yeah reducing the latency there sounds like a good idea.

>>> -Daniel

>>>

>> Hmm, no, turns out that isn't the problem.. or, well, it is probably a

>> good idea to call drm_fence_signal() earlier.  But it seems like

>> waking up from wait_event_* is faster than wake_up_state(wait->task,

>> TASK_NORMAL).  I suppose the wake_up_state() approach still needs for

>> the scheduler to get around to schedule the runnable task.


As far as I know wake_up_state() tries to run the thread on the CPU it 
was scheduled last, while wait_event_* makes the thread run on the CPU 
who issues the wake by default.

And yes I've also noticed this already and it was one of the reason why 
I suggested to use a wait_queue instead of the hand wired dma_fence_wait 
implementation.

>>

>> So for now, I'm going back to my own wait function (plus earlier

>> drm_fence_signal())

>>

>> Before removing dma_fence_opps::wait(), I guess we want to re-think

>> dma_fence_default_wait().. but I think that would require a

>> dma_fence_context base class (rather than just a raw integer).

> Uh that's not great ... can't we fix this instead of papering over it in

> drivers? Aside from maybe different wakeup flags it all is supposed to

> work exactly the same underneath, and whether using a wait queue or not

> really shouldn't matter.


Well it would have been nicer if we used the existing infrastructure 
instead of re-inventing stuff for dma_fence, but that chance is long gone.

And you don't need a dma_fence_context base class, but rather just a 
flag in the dma_fence_ops if you want to change the behavior.

Regards,
Christian.

> -Daniel

>

>> BR,

>> -R

>>

>>>> BR,

>>>> -R

>>>>

>>>>> It essentially exists only for old drivers where ->enable_signalling

>>>>> is unreliable and we paper over that with a retry loop in ->wait and

>>>>> pray no one notices that it's too butchered. The proper fix is to have

>>>>> a driver thread to guarantee that ->enable_signalling works reliable,

>>>>> so you don't need a ->wait.

>>>>>

>>>>> Can you type up a kerneldoc patch for dma_fence_ops->wait to hammer

>>>>> this in please?

>>>>> -Daniel

>>>>>

>>>>>> BR,

>>>>>> -R

>>>>>>

>>>>>>> Regards,

>>>>>>> Christian.

>>>>>>>

>>>>>>>> Note that this removes the !timeout case, which has not been used in

>>>>>>>> a long time.

>>>>>>>

>>>>>>>> Signed-off-by: Rob Clark <robdclark@chromium.org>

>>>>>>>> ---

>>>>>>>>    drivers/gpu/drm/msm/msm_fence.c | 59 +++++++++++++++++++--------------

>>>>>>>>    1 file changed, 34 insertions(+), 25 deletions(-)

>>>>>>>>

>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c

>>>>>>>> index cd59a5918038..8ee96b90ded6 100644

>>>>>>>> --- a/drivers/gpu/drm/msm/msm_fence.c

>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_fence.c

>>>>>>>> @@ -38,11 +38,10 @@ static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fenc

>>>>>>>>        return (int32_t)(fctx->completed_fence - fence) >= 0;

>>>>>>>>    }

>>>>>>>>

>>>>>>>> -/* legacy path for WAIT_FENCE ioctl: */

>>>>>>>> -int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

>>>>>>>> -             ktime_t *timeout, bool interruptible)

>>>>>>>> +static signed long wait_fence(struct msm_fence_context *fctx, uint32_t fence,

>>>>>>>> +             signed long remaining_jiffies, bool interruptible)

>>>>>>>>    {

>>>>>>>> -     int ret;

>>>>>>>> +     signed long ret;

>>>>>>>>

>>>>>>>>        if (fence > fctx->last_fence) {

>>>>>>>>                DRM_ERROR_RATELIMITED("%s: waiting on invalid fence: %u (of %u)\n",

>>>>>>>> @@ -50,33 +49,34 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

>>>>>>>>                return -EINVAL;

>>>>>>>>        }

>>>>>>>>

>>>>>>>> -     if (!timeout) {

>>>>>>>> -             /* no-wait: */

>>>>>>>> -             ret = fence_completed(fctx, fence) ? 0 : -EBUSY;

>>>>>>>> +     if (interruptible) {

>>>>>>>> +             ret = wait_event_interruptible_timeout(fctx->event,

>>>>>>>> +                     fence_completed(fctx, fence),

>>>>>>>> +                     remaining_jiffies);

>>>>>>>>        } else {

>>>>>>>> -             unsigned long remaining_jiffies = timeout_to_jiffies(timeout);

>>>>>>>> -

>>>>>>>> -             if (interruptible)

>>>>>>>> -                     ret = wait_event_interruptible_timeout(fctx->event,

>>>>>>>> -                             fence_completed(fctx, fence),

>>>>>>>> -                             remaining_jiffies);

>>>>>>>> -             else

>>>>>>>> -                     ret = wait_event_timeout(fctx->event,

>>>>>>>> -                             fence_completed(fctx, fence),

>>>>>>>> -                             remaining_jiffies);

>>>>>>>> -

>>>>>>>> -             if (ret == 0) {

>>>>>>>> -                     DBG("timeout waiting for fence: %u (completed: %u)",

>>>>>>>> -                                     fence, fctx->completed_fence);

>>>>>>>> -                     ret = -ETIMEDOUT;

>>>>>>>> -             } else if (ret != -ERESTARTSYS) {

>>>>>>>> -                     ret = 0;

>>>>>>>> -             }

>>>>>>>> +             ret = wait_event_timeout(fctx->event,

>>>>>>>> +                     fence_completed(fctx, fence),

>>>>>>>> +                     remaining_jiffies);

>>>>>>>> +     }

>>>>>>>> +

>>>>>>>> +     if (ret == 0) {

>>>>>>>> +             DBG("timeout waiting for fence: %u (completed: %u)",

>>>>>>>> +                             fence, fctx->completed_fence);

>>>>>>>> +             ret = -ETIMEDOUT;

>>>>>>>> +     } else if (ret != -ERESTARTSYS) {

>>>>>>>> +             ret = 0;

>>>>>>>>        }

>>>>>>>>

>>>>>>>>        return ret;

>>>>>>>>    }

>>>>>>>>

>>>>>>>> +/* legacy path for WAIT_FENCE ioctl: */

>>>>>>>> +int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

>>>>>>>> +             ktime_t *timeout, bool interruptible)

>>>>>>>> +{

>>>>>>>> +     return wait_fence(fctx, fence, timeout_to_jiffies(timeout), interruptible);

>>>>>>>> +}

>>>>>>>> +

>>>>>>>>    /* called from workqueue */

>>>>>>>>    void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)

>>>>>>>>    {

>>>>>>>> @@ -114,10 +114,19 @@ static bool msm_fence_signaled(struct dma_fence *fence)

>>>>>>>>        return fence_completed(f->fctx, f->base.seqno);

>>>>>>>>    }

>>>>>>>>

>>>>>>>> +static signed long msm_fence_wait(struct dma_fence *fence, bool intr,

>>>>>>>> +             signed long timeout)

>>>>>>>> +{

>>>>>>>> +     struct msm_fence *f = to_msm_fence(fence);

>>>>>>>> +

>>>>>>>> +     return wait_fence(f->fctx, fence->seqno, timeout, intr);

>>>>>>>> +}

>>>>>>>> +

>>>>>>>>    static const struct dma_fence_ops msm_fence_ops = {

>>>>>>>>        .get_driver_name = msm_fence_get_driver_name,

>>>>>>>>        .get_timeline_name = msm_fence_get_timeline_name,

>>>>>>>>        .signaled = msm_fence_signaled,

>>>>>>>> +     .wait = msm_fence_wait,

>>>>>>>>    };

>>>>>>>>

>>>>>>>>    struct dma_fence *

>>>>>

>>>>>

>>>>> --

>>>>> Daniel Vetter

>>>>> Software Engineer, Intel Corporation

>>>>> http://blog.ffwll.ch

>>>

>>>

>>> --

>>> Daniel Vetter

>>> Software Engineer, Intel Corporation

>>> http://blog.ffwll.ch

Daniel Vetter July 22, 2021, 9:08 a.m. UTC | #6

On Thu, Jul 22, 2021 at 10:42:26AM +0200, Christian König wrote:
> Am 21.07.21 um 21:03 schrieb Daniel Vetter:

> > On Wed, Jul 21, 2021 at 09:34:43AM -0700, Rob Clark wrote:

> > > On Wed, Jul 21, 2021 at 12:59 AM Daniel Vetter <daniel@ffwll.ch> wrote:

> > > > On Wed, Jul 21, 2021 at 12:32 AM Rob Clark <robdclark@gmail.com> wrote:

> > > > > On Tue, Jul 20, 2021 at 1:55 PM Daniel Vetter <daniel@ffwll.ch> wrote:

> > > > > > On Tue, Jul 20, 2021 at 8:26 PM Rob Clark <robdclark@gmail.com> wrote:

> > > > > > > On Tue, Jul 20, 2021 at 11:03 AM Christian König

> > > > > > > <ckoenig.leichtzumerken@gmail.com> wrote:

> > > > > > > > Hi Rob,

> > > > > > > > 

> > > > > > > > Am 20.07.21 um 17:07 schrieb Rob Clark:

> > > > > > > > > From: Rob Clark <robdclark@chromium.org>

> > > > > > > > > 

> > > > > > > > > Somehow we had neither ->wait() nor dma_fence_signal() calls, and no

> > > > > > > > > one noticed.  Oops.

> > > > > > > > 

> > > > > > > > I'm not sure if that is a good idea.

> > > > > > > > 

> > > > > > > > The dma_fence->wait() callback is pretty much deprecated and should not

> > > > > > > > be used any more.

> > > > > > > > 

> > > > > > > > What exactly do you need that for?

> > > > > > > Well, the alternative is to track the set of fences which have

> > > > > > > signalling enabled, and then figure out which ones to signal, which

> > > > > > > seems like a lot more work, vs just re-purposing the wait

> > > > > > > implementation we already have for non-dma_fence cases ;-)

> > > > > > > 

> > > > > > > Why is the ->wait() callback (pretty much) deprecated?

> > > > > > Because if you need it that means for your driver dma_fence_add_cb is

> > > > > > broken, which means a _lot_ of things don't work. Like dma_buf poll

> > > > > > (compositors have patches to start using that), and I think

> > > > > > drm/scheduler also becomes rather unhappy.

> > > > > I'm starting to page back in how this works.. fence cb's aren't broken

> > > > > (which is also why dma_fence_wait() was not completely broken),

> > > > > because in retire_submits() we call

> > > > > dma_fence_is_signaled(submit->hw_fence).

> > > > > 

> > > > > But the reason that the custom wait function cleans up a tiny bit of

> > > > > jank is that the wait_queue_head_t gets signaled earlier, before we

> > > > > start iterating the submits and doing all that retire_submit() stuff

> > > > > (unpin/unref bo's, etc).  I suppose I could just split things up to

> > > > > call dma_fence_signal() earlier, and *then* do the retire_submits()

> > > > > stuff.

> > > > Yeah reducing the latency there sounds like a good idea.

> > > > -Daniel

> > > > 

> > > Hmm, no, turns out that isn't the problem.. or, well, it is probably a

> > > good idea to call drm_fence_signal() earlier.  But it seems like

> > > waking up from wait_event_* is faster than wake_up_state(wait->task,

> > > TASK_NORMAL).  I suppose the wake_up_state() approach still needs for

> > > the scheduler to get around to schedule the runnable task.

> 

> As far as I know wake_up_state() tries to run the thread on the CPU it was

> scheduled last, while wait_event_* makes the thread run on the CPU who

> issues the wake by default.

> 

> And yes I've also noticed this already and it was one of the reason why I

> suggested to use a wait_queue instead of the hand wired dma_fence_wait

> implementation.


The first versions had used wait_queue, but iirc we had some issues with
the callbacks and stuff and that was the reasons for hand-rolling. Or
maybe it was the integration of the lockless fastpath for
dma_fence_is_signalled().

> > > So for now, I'm going back to my own wait function (plus earlier

> > > drm_fence_signal())

> > > 

> > > Before removing dma_fence_opps::wait(), I guess we want to re-think

> > > dma_fence_default_wait().. but I think that would require a

> > > dma_fence_context base class (rather than just a raw integer).

> > Uh that's not great ... can't we fix this instead of papering over it in

> > drivers? Aside from maybe different wakeup flags it all is supposed to

> > work exactly the same underneath, and whether using a wait queue or not

> > really shouldn't matter.

> 

> Well it would have been nicer if we used the existing infrastructure instead

> of re-inventing stuff for dma_fence, but that chance is long gone.

> 

> And you don't need a dma_fence_context base class, but rather just a flag in

> the dma_fence_ops if you want to change the behavior.


If there's something broken we should just fix it, not force everyone to
set a random flag. dma_fence work like special wait_queues, so if we
differ then we should go back to that.
-Daniel

> 

> Regards,

> Christian.

> 

> > -Daniel

> > 

> > > BR,

> > > -R

> > > 

> > > > > BR,

> > > > > -R

> > > > > 

> > > > > > It essentially exists only for old drivers where ->enable_signalling

> > > > > > is unreliable and we paper over that with a retry loop in ->wait and

> > > > > > pray no one notices that it's too butchered. The proper fix is to have

> > > > > > a driver thread to guarantee that ->enable_signalling works reliable,

> > > > > > so you don't need a ->wait.

> > > > > > 

> > > > > > Can you type up a kerneldoc patch for dma_fence_ops->wait to hammer

> > > > > > this in please?

> > > > > > -Daniel

> > > > > > 

> > > > > > > BR,

> > > > > > > -R

> > > > > > > 

> > > > > > > > Regards,

> > > > > > > > Christian.

> > > > > > > > 

> > > > > > > > > Note that this removes the !timeout case, which has not been used in

> > > > > > > > > a long time.

> > > > > > > > 

> > > > > > > > > Signed-off-by: Rob Clark <robdclark@chromium.org>

> > > > > > > > > ---

> > > > > > > > >    drivers/gpu/drm/msm/msm_fence.c | 59 +++++++++++++++++++--------------

> > > > > > > > >    1 file changed, 34 insertions(+), 25 deletions(-)

> > > > > > > > > 

> > > > > > > > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c

> > > > > > > > > index cd59a5918038..8ee96b90ded6 100644

> > > > > > > > > --- a/drivers/gpu/drm/msm/msm_fence.c

> > > > > > > > > +++ b/drivers/gpu/drm/msm/msm_fence.c

> > > > > > > > > @@ -38,11 +38,10 @@ static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fenc

> > > > > > > > >        return (int32_t)(fctx->completed_fence - fence) >= 0;

> > > > > > > > >    }

> > > > > > > > > 

> > > > > > > > > -/* legacy path for WAIT_FENCE ioctl: */

> > > > > > > > > -int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > > > > > -             ktime_t *timeout, bool interruptible)

> > > > > > > > > +static signed long wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > > > > > +             signed long remaining_jiffies, bool interruptible)

> > > > > > > > >    {

> > > > > > > > > -     int ret;

> > > > > > > > > +     signed long ret;

> > > > > > > > > 

> > > > > > > > >        if (fence > fctx->last_fence) {

> > > > > > > > >                DRM_ERROR_RATELIMITED("%s: waiting on invalid fence: %u (of %u)\n",

> > > > > > > > > @@ -50,33 +49,34 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > > > > >                return -EINVAL;

> > > > > > > > >        }

> > > > > > > > > 

> > > > > > > > > -     if (!timeout) {

> > > > > > > > > -             /* no-wait: */

> > > > > > > > > -             ret = fence_completed(fctx, fence) ? 0 : -EBUSY;

> > > > > > > > > +     if (interruptible) {

> > > > > > > > > +             ret = wait_event_interruptible_timeout(fctx->event,

> > > > > > > > > +                     fence_completed(fctx, fence),

> > > > > > > > > +                     remaining_jiffies);

> > > > > > > > >        } else {

> > > > > > > > > -             unsigned long remaining_jiffies = timeout_to_jiffies(timeout);

> > > > > > > > > -

> > > > > > > > > -             if (interruptible)

> > > > > > > > > -                     ret = wait_event_interruptible_timeout(fctx->event,

> > > > > > > > > -                             fence_completed(fctx, fence),

> > > > > > > > > -                             remaining_jiffies);

> > > > > > > > > -             else

> > > > > > > > > -                     ret = wait_event_timeout(fctx->event,

> > > > > > > > > -                             fence_completed(fctx, fence),

> > > > > > > > > -                             remaining_jiffies);

> > > > > > > > > -

> > > > > > > > > -             if (ret == 0) {

> > > > > > > > > -                     DBG("timeout waiting for fence: %u (completed: %u)",

> > > > > > > > > -                                     fence, fctx->completed_fence);

> > > > > > > > > -                     ret = -ETIMEDOUT;

> > > > > > > > > -             } else if (ret != -ERESTARTSYS) {

> > > > > > > > > -                     ret = 0;

> > > > > > > > > -             }

> > > > > > > > > +             ret = wait_event_timeout(fctx->event,

> > > > > > > > > +                     fence_completed(fctx, fence),

> > > > > > > > > +                     remaining_jiffies);

> > > > > > > > > +     }

> > > > > > > > > +

> > > > > > > > > +     if (ret == 0) {

> > > > > > > > > +             DBG("timeout waiting for fence: %u (completed: %u)",

> > > > > > > > > +                             fence, fctx->completed_fence);

> > > > > > > > > +             ret = -ETIMEDOUT;

> > > > > > > > > +     } else if (ret != -ERESTARTSYS) {

> > > > > > > > > +             ret = 0;

> > > > > > > > >        }

> > > > > > > > > 

> > > > > > > > >        return ret;

> > > > > > > > >    }

> > > > > > > > > 

> > > > > > > > > +/* legacy path for WAIT_FENCE ioctl: */

> > > > > > > > > +int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,

> > > > > > > > > +             ktime_t *timeout, bool interruptible)

> > > > > > > > > +{

> > > > > > > > > +     return wait_fence(fctx, fence, timeout_to_jiffies(timeout), interruptible);

> > > > > > > > > +}

> > > > > > > > > +

> > > > > > > > >    /* called from workqueue */

> > > > > > > > >    void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)

> > > > > > > > >    {

> > > > > > > > > @@ -114,10 +114,19 @@ static bool msm_fence_signaled(struct dma_fence *fence)

> > > > > > > > >        return fence_completed(f->fctx, f->base.seqno);

> > > > > > > > >    }

> > > > > > > > > 

> > > > > > > > > +static signed long msm_fence_wait(struct dma_fence *fence, bool intr,

> > > > > > > > > +             signed long timeout)

> > > > > > > > > +{

> > > > > > > > > +     struct msm_fence *f = to_msm_fence(fence);

> > > > > > > > > +

> > > > > > > > > +     return wait_fence(f->fctx, fence->seqno, timeout, intr);

> > > > > > > > > +}

> > > > > > > > > +

> > > > > > > > >    static const struct dma_fence_ops msm_fence_ops = {

> > > > > > > > >        .get_driver_name = msm_fence_get_driver_name,

> > > > > > > > >        .get_timeline_name = msm_fence_get_timeline_name,

> > > > > > > > >        .signaled = msm_fence_signaled,

> > > > > > > > > +     .wait = msm_fence_wait,

> > > > > > > > >    };

> > > > > > > > > 

> > > > > > > > >    struct dma_fence *

> > > > > > 

> > > > > > 

> > > > > > --

> > > > > > Daniel Vetter

> > > > > > Software Engineer, Intel Corporation

> > > > > > http://blog.ffwll.ch

> > > > 

> > > > 

> > > > --

> > > > Daniel Vetter

> > > > Software Engineer, Intel Corporation

> > > > http://blog.ffwll.ch

> 


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Christian König July 22, 2021, 9:28 a.m. UTC | #7

Am 22.07.21 um 11:08 schrieb Daniel Vetter:
> [SNIP]
>> As far as I know wake_up_state() tries to run the thread on the CPU it was
>> scheduled last, while wait_event_* makes the thread run on the CPU who
>> issues the wake by default.
>>
>> And yes I've also noticed this already and it was one of the reason why I
>> suggested to use a wait_queue instead of the hand wired dma_fence_wait
>> implementation.
> The first versions had used wait_queue, but iirc we had some issues with
> the callbacks and stuff and that was the reasons for hand-rolling. Or
> maybe it was the integration of the lockless fastpath for
> dma_fence_is_signalled().
>
>> [SNIP]
>> Well it would have been nicer if we used the existing infrastructure instead
>> of re-inventing stuff for dma_fence, but that chance is long gone.
>>
>> And you don't need a dma_fence_context base class, but rather just a flag in
>> the dma_fence_ops if you want to change the behavior.
> If there's something broken we should just fix it, not force everyone to
> set a random flag. dma_fence work like special wait_queues, so if we
> differ then we should go back to that.

Wait a second with that, this is not broken. It's just different 
behavior and there are good arguments for both sides.

If a wait is short you can have situations where you want to start the 
thread on the original CPU.
     This is because you can assume that the caches on that CPU are 
still hot and heating up the caches on the local CPU would take longer 
than an inter CPU interrupt.

But if the wait is long it makes more sense to run the thread on the CPU 
where you noticed the wake up event.
     This is because you can assume that the caches are cold anyway and 
starting the thread on the current CPU (most likely from an interrupt 
handler) gives you the absolutely best latency.
     In other words you usually return from the interrupt handler and 
just directly switch to the now running thread.

I'm not sure if all drivers want the same behavior. Rob here seems to 
prefer number 2, but we have used 1 for dma_fence for a rather long time 
now and it could be that some people start to complain when we switch 
unconditionally.

Regards,
Christian.

> -Daniel
>

Daniel Vetter July 22, 2021, 10:47 a.m. UTC | #8

On Thu, Jul 22, 2021 at 11:28:01AM +0200, Christian König wrote:
> Am 22.07.21 um 11:08 schrieb Daniel Vetter:
> > [SNIP]
> > > As far as I know wake_up_state() tries to run the thread on the CPU it was
> > > scheduled last, while wait_event_* makes the thread run on the CPU who
> > > issues the wake by default.
> > > 
> > > And yes I've also noticed this already and it was one of the reason why I
> > > suggested to use a wait_queue instead of the hand wired dma_fence_wait
> > > implementation.
> > The first versions had used wait_queue, but iirc we had some issues with
> > the callbacks and stuff and that was the reasons for hand-rolling. Or
> > maybe it was the integration of the lockless fastpath for
> > dma_fence_is_signalled().
> > 
> > > [SNIP]
> > > Well it would have been nicer if we used the existing infrastructure instead
> > > of re-inventing stuff for dma_fence, but that chance is long gone.
> > > 
> > > And you don't need a dma_fence_context base class, but rather just a flag in
> > > the dma_fence_ops if you want to change the behavior.
> > If there's something broken we should just fix it, not force everyone to
> > set a random flag. dma_fence work like special wait_queues, so if we
> > differ then we should go back to that.
> 
> Wait a second with that, this is not broken. It's just different behavior
> and there are good arguments for both sides.

Oh I know, but since dma_fence is meant to be a wait_queue with hw
support, they really should work the same and have the same tuning.

> If a wait is short you can have situations where you want to start the
> thread on the original CPU.
>     This is because you can assume that the caches on that CPU are still hot
> and heating up the caches on the local CPU would take longer than an inter
> CPU interrupt.
> 
> But if the wait is long it makes more sense to run the thread on the CPU
> where you noticed the wake up event.
>     This is because you can assume that the caches are cold anyway and
> starting the thread on the current CPU (most likely from an interrupt
> handler) gives you the absolutely best latency.
>     In other words you usually return from the interrupt handler and just
> directly switch to the now running thread.
> 
> I'm not sure if all drivers want the same behavior. Rob here seems to prefer
> number 2, but we have used 1 for dma_fence for a rather long time now and it
> could be that some people start to complain when we switch unconditionally.

I think the defaults are different because usually if you wake up a wait
queue, there's a 1:1 relationship between waker and waiter.

Otoh if you just wake a thread it's probably some kinda of service thread,
so N:1 relationship between waker and waiter. And in that case moving the
waiter is a really bad idea.

I think dma_fence is generally much closer to 1:1 (with the most common
one irq handler -> scheduler thread for that engine), so having the "same
cpu" wake behaviour really sounds like the right thing to do. And not
anything that is specifically an issue with how qualcom gpus work, and
hence should be msm specific.

If it turns out to be the wrong thing, well I guess we'll learn
something. And then maybe we have a different version of dma_fence_wait.
-Daniel

Christian König July 22, 2021, 11:33 a.m. UTC | #9

Am 22.07.21 um 12:47 schrieb Daniel Vetter:
> On Thu, Jul 22, 2021 at 11:28:01AM +0200, Christian König wrote:

>> Am 22.07.21 um 11:08 schrieb Daniel Vetter:

>>> [SNIP]

>>>> As far as I know wake_up_state() tries to run the thread on the CPU it was

>>>> scheduled last, while wait_event_* makes the thread run on the CPU who

>>>> issues the wake by default.

>>>>

>>>> And yes I've also noticed this already and it was one of the reason why I

>>>> suggested to use a wait_queue instead of the hand wired dma_fence_wait

>>>> implementation.

>>> The first versions had used wait_queue, but iirc we had some issues with

>>> the callbacks and stuff and that was the reasons for hand-rolling. Or

>>> maybe it was the integration of the lockless fastpath for

>>> dma_fence_is_signalled().

>>>

>>>> [SNIP]

>>>> Well it would have been nicer if we used the existing infrastructure instead

>>>> of re-inventing stuff for dma_fence, but that chance is long gone.

>>>>

>>>> And you don't need a dma_fence_context base class, but rather just a flag in

>>>> the dma_fence_ops if you want to change the behavior.

>>> If there's something broken we should just fix it, not force everyone to

>>> set a random flag. dma_fence work like special wait_queues, so if we

>>> differ then we should go back to that.

>> Wait a second with that, this is not broken. It's just different behavior

>> and there are good arguments for both sides.

> Oh I know, but since dma_fence is meant to be a wait_queue with hw

> support, they really should work the same and have the same tuning.

>

>> If a wait is short you can have situations where you want to start the

>> thread on the original CPU.

>>      This is because you can assume that the caches on that CPU are still hot

>> and heating up the caches on the local CPU would take longer than an inter

>> CPU interrupt.

>>

>> But if the wait is long it makes more sense to run the thread on the CPU

>> where you noticed the wake up event.

>>      This is because you can assume that the caches are cold anyway and

>> starting the thread on the current CPU (most likely from an interrupt

>> handler) gives you the absolutely best latency.

>>      In other words you usually return from the interrupt handler and just

>> directly switch to the now running thread.

>>

>> I'm not sure if all drivers want the same behavior. Rob here seems to prefer

>> number 2, but we have used 1 for dma_fence for a rather long time now and it

>> could be that some people start to complain when we switch unconditionally.

> I think the defaults are different because usually if you wake up a wait

> queue, there's a 1:1 relationship between waker and waiter.

>

> Otoh if you just wake a thread it's probably some kinda of service thread,

> so N:1 relationship between waker and waiter. And in that case moving the

> waiter is a really bad idea.


Exactly that, yes.

> I think dma_fence is generally much closer to 1:1 (with the most common

> one irq handler -> scheduler thread for that engine), so having the "same

> cpu" wake behaviour really sounds like the right thing to do. And not

> anything that is specifically an issue with how qualcom gpus work, and

> hence should be msm specific.


That's the point I really can't judge. At least for AMD stuff we try 
very hard to avoid waiting for the GPU in the first place.

But yes it might indeed be better to do it like this, but to be honest 
no idea what functions should actually be used for this.

So feel free to investigate further how to improve this.

> If it turns out to be the wrong thing, well I guess we'll learn

> something. And then maybe we have a different version of dma_fence_wait.


Yeah, I would rather try to avoid that.

Christian.

> -Daniel

Rob Clark July 22, 2021, 3:40 p.m. UTC | #10

On Thu, Jul 22, 2021 at 1:42 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>

> Am 21.07.21 um 21:03 schrieb Daniel Vetter:

> > On Wed, Jul 21, 2021 at 09:34:43AM -0700, Rob Clark wrote:

> >> On Wed, Jul 21, 2021 at 12:59 AM Daniel Vetter <daniel@ffwll.ch> wrote:

> >>> On Wed, Jul 21, 2021 at 12:32 AM Rob Clark <robdclark@gmail.com> wrote:

> >>>> On Tue, Jul 20, 2021 at 1:55 PM Daniel Vetter <daniel@ffwll.ch> wrote:

> >>>>> On Tue, Jul 20, 2021 at 8:26 PM Rob Clark <robdclark@gmail.com> wrote:

> >>>>>> On Tue, Jul 20, 2021 at 11:03 AM Christian König

> >>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:

> >>>>>>> Hi Rob,

> >>>>>>>

> >>>>>>> Am 20.07.21 um 17:07 schrieb Rob Clark:

> >>>>>>>> From: Rob Clark <robdclark@chromium.org>

> >>>>>>>>

> >>>>>>>> Somehow we had neither ->wait() nor dma_fence_signal() calls, and no

> >>>>>>>> one noticed.  Oops.

> >>>>>>>

> >>>>>>> I'm not sure if that is a good idea.

> >>>>>>>

> >>>>>>> The dma_fence->wait() callback is pretty much deprecated and should not

> >>>>>>> be used any more.

> >>>>>>>

> >>>>>>> What exactly do you need that for?

> >>>>>> Well, the alternative is to track the set of fences which have

> >>>>>> signalling enabled, and then figure out which ones to signal, which

> >>>>>> seems like a lot more work, vs just re-purposing the wait

> >>>>>> implementation we already have for non-dma_fence cases ;-)

> >>>>>>

> >>>>>> Why is the ->wait() callback (pretty much) deprecated?

> >>>>> Because if you need it that means for your driver dma_fence_add_cb is

> >>>>> broken, which means a _lot_ of things don't work. Like dma_buf poll

> >>>>> (compositors have patches to start using that), and I think

> >>>>> drm/scheduler also becomes rather unhappy.

> >>>> I'm starting to page back in how this works.. fence cb's aren't broken

> >>>> (which is also why dma_fence_wait() was not completely broken),

> >>>> because in retire_submits() we call

> >>>> dma_fence_is_signaled(submit->hw_fence).

> >>>>

> >>>> But the reason that the custom wait function cleans up a tiny bit of

> >>>> jank is that the wait_queue_head_t gets signaled earlier, before we

> >>>> start iterating the submits and doing all that retire_submit() stuff

> >>>> (unpin/unref bo's, etc).  I suppose I could just split things up to

> >>>> call dma_fence_signal() earlier, and *then* do the retire_submits()

> >>>> stuff.

> >>> Yeah reducing the latency there sounds like a good idea.

> >>> -Daniel

> >>>

> >> Hmm, no, turns out that isn't the problem.. or, well, it is probably a

> >> good idea to call drm_fence_signal() earlier.  But it seems like

> >> waking up from wait_event_* is faster than wake_up_state(wait->task,

> >> TASK_NORMAL).  I suppose the wake_up_state() approach still needs for

> >> the scheduler to get around to schedule the runnable task.

>

> As far as I know wake_up_state() tries to run the thread on the CPU it

> was scheduled last, while wait_event_* makes the thread run on the CPU

> who issues the wake by default.

>

> And yes I've also noticed this already and it was one of the reason why

> I suggested to use a wait_queue instead of the hand wired dma_fence_wait

> implementation.

>

> >>

> >> So for now, I'm going back to my own wait function (plus earlier

> >> drm_fence_signal())

> >>

> >> Before removing dma_fence_opps::wait(), I guess we want to re-think

> >> dma_fence_default_wait().. but I think that would require a

> >> dma_fence_context base class (rather than just a raw integer).

> > Uh that's not great ... can't we fix this instead of papering over it in

> > drivers? Aside from maybe different wakeup flags it all is supposed to

> > work exactly the same underneath, and whether using a wait queue or not

> > really shouldn't matter.

>

> Well it would have been nicer if we used the existing infrastructure

> instead of re-inventing stuff for dma_fence, but that chance is long gone.

>

> And you don't need a dma_fence_context base class, but rather just a

> flag in the dma_fence_ops if you want to change the behavior.


Hmm, I was thinking dma_fence_context to have a place for the
wait_queue_head, but I guess that could also be per-dma_fence

Rob Clark July 22, 2021, 3:46 p.m. UTC | #11

On Thu, Jul 22, 2021 at 2:28 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 22.07.21 um 11:08 schrieb Daniel Vetter:
> > [SNIP]
> >> As far as I know wake_up_state() tries to run the thread on the CPU it was
> >> scheduled last, while wait_event_* makes the thread run on the CPU who
> >> issues the wake by default.
> >>
> >> And yes I've also noticed this already and it was one of the reason why I
> >> suggested to use a wait_queue instead of the hand wired dma_fence_wait
> >> implementation.
> > The first versions had used wait_queue, but iirc we had some issues with
> > the callbacks and stuff and that was the reasons for hand-rolling. Or
> > maybe it was the integration of the lockless fastpath for
> > dma_fence_is_signalled().
> >
> >> [SNIP]
> >> Well it would have been nicer if we used the existing infrastructure instead
> >> of re-inventing stuff for dma_fence, but that chance is long gone.
> >>
> >> And you don't need a dma_fence_context base class, but rather just a flag in
> >> the dma_fence_ops if you want to change the behavior.
> > If there's something broken we should just fix it, not force everyone to
> > set a random flag. dma_fence work like special wait_queues, so if we
> > differ then we should go back to that.
>
> Wait a second with that, this is not broken. It's just different
> behavior and there are good arguments for both sides.
>
> If a wait is short you can have situations where you want to start the
> thread on the original CPU.
>      This is because you can assume that the caches on that CPU are
> still hot and heating up the caches on the local CPU would take longer
> than an inter CPU interrupt.
>
> But if the wait is long it makes more sense to run the thread on the CPU
> where you noticed the wake up event.
>      This is because you can assume that the caches are cold anyway and
> starting the thread on the current CPU (most likely from an interrupt
> handler) gives you the absolutely best latency.
>      In other words you usually return from the interrupt handler and
> just directly switch to the now running thread.
>
> I'm not sure if all drivers want the same behavior. Rob here seems to
> prefer number 2, but we have used 1 for dma_fence for a rather long time
> now and it could be that some people start to complain when we switch
> unconditionally.
>

Hmm, I wonder if it would make sense to have a dma_wait_fence() flag
to control the behavior, since it is maybe more about the waiter (and
perhaps how long the waiter expects to wait) than the signaler..

BR,
-R

drm/msm: Add fence->wait() op

Commit Message

Comments

Patch