[v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

Message ID	20220412212558.827289-1-olvaffe@gmail.com
State	New
Headers	show Return-Path: <linux-arm-msm-owner@kernel.org> From: Chia-I Wu <olvaffe@gmail.com> To: freedreno@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org, Sean Paul <sean@poorly.run>, Abhinav Kumar <quic_abhinavk@quicinc.com>, David Airlie <airlied@linux.ie>, Sumit Semwal <sumit.semwal@linaro.org>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, linux-arm-msm@vger.kernel.org, Daniel Vetter <daniel@ffwll.ch>, Rob Clark <robdclark@chromium.org>, Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Subject: [PATCH v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit Date: Tue, 12 Apr 2022 14:25:58 -0700 Message-Id: <20220412212558.827289-1-olvaffe@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit \| expand [v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

Chia-I Wu April 12, 2022, 9:25 p.m. UTC

In practice, trace_dma_fence_init called from dma_fence_init is good
enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
and virtio both have cases where trace_dma_fence_init and
trace_dma_fence_emit can be apart.  It is easier for visualization tools
to always use the more correct trace_dma_fence_emit when visualizing
fence timelines.

v2: improve commit message (Dmitry)

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Cc: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/gpu/drm/msm/msm_gpu.c | 2 ++
 1 file changed, 2 insertions(+)

Chia-I Wu April 26, 2022, 4:32 p.m. UTC | #1

On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote:
>
> In practice, trace_dma_fence_init called from dma_fence_init is good
> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
> and virtio both have cases where trace_dma_fence_init and
> trace_dma_fence_emit can be apart.  It is easier for visualization tools
> to always use the more correct trace_dma_fence_emit when visualizing
> fence timelines.
>
> v2: improve commit message (Dmitry)
>
> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> Cc: Rob Clark <robdclark@chromium.org>
> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
This has been reviewed.  Should we land it?

(Or, how do I check if it has landed?)

Christian König April 26, 2022, 4:42 p.m. UTC | #2

Am 26.04.22 um 18:32 schrieb Chia-I Wu:
> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote:
>> In practice, trace_dma_fence_init called from dma_fence_init is good
>> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
>> and virtio both have cases where trace_dma_fence_init and
>> trace_dma_fence_emit can be apart.  It is easier for visualization tools
>> to always use the more correct trace_dma_fence_emit when visualizing
>> fence timelines.
>>
>> v2: improve commit message (Dmitry)
>>
>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
>> Cc: Rob Clark <robdclark@chromium.org>
>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> This has been reviewed.  Should we land it?

No, there are still open discussions about it.

Regards,
Christian.

>
> (Or, how do I check if it has landed?)

Rob Clark April 26, 2022, 5:05 p.m. UTC | #3

On Tue, Apr 26, 2022 at 9:42 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 26.04.22 um 18:32 schrieb Chia-I Wu:
> > On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote:
> >> In practice, trace_dma_fence_init called from dma_fence_init is good
> >> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
> >> and virtio both have cases where trace_dma_fence_init and
> >> trace_dma_fence_emit can be apart.  It is easier for visualization tools
> >> to always use the more correct trace_dma_fence_emit when visualizing
> >> fence timelines.
> >>
> >> v2: improve commit message (Dmitry)
> >>
> >> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> >> Cc: Rob Clark <robdclark@chromium.org>
> >> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > This has been reviewed.  Should we land it?
>
> No, there are still open discussions about it.

I think if it is needed for trace visualization, that is justification
enough for me

I don't really see otherwise how a generic trace visualization tool
like perfetto would handle the case that some fence timelines have
separate events but others do not.

BR,
-R

> Regards,
> Christian.
>
> >
> > (Or, how do I check if it has landed?)
>

Christian König April 26, 2022, 5:08 p.m. UTC | #4

Am 26.04.22 um 19:05 schrieb Rob Clark:
> On Tue, Apr 26, 2022 at 9:42 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 26.04.22 um 18:32 schrieb Chia-I Wu:
>>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote:
>>>> In practice, trace_dma_fence_init called from dma_fence_init is good
>>>> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
>>>> and virtio both have cases where trace_dma_fence_init and
>>>> trace_dma_fence_emit can be apart.  It is easier for visualization tools
>>>> to always use the more correct trace_dma_fence_emit when visualizing
>>>> fence timelines.
>>>>
>>>> v2: improve commit message (Dmitry)
>>>>
>>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
>>>> Cc: Rob Clark <robdclark@chromium.org>
>>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>> This has been reviewed.  Should we land it?
>> No, there are still open discussions about it.
> I think if it is needed for trace visualization, that is justification
> enough for me
>
> I don't really see otherwise how a generic trace visualization tool
> like perfetto would handle the case that some fence timelines have
> separate events but others do not.

Well I just send a patch to completely remove the trace point.

As I said it absolutely doesn't make sense to use this for 
visualization, that's what the trace_dma_fence_init trace point is good for.

The only use case is for debugging the GPU scheduler and we should 
probably introduce a separate GPU scheduler specific trace point for 
this instead if we should ever need it.

Regards,
Christian.

>
> BR,
> -R
>
>> Regards,
>> Christian.
>>
>>> (Or, how do I check if it has landed?)

Rob Clark April 26, 2022, 5:16 p.m. UTC | #5

On Tue, Apr 26, 2022 at 10:08 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 26.04.22 um 19:05 schrieb Rob Clark:
> > On Tue, Apr 26, 2022 at 9:42 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 26.04.22 um 18:32 schrieb Chia-I Wu:
> >>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote:
> >>>> In practice, trace_dma_fence_init called from dma_fence_init is good
> >>>> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
> >>>> and virtio both have cases where trace_dma_fence_init and
> >>>> trace_dma_fence_emit can be apart.  It is easier for visualization tools
> >>>> to always use the more correct trace_dma_fence_emit when visualizing
> >>>> fence timelines.
> >>>>
> >>>> v2: improve commit message (Dmitry)
> >>>>
> >>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> >>>> Cc: Rob Clark <robdclark@chromium.org>
> >>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> >>> This has been reviewed.  Should we land it?
> >> No, there are still open discussions about it.
> > I think if it is needed for trace visualization, that is justification
> > enough for me
> >
> > I don't really see otherwise how a generic trace visualization tool
> > like perfetto would handle the case that some fence timelines have
> > separate events but others do not.
>
> Well I just send a patch to completely remove the trace point.
>
> As I said it absolutely doesn't make sense to use this for
> visualization, that's what the trace_dma_fence_init trace point is good for.
>
> The only use case is for debugging the GPU scheduler and we should
> probably introduce a separate GPU scheduler specific trace point for
> this instead if we should ever need it.

Hmm, AFAIU from Chia-I, virtgpu has a separation of init and emit..
OTOH if using separate events in these special cases is better, then
I'm ok with that and can revert this patch.  Chia-I is more familiar
with the visualization end of it, so I'll let him comment on whether
that is a workable approach.

BR,
-R

> Regards,
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> (Or, how do I check if it has landed?)
>

Christian König April 26, 2022, 5:20 p.m. UTC | #6

Am 26.04.22 um 19:16 schrieb Rob Clark:
> On Tue, Apr 26, 2022 at 10:08 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 26.04.22 um 19:05 schrieb Rob Clark:
>>> On Tue, Apr 26, 2022 at 9:42 AM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 26.04.22 um 18:32 schrieb Chia-I Wu:
>>>>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote:
>>>>>> In practice, trace_dma_fence_init called from dma_fence_init is good
>>>>>> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
>>>>>> and virtio both have cases where trace_dma_fence_init and
>>>>>> trace_dma_fence_emit can be apart.  It is easier for visualization tools
>>>>>> to always use the more correct trace_dma_fence_emit when visualizing
>>>>>> fence timelines.
>>>>>>
>>>>>> v2: improve commit message (Dmitry)
>>>>>>
>>>>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
>>>>>> Cc: Rob Clark <robdclark@chromium.org>
>>>>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>>>>> This has been reviewed.  Should we land it?
>>>> No, there are still open discussions about it.
>>> I think if it is needed for trace visualization, that is justification
>>> enough for me
>>>
>>> I don't really see otherwise how a generic trace visualization tool
>>> like perfetto would handle the case that some fence timelines have
>>> separate events but others do not.
>> Well I just send a patch to completely remove the trace point.
>>
>> As I said it absolutely doesn't make sense to use this for
>> visualization, that's what the trace_dma_fence_init trace point is good for.
>>
>> The only use case is for debugging the GPU scheduler and we should
>> probably introduce a separate GPU scheduler specific trace point for
>> this instead if we should ever need it.
> Hmm, AFAIU from Chia-I, virtgpu has a separation of init and emit..
> OTOH if using separate events in these special cases is better, then
> I'm ok with that and can revert this patch.  Chia-I is more familiar
> with the visualization end of it, so I'll let him comment on whether
> that is a workable approach.

Interesting, I wasn't aware of the virtgpu separation of init and emit.

But yes if there is really an use case for tracing this time stamp as 
well then we should probably have that use case specific.

I just looked into the scheduler case a bit and found that even there we 
already have a different trace point for it, which is probably the 
reason why we never used trace_dma_fence_emit there.

So yes, there really isn't much reason I can see two have two separate 
trace points for every driver.

Christian.

>
> BR,
> -R
>
>> Regards,
>> Christian.
>>
>>> BR,
>>> -R
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> (Or, how do I check if it has landed?)

Chia-I Wu April 26, 2022, 5:40 p.m. UTC | #7

On Tue, Apr 26, 2022 at 10:20 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 26.04.22 um 19:16 schrieb Rob Clark:
> > On Tue, Apr 26, 2022 at 10:08 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 26.04.22 um 19:05 schrieb Rob Clark:
> >>> On Tue, Apr 26, 2022 at 9:42 AM Christian König
> >>> <christian.koenig@amd.com> wrote:
> >>>> Am 26.04.22 um 18:32 schrieb Chia-I Wu:
> >>>>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote:
> >>>>>> In practice, trace_dma_fence_init called from dma_fence_init is good
> >>>>>> enough and almost no driver calls trace_dma_fence_emit.  But drm_sched
> >>>>>> and virtio both have cases where trace_dma_fence_init and
> >>>>>> trace_dma_fence_emit can be apart.  It is easier for visualization tools
> >>>>>> to always use the more correct trace_dma_fence_emit when visualizing
> >>>>>> fence timelines.
> >>>>>>
> >>>>>> v2: improve commit message (Dmitry)
> >>>>>>
> >>>>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
> >>>>>> Cc: Rob Clark <robdclark@chromium.org>
> >>>>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> >>>>> This has been reviewed.  Should we land it?
> >>>> No, there are still open discussions about it.
> >>> I think if it is needed for trace visualization, that is justification
> >>> enough for me
> >>>
> >>> I don't really see otherwise how a generic trace visualization tool
> >>> like perfetto would handle the case that some fence timelines have
> >>> separate events but others do not.
> >> Well I just send a patch to completely remove the trace point.
> >>
> >> As I said it absolutely doesn't make sense to use this for
> >> visualization, that's what the trace_dma_fence_init trace point is good for.
I am a bit confused by this.  _emit and _signaled are a great way to
see how many fences are pending from cpu's point of view.  How does
_emit make no sense and _init is good instead?

Or is this just that _init is good enough most of the time?  (More below)

> >>
> >> The only use case is for debugging the GPU scheduler and we should
> >> probably introduce a separate GPU scheduler specific trace point for
> >> this instead if we should ever need it.
> > Hmm, AFAIU from Chia-I, virtgpu has a separation of init and emit..
> > OTOH if using separate events in these special cases is better, then
> > I'm ok with that and can revert this patch.  Chia-I is more familiar
> > with the visualization end of it, so I'll let him comment on whether
> > that is a workable approach.
>
> Interesting, I wasn't aware of the virtgpu separation of init and emit.
>
> But yes if there is really an use case for tracing this time stamp as
> well then we should probably have that use case specific.
>
> I just looked into the scheduler case a bit and found that even there we
> already have a different trace point for it, which is probably the
> reason why we never used trace_dma_fence_emit there.
Yeah, I am using drm_sched tracepoints in that case.
>
> So yes, there really isn't much reason I can see two have two separate
> trace points for every driver.
That sounds fair.  In any tool, it should be easy to see if a fence
timeline has _emit in addition to _init, and adapt accordingly.  We
can drop this patch.

A clarification that _emit is optional/redundant for most fence
timelines should be nicer than removing it though.

>
> Christian.
>
> >
> > BR,
> > -R
> >
> >> Regards,
> >> Christian.
> >>
> >>> BR,
> >>> -R
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> (Or, how do I check if it has landed?)
>

Christian König April 26, 2022, 6:02 p.m. UTC | #8

Am 26.04.22 um 19:40 schrieb Chia-I Wu:
> [SNIP]
>>>> Well I just send a patch to completely remove the trace point.
>>>>
>>>> As I said it absolutely doesn't make sense to use this for
>>>> visualization, that's what the trace_dma_fence_init trace point is good for.
> I am a bit confused by this.  _emit and _signaled are a great way to
> see how many fences are pending from cpu's point of view.  How does
> _emit make no sense and _init is good instead?

We had exactly that confusion now multiple times and it's one of the 
main reasons why I want to remove the _emit trace point.

See the when you want to know how many fences are pending you need to 
watch out for init/destroy and *NOT* emit.

The reason is that in the special case where emit makes sense (e.g. the 
GPU scheduler fences) emit comes later than init, but pending on the CPU 
and taking up resources are all fences and not just the one emitted to 
the hardware.

On the other hand when you want to measure how much time each operation 
took on the hardware you need to take a look at the differences of the 
signal events on each timeline.

So there isn't really any use case for the emit trace point, except when 
you want to figure out how much latency the scheduler introduce. Then 
you want to take a look at init and emit, but that isn't really that 
interesting for performance analyses.

Regards,
Christian.

Chia-I Wu April 26, 2022, 6:50 p.m. UTC | #9

On Tue, Apr 26, 2022 at 11:02 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 26.04.22 um 19:40 schrieb Chia-I Wu:
> > [SNIP]
> >>>> Well I just send a patch to completely remove the trace point.
> >>>>
> >>>> As I said it absolutely doesn't make sense to use this for
> >>>> visualization, that's what the trace_dma_fence_init trace point is good for.
> > I am a bit confused by this.  _emit and _signaled are a great way to
> > see how many fences are pending from cpu's point of view.  How does
> > _emit make no sense and _init is good instead?
>
> We had exactly that confusion now multiple times and it's one of the
> main reasons why I want to remove the _emit trace point.
>
> See the when you want to know how many fences are pending you need to
> watch out for init/destroy and *NOT* emit.
>
> The reason is that in the special case where emit makes sense (e.g. the
> GPU scheduler fences) emit comes later than init, but pending on the CPU
> and taking up resources are all fences and not just the one emitted to
> the hardware.
I am more interested in pending on the GPU.

>
> On the other hand when you want to measure how much time each operation
> took on the hardware you need to take a look at the differences of the
> signal events on each timeline.
_signaled alone is not enough when the GPU is not always busy.  After
the last pending fence signals but before the following _init/_emit,
nothing is pending.

For all drivers except virtio-gpu, _init and "ring head update" always
happen close enough that I can see why _emit is redundant.  But I like
having _emit as a generic tracepoint for timelines where _init and
_emit can be apart, instead of requiring a special case tracepoint for
each special case timeline.

>
> So there isn't really any use case for the emit trace point, except when
> you want to figure out how much latency the scheduler introduce. Then
> you want to take a look at init and emit, but that isn't really that
> interesting for performance analyses.
>
> Regards,
> Christian.
>

Christian König April 27, 2022, 6:19 a.m. UTC | #10

Am 26.04.22 um 20:50 schrieb Chia-I Wu:
> On Tue, Apr 26, 2022 at 11:02 AM Christian König
> <christian.koenig@amd.com> wrote:
>> Am 26.04.22 um 19:40 schrieb Chia-I Wu:
>>> [SNIP]
>>>>>> Well I just send a patch to completely remove the trace point.
>>>>>>
>>>>>> As I said it absolutely doesn't make sense to use this for
>>>>>> visualization, that's what the trace_dma_fence_init trace point is good for.
>>> I am a bit confused by this.  _emit and _signaled are a great way to
>>> see how many fences are pending from cpu's point of view.  How does
>>> _emit make no sense and _init is good instead?
>> We had exactly that confusion now multiple times and it's one of the
>> main reasons why I want to remove the _emit trace point.
>>
>> See the when you want to know how many fences are pending you need to
>> watch out for init/destroy and *NOT* emit.
>>
>> The reason is that in the special case where emit makes sense (e.g. the
>> GPU scheduler fences) emit comes later than init, but pending on the CPU
>> and taking up resources are all fences and not just the one emitted to
>> the hardware.
> I am more interested in pending on the GPU.
>
>> On the other hand when you want to measure how much time each operation
>> took on the hardware you need to take a look at the differences of the
>> signal events on each timeline.
> _signaled alone is not enough when the GPU is not always busy.  After
> the last pending fence signals but before the following _init/_emit,
> nothing is pending.

Yeah, I'm perfectly aware of that.

> For all drivers except virtio-gpu, _init and "ring head update" always
> happen close enough that I can see why _emit is redundant.  But I like
> having _emit as a generic tracepoint for timelines where _init and
> _emit can be apart, instead of requiring a special case tracepoint for
> each special case timeline.

And I'm certainly not going to add _emit to all drivers just because of 
that. As you said it is a special case for virtio-gpu and the GPU scheduler.

And as I explained before the difference between _init and _emit 
shouldn't matter to your visualization. The background is that as soon 
as an dma_fence is initialized with _init it is "live" regarding the 
dependency and memory management and exactly that's what matters for the 
visualization.

The latency between _init and _emit is just interesting for debugging 
the scheduler and surprisingly virtio-gpu as well, for all other use 
cases it is irrelevant.

Regards,
Christian.

>> So there isn't really any use case for the emit trace point, except when
>> you want to figure out how much latency the scheduler introduce. Then
>> you want to take a look at init and emit, but that isn't really that
>> interesting for performance analyses.
>>
>> Regards,
>> Christian.
>>

Rob Clark April 27, 2022, 4:07 p.m. UTC | #11

On Tue, Apr 26, 2022 at 11:20 PM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 26.04.22 um 20:50 schrieb Chia-I Wu:
> > On Tue, Apr 26, 2022 at 11:02 AM Christian König
> > <christian.koenig@amd.com> wrote:
> >> Am 26.04.22 um 19:40 schrieb Chia-I Wu:
> >>> [SNIP]
> >>>>>> Well I just send a patch to completely remove the trace point.
> >>>>>>
> >>>>>> As I said it absolutely doesn't make sense to use this for
> >>>>>> visualization, that's what the trace_dma_fence_init trace point is good for.
> >>> I am a bit confused by this.  _emit and _signaled are a great way to
> >>> see how many fences are pending from cpu's point of view.  How does
> >>> _emit make no sense and _init is good instead?
> >> We had exactly that confusion now multiple times and it's one of the
> >> main reasons why I want to remove the _emit trace point.
> >>
> >> See the when you want to know how many fences are pending you need to
> >> watch out for init/destroy and *NOT* emit.
> >>
> >> The reason is that in the special case where emit makes sense (e.g. the
> >> GPU scheduler fences) emit comes later than init, but pending on the CPU
> >> and taking up resources are all fences and not just the one emitted to
> >> the hardware.
> > I am more interested in pending on the GPU.
> >
> >> On the other hand when you want to measure how much time each operation
> >> took on the hardware you need to take a look at the differences of the
> >> signal events on each timeline.
> > _signaled alone is not enough when the GPU is not always busy.  After
> > the last pending fence signals but before the following _init/_emit,
> > nothing is pending.
>
> Yeah, I'm perfectly aware of that.
>
> > For all drivers except virtio-gpu, _init and "ring head update" always
> > happen close enough that I can see why _emit is redundant.  But I like
> > having _emit as a generic tracepoint for timelines where _init and
> > _emit can be apart, instead of requiring a special case tracepoint for
> > each special case timeline.
>
> And I'm certainly not going to add _emit to all drivers just because of
> that. As you said it is a special case for virtio-gpu and the GPU scheduler.
>
> And as I explained before the difference between _init and _emit
> shouldn't matter to your visualization. The background is that as soon
> as an dma_fence is initialized with _init it is "live" regarding the
> dependency and memory management and exactly that's what matters for the
> visualization.
>
> The latency between _init and _emit is just interesting for debugging
> the scheduler and surprisingly virtio-gpu as well, for all other use
> cases it is irrelevant.

It might actually be *more* interesting for virtio-gpu.. unless there
is some other way to link guest and host fences to see what the
potential latency of guest->host is

re: adding the tracepoint to other drivers, I'm fine with folks doing
that as needed.  Unless you have a better proposal about how to
visualize init vs emit latency, I think it's fine to have an extra
tracepoint even if it is redundant in some cases.  The visualization
tool is the customer here, we have to give it what it wants/needs.

BR,
-R

>
> Regards,
> Christian.
>
> >> So there isn't really any use case for the emit trace point, except when
> >> you want to figure out how much latency the scheduler introduce. Then
> >> you want to take a look at init and emit, but that isn't really that
> >> interesting for performance analyses.
> >>
> >> Regards,
> >> Christian.
> >>
>

Chia-I Wu April 27, 2022, 5:55 p.m. UTC | #12

On Wed, Apr 27, 2022 at 9:07 AM Rob Clark <robdclark@chromium.org> wrote:
>
> On Tue, Apr 26, 2022 at 11:20 PM Christian König
> <christian.koenig@amd.com> wrote:
> >
> > Am 26.04.22 um 20:50 schrieb Chia-I Wu:
> > > On Tue, Apr 26, 2022 at 11:02 AM Christian König
> > > <christian.koenig@amd.com> wrote:
> > >> Am 26.04.22 um 19:40 schrieb Chia-I Wu:
> > >>> [SNIP]
> > >>>>>> Well I just send a patch to completely remove the trace point.
> > >>>>>>
> > >>>>>> As I said it absolutely doesn't make sense to use this for
> > >>>>>> visualization, that's what the trace_dma_fence_init trace point is good for.
> > >>> I am a bit confused by this.  _emit and _signaled are a great way to
> > >>> see how many fences are pending from cpu's point of view.  How does
> > >>> _emit make no sense and _init is good instead?
> > >> We had exactly that confusion now multiple times and it's one of the
> > >> main reasons why I want to remove the _emit trace point.
> > >>
> > >> See the when you want to know how many fences are pending you need to
> > >> watch out for init/destroy and *NOT* emit.
> > >>
> > >> The reason is that in the special case where emit makes sense (e.g. the
> > >> GPU scheduler fences) emit comes later than init, but pending on the CPU
> > >> and taking up resources are all fences and not just the one emitted to
> > >> the hardware.
> > > I am more interested in pending on the GPU.
> > >
> > >> On the other hand when you want to measure how much time each operation
> > >> took on the hardware you need to take a look at the differences of the
> > >> signal events on each timeline.
> > > _signaled alone is not enough when the GPU is not always busy.  After
> > > the last pending fence signals but before the following _init/_emit,
> > > nothing is pending.
> >
> > Yeah, I'm perfectly aware of that.
> >
> > > For all drivers except virtio-gpu, _init and "ring head update" always
> > > happen close enough that I can see why _emit is redundant.  But I like
> > > having _emit as a generic tracepoint for timelines where _init and
> > > _emit can be apart, instead of requiring a special case tracepoint for
> > > each special case timeline.
> >
> > And I'm certainly not going to add _emit to all drivers just because of
> > that. As you said it is a special case for virtio-gpu and the GPU scheduler.
> >
> > And as I explained before the difference between _init and _emit
> > shouldn't matter to your visualization. The background is that as soon
> > as an dma_fence is initialized with _init it is "live" regarding the
> > dependency and memory management and exactly that's what matters for the
> > visualization.
> >
> > The latency between _init and _emit is just interesting for debugging
> > the scheduler and surprisingly virtio-gpu as well, for all other use
> > cases it is irrelevant.
>
> It might actually be *more* interesting for virtio-gpu.. unless there
> is some other way to link guest and host fences to see what the
> potential latency of guest->host is
>
> re: adding the tracepoint to other drivers, I'm fine with folks doing
> that as needed.  Unless you have a better proposal about how to
> visualize init vs emit latency, I think it's fine to have an extra
> tracepoint even if it is redundant in some cases.  The visualization
> tool is the customer here, we have to give it what it wants/needs.
As far as perfetto is concerned, I just use either _init or _emit on a
per-timeline basis.  We can drop this patch for msm, and do not need
to change drivers whose latencies between _init/_emit are ignorable.

init vs emit latency is still interesting.  I prefer keeping _init /
_emit as generic events that tools can parse, rather than adding
per-driver special cases that tools need to understand.

>
> BR,
> -R
>
> >
> > Regards,
> > Christian.
> >
> > >> So there isn't really any use case for the emit trace point, except when
> > >> you want to figure out how much latency the scheduler introduce. Then
> > >> you want to take a look at init and emit, but that isn't really that
> > >> interesting for performance analyses.
> > >>
> > >> Regards,
> > >> Christian.
> > >>
> >

[v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit

Commit Message

Comments

Patch