Message ID | 20220412212558.827289-1-olvaffe@gmail.com |
---|---|
State | New |
Headers | show |
Series | [v2] drm/msm: add trace_dma_fence_emit to msm_gpu_submit | expand |
On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote: > > In practice, trace_dma_fence_init called from dma_fence_init is good > enough and almost no driver calls trace_dma_fence_emit. But drm_sched > and virtio both have cases where trace_dma_fence_init and > trace_dma_fence_emit can be apart. It is easier for visualization tools > to always use the more correct trace_dma_fence_emit when visualizing > fence timelines. > > v2: improve commit message (Dmitry) > > Signed-off-by: Chia-I Wu <olvaffe@gmail.com> > Cc: Rob Clark <robdclark@chromium.org> > Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> This has been reviewed. Should we land it? (Or, how do I check if it has landed?)
Am 26.04.22 um 18:32 schrieb Chia-I Wu: > On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote: >> In practice, trace_dma_fence_init called from dma_fence_init is good >> enough and almost no driver calls trace_dma_fence_emit. But drm_sched >> and virtio both have cases where trace_dma_fence_init and >> trace_dma_fence_emit can be apart. It is easier for visualization tools >> to always use the more correct trace_dma_fence_emit when visualizing >> fence timelines. >> >> v2: improve commit message (Dmitry) >> >> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> >> Cc: Rob Clark <robdclark@chromium.org> >> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > This has been reviewed. Should we land it? No, there are still open discussions about it. Regards, Christian. > > (Or, how do I check if it has landed?)
On Tue, Apr 26, 2022 at 9:42 AM Christian König <christian.koenig@amd.com> wrote: > > Am 26.04.22 um 18:32 schrieb Chia-I Wu: > > On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote: > >> In practice, trace_dma_fence_init called from dma_fence_init is good > >> enough and almost no driver calls trace_dma_fence_emit. But drm_sched > >> and virtio both have cases where trace_dma_fence_init and > >> trace_dma_fence_emit can be apart. It is easier for visualization tools > >> to always use the more correct trace_dma_fence_emit when visualizing > >> fence timelines. > >> > >> v2: improve commit message (Dmitry) > >> > >> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> > >> Cc: Rob Clark <robdclark@chromium.org> > >> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > > This has been reviewed. Should we land it? > > No, there are still open discussions about it. I think if it is needed for trace visualization, that is justification enough for me I don't really see otherwise how a generic trace visualization tool like perfetto would handle the case that some fence timelines have separate events but others do not. BR, -R > Regards, > Christian. > > > > > (Or, how do I check if it has landed?) >
Am 26.04.22 um 19:05 schrieb Rob Clark: > On Tue, Apr 26, 2022 at 9:42 AM Christian König > <christian.koenig@amd.com> wrote: >> Am 26.04.22 um 18:32 schrieb Chia-I Wu: >>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote: >>>> In practice, trace_dma_fence_init called from dma_fence_init is good >>>> enough and almost no driver calls trace_dma_fence_emit. But drm_sched >>>> and virtio both have cases where trace_dma_fence_init and >>>> trace_dma_fence_emit can be apart. It is easier for visualization tools >>>> to always use the more correct trace_dma_fence_emit when visualizing >>>> fence timelines. >>>> >>>> v2: improve commit message (Dmitry) >>>> >>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> >>>> Cc: Rob Clark <robdclark@chromium.org> >>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> >>> This has been reviewed. Should we land it? >> No, there are still open discussions about it. > I think if it is needed for trace visualization, that is justification > enough for me > > I don't really see otherwise how a generic trace visualization tool > like perfetto would handle the case that some fence timelines have > separate events but others do not. Well I just send a patch to completely remove the trace point. As I said it absolutely doesn't make sense to use this for visualization, that's what the trace_dma_fence_init trace point is good for. The only use case is for debugging the GPU scheduler and we should probably introduce a separate GPU scheduler specific trace point for this instead if we should ever need it. Regards, Christian. > > BR, > -R > >> Regards, >> Christian. >> >>> (Or, how do I check if it has landed?)
On Tue, Apr 26, 2022 at 10:08 AM Christian König <christian.koenig@amd.com> wrote: > > Am 26.04.22 um 19:05 schrieb Rob Clark: > > On Tue, Apr 26, 2022 at 9:42 AM Christian König > > <christian.koenig@amd.com> wrote: > >> Am 26.04.22 um 18:32 schrieb Chia-I Wu: > >>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote: > >>>> In practice, trace_dma_fence_init called from dma_fence_init is good > >>>> enough and almost no driver calls trace_dma_fence_emit. But drm_sched > >>>> and virtio both have cases where trace_dma_fence_init and > >>>> trace_dma_fence_emit can be apart. It is easier for visualization tools > >>>> to always use the more correct trace_dma_fence_emit when visualizing > >>>> fence timelines. > >>>> > >>>> v2: improve commit message (Dmitry) > >>>> > >>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> > >>>> Cc: Rob Clark <robdclark@chromium.org> > >>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > >>> This has been reviewed. Should we land it? > >> No, there are still open discussions about it. > > I think if it is needed for trace visualization, that is justification > > enough for me > > > > I don't really see otherwise how a generic trace visualization tool > > like perfetto would handle the case that some fence timelines have > > separate events but others do not. > > Well I just send a patch to completely remove the trace point. > > As I said it absolutely doesn't make sense to use this for > visualization, that's what the trace_dma_fence_init trace point is good for. > > The only use case is for debugging the GPU scheduler and we should > probably introduce a separate GPU scheduler specific trace point for > this instead if we should ever need it. Hmm, AFAIU from Chia-I, virtgpu has a separation of init and emit.. OTOH if using separate events in these special cases is better, then I'm ok with that and can revert this patch. Chia-I is more familiar with the visualization end of it, so I'll let him comment on whether that is a workable approach. BR, -R > Regards, > Christian. > > > > > BR, > > -R > > > >> Regards, > >> Christian. > >> > >>> (Or, how do I check if it has landed?) >
Am 26.04.22 um 19:16 schrieb Rob Clark: > On Tue, Apr 26, 2022 at 10:08 AM Christian König > <christian.koenig@amd.com> wrote: >> Am 26.04.22 um 19:05 schrieb Rob Clark: >>> On Tue, Apr 26, 2022 at 9:42 AM Christian König >>> <christian.koenig@amd.com> wrote: >>>> Am 26.04.22 um 18:32 schrieb Chia-I Wu: >>>>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote: >>>>>> In practice, trace_dma_fence_init called from dma_fence_init is good >>>>>> enough and almost no driver calls trace_dma_fence_emit. But drm_sched >>>>>> and virtio both have cases where trace_dma_fence_init and >>>>>> trace_dma_fence_emit can be apart. It is easier for visualization tools >>>>>> to always use the more correct trace_dma_fence_emit when visualizing >>>>>> fence timelines. >>>>>> >>>>>> v2: improve commit message (Dmitry) >>>>>> >>>>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> >>>>>> Cc: Rob Clark <robdclark@chromium.org> >>>>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> >>>>> This has been reviewed. Should we land it? >>>> No, there are still open discussions about it. >>> I think if it is needed for trace visualization, that is justification >>> enough for me >>> >>> I don't really see otherwise how a generic trace visualization tool >>> like perfetto would handle the case that some fence timelines have >>> separate events but others do not. >> Well I just send a patch to completely remove the trace point. >> >> As I said it absolutely doesn't make sense to use this for >> visualization, that's what the trace_dma_fence_init trace point is good for. >> >> The only use case is for debugging the GPU scheduler and we should >> probably introduce a separate GPU scheduler specific trace point for >> this instead if we should ever need it. > Hmm, AFAIU from Chia-I, virtgpu has a separation of init and emit.. > OTOH if using separate events in these special cases is better, then > I'm ok with that and can revert this patch. Chia-I is more familiar > with the visualization end of it, so I'll let him comment on whether > that is a workable approach. Interesting, I wasn't aware of the virtgpu separation of init and emit. But yes if there is really an use case for tracing this time stamp as well then we should probably have that use case specific. I just looked into the scheduler case a bit and found that even there we already have a different trace point for it, which is probably the reason why we never used trace_dma_fence_emit there. So yes, there really isn't much reason I can see two have two separate trace points for every driver. Christian. > > BR, > -R > >> Regards, >> Christian. >> >>> BR, >>> -R >>> >>>> Regards, >>>> Christian. >>>> >>>>> (Or, how do I check if it has landed?)
On Tue, Apr 26, 2022 at 10:20 AM Christian König <christian.koenig@amd.com> wrote: > > Am 26.04.22 um 19:16 schrieb Rob Clark: > > On Tue, Apr 26, 2022 at 10:08 AM Christian König > > <christian.koenig@amd.com> wrote: > >> Am 26.04.22 um 19:05 schrieb Rob Clark: > >>> On Tue, Apr 26, 2022 at 9:42 AM Christian König > >>> <christian.koenig@amd.com> wrote: > >>>> Am 26.04.22 um 18:32 schrieb Chia-I Wu: > >>>>> On Tue, Apr 12, 2022 at 2:26 PM Chia-I Wu <olvaffe@gmail.com> wrote: > >>>>>> In practice, trace_dma_fence_init called from dma_fence_init is good > >>>>>> enough and almost no driver calls trace_dma_fence_emit. But drm_sched > >>>>>> and virtio both have cases where trace_dma_fence_init and > >>>>>> trace_dma_fence_emit can be apart. It is easier for visualization tools > >>>>>> to always use the more correct trace_dma_fence_emit when visualizing > >>>>>> fence timelines. > >>>>>> > >>>>>> v2: improve commit message (Dmitry) > >>>>>> > >>>>>> Signed-off-by: Chia-I Wu <olvaffe@gmail.com> > >>>>>> Cc: Rob Clark <robdclark@chromium.org> > >>>>>> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > >>>>> This has been reviewed. Should we land it? > >>>> No, there are still open discussions about it. > >>> I think if it is needed for trace visualization, that is justification > >>> enough for me > >>> > >>> I don't really see otherwise how a generic trace visualization tool > >>> like perfetto would handle the case that some fence timelines have > >>> separate events but others do not. > >> Well I just send a patch to completely remove the trace point. > >> > >> As I said it absolutely doesn't make sense to use this for > >> visualization, that's what the trace_dma_fence_init trace point is good for. I am a bit confused by this. _emit and _signaled are a great way to see how many fences are pending from cpu's point of view. How does _emit make no sense and _init is good instead? Or is this just that _init is good enough most of the time? (More below) > >> > >> The only use case is for debugging the GPU scheduler and we should > >> probably introduce a separate GPU scheduler specific trace point for > >> this instead if we should ever need it. > > Hmm, AFAIU from Chia-I, virtgpu has a separation of init and emit.. > > OTOH if using separate events in these special cases is better, then > > I'm ok with that and can revert this patch. Chia-I is more familiar > > with the visualization end of it, so I'll let him comment on whether > > that is a workable approach. > > Interesting, I wasn't aware of the virtgpu separation of init and emit. > > But yes if there is really an use case for tracing this time stamp as > well then we should probably have that use case specific. > > I just looked into the scheduler case a bit and found that even there we > already have a different trace point for it, which is probably the > reason why we never used trace_dma_fence_emit there. Yeah, I am using drm_sched tracepoints in that case. > > So yes, there really isn't much reason I can see two have two separate > trace points for every driver. That sounds fair. In any tool, it should be easy to see if a fence timeline has _emit in addition to _init, and adapt accordingly. We can drop this patch. A clarification that _emit is optional/redundant for most fence timelines should be nicer than removing it though. > > Christian. > > > > > BR, > > -R > > > >> Regards, > >> Christian. > >> > >>> BR, > >>> -R > >>> > >>>> Regards, > >>>> Christian. > >>>> > >>>>> (Or, how do I check if it has landed?) >
Am 26.04.22 um 19:40 schrieb Chia-I Wu: > [SNIP] >>>> Well I just send a patch to completely remove the trace point. >>>> >>>> As I said it absolutely doesn't make sense to use this for >>>> visualization, that's what the trace_dma_fence_init trace point is good for. > I am a bit confused by this. _emit and _signaled are a great way to > see how many fences are pending from cpu's point of view. How does > _emit make no sense and _init is good instead? We had exactly that confusion now multiple times and it's one of the main reasons why I want to remove the _emit trace point. See the when you want to know how many fences are pending you need to watch out for init/destroy and *NOT* emit. The reason is that in the special case where emit makes sense (e.g. the GPU scheduler fences) emit comes later than init, but pending on the CPU and taking up resources are all fences and not just the one emitted to the hardware. On the other hand when you want to measure how much time each operation took on the hardware you need to take a look at the differences of the signal events on each timeline. So there isn't really any use case for the emit trace point, except when you want to figure out how much latency the scheduler introduce. Then you want to take a look at init and emit, but that isn't really that interesting for performance analyses. Regards, Christian.
On Tue, Apr 26, 2022 at 11:02 AM Christian König <christian.koenig@amd.com> wrote: > > Am 26.04.22 um 19:40 schrieb Chia-I Wu: > > [SNIP] > >>>> Well I just send a patch to completely remove the trace point. > >>>> > >>>> As I said it absolutely doesn't make sense to use this for > >>>> visualization, that's what the trace_dma_fence_init trace point is good for. > > I am a bit confused by this. _emit and _signaled are a great way to > > see how many fences are pending from cpu's point of view. How does > > _emit make no sense and _init is good instead? > > We had exactly that confusion now multiple times and it's one of the > main reasons why I want to remove the _emit trace point. > > See the when you want to know how many fences are pending you need to > watch out for init/destroy and *NOT* emit. > > The reason is that in the special case where emit makes sense (e.g. the > GPU scheduler fences) emit comes later than init, but pending on the CPU > and taking up resources are all fences and not just the one emitted to > the hardware. I am more interested in pending on the GPU. > > On the other hand when you want to measure how much time each operation > took on the hardware you need to take a look at the differences of the > signal events on each timeline. _signaled alone is not enough when the GPU is not always busy. After the last pending fence signals but before the following _init/_emit, nothing is pending. For all drivers except virtio-gpu, _init and "ring head update" always happen close enough that I can see why _emit is redundant. But I like having _emit as a generic tracepoint for timelines where _init and _emit can be apart, instead of requiring a special case tracepoint for each special case timeline. > > So there isn't really any use case for the emit trace point, except when > you want to figure out how much latency the scheduler introduce. Then > you want to take a look at init and emit, but that isn't really that > interesting for performance analyses. > > Regards, > Christian. >
Am 26.04.22 um 20:50 schrieb Chia-I Wu: > On Tue, Apr 26, 2022 at 11:02 AM Christian König > <christian.koenig@amd.com> wrote: >> Am 26.04.22 um 19:40 schrieb Chia-I Wu: >>> [SNIP] >>>>>> Well I just send a patch to completely remove the trace point. >>>>>> >>>>>> As I said it absolutely doesn't make sense to use this for >>>>>> visualization, that's what the trace_dma_fence_init trace point is good for. >>> I am a bit confused by this. _emit and _signaled are a great way to >>> see how many fences are pending from cpu's point of view. How does >>> _emit make no sense and _init is good instead? >> We had exactly that confusion now multiple times and it's one of the >> main reasons why I want to remove the _emit trace point. >> >> See the when you want to know how many fences are pending you need to >> watch out for init/destroy and *NOT* emit. >> >> The reason is that in the special case where emit makes sense (e.g. the >> GPU scheduler fences) emit comes later than init, but pending on the CPU >> and taking up resources are all fences and not just the one emitted to >> the hardware. > I am more interested in pending on the GPU. > >> On the other hand when you want to measure how much time each operation >> took on the hardware you need to take a look at the differences of the >> signal events on each timeline. > _signaled alone is not enough when the GPU is not always busy. After > the last pending fence signals but before the following _init/_emit, > nothing is pending. Yeah, I'm perfectly aware of that. > For all drivers except virtio-gpu, _init and "ring head update" always > happen close enough that I can see why _emit is redundant. But I like > having _emit as a generic tracepoint for timelines where _init and > _emit can be apart, instead of requiring a special case tracepoint for > each special case timeline. And I'm certainly not going to add _emit to all drivers just because of that. As you said it is a special case for virtio-gpu and the GPU scheduler. And as I explained before the difference between _init and _emit shouldn't matter to your visualization. The background is that as soon as an dma_fence is initialized with _init it is "live" regarding the dependency and memory management and exactly that's what matters for the visualization. The latency between _init and _emit is just interesting for debugging the scheduler and surprisingly virtio-gpu as well, for all other use cases it is irrelevant. Regards, Christian. >> So there isn't really any use case for the emit trace point, except when >> you want to figure out how much latency the scheduler introduce. Then >> you want to take a look at init and emit, but that isn't really that >> interesting for performance analyses. >> >> Regards, >> Christian. >>
On Tue, Apr 26, 2022 at 11:20 PM Christian König <christian.koenig@amd.com> wrote: > > Am 26.04.22 um 20:50 schrieb Chia-I Wu: > > On Tue, Apr 26, 2022 at 11:02 AM Christian König > > <christian.koenig@amd.com> wrote: > >> Am 26.04.22 um 19:40 schrieb Chia-I Wu: > >>> [SNIP] > >>>>>> Well I just send a patch to completely remove the trace point. > >>>>>> > >>>>>> As I said it absolutely doesn't make sense to use this for > >>>>>> visualization, that's what the trace_dma_fence_init trace point is good for. > >>> I am a bit confused by this. _emit and _signaled are a great way to > >>> see how many fences are pending from cpu's point of view. How does > >>> _emit make no sense and _init is good instead? > >> We had exactly that confusion now multiple times and it's one of the > >> main reasons why I want to remove the _emit trace point. > >> > >> See the when you want to know how many fences are pending you need to > >> watch out for init/destroy and *NOT* emit. > >> > >> The reason is that in the special case where emit makes sense (e.g. the > >> GPU scheduler fences) emit comes later than init, but pending on the CPU > >> and taking up resources are all fences and not just the one emitted to > >> the hardware. > > I am more interested in pending on the GPU. > > > >> On the other hand when you want to measure how much time each operation > >> took on the hardware you need to take a look at the differences of the > >> signal events on each timeline. > > _signaled alone is not enough when the GPU is not always busy. After > > the last pending fence signals but before the following _init/_emit, > > nothing is pending. > > Yeah, I'm perfectly aware of that. > > > For all drivers except virtio-gpu, _init and "ring head update" always > > happen close enough that I can see why _emit is redundant. But I like > > having _emit as a generic tracepoint for timelines where _init and > > _emit can be apart, instead of requiring a special case tracepoint for > > each special case timeline. > > And I'm certainly not going to add _emit to all drivers just because of > that. As you said it is a special case for virtio-gpu and the GPU scheduler. > > And as I explained before the difference between _init and _emit > shouldn't matter to your visualization. The background is that as soon > as an dma_fence is initialized with _init it is "live" regarding the > dependency and memory management and exactly that's what matters for the > visualization. > > The latency between _init and _emit is just interesting for debugging > the scheduler and surprisingly virtio-gpu as well, for all other use > cases it is irrelevant. It might actually be *more* interesting for virtio-gpu.. unless there is some other way to link guest and host fences to see what the potential latency of guest->host is re: adding the tracepoint to other drivers, I'm fine with folks doing that as needed. Unless you have a better proposal about how to visualize init vs emit latency, I think it's fine to have an extra tracepoint even if it is redundant in some cases. The visualization tool is the customer here, we have to give it what it wants/needs. BR, -R > > Regards, > Christian. > > >> So there isn't really any use case for the emit trace point, except when > >> you want to figure out how much latency the scheduler introduce. Then > >> you want to take a look at init and emit, but that isn't really that > >> interesting for performance analyses. > >> > >> Regards, > >> Christian. > >> >
On Wed, Apr 27, 2022 at 9:07 AM Rob Clark <robdclark@chromium.org> wrote: > > On Tue, Apr 26, 2022 at 11:20 PM Christian König > <christian.koenig@amd.com> wrote: > > > > Am 26.04.22 um 20:50 schrieb Chia-I Wu: > > > On Tue, Apr 26, 2022 at 11:02 AM Christian König > > > <christian.koenig@amd.com> wrote: > > >> Am 26.04.22 um 19:40 schrieb Chia-I Wu: > > >>> [SNIP] > > >>>>>> Well I just send a patch to completely remove the trace point. > > >>>>>> > > >>>>>> As I said it absolutely doesn't make sense to use this for > > >>>>>> visualization, that's what the trace_dma_fence_init trace point is good for. > > >>> I am a bit confused by this. _emit and _signaled are a great way to > > >>> see how many fences are pending from cpu's point of view. How does > > >>> _emit make no sense and _init is good instead? > > >> We had exactly that confusion now multiple times and it's one of the > > >> main reasons why I want to remove the _emit trace point. > > >> > > >> See the when you want to know how many fences are pending you need to > > >> watch out for init/destroy and *NOT* emit. > > >> > > >> The reason is that in the special case where emit makes sense (e.g. the > > >> GPU scheduler fences) emit comes later than init, but pending on the CPU > > >> and taking up resources are all fences and not just the one emitted to > > >> the hardware. > > > I am more interested in pending on the GPU. > > > > > >> On the other hand when you want to measure how much time each operation > > >> took on the hardware you need to take a look at the differences of the > > >> signal events on each timeline. > > > _signaled alone is not enough when the GPU is not always busy. After > > > the last pending fence signals but before the following _init/_emit, > > > nothing is pending. > > > > Yeah, I'm perfectly aware of that. > > > > > For all drivers except virtio-gpu, _init and "ring head update" always > > > happen close enough that I can see why _emit is redundant. But I like > > > having _emit as a generic tracepoint for timelines where _init and > > > _emit can be apart, instead of requiring a special case tracepoint for > > > each special case timeline. > > > > And I'm certainly not going to add _emit to all drivers just because of > > that. As you said it is a special case for virtio-gpu and the GPU scheduler. > > > > And as I explained before the difference between _init and _emit > > shouldn't matter to your visualization. The background is that as soon > > as an dma_fence is initialized with _init it is "live" regarding the > > dependency and memory management and exactly that's what matters for the > > visualization. > > > > The latency between _init and _emit is just interesting for debugging > > the scheduler and surprisingly virtio-gpu as well, for all other use > > cases it is irrelevant. > > It might actually be *more* interesting for virtio-gpu.. unless there > is some other way to link guest and host fences to see what the > potential latency of guest->host is > > re: adding the tracepoint to other drivers, I'm fine with folks doing > that as needed. Unless you have a better proposal about how to > visualize init vs emit latency, I think it's fine to have an extra > tracepoint even if it is redundant in some cases. The visualization > tool is the customer here, we have to give it what it wants/needs. As far as perfetto is concerned, I just use either _init or _emit on a per-timeline basis. We can drop this patch for msm, and do not need to change drivers whose latencies between _init/_emit are ignorable. init vs emit latency is still interesting. I prefer keeping _init / _emit as generic events that tools can parse, rather than adding per-driver special cases that tools need to understand. > > BR, > -R > > > > > Regards, > > Christian. > > > > >> So there isn't really any use case for the emit trace point, except when > > >> you want to figure out how much latency the scheduler introduce. Then > > >> you want to take a look at init and emit, but that isn't really that > > >> interesting for performance analyses. > > >> > > >> Regards, > > >> Christian. > > >> > >
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index faf0c242874e..a82193f41ea2 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -15,6 +15,7 @@ #include <linux/string_helpers.h> #include <linux/devcoredump.h> #include <linux/sched/task.h> +#include <trace/events/dma_fence.h> /* * Power Management: @@ -769,6 +770,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit) gpu->active_submits++; mutex_unlock(&gpu->active_lock); + trace_dma_fence_emit(submit->hw_fence); gpu->funcs->submit(gpu, submit); gpu->cur_ctx_seqno = submit->queue->ctx->seqno;