diff mbox

drm/i915: Before pageflip, also wait for shared dmabuf fences.

Message ID CAHbf0-H0owRAp_VXWd84E03OiPKXSE0jZ1=+OAtvoKuoSyt_=A@mail.gmail.com
State New
Headers show

Commit Message

Mike Lothian Oct. 28, 2016, 6:37 p.m. UTC
Hi Mario

That fixes the tearing, it's been replaced with a strange stutter, like
it's only showing half the number of frames being reported - it's really
noticeable in tomb raider

Thanks for your work on this, the stutter is much more manageable than the
tearing was

I've attached the patch that applies cleanly to 4.10-wip



On Fri, 28 Oct 2016 at 18:37 Mario Kleiner <mario.kleiner.de@gmail.com>
wrote:

>

>

> On 10/28/2016 03:34 AM, Michel Dänzer wrote:

> > On 27/10/16 10:33 PM, Mike Lothian wrote:

> >>

> >> Just another gentle ping to see where you are with this?

> >

> > I haven't got a chance to look into this any further.

> >

> >

>

> Fwiw., as a proof of concept, the attached experimental patch does work

> as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and

> DRI3/Present when applied to drm-next (updated from a few days ago).

> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone

> under all loads. The tearing with "windowed" windows now looks as

> expected for regular tearing not related to Prime.

>

> ftrace confirms the i915 driver's pageflip function is waiting on the

> fence in reservation_object_wait_timeout_rcu() as it should.

>

> That entry->tv.shared needs to be set false for such buffers in

> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer

> validation for command stream submission. There are other places in the

> driver where tv.shared is set, which i didn't check so far.

>

> I don't know which of these would need to be updated with a "exported

> bo" check as well, e.g., for video decoding or maybe gpu compute? Adding

> or removing the check to amdgpu_gem_va_update_vm(), e.g., made no

> difference. I assume that makes sense because that functions seems to

> deal with amdgpu internal vm page tables or page table entries for such

> a bo, not with something visible to external clients?

>

> All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping

> without causing any obvious new problems.

>

> -mario

>

Comments

Mike Lothian Oct. 29, 2016, 1:58 p.m. UTC | #1
I turned on vsync and everything works great in tomb raider

:D

Thanks again to everyone who made this possible

On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk> wrote:

> Hi Mario

>

> That fixes the tearing, it's been replaced with a strange stutter, like

> it's only showing half the number of frames being reported - it's really

> noticeable in tomb raider

>

> Thanks for your work on this, the stutter is much more manageable than the

> tearing was

>

> I've attached the patch that applies cleanly to 4.10-wip

>

>

>

> On Fri, 28 Oct 2016 at 18:37 Mario Kleiner <mario.kleiner.de@gmail.com>

> wrote:

>

>

>

> On 10/28/2016 03:34 AM, Michel Dänzer wrote:

> > On 27/10/16 10:33 PM, Mike Lothian wrote:

> >>

> >> Just another gentle ping to see where you are with this?

> >

> > I haven't got a chance to look into this any further.

> >

> >

>

> Fwiw., as a proof of concept, the attached experimental patch does work

> as tested on Intel HD Haswell + AMD R9 380 Tonga under amdgpu and

> DRI3/Present when applied to drm-next (updated from a few days ago).

> With DRI_PRIME=1 tearing for page-flipped fullscreen windows is gone

> under all loads. The tearing with "windowed" windows now looks as

> expected for regular tearing not related to Prime.

>

> ftrace confirms the i915 driver's pageflip function is waiting on the

> fence in reservation_object_wait_timeout_rcu() as it should.

>

> That entry->tv.shared needs to be set false for such buffers in

> amdgpu_bo_list_set() makes sense to me, as that is part of the buffer

> validation for command stream submission. There are other places in the

> driver where tv.shared is set, which i didn't check so far.

>

> I don't know which of these would need to be updated with a "exported

> bo" check as well, e.g., for video decoding or maybe gpu compute? Adding

> or removing the check to amdgpu_gem_va_update_vm(), e.g., made no

> difference. I assume that makes sense because that functions seems to

> deal with amdgpu internal vm page tables or page table entries for such

> a bo, not with something visible to external clients?

>

> All i can say is it fixes 3D rendering under DRI3 + Prime + pageflipping

> without causing any obvious new problems.

>

> -mario

>

>
Michel Dänzer Oct. 31, 2016, 6:44 a.m. UTC | #2
On 29/10/16 10:58 PM, Mike Lothian wrote:
> I turned on vsync and everything works great in tomb raider
> 
> :D
> 
> Thanks again to everyone who made this possible
> 
> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk
> <mailto:mike@fireburn.co.uk>> wrote:
> 
>     Hi Mario
> 
>     That fixes the tearing, it's been replaced with a strange stutter,
>     like it's only showing half the number of frames being reported -
>     it's really noticeable in tomb raider

I wonder if the stutter might be due to the dGPU writing another frame
before the iGPU is done processing the previous one. Christian, does the
amdgpu scheduler wait for shared fences of shared BOs to signal before
submitting jobs using them to the GPU?
Christian König Oct. 31, 2016, 8 a.m. UTC | #3
Am 31.10.2016 um 07:44 schrieb Michel Dänzer:
> On 29/10/16 10:58 PM, Mike Lothian wrote:
>> I turned on vsync and everything works great in tomb raider
>>
>> :D
>>
>> Thanks again to everyone who made this possible
>>
>> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk
>> <mailto:mike@fireburn.co.uk>> wrote:
>>
>>      Hi Mario
>>
>>      That fixes the tearing, it's been replaced with a strange stutter,
>>      like it's only showing half the number of frames being reported -
>>      it's really noticeable in tomb raider
> I wonder if the stutter might be due to the dGPU writing another frame
> before the iGPU is done processing the previous one. Christian, does the
> amdgpu scheduler wait for shared fences of shared BOs to signal before
> submitting jobs using them to the GPU?

Yeah, that should work. We wait for both the exclusive as well as all 
shared fences before CS or pageflip.

Only on CS we filter the shared fences so that we don't necessary wait 
for submissions from the same process.

Christian.
Michel Dänzer Oct. 31, 2016, 8:06 a.m. UTC | #4
On 31/10/16 05:00 PM, Christian König wrote:
> Am 31.10.2016 um 07:44 schrieb Michel Dänzer:
>> On 29/10/16 10:58 PM, Mike Lothian wrote:
>>> I turned on vsync and everything works great in tomb raider
>>>
>>> :D
>>>
>>> Thanks again to everyone who made this possible
>>>
>>> On Fri, 28 Oct 2016 at 19:37 Mike Lothian <mike@fireburn.co.uk
>>> <mailto:mike@fireburn.co.uk>> wrote:
>>>
>>>      Hi Mario
>>>
>>>      That fixes the tearing, it's been replaced with a strange stutter,
>>>      like it's only showing half the number of frames being reported -
>>>      it's really noticeable in tomb raider
>> I wonder if the stutter might be due to the dGPU writing another frame
>> before the iGPU is done processing the previous one. Christian, does the
>> amdgpu scheduler wait for shared fences of shared BOs to signal before
>> submitting jobs using them to the GPU?
> 
> Yeah, that should work. We wait for both the exclusive as well as all
> shared fences before CS or pageflip.
> 
> Only on CS we filter the shared fences so that we don't necessary wait
> for submissions from the same process.

Note that it can be the same process (Xorg) in the RandR 1.4 multihead
case. But it sounds like Mike's stutter can't be due to the issue I was
thinking of.
diff mbox

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 217df24..6757b99 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -391,6 +391,7 @@  struct amdgpu_bo {
 	u64				metadata_flags;
 	void				*metadata;
 	u32				metadata_size;
+	bool				prime_exported;
 	/* list of all virtual address to which this bo
 	 * is associated to
 	 */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 651115d..6e1d7b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -132,7 +132,10 @@  static int amdgpu_bo_list_set(struct amdgpu_device *adev,
 		entry->priority = min(info[i].bo_priority,
 				      AMDGPU_BO_LIST_MAX_PRIORITY);
 		entry->tv.bo = &entry->robj->tbo;
-		entry->tv.shared = true;
+		entry->tv.shared = !entry->robj->prime_exported;
+
+		if (entry->robj->prime_exported)
+		    DRM_DEBUG_PRIME("Exclusive fence for exported prime bo %p\n", entry->robj);
 
 		if (entry->robj->prefered_domains == AMDGPU_GEM_DOMAIN_GDS)
 			gds_obj = entry->robj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index cd62f6f..54099a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -504,6 +504,12 @@  static void amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
 
 	tv.bo = &bo_va->bo->tbo;
 	tv.shared = true;
+
+	if (bo_va->bo->prime_exported) {
+	    DRM_DEBUG_PRIME("Update for exported prime bo %p\n", bo_va->bo);
+	    /* tv.shared = false; */
+	}
+
 	list_add(&tv.head, &list);
 
 	amdgpu_vm_get_pd_bo(bo_va->vm, &list, &vm_pd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 7700dc2..bfbfeb9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -121,5 +121,8 @@  struct dma_buf *amdgpu_gem_prime_export(struct drm_device *dev,
 	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm))
 		return ERR_PTR(-EPERM);
 
+	bo->prime_exported = true;
+	DRM_DEBUG_PRIME("Exporting prime bo %p\n", bo);
+
 	return drm_gem_prime_export(dev, gobj, flags);
 }