diff mbox series

[v5,02/40] drm/gpuvm: Allow VAs to hold soft reference to BOs

Message ID 20250519175348.11924-3-robdclark@gmail.com
State New
Headers show
Series drm/msm: sparse / "VM_BIND" support | expand

Commit Message

Rob Clark May 19, 2025, 5:51 p.m. UTC
From: Rob Clark <robdclark@chromium.org>

Eases migration for drivers where VAs don't hold hard references to
their associated BO, avoiding reference loops.

In particular, msm uses soft references to optimistically keep around
mappings until the BO is distroyed.  Which obviously won't work if the
VA (the mapping) is holding a reference to the BO.

By making this a per-VM flag, we can use normal hard-references for
mappings in a "VM_BIND" managed VM, but soft references in other cases,
such as kernel-internal VMs (for display scanout, etc).

Cc: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/drm_gpuvm.c | 37 ++++++++++++++++++++++++++++++++-----
 include/drm/drm_gpuvm.h     | 19 +++++++++++++++++--
 2 files changed, 49 insertions(+), 7 deletions(-)

Comments

Danilo Krummrich May 20, 2025, 7:40 a.m. UTC | #1
On Mon, May 19, 2025 at 10:51:25AM -0700, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
> 
> Eases migration for drivers where VAs don't hold hard references to
> their associated BO, avoiding reference loops.
> 
> In particular, msm uses soft references to optimistically keep around
> mappings until the BO is distroyed.  Which obviously won't work if the
> VA (the mapping) is holding a reference to the BO.
> 
> By making this a per-VM flag, we can use normal hard-references for
> mappings in a "VM_BIND" managed VM, but soft references in other cases,
> such as kernel-internal VMs (for display scanout, etc).
> 
> Cc: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>  drivers/gpu/drm/drm_gpuvm.c | 37 ++++++++++++++++++++++++++++++++-----
>  include/drm/drm_gpuvm.h     | 19 +++++++++++++++++--
>  2 files changed, 49 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> index 1e89a98caad4..892b62130ff8 100644
> --- a/drivers/gpu/drm/drm_gpuvm.c
> +++ b/drivers/gpu/drm/drm_gpuvm.c
> @@ -1125,6 +1125,8 @@ __drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
>  	LIST_HEAD(extobjs);
>  	int ret = 0;
>  
> +	WARN_ON(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF);

No, here and below, please don't scatter WARN_ON() calls in various code paths
for this hack.

This won't ever be a valid and supported mode, please keep the diff as small as
possible.

<snip>

> diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> index 2a9629377633..652e0fb66413 100644
> --- a/include/drm/drm_gpuvm.h
> +++ b/include/drm/drm_gpuvm.h
> @@ -205,10 +205,25 @@ enum drm_gpuvm_flags {
>  	 */
>  	DRM_GPUVM_RESV_PROTECTED = BIT(0),
>  
> +	/**
> +	 * @DRM_GPUVM_VA_WEAK_REF:
> +	 *
> +	 * Flag indicating that the &drm_gpuva (or more correctly, the
> +	 * &drm_gpuvm_bo) only holds a weak reference to the &drm_gem_object.
> +	 * This mode is intended to ease migration to drm_gpuvm for drivers
> +	 * where the GEM object holds a referece to the VA, rather than the
> +	 * other way around.
> +	 *
> +	 * In this mode, drm_gpuvm does not track evicted or external objects.
> +	 * It is intended for legacy mode, where the needed objects are attached
> +	 * to the command submission ioctl, therefore this tracking is unneeded.
> +	 */
> +	DRM_GPUVM_VA_WEAK_REF = BIT(1),

As mentioned in v4, I do *not* agree with making this a valid and supported
mode. If at all, it should be an exception for MSM, i.e.
DRM_GPUVM_MSM_LEGACY_QUIRK with an explicit approval from Dave / Sima [1].

It invalidates the whole design and makes a lot of functions fundamentally
invalid to call, which is well demonstrated by all the WARN_ON() calls this
patch attempts to add.

> +
>  	/**
>  	 * @DRM_GPUVM_USERBITS: user defined bits
>  	 */
> -	DRM_GPUVM_USERBITS = BIT(1),
> +	DRM_GPUVM_USERBITS = BIT(2),
>  };

[1] https://lore.kernel.org/dri-devel/aCb-72KH-NrzvGXy@pollux/
Rob Clark May 20, 2025, 3:54 p.m. UTC | #2
On Tue, May 20, 2025 at 12:40 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Mon, May 19, 2025 at 10:51:25AM -0700, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > Eases migration for drivers where VAs don't hold hard references to
> > their associated BO, avoiding reference loops.
> >
> > In particular, msm uses soft references to optimistically keep around
> > mappings until the BO is distroyed.  Which obviously won't work if the
> > VA (the mapping) is holding a reference to the BO.
> >
> > By making this a per-VM flag, we can use normal hard-references for
> > mappings in a "VM_BIND" managed VM, but soft references in other cases,
> > such as kernel-internal VMs (for display scanout, etc).
> >
> > Cc: Danilo Krummrich <dakr@kernel.org>
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >  drivers/gpu/drm/drm_gpuvm.c | 37 ++++++++++++++++++++++++++++++++-----
> >  include/drm/drm_gpuvm.h     | 19 +++++++++++++++++--
> >  2 files changed, 49 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
> > index 1e89a98caad4..892b62130ff8 100644
> > --- a/drivers/gpu/drm/drm_gpuvm.c
> > +++ b/drivers/gpu/drm/drm_gpuvm.c
> > @@ -1125,6 +1125,8 @@ __drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
> >       LIST_HEAD(extobjs);
> >       int ret = 0;
> >
> > +     WARN_ON(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF);
>
> No, here and below, please don't scatter WARN_ON() calls in various code paths
> for this hack.

I do prefer WARN_ON()s to make it clear what is or is not proper
usage, but if you really don't want them I can remove them.

> This won't ever be a valid and supported mode, please keep the diff as small as
> possible.
>
> <snip>
>
> > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
> > index 2a9629377633..652e0fb66413 100644
> > --- a/include/drm/drm_gpuvm.h
> > +++ b/include/drm/drm_gpuvm.h
> > @@ -205,10 +205,25 @@ enum drm_gpuvm_flags {
> >        */
> >       DRM_GPUVM_RESV_PROTECTED = BIT(0),
> >
> > +     /**
> > +      * @DRM_GPUVM_VA_WEAK_REF:
> > +      *
> > +      * Flag indicating that the &drm_gpuva (or more correctly, the
> > +      * &drm_gpuvm_bo) only holds a weak reference to the &drm_gem_object.
> > +      * This mode is intended to ease migration to drm_gpuvm for drivers
> > +      * where the GEM object holds a referece to the VA, rather than the
> > +      * other way around.
> > +      *
> > +      * In this mode, drm_gpuvm does not track evicted or external objects.
> > +      * It is intended for legacy mode, where the needed objects are attached
> > +      * to the command submission ioctl, therefore this tracking is unneeded.
> > +      */
> > +     DRM_GPUVM_VA_WEAK_REF = BIT(1),
>
> As mentioned in v4, I do *not* agree with making this a valid and supported
> mode. If at all, it should be an exception for MSM, i.e.
> DRM_GPUVM_MSM_LEGACY_QUIRK with an explicit approval from Dave / Sima [1].

I can rename it if you really insist, but "legacy" implies it is
something that will go away.

Unfortunately there is too much map/unmap cost for things like
pageflip, so the VM related to scanout will continue to use this mode
forever, even if all of the userspace related VMs are not using this
flag.  If I could opportunistically keep around a mapping while using
gpuvm as it is "intended", I would.  But I don't see how that can work
when the VMA/VM_BO holds a reference to the GEM object.

> It invalidates the whole design and makes a lot of functions fundamentally
> invalid to call, which is well demonstrated by all the WARN_ON() calls this
> patch attempts to add.

I think of it more as adding a different mode of operation.  One
where, perhaps some functions of gpuvm are not available, but that is
fine because they are also unneeded in that mode of operation.  Hence
the WARN_ON()s to make that clear.

BR,
-R

> > +
> >       /**
> >        * @DRM_GPUVM_USERBITS: user defined bits
> >        */
> > -     DRM_GPUVM_USERBITS = BIT(1),
> > +     DRM_GPUVM_USERBITS = BIT(2),
> >  };
>
> [1] https://lore.kernel.org/dri-devel/aCb-72KH-NrzvGXy@pollux/
Danilo Krummrich May 20, 2025, 4:22 p.m. UTC | #3
On Tue, May 20, 2025 at 08:54:53AM -0700, Rob Clark wrote:
> On Tue, May 20, 2025 at 12:40 AM Danilo Krummrich <dakr@kernel.org> wrote:
> > On Mon, May 19, 2025 at 10:51:25AM -0700, Rob Clark wrote:
> > It invalidates the whole design and makes a lot of functions fundamentally
> > invalid to call, which is well demonstrated by all the WARN_ON() calls this
> > patch attempts to add.
> 
> I think of it more as adding a different mode of operation.  One
> where, perhaps some functions of gpuvm are not available, but that is
> fine because they are also unneeded in that mode of operation.  Hence
> the WARN_ON()s to make that clear.

This isn't a different mode of operation. You're breaking the design and
internal guarantees and validity the code relies on. And as a consequence you
have to disable the functions that are obviously broken by scattering it with
WARN_ON() calls.

And for the remaining code that is not disabled we'd have entirely new
requirements on the guarantees the caller must provide in terms of reference
counts.

This is as if I'd try to promote a car with a broken engine control unit and
would tell you "It's just in a different mode of operation, where driving isn't
supported, but you can still heat the cabin and power the radio with the
engine.", hoping that the broken engine control unit has no other side effects.

Sorry, as much as I'd like to help and unblock you, I don't buy it.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 1e89a98caad4..892b62130ff8 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -1125,6 +1125,8 @@  __drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
 	LIST_HEAD(extobjs);
 	int ret = 0;
 
+	WARN_ON(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF);
+
 	for_each_vm_bo_in_list(gpuvm, extobj, &extobjs, vm_bo) {
 		ret = exec_prepare_obj(exec, vm_bo->obj, num_fences);
 		if (ret)
@@ -1145,6 +1147,8 @@  drm_gpuvm_prepare_objects_locked(struct drm_gpuvm *gpuvm,
 	struct drm_gpuvm_bo *vm_bo;
 	int ret = 0;
 
+	WARN_ON(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF);
+
 	drm_gpuvm_resv_assert_held(gpuvm);
 	list_for_each_entry(vm_bo, &gpuvm->extobj.list, list.entry.extobj) {
 		ret = exec_prepare_obj(exec, vm_bo->obj, num_fences);
@@ -1386,6 +1390,7 @@  drm_gpuvm_validate_locked(struct drm_gpuvm *gpuvm, struct drm_exec *exec)
 	struct drm_gpuvm_bo *vm_bo, *next;
 	int ret = 0;
 
+	WARN_ON(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF);
 	drm_gpuvm_resv_assert_held(gpuvm);
 
 	list_for_each_entry_safe(vm_bo, next, &gpuvm->evict.list,
@@ -1482,7 +1487,9 @@  drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm,
 
 	vm_bo->vm = drm_gpuvm_get(gpuvm);
 	vm_bo->obj = obj;
-	drm_gem_object_get(obj);
+
+	if (!(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF))
+		drm_gem_object_get(obj);
 
 	kref_init(&vm_bo->kref);
 	INIT_LIST_HEAD(&vm_bo->list.gpuva);
@@ -1504,16 +1511,22 @@  drm_gpuvm_bo_destroy(struct kref *kref)
 	const struct drm_gpuvm_ops *ops = gpuvm->ops;
 	struct drm_gem_object *obj = vm_bo->obj;
 	bool lock = !drm_gpuvm_resv_protected(gpuvm);
+	bool unref = !(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF);
 
 	if (!lock)
 		drm_gpuvm_resv_assert_held(gpuvm);
 
+	if (kref_read(&obj->refcount) > 0) {
+		drm_gem_gpuva_assert_lock_held(obj);
+	} else {
+		WARN_ON(!(gpuvm->flags & DRM_GPUVM_VA_WEAK_REF));
+		WARN_ON(!list_empty(&vm_bo->list.entry.evict));
+		WARN_ON(!list_empty(&vm_bo->list.entry.extobj));
+	}
+
 	drm_gpuvm_bo_list_del(vm_bo, extobj, lock);
 	drm_gpuvm_bo_list_del(vm_bo, evict, lock);
 
-	if (kref_read(&obj->refcount) > 0)
-		drm_gem_gpuva_assert_lock_held(obj);
-
 	list_del(&vm_bo->list.entry.gem);
 
 	if (ops && ops->vm_bo_free)
@@ -1522,7 +1535,8 @@  drm_gpuvm_bo_destroy(struct kref *kref)
 		kfree(vm_bo);
 
 	drm_gpuvm_put(gpuvm);
-	drm_gem_object_put(obj);
+	if (unref)
+		drm_gem_object_put(obj);
 }
 
 /**
@@ -1678,6 +1692,12 @@  drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
 	if (!lock)
 		drm_gpuvm_resv_assert_held(gpuvm);
 
+	/* If the vm_bo doesn't hold a hard reference to the obj, then the
+	 * driver is responsible for object tracking.
+	 */
+	if (gpuvm->flags & DRM_GPUVM_VA_WEAK_REF)
+		return;
+
 	if (drm_gpuvm_is_extobj(gpuvm, vm_bo->obj))
 		drm_gpuvm_bo_list_add(vm_bo, extobj, lock);
 }
@@ -1699,6 +1719,13 @@  drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
 	bool lock = !drm_gpuvm_resv_protected(gpuvm);
 
 	dma_resv_assert_held(obj->resv);
+
+	/* If the vm_bo doesn't hold a hard reference to the obj, then the
+	 * driver must track evictions on it's own.
+	 */
+	if (gpuvm->flags & DRM_GPUVM_VA_WEAK_REF)
+		return;
+
 	vm_bo->evicted = evict;
 
 	/* Can't add external objects to the evicted list directly if not using
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index 2a9629377633..652e0fb66413 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -205,10 +205,25 @@  enum drm_gpuvm_flags {
 	 */
 	DRM_GPUVM_RESV_PROTECTED = BIT(0),
 
+	/**
+	 * @DRM_GPUVM_VA_WEAK_REF:
+	 *
+	 * Flag indicating that the &drm_gpuva (or more correctly, the
+	 * &drm_gpuvm_bo) only holds a weak reference to the &drm_gem_object.
+	 * This mode is intended to ease migration to drm_gpuvm for drivers
+	 * where the GEM object holds a referece to the VA, rather than the
+	 * other way around.
+	 *
+	 * In this mode, drm_gpuvm does not track evicted or external objects.
+	 * It is intended for legacy mode, where the needed objects are attached
+	 * to the command submission ioctl, therefore this tracking is unneeded.
+	 */
+	DRM_GPUVM_VA_WEAK_REF = BIT(1),
+
 	/**
 	 * @DRM_GPUVM_USERBITS: user defined bits
 	 */
-	DRM_GPUVM_USERBITS = BIT(1),
+	DRM_GPUVM_USERBITS = BIT(2),
 };
 
 /**
@@ -651,7 +666,7 @@  struct drm_gpuvm_bo {
 
 	/**
 	 * @obj: The &drm_gem_object being mapped in @vm. This is a reference
-	 * counted pointer.
+	 * counted pointer, unless the &DRM_GPUVM_VA_WEAK_REF flag is set.
 	 */
 	struct drm_gem_object *obj;