mbox series

[v1,0/6] Move all drivers to a common dma-buf locking convention

Message ID 20220715005244.42198-1-dmitry.osipenko@collabora.com
Headers show
Series Move all drivers to a common dma-buf locking convention | expand

Message

Dmitry Osipenko July 15, 2022, 12:52 a.m. UTC
Hello,

This series moves all drivers to a dynamic dma-buf locking specification.

Comments

Christian König July 15, 2022, 6:50 a.m. UTC | #1
Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:
> Intel i915 GPU driver uses wait-wound mutex to lock multiple GEMs on the
> attachment to the i915 dma-buf. In order to let all drivers utilize shared
> wait-wound context during attachment in a general way, make dma-buf core to
> acquire the ww context internally for the attachment operation and update
> i915 driver to use the importer's ww context instead of the internal one.
>
>  From now on all dma-buf exporters shall use the importer's ww context for
> the attachment operation.
>
> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> ---
>   drivers/dma-buf/dma-buf.c                     |  8 +++++-
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  2 +-
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  2 +-
>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  6 ++---
>   drivers/gpu/drm/i915/i915_gem_evict.c         |  2 +-
>   drivers/gpu/drm/i915/i915_gem_ww.c            | 26 +++++++++++++++----
>   drivers/gpu/drm/i915/i915_gem_ww.h            | 15 +++++++++--
>   7 files changed, 47 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> index 0ee588276534..37545ecb845a 100644
> --- a/drivers/dma-buf/dma-buf.c
> +++ b/drivers/dma-buf/dma-buf.c
> @@ -807,6 +807,8 @@ static struct sg_table * __map_dma_buf(struct dma_buf_attachment *attach,
>    * Optionally this calls &dma_buf_ops.attach to allow device-specific attach
>    * functionality.
>    *
> + * Exporters shall use ww_ctx acquired by this function.
> + *
>    * Returns:
>    *
>    * A pointer to newly created &dma_buf_attachment on success, or a negative
> @@ -822,6 +824,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, struct device *dev,
>   				void *importer_priv)
>   {
>   	struct dma_buf_attachment *attach;
> +	struct ww_acquire_ctx ww_ctx;
>   	int ret;
>   
>   	if (WARN_ON(!dmabuf || !dev))
> @@ -841,7 +844,8 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, struct device *dev,
>   	attach->importer_ops = importer_ops;
>   	attach->importer_priv = importer_priv;
>   
> -	dma_resv_lock(dmabuf->resv, NULL);
> +	ww_acquire_init(&ww_ctx, &reservation_ww_class);
> +	dma_resv_lock(dmabuf->resv, &ww_ctx);

That won't work like this. The core property of a WW context is that you 
need to unwind all the locks and re-quire them with the contended one first.

When you statically lock the imported one here you can't do that any more.

Regards,
Christian.

>   
>   	if (dmabuf->ops->attach) {
>   		ret = dmabuf->ops->attach(dmabuf, attach);
> @@ -876,11 +880,13 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf *dmabuf, struct device *dev,
>   	}
>   
>   	dma_resv_unlock(dmabuf->resv);
> +	ww_acquire_fini(&ww_ctx);
>   
>   	return attach;
>   
>   err_attach:
>   	dma_resv_unlock(attach->dmabuf->resv);
> +	ww_acquire_fini(&ww_ctx);
>   	kfree(attach);
>   	return ERR_PTR(ret);
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index c199bf71c373..9173f0232b16 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -173,7 +173,7 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
>   	if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
>   		return -EOPNOTSUPP;
>   
> -	for_i915_gem_ww(&ww, err, true) {
> +	for_i915_dmabuf_ww(&ww, dmabuf, err, true) {
>   		err = i915_gem_object_migrate(obj, &ww, INTEL_REGION_SMEM);
>   		if (err)
>   			continue;
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 30fe847c6664..ad7d602fc43a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3409,7 +3409,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   		goto err_vma;
>   	}
>   
> -	ww_acquire_done(&eb.ww.ctx);
> +	ww_acquire_done(eb.ww.ctx);
>   	eb_capture_stage(&eb);
>   
>   	out_fence = eb_requests_create(&eb, in_fence, out_fence_fd);
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index e11d82a9f7c3..5ae38f94a5c7 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -178,9 +178,9 @@ static inline int __i915_gem_object_lock(struct drm_i915_gem_object *obj,
>   	int ret;
>   
>   	if (intr)
> -		ret = dma_resv_lock_interruptible(obj->base.resv, ww ? &ww->ctx : NULL);
> +		ret = dma_resv_lock_interruptible(obj->base.resv, ww ? ww->ctx : NULL);
>   	else
> -		ret = dma_resv_lock(obj->base.resv, ww ? &ww->ctx : NULL);
> +		ret = dma_resv_lock(obj->base.resv, ww ? ww->ctx : NULL);
>   
>   	if (!ret && ww) {
>   		i915_gem_object_get(obj);
> @@ -216,7 +216,7 @@ static inline bool i915_gem_object_trylock(struct drm_i915_gem_object *obj,
>   	if (!ww)
>   		return dma_resv_trylock(obj->base.resv);
>   	else
> -		return ww_mutex_trylock(&obj->base.resv->lock, &ww->ctx);
> +		return ww_mutex_trylock(&obj->base.resv->lock, ww->ctx);
>   }
>   
>   static inline void i915_gem_object_unlock(struct drm_i915_gem_object *obj)
> diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
> index f025ee4fa526..047f72e32d47 100644
> --- a/drivers/gpu/drm/i915/i915_gem_evict.c
> +++ b/drivers/gpu/drm/i915/i915_gem_evict.c
> @@ -458,7 +458,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm, struct i915_gem_ww_ctx *ww)
>   			 * need the object ref.
>   			 */
>   			if (dying_vma(vma) ||
> -			    (ww && (dma_resv_locking_ctx(vma->obj->base.resv) == &ww->ctx))) {
> +			    (ww && (dma_resv_locking_ctx(vma->obj->base.resv) == ww->ctx))) {
>   				__i915_vma_pin(vma);
>   				list_add(&vma->evict_link, &locked_eviction_list);
>   				continue;
> diff --git a/drivers/gpu/drm/i915/i915_gem_ww.c b/drivers/gpu/drm/i915/i915_gem_ww.c
> index 3f6ff139478e..c47898993c7d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_ww.c
> +++ b/drivers/gpu/drm/i915/i915_gem_ww.c
> @@ -6,12 +6,20 @@
>   #include "i915_gem_ww.h"
>   #include "gem/i915_gem_object.h"
>   
> -void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ww, bool intr)
> +void i915_gem_ww_ctx_prep(struct i915_gem_ww_ctx *ww,
> +			  struct ww_acquire_ctx *ww_ctx,
> +			  bool intr)
>   {
> -	ww_acquire_init(&ww->ctx, &reservation_ww_class);
>   	INIT_LIST_HEAD(&ww->obj_list);
>   	ww->intr = intr;
>   	ww->contended = NULL;
> +	ww->ctx = ww_ctx;
> +}
> +
> +void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ww, bool intr)
> +{
> +	ww_acquire_init(&ww->ww_ctx, &reservation_ww_class);
> +	i915_gem_ww_ctx_prep(ww, &ww->ww_ctx, intr);
>   }
>   
>   static void i915_gem_ww_ctx_unlock_all(struct i915_gem_ww_ctx *ww)
> @@ -36,7 +44,15 @@ void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ww)
>   {
>   	i915_gem_ww_ctx_unlock_all(ww);
>   	WARN_ON(ww->contended);
> -	ww_acquire_fini(&ww->ctx);
> +
> +	if (ww->ctx == &ww->ww_ctx)
> +		ww_acquire_fini(ww->ctx);
> +}
> +
> +void i915_gem_ww_ctx_fini2(struct i915_gem_ww_ctx *ww)
> +{
> +	i915_gem_ww_ctx_unlock_all(ww);
> +	WARN_ON(ww->contended);
>   }
>   
>   int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ww)
> @@ -48,9 +64,9 @@ int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ww)
>   
>   	i915_gem_ww_ctx_unlock_all(ww);
>   	if (ww->intr)
> -		ret = dma_resv_lock_slow_interruptible(ww->contended->base.resv, &ww->ctx);
> +		ret = dma_resv_lock_slow_interruptible(ww->contended->base.resv, ww->ctx);
>   	else
> -		dma_resv_lock_slow(ww->contended->base.resv, &ww->ctx);
> +		dma_resv_lock_slow(ww->contended->base.resv, ww->ctx);
>   
>   	if (!ret)
>   		list_add_tail(&ww->contended->obj_link, &ww->obj_list);
> diff --git a/drivers/gpu/drm/i915/i915_gem_ww.h b/drivers/gpu/drm/i915/i915_gem_ww.h
> index 86f0fe343de6..e9b0fd4debbf 100644
> --- a/drivers/gpu/drm/i915/i915_gem_ww.h
> +++ b/drivers/gpu/drm/i915/i915_gem_ww.h
> @@ -8,13 +8,17 @@
>   #include <drm/drm_drv.h>
>   
>   struct i915_gem_ww_ctx {
> -	struct ww_acquire_ctx ctx;
> +	struct ww_acquire_ctx *ctx;
> +	struct ww_acquire_ctx ww_ctx;
>   	struct list_head obj_list;
>   	struct drm_i915_gem_object *contended;
>   	bool intr;
>   };
>   
> -void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ctx, bool intr);
> +void i915_gem_ww_ctx_prep(struct i915_gem_ww_ctx *ww,
> +			  struct ww_acquire_ctx *ww_ctx,
> +			  bool intr);
> +void i915_gem_ww_ctx_init(struct i915_gem_ww_ctx *ww, bool intr);
>   void i915_gem_ww_ctx_fini(struct i915_gem_ww_ctx *ctx);
>   int __must_check i915_gem_ww_ctx_backoff(struct i915_gem_ww_ctx *ctx);
>   void i915_gem_ww_unlock_single(struct drm_i915_gem_object *obj);
> @@ -38,4 +42,11 @@ static inline int __i915_gem_ww_fini(struct i915_gem_ww_ctx *ww, int err)
>   	for (i915_gem_ww_ctx_init(_ww, _intr), (_err) = -EDEADLK; \
>   	     (_err) == -EDEADLK;				  \
>   	     (_err) = __i915_gem_ww_fini(_ww, _err))
> +
> +#define for_i915_dmabuf_ww(_ww, _dmabuf, _err, _intr)		  \
> +	for (i915_gem_ww_ctx_prep(_ww, dma_resv_locking_ctx((_dmabuf)->resv), _intr), \
> +	     (_err) = -EDEADLK; 				  \
> +	     (_err) == -EDEADLK;				  \
> +	     (_err) = __i915_gem_ww_fini(_ww, _err))
> +
>   #endif
Christian König July 15, 2022, 7:31 a.m. UTC | #2
Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:
> The new common dma-buf locking convention will require buffer importers
> to hold the reservation lock around mapping operations. Make DRM GEM core
> to take the lock around the vmapping operations and update QXL and i915
> drivers to use the locked functions for the case where DRM core now holds
> the lock. This patch prepares DRM core and drivers to transition to the
> common dma-buf locking convention where vmapping of exported GEMs will
> be done under the held reservation lock.

Oh ^^ That looks like a bug fix to me!

At least drm_gem_ttm_vmap() and drm_gem_ttm_vunmap() already expected 
that they are called with the reservation lock held.

Otherwise you could mess up internal structures in the TTM buffer object 
while vmapping it.

I will take a deeper look at this.

Regards,
Christian.

>
> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
> ---
>   drivers/gpu/drm/drm_client.c                 |  4 +--
>   drivers/gpu/drm/drm_gem.c                    | 28 ++++++++++++++++++++
>   drivers/gpu/drm/drm_gem_framebuffer_helper.c |  6 ++---
>   drivers/gpu/drm/drm_prime.c                  |  4 +--
>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  2 +-
>   drivers/gpu/drm/qxl/qxl_object.c             | 17 ++++++------
>   drivers/gpu/drm/qxl/qxl_prime.c              |  4 +--
>   include/drm/drm_gem.h                        |  3 +++
>   8 files changed, 50 insertions(+), 18 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_client.c b/drivers/gpu/drm/drm_client.c
> index 2b230b4d6942..fbcb1e995384 100644
> --- a/drivers/gpu/drm/drm_client.c
> +++ b/drivers/gpu/drm/drm_client.c
> @@ -323,7 +323,7 @@ drm_client_buffer_vmap(struct drm_client_buffer *buffer,
>   	 * fd_install step out of the driver backend hooks, to make that
>   	 * final step optional for internal users.
>   	 */
> -	ret = drm_gem_vmap(buffer->gem, map);
> +	ret = drm_gem_vmap_unlocked(buffer->gem, map);
>   	if (ret)
>   		return ret;
>   
> @@ -345,7 +345,7 @@ void drm_client_buffer_vunmap(struct drm_client_buffer *buffer)
>   {
>   	struct iosys_map *map = &buffer->map;
>   
> -	drm_gem_vunmap(buffer->gem, map);
> +	drm_gem_vunmap_unlocked(buffer->gem, map);
>   }
>   EXPORT_SYMBOL(drm_client_buffer_vunmap);
>   
> diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
> index eb0c2d041f13..9769c33cad99 100644
> --- a/drivers/gpu/drm/drm_gem.c
> +++ b/drivers/gpu/drm/drm_gem.c
> @@ -1155,6 +1155,8 @@ void drm_gem_print_info(struct drm_printer *p, unsigned int indent,
>   
>   int drm_gem_pin(struct drm_gem_object *obj)
>   {
> +	dma_resv_assert_held(obj->resv);
> +
>   	if (obj->funcs->pin)
>   		return obj->funcs->pin(obj);
>   	else
> @@ -1163,6 +1165,8 @@ int drm_gem_pin(struct drm_gem_object *obj)
>   
>   void drm_gem_unpin(struct drm_gem_object *obj)
>   {
> +	dma_resv_assert_held(obj->resv);
> +
>   	if (obj->funcs->unpin)
>   		obj->funcs->unpin(obj);
>   }
> @@ -1171,6 +1175,8 @@ int drm_gem_vmap(struct drm_gem_object *obj, struct iosys_map *map)
>   {
>   	int ret;
>   
> +	dma_resv_assert_held(obj->resv);
> +
>   	if (!obj->funcs->vmap)
>   		return -EOPNOTSUPP;
>   
> @@ -1186,6 +1192,8 @@ EXPORT_SYMBOL(drm_gem_vmap);
>   
>   void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
>   {
> +	dma_resv_assert_held(obj->resv);
> +
>   	if (iosys_map_is_null(map))
>   		return;
>   
> @@ -1197,6 +1205,26 @@ void drm_gem_vunmap(struct drm_gem_object *obj, struct iosys_map *map)
>   }
>   EXPORT_SYMBOL(drm_gem_vunmap);
>   
> +int drm_gem_vmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map)
> +{
> +	int ret;
> +
> +	dma_resv_lock(obj->resv, NULL);
> +	ret = drm_gem_vmap(obj, map);
> +	dma_resv_unlock(obj->resv);
> +
> +	return ret;
> +}
> +EXPORT_SYMBOL(drm_gem_vmap_unlocked);
> +
> +void drm_gem_vunmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map)
> +{
> +	dma_resv_lock(obj->resv, NULL);
> +	drm_gem_vunmap(obj, map);
> +	dma_resv_unlock(obj->resv);
> +}
> +EXPORT_SYMBOL(drm_gem_vunmap_unlocked);
> +
>   /**
>    * drm_gem_lock_reservations - Sets up the ww context and acquires
>    * the lock on an array of GEM objects.
> diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
> index 880a4975507f..e35e224e6303 100644
> --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c
> +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c
> @@ -354,7 +354,7 @@ int drm_gem_fb_vmap(struct drm_framebuffer *fb, struct iosys_map *map,
>   			ret = -EINVAL;
>   			goto err_drm_gem_vunmap;
>   		}
> -		ret = drm_gem_vmap(obj, &map[i]);
> +		ret = drm_gem_vmap_unlocked(obj, &map[i]);
>   		if (ret)
>   			goto err_drm_gem_vunmap;
>   	}
> @@ -376,7 +376,7 @@ int drm_gem_fb_vmap(struct drm_framebuffer *fb, struct iosys_map *map,
>   		obj = drm_gem_fb_get_obj(fb, i);
>   		if (!obj)
>   			continue;
> -		drm_gem_vunmap(obj, &map[i]);
> +		drm_gem_vunmap_unlocked(obj, &map[i]);
>   	}
>   	return ret;
>   }
> @@ -403,7 +403,7 @@ void drm_gem_fb_vunmap(struct drm_framebuffer *fb, struct iosys_map *map)
>   			continue;
>   		if (iosys_map_is_null(&map[i]))
>   			continue;
> -		drm_gem_vunmap(obj, &map[i]);
> +		drm_gem_vunmap_unlocked(obj, &map[i]);
>   	}
>   }
>   EXPORT_SYMBOL(drm_gem_fb_vunmap);
> diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> index b75ef1756873..1bd234fd21a5 100644
> --- a/drivers/gpu/drm/drm_prime.c
> +++ b/drivers/gpu/drm/drm_prime.c
> @@ -678,7 +678,7 @@ int drm_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct iosys_map *map)
>   {
>   	struct drm_gem_object *obj = dma_buf->priv;
>   
> -	return drm_gem_vmap(obj, map);
> +	return drm_gem_vmap_unlocked(obj, map);
>   }
>   EXPORT_SYMBOL(drm_gem_dmabuf_vmap);
>   
> @@ -694,7 +694,7 @@ void drm_gem_dmabuf_vunmap(struct dma_buf *dma_buf, struct iosys_map *map)
>   {
>   	struct drm_gem_object *obj = dma_buf->priv;
>   
> -	drm_gem_vunmap(obj, map);
> +	drm_gem_vunmap_unlocked(obj, map);
>   }
>   EXPORT_SYMBOL(drm_gem_dmabuf_vunmap);
>   
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> index 5ecea7df98b1..cc54a5b1d6ae 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
> @@ -72,7 +72,7 @@ static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf,
>   	struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
>   	void *vaddr;
>   
> -	vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
> +	vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
>   	if (IS_ERR(vaddr))
>   		return PTR_ERR(vaddr);
>   
> diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c
> index 695d9308d1f0..06a58dad5f5c 100644
> --- a/drivers/gpu/drm/qxl/qxl_object.c
> +++ b/drivers/gpu/drm/qxl/qxl_object.c
> @@ -168,9 +168,16 @@ int qxl_bo_vmap_locked(struct qxl_bo *bo, struct iosys_map *map)
>   		bo->map_count++;
>   		goto out;
>   	}
> -	r = ttm_bo_vmap(&bo->tbo, &bo->map);
> +
> +	r = __qxl_bo_pin(bo);
>   	if (r)
>   		return r;
> +
> +	r = ttm_bo_vmap(&bo->tbo, &bo->map);
> +	if (r) {
> +		__qxl_bo_unpin(bo);
> +		return r;
> +	}
>   	bo->map_count = 1;
>   
>   	/* TODO: Remove kptr in favor of map everywhere. */
> @@ -192,12 +199,6 @@ int qxl_bo_vmap(struct qxl_bo *bo, struct iosys_map *map)
>   	if (r)
>   		return r;
>   
> -	r = __qxl_bo_pin(bo);
> -	if (r) {
> -		qxl_bo_unreserve(bo);
> -		return r;
> -	}
> -
>   	r = qxl_bo_vmap_locked(bo, map);
>   	qxl_bo_unreserve(bo);
>   	return r;
> @@ -247,6 +248,7 @@ void qxl_bo_vunmap_locked(struct qxl_bo *bo)
>   		return;
>   	bo->kptr = NULL;
>   	ttm_bo_vunmap(&bo->tbo, &bo->map);
> +	__qxl_bo_unpin(bo);
>   }
>   
>   int qxl_bo_vunmap(struct qxl_bo *bo)
> @@ -258,7 +260,6 @@ int qxl_bo_vunmap(struct qxl_bo *bo)
>   		return r;
>   
>   	qxl_bo_vunmap_locked(bo);
> -	__qxl_bo_unpin(bo);
>   	qxl_bo_unreserve(bo);
>   	return 0;
>   }
> diff --git a/drivers/gpu/drm/qxl/qxl_prime.c b/drivers/gpu/drm/qxl/qxl_prime.c
> index 142d01415acb..9169c26357d3 100644
> --- a/drivers/gpu/drm/qxl/qxl_prime.c
> +++ b/drivers/gpu/drm/qxl/qxl_prime.c
> @@ -59,7 +59,7 @@ int qxl_gem_prime_vmap(struct drm_gem_object *obj, struct iosys_map *map)
>   	struct qxl_bo *bo = gem_to_qxl_bo(obj);
>   	int ret;
>   
> -	ret = qxl_bo_vmap(bo, map);
> +	ret = qxl_bo_vmap_locked(bo, map);
>   	if (ret < 0)
>   		return ret;
>   
> @@ -71,5 +71,5 @@ void qxl_gem_prime_vunmap(struct drm_gem_object *obj,
>   {
>   	struct qxl_bo *bo = gem_to_qxl_bo(obj);
>   
> -	qxl_bo_vunmap(bo);
> +	qxl_bo_vunmap_locked(bo);
>   }
> diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> index 87cffc9efa85..bf3700415229 100644
> --- a/include/drm/drm_gem.h
> +++ b/include/drm/drm_gem.h
> @@ -420,4 +420,7 @@ void drm_gem_unlock_reservations(struct drm_gem_object **objs, int count,
>   int drm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
>   			    u32 handle, u64 *offset);
>   
> +int drm_gem_vmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map);
> +void drm_gem_vunmap_unlocked(struct drm_gem_object *obj, struct iosys_map *map);
> +
>   #endif /* __DRM_GEM_H__ */
Christian König July 15, 2022, 11:04 a.m. UTC | #3
Am 15.07.22 um 11:31 schrieb Dmitry Osipenko:
> On 7/15/22 10:19, Christian König wrote:
>>> -struct sg_table *dma_buf_map_attachment(struct dma_buf_attachment
>>> *attach,
>>> -                    enum dma_data_direction direction)
>>> +struct sg_table *
>>> +dma_buf_map_attachment_unlocked(struct dma_buf_attachment *attach,
>>> +                enum dma_data_direction direction)
>> The locking state of mapping and unmapping operations depend on if the
>> attachment is dynamic or not.
>>
>> So this here is not a good idea at all since it suggests that the
>> function is always called without holding the lock.
> I had the same thought while was working on this patch and initially was
> thinking about adding an "unlocked" alias to dma_buf_map_attachment().
> In the end I decided that it will create even more confusion and it's
> simpler just to rename this func here since there are only two drivers
> using the dynamic mapping.
>
> Do you have suggestions how to improve it?

Just keep it as it is. The ultimative goal is to switch all drivers over 
to the dynamic mapping or at least over to assume that the map/unmap 
callbacks are called with the lock held.

Regards,
Christian.
Tomasz Figa July 19, 2022, 9:13 a.m. UTC | #4
On Fri, Jul 15, 2022 at 9:53 AM Dmitry Osipenko
<dmitry.osipenko@collabora.com> wrote:
>
> Hello,
>
> This series moves all drivers to a dynamic dma-buf locking specification.
> From now on all dma-buf importers are made responsible for holding
> dma-buf's reservation lock around all operations performed over dma-bufs.
> This common locking convention allows us to utilize reservation lock more
> broadly around kernel without fearing of potential dead locks.
>
> This patchset passes all i915 selftests. It was also tested using VirtIO,
> Panfrost, Lima and Tegra drivers. I tested cases of display+GPU,
> display+V4L and GPU+V4L dma-buf sharing, which covers majority of kernel
> drivers since rest of the drivers share same or similar code paths.
>
> This is a continuation of [1] where Christian König asked to factor out
> the dma-buf locking changes into separate series.
>
> [1] https://lore.kernel.org/dri-devel/20220526235040.678984-1-dmitry.osipenko@collabora.com/
>
> Dmitry Osipenko (6):
>   dma-buf: Add _unlocked postfix to function names
>   drm/gem: Take reservation lock for vmap/vunmap operations
>   dma-buf: Move all dma-bufs to dynamic locking specification
>   dma-buf: Acquire wait-wound context on attachment
>   media: videobuf2: Stop using internal dma-buf lock
>   dma-buf: Remove internal lock
>
>  drivers/dma-buf/dma-buf.c                     | 198 +++++++++++-------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |   4 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       |   4 +-
>  drivers/gpu/drm/armada/armada_gem.c           |  14 +-
>  drivers/gpu/drm/drm_client.c                  |   4 +-
>  drivers/gpu/drm/drm_gem.c                     |  28 +++
>  drivers/gpu/drm/drm_gem_cma_helper.c          |   6 +-
>  drivers/gpu/drm/drm_gem_framebuffer_helper.c  |   6 +-
>  drivers/gpu/drm/drm_gem_shmem_helper.c        |   6 +-
>  drivers/gpu/drm/drm_prime.c                   |  12 +-
>  drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c   |   6 +-
>  drivers/gpu/drm/exynos/exynos_drm_gem.c       |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  20 +-
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_object.h    |   6 +-
>  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  |  20 +-
>  drivers/gpu/drm/i915/i915_gem_evict.c         |   2 +-
>  drivers/gpu/drm/i915/i915_gem_ww.c            |  26 ++-
>  drivers/gpu/drm/i915/i915_gem_ww.h            |  15 +-
>  drivers/gpu/drm/omapdrm/omap_gem_dmabuf.c     |   8 +-
>  drivers/gpu/drm/qxl/qxl_object.c              |  17 +-
>  drivers/gpu/drm/qxl/qxl_prime.c               |   4 +-
>  drivers/gpu/drm/tegra/gem.c                   |  27 +--
>  drivers/infiniband/core/umem_dmabuf.c         |  11 +-
>  .../common/videobuf2/videobuf2-dma-contig.c   |  26 +--
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  23 +-
>  .../common/videobuf2/videobuf2-vmalloc.c      |  17 +-

For the videobuf2 changes:

Acked-by: Tomasz Figa <tfiga@chromium.org>

Best regards,
Tomasz
Dmitry Osipenko July 19, 2022, 8:05 p.m. UTC | #5
On 7/15/22 09:59, Dmitry Osipenko wrote:
> On 7/15/22 09:50, Christian König wrote:
>> Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:
>>> Intel i915 GPU driver uses wait-wound mutex to lock multiple GEMs on the
>>> attachment to the i915 dma-buf. In order to let all drivers utilize
>>> shared
>>> wait-wound context during attachment in a general way, make dma-buf
>>> core to
>>> acquire the ww context internally for the attachment operation and update
>>> i915 driver to use the importer's ww context instead of the internal one.
>>>
>>>  From now on all dma-buf exporters shall use the importer's ww context
>>> for
>>> the attachment operation.
>>>
>>> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
>>> ---
>>>   drivers/dma-buf/dma-buf.c                     |  8 +++++-
>>>   drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  2 +-
>>>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  2 +-
>>>   drivers/gpu/drm/i915/gem/i915_gem_object.h    |  6 ++---
>>>   drivers/gpu/drm/i915/i915_gem_evict.c         |  2 +-
>>>   drivers/gpu/drm/i915/i915_gem_ww.c            | 26 +++++++++++++++----
>>>   drivers/gpu/drm/i915/i915_gem_ww.h            | 15 +++++++++--
>>>   7 files changed, 47 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>> index 0ee588276534..37545ecb845a 100644
>>> --- a/drivers/dma-buf/dma-buf.c
>>> +++ b/drivers/dma-buf/dma-buf.c
>>> @@ -807,6 +807,8 @@ static struct sg_table * __map_dma_buf(struct
>>> dma_buf_attachment *attach,
>>>    * Optionally this calls &dma_buf_ops.attach to allow
>>> device-specific attach
>>>    * functionality.
>>>    *
>>> + * Exporters shall use ww_ctx acquired by this function.
>>> + *
>>>    * Returns:
>>>    *
>>>    * A pointer to newly created &dma_buf_attachment on success, or a
>>> negative
>>> @@ -822,6 +824,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
>>> *dmabuf, struct device *dev,
>>>                   void *importer_priv)
>>>   {
>>>       struct dma_buf_attachment *attach;
>>> +    struct ww_acquire_ctx ww_ctx;
>>>       int ret;
>>>         if (WARN_ON(!dmabuf || !dev))
>>> @@ -841,7 +844,8 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
>>> *dmabuf, struct device *dev,
>>>       attach->importer_ops = importer_ops;
>>>       attach->importer_priv = importer_priv;
>>>   -    dma_resv_lock(dmabuf->resv, NULL);
>>> +    ww_acquire_init(&ww_ctx, &reservation_ww_class);
>>> +    dma_resv_lock(dmabuf->resv, &ww_ctx);
>>
>> That won't work like this. The core property of a WW context is that you
>> need to unwind all the locks and re-quire them with the contended one
>> first.
>>
>> When you statically lock the imported one here you can't do that any more.
> 
> You're right. I felt that something is missing here, but couldn't
> notice. I'll think more about this and enable
> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. Thank you!
> 

Christian, do you think we could make an excuse for the attach()
callback and make the exporter responsible for taking the resv lock? It
will be inconsistent with the rest of the callbacks, where importer
takes the lock, but it will be the simplest and least invasive solution.
It's very messy to do a cross-driver ww locking, I don't think it's the
right approach.
Christian König July 20, 2022, 8:29 a.m. UTC | #6
Am 19.07.22 um 22:05 schrieb Dmitry Osipenko:
> On 7/15/22 09:59, Dmitry Osipenko wrote:
>> On 7/15/22 09:50, Christian König wrote:
>>> Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:
>>>> Intel i915 GPU driver uses wait-wound mutex to lock multiple GEMs on the
>>>> attachment to the i915 dma-buf. In order to let all drivers utilize
>>>> shared
>>>> wait-wound context during attachment in a general way, make dma-buf
>>>> core to
>>>> acquire the ww context internally for the attachment operation and update
>>>> i915 driver to use the importer's ww context instead of the internal one.
>>>>
>>>>   From now on all dma-buf exporters shall use the importer's ww context
>>>> for
>>>> the attachment operation.
>>>>
>>>> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
>>>> ---
>>>>    drivers/dma-buf/dma-buf.c                     |  8 +++++-
>>>>    drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  2 +-
>>>>    .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  2 +-
>>>>    drivers/gpu/drm/i915/gem/i915_gem_object.h    |  6 ++---
>>>>    drivers/gpu/drm/i915/i915_gem_evict.c         |  2 +-
>>>>    drivers/gpu/drm/i915/i915_gem_ww.c            | 26 +++++++++++++++----
>>>>    drivers/gpu/drm/i915/i915_gem_ww.h            | 15 +++++++++--
>>>>    7 files changed, 47 insertions(+), 14 deletions(-)
>>>>
>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>> index 0ee588276534..37545ecb845a 100644
>>>> --- a/drivers/dma-buf/dma-buf.c
>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>> @@ -807,6 +807,8 @@ static struct sg_table * __map_dma_buf(struct
>>>> dma_buf_attachment *attach,
>>>>     * Optionally this calls &dma_buf_ops.attach to allow
>>>> device-specific attach
>>>>     * functionality.
>>>>     *
>>>> + * Exporters shall use ww_ctx acquired by this function.
>>>> + *
>>>>     * Returns:
>>>>     *
>>>>     * A pointer to newly created &dma_buf_attachment on success, or a
>>>> negative
>>>> @@ -822,6 +824,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
>>>> *dmabuf, struct device *dev,
>>>>                    void *importer_priv)
>>>>    {
>>>>        struct dma_buf_attachment *attach;
>>>> +    struct ww_acquire_ctx ww_ctx;
>>>>        int ret;
>>>>          if (WARN_ON(!dmabuf || !dev))
>>>> @@ -841,7 +844,8 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
>>>> *dmabuf, struct device *dev,
>>>>        attach->importer_ops = importer_ops;
>>>>        attach->importer_priv = importer_priv;
>>>>    -    dma_resv_lock(dmabuf->resv, NULL);
>>>> +    ww_acquire_init(&ww_ctx, &reservation_ww_class);
>>>> +    dma_resv_lock(dmabuf->resv, &ww_ctx);
>>> That won't work like this. The core property of a WW context is that you
>>> need to unwind all the locks and re-quire them with the contended one
>>> first.
>>>
>>> When you statically lock the imported one here you can't do that any more.
>> You're right. I felt that something is missing here, but couldn't
>> notice. I'll think more about this and enable
>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. Thank you!
>>
> Christian, do you think we could make an excuse for the attach()
> callback and make the exporter responsible for taking the resv lock? It
> will be inconsistent with the rest of the callbacks, where importer
> takes the lock, but it will be the simplest and least invasive solution.
> It's very messy to do a cross-driver ww locking, I don't think it's the
> right approach.

So to summarize the following calls will require that the caller hold 
the resv lock:
1. dma_buf_pin()/dma_buf_unpin()
2. dma_buf_map_attachment()/dma_buf_unmap_attachment()
3. dma_buf_vmap()/dma_buf_vunmap()
4. dma_buf_move_notify()

The following calls require that caller does not held the resv lock:
1. dma_buf_attach()/dma_buf_dynamic_attach()/dma_buf_detach()
2. dma_buf_export()/dma_buf_fd()
3. dma_buf_get()/dma_buf_put()
4. dma_buf_begin_cpu_access()/dma_buf_end_cpu_access()

If that's correct than that would work for me as well, but we should 
probably document this.

Or let me ask the other way around: What calls exactly do you need to 
change to solve your original issue? That was vmap/vunmap, wasn't it? If 
yes then let's concentrate on those for the moment.

Regards,
Christian.
Dmitry Osipenko July 20, 2022, 12:18 p.m. UTC | #7
On 7/20/22 11:29, Christian König wrote:
> Am 19.07.22 um 22:05 schrieb Dmitry Osipenko:
>> On 7/15/22 09:59, Dmitry Osipenko wrote:
>>> On 7/15/22 09:50, Christian König wrote:
>>>> Am 15.07.22 um 02:52 schrieb Dmitry Osipenko:
>>>>> Intel i915 GPU driver uses wait-wound mutex to lock multiple GEMs
>>>>> on the
>>>>> attachment to the i915 dma-buf. In order to let all drivers utilize
>>>>> shared
>>>>> wait-wound context during attachment in a general way, make dma-buf
>>>>> core to
>>>>> acquire the ww context internally for the attachment operation and
>>>>> update
>>>>> i915 driver to use the importer's ww context instead of the
>>>>> internal one.
>>>>>
>>>>>   From now on all dma-buf exporters shall use the importer's ww
>>>>> context
>>>>> for
>>>>> the attachment operation.
>>>>>
>>>>> Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
>>>>> ---
>>>>>    drivers/dma-buf/dma-buf.c                     |  8 +++++-
>>>>>    drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c    |  2 +-
>>>>>    .../gpu/drm/i915/gem/i915_gem_execbuffer.c    |  2 +-
>>>>>    drivers/gpu/drm/i915/gem/i915_gem_object.h    |  6 ++---
>>>>>    drivers/gpu/drm/i915/i915_gem_evict.c         |  2 +-
>>>>>    drivers/gpu/drm/i915/i915_gem_ww.c            | 26
>>>>> +++++++++++++++----
>>>>>    drivers/gpu/drm/i915/i915_gem_ww.h            | 15 +++++++++--
>>>>>    7 files changed, 47 insertions(+), 14 deletions(-)
>>>>>
>>>>> diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>>>>> index 0ee588276534..37545ecb845a 100644
>>>>> --- a/drivers/dma-buf/dma-buf.c
>>>>> +++ b/drivers/dma-buf/dma-buf.c
>>>>> @@ -807,6 +807,8 @@ static struct sg_table * __map_dma_buf(struct
>>>>> dma_buf_attachment *attach,
>>>>>     * Optionally this calls &dma_buf_ops.attach to allow
>>>>> device-specific attach
>>>>>     * functionality.
>>>>>     *
>>>>> + * Exporters shall use ww_ctx acquired by this function.
>>>>> + *
>>>>>     * Returns:
>>>>>     *
>>>>>     * A pointer to newly created &dma_buf_attachment on success, or a
>>>>> negative
>>>>> @@ -822,6 +824,7 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
>>>>> *dmabuf, struct device *dev,
>>>>>                    void *importer_priv)
>>>>>    {
>>>>>        struct dma_buf_attachment *attach;
>>>>> +    struct ww_acquire_ctx ww_ctx;
>>>>>        int ret;
>>>>>          if (WARN_ON(!dmabuf || !dev))
>>>>> @@ -841,7 +844,8 @@ dma_buf_dynamic_attach_unlocked(struct dma_buf
>>>>> *dmabuf, struct device *dev,
>>>>>        attach->importer_ops = importer_ops;
>>>>>        attach->importer_priv = importer_priv;
>>>>>    -    dma_resv_lock(dmabuf->resv, NULL);
>>>>> +    ww_acquire_init(&ww_ctx, &reservation_ww_class);
>>>>> +    dma_resv_lock(dmabuf->resv, &ww_ctx);
>>>> That won't work like this. The core property of a WW context is that
>>>> you
>>>> need to unwind all the locks and re-quire them with the contended one
>>>> first.
>>>>
>>>> When you statically lock the imported one here you can't do that any
>>>> more.
>>> You're right. I felt that something is missing here, but couldn't
>>> notice. I'll think more about this and enable
>>> CONFIG_DEBUG_WW_MUTEX_SLOWPATH. Thank you!
>>>
>> Christian, do you think we could make an excuse for the attach()
>> callback and make the exporter responsible for taking the resv lock? It
>> will be inconsistent with the rest of the callbacks, where importer
>> takes the lock, but it will be the simplest and least invasive solution.
>> It's very messy to do a cross-driver ww locking, I don't think it's the
>> right approach.
> 
> So to summarize the following calls will require that the caller hold
> the resv lock:
> 1. dma_buf_pin()/dma_buf_unpin()
> 2. dma_buf_map_attachment()/dma_buf_unmap_attachment()
> 3. dma_buf_vmap()/dma_buf_vunmap()
> 4. dma_buf_move_notify()
> 
> The following calls require that caller does not held the resv lock:
> 1. dma_buf_attach()/dma_buf_dynamic_attach()/dma_buf_detach()
> 2. dma_buf_export()/dma_buf_fd()
> 3. dma_buf_get()/dma_buf_put()
> 4. dma_buf_begin_cpu_access()/dma_buf_end_cpu_access()
> 
> If that's correct than that would work for me as well, but we should
> probably document this.

Looks good, thank you. I'll try this variant.

> Or let me ask the other way around: What calls exactly do you need to
> change to solve your original issue? That was vmap/vunmap, wasn't it? If
> yes then let's concentrate on those for the moment.

Originally, Daniel Vetter asked to sort out the dma-buf lockings across
all drivers, so we could replace custom locks in DRM-SHMEM with the resv
lock, otherwise there were no guarantees that we won't have deadlocks in
the dma-buf code paths.

The vmap/vunmap is one of the paths that needs to be sorted out, there
is no particular issue with it, just need to specify the convention. The
mmaping was the other questionable path and we concluded that it's
better to prohibit dma-buf mappings for DRM entirely. Lastly, there is
i915 attach() that uses the ww locking.