From patchwork Fri Jul 14 08:21:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Asahi Lina X-Patchwork-Id: 703659 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 653EEEB64DA for ; Fri, 14 Jul 2023 08:31:53 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235426AbjGNIbv (ORCPT ); Fri, 14 Jul 2023 04:31:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbjGNIbj (ORCPT ); Fri, 14 Jul 2023 04:31:39 -0400 Received: from mail.marcansoft.com (marcansoft.com [IPv6:2a01:298:fe:f::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A8B4212B; Fri, 14 Jul 2023 01:31:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: linasend@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 0184C5BC38; Fri, 14 Jul 2023 08:21:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689322899; bh=SiuL//u01cpp5eowFbvlhLlowHgzfDnh6OXKL8CX6Qo=; h=From:Date:Subject:References:In-Reply-To:To:Cc; b=jTbM8iICekJjMQkS8TPm+lGxRLkRWT0/LnMFix4yI5i9aW2W8kIC4DZfAIW3ftLAt VeHeIq0qSPMn0ajIYWJL/jWVmvCL3GajvWpOvcbiAXBfT3f85HmmPMD4SW/fIL8h/C s5RYchWzcSOH8IQzSJ5Px/0rsjQn2RnSP8Kt0d+SsD/NwC2PhBOMcR3cxg1aTOYDvV Ei7SoK9VjbbNk5Sli2mLSr7WToHp8/EQ3mPvdTujFcB39SO2gCK868iIsuiXC2DcXr 99DTSYsBHjww36hV2i+IfIO+dZCpihCiJz0aMsM5HMpstYVad7TJ/4QiN3z9hOJhei E+WjBqvgJKZQQ== From: Asahi Lina Date: Fri, 14 Jul 2023 17:21:29 +0900 Subject: [PATCH 1/3] drm/scheduler: Add more documentation MIME-Version: 1.0 Message-Id: <20230714-drm-sched-fixes-v1-1-c567249709f7@asahilina.net> References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> In-Reply-To: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> To: Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal , =?utf-8?q?Christian_K=C3=B6nig?= Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev, Asahi Lina X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1689322891; l=3946; i=lina@asahilina.net; s=20230221; h=from:subject:message-id; bh=SiuL//u01cpp5eowFbvlhLlowHgzfDnh6OXKL8CX6Qo=; b=W7DZprDVToONdsfgSG5/Md25GSR994KvtI3Fa+/4LBSTFzarcsYcFhySWLme0A2jAZXVl+3n/ uLmvTuHCmrXCSYDjQ6pa1slmUz9KlJCVgWpqYdmzcqk73XPqRuk6KKj X-Developer-Key: i=lina@asahilina.net; a=ed25519; pk=Qn8jZuOtR1m5GaiDfTrAoQ4NE1XoYVZ/wmt5YtXWFC4= Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org Document the implied lifetime rules of the scheduler (or at least the intended ones), as well as the expectations of how resource acquisition should be handled. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_main.c | 58 ++++++++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 7b2bfc10c1a5..1f3bc3606239 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -43,9 +43,61 @@ * * The jobs in a entity are always scheduled in the order that they were pushed. * - * Note that once a job was taken from the entities queue and pushed to the - * hardware, i.e. the pending queue, the entity must not be referenced anymore - * through the jobs entity pointer. + * Lifetime rules + * -------------- + * + * Getting object lifetimes right across the stack is critical to avoid UAF + * issues. The DRM scheduler has the following lifetime rules: + * + * - The scheduler must outlive all of its entities. + * - Jobs pushed to the scheduler are owned by it, and must only be freed + * after the free_job() callback is called. + * - Scheduler fences are reference-counted and may outlive the scheduler. + * - The scheduler *may* be destroyed while jobs are still in flight. + * - There is no guarantee that all jobs have been freed when all entities + * and the scheduled have been destroyed. Jobs may be freed asynchronously + * after this point. + * - Once a job is taken from the entity's queue and pushed to the hardware, + * i.e. the pending queue, the entity must not be referenced any more + * through the job's entity pointer. In other words, entities are not + * required to outlive job execution. + * + * If the scheduler is destroyed with jobs in flight, the following + * happens: + * + * - Jobs that were pushed but have not yet run will be destroyed as part + * of the entity cleanup (which must happen before the scheduler itself + * is destroyed, per the first rule above). This signals the job + * finished fence with an error flag. This process runs asynchronously + * after drm_sched_entity_destroy() returns. + * - Jobs that are in-flight on the hardware are "detached" from their + * driver fence (the fence returned from the run_job() callback). In + * this case, it is up to the driver to ensure that any bookkeeping or + * internal data structures have separately managed lifetimes and that + * the hardware either cancels the jobs or runs them to completion. + * The DRM scheduler itself will immediately signal the job complete + * fence (with an error flag) and then call free_job() as part of the + * cleanup process. + * + * After the scheduler is destroyed, drivers *may* (but are not required to) + * skip signaling their remaining driver fences, as long as they have only ever + * been returned to the scheduler being destroyed as the return value from + * run_job() and not passed anywhere else. If these fences are used in any other + * context, then the driver *must* signal them, per the usual fence signaling + * rules. + * + * Resource management + * ------------------- + * + * Drivers may need to acquire certain hardware resources (e.g. VM IDs) in order + * to run a job. This process must happen during the job's prepare() callback, + * not in the run() callback. If any resource is unavailable at job prepare time, + * the driver must return a suitable fence that can be waited on to wait for the + * resource to (potentially) become available. + * + * In order to avoid deadlocks, drivers must always acquire resources in the + * same order, and release them in opposite order when a job completes or if + * resource acquisition fails. */ #include From patchwork Fri Jul 14 08:21:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Asahi Lina X-Patchwork-Id: 703660 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84D57EB64DA for ; Fri, 14 Jul 2023 08:31:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235327AbjGNIbm (ORCPT ); Fri, 14 Jul 2023 04:31:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235189AbjGNIbj (ORCPT ); Fri, 14 Jul 2023 04:31:39 -0400 X-Greylist: delayed 599 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 14 Jul 2023 01:31:37 PDT Received: from mail.marcansoft.com (marcansoft.com [IPv6:2a01:298:fe:f::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2DD91FCD; Fri, 14 Jul 2023 01:31:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: linasend@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 9DFBF5BC3A; Fri, 14 Jul 2023 08:21:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689322902; bh=2fvwzfS8YmheCobPc2+Jok2ymjxw2dE4WFuUm89CeSo=; h=From:Date:Subject:References:In-Reply-To:To:Cc; b=Ul6meaIpNeuWcJCwpAYhbbiGf+LI7dxOrwbWECurU79oGo8mKIXGq/PpwntwxYhL6 3jZjBXVRBxLBudEsjD01rpTbvNsZQqsHexszbM+UPx65RHtHxKoWh3iGQDbZhqac0c THw+fGYmeqS407Le8VwRRLJzQcnsrUsKmVYXvE4n13iIuHOL9OUMhyaPV+s7f27n66 xcSXZrePdpYH2rzOqGGsDd6GDt1kdlSQxj5s+Ptu9W6UFTqHEg7NdBIB3mTJhyIwCP mSI2w2bgT6Q3HAovBrdpzKF/wQmls6tyfDsWUyEiInhAtHI5ue5ZUTnuLnKUKSOKyZ KvwjJ2D3DjJ1A== From: Asahi Lina Date: Fri, 14 Jul 2023 17:21:30 +0900 Subject: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name MIME-Version: 1.0 Message-Id: <20230714-drm-sched-fixes-v1-2-c567249709f7@asahilina.net> References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> In-Reply-To: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> To: Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal , =?utf-8?q?Christian_K=C3=B6nig?= Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev, Asahi Lina X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1689322891; l=3041; i=lina@asahilina.net; s=20230221; h=from:subject:message-id; bh=2fvwzfS8YmheCobPc2+Jok2ymjxw2dE4WFuUm89CeSo=; b=cGIqcU2xLXIEnYceEA1R5m+dM1c4K6uQypwIVQQxiA9kE8cWOkpsy03Lc3UmPGdJzzI4sez2N D0FF7Vk7S0VDhI+z11BCCD/UVgnFr42ZSzTeAuiFCbjqPftu2mpb7Vf X-Developer-Key: i=lina@asahilina.net; a=ed25519; pk=Qn8jZuOtR1m5GaiDfTrAoQ4NE1XoYVZ/wmt5YtXWFC4= Precedence: bulk List-ID: X-Mailing-List: linux-media@vger.kernel.org A signaled scheduler fence can outlive its scheduler, since fences are independencly reference counted. Therefore, we can't reference the scheduler in the get_timeline_name() implementation. Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared dma-bufs reference fences from GPU schedulers that no longer exist. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++- drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- include/drm/gpu_scheduler.h | 5 +++++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b2bbc8a68b30..17f35b0b005a 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -389,7 +389,12 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) /* * Fence is from the same scheduler, only need to wait for - * it to be scheduled + * it to be scheduled. + * + * Note: s_fence->sched could have been freed and reallocated + * as another scheduler. This false positive case is okay, as if + * the old scheduler was freed all of its jobs must have + * signaled their completion fences. */ fence = dma_fence_get(&s_fence->scheduled); dma_fence_put(entity->dependency); diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index ef120475e7c6..06a0eebcca10 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -68,7 +68,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence) static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f) { struct drm_sched_fence *fence = to_drm_sched_fence(f); - return (const char *)fence->sched->name; + return (const char *)fence->sched_name; } static void drm_sched_fence_free_rcu(struct rcu_head *rcu) @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, unsigned seq; fence->sched = entity->rq->sched; + strlcpy(fence->sched_name, entity->rq->sched->name, + sizeof(fence->sched_name)); seq = atomic_inc_return(&entity->fence_seq); dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, &fence->lock, entity->fence_context, seq); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index e95b4837e5a3..4fa9523bd47d 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -305,6 +305,11 @@ struct drm_sched_fence { * @lock: the lock used by the scheduled and the finished fences. */ spinlock_t lock; + /** + * @sched_name: the name of the scheduler that owns this fence. We + * keep a copy here since fences can outlive their scheduler. + */ + char sched_name[16]; /** * @owner: job owner for debugging */