From patchwork Wed Oct 18 19:48:22 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 735401 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AF7924736D for ; Wed, 18 Oct 2023 19:48:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="haOA7IU0" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06F34112; Wed, 18 Oct 2023 12:48:40 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id D9F3D41278; Wed, 18 Oct 2023 19:48:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658517; bh=akRPdsuxgfM7TUpwqvyZcc5fI99VwIoThOnPdsadFUs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=haOA7IU0Yctk9yWOwRriiqZYVQ1Y5ofoilUpG6CeNxv/Z+F6X4Oqxtpx0n0hnXTPj SCyGRNZavw2ahrMYWci+37gkGp9yIJ7smfL/LZO6dCqbLZ833yRnaKEbZtrOC8lGdW ++nLy5HOf2J/0GdzjOSvqEb/eTpxKgaPV4XV1neKY8yQ+XKrH4yMQooGESt2GZt0Tu BYTUOSj1YLzAY5O6jC77mAp8gJz6IifrTAZB2GKgg7+ALdVe3Fy9Z3JGmfjXf6mLhQ IKbsmmWy9E6bi2rm56jhLAT42CftAyJk51PwvlncqB0h7L5LONi1gOVhBdR2UYg3/V 26fleQ0Zi+uNA== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Richard Weinberger Subject: [PATCH RT 01/12] io-mapping: don't disable preempt on RT in io_mapping_map_atomic_wc(). Date: Wed, 18 Oct 2023 15:48:22 -0400 Message-Id: <20231018194833.651674-2-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sebastian Andrzej Siewior v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- io_mapping_map_atomic_wc() disables preemption and pagefaults for historical reasons. The conversion to io_mapping_map_local_wc(), which only disables migration, cannot be done wholesale because quite some call sites need to be updated to accommodate with the changed semantics. On PREEMPT_RT enabled kernels the io_mapping_map_atomic_wc() semantics are problematic due to the implicit disabling of preemption which makes it impossible to acquire 'sleeping' spinlocks within the mapped atomic sections. PREEMPT_RT replaces the preempt_disable() with a migrate_disable() for more than a decade. It could be argued that this is a justification to do this unconditionally, but PREEMPT_RT covers only a limited number of architectures and it disables some functionality which limits the coverage further. Limit the replacement to PREEMPT_RT for now. This is also done kmap_atomic(). Link: https://lkml.kernel.org/r/20230310162905.O57Pj7hh@linutronix.de Signed-off-by: Sebastian Andrzej Siewior Reported-by: Richard Weinberger Link: https://lore.kernel.org/CAFLxGvw0WMxaMqYqJ5WgvVSbKHq2D2xcXTOgMCpgq9nDC-MWTQ@mail.gmail.com Cc: Thomas Gleixner Signed-off-by: Andrew Morton (cherry picked from commit 7eb16f23b9a415f062db22739e59bb144e0b24ab) Signed-off-by: Joseph Salisbury --- include/linux/io-mapping.h | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/include/linux/io-mapping.h b/include/linux/io-mapping.h index e9743cfd8585..b0f196e51dca 100644 --- a/include/linux/io-mapping.h +++ b/include/linux/io-mapping.h @@ -69,7 +69,10 @@ io_mapping_map_atomic_wc(struct io_mapping *mapping, BUG_ON(offset >= mapping->size); phys_addr = mapping->base + offset; - preempt_disable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + else + migrate_disable(); pagefault_disable(); return __iomap_local_pfn_prot(PHYS_PFN(phys_addr), mapping->prot); } @@ -79,7 +82,10 @@ io_mapping_unmap_atomic(void __iomem *vaddr) { kunmap_local_indexed((void __force *)vaddr); pagefault_enable(); - preempt_enable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); + else + migrate_enable(); } static inline void __iomem * @@ -168,7 +174,10 @@ static inline void __iomem * io_mapping_map_atomic_wc(struct io_mapping *mapping, unsigned long offset) { - preempt_disable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + else + migrate_disable(); pagefault_disable(); return io_mapping_map_wc(mapping, offset, PAGE_SIZE); } @@ -178,7 +187,10 @@ io_mapping_unmap_atomic(void __iomem *vaddr) { io_mapping_unmap(vaddr); pagefault_enable(); - preempt_enable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); + else + migrate_enable(); } static inline void __iomem * From patchwork Wed Oct 18 19:48:23 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 736134 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F02FF4736A for ; Wed, 18 Oct 2023 19:48:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="NQcJP978" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2741A116; Wed, 18 Oct 2023 12:48:44 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id CAED941538; Wed, 18 Oct 2023 19:48:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658519; bh=mtPHQ5yHBOHSmjh1WF7JqZ4Wpj+oE6q9+cxhTmzOCaI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NQcJP978iEeyOSPVexS16F7tPRTFDui8bibK1SrJqrg90/+Ub9rnkOL93qo7O+0Gv 9jEVOqa7n3iRYD1PbCJk2hIJqwrejMggCM+Vqpdi+y1XbZKtUUP3/2UY5AtHITJGES WnXZQ9aEfwEgFAisfcmCMeO38IPzOVrsWeOifOLARedPqlV5MaYsvVA/om0lXenhzi lGDqw1iIDvZdEYzxXQQ82XVb5/HkOp5RDrdMPxyLwyywrt8++lOCxRk1TT84/HJnrs NWQaepBgBaNoIcQz6wf0MAjuua6eB01SJZZJ/JpfXNq09ZmNEHVeaN3/VEYTBurFoV Tb41+mONncNAg== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Mel Gorman , Linus Torvalds Subject: [PATCH RT 02/12] locking/rwbase: Mitigate indefinite writer starvation Date: Wed, 18 Oct 2023 15:48:23 -0400 Message-Id: <20231018194833.651674-3-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sebastian Andrzej Siewior v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- On PREEMPT_RT, rw_semaphore and rwlock_t locks are unfair to writers. Readers can indefinitely acquire the lock unless the writer fully acquired the lock, which might never happen if there is always a reader in the critical section owning the lock. Mel Gorman reported that since LTP-20220121 the dio_truncate test case went from having 1 reader to having 16 readers and that number of readers is sufficient to prevent the down_write ever succeeding while readers exist. Eventually the test is killed after 30 minutes as a failure. Mel proposed a timeout to limit how long a writer can be blocked until the reader is forced into the slowpath. Thomas argued that there is no added value by providing this timeout. From a PREEMPT_RT point of view, there are no critical rw_semaphore or rwlock_t locks left where the reader must be preferred. Mitigate indefinite writer starvation by forcing the READER into the slowpath once the WRITER attempts to acquire the lock. Reported-by: Mel Gorman Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Ingo Molnar Acked-by: Mel Gorman Link: https://lore.kernel.org/877cwbq4cq.ffs@tglx Link: https://lore.kernel.org/r/20230321161140.HMcQEhHb@linutronix.de Cc: Linus Torvalds (cherry picked from commit 286deb7ec03d941664ac3ffaff58814b454adf65) Signed-off-by: Joseph Salisbury --- kernel/locking/rwbase_rt.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c index 88191f6e252c..a28148a05383 100644 --- a/kernel/locking/rwbase_rt.c +++ b/kernel/locking/rwbase_rt.c @@ -73,15 +73,6 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb, int ret; raw_spin_lock_irq(&rtm->wait_lock); - /* - * Allow readers, as long as the writer has not completely - * acquired the semaphore for write. - */ - if (atomic_read(&rwb->readers) != WRITER_BIAS) { - atomic_inc(&rwb->readers); - raw_spin_unlock_irq(&rtm->wait_lock); - return 0; - } /* * Call into the slow lock path with the rtmutex->wait_lock From patchwork Wed Oct 18 19:48:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 736133 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A87E64736A; Wed, 18 Oct 2023 19:48:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="fYLAOCbR" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D081130; Wed, 18 Oct 2023 12:48:47 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id ABAEF41667; Wed, 18 Oct 2023 19:48:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658521; bh=x1vuTobz6AdxAjlPU1eH9Ke51tMpmFPrUXI+1Z0eGFM=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=fYLAOCbRocIJZchcDE/SIoJac0k+h3qEA6Wnk1Xye7dFQAkO97T+PiLx+oZW1yinN MdIFI6SYUGSM1lEp9nhXUSOcy9JpKT/Jv4s1x9KiyDCLhtAGr/bF91uSAhQwWLw/6A Rv6OJdWz14GZC5FizdnNl/hfewxY/kKjlKiX2qq1DjkaJS+97+YZfJXJZrnH+aItHC 2c/Zoqu+anqXvgFC8pqu8P+m4K8PSOtOJwBvwVasI8r7dPi7cJEpBNOrTtT67bR2fT nYD6cd94NmcJ7s2hYmkjTgDZo5wribfKURPoGYJ6Dl76cQqIQdEZ7K8DMap9B7AUK2 MwqzhKCZpf5Dg== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Paolo Abeni , Jason Xing , Jakub Kicinski , Eric Dumazet , "Paul E. McKenney" , Peter Zijlstra , netdev@vger.kernel.org Subject: [PATCH RT 03/12] Revert "softirq: Let ksoftirqd do its job" Date: Wed, 18 Oct 2023 15:48:24 -0400 Message-Id: <20231018194833.651674-4-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Paolo Abeni v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- This reverts the following commits: 4cd13c21b207 ("softirq: Let ksoftirqd do its job") 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") 1342d8080f61 ("softirq: Don't skip softirq execution when softirq thread is parking") in a single change to avoid known bad intermediate states introduced by a patch series reverting them individually. Due to the mentioned commit, when the ksoftirqd threads take charge of softirq processing, the system can experience high latencies. In the past a few workarounds have been implemented for specific side-effects of the initial ksoftirqd enforcement commit: commit 1ff688209e2e ("watchdog: core: make sure the watchdog_worker is not deferred") commit 8d5755b3f77b ("watchdog: softdog: fire watchdog even if softirqs do not get to run") commit 217f69743681 ("net: busy-poll: allow preemption in sk_busy_loop()") commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") But the latency problem still exists in real-life workloads, see the link below. The reverted commit intended to solve a live-lock scenario that can now be addressed with the NAPI threaded mode, introduced with commit 29863d41bb6e ("net: implement threaded-able napi poll loop support"), which is nowadays in a pretty stable status. While a complete solution to put softirq processing under nice resource control would be preferable, that has proven to be a very hard task. In the short term, remove the main pain point, and also simplify a bit the current softirq implementation. Signed-off-by: Paolo Abeni Signed-off-by: Thomas Gleixner Tested-by: Jason Xing Reviewed-by: Jakub Kicinski Reviewed-by: Eric Dumazet Reviewed-by: Sebastian Andrzej Siewior Cc: "Paul E. McKenney" Cc: Peter Zijlstra Cc: netdev@vger.kernel.org Link: https://lore.kernel.org/netdev/305d7742212cbe98621b16be782b0562f1012cb6.camel@redhat.com Link: https://lore.kernel.org/r/57e66b364f1b6f09c9bc0316742c3b14f4ce83bd.1683526542.git.pabeni@redhat.com (cherry picked from commit d15121be7485655129101f3960ae6add40204463) Signed-off-by: Joseph Salisbury --- kernel/softirq.c | 22 ++-------------------- 1 file changed, 2 insertions(+), 20 deletions(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index 41f470929e99..398951403331 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -80,21 +80,6 @@ static void wakeup_softirqd(void) wake_up_process(tsk); } -/* - * If ksoftirqd is scheduled, we do not want to process pending softirqs - * right now. Let ksoftirqd handle this at its own rate, to get fairness, - * unless we're doing some of the synchronous softirqs. - */ -#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ)) -static bool ksoftirqd_running(unsigned long pending) -{ - struct task_struct *tsk = __this_cpu_read(ksoftirqd); - - if (pending & SOFTIRQ_NOW_MASK) - return false; - return tsk && task_is_running(tsk) && !__kthread_should_park(tsk); -} - #ifdef CONFIG_TRACE_IRQFLAGS DEFINE_PER_CPU(int, hardirqs_enabled); DEFINE_PER_CPU(int, hardirq_context); @@ -236,7 +221,7 @@ void __local_bh_enable_ip(unsigned long ip, unsigned int cnt) goto out; pending = local_softirq_pending(); - if (!pending || ksoftirqd_running(pending)) + if (!pending) goto out; /* @@ -419,9 +404,6 @@ static inline bool should_wake_ksoftirqd(void) static inline void invoke_softirq(void) { - if (ksoftirqd_running(local_softirq_pending())) - return; - if (!force_irqthreads() || !__this_cpu_read(ksoftirqd)) { #ifdef CONFIG_HAVE_IRQ_EXIT_ON_IRQ_STACK /* @@ -455,7 +437,7 @@ asmlinkage __visible void do_softirq(void) pending = local_softirq_pending(); - if (pending && !ksoftirqd_running(pending)) + if (pending) do_softirq_own_stack(); local_irq_restore(flags); From patchwork Wed Oct 18 19:48:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 735400 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D7FE39CA4D for ; Wed, 18 Oct 2023 19:48:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="Ln+uLWOU" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7741F133; Wed, 18 Oct 2023 12:48:47 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 001884166C; Wed, 18 Oct 2023 19:48:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658523; bh=/IVYZ1A9AUMVA/ZaFju9t9TXf8Tzupe5AWHRYxLqdYI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Ln+uLWOUDIQFyId+viJZjW4JnD553+0BKtsBfFdk/KnQt7Hhkvc/iQP2SWfH5Li5K UbWSRfa61088kygrFctzraqM9+DXZZirJWONz2zvScjX9kCFerhGDxQnGwbXS4X7Gd EtUANnPo4ReoSScFYc5VRQMj96RBO5ZbC4VyeeeRkA+Oz0qXvuj7BixudGt5rIuL+F i6Z6DOG7TW0gdcPO50IIYE986ZHPP30fMB023pbYOc51fTIOQMQlN1/+K+Ez/6d6Ns C3LUmO2zwm7HiTGtpVQpml7g9mxXlDOfK54/31CayVxUegDscZOce2Sr9wEz62IKx8 uHZnZt+OKr6Dg== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Ido Schimmel Subject: [PATCH RT 04/12] debugobject: Ensure pool refill (again) Date: Wed, 18 Oct 2023 15:48:25 -0400 Message-Id: <20231018194833.651674-5-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Thomas Gleixner v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- The recent fix to ensure atomicity of lookup and allocation inadvertently broke the pool refill mechanism. Prior to that change debug_objects_activate() and debug_objecs_assert_init() invoked debug_objecs_init() to set up the tracking object for statically initialized objects. That's not longer the case and debug_objecs_init() is now the only place which does pool refills. Depending on the number of statically initialized objects this can be enough to actually deplete the pool, which was observed by Ido via a debugobjects OOM warning. Restore the old behaviour by adding explicit refill opportunities to debug_objects_activate() and debug_objecs_assert_init(). Fixes: 63a759694eed ("debugobject: Prevent init race with static objects") Reported-by: Ido Schimmel Signed-off-by: Thomas Gleixner Tested-by: Ido Schimmel Link: https://lore.kernel.org/r/871qk05a9d.ffs@tglx (cherry picked from commit 0af462f19e635ad522f28981238334620881badc) Signed-off-by: Joseph Salisbury --- lib/debugobjects.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 579406c1e9ed..4c39678c03ee 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -590,6 +590,16 @@ static struct debug_obj *lookup_object_or_alloc(void *addr, struct debug_bucket return NULL; } +static void debug_objects_fill_pool(void) +{ + /* + * On RT enabled kernels the pool refill must happen in preemptible + * context: + */ + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) + fill_pool(); +} + static void __debug_object_init(void *addr, const struct debug_obj_descr *descr, int onstack) { @@ -598,12 +608,7 @@ __debug_object_init(void *addr, const struct debug_obj_descr *descr, int onstack struct debug_obj *obj; unsigned long flags; - /* - * On RT enabled kernels the pool refill must happen in preemptible - * context: - */ - if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) - fill_pool(); + debug_objects_fill_pool(); db = get_bucket((unsigned long) addr); @@ -688,6 +693,8 @@ int debug_object_activate(void *addr, const struct debug_obj_descr *descr) if (!debug_objects_enabled) return 0; + debug_objects_fill_pool(); + db = get_bucket((unsigned long) addr); raw_spin_lock_irqsave(&db->lock, flags); @@ -897,6 +904,8 @@ void debug_object_assert_init(void *addr, const struct debug_obj_descr *descr) if (!debug_objects_enabled) return; + debug_objects_fill_pool(); + db = get_bucket((unsigned long) addr); raw_spin_lock_irqsave(&db->lock, flags); From patchwork Wed Oct 18 19:48:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 735399 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18CDD1EB20 for ; Wed, 18 Oct 2023 19:48:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="Y4b5hIqs" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1EF71129; Wed, 18 Oct 2023 12:48:51 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id A72573F185; Wed, 18 Oct 2023 19:48:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658526; bh=y3puGuxr9d+D67/apJbhs8shSXYt9V9lGXA+J0Jh0ME=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Y4b5hIqsFBvdghOJ8xpWLLwiFuDW8QKe3O28eUBhm9QPaxzyvu25550BYknQD2+0H CseV6wfIqzrPgC7amckq4GCaxymUgnyqI8/2m8WanXwv+XR1tfV/nfN7KJlJ1SfEBm qmWwFOq5JPoHm82+7N333FojCQxW8j2SFtEEU1Br5udFC/W6LAasEX58Rfmjcp/IJG 0Z/7bdX5vxFClzZ7D0PmpN6WcP8yTlMQfMTDwmHhQ6Z48sZ/FRqgteAx4l6xOTU5ih udffNpNcgqE18qfXW34I2gedR328nj/3Tvc1MB+gF0l1X4aMqJYgfJdi35bT6l6l5S 7ogCX0X5trySg== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Peter Zijlstra , Vlastimil Babka , Qi Zheng Subject: [PATCH RT 05/12] debugobjects,locking: Annotate debug_object_fill_pool() wait type violation Date: Wed, 18 Oct 2023 15:48:26 -0400 Message-Id: <20231018194833.651674-6-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Peter Zijlstra v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- There is an explicit wait-type violation in debug_object_fill_pool() for PREEMPT_RT=n kernels which allows them to more easily fill the object pool and reduce the chance of allocation failures. Lockdep's wait-type checks are designed to check the PREEMPT_RT locking rules even for PREEMPT_RT=n kernels and object to this, so create a lockdep annotation to allow this to stand. Specifically, create a 'lock' type that overrides the inner wait-type while it is held -- allowing one to temporarily raise it, such that the violation is hidden. Reported-by: Vlastimil Babka Reported-by: Qi Zheng Signed-off-by: Peter Zijlstra (Intel) Tested-by: Qi Zheng Link: https://lkml.kernel.org/r/20230429100614.GA1489784@hirez.programming.kicks-ass.net (cherry picked from commit 0cce06ba859a515bd06224085d3addb870608b6d) Signed-off-by: Joseph Salisbury --- include/linux/lockdep.h | 14 ++++++++++++++ include/linux/lockdep_types.h | 1 + kernel/locking/lockdep.c | 28 +++++++++++++++++++++------- lib/debugobjects.c | 15 +++++++++++++-- 4 files changed, 49 insertions(+), 9 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index aa0ecfc6cdb4..d28669208e00 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -340,6 +340,16 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); #define lockdep_repin_lock(l,c) lock_repin_lock(&(l)->dep_map, (c)) #define lockdep_unpin_lock(l,c) lock_unpin_lock(&(l)->dep_map, (c)) +/* + * Must use lock_map_aquire_try() with override maps to avoid + * lockdep thinking they participate in the block chain. + */ +#define DEFINE_WAIT_OVERRIDE_MAP(_name, _wait_type) \ + struct lockdep_map _name = { \ + .name = #_name "-wait-type-override", \ + .wait_type_inner = _wait_type, \ + .lock_type = LD_LOCK_WAIT_OVERRIDE, } + #else /* !CONFIG_LOCKDEP */ static inline void lockdep_init_task(struct task_struct *task) @@ -427,6 +437,9 @@ extern int lockdep_is_held(const void *); #define lockdep_repin_lock(l, c) do { (void)(l); (void)(c); } while (0) #define lockdep_unpin_lock(l, c) do { (void)(l); (void)(c); } while (0) +#define DEFINE_WAIT_OVERRIDE_MAP(_name, _wait_type) \ + struct lockdep_map __maybe_unused _name = {} + #endif /* !LOCKDEP */ enum xhlock_context_t { @@ -569,6 +582,7 @@ do { \ #define rwsem_release(l, i) lock_release(l, i) #define lock_map_acquire(l) lock_acquire_exclusive(l, 0, 0, NULL, _THIS_IP_) +#define lock_map_acquire_try(l) lock_acquire_exclusive(l, 0, 1, NULL, _THIS_IP_) #define lock_map_acquire_read(l) lock_acquire_shared_recursive(l, 0, 0, NULL, _THIS_IP_) #define lock_map_acquire_tryread(l) lock_acquire_shared_recursive(l, 0, 1, NULL, _THIS_IP_) #define lock_map_release(l) lock_release(l, _THIS_IP_) diff --git a/include/linux/lockdep_types.h b/include/linux/lockdep_types.h index 3e726ace5c62..a5f1519489df 100644 --- a/include/linux/lockdep_types.h +++ b/include/linux/lockdep_types.h @@ -33,6 +33,7 @@ enum lockdep_wait_type { enum lockdep_lock_type { LD_LOCK_NORMAL = 0, /* normal, catch all */ LD_LOCK_PERCPU, /* percpu */ + LD_LOCK_WAIT_OVERRIDE, /* annotation */ LD_LOCK_MAX, }; diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index ce3c8a4a5506..a3de5f06a2de 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -2208,6 +2208,9 @@ static inline bool usage_match(struct lock_list *entry, void *mask) static inline bool usage_skip(struct lock_list *entry, void *mask) { + if (entry->class->lock_type == LD_LOCK_NORMAL) + return false; + /* * Skip local_lock() for irq inversion detection. * @@ -2234,14 +2237,16 @@ static inline bool usage_skip(struct lock_list *entry, void *mask) * As a result, we will skip local_lock(), when we search for irq * inversion bugs. */ - if (entry->class->lock_type == LD_LOCK_PERCPU) { - if (DEBUG_LOCKS_WARN_ON(entry->class->wait_type_inner < LD_WAIT_CONFIG)) - return false; + if (entry->class->lock_type == LD_LOCK_PERCPU && + DEBUG_LOCKS_WARN_ON(entry->class->wait_type_inner < LD_WAIT_CONFIG)) + return false; - return true; - } + /* + * Skip WAIT_OVERRIDE for irq inversion detection -- it's not actually + * a lock and only used to override the wait_type. + */ - return false; + return true; } /* @@ -4707,7 +4712,8 @@ static int check_wait_context(struct task_struct *curr, struct held_lock *next) for (; depth < curr->lockdep_depth; depth++) { struct held_lock *prev = curr->held_locks + depth; - u8 prev_inner = hlock_class(prev)->wait_type_inner; + struct lock_class *class = hlock_class(prev); + u8 prev_inner = class->wait_type_inner; if (prev_inner) { /* @@ -4717,6 +4723,14 @@ static int check_wait_context(struct task_struct *curr, struct held_lock *next) * Also due to trylocks. */ curr_inner = min(curr_inner, prev_inner); + + /* + * Allow override for annotations -- this is typically + * only valid/needed for code that only exists when + * CONFIG_PREEMPT_RT=n. + */ + if (unlikely(class->lock_type == LD_LOCK_WAIT_OVERRIDE)) + curr_inner = prev_inner; } } diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 4c39678c03ee..e6b0cabdcb2c 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -594,10 +594,21 @@ static void debug_objects_fill_pool(void) { /* * On RT enabled kernels the pool refill must happen in preemptible - * context: + * context -- for !RT kernels we rely on the fact that spinlock_t and + * raw_spinlock_t are basically the same type and this lock-type + * inversion works just fine. */ - if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) { + /* + * Annotate away the spinlock_t inside raw_spinlock_t warning + * by temporarily raising the wait-type to WAIT_SLEEP, matching + * the preemptible() condition above. + */ + static DEFINE_WAIT_OVERRIDE_MAP(fill_pool_map, LD_WAIT_SLEEP); + lock_map_acquire_try(&fill_pool_map); fill_pool(); + lock_map_release(&fill_pool_map); + } } static void From patchwork Wed Oct 18 19:48:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 736132 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90E161EB21 for ; Wed, 18 Oct 2023 19:48:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="eqzGqdhc" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3A73A130; Wed, 18 Oct 2023 12:48:53 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id AAB4C41278; Wed, 18 Oct 2023 19:48:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658528; bh=SD/1DSd8LkUD/zRpHE/Lx7nVsJxLLRJldAx76/Ewjto=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=eqzGqdhcMKkEsGGZrnIdDr4IRYpseCFXUwUeaxfkv3BDuhXLVq/8uqRDeo5jQM6qv VkMaNlBHQ121JFDJw1UODaexmYsTKTA/hI8ihgT2PLuFzMjTxmEYXpMDNNCkzksL9X rgFpiSMYFdC6BmbJ1zR+6LC1UWONgcszBUHO1KmRonLsWLOQqQks9PdvPSOD98Lkev W7yr8XxezPCmMn1svc1b/M5GDFgTLyrJz+sH5CKFWq3OvfJAxLeyFSVhlubUqr2fdd 3xIjzE6XhBvF0q7IbYnTlsoJg5fR1/hzIyN+R9GA/ikzx4QdA2TeftCRE0n3F3/amB 6XOhOzwec5Jng== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Wander Lairson Costa , Oleg Nesterov , Peter Zijlstra Subject: [PATCH RT 06/12] sched: avoid false lockdep splat in put_task_struct() Date: Wed, 18 Oct 2023 15:48:27 -0400 Message-Id: <20231018194833.651674-7-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Wander Lairson Costa v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- In put_task_struct(), a spin_lock is indirectly acquired under the kernel stock. When running the kernel in real-time (RT) configuration, the operation is dispatched to a preemptible context call to ensure guaranteed preemption. However, if PROVE_RAW_LOCK_NESTING is enabled and __put_task_struct() is called while holding a raw_spinlock, lockdep incorrectly reports an "Invalid lock context" in the stock kernel. This false splat occurs because lockdep is unaware of the different route taken under RT. To address this issue, override the inner wait type to prevent the false lockdep splat. Suggested-by: Oleg Nesterov Suggested-by: Sebastian Andrzej Siewior Suggested-by: Peter Zijlstra Signed-off-by: Wander Lairson Costa Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20230614122323.37957-3-wander@redhat.com (cherry picked from commit 893cdaaa3977be6afb3a7f756fbfd7be83f68d8c) Signed-off-by: Joseph Salisbury --- include/linux/sched/task.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 0c2d00809915..75d52a9e7620 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -115,6 +115,19 @@ static inline void put_task_struct(struct task_struct *t) if (!refcount_dec_and_test(&t->usage)) return; + /* + * In !RT, it is always safe to call __put_task_struct(). + * Under RT, we can only call it in preemptible context. + */ + if (!IS_ENABLED(CONFIG_PREEMPT_RT) || preemptible()) { + static DEFINE_WAIT_OVERRIDE_MAP(put_task_map, LD_WAIT_SLEEP); + + lock_map_acquire_try(&put_task_map); + __put_task_struct(t); + lock_map_release(&put_task_map); + return; + } + /* * under PREEMPT_RT, we can't call put_task_struct * in atomic context because it will indirectly @@ -135,10 +148,7 @@ static inline void put_task_struct(struct task_struct *t) * when it fails to fork a process. Therefore, there is no * way it can conflict with put_task_struct(). */ - if (IS_ENABLED(CONFIG_PREEMPT_RT) && !preemptible()) - call_rcu(&t->rcu, __put_task_struct_rcu_cb); - else - __put_task_struct(t); + call_rcu(&t->rcu, __put_task_struct_rcu_cb); } static inline void put_task_struct_many(struct task_struct *t, int nr) From patchwork Wed Oct 18 19:48:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 735398 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 664AD1EB22 for ; Wed, 18 Oct 2023 19:48:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="uTajf4Lr" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6931A134; Wed, 18 Oct 2023 12:48:53 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id B6A7741538; Wed, 18 Oct 2023 19:48:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658532; bh=d70wFwXulEx1BEkYka3RH1xY2ngWeBIL+MztFYjvhlc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=uTajf4Lrk/Y6l0PMtoBHSezRxbtL6I+9Etz56AKlRR7JWdgBtA2AgEQV3/Rupk25Y ymMXoaB4O00xU2p3kpxyx2gU5OaaISsp8UuLCHRKhfRSD1NURNzXc6lXvCvSMrsJmn QAqHgB7jlEbrrtctV3GBhbpeyca3C/g7+TjFn/HbzMP5OcomVDbqC0OrNWOKLinqMa j5gX5NDorx+NHXgRr+k0onATTxnDpnpuyJHWX7vAewtsnEUG3UfckFGZ43VEaAwH/D 9z3KGo7fSjqHSw3K7YImQN2LOrPLrcufRaZ9wwG+Li4hqFbonl0VojHt4v5oB84EHT WTftckTLnLf5g== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Tetsuo Handa Subject: [PATCH RT 07/12] locking/seqlock: Do the lockdep annotation before locking in do_write_seqcount_begin_nested() Date: Wed, 18 Oct 2023 15:48:28 -0400 Message-Id: <20231018194833.651674-8-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sebastian Andrzej Siewior v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- It was brought up by Tetsuo that the following sequence: write_seqlock_irqsave() printk_deferred_enter() could lead to a deadlock if the lockdep annotation within write_seqlock_irqsave() triggers. The problem is that the sequence counter is incremented before the lockdep annotation is performed. The lockdep splat would then attempt to invoke printk() but the reader side, of the same seqcount, could have a tty_port::lock acquired waiting for the sequence number to become even again. The other lockdep annotations come before the actual locking because "we want to see the locking error before it happens". There is no reason why seqcount should be different here. Do the lockdep annotation first then perform the locking operation (the sequence increment). Fixes: 1ca7d67cf5d5a ("seqcount: Add lockdep functionality to seqcount/seqlock structures") Reported-by: Tetsuo Handa Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Ingo Molnar Link: https://lore.kernel.org/r/20230920104627._DTHgPyA@linutronix.de Closes: https://lore.kernel.org/20230621130641.-5iueY1I@linutronix.de (cherry picked from commit 41b43b6c6e30a832c790b010a06772e793bca193) Signed-off-by: Joseph Salisbury --- include/linux/seqlock.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 37ded6b8fee6..2c5d0102315d 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -516,8 +516,8 @@ do { \ static inline void do_write_seqcount_begin_nested(seqcount_t *s, int subclass) { - do_raw_write_seqcount_begin(s); seqcount_acquire(&s->dep_map, subclass, 0, _RET_IP_); + do_raw_write_seqcount_begin(s); } /** From patchwork Wed Oct 18 19:48:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 736131 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A8B31EB36 for ; Wed, 18 Oct 2023 19:49:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="U34gL95E" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0752F12A; Wed, 18 Oct 2023 12:48:57 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 6FFC841667; Wed, 18 Oct 2023 19:48:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658533; bh=q/gI4PcKUTSkZEO2X7cx1o5TMd8FkYNBDk5OQzXeztw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=U34gL95E19+hPs4CUFk2ZWiCQKMKSxu4Rp+Kq5hmE6LgNO2Afspz4YBbB1jq4Nx6C v6E9dVAiy5qhR5JEE1tPdscJJPB7BjY5Y1djPfuJYZfQINWjjl3xcfM+JvzdU6JlBQ VgUEfFfV3Wtlx2Kz/zMuQKTrRqxZhv9IP/4dzmHdXe7RWJCj7J+QZ6M10yGA9JwGPV rzerVR6frjNZfJjjGTqcBetYs2sdBO2MftlyZhMWLmq9viIvvHsIy5FQQC+ZWi+Tu8 dOYp4x2NFWtAngH7k3o0NycUadFgWywSLAOk4o1zok8GI9ieOe7eAJwXQSXQAt9dMu dZKLhkl219FxQ== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Michal Hocko , David Hildenbrand Subject: [PATCH RT 08/12] mm/page_alloc: Use write_seqlock_irqsave() instead write_seqlock() + local_irq_save(). Date: Wed, 18 Oct 2023 15:48:29 -0400 Message-Id: <20231018194833.651674-9-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sebastian Andrzej Siewior v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- __build_all_zonelists() acquires zonelist_update_seq by first disabling interrupts via local_irq_save() and then acquiring the seqlock with write_seqlock(). This is troublesome and leads to problems on PREEMPT_RT. The problem is that the inner spinlock_t becomes a sleeping lock on PREEMPT_RT and must not be acquired with disabled interrupts. The API provides write_seqlock_irqsave() which does the right thing in one step. printk_deferred_enter() has to be invoked in non-migrate-able context to ensure that deferred printing is enabled and disabled on the same CPU. This is the case after zonelist_update_seq has been acquired. There was discussion on the first submission that the order should be: local_irq_disable(); printk_deferred_enter(); write_seqlock(); to avoid pitfalls like having an unaccounted printk() coming from write_seqlock_irqsave() before printk_deferred_enter() is invoked. The only origin of such a printk() can be a lockdep splat because the lockdep annotation happens after the sequence count is incremented. This is exceptional and subject to change. It was also pointed that PREEMPT_RT can be affected by the printk problem since its write_seqlock_irqsave() does not really disable interrupts. This isn't the case because PREEMPT_RT's printk implementation differs from the mainline implementation in two important aspects: - Printing happens in a dedicated threads and not at during the invocation of printk(). - In emergency cases where synchronous printing is used, a different driver is used which does not use tty_port::lock. Acquire zonelist_update_seq with write_seqlock_irqsave() and then defer printk output. Fixes: 1007843a91909 ("mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock") Acked-by: Michal Hocko Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/r/20230623201517.yw286Knb@linutronix.de Signed-off-by: Sebastian Andrzej Siewior (cherry picked from commit 4d1139baae8bc4fff3728d1d204bdb04c13dbe10) Signed-off-by: Joseph Salisbury --- mm/page_alloc.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 33355028122a..174bcc23d5fd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6419,19 +6419,17 @@ static void __build_all_zonelists(void *data) unsigned long flags; /* - * Explicitly disable this CPU's interrupts before taking seqlock - * to prevent any IRQ handler from calling into the page allocator - * (e.g. GFP_ATOMIC) that could hit zonelist_iter_begin and livelock. + * The zonelist_update_seq must be acquired with irqsave because the + * reader can be invoked from IRQ with GFP_ATOMIC. */ - local_irq_save(flags); + write_seqlock_irqsave(&zonelist_update_seq, flags); /* - * Explicitly disable this CPU's synchronous printk() before taking - * seqlock to prevent any printk() from trying to hold port->lock, for + * Also disable synchronous printk() to prevent any printk() from + * trying to hold port->lock, for * tty_insert_flip_string_and_push_buffer() on other CPU might be * calling kmalloc(GFP_ATOMIC | __GFP_NOWARN) with port->lock held. */ printk_deferred_enter(); - write_seqlock(&zonelist_update_seq); #ifdef CONFIG_NUMA memset(node_load, 0, sizeof(node_load)); @@ -6464,9 +6462,8 @@ static void __build_all_zonelists(void *data) #endif } - write_sequnlock(&zonelist_update_seq); printk_deferred_exit(); - local_irq_restore(flags); + write_sequnlock_irqrestore(&zonelist_update_seq, flags); } static noinline void __init From patchwork Wed Oct 18 19:48:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 735397 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 121251EB37 for ; Wed, 18 Oct 2023 19:49:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="off3pYGx" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B3F62136; Wed, 18 Oct 2023 12:48:58 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 0AFB44166B; Wed, 18 Oct 2023 19:48:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658536; bh=G7TCMHI4oCzFilfHL+ENbckNckZnrbLoJwyPDTUJMfY=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=off3pYGxqrhP/ZOROGSot/iom+Qsa3Oo0XFxLOJ979EOZ91Y4nDmMW0fBtJN8/nPt DlbMarLAI95kNAuDsX2HXXPP5ftl+TfH55ZA91AbcPZjtTppyBKJPmVwfH6Grw3zfx HQ8xfrwcHi5HvYKH9ysrxnu0e43QZHJjGic3LWHh/EdIgFY0e14AbsSzM6xMGf1iPf 45I9/kH5KxXOg7pm/k8sykdoQCH5zwNpbVC3sSg62HD62ELXt0ugxcqYXg3mcge/t1 IeRkFY4DGu3auazlWszxM2fJZCahAnuxathnDutea9ZyxdAnvsjaE8mRBgQPxRhoGa HUEMao2v5FlwQ== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Subject: [PATCH RT 09/12] bpf: Remove in_atomic() from bpf_link_put(). Date: Wed, 18 Oct 2023 15:48:30 -0400 Message-Id: <20231018194833.651674-10-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Sebastian Andrzej Siewior v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- bpf_free_inode() is invoked as a RCU callback. Usually RCU callbacks are invoked within softirq context. By setting rcutree.use_softirq=0 boot option the RCU callbacks will be invoked in a per-CPU kthread with bottom halves disabled which implies a RCU read section. On PREEMPT_RT the context remains fully preemptible. The RCU read section however does not allow schedule() invocation. The latter happens in mutex_lock() performed by bpf_trampoline_unlink_prog() originated from bpf_link_put(). It was pointed out that the bpf_link_put() invocation should not be delayed if originated from close(). It was also pointed out that other invocations from within a syscall should also avoid the workqueue. Everyone else should use workqueue by default to remain safe in the future (while auditing the code, every caller was preemptible except for the RCU case). Let bpf_link_put() use the worker unconditionally. Add bpf_link_put_direct() which will directly free the resources and is used by close() and from within __sys_bpf(). Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20230614083430.oENawF8f@linutronix.de (cherry picked from commit ab5d47bd41b1db82c295b0e751e2b822b43a4b5a) Signed-off-by: Joseph Salisbury --- kernel/bpf/syscall.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index ad41b8230780..bcc01f9881cf 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2454,27 +2454,30 @@ static void bpf_link_put_deferred(struct work_struct *work) bpf_link_free(link); } -/* bpf_link_put can be called from atomic context, but ensures that resources - * are freed from process context +/* bpf_link_put might be called from atomic context. It needs to be called + * from sleepable context in order to acquire sleeping locks during the process. */ void bpf_link_put(struct bpf_link *link) { if (!atomic64_dec_and_test(&link->refcnt)) return; - if (in_atomic()) { - INIT_WORK(&link->work, bpf_link_put_deferred); - schedule_work(&link->work); - } else { - bpf_link_free(link); - } + INIT_WORK(&link->work, bpf_link_put_deferred); + schedule_work(&link->work); +} + +static void bpf_link_put_direct(struct bpf_link *link) +{ + if (!atomic64_dec_and_test(&link->refcnt)) + return; + bpf_link_free(link); } static int bpf_link_release(struct inode *inode, struct file *filp) { struct bpf_link *link = filp->private_data; - bpf_link_put(link); + bpf_link_put_direct(link); return 0; } @@ -4351,7 +4354,7 @@ static int link_update(union bpf_attr *attr) if (ret) bpf_prog_put(new_prog); out_put_link: - bpf_link_put(link); + bpf_link_put_direct(link); return ret; } @@ -4374,7 +4377,7 @@ static int link_detach(union bpf_attr *attr) else ret = -EOPNOTSUPP; - bpf_link_put(link); + bpf_link_put_direct(link); return ret; } @@ -4425,7 +4428,7 @@ static int bpf_link_get_fd_by_id(const union bpf_attr *attr) fd = bpf_link_new_fd(link); if (fd < 0) - bpf_link_put(link); + bpf_link_put_direct(link); return fd; } @@ -4502,7 +4505,7 @@ static int bpf_iter_create(union bpf_attr *attr) return PTR_ERR(link); err = bpf_iter_new_fd(link); - bpf_link_put(link); + bpf_link_put_direct(link); return err; } From patchwork Wed Oct 18 19:48:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 736130 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B94971EB44 for ; Wed, 18 Oct 2023 19:49:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="V00gxXgq" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F288012C; Wed, 18 Oct 2023 12:49:00 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id 402FA4166C; Wed, 18 Oct 2023 19:48:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658538; bh=4pmQNv5cuKJdx4r3/SYwWbuS5xEyX4GGjTdNfRldRoE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=V00gxXgqAOphSWpCadBNGWsMCNJ7MQ9x52GYK0yS2GcjAQ6s/7taWRQ+63q2fZc6k fs6rlyCmHxNT8FO5ASDUdWf3iDoQGU2veeNh+ZHnuf3vNfePDtG8Wp8c6oGdfqipP/ bLFDgEPu6NUOE6RSDK1IW2Z+WtjkdYHlJgKljkL7/uzMVroGeG81jOaxNqHnpDMz+x M6AmVbS7jrKKJDmYuI+EfJovx8WKEz7IRgnGxvn6DfzJK1tl05IJ7a8kW5r6qHpkgB 2FPHNONrJB3AlX6VpUQ5j32xivi3juCk/vufQCS1YAX0A/DvRDSkqQshtgJ1QyjZni 9WXo0BifQMY0Q== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com, Dmitry Vyukov , Frederic Weisbecker Subject: [PATCH RT 10/12] posix-timers: Ensure timer ID search-loop limit is valid Date: Wed, 18 Oct 2023 15:48:31 -0400 Message-Id: <20231018194833.651674-11-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Thomas Gleixner v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- posix_timer_add() tries to allocate a posix timer ID by starting from the cached ID which was stored by the last successful allocation. This is done in a loop searching the ID space for a free slot one by one. The loop has to terminate when the search wrapped around to the starting point. But that's racy vs. establishing the starting point. That is read out lockless, which leads to the following problem: CPU0 CPU1 posix_timer_add() start = sig->posix_timer_id; lock(hash_lock); ... posix_timer_add() if (++sig->posix_timer_id < 0) start = sig->posix_timer_id; sig->posix_timer_id = 0; So CPU1 can observe a negative start value, i.e. -1, and the loop break never happens because the condition can never be true: if (sig->posix_timer_id == start) break; While this is unlikely to ever turn into an endless loop as the ID space is huge (INT_MAX), the racy read of the start value caught the attention of KCSAN and Dmitry unearthed that incorrectness. Rewrite it so that all id operations are under the hash lock. Reported-by: syzbot+5c54bd3eb218bb595aa9@syzkaller.appspotmail.com Reported-by: Dmitry Vyukov Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker Link: https://lore.kernel.org/r/87bkhzdn6g.ffs@tglx (cherry picked from commit 8ce8849dd1e78dadcee0ec9acbd259d239b7069f) Signed-off-by: Joseph Salisbury --- include/linux/sched/signal.h | 2 +- kernel/time/posix-timers.c | 31 ++++++++++++++++++------------- 2 files changed, 19 insertions(+), 14 deletions(-) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 5f0e8403e8ce..9743f7d173a0 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -125,7 +125,7 @@ struct signal_struct { #ifdef CONFIG_POSIX_TIMERS /* POSIX.1b Interval Timers */ - int posix_timer_id; + unsigned int next_posix_timer_id; struct list_head posix_timers; /* ITIMER_REAL timer for the process */ diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index ed3c4a954398..2d6cf93ca370 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -140,25 +140,30 @@ static struct k_itimer *posix_timer_by_id(timer_t id) static int posix_timer_add(struct k_itimer *timer) { struct signal_struct *sig = current->signal; - int first_free_id = sig->posix_timer_id; struct hlist_head *head; - int ret = -ENOENT; + unsigned int cnt, id; - do { + /* + * FIXME: Replace this by a per signal struct xarray once there is + * a plan to handle the resulting CRIU regression gracefully. + */ + for (cnt = 0; cnt <= INT_MAX; cnt++) { spin_lock(&hash_lock); - head = &posix_timers_hashtable[hash(sig, sig->posix_timer_id)]; - if (!__posix_timers_find(head, sig, sig->posix_timer_id)) { + id = sig->next_posix_timer_id; + + /* Write the next ID back. Clamp it to the positive space */ + sig->next_posix_timer_id = (id + 1) & INT_MAX; + + head = &posix_timers_hashtable[hash(sig, id)]; + if (!__posix_timers_find(head, sig, id)) { hlist_add_head_rcu(&timer->t_hash, head); - ret = sig->posix_timer_id; + spin_unlock(&hash_lock); + return id; } - if (++sig->posix_timer_id < 0) - sig->posix_timer_id = 0; - if ((sig->posix_timer_id == first_free_id) && (ret == -ENOENT)) - /* Loop over all possible ids completed */ - ret = -EAGAIN; spin_unlock(&hash_lock); - } while (ret == -ENOENT); - return ret; + } + /* POSIX return code when no timer ID could be allocated */ + return -EAGAIN; } static inline void unlock_timer(struct k_itimer *timr, unsigned long flags) From patchwork Wed Oct 18 19:48:32 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 735396 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5A06A1EB27 for ; Wed, 18 Oct 2023 19:49:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="dc6cF6qo" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 073DE134; Wed, 18 Oct 2023 12:49:03 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id C605141687; Wed, 18 Oct 2023 19:48:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658541; bh=8Y07yuoT6DVyuGlkc8/akVKoJW7v392Ra1UBrfLmh5Y=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dc6cF6qoAjkfEXrGMu0blXNYuIEkkt4XWwFkrTY+GNsY+UPMYNV9PGaEca8gY5R1U vMl/LXS/vNd6G1SnK+G1WvGyvx1SoeG+TJ/jCOV4l8iKm6IgHUQroBsTiLAfE0XY3z PhgyfLRirMrTLNsG1faA4fYJWtp554F64OPPsWSaiFymqNuKV1JSRg+jyww6l/0HTT RWX57Z+MVZx24z/FeMExLU7StqidEc22lzUkcKT5jv1KF/Z7nsSg/FX2LtKifEnuAd eyoCAmHb7kq8njndyyvx9pp1uZ/aps3JvZWyS0Qu0QiI3QnvGNXUkJ4kxLJSjfM4Gj Yfton3wKj8SNQ== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Cc: Tvrtko Ursulin , Chris Wilson , Paul Gortmaker Subject: [PATCH RT 11/12] drm/i915: Do not disable preemption for resets Date: Wed, 18 Oct 2023 15:48:32 -0400 Message-Id: <20231018194833.651674-12-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Tvrtko Ursulin v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- [commit 40cd2835ced288789a685aa4aa7bc04b492dcd45 in linux-rt-devel] Commit ade8a0f59844 ("drm/i915: Make all GPU resets atomic") added a preempt disable section over the hardware reset callback to prepare the driver for being able to reset from atomic contexts. In retrospect I can see that the work item at a time was about removing the struct mutex from the reset path. Code base also briefly entertained the idea of doing the reset under stop_machine in order to serialize userspace mmap and temporary glitch in the fence registers (see eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex"), but that never materialized and was soon removed in 2caffbf11762 ("drm/i915: Revoke mmaps and prevent access to fence registers across reset") and replaced with a SRCU based solution. As such, as far as I can see, today we still have a requirement that resets must not sleep (invoked from submission tasklets), but no need to support invoking them from a truly atomic context. Given that the preemption section is problematic on RT kernels, since the uncore lock becomes a sleeping lock and so is invalid in such section, lets try and remove it. Potential downside is that our short waits on GPU to complete the reset may get extended if CPU scheduling interferes, but in practice that probably isn't a deal breaker. In terms of mechanics, since the preemption disabled block is being removed we just need to replace a few of the wait_for_atomic macros into busy looping versions which will work (and not complain) when called from non-atomic sections. Signed-off-by: Tvrtko Ursulin Cc: Chris Wilson Cc: Paul Gortmaker Cc: Sebastian Andrzej Siewior Acked-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20230705093025.3689748-1-tvrtko.ursulin@linux.intel.com Signed-off-by: Sebastian Andrzej Siewior [PG: backport from v6.4-rt ; minor context fixup caused by b7d70b8b06ed] Signed-off-by: Paul Gortmaker Signed-off-by: Clark Williams (cherry picked from commit 1a80b572f783a15327663bf9e7d71163976e8d6a v6.1-rt) Signed-off-by: Joseph Salisbury --- drivers/gpu/drm/i915/gt/intel_reset.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 9dc244b70ce4..06ab730dc9a8 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -167,13 +167,13 @@ static int i915_do_reset(struct intel_gt *gt, /* Assert reset for at least 20 usec, and wait for acknowledgement. */ pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE); udelay(50); - err = wait_for_atomic(i915_in_reset(pdev), 50); + err = _wait_for_atomic(i915_in_reset(pdev), 50, 0); /* Clear the reset request. */ pci_write_config_byte(pdev, I915_GDRST, 0); udelay(50); if (!err) - err = wait_for_atomic(!i915_in_reset(pdev), 50); + err = _wait_for_atomic(!i915_in_reset(pdev), 50, 0); return err; } @@ -193,7 +193,7 @@ static int g33_do_reset(struct intel_gt *gt, struct pci_dev *pdev = to_pci_dev(gt->i915->drm.dev); pci_write_config_byte(pdev, I915_GDRST, GRDOM_RESET_ENABLE); - return wait_for_atomic(g4x_reset_complete(pdev), 50); + return _wait_for_atomic(g4x_reset_complete(pdev), 50, 0); } static int g4x_do_reset(struct intel_gt *gt, @@ -210,7 +210,7 @@ static int g4x_do_reset(struct intel_gt *gt, pci_write_config_byte(pdev, I915_GDRST, GRDOM_MEDIA | GRDOM_RESET_ENABLE); - ret = wait_for_atomic(g4x_reset_complete(pdev), 50); + ret = _wait_for_atomic(g4x_reset_complete(pdev), 50, 0); if (ret) { GT_TRACE(gt, "Wait for media reset failed\n"); goto out; @@ -218,7 +218,7 @@ static int g4x_do_reset(struct intel_gt *gt, pci_write_config_byte(pdev, I915_GDRST, GRDOM_RENDER | GRDOM_RESET_ENABLE); - ret = wait_for_atomic(g4x_reset_complete(pdev), 50); + ret = _wait_for_atomic(g4x_reset_complete(pdev), 50, 0); if (ret) { GT_TRACE(gt, "Wait for render reset failed\n"); goto out; @@ -736,9 +736,7 @@ int __intel_gt_reset(struct intel_gt *gt, intel_engine_mask_t engine_mask) intel_uncore_forcewake_get(gt->uncore, FORCEWAKE_ALL); for (retry = 0; ret == -ETIMEDOUT && retry < retries; retry++) { GT_TRACE(gt, "engine_mask=%x\n", engine_mask); - preempt_disable(); ret = reset(gt, engine_mask, retry); - preempt_enable(); } intel_uncore_forcewake_put(gt->uncore, FORCEWAKE_ALL); From patchwork Wed Oct 18 19:48:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Salisbury X-Patchwork-Id: 736129 Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7FB561EB27 for ; Wed, 18 Oct 2023 19:49:08 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=canonical.com header.i=@canonical.com header.b="gwHg/kTP" Received: from smtp-relay-canonical-0.canonical.com (smtp-relay-canonical-0.canonical.com [185.125.188.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52AC219A; Wed, 18 Oct 2023 12:49:07 -0700 (PDT) Received: from smtp.gmail.com (1.general.jsalisbury.us.vpn [10.172.66.188]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-canonical-0.canonical.com (Postfix) with ESMTPSA id EE37D41667; Wed, 18 Oct 2023 19:49:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1697658543; bh=Gd64AQOlYW0cO+GJIqYClb0fmWxzvQzaoJzI+NqnDfE=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=gwHg/kTP4yko3u2zpm9MYF3sgVOBWx5/6Fpk3Znq6zOoxGFpY/z2P/dLijS4vgLfT XHrS8RAEeliKarUyAjIyfpU6+LS1Ht00cXej3L8mjHB1bSp2JEeH08THPCjvy7LSoP Osmr/iQ4Jf4ggRClne45qDgTXnACqzbwzQJ0z3qTMq7bKdobya4dalkASrQYvIHo6x Hnqy7cyFQ50rofpLwxk6ZjTa3abX221nxO+W2PRe7lUnMBdOAj0S5ourJA9jEPqF/f 5HcdnCp6eLV4tE/fG5yZUvu9daxHI61+bWaOzYgtVLgN+q6XeTLJq84+ygGMVXOf3C iIT1pFbznSiaQ== From: Joseph Salisbury To: LKML , linux-rt-users , Steven Rostedt , Thomas Gleixner , Carsten Emde , John Kacur , Sebastian Andrzej Siewior , Daniel Wagner , Tom Zanussi , Clark Williams , Pavel Machek , Joseph Salisbury Subject: [PATCH RT 12/12] Linux 5.15.133-rt70 Date: Wed, 18 Oct 2023 15:48:33 -0400 Message-Id: <20231018194833.651674-13-joseph.salisbury@canonical.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231018194833.651674-1-joseph.salisbury@canonical.com> References: <20231018194833.651674-1-joseph.salisbury@canonical.com> Precedence: bulk X-Mailing-List: linux-rt-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 v5.15.133-rt70-rc1 stable review patch. If anyone has any objections, please let me know. ----------- Signed-off-by: Joseph Salisbury --- localversion-rt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/localversion-rt b/localversion-rt index 65189810797f..f36b5d418dd8 100644 --- a/localversion-rt +++ b/localversion-rt @@ -1 +1 @@ --rt69 +-rt70