From patchwork Mon Oct 31 15:50:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 620402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F6A1FA3743 for ; Mon, 31 Oct 2022 15:51:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231790AbiJaPvS (ORCPT ); Mon, 31 Oct 2022 11:51:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56716 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231721AbiJaPvQ (ORCPT ); Mon, 31 Oct 2022 11:51:16 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D1FB611A33 for ; Mon, 31 Oct 2022 08:51:14 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1667231473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YBPS80aHUpZuC7rxRWs0ybW07qqs1RfIuE6+vKBkRoA=; b=lyeZwqm1T2TAUEpMbr+S2bLDVjpI2cD7Jv5dIZZURy9vvQ89GqORlguHdrKCbT1JXv748n qKxX0reWYKls9uAVhFGKMebjxddDBScssGwd88QQ+6KvlooxaEuZF9Pf1pbQIxUsowIniT DZJ0LUiBWeZCTxWnrqCuNRw6xfSIh0kcTnuPrLt/s6Nd3odYdFBjp/TJGPHzcc+I4WSXVf EdIyPD8skREha1FDqPU8dmMiJFg85Z/GJHXKYvTGBT7oAjaHuEB1Brq7fUiCoqFJRM+i7a US6eJAL7XCgNSZQLcK4LW3mzko/MX8txyVJyLLQITTO4GiqS46fJmWO57opQUA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1667231473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YBPS80aHUpZuC7rxRWs0ybW07qqs1RfIuE6+vKBkRoA=; b=KkLBFrrJHhZAxqMkTp32wZqGj6G81dRITVJOoMbs7W9cTnF0It60JMxEe8WYuUja12y0Hx k/PQyrvMu6K2+MDw== To: Daniel Wagner Cc: linux-rt-users , Steven Rostedt , homas Gleixner , Carsten Emde , John Kacur , Tom Zanussi , Clark Williams , Pavel Machek Subject: [PATCH RT 1/3] timers: Prepare support for PREEMPT_RT Date: Mon, 31 Oct 2022 16:50:04 +0100 Message-Id: <20221031155006.1651995-2-bigeasy@linutronix.de> In-Reply-To: <20221031155006.1651995-1-bigeasy@linutronix.de> References: <20221024105416.nflnrqhmzsyqqdzz@carbon.lan> <20221031155006.1651995-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org From: Anna-Maria Gleixner Upstream commit 030dcdd197d77374879bb5603d091eee7d8aba80 When PREEMPT_RT is enabled, the soft interrupt thread can be preempted. If the soft interrupt thread is preempted in the middle of a timer callback, then calling del_timer_sync() can lead to two issues: - If the caller is on a remote CPU then it has to spin wait for the timer handler to complete. This can result in unbound priority inversion. - If the caller originates from the task which preempted the timer handler on the same CPU, then spin waiting for the timer handler to complete is never going to end. To avoid these issues, add a new lock to the timer base which is held around the execution of the timer callbacks. If del_timer_sync() detects that the timer callback is currently running, it blocks on the expiry lock. When the callback is finished, the expiry lock is dropped by the softirq thread which wakes up the waiter and the system makes progress. This addresses both the priority inversion and the life lock issues. This mechanism is not used for timers which are marked IRQSAFE as for those preemption is disabled accross the callback and therefore this situation cannot happen. The callbacks for such timers need to be individually audited for RT compliance. The same issue can happen in virtual machines when the vCPU which runs a timer callback is scheduled out. If a second vCPU of the same guest calls del_timer_sync() it will spin wait for the other vCPU to be scheduled back in. The expiry lock mechanism would avoid that. It'd be trivial to enable this when paravirt spinlocks are enabled in a guest, but it's not clear whether this is an actual problem in the wild, so for now it's an RT only mechanism. As the softirq thread can be preempted with PREEMPT_RT=y, the SMP variant of del_timer_sync() needs to be used on UP as well. [ tglx: Refactored it for mainline ] Signed-off-by: Anna-Maria Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20190726185753.832418500@linutronix.de Signed-off-by: Sebastian Andrzej Siewior --- kernel/time/timer.c | 144 +++++++++++++++++++++++++++++--------------- 1 file changed, 95 insertions(+), 49 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 0a6d60b3e67cd..b859ecf6424bd 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -198,7 +198,10 @@ EXPORT_SYMBOL(jiffies_64); struct timer_base { raw_spinlock_t lock; struct timer_list *running_timer; +#ifdef CONFIG_PREEMPT_RT spinlock_t expiry_lock; + atomic_t timer_waiters; +#endif unsigned long clk; unsigned long next_expiry; unsigned int cpu; @@ -1227,25 +1230,6 @@ int del_timer(struct timer_list *timer) } EXPORT_SYMBOL(del_timer); -static int __try_to_del_timer_sync(struct timer_list *timer, - struct timer_base **basep) -{ - struct timer_base *base; - unsigned long flags; - int ret = -1; - - debug_assert_init(timer); - - *basep = base = lock_timer_base(timer, &flags); - - if (base->running_timer != timer) - ret = detach_if_pending(timer, base, true); - - raw_spin_unlock_irqrestore(&base->lock, flags); - - return ret; -} - /** * try_to_del_timer_sync - Try to deactivate a timer * @timer: timer to delete @@ -1256,34 +1240,94 @@ static int __try_to_del_timer_sync(struct timer_list *timer, int try_to_del_timer_sync(struct timer_list *timer) { struct timer_base *base; + unsigned long flags; + int ret = -1; - return __try_to_del_timer_sync(timer, &base); + debug_assert_init(timer); + + base = lock_timer_base(timer, &flags); + + if (base->running_timer != timer) + ret = detach_if_pending(timer, base, true); + + raw_spin_unlock_irqrestore(&base->lock, flags); + + return ret; } EXPORT_SYMBOL(try_to_del_timer_sync); -#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL) -static int __del_timer_sync(struct timer_list *timer) +#ifdef CONFIG_PREEMPT_RT +static __init void timer_base_init_expiry_lock(struct timer_base *base) { - struct timer_base *base; - int ret; + spin_lock_init(&base->expiry_lock); +} - for (;;) { - ret = __try_to_del_timer_sync(timer, &base); - if (ret >= 0) - return ret; +static inline void timer_base_lock_expiry(struct timer_base *base) +{ + spin_lock(&base->expiry_lock); +} - if (READ_ONCE(timer->flags) & TIMER_IRQSAFE) - continue; +static inline void timer_base_unlock_expiry(struct timer_base *base) +{ + spin_unlock(&base->expiry_lock); +} - /* - * When accessing the lock, timers of base are no longer expired - * and so timer is no longer running. - */ - spin_lock(&base->expiry_lock); +/* + * The counterpart to del_timer_wait_running(). + * + * If there is a waiter for base->expiry_lock, then it was waiting for the + * timer callback to finish. Drop expiry_lock and reaquire it. That allows + * the waiter to acquire the lock and make progress. + */ +static void timer_sync_wait_running(struct timer_base *base) +{ + if (atomic_read(&base->timer_waiters)) { spin_unlock(&base->expiry_lock); + spin_lock(&base->expiry_lock); } } +/* + * This function is called on PREEMPT_RT kernels when the fast path + * deletion of a timer failed because the timer callback function was + * running. + * + * This prevents priority inversion, if the softirq thread on a remote CPU + * got preempted, and it prevents a life lock when the task which tries to + * delete a timer preempted the softirq thread running the timer callback + * function. + */ +static void del_timer_wait_running(struct timer_list *timer) +{ + u32 tf; + + tf = READ_ONCE(timer->flags); + if (!(tf & TIMER_MIGRATING)) { + struct timer_base *base = get_timer_base(tf); + + /* + * Mark the base as contended and grab the expiry lock, + * which is held by the softirq across the timer + * callback. Drop the lock immediately so the softirq can + * expire the next timer. In theory the timer could already + * be running again, but that's more than unlikely and just + * causes another wait loop. + */ + atomic_inc(&base->timer_waiters); + spin_lock_bh(&base->expiry_lock); + atomic_dec(&base->timer_waiters); + spin_unlock_bh(&base->expiry_lock); + } +} +#else +static inline void timer_base_init_expiry_lock(struct timer_base *base) { } +static inline void timer_base_lock_expiry(struct timer_base *base) { } +static inline void timer_base_unlock_expiry(struct timer_base *base) { } +static inline void timer_sync_wait_running(struct timer_base *base) { } +static inline void del_timer_wait_running(struct timer_list *timer) { } +#endif + +#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT_FULL) /** * del_timer_sync - deactivate a timer and wait for the handler to finish. * @timer: the timer to be deactivated @@ -1322,6 +1366,8 @@ static int __del_timer_sync(struct timer_list *timer) */ int del_timer_sync(struct timer_list *timer) { + int ret; + #ifdef CONFIG_LOCKDEP unsigned long flags; @@ -1339,14 +1385,17 @@ int del_timer_sync(struct timer_list *timer) * could lead to deadlock. */ WARN_ON(in_irq() && !(timer->flags & TIMER_IRQSAFE)); - /* - * Must be able to sleep on PREEMPT_RT because of the slowpath in - * __del_timer_sync(). - */ - if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(timer->flags & TIMER_IRQSAFE)) - might_sleep(); - return __del_timer_sync(timer); + do { + ret = try_to_del_timer_sync(timer); + + if (unlikely(ret < 0)) { + del_timer_wait_running(timer); + cpu_relax(); + } + } while (ret < 0); + + return ret; } EXPORT_SYMBOL(del_timer_sync); #endif @@ -1410,15 +1459,12 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head) raw_spin_unlock(&base->lock); call_timer_fn(timer, fn); base->running_timer = NULL; - spin_unlock(&base->expiry_lock); - spin_lock(&base->expiry_lock); raw_spin_lock(&base->lock); } else { raw_spin_unlock_irq(&base->lock); call_timer_fn(timer, fn); base->running_timer = NULL; - spin_unlock(&base->expiry_lock); - spin_lock(&base->expiry_lock); + timer_sync_wait_running(base); raw_spin_lock_irq(&base->lock); } } @@ -1715,7 +1761,7 @@ static inline void __run_timers(struct timer_base *base) if (!time_after_eq(jiffies, base->clk)) return; - spin_lock(&base->expiry_lock); + timer_base_lock_expiry(base); raw_spin_lock_irq(&base->lock); /* @@ -1743,7 +1789,7 @@ static inline void __run_timers(struct timer_base *base) expire_timers(base, heads + levels); } raw_spin_unlock_irq(&base->lock); - spin_unlock(&base->expiry_lock); + timer_base_unlock_expiry(base); } /* @@ -1990,7 +2036,7 @@ static void __init init_timer_cpu(int cpu) base->cpu = cpu; raw_spin_lock_init(&base->lock); base->clk = jiffies; - spin_lock_init(&base->expiry_lock); + timer_base_init_expiry_lock(base); } } From patchwork Mon Oct 31 15:50:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 620403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88C33ECAAA1 for ; Mon, 31 Oct 2022 15:51:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231831AbiJaPvR (ORCPT ); Mon, 31 Oct 2022 11:51:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56718 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231712AbiJaPvQ (ORCPT ); Mon, 31 Oct 2022 11:51:16 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F3F311C2A for ; Mon, 31 Oct 2022 08:51:15 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1667231473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j+v02lFGBTQ/aBIKxjP7ZtUbYkSoXI9MAMtdSPSmn7Q=; b=11JQffDEU8S/QWowg13N46bAeq0t/aYUOzgEaMnaF5oAWTEOFwbej8v/dIsJBJj/xuV5Dz FymMjP74K62P1H3YnBS+WOIahEluuIeMsQQ7UBW9uH8XvesPImIYWtH6ZTMcawusMIA3fO 8bKkXvfAZrpl5Uv0cIicm18GsFpYyIVB9qOUkAYCAwsmD2IyV3GumabRBplzWAw1cmv2fi 044fj0427rCA5kImQYZpmaT6VMhx5lqPvXlKeWtrACLLjwC8E4vGsluYrJ5HJukFeZwoYA o7F9ECbnPrgaVx31K+8lY9TIndWmykVRZA3BcuRCL4iuxBVU/Ir4fmrsQ2vZhQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1667231473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j+v02lFGBTQ/aBIKxjP7ZtUbYkSoXI9MAMtdSPSmn7Q=; b=UTePXVVp0dqot0WAoIMjF6HmWnbMW6nDBMrjFPk0Cjz8vW0EDyu86LLBZzbxbktFdoGTV1 0sn461WOTed7aZBg== To: Daniel Wagner Cc: linux-rt-users , Steven Rostedt , homas Gleixner , Carsten Emde , John Kacur , Tom Zanussi , Clark Williams , Pavel Machek Subject: [PATCH RT 2/3] timers: Move clearing of base::timer_running under base:: Lock Date: Mon, 31 Oct 2022 16:50:05 +0100 Message-Id: <20221031155006.1651995-3-bigeasy@linutronix.de> In-Reply-To: <20221031155006.1651995-1-bigeasy@linutronix.de> References: <20221024105416.nflnrqhmzsyqqdzz@carbon.lan> <20221031155006.1651995-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org From: Thomas Gleixner Upstream commit bb7262b295472eb6858b5c49893954794027cd84 syzbot reported KCSAN data races vs. timer_base::timer_running being set to NULL without holding base::lock in expire_timers(). This looks innocent and most reads are clearly not problematic, but Frederic identified an issue which is: int data = 0; void timer_func(struct timer_list *t) { data = 1; } CPU 0 CPU 1 ------------------------------ -------------------------- base = lock_timer_base(timer, &flags); raw_spin_unlock(&base->lock); if (base->running_timer != timer) call_timer_fn(timer, fn, baseclk); ret = detach_if_pending(timer, base, true); base->running_timer = NULL; raw_spin_unlock_irqrestore(&base->lock, flags); raw_spin_lock(&base->lock); x = data; If the timer has previously executed on CPU 1 and then CPU 0 can observe base->running_timer == NULL and returns, assuming the timer has completed, but it's not guaranteed on all architectures. The comment for del_timer_sync() makes that guarantee. Moving the assignment under base->lock prevents this. For non-RT kernel it's performance wise completely irrelevant whether the store happens before or after taking the lock. For an RT kernel moving the store under the lock requires an extra unlock/lock pair in the case that there is a waiter for the timer, but that's not the end of the world. Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com Fixes: 030dcdd197d7 ("timers: Prepare support for PREEMPT_RT") Signed-off-by: Thomas Gleixner Tested-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de Cc: stable@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior --- kernel/time/timer.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index b859ecf6424bd..603985720f547 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1282,8 +1282,10 @@ static inline void timer_base_unlock_expiry(struct timer_base *base) static void timer_sync_wait_running(struct timer_base *base) { if (atomic_read(&base->timer_waiters)) { + raw_spin_unlock_irq(&base->lock); spin_unlock(&base->expiry_lock); spin_lock(&base->expiry_lock); + raw_spin_lock_irq(&base->lock); } } @@ -1458,14 +1460,14 @@ static void expire_timers(struct timer_base *base, struct hlist_head *head) if (timer->flags & TIMER_IRQSAFE) { raw_spin_unlock(&base->lock); call_timer_fn(timer, fn); - base->running_timer = NULL; raw_spin_lock(&base->lock); + base->running_timer = NULL; } else { raw_spin_unlock_irq(&base->lock); call_timer_fn(timer, fn); + raw_spin_lock_irq(&base->lock); base->running_timer = NULL; timer_sync_wait_running(base); - raw_spin_lock_irq(&base->lock); } } } From patchwork Mon Oct 31 15:50:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 620815 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05F4EFA3747 for ; Mon, 31 Oct 2022 15:51:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231405AbiJaPvT (ORCPT ); Mon, 31 Oct 2022 11:51:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56720 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231754AbiJaPvQ (ORCPT ); Mon, 31 Oct 2022 11:51:16 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59A0811C35 for ; Mon, 31 Oct 2022 08:51:15 -0700 (PDT) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1667231474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dxrLsNaCxmjh7zZfTWjtpgN2DnX1GSZaFz3TRuacpbE=; b=SmwwEh0XOIYw2Zsxj1Nw9n8nkUjD2reFodM89fadMQd33k94gV1iZyUVlGp80pMw+IaOUv 3y+FH/YBWYCQ5ZhI8oN+ANP1ze6HST1x8oQnJ9OHoJGe+xYitEckZpFHgjMDXpynaPSJtW RHbs/z5I1cAcPYppFXIoUpGYYmIZvnB0eUBPH8RQtki7Fv5ESjtRNCgU1VgFzfMFXty5kd 5L9GhQqUisWdJpolbkgYFG0wMg/QzKYqRH2RtyKp6gYTRfJupzxxpLr1NYAUuXgdpyqszY LQHo5bMwwPfnbpvTnaOCsyk84Imki+y0W81sAf64kOdbB6nSEDVT07qjEydrOg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1667231474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dxrLsNaCxmjh7zZfTWjtpgN2DnX1GSZaFz3TRuacpbE=; b=ljqZr8y5/vqNljyk44Vo1CyvnNCUjmsQD46GmzsqehmukYqdzQ19AnwSKAjTT9dukhrlLn H/1O0eKBY28Xa5Ag== To: Daniel Wagner Cc: linux-rt-users , Steven Rostedt , homas Gleixner , Carsten Emde , John Kacur , Tom Zanussi , Clark Williams , Pavel Machek Subject: [PATCH RT 3/3] timers: Don't block on ->expiry_lock for TIMER_IRQSAFE timers Date: Mon, 31 Oct 2022 16:50:06 +0100 Message-Id: <20221031155006.1651995-4-bigeasy@linutronix.de> In-Reply-To: <20221031155006.1651995-1-bigeasy@linutronix.de> References: <20221024105416.nflnrqhmzsyqqdzz@carbon.lan> <20221031155006.1651995-1-bigeasy@linutronix.de> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rt-users@vger.kernel.org Upstream commit c725dafc95f1b37027840aaeaa8b7e4e9cd20516 PREEMPT_RT does not spin and wait until a running timer completes its callback but instead it blocks on a sleeping lock to prevent a livelock in the case that the task waiting for the callback completion preempted the callback. This cannot be done for timers flagged with TIMER_IRQSAFE. These timers can be canceled from an interrupt disabled context even on RT kernels. The expiry callback of such timers is invoked with interrupts disabled so there is no need to use the expiry lock mechanism because obviously the callback cannot be preempted even on RT kernels. Do not use the timer_base::expiry_lock mechanism when waiting for a running callback to complete if the timer is flagged with TIMER_IRQSAFE. Also add a lockdep assertion for RT kernels to validate that the expiry lock mechanism is always invoked in preemptible context. [ bigeasy: Dropping that lockdep_assert_preemption_enabled() check in backport ] Reported-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20201103190937.hga67rqhvknki3tp@linutronix.de Signed-off-by: Sebastian Andrzej Siewior --- kernel/time/timer.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 603985720f547..8c7bfcee9609a 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1304,7 +1304,7 @@ static void del_timer_wait_running(struct timer_list *timer) u32 tf; tf = READ_ONCE(timer->flags); - if (!(tf & TIMER_MIGRATING)) { + if (!(tf & (TIMER_MIGRATING | TIMER_IRQSAFE))) { struct timer_base *base = get_timer_base(tf); /* @@ -1388,6 +1388,15 @@ int del_timer_sync(struct timer_list *timer) */ WARN_ON(in_irq() && !(timer->flags & TIMER_IRQSAFE)); + /* + * Must be able to sleep on PREEMPT_RT because of the slowpath in + * del_timer_wait_running(). + */ +#if 0 + if (IS_ENABLED(CONFIG_PREEMPT_RT) && !(timer->flags & TIMER_IRQSAFE)) + lockdep_assert_preemption_enabled(); +#endif + do { ret = try_to_del_timer_sync(timer);