diff mbox series

softirq: wake up ktimer thread in softirq context

Message ID 20221208075604.811710-1-junxiao.chang@intel.com
State New
Headers show
Series softirq: wake up ktimer thread in softirq context | expand

Commit Message

Junxiao Chang Dec. 8, 2022, 7:56 a.m. UTC
Occiaionally timer interrupt might be triggered in softirq context,
ktimer thread should be woken up with RT kernel, or else ktimer
thread might stay in sleep state although timer interrupt has been
triggered.

This change fixes a latency issue that timer handler is delayed for
more than 4ms in network related test.

Fixes: 2165d27554e8 ("softirq: Use a dedicated thread for timer wakeups.")
Reported-by: Peh, Hock Zhang <hock.zhang.peh@intel.com>
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
---
 kernel/softirq.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

Comments

Junxiao Chang Dec. 20, 2022, 10:44 a.m. UTC | #1
Any comment? This patch is for 6.1-rt, issue could be reproduced with 5.19-rt kernel as well.

This issue is easier to reproduced when there is heavy network workload which introduces a lot of softirq events. If hrtimer interrupt is triggered in softirq context, with current RT kernel, it will not wake up ktimers thread which handles hrtimer event because in function __irq_exit_rcu, "in_interrupt()" is true:

static inline void __irq_exit_rcu(void)
{
...
        preempt_count_sub(HARDIRQ_OFFSET);
        if (!in_interrupt()) {
                if (local_softirq_pending())
                        invoke_softirq();

                if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
                        wake_timersd();
        }
...
}

Then ktimers threads stays in sleep state, hrtimer function will not be called although hrtimer interrupt has been triggered. Ktimers thread might be woken up in next timer interrupt which introduces long delay.

Any comments are welcome.

Regards,
Junxiao

-----Original Message-----
From: Chang, Junxiao <junxiao.chang@intel.com> 
Sent: Thursday, December 8, 2022 3:56 PM
To: linux-kernel@vger.kernel.org
Cc: linux-rt-users@vger.kernel.org; bigeasy@linutronix.de; tglx@linutronix.de; rostedt@goodmis.org; Chang, Junxiao <junxiao.chang@intel.com>; Peh, Hock Zhang <hock.zhang.peh@intel.com>
Subject: [PATCH] softirq: wake up ktimer thread in softirq context

Occiaionally timer interrupt might be triggered in softirq context, ktimer thread should be woken up with RT kernel, or else ktimer thread might stay in sleep state although timer interrupt has been triggered.

This change fixes a latency issue that timer handler is delayed for more than 4ms in network related test.

Fixes: 2165d27554e8 ("softirq: Use a dedicated thread for timer wakeups.")
Reported-by: Peh, Hock Zhang <hock.zhang.peh@intel.com>
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
---
 kernel/softirq.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c index ab1fe34326bab..34ae39e4a3d10 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -664,13 +664,10 @@ static inline void __irq_exit_rcu(void)  #endif
 	account_hardirq_exit(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
-	if (!in_interrupt()) {
-		if (local_softirq_pending())
-			invoke_softirq();
-
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
-			wake_timersd();
-	}
+	if (!in_interrupt() && local_softirq_pending())
+		invoke_softirq();
+	if (!(in_nmi() || in_hardirq()) && local_pending_timers())
+		wake_timersd();
 
 	tick_irq_exit();
 }
--
2.25.1
Alison Chaiken Dec. 20, 2022, 5:10 p.m. UTC | #2
On Tue, Dec 20, 2022 at 2:44 AM Chang, Junxiao <junxiao.chang@intel.com> wrote:
> Any comment? This patch is for 6.1-rt, issue could be reproduced with 5.19-rt kernel as well.

In

https://lore.kernel.org/linux-rt-users/CAFzL-7v-NSFKAsyhxReEES7bMomSTwYyrZscnjbkydEP3CTXmQ@mail.gmail.com/

we reported an occasional problem with an x86 system entering a deep C
state even though timers were pending.   Perhaps your patch would
prevent this transition.

> This issue is easier to reproduced when there is heavy network workload which introduces a lot of softirq events. If hrtimer interrupt is triggered in softirq context, with current RT kernel, it will not wake up ktimers thread which handles hrtimer event because in function __irq_exit_rcu, "in_interrupt()" is true:
>
> static inline void __irq_exit_rcu(void)
> {
> ...
>         preempt_count_sub(HARDIRQ_OFFSET);
>         if (!in_interrupt()) {
>                 if (local_softirq_pending())
>                         invoke_softirq();
>
>                 if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
>                         wake_timersd();
>         }
> ...
> }
>

Isn't removing the check for IS_ENABLED(CONFIG_PREEMPT_RT) inadvisable?

> Then ktimers threads stays in sleep state, hrtimer function will not be called although hrtimer interrupt has been triggered. Ktimers thread might be woken up in next timer interrupt which introduces long delay.
>
> Any comments are welcome.
>
> Regards,
> Junxiao

-- Alison Chaiken
Aurora Innovation


 -----Original Message-----
> From: Chang, Junxiao <junxiao.chang@intel.com>
> Sent: Thursday, December 8, 2022 3:56 PM
> To: linux-kernel@vger.kernel.org
> Cc: linux-rt-users@vger.kernel.org; bigeasy@linutronix.de; tglx@linutronix.de; rostedt@goodmis.org; Chang, Junxiao <junxiao.chang@intel.com>; Peh, Hock Zhang <hock.zhang.peh@intel.com>
> Subject: [PATCH] softirq: wake up ktimer thread in softirq context
>
> Occiaionally timer interrupt might be triggered in softirq context, ktimer thread should be woken up with RT kernel, or else ktimer thread might stay in sleep state although timer interrupt has been triggered.
>
> This change fixes a latency issue that timer handler is delayed for more than 4ms in network related test.
>
> Fixes: 2165d27554e8 ("softirq: Use a dedicated thread for timer wakeups.")
> Reported-by: Peh, Hock Zhang <hock.zhang.peh@intel.com>
> Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
> ---
>  kernel/softirq.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c index ab1fe34326bab..34ae39e4a3d10 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -664,13 +664,10 @@ static inline void __irq_exit_rcu(void)  #endif
>         account_hardirq_exit(current);
>         preempt_count_sub(HARDIRQ_OFFSET);
> -       if (!in_interrupt()) {
> -               if (local_softirq_pending())
> -                       invoke_softirq();
> -
> -               if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
> -                       wake_timersd();
> -       }
> +       if (!in_interrupt() && local_softirq_pending())
> +               invoke_softirq();
> +       if (!(in_nmi() || in_hardirq()) && local_pending_timers())
> +               wake_timersd();
>
>         tick_irq_exit();
>  }
> --
> 2.25.1
>
Sebastian Andrzej Siewior Dec. 20, 2022, 6:02 p.m. UTC | #3
On 2022-12-20 10:44:07 [+0000], Chang, Junxiao wrote:
> Any comment? This patch is for 6.1-rt, issue could be reproduced with 5.19-rt kernel as well.

Thanks for the ping. I did see the initial email and I didn't get to it
yet. I need to re-test, confirm and then apply.
The ktimer patch is not in v5.15 and this is currently the latest one
maintained by the stable team. I don't know which one will be the
following LTS kernel but this one needs to have this addressed. The
v5.19 is not receiving any updates.
Given the current timing, I will look into this in January.

> Regards,
> Junxiao

Sebastian
Alison Chaiken Dec. 20, 2022, 7:28 p.m. UTC | #4
On 2022-12-20 10:44:07 [+0000], Chang, Junxiao wrote:
> > Any comment? This patch is for 6.1-rt, issue could be reproduced with 5.19-rt kernel as well.

On Tue, Dec 20, 2022 at 10:03 AM bigeasy@linutronix.de
<bigeasy@linutronix.de> wrote:
> Thanks for the ping. I did see the initial email and I didn't get to it
> yet. I need to re-test, confirm and then apply.
> The ktimer patch is not in v5.15 and this is currently the latest one
> maintained by the stable team. I don't know which one will be the
> following LTS kernel but this one needs to have this addressed. The
> v5.19 is not receiving any updates.

We backported the timersd patch to 5.15, where is where we have
observed the C-state transition despite pending timers.    The
reported trace sequence does show that __do_softirq() starts just
before the hrtimer_interrupt when the problem occurs:

When the timers are delayed, the trouble appears to begin when the
hrtimer_interrupt results in execution of hrtimer_run_queues() instead
of raise_hrtimer_softirq():
 <userspace>-13534   [007] 16947.069504: enqueue_hrtimer
          <idle>-0       [007] 16947.069547:
hrtimer_next_event_without: 16947.067643167
          <idle>-0       [007] 16947.069553: enqueue_hrtimer
          <idle>-0       [007] 16947.069567: enqueue_hrtimer
          <idle>-0       [007] 16947.069575:
hrtimer_next_event_without: 16947.067643167
          <idle>-0       [007] 16947.069579: enqueue_hrtimer
          <idle>-0       [007] 16947.078270: hrtimer_interrupt
          <idle>-0       [007] 16947.078278: hrtimer_run_queues.
          <idle>-0       [007] 16947.078300: enqueue_hrtimer
       ktimers/7-80      [007] 16947.078308: __do_softirq
     ksoftirqd/7-81      [007] 16947.078338: ksoftirqd_should_run 0
          <idle>-0       [007] 16947.078361:
hrtimer_next_event_without: 16947.067643167
          <idle>-0       [007] 16947.079323: hrtimer_interrupt
          <idle>-0       [007] 16947.079328: hrtimer_run_queues.
          <idle>-0       [007] 16947.079334: enqueue_hrtimer
     ksoftirqd/7-81      [007] 16947.079359: ksoftirqd_should_run 128
     ksoftirqd/7-81      [007] 16947.079360: __do_softirq
     ksoftirqd/7-81      [007] 16947.079361: hrtimer_interrupt
     ksoftirqd/7-81      [007] 16947.079361: raise_hrtimer_softirq.
     ksoftirqd/7-81      [007] 16947.079364: ksoftirqd_should_run 0
          <idle>-0       [007] 16947.079375:
hrtimer_next_event_without: 9223372036854775807
          <idle>-0       [007] 16947.079376:
tick_nohz_get_sleep_length: 86.838 ms
          <idle>-0       [007] 16947.079378: enqueue_hrtimer

Note the bogus value printed by hrtimer_next_event_without().

> Given the current timing, I will look into this in January.
>
> > Regards,
> > Junxiao
>
> Sebastian

-- Alison Chaiken
Aurora Innovation
Alison Chaiken Jan. 23, 2023, 6:04 p.m. UTC | #5
On Tue, Dec 20, 2022 at 10:03 AM bigeasy@linutronix.de
<bigeasy@linutronix.de> wrote:
>
> On 2022-12-20 10:44:07 [+0000], Chang, Junxiao wrote:
> > Any comment? This patch is for 6.1-rt, issue could be reproduced with 5.19-rt kernel as well.
>
> Thanks for the ping. I did see the initial email and I didn't get to it
> yet. I need to re-test, confirm and then apply.
> The ktimer patch is not in v5.15 and this is currently the latest one
> maintained by the stable team. I don't know which one will be the
> following LTS kernel but this one needs to have this addressed. The
> v5.19 is not receiving any updates.
> Given the current timing, I will look into this in January.
>
> > Regards,
> > Junxiao
>
> Sebastian

Any further thoughts about Junxiao Chang's patch?    We backported
ktimer patches from Sebastian and Frederic to 5.15 and would like this
fix  it is sensible.

Thanks,
Alison Chaiken
Aurora Innovation
Alison Chaiken Feb. 10, 2023, 3:32 p.m. UTC | #6
On Tue, Dec 20, 2022 at 10:03 AM bigeasy@linutronix.de
<bigeasy@linutronix.de> wrote:
>
> On 2022-12-20 10:44:07 [+0000], Chang, Junxiao wrote:
> > Any comment? This patch is for 6.1-rt, issue could be reproduced with 5.19-rt kernel as well.
>
> Thanks for the ping. I did see the initial email and I didn't get to it
> yet. I need to re-test, confirm and then apply.
> The ktimer patch is not in v5.15 and this is currently the latest one
> maintained by the stable team. I don't know which one will be the
> following LTS kernel but this one needs to have this addressed. The
> v5.19 is not receiving any updates.
> Given the current timing, I will look into this in January.
>
> > Regards,
> > Junxiao
>
> Sebastian

Should users of ktimers adopt Junxiao Chang's patch

https://lore.kernel.org/linux-rt-users/BN9PR11MB5370BA8A506EB8519DC879C1ECEA9@BN9PR11MB5370.namprd11.prod.outlook.com/

from 20 Dec?   It appears to be a bug fix.

Thanks,
Alison Chaiken
Aurora Innovation
Sebastian Andrzej Siewior Feb. 10, 2023, 4:33 p.m. UTC | #7
On 2023-02-10 07:32:26 [-0800], Alison Chaiken wrote:
> 
> Should users of ktimers adopt Junxiao Chang's patch
> 
> https://lore.kernel.org/linux-rt-users/BN9PR11MB5370BA8A506EB8519DC879C1ECEA9@BN9PR11MB5370.namprd11.prod.outlook.com/
> 
> from 20 Dec?   It appears to be a bug fix.

I know. I have a backlog. I looked at this a while ago but got
distracted. I hopefully get it done next week. Now that 6.1 is LTS I
need to bring it in shape before I can hand it over.

> Thanks,
> Alison Chaiken
> Aurora Innovation

Sebastian
Sebastian Andrzej Siewior Feb. 17, 2023, 5:30 p.m. UTC | #8
On 2022-12-08 15:56:04 [+0800], Junxiao Chang wrote:
> Occiaionally timer interrupt might be triggered in softirq context,
> ktimer thread should be woken up with RT kernel, or else ktimer
> thread might stay in sleep state although timer interrupt has been
> triggered.
> 
> This change fixes a latency issue that timer handler is delayed for
> more than 4ms in network related test.

Sorry for keeping you waiting. Your observation and patch is correct. I'm
going to apply a slightly modified version of the patch (see below)
after I reworded the commit message on Monday.

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ab1fe34326bab..82f3e68fbe220 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -664,13 +664,12 @@ static inline void __irq_exit_rcu(void)
 #endif
 	account_hardirq_exit(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
-	if (!in_interrupt()) {
-		if (local_softirq_pending())
-			invoke_softirq();
+	if (!in_interrupt() && local_softirq_pending())
+		invoke_softirq();
 
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
-			wake_timersd();
-	}
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers() &&
+	    !(in_nmi() | in_hardirq()))
+		wake_timersd();
 
 	tick_irq_exit();
 }

Sebastian
Junxiao Chang Feb. 20, 2023, 8:55 a.m. UTC | #9
Thank you for handling this, modified patch looks good to me.

Regards,
Junxiao

-----Original Message-----
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de> 
Sent: Saturday, February 18, 2023 1:30 AM
To: Chang, Junxiao <junxiao.chang@intel.com>
Cc: linux-kernel@vger.kernel.org; linux-rt-users@vger.kernel.org; tglx@linutronix.de; rostedt@goodmis.org; Peh, Hock Zhang <hock.zhang.peh@intel.com>
Subject: Re: [PATCH] softirq: wake up ktimer thread in softirq context

On 2022-12-08 15:56:04 [+0800], Junxiao Chang wrote:
> Occiaionally timer interrupt might be triggered in softirq context, 
> ktimer thread should be woken up with RT kernel, or else ktimer thread 
> might stay in sleep state although timer interrupt has been triggered.
> 
> This change fixes a latency issue that timer handler is delayed for 
> more than 4ms in network related test.

Sorry for keeping you waiting. Your observation and patch is correct. I'm going to apply a slightly modified version of the patch (see below) after I reworded the commit message on Monday.

diff --git a/kernel/softirq.c b/kernel/softirq.c index ab1fe34326bab..82f3e68fbe220 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -664,13 +664,12 @@ static inline void __irq_exit_rcu(void)  #endif
 	account_hardirq_exit(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
-	if (!in_interrupt()) {
-		if (local_softirq_pending())
-			invoke_softirq();
+	if (!in_interrupt() && local_softirq_pending())
+		invoke_softirq();
 
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
-			wake_timersd();
-	}
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers() &&
+	    !(in_nmi() | in_hardirq()))
+		wake_timersd();
 
 	tick_irq_exit();
 }

Sebastian
Sebastian Andrzej Siewior Feb. 20, 2023, 10:53 a.m. UTC | #10
------->8------

From: Junxiao Chang <junxiao.chang@intel.com>
Date: Mon, 20 Feb 2023 09:12:20 +0100
Subject: [PATCH] softirq: Wake ktimers thread also in softirq.

If the hrtimer is raised while a softirq is processed then it does not
wake the corresponding ktimers thread. This is due to the optimisation in the
irq-exit path which is also used to wake the ktimers thread. For the other
softirqs, this is okay because the additional softirq bits will be handled by
the currently running softirq handler.
The timer related softirq bits are added to a different variable and rely on
the ktimers thread.
As a consuequence the wake up of ktimersd is delayed until the next timer tick.

Always wake the ktimers thread if a timer related softirq is pending.

Reported-by: Peh, Hock Zhang <hock.zhang.peh@intel.com>
Signed-off-by: Junxiao Chang <junxiao.chang@intel.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 kernel/softirq.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -664,13 +664,12 @@ static inline void __irq_exit_rcu(void)
 #endif
 	account_hardirq_exit(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
-	if (!in_interrupt()) {
-		if (local_softirq_pending())
-			invoke_softirq();
+	if (!in_interrupt() && local_softirq_pending())
+		invoke_softirq();
 
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
-			wake_timersd();
-	}
+	if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers() &&
+	    !(in_nmi() | in_hardirq()))
+		wake_timersd();
 
 	tick_irq_exit();
 }
diff mbox series

Patch

diff --git a/kernel/softirq.c b/kernel/softirq.c
index ab1fe34326bab..34ae39e4a3d10 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -664,13 +664,10 @@  static inline void __irq_exit_rcu(void)
 #endif
 	account_hardirq_exit(current);
 	preempt_count_sub(HARDIRQ_OFFSET);
-	if (!in_interrupt()) {
-		if (local_softirq_pending())
-			invoke_softirq();
-
-		if (IS_ENABLED(CONFIG_PREEMPT_RT) && local_pending_timers())
-			wake_timersd();
-	}
+	if (!in_interrupt() && local_softirq_pending())
+		invoke_softirq();
+	if (!(in_nmi() || in_hardirq()) && local_pending_timers())
+		wake_timersd();
 
 	tick_irq_exit();
 }