mbox series

[v4,0/2] Avoid spurious freezer wakeups

Message ID 20230908-avoid-spurious-freezer-wakeups-v4-0-6155aa3dafae@quicinc.com
Headers show
Series Avoid spurious freezer wakeups | expand

Message

Elliot Berman Sept. 8, 2023, 10:49 p.m. UTC
After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN  are
always woken up on the thaw path. Prior to that commit, tasks could ask
freezer to consider them "frozen enough" via freezer_do_not_count(). The
commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
allows freezer to immediately mark the task as TASK_FROZEN without
waking up the task.  This is efficient for the suspend path, but on the
thaw path, the task is always woken up even if the task didn't need to
wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
these tasks are capable of handling of the wakeup, we can observe a
power/perf impact from the extra wakeup.

We observed on Android many tasks wait in the TASK_FREEZABLE state
(particularly due to many of them being binder clients). We observed
nearly 4x the number of tasks and a corresponding linear increase in
latency and power consumption when thawing the system. The latency
increased from ~15ms to ~50ms.

Avoid the spurious wakeups by saving the state of TASK_FREEZABLE tasks.
If the task was running before entering TASK_FROZEN state
(__refrigerator()) or if the task received a wake up for the saved
state, then the task is woken on thaw. saved_state from PREEMPT_RT locks
can be re-used because freezer would not stomp on the rtlock wait flow:
TASK_RTLOCK_WAIT isn't considered freezable.

For testing purposes, I use these commands can help see how many tasks were
woken during thawing:

1. Setup:
   mkdir /sys/kernel/tracing/instances/freezer
   cd /sys/kernel/tracing/instances/freezer 
   echo 0 > tracing_on ; echo > trace
   echo power:suspend_resume > set_event
   echo 'enable_event:sched:sched_wakeup if action == "thaw_processes" && start == 1' > events/power/suspend_resume/trigger
   echo 'traceoff if action == "thaw_processes" && start == 0' > events/power/suspend_resume/trigger
   echo 1 > tracing_on

2. Let kernel go to suspend

3. After kernel's back up:
   cat /sys/kernel/tracing/instances/freezer/trace | grep sched_wakeup | grep -o "pid=[0-9]*" | sort -u | wc -l

Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
---
Changes in v4:
- Split into 2 commits
- Link to v3: https://lore.kernel.org/r/20230908-avoid-spurious-freezer-wakeups-v3-1-d49821fda04d@quicinc.com

Changes in v3:
- Remove #ifdeferry for saved_state in kernel/sched/core.c
- Link to v2: https://lore.kernel.org/r/20230830-avoid-spurious-freezer-wakeups-v2-1-8877245cdbdc@quicinc.com

Changes in v2:
- Rebase to v6.5 which includes commit 1c06918788e8 ("sched: Consider task_struct::saved_state in wait_task_inactive()")
  This allows us to remove the special jobctl handling on thaw.
- Link to v1: https://lore.kernel.org/r/20230828-avoid-spurious-freezer-wakeups-v1-1-8be8cf761472@quicinc.com

---
Elliot Berman (2):
      sched/core: Remove ifdeffery for saved_state
      freezer,sched: Use saved_state to reduce some spurious wakeups

 include/linux/sched.h |  2 --
 kernel/freezer.c      | 41 +++++++++++++++++++----------------------
 kernel/sched/core.c   | 40 +++++++++++++++++-----------------------
 3 files changed, 36 insertions(+), 47 deletions(-)
---
base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
change-id: 20230817-avoid-spurious-freezer-wakeups-9f8619680b3a

Best regards,

Comments

Peter Zijlstra Sept. 11, 2023, 3:49 p.m. UTC | #1
On Fri, Sep 08, 2023 at 03:49:14PM -0700, Elliot Berman wrote:

> Elliot Berman (2):
>       sched/core: Remove ifdeffery for saved_state
>       freezer,sched: Use saved_state to reduce some spurious wakeups

Thanks!
Carlos Llamas Sept. 26, 2023, 4:17 p.m. UTC | #2
On Fri, Sep 08, 2023 at 03:49:14PM -0700, Elliot Berman wrote:
> After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
> tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN  are
> always woken up on the thaw path. Prior to that commit, tasks could ask
> freezer to consider them "frozen enough" via freezer_do_not_count(). The
> commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
> allows freezer to immediately mark the task as TASK_FROZEN without
> waking up the task.  This is efficient for the suspend path, but on the
> thaw path, the task is always woken up even if the task didn't need to
> wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
> these tasks are capable of handling of the wakeup, we can observe a
> power/perf impact from the extra wakeup.

This issue is hurting the performance of our stable 6.1 releases. Does
it make sense to backport these patches into stable branches once they
land in mainline? I would assume we want to fix the perf regression
there too?

--
Carlos Llamas
Peter Zijlstra Sept. 26, 2023, 8:02 p.m. UTC | #3
On Tue, Sep 26, 2023 at 04:17:33PM +0000, Carlos Llamas wrote:
> On Fri, Sep 08, 2023 at 03:49:14PM -0700, Elliot Berman wrote:
> > After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
> > tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN  are
> > always woken up on the thaw path. Prior to that commit, tasks could ask
> > freezer to consider them "frozen enough" via freezer_do_not_count(). The
> > commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
> > allows freezer to immediately mark the task as TASK_FROZEN without
> > waking up the task.  This is efficient for the suspend path, but on the
> > thaw path, the task is always woken up even if the task didn't need to
> > wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
> > these tasks are capable of handling of the wakeup, we can observe a
> > power/perf impact from the extra wakeup.
> 
> This issue is hurting the performance of our stable 6.1 releases. Does
> it make sense to backport these patches into stable branches once they
> land in mainline? I would assume we want to fix the perf regression
> there too?

Note that these patches are in tip/sched/core, slated for the next merge
window.
Carlos Llamas Sept. 26, 2023, 8:56 p.m. UTC | #4
On Tue, Sep 26, 2023 at 10:02:38PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 26, 2023 at 04:17:33PM +0000, Carlos Llamas wrote:
> > 
> > This issue is hurting the performance of our stable 6.1 releases. Does
> > it make sense to backport these patches into stable branches once they
> > land in mainline? I would assume we want to fix the perf regression
> > there too?
> 
> Note that these patches are in tip/sched/core, slated for the next merge
> window.

We can wait, no problem. I just wanted to make sure we also patch stable
if needed. Elliot, would you be able to send a backport of your patches
to stable once they land in mainline on the next merge window?

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: stable@vger.kernel.org

--
Carlos Llamas
Elliot Berman Sept. 26, 2023, 10:04 p.m. UTC | #5
On 9/26/2023 1:56 PM, Carlos Llamas wrote:
> On Tue, Sep 26, 2023 at 10:02:38PM +0200, Peter Zijlstra wrote:
>> On Tue, Sep 26, 2023 at 04:17:33PM +0000, Carlos Llamas wrote:
>>>
>>> This issue is hurting the performance of our stable 6.1 releases. Does
>>> it make sense to backport these patches into stable branches once they
>>> land in mainline? I would assume we want to fix the perf regression
>>> there too?
>>
>> Note that these patches are in tip/sched/core, slated for the next merge
>> window.
> 
> We can wait, no problem. I just wanted to make sure we also patch stable
> if needed. Elliot, would you be able to send a backport of your patches
> to stable once they land in mainline on the next merge window?

Yep, happy to send it. There's a trivial conflict to resolve w/older
kernels not having the new guard(...)(...) macros.

> 
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: stable@vger.kernel.org
> 
> --
> Carlos Llamas
Elliot Berman Sept. 28, 2023, 4:24 p.m. UTC | #6
Hi Peter,

On 9/26/2023 1:02 PM, Peter Zijlstra wrote:
> On Tue, Sep 26, 2023 at 04:17:33PM +0000, Carlos Llamas wrote:
>> On Fri, Sep 08, 2023 at 03:49:14PM -0700, Elliot Berman wrote:
>>> After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
>>> tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN  are
>>> always woken up on the thaw path. Prior to that commit, tasks could ask
>>> freezer to consider them "frozen enough" via freezer_do_not_count(). The
>>> commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
>>> allows freezer to immediately mark the task as TASK_FROZEN without
>>> waking up the task.  This is efficient for the suspend path, but on the
>>> thaw path, the task is always woken up even if the task didn't need to
>>> wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
>>> these tasks are capable of handling of the wakeup, we can observe a
>>> power/perf impact from the extra wakeup.
>>
>> This issue is hurting the performance of our stable 6.1 releases. Does
>> it make sense to backport these patches into stable branches once they
>> land in mainline? I would assume we want to fix the perf regression
>> there too?
> 
> Note that these patches are in tip/sched/core, slated for the next merge
> window.

Can the changes be scheduled for the next 6.6-rc? I'd like to get the
changes backported to stable sooner since we observed the regression on
real systems.

Thanks,
Elliot
Ingo Molnar Oct. 3, 2023, 1:01 p.m. UTC | #7
* Elliot Berman <quic_eberman@quicinc.com> wrote:

> Hi Peter,
> 
> On 9/26/2023 1:02 PM, Peter Zijlstra wrote:
> > On Tue, Sep 26, 2023 at 04:17:33PM +0000, Carlos Llamas wrote:
> >> On Fri, Sep 08, 2023 at 03:49:14PM -0700, Elliot Berman wrote:
> >>> After commit f5d39b020809 ("freezer,sched: Rewrite core freezer logic"),
> >>> tasks that transition directly from TASK_FREEZABLE to TASK_FROZEN  are
> >>> always woken up on the thaw path. Prior to that commit, tasks could ask
> >>> freezer to consider them "frozen enough" via freezer_do_not_count(). The
> >>> commit replaced freezer_do_not_count() with a TASK_FREEZABLE state which
> >>> allows freezer to immediately mark the task as TASK_FROZEN without
> >>> waking up the task.  This is efficient for the suspend path, but on the
> >>> thaw path, the task is always woken up even if the task didn't need to
> >>> wake up and goes back to its TASK_(UN)INTERRUPTIBLE state. Although
> >>> these tasks are capable of handling of the wakeup, we can observe a
> >>> power/perf impact from the extra wakeup.
> >>
> >> This issue is hurting the performance of our stable 6.1 releases. Does
> >> it make sense to backport these patches into stable branches once they
> >> land in mainline? I would assume we want to fix the perf regression
> >> there too?
> > 
> > Note that these patches are in tip/sched/core, slated for the next merge
> > window.
> 
> Can the changes be scheduled for the next 6.6-rc? I'd like to get the
> changes backported to stable sooner since we observed the regression on
> real systems.

These are pretty risky and go beyond fixes for regressions introduced
recently: the original commit is more than a year old.

But I agree with having the fixes in stable once it hits upstream in the v6.7 
merge window - the difference would only be a couple of days vs. -final.

Thanks,

	Ingo