[v1,0/2] cpuidle: teo: Refine handling of short idle intervals

Message ID	4661520.LvFx2qVVIh@rjwysocki.net
Headers	show Received: from cloudserver094114.home.pl (cloudserver094114.home.pl [79.96.170.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 765F7240608; Thu, 3 Apr 2025 19:30:07 +0000 (UTC) From: "Rafael J. Wysocki" <rjw@rjwysocki.net> To: Linux PM <linux-pm@vger.kernel.org> Cc: LKML <linux-kernel@vger.kernel.org>, Daniel Lezcano <daniel.lezcano@linaro.org>, Christian Loehle <christian.loehle@arm.com>, Artem Bityutskiy <artem.bityutskiy@linux.intel.com>, Doug Smythies <dsmythies@telus.net>, Aboorva Devarajan <aboorvad@linux.ibm.com> Subject: [PATCH v1 0/2] cpuidle: teo: Refine handling of short idle intervals Date: Thu, 03 Apr 2025 21:16:08 +0200 Message-ID: <4661520.LvFx2qVVIh@rjwysocki.net> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="UTF-8"
Series	cpuidle: teo: Refine handling of short idle intervals \| expand [v1,0/2] cpuidle: teo: Refine handling of short idle intervals [v1,1/2] cpuidle: teo: Move candidate state lookup to separate function [v1,2/2] cpuidle: teo: Refine handling of short idle intervals

Message ID

4661520.LvFx2qVVIh@rjwysocki.net

Headers

From: "Rafael J. Wysocki" <rjw@rjwysocki.net>
To: Linux PM <linux-pm@vger.kernel.org>
Cc: LKML <linux-kernel@vger.kernel.org>,
 Daniel Lezcano <daniel.lezcano@linaro.org>,
 Christian Loehle <christian.loehle@arm.com>,
 Artem Bityutskiy <artem.bityutskiy@linux.intel.com>,
 Doug Smythies <dsmythies@telus.net>,
 Aboorva Devarajan <aboorvad@linux.ibm.com>
Subject: [PATCH v1 0/2] cpuidle: teo: Refine handling of short idle intervals
Date: Thu, 03 Apr 2025 21:16:08 +0200
Message-ID: <4661520.LvFx2qVVIh@rjwysocki.net>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 7Bit
Content-Type: text/plain; charset="UTF-8"

Series

cpuidle: teo: Refine handling of short idle intervals | expand

Message

Rafael J. Wysocki April 3, 2025, 7:16 p.m. UTC

Hi Everyone,

This series is intended to address an issue with overly aggressive selection
of idle state 0 (the polling state) in teo on x86 in some cases when timer
wakeups dominate the CPU wakeup pattern.

In those cases, timer wakeups are not taken into account when they are
within the LATENCY_THRESHOLD_NS range and the idle state selection may
be based entirely on non-timer wakeups which may be rare.  This causes
the prediction accuracy to be low and too much energy may be used as
a result.

The first patch is preparatory and it is not expected to make any
functional difference.

The second patch causes teo to take timer wakeups into account if it
is about to skip the tick_nohz_get_sleep_length() invocation, so they
get a chance to influence the idle state selection.

I have been using this series on my systems for several weeks and observed
a significant reduction of the polling state selection rate in multiple
workloads.

Thanks!

Comments

Artem Bityutskiy April 9, 2025, 6:52 a.m. UTC | #1

On Thu, 2025-04-03 at 21:16 +0200, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This series is intended to address an issue with overly aggressive selection
> of idle state 0 (the polling state) in teo on x86 in some cases when timer
> wakeups dominate the CPU wakeup pattern.

Hi Rafael, I ran SPECjbb2015 with and without these 2 patches on Granite Rapids
Xeon (GNR).

Expectation: no measurable difference, because there is almost no POLL in case
of SPECjbb2015 on GNR.

Result: no measurable difference.

Conclusion: these 2 patches do not introduce a regression as measured by
SPECjbb2015 on GNR.

"No regression" is also a useful piece of information, so reporting.

Thanks, Artem.

Rafael J. Wysocki April 9, 2025, 2:41 p.m. UTC | #2

On Wed, Apr 9, 2025 at 4:36 PM Doug Smythies <dsmythies@telus.net> wrote:
>
> On 2025.04.03 12:16 Rafael J. Wysocki wrote:
>
> > Hi Everyone,
>
> Hi Rafael,
>
> > This series is intended to address an issue with overly aggressive selection
> > of idle state 0 (the polling state) in teo on x86 in some cases when timer
> > wakeups dominate the CPU wakeup pattern.
> >
> > In those cases, timer wakeups are not taken into account when they are
> > within the LATENCY_THRESHOLD_NS range and the idle state selection may
> > be based entirely on non-timer wakeups which may be rare.  This causes
> > the prediction accuracy to be low and too much energy may be used as
> > a result.
> >
> > The first patch is preparatory and it is not expected to make any
> > functional difference.
> >
> > The second patch causes teo to take timer wakeups into account if it
> > is about to skip the tick_nohz_get_sleep_length() invocation, so they
> > get a chance to influence the idle state selection.
> >
> > I have been using this series on my systems for several weeks and observed
> > a significant reduction of the polling state selection rate in multiple
> > workloads.
>
> I ran many tests on this patch set.
> In general, there is nothing significant to report.
>
> There seemed to be a little less power use for the adrestia test and it took a little longer to execute, but the average wakeup latency was the same.
>
> I am still having noise and repeatability issues with my main periodic tests, where CPU is swept from low to high at serveral work sleep frequencies.
> But I didn't observe anything significant.
>
> In order to use more shallow idle states with a periodic workflow, I launched 2000 threads with each at 113 Hertz work/sleep frequency and almost no work to do for each work packet.
> The patched version used between 1 and 1.5 less processor package power, at around 85 watts.
> The patched version spent about 3.5% in idle state 0 verses about 5% for the unpatched version.
> The patched version spent about 31.8% in idle state 1 verses about 30.2% for the unpatched version.
>
> Test computer:
> Processor: Intel(R) Core(TM) i5-10600K CPU @ 4.10GHz
> Distro: Ubuntu 24.04.1, server, no desktop GUI.
> CPU frequency scaling driver: intel_pstate
> HWP: disabled.
> CPU frequency scaling governor: performance
> Ilde driver: intel_idle
> Idle governor: teo
> Idle states: 4: name : description:
>   state0/name:POLL                desc:CPUIDLE CORE POLL IDLE
>   state1/name:C1_ACPI          desc:ACPI FFH MWAIT 0x0
>   state2/name:C2_ACPI          desc:ACPI FFH MWAIT 0x30
>   state3/name:C3_ACPI          desc:ACPI FFH MWAIT 0x60
>
> ... Doug

Thank you!

Aboorva Devarajan April 14, 2025, 7:15 a.m. UTC | #3

On Thu, 2025-04-03 at 21:16 +0200, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This series is intended to address an issue with overly aggressive selection
> of idle state 0 (the polling state) in teo on x86 in some cases when timer
> wakeups dominate the CPU wakeup pattern.
> 
> In those cases, timer wakeups are not taken into account when they are
> within the LATENCY_THRESHOLD_NS range and the idle state selection may
> be based entirely on non-timer wakeups which may be rare.  This causes
> the prediction accuracy to be low and too much energy may be used as
> a result.
> 
> The first patch is preparatory and it is not expected to make any
> functional difference.
> 
> The second patch causes teo to take timer wakeups into account if it
> is about to skip the tick_nohz_get_sleep_length() invocation, so they
> get a chance to influence the idle state selection.
> 
> I have been using this series on my systems for several weeks and observed
> a significant reduction of the polling state selection rate in multiple
> workloads.
> 
> Thanks!
> 
> 

Hi Rafael,

I'm running some tests and going through the patch.
I haven't noticed any deviations so far, will post the results shortly.

Thanks,
Aboorva