diff mbox series

[1/1] cpuidle/menu: avoid prioritizing physical state over polling state

Message ID 20240809073120.250974-2-aboorvad@linux.ibm.com
State New
Headers show
Series cpuidle/menu: Address performance drop from favoring physical over polling cpuidle state | expand

Commit Message

Aboorva Devarajan Aug. 9, 2024, 7:31 a.m. UTC
Update the cpuidle menu governor to avoid prioritizing physical states
over polling states when predicted idle duration is lesser than the
physical states target residency duration for performance gains.

Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
---
 drivers/cpuidle/governors/menu.c | 11 -----------
 1 file changed, 11 deletions(-)

Comments

Christian Loehle Aug. 13, 2024, 1:09 p.m. UTC | #1
On 8/9/24 08:31, Aboorva Devarajan wrote:
> Update the cpuidle menu governor to avoid prioritizing physical states
> over polling states when predicted idle duration is lesser than the
> physical states target residency duration for performance gains.
> 
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>  drivers/cpuidle/governors/menu.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index f3c9d49f0f2a..cf99ca103f9b 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -354,17 +354,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  			idx = i; /* first enabled state */
>  
>  		if (s->target_residency_ns > predicted_ns) {
> -			/*
> -			 * Use a physical idle state, not busy polling, unless
> -			 * a timer is going to trigger soon enough.
> -			 */
> -			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> -			    s->exit_latency_ns <= latency_req &&
> -			    s->target_residency_ns <= data->next_timer_ns) {
> -				predicted_ns = s->target_residency_ns;
> -				idx = i;
> -				break;
> -			}
>  			if (predicted_ns < TICK_NSEC)
>  				break;
>  


How about this?
data->next_timer_ns is set to KTIME_MAX in case the last few intervals were
very short, which might be the case for you.

-->8--

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index f3c9d49f0f2a..77b40201c446 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -360,6 +360,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
                         */
                        if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
                            s->exit_latency_ns <= latency_req &&
+                           data->next_timer_ns != KTIME_MAX &&
                            s->target_residency_ns <= data->next_timer_ns) {
                                predicted_ns = s->target_residency_ns;
                                idx = i;
Christian Loehle Sept. 19, 2024, 4:08 p.m. UTC | #2
On 8/9/24 08:31, Aboorva Devarajan wrote:
> Update the cpuidle menu governor to avoid prioritizing physical states
> over polling states when predicted idle duration is lesser than the
> physical states target residency duration for performance gains.

I would use something like this as wording:

[PATCH] cpuidle: menu: Select polling on short predicted idle
Select a shallow state matching our predicted_ns even if it is a
polling one.

Additionally to querying the next timer event, menu also employs an
interval tracking strategy of most recent idle durations for workloads
that aren't woken up by predictable timer events. The logic predicts
the next wakeup as predicted_ns, but the logic handling that skipped
any polling states.
In the worst-case on POWER we might only have snooze as polling and
CEDE. This makes the entire logic around it a NOP as it never let's
predicted_ns choose an appropriate idle state since it requires a
non-polling one.
To actually enforce predicted_ns for idle state selection actually use
states even though they are polling ones.

Even for a perfect recent intervals of
[1, 1, 1, 1, 1, 1, 1, 1]
menu previously chose the first non-idle state.


-----

To mitigate potential side-effects since most platforms have a
shallower idle state we could add something based on that, but
really your patch would be the only consistent IMO.

I'm not quite sure why this was put there in the first place.
It was essentially introduced in
commit ("69d25870f20c cpuidle: fix the menu governor to boost IO performance")
with a 20us lower limit.

> 
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>  drivers/cpuidle/governors/menu.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index f3c9d49f0f2a..cf99ca103f9b 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -354,17 +354,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  			idx = i; /* first enabled state */
>  
>  		if (s->target_residency_ns > predicted_ns) {
> -			/*
> -			 * Use a physical idle state, not busy polling, unless
> -			 * a timer is going to trigger soon enough.
> -			 */
> -			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> -			    s->exit_latency_ns <= latency_req &&
> -			    s->target_residency_ns <= data->next_timer_ns) {
> -				predicted_ns = s->target_residency_ns;
> -				idx = i;
> -				break;
> -			}
>  			if (predicted_ns < TICK_NSEC)
>  				break;
>
Gautam Menghani Oct. 18, 2024, 10:36 a.m. UTC | #3
On Fri, Aug 09, 2024 at 01:01:20PM GMT, Aboorva Devarajan wrote:
> Update the cpuidle menu governor to avoid prioritizing physical states
> over polling states when predicted idle duration is lesser than the
> physical states target residency duration for performance gains.
> 
> Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com>
> ---
>  drivers/cpuidle/governors/menu.c | 11 -----------
>  1 file changed, 11 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
> index f3c9d49f0f2a..cf99ca103f9b 100644
> --- a/drivers/cpuidle/governors/menu.c
> +++ b/drivers/cpuidle/governors/menu.c
> @@ -354,17 +354,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  			idx = i; /* first enabled state */
>  
>  		if (s->target_residency_ns > predicted_ns) {
> -			/*
> -			 * Use a physical idle state, not busy polling, unless
> -			 * a timer is going to trigger soon enough.
> -			 */
> -			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
> -			    s->exit_latency_ns <= latency_req &&
> -			    s->target_residency_ns <= data->next_timer_ns) {
> -				predicted_ns = s->target_residency_ns;
> -				idx = i;
> -				break;
> -			}
>  			if (predicted_ns < TICK_NSEC)
>  				break;
>  
> -- 
> 2.39.3
> 

I tried to identify the number of times we enter the snooze and CEDE states
on a pseries machine with and without this patch applied while running
the daytrader database benchmark[1]. The results show that the number of
times we enter CEDE goes down considerably:

// Current upstream 6.11

| STATE | NO. OF TIMES ENTERED 	| TOTAL TIME SPENT IN STATE (US) | AVG TIME SPENT IN STATE (US)	|
|Snooze | 6,389		        | 221,592			 | 34.68			|
| CEDE	| 160,368		| 13,288,855			 | 82.86			|


// Current upstream 6.11 - with above patch applied

| STATE | NO. OF TIMES ENTERED 	| TOTAL TIME SPENT IN STATE (US) | AVG TIME SPENT IN STATE (US)	|
|Snooze | 140,267	        | 6,400,073			 | 45.63			|
| CEDE	| 50,884		| 6,553,641			 | 128.80			|

Above data was gathered with the help of power:cpu_idle tracepoint, and
was recorded for 45 secs while the daytrader benchmark was running.

[1] : https://github.com/WASdev/sample.daytrader7

Thanks,
Gautam
diff mbox series

Patch

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index f3c9d49f0f2a..cf99ca103f9b 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -354,17 +354,6 @@  static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			idx = i; /* first enabled state */
 
 		if (s->target_residency_ns > predicted_ns) {
-			/*
-			 * Use a physical idle state, not busy polling, unless
-			 * a timer is going to trigger soon enough.
-			 */
-			if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) &&
-			    s->exit_latency_ns <= latency_req &&
-			    s->target_residency_ns <= data->next_timer_ns) {
-				predicted_ns = s->target_residency_ns;
-				idx = i;
-				break;
-			}
 			if (predicted_ns < TICK_NSEC)
 				break;