Message ID | 2167194.irdbgypaU6@kreacher |
---|---|
State | New |
Headers | show |
Series | cpuidle: teo: Do not check timers unconditionally every time | expand |
Hi Rafael, On Thu, 3 Aug 2023, Rafael J. Wysocki wrote: > Index: linux-pm/drivers/cpuidle/governors/teo.c > =================================================================== > --- linux-pm.orig/drivers/cpuidle/governors/teo.c > +++ linux-pm/drivers/cpuidle/governors/teo.c > @@ -166,6 +166,12 @@ > */ > #define NR_RECENT 9 > > +/* > + * Idle state target residency threshold used for deciding whether or not to > + * check the time till the closest expected timer event. > + */ > +#define RESIDENCY_THRESHOLD_NS (15 * NSEC_PER_USEC) > + I would like to understand why this residency threshold is a fixed value and not related to TICK_NSEC. I'm sure there is a reason for it - but for me it is not obvious. Can you please explain it to me? Thanks, Anna-Maria
Hi, On Fri, Aug 11, 2023 at 10:52 AM Anna-Maria Behnsen <anna-maria@linutronix.de> wrote: > > Hi Rafael, > > On Thu, 3 Aug 2023, Rafael J. Wysocki wrote: > > > Index: linux-pm/drivers/cpuidle/governors/teo.c > > =================================================================== > > --- linux-pm.orig/drivers/cpuidle/governors/teo.c > > +++ linux-pm/drivers/cpuidle/governors/teo.c > > @@ -166,6 +166,12 @@ > > */ > > #define NR_RECENT 9 > > > > +/* > > + * Idle state target residency threshold used for deciding whether or not to > > + * check the time till the closest expected timer event. > > + */ > > +#define RESIDENCY_THRESHOLD_NS (15 * NSEC_PER_USEC) > > + > > I would like to understand why this residency threshold is a fixed value > and not related to TICK_NSEC. I'm sure there is a reason for it - but for > me it is not obvious. Can you please explain it to me? First off, I'm not convinced that there is any direct connection between the TICK_NSEC value and which idle states can be regarded as shallow. HZ may vary between 100 and 1000 (an order of magnitude) and this doesn't affect the target residencies of idle states in any way. Next, the checks involving this value don't influence the tick-stopping decision in any way, so I don't see a reason why they should depend on TICK_NSEC. Finally, it can be observed that ideally, the return value of tick_nohz_get_sleep_length() should always be taken into consideration, because it may influence the idle state selection in any case. However, if the target residency of the idle state selected so far is really small, calling it may be skipped in case it is costly, because its contribution is not likely to be significant. Worst-case we would end up with a CPU wakeup before the target residency of the selected idle state has elapsed, so some energy will be lost and some exit latency will be incurred in vain, so this really should be done when the numbers involved are small enough. Now, what does "small enough" mean? My answer is 15 us.
Index: linux-pm/drivers/cpuidle/governors/teo.c =================================================================== --- linux-pm.orig/drivers/cpuidle/governors/teo.c +++ linux-pm/drivers/cpuidle/governors/teo.c @@ -166,6 +166,12 @@ */ #define NR_RECENT 9 +/* + * Idle state target residency threshold used for deciding whether or not to + * check the time till the closest expected timer event. + */ +#define RESIDENCY_THRESHOLD_NS (15 * NSEC_PER_USEC) + /** * struct teo_bin - Metrics used by the TEO cpuidle governor. * @intercepts: The "intercepts" metric. @@ -542,6 +548,22 @@ static int teo_select(struct cpuidle_dri idx = i; } + /* + * Skip the timers check if state 0 is the current candidate one, + * because an immediate non-timer wakeup is expected in that case. + */ + if (!idx) + goto out_tick; + + /* + * If state 0 is a polling one, check if the target residency of + * the current candidate state is low enough and skip the timers + * check in that case too. + */ + if ((drv->states[0].flags & CPUIDLE_FLAG_POLLING) && + drv->states[idx].target_residency_ns < RESIDENCY_THRESHOLD_NS) + goto out_tick; + duration_ns = tick_nohz_get_sleep_length(&delta_tick); cpu_data->sleep_length_ns = duration_ns;