diff mbox series

[RFC,v3,2/5] cpuidle: Add Cpufreq Active Stats calls tracking idle entry/exit

Message ID 20220406220809.22555-3-lukasz.luba@arm.com
State New
Headers show
Series Introduce Cpufreq Active Stats | expand

Commit Message

Lukasz Luba April 6, 2022, 10:08 p.m. UTC
The Cpufreq Active Stats tracks and accounts the activity of the CPU
for each performance level. It accounts the real residency, when the CPU
was not idle, at a given performance level. This patch adds needed calls
which provide the CPU idle entry/exit events to the Cpufreq Active Stats.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 drivers/cpuidle/cpuidle.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Artem Bityutskiy April 26, 2022, 12:05 p.m. UTC | #1
Hi Lukasz,

On Wed, 2022-04-06 at 23:08 +0100, Lukasz Luba wrote:
> @@ -231,6 +232,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct
> cpuidle_driver *drv,
>         trace_cpu_idle(index, dev->cpu);
>         time_start = ns_to_ktime(local_clock());
>  
> +       cpufreq_active_stats_cpu_idle_enter(time_start);
> +
>         stop_critical_timings();
>         if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>                 rcu_idle_enter();
> @@ -243,6 +246,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct
> cpuidle_driver *drv,
>         time_end = ns_to_ktime(local_clock());
>         trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
>  
> +       cpufreq_active_stats_cpu_idle_exit(time_end);
> +

At this point the interrupts are still disabled, and they get enabled later. So
the more code you add here and the longer it executes, the longer you delay the
interrupts. Therefore, you are effectively increasing IRQ latency from idle by
adding more code here.

How much? I do not know, depends on how much code you need to execute. But the
amount of code in functions like this tends to increase over time.

So the risk is that we'll keep making 'cpufreq_active_stats_cpu_idle_exit()',
and (may be unintentionally) increase idle interrupt latency.

This is not ideal.

We use the 'wult' tool (https://github.com/intel/wult) to measure C-states
latency and interrupt latency on Intel platforms, and for fast C-states like
Intel C1, we can see that even the current code between C-state exit and
interrupt re-enabled adds measurable overhead.

I am worried about adding more stuff here.

Please, consider getting the stats after interrupts are re-enabled. You may lose
some "precision" because of that, but it is probably overall better that adding
to idle interrupt latency.

>         /* The cpu is no longer idle or about to enter idle. */
>         sched_idle_set_state(NULL);
Lukasz Luba April 26, 2022, 3:01 p.m. UTC | #2
Hi Artem,

Thanks for comments!

On 4/26/22 13:05, Artem Bityutskiy wrote:
> Hi Lukasz,
> 
> On Wed, 2022-04-06 at 23:08 +0100, Lukasz Luba wrote:
>> @@ -231,6 +232,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct
>> cpuidle_driver *drv,
>>          trace_cpu_idle(index, dev->cpu);
>>          time_start = ns_to_ktime(local_clock());
>>   
>> +       cpufreq_active_stats_cpu_idle_enter(time_start);
>> +
>>          stop_critical_timings();
>>          if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
>>                  rcu_idle_enter();
>> @@ -243,6 +246,8 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct
>> cpuidle_driver *drv,
>>          time_end = ns_to_ktime(local_clock());
>>          trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
>>   
>> +       cpufreq_active_stats_cpu_idle_exit(time_end);
>> +
> 
> At this point the interrupts are still disabled, and they get enabled later. So
> the more code you add here and the longer it executes, the longer you delay the
> interrupts. Therefore, you are effectively increasing IRQ latency from idle by
> adding more code here.

Good point, I've added it just below the trace_cpu_idle().

> 
> How much? I do not know, depends on how much code you need to execute. But the
> amount of code in functions like this tends to increase over time.
> 
> So the risk is that we'll keep making 'cpufreq_active_stats_cpu_idle_exit()',
> and (may be unintentionally) increase idle interrupt latency.
> 
> This is not ideal.

I agree,  I will try to find a better place to put this call.

> 
> We use the 'wult' tool (https://github.com/intel/wult) to measure C-states
> latency and interrupt latency on Intel platforms, and for fast C-states like
> Intel C1, we can see that even the current code between C-state exit and
> interrupt re-enabled adds measurable overhead.

Thanks for the hint and the link. I'll check that tool. I don't know if
that would work with my platforms. I might create some tests for this
latency measurements.

> 
> I am worried about adding more stuff here.
> 
> Please, consider getting the stats after interrupts are re-enabled. You may lose
> some "precision" because of that, but it is probably overall better that adding
> to idle interrupt latency.

Definitely. I don't need such precision, so later when interrupts are
re-enabled is OK for me.

> 
>>          /* The cpu is no longer idle or about to enter idle. */
>>          sched_idle_set_state(NULL);
> 
> 

This new call might be empty for your x86 kernels, since probably
you set the CONFIG_CPU_FREQ_STAT.I can add additional config
so platforms might still have CONFIG_CPU_FREQ_STAT but avoid this
new feature and additional overhead in idle exit when e.g.
CONFIG_CPU_FREQ_ACTIVE_STAT is not set.

The x86 platforms won't use IPA governor, so it's reasonable to
do this way.

Does this sounds good?

Regards,
Lukasz
Artem Bityutskiy April 26, 2022, 4:29 p.m. UTC | #3
On Tue, 2022-04-26 at 16:01 +0100, Lukasz Luba wrote:
> > I am worried about adding more stuff here.
> > 
> > Please, consider getting the stats after interrupts are re-enabled. You may
> > lose
> > some "precision" because of that, but it is probably overall better that
> > adding
> > to idle interrupt latency.
> 
> Definitely. I don't need such precision, so later when interrupts are
> re-enabled is OK for me.

Thanks. That is preferable in general: we do not do things with interrupts
disabled unless there is a very good reason to.

> 
> This new call might be empty for your x86 kernels, since probably
> you set the CONFIG_CPU_FREQ_STAT.I can add additional config
> so platforms might still have CONFIG_CPU_FREQ_STAT but avoid this
> new feature and additional overhead in idle exit when e.g.
> CONFIG_CPU_FREQ_ACTIVE_STAT is not set.
> 
> The x86 platforms won't use IPA governor, so it's reasonable to
> do this way.
> 
> Does this sounds good?

I did not thoroughly read your patches so can't comment on the details.

Just pointing that in general idle path is to be considered the critical path,
especially the part before interrupts are re-enabled. Not only on x86,
but on all platforms using cpuidle. This does not mean we can't read more
statistics there, but it does mean that we should be very careful about added
overhead, keep it under control, etc.

Thank you!
Lukasz Luba April 27, 2022, 1:58 p.m. UTC | #4
On 4/26/22 17:29, Artem Bityutskiy wrote:
> On Tue, 2022-04-26 at 16:01 +0100, Lukasz Luba wrote:
>>> I am worried about adding more stuff here.
>>>
>>> Please, consider getting the stats after interrupts are re-enabled. You may
>>> lose
>>> some "precision" because of that, but it is probably overall better that
>>> adding
>>> to idle interrupt latency.
>>
>> Definitely. I don't need such precision, so later when interrupts are
>> re-enabled is OK for me.
> 
> Thanks. That is preferable in general: we do not do things with interrupts
> disabled unless there is a very good reason to.
> 
>>
>> This new call might be empty for your x86 kernels, since probably
>> you set the CONFIG_CPU_FREQ_STAT.I can add additional config
>> so platforms might still have CONFIG_CPU_FREQ_STAT but avoid this
>> new feature and additional overhead in idle exit when e.g.
>> CONFIG_CPU_FREQ_ACTIVE_STAT is not set.
>>
>> The x86 platforms won't use IPA governor, so it's reasonable to
>> do this way.
>>
>> Does this sounds good?
> 
> I did not thoroughly read your patches so can't comment on the details.
> 
> Just pointing that in general idle path is to be considered the critical path,
> especially the part before interrupts are re-enabled. Not only on x86,
> but on all platforms using cpuidle. This does not mean we can't read more
> statistics there, but it does mean that we should be very careful about added
> overhead, keep it under control, etc.

I totally agree with you. I didn't know that the interrupts were not
enabled at that moment. I'll address it.

Regards,
Lukasz
diff mbox series

Patch

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ef2ea1b12cd8..f19711b95afb 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -16,6 +16,7 @@ 
 #include <linux/notifier.h>
 #include <linux/pm_qos.h>
 #include <linux/cpu.h>
+#include <linux/cpufreq_stats.h>
 #include <linux/cpuidle.h>
 #include <linux/ktime.h>
 #include <linux/hrtimer.h>
@@ -231,6 +232,8 @@  int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	trace_cpu_idle(index, dev->cpu);
 	time_start = ns_to_ktime(local_clock());
 
+	cpufreq_active_stats_cpu_idle_enter(time_start);
+
 	stop_critical_timings();
 	if (!(target_state->flags & CPUIDLE_FLAG_RCU_IDLE))
 		rcu_idle_enter();
@@ -243,6 +246,8 @@  int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv,
 	time_end = ns_to_ktime(local_clock());
 	trace_cpu_idle(PWR_EVENT_EXIT, dev->cpu);
 
+	cpufreq_active_stats_cpu_idle_exit(time_end);
+
 	/* The cpu is no longer idle or about to enter idle. */
 	sched_idle_set_state(NULL);