diff mbox series

[1/2] PM: EM: Add min/max available performance state limits

Message ID 20240403162315.1458337-2-lukasz.luba@arm.com
State New
Headers show
Series Update Energy Model with perfromance limits | expand

Commit Message

Lukasz Luba April 3, 2024, 4:23 p.m. UTC
On some devices there are HW dependencies for shared frequency and voltage
between devices: CPUs and L3 cache. When the L3 cache frequency is
increased, the affected CPUs might run at higher voltage and frequency.
That higher voltage causes higher CPU power and thus more energy is used
for running the tasks.

Add performance state limits which are applied for the device. This allows
the Energy Model (EM) to better reflect the CPU's currently available
performance states. This information is used by Energy Aware Scheduler
(EAS) during task placement to avoid situation when a simulated energy
cost has error due to using wrong Power Domain (PD) frequency.

The function performs only bare minimum checks (and requires EM as
an argument) to reduce the overhead.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h | 22 ++++++++++++++---
 kernel/power/energy_model.c  | 48 ++++++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+), 4 deletions(-)

Comments

Hongyan Xia April 9, 2024, 2:47 p.m. UTC | #1
On 03/04/2024 17:23, Lukasz Luba wrote:
> [...]
>
> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 927cc55ba0b3d..1a8b394251cb2 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -628,6 +628,8 @@ int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
>   		goto unlock;
>   
>   	dev->em_pd->flags |= flags;
> +	dev->em_pd->min_ps = 0;
> +	dev->em_pd->max_ps = nr_states - 1;
>   
>   	em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state);
>   
> @@ -856,3 +858,49 @@ int em_dev_update_chip_binning(struct device *dev)
>   	return em_recalc_and_update(dev, pd, em_table);
>   }
>   EXPORT_SYMBOL_GPL(em_dev_update_chip_binning);
> +
> +
> +/**
> + * em_update_performance_limits() - Update Energy Model with performance
> + *				limits information.
> + * @pd			: Performance Domain with EM that has to be updated.
> + * @freq_min_khz	: New minimum allowed frequency for this device.
> + * @freq_max_khz	: New maximum allowed frequency for this device.
> + *
> + * This function allows to update the EM with information about available
> + * performance levels. It takes the minimum and maximum frequency in kHz
> + * and does internal translation to performance levels.
> + * Returns 0 on success or -EINVAL when failed.
> + */
> +int em_update_performance_limits(struct em_perf_domain *pd,
> +		unsigned long freq_min_khz, unsigned long freq_max_khz)
> +{
> +	struct em_perf_state *table;
> +	int min_ps = -1;
> +	int max_ps = -1;
> +	int i;
> +
> +	if (!pd)
> +		return -EINVAL;
> +
> +	rcu_read_lock();
> +	table = em_perf_state_from_pd(pd);
> +
> +	for (i = 0; i < pd->nr_perf_states; i++) {
> +		if (freq_min_khz == table[i].frequency)
> +			min_ps = i;
> +		if (freq_max_khz == table[i].frequency)
> +			max_ps = i;
> +	}
> +	rcu_read_unlock();
> +
> +	/* Only update when both are found and sane */
> +	if (min_ps < 0 || max_ps < 0 || max_ps < min_ps)
> +		return -EINVAL;
> +
> +	pd->min_ps = min_ps;
> +	pd->max_ps = max_ps;

Are we sure we are protected against multiple simultaneous updates? Or 
is this a concern for the caller?

The rest of the patch LGTM.

> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(em_update_performance_limits);
Lukasz Luba April 22, 2024, 7:24 a.m. UTC | #2
On 4/9/24 15:47, Hongyan Xia wrote:
> On 03/04/2024 17:23, Lukasz Luba wrote:
>> [...]
>>
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index 927cc55ba0b3d..1a8b394251cb2 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -628,6 +628,8 @@ int em_dev_register_perf_domain(struct device 
>> *dev, unsigned int nr_states,
>>           goto unlock;
>>       dev->em_pd->flags |= flags;
>> +    dev->em_pd->min_ps = 0;
>> +    dev->em_pd->max_ps = nr_states - 1;
>>       em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state);
>> @@ -856,3 +858,49 @@ int em_dev_update_chip_binning(struct device *dev)
>>       return em_recalc_and_update(dev, pd, em_table);
>>   }
>>   EXPORT_SYMBOL_GPL(em_dev_update_chip_binning);
>> +
>> +
>> +/**
>> + * em_update_performance_limits() - Update Energy Model with performance
>> + *                limits information.
>> + * @pd            : Performance Domain with EM that has to be updated.
>> + * @freq_min_khz    : New minimum allowed frequency for this device.
>> + * @freq_max_khz    : New maximum allowed frequency for this device.
>> + *
>> + * This function allows to update the EM with information about 
>> available
>> + * performance levels. It takes the minimum and maximum frequency in kHz
>> + * and does internal translation to performance levels.
>> + * Returns 0 on success or -EINVAL when failed.
>> + */
>> +int em_update_performance_limits(struct em_perf_domain *pd,
>> +        unsigned long freq_min_khz, unsigned long freq_max_khz)
>> +{
>> +    struct em_perf_state *table;
>> +    int min_ps = -1;
>> +    int max_ps = -1;
>> +    int i;
>> +
>> +    if (!pd)
>> +        return -EINVAL;
>> +
>> +    rcu_read_lock();
>> +    table = em_perf_state_from_pd(pd);
>> +
>> +    for (i = 0; i < pd->nr_perf_states; i++) {
>> +        if (freq_min_khz == table[i].frequency)
>> +            min_ps = i;
>> +        if (freq_max_khz == table[i].frequency)
>> +            max_ps = i;
>> +    }
>> +    rcu_read_unlock();
>> +
>> +    /* Only update when both are found and sane */
>> +    if (min_ps < 0 || max_ps < 0 || max_ps < min_ps)
>> +        return -EINVAL;
>> +
>> +    pd->min_ps = min_ps;
>> +    pd->max_ps = max_ps;
> 
> Are we sure we are protected against multiple simultaneous updates? Or 
> is this a concern for the caller?
> 
> The rest of the patch LGTM.
> 

I've tried to make it running fast for only one caller. Although,
if someone would like to use it from many places then locking should be
handled under in function (and I will use the existing mutex for it).

I'll change it. Thanks for the review.
Dietmar Eggemann April 22, 2024, 7:46 a.m. UTC | #3
On 03/04/2024 18:23, Lukasz Luba wrote:
> On some devices there are HW dependencies for shared frequency and voltage
> between devices: CPUs and L3 cache. When the L3 cache frequency is
> increased, the affected CPUs might run at higher voltage and frequency.

IMHO, this is an example where the min Performance State (PS) makes
sense. But what's a use case for the max PS?

> That higher voltage causes higher CPU power and thus more energy is used
> for running the tasks.
> 
> Add performance state limits which are applied for the device. This allows

Regarding device, I thought that this is only applicable for device type
CPU?

> the Energy Model (EM) to better reflect the CPU's currently available
> performance states. This information is used by Energy Aware Scheduler
> (EAS) during task placement to avoid situation when a simulated energy
> cost has error due to using wrong Power Domain (PD) frequency.

Maybe better?

This is important on SoCs with HW dependencies mentioned above so that
the  Energy Aware Scheduler (EAS) does not use performance states
outside the valid min-max range for energy calculation.

> The function performs only bare minimum checks (and requires EM as

s/The function/em_update_performance_limits()

s/EM/PD ... I guess we always pass a PD pointer to all the EM functions.
I guess we can say that an EM consists of at least 2 PDs.
I guess it's valid to say that we limit per PD, e.g. per little CPUs?

[...]

>  /**
>   * em_pd_get_efficient_state() - Get an efficient performance state from the EM
> @@ -189,12 +195,13 @@ int em_dev_update_chip_binning(struct device *dev);
>   */

Missing  min_ps, max_ps description in function header of
em_pd_get_efficient_state().

[...]

> +/**
> + * em_update_performance_limits() - Update Energy Model with performance
> + *				limits information.
> + * @pd			: Performance Domain with EM that has to be updated.
> + * @freq_min_khz	: New minimum allowed frequency for this device.
> + * @freq_max_khz	: New maximum allowed frequency for this device.
> + *
> + * This function allows to update the EM with information about available
> + * performance levels. It takes the minimum and maximum frequency in kHz
> + * and does internal translation to performance levels.
> + * Returns 0 on success or -EINVAL when failed.
> + */
> +int em_update_performance_limits(struct em_perf_domain *pd,
> +		unsigned long freq_min_khz, unsigned long freq_max_khz)
> +{
> +	struct em_perf_state *table;
> +	int min_ps = -1;
> +	int max_ps = -1;
> +	int i;
> +
> +	if (!pd)
> +		return -EINVAL;
> +
> +	rcu_read_lock();
> +	table = em_perf_state_from_pd(pd);
> +
> +	for (i = 0; i < pd->nr_perf_states; i++) {
> +		if (freq_min_khz == table[i].frequency)

So the caller has to know the exact frequency value of the performance
states (PSs)? It's not 'f(PS_n-1) + 1 <= x <= f(PS_n)'.

[...]
diff mbox series

Patch

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index d30d67c2f07cf..feadd0fd6b356 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -55,6 +55,8 @@  struct em_perf_table {
  * struct em_perf_domain - Performance domain
  * @em_table:		Pointer to the runtime modifiable em_perf_table
  * @nr_perf_states:	Number of performance states
+ * @min_ps:		Minimum available performance state index
+ * @max_ps:		Maximum available performance state index
  * @flags:		See "em_perf_domain flags"
  * @cpus:		Cpumask covering the CPUs of the domain. It's here
  *			for performance reasons to avoid potential cache
@@ -70,6 +72,8 @@  struct em_perf_table {
 struct em_perf_domain {
 	struct em_perf_table __rcu *em_table;
 	int nr_perf_states;
+	int min_ps;
+	int max_ps;
 	unsigned long flags;
 	unsigned long cpus[];
 };
@@ -173,6 +177,8 @@  void em_table_free(struct em_perf_table __rcu *table);
 int em_dev_compute_costs(struct device *dev, struct em_perf_state *table,
 			 int nr_states);
 int em_dev_update_chip_binning(struct device *dev);
+int em_update_performance_limits(struct em_perf_domain *pd,
+		unsigned long freq_min_khz, unsigned long freq_max_khz);
 
 /**
  * em_pd_get_efficient_state() - Get an efficient performance state from the EM
@@ -189,12 +195,13 @@  int em_dev_update_chip_binning(struct device *dev);
  */
 static inline int
 em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
-			  unsigned long max_util, unsigned long pd_flags)
+			  unsigned long max_util, unsigned long pd_flags,
+			  int min_ps, int max_ps)
 {
 	struct em_perf_state *ps;
 	int i;
 
-	for (i = 0; i < nr_perf_states; i++) {
+	for (i = min_ps; i <= max_ps; i++) {
 		ps = &table[i];
 		if (ps->performance >= max_util) {
 			if (pd_flags & EM_PERF_DOMAIN_SKIP_INEFFICIENCIES &&
@@ -204,7 +211,7 @@  em_pd_get_efficient_state(struct em_perf_state *table, int nr_perf_states,
 		}
 	}
 
-	return nr_perf_states - 1;
+	return max_ps;
 }
 
 /**
@@ -255,7 +262,8 @@  static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
 	 */
 	em_table = rcu_dereference(pd->em_table);
 	i = em_pd_get_efficient_state(em_table->state, pd->nr_perf_states,
-				      max_util, pd->flags);
+				      max_util, pd->flags, pd->min_ps,
+				      pd->max_ps);
 	ps = &em_table->state[i];
 
 	/*
@@ -392,6 +400,12 @@  static inline int em_dev_update_chip_binning(struct device *dev)
 {
 	return -EINVAL;
 }
+static inline
+int em_update_performance_limits(struct em_perf_domain *pd,
+		unsigned long freq_min_khz, unsigned long freq_max_khz)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 927cc55ba0b3d..1a8b394251cb2 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -628,6 +628,8 @@  int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 		goto unlock;
 
 	dev->em_pd->flags |= flags;
+	dev->em_pd->min_ps = 0;
+	dev->em_pd->max_ps = nr_states - 1;
 
 	em_cpufreq_update_efficiencies(dev, dev->em_pd->em_table->state);
 
@@ -856,3 +858,49 @@  int em_dev_update_chip_binning(struct device *dev)
 	return em_recalc_and_update(dev, pd, em_table);
 }
 EXPORT_SYMBOL_GPL(em_dev_update_chip_binning);
+
+
+/**
+ * em_update_performance_limits() - Update Energy Model with performance
+ *				limits information.
+ * @pd			: Performance Domain with EM that has to be updated.
+ * @freq_min_khz	: New minimum allowed frequency for this device.
+ * @freq_max_khz	: New maximum allowed frequency for this device.
+ *
+ * This function allows to update the EM with information about available
+ * performance levels. It takes the minimum and maximum frequency in kHz
+ * and does internal translation to performance levels.
+ * Returns 0 on success or -EINVAL when failed.
+ */
+int em_update_performance_limits(struct em_perf_domain *pd,
+		unsigned long freq_min_khz, unsigned long freq_max_khz)
+{
+	struct em_perf_state *table;
+	int min_ps = -1;
+	int max_ps = -1;
+	int i;
+
+	if (!pd)
+		return -EINVAL;
+
+	rcu_read_lock();
+	table = em_perf_state_from_pd(pd);
+
+	for (i = 0; i < pd->nr_perf_states; i++) {
+		if (freq_min_khz == table[i].frequency)
+			min_ps = i;
+		if (freq_max_khz == table[i].frequency)
+			max_ps = i;
+	}
+	rcu_read_unlock();
+
+	/* Only update when both are found and sane */
+	if (min_ps < 0 || max_ps < 0 || max_ps < min_ps)
+		return -EINVAL;
+
+	pd->min_ps = min_ps;
+	pd->max_ps = max_ps;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(em_update_performance_limits);