diff mbox series

[v5,11/23] PM: EM: Add API for updating the runtime modifiable EM

Message ID 20231129110853.94344-12-lukasz.luba@arm.com
State New
Headers show
Series Introduce runtime modifiable Energy Model | expand

Commit Message

Lukasz Luba Nov. 29, 2023, 11:08 a.m. UTC
Add API function em_dev_update_perf_domain() which allows to safely
change the EM. The concurrent modifiers are protected by the mutex
to serialize them. Removal of the old memory is asynchronous and
handled by the RCU mechanisms.

Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
---
 include/linux/energy_model.h |  8 +++++++
 kernel/power/energy_model.c  | 46 ++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+)

Comments

Dietmar Eggemann Dec. 12, 2023, 6:50 p.m. UTC | #1
On 29/11/2023 12:08, Lukasz Luba wrote:

[...]

> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
> index 489a358b9a00..614891fde8df 100644
> --- a/kernel/power/energy_model.c
> +++ b/kernel/power/energy_model.c
> @@ -221,6 +221,52 @@ static int em_allocate_perf_table(struct em_perf_domain *pd,
>  	return 0;
>  }
>  
> +/**
> + * em_dev_update_perf_domain() - Update runtime EM table for a device
> + * @dev		: Device for which the EM is to be updated
> + * @table	: The new EM table that is going to used from now

s/going to used/going to be used

> + *
> + * Update EM runtime modifiable table for the @dev using the privided @table.

s/privided/provided

> + *
> + * This function uses mutex to serialize writers, so it must not be called
> + * from non-sleeping context.
> + *
> + * Return 0 on success or a proper error in case of failure.
> + */
> +int em_dev_update_perf_domain(struct device *dev,
> +			      struct em_perf_table __rcu *new_table)
> +{
> +	struct em_perf_table __rcu *old_table;
> +	struct em_perf_domain *pd;
> +
> +	/*
> +	 * The lock serializes update and unregister code paths. When the
> +	 * EM has been unregistered in the meantime, we should capture that
> +	 * when entering this critical section. It also makes sure that

What do you want to capture here? You want to block in this moment,
right? Don't understand the 2. sentence here.

[...]
Lukasz Luba Dec. 20, 2023, 8:06 a.m. UTC | #2
On 12/12/23 18:50, Dietmar Eggemann wrote:
> On 29/11/2023 12:08, Lukasz Luba wrote:
> 
> [...]
> 
>> diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
>> index 489a358b9a00..614891fde8df 100644
>> --- a/kernel/power/energy_model.c
>> +++ b/kernel/power/energy_model.c
>> @@ -221,6 +221,52 @@ static int em_allocate_perf_table(struct em_perf_domain *pd,
>>   	return 0;
>>   }
>>   
>> +/**
>> + * em_dev_update_perf_domain() - Update runtime EM table for a device
>> + * @dev		: Device for which the EM is to be updated
>> + * @table	: The new EM table that is going to used from now
> 
> s/going to used/going to be used
> 
>> + *
>> + * Update EM runtime modifiable table for the @dev using the privided @table.
> 
> s/privided/provided
> 
>> + *
>> + * This function uses mutex to serialize writers, so it must not be called
>> + * from non-sleeping context.
>> + *
>> + * Return 0 on success or a proper error in case of failure.
>> + */
>> +int em_dev_update_perf_domain(struct device *dev,
>> +			      struct em_perf_table __rcu *new_table)
>> +{
>> +	struct em_perf_table __rcu *old_table;
>> +	struct em_perf_domain *pd;
>> +
>> +	/*
>> +	 * The lock serializes update and unregister code paths. When the
>> +	 * EM has been unregistered in the meantime, we should capture that
>> +	 * when entering this critical section. It also makes sure that
> 
> What do you want to capture here? You want to block in this moment,
> right? Don't understand the 2. sentence here.
> 
> [...]

There is general issue with module... they can reload. A driver which
registered EM can than later disappear. I had similar issues for the
devfreq cooling. It can happen at any time. In this scenario let's
consider scenario w/ 2 kernel drivers:
1. Main driver which registered EM, e.g. GPU driver
2. Thermal driver which updates that EM
When 1. starts unload process, it has to make sure that it will
not free the main EM 'pd', because the 2. might try to use e.g.
'pd->nr_perf_states' while doing update at the moment.
Thus, this 'pd' has local mutex, to avoid issues of
module unload vs. EM update. The EM unregister will block on
that mutex and let the background update finish it's critical
section.
Dietmar Eggemann Jan. 4, 2024, 3:45 p.m. UTC | #3
On 20/12/2023 09:06, Lukasz Luba wrote:
> 
> 
> On 12/12/23 18:50, Dietmar Eggemann wrote:
>> On 29/11/2023 12:08, Lukasz Luba wrote:

[...]

>>> +int em_dev_update_perf_domain(struct device *dev,
>>> +                  struct em_perf_table __rcu *new_table)
>>> +{
>>> +    struct em_perf_table __rcu *old_table;
>>> +    struct em_perf_domain *pd;
>>> +
>>> +    /*
>>> +     * The lock serializes update and unregister code paths. When the
>>> +     * EM has been unregistered in the meantime, we should capture that
>>> +     * when entering this critical section. It also makes sure that
>>
>> What do you want to capture here? You want to block in this moment,
>> right? Don't understand the 2. sentence here.
>>
>> [...]
> 
> There is general issue with module... they can reload. A driver which
> registered EM can than later disappear. I had similar issues for the
> devfreq cooling. It can happen at any time. In this scenario let's
> consider scenario w/ 2 kernel drivers:
> 1. Main driver which registered EM, e.g. GPU driver
> 2. Thermal driver which updates that EM
> When 1. starts unload process, it has to make sure that it will
> not free the main EM 'pd', because the 2. might try to use e.g.
> 'pd->nr_perf_states' while doing update at the moment.
> Thus, this 'pd' has local mutex, to avoid issues of
> module unload vs. EM update. The EM unregister will block on
> that mutex and let the background update finish it's critical
> section.

All true but wouldn't

    /* Serialize update/unregister or concurrent updates */

be sufficient as a comment here?
diff mbox series

Patch

diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
index e785211828fe..520a8c8ad849 100644
--- a/include/linux/energy_model.h
+++ b/include/linux/energy_model.h
@@ -183,6 +183,8 @@  struct em_data_callback {
 
 struct em_perf_domain *em_cpu_get(int cpu);
 struct em_perf_domain *em_pd_get(struct device *dev);
+int em_dev_update_perf_domain(struct device *dev,
+			      struct em_perf_table __rcu *new_table);
 int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states,
 				struct em_data_callback *cb, cpumask_t *span,
 				bool microwatts);
@@ -379,6 +381,12 @@  struct em_perf_table __rcu *em_allocate_table(struct em_perf_domain *pd)
 	return NULL;
 }
 static inline void em_free_table(struct em_perf_table __rcu *table) {}
+static inline
+int em_dev_update_perf_domain(struct device *dev,
+			      struct em_perf_table __rcu *new_table)
+{
+	return -EINVAL;
+}
 #endif
 
 #endif
diff --git a/kernel/power/energy_model.c b/kernel/power/energy_model.c
index 489a358b9a00..614891fde8df 100644
--- a/kernel/power/energy_model.c
+++ b/kernel/power/energy_model.c
@@ -221,6 +221,52 @@  static int em_allocate_perf_table(struct em_perf_domain *pd,
 	return 0;
 }
 
+/**
+ * em_dev_update_perf_domain() - Update runtime EM table for a device
+ * @dev		: Device for which the EM is to be updated
+ * @table	: The new EM table that is going to used from now
+ *
+ * Update EM runtime modifiable table for the @dev using the privided @table.
+ *
+ * This function uses mutex to serialize writers, so it must not be called
+ * from non-sleeping context.
+ *
+ * Return 0 on success or a proper error in case of failure.
+ */
+int em_dev_update_perf_domain(struct device *dev,
+			      struct em_perf_table __rcu *new_table)
+{
+	struct em_perf_table __rcu *old_table;
+	struct em_perf_domain *pd;
+
+	/*
+	 * The lock serializes update and unregister code paths. When the
+	 * EM has been unregistered in the meantime, we should capture that
+	 * when entering this critical section. It also makes sure that
+	 * two concurrent updates will be serialized.
+	 */
+	mutex_lock(&em_pd_mutex);
+
+	if (!dev || !dev->em_pd) {
+		mutex_unlock(&em_pd_mutex);
+		return -EINVAL;
+	}
+	pd = dev->em_pd;
+
+	em_inc_usage(new_table);
+
+	old_table = pd->runtime_table;
+	rcu_assign_pointer(pd->runtime_table, new_table);
+
+	em_cpufreq_update_efficiencies(dev, new_table->state);
+
+	em_dec_usage(old_table);
+
+	mutex_unlock(&em_pd_mutex);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(em_dev_update_perf_domain);
+
 static int em_create_runtime_table(struct em_perf_domain *pd)
 {
 	struct em_perf_table __rcu *runtime_table;