diff mbox series

thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error

Message ID b2b7e84944937390256669df5a48ce5abba0c1ef.1613540713.git.viresh.kumar@linaro.org
State Accepted
Commit a51afb13311cd85b2f638c691b2734622277d8f5
Headers show
Series thermal: cpufreq_cooling: freq_qos_update_request() returns < 0 on error | expand

Commit Message

Viresh Kumar Feb. 17, 2021, 5:48 a.m. UTC
freq_qos_update_request() returns 1 if the effective constraint value
has changed, 0 if the effective constraint value has not changed, or a
negative error code on failures.

The frequency constraints for CPUs can be set by different parts of the
kernel. If the maximum frequency constraint set by other parts of the
kernel are set at a lower value than the one corresponding to cooling
state 0, then we will never be able to cool down the system as
freq_qos_update_request() will keep on returning 0 and we will skip
updating cpufreq_state and thermal pressure.

Fix that by doing the updates even in the case where
freq_qos_update_request() returns 0, as we have effectively set the
constraint to a new value even if the consolidated value of the
actual constraint is unchanged because of external factors.

Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
---
Hi Guys,

This needs to go in 5.12-rc.

Thara, please give this a try and give your tested-by :).

 drivers/thermal/cpufreq_cooling.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Lukasz Luba Feb. 17, 2021, 10:29 a.m. UTC | #1
Hi Viresh,

On 2/17/21 5:48 AM, Viresh Kumar wrote:
> freq_qos_update_request() returns 1 if the effective constraint value
> has changed, 0 if the effective constraint value has not changed, or a
> negative error code on failures.
> 
> The frequency constraints for CPUs can be set by different parts of the
> kernel. If the maximum frequency constraint set by other parts of the
> kernel are set at a lower value than the one corresponding to cooling
> state 0, then we will never be able to cool down the system as
> freq_qos_update_request() will keep on returning 0 and we will skip
> updating cpufreq_state and thermal pressure.

To be precised, thermal pressure signal is not so important in this
mechanism and the 'cpufreq_state' has changed recently:

236761f19a4f373354  thermal/drivers/cpufreq_cooling: Update 
cpufreq_state only if state has changed

> 
> Fix that by doing the updates even in the case where
> freq_qos_update_request() returns 0, as we have effectively set the
> constraint to a new value even if the consolidated value of the
> actual constraint is unchanged because of external factors.
> 
> Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
> Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")

I'm not sure if that f12e4f is the root cause.

> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> Hi Guys,
> 
> This needs to go in 5.12-rc.
> 
> Thara, please give this a try and give your tested-by :).
> 
>   drivers/thermal/cpufreq_cooling.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)


Anyway, the fix LGTM. I will have to make sure that I'm CC'ed for these
topic, so I can have a look (I missed somehow 236761f19)

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>

Regards,
Lukasz
Rafael J. Wysocki Feb. 17, 2021, 2:21 p.m. UTC | #2
On Wed, Feb 17, 2021 at 6:50 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> freq_qos_update_request() returns 1 if the effective constraint value
> has changed, 0 if the effective constraint value has not changed, or a
> negative error code on failures.
>
> The frequency constraints for CPUs can be set by different parts of the
> kernel. If the maximum frequency constraint set by other parts of the
> kernel are set at a lower value than the one corresponding to cooling
> state 0, then we will never be able to cool down the system as
> freq_qos_update_request() will keep on returning 0 and we will skip
> updating cpufreq_state and thermal pressure.
>
> Fix that by doing the updates even in the case where
> freq_qos_update_request() returns 0, as we have effectively set the
> constraint to a new value even if the consolidated value of the
> actual constraint is unchanged because of external factors.
>
> Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
> Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
> Hi Guys,
>
> This needs to go in 5.12-rc.
>
> Thara, please give this a try and give your tested-by :).
>
>  drivers/thermal/cpufreq_cooling.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index f5af2571f9b7..10af3341e5ea 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -485,7 +485,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
>         frequency = get_state_freq(cpufreq_cdev, state);
>
>         ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
> -       if (ret > 0) {
> +       if (ret >= 0) {
>                 cpufreq_cdev->cpufreq_state = state;
>                 cpus = cpufreq_cdev->policy->cpus;
>                 max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
> --
> 2.25.0.rc1.19.g042ed3e048af
>
Thara Gopinath Feb. 17, 2021, 3:38 p.m. UTC | #3
On 2/17/21 12:48 AM, Viresh Kumar wrote:
> freq_qos_update_request() returns 1 if the effective constraint value
> has changed, 0 if the effective constraint value has not changed, or a
> negative error code on failures.
> 
> The frequency constraints for CPUs can be set by different parts of the
> kernel. If the maximum frequency constraint set by other parts of the
> kernel are set at a lower value than the one corresponding to cooling
> state 0, then we will never be able to cool down the system as
> freq_qos_update_request() will keep on returning 0 and we will skip
> updating cpufreq_state and thermal pressure.
> 
> Fix that by doing the updates even in the case where
> freq_qos_update_request() returns 0, as we have effectively set the
> constraint to a new value even if the consolidated value of the
> actual constraint is unchanged because of external factors.
> 
> Cc: v5.7+ <stable@vger.kernel.org> # v5.7+
> Reported-by: Thara Gopinath <thara.gopinath@linaro.org>
> Fixes: f12e4f66ab6a ("thermal/cpu-cooling: Update thermal pressure in case of a maximum frequency capping")
> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
> ---
> Hi Guys,
> 
> This needs to go in 5.12-rc.
> 
> Thara, please give this a try and give your tested-by :).

It fixes the thermal runaway issue on sdm845 that I had reported. So,

Tested-by: Thara Gopinath<thara.gopinath@linaro.org>

> 
>   drivers/thermal/cpufreq_cooling.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
> index f5af2571f9b7..10af3341e5ea 100644
> --- a/drivers/thermal/cpufreq_cooling.c
> +++ b/drivers/thermal/cpufreq_cooling.c
> @@ -485,7 +485,7 @@ static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
>   	frequency = get_state_freq(cpufreq_cdev, state);
>   
>   	ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
> -	if (ret > 0) {
> +	if (ret >= 0) {
>   		cpufreq_cdev->cpufreq_state = state;
>   		cpus = cpufreq_cdev->policy->cpus;
>   		max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));
>
diff mbox series

Patch

diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c
index f5af2571f9b7..10af3341e5ea 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -485,7 +485,7 @@  static int cpufreq_set_cur_state(struct thermal_cooling_device *cdev,
 	frequency = get_state_freq(cpufreq_cdev, state);
 
 	ret = freq_qos_update_request(&cpufreq_cdev->qos_req, frequency);
-	if (ret > 0) {
+	if (ret >= 0) {
 		cpufreq_cdev->cpufreq_state = state;
 		cpus = cpufreq_cdev->policy->cpus;
 		max_capacity = arch_scale_cpu_capacity(cpumask_first(cpus));