[V2] thermal/drivers/hisi: Switch to interrupt mode

Message ID 1506575625-20388-1-git-send-email-daniel.lezcano@linaro.org
State New
Headers show
Series
  • [V2] thermal/drivers/hisi: Switch to interrupt mode
Related show

Commit Message

Daniel Lezcano Sept. 28, 2017, 5:13 a.m.
At this moment, we have both the interrupt setup and the polling enabled. The
interrupt does nothing more than forcing an update while the temperature is
polled every second.

We can do much better than that, threshold is set to 65C in the DT and the
passive cooling device enters in the dance when 75C is reached. We need to
sample the temperature at 65C in order to let the IPA gather enough values for
the PID computation. If the SoC is running at a temperature below 65C, we will
be constantly polling for nothing.

This patch disables the sensor when the temperature is below 65C and enables it
when passing the threshold. It results the thermal sensor driver will have no
activity most of the time.

Cc: Keerthy <j-keerthy@ti.com>
Cc: Leo Yang <leo.yan@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

---
 drivers/thermal/hisi_thermal.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

-- 
2.7.4

Comments

Daniel Lezcano Sept. 28, 2017, 7:32 a.m. | #1
On Thu, Sep 28, 2017 at 02:57:52PM +0800, Leo Yan wrote:
> Hi Daniel,

> 

> On Thu, Sep 28, 2017 at 07:13:44AM +0200, Daniel Lezcano wrote:

> > At this moment, we have both the interrupt setup and the polling enabled. The

> > interrupt does nothing more than forcing an update while the temperature is

> > polled every second.

> > 

> > We can do much better than that, threshold is set to 65C in the DT and the

> > passive cooling device enters in the dance when 75C is reached. We need to

> > sample the temperature at 65C in order to let the IPA gather enough values for

> > the PID computation. If the SoC is running at a temperature below 65C, we will

> > be constantly polling for nothing.

> > 

> > This patch disables the sensor when the temperature is below 65C and enables it

> > when passing the threshold. It results the thermal sensor driver will have no

> > activity most of the time.

> > 

> > Cc: Keerthy <j-keerthy@ti.com>

> > Cc: Leo Yang <leo.yan@linaro.org>

> 

> s/Yang/Yan :) Have tested this patch on Hikey at my side:


Oops sorry :)

> Reviewed-by: Leo Yan <leo.yan@linaro.org>

> Tested-by: Leo Yan <leo.yan@linaro.org>



Great! Thanks for testing.

  -- Daniel
Valentin Schneider Sept. 29, 2017, 11:07 a.m. | #2
Hi,


On 09/28/2017 06:13 AM, Daniel Lezcano wrote:
> At this moment, we have both the interrupt setup and the polling enabled. The

> interrupt does nothing more than forcing an update while the temperature is

> polled every second.

>

> We can do much better than that, threshold is set to 65C in the DT and the

> passive cooling device enters in the dance when 75C is reached. We need to

> sample the temperature at 65C in order to let the IPA gather enough values for

> the PID computation.

Sample collection is a valid point, but passive cooling should become 
active as soon
as 65C is reached (at least that's the case with IPA). Furthermore, 
IPA's PID controller
is reset when the temperature drops below the first trip-point 
(threshold) - as such,
I believe the PID's behavior should be the same now as it with polling.

I think the main selling point of interrupt-based updates is a faster 
reaction time.
On the HiKey960, we can go from below the threshold temperature to over 
the control temperature
in less than a second (default polling rate is 1s). In this situation, 
IPA's PID starts accumulating error
while already overshooting, which isn't optimal. On top of that, the 
passive cooling reacts
too slowly.
>   If the SoC is running at a temperature below 65C, we will

> be constantly polling for nothing.

>

> This patch disables the sensor when the temperature is below 65C and enables it

> when passing the threshold. It results the thermal sensor driver will have no

> activity most of the time.

>

> Cc: Keerthy <j-keerthy@ti.com>

> Cc: Leo Yang <leo.yan@linaro.org>

> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

I've tested this on HiKey960 (Android 4.9 + upstream patches to apply 
your thermal/drivers/hisi series + Kevin's hi3600 support).
I ran a video workload, and noticed I get several interrupts while 
passive cooling is already in effect
(I might move part of this discussion to Kevin's posting, but I think 
it's still relevant to be here):

[  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000
[  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 > 65000
[  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000
[  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 > 65000
[  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 > 65000

This isn't optimal for IPA, as the PID is supposed to use a specific 
sampling rate, but those interrupts
forced a re-trigger of power_allocator_throttle which changes the PID's 
actual sampling rate.
IPA isn't expecting this kind of scenarios, as I can see a 
tz->passive_delay in the computation
of the derivative term (although the derivative coefficient defaults to 
0...).

In a perfect world I would see those interrupts being toggled by the 
thermal governor, as that is
where we know what to do with each trip point - we could still want the 
interrupts, but in the case
of IPA we'd like to disable them while the PID controller is active, and 
we would know when to re-enable them
(as soon as IPA is toggled off).
> ---

>   drivers/thermal/hisi_thermal.c | 24 ++++++++++++++----------

>   1 file changed, 14 insertions(+), 10 deletions(-)

>

> diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c

> index 39f4627..74ea70d 100644

> --- a/drivers/thermal/hisi_thermal.c

> +++ b/drivers/thermal/hisi_thermal.c

> @@ -218,6 +218,15 @@ static int hisi_thermal_get_temp(void *__data, int *temp)

>   	return 0;

>   }

>   

> +static void hisi_thermal_toggle_sensor(struct hisi_thermal_sensor *sensor,

> +				       bool on)

> +{

> +	struct thermal_zone_device *tzd = sensor->tzd;

> +

> +	tzd->ops->set_mode(tzd,

> +		on ? THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED);

> +}

> +

>   static const struct thermal_zone_of_device_ops hisi_of_thermal_ops = {

>   	.get_temp = hisi_thermal_get_temp,

>   };

> @@ -236,12 +245,16 @@ static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev)

>   		dev_crit(&data->pdev->dev, "THERMAL ALARM: %d > %d\n",

>   			 temp, sensor->thres_temp);

>   

> +		hisi_thermal_toggle_sensor(&data->sensor, true);

> +

>   		thermal_zone_device_update(data->sensor.tzd,

>   					   THERMAL_EVENT_UNSPECIFIED);

>   

>   	} else if (temp < sensor->thres_temp) {

>   		dev_crit(&data->pdev->dev, "THERMAL ALARM stopped: %d < %d\n",

>   			 temp, sensor->thres_temp);

> +

> +		hisi_thermal_toggle_sensor(&data->sensor, false);

>   	}

>   

>   	return IRQ_HANDLED;

> @@ -286,15 +299,6 @@ static const struct of_device_id of_hisi_thermal_match[] = {

>   };

>   MODULE_DEVICE_TABLE(of, of_hisi_thermal_match);

>   

> -static void hisi_thermal_toggle_sensor(struct hisi_thermal_sensor *sensor,

> -				       bool on)

> -{

> -	struct thermal_zone_device *tzd = sensor->tzd;

> -

> -	tzd->ops->set_mode(tzd,

> -		on ? THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED);

> -}

> -

>   static int hisi_thermal_setup(struct hisi_thermal_data *data)

>   {

>   	struct hisi_thermal_sensor *sensor;

> @@ -393,7 +397,7 @@ static int hisi_thermal_probe(struct platform_device *pdev)

>   		return ret;

>   	}

>   

> -	hisi_thermal_toggle_sensor(&data->sensor, true);

> +	hisi_thermal_toggle_sensor(&data->sensor, false);

>   

>   	return 0;

>   }
Daniel Lezcano Sept. 29, 2017, 4:46 p.m. | #3
Hi Valentin,

On 29/09/2017 13:07, Valentin Schneider wrote:

[ ... ]

> I've tested this on HiKey960 (Android 4.9 + upstream patches to apply

> your thermal/drivers/hisi series + Kevin's hi3600 support).

> I ran a video workload, and noticed I get several interrupts while

> passive cooling is already in effect

> (I might move part of this discussion to Kevin's posting, but I think

> it's still relevant to be here):

> 

> [  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

> [  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 > 65000

> [  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

> [  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 > 65000

> [  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 > 65000

> 

> This isn't optimal for IPA, as the PID is supposed to use a specific

> sampling rate, but those interrupts

> forced a re-trigger of power_allocator_throttle which changes the PID's

> actual sampling rate.

> IPA isn't expecting this kind of scenarios, as I can see a

> tz->passive_delay in the computation

> of the derivative term (although the derivative coefficient defaults to

> 0...).


Interesting. This patch actually is for the hikey, not for the hikey960.
I didn't tested with Kevin's patches.

The thermal characteristics are very different on hi960. For example,
the temperature increase is must faster on hikey960. So I guess, the
driver itself should be fine tuned to prevent this kind of scenario.

> In a perfect world I would see those interrupts being toggled by the

> thermal governor, as that is

> where we know what to do with each trip point - we could still want the

> interrupts, but in the case

> of IPA we'd like to disable them while the PID controller is active, and

> we would know when to re-enable them

> (as soon as IPA is toggled off).

Yes, that is true the thermal framework could be enhanced to support this.

Do you have an hikey (not 960) to test this patch?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Valentin Schneider Sept. 29, 2017, 5:26 p.m. | #4
On 09/29/2017 05:46 PM, Daniel Lezcano wrote:
> Hi Valentin,

>

> On 29/09/2017 13:07, Valentin Schneider wrote:

>

> [ ... ]

>

>> I've tested this on HiKey960 (Android 4.9 + upstream patches to apply

>> your thermal/drivers/hisi series + Kevin's hi3600 support).

>> I ran a video workload, and noticed I get several interrupts while

>> passive cooling is already in effect

>> (I might move part of this discussion to Kevin's posting, but I think

>> it's still relevant to be here):

>>

>> [  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

>> [  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 > 65000

>> [  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

>> [  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 > 65000

>> [  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 > 65000

>>

>> This isn't optimal for IPA, as the PID is supposed to use a specific

>> sampling rate, but those interrupts

>> forced a re-trigger of power_allocator_throttle which changes the PID's

>> actual sampling rate.

>> IPA isn't expecting this kind of scenarios, as I can see a

>> tz->passive_delay in the computation

>> of the derivative term (although the derivative coefficient defaults to

>> 0...).

> Interesting. This patch actually is for the hikey, not for the hikey960.

> I didn't tested with Kevin's patches.

>

> The thermal characteristics are very different on hi960. For example,

> the temperature increase is must faster on hikey960. So I guess, the

> driver itself should be fine tuned to prevent this kind of scenario.


True

>> In a perfect world I would see those interrupts being toggled by the

>> thermal governor, as that is

>> where we know what to do with each trip point - we could still want the

>> interrupts, but in the case

>> of IPA we'd like to disable them while the PID controller is active, and

>> we would know when to re-enable them

>> (as soon as IPA is toggled off).

> Yes, that is true the thermal framework could be enhanced to support this.


If you're up for it, it could be an interesting discussion. In any case, 
I might spend a little bit of time looking into making such an 
interrupt-toggling work (at least as a prototype just on the HiKey960).

>

> Do you have an hikey (not 960) to test this patch?


I do have one but I'm currently lending it to someone else in my team, 
I'll have to track it down (probably next week).
I'm actually quite interested in having IPA triggered by interrupt 
because of the HiKey960's tendency to heat up insanely fast, which leads 
to IPA starting when the temperature is already way too high. I started 
toying around with Kevin's patches but saw your own patch and thought it 
was related, so I figured I would share some of my thoughts before 
starting to work on something someone else might already be working on.

I should perhaps move some of those comments to Kevin's posting - sorry 
for the noise.

>

>
Daniel Lezcano Oct. 2, 2017, 8:02 p.m. | #5
Oh, I noticed Kevin is not in the loop, Cc'ing him.

Sorry Kevin.

On 29/09/2017 19:26, Valentin Schneider wrote:
> 

> 

> On 09/29/2017 05:46 PM, Daniel Lezcano wrote:

>> Hi Valentin,

>>

>> On 29/09/2017 13:07, Valentin Schneider wrote:

>>

>> [ ... ]

>>

>>> I've tested this on HiKey960 (Android 4.9 + upstream patches to apply

>>> your thermal/drivers/hisi series + Kevin's hi3600 support).

>>> I ran a video workload, and noticed I get several interrupts while

>>> passive cooling is already in effect

>>> (I might move part of this discussion to Kevin's posting, but I think

>>> it's still relevant to be here):

>>>

>>> [  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 >

>>> 65000

>>> [  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 >

>>> 65000

>>> [  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 >

>>> 65000

>>> [  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 >

>>> 65000

>>> [  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 >

>>> 65000

>>>

>>> This isn't optimal for IPA, as the PID is supposed to use a specific

>>> sampling rate, but those interrupts

>>> forced a re-trigger of power_allocator_throttle which changes the PID's

>>> actual sampling rate.

>>> IPA isn't expecting this kind of scenarios, as I can see a

>>> tz->passive_delay in the computation

>>> of the derivative term (although the derivative coefficient defaults to

>>> 0...).

>> Interesting. This patch actually is for the hikey, not for the hikey960.

>> I didn't tested with Kevin's patches.

>>

>> The thermal characteristics are very different on hi960. For example,

>> the temperature increase is must faster on hikey960. So I guess, the

>> driver itself should be fine tuned to prevent this kind of scenario.

> 

> True

> 

>>> In a perfect world I would see those interrupts being toggled by the

>>> thermal governor, as that is

>>> where we know what to do with each trip point - we could still want the

>>> interrupts, but in the case

>>> of IPA we'd like to disable them while the PID controller is active, and

>>> we would know when to re-enable them

>>> (as soon as IPA is toggled off).

>> Yes, that is true the thermal framework could be enhanced to support

>> this.

> 

> If you're up for it, it could be an interesting discussion. In any case,

> I might spend a little bit of time looking into making such an

> interrupt-toggling work (at least as a prototype just on the HiKey960).

> 

>>

>> Do you have an hikey (not 960) to test this patch?

> 

> I do have one but I'm currently lending it to someone else in my team,

> I'll have to track it down (probably next week).

> I'm actually quite interested in having IPA triggered by interrupt

> because of the HiKey960's tendency to heat up insanely fast, which leads

> to IPA starting when the temperature is already way too high. I started

> toying around with Kevin's patches but saw your own patch and thought it

> was related, so I figured I would share some of my thoughts before

> starting to work on something someone else might already be working on.

> 

> I should perhaps move some of those comments to Kevin's posting - sorry

> for the noise.

> 

>>

>>

> 



-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Daniel Lezcano Oct. 10, 2017, 4:51 p.m. | #6
On 29/09/2017 13:07, Valentin Schneider wrote:
> Hi,

> 

> 

> On 09/28/2017 06:13 AM, Daniel Lezcano wrote:

>> At this moment, we have both the interrupt setup and the polling

>> enabled. The

>> interrupt does nothing more than forcing an update while the

>> temperature is

>> polled every second.

>>

>> We can do much better than that, threshold is set to 65C in the DT and

>> the

>> passive cooling device enters in the dance when 75C is reached. We

>> need to

>> sample the temperature at 65C in order to let the IPA gather enough

>> values for

>> the PID computation.

> Sample collection is a valid point, but passive cooling should become

> active as soon

> as 65C is reached (at least that's the case with IPA). Furthermore,

> IPA's PID controller

> is reset when the temperature drops below the first trip-point

> (threshold) - as such,

> I believe the PID's behavior should be the same now as it with polling.

> 

> I think the main selling point of interrupt-based updates is a faster

> reaction time.

> On the HiKey960, we can go from below the threshold temperature to over

> the control temperature

> in less than a second (default polling rate is 1s). In this situation,

> IPA's PID starts accumulating error

> while already overshooting, which isn't optimal. On top of that, the

> passive cooling reacts

> too slowly.

>>   If the SoC is running at a temperature below 65C, we will

>> be constantly polling for nothing.

>>

>> This patch disables the sensor when the temperature is below 65C and

>> enables it

>> when passing the threshold. It results the thermal sensor driver will

>> have no

>> activity most of the time.

>>

>> Cc: Keerthy <j-keerthy@ti.com>

>> Cc: Leo Yang <leo.yan@linaro.org>

>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> I've tested this on HiKey960 (Android 4.9 + upstream patches to apply

> your thermal/drivers/hisi series + Kevin's hi3600 support).

> I ran a video workload, and noticed I get several interrupts while

> passive cooling is already in effect

> (I might move part of this discussion to Kevin's posting, but I think

> it's still relevant to be here):

> 

> [  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

> [  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 > 65000

> [  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

> [  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 > 65000

> [  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 > 65000



Ok, so apparently there are multiple alarms level in the driver for the
hikey960 [1]. So I prefer to drop this patch for now and take the
hikey960 thermal support first and we can sort out the issue later.

For my information, can you show me the DT snippet you have for the
thermal zones?

  -- Daniel

[1] https://patchwork.kernel.org/patch/9965743/


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Valentin Schneider Oct. 10, 2017, 5:01 p.m. | #7
On 10/10/2017 05:51 PM, Daniel Lezcano wrote:
> Ok, so apparently there are multiple alarms level in the driver for the

> hikey960 [1]. So I prefer to drop this patch for now and take the

> hikey960 thermal support first and we can sort out the issue later.

>

> For my information, can you show me the DT snippet you have for the

> thermal zones?


Sure thing:

     thermal-zones {

             cls0: cls0 {
                 polling-delay = <1000>;
                 polling-delay-passive = <100>;
                 sustainable-power = <4500>;

                 /* sensor ID */
                 thermal-sensors = <&tsensor 1>;

                 trips {
                     threshold: trip-point@0 {
                         temperature = <65000>;
                         hysteresis = <1000>;
                         type = "passive";
                     };

                     target: trip-point@1 {
                         temperature = <75000>;
                         hysteresis = <1000>;
                         type = "passive";
                     };
                 };

                 cooling-maps {
                     map0 {
                         trip = <&target>;
                         contribution = <1024>;
                         cooling-device = <&cpu0 THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>;
                     };
                     map1 {
                         trip = <&target>;
                         contribution = <512>;
                         cooling-device = <&cpu4 THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>;
                     };
                     map2 {
                         trip = <&target>;
                         contribution = <1024>;
                         cooling-device = <&gpu THERMAL_NO_LIMIT 
THERMAL_NO_LIMIT>;
                     };
                 };
             };
         };

It's exactly what is on the Android 4.9 kernel 
(https://android.googlesource.com/kernel/hikey-linaro/+/android-hikey-linaro-4.9/arch/arm64/boot/dts/hisilicon/hi3660.dtsi#1272), 
with the tsensor index changed from 4 to 1 to work with Kevin's patches.

>

>    -- Daniel

>

> [1] https://patchwork.kernel.org/patch/9965743/

>

>
Daniel Lezcano Oct. 10, 2017, 5:13 p.m. | #8
On 10/10/2017 19:01, Valentin Schneider wrote:
> On 10/10/2017 05:51 PM, Daniel Lezcano wrote:

>> Ok, so apparently there are multiple alarms level in the driver for the

>> hikey960 [1]. So I prefer to drop this patch for now and take the

>> hikey960 thermal support first and we can sort out the issue later.

>>

>> For my information, can you show me the DT snippet you have for the

>> thermal zones?

> 

> Sure thing:

> 

>     thermal-zones {

> 

>             cls0: cls0 {

>                 polling-delay = <1000>;

>                 polling-delay-passive = <100>;

>                 sustainable-power = <4500>;

> 

>                 /* sensor ID */

>                 thermal-sensors = <&tsensor 1>;

> 

>                 trips {

>                     threshold: trip-point@0 {

>                         temperature = <65000>;

>                         hysteresis = <1000>;

>                         type = "passive";

>                     };

> 

>                     target: trip-point@1 {

>                         temperature = <75000>;

>                         hysteresis = <1000>;

>                         type = "passive";

>                     };

>                 };


That's strange, regarding your traces:

"
[  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000
[  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 > 65000
[  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000
[  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 > 65000
[  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 > 65000
"

I was expecting to see more trip points. Did you test the driver with a
70000 trip point?


[ ... ]


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Valentin Schneider Oct. 10, 2017, 5:19 p.m. | #9
On 10/10/2017 06:13 PM, Daniel Lezcano wrote:
> On 10/10/2017 19:01, Valentin Schneider wrote:

>> On 10/10/2017 05:51 PM, Daniel Lezcano wrote:

>>> Ok, so apparently there are multiple alarms level in the driver for the

>>> hikey960 [1]. So I prefer to drop this patch for now and take the

>>> hikey960 thermal support first and we can sort out the issue later.

>>>

>>> For my information, can you show me the DT snippet you have for the

>>> thermal zones?

>> Sure thing:

>>

>>      thermal-zones {

>>

>>              cls0: cls0 {

>>                  polling-delay = <1000>;

>>                  polling-delay-passive = <100>;

>>                  sustainable-power = <4500>;

>>

>>                  /* sensor ID */

>>                  thermal-sensors = <&tsensor 1>;

>>

>>                  trips {

>>                      threshold: trip-point@0 {

>>                          temperature = <65000>;

>>                          hysteresis = <1000>;

>>                          type = "passive";

>>                      };

>>

>>                      target: trip-point@1 {

>>                          temperature = <75000>;

>>                          hysteresis = <1000>;

>>                          type = "passive";

>>                      };

>>                  };

> That's strange, regarding your traces:

>

> "

> [  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

> [  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 > 65000

> [  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 > 65000

> [  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 > 65000

> [  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 > 65000

> "

>

> I was expecting to see more trip points. Did you test the driver with a

> 70000 trip point?


No, I didn't change any setting other than the tsensor index to make 
things work. Mind you, in Kevin's patch series the thermal alarm is 
setup with a 4 degrees "lag", i.e. alarms will only be re-triggered if 
temperature increases/decreases of at least 4 degrees (which explains 
the traces).

>

>

> [ ... ]

>

>
Daniel Lezcano Oct. 10, 2017, 5:28 p.m. | #10
On 10/10/2017 19:19, Valentin Schneider wrote:
> 

> 

> On 10/10/2017 06:13 PM, Daniel Lezcano wrote:

>> On 10/10/2017 19:01, Valentin Schneider wrote:

>>> On 10/10/2017 05:51 PM, Daniel Lezcano wrote:

>>>> Ok, so apparently there are multiple alarms level in the driver for the

>>>> hikey960 [1]. So I prefer to drop this patch for now and take the

>>>> hikey960 thermal support first and we can sort out the issue later.

>>>>

>>>> For my information, can you show me the DT snippet you have for the

>>>> thermal zones?

>>> Sure thing:

>>>

>>>      thermal-zones {

>>>

>>>              cls0: cls0 {

>>>                  polling-delay = <1000>;

>>>                  polling-delay-passive = <100>;

>>>                  sustainable-power = <4500>;

>>>

>>>                  /* sensor ID */

>>>                  thermal-sensors = <&tsensor 1>;

>>>

>>>                  trips {

>>>                      threshold: trip-point@0 {

>>>                          temperature = <65000>;

>>>                          hysteresis = <1000>;

>>>                          type = "passive";

>>>                      };

>>>

>>>                      target: trip-point@1 {

>>>                          temperature = <75000>;

>>>                          hysteresis = <1000>;

>>>                          type = "passive";

>>>                      };

>>>                  };

>> That's strange, regarding your traces:

>>

>> "

>> [  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 >

>> 65000

>> [  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 >

>> 65000

>> [  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 >

>> 65000

>> [  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 >

>> 65000

>> [  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 >

>> 65000

>> "

>>

>> I was expecting to see more trip points. Did you test the driver with a

>> 70000 trip point?

> 

> No, I didn't change any setting other than the tsensor index to make

> things work. Mind you, in Kevin's patch series the thermal alarm is

> setup with a 4 degrees "lag", i.e. alarms will only be re-triggered if

> temperature increases/decreases of at least 4 degrees (which explains

> the traces).


Mmh, the behavior regarding the interrupt is slightly different with the
hi960, perhaps a bit fuzzy regarding how it is handled now. Anyway, we
can live with that now and go further to fix that later, the result is
the same.


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Wangtao (Kevin, Kirin) Oct. 11, 2017, 1:54 a.m. | #11
On 2017/10/11 1:28, Daniel Lezcano wrote:
> On 10/10/2017 19:19, Valentin Schneider wrote:

>>

>>

>> On 10/10/2017 06:13 PM, Daniel Lezcano wrote:

>>> On 10/10/2017 19:01, Valentin Schneider wrote:

>>>> On 10/10/2017 05:51 PM, Daniel Lezcano wrote:

>>>>> Ok, so apparently there are multiple alarms level in the driver for the

>>>>> hikey960 [1]. So I prefer to drop this patch for now and take the

>>>>> hikey960 thermal support first and we can sort out the issue later.

>>>>>

>>>>> For my information, can you show me the DT snippet you have for the

>>>>> thermal zones?

>>>> Sure thing:

>>>>

>>>>       thermal-zones {

>>>>

>>>>               cls0: cls0 {

>>>>                   polling-delay = <1000>;

>>>>                   polling-delay-passive = <100>;

>>>>                   sustainable-power = <4500>;

>>>>

>>>>                   /* sensor ID */

>>>>                   thermal-sensors = <&tsensor 1>;

>>>>

>>>>                   trips {

>>>>                       threshold: trip-point@0 {

>>>>                           temperature = <65000>;

>>>>                           hysteresis = <1000>;

>>>>                           type = "passive";

>>>>                       };

>>>>

>>>>                       target: trip-point@1 {

>>>>                           temperature = <75000>;

>>>>                           hysteresis = <1000>;

>>>>                           type = "passive";

>>>>                       };

>>>>                   };

>>> That's strange, regarding your traces:

>>>

>>> "

>>> [  118.107357] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 >

>>> 65000

>>> [  119.182531] hisi_thermal fff30000.tsensor: THERMAL ALARM: 76235 >

>>> 65000

>>> [  119.361964] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70495 >

>>> 65000

>>> [  119.907865] hisi_thermal fff30000.tsensor: THERMAL ALARM: 75620 >

>>> 65000

>>> [  119.959076] hisi_thermal fff30000.tsensor: THERMAL ALARM: 70700 >

>>> 65000

>>> "

>>>

>>> I was expecting to see more trip points. Did you test the driver with a

>>> 70000 trip point?

>>

>> No, I didn't change any setting other than the tsensor index to make

>> things work. Mind you, in Kevin's patch series the thermal alarm is

>> setup with a 4 degrees "lag", i.e. alarms will only be re-triggered if

>> temperature increases/decreases of at least 4 degrees (which explains

>> the traces).

This should be a trip point 75000, cross 75000 triger an interrupt and
drop below 71000 triger another interrupt, the multi alarm interrupt is
not suitable for ipa as you discussed before, we should drop the patch
of multi alarm support.
> 

> Mmh, the behavior regarding the interrupt is slightly different with the

> hi960, perhaps a bit fuzzy regarding how it is handled now. Anyway, we

> can live with that now and go further to fix that later, the result is

> the same.

> 

>
Eduardo Valentin Dec. 5, 2017, 2 a.m. | #12
Hello,

On Thu, Sep 28, 2017 at 09:32:20AM +0200, Daniel Lezcano wrote:
> On Thu, Sep 28, 2017 at 02:57:52PM +0800, Leo Yan wrote:

> > Hi Daniel,

> > 

> > On Thu, Sep 28, 2017 at 07:13:44AM +0200, Daniel Lezcano wrote:

> > > At this moment, we have both the interrupt setup and the polling enabled. The

> > > interrupt does nothing more than forcing an update while the temperature is

> > > polled every second.

> > > 

> > > We can do much better than that, threshold is set to 65C in the DT and the

> > > passive cooling device enters in the dance when 75C is reached. We need to

> > > sample the temperature at 65C in order to let the IPA gather enough values for

> > > the PID computation. If the SoC is running at a temperature below 65C, we will

> > > be constantly polling for nothing.

> > > 

> > > This patch disables the sensor when the temperature is below 65C and enables it

> > > when passing the threshold. It results the thermal sensor driver will have no

> > > activity most of the time.

> > > 

> > > Cc: Keerthy <j-keerthy@ti.com>

> > > Cc: Leo Yang <leo.yan@linaro.org>

> > 

> > s/Yang/Yan :) Have tested this patch on Hikey at my side:

> 

> Oops sorry :)

> 

> > Reviewed-by: Leo Yan <leo.yan@linaro.org>

> > Tested-by: Leo Yan <leo.yan@linaro.org>

> 


Is this still needed after the latest rework done?

> 

> Great! Thanks for testing.

> 

>   -- Daniel
Daniel Lezcano Dec. 5, 2017, 6:49 a.m. | #13
On 05/12/2017 03:00, Eduardo Valentin wrote:
> Hello,

> 

> On Thu, Sep 28, 2017 at 09:32:20AM +0200, Daniel Lezcano wrote:

>> On Thu, Sep 28, 2017 at 02:57:52PM +0800, Leo Yan wrote:

>>> Hi Daniel,

>>>

>>> On Thu, Sep 28, 2017 at 07:13:44AM +0200, Daniel Lezcano wrote:

>>>> At this moment, we have both the interrupt setup and the polling enabled. The

>>>> interrupt does nothing more than forcing an update while the temperature is

>>>> polled every second.

>>>>

>>>> We can do much better than that, threshold is set to 65C in the DT and the

>>>> passive cooling device enters in the dance when 75C is reached. We need to

>>>> sample the temperature at 65C in order to let the IPA gather enough values for

>>>> the PID computation. If the SoC is running at a temperature below 65C, we will

>>>> be constantly polling for nothing.

>>>>

>>>> This patch disables the sensor when the temperature is below 65C and enables it

>>>> when passing the threshold. It results the thermal sensor driver will have no

>>>> activity most of the time.

>>>>

>>>> Cc: Keerthy <j-keerthy@ti.com>

>>>> Cc: Leo Yang <leo.yan@linaro.org>

>>>

>>> s/Yang/Yan :) Have tested this patch on Hikey at my side:

>>

>> Oops sorry :)

>>

>>> Reviewed-by: Leo Yan <leo.yan@linaro.org>

>>> Tested-by: Leo Yan <leo.yan@linaro.org>

>>

> 

> Is this still needed after the latest rework done?


No longer needed.



-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

Patch

diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c
index 39f4627..74ea70d 100644
--- a/drivers/thermal/hisi_thermal.c
+++ b/drivers/thermal/hisi_thermal.c
@@ -218,6 +218,15 @@  static int hisi_thermal_get_temp(void *__data, int *temp)
 	return 0;
 }
 
+static void hisi_thermal_toggle_sensor(struct hisi_thermal_sensor *sensor,
+				       bool on)
+{
+	struct thermal_zone_device *tzd = sensor->tzd;
+
+	tzd->ops->set_mode(tzd,
+		on ? THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED);
+}
+
 static const struct thermal_zone_of_device_ops hisi_of_thermal_ops = {
 	.get_temp = hisi_thermal_get_temp,
 };
@@ -236,12 +245,16 @@  static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev)
 		dev_crit(&data->pdev->dev, "THERMAL ALARM: %d > %d\n",
 			 temp, sensor->thres_temp);
 
+		hisi_thermal_toggle_sensor(&data->sensor, true);
+
 		thermal_zone_device_update(data->sensor.tzd,
 					   THERMAL_EVENT_UNSPECIFIED);
 
 	} else if (temp < sensor->thres_temp) {
 		dev_crit(&data->pdev->dev, "THERMAL ALARM stopped: %d < %d\n",
 			 temp, sensor->thres_temp);
+
+		hisi_thermal_toggle_sensor(&data->sensor, false);
 	}
 
 	return IRQ_HANDLED;
@@ -286,15 +299,6 @@  static const struct of_device_id of_hisi_thermal_match[] = {
 };
 MODULE_DEVICE_TABLE(of, of_hisi_thermal_match);
 
-static void hisi_thermal_toggle_sensor(struct hisi_thermal_sensor *sensor,
-				       bool on)
-{
-	struct thermal_zone_device *tzd = sensor->tzd;
-
-	tzd->ops->set_mode(tzd,
-		on ? THERMAL_DEVICE_ENABLED : THERMAL_DEVICE_DISABLED);
-}
-
 static int hisi_thermal_setup(struct hisi_thermal_data *data)
 {
 	struct hisi_thermal_sensor *sensor;
@@ -393,7 +397,7 @@  static int hisi_thermal_probe(struct platform_device *pdev)
 		return ret;
 	}
 
-	hisi_thermal_toggle_sensor(&data->sensor, true);
+	hisi_thermal_toggle_sensor(&data->sensor, false);
 
 	return 0;
 }