diff mbox series

[v1,6/7] arm64: dts: sdm845: Increase alert trip point to 95 degrees

Message ID 041258d65883df964890249a24d2a4788c419304.1547078153.git.amit.kucheria@linaro.org
State New
Headers show
Series None | expand

Commit Message

Amit Kucheria Jan. 10, 2019, midnight UTC
75 degrees is too aggressive for throttling the CPU. After speaking to
Qualcomm engineers, increase it to 95 degrees.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>

---
 arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

-- 
2.17.1

Comments

Stephen Boyd Jan. 10, 2019, 12:29 a.m. UTC | #1
Quoting Amit Kucheria (2019-01-09 16:00:55)
> 75 degrees is too aggressive for throttling the CPU. After speaking to

> Qualcomm engineers, increase it to 95 degrees.

> 

> Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>

> ---

>  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------


Is the plan that these are some defaults that would be adjusted by board
variants? Just curious why we have anything in here and don't punt it
all to each board dts file.
Matthias Kaehlcke Jan. 10, 2019, 2:15 a.m. UTC | #2
On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:
> Hi Amit,

> 

> On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:

> > 75 degrees is too aggressive for throttling the CPU. After speaking to

> > Qualcomm engineers, increase it to 95 degrees.

> > 

> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>

> > ---

> >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------

> >  1 file changed, 8 insertions(+), 8 deletions(-)

> > 

> > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > index c27cbd3bcb0a..29e823b0caf4 100644

> > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > @@ -1692,7 +1692,7 @@

> >  

> >  			trips {

> >  				cpu_alert0: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> > @@ -1713,7 +1713,7 @@

> >  

> >  			trips {

> >  				cpu_alert1: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> > @@ -1734,7 +1734,7 @@

> >  

> >  			trips {

> >  				cpu_alert2: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> > @@ -1755,7 +1755,7 @@

> >  

> >  			trips {

> >  				cpu_alert3: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> > @@ -1776,7 +1776,7 @@

> >  

> >  			trips {

> >  				cpu_alert4: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> > @@ -1797,7 +1797,7 @@

> >  

> >  			trips {

> >  				cpu_alert5: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> > @@ -1818,7 +1818,7 @@

> >  

> >  			trips {

> >  				cpu_alert6: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> > @@ -1839,7 +1839,7 @@

> >  

> >  			trips {

> >  				cpu_alert7: trip0 {

> > -					temperature = <75000>;

> > +					temperature = <95000>;

> >  					hysteresis = <2000>;

> >  					type = "passive";

> >  				};

> 

> The change itself looks good to me, however I wonder if it would be

> worth to eliminate redundancy and merge the current 8 thermal zones

> into 2, one for the Silver and one for the Gold cluster (as done by

> http://crrev.com/c/1381752). There is a single cooling device for

> each cluster, so it's not clear to me if there is any gain from having

> a separate thermal zone for each CPU. If it is important to monitor

> the temperatures of the individual cores this can still be done by

> configuring the thermal zone of the cluster with multiple thermal

> sensors.


I see your idea is to have a cooling device per CPU ("arm64: dts:
sdm845: wireup the thermal trip points to cpufreq" /
https://lore.kernel.org/patchwork/patch/1030742/), however that
doesn't work as intended. Only two cpufreq 'devices' are created,
one for CPU0 and one for CPU4. In consequence cpufreq->ready() only
runs for these cores and only two cooling devices are
registered. Since the cores of a cluster all run at the same
frequency I also doubt if having multiple cooling devices would
bring any benefits.

Cheers

Matthias
Matthias Kaehlcke Jan. 10, 2019, 8 p.m. UTC | #3
On Fri, Jan 11, 2019 at 01:15:09AM +0530, Amit Kucheria wrote:
> On Thu, Jan 10, 2019 at 7:45 AM Matthias Kaehlcke <mka@chromium.org> wrote:

> >

> > On Wed, Jan 09, 2019 at 05:15:33PM -0800, Matthias Kaehlcke wrote:

> > > Hi Amit,

> > >

> > > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:

> > > > 75 degrees is too aggressive for throttling the CPU. After speaking to

> > > > Qualcomm engineers, increase it to 95 degrees.

> > > >

> > > > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>

> > > > ---

> > > >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------

> > > >  1 file changed, 8 insertions(+), 8 deletions(-)

> > > >

> > > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > > > index c27cbd3bcb0a..29e823b0caf4 100644

> > > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > > > @@ -1692,7 +1692,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert0: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > > > @@ -1713,7 +1713,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert1: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > > > @@ -1734,7 +1734,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert2: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > > > @@ -1755,7 +1755,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert3: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > > > @@ -1776,7 +1776,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert4: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > > > @@ -1797,7 +1797,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert5: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > > > @@ -1818,7 +1818,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert6: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > > > @@ -1839,7 +1839,7 @@

> > > >

> > > >                     trips {

> > > >                             cpu_alert7: trip0 {

> > > > -                                   temperature = <75000>;

> > > > +                                   temperature = <95000>;

> > > >                                     hysteresis = <2000>;

> > > >                                     type = "passive";

> > > >                             };

> > >

> > > The change itself looks good to me, however I wonder if it would be

> > > worth to eliminate redundancy and merge the current 8 thermal zones

> > > into 2, one for the Silver and one for the Gold cluster (as done by

> > > http://crrev.com/c/1381752). There is a single cooling device for

> > > each cluster, so it's not clear to me if there is any gain from having

> > > a separate thermal zone for each CPU. If it is important to monitor

> > > the temperatures of the individual cores this can still be done by

> > > configuring the thermal zone of the cluster with multiple thermal

> > > sensors.

> >

> > I see your idea is to have a cooling device per CPU ("arm64: dts:

> > sdm845: wireup the thermal trip points to cpufreq" /

> > https://lore.kernel.org/patchwork/patch/1030742/), however that

> > doesn't work as intended. Only two cpufreq 'devices' are created,

> > one for CPU0 and one for CPU4. In consequence cpufreq->ready() only

> > runs for these cores and only two cooling devices are

> > registered. Since the cores of a cluster all run at the same

> > frequency I also doubt if having multiple cooling devices would

> > bring any benefits.

> 

> I actually only intended for two cooling devices - one for each

> frequency domain. I'll clarify it better in the patch description.


Viresh helped me understand that we currently need to add cooling
device entries for all CPUs to the DT, even though at most one will be
active per freq domain at any time (I wonder if this could be changed
though).

Independently of the cooling devices, is there any value in having a
thermal zone for each CPU instead of having only one per freq domain?

Thanks

Matthias
Amit Kucheria Jan. 10, 2019, 8:06 p.m. UTC | #4
On Thu, Jan 10, 2019 at 5:59 AM Stephen Boyd <swboyd@chromium.org> wrote:
>

> Quoting Amit Kucheria (2019-01-09 16:00:55)

> > 75 degrees is too aggressive for throttling the CPU. After speaking to

> > Qualcomm engineers, increase it to 95 degrees.

> >

> > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>

> > ---

> >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------

>

> Is the plan that these are some defaults that would be adjusted by board

> variants? Just curious why we have anything in here and don't punt it

> all to each board dts file.


These values are meant to be safe values for silicon operation as
characterised[1] by the HW team. So I'd consider them a more than just
default values. It also gives future boards a safe starting point.

IMHO it would be better to have the characterized values merged
upstream and if really required, those values can be tweaked in the
board-specific DTS.

[1] Caveat: The characterisation probably focused on mobile devices
and might change depending on form-factor, active cooling and heat
dissipation design but those differences should only impact the skin
temperature, not the operation of the SoC itself.
Viresh Kumar Jan. 11, 2019, 3:32 a.m. UTC | #5
On 10-01-19, 12:00, Matthias Kaehlcke wrote:
> Viresh helped me understand that we currently need to add cooling

> device entries for all CPUs to the DT, even though at most one will be

> active per freq domain at any time (I wonder if this could be changed

> though).


Actually we were only adding cooling-cells in CPU0 until now and I
fixed that, so that is going to stay :)

The idea is that the hardware should be described properly and not
partially. Even if all the CPUs are part of the same freq-domain, they
are all capable of being a cooling device here and the DT should
describe that. Kernel will ofcourse create a single cooling device.

-- 
viresh
Matthias Kaehlcke Jan. 11, 2019, 6:30 p.m. UTC | #6
On Fri, Jan 11, 2019 at 03:54:23PM +0530, Amit Kucheria wrote:
> On Thu, Jan 10, 2019 at 6:45 AM Matthias Kaehlcke <mka@chromium.org> wrote:

> >

> > Hi Amit,

> >

> > On Thu, Jan 10, 2019 at 05:30:55AM +0530, Amit Kucheria wrote:

> > > 75 degrees is too aggressive for throttling the CPU. After speaking to

> > > Qualcomm engineers, increase it to 95 degrees.

> > >

> > > Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>

> > > ---

> > >  arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++--------

> > >  1 file changed, 8 insertions(+), 8 deletions(-)

> > >

> > > diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > > index c27cbd3bcb0a..29e823b0caf4 100644

> > > --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > > +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi

> > > @@ -1692,7 +1692,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert0: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> > > @@ -1713,7 +1713,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert1: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> > > @@ -1734,7 +1734,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert2: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> > > @@ -1755,7 +1755,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert3: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> > > @@ -1776,7 +1776,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert4: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> > > @@ -1797,7 +1797,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert5: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> > > @@ -1818,7 +1818,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert6: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> > > @@ -1839,7 +1839,7 @@

> > >

> > >                       trips {

> > >                               cpu_alert7: trip0 {

> > > -                                     temperature = <75000>;

> > > +                                     temperature = <95000>;

> > >                                       hysteresis = <2000>;

> > >                                       type = "passive";

> > >                               };

> >

> > The change itself looks good to me, however I wonder if it would be

> > worth to eliminate redundancy and merge the current 8 thermal zones

> > into 2, one for the Silver and one for the Gold cluster (as done by

> > http://crrev.com/c/1381752). There is a single cooling device for

> > each cluster, so it's not clear to me if there is any gain from having

> > a separate thermal zone for each CPU. If it is important to monitor

> > the temperatures of the individual cores this can still be done by

> > configuring the thermal zone of the cluster with multiple thermal

> > sensors.

> 

> Reducing the number of thermal zones to 2 (by grouping 4 sensors per

> zone) is not possible due a limitation of the thermal framework[1]. It

> is something that we want to address. Previous attempts to fix this

> were rejected for various reasons. Eduardo was going to share a way to

> have more flexible mapping between sensors and zones after discussions

> at LPC.


I wasn't aware of this limitation, thanks for the clarification! With
this I understand that for now we indeed need the 8 thermal zones with
all the redundant information :(

> <nag> Eduardo, do you have anything we can review? </nag> :-)

> 

> Having said that, we'll need some aggregation functions when we add

> multiple sensors to a zone (e.g. max, mean) to reflect the zone. This

> will lose information about hotspots and prevent things like idle

> injection on a particular CPU that is causing most of the heat in the

> aggregated zone. So IMHO, it might be useful to have information about

> the hotspots (i.e TZ per sensor) and aggregated values (ambient

> temperature) that can be fed to the thermal policy.


Ok, it seems for now we need the 8 thermal zones in any case, when
support for multiple sensors becomes available we can evaluate whether
it's worth to change that or not.

Cheers

Matthias
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index c27cbd3bcb0a..29e823b0caf4 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -1692,7 +1692,7 @@ 
 
 			trips {
 				cpu_alert0: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1713,7 +1713,7 @@ 
 
 			trips {
 				cpu_alert1: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1734,7 +1734,7 @@ 
 
 			trips {
 				cpu_alert2: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1755,7 +1755,7 @@ 
 
 			trips {
 				cpu_alert3: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1776,7 +1776,7 @@ 
 
 			trips {
 				cpu_alert4: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1797,7 +1797,7 @@ 
 
 			trips {
 				cpu_alert5: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1818,7 +1818,7 @@ 
 
 			trips {
 				cpu_alert6: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};
@@ -1839,7 +1839,7 @@ 
 
 			trips {
 				cpu_alert7: trip0 {
-					temperature = <75000>;
+					temperature = <95000>;
 					hysteresis = <2000>;
 					type = "passive";
 				};