diff mbox series

[4/4] cpu_cooling: Drop static-power related stuff

Message ID fcca009e0e10840ff5ee31c88b618a212ab69201.1510735482.git.viresh.kumar@linaro.org
State Superseded
Headers show
Series cpu_cooling: cooling dev registration cleanups | expand

Commit Message

Viresh Kumar Nov. 15, 2017, 9:19 a.m. UTC
No one has used it for the last two and half years (since it was
introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the
power cooling device API")), get rid of it.

Cc: Javi Merino <javi.merino@arm.com>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

---
 drivers/thermal/cpu_cooling.c  | 106 +++++------------------------------------
 include/linux/cpu_cooling.h    |   3 --
 include/trace/events/thermal.h |  10 ++--
 3 files changed, 16 insertions(+), 103 deletions(-)

-- 
2.15.0.194.g9af6a3dea062

Comments

Viresh Kumar Nov. 15, 2017, 11:25 a.m. UTC | #1
On 15-11-17, 11:18, Daniel Lezcano wrote:
> Even if I agree that is not used to in the mainstream kernel, it is part

> of the EAS which is currently merged in Android.


Okay, I had chat with Daniel offline after looking at Android sources
and here are the key-points:
- No one is using this feature currently in Mainline or Android kernel
  tree.
- There may be some SoC vendors who are using it on their local repos,
  but we don't have much details on that.
- For EAS (Energy aware scheduling), we want to take static power of
  the CPUs into account and that is a TODO thing right now.

What worries me is that this code got merged almost 2.5 years back and
no one used it. Yeah, it looks interesting stuff but what's the
guarantee that anyone is going to use it in near future ?

@Javi, @Puneet, @Lukasz: As it was added by Javi (from ARM), can
someone from ARM write a patch for the arm big LITTLE cpufreq driver
(Juno?) and use this thing?

Else I am not sure how long we want to keep code in kernel that no one
touches.

And of course, we can revert this commit later on if required without
much trouble hopefully.

-- 
viresh
Javi Merino Nov. 15, 2017, 11:31 a.m. UTC | #2
On Wed, Nov 15, 2017 at 02:49:48PM +0530, Viresh Kumar wrote:
> No one has used it for the last two and half years (since it was

> introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the

> power cooling device API")), get rid of it.


Linaro used it in lsk 3.18 for the cpufreq driver for Juno.  The cpufreq
driver was converted to the generic one from dt in lsk 4.4, but the
generic cpufreq driver does not support static power because everything
has to come from device tree and we don't have a way to specify it there.

Cheers,
Javi
Daniel Lezcano Nov. 15, 2017, 11:39 a.m. UTC | #3
On 15/11/2017 12:31, Javi Merino wrote:
> On Wed, Nov 15, 2017 at 02:49:48PM +0530, Viresh Kumar wrote:

>> No one has used it for the last two and half years (since it was

>> introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the

>> power cooling device API")), get rid of it.

> 

> Linaro used it in lsk 3.18 for the cpufreq driver for Juno.  The cpufreq

> driver was converted to the generic one from dt in lsk 4.4, but the

> generic cpufreq driver does not support static power because everything

> has to come from device tree and we don't have a way to specify it there.


Are in favor of removing it or improving the code ?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Eduardo Valentin Nov. 15, 2017, 3:09 p.m. UTC | #4
On Wed, Nov 15, 2017 at 12:39:36PM +0100, Daniel Lezcano wrote:
> On 15/11/2017 12:31, Javi Merino wrote:

> > On Wed, Nov 15, 2017 at 02:49:48PM +0530, Viresh Kumar wrote:

> >> No one has used it for the last two and half years (since it was

> >> introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the

> >> power cooling device API")), get rid of it.

> > 

> > Linaro used it in lsk 3.18 for the cpufreq driver for Juno.  The cpufreq

> > driver was converted to the generic one from dt in lsk 4.4, but the

> > generic cpufreq driver does not support static power because everything

> > has to come from device tree and we don't have a way to specify it there.

> 

> Are in favor of removing it or improving the code ?


Yes, to what I can remember, Juno driver originally used this and it was
supposed to be a reference on who this stuff gets to be used. So, this
is more like "the code never made mainline" instead of no one is using.

And to be frank, this is the API that represents static power, which is
the component that differentiate things while using IPA. Maybe 

So yes, my suggestion is to put effort to get the juno code that uses
the static power back to mainline.

> 

> 

> -- 

>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

> 

> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |

> <http://twitter.com/#!/linaroorg> Twitter |

> <http://www.linaro.org/linaro-blog/> Blog

>
Eduardo Valentin Nov. 15, 2017, 3:43 p.m. UTC | #5
On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote:
> On 15/11/2017 10:19, Viresh Kumar wrote:

> > No one has used it for the last two and half years (since it was

> > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the

> > power cooling device API")), get rid of it.

> > 

> > Cc: Javi Merino <javi.merino@arm.com>

> > Cc: Punit Agrawal <punit.agrawal@arm.com>

> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> > ---

> 

> Even if I agree that is not used to in the mainstream kernel, it is part

> of the EAS which is currently merged in Android.

> 


Even though we really should care about stuff that is in mainline, this
specific case is about a piece of code that never made mainline, or got
lost on translation from one version to another. So, I am currently
nacking this patch and asking ARM/linaro folks to upstream the juno
implementation that uses static power.

> -- 

>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

> 

> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |

> <http://twitter.com/#!/linaroorg> Twitter |

> <http://www.linaro.org/linaro-blog/> Blog

>
Rafael J. Wysocki Nov. 15, 2017, 6:17 p.m. UTC | #6
On Wed, Nov 15, 2017 at 4:43 PM, Eduardo Valentin <edubezval@gmail.com> wrote:
> On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote:

>> On 15/11/2017 10:19, Viresh Kumar wrote:

>> > No one has used it for the last two and half years (since it was

>> > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the

>> > power cooling device API")), get rid of it.

>> >

>> > Cc: Javi Merino <javi.merino@arm.com>

>> > Cc: Punit Agrawal <punit.agrawal@arm.com>

>> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

>> > ---

>>

>> Even if I agree that is not used to in the mainstream kernel, it is part

>> of the EAS which is currently merged in Android.

>>

>

> Even though we really should care about stuff that is in mainline, this

> specific case is about a piece of code that never made mainline, or got

> lost on translation from one version to another. So, I am currently

> nacking this patch and asking ARM/linaro folks to upstream the juno

> implementation that uses static power.


However, I would like to see a clear declaration from whoever is
maintaining that code today that there is a plan in place to upstream
it and that this plan will actually be acted on.  And, better yet,
*when* that can be expected to happen.

Without such a declaration I'm afraid there is no point for the
mainline to carry the unused code.  Which apparently gets in the way
somehow, or Viresh wouldn't have taken the time to attempt to remove
it I suppose?

Thanks,
Rafael
Eduardo Valentin Nov. 15, 2017, 6:20 p.m. UTC | #7
On Wed, Nov 15, 2017 at 07:17:49PM +0100, Rafael J. Wysocki wrote:
> On Wed, Nov 15, 2017 at 4:43 PM, Eduardo Valentin <edubezval@gmail.com> wrote:

> > On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote:

> >> On 15/11/2017 10:19, Viresh Kumar wrote:

> >> > No one has used it for the last two and half years (since it was

> >> > introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the

> >> > power cooling device API")), get rid of it.

> >> >

> >> > Cc: Javi Merino <javi.merino@arm.com>

> >> > Cc: Punit Agrawal <punit.agrawal@arm.com>

> >> > Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

> >> > ---

> >>

> >> Even if I agree that is not used to in the mainstream kernel, it is part

> >> of the EAS which is currently merged in Android.

> >>

> >

> > Even though we really should care about stuff that is in mainline, this

> > specific case is about a piece of code that never made mainline, or got

> > lost on translation from one version to another. So, I am currently

> > nacking this patch and asking ARM/linaro folks to upstream the juno

> > implementation that uses static power.

> 

> However, I would like to see a clear declaration from whoever is

> maintaining that code today that there is a plan in place to upstream

> it and that this plan will actually be acted on.  And, better yet,

> *when* that can be expected to happen.

> 

> Without such a declaration I'm afraid there is no point for the

> mainline to carry the unused code.  Which apparently gets in the way

> somehow, or Viresh wouldn't have taken the time to attempt to remove

> it I suppose?



I agree here. This is mostly a code maintained by the linaro folks at
this moment (daniel, please chime in if I am wrong). If no effort is
done to get the code into mainline, there is no point in keeping the
static component as a dead code in our tree.

> 

> Thanks,

> Rafael
Viresh Kumar Nov. 16, 2017, 2:47 a.m. UTC | #8
On 15-11-17, 19:17, Rafael J. Wysocki wrote:
> However, I would like to see a clear declaration from whoever is

> maintaining that code today that there is a plan in place to upstream

> it and that this plan will actually be acted on.  And, better yet,

> *when* that can be expected to happen.


Exactly what I have been advocating.

And there is bunch of other places where such examples can be seen.
For example multiple regulator support in the OPP framework, which I
added an year ago hasn't seen a user yet. And I am pushing the TI guys
(who wanted it badly) to upstream their code before we remove that as
well :)

Again, someone has to come up and take responsibility of getting
static power platform support upstream in a definite amount of time.

-- 
viresh
Ionela Voinescu Nov. 16, 2017, 3:02 p.m. UTC | #9
Hi guys,

On 15/11/17 18:20, Eduardo Valentin wrote:
> On Wed, Nov 15, 2017 at 07:17:49PM +0100, Rafael J. Wysocki wrote:

>> On Wed, Nov 15, 2017 at 4:43 PM, Eduardo Valentin <edubezval@gmail.com> wrote:

>>> On Wed, Nov 15, 2017 at 11:18:03AM +0100, Daniel Lezcano wrote:

>>>> On 15/11/2017 10:19, Viresh Kumar wrote:

>>>>> No one has used it for the last two and half years (since it was

>>>>> introduced by commit c36cf0717631 ("thermal: cpu_cooling: implement the

>>>>> power cooling device API")), get rid of it.

>>>>>

>>>>> Cc: Javi Merino <javi.merino@arm.com>

>>>>> Cc: Punit Agrawal <punit.agrawal@arm.com>

>>>>> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

>>>>> ---

>>>>

>>>> Even if I agree that is not used to in the mainstream kernel, it is part

>>>> of the EAS which is currently merged in Android.

>>>>

>>>

>>> Even though we really should care about stuff that is in mainline, this

>>> specific case is about a piece of code that never made mainline, or got

>>> lost on translation from one version to another. So, I am currently

>>> nacking this patch and asking ARM/linaro folks to upstream the juno

>>> implementation that uses static power.

>>

>> However, I would like to see a clear declaration from whoever is

>> maintaining that code today that there is a plan in place to upstream

>> it and that this plan will actually be acted on.  And, better yet,

>> *when* that can be expected to happen.

>>

>> Without such a declaration I'm afraid there is no point for the

>> mainline to carry the unused code.  Which apparently gets in the way

>> somehow, or Viresh wouldn't have taken the time to attempt to remove

>> it I suppose?

> 

> 

> I agree here. This is mostly a code maintained by the linaro folks at

> this moment (daniel, please chime in if I am wrong). If no effort is

> done to get the code into mainline, there is no point in keeping the

> static component as a dead code in our tree.

> 


When it was added in lsk 3.18 in what was then a thermal driver for Juno
it was believed to have an effect in thermal mitigation, but that was
not proven later as to justify posting it upstream, and that is why the
code never made it in mainline.
The code added there can be found at:
https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247

As for removing this code now, I would not want to make that decision without
spending more time to check if it impacts other customer codelines.
I'll come back with a reply to this in the next couple of days.

Thank you,
Ionela.

>>

>> Thanks,

>> Rafael

>
Viresh Kumar Nov. 16, 2017, 3:20 p.m. UTC | #10
On 16-11-17, 15:02, Ionela Voinescu wrote:
> When it was added in lsk 3.18 in what was then a thermal driver for Juno

> it was believed to have an effect in thermal mitigation, but that was

> not proven later as to justify posting it upstream, and that is why the

> code never made it in mainline.

> The code added there can be found at:

> https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247

> 

> As for removing this code now, I would not want to make that decision without

> spending more time to check if it impacts other customer codelines.

> I'll come back with a reply to this in the next couple of days.


Sure, we can wait for few days definitely :)

-- 
viresh
Rafael J. Wysocki Nov. 16, 2017, 11:31 p.m. UTC | #11
On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote:
> On 16-11-17, 15:02, Ionela Voinescu wrote:

> > When it was added in lsk 3.18 in what was then a thermal driver for Juno

> > it was believed to have an effect in thermal mitigation, but that was

> > not proven later as to justify posting it upstream, and that is why the

> > code never made it in mainline.

> > The code added there can be found at:

> > https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247

> > 

> > As for removing this code now, I would not want to make that decision without

> > spending more time to check if it impacts other customer codelines.

> >

> > I'll come back with a reply to this in the next couple of days.

> 

> Sure, we can wait for few days definitely :)


While the above certainly is true, it doesn't matter whether or not any
out-of-the-tree code will be affected by removing this from the mainline.

What matters is *only* whether or not anyone is going to add anything
depending on it to the mainline any time soon.  If that's not the case,
the stuff goes away (and may be added back in the future if need be).

To avoid delaying this indefinitely, let's make a deal as follows.

Unless anyone with some changes targeted at the mainline and depending on the
code in question shows up before the end of the merge window currently under
way, I will queue up the patches from Viresh for 4.16.  Then, there will be
8 weeks (or more) of time before the 4.16 merge window opens to drop them if
any new material depending on the code removed by them materializes in the
meantime.

Thanks,
Rafael
Eduardo Valentin Nov. 16, 2017, 11:44 p.m. UTC | #12
Hello,

On Fri, Nov 17, 2017 at 12:31:41AM +0100, Rafael J. Wysocki wrote:
> On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote:

> > On 16-11-17, 15:02, Ionela Voinescu wrote:

> > > When it was added in lsk 3.18 in what was then a thermal driver for Juno

> > > it was believed to have an effect in thermal mitigation, but that was

> > > not proven later as to justify posting it upstream, and that is why the

> > > code never made it in mainline.



Really?  Do you have data that can be shared to back up the above
statement? 

Last time I checked, not only in Juno, static power does have
a non-negligible contribution. Neglecting static power can essentially
make IPA to undershoot in many cases to eventually overshoot. And that
is what I recollect from the data that I was presented, even for getting
this code reviewed. Just do not recollect to have the link to it.

> > > The code added there can be found at:

> > > https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247

> > > 

> > > As for removing this code now, I would not want to make that decision without

> > > spending more time to check if it impacts other customer codelines.

> > >


Maybe this is a matter of Linaro/ARM failing to convince SoC vendors to
really publish static power to mainline Linux instead of really having
the benefit of modeling it? Even simple models based on temperature
ranges would be better than only using dynamic power model.

> > > I'll come back with a reply to this in the next couple of days.

> > 

> > Sure, we can wait for few days definitely :)

> 

> While the above certainly is true, it doesn't matter whether or not any

> out-of-the-tree code will be affected by removing this from the mainline.

> 

> What matters is *only* whether or not anyone is going to add anything

> depending on it to the mainline any time soon.  If that's not the case,

> the stuff goes away (and may be added back in the future if need be).

> 

> To avoid delaying this indefinitely, let's make a deal as follows.

> 

> Unless anyone with some changes targeted at the mainline and depending on the

> code in question shows up before the end of the merge window currently under

> way, I will queue up the patches from Viresh for 4.16.  Then, there will be

> 8 weeks (or more) of time before the 4.16 merge window opens to drop them if

> any new material depending on the code removed by them materializes in the

> meantime.



Sure, as I mentioned before, if we failed to convince SoC manufacturers
to provide valid models, makes no sense to keep dead code in the tree,
you have my support and you can add my:

Acked-by: Eduardo Valentin <edubezval@gmail.com>


I am just not convinced that this is really about the static power
being negligible for IPA.

> 

> Thanks,

> Rafael

>
Daniel Lezcano Nov. 17, 2017, 7:55 a.m. UTC | #13
On 16/11/2017 03:47, Viresh Kumar wrote:
> On 15-11-17, 19:17, Rafael J. Wysocki wrote:

>> However, I would like to see a clear declaration from whoever is

>> maintaining that code today that there is a plan in place to upstream

>> it and that this plan will actually be acted on.  And, better yet,

>> *when* that can be expected to happen.

> 

> Exactly what I have been advocating.

> 

> And there is bunch of other places where such examples can be seen.

> For example multiple regulator support in the OPP framework, which I

> added an year ago hasn't seen a user yet. And I am pushing the TI guys

> (who wanted it badly) to upstream their code before we remove that as

> well :)

> 

> Again, someone has to come up and take responsibility of getting

> static power platform support upstream in a definite amount of time.


Instead of removing entirely the code, why not convert this to a DT
based info and put the Juno values back ?


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Punit Agrawal Nov. 17, 2017, 11:02 a.m. UTC | #14
Hi Eduardo,

Eduardo Valentin <edubezval@gmail.com> writes:

> Hello,

>

> On Fri, Nov 17, 2017 at 12:31:41AM +0100, Rafael J. Wysocki wrote:

>> On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote:

>> > On 16-11-17, 15:02, Ionela Voinescu wrote:

>> > > When it was added in lsk 3.18 in what was then a thermal driver for Juno

>> > > it was believed to have an effect in thermal mitigation, but that was

>> > > not proven later as to justify posting it upstream, and that is why the

>> > > code never made it in mainline.

>

>

> Really?  Do you have data that can be shared to back up the above

> statement?

>

> Last time I checked, not only in Juno, static power does have

> a non-negligible contribution. Neglecting static power can essentially

> make IPA to undershoot in many cases to eventually overshoot. And that

> is what I recollect from the data that I was presented, even for getting

> this code reviewed. Just do not recollect to have the link to it.


Just to make sure we are on the same page - static power can be a
significant portion of SoC power consumption. It varies widely across
SoCs and from experience depends on things like fabrication process,
ambient temperature, applied voltage/current drain, etc. There are many
SoCs where static power is a significant part of power consumption and
needs to be modelled for good thermal control.

Specifically in the case of Juno - we'd done some thermal and
performance benchmarking when working on IPA. This included implementing
a static power model to understand and test it's impact. If memory
serves me right static power was approximately in the 10-15%
range. Unfortunately, I can't find any datasets to support or disprove
this claim.

My take away from the Juno experiment was that it is not a particularly
thermally constrained system. At the least it took many 10s of seconds
running at max frequency (both clusters and the GPU) with the case fan
turned off for it to see a 10C increase. So the lack of a static power
model wasn't affecting it's thermal control.

But this situation is only true for Juno. More below...

>

>> > > The code added there can be found at:

>> > > https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247

>> > > 

>> > > As for removing this code now, I would not want to make that decision without

>> > > spending more time to check if it impacts other customer codelines.

>> > >

>

> Maybe this is a matter of Linaro/ARM failing to convince SoC vendors to

> really publish static power to mainline Linux instead of really having

> the benefit of modeling it? Even simple models based on temperature

> ranges would be better than only using dynamic power model.


I know of a few SoCs implementing static power model in their kernel
(not mainline). It would be great for that code to hit mainline. But as
with all the out of tree code, I'm not sure have much influence we have
in making that happen.

Again, I don't think we are arguing about the importance of static power
in a power model based thermal control strategy like IPA.

>

>> > > I'll come back with a reply to this in the next couple of days.

>> > 

>> > Sure, we can wait for few days definitely :)

>> 

>> While the above certainly is true, it doesn't matter whether or not any

>> out-of-the-tree code will be affected by removing this from the mainline.

>> 

>> What matters is *only* whether or not anyone is going to add anything

>> depending on it to the mainline any time soon.  If that's not the case,

>> the stuff goes away (and may be added back in the future if need be).

>> 

>> To avoid delaying this indefinitely, let's make a deal as follows.

>> 

>> Unless anyone with some changes targeted at the mainline and depending on the

>> code in question shows up before the end of the merge window currently under

>> way, I will queue up the patches from Viresh for 4.16.  Then, there will be

>> 8 weeks (or more) of time before the 4.16 merge window opens to drop them if

>> any new material depending on the code removed by them materializes in the

>> meantime.

>

>

> Sure, as I mentioned before, if we failed to convince SoC manufacturers

> to provide valid models, makes no sense to keep dead code in the tree,

> you have my support and you can add my:

>

> Acked-by: Eduardo Valentin <edubezval@gmail.com>

>

> I am just not convinced that this is really about the static power

> being negligible for IPA.


Just to reiterate once more, we are in agreement here. :)

I'd like to think this patchset has come out of a plan to develop
functionality on top but I don't know if that is the case.

>

>> 

>> Thanks,

>> Rafael

>>
Viresh Kumar Nov. 17, 2017, 11:06 a.m. UTC | #15
On 17-11-17, 11:02, Punit Agrawal wrote:
> I'd like to think this patchset has come out of a plan to develop

> functionality on top but I don't know if that is the case.


If you are talking about my patch series (?), then no, I don't have
any plans on top of it. I just cleaned things up a bit. Nothing more
:)

-- 
viresh
Ionela Voinescu Nov. 21, 2017, 11:30 a.m. UTC | #16
Hi guys,

On 17/11/17 11:02, Punit Agrawal wrote:
> Hi Eduardo,

> 

> Eduardo Valentin <edubezval@gmail.com> writes:

> 

>> Hello,

>>

>> On Fri, Nov 17, 2017 at 12:31:41AM +0100, Rafael J. Wysocki wrote:

>>> On Thursday, November 16, 2017 4:20:58 PM CET Viresh Kumar wrote:

>>>> On 16-11-17, 15:02, Ionela Voinescu wrote:

>>>>> When it was added in lsk 3.18 in what was then a thermal driver for Juno

>>>>> it was believed to have an effect in thermal mitigation, but that was

>>>>> not proven later as to justify posting it upstream, and that is why the

>>>>> code never made it in mainline.

>>

>>

>> Really?  Do you have data that can be shared to back up the above

>> statement?

>>

>> Last time I checked, not only in Juno, static power does have

>> a non-negligible contribution. Neglecting static power can essentially

>> make IPA to undershoot in many cases to eventually overshoot. And that

>> is what I recollect from the data that I was presented, even for getting

>> this code reviewed. Just do not recollect to have the link to it.

> 

> Just to make sure we are on the same page - static power can be a

> significant portion of SoC power consumption. It varies widely across

> SoCs and from experience depends on things like fabrication process,

> ambient temperature, applied voltage/current drain, etc. There are many

> SoCs where static power is a significant part of power consumption and

> needs to be modelled for good thermal control.

> 

> Specifically in the case of Juno - we'd done some thermal and

> performance benchmarking when working on IPA. This included implementing

> a static power model to understand and test it's impact. If memory

> serves me right static power was approximately in the 10-15%

> range. Unfortunately, I can't find any datasets to support or disprove

> this claim.

> 

> My take away from the Juno experiment was that it is not a particularly

> thermally constrained system. At the least it took many 10s of seconds

> running at max frequency (both clusters and the GPU) with the case fan

> turned off for it to see a 10C increase. So the lack of a static power

> model wasn't affecting it's thermal control.

> 

> But this situation is only true for Juno. More below...

> 


Sorry, I did not mean to say that static power is irrelevant. I only
specified that for Juno, the values used in the lsk3.18 kernel did not
have a significant effect in thermal mitigation, as Punit detailed here
and below.

Talking generically, IPA uses approximate values for dynamic power
consumption (due to approximate values for the dynamic power
coefficient, frequency at the end of the analysis window, an approximate
value for utilization), approximate values for sustainable power in
within a thermal zone, and approximate values for static power. A lot of
my own experiments so far showed that IPA can compensate very well for
inaccuracies in all of these due to the PID control loop but it pays the
price in the stability of its signal when they are way off.

As you also mention, when the accuracy of these values is neglected this
can result in an inefficient ramp up and seesaw effects (undershoots and
overshoots). That's why is very important for IPA to be properly tuned
and I would not suggest that any of these should be neglected, but to be
as accurate as possible.

But there is only so much accuracy that you can achieve given the high
cost of added complexity: static power and the dynamic power coefficient
depend on PVT which sometimes cannot easily be factored in, the
utilization that scales the dynamic power cannot be easily tracked
especially for clusters of CPUs and given inaccurate estimations of idle
states, etc.

This being said, I believe there are platforms out there where the
static power cost might be much too expensive to model for the gain it
brings in the stability and accuracy of IPA power request estimation and
allocation. Also, as you pointed out, there is reluctance in sharing
these models.

>>

>>>>> The code added there can be found at:

>>>>> https://git.linaro.org/landing-teams/working/arm/kernel-release.git/tree/drivers/thermal/scpi-thermal.c?h=lsk-3.18-armlt#n247

>>>>>

>>>>> As for removing this code now, I would not want to make that decision without

>>>>> spending more time to check if it impacts other customer codelines.

>>>>>

>>

>> Maybe this is a matter of Linaro/ARM failing to convince SoC vendors to

>> really publish static power to mainline Linux instead of really having

>> the benefit of modeling it? Even simple models based on temperature

>> ranges would be better than only using dynamic power model.

> 

> I know of a few SoCs implementing static power model in their kernel

> (not mainline). It would be great for that code to hit mainline. But as

> with all the out of tree code, I'm not sure have much influence we have

> in making that happen.

> 

> Again, I don't think we are arguing about the importance of static power

> in a power model based thermal control strategy like IPA.

> 

>>

>>>>> I'll come back with a reply to this in the next couple of days.

>>>>

>>>> Sure, we can wait for few days definitely :)

>>>

>>> While the above certainly is true, it doesn't matter whether or not any

>>> out-of-the-tree code will be affected by removing this from the mainline.

>>>

>>> What matters is *only* whether or not anyone is going to add anything

>>> depending on it to the mainline any time soon.  If that's not the case,

>>> the stuff goes away (and may be added back in the future if need be).

>>>

>>> To avoid delaying this indefinitely, let's make a deal as follows.

>>>

>>> Unless anyone with some changes targeted at the mainline and depending on the

>>> code in question shows up before the end of the merge window currently under

>>> way, I will queue up the patches from Viresh for 4.16.  Then, there will be

>>> 8 weeks (or more) of time before the 4.16 merge window opens to drop them if

>>> any new material depending on the code removed by them materializes in the

>>> meantime.

>>

>>

>> Sure, as I mentioned before, if we failed to convince SoC manufacturers

>> to provide valid models, makes no sense to keep dead code in the tree,

>> you have my support and you can add my:

>>

>> Acked-by: Eduardo Valentin <edubezval@gmail.com>

>>

>> I am just not convinced that this is really about the static power

>> being negligible for IPA.

> 


Static power is definitely significant for IPA as is the accuracy of
all other values that contribute to the power request and allocation
calculations. 

For me, part the decision or whether to remove or to keep this code is
about how much use we can make of it now or in the future, as it stands.

Information on static power, depending on platform and achievable
accuracy, can be as simple as a DT value (I would definitely not
recommend this to be the only supported source), a more complex callback
or maybe a value provided by firmware where the mechanism to obtain that
value is hidden.
A DT model would be easy to support with the current code but it would
be very inaccurate.
More complex algorithms provided as callbacks should give the possibility 
for platform specific customization which lacks at the moment. But even
if it was supported, SoC manufactures are usually reluctant to share that 
information.
Passing information from firmware would allow for platform specific 
customization as well as hiding the mechanism through which is obtained,
but there's no standard way to obtain this value at the moment (probably
can be added to the OPP framework in the future).

Another factor to consider is the imbalance between cpufreq and devfreq
cooling devices (devfreq cooling devices are still able to provide
static power information) that removing this code creates.

Therefore, I would rather see this interface extended and not removed
completely, in order to give users the possibility to link a source of
information more appropriate for them. And it should all start with 
support for one platform. But I can't volunteer my time in short term 
to make these changes. So I can ack these patches for now, as the 
justification for cleaning this code is correct, but sooner or later 
better support for providing static power information should and will 
be added.

Best regards,
Ionela.

> Just to reiterate once more, we are in agreement here. :)

> 

> I'd like to think this patchset has come out of a plan to develop

> functionality on top but I don't know if that is the case.

> 

>>

>>>

>>> Thanks,

>>> Rafael

>>>

>
Lukasz Luba Nov. 21, 2017, 3:56 p.m. UTC | #17
On 21/11/17 14:06, Daniel Lezcano wrote:
> On 21/11/2017 12:30, Ionela Voinescu wrote:

>

> [ ... ]

>

>> A DT model would be easy to support with the current code but it would

>> be very inaccurate.

>

> Why ?

>

> [ ... ]

>

Hi all,

The DT solution won't fly, the reason can be found below.

I agree with Ionela and Punit that the Juno board is not
the best platform to test the static power impact on IPA.
In some other platforms the static power can be 50% or more
of the total power, so it cannot be neglected.

These are the issues.
The static power equation is complicated, here is one known to me.
The leakage function is exponentially influenced by current circuit
supply voltage, body-bias and some constants K_{4,5}.

P_{leak} = L_{g}*V_{dd}*K_{3}*e^{K_{4}*V_{dd}}*e^{K_{5}*V_{bs}}+| 
V_{bs}|*I_{Ju}

It can also vary depending on technology (CMOS, FinFET, etc).

It would be really hard to approximate by i.e. a polynomial
function with inputs from DT. One size does not fit all.

The equation can also tell you some interesting things about
the manufacturing process. Exposing such information might be the last
thing the vendors want to.
That's why the vendors might want to implement whole
thermal management in the firmware or skip static power and
rely on IPA adaptation.
They can also use a different api in IPA, when they have some mechanism
to measure power in firmware, it can be feed into IPA.

Anyway, I would recommend to keep it as is, to have a complete
power model in the kernel.
The code without static power routines looks awkward to me.
 From my side - NACK for the patch which removes static power.

Regards,
Lukasz Luba
Eduardo Valentin Nov. 21, 2017, 4:57 p.m. UTC | #18
Hello folks,

On Tue, Nov 21, 2017 at 05:08:34PM +0100, Vincent Guittot wrote:
> Hi Lukasz

> 

> On 21 November 2017 at 16:56, Lukasz Luba <llu.ker.dev@gmail.com> wrote:

> > On 21/11/17 14:06, Daniel Lezcano wrote:

> >>

> >> On 21/11/2017 12:30, Ionela Voinescu wrote:

> >>

> >> [ ... ]

> >>

> >>> A DT model would be easy to support with the current code but it would

> >>> be very inaccurate.

> >>

> >>

> >> Why ?

> >>

> >> [ ... ]

> >>

> > Hi all,

> >

> > The DT solution won't fly, the reason can be found below.


The APIs being removed by this patch is exactly to cover for the
difficulty to model all static power cases. 


> >

> > I agree with Ionela and Punit that the Juno board is not

> > the best platform to test the static power impact on IPA.

> > In some other platforms the static power can be 50% or more

> > of the total power, so it cannot be neglected.

> >

> > These are the issues.

> > The static power equation is complicated, here is one known to me.

> > The leakage function is exponentially influenced by current circuit

> > supply voltage, body-bias and some constants K_{4,5}.

> >

> > P_{leak} = L_{g}*V_{dd}*K_{3}*e^{K_{4}*V_{dd}}*e^{K_{5}*V_{bs}}+|

> > V_{bs}|*I_{Ju}

> 

> You forgot one main contributor of static leakage: the temperature

> 



We all agree that it is hard to model static power, specially
considering all variables. And that today, ARM/Linaro failed to
convince vendors to expose this in mainline. So, ...

> >

> > It can also vary depending on technology (CMOS, FinFET, etc).

> >

> > It would be really hard to approximate by i.e. a polynomial

> > function with inputs from DT. One size does not fit all.

> 

> But can't we linearized around the target temp ? that were we want to

> be accurate

> 

> Regards,

> >

> > The equation can also tell you some interesting things about

> > the manufacturing process. Exposing such information might be the last

> > thing the vendors want to.

> > That's why the vendors might want to implement whole

> > thermal management in the firmware or skip static power and

> > rely on IPA adaptation.

> > They can also use a different api in IPA, when they have some mechanism

> > to measure power in firmware, it can be feed into IPA.

> >



The lack of code in mainline, is not really because the API would not
help IPA, but because 2.5y after this has been merged, vendors were not
convinced to push a model, even if simple, to mainline.

> > Anyway, I would recommend to keep it as is, to have a complete

> > power model in the kernel.

> > The code without static power routines looks awkward to me.

> > From my side - NACK for the patch which removes static power.

> >



However, we cannot NACK just because we like the code :-). Nor we can
NACK because vendors keep their code in Android tree somewhere, or in
any other tree. Viresh has a point, if one looks at this code today in
mainline, no one is using, it is a dead code, doesn't matter what is out
of the tree using it.

As I said before, the minimal you guys (ARM and Linaro) can do is to at
least upstream the Juno code! as a reference. Come on guys?  what is
preventing you to upstream Juno model? As already discussed in this
thread, we know Juno won't be the best platform to benefit for it, but
it has a static power component, and for sure behaves better with the
static power model than only with dynamic power.

> > Regards,

> > Lukasz Luba
Daniel Lezcano Nov. 21, 2017, 5:49 p.m. UTC | #19
On 21/11/2017 18:09, Eduardo Valentin wrote:

[ ... ]

> Well, I really do not see the point of a ask to extend the current API

> if have no single user of it. What are the current users problems with

> the API? What needs to be improved? What are the problems? We cannot

> tell, guess why? We have no users of it!

> 

> Once again, do we have a reference platform? Oh, yes, that is Juno, and

> not even that is in mainline code.

> 

> Folks, this can be a very nice discussion on how we can over engineer

> this API, but being pragmatic and avoiding wasting our time here, we all

> know what needs to be done. Dead code in mainline is hard to maintain.

> API to support out of the tree code is even harder. I am surprised to

> see this code was able to sustain itself in mainline with none

> challenging it for 2.5 y. So, we either get you guys to upstream at

> least one user of it or we just move on and remove the API, and keep

> mainline with only dynamic power, with periodic undershoots and

> overshoots. It is a simple decision: you either mainline it or keep IPA

> MORE inaccurate and take the burden to keep vendor own implementation

> of APIs for static power model on each product cycle, you choose.


+1 to remove it.


-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Javi Merino Nov. 21, 2017, 6 p.m. UTC | #20
Hi,

On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

[snip]

> As I said before, the minimal you guys (ARM and Linaro) can do is to at

> least upstream the Juno code! as a reference. Come on guys?  what is

> preventing you to upstream Juno model?


As Ionela pointed out earlier in the thread, the cpufreq driver for Juno
was not acceptable for mainline because it used platform specific code.
When it was converted to cpufreq-dt, the static power was left behind
because it can't be represented in device tree.  This is because there
isn't a function that works for every SoC, different process nodes
(among other things) will need different functions.  So it can't be just
a bunch of coefficients in DT, we need a function.  Hence the callback.

In a nutshell, mainline does not want platform specific code, but we
haven't figured out how to calculate static power without platform
specific code.

Cheers,
Javi
Daniel Lezcano Nov. 21, 2017, 6:05 p.m. UTC | #21
On 21/11/2017 19:00, Javi Merino wrote:
> Hi,

> 

> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

> 

> [snip]

> 

>> As I said before, the minimal you guys (ARM and Linaro) can do is to at

>> least upstream the Juno code! as a reference. Come on guys?  what is

>> preventing you to upstream Juno model?

> 

> As Ionela pointed out earlier in the thread, the cpufreq driver for Juno

> was not acceptable for mainline because it used platform specific code.

> When it was converted to cpufreq-dt, the static power was left behind

> because it can't be represented in device tree.  This is because there

> isn't a function that works for every SoC, different process nodes

> (among other things) will need different functions.  So it can't be just

> a bunch of coefficients in DT, we need a function.  Hence the callback.


The DT could contain the coef and a compatible string for a specific
polynomial computation callback. I imagine we should not have a lot of
different equations, no ?



-- 
 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog
Eduardo Valentin Nov. 21, 2017, 6:12 p.m. UTC | #22
On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote:
> Hi,

> 

> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

> 

> [snip]

> 

> > As I said before, the minimal you guys (ARM and Linaro) can do is to at

> > least upstream the Juno code! as a reference. Come on guys?  what is

> > preventing you to upstream Juno model?

> 

> As Ionela pointed out earlier in the thread, the cpufreq driver for Juno

> was not acceptable for mainline because it used platform specific code.

> When it was converted to cpufreq-dt, the static power was left behind

> because it can't be represented in device tree.  This is because there


Or we still haven't figure what to represent. Once decided what to
represent, then we can claim if is doable in DT or not.


> isn't a function that works for every SoC, different process nodes

> (among other things) will need different functions.  So it can't be just

> a bunch of coefficients in DT, we need a function.  Hence the callback.


Well, DT is full of vendor specific stuff.

> 

> In a nutshell, mainline does not want platform specific code, but we

> haven't figured out how to calculate static power without platform

> specific code.


To, that is still fine to have it as a callback, as long as you have at
least one user! I still do not understand why Juno static power cannot
go as platform code that register the callback to implement the static
power model.

> 

> Cheers,

> Javi
Eduardo Valentin Nov. 21, 2017, 6:13 p.m. UTC | #23
On Tue, Nov 21, 2017 at 07:05:46PM +0100, Daniel Lezcano wrote:
> On 21/11/2017 19:00, Javi Merino wrote:

> > Hi,

> > 

> > On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

> > 

> > [snip]

> > 

> >> As I said before, the minimal you guys (ARM and Linaro) can do is to at

> >> least upstream the Juno code! as a reference. Come on guys?  what is

> >> preventing you to upstream Juno model?

> > 

> > As Ionela pointed out earlier in the thread, the cpufreq driver for Juno

> > was not acceptable for mainline because it used platform specific code.

> > When it was converted to cpufreq-dt, the static power was left behind

> > because it can't be represented in device tree.  This is because there

> > isn't a function that works for every SoC, different process nodes

> > (among other things) will need different functions.  So it can't be just

> > a bunch of coefficients in DT, we need a function.  Hence the callback.

> 

> The DT could contain the coef and a compatible string for a specific

> polynomial computation callback. I imagine we should not have a lot of

> different equations, no ?

> 


Yeah, that would be another way of doing it. If there is no equation
that correlates all processes, then we need a vendor specific entry, or
a compatible string, as Daniel said.

> 

> 

> -- 

>  <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

> 

> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |

> <http://twitter.com/#!/linaroorg> Twitter |

> <http://www.linaro.org/linaro-blog/> Blog

>
Lukasz Luba Nov. 21, 2017, 11:32 p.m. UTC | #24
On 21/11/17 19:13, Eduardo Valentin wrote:
> On Tue, Nov 21, 2017 at 07:05:46PM +0100, Daniel Lezcano wrote:

>> On 21/11/2017 19:00, Javi Merino wrote:

>>> Hi,

>>>

>>> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

>>>

>>> [snip]

>>>

>>>> As I said before, the minimal you guys (ARM and Linaro) can do is to at

>>>> least upstream the Juno code! as a reference. Come on guys?  what is

>>>> preventing you to upstream Juno model?

>>>

>>> As Ionela pointed out earlier in the thread, the cpufreq driver for Juno

>>> was not acceptable for mainline because it used platform specific code.

>>> When it was converted to cpufreq-dt, the static power was left behind

>>> because it can't be represented in device tree.  This is because there

>>> isn't a function that works for every SoC, different process nodes

>>> (among other things) will need different functions.  So it can't be just

>>> a bunch of coefficients in DT, we need a function.  Hence the callback.

>>

>> The DT could contain the coef and a compatible string for a specific

>> polynomial computation callback. I imagine we should not have a lot of

>> different equations, no ?

>>

>

> Yeah, that would be another way of doing it. If there is no equation

> that correlates all processes, then we need a vendor specific entry, or

> a compatible string, as Daniel said.

>

So we have ~8 weeks (before it will vanish from mainline) to come up 
with ideas
or to show that it is needed and used by some platform.
Let's see...

Regards,
Lukasz
Viresh Kumar Nov. 22, 2017, 1:25 a.m. UTC | #25
On 21-11-17, 18:00, Javi Merino wrote:
> As Ionela pointed out earlier in the thread, the cpufreq driver for Juno

> was not acceptable for mainline because it used platform specific code.


Can we get a link to that thread? I don't remember what I have commented earlier
but the above doesn't seem to be entirely true.

The basic idea is to use as much common stuff as possible and so to use
cpufreq-dt.c if possible. Its not that we are against platform specific bits,
they are fine if they are really required.

> When it was converted to cpufreq-dt, the static power was left behind

> because it can't be represented in device tree.  This is because there

> isn't a function that works for every SoC, different process nodes

> (among other things) will need different functions.  So it can't be just

> a bunch of coefficients in DT, we need a function.  Hence the callback.


Sure thing. And we can make this happen if we need. We aren't blocking it.

> In a nutshell, mainline does not want platform specific code, but we


Not really. We don't want platform specific code in arch/arm64, but we can have
that in drivers/opp/ for example if required.

Please start a discussion (in a separate thread if you want) and we can get
cpufreq support updated for Juno very easily.

And don't worry about this patch here. We can surely drop the patch if someone
is serious enough to start using it. But there needs to be a commitment, nothing
more.

-- 
viresh
Sudeep Holla Nov. 22, 2017, 10:59 a.m. UTC | #26
(sorry for chiming in quite late)

On 21/11/17 18:12, Eduardo Valentin wrote:
> On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote:

>> Hi,

>>

>> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

>>


[...]

>>

>> In a nutshell, mainline does not want platform specific code, but we

>> haven't figured out how to calculate static power without platform

>> specific code.>

> To, that is still fine to have it as a callback, as long as you have at

> least one user! I still do not understand why Juno static power cannot

> go as platform code that register the callback to implement the static

> power model.

> 


1. It was proved not so useful(anyone can prove otherwise ?)
2. I am told static power is negligible compared to dynamic power with
   new fab processes.
3. It's very hard to even test IPA on Juno as it doesn't reach the
   required critical temperature easily. So as Juno platform maintainer
   I want a test case to test regression before we merge anything.

IMO, if the $subject code is expected to be used on Juno, then my answer
is no if one can't test it reliably and also prove that static power
really matters on Juno. So far, I have heard both the above is not
possible. So please delete the code if Juno is the only user in
short and mid term. We can get the code back if we find any users in
longer term.

-- 
Regards,
Sudeep
Viresh Kumar Nov. 22, 2017, 11:08 a.m. UTC | #27
On 22-11-17, 06:55, Viresh Kumar wrote:
> On 21-11-17, 18:00, Javi Merino wrote:

> > As Ionela pointed out earlier in the thread, the cpufreq driver for Juno

> > was not acceptable for mainline because it used platform specific code.

> 

> Can we get a link to that thread? I don't remember what I have commented earlier

> but the above doesn't seem to be entirely true.

> 

> The basic idea is to use as much common stuff as possible and so to use

> cpufreq-dt.c if possible. Its not that we are against platform specific bits,

> they are fine if they are really required.


Just to correct everyone here, Juno doesn't use the cpufreq-dt driver but
scpi-cpufreq.c :)

--
viresh
Viresh Kumar Nov. 22, 2017, 11:10 a.m. UTC | #28
On 22-11-17, 10:59, Sudeep Holla wrote:
> IMO, if the $subject code is expected to be used on Juno, then my answer

> is no if one can't test it reliably and also prove that static power

> really matters on Juno. So far, I have heard both the above is not

> possible. So please delete the code if Juno is the only user in

> short and mid term. We can get the code back if we find any users in

> longer term.


Its funny that the $subject patch receives few Acks and Nacks every day now :)

-- 
viresh
Sudeep Holla Nov. 22, 2017, 11:17 a.m. UTC | #29
On 22/11/17 11:10, Viresh Kumar wrote:
> On 22-11-17, 10:59, Sudeep Holla wrote:

>> IMO, if the $subject code is expected to be used on Juno, then my answer

>> is no if one can't test it reliably and also prove that static power

>> really matters on Juno. So far, I have heard both the above is not

>> possible. So please delete the code if Juno is the only user in

>> short and mid term. We can get the code back if we find any users in

>> longer term.

> 

> Its funny that the $subject patch receives few Acks and Nacks every day now :)

> 


If Nacks are depending on Juno, then you can convert them to Ack on my
behalf ;) as that won't happen unless someone disproves all the known
facts so far and provides a solid test case we can depend on to test
regressions.

-- 
Regards,
Sudeep
Eduardo Valentin Nov. 22, 2017, 3:34 p.m. UTC | #30
On Wed, Nov 22, 2017 at 10:59:21AM +0000, Sudeep Holla wrote:
> (sorry for chiming in quite late)

> 

> On 21/11/17 18:12, Eduardo Valentin wrote:

> > On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote:

> >> Hi,

> >>

> >> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

> >>

> 

> [...]

> 

> >>

> >> In a nutshell, mainline does not want platform specific code, but we

> >> haven't figured out how to calculate static power without platform

> >> specific code.>

> > To, that is still fine to have it as a callback, as long as you have at

> > least one user! I still do not understand why Juno static power cannot

> > go as platform code that register the callback to implement the static

> > power model.

> > 

> 

> 1. It was proved not so useful(anyone can prove otherwise ?)


Can anyone prove it does not have static power?

> 2. I am told static power is negligible compared to dynamic power with

>    new fab processes.


I am told quantum computer is out there :-), does it mean we should drop the
maintenance of everything else?


> 3. It's very hard to even test IPA on Juno as it doesn't reach the

>    required critical temperature easily. So as Juno platform maintainer

>    I want a test case to test regression before we merge anything.

> 

> IMO, if the $subject code is expected to be used on Juno, then my answer

> is no if one can't test it reliably and also prove that static power

> really matters on Juno. So far, I have heard both the above is not

> possible. So please delete the code if Juno is the only user in

> short and mid term. We can get the code back if we find any users in

> longer term.


Yeah, the fact that Juno takes time to reach crit temperature does not
necessarily imply it does not have static power consumption, or that its
static power consumption is negligible. Now, if you want to ignore it,
because it is not the best example to show usefulness of IPA, that is a
different story.

Eduardo

> 

> -- 

> Regards,

> Sudeep
Eduardo Valentin Nov. 22, 2017, 3:38 p.m. UTC | #31
On Wed, Nov 22, 2017 at 04:40:13PM +0530, Viresh Kumar wrote:
> On 22-11-17, 10:59, Sudeep Holla wrote:

> > IMO, if the $subject code is expected to be used on Juno, then my answer

> > is no if one can't test it reliably and also prove that static power

> > really matters on Juno. So far, I have heard both the above is not

> > possible. So please delete the code if Juno is the only user in

> > short and mid term. We can get the code back if we find any users in

> > longer term.

> 

> Its funny that the $subject patch receives few Acks and Nacks every day now :)


I am still in favor to removed it, my ack still sustains,
as my original motivation to keep this is pointless as people responsible
for the code is not willing upstream it nor have the time nor will get
further vendors to use it.

> 

> -- 

> viresh
Sudeep Holla Nov. 22, 2017, 4:04 p.m. UTC | #32
On 22/11/17 15:34, Eduardo Valentin wrote:
> On Wed, Nov 22, 2017 at 10:59:21AM +0000, Sudeep Holla wrote:

>> (sorry for chiming in quite late)

>>

>> On 21/11/17 18:12, Eduardo Valentin wrote:

>>> On Tue, Nov 21, 2017 at 06:00:07PM +0000, Javi Merino wrote:

>>>> Hi,

>>>>

>>>> On Tue, Nov 21, 2017 at 08:57:06AM -0800, Eduardo Valentin wrote:

>>>>

>>

>> [...]

>>

>>>>

>>>> In a nutshell, mainline does not want platform specific code, but we

>>>> haven't figured out how to calculate static power without platform

>>>> specific code.>

>>> To, that is still fine to have it as a callback, as long as you have at

>>> least one user! I still do not understand why Juno static power cannot

>>> go as platform code that register the callback to implement the static

>>> power model.

>>>

>>

>> 1. It was proved not so useful(anyone can prove otherwise ?)

> 

> Can anyone prove it does not have static power?

> 


I didn't claim that. I said it has been found that it's negligible based
on experiments.

>> 2. I am told static power is negligible compared to dynamic power with

>>    new fab processes.

> 

> I am told quantum computer is out there :-), does it mean we should drop the

> maintenance of everything else?

> 


Good comparison :)

Anyways all I told is if this code expects Juno to be user, then no as
there's no reliable way to test it.

It's entirely up to you if you want to support it or delete it as you
are the maintainer. I am bit confused here as you seem to imply you want
to continue supporting it here with you quantum comparison but say you
are fine to delete in other thread.

> 

>> 3. It's very hard to even test IPA on Juno as it doesn't reach the

>>    required critical temperature easily. So as Juno platform maintainer

>>    I want a test case to test regression before we merge anything.

>>

>> IMO, if the $subject code is expected to be used on Juno, then my answer

>> is no if one can't test it reliably and also prove that static power

>> really matters on Juno. So far, I have heard both the above is not

>> possible. So please delete the code if Juno is the only user in

>> short and mid term. We can get the code back if we find any users in

>> longer term.

> 

> Yeah, the fact that Juno takes time to reach crit temperature does not

> necessarily imply it does not have static power consumption, or that its

> static power consumption is negligible. Now, if you want to ignore it,

> because it is not the best example to show usefulness of IPA, that is a

> different story.

> 


OK, I agree that I want to ignore the usefulness of static power on Juno
as no one provides a reliable way to see that and test that regularly. I
am open to change my mind if circumstances change.

-- 
Regards,
Sudeep
diff mbox series

Patch

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 55d6b9fb909d..f102ad6127a4 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -88,7 +88,6 @@  struct time_in_idle {
  * @policy: cpufreq policy.
  * @node: list_head to link all cpufreq_cooling_device together.
  * @idle_time: idle time stats
- * @plat_get_static_power: callback to calculate the static power
  *
  * This structure is required for keeping information of each registered
  * cpufreq_cooling_device.
@@ -104,7 +103,6 @@  struct cpufreq_cooling_device {
 	struct cpufreq_policy *policy;
 	struct list_head node;
 	struct time_in_idle *idle_time;
-	get_static_t plat_get_static_power;
 };
 
 static DEFINE_IDA(cpufreq_ida);
@@ -318,60 +316,6 @@  static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu,
 	return load;
 }
 
-/**
- * get_static_power() - calculate the static power consumed by the cpus
- * @cpufreq_cdev:	struct &cpufreq_cooling_device for this cpu cdev
- * @tz:		thermal zone device in which we're operating
- * @freq:	frequency in KHz
- * @power:	pointer in which to store the calculated static power
- *
- * Calculate the static power consumed by the cpus described by
- * @cpu_actor running at frequency @freq.  This function relies on a
- * platform specific function that should have been provided when the
- * actor was registered.  If it wasn't, the static power is assumed to
- * be negligible.  The calculated static power is stored in @power.
- *
- * Return: 0 on success, -E* on failure.
- */
-static int get_static_power(struct cpufreq_cooling_device *cpufreq_cdev,
-			    struct thermal_zone_device *tz, unsigned long freq,
-			    u32 *power)
-{
-	struct dev_pm_opp *opp;
-	unsigned long voltage;
-	struct cpufreq_policy *policy = cpufreq_cdev->policy;
-	struct cpumask *cpumask = policy->related_cpus;
-	unsigned long freq_hz = freq * 1000;
-	struct device *dev;
-
-	if (!cpufreq_cdev->plat_get_static_power) {
-		*power = 0;
-		return 0;
-	}
-
-	dev = get_cpu_device(policy->cpu);
-	WARN_ON(!dev);
-
-	opp = dev_pm_opp_find_freq_exact(dev, freq_hz, true);
-	if (IS_ERR(opp)) {
-		dev_warn_ratelimited(dev, "Failed to find OPP for frequency %lu: %ld\n",
-				     freq_hz, PTR_ERR(opp));
-		return -EINVAL;
-	}
-
-	voltage = dev_pm_opp_get_voltage(opp);
-	dev_pm_opp_put(opp);
-
-	if (voltage == 0) {
-		dev_err_ratelimited(dev, "Failed to get voltage for frequency %lu\n",
-				    freq_hz);
-		return -EINVAL;
-	}
-
-	return cpufreq_cdev->plat_get_static_power(cpumask, tz->passive_delay,
-						  voltage, power);
-}
-
 /**
  * get_dynamic_power() - calculate the dynamic power
  * @cpufreq_cdev:	&cpufreq_cooling_device for this cdev
@@ -491,8 +435,8 @@  static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev,
 				       u32 *power)
 {
 	unsigned long freq;
-	int i = 0, cpu, ret;
-	u32 static_power, dynamic_power, total_load = 0;
+	int i = 0, cpu;
+	u32 total_load = 0;
 	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
 	struct cpufreq_policy *policy = cpufreq_cdev->policy;
 	u32 *load_cpu = NULL;
@@ -522,22 +466,15 @@  static int cpufreq_get_requested_power(struct thermal_cooling_device *cdev,
 
 	cpufreq_cdev->last_load = total_load;
 
-	dynamic_power = get_dynamic_power(cpufreq_cdev, freq);
-	ret = get_static_power(cpufreq_cdev, tz, freq, &static_power);
-	if (ret) {
-		kfree(load_cpu);
-		return ret;
-	}
+	*power = get_dynamic_power(cpufreq_cdev, freq);
 
 	if (load_cpu) {
 		trace_thermal_power_cpu_get_power(policy->related_cpus, freq,
-						  load_cpu, i, dynamic_power,
-						  static_power);
+						  load_cpu, i, *power);
 
 		kfree(load_cpu);
 	}
 
-	*power = static_power + dynamic_power;
 	return 0;
 }
 
@@ -561,8 +498,6 @@  static int cpufreq_state2power(struct thermal_cooling_device *cdev,
 			       unsigned long state, u32 *power)
 {
 	unsigned int freq, num_cpus;
-	u32 static_power, dynamic_power;
-	int ret;
 	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
 
 	/* Request state should be less than max_level */
@@ -572,13 +507,9 @@  static int cpufreq_state2power(struct thermal_cooling_device *cdev,
 	num_cpus = cpumask_weight(cpufreq_cdev->policy->cpus);
 
 	freq = cpufreq_cdev->freq_table[state].frequency;
-	dynamic_power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus;
-	ret = get_static_power(cpufreq_cdev, tz, freq, &static_power);
-	if (ret)
-		return ret;
+	*power = cpu_freq_to_power(cpufreq_cdev, freq) * num_cpus;
 
-	*power = static_power + dynamic_power;
-	return ret;
+	return 0;
 }
 
 /**
@@ -606,21 +537,14 @@  static int cpufreq_power2state(struct thermal_cooling_device *cdev,
 			       unsigned long *state)
 {
 	unsigned int cur_freq, target_freq;
-	int ret;
-	s32 dyn_power;
-	u32 last_load, normalised_power, static_power;
+	u32 last_load, normalised_power;
 	struct cpufreq_cooling_device *cpufreq_cdev = cdev->devdata;
 	struct cpufreq_policy *policy = cpufreq_cdev->policy;
 
 	cur_freq = cpufreq_quick_get(policy->cpu);
-	ret = get_static_power(cpufreq_cdev, tz, cur_freq, &static_power);
-	if (ret)
-		return ret;
-
-	dyn_power = power - static_power;
-	dyn_power = dyn_power > 0 ? dyn_power : 0;
+	power = power > 0 ? power : 0;
 	last_load = cpufreq_cdev->last_load ?: 1;
-	normalised_power = (dyn_power * 100) / last_load;
+	normalised_power = (power * 100) / last_load;
 	target_freq = cpu_power_to_freq(cpufreq_cdev, normalised_power);
 
 	*state = get_level(cpufreq_cdev, target_freq);
@@ -671,8 +595,6 @@  static unsigned int find_next_max(struct cpufreq_frequency_table *table,
  * @policy: cpufreq policy
  * Normally this should be same as cpufreq policy->related_cpus.
  * @capacitance: dynamic power coefficient for these cpus
- * @plat_static_func: function to calculate the static power consumed by these
- *                    cpus (optional)
  *
  * This interface function registers the cpufreq cooling device with the name
  * "thermal-cpufreq-%x". This api can support multiple instances of cpufreq
@@ -684,8 +606,7 @@  static unsigned int find_next_max(struct cpufreq_frequency_table *table,
  */
 static struct thermal_cooling_device *
 __cpufreq_cooling_register(struct device_node *np,
-			struct cpufreq_policy *policy, u32 capacitance,
-			get_static_t plat_static_func)
+			struct cpufreq_policy *policy, u32 capacitance)
 {
 	struct thermal_cooling_device *cdev;
 	struct cpufreq_cooling_device *cpufreq_cdev;
@@ -755,8 +676,6 @@  __cpufreq_cooling_register(struct device_node *np,
 	}
 
 	if (capacitance) {
-		cpufreq_cdev->plat_get_static_power = plat_static_func;
-
 		ret = update_freq_table(cpufreq_cdev, capacitance);
 		if (ret) {
 			cdev = ERR_PTR(ret);
@@ -813,7 +732,7 @@  __cpufreq_cooling_register(struct device_node *np,
 struct thermal_cooling_device *
 cpufreq_cooling_register(struct cpufreq_policy *policy)
 {
-	return __cpufreq_cooling_register(NULL, policy, 0, NULL);
+	return __cpufreq_cooling_register(NULL, policy, 0);
 }
 EXPORT_SYMBOL_GPL(cpufreq_cooling_register);
 
@@ -853,8 +772,7 @@  of_cpufreq_cooling_register(struct cpufreq_policy *policy)
 		of_property_read_u32(np, "dynamic-power-coefficient",
 				     &capacitance);
 
-		cdev = __cpufreq_cooling_register(np, policy, capacitance,
-						  NULL);
+		cdev = __cpufreq_cooling_register(np, policy, capacitance);
 		if (IS_ERR(cdev)) {
 			pr_err("cpu_cooling: cpu%d is not running as cooling device: %ld\n",
 			       policy->cpu, PTR_ERR(cdev));
diff --git a/include/linux/cpu_cooling.h b/include/linux/cpu_cooling.h
index a0204c58d269..22d7364cbc16 100644
--- a/include/linux/cpu_cooling.h
+++ b/include/linux/cpu_cooling.h
@@ -30,9 +30,6 @@ 
 
 struct cpufreq_policy;
 
-typedef int (*get_static_t)(cpumask_t *cpumask, int interval,
-			    unsigned long voltage, u32 *power);
-
 #ifdef CONFIG_CPU_THERMAL
 /**
  * cpufreq_cooling_register - function to create cpufreq cooling device.
diff --git a/include/trace/events/thermal.h b/include/trace/events/thermal.h
index 466c09d882ad..52424cf13408 100644
--- a/include/trace/events/thermal.h
+++ b/include/trace/events/thermal.h
@@ -93,9 +93,9 @@  TRACE_EVENT(thermal_zone_trip,
 
 TRACE_EVENT(thermal_power_cpu_get_power,
 	TP_PROTO(const struct cpumask *cpus, unsigned long freq, u32 *load,
-		size_t load_len, u32 dynamic_power, u32 static_power),
+		size_t load_len, u32 dynamic_power),
 
-	TP_ARGS(cpus, freq, load, load_len, dynamic_power, static_power),
+	TP_ARGS(cpus, freq, load, load_len, dynamic_power),
 
 	TP_STRUCT__entry(
 		__bitmask(cpumask, num_possible_cpus())
@@ -103,7 +103,6 @@  TRACE_EVENT(thermal_power_cpu_get_power,
 		__dynamic_array(u32,   load, load_len)
 		__field(size_t,        load_len      )
 		__field(u32,           dynamic_power )
-		__field(u32,           static_power  )
 	),
 
 	TP_fast_assign(
@@ -114,13 +113,12 @@  TRACE_EVENT(thermal_power_cpu_get_power,
 			load_len * sizeof(*load));
 		__entry->load_len = load_len;
 		__entry->dynamic_power = dynamic_power;
-		__entry->static_power = static_power;
 	),
 
-	TP_printk("cpus=%s freq=%lu load={%s} dynamic_power=%d static_power=%d",
+	TP_printk("cpus=%s freq=%lu load={%s} dynamic_power=%d",
 		__get_bitmask(cpumask), __entry->freq,
 		__print_array(__get_dynamic_array(load), __entry->load_len, 4),
-		__entry->dynamic_power, __entry->static_power)
+		__entry->dynamic_power)
 );
 
 TRACE_EVENT(thermal_power_cpu_limit,