diff mbox series

[v8,09/26] kernel/cpu_pm: Manage runtime PM in the idle path for CPUs

Message ID 20180620172226.15012-10-ulf.hansson@linaro.org
State New
Headers show
Series PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM) | expand

Commit Message

Ulf Hansson June 20, 2018, 5:22 p.m. UTC
To allow CPUs being power managed by PM domains, let's deploy support for
runtime PM for the CPU's corresponding struct device.

More precisely, at the point when the CPU is about to enter an idle state,
decrease the runtime PM usage count for its corresponding struct device,
via calling pm_runtime_put_sync_suspend(). Then, at the point when the CPU
resumes from idle, let's increase the runtime PM usage count, via calling
pm_runtime_get_sync().

Cc: Lina Iyer <ilina@codeaurora.org>
Co-developed-by: Lina Iyer <lina.iyer@linaro.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---
 kernel/cpu_pm.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

-- 
2.17.1

Comments

Rafael J. Wysocki July 19, 2018, 10:12 a.m. UTC | #1
On Wednesday, July 18, 2018 12:11:06 PM CEST Rafael J. Wysocki wrote:
> On Wednesday, June 20, 2018 7:22:09 PM CEST Ulf Hansson wrote:

> > To allow CPUs being power managed by PM domains, let's deploy support for

> > runtime PM for the CPU's corresponding struct device.

> > 

> > More precisely, at the point when the CPU is about to enter an idle state,

> > decrease the runtime PM usage count for its corresponding struct device,

> > via calling pm_runtime_put_sync_suspend(). Then, at the point when the CPU

> > resumes from idle, let's increase the runtime PM usage count, via calling

> > pm_runtime_get_sync().

> > 

> > Cc: Lina Iyer <ilina@codeaurora.org>

> > Co-developed-by: Lina Iyer <lina.iyer@linaro.org>

> > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

> 

> I finally got to this one, sorry for the huge delay.

> 

> Let me confirm that I understand the code flow correctly.

> 

> > ---

> >  kernel/cpu_pm.c | 11 +++++++++++

> >  1 file changed, 11 insertions(+)

> > 

> > diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c

> > index 67b02e138a47..492d4a83dca0 100644

> > --- a/kernel/cpu_pm.c

> > +++ b/kernel/cpu_pm.c

> > @@ -16,9 +16,11 @@

> >   */

> >  

> >  #include <linux/kernel.h>

> > +#include <linux/cpu.h>

> >  #include <linux/cpu_pm.h>

> >  #include <linux/module.h>

> >  #include <linux/notifier.h>

> > +#include <linux/pm_runtime.h>

> >  #include <linux/spinlock.h>

> >  #include <linux/syscore_ops.h>

> >  

> > @@ -91,6 +93,7 @@ int cpu_pm_enter(void)

> 

> This is called from a cpuidle driver's ->enter callback for the target state

> selected by the idle governor ->

> 

> >  {

> >  	int nr_calls;

> >  	int ret = 0;

> > +	struct device *dev = get_cpu_device(smp_processor_id());

> >  

> >  	ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);

> >  	if (ret)

> > @@ -100,6 +103,9 @@ int cpu_pm_enter(void)

> >  		 */

> >  		cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);

> >  

> > +	if (!ret && dev && dev->pm_domain)

> > +		pm_runtime_put_sync_suspend(dev);

> 

> -> so this is going to invoke genpd_runtime_suspend() if the usage

> counter of dev is 0.

> 

> That will cause cpu_power_down_ok() to be called (because this is

> a CPU domain) and that will walk the domain cpumask and compute the

> estimated idle duration as the minimum of tick_nohz_get_next_wakeup()

> values over the CPUs in that cpumask.  [Note that the weight of the

> cpumask must be seriously limited for that to actually work, as this

> happens in the idle path.]  Next, it will return "true" if it can

> find a domain state with residency within the estimated idle

> duration.  [Note that this sort of overlaps with the idle governor's

> job.]

> 

> Next, __genpd_runtime_suspend() will be invoked to run the device-specific

> callback if any [Note that this has to be suitable for the idle path if

> present.] and genpd_stop_dev() runs (which, again, may invoke a callback)

> and genpd_power_off() runs under the domain lock (which must be a spinlock

> then).

> 

> > +

> >  	return ret;

> >  }

> >  EXPORT_SYMBOL_GPL(cpu_pm_enter);

> > @@ -118,6 +124,11 @@ EXPORT_SYMBOL_GPL(cpu_pm_enter);

> >   */

> >  int cpu_pm_exit(void)

> >  {

> > +	struct device *dev = get_cpu_device(smp_processor_id());

> > +

> > +	if (dev && dev->pm_domain)

> > +		pm_runtime_get_sync(dev);

> > +

> >  	return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);

> >  }

> >  EXPORT_SYMBOL_GPL(cpu_pm_exit);

> > 

> 

> And this is called on wakeup when the cpuidle driver's ->enter callback

> is about to return and it reverses the suspend flow (except that the

> governor doesn't need to be called now).

> 

> Have I got that right?


Assuming that I have got that right, there are concerns, mostly regarding
patch [07/26], but I will reply to that directly.

The $subject patch is fine by me by itself, but it obviously depends on the
previous ones.  Patches [01-02/26] are fine too, but they don't seem to be
particularly useful without the rest of the series.

As far as patches [10-26/26] go, I'd like to see some review comments and/or
tags from the people with vested interest in there, in particular from Daniel
on patch [12/26] and from Sudeep on the PSCI ones.

Thanks,
Rafael
Rafael J. Wysocki Aug. 6, 2018, 9:37 a.m. UTC | #2
On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> [...]

>

>>>

>>> Assuming that I have got that right, there are concerns, mostly regarding

>>> patch [07/26], but I will reply to that directly.

>>

>> Well, I haven't got that right, so never mind.

>>

>> There are a few minor things to address, but apart from that the general

>> genpd patches look ready.

>

> Alright, thanks!

>

> I will re-spin the series and post a new version once 4.19 rc1 is out.

> Hopefully we can queue it up early in next cycle to get it tested in

> next for a while.

>

>>

>>> The $subject patch is fine by me by itself, but it obviously depends on the

>>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be

>>> particularly useful without the rest of the series.

>>>

>>> As far as patches [10-26/26] go, I'd like to see some review comments and/or

>>> tags from the people with vested interest in there, in particular from Daniel

>>> on patch [12/26] and from Sudeep on the PSCI ones.

>>

>> But this still holds.

>

> Actually, patch 10 and patch11 is ready to go as well. I ping Daniel

> on patch 12.

>

> In regards to the rest of the series, some of the PSCI/ARM changes

> have been reviewed by Mark Rutland, however several changes have not

> been acked.

>

> On the other hand, one can also interpret the long silence in regards

> to PSCI/ARM changes as they are good to go. :-)


Well, in that case giving an ACK to them should not be an issue for
the people with a vested interest I suppose.
Lina Iyer Aug. 8, 2018, 6:02 p.m. UTC | #3
On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:
>On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:

>> On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org> wrote:

>> > [...]

>> >

>> >>>

>> >>> Assuming that I have got that right, there are concerns, mostly regarding

>> >>> patch [07/26], but I will reply to that directly.

>> >>

>> >> Well, I haven't got that right, so never mind.

>> >>

>> >> There are a few minor things to address, but apart from that the general

>> >> genpd patches look ready.

>> >

>> > Alright, thanks!

>> >

>> > I will re-spin the series and post a new version once 4.19 rc1 is out.

>> > Hopefully we can queue it up early in next cycle to get it tested in

>> > next for a while.

>> >

>> >>

>> >>> The $subject patch is fine by me by itself, but it obviously depends on the

>> >>> previous ones.  Patches [01-02/26] are fine too, but they don't seem to be

>> >>> particularly useful without the rest of the series.

>> >>>

>> >>> As far as patches [10-26/26] go, I'd like to see some review comments and/or

>> >>> tags from the people with vested interest in there, in particular from Daniel

>> >>> on patch [12/26] and from Sudeep on the PSCI ones.

>> >>

>> >> But this still holds.

>> >

>> > Actually, patch 10 and patch11 is ready to go as well. I ping Daniel

>> > on patch 12.

>> >

>> > In regards to the rest of the series, some of the PSCI/ARM changes

>> > have been reviewed by Mark Rutland, however several changes have not

>> > been acked.

>> >

>> > On the other hand, one can also interpret the long silence in regards

>> > to PSCI/ARM changes as they are good to go. :-)

>>

>> Well, in that case giving an ACK to them should not be an issue for

>> the people with a vested interest I suppose.

>

>Apologies to everyone for the delay in replying.

>

>Side note: cpu_pm_enter()/exit() are also called through syscore ops in

>s2RAM/IDLE, you know that but I just wanted to mention it to compound

>the discussion.

>

>As for PSCI patches I do not personally think PSCI OSI enablement is

>beneficial (and my position has always been the same since PSCI OSI was

>added to the specification, I am not even talking about this patchset)

>and Arm Trusted Firmware does not currently support it for the same

>reason.

>

>We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a

>definitive and constructive answer to *why* we have to do that that is

>not a dogmatic "the kernel knows better" but rather a comprehensive

>power benchmark evaluation - I thought that was the agreement reached

>at OSPM but apparently I was mistaken.

>

I will not speak to any comparison of benchmarks between OSI and PC.
AFAIK, there are no platforms supporting both.

But, the OSI feature is critical for QCOM mobile platforms. The
last man activities during cpuidle save quite a lot of power. Powering
off the clocks, busses, regulators and even the oscillator is very
important to have a reasonable battery life when using the phone.
Platform coordinated approach falls quite short of the needs of a
powerful processor with a desired battery efficiency.

-- Lina

>As a reminder - PSCI firmware implementation has to have state machines

>and locking to guarantee safe power down operations (and to flush caches

>only if necessary - which requires cpu masks for power domains) and

>that's true whether we enable PSCI OSI or not, the coordination logic

>must be in firmware/hardware _already_ - the cpumasks, the power domain

>topology, etc.

>

>I agree with the power-domains representation of idle-states (since

>that's the correct HW description) and I thought and hoped that runtime

>PM could help _remove_ the CPU PM notifiers (by making the notifiers

>callbacks a runtime PM one) even though I have to say that's quite

>complex, given that only few (ie one instance :)) CPU PM notifiers

>callbacks are backed by a struct device (eg an ARM PMU is a device but

>for instance the GIC is not a device so its save/restore code I am not

>sure it can be implemented with runtime PM callbacks).

>

>Lorenzo
Sudeep Holla Aug. 9, 2018, 9:58 a.m. UTC | #4
On Wed, Aug 08, 2018 at 12:02:48PM -0600, Lina Iyer wrote:

[...]

> I will not speak to any comparison of benchmarks between OSI and PC.

> AFAIK, there are no platforms supporting both.

>


That's the fundamental issue here. So we have never ever done a proper
comparison.

> But, the OSI feature is critical for QCOM mobile platforms. The

> last man activities during cpuidle save quite a lot of power. Powering

> off the clocks, busses, regulators and even the oscillator is very

> important to have a reasonable battery life when using the phone.

> Platform coordinated approach falls quite short of the needs of a

> powerful processor with a desired battery efficiency.

>


As mentioned above, without the actual comparison it's hard to justify
that. While there are corner cases where OSI is able to make better
judgement, may be we can add ways to deal with that in the firmware
with PC mode, have we explored that before adding complexity to the OSPM ?
Since the firmware complexity with OSI remains same as PC mode, isn't it
worth checking if the corner case we are talking here can be handled in
the firmware.

--
Regards,
Sudeep
Lina Iyer Aug. 10, 2018, 8:36 p.m. UTC | #5
On Thu, Aug 09 2018 at 02:16 -0600, Rafael J. Wysocki wrote:
>On Wed, Aug 8, 2018 at 8:02 PM, Lina Iyer <ilina@codeaurora.org> wrote:

>> On Wed, Aug 08 2018 at 04:56 -0600, Lorenzo Pieralisi wrote:

>>>

>>> On Mon, Aug 06, 2018 at 11:37:55AM +0200, Rafael J. Wysocki wrote:

>>>>

>>>> On Fri, Aug 3, 2018 at 1:42 PM, Ulf Hansson <ulf.hansson@linaro.org>

>>>> wrote:

>>>> > [...]

>>>> >

>>>> >>>

>>>> >>> Assuming that I have got that right, there are concerns, mostly

>>>> >>> regarding

>>>> >>> patch [07/26], but I will reply to that directly.

>>>> >>

>>>> >> Well, I haven't got that right, so never mind.

>>>> >>

>>>> >> There are a few minor things to address, but apart from that the

>>>> >> general

>>>> >> genpd patches look ready.

>>>> >

>>>> > Alright, thanks!

>>>> >

>>>> > I will re-spin the series and post a new version once 4.19 rc1 is out.

>>>> > Hopefully we can queue it up early in next cycle to get it tested in

>>>> > next for a while.

>>>> >

>>>> >>

>>>> >>> The $subject patch is fine by me by itself, but it obviously depends

>>>> >>> on the

>>>> >>> previous ones.  Patches [01-02/26] are fine too, but they don't seem

>>>> >>> to be

>>>> >>> particularly useful without the rest of the series.

>>>> >>>

>>>> >>> As far as patches [10-26/26] go, I'd like to see some review comments

>>>> >>> and/or

>>>> >>> tags from the people with vested interest in there, in particular

>>>> >>> from Daniel

>>>> >>> on patch [12/26] and from Sudeep on the PSCI ones.

>>>> >>

>>>> >> But this still holds.

>>>> >

>>>> > Actually, patch 10 and patch11 is ready to go as well. I ping Daniel

>>>> > on patch 12.

>>>> >

>>>> > In regards to the rest of the series, some of the PSCI/ARM changes

>>>> > have been reviewed by Mark Rutland, however several changes have not

>>>> > been acked.

>>>> >

>>>> > On the other hand, one can also interpret the long silence in regards

>>>> > to PSCI/ARM changes as they are good to go. :-)

>>>>

>>>> Well, in that case giving an ACK to them should not be an issue for

>>>> the people with a vested interest I suppose.

>>>

>>>

>>> Apologies to everyone for the delay in replying.

>>>

>>> Side note: cpu_pm_enter()/exit() are also called through syscore ops in

>>> s2RAM/IDLE, you know that but I just wanted to mention it to compound

>>> the discussion.

>>>

>>> As for PSCI patches I do not personally think PSCI OSI enablement is

>>> beneficial (and my position has always been the same since PSCI OSI was

>>> added to the specification, I am not even talking about this patchset)

>>> and Arm Trusted Firmware does not currently support it for the same

>>> reason.

>>>

>>> We (if Mark and Sudeep agree) will enable PSCI OSI if and when we have a

>>> definitive and constructive answer to *why* we have to do that that is

>>> not a dogmatic "the kernel knows better" but rather a comprehensive

>>> power benchmark evaluation - I thought that was the agreement reached

>>> at OSPM but apparently I was mistaken.

>>>

>> I will not speak to any comparison of benchmarks between OSI and PC.

>> AFAIK, there are no platforms supporting both.

>>

>> But, the OSI feature is critical for QCOM mobile platforms. The

>> last man activities during cpuidle save quite a lot of power. Powering

>> off the clocks, busses, regulators and even the oscillator is very

>> important to have a reasonable battery life when using the phone.

>> Platform coordinated approach falls quite short of the needs of a

>> powerful processor with a desired battery efficiency.

>

>Even so, you still need firmware (or hardware) to do the right thing

>in the concurrent wakeup via an edge-triggered interrupt case AFAICS.

>That is, you need the domain to be prevented from being turned off if

>one of the CPUs in it has just been woken up and the interrupt is

>still pending.

Yes, that is true and we have been doing this on pretty much every QC
SoC there is, for CPU domains. Generally, there is a handshake of sorts
with the power domain controller when the core executes WFI. It
decrements the reference on the controller when going down and
increments when coming up. The controller is only turned off when the
reference count is 0 and is turned back on before the CPU is ready to
exit the WFI.

What we are doing here is hand the domain's ->power_off and ->power_on
over to the platform firmware, which needs to make sure the races are
handled correctly either in h/w or through mechanisms like MCPM or in
the firmware. I would consider what happens during the power on/off of
the domains beyond the realm of the genpd at least for CPU specific PM
domains.

Thanks,
Lina
Ulf Hansson Aug. 24, 2018, 12:24 p.m. UTC | #6
Lorenzo, Sudeep, Mark

On 15 August 2018 at 12:44, Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> wrote:
> On Fri, Aug 10, 2018 at 02:18:15PM -0600, Lina Iyer wrote:

>

> [...]

>

>> >>But, the OSI feature is critical for QCOM mobile platforms. The

>> >>last man activities during cpuidle save quite a lot of power.

>> >

>> >What I expressed above was that, in PSCI based systems (OSI or PC

>> >alike), it is up to firmware/hardware to detect "the last man" not

>> >the kernel.

>> >

>> >I need to understand what you mean by "last man activities" to

>> >provide feedback here.

>> >

>> When the last CPU goes down during deep sleep, the following would be

>> done

>> - Lower resource requirements for shared resources such as clocks,

>>  busses and regulators that were used by drivers in AP. These shared

>>  resources when not used by other processors in the SoC may be turned

>>  off and put in low power state by a remote processor. [1][2]

>> - Enable and setup wakeup capable interrupts on an always-on interrupt

>>  controller, so the GIC and the GPIO controllers may be put in low

>>  power state. [3][4]

>> - Write next known wakeup value to the timer, so the blocks that were

>>  powered off, may be brought back into operational before the wakeup.

>>  [4][5]

>>

>> These are commonly done during suspend, but to achieve a good power

>> efficiency, we have to do this when all the CPUs are just executing CPU

>> idle. Also, they cannot be done from the firmware (because the data

>> required for all this is part of Linux). OSI plays a crucial role in

>> determining when to do all this.

>

> No it does not. It is the power domain cpumasks that allow this code to

> make an educated guess on the last cpu running (the kernel), PSCI OSI is

> not crucial at all (it is crucial in QC platforms because that's the

> only mode supported but that's not a reason I accept as valid since it

> does not comply with the PSCI specifications).


We can keep argue on this back and forward, but it seems to lead
nowhere. As a matter of fact I am also surprised that this kind of
discussion pops up, again. I thought we had sorted this out,
especially since we have also met face to face, discussing this in
detail, several times by now. Well, well, let's try again. :-)

First, in regards to complying with the PSCI spec, to me, that sounds
like nonsense, sorry! Is the spec stating that the PSCI FW needs to
support all the idle states in PC mode, when the optional OSI mode
also is supported? To me, it looks like the QCOM PSCI FW supports PC
mode, but in that mode only a subset of the idle states can be
reached, so that should be fine, no?

Moving forward, I am wondering if a more detailed technical
description, comparing the benefits from OSI mode vs the benefits from
PC mode could help? Or is just a waste of everybody time, as you all
already know this? Anyway I am willing to try, just tell me and I
provide you with the best details I can give, about why OSI is better
suited for these kind of QCOM SoCs. I trust Lina to help to fill in,
if/when needed.

Why? Simply because I doubt we ever see the QCOM FW for the battery
driven embedded devices to support all idles states in the PC mode, so
doing a comparison on for example the 410c platform, just doesn't
seems to be possible, sorry!

I also have another, quite important, concern. That is, ARM decided to
put the OSI mode into the PSCI spec, I assume there were reasons for
it. Then, when the ARM community wants to implement support for OSI
mode, you are now requiring us to proof the justification of it in the
spec. To me, that is, nicely stated, weird. :-) But it also worries
me, ARM vendors observes the behavior.

That said, in the end we are discussing a quite limited amount of
lines of code to support PSCI OSI (some which may not even be
considered as OSI specific). It's ~200 lines of code, where most of
the code lives in a separate new c-file (psci_pm_domain.c).
Additionally, existing PC mode only platforms should still work as
before, without drawbacks.

Really, why are we arguing about this at all?

>

> As I mentioned in another thread[1] the generic part of this

> series may be applicable in a platform agnostic way to the

> CPUidle framework, whether that's beneficial it has to be proven

> and it is benchmark specific anyway.


I don't think this can be made fully platform agnostic. Or maybe you
are suggesting another helper layer above the new genpd
infrastructure?

Anyway, my point is, the genpd backend driver requires knowledge about
the FW and the last man standing algorithm, hence a platform agnostic
backend doesn't sound feasible to me.

>

> Lorenzo

>

> [1]: https://marc.info/?l=linux-pm&m=153382916513032&w=2


Kind regards
Uffe
diff mbox series

Patch

diff --git a/kernel/cpu_pm.c b/kernel/cpu_pm.c
index 67b02e138a47..492d4a83dca0 100644
--- a/kernel/cpu_pm.c
+++ b/kernel/cpu_pm.c
@@ -16,9 +16,11 @@ 
  */
 
 #include <linux/kernel.h>
+#include <linux/cpu.h>
 #include <linux/cpu_pm.h>
 #include <linux/module.h>
 #include <linux/notifier.h>
+#include <linux/pm_runtime.h>
 #include <linux/spinlock.h>
 #include <linux/syscore_ops.h>
 
@@ -91,6 +93,7 @@  int cpu_pm_enter(void)
 {
 	int nr_calls;
 	int ret = 0;
+	struct device *dev = get_cpu_device(smp_processor_id());
 
 	ret = cpu_pm_notify(CPU_PM_ENTER, -1, &nr_calls);
 	if (ret)
@@ -100,6 +103,9 @@  int cpu_pm_enter(void)
 		 */
 		cpu_pm_notify(CPU_PM_ENTER_FAILED, nr_calls - 1, NULL);
 
+	if (!ret && dev && dev->pm_domain)
+		pm_runtime_put_sync_suspend(dev);
+
 	return ret;
 }
 EXPORT_SYMBOL_GPL(cpu_pm_enter);
@@ -118,6 +124,11 @@  EXPORT_SYMBOL_GPL(cpu_pm_enter);
  */
 int cpu_pm_exit(void)
 {
+	struct device *dev = get_cpu_device(smp_processor_id());
+
+	if (dev && dev->pm_domain)
+		pm_runtime_get_sync(dev);
+
 	return cpu_pm_notify(CPU_PM_EXIT, -1, NULL);
 }
 EXPORT_SYMBOL_GPL(cpu_pm_exit);