[v11,0/8] PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)

Message ID 20190226145415.19411-1-ulf.hansson@linaro.org
Headers show
Series
  • PM / Domains: Support hierarchical CPU arrangement (PSCI/ARM)
Related show

Message

Ulf Hansson Feb. 26, 2019, 2:54 p.m.
Changes in v11:
 - This version contains only the infrastructure changes that is needed for
deployment. The PSCI/ARM changes have also been updated and tested, but I will
post them separately. Still, to provide completeness, I have published a branch
containing everything to a git tree [1], feel free to have a look and test.
 - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",
which has been replaced by a couple of new patches, whom reworks the existing
tick_nohz_get_sleep_length() function, to provide the next timer expiration
instead of the duration.
 - More changelogs are available per patch.

Changes in v10:
 - Quite significant changes have been to the PSCI driver deployment. According
   to an agreement with Lorenzo, the hierarchical CPU layout for PSCI should be
   orthogonal to whether the PSCI FW supports OSI or not. This has been taken
   care of in this version.
 - Drop the generic attach/detach helpers of CPUs to genpd, instead make that
   related code internal to PSCI, for now.
 - Fix "BUG: sleeping for invalid context" for hotplug, as reported by Raju.
 - Addressed various comments from version 8 and 9.
 - Clarified changelogs and re-wrote the cover-letter to better explain the
   motivations behind these changes.

Background:

For ARM64/ARM based platforms CPUs are often arranged in a hierarchical manner.
From a CPU idle state perspective, this means some states may be shared among a
group of CPUs (aka CPU cluster).

To deal with idle management of a group of CPUs, sometimes the kernel needs to
be involved to manage the last-man standing algorithm, simply because it can't
rely solely on power management FWs to deal with this. Depending on the
platform, of course.

There are a couple of typical scenarios for when the kernel needs to be in
control, dealing with synchronization of when the last CPU in a cluster is about
to enter a deep idle state.

1)
The kernel needs to carry out so called last-man activities before the
CPU cluster can enter a deep idle state. This may for example involve to
configure external logics for wakeups, as the GIC may no longer be functional
once a deep cluster idle state have been entered. Likewise, these operations
may need to be restored, when the first CPU wakes up.

2)
Other more generic I/O devices, such as an MMC controller for example, may be a
part of the same power domain as the CPU cluster, due to a shared power-rail.
For these scenarios, when the MMC controller is in use dealing with an MMC
request, a deeper idle state of the CPU cluster may needs to be temporarily
disabled. This is needed to retain the MMC controller in a functional state,
else it may loose its register-context in the middle of serving a request.

In this series, we are extending the generic PM domain (aka genpd) to be used
for also CPU devices. Hence the goal is to re-use much of its current code to
help us manage the last-man standing synchronization. Moreover, as we already
use genpd to model power domains for generic I/O devices, both 1) and 2) can be
address with its help.

Moreover, to address these problems for ARM64 DT based platforms, we are
deploying support for genpd and runtime PM to the PSCI FW driver - and finally
we make some updates to two ARM64 DTBs, as to deploy the new PSCI CPU topology
layout.

The series has been tested on a Qcom 410c dragonboard and on a Hisilicon Hikey
board. The first one uses PSCI OS-initiated mode, while the second uses the PSCI
Platform-Coordinated mode.

Kind regards
Ulf Hansson

[1]
git.linaro.org/people/ulf.hansson/linux-pm.git next_v11

Daniel Lezcano (4):
  time: tick-sched: Provide helpers to get the next timer expiration
  cpuidle: menu: Convert to tick_nohz_get_next_timer|hrtimer()
  cpuidle: teo: Convert to tick_nohz_get_next_timer()
  time: tick-sched: Remove tick_nohz_get_sleep_length()

Ulf Hansson (4):
  PM / Domains: Add a generic data pointer to the genpd_power_state
    struct
  PM / Domains: Add support for CPU devices to genpd
  cpuidle: Pre-store next timer/tick before selecting an idle state
  PM / Domains: Add genpd governor for CPUs

 drivers/base/power/domain.c          | 78 ++++++++++++++++++++++++++--
 drivers/base/power/domain_governor.c | 62 +++++++++++++++++++++-
 drivers/cpuidle/cpuidle.c            |  2 +
 drivers/cpuidle/governors/menu.c     | 14 ++++-
 drivers/cpuidle/governors/teo.c      |  4 +-
 include/linux/cpuidle.h              |  2 +
 include/linux/pm_domain.h            | 20 ++++++-
 include/linux/tick.h                 | 11 ++--
 kernel/time/tick-sched.c             | 71 +++++++++++++++++--------
 9 files changed, 229 insertions(+), 35 deletions(-)

-- 
2.17.1

Comments

Rafael J. Wysocki Feb. 26, 2019, 5:50 p.m. | #1
On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>

> Changes in v11:

>  - This version contains only the infrastructure changes that is needed for

> deployment. The PSCI/ARM changes have also been updated and tested, but I will

> post them separately. Still, to provide completeness, I have published a branch

> containing everything to a git tree [1], feel free to have a look and test.

>  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",

> which has been replaced by a couple of new patches, whom reworks the existing

> tick_nohz_get_sleep_length() function, to provide the next timer expiration

> instead of the duration.

>  - More changelogs are available per patch.


NAK for patches [4-6/8].

The code as is specifically avoids calling ktime_get() from the
governors as that can be quite expensive, so these patches potentially
make things worse.
Ulf Hansson Feb. 26, 2019, 9:31 p.m. | #2
On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:
>

> On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> >

> > Changes in v11:

> >  - This version contains only the infrastructure changes that is needed for

> > deployment. The PSCI/ARM changes have also been updated and tested, but I will

> > post them separately. Still, to provide completeness, I have published a branch

> > containing everything to a git tree [1], feel free to have a look and test.

> >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",

> > which has been replaced by a couple of new patches, whom reworks the existing

> > tick_nohz_get_sleep_length() function, to provide the next timer expiration

> > instead of the duration.

> >  - More changelogs are available per patch.

>

> NAK for patches [4-6/8].

>

> The code as is specifically avoids calling ktime_get() from the

> governors as that can be quite expensive, so these patches potentially

> make things worse.


Yeah, good point! What do you think about folding in a patch into the
series, like below, and then let the cpuidle governors use it?

One questions about when CONFIG_NO_HZ_COMMON is unset for the below
suggested code, does it make sense to "return -1" for that case, or
should I return ktime_get()? Does it matter?

Thanks for reviewing!

Kind regards
Uffe

From: Ulf Hansson <ulf.hansson@linaro.org>

Date: Tue, 26 Feb 2019 21:43:46 +0100
Subject: [PATCH] time: tick-sched: Add a helper function returning the idle
 entry time

To avoid calling ktime_get() unnecessary times during the idle path, let's
export the timestamp we stored in the per CPU variable,
tick_cpu_sched.idle_entrytime, at the point when we entered idle.

Following changes to the cpuidle governors makes use of this.

Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---
 include/linux/tick.h     |  2 ++
 kernel/time/tick-sched.c | 10 ++++++++++
 2 files changed, 12 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 5b10a0e4acbb..b641f6e4a50f 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -126,6 +126,7 @@ extern void tick_nohz_irq_exit(void);
 extern bool tick_nohz_idle_got_tick(void);
 extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
 extern ktime_t tick_nohz_get_next_timer(void);
+extern ktime_t tick_nohz_get_idle_entrytime(void);
 extern unsigned long tick_nohz_get_idle_calls(void);
 extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
 extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
@@ -158,6 +159,7 @@ static inline ktime_t
tick_nohz_get_sleep_length(ktime_t *delta_next)
        *delta_next = TICK_NSEC;
        return *delta_next;
 }
+static inline ktime_t tick_nohz_get_idle_entrytime(void) { return -1; }
 static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
 static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }

diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 9966be665074..e5d66b618bfa 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -1127,6 +1127,16 @@ ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
        return ktime_sub(next_event, now);
 }

+/**
+ * tick_nohz_get_idle_entrytime - return the time when we entered idle
+ *
+ * Called from power state control code with interrupts disabled
+ */
+ktime_t tick_nohz_get_idle_entrytime(void)
+{
+       return __this_cpu_read(tick_cpu_sched.idle_entrytime);
+}
+
 /**
  * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value
  * for a particular CPU.
-- 
2.17.1
Rafael J. Wysocki Feb. 26, 2019, 9:52 p.m. | #3
On Tue, Feb 26, 2019 at 10:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>

> On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:

> >

> > On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> > >

> > > Changes in v11:

> > >  - This version contains only the infrastructure changes that is needed for

> > > deployment. The PSCI/ARM changes have also been updated and tested, but I will

> > > post them separately. Still, to provide completeness, I have published a branch

> > > containing everything to a git tree [1], feel free to have a look and test.

> > >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",

> > > which has been replaced by a couple of new patches, whom reworks the existing

> > > tick_nohz_get_sleep_length() function, to provide the next timer expiration

> > > instead of the duration.

> > >  - More changelogs are available per patch.

> >

> > NAK for patches [4-6/8].

> >

> > The code as is specifically avoids calling ktime_get() from the

> > governors as that can be quite expensive, so these patches potentially

> > make things worse.

>

> Yeah, good point! What do you think about folding in a patch into the

> series, like below, and then let the cpuidle governors use it?


It is not objectionable as it stands, but that also depends on what
the new function is used for.

In particular, I don't really think that the menu and teo governors
need to call it at all.

> One questions about when CONFIG_NO_HZ_COMMON is unset for the below

> suggested code, does it make sense to "return -1" for that case, or

> should I return ktime_get()? Does it matter?


Again, it may or may not matter depending on what the caller is going
to do with the value returned by it.
Ulf Hansson Feb. 26, 2019, 10:06 p.m. | #4
On Tue, 26 Feb 2019 at 22:52, Rafael J. Wysocki <rafael@kernel.org> wrote:
>

> On Tue, Feb 26, 2019 at 10:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> >

> > On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:

> > >

> > > On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> > > >

> > > > Changes in v11:

> > > >  - This version contains only the infrastructure changes that is needed for

> > > > deployment. The PSCI/ARM changes have also been updated and tested, but I will

> > > > post them separately. Still, to provide completeness, I have published a branch

> > > > containing everything to a git tree [1], feel free to have a look and test.

> > > >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",

> > > > which has been replaced by a couple of new patches, whom reworks the existing

> > > > tick_nohz_get_sleep_length() function, to provide the next timer expiration

> > > > instead of the duration.

> > > >  - More changelogs are available per patch.

> > >

> > > NAK for patches [4-6/8].

> > >

> > > The code as is specifically avoids calling ktime_get() from the

> > > governors as that can be quite expensive, so these patches potentially

> > > make things worse.

> >

> > Yeah, good point! What do you think about folding in a patch into the

> > series, like below, and then let the cpuidle governors use it?

>

> It is not objectionable as it stands, but that also depends on what

> the new function is used for.

>

> In particular, I don't really think that the menu and teo governors

> need to call it at all.


Well, if we are going to re-work the code as suggested in the series,
then how would you suggest to get rid of the calls to ktime_get() that
is introduced in patch 4 and patch5?

BTW, at a closer look I am even tempted to squash patch3 to patch6
(including the part I attached earlier) as this part of the series is
really a re-work of tick_nohz_get_sleep_length() and its users.

>

> > One questions about when CONFIG_NO_HZ_COMMON is unset for the below

> > suggested code, does it make sense to "return -1" for that case, or

> > should I return ktime_get()? Does it matter?

>

> Again, it may or may not matter depending on what the caller is going

> to do with the value returned by it.


Well, so far this is only for teo/menu - instead of calling ktime_get().

Any other suggestions of how to move forward here is welcome, of course!

Kind regards
Uffe
Rafael J. Wysocki Feb. 26, 2019, 10:11 p.m. | #5
On Tue, Feb 26, 2019 at 11:06 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>

> On Tue, 26 Feb 2019 at 22:52, Rafael J. Wysocki <rafael@kernel.org> wrote:

> >

> > On Tue, Feb 26, 2019 at 10:31 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> > >

> > > On Tue, 26 Feb 2019 at 18:50, Rafael J. Wysocki <rafael@kernel.org> wrote:

> > > >

> > > > On Tue, Feb 26, 2019 at 3:55 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:

> > > > >

> > > > > Changes in v11:

> > > > >  - This version contains only the infrastructure changes that is needed for

> > > > > deployment. The PSCI/ARM changes have also been updated and tested, but I will

> > > > > post them separately. Still, to provide completeness, I have published a branch

> > > > > containing everything to a git tree [1], feel free to have a look and test.

> > > > >  - The v10 series contained a patch, "timer: Export next wakeup time of a CPU",

> > > > > which has been replaced by a couple of new patches, whom reworks the existing

> > > > > tick_nohz_get_sleep_length() function, to provide the next timer expiration

> > > > > instead of the duration.

> > > > >  - More changelogs are available per patch.

> > > >

> > > > NAK for patches [4-6/8].

> > > >

> > > > The code as is specifically avoids calling ktime_get() from the

> > > > governors as that can be quite expensive, so these patches potentially

> > > > make things worse.

> > >

> > > Yeah, good point! What do you think about folding in a patch into the

> > > series, like below, and then let the cpuidle governors use it?

> >

> > It is not objectionable as it stands, but that also depends on what

> > the new function is used for.

> >

> > In particular, I don't really think that the menu and teo governors

> > need to call it at all.

>

> Well, if we are going to re-work the code as suggested in the series,

> then how would you suggest to get rid of the calls to ktime_get() that

> is introduced in patch 4 and patch5?

>

> BTW, at a closer look I am even tempted to squash patch3 to patch6

> (including the part I attached earlier) as this part of the series is

> really a re-work of tick_nohz_get_sleep_length() and its users.


Which is unnecessary IMO - see my reply to patch [7/8].