diff mbox series

[v2,4/4] cpuidle: psci: Allow WFI to be the only state for the hierarchical topology

Message ID 20200303203559.23995-5-ulf.hansson@linaro.org
State New
Headers show
Series cpuidle: psci: Some fixes when using the hierarchical layout | expand

Commit Message

Ulf Hansson March 3, 2020, 8:35 p.m. UTC
It's possible that only the WFI state is supported for the CPU, while also
a shared idle state exists for a group of CPUs.

When the hierarchical topology is used, the shared idle state may not be
compatible with arm,idle-state, rather with "domain-idle-state", which
makes dt_init_idle_driver() to return zero. This leads to that the
cpuidle-psci driver bails out during initialization, avoiding to register a
cpuidle driver and instead relies on the default architectural back-end
(called via cpu_do_idle()). In other words, the shared idle state becomes
unused.

Let's fix this behaviour, by allowing the dt_init_idle_driver() to return 0
and then continue with the initialization. If it turns out that the
hierarchical topology is used and we have some additional states to manage,
then continue with the cpuidle driver registration, otherwise bail out as
before.

Reported-by: Benjamin Gaignard <benjamin.gaignard@st.com>
Fixes: a65a397f2451 ("cpuidle: psci: Add support for PM domains by using genpd")
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>

---

Changes in v2:
	- Convert the error code returned from psci_cpu_suspend_enter() into an
	expected error code by cpuidle core.

---
 drivers/cpuidle/cpuidle-psci.c | 48 +++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 18 deletions(-)

-- 
2.20.1

Comments

Ulf Hansson March 6, 2020, 9:28 a.m. UTC | #1
On Thu, 5 Mar 2020 at 17:23, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
> On Thu, Mar 05, 2020 at 03:17:42PM +0100, Ulf Hansson wrote:
> > On Wed, 4 Mar 2020 at 13:23, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > >
> > > The $subject is bit confusing. IIUC, if there are no idle states to
> > > manage including hierarchical domain states you will not register the driver
> > > right ? If so, you are not allowing WFI to be the only state, hence my
> > > concern with $subject.
> >
> > I agree that's not so clear, but it wasn't easy to fit everything I
> > wanted to say in one line. :-)
> >
>
> No worries, just wanted to clarified. Looking at the patch, lot of things
> got clarified but thought we can always improve.
>
> > Is this below better and okay for you?
> >
> > "cpuidle: psci: Update condition when avoiding driver registration".
> >
>
> Definitely better than $subject :)

Great, then I switch to that.

>
> > >
> > > On Tue, Mar 03, 2020 at 09:35:59PM +0100, Ulf Hansson wrote:
> > > > It's possible that only the WFI state is supported for the CPU, while also
> > > > a shared idle state exists for a group of CPUs.
> > > >
> > > > When the hierarchical topology is used, the shared idle state may not be
> > > > compatible with arm,idle-state, rather with "domain-idle-state", which
> > > > makes dt_init_idle_driver() to return zero. This leads to that the
> > > > cpuidle-psci driver bails out during initialization, avoiding to register a
> > > > cpuidle driver and instead relies on the default architectural back-end
> > > > (called via cpu_do_idle()). In other words, the shared idle state becomes
> > > > unused.
> > > >
> > > > Let's fix this behaviour, by allowing the dt_init_idle_driver() to return 0
> > > > and then continue with the initialization. If it turns out that the
> > > > hierarchical topology is used and we have some additional states to manage,
> > > > then continue with the cpuidle driver registration, otherwise bail out as
> > > > before.
> > > >
> > > > Reported-by: Benjamin Gaignard <benjamin.gaignard@st.com>
> > > > Fixes: a65a397f2451 ("cpuidle: psci: Add support for PM domains by using genpd")
> > > > Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
> > > > ---
> > > >
> > > > Changes in v2:
> > > >       - Convert the error code returned from psci_cpu_suspend_enter() into an
> > > >       expected error code by cpuidle core.
> > > >
> > > > ---
> > > >  drivers/cpuidle/cpuidle-psci.c | 48 +++++++++++++++++++++-------------
> > > >  1 file changed, 30 insertions(+), 18 deletions(-)
> > > >
> > > > diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> > > > index bae9140a65a5..ae0fabec2742 100644
> > > > --- a/drivers/cpuidle/cpuidle-psci.c
> > > > +++ b/drivers/cpuidle/cpuidle-psci.c
> > > > @@ -56,16 +56,19 @@ static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
> > > >       u32 *states = data->psci_states;
> > > >       struct device *pd_dev = data->dev;
> > > >       u32 state;
> > > > -     int ret;
> > > > +     int ret = 0;
> > > >
> > > >       /* Do runtime PM to manage a hierarchical CPU toplogy. */
> > > >       pm_runtime_put_sync_suspend(pd_dev);
> > > >
> > > >       state = psci_get_domain_state();
> > > > -     if (!state)
> > > > +     if (!state && states)
> > > >               state = states[idx];
> > > >
> > > > -     ret = psci_enter_state(idx, state);
> > > > +     if (state)
> > > > +             ret = psci_cpu_suspend_enter(state) ? -1 : idx;
> > > > +     else
> > > > +             cpu_do_idle();
> > >
> > > May be, I haven't followed this completely yet, but I don't want to be
> > > in the position to replicated default arch idle hook. Just use the one
> > > that exist by simply not registering the driver.
> >
> > That doesn't work for the configuration I am solving.
> >
> > Assume this scenario: We have WFI and a domain/cluster idle state.
> > From the cpuidle governor point of view, it always selects the WFI
> > state, which means idx is zero.
> >
>
> OK. The only state that cluster can enter when CPUs are in WFI are
> cluster WFI and most hardware can handle it automatically. I don't see
> the need to do any extra work for that.

This isn't about cluster WFI, but about deeper cluster states, such as
a cluster-clock-gated-state and a cluster-power-off-state. It's an ST
platform, which Benjamin is working on.

>
> > Then, after we have called pm_runtime_put_sync_suspend() a few lines
> > above, we may potentially have a "domain state" to use, instead of the
> > WFI state.
> >
>
> Are they any platforms with this potential "domain state" to use with
> CPU WFI. I want to understand this better.
>
> > In this case, if we would have called psci_enter_state(), that would
> > lead us to calling cpu_do_idle() from the __CPU_PM_CPU_IDLE_ENTER()
> > macro, becuase idx is zero. In other words, the domain state would
> > become unused.
> >
>
> For a domain state to become unused with WFI, it needs to be available
> and I am not 100% sure of that.

With these changes from the series, we can fully conform to the
hierarchical DT bindings for PSCI.

I am not sure I understand your concern, is there a cost involved by
applying this?

Kind regards
Uffe
Sudeep Holla March 6, 2020, 12:06 p.m. UTC | #2
On Fri, Mar 06, 2020 at 11:47:40AM +0100, Benjamin Gaignard wrote:
> Le ven. 6 mars 2020 à 11:04, Sudeep Holla <sudeep.holla@arm.com> a écrit :
> >
> > On Fri, Mar 06, 2020 at 10:28:10AM +0100, Ulf Hansson wrote:
> > > On Thu, 5 Mar 2020 at 17:23, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > >
> >
> > [...]
> >
> > > > OK. The only state that cluster can enter when CPUs are in WFI are
> > > > cluster WFI and most hardware can handle it automatically. I don't see
> > > > the need to do any extra work for that.
> > >
> > > This isn't about cluster WFI, but about deeper cluster states, such as
> > > a cluster-clock-gated-state and a cluster-power-off-state. It's an ST
> > > platform, which Benjamin is working on.
> > >
> >
> > Then definitely something is completely wrong. You can't enter deeper
> > cluster states(clock-gated and power-off to be specific) with CPU in
> > just WFI state. So, if the attempt here is to enter those states, I
> > disagree with the change.
> >
> > Benjamin, please share the complete hierarchical topology for your platform.
>
> The platform is stm32mp157 SoC which embedded two Cortex A7 in one cluster.

Hang on a minute, is this the same platform where you wanted high
resolution timer and were hacking moving dirty tricks around[1]. Now I think
you have landed here.

> I would like to be able to put the system in a state where clocks of CPUs and
> hardware blocks are gated. In this state local timer are off.

Sure, please create a deeper CPU state than WFI and enter so that the CPU
state is saved and restored correctly. What is the problem doing that ?

> The platform should be allowed to go in this state when the devices
> within the power domain are pm_runtime_suspend and the CPUs in WFI.

Nope, we don't save and restore state when we enter/exit WFI. And hence
we can't allow deeper idle states in the hierarchy. No more discussion
on that.

> In DT I have one system power domain where the hardware blocks (i2,
> uart; spi, etc..) are attached + a power per CPU.

You really need a CPU idle state here.

--
Regards,
Sudeep

[1] https://lore.kernel.org/linux-arm-kernel/a42dd20677cddd8d09ea91a369a4e10b@www.loen.fr/
Benjamin Gaignard March 6, 2020, 12:32 p.m. UTC | #3
Le ven. 6 mars 2020 à 13:06, Sudeep Holla <sudeep.holla@arm.com> a écrit :
>
> On Fri, Mar 06, 2020 at 11:47:40AM +0100, Benjamin Gaignard wrote:
> > Le ven. 6 mars 2020 à 11:04, Sudeep Holla <sudeep.holla@arm.com> a écrit :
> > >
> > > On Fri, Mar 06, 2020 at 10:28:10AM +0100, Ulf Hansson wrote:
> > > > On Thu, 5 Mar 2020 at 17:23, Sudeep Holla <sudeep.holla@arm.com> wrote:
> > > > >
> > >
> > > [...]
> > >
> > > > > OK. The only state that cluster can enter when CPUs are in WFI are
> > > > > cluster WFI and most hardware can handle it automatically. I don't see
> > > > > the need to do any extra work for that.
> > > >
> > > > This isn't about cluster WFI, but about deeper cluster states, such as
> > > > a cluster-clock-gated-state and a cluster-power-off-state. It's an ST
> > > > platform, which Benjamin is working on.
> > > >
> > >
> > > Then definitely something is completely wrong. You can't enter deeper
> > > cluster states(clock-gated and power-off to be specific) with CPU in
> > > just WFI state. So, if the attempt here is to enter those states, I
> > > disagree with the change.
> > >
> > > Benjamin, please share the complete hierarchical topology for your platform.
> >
> > The platform is stm32mp157 SoC which embedded two Cortex A7 in one cluster.
>
> Hang on a minute, is this the same platform where you wanted high
> resolution timer and were hacking moving dirty tricks around[1]. Now I think
> you have landed here.

yes it has been fixed in this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm/kernel/time.c?h=v5.6-rc4&id=022eb8ae8b5ee8c5c813923c69b5ebb1e9612c4c

>
> > I would like to be able to put the system in a state where clocks of CPUs and
> > hardware blocks are gated. In this state local timer are off.
>
> Sure, please create a deeper CPU state than WFI and enter so that the CPU
> state is saved and restored correctly. What is the problem doing that ?

This state stop the clocks for all the hardware blocks and not only the CPUs
so we can't go on it while devices aren't suspended.
I may have missed something but I don't believe that I could add this kind of
conditions in a cpu idle state, right ?
In this state I need to be able to enable the wake up sources because
it is the only
for hardware block used as broadcast timer to wake up the system.

>
> > The platform should be allowed to go in this state when the devices
> > within the power domain are pm_runtime_suspend and the CPUs in WFI.
>
> Nope, we don't save and restore state when we enter/exit WFI. And hence
> we can't allow deeper idle states in the hierarchy. No more discussion
> on that.

>
> > In DT I have one system power domain where the hardware blocks (i2,
> > uart; spi, etc..) are attached + a power per CPU.
>
> You really need a CPU idle state here.
>
> --
> Regards,
> Sudeep
>
> [1] https://lore.kernel.org/linux-arm-kernel/a42dd20677cddd8d09ea91a369a4e10b@www.loen.fr/
Benjamin Gaignard March 6, 2020, 2:44 p.m. UTC | #4
Le ven. 6 mars 2020 à 15:23, Sudeep Holla <sudeep.holla@arm.com> a écrit :
>
> On Fri, Mar 06, 2020 at 01:32:59PM +0100, Benjamin Gaignard wrote:
> > Le ven. 6 mars 2020 à 13:06, Sudeep Holla <sudeep.holla@arm.com> a écrit :
> > >
>
> [...]
>
> > > Sure, please create a deeper CPU state than WFI and enter so that the CPU
> > > state is saved and restored correctly. What is the problem doing that ?
> >
> > This state stop the clocks for all the hardware blocks and not only the CPUs
> > so we can't go on it while devices aren't suspended.
> > I may have missed something but I don't believe that I could add this kind of
> > conditions in a cpu idle state, right ?
> > In this state I need to be able to enable the wake up sources because
> > it is the only
> > for hardware block used as broadcast timer to wake up the system.
> >
>
> We have discussed this in past in the thread I mentioned and may be
> others too. It sounds like a broken hardware, sorry if I am wrong.
> But this $subject patch is a hack to solve that and I am NACK-ing this
> now. Please fix it adding another CPU level idle state, we are not
> supporting without that and there is absolutely no need to.

A CPU idle state only take care of CPU activities, right ? but before going in
the targeting state I need to be sure that the other hardware blocks
are suspended.
Is it possible to describe that in an idle state ?
What sound broken ? is it because we need to setup the wake up sources ?

>
> --
> Regards,
> Sudeep
diff mbox series

Patch

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index bae9140a65a5..ae0fabec2742 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -56,16 +56,19 @@  static int psci_enter_domain_idle_state(struct cpuidle_device *dev,
 	u32 *states = data->psci_states;
 	struct device *pd_dev = data->dev;
 	u32 state;
-	int ret;
+	int ret = 0;
 
 	/* Do runtime PM to manage a hierarchical CPU toplogy. */
 	pm_runtime_put_sync_suspend(pd_dev);
 
 	state = psci_get_domain_state();
-	if (!state)
+	if (!state && states)
 		state = states[idx];
 
-	ret = psci_enter_state(idx, state);
+	if (state)
+		ret = psci_cpu_suspend_enter(state) ? -1 : idx;
+	else
+		cpu_do_idle();
 
 	pm_runtime_get_sync(pd_dev);
 
@@ -180,7 +183,7 @@  static int __init psci_dt_cpu_init_topology(struct cpuidle_driver *drv,
 	drv->states[state_count - 1].enter = psci_enter_domain_idle_state;
 	psci_cpuidle_use_cpuhp = true;
 
-	return 0;
+	return 1;
 }
 
 static int __init psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
@@ -192,6 +195,13 @@  static int __init psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
 	struct device_node *state_node;
 	struct psci_cpuidle_data *data = per_cpu_ptr(&psci_cpuidle_data, cpu);
 
+	/*
+	 * Special case when WFI is the only state, as we may still need to
+	 * initialize data, if the hierarchical topology is used.
+	 */
+	if (!state_count)
+		return psci_dt_cpu_init_topology(drv, data, 1, cpu);
+
 	state_count++; /* Add WFI state too */
 	psci_states = kcalloc(state_count, sizeof(*psci_states), GFP_KERNEL);
 	if (!psci_states)
@@ -223,7 +233,7 @@  static int __init psci_dt_cpu_init_idle(struct cpuidle_driver *drv,
 
 	/* Idle states parsed correctly, store them in the per-cpu struct. */
 	data->psci_states = psci_states;
-	return 0;
+	return state_count;
 
 free_mem:
 	kfree(psci_states);
@@ -282,33 +292,35 @@  static int __init psci_idle_init_cpu(int cpu)
 		return -ENOMEM;
 
 	drv->cpumask = (struct cpumask *)cpumask_of(cpu);
+	drv->state_count = 1;
 
 	/*
-	 * Initialize idle states data, starting at index 1, since
-	 * by default idle state 0 is the quiescent state reached
-	 * by the cpu by executing the wfi instruction.
-	 *
-	 * If no DT idle states are detected (ret == 0) let the driver
-	 * initialization fail accordingly since there is no reason to
-	 * initialize the idle driver if only wfi is supported, the
-	 * default archictectural back-end already executes wfi
-	 * on idle entry.
+	 * Initialize idle states data, starting at index 1, since by default
+	 * idle state 0 is the quiescent state reached by the cpu by executing
+	 * the wfi instruction. If no DT idle states are detected (ret == 0),
+	 * we may still use the hierarchical topology.
 	 */
 	ret = dt_init_idle_driver(drv, psci_idle_state_match, 1);
-	if (ret <= 0) {
-		ret = ret ? : -ENODEV;
+	if (ret < 0)
 		goto out_kfree_drv;
-	}
 
 	/*
 	 * Initialize PSCI idle states.
 	 */
 	ret = psci_cpu_init_idle(drv, cpu, ret);
-	if (ret) {
+	if (ret < 0) {
 		pr_err("CPU %d failed to PSCI idle\n", cpu);
 		goto out_kfree_drv;
 	}
 
+	/* If there are no idle states to manage, but the wfi state and we also
+	 * don't use the hierarchical topology, let the driver initialization
+	 * fail. Instead, let's rely on the default architectural back-end to
+	 * execute wfi on idle entry.
+	 */
+	if (!ret)
+		goto out_kfree_drv;
+
 	ret = cpuidle_register(drv, NULL);
 	if (ret)
 		goto out_kfree_drv;