diff mbox series

[v3] cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT

Message ID 20230119184228.683892-1-krzysztof.kozlowski@linaro.org
State New
Headers show
Series [v3] cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT | expand

Commit Message

Krzysztof Kozlowski Jan. 19, 2023, 6:42 p.m. UTC
The runtime Power Management of CPU topology is not compatible with
PREEMPT_RT:
1. Core cpuidle path disables IRQs.
2. Core cpuidle calls cpuidle-psci.
3. cpuidle-psci in __psci_enter_domain_idle_state() calls
   pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use
   spinlocks (which are sleeping on PREEMPT_RT).

Deep sleep modes are not a priority of Realtime kernels because the
latencies might become unpredictable.  On the other hand the PSCI CPU
idle power domain is a parent of other devices and power domain
controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250).

Disable the runtime PM calls from cpuidle-psci, which effectively stops
suspending the cpuidle PSCI domain.  This is a trade-off between making
PREEMPT_RT working and still having a proper power domain hierarchy in
the system.

Cc: Adrien Thierry <athierry@redhat.com>
Cc: Brian Masney <bmasney@redhat.com>
Cc: linux-rt-users@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>

---

Changes since v1:
1. Re-work commit msg.
2. Add note to Kconfig.

Several other patches were dropped, as this is the only one actually
needed.  It effectively stops PSCI cpuidle power domains from suspending
thus solving all other issues I experienced.
---
 drivers/cpuidle/Kconfig.arm    | 3 +++
 drivers/cpuidle/cpuidle-psci.c | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

Comments

Ulf Hansson Jan. 24, 2023, 10:33 a.m. UTC | #1
On Thu, 19 Jan 2023 at 19:42, Krzysztof Kozlowski
<krzysztof.kozlowski@linaro.org> wrote:
>
> The runtime Power Management of CPU topology is not compatible with
> PREEMPT_RT:
> 1. Core cpuidle path disables IRQs.
> 2. Core cpuidle calls cpuidle-psci.
> 3. cpuidle-psci in __psci_enter_domain_idle_state() calls
>    pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use
>    spinlocks (which are sleeping on PREEMPT_RT).
>
> Deep sleep modes are not a priority of Realtime kernels because the
> latencies might become unpredictable.  On the other hand the PSCI CPU
> idle power domain is a parent of other devices and power domain
> controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250).
>
> Disable the runtime PM calls from cpuidle-psci, which effectively stops
> suspending the cpuidle PSCI domain.  This is a trade-off between making
> PREEMPT_RT working and still having a proper power domain hierarchy in
> the system.

I think this sounds like a reasonable compromise, at least at this point.

>
> Cc: Adrien Thierry <athierry@redhat.com>
> Cc: Brian Masney <bmasney@redhat.com>
> Cc: linux-rt-users@vger.kernel.org
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>
> ---
>
> Changes since v1:
> 1. Re-work commit msg.
> 2. Add note to Kconfig.
>
> Several other patches were dropped, as this is the only one actually
> needed.  It effectively stops PSCI cpuidle power domains from suspending
> thus solving all other issues I experienced.

I like this approach better, thanks!

> ---
>  drivers/cpuidle/Kconfig.arm    | 3 +++
>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
> index 747aa537389b..24429b5bfd1c 100644
> --- a/drivers/cpuidle/Kconfig.arm
> +++ b/drivers/cpuidle/Kconfig.arm
> @@ -24,6 +24,9 @@ config ARM_PSCI_CPUIDLE
>           It provides an idle driver that is capable of detecting and
>           managing idle states through the PSCI firmware interface.
>
> +         The driver is not yet compatible with PREEMPT_RT: no idle states will
> +         be entered by CPUs on such kernel.

This isn't entirely correct. In principle your suggested change ends
up providing the below updated behaviour for PREEMPT_RT.

*) If the idle states are described with the non-hierarchical layout,
all idle states are still available.
**) If the idle states are described with the hierarchical layout,
only the idle states defined per CPU are available, but not the ones
being shared among a group of CPUs (aka cluster idle states).

Perhaps there is an easier way to summarize what I stated above?

> +
>  config ARM_PSCI_CPUIDLE_DOMAIN
>         bool "PSCI CPU idle Domain"
>         depends on ARM_PSCI_CPUIDLE
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 312a34ef28dc..c25592718984 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -66,7 +66,7 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
>         /* Do runtime PM to manage a hierarchical CPU toplogy. */
>         if (s2idle)
>                 dev_pm_genpd_suspend(pd_dev);
> -       else
> +       else if (!IS_ENABLED(CONFIG_PREEMPT_RT))

Rather than doing this (and the below) in
__psci_enter_domain_idle_state(), I suggest replacing this with a
bailout point in psci_dt_cpu_init_topology(). That would prevent the
__psci_enter_domain_idle_state() from being called altogether, which
is really what we need.

Moreover, I think it would make sense to set the GENPD_FLAG_ALWAYS_ON
for the corresponding genpd, when CONFIG_PREEMPT_RT is set. See
psci_pd_init().

>                 pm_runtime_put_sync_suspend(pd_dev);
>
>         state = psci_get_domain_state();
> @@ -77,7 +77,7 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
>
>         if (s2idle)
>                 dev_pm_genpd_resume(pd_dev);
> -       else
> +       else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
>                 pm_runtime_get_sync(pd_dev);
>
>         cpu_pm_exit();
> --

Kind regards
Uffe
Sudeep Holla Jan. 24, 2023, 3:34 p.m. UTC | #2
On Thu, Jan 19, 2023 at 07:42:28PM +0100, Krzysztof Kozlowski wrote:
> The runtime Power Management of CPU topology is not compatible with
> PREEMPT_RT:
> 1. Core cpuidle path disables IRQs.
> 2. Core cpuidle calls cpuidle-psci.
> 3. cpuidle-psci in __psci_enter_domain_idle_state() calls
>    pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use
>    spinlocks (which are sleeping on PREEMPT_RT).
> 
> Deep sleep modes are not a priority of Realtime kernels because the
> latencies might become unpredictable.  On the other hand the PSCI CPU
> idle power domain is a parent of other devices and power domain
> controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250).
> 
> Disable the runtime PM calls from cpuidle-psci, which effectively stops
> suspending the cpuidle PSCI domain.  This is a trade-off between making
> PREEMPT_RT working and still having a proper power domain hierarchy in
> the system.
> 
> Cc: Adrien Thierry <athierry@redhat.com>
> Cc: Brian Masney <bmasney@redhat.com>
> Cc: linux-rt-users@vger.kernel.org
> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> 
> ---
> 
> Changes since v1:
> 1. Re-work commit msg.
> 2. Add note to Kconfig.
> 
> Several other patches were dropped, as this is the only one actually
> needed.  It effectively stops PSCI cpuidle power domains from suspending
> thus solving all other issues I experienced.
> ---
>  drivers/cpuidle/Kconfig.arm    | 3 +++
>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
> index 747aa537389b..24429b5bfd1c 100644
> --- a/drivers/cpuidle/Kconfig.arm
> +++ b/drivers/cpuidle/Kconfig.arm
> @@ -24,6 +24,9 @@ config ARM_PSCI_CPUIDLE
>  	  It provides an idle driver that is capable of detecting and
>  	  managing idle states through the PSCI firmware interface.
>  
> +	  The driver is not yet compatible with PREEMPT_RT: no idle states will
> +	  be entered by CPUs on such kernel.
> +

Any particular reason for even compiling this file in or allowing the
ARM_PSCI_CPUIDLE when PREEMPT_RT=y ? If we can't enter idle states, we
can as well compile this file out ?
Krzysztof Kozlowski Jan. 25, 2023, 7:42 a.m. UTC | #3
On 24/01/2023 16:34, Sudeep Holla wrote:
> On Thu, Jan 19, 2023 at 07:42:28PM +0100, Krzysztof Kozlowski wrote:
>> The runtime Power Management of CPU topology is not compatible with
>> PREEMPT_RT:
>> 1. Core cpuidle path disables IRQs.
>> 2. Core cpuidle calls cpuidle-psci.
>> 3. cpuidle-psci in __psci_enter_domain_idle_state() calls
>>    pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use
>>    spinlocks (which are sleeping on PREEMPT_RT).
>>
>> Deep sleep modes are not a priority of Realtime kernels because the
>> latencies might become unpredictable.  On the other hand the PSCI CPU
>> idle power domain is a parent of other devices and power domain
>> controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250).
>>
>> Disable the runtime PM calls from cpuidle-psci, which effectively stops
>> suspending the cpuidle PSCI domain.  This is a trade-off between making
>> PREEMPT_RT working and still having a proper power domain hierarchy in
>> the system.
>>
>> Cc: Adrien Thierry <athierry@redhat.com>
>> Cc: Brian Masney <bmasney@redhat.com>
>> Cc: linux-rt-users@vger.kernel.org
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>>
>> ---
>>
>> Changes since v1:
>> 1. Re-work commit msg.
>> 2. Add note to Kconfig.
>>
>> Several other patches were dropped, as this is the only one actually
>> needed.  It effectively stops PSCI cpuidle power domains from suspending
>> thus solving all other issues I experienced.
>> ---
>>  drivers/cpuidle/Kconfig.arm    | 3 +++
>>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
>> index 747aa537389b..24429b5bfd1c 100644
>> --- a/drivers/cpuidle/Kconfig.arm
>> +++ b/drivers/cpuidle/Kconfig.arm
>> @@ -24,6 +24,9 @@ config ARM_PSCI_CPUIDLE
>>  	  It provides an idle driver that is capable of detecting and
>>  	  managing idle states through the PSCI firmware interface.
>>  
>> +	  The driver is not yet compatible with PREEMPT_RT: no idle states will
>> +	  be entered by CPUs on such kernel.
>> +
> 
> Any particular reason for even compiling this file in or allowing the
> ARM_PSCI_CPUIDLE when PREEMPT_RT=y ? If we can't enter idle states, we
> can as well compile this file out ?

It's the power domain sued for other devices, so we need it. Otherwise
other devices will keep waiting for this missing power domain provider.

Best regards,
Krzysztof
Ulf Hansson Jan. 25, 2023, 10:08 a.m. UTC | #4
On Wed, 25 Jan 2023 at 08:43, Krzysztof Kozlowski
<krzysztof.kozlowski@linaro.org> wrote:
>
> On 24/01/2023 16:34, Sudeep Holla wrote:
> > On Thu, Jan 19, 2023 at 07:42:28PM +0100, Krzysztof Kozlowski wrote:
> >> The runtime Power Management of CPU topology is not compatible with
> >> PREEMPT_RT:
> >> 1. Core cpuidle path disables IRQs.
> >> 2. Core cpuidle calls cpuidle-psci.
> >> 3. cpuidle-psci in __psci_enter_domain_idle_state() calls
> >>    pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use
> >>    spinlocks (which are sleeping on PREEMPT_RT).
> >>
> >> Deep sleep modes are not a priority of Realtime kernels because the
> >> latencies might become unpredictable.  On the other hand the PSCI CPU
> >> idle power domain is a parent of other devices and power domain
> >> controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250).
> >>
> >> Disable the runtime PM calls from cpuidle-psci, which effectively stops
> >> suspending the cpuidle PSCI domain.  This is a trade-off between making
> >> PREEMPT_RT working and still having a proper power domain hierarchy in
> >> the system.
> >>
> >> Cc: Adrien Thierry <athierry@redhat.com>
> >> Cc: Brian Masney <bmasney@redhat.com>
> >> Cc: linux-rt-users@vger.kernel.org
> >> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> >>
> >> ---
> >>
> >> Changes since v1:
> >> 1. Re-work commit msg.
> >> 2. Add note to Kconfig.
> >>
> >> Several other patches were dropped, as this is the only one actually
> >> needed.  It effectively stops PSCI cpuidle power domains from suspending
> >> thus solving all other issues I experienced.
> >> ---
> >>  drivers/cpuidle/Kconfig.arm    | 3 +++
> >>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
> >>  2 files changed, 5 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
> >> index 747aa537389b..24429b5bfd1c 100644
> >> --- a/drivers/cpuidle/Kconfig.arm
> >> +++ b/drivers/cpuidle/Kconfig.arm
> >> @@ -24,6 +24,9 @@ config ARM_PSCI_CPUIDLE
> >>        It provides an idle driver that is capable of detecting and
> >>        managing idle states through the PSCI firmware interface.
> >>
> >> +      The driver is not yet compatible with PREEMPT_RT: no idle states will
> >> +      be entered by CPUs on such kernel.
> >> +
> >
> > Any particular reason for even compiling this file in or allowing the
> > ARM_PSCI_CPUIDLE when PREEMPT_RT=y ? If we can't enter idle states, we
> > can as well compile this file out ?
>
> It's the power domain sued for other devices, so we need it. Otherwise
> other devices will keep waiting for this missing power domain provider.

Yes.

And we are still able to use those idle states that are solely per
CPU, which is probably nice to have. No?

Kind regards
Uffe
Krzysztof Kozlowski Jan. 25, 2023, 10:44 a.m. UTC | #5
On 24/01/2023 11:33, Ulf Hansson wrote:
> On Thu, 19 Jan 2023 at 19:42, Krzysztof Kozlowski
> <krzysztof.kozlowski@linaro.org> wrote:
>>
>> The runtime Power Management of CPU topology is not compatible with
>> PREEMPT_RT:
>> 1. Core cpuidle path disables IRQs.
>> 2. Core cpuidle calls cpuidle-psci.
>> 3. cpuidle-psci in __psci_enter_domain_idle_state() calls
>>    pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use
>>    spinlocks (which are sleeping on PREEMPT_RT).
>>
>> Deep sleep modes are not a priority of Realtime kernels because the
>> latencies might become unpredictable.  On the other hand the PSCI CPU
>> idle power domain is a parent of other devices and power domain
>> controllers, thus it cannot be simply skipped (e.g. on Qualcomm SM8250).
>>
>> Disable the runtime PM calls from cpuidle-psci, which effectively stops
>> suspending the cpuidle PSCI domain.  This is a trade-off between making
>> PREEMPT_RT working and still having a proper power domain hierarchy in
>> the system.
> 
> I think this sounds like a reasonable compromise, at least at this point.
> 
>>
>> Cc: Adrien Thierry <athierry@redhat.com>
>> Cc: Brian Masney <bmasney@redhat.com>
>> Cc: linux-rt-users@vger.kernel.org
>> Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
>>
>> ---
>>
>> Changes since v1:
>> 1. Re-work commit msg.
>> 2. Add note to Kconfig.
>>
>> Several other patches were dropped, as this is the only one actually
>> needed.  It effectively stops PSCI cpuidle power domains from suspending
>> thus solving all other issues I experienced.
> 
> I like this approach better, thanks!
> 
>> ---
>>  drivers/cpuidle/Kconfig.arm    | 3 +++
>>  drivers/cpuidle/cpuidle-psci.c | 4 ++--
>>  2 files changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
>> index 747aa537389b..24429b5bfd1c 100644
>> --- a/drivers/cpuidle/Kconfig.arm
>> +++ b/drivers/cpuidle/Kconfig.arm
>> @@ -24,6 +24,9 @@ config ARM_PSCI_CPUIDLE
>>           It provides an idle driver that is capable of detecting and
>>           managing idle states through the PSCI firmware interface.
>>
>> +         The driver is not yet compatible with PREEMPT_RT: no idle states will
>> +         be entered by CPUs on such kernel.
> 
> This isn't entirely correct. In principle your suggested change ends
> up providing the below updated behaviour for PREEMPT_RT.
> 
> *) If the idle states are described with the non-hierarchical layout,
> all idle states are still available.
> **) If the idle states are described with the hierarchical layout,
> only the idle states defined per CPU are available, but not the ones
> being shared among a group of CPUs (aka cluster idle states).
> 
> Perhaps there is an easier way to summarize what I stated above?

Yes, I'll correct the message.

> 
>> +
>>  config ARM_PSCI_CPUIDLE_DOMAIN
>>         bool "PSCI CPU idle Domain"
>>         depends on ARM_PSCI_CPUIDLE
>> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
>> index 312a34ef28dc..c25592718984 100644
>> --- a/drivers/cpuidle/cpuidle-psci.c
>> +++ b/drivers/cpuidle/cpuidle-psci.c
>> @@ -66,7 +66,7 @@ static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
>>         /* Do runtime PM to manage a hierarchical CPU toplogy. */
>>         if (s2idle)
>>                 dev_pm_genpd_suspend(pd_dev);
>> -       else
>> +       else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> 
> Rather than doing this (and the below) in
> __psci_enter_domain_idle_state(), I suggest replacing this with a
> bailout point in psci_dt_cpu_init_topology(). That would prevent the
> __psci_enter_domain_idle_state() from being called altogether, which
> is really what we need.

Ack

> 
> Moreover, I think it would make sense to set the GENPD_FLAG_ALWAYS_ON
> for the corresponding genpd, when CONFIG_PREEMPT_RT is set. See
> psci_pd_init().

Makes sense.


Best regards,
Krzysztof
Sudeep Holla Jan. 25, 2023, 11:19 a.m. UTC | #6
On Wed, Jan 25, 2023 at 11:08:04AM +0100, Ulf Hansson wrote:
> On Wed, 25 Jan 2023 at 08:43, Krzysztof Kozlowski
> <krzysztof.kozlowski@linaro.org> wrote:
> >
> > On 24/01/2023 16:34, Sudeep Holla wrote:

[...]

> > > Any particular reason for even compiling this file in or allowing the
> > > ARM_PSCI_CPUIDLE when PREEMPT_RT=y ? If we can't enter idle states, we
> > > can as well compile this file out ?
> >
> > It's the power domain sued for other devices, so we need it. Otherwise
> > other devices will keep waiting for this missing power domain provider.
> 
> Yes.
> 
> And we are still able to use those idle states that are solely per
> CPU, which is probably nice to have. No?
> 

Makes sense, thanks for the explanation. Also the other discussions clears
probably questions I had.
Sebastian Andrzej Siewior Jan. 30, 2023, 10:04 a.m. UTC | #7
On 2023-01-25 11:08:04 [+0100], Ulf Hansson wrote:
> > It's the power domain sued for other devices, so we need it. Otherwise
> > other devices will keep waiting for this missing power domain provider.
> 
Ach this explains my other question I had.
> 
> And we are still able to use those idle states that are solely per
> CPU, which is probably nice to have. No?

If the entry/ exit latency is not known, it can be dangerous.

> Kind regards
> Uffe

Sebastian
diff mbox series

Patch

diff --git a/drivers/cpuidle/Kconfig.arm b/drivers/cpuidle/Kconfig.arm
index 747aa537389b..24429b5bfd1c 100644
--- a/drivers/cpuidle/Kconfig.arm
+++ b/drivers/cpuidle/Kconfig.arm
@@ -24,6 +24,9 @@  config ARM_PSCI_CPUIDLE
 	  It provides an idle driver that is capable of detecting and
 	  managing idle states through the PSCI firmware interface.
 
+	  The driver is not yet compatible with PREEMPT_RT: no idle states will
+	  be entered by CPUs on such kernel.
+
 config ARM_PSCI_CPUIDLE_DOMAIN
 	bool "PSCI CPU idle Domain"
 	depends on ARM_PSCI_CPUIDLE
diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 312a34ef28dc..c25592718984 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -66,7 +66,7 @@  static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
 	/* Do runtime PM to manage a hierarchical CPU toplogy. */
 	if (s2idle)
 		dev_pm_genpd_suspend(pd_dev);
-	else
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
 		pm_runtime_put_sync_suspend(pd_dev);
 
 	state = psci_get_domain_state();
@@ -77,7 +77,7 @@  static __cpuidle int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
 
 	if (s2idle)
 		dev_pm_genpd_resume(pd_dev);
-	else
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
 		pm_runtime_get_sync(pd_dev);
 
 	cpu_pm_exit();