diff mbox series

[v2,3/5] cpuidle: psci: Do not suspend topology CPUs on PREEMPT_RT

Message ID 20221219151503.385816-4-krzysztof.kozlowski@linaro.org
State New
Headers show
Series PM: Fixes for Realtime systems | expand

Commit Message

Krzysztof Kozlowski Dec. 19, 2022, 3:15 p.m. UTC
Realtime kernels are not supposed to go into deep sleep modes (neither
system nor CPUs) because the latencies might become unpredictable.
Therefore actual power suspending of CPU topology is not that important.

On the other hand, this runtime PM of CPU topology is not compatible
with PREEMPT_RT:
1. Core cpuidle path disables IRQs.
2. Core cpuidle calls cpuidle-psci.
3. cpuidle-psci in __psci_enter_domain_idle_state() calls
   pm_runtime_put_sync_suspend() and pm_runtime_get_sync() which use
   spinlocks (which are sleeping on PREEMPT_RT):

  BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
  in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/0
  preempt_count: 1, expected: 0
  RCU nest depth: 0, expected: 0
  1 lock held by swapper/0/0:
   #0: ffff20524613c1a0 (&dev->power.lock){+.+.}-{3:3}, at: __pm_runtime_suspend+0x30/0xac
  irq event stamp: 18776
  hardirqs last  enabled at (18775): [<ffffa4853720ac80>] tick_nohz_idle_enter+0x7c/0x17c
  hardirqs last disabled at (18776): [<ffffa4853717ae00>] do_idle+0xe0/0x300
  softirqs last  enabled at (4310): [<ffffa48537126398>] __local_bh_enable_ip+0x98/0x280
  softirqs last disabled at (4288): [<ffffa4853721c250>] cgroup_idr_alloc.constprop.0+0x0/0xd0
  Preemption disabled at:
  [<ffffa485381e55a0>] schedule_preempt_disabled+0x20/0x30
  CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W          6.1.0-rt5-00322-gb49b67f1d8dc #1
  Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
  Call trace:
   dump_backtrace.part.0+0xe0/0xf0
   show_stack+0x18/0x40
   dump_stack_lvl+0x8c/0xb8
   dump_stack+0x18/0x34
   __might_resched+0x17c/0x214
   rt_spin_lock+0x5c/0x100
   __pm_runtime_suspend+0x30/0xac
   __psci_enter_domain_idle_state.constprop.0+0x60/0x104
   psci_enter_domain_idle_state+0x18/0x2c
   cpuidle_enter_state+0x220/0x37c
   cpuidle_enter+0x38/0x50
   do_idle+0x258/0x300
   cpu_startup_entry+0x28/0x30
   rest_init+0x104/0x180
   arch_post_acpi_subsys_init+0x0/0x18
   start_kernel+0x6fc/0x73c
   __primary_switched+0xbc/0xc4

Cc: Adrien Thierry <athierry@redhat.com>
Cc: Brian Masney <bmasney@redhat.com>
Cc: linux-rt-users@vger.kernel.org
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---
 drivers/cpuidle/cpuidle-psci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Sebastian Andrzej Siewior Jan. 12, 2023, 11:09 a.m. UTC | #1
On 2022-12-19 16:15:01 [+0100], Krzysztof Kozlowski wrote:
> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> index 57bc3e3ae391..9d971cc4b12b 100644
> --- a/drivers/cpuidle/cpuidle-psci.c
> +++ b/drivers/cpuidle/cpuidle-psci.c
> @@ -72,7 +72,7 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
>  	ct_irq_enter_irqson();
>  	if (s2idle)
>  		dev_pm_genpd_suspend(pd_dev);
> -	else
> +	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
>  		pm_runtime_put_sync_suspend(pd_dev);

So based on the commit description you run into a sleeping lock in
pm_runtime_put_sync_suspend() while the CPU is in an IRQ-off region.
Why is it okay to skip it on PREEMPT_RT?

>  	ct_irq_exit_irqson();
>  

Sebastian
Krzysztof Kozlowski Jan. 12, 2023, 11:34 a.m. UTC | #2
On 12/01/2023 12:09, Sebastian Andrzej Siewior wrote:
> On 2022-12-19 16:15:01 [+0100], Krzysztof Kozlowski wrote:
>> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
>> index 57bc3e3ae391..9d971cc4b12b 100644
>> --- a/drivers/cpuidle/cpuidle-psci.c
>> +++ b/drivers/cpuidle/cpuidle-psci.c
>> @@ -72,7 +72,7 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
>>  	ct_irq_enter_irqson();
>>  	if (s2idle)
>>  		dev_pm_genpd_suspend(pd_dev);
>> -	else
>> +	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
>>  		pm_runtime_put_sync_suspend(pd_dev);
> 
> So based on the commit description you run into a sleeping lock in
> pm_runtime_put_sync_suspend() while the CPU is in an IRQ-off region.
> Why is it okay to skip it on PREEMPT_RT?

It is okay to skip it everywhere, you just don't get a suspended CPU.
Why PREEMPT_RT is different here - having suspended CPU is a great way
to have longer or even unpredictable (as it goes to firmware which is
out of control of Linux) latencies.

Best regards,
Krzysztof
Sebastian Andrzej Siewior Jan. 30, 2023, 9:51 a.m. UTC | #3
On 2023-01-12 12:34:35 [+0100], Krzysztof Kozlowski wrote:
> On 12/01/2023 12:09, Sebastian Andrzej Siewior wrote:
> > On 2022-12-19 16:15:01 [+0100], Krzysztof Kozlowski wrote:
> >> diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
> >> index 57bc3e3ae391..9d971cc4b12b 100644
> >> --- a/drivers/cpuidle/cpuidle-psci.c
> >> +++ b/drivers/cpuidle/cpuidle-psci.c
> >> @@ -72,7 +72,7 @@ static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
> >>  	ct_irq_enter_irqson();
> >>  	if (s2idle)
> >>  		dev_pm_genpd_suspend(pd_dev);
> >> -	else
> >> +	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
> >>  		pm_runtime_put_sync_suspend(pd_dev);
> > 
> > So based on the commit description you run into a sleeping lock in
> > pm_runtime_put_sync_suspend() while the CPU is in an IRQ-off region.
> > Why is it okay to skip it on PREEMPT_RT?
> 
> It is okay to skip it everywhere, you just don't get a suspended CPU.
> Why PREEMPT_RT is different here - having suspended CPU is a great way
> to have longer or even unpredictable (as it goes to firmware which is
> out of control of Linux) latencies.

On X86 C1 has a latency of less than 5us and this is exposed by the
firmware. C1E and everything that follows has a much higher entry/ exit
latency which makes not usable.
The entry/exit latency seems not to be exposed by PSCI. My understanding
is that the driver is now enabled but not doing any suspending, right?
If so, why isn't it completely disabled?

> Best regards,
> Krzysztof

Sebastian
diff mbox series

Patch

diff --git a/drivers/cpuidle/cpuidle-psci.c b/drivers/cpuidle/cpuidle-psci.c
index 57bc3e3ae391..9d971cc4b12b 100644
--- a/drivers/cpuidle/cpuidle-psci.c
+++ b/drivers/cpuidle/cpuidle-psci.c
@@ -72,7 +72,7 @@  static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
 	ct_irq_enter_irqson();
 	if (s2idle)
 		dev_pm_genpd_suspend(pd_dev);
-	else
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
 		pm_runtime_put_sync_suspend(pd_dev);
 	ct_irq_exit_irqson();
 
@@ -85,7 +85,7 @@  static int __psci_enter_domain_idle_state(struct cpuidle_device *dev,
 	ct_irq_enter_irqson();
 	if (s2idle)
 		dev_pm_genpd_resume(pd_dev);
-	else
+	else if (!IS_ENABLED(CONFIG_PREEMPT_RT))
 		pm_runtime_get_sync(pd_dev);
 	ct_irq_exit_irqson();