diff mbox

[RFC,PATCHC,2/3] idle: store the idle state the cpu is

Message ID 1396009796-31598-3-git-send-email-daniel.lezcano@linaro.org
State New
Headers show

Commit Message

Daniel Lezcano March 28, 2014, 12:29 p.m. UTC
When the cpu enters idle it stores the cpuidle power info in the struct
rq which in turn could be used to take a right decision when balancing
a task.

As soon as the cpu exits the idle state, the structure is filled with the
NULL pointer.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/idle.c  |   17 +++++++++++++++--
 kernel/sched/sched.h |    5 +++++
 2 files changed, 20 insertions(+), 2 deletions(-)

Comments

Peter Zijlstra April 15, 2014, 12:43 p.m. UTC | #1
On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>  			if (!ret) {
>  				trace_cpu_idle_rcuidle(next_state, dev->cpu);
>  
> +				*power = &drv->states[next_state].power;
> +
> +				wmb();
> +

I very much suspect you meant: smp_wmb(), as I don't see the hardware
reading that pointer, therefore UP wouldn't care. Also, any and all
barriers should come with a comment that describes the data ordering and
points to the matchin barriers.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra April 15, 2014, 12:44 p.m. UTC | #2
On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> > @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
> >  			if (!ret) {
> >  				trace_cpu_idle_rcuidle(next_state, dev->cpu);
> >  
> > +				*power = &drv->states[next_state].power;
> > +
> > +				wmb();
> > +
> 
> I very much suspect you meant: smp_wmb(), as I don't see the hardware
> reading that pointer, therefore UP wouldn't care. Also, any and all
> barriers should come with a comment that describes the data ordering and
> points to the matchin barriers.

Furthermore, this patch fails to describe the life-time rules of the
object placed there. Can the objected pointed to ever disappear?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Daniel Lezcano April 15, 2014, 2:17 p.m. UTC | #3
On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
> On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
>> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
>>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>>>   			if (!ret) {
>>>   				trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>>
>>> +				*power = &drv->states[next_state].power;
>>> +
>>> +				wmb();
>>> +
>>
>> I very much suspect you meant: smp_wmb(), as I don't see the hardware
>> reading that pointer, therefore UP wouldn't care. Also, any and all
>> barriers should come with a comment that describes the data ordering and
>> points to the matchin barriers.
>
> Furthermore, this patch fails to describe the life-time rules of the
> object placed there. Can the objected pointed to ever disappear?

Hi Peter,

thanks for reviewing the patches.

There are a couple of situations where a cpuidle state can disappear:

1. For x86/acpi with dynamic c-states, when a laptop switches from 
battery to AC that could result on removing the deeper idle state. The 
acpi driver triggers:

'acpi_processor_cst_has_changed' which will call 
'cpuidle_pause_and_lock'. This one will call 
'cpuidle_uninstall_idle_handler' which in turn calls 'kick_all_cpus_sync'.

All cpus will exit their idle state and the pointed object will be set 
to NULL again.

2. The cpuidle driver is unloaded. Logically that could happen but not 
in practice because the drivers are always compiled in and 95% of the 
drivers are not coded to unregister the driver. Anyway ...

The unloading code must call 'cpuidle_unregister_device', that calls 
'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.

IIUC, the race can happen if we take the pointer and then one of these 
two situation occurs at the same moment.

As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a 
rcu_barrier in 'cpuidle_pause_and_lock' or 
'cpuidle_uninstall_idle_handler' should suffice, no ?

Thanks

   -- Daniel
Peter Zijlstra April 15, 2014, 2:33 p.m. UTC | #4
On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote:
> On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
> >On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
> >>On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
> >>>@@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
> >>>  			if (!ret) {
> >>>  				trace_cpu_idle_rcuidle(next_state, dev->cpu);
> >>>
> >>>+				*power = &drv->states[next_state].power;
> >>>+
> >>>+				wmb();
> >>>+
> >>
> >>I very much suspect you meant: smp_wmb(), as I don't see the hardware
> >>reading that pointer, therefore UP wouldn't care. Also, any and all
> >>barriers should come with a comment that describes the data ordering and
> >>points to the matchin barriers.
> >
> >Furthermore, this patch fails to describe the life-time rules of the
> >object placed there. Can the objected pointed to ever disappear?
> 
> Hi Peter,
> 
> thanks for reviewing the patches.
> 
> There are a couple of situations where a cpuidle state can disappear:
> 
> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
> to AC that could result on removing the deeper idle state. The acpi driver
> triggers:
> 
> 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'.
> This one will call 'cpuidle_uninstall_idle_handler' which in turn calls
> 'kick_all_cpus_sync'.
> 
> All cpus will exit their idle state and the pointed object will be set to
> NULL again.
> 
> 2. The cpuidle driver is unloaded. Logically that could happen but not in
> practice because the drivers are always compiled in and 95% of the drivers
> are not coded to unregister the driver. Anyway ...
> 
> The unloading code must call 'cpuidle_unregister_device', that calls
> 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.
> 
> IIUC, the race can happen if we take the pointer and then one of these two
> situation occurs at the same moment.
> 
> As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a
> rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler'
> should suffice, no ?

Indeed. But be sure to document this.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano April 15, 2014, 2:39 p.m. UTC | #5
On 04/15/2014 04:33 PM, Peter Zijlstra wrote:
> On Tue, Apr 15, 2014 at 04:17:36PM +0200, Daniel Lezcano wrote:
>> On 04/15/2014 02:44 PM, Peter Zijlstra wrote:
>>> On Tue, Apr 15, 2014 at 02:43:30PM +0200, Peter Zijlstra wrote:
>>>> On Fri, Mar 28, 2014 at 01:29:55PM +0100, Daniel Lezcano wrote:
>>>>> @@ -143,6 +145,10 @@ static int cpuidle_idle_call(void)
>>>>>   			if (!ret) {
>>>>>   				trace_cpu_idle_rcuidle(next_state, dev->cpu);
>>>>>
>>>>> +				*power = &drv->states[next_state].power;
>>>>> +
>>>>> +				wmb();
>>>>> +
>>>>
>>>> I very much suspect you meant: smp_wmb(), as I don't see the hardware
>>>> reading that pointer, therefore UP wouldn't care. Also, any and all
>>>> barriers should come with a comment that describes the data ordering and
>>>> points to the matchin barriers.
>>>
>>> Furthermore, this patch fails to describe the life-time rules of the
>>> object placed there. Can the objected pointed to ever disappear?
>>
>> Hi Peter,
>>
>> thanks for reviewing the patches.
>>
>> There are a couple of situations where a cpuidle state can disappear:
>>
>> 1. For x86/acpi with dynamic c-states, when a laptop switches from battery
>> to AC that could result on removing the deeper idle state. The acpi driver
>> triggers:
>>
>> 'acpi_processor_cst_has_changed' which will call 'cpuidle_pause_and_lock'.
>> This one will call 'cpuidle_uninstall_idle_handler' which in turn calls
>> 'kick_all_cpus_sync'.
>>
>> All cpus will exit their idle state and the pointed object will be set to
>> NULL again.
>>
>> 2. The cpuidle driver is unloaded. Logically that could happen but not in
>> practice because the drivers are always compiled in and 95% of the drivers
>> are not coded to unregister the driver. Anyway ...
>>
>> The unloading code must call 'cpuidle_unregister_device', that calls
>> 'cpuidle_pause_and_lock' leading to 'kick_all_cpus_sync'.
>>
>> IIUC, the race can happen if we take the pointer and then one of these two
>> situation occurs at the same moment.
>>
>> As the function 'find_idlest_cpu' is inside a rcu_read_lock may be a
>> rcu_barrier in 'cpuidle_pause_and_lock' or 'cpuidle_uninstall_idle_handler'
>> should suffice, no ?
>
> Indeed. But be sure to document this.

Yes, sure. Thanks for pointing this.

   -- Daniel
diff mbox

Patch

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 8f4390a..5c32c11 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -12,6 +12,8 @@ 
 
 #include <trace/events/power.h>
 
+#include "sched.h"
+
 static int __read_mostly cpu_idle_force_poll;
 
 void cpu_idle_poll_ctrl(bool enable)
@@ -69,7 +71,7 @@  void __weak arch_cpu_idle(void)
  * NOTE: no locks or semaphores should be used here
  * return non-zero on failure
  */
-static int cpuidle_idle_call(void)
+static int cpuidle_idle_call(struct cpuidle_power **power)
 {
 	struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
 	struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
@@ -143,6 +145,10 @@  static int cpuidle_idle_call(void)
 			if (!ret) {
 				trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
+				*power = &drv->states[next_state].power;
+
+				wmb();
+
 				/*
 				 * Enter the idle state previously
 				 * returned by the governor
@@ -154,6 +160,10 @@  static int cpuidle_idle_call(void)
 				entered_state = cpuidle_enter(drv, dev,
 							      next_state);
 
+				*power = NULL;
+
+				wmb();
+
 				trace_cpu_idle_rcuidle(PWR_EVENT_EXIT,
 						       dev->cpu);
 
@@ -198,6 +208,9 @@  static int cpuidle_idle_call(void)
  */
 static void cpu_idle_loop(void)
 {
+	struct rq *rq = this_rq();
+	struct cpuidle_power **power = &rq->power;
+
 	while (1) {
 		tick_nohz_idle_enter();
 
@@ -223,7 +236,7 @@  static void cpu_idle_loop(void)
 			if (cpu_idle_force_poll || tick_check_broadcast_expired())
 				cpu_idle_poll();
 			else
-				cpuidle_idle_call();
+				cpuidle_idle_call(power);
 
 			arch_cpu_idle_exit();
 		}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1929deb..1bcac35 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -14,6 +14,7 @@ 
 #include "cpuacct.h"
 
 struct rq;
+struct cpuidle_power;
 
 extern __read_mostly int scheduler_running;
 
@@ -632,6 +633,10 @@  struct rq {
 #ifdef CONFIG_SMP
 	struct llist_head wake_list;
 #endif
+
+#ifdef CONFIG_CPU_IDLE
+	struct cpuidle_power *power;
+#endif
 };
 
 static inline int cpu_of(struct rq *rq)