diff mbox

[V3,1/6] sched: idle: Add a weak arch_cpu_idle_poll function

Message ID 1415370687-18688-2-git-send-email-daniel.lezcano@linaro.org
State New
Headers show

Commit Message

Daniel Lezcano Nov. 7, 2014, 2:31 p.m. UTC
The poll function is called when a timer expired or if we force to poll when
the cpu_idle_force_poll option is set.

The poll function does:

       local_irq_enable();
       while (!tif_need_resched())
               cpu_relax();

This default poll function suits for the x86 arch because its rep; nop;
hardware power optimization. But on other archs, this optimization does not
exists and we are not saving power. The arch specific bits may want to
optimize this loop by adding their own optimization.

Give an opportunity to the different platform to specify their own polling
loop by adding a weak cpu_idle_poll_loop function.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 kernel/sched/idle.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

Comments

Preeti U Murthy Nov. 8, 2014, 10:39 a.m. UTC | #1
On 11/07/2014 08:01 PM, Daniel Lezcano wrote:
> The poll function is called when a timer expired or if we force to poll when
> the cpu_idle_force_poll option is set.
> 
> The poll function does:
> 
>        local_irq_enable();
>        while (!tif_need_resched())
>                cpu_relax();
> 
> This default poll function suits for the x86 arch because its rep; nop;
> hardware power optimization. But on other archs, this optimization does not
> exists and we are not saving power. The arch specific bits may want to
> optimize this loop by adding their own optimization.
> 
> Give an opportunity to the different platform to specify their own polling
> loop by adding a weak cpu_idle_poll_loop function.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  kernel/sched/idle.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index c47fce7..ea65bbae 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -42,18 +42,6 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>  __setup("hlt", cpu_idle_nopoll_setup);
>  #endif
> 
> -static inline int cpu_idle_poll(void)
> -{
> -	rcu_idle_enter();
> -	trace_cpu_idle_rcuidle(0, smp_processor_id());
> -	local_irq_enable();
> -	while (!tif_need_resched())
> -		cpu_relax();
> -	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> -	rcu_idle_exit();
> -	return 1;
> -}
> -
>  /* Weak implementations for optional arch specific functions */
>  void __weak arch_cpu_idle_prepare(void) { }
>  void __weak arch_cpu_idle_enter(void) { }
> @@ -65,6 +53,23 @@ void __weak arch_cpu_idle(void)
>  	local_irq_enable();
>  }
> 
> +void __weak arch_cpu_idle_poll(void)
> +{
> +	local_irq_enable();
> +	while (!tif_need_resched())
> +		cpu_relax();
> +}
> +
> +static inline int cpu_idle_poll(void)
> +{
> +	rcu_idle_enter();
> +	trace_cpu_idle_rcuidle(0, smp_processor_id());
> +	arch_cpu_idle_poll();
> +	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> +	rcu_idle_exit();
> +	return 1;
> +}
> +
>  /**
>   * cpuidle_idle_call - the main idle function
>   *
> 

Thanks Daniel!

Reviewed-by: Preeti U. Murthy <preeti@linux.vnet.ibm.com>
Peter Zijlstra Nov. 10, 2014, 12:29 p.m. UTC | #2
On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
> The poll function is called when a timer expired or if we force to poll when
> the cpu_idle_force_poll option is set.
> 
> The poll function does:
> 
>        local_irq_enable();
>        while (!tif_need_resched())
>                cpu_relax();
> 
> This default poll function suits for the x86 arch because its rep; nop;
> hardware power optimization. But on other archs, this optimization does not
> exists and we are not saving power. The arch specific bits may want to
> optimize this loop by adding their own optimization.

This doesn't make sense to me; should an arch not either implement an
actual idle driver or implement cpu_relax() properly, why allow for a
third weird option?
Preeti U Murthy Nov. 10, 2014, 2:20 p.m. UTC | #3
Hi Peter,

On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
> On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
>> The poll function is called when a timer expired or if we force to poll when
>> the cpu_idle_force_poll option is set.
>>
>> The poll function does:
>>
>>        local_irq_enable();
>>        while (!tif_need_resched())
>>                cpu_relax();
>>
>> This default poll function suits for the x86 arch because its rep; nop;
>> hardware power optimization. But on other archs, this optimization does not
>> exists and we are not saving power. The arch specific bits may want to
>> optimize this loop by adding their own optimization.
> 
> This doesn't make sense to me; should an arch not either implement an
> actual idle driver or implement cpu_relax() properly, why allow for a
> third weird option?
> 

The previous version of this patch simply invoked cpu_idle_loop() for
cases where latency_req was 0. This would have changed the behavior
on PowerPC wherein earlier the 0th idle index was returned which is also
a polling loop but differs from cpu_idle_loop() in two ways:

a. It polls at a relatively lower power state than cpu_relax().
b. We set certain registers to indicate that the cpu is idle.

Hence for all such cases wherein the cpu is required to poll while idle
(only for cases such as force poll, broadcast ipi to arrive soon and
latency_req = 0), we should be able to call into cpuidle_idle_loop()
only if the cpuidle driver's 0th idle state has an exit_latency > 0.
(The 0th idle state is expected to be a polling loop with
exit_latency = 0).

If otherwise, it would mean the driver has an optimized polling loop
when idle. But instead of adding in the logic of checking the
exit_latency, we thought it would be simpler to call into an arch
defined polling idle loop under the above circumstances. If that is no
better we could fall back to cpuidle_idle_loop().

Regards
Preeti U Murthy
Peter Zijlstra Nov. 10, 2014, 3:17 p.m. UTC | #4
On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote:
> Hi Peter,
> 
> On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
> > On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
> >> The poll function is called when a timer expired or if we force to poll when
> >> the cpu_idle_force_poll option is set.
> >>
> >> The poll function does:
> >>
> >>        local_irq_enable();
> >>        while (!tif_need_resched())
> >>                cpu_relax();
> >>
> >> This default poll function suits for the x86 arch because its rep; nop;
> >> hardware power optimization. But on other archs, this optimization does not
> >> exists and we are not saving power. The arch specific bits may want to
> >> optimize this loop by adding their own optimization.
> > 
> > This doesn't make sense to me; should an arch not either implement an
> > actual idle driver or implement cpu_relax() properly, why allow for a
> > third weird option?
> > 
> 
> The previous version of this patch simply invoked cpu_idle_loop() for
> cases where latency_req was 0. This would have changed the behavior
> on PowerPC wherein earlier the 0th idle index was returned which is also
> a polling loop but differs from cpu_idle_loop() in two ways:
> 
> a. It polls at a relatively lower power state than cpu_relax().
> b. We set certain registers to indicate that the cpu is idle.

So I'm confused; the current code runs the generic cpu_relax idle poll
loop for the broadcast case. I suppose you want to retain this because
not doing your a-b above will indeed give you a lower latency.

Therefore one could argue that latency_req==0 should indeed use this,
and your a-b idle state should be latency_req==1 or higher.

Thus yes it changes behaviour, but I think it actually fixes something.
You cannot have a latency_req==0 state which has higher latency than the
actual polling loop, as you appear to have.

> Hence for all such cases wherein the cpu is required to poll while idle
> (only for cases such as force poll, broadcast ipi to arrive soon and
> latency_req = 0), we should be able to call into cpuidle_idle_loop()
> only if the cpuidle driver's 0th idle state has an exit_latency > 0.
> (The 0th idle state is expected to be a polling loop with
> exit_latency = 0).
> 
> If otherwise, it would mean the driver has an optimized polling loop
> when idle. But instead of adding in the logic of checking the
> exit_latency, we thought it would be simpler to call into an arch
> defined polling idle loop under the above circumstances. If that is no
> better we could fall back to cpuidle_idle_loop().

That still doesn't make sense to me; suppose the implementation of this
special poll state differs on different uarchs for the same arch, then
we'll end up with another registration and selection interface parallel
to cpuidle.
Preeti U Murthy Nov. 11, 2014, 11 a.m. UTC | #5
On 11/10/2014 08:47 PM, Peter Zijlstra wrote:
> On Mon, Nov 10, 2014 at 07:50:22PM +0530, Preeti U Murthy wrote:
>> Hi Peter,
>>
>> On 11/10/2014 05:59 PM, Peter Zijlstra wrote:
>>> On Fri, Nov 07, 2014 at 03:31:22PM +0100, Daniel Lezcano wrote:
>>>> The poll function is called when a timer expired or if we force to poll when
>>>> the cpu_idle_force_poll option is set.
>>>>
>>>> The poll function does:
>>>>
>>>>        local_irq_enable();
>>>>        while (!tif_need_resched())
>>>>                cpu_relax();
>>>>
>>>> This default poll function suits for the x86 arch because its rep; nop;
>>>> hardware power optimization. But on other archs, this optimization does not
>>>> exists and we are not saving power. The arch specific bits may want to
>>>> optimize this loop by adding their own optimization.
>>>
>>> This doesn't make sense to me; should an arch not either implement an
>>> actual idle driver or implement cpu_relax() properly, why allow for a
>>> third weird option?
>>>
>>
>> The previous version of this patch simply invoked cpu_idle_loop() for
>> cases where latency_req was 0. This would have changed the behavior
>> on PowerPC wherein earlier the 0th idle index was returned which is also
>> a polling loop but differs from cpu_idle_loop() in two ways:
>>
>> a. It polls at a relatively lower power state than cpu_relax().
>> b. We set certain registers to indicate that the cpu is idle.
> 
> So I'm confused; the current code runs the generic cpu_relax idle poll
> loop for the broadcast case. I suppose you want to retain this because
> not doing your a-b above will indeed give you a lower latency.
> 
> Therefore one could argue that latency_req==0 should indeed use this,
> and your a-b idle state should be latency_req==1 or higher.
> 
> Thus yes it changes behaviour, but I think it actually fixes something.
> You cannot have a latency_req==0 state which has higher latency than the
> actual polling loop, as you appear to have.

Yes you are right. This is fixing the current behavior. So we should be
good to call cpuidle_idle_loop() when latency_req=0.

> 
>> Hence for all such cases wherein the cpu is required to poll while idle
>> (only for cases such as force poll, broadcast ipi to arrive soon and
>> latency_req = 0), we should be able to call into cpuidle_idle_loop()
>> only if the cpuidle driver's 0th idle state has an exit_latency > 0.
>> (The 0th idle state is expected to be a polling loop with
>> exit_latency = 0).
>>
>> If otherwise, it would mean the driver has an optimized polling loop
>> when idle. But instead of adding in the logic of checking the
>> exit_latency, we thought it would be simpler to call into an arch
>> defined polling idle loop under the above circumstances. If that is no
>> better we could fall back to cpuidle_idle_loop().
> 
> That still doesn't make sense to me; suppose the implementation of this
> special poll state differs on different uarchs for the same arch, then
> we'll end up with another registration and selection interface parallel
> to cpuidle.

Yeah this will only get complicated. I was trying to see how to keep the
current behavior unchanged but looks like it is not worth it.

Thanks

Regards
Preeti U Murthy
> 
>
diff mbox

Patch

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index c47fce7..ea65bbae 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -42,18 +42,6 @@  static int __init cpu_idle_nopoll_setup(char *__unused)
 __setup("hlt", cpu_idle_nopoll_setup);
 #endif
 
-static inline int cpu_idle_poll(void)
-{
-	rcu_idle_enter();
-	trace_cpu_idle_rcuidle(0, smp_processor_id());
-	local_irq_enable();
-	while (!tif_need_resched())
-		cpu_relax();
-	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
-	rcu_idle_exit();
-	return 1;
-}
-
 /* Weak implementations for optional arch specific functions */
 void __weak arch_cpu_idle_prepare(void) { }
 void __weak arch_cpu_idle_enter(void) { }
@@ -65,6 +53,23 @@  void __weak arch_cpu_idle(void)
 	local_irq_enable();
 }
 
+void __weak arch_cpu_idle_poll(void)
+{
+	local_irq_enable();
+	while (!tif_need_resched())
+		cpu_relax();
+}
+
+static inline int cpu_idle_poll(void)
+{
+	rcu_idle_enter();
+	trace_cpu_idle_rcuidle(0, smp_processor_id());
+	arch_cpu_idle_poll();
+	trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
+	rcu_idle_exit();
+	return 1;
+}
+
 /**
  * cpuidle_idle_call - the main idle function
  *