[tip/core/rcu,55/55] powerpc: Work around tracing from dyntick-idle mode

Message ID 1315332049-2604-55-git-send-email-paulmck@linux.vnet.ibm.com
State New
Headers show

Commit Message

Paul E. McKenney Sept. 6, 2011, 6 p.m.
PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a
point where RCU has been told that the CPU is in dyntick-idle mode.
Because event tracing uses RCU, this can result in failures.

A correct fix would arrange for RCU to be told about dyntick-idle
mode after tracing had completed, however, this will require some care
because it appears that __trace_hcall_exit() can also be called from
non-dyntick-idle mode.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Cc: paulus@samba.org
---
 arch/powerpc/platforms/pseries/lpar.c |    6 ++++++
 1 files changed, 6 insertions(+), 0 deletions(-)

Comments

Benjamin Herrenschmidt Sept. 7, 2011, 10 a.m. | #1
On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote:
> PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a
> point where RCU has been told that the CPU is in dyntick-idle mode.
> Because event tracing uses RCU, this can result in failures.
> 
> A correct fix would arrange for RCU to be told about dyntick-idle
> mode after tracing had completed, however, this will require some care
> because it appears that __trace_hcall_exit() can also be called from
> non-dyntick-idle mode.

This obviously needs to be fixed properly. hcall tracing is very useful
and if I understand your patch properly, it just comments it out :-)

I'm not sure what the best approach is, maybe have the hcall tracing
test for the dyntick-idle mode and skip tracing in that case ?

Cheers,
Ben.

> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> Cc: anton@samba.org
> Cc: benh@kernel.crashing.org
> Cc: paulus@samba.org
> ---
>  arch/powerpc/platforms/pseries/lpar.c |    6 ++++++
>  1 files changed, 6 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> index 39e6e0a..668f300 100644
> --- a/arch/powerpc/platforms/pseries/lpar.c
> +++ b/arch/powerpc/platforms/pseries/lpar.c
> @@ -715,12 +715,14 @@ EXPORT_SYMBOL(arch_free_page);
>  /* NB: reg/unreg are called while guarded with the tracepoints_mutex */
>  extern long hcall_tracepoint_refcount;
>  
> +#if 0 /* work around buggy use of RCU from dyntick-idle mode */
>  /* 
>   * Since the tracing code might execute hcalls we need to guard against
>   * recursion. One example of this are spinlocks calling H_YIELD on
>   * shared processor partitions.
>   */
>  static DEFINE_PER_CPU(unsigned int, hcall_trace_depth);
> +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
>  
>  void hcall_tracepoint_regfunc(void)
>  {
> @@ -734,6 +736,7 @@ void hcall_tracepoint_unregfunc(void)
>  
>  void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
>  {
> +#if 0 /* work around buggy use of RCU from dyntick-idle mode */
>  	unsigned long flags;
>  	unsigned int *depth;
>  
> @@ -750,11 +753,13 @@ void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
>  
>  out:
>  	local_irq_restore(flags);
> +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
>  }
>  
>  void __trace_hcall_exit(long opcode, unsigned long retval,
>  			unsigned long *retbuf)
>  {
> +#if 0 /* work around buggy use of RCU from dyntick-idle mode */
>  	unsigned long flags;
>  	unsigned int *depth;
>  
> @@ -771,6 +776,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval,
>  
>  out:
>  	local_irq_restore(flags);
> +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
>  }
>  #endif
>
Paul E. McKenney Sept. 7, 2011, 1:44 p.m. | #2
On Wed, Sep 07, 2011 at 07:00:22AM -0300, Benjamin Herrenschmidt wrote:
> On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote:
> > PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a
> > point where RCU has been told that the CPU is in dyntick-idle mode.
> > Because event tracing uses RCU, this can result in failures.
> > 
> > A correct fix would arrange for RCU to be told about dyntick-idle
> > mode after tracing had completed, however, this will require some care
> > because it appears that __trace_hcall_exit() can also be called from
> > non-dyntick-idle mode.
> 
> This obviously needs to be fixed properly. hcall tracing is very useful
> and if I understand your patch properly, it just comments it out :-)

That is exactly what it does, and I completely agree that this patch
is nothing but a short-term work-around to allow my RCU tests to find
other bugs.

> I'm not sure what the best approach is, maybe have the hcall tracing
> test for the dyntick-idle mode and skip tracing in that case ?

Another approach would be to update Frederic Weisbecker's patch at:

	https://lkml.org/lkml/2011/8/20/83

so that powerpc does tick_nohz_enter_idle(false), and then uses
rcu_enter_nohz() explicitly just after doing the hcall tracing.
If pseries is the only powerpc architecture requiring this, then
the argument to tick_nohz_enter_idle() could depend on the powerpc
sub-architecture.

The same thing would be needed for tick_nohz_exit_idle() and
rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after
gaining control from the hypervisor but before doing its first tracing,
and then it would need the idle loop to to tick_nohz_exit_idle(false).
Again, if pseries is the only powerpc architecture requiring this,
the argument to tick_nohz_exit_idle() could depend on the architecture.

Would this approach work?

							Thanx, Paul

> Cheers,
> Ben.
> 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > Cc: anton@samba.org
> > Cc: benh@kernel.crashing.org
> > Cc: paulus@samba.org
> > ---
> >  arch/powerpc/platforms/pseries/lpar.c |    6 ++++++
> >  1 files changed, 6 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
> > index 39e6e0a..668f300 100644
> > --- a/arch/powerpc/platforms/pseries/lpar.c
> > +++ b/arch/powerpc/platforms/pseries/lpar.c
> > @@ -715,12 +715,14 @@ EXPORT_SYMBOL(arch_free_page);
> >  /* NB: reg/unreg are called while guarded with the tracepoints_mutex */
> >  extern long hcall_tracepoint_refcount;
> >  
> > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */
> >  /* 
> >   * Since the tracing code might execute hcalls we need to guard against
> >   * recursion. One example of this are spinlocks calling H_YIELD on
> >   * shared processor partitions.
> >   */
> >  static DEFINE_PER_CPU(unsigned int, hcall_trace_depth);
> > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
> >  
> >  void hcall_tracepoint_regfunc(void)
> >  {
> > @@ -734,6 +736,7 @@ void hcall_tracepoint_unregfunc(void)
> >  
> >  void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
> >  {
> > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */
> >  	unsigned long flags;
> >  	unsigned int *depth;
> >  
> > @@ -750,11 +753,13 @@ void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
> >  
> >  out:
> >  	local_irq_restore(flags);
> > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
> >  }
> >  
> >  void __trace_hcall_exit(long opcode, unsigned long retval,
> >  			unsigned long *retbuf)
> >  {
> > +#if 0 /* work around buggy use of RCU from dyntick-idle mode */
> >  	unsigned long flags;
> >  	unsigned int *depth;
> >  
> > @@ -771,6 +776,7 @@ void __trace_hcall_exit(long opcode, unsigned long retval,
> >  
> >  out:
> >  	local_irq_restore(flags);
> > +#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
> >  }
> >  #endif
> >  
> 
>
Frederic Weisbecker Sept. 13, 2011, 7:13 p.m. | #3
On Wed, Sep 07, 2011 at 06:44:00AM -0700, Paul E. McKenney wrote:
> On Wed, Sep 07, 2011 at 07:00:22AM -0300, Benjamin Herrenschmidt wrote:
> > On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote:
> > > PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a
> > > point where RCU has been told that the CPU is in dyntick-idle mode.
> > > Because event tracing uses RCU, this can result in failures.
> > > 
> > > A correct fix would arrange for RCU to be told about dyntick-idle
> > > mode after tracing had completed, however, this will require some care
> > > because it appears that __trace_hcall_exit() can also be called from
> > > non-dyntick-idle mode.
> > 
> > This obviously needs to be fixed properly. hcall tracing is very useful
> > and if I understand your patch properly, it just comments it out :-)
> 
> That is exactly what it does, and I completely agree that this patch
> is nothing but a short-term work-around to allow my RCU tests to find
> other bugs.
> 
> > I'm not sure what the best approach is, maybe have the hcall tracing
> > test for the dyntick-idle mode and skip tracing in that case ?
> 
> Another approach would be to update Frederic Weisbecker's patch at:
> 
> 	https://lkml.org/lkml/2011/8/20/83
> 
> so that powerpc does tick_nohz_enter_idle(false), and then uses
> rcu_enter_nohz() explicitly just after doing the hcall tracing.
> If pseries is the only powerpc architecture requiring this, then
> the argument to tick_nohz_enter_idle() could depend on the powerpc
> sub-architecture.

I'm trying to fix this but I need a bit of help to understand the
pseries cpu sleeping.

In pseries_dedicated_idle_sleep(), what is the function that does
the real sleeping? Is it cede_processor()?

> 
> The same thing would be needed for tick_nohz_exit_idle() and
> rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after
> gaining control from the hypervisor but before doing its first tracing,
> and then it would need the idle loop to to tick_nohz_exit_idle(false).
> Again, if pseries is the only powerpc architecture requiring this,
> the argument to tick_nohz_exit_idle() could depend on the architecture.
> 
> Would this approach work?

Sounds like we really need that.

Thanks.
Paul E. McKenney Sept. 13, 2011, 7:50 p.m. | #4
On Tue, Sep 13, 2011 at 09:13:21PM +0200, Frederic Weisbecker wrote:
> On Wed, Sep 07, 2011 at 06:44:00AM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 07, 2011 at 07:00:22AM -0300, Benjamin Herrenschmidt wrote:
> > > On Tue, 2011-09-06 at 11:00 -0700, Paul E. McKenney wrote:
> > > > PowerPC LPAR's __trace_hcall_exit() can invoke event tracing at a
> > > > point where RCU has been told that the CPU is in dyntick-idle mode.
> > > > Because event tracing uses RCU, this can result in failures.
> > > > 
> > > > A correct fix would arrange for RCU to be told about dyntick-idle
> > > > mode after tracing had completed, however, this will require some care
> > > > because it appears that __trace_hcall_exit() can also be called from
> > > > non-dyntick-idle mode.
> > > 
> > > This obviously needs to be fixed properly. hcall tracing is very useful
> > > and if I understand your patch properly, it just comments it out :-)
> > 
> > That is exactly what it does, and I completely agree that this patch
> > is nothing but a short-term work-around to allow my RCU tests to find
> > other bugs.
> > 
> > > I'm not sure what the best approach is, maybe have the hcall tracing
> > > test for the dyntick-idle mode and skip tracing in that case ?
> > 
> > Another approach would be to update Frederic Weisbecker's patch at:
> > 
> > 	https://lkml.org/lkml/2011/8/20/83
> > 
> > so that powerpc does tick_nohz_enter_idle(false), and then uses
> > rcu_enter_nohz() explicitly just after doing the hcall tracing.
> > If pseries is the only powerpc architecture requiring this, then
> > the argument to tick_nohz_enter_idle() could depend on the powerpc
> > sub-architecture.
> 
> I'm trying to fix this but I need a bit of help to understand the
> pseries cpu sleeping.
> 
> In pseries_dedicated_idle_sleep(), what is the function that does
> the real sleeping? Is it cede_processor()?

As I understand it, cede_processor()'s call to plpar_hcall_norets()
results in the hypervisor being invoked, and could give up the CPU.
And yes, in this case, RCU needs to stop paying attention to this CPU.
And pseries_shared_idle_sleep() also invokes cede_proceessor().

Gah...  And there also appear to be some assembly-language functions
that can be invoked via the ppc_md.power_save() call from cpu_idle():
ppc6xx_idle(), power4_idle(), idle_spin(), idle_doze(), and book3e_idle().
There is also a power7_idle(), but it does not appear to be used anywhere.

Plus there are the C-language ppc44x_idle(), beat_power_save(),
cbe_power_save(), ps3_power_save(), and cpm_idle().

> > The same thing would be needed for tick_nohz_exit_idle() and
> > rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after
> > gaining control from the hypervisor but before doing its first tracing,
> > and then it would need the idle loop to to tick_nohz_exit_idle(false).
> > Again, if pseries is the only powerpc architecture requiring this,
> > the argument to tick_nohz_exit_idle() could depend on the architecture.
> > 
> > Would this approach work?
> 
> Sounds like we really need that.

Sounds like an arch-dependent config symbol that is defined for the
pseries targets, but not for the other powerpc architectures.

Not clear to me what to do about power4_idle(), though.

							Thanx, Paul
Benjamin Herrenschmidt Sept. 13, 2011, 8:49 p.m. | #5
> As I understand it, cede_processor()'s call to plpar_hcall_norets()
> results in the hypervisor being invoked, and could give up the CPU.
> And yes, in this case, RCU needs to stop paying attention to this CPU.
> And pseries_shared_idle_sleep() also invokes cede_proceessor().
> 
> Gah...  And there also appear to be some assembly-language functions
> that can be invoked via the ppc_md.power_save() call from cpu_idle():
> ppc6xx_idle(), power4_idle(), idle_spin(), idle_doze(), and book3e_idle().
> There is also a power7_idle(), but it does not appear to be used anywhere.
> 
> Plus there are the C-language ppc44x_idle(), beat_power_save(),
> cbe_power_save(), ps3_power_save(), and cpm_idle().
> 
> > > The same thing would be needed for tick_nohz_exit_idle() and
> > > rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after
> > > gaining control from the hypervisor but before doing its first tracing,
> > > and then it would need the idle loop to to tick_nohz_exit_idle(false).
> > > Again, if pseries is the only powerpc architecture requiring this,
> > > the argument to tick_nohz_exit_idle() could depend on the architecture.
> > > 
> > > Would this approach work?
> > 
> > Sounds like we really need that.
> 
> Sounds like an arch-dependent config symbol that is defined for the
> pseries targets, but not for the other powerpc architectures.
> 
> Not clear to me what to do about power4_idle(), though.

I don't totally follow, too many things to deal with right now, but keep
in mind that we build multiplatform kernels, so you can have powermac,
cell, pseries, etc... all in one kernel binary (including power7 idle).

Shouldn't we instead change the plpar trace call to skip the tracing
when not safe to do so ?

Cheers,
Ben.
Frederic Weisbecker Sept. 15, 2011, 2:53 p.m. | #6
On Tue, Sep 13, 2011 at 05:49:53PM -0300, Benjamin Herrenschmidt wrote:
> 
> > As I understand it, cede_processor()'s call to plpar_hcall_norets()
> > results in the hypervisor being invoked, and could give up the CPU.
> > And yes, in this case, RCU needs to stop paying attention to this CPU.
> > And pseries_shared_idle_sleep() also invokes cede_proceessor().
> > 
> > Gah...  And there also appear to be some assembly-language functions
> > that can be invoked via the ppc_md.power_save() call from cpu_idle():
> > ppc6xx_idle(), power4_idle(), idle_spin(), idle_doze(), and book3e_idle().
> > There is also a power7_idle(), but it does not appear to be used anywhere.
> > 
> > Plus there are the C-language ppc44x_idle(), beat_power_save(),
> > cbe_power_save(), ps3_power_save(), and cpm_idle().
> > 
> > > > The same thing would be needed for tick_nohz_exit_idle() and
> > > > rcu_exit_nohz(): powerpc would need to invoke rcu_exit_nohz() after
> > > > gaining control from the hypervisor but before doing its first tracing,
> > > > and then it would need the idle loop to to tick_nohz_exit_idle(false).
> > > > Again, if pseries is the only powerpc architecture requiring this,
> > > > the argument to tick_nohz_exit_idle() could depend on the architecture.
> > > > 
> > > > Would this approach work?
> > > 
> > > Sounds like we really need that.
> > 
> > Sounds like an arch-dependent config symbol that is defined for the
> > pseries targets, but not for the other powerpc architectures.
> > 
> > Not clear to me what to do about power4_idle(), though.
> 
> I don't totally follow, too many things to deal with right now, but keep
> in mind that we build multiplatform kernels, so you can have powermac,
> cell, pseries, etc... all in one kernel binary (including power7 idle).
> 
> Shouldn't we instead change the plpar trace call to skip the tracing
> when not safe to do so ?

Or may be we can have a plpar_hcall_norets_notrace() for this specific case?
Lemme try something.

Patch

diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 39e6e0a..668f300 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -715,12 +715,14 @@  EXPORT_SYMBOL(arch_free_page);
 /* NB: reg/unreg are called while guarded with the tracepoints_mutex */
 extern long hcall_tracepoint_refcount;
 
+#if 0 /* work around buggy use of RCU from dyntick-idle mode */
 /* 
  * Since the tracing code might execute hcalls we need to guard against
  * recursion. One example of this are spinlocks calling H_YIELD on
  * shared processor partitions.
  */
 static DEFINE_PER_CPU(unsigned int, hcall_trace_depth);
+#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
 
 void hcall_tracepoint_regfunc(void)
 {
@@ -734,6 +736,7 @@  void hcall_tracepoint_unregfunc(void)
 
 void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
 {
+#if 0 /* work around buggy use of RCU from dyntick-idle mode */
 	unsigned long flags;
 	unsigned int *depth;
 
@@ -750,11 +753,13 @@  void __trace_hcall_entry(unsigned long opcode, unsigned long *args)
 
 out:
 	local_irq_restore(flags);
+#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
 }
 
 void __trace_hcall_exit(long opcode, unsigned long retval,
 			unsigned long *retbuf)
 {
+#if 0 /* work around buggy use of RCU from dyntick-idle mode */
 	unsigned long flags;
 	unsigned int *depth;
 
@@ -771,6 +776,7 @@  void __trace_hcall_exit(long opcode, unsigned long retval,
 
 out:
 	local_irq_restore(flags);
+#endif /* #if 0 work around buggy use of RCU from dyntick-idle mode */
 }
 #endif