[RFC,tip/core/rcu,05/28] lockdep: Update documentation for lock-class leak detection

Message ID 1320265849-5744-5-git-send-email-paulmck@linux.vnet.ibm.com
State New
Headers show

Commit Message

Paul E. McKenney Nov. 2, 2011, 8:30 p.m.
There are a number of bugs that can leak or overuse lock classes,
which can cause the maximum number of lock classes (currently 8191)
to be exceeded.  However, the documentation does not tell you how to
track down these problems.  This commit addresses this shortcoming.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---
 Documentation/lockdep-design.txt |   61 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 61 insertions(+), 0 deletions(-)

Comments

Josh Triplett Nov. 3, 2011, 2:57 a.m. | #1
On Wed, Nov 02, 2011 at 01:30:26PM -0700, Paul E. McKenney wrote:
> There are a number of bugs that can leak or overuse lock classes,
> which can cause the maximum number of lock classes (currently 8191)
> to be exceeded.  However, the documentation does not tell you how to
> track down these problems.  This commit addresses this shortcoming.
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
>  Documentation/lockdep-design.txt |   61 ++++++++++++++++++++++++++++++++++++++
>  1 files changed, 61 insertions(+), 0 deletions(-)
> 
> diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
> index abf768c..383bb23 100644
> --- a/Documentation/lockdep-design.txt
> +++ b/Documentation/lockdep-design.txt
> @@ -221,3 +221,64 @@ when the chain is validated for the first time, is then put into a hash
>  table, which hash-table can be checked in a lockfree manner. If the
>  locking chain occurs again later on, the hash table tells us that we
>  dont have to validate the chain again.
> +
> +Troubleshooting:
> +----------------
> +
> +The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
> +Exceeding this number will trigger the following lockdep warning:
> +
> +	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
> +
> +By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
> +desktop systems have less than 1,000 lock classes, so this warning
> +normally results from lock-class leakage or failure to properly
> +initialize locks.  These two problems are illustrated below:
> +
> +1.	Repeated module loading and unloading while running the validator
> +	will result in lock-class leakage.  The issue here is that each
> +	load of the module will create a new set of lock classes for that
> +	module's locks, but module unloading does not remove old classes.

I'd explicitly add a parenthetical here: (see below about reusing lock
classes for why).  I stared at this for a minute trying to think about
why the old classes couldn't go away, before realizing this fell into
the case you described below: removing them would require cleaning up
any dependency chains involving them.

> +	Therefore, if that module is loaded and unloaded repeatedly,
> +	the number of lock classes will eventually reach the maximum.
> +
> +2.	Using structures such as arrays that have large numbers of
> +	locks that are not explicitly initialized.  For example,
> +	a hash table with 8192 buckets where each bucket has its
> +	own spinlock_t will consume 8192 lock classes -unless- each
> +	spinlock is initialized, for example, using spin_lock_init().
> +	Failure to properly initialize the per-bucket spinlocks would
> +	guarantee lock-class overflow.	In contrast, a loop that called
> +	spin_lock_init() on each lock would place all 8192 locks into a
> +	single lock class.
> +
> +	The moral of this story is that you should always explicitly
> +	initialize your locks.

Spin locks *require* initialization, right?  Doesn't this constitute a
bug regardless of lockdep?

If so, could we simply arrange to have lockdep scream when it encounters
an uninitialized spinlock?

> +One might argue that the validator should be modified to allow lock
> +classes to be reused.  However, if you are tempted to make this argument,
> +first review the code and think through the changes that would be
> +required, keeping in mind that the lock classes to be removed are likely
> +to be linked into the lock-dependency graph.  This turns out to be a
> +harder to do than to say.

Typo fix: s/to be a harder/to be harder/.

> +Of course, if you do run out of lock classes, the next thing to do is
> +to find the offending lock classes.  First, the following command gives
> +you the number of lock classes currently in use along with the maximum:
> +
> +	grep "lock-classes" /proc/lockdep_stats
> +
> +This command produces the following output on a modest Power system:
> +
> +	 lock-classes:                          748 [max: 8191]

Does Power matter here?  Could this just say "a modest system"?

> +If the number allocated (748 above) increases continually over time,
> +then there is likely a leak.  The following command can be used to
> +identify the leaking lock classes:
> +
> +	grep "BD" /proc/lockdep
> +
> +Run the command and save the output, then compare against the output
> +from a later run of this command to identify the leakers.  This same
> +output can also help you find situations where lock initialization
> +has been omitted.

You might consider giving an example of what a lack of lock
initialization would look like here.

- Josh Triplett
Paul E. McKenney Nov. 3, 2011, 7:42 p.m. | #2
On Wed, Nov 02, 2011 at 07:57:16PM -0700, Josh Triplett wrote:
> On Wed, Nov 02, 2011 at 01:30:26PM -0700, Paul E. McKenney wrote:
> > There are a number of bugs that can leak or overuse lock classes,
> > which can cause the maximum number of lock classes (currently 8191)
> > to be exceeded.  However, the documentation does not tell you how to
> > track down these problems.  This commit addresses this shortcoming.
> > 
> > Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > ---
> >  Documentation/lockdep-design.txt |   61 ++++++++++++++++++++++++++++++++++++++
> >  1 files changed, 61 insertions(+), 0 deletions(-)
> > 
> > diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
> > index abf768c..383bb23 100644
> > --- a/Documentation/lockdep-design.txt
> > +++ b/Documentation/lockdep-design.txt
> > @@ -221,3 +221,64 @@ when the chain is validated for the first time, is then put into a hash
> >  table, which hash-table can be checked in a lockfree manner. If the
> >  locking chain occurs again later on, the hash table tells us that we
> >  dont have to validate the chain again.
> > +
> > +Troubleshooting:
> > +----------------
> > +
> > +The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
> > +Exceeding this number will trigger the following lockdep warning:
> > +
> > +	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
> > +
> > +By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
> > +desktop systems have less than 1,000 lock classes, so this warning
> > +normally results from lock-class leakage or failure to properly
> > +initialize locks.  These two problems are illustrated below:
> > +
> > +1.	Repeated module loading and unloading while running the validator
> > +	will result in lock-class leakage.  The issue here is that each
> > +	load of the module will create a new set of lock classes for that
> > +	module's locks, but module unloading does not remove old classes.
> 
> I'd explicitly add a parenthetical here: (see below about reusing lock
> classes for why).  I stared at this for a minute trying to think about
> why the old classes couldn't go away, before realizing this fell into
> the case you described below: removing them would require cleaning up
> any dependency chains involving them.

Done!

> > +	Therefore, if that module is loaded and unloaded repeatedly,
> > +	the number of lock classes will eventually reach the maximum.
> > +
> > +2.	Using structures such as arrays that have large numbers of
> > +	locks that are not explicitly initialized.  For example,
> > +	a hash table with 8192 buckets where each bucket has its
> > +	own spinlock_t will consume 8192 lock classes -unless- each
> > +	spinlock is initialized, for example, using spin_lock_init().
> > +	Failure to properly initialize the per-bucket spinlocks would
> > +	guarantee lock-class overflow.	In contrast, a loop that called
> > +	spin_lock_init() on each lock would place all 8192 locks into a
> > +	single lock class.
> > +
> > +	The moral of this story is that you should always explicitly
> > +	initialize your locks.
> 
> Spin locks *require* initialization, right?  Doesn't this constitute a
> bug regardless of lockdep?
> 
> If so, could we simply arrange to have lockdep scream when it encounters
> an uninitialized spinlock?

I reworded to distinguish between compile-time initialization (which will
cause lockdep to have a separate class per instance) and run-time
initialization (which will cause lockdep to have one class total).

Making lockdep scream in this case might be useful, but if I understand
correctly, that would give false positives for compile-time initialized
global locks.

> > +One might argue that the validator should be modified to allow lock
> > +classes to be reused.  However, if you are tempted to make this argument,
> > +first review the code and think through the changes that would be
> > +required, keeping in mind that the lock classes to be removed are likely
> > +to be linked into the lock-dependency graph.  This turns out to be a
> > +harder to do than to say.
> 
> Typo fix: s/to be a harder/to be harder/.

Fixed.

> > +Of course, if you do run out of lock classes, the next thing to do is
> > +to find the offending lock classes.  First, the following command gives
> > +you the number of lock classes currently in use along with the maximum:
> > +
> > +	grep "lock-classes" /proc/lockdep_stats
> > +
> > +This command produces the following output on a modest Power system:
> > +
> > +	 lock-classes:                          748 [max: 8191]
> 
> Does Power matter here?  Could this just say "a modest system"?

Good point -- true but irrelevant.  Removed "Power".

> > +If the number allocated (748 above) increases continually over time,
> > +then there is likely a leak.  The following command can be used to
> > +identify the leaking lock classes:
> > +
> > +	grep "BD" /proc/lockdep
> > +
> > +Run the command and save the output, then compare against the output
> > +from a later run of this command to identify the leakers.  This same
> > +output can also help you find situations where lock initialization
> > +has been omitted.
> 
> You might consider giving an example of what a lack of lock
> initialization would look like here.

Hopefully the compile-time vs. run-time clears this up.

								Thanx, Paul
Peter Zijlstra Nov. 9, 2011, 2:02 p.m. | #3
On Thu, 2011-11-03 at 12:42 -0700, Paul E. McKenney wrote:
> > If so, could we simply arrange to have lockdep scream when it encounters
> > an uninitialized spinlock?
> 
> I reworded to distinguish between compile-time initialization (which will
> cause lockdep to have a separate class per instance) and run-time
> initialization (which will cause lockdep to have one class total).

Right, runtime init will key off of the call-site, compile-time init
will key off of the static data address.

> Making lockdep scream in this case might be useful, but if I understand
> correctly, that would give false positives for compile-time initialized
> global locks. 

Yeah, that's going to bring a lot of pain with it, in particular all the
early stuff like the init task etc. are all statically initialized.
Paul E. McKenney Nov. 10, 2011, 5:22 p.m. | #4
On Wed, Nov 09, 2011 at 03:02:08PM +0100, Peter Zijlstra wrote:
> On Thu, 2011-11-03 at 12:42 -0700, Paul E. McKenney wrote:
> > > If so, could we simply arrange to have lockdep scream when it encounters
> > > an uninitialized spinlock?
> > 
> > I reworded to distinguish between compile-time initialization (which will
> > cause lockdep to have a separate class per instance) and run-time
> > initialization (which will cause lockdep to have one class total).
> 
> Right, runtime init will key off of the call-site, compile-time init
> will key off of the static data address.
> 
> > Making lockdep scream in this case might be useful, but if I understand
> > correctly, that would give false positives for compile-time initialized
> > global locks. 
> 
> Yeah, that's going to bring a lot of pain with it, in particular all the
> early stuff like the init task etc. are all statically initialized.

OK, will stick with the current approach, then.

								Thanx, Paul

Patch

diff --git a/Documentation/lockdep-design.txt b/Documentation/lockdep-design.txt
index abf768c..383bb23 100644
--- a/Documentation/lockdep-design.txt
+++ b/Documentation/lockdep-design.txt
@@ -221,3 +221,64 @@  when the chain is validated for the first time, is then put into a hash
 table, which hash-table can be checked in a lockfree manner. If the
 locking chain occurs again later on, the hash table tells us that we
 dont have to validate the chain again.
+
+Troubleshooting:
+----------------
+
+The validator tracks a maximum of MAX_LOCKDEP_KEYS number of lock classes.
+Exceeding this number will trigger the following lockdep warning:
+
+	(DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
+
+By default, MAX_LOCKDEP_KEYS is currently set to 8191, and typical
+desktop systems have less than 1,000 lock classes, so this warning
+normally results from lock-class leakage or failure to properly
+initialize locks.  These two problems are illustrated below:
+
+1.	Repeated module loading and unloading while running the validator
+	will result in lock-class leakage.  The issue here is that each
+	load of the module will create a new set of lock classes for that
+	module's locks, but module unloading does not remove old classes.
+	Therefore, if that module is loaded and unloaded repeatedly,
+	the number of lock classes will eventually reach the maximum.
+
+2.	Using structures such as arrays that have large numbers of
+	locks that are not explicitly initialized.  For example,
+	a hash table with 8192 buckets where each bucket has its
+	own spinlock_t will consume 8192 lock classes -unless- each
+	spinlock is initialized, for example, using spin_lock_init().
+	Failure to properly initialize the per-bucket spinlocks would
+	guarantee lock-class overflow.	In contrast, a loop that called
+	spin_lock_init() on each lock would place all 8192 locks into a
+	single lock class.
+
+	The moral of this story is that you should always explicitly
+	initialize your locks.
+
+One might argue that the validator should be modified to allow lock
+classes to be reused.  However, if you are tempted to make this argument,
+first review the code and think through the changes that would be
+required, keeping in mind that the lock classes to be removed are likely
+to be linked into the lock-dependency graph.  This turns out to be a
+harder to do than to say.
+
+Of course, if you do run out of lock classes, the next thing to do is
+to find the offending lock classes.  First, the following command gives
+you the number of lock classes currently in use along with the maximum:
+
+	grep "lock-classes" /proc/lockdep_stats
+
+This command produces the following output on a modest Power system:
+
+	 lock-classes:                          748 [max: 8191]
+
+If the number allocated (748 above) increases continually over time,
+then there is likely a leak.  The following command can be used to
+identify the leaking lock classes:
+
+	grep "BD" /proc/lockdep
+
+Run the command and save the output, then compare against the output
+from a later run of this command to identify the leakers.  This same
+output can also help you find situations where lock initialization
+has been omitted.