diff mbox series

drivers/core: Replace lockdep_set_novalidate_class() with unique class keys

Message ID 1ad499bb-0c53-7529-ff00-e4328823f6fa@I-love.SAKURA.ne.jp
State New
Headers show
Series drivers/core: Replace lockdep_set_novalidate_class() with unique class keys | expand

Commit Message

Tetsuo Handa Feb. 8, 2023, 10:30 a.m. UTC
Commit 1704f47b50b5 ("lockdep: Add novalidate class for dev->mutex
conversion") made it impossible to find real deadlocks unless timing
dependent testings manage to trigger hung task like [1] and [2]. And
lockdep_set_novalidate_class() remained for more than one decade due to
a fear of false positives [3]. But not sharing mutex_init() could make
it possible to find real deadlocks without triggering hung task [4].
Thus, let's assign a unique class key on each "struct device"->mutex.

Link: https://syzkaller.appspot.com/bug?extid=2d6ac90723742279e101 [1]
Link: https://syzkaller.appspot.com/bug?extid=2e39bc6569d281acbcfb [2]
Link: https://lkml.kernel.org/r/Y98FLlr7jkiFlV0k@rowland.harvard.edu [3]
Link: https://lkml.kernel.org/r/827177aa-bb64-87a9-e1af-dfe070744045@I-love.SAKURA.ne.jp [4]
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Co-developed-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Co-developed-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
Hello, syzkaller users.

We made a patch that keeps lockdep validation enabled on "struct dev->mutex".
Will you try this patch and see if this patch causes boot failures and/or
too frequent crashes to continue testing.

 drivers/base/core.c      | 7 ++++++-
 include/linux/device.h   | 1 +
 include/linux/lockdep.h  | 6 ++++++
 kernel/locking/lockdep.c | 7 +++++++
 4 files changed, 20 insertions(+), 1 deletion(-)

Comments

Kent Overstreet Feb. 11, 2023, 11:06 p.m. UTC | #1
On 2/11/23 16:51, Linus Torvalds wrote:
> On Sat, Feb 11, 2023 at 1:41 PM Alan Stern <stern@rowland.harvard.edu> wrote:
>>
>> @@ -2941,7 +2944,10 @@ void device_initialize(struct device *de
>>          kobject_init(&dev->kobj, &device_ktype);
>>          INIT_LIST_HEAD(&dev->dma_pools);
>>          mutex_init(&dev->mutex);
>> -       lockdep_set_novalidate_class(&dev->mutex);
>> +       if (!lockdep_static_obj(dev)) {
>> +               lockdep_register_key(&dev->mutex_key);
>> +               lockdep_set_class(&dev->mutex, &dev->mutex_key);
>> +       }
>>          spin_lock_init(&dev->devres_lock);
>>          INIT_LIST_HEAD(&dev->devres_head);
>>          device_pm_init(dev);
> 
> So I think this is the right thing to do, but I note that while that
> lockdep_set_novalidate_class() was "documented" to only be for
> 'dev->mutex' by scripts/checkpatch.pl, that horrific thing is also
> used by md/bcache/btree.c for the mca_bucket_alloc().
> 
> Can we *please* get rid of it there too (it was added by the initial
> code, and never had any explicit excuse for it), possibly by using the
> same model.
> 
> And then we could get rid of lockdep_set_novalidate_class() entirely.
> That would be a good thing.

Yeah, what bcache really needs (and presumably dev->mutex as well) is 
just to disable lockdep checking for self-deadlock of that lock type, 
since it's got its own deadlock avoidance and the subclass thing isn't 
good enough.

I've got a patch that should do what we want, replying from my other 
account with it.
Kent Overstreet Feb. 11, 2023, 11:24 p.m. UTC | #2
On Sat, Feb 11, 2023 at 06:06:28PM -0500, Kent Overstreet wrote:
> On 2/11/23 16:51, Linus Torvalds wrote:
> > On Sat, Feb 11, 2023 at 1:41 PM Alan Stern <stern@rowland.harvard.edu> wrote:
> > > 
> > > @@ -2941,7 +2944,10 @@ void device_initialize(struct device *de
> > >          kobject_init(&dev->kobj, &device_ktype);
> > >          INIT_LIST_HEAD(&dev->dma_pools);
> > >          mutex_init(&dev->mutex);
> > > -       lockdep_set_novalidate_class(&dev->mutex);
> > > +       if (!lockdep_static_obj(dev)) {
> > > +               lockdep_register_key(&dev->mutex_key);
> > > +               lockdep_set_class(&dev->mutex, &dev->mutex_key);
> > > +       }
> > >          spin_lock_init(&dev->devres_lock);
> > >          INIT_LIST_HEAD(&dev->devres_head);
> > >          device_pm_init(dev);
> > 
> > So I think this is the right thing to do, but I note that while that
> > lockdep_set_novalidate_class() was "documented" to only be for
> > 'dev->mutex' by scripts/checkpatch.pl, that horrific thing is also
> > used by md/bcache/btree.c for the mca_bucket_alloc().
> > 
> > Can we *please* get rid of it there too (it was added by the initial
> > code, and never had any explicit excuse for it), possibly by using the
> > same model.
> > 
> > And then we could get rid of lockdep_set_novalidate_class() entirely.
> > That would be a good thing.
> 
> Yeah, what bcache really needs (and presumably dev->mutex as well) is just
> to disable lockdep checking for self-deadlock of that lock type, since it's
> got its own deadlock avoidance and the subclass thing isn't good enough.
> 
> I've got a patch that should do what we want, replying from my other account
> with it.

After scanning the rest of the thread: I don't think you want to create
separate lockdep classes for each bus and device type, that's defeating
how lockdep works. Maybe if it was only a small, _static_ number of new
classes, but the basic premesis of lockdep is that there are static
human understandable lock ordering rules, so lockdep figures out what
they are and checks them: if you create a bunch of dynamic classes, the
classes are going to be different for everyone in practice and won't
have any real bearing on the structure of the code - that is, given a
lockdep splat, you won't be able to do anything with it.

If static lock ordering rules aren't working (say, because the lock
ordering rules are determined by hardware relationships or what
userspace is doing), then you have to do something more sophisticated.

Wait/wound mutexes would be the next thing to look at, and DRM ended up
needing them for similar reasons as what you're running up against so I
think they bear serious consideration.

ww mutexes are the simple version of dynamic deadlock avoidance -
instead of doing full cycle detection they just compare transaction
start times, so they come at the cost of more frequent aborts. If this
is an issue for you, here's what full cycle detection looks like:

https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/btree_locking.c#n53
Kent Overstreet Feb. 12, 2023, 2:46 a.m. UTC | #3
On Sat, Feb 11, 2023 at 09:40:58PM -0500, Alan Stern wrote:
> On Sat, Feb 11, 2023 at 06:24:42PM -0500, Kent Overstreet wrote:
> > After scanning the rest of the thread: I don't think you want to create
> > separate lockdep classes for each bus and device type, that's defeating
> > how lockdep works.
> 
> Not at all.  In fact, exactly the opposite: lockdep works by creating a 
> class for each lock-inside-a-data-structure-type combination.  A struct 
> device-bus_type/device_type combination is pretty much the same kind of 
> thing.
> 
> >  Maybe if it was only a small, _static_ number of new
> > classes,
> 
> The collection of bus_types and device_types _is_ static, in the sense 
> that each one is a structure defined in a driver source file.  Whether 
> the number is "small" depends on your tolerance for large numbers; the 
> kernel has a lot of source files.  :-)
> 
> Mind you, I'm not saying that having lockdep classes for each bus_type 
> or device_type is always the right thing to do.  There definitely are 
> cases where it wouldn't do what we want.  But perhaps in some cases it 
> would work.
> 
> > but the basic premesis of lockdep is that there are static
> > human understandable lock ordering rules, so lockdep figures out what
> > they are and checks them: if you create a bunch of dynamic classes, the
> > classes are going to be different for everyone in practice and won't
> > have any real bearing on the structure of the code
> 
> As a rule, bus_type's and device_type's aren't dynamic.  Maybe Greg KH 
> once published an example of such a thing; IIRC it was more like a 
> proof-of-principle rather than a serious recommendation on how to write 
> drivers.  (Or else I'm misremembering and it was actually an example of 
> creating dynamic sysfs attributes.)
> 
> Or maybe you're referring to what this patch does?  It does indeed 
> create a bunch of dynamic classes -- one for each struct device.  The 
> ordering rules derived by lockdep will be somewhat arbitrary, as you 
> say.  But some of them certainly will be related to the structure of the 
> source code.

I could be :) I haven't been able to find the patch in question - have a
link?

If you're talking about making lock_class_key dynamic, I think I stand
by what I said though - OTOH, if all you're doing is lifting that to the
caller of the device object init function, so it'll still be a static
object in the driver, that would be totally fine.

I probably should've found the patch before commenting :)
Kent Overstreet Feb. 12, 2023, 3:10 a.m. UTC | #4
On Sat, Feb 11, 2023 at 10:03:11PM -0500, Alan Stern wrote:
> On Sat, Feb 11, 2023 at 09:46:42PM -0500, Kent Overstreet wrote:
> > On Sat, Feb 11, 2023 at 09:40:58PM -0500, Alan Stern wrote:
> > > Or maybe you're referring to what this patch does?  It does indeed 
> > > create a bunch of dynamic classes -- one for each struct device.  The 
> > > ordering rules derived by lockdep will be somewhat arbitrary, as you 
> > > say.  But some of them certainly will be related to the structure of the 
> > > source code.
> > 
> > I could be :) I haven't been able to find the patch in question - have a
> > link?
> 
> It was earlier in this email thread.  Here's a link:
> 
> https://lore.kernel.org/r/Y+gLd78vChQERZ6A@rowland.harvard.edu/
> 
> > If you're talking about making lock_class_key dynamic, I think I stand
> > by what I said though - OTOH, if all you're doing is lifting that to the
> > caller of the device object init function, so it'll still be a static
> > object in the driver, that would be totally fine.
> 
> The patch does the first, not the second.  Feel free to object some 
> more...  :-)

So IMO the more correct way to do this would be to change
device_initialize() to __device_initialize(), have it take a
lock_class_key as a parameter, and then use __mutex_init() instead of
mutex_init().

But let's think about this more. Will there ever be situations where
lock ordering is dependent on what hardware is plugged into what, or
what hardware is plugged in first?
Kent Overstreet Feb. 12, 2023, 8:51 p.m. UTC | #5
On Sun, Feb 12, 2023 at 03:19:16PM -0500, Alan Stern wrote:
> Maybe I didn't understand your suggestion.  Did you mean that all 
> callers of device_initialize() (or whatever) should be able to specify 
> their own lock_class_key?  Or were you just trying to avoid the single 
> static allocation of a lock_class_key in device_initialize() done as a 
> side-effect of the mutex_init() call?
> 
> > And whatever effect
> > would only be when lockdep is enabled, so not a concern.
> 
> Not if a new function is created (i.e., __device_initialize()).

Follow the same pattern as mutex_init() - have a look over that code.

> > But consider that the name of a lock as registered with lockdep really
> > should correspond to a source code location - i.e. it should be
> > something you can grep for. (We should probably consider adding file and
> > line number to that string, since current names are not unambiguous).
> > 
> > Whereas in your pass, you're calling lockdep_set_class(), which
> > generates a class name via stringifcation - with the same name every
> > time.
> > 
> > Definitely _don't_ do that. With your patch, the generated lockdep
> > traces will be useless.
> 
> I'll revise the patch to use the device's name for the class.  While it 
> may not be globally unique, it should be sufficiently specific.
> 
> (Device names are often set after the device is initialized.  Does 
> lockdep mind if a lock_class_key's name is changed after it has been 
> registered?)

The device name should _not_ be something dynamic, it should be
something easily tied back to a source code location - i.e. related to
the driver name, not the device name.

That's how people read and use lockdep reports!

Do it exactly the same way mutex_init() does it, just lift it up a level
to a wrapper around device_initialize() - stringify the pointer to the
mutex (embedded in struct device, embedded in what-have-you driver code)
and use that.

> You're right.  There are no explicitly documented rules for device 
> locking as far as I'm aware.  Each subsystem handles its own locking 
> independently (but self-consistently, we hope).  That's one of the 
> reasons that creating lockdep rules for devices is so difficult.
> 
> The business about not locking a parent if you already own the child's 
> lock is perhaps the only universal -- and I don't know that it's written 
> down anywhere.

Yeah that's sketchy; if the rules are too complicated to be written
down, they're too complicated.

One thing that could be contemplated is adding support for user-defined
comparison functions to lockdep, to define a lock ordering within a
class when subclass isn't sufficient.

That would work for bcache - for bcache the lock ordering is parent
nodes before children, and if multiple nodes are locked at the same
level they have to be locked in natural key order.

But, this would add a lot of complexity to lockdep, and this is the sort
of situation where if you have a bug in the comparison function (i.e. it
doesn't define a total ordering) it'll break things in terribly fun
ways.

> > The patch I posted would be an improvement over the current situation,
> > because it'd get you checking w.r.t. other lock types - but with that
> > you would still have to have your own deadlock avoidance strategy, and
> > you'd have to be _really_ clear on what it is and how you know that
> > you're getting it right - you're still opting out of checking.
> 
> Same with the patch I posted, except that it opts back into checking.
> 
> > I think you should really be investigating wait/wound mutexes here.
> 
> At this stage, converting would be most impractical.  And I don't think 
> it's really needed.

They do make you deal with lock restarts; unwinding typical stateful
kernel code is not necessarily super practical :)

Anyways, it sounds like the lockdep-class-per-driver approach will get
you more information, that's certainly a reasonable approach for now.

Cheers,
-Kent
Kent Overstreet Feb. 13, 2023, 2:21 a.m. UTC | #6
On Sun, Feb 12, 2023 at 08:23:34PM -0500, Alan Stern wrote:
> I really don't think that's a good idea here.  When you've got a bus 
> containing multiple devices, typically all those device structures are 
> created by the same line of code.  So knowing the source code location 
> won't tell you _which_ device structure is involved in the locking 
> cycle or what driver it's using.

Yeah, I was thinking about this more and realized it'd be insufficient.

> By contrast, knowing the device name 
> would.
> 
> Furthermore, to the extent that the device's name identifies what kind 
> of device it is, the name would tell you what where the structure was 
> created and which driver it is using.

OTOH, with the device name, it seems like you'll additionally need the
full device topology to be able to do anything with lockdep splats, no?

What if we just added a way to set a comparison function for a lockdep
class? I'm looking at the lockdep code now, and I think I could do that
for you.
Peter Zijlstra Feb. 13, 2023, 9:27 a.m. UTC | #7
On Sun, Feb 12, 2023 at 03:19:16PM -0500, Alan Stern wrote:

> (Device names are often set after the device is initialized.  Does 
> lockdep mind if a lock_class_key's name is changed after it has been 
> registered?)

It does, althought I don't at the moment recall how hard it would be to
change that.
Alan Stern Feb. 13, 2023, 3:25 p.m. UTC | #8
On Mon, Feb 13, 2023 at 10:24:13AM +0100, Peter Zijlstra wrote:
> On Sun, Feb 12, 2023 at 10:23:44AM -0500, Alan Stern wrote:
> > Provided it acquires the parent device's lock first, this is 
> > utterly safe no matter what order the children are locked in.  Try 
> > telling that to lockdep! 
> 
> mutex_lock_next_lock(child->lock, parent->lock) is there to express this
> exact pattern, it allows taking multiple child->lock class locks (in any
> order) provided parent->lock is held.

Ah, this is news to me.  Is this sort of thing documented somewhere?

Alan Stern
Peter Zijlstra Feb. 14, 2023, 10:48 a.m. UTC | #9
On Mon, Feb 13, 2023 at 05:51:11PM -0800, Boqun Feng wrote:
> On Mon, Feb 13, 2023 at 05:29:49PM +0100, Peter Zijlstra wrote:
> > On Mon, Feb 13, 2023 at 10:25:59AM -0500, Alan Stern wrote:
> > > On Mon, Feb 13, 2023 at 10:24:13AM +0100, Peter Zijlstra wrote:
> > > > On Sun, Feb 12, 2023 at 10:23:44AM -0500, Alan Stern wrote:
> > > > > Provided it acquires the parent device's lock first, this is 
> > > > > utterly safe no matter what order the children are locked in.  Try 
> > > > > telling that to lockdep! 
> > > > 
> > > > mutex_lock_next_lock(child->lock, parent->lock) is there to express this
> > > > exact pattern, it allows taking multiple child->lock class locks (in any
> > > > order) provided parent->lock is held.
> > > 
> > > Ah, this is news to me.  Is this sort of thing documented somewhere?
> 
> Basically if you have two lock instances A and B with the same class,
> and you know that locking ordering is always A -> B, then you can do
> 
> 	mutex_lock(A);
> 	mutex_lock_nest_lock(B, A); // lock B.
> 

No, this isn't quite right, You need at least 3 locks and 2 classes.

P, C1, C2

Then:

	mutex_lock(P)
	mutex_lock_next_lock(C1, P)
	mutex_lock_next_lock(C2, P)

And it will accept any order of Cn -- since it assumes that any
multi-lock of Cn will always hold P, therefore the ordering fully given
by P.
Peter Zijlstra Feb. 14, 2023, 11:05 a.m. UTC | #10
On Mon, Feb 13, 2023 at 01:46:11PM -0500, Kent Overstreet wrote:
> On Mon, Feb 13, 2023 at 10:24:13AM +0100, Peter Zijlstra wrote:
> > On Sun, Feb 12, 2023 at 10:23:44AM -0500, Alan Stern wrote:
> > > Provided it acquires the parent device's lock first, this is 
> > > utterly safe no matter what order the children are locked in.  Try 
> > > telling that to lockdep! 
> > 
> > mutex_lock_next_lock(child->lock, parent->lock) is there to express this
> > exact pattern, it allows taking multiple child->lock class locks (in any
> > order) provided parent->lock is held.
> 
> Perhaps I'm stupid, but I've never understood how subclasses - or this -
> are supposed to work.
> 
> Locks don't get a fixed subclass, so what's to prevent some code from
> going

So there's two annotations here, the nest_lock thing and subclasses,
they're distinct things.

Every class gets a fixed 8 subclasses (0-7) given by the unique byte
addresses inside the actual key object.

Subclasses will let you create nesting order of the same class that are
acceptable. Typically lock/1 nests inside lock/0, but that's not
hard-coded, simply convention.

The way it is used is given an external lock order, say the CPU number
for the runqueue locks, you do things like:

void double_rq_lock(struct rq *rq1, struct rq *r2)
{
	lockdep_assert_irqs_disabled();

	if (rq_order_less(r2, rq1))
		swap(rq1, rq2);

	raw_spin_rq_lock(rq1);
	if (__rq_lockp(rq1) != __rq_lock(rq2))
		raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);

	...
}

(which is more complicated than it needs to be due to the whole
core-scheduling mess, but should still be readable I suppose).

Basically we make sure rq1 and rq2 are in the correct order and acquire
them with subclass 0 (the default) and subcless 1 (SINGLE_DEPTH_NESTING)
resp. dictating the subclass order.

This is lock order per decree, if you get the order function wrong
lockdep will not see the inversion but you *will* deadlock.


Then there's that nesting lock, that requires two classes and at least 3
locks to make sense:

  P, C1, C2

Where we posit that any multi-lock of Cn is fully serialized by P and it
is used like:

	mutex_lock(P);
	mutex_lock_nest_lock(C1, P);
	mutex_lock_nest_lock(C2, P);

Where any order of Cn is acceptable, because fully ordered by P.

If you were to combine this with subclass on Cn to allow multi-lock
instances not order by P, you get to keep the pieces :-)
Boqun Feng Feb. 14, 2023, 4:22 p.m. UTC | #11
On Tue, Feb 14, 2023 at 11:48:07AM +0100, Peter Zijlstra wrote:
> On Mon, Feb 13, 2023 at 05:51:11PM -0800, Boqun Feng wrote:
> > On Mon, Feb 13, 2023 at 05:29:49PM +0100, Peter Zijlstra wrote:
> > > On Mon, Feb 13, 2023 at 10:25:59AM -0500, Alan Stern wrote:
> > > > On Mon, Feb 13, 2023 at 10:24:13AM +0100, Peter Zijlstra wrote:
> > > > > On Sun, Feb 12, 2023 at 10:23:44AM -0500, Alan Stern wrote:
> > > > > > Provided it acquires the parent device's lock first, this is 
> > > > > > utterly safe no matter what order the children are locked in.  Try 
> > > > > > telling that to lockdep! 
> > > > > 
> > > > > mutex_lock_next_lock(child->lock, parent->lock) is there to express this
> > > > > exact pattern, it allows taking multiple child->lock class locks (in any
> > > > > order) provided parent->lock is held.
> > > > 
> > > > Ah, this is news to me.  Is this sort of thing documented somewhere?
> > 
> > Basically if you have two lock instances A and B with the same class,
> > and you know that locking ordering is always A -> B, then you can do
> > 
> > 	mutex_lock(A);
> > 	mutex_lock_nest_lock(B, A); // lock B.
> > 
> 
> No, this isn't quite right, You need at least 3 locks and 2 classes.
> 
> P, C1, C2
> 
> Then:
> 
> 	mutex_lock(P)
> 	mutex_lock_next_lock(C1, P)
> 	mutex_lock_next_lock(C2, P)
> 
> And it will accept any order of Cn -- since it assumes that any
> multi-lock of Cn will always hold P, therefore the ordering fully given
> by P.

Ah, right, I was missing the fact that it works with 2 classes...

But I think with only one class, the nest_lock() still works, right?
In other words, if P and Cn are the same lock class in your example.

Also seems I gave a wrong answer to Alan, just to clarify, the following
is not a deadlock to lockdep:

T1:
	mutex_lock(P)
	mutex_lock_next_lock(C1, P)
	mutex_lock_next_lock(C2, P)
	mutex_lock(B)

T2:
	mutex_lock(P)
	mutex_lock(B)
	mutex_lock_next_lock(C1, P)
	mutex_lock_next_lock(C2, P)

Because of any pair of

	mutex_lock(L);
	... // other locks maybe
	mutex_lock_nest_lock(M, L);

lockdep will not add M into the dependency graph, since it's nested and
should be serialized by L.

Regards,
Boqun
Alan Stern Feb. 14, 2023, 8:05 p.m. UTC | #12
On Tue, Feb 14, 2023 at 12:05:27PM +0100, Peter Zijlstra wrote:
> Every class gets a fixed 8 subclasses (0-7) given by the unique byte
> addresses inside the actual key object.
> 
> Subclasses will let you create nesting order of the same class that are
> acceptable. Typically lock/1 nests inside lock/0, but that's not
> hard-coded, simply convention.

Can you explain in more detail how this works in the lockdep checking 
algorithm?  (For simplicity, let's leave out issues of interrupt status 
and other environmental things.)

I've been assuming that lockdep builds up a set of links between the 
classes -- namely, a link is created from A to B whenever a thread holds 
a lock of class A while acquiring a lock of class B.  The checking part 
would then amount to just making sure that these links don't form any 
cycles.

So then how do subclasses fit into the picture?  Is it just that now the 
links are between subclasses rather than classes, so it's not 
automatically wrong to hold a lock while acquiring another lock of the 
same class as long as the two acquisitions are in different subclasses?  
But you can still provoke a violation if there's a cycle among the 
subclasses?

> Then there's that nesting lock, that requires two classes and at least 3
> locks to make sense:
> 
>   P, C1, C2
> 
> Where we posit that any multi-lock of Cn is fully serialized by P

Assuming the speculations above are correct, how does the algorithm take 
lockdep nesting into account?  Does it simply avoid creating a link from 
subclass C to itself if both C1 and C2 were acquired while holding a 
lock of the parent subclass and both acquisitions were annotated with 
mutex_lock_next_lock()?  Or is there more to it than that?  (And should 
I have written class rather than subclass?)

And Kent, how does your proposed lockdep_set_no_check_recursion() work?  
Does it just prevent lockdep from creating a link between any two 
subclasses of the specified class?  Does it do anything more?

Alan
Kent Overstreet Feb. 14, 2023, 8:16 p.m. UTC | #13
On Tue, Feb 14, 2023 at 12:05:27PM +0100, Peter Zijlstra wrote:
> This is lock order per decree, if you get the order function wrong
> lockdep will not see the inversion but you *will* deadlock.

Yeah, that's what I mean. Given that a subclass isn't a fixed thing you
assign to a lock, just something you magic up as needed - I just don't
see what this gets us?

Why not just tell lockdep what the order function is directly?

(I know, I've been saying I'd write a patch for this, I'll get around to
it, I swear :)
Peter Zijlstra Feb. 15, 2023, 10:33 a.m. UTC | #14
On Tue, Feb 14, 2023 at 03:05:42PM -0500, Alan Stern wrote:
> On Tue, Feb 14, 2023 at 12:05:27PM +0100, Peter Zijlstra wrote:
> > Every class gets a fixed 8 subclasses (0-7) given by the unique byte
> > addresses inside the actual key object.
> > 
> > Subclasses will let you create nesting order of the same class that are
> > acceptable. Typically lock/1 nests inside lock/0, but that's not
> > hard-coded, simply convention.
> 
> Can you explain in more detail how this works in the lockdep checking 
> algorithm?  (For simplicity, let's leave out issues of interrupt status 
> and other environmental things.)
> 
> I've been assuming that lockdep builds up a set of links between the 
> classes -- namely, a link is created from A to B whenever a thread holds 
> a lock of class A while acquiring a lock of class B.  The checking part 
> would then amount to just making sure that these links don't form any 
> cycles.
> 
> So then how do subclasses fit into the picture?  Is it just that now the 
> links are between subclasses rather than classes, so it's not 
> automatically wrong to hold a lock while acquiring another lock of the 
> same class as long as the two acquisitions are in different subclasses?  
> But you can still provoke a violation if there's a cycle among the 
> subclasses?

For all intents and purposes the subclasses are fully distinct classes
from the validation pov.

	mutex_lock(L);
	mutex_lock_nested(L, 0);

are equivalent (ignoring lockdep_set_subclass()), and

	mutex_lock_nested(L, 1);

is a distinct class, validation wise. So if you write:

	mutex_lock(L1);
	mutex_lock_nested(L2, 1);

you explicitly create a lock order between the distinct validation
classes: L/0,  L/1

> > Then there's that nesting lock, that requires two classes and at least 3
> > locks to make sense:
> > 
> >   P, C1, C2
> > 
> > Where we posit that any multi-lock of Cn is fully serialized by P
> 
> Assuming the speculations above are correct, how does the algorithm take 
> lockdep nesting into account?  Does it simply avoid creating a link from 
> subclass C to itself if both C1 and C2 were acquired while holding a 
> lock of the parent subclass and both acquisitions were annotated with 
> mutex_lock_next_lock()?

Basically this; it will explicitly ignore the nesting.

Given:

	mutex_lock(P);
	mutex_lock_nest_lock(C1, P);
	mutex_lock_nest_lock(C2, P);

mutex_lock_nest_lock() basically does:

 - validate that the instance of P is actually held.
   (as such, mutex_lock_nest_lock(C1, P1); mutex_lock_nest_lock(C2, P2);
   will cause objections).

 - either:

   * establish P->C in the held-lock stack
     and update the graph if so required

   * find the existing C in the held-lock stack
     and instead of complaining about class recursion, increment a
     refcount, and leave the held-stack and thus the graph unmodified.

subsequent mutex_unlock() will decrement the refcount and only when 0
'pop' the actual entry from the held stack.
Peter Zijlstra Feb. 15, 2023, 10:45 a.m. UTC | #15
On Tue, Feb 14, 2023 at 08:22:28AM -0800, Boqun Feng wrote:

> Ah, right, I was missing the fact that it works with 2 classes...
> 
> But I think with only one class, the nest_lock() still works, right?
> In other words, if P and Cn are the same lock class in your example.

I don't think so, but I don't think I've carefully considered that case.

> Also seems I gave a wrong answer to Alan, just to clarify, the following
> is not a deadlock to lockdep:
> 
> T1:
> 	mutex_lock(P)
> 	mutex_lock_next_lock(C1, P)
> 	mutex_lock_next_lock(C2, P)
> 	mutex_lock(B)
> 
> T2:
> 	mutex_lock(P)
> 	mutex_lock(B)
> 	mutex_lock_next_lock(C1, P)
> 	mutex_lock_next_lock(C2, P)
> 

This should in fact complain about a CB-BC deadlock, (but I've not
tested it, just going on memories of how I implemented it).

> Because of any pair of
> 
> 	mutex_lock(L);
> 	... // other locks maybe
> 	mutex_lock_nest_lock(M, L);
> 
> lockdep will not add M into the dependency graph, since it's nested and
> should be serialized by L.

We do enter M into the dependency graph, but instead ignore M-M
recursion. Specifically so that we might catch the above deadlock vs B.
Boqun Feng Feb. 20, 2023, 5:32 p.m. UTC | #16
On Wed, Feb 15, 2023 at 11:45:03AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 14, 2023 at 08:22:28AM -0800, Boqun Feng wrote:
> 
> > Ah, right, I was missing the fact that it works with 2 classes...
> > 
> > But I think with only one class, the nest_lock() still works, right?
> > In other words, if P and Cn are the same lock class in your example.

After playing with some self test cases, I found I was wrong again ;-(

> 
> I don't think so, but I don't think I've carefully considered that case.
> 

You are right, the same class case will trigger a DEBUG_LOCKS_WARN_ON()
in the match_held_lock() when releasing the locks.

> > Also seems I gave a wrong answer to Alan, just to clarify, the following
> > is not a deadlock to lockdep:
> > 
> > T1:
> > 	mutex_lock(P)
> > 	mutex_lock_next_lock(C1, P)
> > 	mutex_lock_next_lock(C2, P)
> > 	mutex_lock(B)
> > 
> > T2:
> > 	mutex_lock(P)
> > 	mutex_lock(B)
> > 	mutex_lock_next_lock(C1, P)
> > 	mutex_lock_next_lock(C2, P)
> > 
> 
> This should in fact complain about a CB-BC deadlock, (but I've not
> tested it, just going on memories of how I implemented it).
> 

Yes, confirmed by a selftest.

> > Because of any pair of
> > 
> > 	mutex_lock(L);
> > 	... // other locks maybe
> > 	mutex_lock_nest_lock(M, L);
> > 
> > lockdep will not add M into the dependency graph, since it's nested and
> > should be serialized by L.
> 
> We do enter M into the dependency graph, but instead ignore M-M
> recursion. Specifically so that we might catch the above deadlock vs B.

Right, I mis-read the code, which suggests I should improve it to help
the future me ;-)

FWIW, the selftests I used are as follow:

Regards,
Boqun

------------------------------->8
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 8d24279fad05..6aadebad68c1 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -60,6 +60,7 @@ __setup("debug_locks_verbose=", setup_debug_locks_verbose);
 #define LOCKTYPE_RTMUTEX 0x20
 #define LOCKTYPE_LL	0x40
 #define LOCKTYPE_SPECIAL 0x80
+#define LOCKTYPE_NEST	0x100
 
 static struct ww_acquire_ctx t, t2;
 static struct ww_mutex o, o2, o3;
@@ -2091,14 +2092,14 @@ static void ww_test_edeadlk_acquire_wrong_slow(void)
 	ww_mutex_lock_slow(&o3, &t);
 }
 
-static void ww_test_spin_nest_unlocked(void)
+static void nest_test_spin_nest_unlocked(void)
 {
 	spin_lock_nest_lock(&lock_A, &o.base);
 	U(A);
 }
 
 /* This is not a deadlock, because we have X1 to serialize Y1 and Y2 */
-static void ww_test_spin_nest_lock(void)
+static void nest_test_spin_nest_lock(void)
 {
 	spin_lock(&lock_X1);
 	spin_lock_nest_lock(&lock_Y1, &lock_X1);
@@ -2110,6 +2111,33 @@ static void ww_test_spin_nest_lock(void)
 	spin_unlock(&lock_X1);
 }
 
+static void nest_test_spin_nest_lock_deadlock(void)
+{
+	nest_test_spin_nest_lock();
+
+	/*
+	 * Although above is not a deadlokc, but with the following code, Y1 and
+	 * A create a ABBA deadlock.
+	 */
+	spin_lock(&lock_X1);
+	spin_lock(&lock_A);
+	spin_lock_nest_lock(&lock_Y1, &lock_X1);
+	spin_lock_nest_lock(&lock_Y2, &lock_X1);
+	spin_unlock(&lock_A);
+	spin_unlock(&lock_Y2);
+	spin_unlock(&lock_Y1);
+	spin_unlock(&lock_X1);
+}
+
+/* Not the supported usage */
+static void nest_test_spin_nest_lock_same_class(void)
+{
+	spin_lock(&lock_X1);
+	spin_lock_nest_lock(&lock_X2, &lock_X1);
+	spin_unlock(&lock_X2);
+	spin_unlock(&lock_X1);
+}
+
 static void ww_test_unneeded_slow(void)
 {
 	WWAI(&t);
@@ -2323,14 +2351,6 @@ static void ww_tests(void)
 	dotest(ww_test_edeadlk_acquire_wrong_slow, FAILURE, LOCKTYPE_WW);
 	pr_cont("\n");
 
-	print_testname("spinlock nest unlocked");
-	dotest(ww_test_spin_nest_unlocked, FAILURE, LOCKTYPE_WW);
-	pr_cont("\n");
-
-	print_testname("spinlock nest test");
-	dotest(ww_test_spin_nest_lock, SUCCESS, LOCKTYPE_WW);
-	pr_cont("\n");
-
 	printk("  -----------------------------------------------------\n");
 	printk("                                 |block | try  |context|\n");
 	printk("  -----------------------------------------------------\n");
@@ -2360,6 +2380,27 @@ static void ww_tests(void)
 	pr_cont("\n");
 }
 
+static void nest_tests(void)
+{
+	printk("  --------------------------------------------------------------------------\n");
+	printk("  | nest lock tests |\n");
+	printk("  -------------------\n");
+	print_testname("spinlock nest unlocked");
+	dotest(nest_test_spin_nest_unlocked, FAILURE, LOCKTYPE_NEST);
+	pr_cont("\n");
+
+	print_testname("spinlock nest test");
+	dotest(nest_test_spin_nest_lock, SUCCESS, LOCKTYPE_NEST);
+	pr_cont("\n");
+	print_testname("spinlock nest test dead lock");
+	dotest(nest_test_spin_nest_lock_deadlock, FAILURE, LOCKTYPE_NEST);
+	pr_cont("\n");
+	print_testname("spinlock nest test dead lock");
+	dotest(nest_test_spin_nest_lock_same_class, FAILURE, LOCKTYPE_NEST);
+	pr_cont("\n");
+
+}
+
 
 /*
  * <in hardirq handler>
@@ -2966,6 +3007,8 @@ void locking_selftest(void)
 
 	ww_tests();
 
+	nest_tests();
+
 	force_read_lock_recursive = 0;
 	/*
 	 * queued_read_lock() specific test cases can be put here
diff mbox series

Patch

diff --git a/drivers/base/core.c b/drivers/base/core.c
index a3e14143ec0c..c30ecbc4d60e 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2322,6 +2322,9 @@  static void device_release(struct kobject *kobj)
 	devres_release_all(dev);
 
 	kfree(dev->dma_range_map);
+	mutex_destroy(&dev->mutex);
+	if (!lockdep_static_obj(dev))
+		lockdep_unregister_key(&dev->mutex_key);
 
 	if (dev->release)
 		dev->release(dev);
@@ -2941,7 +2944,9 @@  void device_initialize(struct device *dev)
 	kobject_init(&dev->kobj, &device_ktype);
 	INIT_LIST_HEAD(&dev->dma_pools);
 	mutex_init(&dev->mutex);
-	lockdep_set_novalidate_class(&dev->mutex);
+	if (!lockdep_static_obj(dev))
+		lockdep_register_key(&dev->mutex_key);
+	lockdep_set_class(&dev->mutex, &dev->mutex_key);
 	spin_lock_init(&dev->devres_lock);
 	INIT_LIST_HEAD(&dev->devres_head);
 	device_pm_init(dev);
diff --git a/include/linux/device.h b/include/linux/device.h
index 44e3acae7b36..bdaca9f54dc2 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -570,6 +570,7 @@  struct device {
 	struct mutex		mutex;	/* mutex to synchronize calls to
 					 * its driver.
 					 */
+	struct lock_class_key	mutex_key;	/* Unique key for each device */
 
 	struct dev_links_info	links;
 	struct dev_pm_info	power;
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 1f1099dac3f0..5afc999a7e56 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -172,6 +172,7 @@  do {							\
 	current->lockdep_recursion -= LOCKDEP_OFF;	\
 } while (0)
 
+extern int lockdep_static_obj(const void *obj);
 extern void lockdep_register_key(struct lock_class_key *key);
 extern void lockdep_unregister_key(struct lock_class_key *key);
 
@@ -391,6 +392,11 @@  static inline void lockdep_set_selftest_task(struct task_struct *task)
 # define lockdep_free_key_range(start, size)	do { } while (0)
 # define lockdep_sys_exit() 			do { } while (0)
 
+static inline int lockdep_static_obj(const void *obj)
+{
+	return 0;
+}
+
 static inline void lockdep_register_key(struct lock_class_key *key)
 {
 }
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index e3375bc40dad..74c0113646f1 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -857,6 +857,13 @@  static int static_obj(const void *obj)
 	 */
 	return is_module_address(addr) || is_module_percpu_address(addr);
 }
+
+int lockdep_static_obj(const void *obj)
+{
+	return static_obj(obj);
+}
+EXPORT_SYMBOL_GPL(lockdep_static_obj);
+
 #endif
 
 /*