diff mbox

[v5,2/9] IB/core: Replace semaphore sm_sem with an atomic wait

Message ID 1479708496-9828-3-git-send-email-binoy.jayan@linaro.org
State New
Headers show

Commit Message

Binoy Jayan Nov. 21, 2016, 6:08 a.m. UTC
The semaphore 'sm_sem' is used for an exclusive ownership of the device
so model the same as an atomic variable with an associated wait_event.
Semaphores are going away in the future.

Signed-off-by: Binoy Jayan <binoy.jayan@linaro.org>

---
 drivers/infiniband/core/user_mad.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

Comments

Arnd Bergmann Nov. 21, 2016, 4:52 p.m. UTC | #1
On Monday, November 21, 2016 7:57:53 AM CET Linus Torvalds wrote:
> Don't do this.

> 

> Never ever do your own locking primitives. You will get the memory ordering

> wrong. And even if you get it right, why do it?

> 

> If you want to get rid of semaphores, and replace them with a mutex, that's

> OK. But don't replace them with something more complex like an open coded

> waiting model.


I think a mutex would't work here, since fops->open() and fops->close()
are not called from the same context and lockdep will complain
about that.

Version of the series had replaced the semaphore with a completion
here, which worked correctly, but one reviewer suggested using
the wait_event() instead since it's confusing to have a completion
starting out in 'completed' state.

	Arnd
Christoph Hellwig Nov. 21, 2016, 4:57 p.m. UTC | #2
On Mon, Nov 21, 2016 at 05:52:27PM +0100, Arnd Bergmann wrote:
> I think a mutex would't work here, since fops->open() and fops->close()

> are not called from the same context and lockdep will complain

> about that.

> 

> Version of the series had replaced the semaphore with a completion

> here, which worked correctly, but one reviewer suggested using

> the wait_event() instead since it's confusing to have a completion

> starting out in 'completed' state.


On the other hand the existing semaphore works perfectly fine, and
is way easier to understand than any of the other models, which also
don't provide any benefit at all.

Please stop messing with perfectly fine semaphores now - there are
plenty that are much better replaced with mutexes or completions,
but this (and various others are not).  Leave them alone and do
something useful instead.
Linus Torvalds Nov. 21, 2016, 5:57 p.m. UTC | #3
On Mon, Nov 21, 2016 at 8:52 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>

> Version of the series had replaced the semaphore with a completion

> here, which worked correctly, but one reviewer suggested using

> the wait_event() instead since it's confusing to have a completion

> starting out in 'completed' state.


Quite frankly, in that case just keep it as a semaphore.

So we have a few cases:

 - completions are *slow*.

   They are designed to be entirely synchronous, exactly so that you
can wait for a completion to finish, and then immediately free the
memory that the completion is in.

 - completions used fro mutual exclusion are *confusing*.

   Yes, they end up working as a counting semaphore, and yes, that's
by design, but at the same time, they are not _meant_ to be used as a
semaphore. The name matters. They are meant to be used the way they
are named: to be a "I'm done now" event. Don't use them for anything
else, because you *will* be confusing everybody else.

 - open-coding wait-queues are really fragile and are *not* meant for exclusion.

   They don't have lockdep checking, but more importantly, people
always invariably get the memory ordering wrong. And by "wrong" I mean
both "doesn't work" (because of insufficient memory ordering) and
"slow" (due to excessive memory ordering).

 - mutexes are good for exclusion.

   mutexes are both high-performance and non-confusing. Yes, they get
lockdep warnings, but that's usually a *good* thing. If you get
lockdep warnings from mutex use, you're almost certainly doing
something iffy.

   mutexes do have a subtle downside: exactly because they are fast,
they are not entirely synchronous. You can't use them for completion
style notifications if you are going to release the memory they are
allocated in.

   So mutexes need to be either statically allocated, or in
reference-counted allocations. That's so that the lifetime of the
memory is guaranteed to be cover all the users (including the wakers).

 - semaphores are "old-fashioned mutexes". A mutex is better than a
semaphore, but a semaphore is better than just about all the other
alternatives. There's nothing _wrong_ with using a semaphore per se.

In this case, either use a semaphore or a mutex. If you are doing
mutual exclusion, those are really the only two acceptable sleeping
models.

                Linus
Arnd Bergmann Nov. 21, 2016, 10:51 p.m. UTC | #4
On Monday, November 21, 2016 9:57:51 AM CET Linus Torvalds wrote:
> 

>  - semaphores are "old-fashioned mutexes". A mutex is better than a

> semaphore, but a semaphore is better than just about all the other

> alternatives. There's nothing _wrong_ with using a semaphore per se.

> 

> In this case, either use a semaphore or a mutex. If you are doing

> mutual exclusion, those are really the only two acceptable sleeping

> models.


The main problem with semaphores is that they are slowly spreading
into areas that really should be mutexes or completions. A couple
of years ago, we had only around 30 semaphores left in the kernel
and while a lot of those have been removed in the meantime, over
100 new ones have come in, the majority of them in the category
that can be trivially converted to a mutex or semaphore.

This in turn is not much of a problem, except to a certain degree
for preempt-rt users.

I suggested to Binoy that he could look into replacing the existing
semaphores one subsystem at a time under the assumption that we
could find a relatively easy alternative for every one of them
and then remove the implementation completely.

Christoph's suggestion is probably more productive here: let's
remove the ones that are obviously wrong or inferior, and only
once they have been taken care of we can look into whether it's
worth doing something about the rest or not.

	Arnd
diff mbox

Patch

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 415a318..6101c0a 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -67,6 +67,8 @@  enum {
 	IB_UMAD_MINOR_BASE = 0
 };
 
+#define UMAD_F_CLAIM	0x01
+
 /*
  * Our lifetime rules for these structs are the following:
  * device special file is opened, we take a reference on the
@@ -87,7 +89,8 @@  struct ib_umad_port {
 
 	struct cdev           sm_cdev;
 	struct device	      *sm_dev;
-	struct semaphore       sm_sem;
+	wait_queue_head_t     wq;
+	unsigned long         flags;
 
 	struct mutex	       file_mutex;
 	struct list_head       file_list;
@@ -1030,12 +1033,14 @@  static int ib_umad_sm_open(struct inode *inode, struct file *filp)
 	port = container_of(inode->i_cdev, struct ib_umad_port, sm_cdev);
 
 	if (filp->f_flags & O_NONBLOCK) {
-		if (down_trylock(&port->sm_sem)) {
+		if (test_and_set_bit(UMAD_F_CLAIM, &port->flags)) {
 			ret = -EAGAIN;
 			goto fail;
 		}
 	} else {
-		if (down_interruptible(&port->sm_sem)) {
+		if (wait_event_interruptible(port->wq,
+					     !test_and_set_bit(UMAD_F_CLAIM,
+					     &port->flags))) {
 			ret = -ERESTARTSYS;
 			goto fail;
 		}
@@ -1060,7 +1065,8 @@  static int ib_umad_sm_open(struct inode *inode, struct file *filp)
 	ib_modify_port(port->ib_dev, port->port_num, 0, &props);
 
 err_up_sem:
-	up(&port->sm_sem);
+	clear_bit(UMAD_F_CLAIM, &port->flags);
+	wake_up(&port->wq);
 
 fail:
 	return ret;
@@ -1079,7 +1085,8 @@  static int ib_umad_sm_close(struct inode *inode, struct file *filp)
 		ret = ib_modify_port(port->ib_dev, port->port_num, 0, &props);
 	mutex_unlock(&port->file_mutex);
 
-	up(&port->sm_sem);
+	clear_bit(UMAD_F_CLAIM, &port->flags);
+	wake_up(&port->wq);
 
 	kobject_put(&port->umad_dev->kobj);
 
@@ -1177,7 +1184,8 @@  static int ib_umad_init_port(struct ib_device *device, int port_num,
 
 	port->ib_dev   = device;
 	port->port_num = port_num;
-	sema_init(&port->sm_sem, 1);
+	init_waitqueue_head(&port->wq);
+	__clear_bit(UMAD_F_CLAIM, &port->flags);
 	mutex_init(&port->file_mutex);
 	INIT_LIST_HEAD(&port->file_list);