[2/2] New pthread rwlock that is more scalable.

Message ID 1483385174.13143.124.camel@redhat.com
State New
Headers show

Commit Message

Torvald Riegel Jan. 2, 2017, 7:26 p.m.
On Wed, 2016-07-27 at 23:44 +0200, Torvald Riegel wrote:
> This replaces the pthread rwlock with a new implementation that uses a

> more scalable algorithm (primarily through not using a critical section

> anymore to make state changes).  The fast path for rdlock acquisition

> and release is now basically a single atomic read-modify write or CAS

> and a few branches.  See nptl/pthread_rwlock_common.c for details.


I have noticed two small oversights, which are taken care of in the
attached patch.  The first is a mssign overflow check (a lock acquired
too often as a reader) in one of the tryrdlock branches.  The second is
a that I had forgotten to apply a cleanup (no correctness change; the
former code did more than it had to).

Comments

Carlos O'Donell Jan. 10, 2017, 8:45 a.m. | #1
On 01/02/2017 02:26 PM, Torvald Riegel wrote:
> On Wed, 2016-07-27 at 23:44 +0200, Torvald Riegel wrote:

>> This replaces the pthread rwlock with a new implementation that uses a

>> more scalable algorithm (primarily through not using a critical section

>> anymore to make state changes).  The fast path for rdlock acquisition

>> and release is now basically a single atomic read-modify write or CAS

>> and a few branches.  See nptl/pthread_rwlock_common.c for details.

> I have noticed two small oversights, which are taken care of in the

> attached patch.  The first is a mssign overflow check (a lock acquired

> too often as a reader) in one of the tryrdlock branches.  The second is

> a that I had forgotten to apply a cleanup (no correctness change; the

> former code did more than it had to).


These look good to me.

> 

> commit 59c2c0dafb1c1460a457037f222032ade9fc5a74

> Author: Torvald Riegel <triegel@redhat.com>

> Date:   Mon Jan 2 17:50:37 2017 +0100

> 

>     Fix a minor issue and an oversight (not a correctness bug) in tryrdlock

> 

> diff --git a/nptl/pthread_rwlock_tryrdlock.c b/nptl/pthread_rwlock_tryrdlock.c

> index e002f15..6c3014c 100644

> --- a/nptl/pthread_rwlock_tryrdlock.c

> +++ b/nptl/pthread_rwlock_tryrdlock.c

> @@ -51,12 +51,6 @@ __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)

>  		  == PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP))

>  	    return EBUSY;

>  	  rnew = r + (1 << PTHREAD_RWLOCK_READER_SHIFT);

> -	  /* If we could have caused an overflow or take effect during an

> -	     overflow, we just can / need to return EAGAIN.  There is no need

> -	     to have modified the number of readers because we could have

> -	     done that and cleaned up immediately.  */

> -	  if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)

> -	    return EAGAIN;

>  	}

>        else

>  	{

> @@ -72,6 +66,12 @@ __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)

>  		  ^ PTHREAD_RWLOCK_WRPHASE;

>  	    }

>  	}

> +      /* If we could have caused an overflow or take effect during an

> +	 overflow, we just can / need to return EAGAIN.  There is no need to

> +	 have actually modified the number of readers because we could have

> +	 done that and cleaned up immediately.  */

> +      if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)

> +	return EAGAIN;


OK. Move the overflow check later in the code to catch all cases.

>      }

>    /* If the CAS fails, we retry; this prevents that tryrdlock fails spuriously

>       (i.e., fails to acquire the lock although there is no writer), which is

> @@ -84,16 +84,25 @@ __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)

>       readers or writers that acquire and release in the meantime.  Using

>       randomized exponential back-off to make a live-lock unlikely should be

>       sufficient.

> +     TODO Back-off.

>       Acquire MO so we synchronize with prior writers.  */

>    while (!atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,

>        &r, rnew));

>  

>    if ((r & PTHREAD_RWLOCK_WRPHASE) != 0)

>      {

> -      //FIXME / TODO same as in rdlock_full

> -      int private = __pthread_rwlock_get_private (rwlock);

> -      atomic_store_release (&rwlock->__data.__wrphase_futex, 0);

> -      futex_wake (&rwlock->__data.__wrphase_futex, INT_MAX, private);

> +      /* Same as in __pthread_rwlock_rdlock_full:

> +	 We started the read phase, so we are also responsible for

> +	 updating the write-phase futex.  Relaxed MO is sufficient.

> +	 Note that there can be no other reader that we have to wake

> +	 because all other readers will see the read phase started by us

> +	 (or they will try to start it themselves); if a writer started

> +	 the read phase, we cannot have started it.  Furthermore, we

> +	 cannot discard a PTHREAD_RWLOCK_FUTEX_USED flag because we will

> +	 overwrite the value set by the most recent writer (or the readers

> +	 before it in case of explicit hand-over) and we know that there

> +	 are no waiting readers.  */

> +      atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 0);


OK.

>      }

>  

>    return 0;

Patch hide | download patch | download mbox

commit 59c2c0dafb1c1460a457037f222032ade9fc5a74
Author: Torvald Riegel <triegel@redhat.com>
Date:   Mon Jan 2 17:50:37 2017 +0100

    Fix a minor issue and an oversight (not a correctness bug) in tryrdlock

diff --git a/nptl/pthread_rwlock_tryrdlock.c b/nptl/pthread_rwlock_tryrdlock.c
index e002f15..6c3014c 100644
--- a/nptl/pthread_rwlock_tryrdlock.c
+++ b/nptl/pthread_rwlock_tryrdlock.c
@@ -51,12 +51,6 @@  __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
 		  == PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP))
 	    return EBUSY;
 	  rnew = r + (1 << PTHREAD_RWLOCK_READER_SHIFT);
-	  /* If we could have caused an overflow or take effect during an
-	     overflow, we just can / need to return EAGAIN.  There is no need
-	     to have modified the number of readers because we could have
-	     done that and cleaned up immediately.  */
-	  if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)
-	    return EAGAIN;
 	}
       else
 	{
@@ -72,6 +66,12 @@  __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
 		  ^ PTHREAD_RWLOCK_WRPHASE;
 	    }
 	}
+      /* If we could have caused an overflow or take effect during an
+	 overflow, we just can / need to return EAGAIN.  There is no need to
+	 have actually modified the number of readers because we could have
+	 done that and cleaned up immediately.  */
+      if (rnew >= PTHREAD_RWLOCK_READER_OVERFLOW)
+	return EAGAIN;
     }
   /* If the CAS fails, we retry; this prevents that tryrdlock fails spuriously
      (i.e., fails to acquire the lock although there is no writer), which is
@@ -84,16 +84,25 @@  __pthread_rwlock_tryrdlock (pthread_rwlock_t *rwlock)
      readers or writers that acquire and release in the meantime.  Using
      randomized exponential back-off to make a live-lock unlikely should be
      sufficient.
+     TODO Back-off.
      Acquire MO so we synchronize with prior writers.  */
   while (!atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
       &r, rnew));
 
   if ((r & PTHREAD_RWLOCK_WRPHASE) != 0)
     {
-      //FIXME / TODO same as in rdlock_full
-      int private = __pthread_rwlock_get_private (rwlock);
-      atomic_store_release (&rwlock->__data.__wrphase_futex, 0);
-      futex_wake (&rwlock->__data.__wrphase_futex, INT_MAX, private);
+      /* Same as in __pthread_rwlock_rdlock_full:
+	 We started the read phase, so we are also responsible for
+	 updating the write-phase futex.  Relaxed MO is sufficient.
+	 Note that there can be no other reader that we have to wake
+	 because all other readers will see the read phase started by us
+	 (or they will try to start it themselves); if a writer started
+	 the read phase, we cannot have started it.  Furthermore, we
+	 cannot discard a PTHREAD_RWLOCK_FUTEX_USED flag because we will
+	 overwrite the value set by the most recent writer (or the readers
+	 before it in case of explicit hand-over) and we know that there
+	 are no waiting readers.  */
+      atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 0);
     }
 
   return 0;