diff mbox

[v2] PowerPC: Fix a race condition when eliding a lock

Message ID 55DC9C24.5090401@linaro.org
State New
Headers show

Commit Message

Adhemerval Zanella Aug. 25, 2015, 4:47 p.m. UTC
On 25-08-2015 13:37, Torvald Riegel wrote:
> On Tue, 2015-08-25 at 10:15 -0400, Carlos O'Donell wrote:
>> As a concrete example the structure element that is accessed atomically is
>> rwlock->__data.__lock. We access it atomically in lll_lock and also
>> in the txnal region (is region the right word?).
> 
> Txnal region is used by some, so I think this works.  Just transaction
> would work as well I think.
> 
>> The access is part of:
>>
>> nptl/pthread_rwlock_wrlock.c 
>>
>> 100   if (ELIDE_LOCK (rwlock->__data.__rwelision,
>> 101                   rwlock->__data.__lock == 0
>> 102                   && rwlock->__data.__writer == 0
>> 103                   && rwlock->__data.__nr_readers == 0))
>> 104     return 0;
>>
>> With the intent being for the expression to be evaluted inside of the
>> transaction and thus abort if another thread has touched any of those
>> fields.
>>
>> That entire expression expands into the usage of is_lock_free. Therefore
>> shouldn't the change be at the caller site?
> 
> That would be an option, or we should pass in a function or something.
> 
>> e.g.
>>
>> 100   if (ELIDE_LOCK (rwlock->__data.__rwelision,
>> 101                   atomic_load_relaxed (rwlock->__data.__lock) == 0
>> 102                   && rwlock->__data.__writer == 0
>> 103                   && rwlock->__data.__nr_readers == 0))
>> 104     return 0;
>>
>> Since the powerpc elision backend doesn't know anything about the types
>> that went into the evaluation of is_lock_free?
>>
>> If anything I think *we* have the onus to fix these cases of missing
>> atomic_load_relaxed not IBM?
> 
> Somebody should do it :)  I hadn't thought too much about for whom it
> would be most convenient to do.  As I wrote in the review for Tulio's
> patch, IMO passing in an expression into a macro that has to be
> evaluated in there is pretty ugly and seems to be at least related to
> the bug Tulio is fixing.  Maybe we can pass in a function that is
> inlined by the compiler.
> I do agree that IBM just used what the Intel implementation did, and
> thus didn't create that in the first place.
> 

Indeed Intel was the first arch that enable Lock Elision and other
arches just followed the implementation.  And I also agree that the
macro is not best approach and on initial iterations on this I proposed
to instead of passing an function we can instead use the arguments
address instead:
diff mbox

Patch

diff --git a/nptl/pthread_rwlock_wrlock.c b/nptl/pthread_rwlock_wrlock.c
index 60fa909..0161876 100644
--- a/nptl/pthread_rwlock_wrlock.c
+++ b/nptl/pthread_rwlock_wrlock.c
@@ -98,9 +98,9 @@  __pthread_rwlock_wrlock (pthread_rwlock_t *rwlock)
   LIBC_PROBE (wrlock_entry, 1, rwlock);
 
   if (ELIDE_LOCK (rwlock->__data.__rwelision,
-		  rwlock->__data.__lock == 0
-		  && rwlock->__data.__writer == 0
-		  && rwlock->__data.__nr_readers == 0))
+		  rwlock->__data.__lock,
+		  rwlock->__data.__writer,
+		  rwlock->__data.__nr_readers))

diff --git a/sysdeps/generic/elide.h b/sysdeps/generic/elide.h
index 80a3a03..64d9cce 100644
--- a/sysdeps/generic/elide.h
+++ b/sysdeps/generic/elide.h
@@ -15,11 +15,12 @@ 
    You should have received a copy of the GNU Lesser General Public
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
+
 #ifndef ELIDE_H
 #define ELIDE_H 1
 
-#define ELIDE_LOCK(adapt_count, is_lock_free) 0
-#define ELIDE_TRYLOCK(adapt_count, is_lock_free, write) 0
-#define ELIDE_UNLOCK(is_lock_free) 0
+#define ELIDE_LOCK(adapt_count, lock, writer, readers) 0
+#define ELIDE_TRYLOCK(adapt_count, lock, writer, readers, write) 0
+#define ELIDE_UNLOCK(writer, readers) 0
 
 #endif
diff --git a/sysdeps/powerpc/nptl/elide.h b/sysdeps/powerpc/nptl/elide.h
index 389f5a5..b3cc50a 100644
--- a/sysdeps/powerpc/nptl/elide.h
+++ b/sysdeps/powerpc/nptl/elide.h
@@ -27,7 +27,7 @@ 
    ADAPT_COUNT is a pointer to per-lock state variable. */
 
 static inline bool
-__elide_lock (uint8_t *adapt_count, int is_lock_free)
+__elide_lock (uint8_t *adapt_count, int *lock, int *writer, unsigned int *readers)
 {
   if (*adapt_count > 0)
     {
@@ -39,7 +39,10 @@  __elide_lock (uint8_t *adapt_count, int is_lock_free)
     {
       if (__builtin_tbegin (0))
 	{
-	  if (is_lock_free)
+	  /* The compiler barrier is required because some GCC version might
+	     reorder the lock read before the transaction init builtin.  */
+	  asm volatile("" ::: "memory");
+	  if ((*lock == 0) && (*writer == 0) && (*readers == 0))
 	    return true;
 	  /* Lock was busy.  */
 	  __builtin_tabort (_ABORT_LOCK_BUSY);
@@ -66,30 +69,31 @@  __elide_lock (uint8_t *adapt_count, int is_lock_free)
   return false;
 }

I do not know which is better, since it will tie the ELIDE_LOCK implementation
with current internal pthread definitions.