[BZ,#16892] Check value of futex before updating in __lll_timedlock

Message ID 1D01BCB7-8B02-415F-9244-58C15296B799@linaro.org
State New
Headers show

Commit Message

Bernie Ogden Aug. 5, 2014, 3:43 p.m.
__lll_timedlock in generic lowlevellock.h sets futex to 1
without checking that it is already 0. This can create 2
performance problems (analysis by Carlos O'Donell):

1) Up to N threads can fail to sleep when they ought to have done,
where N is the number of threads expecting futex==2. For example:

  * T1 calls __lll_timedlock setting futex to 1 and taking the lock.
  * T2 calls __lll_timedlock setting futex to 1 but does not take the lock.
  * T2 calls __lll_timedlock_wait and sets the futex to 2 and does not
    gain the lock.
  * T3 calls __lll_timedlock setting futex to 1 but does not take the lock.
  * T2 calls lll_futex_time_wait but fails with -EWOULDBLOCK because T3 reset futex to 1.
    -> One inflight thread (T2), and one spurious failed futex wait syscall.
  * T2 again sets the futex to 2 and does not gain the lock.
  * ... T2 and T3 go on to call futex wait syscall and both sleep.

2) __lll_unlock only wakes if futex was > 1 prior to release. Thus
it can happen that __lll_timedlock keeps setting futex from 2 to 1
just prior to __lll_unlock calls, preventing waiters from being
awoken. This certainly affects m68k, arm and aarch64 - sh may also
be affected but it's a little harder to tell as its written in asm.

We fix this by changing an atomic_exchange_acq to
atomic_compare_and_exchange_bool_acq.

This bug previously affected arm, aarch64, m68k and sh/sh4. Since mips switched
to the generic lowlevellock.h, it is also affected. Applying this patch will fix arm, aarch64
and mips. m68k and sh would need to switch to the generic header to get the fix.

Change is pretty simple, but has been built and tested on arm.

OK?


2014-08-05  Bernard Ogden  <bernie.ogden@linaro.org>

        [BZ #16892]
        * sysdeps/nptl/lowlevellock.h: Use
        atomic_compare_and_exchange_bool_acq rather than atomic_exchange_acq
        in __lll_timedlock.

---

Comments

Roland McGrath Aug. 6, 2014, 9:27 p.m. | #1
> This bug previously affected arm, aarch64, m68k and sh/sh4. Since mips
> switched to the generic lowlevellock.h, it is also affected. Applying
> this patch will fix arm, aarch64 and mips. m68k and sh would need to
> switch to the generic header to get the fix.

m68k uses the generic code now.

> Change is pretty simple, but has been built and tested on arm.

> 2014-08-05  Bernard Ogden  <bernie.ogden@linaro.org>
> 
>         [BZ #16892]
>         * sysdeps/nptl/lowlevellock.h: Use
>         atomic_compare_and_exchange_bool_acq rather than atomic_exchange_acq
>         in __lll_timedlock.

This should look like:

	[BZ #16892]
	* sysdeps/nptl/lowlevellock.h (__lll_timedlock): Use
	atomic_compare_and_exchange_bool_acq rather than atomic_exchange_acq.

1. Use 1 tab to indent, not 8 spaces.
2. Put the name of the affected macro/function/variable in parens after the
   file name, not just in regular English.

> --- a/sysdeps/nptl/lowlevellock.h
> +++ b/sysdeps/nptl/lowlevellock.h
> @@ -93,7 +93,8 @@ extern int __lll_robust_timedlock_wait (int *futex, const struct timespec *,
>      int *__futex = (futex);                                     \
>      int __val = 0;                                              \
>                                                                  \
> -    if (__glibc_unlikely (atomic_exchange_acq (__futex, 1)))    \
> +    if (__glibc_unlikely                                        \
> +        (atomic_compare_and_exchange_bool_acq (__futex, 1, 0))) \
>        __val = __lll_timedlock_wait (__futex, abstime, private); \
>      __val;                                                      \
>    })

Please add a comment explaining what the code is doing.  The rest of the
file needs more comments like this and I'm not asking you to add all those
(unless you want to!).  But where you're touching it, and especially
replacing one magic atomic operation with a different magic atomic
operation to fix a bug, a comment is really warranted.  If there had been a
good comment on the original code, it probably would have prevented us from
letting the bug go by in the first place.


Thanks,
Roland

Patch

diff --git a/sysdeps/nptl/lowlevellock.h b/sysdeps/nptl/lowlevellock.h
index 548a9c8..06ccde5 100644
--- a/sysdeps/nptl/lowlevellock.h
+++ b/sysdeps/nptl/lowlevellock.h
@@ -93,7 +93,8 @@  extern int __lll_robust_timedlock_wait (int *futex, const struct timespec *,
     int *__futex = (futex);                                     \
     int __val = 0;                                              \
                                                                 \
-    if (__glibc_unlikely (atomic_exchange_acq (__futex, 1)))    \
+    if (__glibc_unlikely                                        \
+        (atomic_compare_and_exchange_bool_acq (__futex, 1, 0))) \
       __val = __lll_timedlock_wait (__futex, abstime, private); \
     __val;                                                      \
   })