diff mbox series

Documentation: locking: update libc support status of PI futexes

Message ID 20241228181546.1315328-1-alison@she-devel.com
State New
Headers show
Series Documentation: locking: update libc support status of PI futexes | expand

Commit Message

Alison Chaiken Dec. 28, 2024, 6:15 p.m. UTC
From: Alison Chaiken <alison@she-devel.com>

Update the text of futex-requeue-pi.rst to explain that, because of a
conflict between POSIX requirements and ABI constraints, glibc does
not support requeueing of PI futexes.  Add some information about
librtpi, a library which provides an implementation of condition
variables which supports priority inheritance.

Signed-off-by: Alison Chaiken <alison@she-devel.com>
---
 Documentation/locking/futex-requeue-pi.rst | 47 +++++++++++++++++++---
 1 file changed, 42 insertions(+), 5 deletions(-)

Comments

Sebastian Andrzej Siewior Jan. 7, 2025, 3:31 p.m. UTC | #1
On 2024-12-28 10:15:46 [-0800], alison@she-devel.com wrote:
> From: Alison Chaiken <alison@she-devel.com>
> 
> Update the text of futex-requeue-pi.rst to explain that, because of a
> conflict between POSIX requirements and ABI constraints, glibc does
> not support requeueing of PI futexes.  Add some information about
> librtpi, a library which provides an implementation of condition
> variables which supports priority inheritance.

Are you sure? My memory is that glibc avoided using the internal mutex.
The old problem should be gone and pthread_cond_signal() and
pthread_cond_wait() should work.

> Signed-off-by: Alison Chaiken <alison@she-devel.com>

Sebastian
Alison Chaiken Jan. 11, 2025, 6:55 p.m. UTC | #2
On 2025-01-07 07:31, Sebastian Andrzej Siewior wrote:
> On 2024-12-28 10:15:46 [-0800], alison@she-devel.com wrote:
>> From: Alison Chaiken <alison@she-devel.com>
>> 
>> Update the text of futex-requeue-pi.rst to explain that, because of a
>> conflict between POSIX requirements and ABI constraints, glibc does
>> not support requeueing of PI futexes.  Add some information about
>> librtpi, a library which provides an implementation of condition
>> variables which supports priority inheritance.
> 
> Are you sure? My memory is that glibc avoided using the internal mutex.
> The old problem should be gone and pthread_cond_signal() and
> pthread_cond_wait() should work.

Ignoring support for 64-bit time, the last substantive change to 
pthread_cond_wait() and pthread_cond_signal() was Torvald Riegel's  
commit ed19993b5b0d05d62cc883571519a67dae481a14 "New condvar 
implementation that provides stronger ordering guarantees," which fixed 
problems with waking of ineligible futex waiters and with ABA issues 
concerning the futex word.    What the patch does not do is made clear 
by the commit message:

      This condvar doesn't yet use a requeue optimization (ie, on a 
broadcast,
      waking just one thread and requeueing all others on the futex of 
the
      mutex supplied by the program).

What futex-requeue-pi.rst directs is

      In order to support PI-aware pthread_condvar's, the kernel needs to
      be able to requeue tasks to PI futexes.

Riegel and Darren Hart discussed Riegel's patch in at length at the 2016 
RT Summit:

     
https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedule

The related glibc bug report by Darren may be found at

     https://sourceware.org/bugzilla/show_bug.cgi?id=11588

The last comment on the bug from 2017 is by Riegel:

     So far, there is no known solution for how to achieve PI support 
given the current constraints we have (eg, available futex operations, 
POSIX requirements, ...).

I ran the bug reproducer posted by Darren in Qemu and found that it did 
not fail.   I'm not sure if the result is valid given the peculiarities 
of Qemu, or whether I made some other mistake.

> 
>> Signed-off-by: Alison Chaiken <alison@she-devel.com>
> 
> Sebastian

-- Alison Chaiken
     Aurora Innovation
Sebastian Andrzej Siewior May 23, 2025, 3 p.m. UTC | #3
On 2025-01-11 10:55:55 [-0800], Alison Chaiken wrote:
> > Are you sure? My memory is that glibc avoided using the internal mutex.
> > The old problem should be gone and pthread_cond_signal() and
> > pthread_cond_wait() should work.
> 
> Ignoring support for 64-bit time, the last substantive change to
> pthread_cond_wait() and pthread_cond_signal() was Torvald Riegel's  commit
> ed19993b5b0d05d62cc883571519a67dae481a14 "New condvar implementation that
> provides stronger ordering guarantees," which fixed problems with waking of
> ineligible futex waiters and with ABA issues concerning the futex word.
> What the patch does not do is made clear by the commit message:
> 
>      This condvar doesn't yet use a requeue optimization (ie, on a
> broadcast,
>      waking just one thread and requeueing all others on the futex of the
>      mutex supplied by the program).
> 
> What futex-requeue-pi.rst directs is
> 
>      In order to support PI-aware pthread_condvar's, the kernel needs to
>      be able to requeue tasks to PI futexes.
> 
> Riegel and Darren Hart discussed Riegel's patch in at length at the 2016 RT
> Summit:
> 
> https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedule
> 
> The related glibc bug report by Darren may be found at
> 
>     https://sourceware.org/bugzilla/show_bug.cgi?id=11588
> 
> The last comment on the bug from 2017 is by Riegel:
> 
>     So far, there is no known solution for how to achieve PI support given
> the current constraints we have (eg, available futex operations, POSIX
> requirements, ...).
> 
> I ran the bug reproducer posted by Darren in Qemu and found that it did not
> fail.   I'm not sure if the result is valid given the peculiarities of Qemu,
> or whether I made some other mistake.

I've been looking at this again for other reasons and looked at the
code again…

Back then we use futex-requeue API and required both futex-object to
have the PI bit set. This wasn't the case originally, hence the patch by
Darren which did not make it into the official libc.

With the rework by Riegel, the mutex within pthread's condvar
implementation is gone also the usage of the requeue API. The
pthread_cond_wait()/ pthread_cond_signal() API is back to use futex'
wait/ wake.
The glibc comments write something about important ordering constrains.
The futex wait enqueues the waiter according to its priority. So the
task with highest priority gets always a front seat. The futex wake
function wakes always the first waiter in the queue.

With all this I would say that the glib'c condvar implementation does
not have any issues since the rework.
There were a few loops, with a 0 retry counter (basically dead) and they
have been removed.

Sebastian
Jan Kiszka May 23, 2025, 4:11 p.m. UTC | #4
On 23.05.25 17:00, Sebastian Andrzej Siewior wrote:
> On 2025-01-11 10:55:55 [-0800], Alison Chaiken wrote:
>>> Are you sure? My memory is that glibc avoided using the internal mutex.
>>> The old problem should be gone and pthread_cond_signal() and
>>> pthread_cond_wait() should work.
>>
>> Ignoring support for 64-bit time, the last substantive change to
>> pthread_cond_wait() and pthread_cond_signal() was Torvald Riegel's  commit
>> ed19993b5b0d05d62cc883571519a67dae481a14 "New condvar implementation that
>> provides stronger ordering guarantees," which fixed problems with waking of
>> ineligible futex waiters and with ABA issues concerning the futex word.
>> What the patch does not do is made clear by the commit message:
>>
>>      This condvar doesn't yet use a requeue optimization (ie, on a
>> broadcast,
>>      waking just one thread and requeueing all others on the futex of the
>>      mutex supplied by the program).
>>
>> What futex-requeue-pi.rst directs is
>>
>>      In order to support PI-aware pthread_condvar's, the kernel needs to
>>      be able to requeue tasks to PI futexes.
>>
>> Riegel and Darren Hart discussed Riegel's patch in at length at the 2016 RT
>> Summit:
>>
>> https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedule
>>
>> The related glibc bug report by Darren may be found at
>>
>>     https://sourceware.org/bugzilla/show_bug.cgi?id=11588
>>
>> The last comment on the bug from 2017 is by Riegel:
>>
>>     So far, there is no known solution for how to achieve PI support given
>> the current constraints we have (eg, available futex operations, POSIX
>> requirements, ...).
>>
>> I ran the bug reproducer posted by Darren in Qemu and found that it did not
>> fail.   I'm not sure if the result is valid given the peculiarities of Qemu,
>> or whether I made some other mistake.
> 
> I've been looking at this again for other reasons and looked at the
> code again…
> 
> Back then we use futex-requeue API and required both futex-object to
> have the PI bit set. This wasn't the case originally, hence the patch by
> Darren which did not make it into the official libc.
> 
> With the rework by Riegel, the mutex within pthread's condvar
> implementation is gone also the usage of the requeue API. The
> pthread_cond_wait()/ pthread_cond_signal() API is back to use futex'
> wait/ wake.
> The glibc comments write something about important ordering constrains.
> The futex wait enqueues the waiter according to its priority. So the
> task with highest priority gets always a front seat. The futex wake
> function wakes always the first waiter in the queue.
> 
> With all this I would say that the glib'c condvar implementation does
> not have any issues since the rework.
> There were a few loops, with a 0 retry counter (basically dead) and they
> have been removed.
> 

That would be good news.

Which would be the minimal glibc version needed then, already 2.25? And
could we ensure that future versions will maintain these properties by
sneaking some related testcase(s) into glibc?

Jan
Alison Chaiken May 24, 2025, 10:09 p.m. UTC | #5
On 2025-05-23 08:00, Sebastian Andrzej Siewior wrote:
> On 2025-01-11 10:55:55 [-0800], Alison Chaiken wrote:
>> > Are you sure? My memory is that glibc avoided using the internal mutex.
>> > The old problem should be gone and pthread_cond_signal() and
>> > pthread_cond_wait() should work.
>> 
>> Ignoring support for 64-bit time, the last substantive change to
>> pthread_cond_wait() and pthread_cond_signal() was Torvald Riegel's  
>> commit
>> ed19993b5b0d05d62cc883571519a67dae481a14 "New condvar implementation 
>> that
>> provides stronger ordering guarantees," which fixed problems with 
>> waking of
>> ineligible futex waiters and with ABA issues concerning the futex 
>> word.
>> What the patch does not do is made clear by the commit message:
>> 
>>      This condvar doesn't yet use a requeue optimization (ie, on a
>> broadcast,
>>      waking just one thread and requeueing all others on the futex of 
>> the
>>      mutex supplied by the program).
>> 
>> What futex-requeue-pi.rst directs is
>> 
>>      In order to support PI-aware pthread_condvar's, the kernel needs 
>> to
>>      be able to requeue tasks to PI futexes.
>> 
>> Riegel and Darren Hart discussed Riegel's patch in at length at the 
>> 2016 RT
>> Summit:
>> 
>> https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/schedule
>> 
>> The related glibc bug report by Darren may be found at
>> 
>>     https://sourceware.org/bugzilla/show_bug.cgi?id=11588
>> 
>> The last comment on the bug from 2017 is by Riegel:
>> 
>>     So far, there is no known solution for how to achieve PI support 
>> given
>> the current constraints we have (eg, available futex operations, POSIX
>> requirements, ...).
>> 
>> I ran the bug reproducer posted by Darren in Qemu and found that it 
>> did not
>> fail.   I'm not sure if the result is valid given the peculiarities of 
>> Qemu,
>> or whether I made some other mistake.
> 
> I've been looking at this again for other reasons and looked at the
> code again…
> 
> Back then we use futex-requeue API and required both futex-object to
> have the PI bit set. This wasn't the case originally, hence the patch 
> by
> Darren which did not make it into the official libc.
> 
> With the rework by Riegel, the mutex within pthread's condvar
> implementation is gone also the usage of the requeue API. The
> pthread_cond_wait()/ pthread_cond_signal() API is back to use futex'
> wait/ wake.
> The glibc comments write something about important ordering constrains.
> The futex wait enqueues the waiter according to its priority. So the
> task with highest priority gets always a front seat. The futex wake
> function wakes always the first waiter in the queue.
> 
> With all this I would say that the glib'c condvar implementation does
> not have any issues since the rework.
> There were a few loops, with a 0 retry counter (basically dead) and 
> they
> have been removed.
> 
> Sebastian

Thanks, Sebastian, for looking into this question.

Torvald Riegel's last patch to pthread_cond_wait.c:

$ git log -n 1 --author=riegel -- pthread_cond_wait.c
commit ed19993b5b0d05d62cc883571519a67dae481a14
Author: Torvald Riegel <triegel@redhat.com>
Date:   Wed May 25 23:43:36 2016 +0200
     New condvar implementation that provides stronger ordering 
guarantees.

Speaking of ordering, the 2016 Linux Realtime Summit happened after, on 
11 October.   Torvald and Darren co-presented a talk about condition 
variables:

https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/pthread-condvars

Torvald in his half of the talk discusses the POSIX requirement which 
necessitated a change to condvars and his redesign.  In the video at 
30:50,

---

Zijlstra:  Even for FIFO, in the previous slides, S2 will only wake W2, 
because W3 was not yet
eligible, but W3 might be the highest-priority waiter.  Strictly 
speaking, W3 was eligible at S2.
[p. 9, W3 was in G2, not G1, but "happened before" S2].    At S2, the 
only possible wakeup was W2,
even thought W3 might be the highest-priority waiter.

Hart: Correct.  Not in this scheme.

Zijstra: Sequence-wise, it's correct,

Hart: it's mathematically correct.

Zijlstra: But it's not the one we want to wake according to PI rules.

Hart: Yep.

Zijlstra: This scheme does not permit us doing so.

Hart: Noted.

---

Darren and Torvald agree that glibc cannot make pthread condvars 
PI-aware without breaking ABI.   Am I missing something?

Thanks,
Alison

---
Alison Chaiken                   alison@she-devel.com
https://she-devel.com
"What respite from her thrilling toil did Beauty ever take — But Work 
might be Electric Rest To those that Magic make" -- Emily Dickinson
diff mbox series

Patch

diff --git a/Documentation/locking/futex-requeue-pi.rst b/Documentation/locking/futex-requeue-pi.rst
index dd4ecf4528a4..6ad7f0c9ea4b 100644
--- a/Documentation/locking/futex-requeue-pi.rst
+++ b/Documentation/locking/futex-requeue-pi.rst
@@ -54,7 +54,7 @@  In order to support PI-aware pthread_condvar's, the kernel needs to
 be able to requeue tasks to PI futexes.  This support implies that
 upon a successful futex_wait system call, the caller would return to
 user space already holding the PI futex.  The glibc implementation
-would be modified as follows::
+would need to be modified as follows::
 
 
 	/* caller must lock mutex */
@@ -78,10 +78,20 @@  would be modified as follows::
 		futex_requeue_pi(cond->data.__futex, cond->mutex);
 	}
 
-The actual glibc implementation will likely test for PI and make the
-necessary changes inside the existing calls rather than creating new
-calls for the PI cases.  Similar changes are needed for
-pthread_cond_timedwait() and pthread_cond_signal().
+The actual glibc libpthread implementation has not made these changes,
+nor has it made similar changes needed for pthread_cond_timedwait()
+and pthread_cond_signal().  The reason is that POSIX has a strict
+notion of "eligible" waiters on a futex, which means the set of
+waiters created before a given signal is sent.  Because userspace has
+no atomic way to perform lock operations together with the futex
+system call, the implementation must also carefully guard against lost
+wakeups on a multicore system.  These constraints mean that the
+libpthread condition variable would need an ABI break into order to
+support requeueing.  The fundamental underlying difficulty stems from
+the limited size of the futex word, which is 32 bits even on 64-bit
+systems.  See
+https://wiki.linuxfoundation.org/realtime/events/rt-summit2016/pthread-condvars
+for details.
 
 Implementation
 --------------
@@ -130,3 +140,30 @@  either pthread_cond_broadcast() or pthread_cond_signal() acquire the
 mutex prior to making the call. FUTEX_CMP_REQUEUE_PI requires that
 nr_wake=1.  nr_requeue should be INT_MAX for broadcast and 0 for
 signal.
+
+librtpi
+--------------
+
+librtpi (https://gitlab.com/linux-rt/librtpi) implements condition
+variables which closely follow the guidance above.  The librtpi
+implementation adds a new mutex parameter to the waiting and signaling
+functions in order to support the requirement that mutexes with the
+PTHREAD_PRIO_INHERIT attribute always have an owner:
+
+int pi_cond_wait(pi_cond_t *cond, pi_mutex_t *mutex);
+int pi_cond_signal(pi_cond_t *cond, pi_mutex_t *mutex);
+
+Realtime userspace applications which rely on librtpi must therefore
+make code changes.
+
+librtpi works with the kernel scheduler to wake the highest-priority
+waiters on a futex in FIFO order.  The code is much simpler than
+glibc's at the cost of omitting some POSIX-mandated features.  librtpi
+has no notion of POSIX's eligible waiters, and it does not support
+robust, process-private or PTHREAD_PRIO_PROTECT mutexes.
+
+other C libraries
+--------------
+
+Like glibc's NPTL, other prominent threading libraries like musl,
+Thread Building Blocks and Boost do not implement futex requeueing.