mbox series

[v2,0/2] Make abort AS-safe

Message ID 20240723134235.1520483-1-adhemerval.zanella@linaro.org
Headers show
Series Make abort AS-safe | expand

Message

Adhemerval Zanella Netto July 23, 2024, 1:41 p.m. UTC
POSIX states that abort should be AS-safe, and Rust also had an open PR
about it [1] (it was closed with a different fix).

The main issue is the recursive lock used on abort does not synchronize
with new process creation (either by fork-like interfaces or posix_spawn
ones), nor it is reinitialized after fork.

Also, the SIGABRT unblock before raise shows another race condition,
where a fork or posix_spawn call by another thread just after
the recursive lock release and before raising SIGABRT might create
a new process with a non-expected signal mask.

To fix the AS-safe, the raise is issued without changing the process
signal mask, and an AS-safe lock is used if a SIGABRT is installed or
the process is blocked or ignored.  With the signal mask change removal,
there is no need to use a recursive lock.  The lock is also used on
both _Fork and posix_spawn, to avoid the spawn process to see the
abort handler as SIG_DFL.

This change was not fully possible with previous POSIX standard (2008),
where it stated that:

  The abort() function shall override blocking or ignoring the SIGABRT
  signal.

And

  The SIGABRT signal shall be sent to the calling process as if by means
  of raise() with the argument SIGABRT.

The later has been changed with a new clarification [3]:

  The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as if by
  means of raise() with the argument SIGABRT. [CX]If this signal does not
  terminate the process (for example, if the signal is caught and the handler
  returns), abort() may change the disposition of SIGABRT to SIG_DFL and send
  the signal (in the same way) again. If a second signal is sent and it does
  not terminate the process, the behavior is unspecified, except that the
  abort() call shall not return.

The clone is also subjected to this issue, but since glibc does not
do any internal metadata setup (as for fork-like function), this patch
does not handle it for the symbol.

I have not added a regression tests because, from previous Carlos's
patch [2], hitting the code path to trigger the potential issue
(fork just after abort has acquired the lock and reset SIGABRT handler)
is not deterministic and it would generate a lot of development
overhead.

[1] https://github.com/rust-lang/rust/issues/73894#issuecomment-673478761
[2] https://sourceware.org/pipermail/libc-alpha/2020-September/117934.html
[3] https://austingroupbugs.net/view.php?id=906#c5851

Changes from v1:
- Rename de signal block and lock to __abort_lock_lock.
- Improve comments on both abort, where the signal disposition can not
  be changed, and on posix_spawn on why it needs to take the abort lock.
- Use gettid() on __pthread_raise_internal.
- Added a NEWS entry for the setjmp fix.

Adhemerval Zanella (2):
  setjmp: Use BSD sematic as default for setjmp
  stdlib: Make abort/_Exit AS-safe (BZ 26275)

 NEWS                                       |   4 +-
 include/bits/unistd_ext.h                  |   3 +
 include/stdlib.h                           |   6 +
 manual/setjmp.texi                         |  14 +--
 manual/startup.texi                        |   5 +-
 nptl/pthread_create.c                      |   3 +-
 nptl/pthread_kill.c                        |  11 ++
 posix/fork.c                               |   2 +
 setjmp/setjmp.h                            |   5 -
 signal/sigaction.c                         |  15 ++-
 stdlib/abort.c                             | 131 ++++++++-------------
 sysdeps/generic/internal-signals.h         |  27 ++++-
 sysdeps/generic/internal-sigset.h          |  26 ++++
 sysdeps/htl/pthreadP.h                     |   2 +
 sysdeps/nptl/_Fork.c                       |   9 ++
 sysdeps/nptl/libc_start_call_main.h        |   3 +-
 sysdeps/nptl/pthreadP.h                    |   1 +
 sysdeps/unix/sysv/linux/internal-signals.h |   9 ++
 sysdeps/unix/sysv/linux/internal-sigset.h  |   2 +-
 sysdeps/unix/sysv/linux/spawni.c           |   8 +-
 20 files changed, 174 insertions(+), 112 deletions(-)
 create mode 100644 sysdeps/generic/internal-sigset.h

Comments

Zack Weinberg July 24, 2024, 2:40 p.m. UTC | #1
Continuing the discussion from part 1 of your patch here as this message
provides important context:

On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
> POSIX states that abort should be AS-safe
...
[and now also states]
>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>   does not terminate the process (for example, if the signal is caught
>   and the handler returns), abort() may change the disposition of
>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>   second signal is sent and it does not terminate the process, the
>   behavior is unspecified, except that the abort() call shall not
>   return.

It occurs to me that the new language does not require us to _succeed_
in changing the disposition of SIGABRT, which means "but what if other
threads race with us to change the disposition of SIGABRT?" is moot.
The only hard requirements that survive are:

- abort shall cause abnormal termination of the process
- abort's first attempt to terminate the process shall be by sending
  the calling thread a catchable and blockable SIGABRT
- under no circumstances shall abort return to its caller
- abort shall be AS-safe

And that means we shouldn't need any locking or new interfaces or anything.
I claim this is a fully conformant implementation of abort.
Change my mind.

#include <pthread.h>
#include <signal.h>
#include <stdnoreturn.h>
#include <sysexits.h>
#include <unistd.h>

noreturn void
abort (void)
{
  // could attempt to flush open stdio streams here, but it's
  // not required; if an attempt is made, locked streams should be
  // skipped rather than waiting for them

  pthread_kill (pthread_self (), SIGABRT);
  // if we get here, SIGABRT is blocked or has a handler that returns

#if 1 // this block is not required nor is anything in it required to work
  // unblocking SIGABRT in this thread is sufficient for its default
  // disposition to terminate the whole process
  sigset_t ss_SIGABRT;
  sigemptyset (&ss_SIGABRT);
  sigaddset (&ss_SIGABRT, SIGABRT);
  pthread_sigmask (SIG_UNBLOCK, &ss_SIGABRT, 0);
  signal (SIGABRT, SIG_DFL);
  pthread_kill (pthread_self (), SIGABRT);
#endif

  for (;;)
    _exit (EX_SOFTWARE);
}

zw

p.s. It also occurs to me that a hypothetical _exit_with_signal
syscall would have other uses than replacing the #if 1 block above:
for instance, it would permit a wrapper program to forward a signal
exit from the wrapped program to its parent with substantially less
fuss than is possible today.
Adhemerval Zanella Netto July 24, 2024, 3:25 p.m. UTC | #2
On 24/07/24 11:40, Zack Weinberg wrote:
> Continuing the discussion from part 1 of your patch here as this message
> provides important context:
> 
> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
>> POSIX states that abort should be AS-safe
> ...
> [and now also states]
>>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>>   does not terminate the process (for example, if the signal is caught
>>   and the handler returns), abort() may change the disposition of
>>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>>   second signal is sent and it does not terminate the process, the
>>   behavior is unspecified, except that the abort() call shall not
>>   return.
> 
> It occurs to me that the new language does not require us to _succeed_
> in changing the disposition of SIGABRT, which means "but what if other
> threads race with us to change the disposition of SIGABRT?" is moot.
> The only hard requirements that survive are:
> 
> - abort shall cause abnormal termination of the process
> - abort's first attempt to terminate the process shall be by sending
>   the calling thread a catchable and blockable SIGABRT
> - under no circumstances shall abort return to its caller
> - abort shall be AS-safe
> 
> And that means we shouldn't need any locking or new interfaces or anything.
> I claim this is a fully conformant implementation of abort.
> Change my mind.
> 
> #include <pthread.h>
> #include <signal.h>
> #include <stdnoreturn.h>
> #include <sysexits.h>
> #include <unistd.h>
> 
> noreturn void
> abort (void)
> {
>   // could attempt to flush open stdio streams here, but it's
>   // not required; if an attempt is made, locked streams should be
>   // skipped rather than waiting for them
> 
>   pthread_kill (pthread_self (), SIGABRT);
>   // if we get here, SIGABRT is blocked or has a handler that returns
> 
> #if 1 // this block is not required nor is anything in it required to work
>   // unblocking SIGABRT in this thread is sufficient for its default
>   // disposition to terminate the whole process
>   sigset_t ss_SIGABRT;
>   sigemptyset (&ss_SIGABRT);
>   sigaddset (&ss_SIGABRT, SIGABRT);
>   pthread_sigmask (SIG_UNBLOCK, &ss_SIGABRT, 0);
>   signal (SIGABRT, SIG_DFL);
>   pthread_kill (pthread_self (), SIGABRT);
> #endif
> 
>   for (;;)
>     _exit (EX_SOFTWARE);
> }
> 
> zw

This does not solve the issue of concurrent sigaction (SIGABRT), which is
the main point of the locking (and on my patch the locks is taken iff for 
SIGABRT disposition change).

> 
> p.s. It also occurs to me that a hypothetical _exit_with_signal
> syscall would have other uses than replacing the #if 1 block above:
> for instance, it would permit a wrapper program to forward a signal
> exit from the wrapped program to its parent with substantially less
> fuss than is possible today.

A syscall would be a nice thing to have, but not really required.  Also,
making abort async-signal-safe only for newly kernel is far fol ideal.
Zack Weinberg July 24, 2024, 3:30 p.m. UTC | #3
On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
> On 24/07/24 11:40, Zack Weinberg wrote:
>> Continuing the discussion from part 1 of your patch here as this message
>> provides important context:
>> 
>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
>>> POSIX states that abort should be AS-safe
>> ...
>> [and now also states]
>>>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>>>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>>>   does not terminate the process (for example, if the signal is caught
>>>   and the handler returns), abort() may change the disposition of
>>>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>>>   second signal is sent and it does not terminate the process, the
>>>   behavior is unspecified, except that the abort() call shall not
>>>   return.
>> 
>> It occurs to me that the new language does not require us to _succeed_
>> in changing the disposition of SIGABRT, which means "but what if other
>> threads race with us to change the disposition of SIGABRT?" is moot.
>> The only hard requirements that survive are:
...
> This does not solve the issue of concurrent sigaction (SIGABRT), which is
> the main point of the locking (and on my patch the locks is taken iff for 
> SIGABRT disposition change).

Uh, my whole point was that the new language permits us to ignore that issue.

zw
Adhemerval Zanella Netto July 24, 2024, 3:59 p.m. UTC | #4
On 24/07/24 12:30, Zack Weinberg wrote:
> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
>> On 24/07/24 11:40, Zack Weinberg wrote:
>>> Continuing the discussion from part 1 of your patch here as this message
>>> provides important context:
>>>
>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
>>>> POSIX states that abort should be AS-safe
>>> ...
>>> [and now also states]
>>>>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>>>>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>>>>   does not terminate the process (for example, if the signal is caught
>>>>   and the handler returns), abort() may change the disposition of
>>>>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>>>>   second signal is sent and it does not terminate the process, the
>>>>   behavior is unspecified, except that the abort() call shall not
>>>>   return.
>>>
>>> It occurs to me that the new language does not require us to _succeed_
>>> in changing the disposition of SIGABRT, which means "but what if other
>>> threads race with us to change the disposition of SIGABRT?" is moot.
>>> The only hard requirements that survive are:
> ...
>> This does not solve the issue of concurrent sigaction (SIGABRT), which is
>> the main point of the locking (and on my patch the locks is taken iff for 
>> SIGABRT disposition change).
> 
> Uh, my whole point was that the new language permits us to ignore that issue.

Yes, I get it; but again this is nice optimization, but not available *today*.
Zack Weinberg July 24, 2024, 4:01 p.m. UTC | #5
On Wed, Jul 24, 2024, at 11:59 AM, Adhemerval Zanella Netto wrote:
> On 24/07/24 12:30, Zack Weinberg wrote:
>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
>>> On 24/07/24 11:40, Zack Weinberg wrote:
>>>> Continuing the discussion from part 1 of your patch here as this message
>>>> provides important context:
>>>>
>>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
>>>>> POSIX states that abort should be AS-safe
>>>> ...
>>>> [and now also states]
>>>>>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>>>>>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>>>>>   does not terminate the process (for example, if the signal is caught
>>>>>   and the handler returns), abort() may change the disposition of
>>>>>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>>>>>   second signal is sent and it does not terminate the process, the
>>>>>   behavior is unspecified, except that the abort() call shall not
>>>>>   return.
>>>>
>>>> It occurs to me that the new language does not require us to _succeed_
>>>> in changing the disposition of SIGABRT, which means "but what if other
>>>> threads race with us to change the disposition of SIGABRT?" is moot.
>>>> The only hard requirements that survive are:
>> ...
>>> This does not solve the issue of concurrent sigaction (SIGABRT), which is
>>> the main point of the locking (and on my patch the locks is taken iff for 
>>> SIGABRT disposition change).
>> 
>> Uh, my whole point was that the new language permits us to ignore that issue.
>
> Yes, I get it; but again this is nice optimization, but not available *today*.

Now I'm really confused. Were you not implementing the new language?
Adhemerval Zanella Netto July 24, 2024, 4:05 p.m. UTC | #6
On 24/07/24 13:01, Zack Weinberg wrote:
> On Wed, Jul 24, 2024, at 11:59 AM, Adhemerval Zanella Netto wrote:
>> On 24/07/24 12:30, Zack Weinberg wrote:
>>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
>>>> On 24/07/24 11:40, Zack Weinberg wrote:
>>>>> Continuing the discussion from part 1 of your patch here as this message
>>>>> provides important context:
>>>>>
>>>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
>>>>>> POSIX states that abort should be AS-safe
>>>>> ...
>>>>> [and now also states]
>>>>>>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>>>>>>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>>>>>>   does not terminate the process (for example, if the signal is caught
>>>>>>   and the handler returns), abort() may change the disposition of
>>>>>>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>>>>>>   second signal is sent and it does not terminate the process, the
>>>>>>   behavior is unspecified, except that the abort() call shall not
>>>>>>   return.
>>>>>
>>>>> It occurs to me that the new language does not require us to _succeed_
>>>>> in changing the disposition of SIGABRT, which means "but what if other
>>>>> threads race with us to change the disposition of SIGABRT?" is moot.
>>>>> The only hard requirements that survive are:
>>> ...
>>>> This does not solve the issue of concurrent sigaction (SIGABRT), which is
>>>> the main point of the locking (and on my patch the locks is taken iff for 
>>>> SIGABRT disposition change).
>>>
>>> Uh, my whole point was that the new language permits us to ignore that issue.
>>
>> Yes, I get it; but again this is nice optimization, but not available *today*.
> 
> Now I'm really confused. Were you not implementing the new language? 

My understanding is the new language does not allow the implementation to 
ignore the issue of concurrent sigaction (SIGABRT), because abort *might* return
to the caller in this case.

What I meant as a nice optimization would be a possible exit_signal feature from
the kernel.
Zack Weinberg July 24, 2024, 4:09 p.m. UTC | #7
On Wed, Jul 24, 2024, at 12:05 PM, Adhemerval Zanella Netto wrote:
> On 24/07/24 13:01, Zack Weinberg wrote: 
>> Now I'm really confused. Were you not implementing the new language? 
>
> My understanding is the new language does not allow the implementation to 
> ignore the issue of concurrent sigaction (SIGABRT), because abort *might* return
> to the caller in this case.

That's why I had a for (;;) _exit(nonzero) backstop.

zw
Adhemerval Zanella Netto July 24, 2024, 4:18 p.m. UTC | #8
On 24/07/24 13:09, Zack Weinberg wrote:
> 
> 
> On Wed, Jul 24, 2024, at 12:05 PM, Adhemerval Zanella Netto wrote:
>> On 24/07/24 13:01, Zack Weinberg wrote: 
>>> Now I'm really confused. Were you not implementing the new language? 
>>
>> My understanding is the new language does not allow the implementation to 
>> ignore the issue of concurrent sigaction (SIGABRT), because abort *might* return
>> to the caller in this case.
> 
> That's why I had a for (;;) _exit(nonzero) backstop.

The current code also has a exit and ABORT_INSTRUCTION; my understanding it is
a last resort since the parent will see a different termination state.  The lock
give us a better QoI, which process will be always terminated with SIGABRT.
Florian Weimer July 24, 2024, 4:37 p.m. UTC | #9
* Zack Weinberg:

> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
>> On 24/07/24 11:40, Zack Weinberg wrote:
>>> Continuing the discussion from part 1 of your patch here as this message
>>> provides important context:
>>> 
>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
>>>> POSIX states that abort should be AS-safe
>>> ...
>>> [and now also states]
>>>>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>>>>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>>>>   does not terminate the process (for example, if the signal is caught
>>>>   and the handler returns), abort() may change the disposition of
>>>>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>>>>   second signal is sent and it does not terminate the process, the
>>>>   behavior is unspecified, except that the abort() call shall not
>>>>   return.
>>> 
>>> It occurs to me that the new language does not require us to _succeed_
>>> in changing the disposition of SIGABRT, which means "but what if other
>>> threads race with us to change the disposition of SIGABRT?" is moot.
>>> The only hard requirements that survive are:
> ...
>> This does not solve the issue of concurrent sigaction (SIGABRT), which is
>> the main point of the locking (and on my patch the locks is taken iff for 
>> SIGABRT disposition change).
>
> Uh, my whole point was that the new language permits us to ignore that issue.

Indeed.  And it seems to suggest that it's okay if a concurrently
running execve picks up the SIG_DFL change.  The old POSIX language
doesn't mention the SIG_DFL change, so I suppose the current
abort/execve behavior (and the new one after this patch) was not allowed
before.

Thanks,
Florian
Adhemerval Zanella Netto July 24, 2024, 4:57 p.m. UTC | #10
On 24/07/24 13:37, Florian Weimer wrote:
> * Zack Weinberg:
> 
>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
>>> On 24/07/24 11:40, Zack Weinberg wrote:
>>>> Continuing the discussion from part 1 of your patch here as this message
>>>> provides important context:
>>>>
>>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote:
>>>>> POSIX states that abort should be AS-safe
>>>> ...
>>>> [and now also states]
>>>>>   The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as
>>>>>   if by means of raise() with the argument SIGABRT. [CX]If this signal
>>>>>   does not terminate the process (for example, if the signal is caught
>>>>>   and the handler returns), abort() may change the disposition of
>>>>>   SIGABRT to SIG_DFL and send the signal (in the same way) again. If a
>>>>>   second signal is sent and it does not terminate the process, the
>>>>>   behavior is unspecified, except that the abort() call shall not
>>>>>   return.
>>>>
>>>> It occurs to me that the new language does not require us to _succeed_
>>>> in changing the disposition of SIGABRT, which means "but what if other
>>>> threads race with us to change the disposition of SIGABRT?" is moot.
>>>> The only hard requirements that survive are:
>> ...
>>> This does not solve the issue of concurrent sigaction (SIGABRT), which is
>>> the main point of the locking (and on my patch the locks is taken iff for 
>>> SIGABRT disposition change).
>>
>> Uh, my whole point was that the new language permits us to ignore that issue.
> 
> Indeed.  And it seems to suggest that it's okay if a concurrently
> running execve picks up the SIG_DFL change.  The old POSIX language
> doesn't mention the SIG_DFL change, so I suppose the current
> abort/execve behavior (and the new one after this patch) was not allowed
> before.

So should we ignore concurrent sigaction for abort?  It does simplify the
implementation without the lock, although I am not sure if this is really
an improvement over current code regarding how parent seems the termination
state.
Zack Weinberg July 24, 2024, 5:19 p.m. UTC | #11
On Wed, Jul 24, 2024, at 12:57 PM, Adhemerval Zanella Netto wrote:
> On 24/07/24 13:37, Florian Weimer wrote:
>> * Zack Weinberg:
>>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
>>>> This does not solve the issue of concurrent sigaction (SIGABRT),
>>>> which is the main point of the locking (and on my patch the locks
>>>> is taken iff for SIGABRT disposition change).
>>>
>>> Uh, my whole point was that the new language permits us to ignore
>>> that issue.
>>
>> Indeed.  And it seems to suggest that it's okay if a concurrently
>> running execve picks up the SIG_DFL change.  The old POSIX language
>> doesn't mention the SIG_DFL change, so I suppose the current
>> abort/execve behavior (and the new one after this patch) was not
>> allowed before.
>
> So should we ignore concurrent sigaction for abort?  It does simplify
> the implementation without the lock, although I am not sure if this is
> really an improvement over current code regarding how parent seems the
> termination state.

Just my opinion, but we know for a fact that abort not being AS-safe is
a problem, and we know that in the absence of a new system call we
cannot avoid *some* infelicities in behavior when abort races with fork,
execve, or sigaction.  But handling, ignoring, or blocking SIGABRT is
rare, and people writing multithreaded programs should know they have to
be careful about changing signal dispositions in the middle of the run.

I think we don't need to spend a lot of time worrying about exactly what
happens when there's a race, as long as the process that called abort()
definitely does exit.  In particular I really don't think it matters if
the termination state indicates a nonzero _exit rather than a signal under
these rare circumstances.  And I think avoiding all locks in abort is
the best way to make it AS-safe without causing other problems.

zw
Adhemerval Zanella Netto July 24, 2024, 5:45 p.m. UTC | #12
On 24/07/24 14:19, Zack Weinberg wrote:
> On Wed, Jul 24, 2024, at 12:57 PM, Adhemerval Zanella Netto wrote:
>> On 24/07/24 13:37, Florian Weimer wrote:
>>> * Zack Weinberg:
>>>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
>>>>> This does not solve the issue of concurrent sigaction (SIGABRT),
>>>>> which is the main point of the locking (and on my patch the locks
>>>>> is taken iff for SIGABRT disposition change).
>>>>
>>>> Uh, my whole point was that the new language permits us to ignore
>>>> that issue.
>>>
>>> Indeed.  And it seems to suggest that it's okay if a concurrently
>>> running execve picks up the SIG_DFL change.  The old POSIX language
>>> doesn't mention the SIG_DFL change, so I suppose the current
>>> abort/execve behavior (and the new one after this patch) was not
>>> allowed before.
>>
>> So should we ignore concurrent sigaction for abort?  It does simplify
>> the implementation without the lock, although I am not sure if this is
>> really an improvement over current code regarding how parent seems the
>> termination state.
> 
> Just my opinion, but we know for a fact that abort not being AS-safe is
> a problem, and we know that in the absence of a new system call we
> cannot avoid *some* infelicities in behavior when abort races with fork,
> execve, or sigaction.  But handling, ignoring, or blocking SIGABRT is
> rare, and people writing multithreaded programs should know they have to
> be careful about changing signal dispositions in the middle of the run.
> 
> I think we don't need to spend a lot of time worrying about exactly what
> happens when there's a race, as long as the process that called abort()
> definitely does exit.  In particular I really don't think it matters if
> the termination state indicates a nonzero _exit rather than a signal under
> these rare circumstances.  And I think avoiding all locks in abort is
> the best way to make it AS-safe without causing other problems.

Another issue is abort() will usually generate a coredump, and this is the
main reason we add a reference of __abort_msg on abort TU.  Removing the
signal disposition change along with the lock and just calling _exit
means not only a change in parent termination view, but termination without
a core for calls that caller might expect one.

This is the current behavior, I am not sure if changing for a more relaxed
one so we can remove a lock is really an improvement here.
enh July 24, 2024, 5:49 p.m. UTC | #13
fwiw, the BSDs (and thus Android) basically just unblock SIGABRT,
raise, restore the default disposition, raise, then exit. Android's
had no bug reports from that. (though full disclosure: Android's a bit
weird in that by default all processes have a crashdump-generating
SIGABRT handler installed.)

(if i were going to change anything in bionic, it would probably be to
try the raise before unblocking SIGABRT, on the assumption it's not
usually blocked...)

On Wed, Jul 24, 2024 at 1:20 PM Zack Weinberg <zack@owlfolio.org> wrote:
>
> On Wed, Jul 24, 2024, at 12:57 PM, Adhemerval Zanella Netto wrote:
> > On 24/07/24 13:37, Florian Weimer wrote:
> >> * Zack Weinberg:
> >>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote:
> >>>> This does not solve the issue of concurrent sigaction (SIGABRT),
> >>>> which is the main point of the locking (and on my patch the locks
> >>>> is taken iff for SIGABRT disposition change).
> >>>
> >>> Uh, my whole point was that the new language permits us to ignore
> >>> that issue.
> >>
> >> Indeed.  And it seems to suggest that it's okay if a concurrently
> >> running execve picks up the SIG_DFL change.  The old POSIX language
> >> doesn't mention the SIG_DFL change, so I suppose the current
> >> abort/execve behavior (and the new one after this patch) was not
> >> allowed before.
> >
> > So should we ignore concurrent sigaction for abort?  It does simplify
> > the implementation without the lock, although I am not sure if this is
> > really an improvement over current code regarding how parent seems the
> > termination state.
>
> Just my opinion, but we know for a fact that abort not being AS-safe is
> a problem, and we know that in the absence of a new system call we
> cannot avoid *some* infelicities in behavior when abort races with fork,
> execve, or sigaction.  But handling, ignoring, or blocking SIGABRT is
> rare, and people writing multithreaded programs should know they have to
> be careful about changing signal dispositions in the middle of the run.
>
> I think we don't need to spend a lot of time worrying about exactly what
> happens when there's a race, as long as the process that called abort()
> definitely does exit.  In particular I really don't think it matters if
> the termination state indicates a nonzero _exit rather than a signal under
> these rare circumstances.  And I think avoiding all locks in abort is
> the best way to make it AS-safe without causing other problems.
>
> zw
Zack Weinberg July 24, 2024, 5:57 p.m. UTC | #14
On Wed, Jul 24, 2024, at 1:45 PM, Adhemerval Zanella Netto wrote:
> On 24/07/24 14:19, Zack Weinberg wrote:
>> I think we don't need to spend a lot of time worrying about exactly
>> what happens when there's a race, as long as the process that called
>> abort() definitely does exit.  In particular I really don't think it
>> matters if the termination state indicates a nonzero _exit rather
>> than a signal under these rare circumstances.  And I think avoiding
>> all locks in abort is the best way to make it AS-safe without causing
>> other problems.
>
> Another issue is abort() will usually generate a coredump, and this is
> the main reason we add a reference of __abort_msg on abort TU.
> Removing the signal disposition change along with the lock and just
> calling _exit means not only a change in parent termination view, but
> termination without a core for calls that caller might expect one.

Well, again, this is under rare circumstances.  To hit the _exit, the
program has to suppress the initial raise(SIGABRT) and then win a race
against the abort implementation trying to reset the mask and disposition.
(To be clear, I am *not* proposing we drop the attempt to reset the mask
and disposition, I am only proposing that we not bother with any locking
around that attempt.)  It is unusual to suppress SIGABRT in the first
place; it is unusual to change signal dispositions in the middle of the
program; abort is only called when the program is already
malfunctioning; the cumulative probability of hitting the _exit should
be so small, both before and after the change, that we can live with not
getting core dumps when it does happen.

(It's *not* unusual to change the *mask* in the middle of the program,
but pthread_sigmask lets us unblock SIGABRT for just the thread that's
calling abort, and any multithreaded program that uses sigprocmask while
multiple threads are active has only itself to blame.)

zw
Adhemerval Zanella Netto July 24, 2024, 6:16 p.m. UTC | #15
On 24/07/24 14:57, Zack Weinberg wrote:
> On Wed, Jul 24, 2024, at 1:45 PM, Adhemerval Zanella Netto wrote:
>> On 24/07/24 14:19, Zack Weinberg wrote:
>>> I think we don't need to spend a lot of time worrying about exactly
>>> what happens when there's a race, as long as the process that called
>>> abort() definitely does exit.  In particular I really don't think it
>>> matters if the termination state indicates a nonzero _exit rather
>>> than a signal under these rare circumstances.  And I think avoiding
>>> all locks in abort is the best way to make it AS-safe without causing
>>> other problems.
>>
>> Another issue is abort() will usually generate a coredump, and this is
>> the main reason we add a reference of __abort_msg on abort TU.
>> Removing the signal disposition change along with the lock and just
>> calling _exit means not only a change in parent termination view, but
>> termination without a core for calls that caller might expect one.
> 
> Well, again, this is under rare circumstances.  To hit the _exit, the
> program has to suppress the initial raise(SIGABRT) and then win a race
> against the abort implementation trying to reset the mask and disposition.
> (To be clear, I am *not* proposing we drop the attempt to reset the mask
> and disposition, I am only proposing that we not bother with any locking
> around that attempt.)  It is unusual to suppress SIGABRT in the first
> place; it is unusual to change signal dispositions in the middle of the
> program; abort is only called when the program is already
> malfunctioning; the cumulative probability of hitting the _exit should
> be so small, both before and after the change, that we can live with not
> getting core dumps when it does happen.
> 
> (It's *not* unusual to change the *mask* in the middle of the program,
> but pthread_sigmask lets us unblock SIGABRT for just the thread that's
> calling abort, and any multithreaded program that uses sigprocmask while
> multiple threads are active has only itself to blame.)

That is essentially the BSD strategy, as noted by enh; where it is still racy
but I give that the possible windows would quite narrow.

The main problem with the BSD approach for glibc is in fact setjmp, as I 
noted in the first patch.  It means that usages like we do on
debug/tst-fortify.c will always abort:


  static volatile int chk_fail_ok;
  static jmp_buf chk_fail_buf;

  static void
  handler (int sig)
  {
    if (chk_fail_ok)
      {
        chk_fail_ok = 0;
        longjmp (chk_fail_buf, 1);
      }
    else
      _exit (127);
  }
  [...]
  signal (SIGABRT, handler);
  [....]
  chk_fail_ok = 1;
  if (! setjmp (chk_fail_buf))
    {
      // Something that can call abort, like a failed fortify function.
      chk_fail_ok = 0;
      printf ("FAIL\n");
    }
  
We can ignore it and suggest possible callers that to just use sigsetjmp 
instead.  

So the options we have are:

  1. This patch approach, which enforces the SIGABRT termination semantic
     and uses a lock.  The setjmp change is optional.

  2. The BSD way, which is similar to 1. with the only difference it does
     not use a lock.  There still a race windows, albeit small.

  3. Your suggestion to just unblock the signal and raise SIGABRT; and 
     fallback to exit if the handler returns.  There is no lock, but there
     is not guarantee that parent will see a abnormal terminal also.

I still prefer 1. since it does not have any race condition and the lock is
only exercised on abnormal program termination anyway.  I don't have a strong
opinion about the setjmp semantic change, and I can drop the first patch is
we are ok about it.
enh July 24, 2024, 6:33 p.m. UTC | #16
On Wed, Jul 24, 2024 at 2:17 PM Adhemerval Zanella Netto
<adhemerval.zanella@linaro.org> wrote:
>
>
>
> On 24/07/24 14:57, Zack Weinberg wrote:
> > On Wed, Jul 24, 2024, at 1:45 PM, Adhemerval Zanella Netto wrote:
> >> On 24/07/24 14:19, Zack Weinberg wrote:
> >>> I think we don't need to spend a lot of time worrying about exactly
> >>> what happens when there's a race, as long as the process that called
> >>> abort() definitely does exit.  In particular I really don't think it
> >>> matters if the termination state indicates a nonzero _exit rather
> >>> than a signal under these rare circumstances.  And I think avoiding
> >>> all locks in abort is the best way to make it AS-safe without causing
> >>> other problems.
> >>
> >> Another issue is abort() will usually generate a coredump, and this is
> >> the main reason we add a reference of __abort_msg on abort TU.
> >> Removing the signal disposition change along with the lock and just
> >> calling _exit means not only a change in parent termination view, but
> >> termination without a core for calls that caller might expect one.
> >
> > Well, again, this is under rare circumstances.  To hit the _exit, the
> > program has to suppress the initial raise(SIGABRT) and then win a race
> > against the abort implementation trying to reset the mask and disposition.
> > (To be clear, I am *not* proposing we drop the attempt to reset the mask
> > and disposition, I am only proposing that we not bother with any locking
> > around that attempt.)  It is unusual to suppress SIGABRT in the first
> > place; it is unusual to change signal dispositions in the middle of the
> > program; abort is only called when the program is already
> > malfunctioning; the cumulative probability of hitting the _exit should
> > be so small, both before and after the change, that we can live with not
> > getting core dumps when it does happen.
> >
> > (It's *not* unusual to change the *mask* in the middle of the program,
> > but pthread_sigmask lets us unblock SIGABRT for just the thread that's
> > calling abort, and any multithreaded program that uses sigprocmask while
> > multiple threads are active has only itself to blame.)
>
> That is essentially the BSD strategy, as noted by enh; where it is still racy
> but I give that the possible windows would quite narrow.
>
> The main problem with the BSD approach for glibc is in fact setjmp, as I
> noted in the first patch.

yeah, that's actually another BSD difference that Android shares ---
for us, setjmp() == sigsetjmp(..., 1) (and there's a
_setjmp()/_longjmp() if you _don't_ want the signal mask).

>  It means that usages like we do on
> debug/tst-fortify.c will always abort:
>
>
>   static volatile int chk_fail_ok;
>   static jmp_buf chk_fail_buf;
>
>   static void
>   handler (int sig)
>   {
>     if (chk_fail_ok)
>       {
>         chk_fail_ok = 0;
>         longjmp (chk_fail_buf, 1);
>       }
>     else
>       _exit (127);
>   }
>   [...]
>   signal (SIGABRT, handler);
>   [....]
>   chk_fail_ok = 1;
>   if (! setjmp (chk_fail_buf))
>     {
>       // Something that can call abort, like a failed fortify function.
>       chk_fail_ok = 0;
>       printf ("FAIL\n");
>     }
>
> We can ignore it and suggest possible callers that to just use sigsetjmp
> instead.
>
> So the options we have are:
>
>   1. This patch approach, which enforces the SIGABRT termination semantic
>      and uses a lock.  The setjmp change is optional.
>
>   2. The BSD way, which is similar to 1. with the only difference it does
>      not use a lock.  There still a race windows, albeit small.
>
>   3. Your suggestion to just unblock the signal and raise SIGABRT; and
>      fallback to exit if the handler returns.  There is no lock, but there
>      is not guarantee that parent will see a abnormal terminal also.
>
> I still prefer 1. since it does not have any race condition and the lock is
> only exercised on abnormal program termination anyway.  I don't have a strong
> opinion about the setjmp semantic change, and I can drop the first patch is
> we are ok about it.
Zack Weinberg July 24, 2024, 6:39 p.m. UTC | #17
On Wed, Jul 24, 2024, at 2:16 PM, Adhemerval Zanella Netto wrote:
...
> The main problem with the BSD approach for glibc is in fact setjmp, as I 
> noted in the first patch.  It means that usages like we do on
> debug/tst-fortify.c will always abort:
...

I think I'm okay with saying that any  longjmp out of a signal handler really ought to be sigsetjmp/siglongjmp. Unlike Florian I would also be OK with changing plain setjmp/longjmp to restore the signal mask; as I said before, I think that's more likely to make existing problems go away than introduce new ones. Or we could split the difference and introduce a new feature test macro,  _LONGJMP_RESTORE_SIGMASK perhaps.

...
>   2. The BSD way, which is similar to 1. with the only difference it does
>      not use a lock.  There still a race windows, albeit small.
>
>   3. Your suggestion to just unblock the signal and raise SIGABRT; and 
>      fallback to exit if the handler returns.  There is no lock, but there
>      is not guarantee that parent will see a abnormal terminal also.

Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far.

zw
Adhemerval Zanella Netto July 24, 2024, 6:52 p.m. UTC | #18
On 24/07/24 15:39, Zack Weinberg wrote:
> On Wed, Jul 24, 2024, at 2:16 PM, Adhemerval Zanella Netto wrote:
> ...
>> The main problem with the BSD approach for glibc is in fact setjmp, as I 
>> noted in the first patch.  It means that usages like we do on
>> debug/tst-fortify.c will always abort:
> ...
> 
> I think I'm okay with saying that any  longjmp out of a signal handler really ought to be sigsetjmp/siglongjmp. Unlike Florian I would also be OK with changing plain setjmp/longjmp to restore the signal mask; as I said before, I think that's more likely to make existing problems go away than introduce new ones. Or we could split the difference and introduce a new feature test macro,  _LONGJMP_RESTORE_SIGMASK perhaps.
> 
> ...
>>   2. The BSD way, which is similar to 1. with the only difference it does
>>      not use a lock.  There still a race windows, albeit small.
>>
>>   3. Your suggestion to just unblock the signal and raise SIGABRT; and 
>>      fallback to exit if the handler returns.  There is no lock, but there
>>      is not guarantee that parent will see a abnormal terminal also.
> 
> Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far.

Right, so essentially current patch without the lock.  I still think the lock
is really cheap and it helps on other very unlikely corner cases (like the
posix_spawn just after the new signal disposition), and make abort without
any race condition.
Zack Weinberg July 24, 2024, 9:01 p.m. UTC | #19
On Wed, Jul 24, 2024, at 2:52 PM, Adhemerval Zanella Netto wrote:
> On 24/07/24 15:39, Zack Weinberg wrote:
>> Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far.
>
> Right, so essentially current patch without the lock.  I still think the lock
> is really cheap and it helps on other very unlikely corner cases

But the lock is the thing that makes abort not AS-safe!

zw
Adhemerval Zanella Netto July 24, 2024, 9:35 p.m. UTC | #20
> On 24 Jul 2024, at 18:01, Zack Weinberg <zack@owlfolio.org> wrote:
> On Wed, Jul 24, 2024, at 2:52 PM, Adhemerval Zanella Netto wrote:
>> On 24/07/24 15:39, Zack Weinberg wrote:
>>> Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far.
>> 
>> Right, so essentially current patch without the lock.  I still think the lock
>> is really cheap and it helps on other very unlikely corner cases
> 
> But the lock is the thing that makes abort not AS-safe!
> 
> zw

The *recursive* lock is the main issue, where the current implementation tries to mitigate with a somewhat complex state machine. By blocking all signals and then taking the locking, we make sure there is no need to handle reentrancy.