Message ID | 20240723134235.1520483-1-adhemerval.zanella@linaro.org |
---|---|
Headers | show |
Series | Make abort AS-safe | expand |
Continuing the discussion from part 1 of your patch here as this message provides important context: On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: > POSIX states that abort should be AS-safe ... [and now also states] > The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as > if by means of raise() with the argument SIGABRT. [CX]If this signal > does not terminate the process (for example, if the signal is caught > and the handler returns), abort() may change the disposition of > SIGABRT to SIG_DFL and send the signal (in the same way) again. If a > second signal is sent and it does not terminate the process, the > behavior is unspecified, except that the abort() call shall not > return. It occurs to me that the new language does not require us to _succeed_ in changing the disposition of SIGABRT, which means "but what if other threads race with us to change the disposition of SIGABRT?" is moot. The only hard requirements that survive are: - abort shall cause abnormal termination of the process - abort's first attempt to terminate the process shall be by sending the calling thread a catchable and blockable SIGABRT - under no circumstances shall abort return to its caller - abort shall be AS-safe And that means we shouldn't need any locking or new interfaces or anything. I claim this is a fully conformant implementation of abort. Change my mind. #include <pthread.h> #include <signal.h> #include <stdnoreturn.h> #include <sysexits.h> #include <unistd.h> noreturn void abort (void) { // could attempt to flush open stdio streams here, but it's // not required; if an attempt is made, locked streams should be // skipped rather than waiting for them pthread_kill (pthread_self (), SIGABRT); // if we get here, SIGABRT is blocked or has a handler that returns #if 1 // this block is not required nor is anything in it required to work // unblocking SIGABRT in this thread is sufficient for its default // disposition to terminate the whole process sigset_t ss_SIGABRT; sigemptyset (&ss_SIGABRT); sigaddset (&ss_SIGABRT, SIGABRT); pthread_sigmask (SIG_UNBLOCK, &ss_SIGABRT, 0); signal (SIGABRT, SIG_DFL); pthread_kill (pthread_self (), SIGABRT); #endif for (;;) _exit (EX_SOFTWARE); } zw p.s. It also occurs to me that a hypothetical _exit_with_signal syscall would have other uses than replacing the #if 1 block above: for instance, it would permit a wrapper program to forward a signal exit from the wrapped program to its parent with substantially less fuss than is possible today.
On 24/07/24 11:40, Zack Weinberg wrote: > Continuing the discussion from part 1 of your patch here as this message > provides important context: > > On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: >> POSIX states that abort should be AS-safe > ... > [and now also states] >> The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as >> if by means of raise() with the argument SIGABRT. [CX]If this signal >> does not terminate the process (for example, if the signal is caught >> and the handler returns), abort() may change the disposition of >> SIGABRT to SIG_DFL and send the signal (in the same way) again. If a >> second signal is sent and it does not terminate the process, the >> behavior is unspecified, except that the abort() call shall not >> return. > > It occurs to me that the new language does not require us to _succeed_ > in changing the disposition of SIGABRT, which means "but what if other > threads race with us to change the disposition of SIGABRT?" is moot. > The only hard requirements that survive are: > > - abort shall cause abnormal termination of the process > - abort's first attempt to terminate the process shall be by sending > the calling thread a catchable and blockable SIGABRT > - under no circumstances shall abort return to its caller > - abort shall be AS-safe > > And that means we shouldn't need any locking or new interfaces or anything. > I claim this is a fully conformant implementation of abort. > Change my mind. > > #include <pthread.h> > #include <signal.h> > #include <stdnoreturn.h> > #include <sysexits.h> > #include <unistd.h> > > noreturn void > abort (void) > { > // could attempt to flush open stdio streams here, but it's > // not required; if an attempt is made, locked streams should be > // skipped rather than waiting for them > > pthread_kill (pthread_self (), SIGABRT); > // if we get here, SIGABRT is blocked or has a handler that returns > > #if 1 // this block is not required nor is anything in it required to work > // unblocking SIGABRT in this thread is sufficient for its default > // disposition to terminate the whole process > sigset_t ss_SIGABRT; > sigemptyset (&ss_SIGABRT); > sigaddset (&ss_SIGABRT, SIGABRT); > pthread_sigmask (SIG_UNBLOCK, &ss_SIGABRT, 0); > signal (SIGABRT, SIG_DFL); > pthread_kill (pthread_self (), SIGABRT); > #endif > > for (;;) > _exit (EX_SOFTWARE); > } > > zw This does not solve the issue of concurrent sigaction (SIGABRT), which is the main point of the locking (and on my patch the locks is taken iff for SIGABRT disposition change). > > p.s. It also occurs to me that a hypothetical _exit_with_signal > syscall would have other uses than replacing the #if 1 block above: > for instance, it would permit a wrapper program to forward a signal > exit from the wrapped program to its parent with substantially less > fuss than is possible today. A syscall would be a nice thing to have, but not really required. Also, making abort async-signal-safe only for newly kernel is far fol ideal.
On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: > On 24/07/24 11:40, Zack Weinberg wrote: >> Continuing the discussion from part 1 of your patch here as this message >> provides important context: >> >> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: >>> POSIX states that abort should be AS-safe >> ... >> [and now also states] >>> The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as >>> if by means of raise() with the argument SIGABRT. [CX]If this signal >>> does not terminate the process (for example, if the signal is caught >>> and the handler returns), abort() may change the disposition of >>> SIGABRT to SIG_DFL and send the signal (in the same way) again. If a >>> second signal is sent and it does not terminate the process, the >>> behavior is unspecified, except that the abort() call shall not >>> return. >> >> It occurs to me that the new language does not require us to _succeed_ >> in changing the disposition of SIGABRT, which means "but what if other >> threads race with us to change the disposition of SIGABRT?" is moot. >> The only hard requirements that survive are: ... > This does not solve the issue of concurrent sigaction (SIGABRT), which is > the main point of the locking (and on my patch the locks is taken iff for > SIGABRT disposition change). Uh, my whole point was that the new language permits us to ignore that issue. zw
On 24/07/24 12:30, Zack Weinberg wrote: > On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: >> On 24/07/24 11:40, Zack Weinberg wrote: >>> Continuing the discussion from part 1 of your patch here as this message >>> provides important context: >>> >>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: >>>> POSIX states that abort should be AS-safe >>> ... >>> [and now also states] >>>> The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as >>>> if by means of raise() with the argument SIGABRT. [CX]If this signal >>>> does not terminate the process (for example, if the signal is caught >>>> and the handler returns), abort() may change the disposition of >>>> SIGABRT to SIG_DFL and send the signal (in the same way) again. If a >>>> second signal is sent and it does not terminate the process, the >>>> behavior is unspecified, except that the abort() call shall not >>>> return. >>> >>> It occurs to me that the new language does not require us to _succeed_ >>> in changing the disposition of SIGABRT, which means "but what if other >>> threads race with us to change the disposition of SIGABRT?" is moot. >>> The only hard requirements that survive are: > ... >> This does not solve the issue of concurrent sigaction (SIGABRT), which is >> the main point of the locking (and on my patch the locks is taken iff for >> SIGABRT disposition change). > > Uh, my whole point was that the new language permits us to ignore that issue. Yes, I get it; but again this is nice optimization, but not available *today*.
On Wed, Jul 24, 2024, at 11:59 AM, Adhemerval Zanella Netto wrote: > On 24/07/24 12:30, Zack Weinberg wrote: >> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: >>> On 24/07/24 11:40, Zack Weinberg wrote: >>>> Continuing the discussion from part 1 of your patch here as this message >>>> provides important context: >>>> >>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: >>>>> POSIX states that abort should be AS-safe >>>> ... >>>> [and now also states] >>>>> The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as >>>>> if by means of raise() with the argument SIGABRT. [CX]If this signal >>>>> does not terminate the process (for example, if the signal is caught >>>>> and the handler returns), abort() may change the disposition of >>>>> SIGABRT to SIG_DFL and send the signal (in the same way) again. If a >>>>> second signal is sent and it does not terminate the process, the >>>>> behavior is unspecified, except that the abort() call shall not >>>>> return. >>>> >>>> It occurs to me that the new language does not require us to _succeed_ >>>> in changing the disposition of SIGABRT, which means "but what if other >>>> threads race with us to change the disposition of SIGABRT?" is moot. >>>> The only hard requirements that survive are: >> ... >>> This does not solve the issue of concurrent sigaction (SIGABRT), which is >>> the main point of the locking (and on my patch the locks is taken iff for >>> SIGABRT disposition change). >> >> Uh, my whole point was that the new language permits us to ignore that issue. > > Yes, I get it; but again this is nice optimization, but not available *today*. Now I'm really confused. Were you not implementing the new language?
On 24/07/24 13:01, Zack Weinberg wrote: > On Wed, Jul 24, 2024, at 11:59 AM, Adhemerval Zanella Netto wrote: >> On 24/07/24 12:30, Zack Weinberg wrote: >>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: >>>> On 24/07/24 11:40, Zack Weinberg wrote: >>>>> Continuing the discussion from part 1 of your patch here as this message >>>>> provides important context: >>>>> >>>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: >>>>>> POSIX states that abort should be AS-safe >>>>> ... >>>>> [and now also states] >>>>>> The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as >>>>>> if by means of raise() with the argument SIGABRT. [CX]If this signal >>>>>> does not terminate the process (for example, if the signal is caught >>>>>> and the handler returns), abort() may change the disposition of >>>>>> SIGABRT to SIG_DFL and send the signal (in the same way) again. If a >>>>>> second signal is sent and it does not terminate the process, the >>>>>> behavior is unspecified, except that the abort() call shall not >>>>>> return. >>>>> >>>>> It occurs to me that the new language does not require us to _succeed_ >>>>> in changing the disposition of SIGABRT, which means "but what if other >>>>> threads race with us to change the disposition of SIGABRT?" is moot. >>>>> The only hard requirements that survive are: >>> ... >>>> This does not solve the issue of concurrent sigaction (SIGABRT), which is >>>> the main point of the locking (and on my patch the locks is taken iff for >>>> SIGABRT disposition change). >>> >>> Uh, my whole point was that the new language permits us to ignore that issue. >> >> Yes, I get it; but again this is nice optimization, but not available *today*. > > Now I'm really confused. Were you not implementing the new language? My understanding is the new language does not allow the implementation to ignore the issue of concurrent sigaction (SIGABRT), because abort *might* return to the caller in this case. What I meant as a nice optimization would be a possible exit_signal feature from the kernel.
On Wed, Jul 24, 2024, at 12:05 PM, Adhemerval Zanella Netto wrote: > On 24/07/24 13:01, Zack Weinberg wrote: >> Now I'm really confused. Were you not implementing the new language? > > My understanding is the new language does not allow the implementation to > ignore the issue of concurrent sigaction (SIGABRT), because abort *might* return > to the caller in this case. That's why I had a for (;;) _exit(nonzero) backstop. zw
On 24/07/24 13:09, Zack Weinberg wrote: > > > On Wed, Jul 24, 2024, at 12:05 PM, Adhemerval Zanella Netto wrote: >> On 24/07/24 13:01, Zack Weinberg wrote: >>> Now I'm really confused. Were you not implementing the new language? >> >> My understanding is the new language does not allow the implementation to >> ignore the issue of concurrent sigaction (SIGABRT), because abort *might* return >> to the caller in this case. > > That's why I had a for (;;) _exit(nonzero) backstop. The current code also has a exit and ABORT_INSTRUCTION; my understanding it is a last resort since the parent will see a different termination state. The lock give us a better QoI, which process will be always terminated with SIGABRT.
* Zack Weinberg: > On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: >> On 24/07/24 11:40, Zack Weinberg wrote: >>> Continuing the discussion from part 1 of your patch here as this message >>> provides important context: >>> >>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: >>>> POSIX states that abort should be AS-safe >>> ... >>> [and now also states] >>>> The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as >>>> if by means of raise() with the argument SIGABRT. [CX]If this signal >>>> does not terminate the process (for example, if the signal is caught >>>> and the handler returns), abort() may change the disposition of >>>> SIGABRT to SIG_DFL and send the signal (in the same way) again. If a >>>> second signal is sent and it does not terminate the process, the >>>> behavior is unspecified, except that the abort() call shall not >>>> return. >>> >>> It occurs to me that the new language does not require us to _succeed_ >>> in changing the disposition of SIGABRT, which means "but what if other >>> threads race with us to change the disposition of SIGABRT?" is moot. >>> The only hard requirements that survive are: > ... >> This does not solve the issue of concurrent sigaction (SIGABRT), which is >> the main point of the locking (and on my patch the locks is taken iff for >> SIGABRT disposition change). > > Uh, my whole point was that the new language permits us to ignore that issue. Indeed. And it seems to suggest that it's okay if a concurrently running execve picks up the SIG_DFL change. The old POSIX language doesn't mention the SIG_DFL change, so I suppose the current abort/execve behavior (and the new one after this patch) was not allowed before. Thanks, Florian
On 24/07/24 13:37, Florian Weimer wrote: > * Zack Weinberg: > >> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: >>> On 24/07/24 11:40, Zack Weinberg wrote: >>>> Continuing the discussion from part 1 of your patch here as this message >>>> provides important context: >>>> >>>> On Tue, Jul 23, 2024, at 9:41 AM, Adhemerval Zanella wrote: >>>>> POSIX states that abort should be AS-safe >>>> ... >>>> [and now also states] >>>>> The SIGABRT signal shall be sent to the calling [CX]thread[/CX] as >>>>> if by means of raise() with the argument SIGABRT. [CX]If this signal >>>>> does not terminate the process (for example, if the signal is caught >>>>> and the handler returns), abort() may change the disposition of >>>>> SIGABRT to SIG_DFL and send the signal (in the same way) again. If a >>>>> second signal is sent and it does not terminate the process, the >>>>> behavior is unspecified, except that the abort() call shall not >>>>> return. >>>> >>>> It occurs to me that the new language does not require us to _succeed_ >>>> in changing the disposition of SIGABRT, which means "but what if other >>>> threads race with us to change the disposition of SIGABRT?" is moot. >>>> The only hard requirements that survive are: >> ... >>> This does not solve the issue of concurrent sigaction (SIGABRT), which is >>> the main point of the locking (and on my patch the locks is taken iff for >>> SIGABRT disposition change). >> >> Uh, my whole point was that the new language permits us to ignore that issue. > > Indeed. And it seems to suggest that it's okay if a concurrently > running execve picks up the SIG_DFL change. The old POSIX language > doesn't mention the SIG_DFL change, so I suppose the current > abort/execve behavior (and the new one after this patch) was not allowed > before. So should we ignore concurrent sigaction for abort? It does simplify the implementation without the lock, although I am not sure if this is really an improvement over current code regarding how parent seems the termination state.
On Wed, Jul 24, 2024, at 12:57 PM, Adhemerval Zanella Netto wrote: > On 24/07/24 13:37, Florian Weimer wrote: >> * Zack Weinberg: >>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: >>>> This does not solve the issue of concurrent sigaction (SIGABRT), >>>> which is the main point of the locking (and on my patch the locks >>>> is taken iff for SIGABRT disposition change). >>> >>> Uh, my whole point was that the new language permits us to ignore >>> that issue. >> >> Indeed. And it seems to suggest that it's okay if a concurrently >> running execve picks up the SIG_DFL change. The old POSIX language >> doesn't mention the SIG_DFL change, so I suppose the current >> abort/execve behavior (and the new one after this patch) was not >> allowed before. > > So should we ignore concurrent sigaction for abort? It does simplify > the implementation without the lock, although I am not sure if this is > really an improvement over current code regarding how parent seems the > termination state. Just my opinion, but we know for a fact that abort not being AS-safe is a problem, and we know that in the absence of a new system call we cannot avoid *some* infelicities in behavior when abort races with fork, execve, or sigaction. But handling, ignoring, or blocking SIGABRT is rare, and people writing multithreaded programs should know they have to be careful about changing signal dispositions in the middle of the run. I think we don't need to spend a lot of time worrying about exactly what happens when there's a race, as long as the process that called abort() definitely does exit. In particular I really don't think it matters if the termination state indicates a nonzero _exit rather than a signal under these rare circumstances. And I think avoiding all locks in abort is the best way to make it AS-safe without causing other problems. zw
On 24/07/24 14:19, Zack Weinberg wrote: > On Wed, Jul 24, 2024, at 12:57 PM, Adhemerval Zanella Netto wrote: >> On 24/07/24 13:37, Florian Weimer wrote: >>> * Zack Weinberg: >>>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: >>>>> This does not solve the issue of concurrent sigaction (SIGABRT), >>>>> which is the main point of the locking (and on my patch the locks >>>>> is taken iff for SIGABRT disposition change). >>>> >>>> Uh, my whole point was that the new language permits us to ignore >>>> that issue. >>> >>> Indeed. And it seems to suggest that it's okay if a concurrently >>> running execve picks up the SIG_DFL change. The old POSIX language >>> doesn't mention the SIG_DFL change, so I suppose the current >>> abort/execve behavior (and the new one after this patch) was not >>> allowed before. >> >> So should we ignore concurrent sigaction for abort? It does simplify >> the implementation without the lock, although I am not sure if this is >> really an improvement over current code regarding how parent seems the >> termination state. > > Just my opinion, but we know for a fact that abort not being AS-safe is > a problem, and we know that in the absence of a new system call we > cannot avoid *some* infelicities in behavior when abort races with fork, > execve, or sigaction. But handling, ignoring, or blocking SIGABRT is > rare, and people writing multithreaded programs should know they have to > be careful about changing signal dispositions in the middle of the run. > > I think we don't need to spend a lot of time worrying about exactly what > happens when there's a race, as long as the process that called abort() > definitely does exit. In particular I really don't think it matters if > the termination state indicates a nonzero _exit rather than a signal under > these rare circumstances. And I think avoiding all locks in abort is > the best way to make it AS-safe without causing other problems. Another issue is abort() will usually generate a coredump, and this is the main reason we add a reference of __abort_msg on abort TU. Removing the signal disposition change along with the lock and just calling _exit means not only a change in parent termination view, but termination without a core for calls that caller might expect one. This is the current behavior, I am not sure if changing for a more relaxed one so we can remove a lock is really an improvement here.
fwiw, the BSDs (and thus Android) basically just unblock SIGABRT, raise, restore the default disposition, raise, then exit. Android's had no bug reports from that. (though full disclosure: Android's a bit weird in that by default all processes have a crashdump-generating SIGABRT handler installed.) (if i were going to change anything in bionic, it would probably be to try the raise before unblocking SIGABRT, on the assumption it's not usually blocked...) On Wed, Jul 24, 2024 at 1:20 PM Zack Weinberg <zack@owlfolio.org> wrote: > > On Wed, Jul 24, 2024, at 12:57 PM, Adhemerval Zanella Netto wrote: > > On 24/07/24 13:37, Florian Weimer wrote: > >> * Zack Weinberg: > >>> On Wed, Jul 24, 2024, at 11:25 AM, Adhemerval Zanella Netto wrote: > >>>> This does not solve the issue of concurrent sigaction (SIGABRT), > >>>> which is the main point of the locking (and on my patch the locks > >>>> is taken iff for SIGABRT disposition change). > >>> > >>> Uh, my whole point was that the new language permits us to ignore > >>> that issue. > >> > >> Indeed. And it seems to suggest that it's okay if a concurrently > >> running execve picks up the SIG_DFL change. The old POSIX language > >> doesn't mention the SIG_DFL change, so I suppose the current > >> abort/execve behavior (and the new one after this patch) was not > >> allowed before. > > > > So should we ignore concurrent sigaction for abort? It does simplify > > the implementation without the lock, although I am not sure if this is > > really an improvement over current code regarding how parent seems the > > termination state. > > Just my opinion, but we know for a fact that abort not being AS-safe is > a problem, and we know that in the absence of a new system call we > cannot avoid *some* infelicities in behavior when abort races with fork, > execve, or sigaction. But handling, ignoring, or blocking SIGABRT is > rare, and people writing multithreaded programs should know they have to > be careful about changing signal dispositions in the middle of the run. > > I think we don't need to spend a lot of time worrying about exactly what > happens when there's a race, as long as the process that called abort() > definitely does exit. In particular I really don't think it matters if > the termination state indicates a nonzero _exit rather than a signal under > these rare circumstances. And I think avoiding all locks in abort is > the best way to make it AS-safe without causing other problems. > > zw
On Wed, Jul 24, 2024, at 1:45 PM, Adhemerval Zanella Netto wrote: > On 24/07/24 14:19, Zack Weinberg wrote: >> I think we don't need to spend a lot of time worrying about exactly >> what happens when there's a race, as long as the process that called >> abort() definitely does exit. In particular I really don't think it >> matters if the termination state indicates a nonzero _exit rather >> than a signal under these rare circumstances. And I think avoiding >> all locks in abort is the best way to make it AS-safe without causing >> other problems. > > Another issue is abort() will usually generate a coredump, and this is > the main reason we add a reference of __abort_msg on abort TU. > Removing the signal disposition change along with the lock and just > calling _exit means not only a change in parent termination view, but > termination without a core for calls that caller might expect one. Well, again, this is under rare circumstances. To hit the _exit, the program has to suppress the initial raise(SIGABRT) and then win a race against the abort implementation trying to reset the mask and disposition. (To be clear, I am *not* proposing we drop the attempt to reset the mask and disposition, I am only proposing that we not bother with any locking around that attempt.) It is unusual to suppress SIGABRT in the first place; it is unusual to change signal dispositions in the middle of the program; abort is only called when the program is already malfunctioning; the cumulative probability of hitting the _exit should be so small, both before and after the change, that we can live with not getting core dumps when it does happen. (It's *not* unusual to change the *mask* in the middle of the program, but pthread_sigmask lets us unblock SIGABRT for just the thread that's calling abort, and any multithreaded program that uses sigprocmask while multiple threads are active has only itself to blame.) zw
On 24/07/24 14:57, Zack Weinberg wrote: > On Wed, Jul 24, 2024, at 1:45 PM, Adhemerval Zanella Netto wrote: >> On 24/07/24 14:19, Zack Weinberg wrote: >>> I think we don't need to spend a lot of time worrying about exactly >>> what happens when there's a race, as long as the process that called >>> abort() definitely does exit. In particular I really don't think it >>> matters if the termination state indicates a nonzero _exit rather >>> than a signal under these rare circumstances. And I think avoiding >>> all locks in abort is the best way to make it AS-safe without causing >>> other problems. >> >> Another issue is abort() will usually generate a coredump, and this is >> the main reason we add a reference of __abort_msg on abort TU. >> Removing the signal disposition change along with the lock and just >> calling _exit means not only a change in parent termination view, but >> termination without a core for calls that caller might expect one. > > Well, again, this is under rare circumstances. To hit the _exit, the > program has to suppress the initial raise(SIGABRT) and then win a race > against the abort implementation trying to reset the mask and disposition. > (To be clear, I am *not* proposing we drop the attempt to reset the mask > and disposition, I am only proposing that we not bother with any locking > around that attempt.) It is unusual to suppress SIGABRT in the first > place; it is unusual to change signal dispositions in the middle of the > program; abort is only called when the program is already > malfunctioning; the cumulative probability of hitting the _exit should > be so small, both before and after the change, that we can live with not > getting core dumps when it does happen. > > (It's *not* unusual to change the *mask* in the middle of the program, > but pthread_sigmask lets us unblock SIGABRT for just the thread that's > calling abort, and any multithreaded program that uses sigprocmask while > multiple threads are active has only itself to blame.) That is essentially the BSD strategy, as noted by enh; where it is still racy but I give that the possible windows would quite narrow. The main problem with the BSD approach for glibc is in fact setjmp, as I noted in the first patch. It means that usages like we do on debug/tst-fortify.c will always abort: static volatile int chk_fail_ok; static jmp_buf chk_fail_buf; static void handler (int sig) { if (chk_fail_ok) { chk_fail_ok = 0; longjmp (chk_fail_buf, 1); } else _exit (127); } [...] signal (SIGABRT, handler); [....] chk_fail_ok = 1; if (! setjmp (chk_fail_buf)) { // Something that can call abort, like a failed fortify function. chk_fail_ok = 0; printf ("FAIL\n"); } We can ignore it and suggest possible callers that to just use sigsetjmp instead. So the options we have are: 1. This patch approach, which enforces the SIGABRT termination semantic and uses a lock. The setjmp change is optional. 2. The BSD way, which is similar to 1. with the only difference it does not use a lock. There still a race windows, albeit small. 3. Your suggestion to just unblock the signal and raise SIGABRT; and fallback to exit if the handler returns. There is no lock, but there is not guarantee that parent will see a abnormal terminal also. I still prefer 1. since it does not have any race condition and the lock is only exercised on abnormal program termination anyway. I don't have a strong opinion about the setjmp semantic change, and I can drop the first patch is we are ok about it.
On Wed, Jul 24, 2024 at 2:17 PM Adhemerval Zanella Netto <adhemerval.zanella@linaro.org> wrote: > > > > On 24/07/24 14:57, Zack Weinberg wrote: > > On Wed, Jul 24, 2024, at 1:45 PM, Adhemerval Zanella Netto wrote: > >> On 24/07/24 14:19, Zack Weinberg wrote: > >>> I think we don't need to spend a lot of time worrying about exactly > >>> what happens when there's a race, as long as the process that called > >>> abort() definitely does exit. In particular I really don't think it > >>> matters if the termination state indicates a nonzero _exit rather > >>> than a signal under these rare circumstances. And I think avoiding > >>> all locks in abort is the best way to make it AS-safe without causing > >>> other problems. > >> > >> Another issue is abort() will usually generate a coredump, and this is > >> the main reason we add a reference of __abort_msg on abort TU. > >> Removing the signal disposition change along with the lock and just > >> calling _exit means not only a change in parent termination view, but > >> termination without a core for calls that caller might expect one. > > > > Well, again, this is under rare circumstances. To hit the _exit, the > > program has to suppress the initial raise(SIGABRT) and then win a race > > against the abort implementation trying to reset the mask and disposition. > > (To be clear, I am *not* proposing we drop the attempt to reset the mask > > and disposition, I am only proposing that we not bother with any locking > > around that attempt.) It is unusual to suppress SIGABRT in the first > > place; it is unusual to change signal dispositions in the middle of the > > program; abort is only called when the program is already > > malfunctioning; the cumulative probability of hitting the _exit should > > be so small, both before and after the change, that we can live with not > > getting core dumps when it does happen. > > > > (It's *not* unusual to change the *mask* in the middle of the program, > > but pthread_sigmask lets us unblock SIGABRT for just the thread that's > > calling abort, and any multithreaded program that uses sigprocmask while > > multiple threads are active has only itself to blame.) > > That is essentially the BSD strategy, as noted by enh; where it is still racy > but I give that the possible windows would quite narrow. > > The main problem with the BSD approach for glibc is in fact setjmp, as I > noted in the first patch. yeah, that's actually another BSD difference that Android shares --- for us, setjmp() == sigsetjmp(..., 1) (and there's a _setjmp()/_longjmp() if you _don't_ want the signal mask). > It means that usages like we do on > debug/tst-fortify.c will always abort: > > > static volatile int chk_fail_ok; > static jmp_buf chk_fail_buf; > > static void > handler (int sig) > { > if (chk_fail_ok) > { > chk_fail_ok = 0; > longjmp (chk_fail_buf, 1); > } > else > _exit (127); > } > [...] > signal (SIGABRT, handler); > [....] > chk_fail_ok = 1; > if (! setjmp (chk_fail_buf)) > { > // Something that can call abort, like a failed fortify function. > chk_fail_ok = 0; > printf ("FAIL\n"); > } > > We can ignore it and suggest possible callers that to just use sigsetjmp > instead. > > So the options we have are: > > 1. This patch approach, which enforces the SIGABRT termination semantic > and uses a lock. The setjmp change is optional. > > 2. The BSD way, which is similar to 1. with the only difference it does > not use a lock. There still a race windows, albeit small. > > 3. Your suggestion to just unblock the signal and raise SIGABRT; and > fallback to exit if the handler returns. There is no lock, but there > is not guarantee that parent will see a abnormal terminal also. > > I still prefer 1. since it does not have any race condition and the lock is > only exercised on abnormal program termination anyway. I don't have a strong > opinion about the setjmp semantic change, and I can drop the first patch is > we are ok about it.
On Wed, Jul 24, 2024, at 2:16 PM, Adhemerval Zanella Netto wrote: ... > The main problem with the BSD approach for glibc is in fact setjmp, as I > noted in the first patch. It means that usages like we do on > debug/tst-fortify.c will always abort: ... I think I'm okay with saying that any longjmp out of a signal handler really ought to be sigsetjmp/siglongjmp. Unlike Florian I would also be OK with changing plain setjmp/longjmp to restore the signal mask; as I said before, I think that's more likely to make existing problems go away than introduce new ones. Or we could split the difference and introduce a new feature test macro, _LONGJMP_RESTORE_SIGMASK perhaps. ... > 2. The BSD way, which is similar to 1. with the only difference it does > not use a lock. There still a race windows, albeit small. > > 3. Your suggestion to just unblock the signal and raise SIGABRT; and > fallback to exit if the handler returns. There is no lock, but there > is not guarantee that parent will see a abnormal terminal also. Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far. zw
On 24/07/24 15:39, Zack Weinberg wrote: > On Wed, Jul 24, 2024, at 2:16 PM, Adhemerval Zanella Netto wrote: > ... >> The main problem with the BSD approach for glibc is in fact setjmp, as I >> noted in the first patch. It means that usages like we do on >> debug/tst-fortify.c will always abort: > ... > > I think I'm okay with saying that any longjmp out of a signal handler really ought to be sigsetjmp/siglongjmp. Unlike Florian I would also be OK with changing plain setjmp/longjmp to restore the signal mask; as I said before, I think that's more likely to make existing problems go away than introduce new ones. Or we could split the difference and introduce a new feature test macro, _LONGJMP_RESTORE_SIGMASK perhaps. > > ... >> 2. The BSD way, which is similar to 1. with the only difference it does >> not use a lock. There still a race windows, albeit small. >> >> 3. Your suggestion to just unblock the signal and raise SIGABRT; and >> fallback to exit if the handler returns. There is no lock, but there >> is not guarantee that parent will see a abnormal terminal also. > > Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far. Right, so essentially current patch without the lock. I still think the lock is really cheap and it helps on other very unlikely corner cases (like the posix_spawn just after the new signal disposition), and make abort without any race condition.
On Wed, Jul 24, 2024, at 2:52 PM, Adhemerval Zanella Netto wrote: > On 24/07/24 15:39, Zack Weinberg wrote: >> Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far. > > Right, so essentially current patch without the lock. I still think the lock > is really cheap and it helps on other very unlikely corner cases But the lock is the thing that makes abort not AS-safe! zw
> On 24 Jul 2024, at 18:01, Zack Weinberg <zack@owlfolio.org> wrote: > On Wed, Jul 24, 2024, at 2:52 PM, Adhemerval Zanella Netto wrote: >> On 24/07/24 15:39, Zack Weinberg wrote: >>> Let me clarify that what I was actually proposing is (2). I think the new POSIX language *permits* (3) but I don't think it would be a good idea to go that far. >> >> Right, so essentially current patch without the lock. I still think the lock >> is really cheap and it helps on other very unlikely corner cases > > But the lock is the thing that makes abort not AS-safe! > > zw The *recursive* lock is the main issue, where the current implementation tries to mitigate with a somewhat complex state machine. By blocking all signals and then taking the locking, we make sure there is no need to handle reentrancy.