diff mbox series

[v2,1/2] setjmp: Use BSD sematic as default for setjmp

Message ID 20240723134235.1520483-2-adhemerval.zanella@linaro.org
State New
Headers show
Series Make abort AS-safe | expand

Commit Message

Adhemerval Zanella July 23, 2024, 1:41 p.m. UTC
POSIX relaxed the relation of setjmp/longjmp and the signal mask
save/restore, meaning that setjmp does not require to be routed to
_setjmp to be standard compliant.

This is done to avoid breakage of SIGABRT handlers, since to fully
make abort AS-safe, it is required to remove the recurisve lock
used to unblock SIGABRT prior raised the signal.

Also, it allows caller to actually use setjmp, since from
7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally
routed to _setjmp.

Checked on x86_64-linux-gnu.
---
 NEWS                                |  4 +++-
 manual/setjmp.texi                  | 14 ++++----------
 nptl/pthread_create.c               |  3 ++-
 setjmp/setjmp.h                     |  5 -----
 sysdeps/nptl/libc_start_call_main.h |  3 ++-
 5 files changed, 11 insertions(+), 18 deletions(-)

Comments

Florian Weimer July 23, 2024, 1:49 p.m. UTC | #1
* Adhemerval Zanella:

> POSIX relaxed the relation of setjmp/longjmp and the signal mask
> save/restore, meaning that setjmp does not require to be routed to
> _setjmp to be standard compliant.
>
> This is done to avoid breakage of SIGABRT handlers, since to fully
> make abort AS-safe, it is required to remove the recurisve lock
> used to unblock SIGABRT prior raised the signal.
>
> Also, it allows caller to actually use setjmp, since from
> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally
> routed to _setjmp.

I still don't think making a compatibility-impacting change to fix such
an obscure bug is a good trade-off.

The kernel providing something like exit_signal (terminate the process
even if a signal handler is installed) would be a better way forward, in
my opinion.

Thanks,
Florian
Adhemerval Zanella July 23, 2024, 4:04 p.m. UTC | #2
On 23/07/24 10:49, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>> POSIX relaxed the relation of setjmp/longjmp and the signal mask
>> save/restore, meaning that setjmp does not require to be routed to
>> _setjmp to be standard compliant.
>>
>> This is done to avoid breakage of SIGABRT handlers, since to fully
>> make abort AS-safe, it is required to remove the recurisve lock
>> used to unblock SIGABRT prior raised the signal.
>>
>> Also, it allows caller to actually use setjmp, since from
>> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally
>> routed to _setjmp.
> 
> I still don't think making a compatibility-impacting change to fix such
> an obscure bug is a good trade-off.
> 
> The kernel providing something like exit_signal (terminate the process
> even if a signal handler is installed) would be a better way forward, in
> my opinion.

What do you mean by obscure bug? The abort not being async-signal-safe is
being triggered by users and incurring in workarounds and time consuming
analysis.

And I don't see the need of a new kernel interface for an issue that can
be complete solved in userspace (and I think kernel developers would say
the same thing, since this most likely will need a new syscall to solve a
very specific userland issue).

And user can still use the SysV setjmp, they just need to explicit say so.
I am not really following the reluctant to fix abort() async-signal-safeness.
Zack Weinberg July 23, 2024, 5:26 p.m. UTC | #3
On Tue, Jul 23, 2024, at 12:04 PM, Adhemerval Zanella Netto wrote:
> On 23/07/24 10:49, Florian Weimer wrote:
>> * Adhemerval Zanella:
>> 
>>> POSIX relaxed the relation of setjmp/longjmp and the signal mask
>>> save/restore, meaning that setjmp does not require to be routed to
>>> _setjmp to be standard compliant.
>>>
>>> This is done to avoid breakage of SIGABRT handlers, since to fully
>>> make abort AS-safe, it is required to remove the recurisve lock
>>> used to unblock SIGABRT prior raised the signal.
>>>
>>> Also, it allows caller to actually use setjmp, since from
>>> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally
>>> routed to _setjmp.
>> 
>> I still don't think making a compatibility-impacting change to fix such
>> an obscure bug is a good trade-off.

I'm inclined to think that making setjmp be sigsetjmp by default is more
likely to _hide_ bugs than _expose_ them, but ...

>> The kernel providing something like exit_signal (terminate the process
>> even if a signal handler is installed) would be a better way forward, in
>> my opinion.
>
> And I don't see the need of a new kernel interface for an issue that can
> be complete solved in userspace (and I think kernel developers would say
> the same thing, since this most likely will need a new syscall to solve a
> very specific userland issue).

... adding exit_signal() would mean that abort() didn't need to do locking at all,
which might actually be a strong enough argument for the kernel people.

zw
Zack Weinberg July 23, 2024, 5:34 p.m. UTC | #4
On Tue, Jul 23, 2024, at 1:26 PM, Zack Weinberg wrote:
>> And I don't see the need of a new kernel interface for an issue that
>> can be complete solved in userspace (and I think kernel developers
>> would say the same thing, since this most likely will need a new
>> syscall to solve a very specific userland issue).
>
> ... adding exit_signal() would mean that abort() didn't need to do
> locking at all, which might actually be a strong enough argument for
> the kernel people.

I still kinda like the exit_signal idea, but here's another idea that
might make abort AS-safe without needing to change setjmp or add a
new kernel interface: add another internal cross-thread broadcast
message (like the existing one for euid changes) that means "force
all other threads to exit [not cancel!] immediately".  We'd do that
if the initial raise(SIGABRT) fails, thus permitting us to reset the
SIGABRT disposition and not have to worry about another thread changing
it back, which I believe is the only reason for the locking?

zw
Adhemerval Zanella July 23, 2024, 5:35 p.m. UTC | #5
On 23/07/24 14:26, Zack Weinberg wrote:
> On Tue, Jul 23, 2024, at 12:04 PM, Adhemerval Zanella Netto wrote:
>> On 23/07/24 10:49, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>>
>>>> POSIX relaxed the relation of setjmp/longjmp and the signal mask
>>>> save/restore, meaning that setjmp does not require to be routed to
>>>> _setjmp to be standard compliant.
>>>>
>>>> This is done to avoid breakage of SIGABRT handlers, since to fully
>>>> make abort AS-safe, it is required to remove the recurisve lock
>>>> used to unblock SIGABRT prior raised the signal.
>>>>
>>>> Also, it allows caller to actually use setjmp, since from
>>>> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally
>>>> routed to _setjmp.
>>>
>>> I still don't think making a compatibility-impacting change to fix such
>>> an obscure bug is a good trade-off.
> 
> I'm inclined to think that making setjmp be sigsetjmp by default is more
> likely to _hide_ bugs than _expose_ them, but ...
> 
>>> The kernel providing something like exit_signal (terminate the process
>>> even if a signal handler is installed) would be a better way forward, in
>>> my opinion.
>>
>> And I don't see the need of a new kernel interface for an issue that can
>> be complete solved in userspace (and I think kernel developers would say
>> the same thing, since this most likely will need a new syscall to solve a
>> very specific userland issue).
> 
> ... adding exit_signal() would mean that abort() didn't need to do locking at all,
> which might actually be a strong enough argument for the kernel people.

Besides the time required to propagate this, and the issue that abort() would be
async-signal-safe only for certain kernels, I am not fully sure that exit_signal
make sense when kernel already provides exit_group.  The whole idea of abort()
is to execute default action of abnormal termination, otherwise one can just
call _Exit/_exit.
Adhemerval Zanella July 23, 2024, 5:41 p.m. UTC | #6
On 23/07/24 14:34, Zack Weinberg wrote:
> On Tue, Jul 23, 2024, at 1:26 PM, Zack Weinberg wrote:
>>> And I don't see the need of a new kernel interface for an issue that
>>> can be complete solved in userspace (and I think kernel developers
>>> would say the same thing, since this most likely will need a new
>>> syscall to solve a very specific userland issue).
>>
>> ... adding exit_signal() would mean that abort() didn't need to do
>> locking at all, which might actually be a strong enough argument for
>> the kernel people.
> 
> I still kinda like the exit_signal idea, but here's another idea that
> might make abort AS-safe without needing to change setjmp or add a
> new kernel interface: add another internal cross-thread broadcast
> message (like the existing one for euid changes) that means "force
> all other threads to exit [not cancel!] immediately".  We'd do that
> if the initial raise(SIGABRT) fails, thus permitting us to reset the
> SIGABRT disposition and not have to worry about another thread changing
> it back, which I believe is the only reason for the locking?

This is *way* more complex, since it requires register a new internal
signal (or remap one like we do for SIGTIMER, but this also adds more
complexity) on abort anyway as we on pthread_cancel.  It also requires
to synchronize on each thread for abnormal end of execution, which raises
the question on how to handle newly created threads during the signaling
(which most likely would likely required even more synchronization in
pthread_create).
Maciej W. Rozycki July 24, 2024, 8:47 a.m. UTC | #7
On Tue, 23 Jul 2024, Zack Weinberg wrote:

> >> The kernel providing something like exit_signal (terminate the process
> >> even if a signal handler is installed) would be a better way forward, in
> >> my opinion.
> >
> > And I don't see the need of a new kernel interface for an issue that can
> > be complete solved in userspace (and I think kernel developers would say
> > the same thing, since this most likely will need a new syscall to solve a
> > very specific userland issue).
> 
> ... adding exit_signal() would mean that abort() didn't need to do locking at all,
> which might actually be a strong enough argument for the kernel people.

 And FWIW this interface already exists internally in the kernel, just not 
exposed for general use to the userland (it's `force_exit_sig').

  Maciej
Florian Weimer July 24, 2024, 9:29 a.m. UTC | #8
* Zack Weinberg:

> On Tue, Jul 23, 2024, at 12:04 PM, Adhemerval Zanella Netto wrote:
>> On 23/07/24 10:49, Florian Weimer wrote:
>>> * Adhemerval Zanella:
>>> 
>>>> POSIX relaxed the relation of setjmp/longjmp and the signal mask
>>>> save/restore, meaning that setjmp does not require to be routed to
>>>> _setjmp to be standard compliant.
>>>>
>>>> This is done to avoid breakage of SIGABRT handlers, since to fully
>>>> make abort AS-safe, it is required to remove the recurisve lock
>>>> used to unblock SIGABRT prior raised the signal.
>>>>
>>>> Also, it allows caller to actually use setjmp, since from
>>>> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally
>>>> routed to _setjmp.
>>> 
>>> I still don't think making a compatibility-impacting change to fix such
>>> an obscure bug is a good trade-off.
>
> I'm inclined to think that making setjmp be sigsetjmp by default is more
> likely to _hide_ bugs than _expose_ them, but ...

It's the impact of the extra system call that I'm worried about.  It may
preclude using setjmp for using setjmp/longjmp exception handling, for
performance reasons.

It only impacts applications as they are recompiled, but still.  People
will have to change their sources and switch to sigsetjmp.

Thanks,
Floria
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 2f67f3f315..a30fb748d6 100644
--- a/NEWS
+++ b/NEWS
@@ -17,7 +17,9 @@  Deprecated and removed features, and other changes affecting compatibility:
 
 Changes to build and runtime requirements:
 
-  [Add changes to build and runtime requirements here]
+* The setjmp now uses the BSD semantic as default, where the signal mask
+  is saved.  This is required to avoid the break of SIGABRT handler for
+  the async-signal-safe abort implementation.
 
 Security related changes:
 
diff --git a/manual/setjmp.texi b/manual/setjmp.texi
index 7092a0dde2..f2d82a2f33 100644
--- a/manual/setjmp.texi
+++ b/manual/setjmp.texi
@@ -189,16 +189,10 @@  them @code{volatile}.
 @section Non-Local Exits and Signals
 
 In BSD Unix systems, @code{setjmp} and @code{longjmp} also save and
-restore the set of blocked signals; see @ref{Blocking Signals}.  However,
-the POSIX.1 standard requires @code{setjmp} and @code{longjmp} not to
-change the set of blocked signals, and provides an additional pair of
-functions (@code{sigsetjmp} and @code{siglongjmp}) to get the BSD
-behavior.
-
-The behavior of @code{setjmp} and @code{longjmp} in @theglibc{} is
-controlled by feature test macros; see @ref{Feature Test Macros}.  The
-default in @theglibc{} is the POSIX.1 behavior rather than the BSD
-behavior.
+restore the set of blocked signals; see @ref{Blocking Signals}, while
+on @w{System V} they will not.  POSIX does not specify the relation of
+@code{setjmp} and @code{longjmp} to signal mask.  The default in
+@theglibc{} is the BSD behavior.
 
 The facilities in this section are declared in the header file
 @file{setjmp.h}.
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 1d3665d5ed..527cb9017d 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -402,7 +402,8 @@  start_thread (void *arg)
      the saved signal mask), so that is a false positive.  */
   DIAG_IGNORE_NEEDS_COMMENT (11, "-Wstringop-overflow=");
 #endif
-  not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf);
+  not_first_call = __sigsetjmp (
+    (struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf, 0);
   DIAG_POP_NEEDS_COMMENT;
 
   /* No previous handlers.  NB: This must be done after setjmp since the
diff --git a/setjmp/setjmp.h b/setjmp/setjmp.h
index 1309c6210d..98efa3e863 100644
--- a/setjmp/setjmp.h
+++ b/setjmp/setjmp.h
@@ -44,11 +44,6 @@  extern int __sigsetjmp (struct __jmp_buf_tag __env[1], int __savemask) __THROWNL
    Return 0.  */
 extern int _setjmp (struct __jmp_buf_tag __env[1]) __THROWNL;
 
-/* Do not save the signal mask.  This is equivalent to the `_setjmp'
-   BSD function.  */
-#define setjmp(env)	_setjmp (env)
-
-
 /* Jump to the environment saved in ENV, making the
    `setjmp' call there return VAL, or 1 if VAL is 0.  */
 extern void longjmp (struct __jmp_buf_tag __env[1], int __val)
diff --git a/sysdeps/nptl/libc_start_call_main.h b/sysdeps/nptl/libc_start_call_main.h
index 26ac8dd53b..c41ca6843b 100644
--- a/sysdeps/nptl/libc_start_call_main.h
+++ b/sysdeps/nptl/libc_start_call_main.h
@@ -41,7 +41,8 @@  __libc_start_call_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
      the saved signal mask), so that is a false positive.  */
   DIAG_IGNORE_NEEDS_COMMENT (11, "-Wstringop-overflow=");
 #endif
-  not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf);
+  not_first_call = __sigsetjmp (
+     (struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf, 0);
   DIAG_POP_NEEDS_COMMENT;
   if (__glibc_likely (! not_first_call))
     {