Message ID | 20240723134235.1520483-2-adhemerval.zanella@linaro.org |
---|---|
State | New |
Headers | show |
Series | Make abort AS-safe | expand |
* Adhemerval Zanella: > POSIX relaxed the relation of setjmp/longjmp and the signal mask > save/restore, meaning that setjmp does not require to be routed to > _setjmp to be standard compliant. > > This is done to avoid breakage of SIGABRT handlers, since to fully > make abort AS-safe, it is required to remove the recurisve lock > used to unblock SIGABRT prior raised the signal. > > Also, it allows caller to actually use setjmp, since from > 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally > routed to _setjmp. I still don't think making a compatibility-impacting change to fix such an obscure bug is a good trade-off. The kernel providing something like exit_signal (terminate the process even if a signal handler is installed) would be a better way forward, in my opinion. Thanks, Florian
On 23/07/24 10:49, Florian Weimer wrote: > * Adhemerval Zanella: > >> POSIX relaxed the relation of setjmp/longjmp and the signal mask >> save/restore, meaning that setjmp does not require to be routed to >> _setjmp to be standard compliant. >> >> This is done to avoid breakage of SIGABRT handlers, since to fully >> make abort AS-safe, it is required to remove the recurisve lock >> used to unblock SIGABRT prior raised the signal. >> >> Also, it allows caller to actually use setjmp, since from >> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally >> routed to _setjmp. > > I still don't think making a compatibility-impacting change to fix such > an obscure bug is a good trade-off. > > The kernel providing something like exit_signal (terminate the process > even if a signal handler is installed) would be a better way forward, in > my opinion. What do you mean by obscure bug? The abort not being async-signal-safe is being triggered by users and incurring in workarounds and time consuming analysis. And I don't see the need of a new kernel interface for an issue that can be complete solved in userspace (and I think kernel developers would say the same thing, since this most likely will need a new syscall to solve a very specific userland issue). And user can still use the SysV setjmp, they just need to explicit say so. I am not really following the reluctant to fix abort() async-signal-safeness.
On Tue, Jul 23, 2024, at 12:04 PM, Adhemerval Zanella Netto wrote: > On 23/07/24 10:49, Florian Weimer wrote: >> * Adhemerval Zanella: >> >>> POSIX relaxed the relation of setjmp/longjmp and the signal mask >>> save/restore, meaning that setjmp does not require to be routed to >>> _setjmp to be standard compliant. >>> >>> This is done to avoid breakage of SIGABRT handlers, since to fully >>> make abort AS-safe, it is required to remove the recurisve lock >>> used to unblock SIGABRT prior raised the signal. >>> >>> Also, it allows caller to actually use setjmp, since from >>> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally >>> routed to _setjmp. >> >> I still don't think making a compatibility-impacting change to fix such >> an obscure bug is a good trade-off. I'm inclined to think that making setjmp be sigsetjmp by default is more likely to _hide_ bugs than _expose_ them, but ... >> The kernel providing something like exit_signal (terminate the process >> even if a signal handler is installed) would be a better way forward, in >> my opinion. > > And I don't see the need of a new kernel interface for an issue that can > be complete solved in userspace (and I think kernel developers would say > the same thing, since this most likely will need a new syscall to solve a > very specific userland issue). ... adding exit_signal() would mean that abort() didn't need to do locking at all, which might actually be a strong enough argument for the kernel people. zw
On Tue, Jul 23, 2024, at 1:26 PM, Zack Weinberg wrote: >> And I don't see the need of a new kernel interface for an issue that >> can be complete solved in userspace (and I think kernel developers >> would say the same thing, since this most likely will need a new >> syscall to solve a very specific userland issue). > > ... adding exit_signal() would mean that abort() didn't need to do > locking at all, which might actually be a strong enough argument for > the kernel people. I still kinda like the exit_signal idea, but here's another idea that might make abort AS-safe without needing to change setjmp or add a new kernel interface: add another internal cross-thread broadcast message (like the existing one for euid changes) that means "force all other threads to exit [not cancel!] immediately". We'd do that if the initial raise(SIGABRT) fails, thus permitting us to reset the SIGABRT disposition and not have to worry about another thread changing it back, which I believe is the only reason for the locking? zw
On 23/07/24 14:26, Zack Weinberg wrote: > On Tue, Jul 23, 2024, at 12:04 PM, Adhemerval Zanella Netto wrote: >> On 23/07/24 10:49, Florian Weimer wrote: >>> * Adhemerval Zanella: >>> >>>> POSIX relaxed the relation of setjmp/longjmp and the signal mask >>>> save/restore, meaning that setjmp does not require to be routed to >>>> _setjmp to be standard compliant. >>>> >>>> This is done to avoid breakage of SIGABRT handlers, since to fully >>>> make abort AS-safe, it is required to remove the recurisve lock >>>> used to unblock SIGABRT prior raised the signal. >>>> >>>> Also, it allows caller to actually use setjmp, since from >>>> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally >>>> routed to _setjmp. >>> >>> I still don't think making a compatibility-impacting change to fix such >>> an obscure bug is a good trade-off. > > I'm inclined to think that making setjmp be sigsetjmp by default is more > likely to _hide_ bugs than _expose_ them, but ... > >>> The kernel providing something like exit_signal (terminate the process >>> even if a signal handler is installed) would be a better way forward, in >>> my opinion. >> >> And I don't see the need of a new kernel interface for an issue that can >> be complete solved in userspace (and I think kernel developers would say >> the same thing, since this most likely will need a new syscall to solve a >> very specific userland issue). > > ... adding exit_signal() would mean that abort() didn't need to do locking at all, > which might actually be a strong enough argument for the kernel people. Besides the time required to propagate this, and the issue that abort() would be async-signal-safe only for certain kernels, I am not fully sure that exit_signal make sense when kernel already provides exit_group. The whole idea of abort() is to execute default action of abnormal termination, otherwise one can just call _Exit/_exit.
On 23/07/24 14:34, Zack Weinberg wrote: > On Tue, Jul 23, 2024, at 1:26 PM, Zack Weinberg wrote: >>> And I don't see the need of a new kernel interface for an issue that >>> can be complete solved in userspace (and I think kernel developers >>> would say the same thing, since this most likely will need a new >>> syscall to solve a very specific userland issue). >> >> ... adding exit_signal() would mean that abort() didn't need to do >> locking at all, which might actually be a strong enough argument for >> the kernel people. > > I still kinda like the exit_signal idea, but here's another idea that > might make abort AS-safe without needing to change setjmp or add a > new kernel interface: add another internal cross-thread broadcast > message (like the existing one for euid changes) that means "force > all other threads to exit [not cancel!] immediately". We'd do that > if the initial raise(SIGABRT) fails, thus permitting us to reset the > SIGABRT disposition and not have to worry about another thread changing > it back, which I believe is the only reason for the locking? This is *way* more complex, since it requires register a new internal signal (or remap one like we do for SIGTIMER, but this also adds more complexity) on abort anyway as we on pthread_cancel. It also requires to synchronize on each thread for abnormal end of execution, which raises the question on how to handle newly created threads during the signaling (which most likely would likely required even more synchronization in pthread_create).
On Tue, 23 Jul 2024, Zack Weinberg wrote: > >> The kernel providing something like exit_signal (terminate the process > >> even if a signal handler is installed) would be a better way forward, in > >> my opinion. > > > > And I don't see the need of a new kernel interface for an issue that can > > be complete solved in userspace (and I think kernel developers would say > > the same thing, since this most likely will need a new syscall to solve a > > very specific userland issue). > > ... adding exit_signal() would mean that abort() didn't need to do locking at all, > which might actually be a strong enough argument for the kernel people. And FWIW this interface already exists internally in the kernel, just not exposed for general use to the userland (it's `force_exit_sig'). Maciej
* Zack Weinberg: > On Tue, Jul 23, 2024, at 12:04 PM, Adhemerval Zanella Netto wrote: >> On 23/07/24 10:49, Florian Weimer wrote: >>> * Adhemerval Zanella: >>> >>>> POSIX relaxed the relation of setjmp/longjmp and the signal mask >>>> save/restore, meaning that setjmp does not require to be routed to >>>> _setjmp to be standard compliant. >>>> >>>> This is done to avoid breakage of SIGABRT handlers, since to fully >>>> make abort AS-safe, it is required to remove the recurisve lock >>>> used to unblock SIGABRT prior raised the signal. >>>> >>>> Also, it allows caller to actually use setjmp, since from >>>> 7011c2622fe3e10a29dbe74f06aaebd07710127d the symbol is unconditionally >>>> routed to _setjmp. >>> >>> I still don't think making a compatibility-impacting change to fix such >>> an obscure bug is a good trade-off. > > I'm inclined to think that making setjmp be sigsetjmp by default is more > likely to _hide_ bugs than _expose_ them, but ... It's the impact of the extra system call that I'm worried about. It may preclude using setjmp for using setjmp/longjmp exception handling, for performance reasons. It only impacts applications as they are recompiled, but still. People will have to change their sources and switch to sigsetjmp. Thanks, Floria
diff --git a/NEWS b/NEWS index 2f67f3f315..a30fb748d6 100644 --- a/NEWS +++ b/NEWS @@ -17,7 +17,9 @@ Deprecated and removed features, and other changes affecting compatibility: Changes to build and runtime requirements: - [Add changes to build and runtime requirements here] +* The setjmp now uses the BSD semantic as default, where the signal mask + is saved. This is required to avoid the break of SIGABRT handler for + the async-signal-safe abort implementation. Security related changes: diff --git a/manual/setjmp.texi b/manual/setjmp.texi index 7092a0dde2..f2d82a2f33 100644 --- a/manual/setjmp.texi +++ b/manual/setjmp.texi @@ -189,16 +189,10 @@ them @code{volatile}. @section Non-Local Exits and Signals In BSD Unix systems, @code{setjmp} and @code{longjmp} also save and -restore the set of blocked signals; see @ref{Blocking Signals}. However, -the POSIX.1 standard requires @code{setjmp} and @code{longjmp} not to -change the set of blocked signals, and provides an additional pair of -functions (@code{sigsetjmp} and @code{siglongjmp}) to get the BSD -behavior. - -The behavior of @code{setjmp} and @code{longjmp} in @theglibc{} is -controlled by feature test macros; see @ref{Feature Test Macros}. The -default in @theglibc{} is the POSIX.1 behavior rather than the BSD -behavior. +restore the set of blocked signals; see @ref{Blocking Signals}, while +on @w{System V} they will not. POSIX does not specify the relation of +@code{setjmp} and @code{longjmp} to signal mask. The default in +@theglibc{} is the BSD behavior. The facilities in this section are declared in the header file @file{setjmp.h}. diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c index 1d3665d5ed..527cb9017d 100644 --- a/nptl/pthread_create.c +++ b/nptl/pthread_create.c @@ -402,7 +402,8 @@ start_thread (void *arg) the saved signal mask), so that is a false positive. */ DIAG_IGNORE_NEEDS_COMMENT (11, "-Wstringop-overflow="); #endif - not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf); + not_first_call = __sigsetjmp ( + (struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf, 0); DIAG_POP_NEEDS_COMMENT; /* No previous handlers. NB: This must be done after setjmp since the diff --git a/setjmp/setjmp.h b/setjmp/setjmp.h index 1309c6210d..98efa3e863 100644 --- a/setjmp/setjmp.h +++ b/setjmp/setjmp.h @@ -44,11 +44,6 @@ extern int __sigsetjmp (struct __jmp_buf_tag __env[1], int __savemask) __THROWNL Return 0. */ extern int _setjmp (struct __jmp_buf_tag __env[1]) __THROWNL; -/* Do not save the signal mask. This is equivalent to the `_setjmp' - BSD function. */ -#define setjmp(env) _setjmp (env) - - /* Jump to the environment saved in ENV, making the `setjmp' call there return VAL, or 1 if VAL is 0. */ extern void longjmp (struct __jmp_buf_tag __env[1], int __val) diff --git a/sysdeps/nptl/libc_start_call_main.h b/sysdeps/nptl/libc_start_call_main.h index 26ac8dd53b..c41ca6843b 100644 --- a/sysdeps/nptl/libc_start_call_main.h +++ b/sysdeps/nptl/libc_start_call_main.h @@ -41,7 +41,8 @@ __libc_start_call_main (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL), the saved signal mask), so that is a false positive. */ DIAG_IGNORE_NEEDS_COMMENT (11, "-Wstringop-overflow="); #endif - not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf); + not_first_call = __sigsetjmp ( + (struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf, 0); DIAG_POP_NEEDS_COMMENT; if (__glibc_likely (! not_first_call)) {