diff mbox series

[RFC] linux-user: trap internal SIGABRT's

Message ID 20220209112207.3368139-1-alex.bennee@linaro.org
State New
Headers show
Series [RFC] linux-user: trap internal SIGABRT's | expand

Commit Message

Alex Bennée Feb. 9, 2022, 11:22 a.m. UTC
linux-user wants to trap all signals in case they are related to the
guest. This however results in less than helpful core dumps when the
error is internal to QEMU. We can detect when an assert failure is in
progress by examining __glib_assert_msg and fall through to
cpu_abort() which will pretty print something before restoring the
default SIGABRT behaviour and dumping core.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
 linux-user/signal.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Peter Maydell Feb. 9, 2022, 11:38 a.m. UTC | #1
On Wed, 9 Feb 2022 at 11:35, Alex Bennée <alex.bennee@linaro.org> wrote:
> linux-user wants to trap all signals in case they are related to the
> guest. This however results in less than helpful core dumps when the
> error is internal to QEMU. We can detect when an assert failure is in
> progress by examining __glib_assert_msg and fall through to
> cpu_abort() which will pretty print something before restoring the
> default SIGABRT behaviour and dumping core.

There is definitely a problem here that it would be nice to
fix, but __glib_assert_msg is as far as I can tell not a
documented public-facing glib API, and in any case it won't
catch assertions via plain old assert() or abort() or for
that matter SIGSEGVs and other kinds of crash in QEMU's own code.

thanks
-- PMM
Alex Bennée Feb. 9, 2022, 1:12 p.m. UTC | #2
Peter Maydell <peter.maydell@linaro.org> writes:

> On Wed, 9 Feb 2022 at 11:35, Alex Bennée <alex.bennee@linaro.org> wrote:
>> linux-user wants to trap all signals in case they are related to the
>> guest. This however results in less than helpful core dumps when the
>> error is internal to QEMU. We can detect when an assert failure is in
>> progress by examining __glib_assert_msg and fall through to
>> cpu_abort() which will pretty print something before restoring the
>> default SIGABRT behaviour and dumping core.
>
> There is definitely a problem here that it would be nice to
> fix, but __glib_assert_msg is as far as I can tell not a
> documented public-facing glib API,

Yeah it's in an odd position - it is explicitly exported but not
documented as an API but for use by crash tools:

  https://gitlab.gnome.org/GNOME/glib/-/issues/712

> and in any case it won't
> catch assertions via plain old assert() or abort() or for

libc does provide an a private __abort_msg but that is explicitly
private and I guess would break against a non-gnu libc (do we support
that?).

Explicit aborts() in linux-user code should probably be converted to
cpu_abort as it does the right thing. asserts() can be converted to
g_assert() given as glib is a absolute requirement for building.

> that matter SIGSEGVs and other kinds of crash in QEMU's own code.

There is some checking in the host_signal_handler that could be a bit
cleverer. We currently check for h2g_valid(host_addr) but we could
expand that to cover QEMU's own address space and behave appropriately.

>
> thanks
> -- PMM
Peter Maydell Feb. 9, 2022, 1:27 p.m. UTC | #3
On Wed, 9 Feb 2022 at 13:23, Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Peter Maydell <peter.maydell@linaro.org> writes:
>
> > On Wed, 9 Feb 2022 at 11:35, Alex Bennée <alex.bennee@linaro.org> wrote:
> >> linux-user wants to trap all signals in case they are related to the
> >> guest. This however results in less than helpful core dumps when the
> >> error is internal to QEMU. We can detect when an assert failure is in
> >> progress by examining __glib_assert_msg and fall through to
> >> cpu_abort() which will pretty print something before restoring the
> >> default SIGABRT behaviour and dumping core.
> >
> > There is definitely a problem here that it would be nice to
> > fix, but __glib_assert_msg is as far as I can tell not a
> > documented public-facing glib API,
>
> Yeah it's in an odd position - it is explicitly exported but not
> documented as an API but for use by crash tools:
>
>   https://gitlab.gnome.org/GNOME/glib/-/issues/712

Mmm. I think if glib specifically mark it as "not part of our API"
then we should not be touching it.

-- PMM
Richard Henderson Feb. 9, 2022, 9:56 p.m. UTC | #4
On 2/9/22 22:22, Alex Bennée wrote:
> linux-user wants to trap all signals in case they are related to the
> guest. This however results in less than helpful core dumps when the
> error is internal to QEMU. We can detect when an assert failure is in
> progress by examining __glib_assert_msg and fall through to
> cpu_abort() which will pretty print something before restoring the
> default SIGABRT behaviour and dumping core.
> 
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
>   linux-user/signal.c | 6 ++++++
>   1 file changed, 6 insertions(+)
> 
> diff --git a/linux-user/signal.c b/linux-user/signal.c
> index 32854bb375..8ecc1215f7 100644
> --- a/linux-user/signal.c
> +++ b/linux-user/signal.c
> @@ -809,6 +809,8 @@ static inline void rewind_if_in_safe_syscall(void *puc)
>       }
>   }
>   
> +GLIB_VAR char *__glib_assert_msg;
> +
>   static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
>   {
>       CPUArchState *env = thread_cpu->env_ptr;
> @@ -821,6 +823,10 @@ static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
>       uintptr_t pc = 0;
>       bool sync_sig = false;
>   
> +    if (__glib_assert_msg) {
> +        cpu_abort(cpu, "internal QEMU error, aborting...");
> +    }

I think we should not be trapping SIGABRT.  I think we can preserve all guest behaviour 
wrt SIGABRT by stealing another SIGRTMIN value, and remapping the guest signal number.  We 
can produce the correct result for the system by mapping it back to host SIGABRT in 
core_dump_and_abort().


r~
Peter Maydell Feb. 11, 2022, 11:46 a.m. UTC | #5
On Wed, 9 Feb 2022 at 22:12, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 2/9/22 22:22, Alex Bennée wrote:
> > linux-user wants to trap all signals in case they are related to the
> > guest. This however results in less than helpful core dumps when the
> > error is internal to QEMU. We can detect when an assert failure is in
> > progress by examining __glib_assert_msg and fall through to
> > cpu_abort() which will pretty print something before restoring the
> > default SIGABRT behaviour and dumping core.
> >
> > Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> > ---
> >   linux-user/signal.c | 6 ++++++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/linux-user/signal.c b/linux-user/signal.c
> > index 32854bb375..8ecc1215f7 100644
> > --- a/linux-user/signal.c
> > +++ b/linux-user/signal.c
> > @@ -809,6 +809,8 @@ static inline void rewind_if_in_safe_syscall(void *puc)
> >       }
> >   }
> >
> > +GLIB_VAR char *__glib_assert_msg;
> > +
> >   static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
> >   {
> >       CPUArchState *env = thread_cpu->env_ptr;
> > @@ -821,6 +823,10 @@ static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
> >       uintptr_t pc = 0;
> >       bool sync_sig = false;
> >
> > +    if (__glib_assert_msg) {
> > +        cpu_abort(cpu, "internal QEMU error, aborting...");
> > +    }
>
> I think we should not be trapping SIGABRT.  I think we can preserve all guest behaviour
> wrt SIGABRT by stealing another SIGRTMIN value, and remapping the guest signal number.  We
> can produce the correct result for the system by mapping it back to host SIGABRT in
> core_dump_and_abort().

Stealing signal values is awkward, because you don't know what the guest
(either application or libc) might be trying to do with them. I think
we should prefer not to steal them, especially in this case where the
only reason for taking a signal value for our own use is to deal with
a "should never happen anyway" case. We've already had a few bugs/awkwardnesses
with the current level of signal stealing/rearranging we have to do.

thanks
-- PMM
diff mbox series

Patch

diff --git a/linux-user/signal.c b/linux-user/signal.c
index 32854bb375..8ecc1215f7 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -809,6 +809,8 @@  static inline void rewind_if_in_safe_syscall(void *puc)
     }
 }
 
+GLIB_VAR char *__glib_assert_msg;
+
 static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
 {
     CPUArchState *env = thread_cpu->env_ptr;
@@ -821,6 +823,10 @@  static void host_signal_handler(int host_sig, siginfo_t *info, void *puc)
     uintptr_t pc = 0;
     bool sync_sig = false;
 
+    if (__glib_assert_msg) {
+        cpu_abort(cpu, "internal QEMU error, aborting...");
+    }
+
     /*
      * Non-spoofed SIGSEGV and SIGBUS are synchronous, and need special
      * handling wrt signal blocking and unwinding.