diff mbox series

[v2,3/3] gdb/nat/linux: Fix attaching to process when it has zombie threads

Message ID 20240420055652.819024-4-thiago.bauermann@linaro.org
State Superseded
Headers show
Series Fix attaching to process when it has zombie threads | expand

Commit Message

Thiago Jung Bauermann April 20, 2024, 5:56 a.m. UTC
When GDB attaches to a multi-threaded process, it calls
linux_proc_attach_tgid_threads () to go through all threads found in
/proc/PID/task/ and call attach_proc_task_lwp_callback () on each of
them.  If it does that twice without the callback reporting that a new
thread was found, then it considers that all inferior threads have been
found and returns.

The problem is that the callback considers any thread that it hasn't
attached to yet as new.  This causes problems if the process has one or
more zombie threads, because GDB can't attach to it and the loop will
always "find" a new thread (the zombie one), and get stuck in an
infinite loop.

This is easy to trigger (at least on aarch64-linux and powerpc64le-linux)
with the gdb.threads/attach-many-short-lived-threads.exp testcase, because
its test program constantly creates and finishes joinable threads so the
chance of having zombie threads is high.

This problem causes the following failures:

FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: attach (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: no new threads (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint always-inserted on (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break break_fn (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 1 (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 2 (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 3 (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: reset timer in the inferior (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: print seconds_left (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: detach (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint always-inserted off (timeout)
FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: delete all breakpoints, watchpoints, tracepoints, and catchpoints in delete_breakpoints (timeout)
ERROR: breakpoints not deleted

The iteration number is random, and all tests in the subsequent iterations
fail too, because GDB is stuck in the attach command at the beginning of
the iteration.

The solution is to make linux_proc_attach_tgid_threads () remember when it
has already processed a given LWP and skip it in the subsequent iterations.

PR testsuite/31312
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31312

Reviewed-By: Luis Machado <luis.machado@arm.com>
---
 gdb/nat/linux-procfs.c | 53 ++++++++++++++++++++++++++++++++++++++++++
 gdb/nat/linux-procfs.h |  1 +
 2 files changed, 54 insertions(+)

Changes in v2:
- Added macro for field index in /proc/PID/stat (Suggested by Luis).
- Moved linux_get_starttime to linux-procfs.c and changed its prefix
  to linux_proc (Suggested by Pedro).
- Changed visited_lwps from std::set to std::unordered_set. Had to add a
  hash function (Suggested by Pedro).

Comments

Pedro Alves April 22, 2024, 7:59 p.m. UTC | #1
On 2024-04-20 06:56, Thiago Jung Bauermann wrote:
> When GDB attaches to a multi-threaded process, it calls
> linux_proc_attach_tgid_threads () to go through all threads found in
> /proc/PID/task/ and call attach_proc_task_lwp_callback () on each of
> them.  If it does that twice without the callback reporting that a new
> thread was found, then it considers that all inferior threads have been
> found and returns.
> 
> The problem is that the callback considers any thread that it hasn't
> attached to yet as new.  This causes problems if the process has one or
> more zombie threads, because GDB can't attach to it and the loop will
> always "find" a new thread (the zombie one), and get stuck in an
> infinite loop.
> 
> This is easy to trigger (at least on aarch64-linux and powerpc64le-linux)
> with the gdb.threads/attach-many-short-lived-threads.exp testcase, because
> its test program constantly creates and finishes joinable threads so the
> chance of having zombie threads is high.
> 
> This problem causes the following failures:
> 
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: attach (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: no new threads (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint always-inserted on (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break break_fn (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 1 (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 2 (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 3 (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: reset timer in the inferior (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: print seconds_left (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: detach (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint always-inserted off (timeout)
> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: delete all breakpoints, watchpoints, tracepoints, and catchpoints in delete_breakpoints (timeout)
> ERROR: breakpoints not deleted
> 
> The iteration number is random, and all tests in the subsequent iterations
> fail too, because GDB is stuck in the attach command at the beginning of
> the iteration.
> 
> The solution is to make linux_proc_attach_tgid_threads () remember when it
> has already processed a given LWP and skip it in the subsequent iterations.
> 
> PR testsuite/31312
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31312
> 
> Reviewed-By: Luis Machado <luis.machado@arm.com>

Approved-By: Pedro Alves <pedro@palves.net>

BTW, after seeing the other patches after patch #1, I do agree with giving names
to the stat fields.
Thiago Jung Bauermann April 24, 2024, 11:15 p.m. UTC | #2
Pedro Alves <pedro@palves.net> writes:

> On 2024-04-20 06:56, Thiago Jung Bauermann wrote:
>> When GDB attaches to a multi-threaded process, it calls
>> linux_proc_attach_tgid_threads () to go through all threads found in
>> /proc/PID/task/ and call attach_proc_task_lwp_callback () on each of
>> them.  If it does that twice without the callback reporting that a new
>> thread was found, then it considers that all inferior threads have been
>> found and returns.
>> 
>> The problem is that the callback considers any thread that it hasn't
>> attached to yet as new.  This causes problems if the process has one or
>> more zombie threads, because GDB can't attach to it and the loop will
>> always "find" a new thread (the zombie one), and get stuck in an
>> infinite loop.
>> 
>> This is easy to trigger (at least on aarch64-linux and powerpc64le-linux)
>> with the gdb.threads/attach-many-short-lived-threads.exp testcase, because
>> its test program constantly creates and finishes joinable threads so the
>> chance of having zombie threads is high.
>> 
>> This problem causes the following failures:
>> 
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: attach (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: no new threads (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint
>> always-inserted on (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break break_fn (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 1
>> (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 2
>> (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: break at break_fn: 3
>> (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: reset timer in the
>> inferior (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: print seconds_left
>> (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: detach (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: set breakpoint
>> always-inserted off (timeout)
>> FAIL: gdb.threads/attach-many-short-lived-threads.exp: iter 8: delete all breakpoints,
>> watchpoints, tracepoints, and catchpoints in delete_breakpoints (timeout)
>> ERROR: breakpoints not deleted
>> 
>> The iteration number is random, and all tests in the subsequent iterations
>> fail too, because GDB is stuck in the attach command at the beginning of
>> the iteration.
>> 
>> The solution is to make linux_proc_attach_tgid_threads () remember when it
>> has already processed a given LWP and skip it in the subsequent iterations.
>> 
>> PR testsuite/31312
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31312
>> 
>> Reviewed-By: Luis Machado <luis.machado@arm.com>
>
> Approved-By: Pedro Alves <pedro@palves.net>

Thank you!

> BTW, after seeing the other patches after patch #1, I do agree with giving names
> to the stat fields.

Great. You did bring up interesting points about them though, so it was
an interesting exercise.
diff mbox series

Patch

diff --git a/gdb/nat/linux-procfs.c b/gdb/nat/linux-procfs.c
index 23231c301a3f..2ed7b36ecd55 100644
--- a/gdb/nat/linux-procfs.c
+++ b/gdb/nat/linux-procfs.c
@@ -20,6 +20,8 @@ 
 #include "gdbsupport/filestuff.h"
 #include <dirent.h>
 #include <sys/stat.h>
+#include <unordered_set>
+#include <utility>
 
 /* Return the TGID of LWPID from /proc/pid/status.  Returns -1 if not
    found.  */
@@ -271,6 +273,29 @@  linux_proc_get_stat_field (ptid_t ptid, int field)
     return content->substr (pos, end_pos - pos);
 }
 
+/* Get the start time of thread PTID.  */
+
+static std::optional<ULONGEST>
+linux_proc_get_starttime (ptid_t ptid)
+{
+  std::optional<std::string> field
+    = linux_proc_get_stat_field (ptid, LINUX_PROC_STAT_STARTTIME);
+
+  if (!field.has_value ())
+    return {};
+
+  errno = 0;
+  const char *trailer;
+  ULONGEST starttime = strtoulst (field->c_str (), &trailer, 10);
+  if (starttime == ULONGEST_MAX && errno == ERANGE)
+    return {};
+  else if (*trailer != '\0')
+    /* There were unexpected characters.  */
+    return {};
+
+  return starttime;
+}
+
 /* See linux-procfs.h.  */
 
 const char *
@@ -332,6 +357,21 @@  linux_proc_attach_tgid_threads (pid_t pid,
       return;
     }
 
+  /* Callable object to hash elements in visited_lpws.  */
+  struct pair_hash
+  {
+    std::size_t operator() (const std::pair<unsigned long, ULONGEST> &v) const
+    {
+      return (std::hash<unsigned long>() (v.first)
+	      ^ std::hash<ULONGEST>() (v.second));
+    }
+  };
+
+  /* Keeps track of the LWPs we have already visited in /proc,
+     identified by their PID and starttime to detect PID reuse.  */
+  std::unordered_set<std::pair<unsigned long, ULONGEST>,
+		     pair_hash> visited_lwps;
+
   /* Scan the task list for existing threads.  While we go through the
      threads, new threads may be spawned.  Cycle through the list of
      threads until we have done two iterations without finding new
@@ -350,6 +390,19 @@  linux_proc_attach_tgid_threads (pid_t pid,
 	  if (lwp != 0)
 	    {
 	      ptid_t ptid = ptid_t (pid, lwp);
+	      std::optional<ULONGEST> starttime
+		= linux_proc_get_starttime (ptid);
+
+	      if (starttime.has_value ())
+		{
+		  std::pair<unsigned long, ULONGEST> key (lwp, *starttime);
+
+		  /* If we already visited this LWP, skip it this time.  */
+		  if (visited_lwps.find (key) != visited_lwps.cend ())
+		    continue;
+
+		  visited_lwps.insert (key);
+		}
 
 	      if (attach_lwp (ptid))
 		new_threads_found = 1;
diff --git a/gdb/nat/linux-procfs.h b/gdb/nat/linux-procfs.h
index ec1f37651fbf..64224801c8f2 100644
--- a/gdb/nat/linux-procfs.h
+++ b/gdb/nat/linux-procfs.h
@@ -56,6 +56,7 @@  extern int linux_proc_pid_is_gone (pid_t pid);
 
 /* Index of fields of interest in /proc/PID/stat, from procfs(5) man page.  */
 #define LINUX_PROC_STAT_STATE 3
+#define LINUX_PROC_STAT_STARTTIME 22
 #define LINUX_PROC_STAT_PROCESSOR 39
 
 /* Returns FIELD (as numbered in procfs(5) man page) of