diff mbox series

[net-next] selftests: net: exit cleanly on SIGTERM / timeout

Message ID 20250425151757.1652517-1-kuba@kernel.org
State Superseded
Headers show
Series [net-next] selftests: net: exit cleanly on SIGTERM / timeout | expand

Commit Message

Jakub Kicinski April 25, 2025, 3:17 p.m. UTC
ksft runner sends 2 SIGTERMs in a row if a test runs out of time.
Handle this in a similar way we handle SIGINT - cleanup and stop
running further tests.

Because we get 2 signals we need a bit of logic to ignore
the subsequent one, they come immediately one after the other
(due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM
to runner child")).

This change makes sure we run cleanup (scheduled defer()s)
and also print a stack trace on SIGTERM, which doesn't happen
by default. Tests occasionally hang in NIPA and it's impossible
to tell what they are waiting from or doing.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: petrm@nvidia.com
CC: willemb@google.com
CC: sdf@fomichev.me
CC: linux-kselftest@vger.kernel.org
---
 tools/testing/selftests/net/lib/py/ksft.py | 27 +++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski April 28, 2025, 8:24 p.m. UTC | #1
On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
> > @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True
> >      return env
> >  
> >  
> > +term_cnt = 0
> > +  
> 
> A bit ugly to initialize this here. Also, it already is initialized
> below.

We need a global so that the signal handler can access it.
Python doesn't have syntax to define a variable without a value.
Or do you suggest term_cnt = None ?

The whole term_cnt dance is super ugly, couldn't think of a cleaner way.
It's really annoying that ksft infra sends 2 terminating signals one
immediately after the other :|

> > +def _ksft_intr(signum, frame):
> > +    # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
> > +    # if we don't ignore the second one it will stop us from handling cleanup
> > +    global term_cnt
> > +    term_cnt += 1
> > +    if term_cnt == 1:
> > +        raise KsftTerminate()
> > +    else:
> > +        ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
> > +
> > +
> >  def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
> >      cases = cases or []
> >  
> > @@ -205,6 +223,10 @@ KSFT_DISRUPTIVE = True
> >                      cases.append(value)
> >                      break
> >  
> > +    global term_cnt
> > +    term_cnt = 0
> > +    prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
> > +
> >      totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
> >  
> >      print("TAP version 13")
> > @@ -229,11 +251,12 @@ KSFT_DISRUPTIVE = True
> >              cnt_key = 'xfail'
> >          except BaseException as e:
> >              stop |= isinstance(e, KeyboardInterrupt)
> > +            stop |= isinstance(e, KsftTerminate)
> >              tb = traceback.format_exc()
> >              for line in tb.strip().split('\n'):
> >                  ksft_pr("Exception|", line)
> >              if stop:
> > -                ksft_pr("Stopping tests due to KeyboardInterrupt.")
> > +                ksft_pr(f"Stopping tests due to {type(e).__name__}.")
> >              KSFT_RESULT = False
> >              cnt_key = 'fail'
> >  
> > @@ -248,6 +271,8 @@ KSFT_DISRUPTIVE = True
> >          if stop:
> >              break
> >  
> > +    signal.signal(signal.SIGTERM, prev_sigterm)
> > +  
> 
> Why is prev_sigterm saved and reassigned as handler here?

Because we ignore all signals when cnt > 2 I didn't want to keep our
handler installed. Just in case something after ksft_run() hangs.
It should be equivalent to

	signal.signal(signal.SIGTERM, signal.SIG_DLF)

if the prev is of concern. Then again keeping prev doesn't change #LOC
Paolo Abeni April 29, 2025, 2:49 p.m. UTC | #2
On 4/29/25 3:27 AM, Willem de Bruijn wrote:
> Reviewed-by: Willem de Bruijn <willemb@google.com>
> 
> Jakub Kicinski wrote:
>> On Sat, 26 Apr 2025 11:15:34 -0400 Willem de Bruijn wrote:
>>>> @@ -193,6 +198,19 @@ KSFT_DISRUPTIVE = True
>>>>      return env
>>>>  
>>>>  
>>>> +term_cnt = 0
>>>> +  
>>>
>>> A bit ugly to initialize this here. Also, it already is initialized
>>> below.
>>
>> We need a global so that the signal handler can access it.
>> Python doesn't have syntax to define a variable without a value.
>> Or do you suggest term_cnt = None ?
> 
> I meant that the "global term_cnt" in ksft_run below already creates
> the global var, and is guaranteed to do so before _ksft_intr, so no
> need to also define it outside a function.
> 
> Obviously not very important, don't mean to ask for a respin. LGTM.

FWIW I think it's better to avoid the unneeded assignment in global
scope, so I would suggest either follow-up or a v2, whatever is simpler.

Thanks,

Paolo
Jakub Kicinski April 29, 2025, 5:07 p.m. UTC | #3
On Mon, 28 Apr 2025 21:27:32 -0400 Willem de Bruijn wrote:
> > > A bit ugly to initialize this here. Also, it already is initialized
> > > below.  
> > 
> > We need a global so that the signal handler can access it.
> > Python doesn't have syntax to define a variable without a value.
> > Or do you suggest term_cnt = None ?  
> 
> I meant that the "global term_cnt" in ksft_run below already creates
> the global var, and is guaranteed to do so before _ksft_intr, so no
> need to also define it outside a function.
> 
> Obviously not very important, don't mean to ask for a respin. LGTM.

Oh wow, thanks! totally didn't know that using the global is enough
to add something to the global scope.
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py
index 3cfad0fd4570..73710634d457 100644
--- a/tools/testing/selftests/net/lib/py/ksft.py
+++ b/tools/testing/selftests/net/lib/py/ksft.py
@@ -3,6 +3,7 @@ 
 import builtins
 import functools
 import inspect
+import signal
 import sys
 import time
 import traceback
@@ -26,6 +27,10 @@  KSFT_DISRUPTIVE = True
     pass
 
 
+class KsftTerminate(KeyboardInterrupt):
+    pass
+
+
 def ksft_pr(*objs, **kwargs):
     print("#", *objs, **kwargs)
 
@@ -193,6 +198,19 @@  KSFT_DISRUPTIVE = True
     return env
 
 
+term_cnt = 0
+
+def _ksft_intr(signum, frame):
+    # ksft runner.sh sends 2 SIGTERMs in a row on a timeout
+    # if we don't ignore the second one it will stop us from handling cleanup
+    global term_cnt
+    term_cnt += 1
+    if term_cnt == 1:
+        raise KsftTerminate()
+    else:
+        ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...")
+
+
 def ksft_run(cases=None, globs=None, case_pfx=None, args=()):
     cases = cases or []
 
@@ -205,6 +223,10 @@  KSFT_DISRUPTIVE = True
                     cases.append(value)
                     break
 
+    global term_cnt
+    term_cnt = 0
+    prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr)
+
     totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0}
 
     print("TAP version 13")
@@ -229,11 +251,12 @@  KSFT_DISRUPTIVE = True
             cnt_key = 'xfail'
         except BaseException as e:
             stop |= isinstance(e, KeyboardInterrupt)
+            stop |= isinstance(e, KsftTerminate)
             tb = traceback.format_exc()
             for line in tb.strip().split('\n'):
                 ksft_pr("Exception|", line)
             if stop:
-                ksft_pr("Stopping tests due to KeyboardInterrupt.")
+                ksft_pr(f"Stopping tests due to {type(e).__name__}.")
             KSFT_RESULT = False
             cnt_key = 'fail'
 
@@ -248,6 +271,8 @@  KSFT_DISRUPTIVE = True
         if stop:
             break
 
+    signal.signal(signal.SIGTERM, prev_sigterm)
+
     print(
         f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0"
     )