Message ID | 20250429170804.2649622-1-kuba@kernel.org |
---|---|
State | Superseded |
Headers | show |
Series | [net-next,v2] selftests: net: exit cleanly on SIGTERM / timeout | expand |
Jakub Kicinski wrote: > ksft runner sends 2 SIGTERMs in a row if a test runs out of time. > Handle this in a similar way we handle SIGINT - cleanup and stop > running further tests. > > Because we get 2 signals we need a bit of logic to ignore > the subsequent one, they come immediately one after the other > (due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM > to runner child")). > > This change makes sure we run cleanup (scheduled defer()s) > and also print a stack trace on SIGTERM, which doesn't happen > by default. Tests occasionally hang in NIPA and it's impossible > to tell what they are waiting from or doing. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com>
On 29/04/2025 18:08, Jakub Kicinski wrote: > +class KsftTerminate(KeyboardInterrupt): > + pass ... > @@ -229,11 +249,12 @@ KSFT_DISRUPTIVE = True > cnt_key = 'xfail' > except BaseException as e: > stop |= isinstance(e, KeyboardInterrupt) > + stop |= isinstance(e, KsftTerminate) The first isinstance() will return True for a KsftTerminate as it's a subclass of KeyboardInterrupt, and thus the second line isn't needed.
On 4/30/25 8:03 PM, Edward Cree wrote: > On 29/04/2025 18:08, Jakub Kicinski wrote: >> +class KsftTerminate(KeyboardInterrupt): >> + pass > ... >> @@ -229,11 +249,12 @@ KSFT_DISRUPTIVE = True >> cnt_key = 'xfail' >> except BaseException as e: >> stop |= isinstance(e, KeyboardInterrupt) >> + stop |= isinstance(e, KsftTerminate) > > The first isinstance() will return True for a KsftTerminate as it's a > subclass of KeyboardInterrupt, and thus the second line isn't needed. @Jakub: I'm using the selftests code to refresh my rather rusty python skills, I think it would be good to address the above and keep the codebase clean. Thanks, Paolo
On Fri, 2 May 2025 15:05:48 +0200 Paolo Abeni wrote: > >> @@ -229,11 +249,12 @@ KSFT_DISRUPTIVE = True > >> cnt_key = 'xfail' > >> except BaseException as e: > >> stop |= isinstance(e, KeyboardInterrupt) > >> + stop |= isinstance(e, KsftTerminate) > > > > The first isinstance() will return True for a KsftTerminate as it's a > > subclass of KeyboardInterrupt, and thus the second line isn't needed. > > @Jakub: I'm using the selftests code to refresh my rather rusty python > skills, I think it would be good to address the above and keep the > codebase clean. Right, right, I was just distracted the last 3 days, wasn't trying to ignore :) Will respin shortly, good catch indeed.
diff --git a/tools/testing/selftests/net/lib/py/ksft.py b/tools/testing/selftests/net/lib/py/ksft.py index 3cfad0fd4570..1b815768bf8a 100644 --- a/tools/testing/selftests/net/lib/py/ksft.py +++ b/tools/testing/selftests/net/lib/py/ksft.py @@ -3,6 +3,7 @@ import builtins import functools import inspect +import signal import sys import time import traceback @@ -26,6 +27,10 @@ KSFT_DISRUPTIVE = True pass +class KsftTerminate(KeyboardInterrupt): + pass + + def ksft_pr(*objs, **kwargs): print("#", *objs, **kwargs) @@ -193,6 +198,17 @@ KSFT_DISRUPTIVE = True return env +def _ksft_intr(signum, frame): + # ksft runner.sh sends 2 SIGTERMs in a row on a timeout + # if we don't ignore the second one it will stop us from handling cleanup + global term_cnt + term_cnt += 1 + if term_cnt == 1: + raise KsftTerminate() + else: + ksft_pr(f"Ignoring SIGTERM (cnt: {term_cnt}), already exiting...") + + def ksft_run(cases=None, globs=None, case_pfx=None, args=()): cases = cases or [] @@ -205,6 +221,10 @@ KSFT_DISRUPTIVE = True cases.append(value) break + global term_cnt + term_cnt = 0 + prev_sigterm = signal.signal(signal.SIGTERM, _ksft_intr) + totals = {"pass": 0, "fail": 0, "skip": 0, "xfail": 0} print("TAP version 13") @@ -229,11 +249,12 @@ KSFT_DISRUPTIVE = True cnt_key = 'xfail' except BaseException as e: stop |= isinstance(e, KeyboardInterrupt) + stop |= isinstance(e, KsftTerminate) tb = traceback.format_exc() for line in tb.strip().split('\n'): ksft_pr("Exception|", line) if stop: - ksft_pr("Stopping tests due to KeyboardInterrupt.") + ksft_pr(f"Stopping tests due to {type(e).__name__}.") KSFT_RESULT = False cnt_key = 'fail' @@ -248,6 +269,8 @@ KSFT_DISRUPTIVE = True if stop: break + signal.signal(signal.SIGTERM, prev_sigterm) + print( f"# Totals: pass:{totals['pass']} fail:{totals['fail']} xfail:{totals['xfail']} xpass:0 skip:{totals['skip']} error:0" )
ksft runner sends 2 SIGTERMs in a row if a test runs out of time. Handle this in a similar way we handle SIGINT - cleanup and stop running further tests. Because we get 2 signals we need a bit of logic to ignore the subsequent one, they come immediately one after the other (due to commit 9616cb34b08e ("kselftest/runner.sh: Propagate SIGTERM to runner child")). This change makes sure we run cleanup (scheduled defer()s) and also print a stack trace on SIGTERM, which doesn't happen by default. Tests occasionally hang in NIPA and it's impossible to tell what they are waiting from or doing. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- v2: - remove declaration at the global scope v1: https://lore.kernel.org/20250425151757.1652517-1-kuba@kernel.org CC: petrm@nvidia.com CC: willemb@google.com CC: sdf@fomichev.me CC: linux-kselftest@vger.kernel.org --- tools/testing/selftests/net/lib/py/ksft.py | 25 +++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)