diff mbox series

[net,10/10] selftests: mptcp: explicitly trigger the listener diag code-path

Message ID 20240223-upstream-net-20240223-misc-fixes-v1-10-162e87e48497@kernel.org
State New
Headers show
Series mptcp: more misc. fixes for v6.8 | expand

Commit Message

Matthieu Baerts (NGI0) Feb. 23, 2024, 4:14 p.m. UTC
From: Paolo Abeni <pabeni@redhat.com>

The mptcp diag interface already experienced a few locking bugs
that lockdep and appropriate coverage have detected in advance.

Let's add a test-case triggering the relevant code path, to prevent
similar issues in the future.

Be careful to cope with very slow environments.

Note that we don't need an explicit timeout on the mptcp_connect
subprocess to cope with eventual bug/hang-up as the final cleanup
terminating the child processes will take care of that.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
---
 tools/testing/selftests/net/mptcp/diag.sh | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski March 1, 2024, 2:37 p.m. UTC | #1
On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> 
> The mptcp diag interface already experienced a few locking bugs
> that lockdep and appropriate coverage have detected in advance.
> 
> Let's add a test-case triggering the relevant code path, to prevent
> similar issues in the future.
> 
> Be careful to cope with very slow environments.
> 
> Note that we don't need an explicit timeout on the mptcp_connect
> subprocess to cope with eventual bug/hang-up as the final cleanup
> terminating the child processes will take care of that.

Hi!

There's a failure in CI under debug after merging net and net-next 
in diag.sh. Maybe because of the patch which lowered timeout?
https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/
Matthieu Baerts (NGI0) March 1, 2024, 3:10 p.m. UTC | #2
Hi Jakub,

On 01/03/2024 15:37, Jakub Kicinski wrote:
> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
>> From: Paolo Abeni <pabeni@redhat.com>
>>
>> The mptcp diag interface already experienced a few locking bugs
>> that lockdep and appropriate coverage have detected in advance.
>>
>> Let's add a test-case triggering the relevant code path, to prevent
>> similar issues in the future.
>>
>> Be careful to cope with very slow environments.
>>
>> Note that we don't need an explicit timeout on the mptcp_connect
>> subprocess to cope with eventual bug/hang-up as the final cleanup
>> terminating the child processes will take care of that.
> 
> Hi!
> 
> There's a failure in CI under debug after merging net and net-next 
> in diag.sh. Maybe because of the patch which lowered timeout?
> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/

Thank you for this message!

I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
do on top of the debug kconfig, but I see I can reproduce it on slower
environments. Indeed, it looks like it can be caused by that
modification. I will send a fix ASAP!

Cheers,
Matt
Matthieu Baerts (NGI0) March 1, 2024, 6:24 p.m. UTC | #3
Hi Jakub,

On 01/03/2024 16:10, Matthieu Baerts wrote:
> On 01/03/2024 15:37, Jakub Kicinski wrote:
>> On Fri, 23 Feb 2024 17:14:20 +0100 Matthieu Baerts (NGI0) wrote:
>>> From: Paolo Abeni <pabeni@redhat.com>
>>>
>>> The mptcp diag interface already experienced a few locking bugs
>>> that lockdep and appropriate coverage have detected in advance.
>>>
>>> Let's add a test-case triggering the relevant code path, to prevent
>>> similar issues in the future.
>>>
>>> Be careful to cope with very slow environments.
>>>
>>> Note that we don't need an explicit timeout on the mptcp_connect
>>> subprocess to cope with eventual bug/hang-up as the final cleanup
>>> terminating the child processes will take care of that.
>>
>> Hi!
>>
>> There's a failure in CI under debug after merging net and net-next 
>> in diag.sh. Maybe because of the patch which lowered timeout?
>> https://lore.kernel.org/all/20240223-upstream-net-next-20240223-misc-improvements-v1-8-b6c8a10396bd@kernel.org/
> 
> Thank you for this message!
> 
> I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
> do on top of the debug kconfig, but I see I can reproduce it on slower
> environments. Indeed, it looks like it can be caused by that
> modification. I will send a fix ASAP!

The following patch fixes the issue on my side (when using 'renice 20'
and stress-ng in parallel :) ):

https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/

I queued it for -net, but the issue should only be visible in net-next
due to the reduction of the timeout, as you suggested. This patch can
also be applied in net-next if preferred, it will not cause any
conflicts between net and net-next.

Cheers,
Matt
Jakub Kicinski March 1, 2024, 6:55 p.m. UTC | #4
On Fri, 1 Mar 2024 19:24:53 +0100 Matthieu Baerts wrote:
> > Thank you for this message!
> > 
> > I didn't have this error on my side, even without '-d SLUB_DEBUG_ON' we
> > do on top of the debug kconfig, but I see I can reproduce it on slower
> > environments. Indeed, it looks like it can be caused by that
> > modification. I will send a fix ASAP!  
> 
> The following patch fixes the issue on my side (when using 'renice 20'
> and stress-ng in parallel :) ):
> 
> https://lore.kernel.org/netdev/20240301-upstream-net-20240301-selftests-mptcp-diag-exit-timeout-v1-2-07cb2fa15c06@kernel.org/T/

Nice! Note only a fix but also lowers runtime, if I'm reading right!

> I queued it for -net, but the issue should only be visible in net-next
> due to the reduction of the timeout, as you suggested. This patch can
> also be applied in net-next if preferred, it will not cause any
> conflicts between net and net-next.

No strong preference, net sounds good.

Thanks for a quick turnaround!
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/mptcp/diag.sh b/tools/testing/selftests/net/mptcp/diag.sh
index 0a58ebb8b04c..f300f4e1eb59 100755
--- a/tools/testing/selftests/net/mptcp/diag.sh
+++ b/tools/testing/selftests/net/mptcp/diag.sh
@@ -20,7 +20,7 @@  flush_pids()
 
 	ip netns pids "${ns}" | xargs --no-run-if-empty kill -SIGUSR1 &>/dev/null
 
-	for _ in $(seq 10); do
+	for _ in $(seq $((timeout_poll * 10))); do
 		[ -z "$(ip netns pids "${ns}")" ] && break
 		sleep 0.1
 	done
@@ -91,6 +91,15 @@  chk_msk_nr()
 	__chk_msk_nr "grep -c token:" "$@"
 }
 
+chk_listener_nr()
+{
+	local expected=$1
+	local msg="$2"
+
+	__chk_nr "ss -inmlHMON $ns | wc -l" "$expected" "$msg - mptcp" 0
+	__chk_nr "ss -inmlHtON $ns | wc -l" "$expected" "$msg - subflows"
+}
+
 wait_msk_nr()
 {
 	local condition="grep -c token:"
@@ -289,5 +298,24 @@  flush_pids
 chk_msk_inuse 0 "many->0"
 chk_msk_cestab 0 "many->0"
 
+chk_listener_nr 0 "no listener sockets"
+NR_SERVERS=100
+for I in $(seq 1 $NR_SERVERS); do
+	ip netns exec $ns ./mptcp_connect -p $((I + 20001)) \
+		-t ${timeout_poll} -l 0.0.0.0 >/dev/null 2>&1 &
+done
+
+for I in $(seq 1 $NR_SERVERS); do
+	mptcp_lib_wait_local_port_listen $ns $((I + 20001))
+done
+
+chk_listener_nr $NR_SERVERS "many listener sockets"
+
+# graceful termination
+for I in $(seq 1 $NR_SERVERS); do
+	echo a | ip netns exec $ns ./mptcp_connect -p $((I + 20001)) 127.0.0.1 >/dev/null 2>&1 &
+done
+flush_pids
+
 mptcp_lib_result_print_all_tap
 exit $ret