mbox series

[net-next,0/3] selftests: openvswitch: Address some flakes in the CI environment

Message ID 20240702132830.213384-1-aconole@redhat.com
Headers show
Series selftests: openvswitch: Address some flakes in the CI environment | expand

Message

Aaron Conole July 2, 2024, 1:28 p.m. UTC
These patches aim to make using the openvswitch testsuite more reliable.
These should address the major sources of flakiness in the openvswitch
test suite allowing the CI infrastructure to exercise the openvswitch
module for patch series.  There should be no change for users who simply
run the tests (except that patch 3/3 does make some of the debugging a bit
easier by making some output more verbose).

Aaron Conole (3):
  selftests: openvswitch: Bump timeout to 15 minutes.
  selftests: openvswitch: Attempt to autoload module.
  selftests: openvswitch: Be more verbose with selftest debugging.

 .../selftests/net/openvswitch/openvswitch.sh  | 23 ++++++++++++-------
 .../selftests/net/openvswitch/settings        |  1 +
 2 files changed, 16 insertions(+), 8 deletions(-)
 create mode 100644 tools/testing/selftests/net/openvswitch/settings

Comments

Simon Horman July 3, 2024, 4:55 p.m. UTC | #1
On Tue, Jul 02, 2024 at 09:28:28AM -0400, Aaron Conole wrote:
> We found that since some tests rely on the TCP SYN timeouts to cause flow
> misses, the default test suite timeout of 45 seconds is quick to be
> exceeded.  Bump the timeout to 15 minutes.
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Simon Horman <horms@kernel.org>

FWIIW, locally I had been using a timeout of 720s.
So 900 seems entirely reasonable to me.
Simon Horman July 3, 2024, 4:55 p.m. UTC | #2
On Tue, Jul 02, 2024 at 09:28:30AM -0400, Aaron Conole wrote:
> The openvswitch selftest is difficult to debug for anyone that isn't
> directly familiar with the openvswitch module and the specifics of the
> test cases.  Many times when something fails, the debug log will be
> sparsely populated and it takes some time to understand where a failure
> occured.
> 
> Increase the amount of details logged to the debug log by trapping all
> 'info' logs, and all 'ovs_sbx' commands.
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Jakub Kicinski July 5, 2024, 1:28 p.m. UTC | #3
On Tue,  2 Jul 2024 09:28:27 -0400 Aaron Conole wrote:
> These patches aim to make using the openvswitch testsuite more reliable.
> These should address the major sources of flakiness in the openvswitch
> test suite allowing the CI infrastructure to exercise the openvswitch
> module for patch series.  There should be no change for users who simply
> run the tests (except that patch 3/3 does make some of the debugging a bit
> easier by making some output more verbose).

Hi Aaron!

The results look solid on normal builds now, but with a debug kernel
the test is failing consistently:

https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=openvswitch-sh
Aaron Conole July 5, 2024, 1:49 p.m. UTC | #4
Jakub Kicinski <kuba@kernel.org> writes:

> On Tue,  2 Jul 2024 09:28:27 -0400 Aaron Conole wrote:
>> These patches aim to make using the openvswitch testsuite more reliable.
>> These should address the major sources of flakiness in the openvswitch
>> test suite allowing the CI infrastructure to exercise the openvswitch
>> module for patch series.  There should be no change for users who simply
>> run the tests (except that patch 3/3 does make some of the debugging a bit
>> easier by making some output more verbose).
>
> Hi Aaron!
>
> The results look solid on normal builds now, but with a debug kernel
> the test is failing consistently:
>
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=openvswitch-sh

Yes - it shows a test case issue with the upcall and psample tests.

Adrian and I discussed the correct approach would be using a wait_for
instead of just sleeping, because it seems the dbg environment might be
too racy.  I think he is working on a follow up to submit after the
psample work gets merged - we were hoping not to hold that patch series
up with more potential conflicts or merge issues if that's okay.
Jakub Kicinski July 5, 2024, 1:53 p.m. UTC | #5
On Fri, 05 Jul 2024 09:49:12 -0400 Aaron Conole wrote:
> > The results look solid on normal builds now, but with a debug kernel
> > the test is failing consistently:
> >
> > https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=openvswitch-sh  
> 
> Yes - it shows a test case issue with the upcall and psample tests.
> 
> Adrian and I discussed the correct approach would be using a wait_for
> instead of just sleeping, because it seems the dbg environment might be
> too racy.  I think he is working on a follow up to submit after the
> psample work gets merged - we were hoping not to hold that patch series
> up with more potential conflicts or merge issues if that's okay.

Makes sense, thanks!
Adrián Moreno July 5, 2024, 2:01 p.m. UTC | #6
On Fri, Jul 05, 2024 at 09:49:12AM GMT, Aaron Conole wrote:
> Jakub Kicinski <kuba@kernel.org> writes:
>
> > On Tue,  2 Jul 2024 09:28:27 -0400 Aaron Conole wrote:
> >> These patches aim to make using the openvswitch testsuite more reliable.
> >> These should address the major sources of flakiness in the openvswitch
> >> test suite allowing the CI infrastructure to exercise the openvswitch
> >> module for patch series.  There should be no change for users who simply
> >> run the tests (except that patch 3/3 does make some of the debugging a bit
> >> easier by making some output more verbose).
> >
> > Hi Aaron!
> >
> > The results look solid on normal builds now, but with a debug kernel
> > the test is failing consistently:
> >
> > https://netdev.bots.linux.dev/contest.html?executor=vmksft-net-dbg&test=openvswitch-sh
>
> Yes - it shows a test case issue with the upcall and psample tests.
>
> Adrian and I discussed the correct approach would be using a wait_for
> instead of just sleeping, because it seems the dbg environment might be
> too racy.  I think he is working on a follow up to submit after the
> psample work gets merged - we were hoping not to hold that patch series
> up with more potential conflicts or merge issues if that's okay.
>

Yes. I am working on a patch to solve the failures in slow systems.

Thanks.
Adrián