diff mbox series

[bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected()

Message ID 20210519204132.107247-1-xiyou.wangcong@gmail.com
State New
Headers show
Series [bpf] selftests/bpf: Retry for EAGAIN in udp_redir_to_connected() | expand

Commit Message

Cong Wang May 19, 2021, 8:41 p.m. UTC
From: Cong Wang <cong.wang@bytedance.com>

We use non-blocking sockets for testing sockmap redirections,
and got some random EAGAIN errors from UDP tests.

There is no guarantee the packet would be immediately available
to receive as soon as it is sent out, even on the local host.
For UDP, this is especially true because it does not lock the
sock during BH (unlike the TCP path). This is probably why we
only saw this error in UDP cases.

No matter how hard we try to make the queue empty check accurate,
it is always possible for recvmsg() to beat ->sk_data_ready().
Therefore, we should just retry in case of EAGAIN.

Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
Reported-by: Jiang Wang <jiang.wang@bytedance.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Sitnicki <jakub@cloudflare.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Cong Wang <cong.wang@bytedance.com>
---
 tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Cong Wang May 20, 2021, 8 p.m. UTC | #1
On Thu, May 20, 2021 at 10:47 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > >
> > > Cong Wang wrote:
> > > > From: Cong Wang <cong.wang@bytedance.com>
> > > >
> > > > We use non-blocking sockets for testing sockmap redirections,
> > > > and got some random EAGAIN errors from UDP tests.
> > > >
> > > > There is no guarantee the packet would be immediately available
> > > > to receive as soon as it is sent out, even on the local host.
> > > > For UDP, this is especially true because it does not lock the
> > > > sock during BH (unlike the TCP path). This is probably why we
> > > > only saw this error in UDP cases.
> > > >
> > > > No matter how hard we try to make the queue empty check accurate,
> > > > it is always possible for recvmsg() to beat ->sk_data_ready().
> > > > Therefore, we should just retry in case of EAGAIN.
> > > >
> > > > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > > > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > > ---
> > > >  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > index 648d9ae898d2..b1ed182c4720 100644
> > > > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> > > >       if (pass != 1)
> > > >               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > > >
> > > > +again:
> > > >       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > > > -     if (n < 0)
> > > > +     if (n < 0) {
> > > > +             if (errno == EAGAIN)
> > > > +                     goto again;
> > > >               FAIL_ERRNO("%s: read", log_prefix);
> > >
> > > Needs a counter and abort logic we don't want to loop forever in the
> > > case the packet is lost.
> >
> > It should not be lost because selftests must be self-contained,
> > if the selftests could not even predict whether its own packet is
> > lost or not, we would have a much bigger trouble than just this
> > infinite loop.
> >
> > Thanks.
>
> Add the retry counter its maybe 4 lines of code. When I run this in a container

Sure, then the next question is how many times are you going to retry?
Let's say, 10. Then the next question would be is 10 sufficient? Clearly
not, because if the packet can be dropped (let's say by firewall), it can
be delayed too (let's say by netem).

Really, are we going to handle all of such cases? Or if we simply
assume the environment is self-contained and not hostile, none of
the above would happen hence nothing needs to be checked.

> and my memcg kicks in for some unknown reason I don't want it to loop
> forever I want it to fail gracefully. Plus it just looks bad to encode
> a non-terminating loop, in my opinion, so I want the counter.

There could be hundreds of reasons to drop a packet, or delay a
packet. How many of them do you want to consider and how many
of them do you not consider? Please draw a boundary.

The boundary I drew is very clear: we just assume the selftests
environment is not hostile. In fact, I believe selftests should setup
such an environment before running any test, for example, by creating
a container. And I believe this would make everyone happy.

Thanks.
Andrii Nakryiko May 21, 2021, 5:12 a.m. UTC | #2
On Thu, May 20, 2021 at 1:44 PM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>

> On Thu, May 20, 2021 at 1:14 PM Andrii Nakryiko

> <andrii.nakryiko@gmail.com> wrote:

> >

> > Bugs do happen though, so if you can detect some error condition

> > instead of having an infinite loop, then do it.

>

> You both are underestimating the problem. There are two different things

> to consider here:

>

> 1) Kernel bugs: This is known unknown, we certainly do not know

> how many bugs we have, otherwise they would have been fixed

> already. So we can not predict the consequence of the bug either,

> assuming a bug could only cause packet drop is underestimated.

>

> 2) Configurations: For instance, firewall rules. If the selftests are run

> in a weird firewall setup which drops all UDP packets, there is nothing

> we can do in the test itself. If we have to detect this, then we would

> have to detect netem cases too where packets can be held indefinitely

> or reordered arbitrarily. The possibilities here are too many to detect,

> hence I argue the selftests should setup its own non-hostile environment,

> which has nothing to do with any specific program.

>

> This is why I ask you to draw a boundary: what we can assume and

> what we can't. My boundary is obviously clear: we just assume the

> environment is non-hostile and we can't predict any kernel bugs,

> nor their consequences.


It doesn't matter. Instead of having an endless loop, please add a
counter and limit the number of iterations to some reasonable number.
That's all, no one is asking you to do something impossible, just
prevent looping forever in some unforeseen situations and just fail
the test instead.

>

> Thanks.
Cong Wang May 21, 2021, 9:49 p.m. UTC | #3
On Thu, May 20, 2021 at 10:12 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
> It doesn't matter. Instead of having an endless loop, please add a

> counter and limit the number of iterations to some reasonable number.

> That's all, no one is asking you to do something impossible, just

> prevent looping forever in some unforeseen situations and just fail

> the test instead.


Well, in a non-hostile environment, the packet will not be dropped
as it is sent via a loopback, hence the loop is not endless.

"my memcg kicks in for some unknown reason" is pretty much
a hostile environment, for which we should not consider. The counter
argument is simply not to run selftests in such a hostile environment.

Thanks.
Andrii Nakryiko May 22, 2021, 12:12 a.m. UTC | #4
On Fri, May 21, 2021 at 12:31 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, May 20, 2021 at 10:47 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > On Wed, May 19, 2021 at 2:56 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > > >
> > > > Cong Wang wrote:
> > > > > From: Cong Wang <cong.wang@bytedance.com>
> > > > >
> > > > > We use non-blocking sockets for testing sockmap redirections,
> > > > > and got some random EAGAIN errors from UDP tests.
> > > > >
> > > > > There is no guarantee the packet would be immediately available
> > > > > to receive as soon as it is sent out, even on the local host.
> > > > > For UDP, this is especially true because it does not lock the
> > > > > sock during BH (unlike the TCP path). This is probably why we
> > > > > only saw this error in UDP cases.
> > > > >
> > > > > No matter how hard we try to make the queue empty check accurate,
> > > > > it is always possible for recvmsg() to beat ->sk_data_ready().
> > > > > Therefore, we should just retry in case of EAGAIN.
> > > > >
> > > > > Fixes: d6378af615275 ("selftests/bpf: Add a test case for udp sockmap")
> > > > > Reported-by: Jiang Wang <jiang.wang@bytedance.com>
> > > > > Cc: John Fastabend <john.fastabend@gmail.com>
> > > > > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > > > > Cc: Jakub Sitnicki <jakub@cloudflare.com>
> > > > > Cc: Lorenz Bauer <lmb@cloudflare.com>
> > > > > Signed-off-by: Cong Wang <cong.wang@bytedance.com>
> > > > > ---
> > > > >  tools/testing/selftests/bpf/prog_tests/sockmap_listen.c | 6 +++++-
> > > > >  1 file changed, 5 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > index 648d9ae898d2..b1ed182c4720 100644
> > > > > --- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
> > > > > @@ -1686,9 +1686,13 @@ static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
> > > > >       if (pass != 1)
> > > > >               FAIL("%s: want pass count 1, have %d", log_prefix, pass);
> > > > >
> > > > > +again:
> > > > >       n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
> > > > > -     if (n < 0)
> > > > > +     if (n < 0) {
> > > > > +             if (errno == EAGAIN)
> > > > > +                     goto again;
> > > > >               FAIL_ERRNO("%s: read", log_prefix);
> > > >
> > > > Needs a counter and abort logic we don't want to loop forever in the
> > > > case the packet is lost.
> > >
> > > It should not be lost because selftests must be self-contained,
> > > if the selftests could not even predict whether its own packet is
> > > lost or not, we would have a much bigger trouble than just this
> > > infinite loop.
> > >
> > > Thanks.
> >
> > Add the retry counter its maybe 4 lines of code. When I run this in a container
>
> Sure, then the next question is how many times are you going to retry?
> Let's say, 10. Then the next question would be is 10 sufficient? Clearly
> not, because if the packet can be dropped (let's say by firewall), it can
> be delayed too (let's say by netem).

10 is fine. If something unexpected happens (whatever hostility of the
environment), getting stuck is much-much worse than erroring out. So
please add the counter and be done with it.

>
> Really, are we going to handle all of such cases? Or if we simply
> assume the environment is self-contained and not hostile, none of
> the above would happen hence nothing needs to be checked.

We have many different environments in which selftests are running. We
shouldn't have an infinite loop in any of them, even if some selftests
can't succeed in some of them. Not in all environments it is possible
to do Ctrl-C (e.g., CIs).

>
> > and my memcg kicks in for some unknown reason I don't want it to loop
> > forever I want it to fail gracefully. Plus it just looks bad to encode
> > a non-terminating loop, in my opinion, so I want the counter.
>
> There could be hundreds of reasons to drop a packet, or delay a
> packet. How many of them do you want to consider and how many
> of them do you not consider? Please draw a boundary.
>
> The boundary I drew is very clear: we just assume the selftests

See above, the minimum bar is that selftest shouldn't get stuck, no matter what.

> environment is not hostile. In fact, I believe selftests should setup
> such an environment before running any test, for example, by creating
> a container. And I believe this would make everyone happy.
>
> Thanks.
Jakub Sitnicki June 3, 2021, 8:10 a.m. UTC | #5
On Thu, May 20, 2021 at 10:44 PM CEST, Cong Wang wrote:
> On Thu, May 20, 2021 at 1:14 PM Andrii Nakryiko

> <andrii.nakryiko@gmail.com> wrote:

>>

>> Bugs do happen though, so if you can detect some error condition

>> instead of having an infinite loop, then do it.

>

> You both are underestimating the problem. There are two different things

> to consider here:

>

> 1) Kernel bugs: This is known unknown, we certainly do not know

> how many bugs we have, otherwise they would have been fixed

> already. So we can not predict the consequence of the bug either,

> assuming a bug could only cause packet drop is underestimated.

>

> 2) Configurations: For instance, firewall rules. If the selftests are run

> in a weird firewall setup which drops all UDP packets, there is nothing

> we can do in the test itself. If we have to detect this, then we would

> have to detect netem cases too where packets can be held indefinitely

> or reordered arbitrarily. The possibilities here are too many to detect,

> hence I argue the selftests should setup its own non-hostile environment,

> which has nothing to do with any specific program.

>

> This is why I ask you to draw a boundary: what we can assume and

> what we can't. My boundary is obviously clear: we just assume the

> environment is non-hostile and we can't predict any kernel bugs,

> nor their consequences.

>

> Thanks.


(Sorry for the delay in reviews. I've been out.)

In my mind uAPI tests should not be tailored to the underlying
implementation (non-blocking read after write over loopback succeeds for
TCP), or the environment they run in (packets don't get dropped due to
OOM, signals don't interrupt syscalls).

If it's a non-blocking socket, then EAGAIN can happen. That's the
contract between the kernel and the user-space.

There is already a helper in this test case for polling and reading with
a timeout (see recv_timeout()). IMO we should be using it in all tests
that use non-blocking I/O.

If it's not being used already, that is most likely my fault.
diff mbox series

Patch

diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
index 648d9ae898d2..b1ed182c4720 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_listen.c
@@ -1686,9 +1686,13 @@  static void udp_redir_to_connected(int family, int sotype, int sock_mapfd,
 	if (pass != 1)
 		FAIL("%s: want pass count 1, have %d", log_prefix, pass);
 
+again:
 	n = read(mode == REDIR_INGRESS ? p0 : c0, &b, 1);
-	if (n < 0)
+	if (n < 0) {
+		if (errno == EAGAIN)
+			goto again;
 		FAIL_ERRNO("%s: read", log_prefix);
+	}
 	if (n == 0)
 		FAIL("%s: incomplete read", log_prefix);