diff mbox series

[bpf-next,4/8] bpf: add PROG_TEST_RUN support for sk_lookup programs

Message ID 20210216105713.45052-5-lmb@cloudflare.com
State New
Headers show
Series PROG_TEST_RUN support for sk_lookup programs | expand

Commit Message

Lorenz Bauer Feb. 16, 2021, 10:57 a.m. UTC
Allow to pass (multiple) sk_lookup programs to PROG_TEST_RUN.
User space provides the full bpf_sk_lookup struct as context.
Since the context includes a socket pointer that can't be exposed
to user space we define that PROG_TEST_RUN returns the cookie
of the selected socket or zero in place of the socket pointer.

We don't support testing programs that select a reuseport socket,
since this would mean running another (unrelated) BPF program
from the sk_lookup test handler.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 include/linux/bpf.h            | 10 ++++
 include/uapi/linux/bpf.h       |  5 +-
 net/bpf/test_run.c             | 93 ++++++++++++++++++++++++++++++++++
 net/core/filter.c              |  1 +
 tools/include/uapi/linux/bpf.h |  5 +-
 5 files changed, 112 insertions(+), 2 deletions(-)

Comments

Alexei Starovoitov Feb. 23, 2021, 1:11 a.m. UTC | #1
On Tue, Feb 16, 2021 at 10:57:09AM +0000, Lorenz Bauer wrote:
> +	user_ctx = bpf_ctx_init(kattr, sizeof(*user_ctx));

> +	if (IS_ERR(user_ctx))

> +		return PTR_ERR(user_ctx);

> +

> +	if (!user_ctx)

> +		return -EINVAL;

> +

> +	if (user_ctx->sk)

> +		goto out;

> +

> +	if (!range_is_zero(user_ctx, offsetofend(typeof(*user_ctx), local_port), sizeof(*user_ctx)))

> +		goto out;

> +

> +	if (user_ctx->local_port > U16_MAX || user_ctx->remote_port > U16_MAX) {

> +		ret = -ERANGE;

> +		goto out;

> +	}

> +

> +	ctx.family = user_ctx->family;

> +	ctx.protocol = user_ctx->protocol;

> +	ctx.dport = user_ctx->local_port;

> +	ctx.sport = user_ctx->remote_port;

> +

> +	switch (ctx.family) {

> +	case AF_INET:

> +		ctx.v4.daddr = user_ctx->local_ip4;

> +		ctx.v4.saddr = user_ctx->remote_ip4;

> +		break;

> +

> +#if IS_ENABLED(CONFIG_IPV6)

> +	case AF_INET6:

> +		ctx.v6.daddr = (struct in6_addr *)user_ctx->local_ip6;

> +		ctx.v6.saddr = (struct in6_addr *)user_ctx->remote_ip6;

> +		break;

> +#endif

> +

> +	default:

> +		ret = -EAFNOSUPPORT;

> +		goto out;

> +	}

> +

> +	while (t_check(&t, repeat, &ret, &duration)) {

> +		ctx.selected_sk = NULL;

> +		retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, BPF_PROG_RUN);

> +	}

> +

> +	if (ret < 0)

> +		goto out;

> +

> +	user_ctx->cookie = 0;

> +	if (ctx.selected_sk) {

> +		if (ctx.selected_sk->sk_reuseport && !ctx.no_reuseport) {

> +			ret = -EOPNOTSUPP;

> +			goto out;

> +		}

> +

> +		user_ctx->cookie = sock_gen_cookie(ctx.selected_sk);

> +	}


I'm struggling to come up with the case where running N sk_lookup progs
like this cannot be done with running them one by one.
It looks to me that this N prog_fds api is not really about running and
testing the progs, but about testing BPF_PROG_SK_LOOKUP_RUN_ARRAY()
SK_PASS vs SK_DROP logic.
So it's more of the kernel infra testing than program testing.
Are you suggesting that the sequence of sk_lookup progs are so delicate
that they are aware of each other and _has_ to be tested together
with gluing logic that the macro provides?
But if it is so then testing the progs one by one would be better,
because test_run will be able to check each individual prog return code
instead of implicit BPF_PROG_SK_LOOKUP_RUN_ARRAY logic.
It feels less of the unit test and more as a full stack test,
but if so then lack of cookie on input is questionable.
The progs can only examine port/ip/family data.
So testing them individually would give more accurate picture on
what progs are doing and potential bugs can be found sooner than
testing the sequence of progs. In a sequence one prog could have
been buggy, but the final cookie came out correct.

Looking at patch 7 it seems the unit test framework will be comparing
the cookies for your production tests, but then nentns argument
in the cover letter is suprising. If the tests are run in the init_netns
then selected sockets will be just as special as in patch 7.
So it's not a full stack kinda test.

In other words I'm struggling with in-between state of the api.
test_run with N fds is not really a full test, but not a unit test either.
Lorenz Bauer Feb. 23, 2021, 10:10 a.m. UTC | #2
On Tue, 23 Feb 2021 at 01:11, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>

> I'm struggling to come up with the case where running N sk_lookup progs

> like this cannot be done with running them one by one.

> It looks to me that this N prog_fds api is not really about running and

> testing the progs, but about testing BPF_PROG_SK_LOOKUP_RUN_ARRAY()

> SK_PASS vs SK_DROP logic.


In a way that is true, yes. TBH I figured that my patch set would be
rejected if I just
implemented single program test run, since it doesn't allow exercising the full
sk_lookup test run semantics.

> So it's more of the kernel infra testing than program testing.

> Are you suggesting that the sequence of sk_lookup progs are so delicate

> that they are aware of each other and _has_ to be tested together

> with gluing logic that the macro provides?


We currently don't have a case like that.

> But if it is so then testing the progs one by one would be better,

> because test_run will be able to check each individual prog return code

> instead of implicit BPF_PROG_SK_LOOKUP_RUN_ARRAY logic.


That means emulating the kind of subtle BPF_PROG_SK_LOOKUP_RUN_ARRAY
in user space, which isn't trivial and a source of bugs.

For example we rely on having multiple programs attached when
"upgrading" from old to new BPF. Here we care mostly that we don't drop
lookups on the floor, and the behaviour is tightly coupled to the in-kernel
implementation. It's not much use to cobble up my own implementation of
SK_LOOKUP_RUN_ARRAY here, I would rather use multi progs to test this.
Of course we can also already spawn a netns and test it that way, so not
much is lost if there is no multi prog test run.

> It feels less of the unit test and more as a full stack test,

> but if so then lack of cookie on input is questionable.


I'm not sure what you mean with "the lack of cookie on input is
questionable", can you rephrase?

> In other words I'm struggling with in-between state of the api.

> test_run with N fds is not really a full test, but not a unit test either.


If I understand you correctly, a "full" API would expose the
intermediate results from
individual programs as well as the final selection? Sounds quite
complicated, and as
you point out most of the benefits can be had from running single programs.

I'm happy to drop the multiple programs bit, like I mentioned I did it
for completeness sake.
I care about being able to test or benchmark a single sk_lookup program.

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com
Alexei Starovoitov Feb. 24, 2021, 6:11 a.m. UTC | #3
On Tue, Feb 23, 2021 at 10:10:44AM +0000, Lorenz Bauer wrote:
> On Tue, 23 Feb 2021 at 01:11, Alexei Starovoitov

> <alexei.starovoitov@gmail.com> wrote:

> >

> > I'm struggling to come up with the case where running N sk_lookup progs

> > like this cannot be done with running them one by one.

> > It looks to me that this N prog_fds api is not really about running and

> > testing the progs, but about testing BPF_PROG_SK_LOOKUP_RUN_ARRAY()

> > SK_PASS vs SK_DROP logic.

> 

> In a way that is true, yes. TBH I figured that my patch set would be

> rejected if I just

> implemented single program test run, since it doesn't allow exercising the full

> sk_lookup test run semantics.

> 

> > So it's more of the kernel infra testing than program testing.

> > Are you suggesting that the sequence of sk_lookup progs are so delicate

> > that they are aware of each other and _has_ to be tested together

> > with gluing logic that the macro provides?

> 

> We currently don't have a case like that.

> 

> > But if it is so then testing the progs one by one would be better,

> > because test_run will be able to check each individual prog return code

> > instead of implicit BPF_PROG_SK_LOOKUP_RUN_ARRAY logic.

> 

> That means emulating the kind of subtle BPF_PROG_SK_LOOKUP_RUN_ARRAY

> in user space, which isn't trivial and a source of bugs.


I'm not at all suggesting to emulate it in user space.

> For example we rely on having multiple programs attached when

> "upgrading" from old to new BPF. Here we care mostly that we don't drop

> lookups on the floor, and the behaviour is tightly coupled to the in-kernel

> implementation. It's not much use to cobble up my own implementation of

> SK_LOOKUP_RUN_ARRAY here, I would rather use multi progs to test this.

> Of course we can also already spawn a netns and test it that way, so not

> much is lost if there is no multi prog test run.


I mean that to test the whole setup close to production the netns is
probably needed because sockets would mess with init_netns.
But to test each individual bpf prog there is no need for RUN_ARRAY.
Each prog can be more accurately tested in isolation.
RUN_ARRAY adds, as you said, subtle details of RUN_ARRAY macro.

> > It feels less of the unit test and more as a full stack test,

> > but if so then lack of cookie on input is questionable.

> 

> I'm not sure what you mean with "the lack of cookie on input is

> questionable", can you rephrase?

> 

> > In other words I'm struggling with in-between state of the api.

> > test_run with N fds is not really a full test, but not a unit test either.

> 

> If I understand you correctly, a "full" API would expose the

> intermediate results from

> individual programs as well as the final selection? Sounds quite

> complicated, and as

> you point out most of the benefits can be had from running single programs.


I'm not suggesting to return intermediate results either.
I'm looking at test_run as a facility to test one individual program
at a time. Like in tc, cgroups, tracing we can have multiple progs
attached to one place and the final verdict will depend on what
each prog is returning. But there is no need to test them all together
through BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY.
Each prog is more accurately validated independently.
Hence I'm puzzled why sk_lookup's RUN_ARRAY is special.
Its drop/pass/selected sk is more or less the same complexity
as CGROUP_INET_EGRESS_RUN_ARRAY.
diff mbox series

Patch

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 67c21c8ba7cc..d251db1354ec 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1472,6 +1472,9 @@  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
 			     const union bpf_attr *kattr,
 			     union bpf_attr __user *uattr);
+int bpf_prog_test_run_sk_lookup(struct bpf_prog_array *progs,
+				const union bpf_attr *kattr,
+				union bpf_attr __user *uattr);
 bool btf_ctx_access(int off, int size, enum bpf_access_type type,
 		    const struct bpf_prog *prog,
 		    struct bpf_insn_access_aux *info);
@@ -1672,6 +1675,13 @@  static inline int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	return -ENOTSUPP;
 }
 
+static inline int bpf_prog_test_run_sk_lookup(struct bpf_prog_array *progs,
+					      const union bpf_attr *kattr,
+					      union bpf_attr __user *uattr)
+{
+	return -ENOTSUPP;
+}
+
 static inline void bpf_map_put(struct bpf_map *map)
 {
 }
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b37a0f39b95f..078ad0b8d1a7 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5210,7 +5210,10 @@  struct bpf_pidns_info {
 
 /* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
 struct bpf_sk_lookup {
-	__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+	union {
+		__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+		__u64 cookie; /* Non-zero if socket was selected in PROG_TEST_RUN */
+	};
 
 	__u32 family;		/* Protocol family (AF_INET, AF_INET6) */
 	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 33bd2f67e259..932c8e036b0a 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -10,8 +10,10 @@ 
 #include <net/bpf_sk_storage.h>
 #include <net/sock.h>
 #include <net/tcp.h>
+#include <net/net_namespace.h>
 #include <linux/error-injection.h>
 #include <linux/smp.h>
+#include <linux/sock_diag.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/bpf_test_run.h>
@@ -781,3 +783,94 @@  int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	kfree(data);
 	return ret;
 }
+
+int bpf_prog_test_run_sk_lookup(struct bpf_prog_array *progs, const union bpf_attr *kattr,
+				union bpf_attr __user *uattr)
+{
+	struct test_timer t = { NO_PREEMPT };
+	struct bpf_sk_lookup_kern ctx = {};
+	u32 repeat = kattr->test.repeat;
+	struct bpf_sk_lookup *user_ctx;
+	u32 retval, duration;
+	int ret = -EINVAL;
+
+	if (bpf_prog_array_length(progs) >= BPF_SK_LOOKUP_MAX_PROGS)
+		return -E2BIG;
+
+	if (kattr->test.flags || kattr->test.cpu)
+		return -EINVAL;
+
+	if (kattr->test.data_in || kattr->test.data_size_in || kattr->test.data_out ||
+	    kattr->test.data_size_out)
+		return -EINVAL;
+
+	if (!repeat)
+		repeat = 1;
+
+	user_ctx = bpf_ctx_init(kattr, sizeof(*user_ctx));
+	if (IS_ERR(user_ctx))
+		return PTR_ERR(user_ctx);
+
+	if (!user_ctx)
+		return -EINVAL;
+
+	if (user_ctx->sk)
+		goto out;
+
+	if (!range_is_zero(user_ctx, offsetofend(typeof(*user_ctx), local_port), sizeof(*user_ctx)))
+		goto out;
+
+	if (user_ctx->local_port > U16_MAX || user_ctx->remote_port > U16_MAX) {
+		ret = -ERANGE;
+		goto out;
+	}
+
+	ctx.family = user_ctx->family;
+	ctx.protocol = user_ctx->protocol;
+	ctx.dport = user_ctx->local_port;
+	ctx.sport = user_ctx->remote_port;
+
+	switch (ctx.family) {
+	case AF_INET:
+		ctx.v4.daddr = user_ctx->local_ip4;
+		ctx.v4.saddr = user_ctx->remote_ip4;
+		break;
+
+#if IS_ENABLED(CONFIG_IPV6)
+	case AF_INET6:
+		ctx.v6.daddr = (struct in6_addr *)user_ctx->local_ip6;
+		ctx.v6.saddr = (struct in6_addr *)user_ctx->remote_ip6;
+		break;
+#endif
+
+	default:
+		ret = -EAFNOSUPPORT;
+		goto out;
+	}
+
+	while (t_check(&t, repeat, &ret, &duration)) {
+		ctx.selected_sk = NULL;
+		retval = BPF_PROG_SK_LOOKUP_RUN_ARRAY(progs, ctx, BPF_PROG_RUN);
+	}
+
+	if (ret < 0)
+		goto out;
+
+	user_ctx->cookie = 0;
+	if (ctx.selected_sk) {
+		if (ctx.selected_sk->sk_reuseport && !ctx.no_reuseport) {
+			ret = -EOPNOTSUPP;
+			goto out;
+		}
+
+		user_ctx->cookie = sock_gen_cookie(ctx.selected_sk);
+	}
+
+	ret = bpf_test_finish(kattr, uattr, NULL, 0, retval, duration);
+	if (!ret)
+		ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx));
+
+out:
+	kfree(user_ctx);
+	return ret;
+}
diff --git a/net/core/filter.c b/net/core/filter.c
index 7059cf604d94..978cea941268 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10451,6 +10451,7 @@  static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
 }
 
 const struct bpf_prog_ops sk_lookup_prog_ops = {
+	.test_run_array = bpf_prog_test_run_sk_lookup,
 };
 
 const struct bpf_verifier_ops sk_lookup_verifier_ops = {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b37a0f39b95f..078ad0b8d1a7 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5210,7 +5210,10 @@  struct bpf_pidns_info {
 
 /* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
 struct bpf_sk_lookup {
-	__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+	union {
+		__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+		__u64 cookie; /* Non-zero if socket was selected in PROG_TEST_RUN */
+	};
 
 	__u32 family;		/* Protocol family (AF_INET, AF_INET6) */
 	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */