diff mbox series

[bpf-next,v2,1/5] bpf: expose is_mptcp flag to bpf_tcp_sock

Message ID 20200911143022.414783-1-nicolas.rybowski@tessares.net
State Superseded
Headers show
Series [bpf-next,v2,1/5] bpf: expose is_mptcp flag to bpf_tcp_sock | expand

Commit Message

Nicolas Rybowski Sept. 11, 2020, 2:30 p.m. UTC
is_mptcp is a field from struct tcp_sock used to indicate that the
current tcp_sock is part of the MPTCP protocol.

In this protocol, a first socket (mptcp_sock) is created with
sk_protocol set to IPPROTO_MPTCP (=262) for control purpose but it
isn't directly on the wire. This is the role of the subflow (kernel)
sockets which are classical tcp_sock with sk_protocol set to
IPPROTO_TCP. The only way to differentiate such sockets from plain TCP
sockets is the is_mptcp field from tcp_sock.

Such an exposure in BPF is thus required to be able to differentiate
plain TCP sockets from MPTCP subflow sockets in BPF_PROG_TYPE_SOCK_OPS
programs.

The choice has been made to silently pass the case when CONFIG_MPTCP is
unset by defaulting is_mptcp to 0 in order to make BPF independent of
the MPTCP configuration. Another solution is to make the verifier fail
in 'bpf_tcp_sock_is_valid_ctx_access' but this will add an additional
'#ifdef CONFIG_MPTCP' in the BPF code and a same injected BPF program
will not run if MPTCP is not set.

An example use-case is provided in
https://github.com/multipath-tcp/mptcp_net-next/tree/scripts/bpf/examples

Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>
---
 include/uapi/linux/bpf.h       | 1 +
 net/core/filter.c              | 9 ++++++++-
 tools/include/uapi/linux/bpf.h | 1 +
 3 files changed, 10 insertions(+), 1 deletion(-)

Comments

Nicolas Rybowski Sept. 11, 2020, 2:30 p.m. UTC | #1
Previously it was not possible to make a distinction between plain TCP
sockets and MPTCP subflow sockets on the BPF_PROG_TYPE_SOCK_OPS hook.

This patch series now enables a fine control of subflow sockets. In its
current state, it allows to put different sockopt on each subflow from a
same MPTCP connection (socket mark, TCP congestion algorithm, ...) using
BPF programs.

It should also be the basis of exposing MPTCP-specific fields through BPF.

v1 -> v2:
- add basic mandatory selftests for the new helper and is_mptcp field (Alexei)
- rebase on latest bpf-next

Nicolas Rybowski (5):
  bpf: expose is_mptcp flag to bpf_tcp_sock
  mptcp: attach subflow socket to parent cgroup
  bpf: add 'bpf_mptcp_sock' structure and helper
  bpf: selftests: add MPTCP test base
  bpf: selftests: add bpf_mptcp_sock() verifier tests

 include/linux/bpf.h                           |  33 +++++
 include/uapi/linux/bpf.h                      |  15 +++
 kernel/bpf/verifier.c                         |  30 +++++
 net/core/filter.c                             |  13 +-
 net/mptcp/Makefile                            |   2 +
 net/mptcp/bpf.c                               |  72 +++++++++++
 net/mptcp/subflow.c                           |  27 ++++
 scripts/bpf_helpers_doc.py                    |   2 +
 tools/include/uapi/linux/bpf.h                |  15 +++
 tools/testing/selftests/bpf/config            |   1 +
 tools/testing/selftests/bpf/network_helpers.c |  37 +++++-
 tools/testing/selftests/bpf/network_helpers.h |   3 +
 .../testing/selftests/bpf/prog_tests/mptcp.c  | 119 ++++++++++++++++++
 tools/testing/selftests/bpf/progs/mptcp.c     |  48 +++++++
 tools/testing/selftests/bpf/verifier/sock.c   |  63 ++++++++++
 15 files changed, 474 insertions(+), 6 deletions(-)
 create mode 100644 net/mptcp/bpf.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/mptcp.c
 create mode 100644 tools/testing/selftests/bpf/progs/mptcp.c
Song Liu Sept. 14, 2020, 6:08 p.m. UTC | #2
On Fri, Sep 11, 2020 at 9:46 AM Nicolas Rybowski
<nicolas.rybowski@tessares.net> wrote:
>
> This patch adds verifier side tests for the new bpf_mptcp_sock() helper.
>
> Here are the new tests :
> - NULL bpf_sock is correctly handled
> - We cannot access a field from bpf_mptcp_sock if the latter is NULL
> - We can access a field from bpf_mptcp_sock if the latter is not NULL
> - We cannot modify a field from bpf_mptcp_sock.
>
> Note that "token" is currently the only field in bpf_mptcp_sock.
>
> Currently, there is no easy way to test the token field since we cannot
> get back the mptcp_sock in userspace, this could be a future amelioration.
>
> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>
> Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>

Acked-by: Song Liu <songliubraving@fb.com>
Song Liu Sept. 14, 2020, 6:20 p.m. UTC | #3
On Fri, Sep 11, 2020 at 8:07 AM Nicolas Rybowski
<nicolas.rybowski@tessares.net> wrote:
>

> is_mptcp is a field from struct tcp_sock used to indicate that the

> current tcp_sock is part of the MPTCP protocol.

>

> In this protocol, a first socket (mptcp_sock) is created with

> sk_protocol set to IPPROTO_MPTCP (=262) for control purpose but it

> isn't directly on the wire. This is the role of the subflow (kernel)

> sockets which are classical tcp_sock with sk_protocol set to

> IPPROTO_TCP. The only way to differentiate such sockets from plain TCP

> sockets is the is_mptcp field from tcp_sock.

>

> Such an exposure in BPF is thus required to be able to differentiate

> plain TCP sockets from MPTCP subflow sockets in BPF_PROG_TYPE_SOCK_OPS

> programs.

>

> The choice has been made to silently pass the case when CONFIG_MPTCP is

> unset by defaulting is_mptcp to 0 in order to make BPF independent of

> the MPTCP configuration. Another solution is to make the verifier fail

> in 'bpf_tcp_sock_is_valid_ctx_access' but this will add an additional

> '#ifdef CONFIG_MPTCP' in the BPF code and a same injected BPF program

> will not run if MPTCP is not set.

>

> An example use-case is provided in

> https://github.com/multipath-tcp/mptcp_net-next/tree/scripts/bpf/examples

>

> Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>

> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>

> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

> Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>

> ---

>  include/uapi/linux/bpf.h       | 1 +

>  net/core/filter.c              | 9 ++++++++-

>  tools/include/uapi/linux/bpf.h | 1 +

>  3 files changed, 10 insertions(+), 1 deletion(-)

>

> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> index 7dd314176df7..7d179eada1c3 100644

> --- a/include/uapi/linux/bpf.h

> +++ b/include/uapi/linux/bpf.h

> @@ -4060,6 +4060,7 @@ struct bpf_tcp_sock {

>         __u32 delivered;        /* Total data packets delivered incl. rexmits */

>         __u32 delivered_ce;     /* Like the above but only ECE marked packets */

>         __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */

> +       __u32 is_mptcp;         /* Is MPTCP subflow? */


Shall we have an __u32 flags, and make is_mptcp a bit of it?

Thanks,
Song
[...]
Song Liu Sept. 14, 2020, 9 p.m. UTC | #4
On Fri, Sep 11, 2020 at 8:15 AM Nicolas Rybowski
<nicolas.rybowski@tessares.net> wrote:
>

> In order to precisely identify the parent MPTCP connection of a subflow,

> it is required to access the mptcp_sock's token which uniquely identify a

> MPTCP connection.

>

> This patch adds a new structure 'bpf_mptcp_sock' exposing the 'token' field

> of the 'mptcp_sock' extracted from a subflow's 'tcp_sock'. It also adds the

> declaration of a new BPF helper of the same name to expose the newly

> defined structure in the userspace BPF API.

>

> This is the foundation to expose more MPTCP-specific fields through BPF.

>

> Currently, it is limited to the field 'token' of the msk but it is

> easily extensible.

>

> Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>

> Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

> Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>


Acked-by: Song Liu <songliubraving@fb.com>
Andrii Nakryiko Sept. 14, 2020, 10:13 p.m. UTC | #5
On Mon, Sep 14, 2020 at 11:21 AM Song Liu <song@kernel.org> wrote:
>

> On Fri, Sep 11, 2020 at 8:07 AM Nicolas Rybowski

> <nicolas.rybowski@tessares.net> wrote:

> >

> > is_mptcp is a field from struct tcp_sock used to indicate that the

> > current tcp_sock is part of the MPTCP protocol.

> >

> > In this protocol, a first socket (mptcp_sock) is created with

> > sk_protocol set to IPPROTO_MPTCP (=262) for control purpose but it

> > isn't directly on the wire. This is the role of the subflow (kernel)

> > sockets which are classical tcp_sock with sk_protocol set to

> > IPPROTO_TCP. The only way to differentiate such sockets from plain TCP

> > sockets is the is_mptcp field from tcp_sock.

> >

> > Such an exposure in BPF is thus required to be able to differentiate

> > plain TCP sockets from MPTCP subflow sockets in BPF_PROG_TYPE_SOCK_OPS

> > programs.

> >

> > The choice has been made to silently pass the case when CONFIG_MPTCP is

> > unset by defaulting is_mptcp to 0 in order to make BPF independent of

> > the MPTCP configuration. Another solution is to make the verifier fail

> > in 'bpf_tcp_sock_is_valid_ctx_access' but this will add an additional

> > '#ifdef CONFIG_MPTCP' in the BPF code and a same injected BPF program

> > will not run if MPTCP is not set.

> >

> > An example use-case is provided in

> > https://github.com/multipath-tcp/mptcp_net-next/tree/scripts/bpf/examples

> >

> > Suggested-by: Matthieu Baerts <matthieu.baerts@tessares.net>

> > Acked-by: Matthieu Baerts <matthieu.baerts@tessares.net>

> > Acked-by: Mat Martineau <mathew.j.martineau@linux.intel.com>

> > Signed-off-by: Nicolas Rybowski <nicolas.rybowski@tessares.net>

> > ---

> >  include/uapi/linux/bpf.h       | 1 +

> >  net/core/filter.c              | 9 ++++++++-

> >  tools/include/uapi/linux/bpf.h | 1 +

> >  3 files changed, 10 insertions(+), 1 deletion(-)

> >

> > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h

> > index 7dd314176df7..7d179eada1c3 100644

> > --- a/include/uapi/linux/bpf.h

> > +++ b/include/uapi/linux/bpf.h

> > @@ -4060,6 +4060,7 @@ struct bpf_tcp_sock {

> >         __u32 delivered;        /* Total data packets delivered incl. rexmits */

> >         __u32 delivered_ce;     /* Like the above but only ECE marked packets */

> >         __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */

> > +       __u32 is_mptcp;         /* Is MPTCP subflow? */

>

> Shall we have an __u32 flags, and make is_mptcp a bit of it?

>


Bitfields are slow and more annoying to rewrite in verifier, so having
an __u32 field is actually good.

> Thanks,

> Song

> [...]
diff mbox series

Patch

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7dd314176df7..7d179eada1c3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4060,6 +4060,7 @@  struct bpf_tcp_sock {
 	__u32 delivered;	/* Total data packets delivered incl. rexmits */
 	__u32 delivered_ce;	/* Like the above but only ECE marked packets */
 	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
+	__u32 is_mptcp;		/* Is MPTCP subflow? */
 };
 
 struct bpf_sock_tuple {
diff --git a/net/core/filter.c b/net/core/filter.c
index d266c6941967..dab48528dceb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5837,7 +5837,7 @@  bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type,
 				  struct bpf_insn_access_aux *info)
 {
 	if (off < 0 || off >= offsetofend(struct bpf_tcp_sock,
-					  icsk_retransmits))
+					  is_mptcp))
 		return false;
 
 	if (off % size != 0)
@@ -5971,6 +5971,13 @@  u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type,
 	case offsetof(struct bpf_tcp_sock, icsk_retransmits):
 		BPF_INET_SOCK_GET_COMMON(icsk_retransmits);
 		break;
+	case offsetof(struct bpf_tcp_sock, is_mptcp):
+#ifdef CONFIG_MPTCP
+		BPF_TCP_SOCK_GET_COMMON(is_mptcp);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7dd314176df7..7d179eada1c3 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4060,6 +4060,7 @@  struct bpf_tcp_sock {
 	__u32 delivered;	/* Total data packets delivered incl. rexmits */
 	__u32 delivered_ce;	/* Like the above but only ECE marked packets */
 	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
+	__u32 is_mptcp;		/* Is MPTCP subflow? */
 };
 
 struct bpf_sock_tuple {