From patchwork Fri Sep 11 14:30:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Rybowski X-Patchwork-Id: 261055 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A9F2C43461 for ; Fri, 11 Sep 2020 16:37:27 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EA346221EB for ; Fri, 11 Sep 2020 16:37:26 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=tessares-net.20150623.gappssmtp.com header.i=@tessares-net.20150623.gappssmtp.com header.b="jkpy5S66" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726436AbgIKQhX (ORCPT ); Fri, 11 Sep 2020 12:37:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59502 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726180AbgIKPNS (ORCPT ); Fri, 11 Sep 2020 11:13:18 -0400 Received: from mail-ed1-x543.google.com (mail-ed1-x543.google.com [IPv6:2a00:1450:4864:20::543]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F32B3C06135C for ; Fri, 11 Sep 2020 07:31:00 -0700 (PDT) Received: by mail-ed1-x543.google.com with SMTP id c10so10183779edk.6 for ; Fri, 11 Sep 2020 07:31:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tessares-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=7mgZuX+I018CPLyKtPUh7k/9HodG7wIMhXXoeD6+JKU=; b=jkpy5S66o1feRhrtlzl6PzU1fYz4TE/zZjvhcBHu8qQD+l2uP0ug6wbXuJK9XU4+zS Ml+05QzSwfORDdn+2bQvu4VdPK2hc1MOc+CNLNakaj+5eG4vOqQ+6C4yyw6O78nMtoDL fYu9Skcyyh6a42fCJ70NDglzYCpuwLP8yZDm3HKrrSLSbHoGhva5SKerC7dFzpZNzzWf QXwz7BMUIkVhL2MFmhQEV8f0wCTcrqy83/EsGthbVOTMtERTPPy0pauQg8lD+Eyvz4U8 e+xRRDyuXs0jNQBiTm6Ybs236Zx5Na+0jiuaPpAphBmFuJAz0xli0/wSs3jTZHX11Ryr XlUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=7mgZuX+I018CPLyKtPUh7k/9HodG7wIMhXXoeD6+JKU=; b=PExyGcBhTqwjLvijZJT2zKDnv+O4bJ0FZs+ycZP7ECRsnzYIDR/24Z4FazaoJiLTIZ GUHZoT8v+Uz8NRiYEJNgAhpuuRAzrKheyBRChESpV7uuuDGiBrK473m44rQqjvDHjbxu xAQoZEtuCjFbudgQ3Izrpomv19aMiG/q5sXBM6nF+EFhPbGzLi8BxWTdeh5cigslrlMY odM9pc6u1PCeyuNz2MeBW/PO+tShwS7qiDjukheTs2hWfsw6+W2K805uwwUIl0GqxdxK eF2O89pnkfl7foOfPGfAm6vbQHFLlyMIugBqaz2ko65VvdNHbzwN9qqEbUiolfWbK6Pd ub4Q== X-Gm-Message-State: AOAM532IIAjgCFlnCjHt6MEP3eA3jeVvHVyWGg1Mw4WWVTOB2NBzdjEy ixiNujcoGCjCRjN+sOLk/1wJ/Q== X-Google-Smtp-Source: ABdhPJwhmYbA1MMaXCK6kWdSVFv5YuQOmbMnYfXF79rYVkrYaDq+fAD+mllnm22XYjlgKXbocLQ7ng== X-Received: by 2002:aa7:d29a:: with SMTP id w26mr2271422edq.106.1599834659332; Fri, 11 Sep 2020 07:30:59 -0700 (PDT) Received: from localhost.localdomain ([87.66.33.240]) by smtp.gmail.com with ESMTPSA id y21sm1716261eju.46.2020.09.11.07.30.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Sep 2020 07:30:58 -0700 (PDT) From: Nicolas Rybowski To: Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , John Fastabend , KP Singh , "David S. Miller" , Jakub Kicinski Cc: Nicolas Rybowski , Matthieu Baerts , Mat Martineau , netdev@vger.kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH bpf-next v2 1/5] bpf: expose is_mptcp flag to bpf_tcp_sock Date: Fri, 11 Sep 2020 16:30:16 +0200 Message-Id: <20200911143022.414783-1-nicolas.rybowski@tessares.net> X-Mailer: git-send-email 2.28.0 MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org is_mptcp is a field from struct tcp_sock used to indicate that the current tcp_sock is part of the MPTCP protocol. In this protocol, a first socket (mptcp_sock) is created with sk_protocol set to IPPROTO_MPTCP (=262) for control purpose but it isn't directly on the wire. This is the role of the subflow (kernel) sockets which are classical tcp_sock with sk_protocol set to IPPROTO_TCP. The only way to differentiate such sockets from plain TCP sockets is the is_mptcp field from tcp_sock. Such an exposure in BPF is thus required to be able to differentiate plain TCP sockets from MPTCP subflow sockets in BPF_PROG_TYPE_SOCK_OPS programs. The choice has been made to silently pass the case when CONFIG_MPTCP is unset by defaulting is_mptcp to 0 in order to make BPF independent of the MPTCP configuration. Another solution is to make the verifier fail in 'bpf_tcp_sock_is_valid_ctx_access' but this will add an additional '#ifdef CONFIG_MPTCP' in the BPF code and a same injected BPF program will not run if MPTCP is not set. An example use-case is provided in https://github.com/multipath-tcp/mptcp_net-next/tree/scripts/bpf/examples Suggested-by: Matthieu Baerts Acked-by: Matthieu Baerts Acked-by: Mat Martineau Signed-off-by: Nicolas Rybowski Acked-by: Song Liu Acked-by: Song Liu --- include/uapi/linux/bpf.h | 1 + net/core/filter.c | 9 ++++++++- tools/include/uapi/linux/bpf.h | 1 + 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 7dd314176df7..7d179eada1c3 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -4060,6 +4060,7 @@ struct bpf_tcp_sock { __u32 delivered; /* Total data packets delivered incl. rexmits */ __u32 delivered_ce; /* Like the above but only ECE marked packets */ __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */ + __u32 is_mptcp; /* Is MPTCP subflow? */ }; struct bpf_sock_tuple { diff --git a/net/core/filter.c b/net/core/filter.c index d266c6941967..dab48528dceb 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -5837,7 +5837,7 @@ bool bpf_tcp_sock_is_valid_access(int off, int size, enum bpf_access_type type, struct bpf_insn_access_aux *info) { if (off < 0 || off >= offsetofend(struct bpf_tcp_sock, - icsk_retransmits)) + is_mptcp)) return false; if (off % size != 0) @@ -5971,6 +5971,13 @@ u32 bpf_tcp_sock_convert_ctx_access(enum bpf_access_type type, case offsetof(struct bpf_tcp_sock, icsk_retransmits): BPF_INET_SOCK_GET_COMMON(icsk_retransmits); break; + case offsetof(struct bpf_tcp_sock, is_mptcp): +#ifdef CONFIG_MPTCP + BPF_TCP_SOCK_GET_COMMON(is_mptcp); +#else + *insn++ = BPF_MOV32_IMM(si->dst_reg, 0); +#endif + break; } return insn - insn_buf; diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index 7dd314176df7..7d179eada1c3 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -4060,6 +4060,7 @@ struct bpf_tcp_sock { __u32 delivered; /* Total data packets delivered incl. rexmits */ __u32 delivered_ce; /* Like the above but only ECE marked packets */ __u32 icsk_retransmits; /* Number of unrecovered [RTO] timeouts */ + __u32 is_mptcp; /* Is MPTCP subflow? */ }; struct bpf_sock_tuple { From patchwork Fri Sep 11 14:30:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Rybowski X-Patchwork-Id: 261047 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE1C4C433E2 for ; Fri, 11 Sep 2020 17:02:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 78D1A221E7 for ; Fri, 11 Sep 2020 17:02:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=tessares-net.20150623.gappssmtp.com header.i=@tessares-net.20150623.gappssmtp.com header.b="qySYWhW0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726329AbgIKRB5 (ORCPT ); Fri, 11 Sep 2020 13:01:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726015AbgIKPEQ (ORCPT ); Fri, 11 Sep 2020 11:04:16 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E1EDC06135D for ; Fri, 11 Sep 2020 07:31:06 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id e23so14106642eja.3 for ; Fri, 11 Sep 2020 07:31:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tessares-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=8voqPtdkbS5RYnHXaRVOj2SC1jKq1kvwIX4dEXxo6Mc=; b=qySYWhW0yC42rPW/IoqWQolhYfXTLM21q7M2YytnHh2yQU4RoGj86gmZePTxz8sAjT GFnqtUFKRVnNvFFSoku5JBmqgqWLDyp3MH8ip3utZK5/D9/Q5oyLP69h10p78VRSCZL4 CjxstQ5GyKsCPPsHl+CbtPcFPb7vjpkHd29RXhjIRjBEWV+X8wVZQMmDOV0ELwoDcLmz HrVgononjhPjTIsOluxMQICFWXylrDWQkjYNSlCBsEJOszmzHOOjtdqU3VA0Jdd7dB1w X8H0violb+3GZwx8SNphe81iv+N5lCd4hWhgNDxulNEkayJY2qBM/9ftAQY8iAf/gPHo GyDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8voqPtdkbS5RYnHXaRVOj2SC1jKq1kvwIX4dEXxo6Mc=; b=KXtg6y2afko3hqSiELpLZPsUuPfOZCVmP1u42a3u8zGhCarys20mD1PkjgscRXTFUp ysetn6NtrCnYHTat2dRrEgcFr3Lacsb6vTLcL271NNxpZIaNWccfDiAwsMikLx4q8IM3 lhwHDV92qPOvgygYOd/8WD8oXtciOO9bDVpKslq+f1jysraBecLlDSPLooBduv1kE105 dwXA23RJacGhu+oenfU/Bvc8MZnYkz4LGUAtfDe+sYOM7YA9pGgtK7vtEdDvTtsmhg6r 9dtQiWuRANjXPa6DJ9xbqEmlRb6Sph8jCIEYwnvaGUXHE3N20DN2jUXdP57KQPWl+OqI 0SMQ== X-Gm-Message-State: AOAM533udMyM0dan5aqm6TQAa/MNkkZ9Kl5/x24lmtCXTduxrz9PynnX Vf8DPZxgeXzkHJ2hNJFBsGI5mQ== X-Google-Smtp-Source: ABdhPJx9aD+LYlhca0oVerMVXjMsebdQaEZZyd5ADDx85BMSztyKMmeinroRPPOAzwBmwtoOQ65huA== X-Received: by 2002:a17:906:5206:: with SMTP id g6mr2404464ejm.292.1599834665328; Fri, 11 Sep 2020 07:31:05 -0700 (PDT) Received: from localhost.localdomain ([87.66.33.240]) by smtp.gmail.com with ESMTPSA id y21sm1716261eju.46.2020.09.11.07.31.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Sep 2020 07:31:04 -0700 (PDT) From: Nicolas Rybowski To: Mat Martineau , Matthieu Baerts , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , John Fastabend , KP Singh Cc: Nicolas Rybowski , Paolo Abeni , netdev@vger.kernel.org, mptcp@lists.01.org, linux-kernel@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v2 2/5] mptcp: attach subflow socket to parent cgroup Date: Fri, 11 Sep 2020 16:30:17 +0200 Message-Id: <20200911143022.414783-2-nicolas.rybowski@tessares.net> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200911143022.414783-1-nicolas.rybowski@tessares.net> References: <20200911143022.414783-1-nicolas.rybowski@tessares.net> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org It has been observed that the kernel sockets created for the subflows (except the first one) are not in the same cgroup as their parents. That's because the additional subflows are created by kernel workers. This is a problem with eBPF programs attached to the parent's cgroup won't be executed for the children. But also with any other features of CGroup linked to a sk. This patch fixes this behaviour. As the subflow sockets are created by the kernel, we can't use 'mem_cgroup_sk_alloc' because of the current context being the one of the kworker. This is why we have to do low level memcg manipulation, if required. Suggested-by: Matthieu Baerts Suggested-by: Paolo Abeni Acked-by: Matthieu Baerts Reviewed-by: Mat Martineau Signed-off-by: Nicolas Rybowski Acked-by: Song Liu --- net/mptcp/subflow.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index e8cac2655c82..535a3f9f8cfc 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1130,6 +1130,30 @@ int __mptcp_subflow_connect(struct sock *sk, int ifindex, return err; } +static void mptcp_attach_cgroup(struct sock *parent, struct sock *child) +{ +#ifdef CONFIG_SOCK_CGROUP_DATA + struct sock_cgroup_data *parent_skcd = &parent->sk_cgrp_data, + *child_skcd = &child->sk_cgrp_data; + + /* only the additional subflows created by kworkers have to be modified */ + if (cgroup_id(sock_cgroup_ptr(parent_skcd)) != + cgroup_id(sock_cgroup_ptr(child_skcd))) { +#ifdef CONFIG_MEMCG + struct mem_cgroup *memcg = parent->sk_memcg; + + mem_cgroup_sk_free(child); + if (memcg && css_tryget(&memcg->css)) + child->sk_memcg = memcg; +#endif /* CONFIG_MEMCG */ + + cgroup_sk_free(child_skcd); + *child_skcd = *parent_skcd; + cgroup_sk_clone(child_skcd); + } +#endif /* CONFIG_SOCK_CGROUP_DATA */ +} + int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) { struct mptcp_subflow_context *subflow; @@ -1150,6 +1174,9 @@ int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock) lock_sock(sf->sk); + /* the newly created socket has to be in the same cgroup as its parent */ + mptcp_attach_cgroup(sk, sf->sk); + /* kernel sockets do not by default acquire net ref, but TCP timer * needs it. */ From patchwork Fri Sep 11 14:30:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicolas Rybowski X-Patchwork-Id: 261060 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B55BC43461 for ; Fri, 11 Sep 2020 14:59:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DB65F223BD for ; Fri, 11 Sep 2020 14:59:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=tessares-net.20150623.gappssmtp.com header.i=@tessares-net.20150623.gappssmtp.com header.b="a15BUZJR" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726225AbgIKO7h (ORCPT ); Fri, 11 Sep 2020 10:59:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57328 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726184AbgIKO6U (ORCPT ); Fri, 11 Sep 2020 10:58:20 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E45C4C061365 for ; Fri, 11 Sep 2020 07:31:16 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id z22so14077833ejl.7 for ; Fri, 11 Sep 2020 07:31:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tessares-net.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+i8R58kw/leW4I2M13LKbuHEybfMrzesYwlJiGTgtNk=; b=a15BUZJRqua56q81WMjBEDQ3iSAVjqj55mNjlT0LHm7fj8zncIWlNPf3SrHkC/eofS MUhGOCYkv34QW1doX+CMNH+ev+OYMNBtjPe9GoTZyrcxRq00tSwp53Z8f4pxs2Jw1udo Z6k1oSYZLFfsSuR/MgGY022bxzWljeVKwumPP+KmaGAHP0/7QGVkbf3SA2+C8GzGvjt5 wDsUMZSigxwNqh19nEEHq3Yn9wkEF05CIk2+FnoopbRLINo3K1qss/pCs8zx7AwKQMhX ROL8xgPAq1U794RkujJ3S5Bi024Rd92qVakoS/PyvX/Vv9XhMrkfYcv/0ovX0sczmaHf sISg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+i8R58kw/leW4I2M13LKbuHEybfMrzesYwlJiGTgtNk=; b=rR5DtEkxWIeGeGDd7kBmtK39rdo+sdRq28uhZZrLeAxM5xLHzpJop2OuX/hSJR8p1r CyoGE7F0y5pbZ9p9mL3CZp1wpF0ZcP1kAbMC/1X/bwJq3KuvBm0NVAlCiLhXYahCLtDt x/S2poP6qfT7GdQ1SLgjJRTZwH88Dgo5JQajkiKBAA1qzc0Sp7j5V3HJwiymdkHuB9BU sDvpMbxBfcWLeDdWUVtDNaV8hAsRx9zLLgLi99IAKinHSQ7Tkt5mrfn/a4k9PSaLWRDD m8C4rsxT4EvoPJyoQW+9RNEvs+AQ3XJYkO7CRWWAQhfz+DS+ZNtwUN8dY19/S0MfvQtk mmKg== X-Gm-Message-State: AOAM533m21EABI0dNYVb5x2UU/SZZVKnF/haNBMYIDwIN/Aw94QzrnAT JJW7Z+m5pQh+WKwb1xvwP8n1VQ== X-Google-Smtp-Source: ABdhPJy1pF6nMvBpXqraqaRNOATVMTu/betL9JR2wN4Oe0zmBlv0YY5nb7O5C/sfCFUnKtFrAXQASw== X-Received: by 2002:a17:906:1b55:: with SMTP id p21mr2425207ejg.457.1599834675454; Fri, 11 Sep 2020 07:31:15 -0700 (PDT) Received: from localhost.localdomain ([87.66.33.240]) by smtp.gmail.com with ESMTPSA id y21sm1716261eju.46.2020.09.11.07.31.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Sep 2020 07:31:14 -0700 (PDT) From: Nicolas Rybowski To: Shuah Khan , Alexei Starovoitov , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song , Andrii Nakryiko , John Fastabend , KP Singh Cc: Nicolas Rybowski , Matthieu Baerts , linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org Subject: [PATCH bpf-next v2 4/5] bpf: selftests: add MPTCP test base Date: Fri, 11 Sep 2020 16:30:19 +0200 Message-Id: <20200911143022.414783-4-nicolas.rybowski@tessares.net> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200911143022.414783-1-nicolas.rybowski@tessares.net> References: <20200911143022.414783-1-nicolas.rybowski@tessares.net> MIME-Version: 1.0 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This patch adds a base for MPTCP specific tests. It is currently limited to the is_mptcp field in case of plain TCP connection because for the moment there is no easy way to get the subflow sk from a msk in userspace. This implies that we cannot lookup the sk_storage attached to the subflow sk in the sockops program. Acked-by: Matthieu Baerts Signed-off-by: Nicolas Rybowski Acked-by: Song Liu --- Notes: v1 -> v2: - new patch: mandatory selftests (Alexei) tools/testing/selftests/bpf/config | 1 + tools/testing/selftests/bpf/network_helpers.c | 37 +++++- tools/testing/selftests/bpf/network_helpers.h | 3 + .../testing/selftests/bpf/prog_tests/mptcp.c | 119 ++++++++++++++++++ tools/testing/selftests/bpf/progs/mptcp.c | 48 +++++++ 5 files changed, 203 insertions(+), 5 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/mptcp.c create mode 100644 tools/testing/selftests/bpf/progs/mptcp.c diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index 2118e23ac07a..8377836ea976 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -39,3 +39,4 @@ CONFIG_BPF_JIT=y CONFIG_BPF_LSM=y CONFIG_SECURITY=y CONFIG_LIRC=y +CONFIG_MPTCP=y diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c index 12ee40284da0..85cbb683965c 100644 --- a/tools/testing/selftests/bpf/network_helpers.c +++ b/tools/testing/selftests/bpf/network_helpers.c @@ -14,6 +14,10 @@ #include "bpf_util.h" #include "network_helpers.h" +#ifndef IPPROTO_MPTCP +#define IPPROTO_MPTCP 262 +#endif + #define clean_errno() (errno == 0 ? "None" : strerror(errno)) #define log_err(MSG, ...) ({ \ int __save = errno; \ @@ -66,8 +70,8 @@ static int settimeo(int fd, int timeout_ms) #define save_errno_close(fd) ({ int __save = errno; close(fd); errno = __save; }) -int start_server(int family, int type, const char *addr_str, __u16 port, - int timeout_ms) +static int start_server_proto(int family, int type, int protocol, + const char *addr_str, __u16 port, int timeout_ms) { struct sockaddr_storage addr = {}; socklen_t len; @@ -76,7 +80,7 @@ int start_server(int family, int type, const char *addr_str, __u16 port, if (make_sockaddr(family, addr_str, port, &addr, &len)) return -1; - fd = socket(family, type, 0); + fd = socket(family, type, protocol); if (fd < 0) { log_err("Failed to create server socket"); return -1; @@ -104,6 +108,19 @@ int start_server(int family, int type, const char *addr_str, __u16 port, return -1; } +int start_server(int family, int type, const char *addr_str, __u16 port, + int timeout_ms) +{ + return start_server_proto(family, type, 0, addr_str, port, timeout_ms); +} + +int start_mptcp_server(int family, const char *addr_str, __u16 port, + int timeout_ms) +{ + return start_server_proto(family, SOCK_STREAM, IPPROTO_MPTCP, addr_str, + port, timeout_ms); +} + int fastopen_connect(int server_fd, const char *data, unsigned int data_len, int timeout_ms) { @@ -153,7 +170,7 @@ static int connect_fd_to_addr(int fd, return 0; } -int connect_to_fd(int server_fd, int timeout_ms) +static int connect_to_fd_proto(int server_fd, int protocol, int timeout_ms) { struct sockaddr_storage addr; struct sockaddr_in *addr_in; @@ -173,7 +190,7 @@ int connect_to_fd(int server_fd, int timeout_ms) } addr_in = (struct sockaddr_in *)&addr; - fd = socket(addr_in->sin_family, type, 0); + fd = socket(addr_in->sin_family, type, protocol); if (fd < 0) { log_err("Failed to create client socket"); return -1; @@ -192,6 +209,16 @@ int connect_to_fd(int server_fd, int timeout_ms) return -1; } +int connect_to_fd(int server_fd, int timeout_ms) +{ + return connect_to_fd_proto(server_fd, 0, timeout_ms); +} + +int connect_to_mptcp_fd(int server_fd, int timeout_ms) +{ + return connect_to_fd_proto(server_fd, IPPROTO_MPTCP, timeout_ms); +} + int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms) { struct sockaddr_storage addr; diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h index 7205f8afdba1..336423a789e9 100644 --- a/tools/testing/selftests/bpf/network_helpers.h +++ b/tools/testing/selftests/bpf/network_helpers.h @@ -35,7 +35,10 @@ extern struct ipv6_packet pkt_v6; int start_server(int family, int type, const char *addr, __u16 port, int timeout_ms); +int start_mptcp_server(int family, const char *addr, __u16 port, + int timeout_ms); int connect_to_fd(int server_fd, int timeout_ms); +int connect_to_mptcp_fd(int server_fd, int timeout_ms); int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms); int fastopen_connect(int server_fd, const char *data, unsigned int data_len, int timeout_ms); diff --git a/tools/testing/selftests/bpf/prog_tests/mptcp.c b/tools/testing/selftests/bpf/prog_tests/mptcp.c new file mode 100644 index 000000000000..0e65d64868e9 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/mptcp.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include "cgroup_helpers.h" +#include "network_helpers.h" + +struct mptcp_storage { + __u32 invoked; + __u32 is_mptcp; +}; + +static int verify_sk(int map_fd, int client_fd, const char *msg, __u32 is_mptcp) +{ + int err = 0, cfd = client_fd; + struct mptcp_storage val; + + /* Currently there is no easy way to get back the subflow sk from the MPTCP + * sk, thus we cannot access here the sk_storage associated to the subflow + * sk. Also, there is no sk_storage associated with the MPTCP sk since it + * does not trigger sockops events. + * We silently pass this situation at the moment. + */ + if (is_mptcp == 1) + return 0; + + if (CHECK_FAIL(bpf_map_lookup_elem(map_fd, &cfd, &val) < 0)) { + perror("Failed to read socket storage"); + return -1; + } + + if (val.invoked != 1) { + log_err("%s: unexpected invoked count %d != %d", + msg, val.invoked, 1); + err++; + } + + if (val.is_mptcp != is_mptcp) { + log_err("%s: unexpected bpf_tcp_sock.is_mptcp %d != %d", + msg, val.is_mptcp, is_mptcp); + err++; + } + + return err; +} + +static int run_test(int cgroup_fd, int server_fd, bool is_mptcp) +{ + int client_fd, prog_fd, map_fd, err; + struct bpf_object *obj; + struct bpf_map *map; + + struct bpf_prog_load_attr attr = { + .prog_type = BPF_PROG_TYPE_SOCK_OPS, + .file = "./mptcp.o", + .expected_attach_type = BPF_CGROUP_SOCK_OPS, + }; + + err = bpf_prog_load_xattr(&attr, &obj, &prog_fd); + if (err) { + log_err("Failed to load BPF object"); + return -1; + } + + map = bpf_map__next(NULL, obj); + map_fd = bpf_map__fd(map); + + err = bpf_prog_attach(prog_fd, cgroup_fd, BPF_CGROUP_SOCK_OPS, 0); + if (err) { + log_err("Failed to attach BPF program"); + goto close_bpf_object; + } + + client_fd = is_mptcp ? connect_to_mptcp_fd(server_fd, 0) : + connect_to_fd(server_fd, 0); + if (client_fd < 0) { + err = -1; + goto close_client_fd; + } + + err += is_mptcp ? verify_sk(map_fd, client_fd, "MPTCP subflow socket", 1) : + verify_sk(map_fd, client_fd, "plain TCP socket", 0); + +close_client_fd: + close(client_fd); + +close_bpf_object: + bpf_object__close(obj); + return err; +} + +void test_mptcp(void) +{ + int server_fd, cgroup_fd; + + cgroup_fd = test__join_cgroup("/mptcp"); + if (CHECK_FAIL(cgroup_fd < 0)) + return; + + /* without MPTCP */ + server_fd = start_server(AF_INET, SOCK_STREAM, NULL, 0, 0); + if (CHECK_FAIL(server_fd < 0)) + goto with_mptcp; + + CHECK_FAIL(run_test(cgroup_fd, server_fd, false)); + + close(server_fd); + +with_mptcp: + /* with MPTCP */ + server_fd = start_mptcp_server(AF_INET, NULL, 0, 0); + if (CHECK_FAIL(server_fd < 0)) + goto close_cgroup_fd; + + CHECK_FAIL(run_test(cgroup_fd, server_fd, true)); + + close(server_fd); + +close_cgroup_fd: + close(cgroup_fd); +} diff --git a/tools/testing/selftests/bpf/progs/mptcp.c b/tools/testing/selftests/bpf/progs/mptcp.c new file mode 100644 index 000000000000..be5ee8dac2b3 --- /dev/null +++ b/tools/testing/selftests/bpf/progs/mptcp.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include + +char _license[] SEC("license") = "GPL"; +__u32 _version SEC("version") = 1; + +struct mptcp_storage { + __u32 invoked; + __u32 is_mptcp; +}; + +struct { + __uint(type, BPF_MAP_TYPE_SK_STORAGE); + __uint(map_flags, BPF_F_NO_PREALLOC); + __type(key, int); + __type(value, struct mptcp_storage); +} socket_storage_map SEC(".maps"); + +SEC("sockops") +int _sockops(struct bpf_sock_ops *ctx) +{ + struct mptcp_storage *storage; + struct bpf_tcp_sock *tcp_sk; + int op = (int)ctx->op; + struct bpf_sock *sk; + + sk = ctx->sk; + if (!sk) + return 1; + + storage = bpf_sk_storage_get(&socket_storage_map, sk, 0, + BPF_SK_STORAGE_GET_F_CREATE); + if (!storage) + return 1; + + if (op != BPF_SOCK_OPS_TCP_CONNECT_CB) + return 1; + + tcp_sk = bpf_tcp_sock(sk); + if (!tcp_sk) + return 1; + + storage->invoked++; + storage->is_mptcp = tcp_sk->is_mptcp; + + return 1; +}