From patchwork Wed Sep 9 18:15:53 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neal Cardwell X-Patchwork-Id: 261231 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 154A9C43461 for ; Wed, 9 Sep 2020 18:16:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A309D21D7D for ; Wed, 9 Sep 2020 18:16:28 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="RRAGhKxk" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728611AbgIISQ0 (ORCPT ); Wed, 9 Sep 2020 14:16:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45160 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726226AbgIISQR (ORCPT ); Wed, 9 Sep 2020 14:16:17 -0400 Received: from mail-qk1-x749.google.com (mail-qk1-x749.google.com [IPv6:2607:f8b0:4864:20::749]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A3973C061755 for ; Wed, 9 Sep 2020 11:16:14 -0700 (PDT) Received: by mail-qk1-x749.google.com with SMTP id r128so1892241qkc.9 for ; Wed, 09 Sep 2020 11:16:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=wZb+bYOBCFyLXSATRBr9zjdlrGh9pUT/rzRHEgOsJSQ=; b=RRAGhKxkOOGzImEtbPe9UP8agWsW2AKPv9loFyGJK06nSt7a/fvXKSwe8jlcL+T1RX 16bVL/MntfQJtSaoxnP6wVIHXdHOYBI887VV3nh1AFb/gHvxY79jGvfD9/QHsn3cXP1n YeTAh1yN4V6OuFRT1vGjwhLSrSeNiiZs5u1B/Z9aXeGVCe0chAeXGtPbp6P3fPwL73T/ G5jPD6eethNLxLxld0qCYzKfw8qSYlwt+Eyf/VzN9jtteGQvaMIBxoED+AEeXIcvnldx pjTv5q4UTGFs3GuCgF1KPi8brc0j36kMjuGXfW3K70XC8QqTBVJvpv5MqkjfdFs8QsYI G+0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=wZb+bYOBCFyLXSATRBr9zjdlrGh9pUT/rzRHEgOsJSQ=; b=cHPGcpHx9R3XEmYIA2bDaGzIyrNzHrR21KG1DhM9RrRHHvKgNWo375qj3JuL4I4TaY kRtAMpaj27++M4213jdhGvUJJmRIPU/L+4M9Bs6tpBf4FoJiR3Cih7t6tQog+vN7UZyT rZaUIB0XiEH4YcqCzrYvSI77aNnRdulqKE+NAF508nz4eQLDUSkpN3AuWwYtAY2QVWQ+ KD/JK5Z2LAr5I8cj3zXoRIxZ6a4w/ea6B3kHWCszVTutqSXYIxha95o5O43cbTqCOlli AVzanabrPxAlvvPmYFuoOQXerqxt0KTUKHwf34QeaL6eQfNGAbPpp/mk7dDxt9W5peQy yspw== X-Gm-Message-State: AOAM533NW7rB07OW6G1XNsa+6WHRX5bvJe9pc+ZfHgf1tZvxwqhM/tPB DkPPKysMR2M98SJqpMhUVsdoru0iX447oBc= X-Google-Smtp-Source: ABdhPJwLfvITa9OfIOIuLa+sx6Rhs1vMwYyxLgri5RF1dM4Qv/5g6QkH2aEun0Q8rUo0qeL2P5gHX0mXCFwb56A= X-Received: from soy.nyc.corp.google.com ([2620:0:1003:312:7220:84ff:fe09:3008]) (user=ncardwell job=sendgmr) by 2002:a0c:c289:: with SMTP id b9mr5311858qvi.31.1599675373343; Wed, 09 Sep 2020 11:16:13 -0700 (PDT) Date: Wed, 9 Sep 2020 14:15:53 -0400 In-Reply-To: <20200909181556.2945496-1-ncardwell@google.com> Message-Id: <20200909181556.2945496-2-ncardwell@google.com> Mime-Version: 1.0 References: <20200909181556.2945496-1-ncardwell@google.com> X-Mailer: git-send-email 2.28.0.526.ge36021eeef-goog Subject: [PATCH net-next 1/4] tcp: only init congestion control if not initialized already From: Neal Cardwell To: David Miller Cc: netdev@vger.kernel.org, Neal Cardwell , Yuchung Cheng , Kevin Yang , Eric Dumazet , Lawrence Brakmo Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Change tcp_init_transfer() to only initialize congestion control if it has not been initialized already. With this new approach, we can arrange things so that if the EBPF code sets the congestion control by calling setsockopt(TCP_CONGESTION) then tcp_init_transfer() will not re-initialize the CC module. This is an approach that has the following beneficial properties: (1) This allows CC module customizations made by the EBPF called in tcp_init_transfer() to persist, and not be wiped out by a later call to tcp_init_congestion_control() in tcp_init_transfer(). (2) Does not flip the order of EBPF and CC init, to avoid causing bugs for existing code upstream that depends on the current order. (3) Does not cause 2 initializations for for CC in the case where the EBPF called in tcp_init_transfer() wants to set the CC to a new CC algorithm. (4) Allows follow-on simplifications to the code in net/core/filter.c and net/ipv4/tcp_cong.c, which currently both have some complexity to special-case CC initialization to avoid double CC initialization if EBPF sets the CC. Signed-off-by: Neal Cardwell Acked-by: Yuchung Cheng Acked-by: Kevin Yang Signed-off-by: Eric Dumazet Cc: Lawrence Brakmo --- include/net/inet_connection_sock.h | 3 ++- net/ipv4/tcp.c | 1 + net/ipv4/tcp_cong.c | 3 ++- net/ipv4/tcp_input.c | 4 +++- 4 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index c738abeb3265..dc763ca9413c 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -96,7 +96,8 @@ struct inet_connection_sock { void (*icsk_clean_acked)(struct sock *sk, u32 acked_seq); struct hlist_node icsk_listen_portaddr_node; unsigned int (*icsk_sync_mss)(struct sock *sk, u32 pmtu); - __u8 icsk_ca_state:6, + __u8 icsk_ca_state:5, + icsk_ca_initialized:1, icsk_ca_setsockopt:1, icsk_ca_dst_locked:1; __u8 icsk_retransmits; diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 57a568875539..7360d3db2b61 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2698,6 +2698,7 @@ int tcp_disconnect(struct sock *sk, int flags) if (icsk->icsk_ca_ops->release) icsk->icsk_ca_ops->release(sk); memset(icsk->icsk_ca_priv, 0, sizeof(icsk->icsk_ca_priv)); + icsk->icsk_ca_initialized = 0; tcp_set_ca_state(sk, TCP_CA_Open); tp->is_sack_reneg = 0; tcp_clear_retrans(tp); diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index 62878cf26d9c..d18d7a1ce4ce 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -176,7 +176,7 @@ void tcp_assign_congestion_control(struct sock *sk) void tcp_init_congestion_control(struct sock *sk) { - const struct inet_connection_sock *icsk = inet_csk(sk); + struct inet_connection_sock *icsk = inet_csk(sk); tcp_sk(sk)->prior_ssthresh = 0; if (icsk->icsk_ca_ops->init) @@ -185,6 +185,7 @@ void tcp_init_congestion_control(struct sock *sk) INET_ECN_xmit(sk); else INET_ECN_dontxmit(sk); + icsk->icsk_ca_initialized = 1; } static void tcp_reinit_congestion_control(struct sock *sk, diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4337841faeff..0e5ac0d33fd3 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -5894,8 +5894,10 @@ void tcp_init_transfer(struct sock *sk, int bpf_op, struct sk_buff *skb) tp->snd_cwnd = tcp_init_cwnd(tp, __sk_dst_get(sk)); tp->snd_cwnd_stamp = tcp_jiffies32; + icsk->icsk_ca_initialized = 0; bpf_skops_established(sk, bpf_op, skb); - tcp_init_congestion_control(sk); + if (!icsk->icsk_ca_initialized) + tcp_init_congestion_control(sk); tcp_init_buffer_space(sk); } From patchwork Wed Sep 9 18:15:55 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neal Cardwell X-Patchwork-Id: 261230 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED, DKIM_INVALID, DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E476CC43461 for ; Wed, 9 Sep 2020 18:17:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 80B5C2078E for ; Wed, 9 Sep 2020 18:17:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=google.com header.i=@google.com header.b="onc9VEw8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729275AbgIISRB (ORCPT ); Wed, 9 Sep 2020 14:17:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727055AbgIISQS (ORCPT ); Wed, 9 Sep 2020 14:16:18 -0400 Received: from mail-qv1-xf49.google.com (mail-qv1-xf49.google.com [IPv6:2607:f8b0:4864:20::f49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E20B9C061757 for ; Wed, 9 Sep 2020 11:16:16 -0700 (PDT) Received: by mail-qv1-xf49.google.com with SMTP id y2so1878269qvs.14 for ; Wed, 09 Sep 2020 11:16:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=WpHZA/YEb8AVMH8e/anhUbZurJrmP6ml/9UihqCFb/E=; b=onc9VEw88lG1pmegiJlBWBJ1rKtrwLP6Vm62HpLo3b7Eb39nuoZsav/Q9hsU6n7gB1 eX1FXJ7LecwsU8xVK6YtYjNkWveUZCE68p2GvF6zLcmTd6z2bsXPhYtjftBPCFynH4Cm Cr61YrwWuAMJ35ORvMb7j9LGOHOKbdyEu4uBpDCL9lHT9RGh5eP1H5VEF//aHCQk97mS mUI9DCGsUuCkODWE1iA582lQ29TKdpW9LSShTmbuE8rkN1vU+6J4yflXRWSz7l1FB7Nc swh+zxbPHU+LW3Gigu6devZVw2sRShjPrJmnc8VjhuNHIGIPb8p/2i6pq7I7RhrXvm31 22Ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=WpHZA/YEb8AVMH8e/anhUbZurJrmP6ml/9UihqCFb/E=; b=KWeb+X+g6rXjvE8EFPtdAKupzdvv56SX54qNneiwi3upoLaAKFziOk8yI3PJN6TuY4 Qly7kq+lGYTsv1HGcD8Fkohzsgu+NQ22MnicE0/FXyi3tZB1bZnJ+ECUa4nPERP518fD M7+WK2f7AOyi90OklpNGM0nH0Bxwl9K2UIDUwE405duRaRpSkp27UZACet03sQ4XPr9G 5fCmc1XVTD3bHyPweCs0muZKlrSiaB2VRnqGVFwBUNAbofEyjOavYY32bxUPFRbvkrpK H/J35vhBR5VfL8x8BhuA1xnEkHPUH9ssbgx/TTP4GflthUigCUxsK2Dh3AvuL7hlaBbj oEaQ== X-Gm-Message-State: AOAM530p32XvkNJhvBPqt6dPdjyuZ4xWVBDxf/BHJKDsVYdfNmwyK2t3 2f1WrRo/Xd0AI+wUbBqeD4HrW+BqQmWVVf8= X-Google-Smtp-Source: ABdhPJzS0ym9tVR8I2T1mvWHbdYsdDXSbYFdW0NkipIWrWEC9aGNXY+MS2O1gnWPHns2kmS6d7GkVUzqOdKcCds= X-Received: from soy.nyc.corp.google.com ([2620:0:1003:312:7220:84ff:fe09:3008]) (user=ncardwell job=sendgmr) by 2002:a0c:ec11:: with SMTP id y17mr5491406qvo.72.1599675375975; Wed, 09 Sep 2020 11:16:15 -0700 (PDT) Date: Wed, 9 Sep 2020 14:15:55 -0400 In-Reply-To: <20200909181556.2945496-1-ncardwell@google.com> Message-Id: <20200909181556.2945496-4-ncardwell@google.com> Mime-Version: 1.0 References: <20200909181556.2945496-1-ncardwell@google.com> X-Mailer: git-send-email 2.28.0.526.ge36021eeef-goog Subject: [PATCH net-next 3/4] tcp: simplify tcp_set_congestion_control(): always reinitialize From: Neal Cardwell To: David Miller Cc: netdev@vger.kernel.org, Neal Cardwell , Yuchung Cheng , Kevin Yang , Eric Dumazet , Lawrence Brakmo Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Now that the previous patches ensure that all call sites for tcp_set_congestion_control() want to initialize congestion control, we can simplify tcp_set_congestion_control() by removing the reinit argument and the code to support it. Signed-off-by: Neal Cardwell Acked-by: Yuchung Cheng Acked-by: Kevin Yang Signed-off-by: Eric Dumazet Cc: Lawrence Brakmo --- include/net/tcp.h | 2 +- net/core/filter.c | 3 +-- net/ipv4/tcp.c | 2 +- net/ipv4/tcp_cong.c | 11 ++--------- 4 files changed, 5 insertions(+), 13 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index e85d564446c6..f857146c17a5 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1104,7 +1104,7 @@ void tcp_get_available_congestion_control(char *buf, size_t len); void tcp_get_allowed_congestion_control(char *buf, size_t len); int tcp_set_allowed_congestion_control(char *allowed); int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, - bool reinit, bool cap_net_admin); + bool cap_net_admin); u32 tcp_slow_start(struct tcp_sock *tp, u32 acked); void tcp_cong_avoid_ai(struct tcp_sock *tp, u32 w, u32 acked); diff --git a/net/core/filter.c b/net/core/filter.c index b26c04924fa3..0bd0a97ee951 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4451,8 +4451,7 @@ static int _bpf_setsockopt(struct sock *sk, int level, int optname, strncpy(name, optval, min_t(long, optlen, TCP_CA_NAME_MAX-1)); name[TCP_CA_NAME_MAX-1] = 0; - ret = tcp_set_congestion_control(sk, name, false, - true, true); + ret = tcp_set_congestion_control(sk, name, false, true); } else { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 7360d3db2b61..e58ab9db73ff 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -3050,7 +3050,7 @@ static int do_tcp_setsockopt(struct sock *sk, int level, int optname, name[val] = 0; lock_sock(sk); - err = tcp_set_congestion_control(sk, name, true, true, + err = tcp_set_congestion_control(sk, name, true, ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)); release_sock(sk); diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c index d18d7a1ce4ce..a9b0fb52a1ec 100644 --- a/net/ipv4/tcp_cong.c +++ b/net/ipv4/tcp_cong.c @@ -341,7 +341,7 @@ int tcp_set_allowed_congestion_control(char *val) * already initialized. */ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, - bool reinit, bool cap_net_admin) + bool cap_net_admin) { struct inet_connection_sock *icsk = inet_csk(sk); const struct tcp_congestion_ops *ca; @@ -365,15 +365,8 @@ int tcp_set_congestion_control(struct sock *sk, const char *name, bool load, if (!ca) { err = -ENOENT; } else if (!load) { - const struct tcp_congestion_ops *old_ca = icsk->icsk_ca_ops; - if (bpf_try_module_get(ca, ca->owner)) { - if (reinit) { - tcp_reinit_congestion_control(sk, ca); - } else { - icsk->icsk_ca_ops = ca; - bpf_module_put(old_ca, old_ca->owner); - } + tcp_reinit_congestion_control(sk, ca); } else { err = -EBUSY; }