From patchwork Thu Nov 19 19:45:58 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 328944 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37F6CC56201 for ; Thu, 19 Nov 2020 19:46:19 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E3D4824199 for ; Thu, 19 Nov 2020 19:46:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728015AbgKSTqS (ORCPT ); Thu, 19 Nov 2020 14:46:18 -0500 Received: from mga04.intel.com ([192.55.52.120]:13142 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727066AbgKSTqO (ORCPT ); Thu, 19 Nov 2020 14:46:14 -0500 IronPort-SDR: iCRbN35ISroPzGd0m+dP8U2CBX4gpn+hunIJezatCrhwmiXJiErEjF7/ej1SCY/mseQpG0XbyG PMf9Dz2i0OUg== X-IronPort-AV: E=McAfee;i="6000,8403,9810"; a="168780895" X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="168780895" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 IronPort-SDR: 5f+Jm87yNj+gp8grbHBnkYewo/Cri2U8mTSUrI1KIaJAlbXM5F3963D62AFWhsVayu5RGszihG Xg//eLgxdUbg== X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="476940481" Received: from mjmartin-nuc02.amr.corp.intel.com ([10.255.229.232]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , kuba@kernel.org, mptcp@lists.01.org, Geliang Tang , Mat Martineau Subject: [PATCH net-next 05/10] mptcp: keep unaccepted MPC subflow into join list Date: Thu, 19 Nov 2020 11:45:58 -0800 Message-Id: <20201119194603.103158-6-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> References: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni This will simplify all operation dealing with subflows before accept time (e.g. data fin processing, add_addr). The join list is already flushed by mptcp_stream_accept() before returning the newly created msk to the user space. This also fixes an potential bug present into the old code: conn_list was manipulated without helding the msk lock in mptcp_stream_accept(). Tested-by: Geliang Tang Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/protocol.c | 24 ++++++++---------------- net/mptcp/protocol.h | 9 +++++++++ net/mptcp/subflow.c | 10 +++++----- 3 files changed, 22 insertions(+), 21 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 806c0658e42f..0e83887efbc8 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2342,7 +2342,6 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, if (sk_is_mptcp(newsk)) { struct mptcp_subflow_context *subflow; struct sock *new_mptcp_sock; - struct sock *ssk = newsk; subflow = mptcp_subflow_ctx(newsk); new_mptcp_sock = subflow->conn; @@ -2357,22 +2356,8 @@ static struct sock *mptcp_accept(struct sock *sk, int flags, int *err, /* acquire the 2nd reference for the owning socket */ sock_hold(new_mptcp_sock); - - local_bh_disable(); - bh_lock_sock(new_mptcp_sock); - msk = mptcp_sk(new_mptcp_sock); - msk->first = newsk; - newsk = new_mptcp_sock; - mptcp_copy_inaddrs(newsk, ssk); - list_add(&subflow->node, &msk->conn_list); - sock_hold(ssk); - - mptcp_rcv_space_init(msk, ssk); - bh_unlock_sock(new_mptcp_sock); - - __MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEPASSIVEACK); - local_bh_enable(); + MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEPASSIVEACK); } else { MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK); @@ -2823,6 +2808,12 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, if (err == 0 && !mptcp_is_tcpsk(newsock->sk)) { struct mptcp_sock *msk = mptcp_sk(newsock->sk); struct mptcp_subflow_context *subflow; + struct sock *newsk = newsock->sk; + bool slowpath; + + slowpath = lock_sock_fast(newsk); + mptcp_copy_inaddrs(newsk, msk->first); + mptcp_rcv_space_init(msk, msk->first); /* set ssk->sk_socket of accept()ed flows to mptcp socket. * This is needed so NOSPACE flag can be set from tcp stack. @@ -2834,6 +2825,7 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock, if (!ssk->sk_socket) mptcp_sock_graft(ssk, newsock); } + unlock_sock_fast(newsk, slowpath); } if (inet_csk_listen_poll(ssock->sk)) diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 10fffc5de9e4..7affaf0b1941 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -403,6 +403,15 @@ mptcp_subflow_get_mapped_dsn(const struct mptcp_subflow_context *subflow) return subflow->map_seq + mptcp_subflow_get_map_offset(subflow); } +static inline void mptcp_add_pending_subflow(struct mptcp_sock *msk, + struct mptcp_subflow_context *subflow) +{ + sock_hold(mptcp_subflow_tcp_sock(subflow)); + spin_lock_bh(&msk->join_list_lock); + list_add_tail(&subflow->node, &msk->join_list); + spin_unlock_bh(&msk->join_list_lock); +} + int mptcp_is_enabled(struct net *net); unsigned int mptcp_get_add_addr_timeout(struct net *net); void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow, diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 794259789194..d3c6b3a5ad55 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -578,6 +578,10 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk, */ inet_sk_state_store((void *)new_msk, TCP_ESTABLISHED); + /* link the newly created socket to the msk */ + mptcp_add_pending_subflow(mptcp_sk(new_msk), ctx); + WRITE_ONCE(mptcp_sk(new_msk)->first, child); + /* new mpc subflow takes ownership of the newly * created mptcp socket */ @@ -1124,11 +1128,7 @@ int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, if (err && err != -EINPROGRESS) goto failed; - sock_hold(ssk); - spin_lock_bh(&msk->join_list_lock); - list_add_tail(&subflow->node, &msk->join_list); - spin_unlock_bh(&msk->join_list_lock); - + mptcp_add_pending_subflow(msk, subflow); return err; failed: From patchwork Thu Nov 19 19:45:59 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 328946 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7978CC56201 for ; Thu, 19 Nov 2020 19:46:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 23548246CA for ; Thu, 19 Nov 2020 19:46:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727907AbgKSTqL (ORCPT ); Thu, 19 Nov 2020 14:46:11 -0500 Received: from mga03.intel.com ([134.134.136.65]:56904 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727066AbgKSTqK (ORCPT ); Thu, 19 Nov 2020 14:46:10 -0500 IronPort-SDR: QAQnKWLD5rOhA8N8ktN/Cv5YTnovh/190QOU388iMgVCn15/2mvHaC439GHH7nDZ50POi57cu/ w9KZPbrEJV5Q== X-IronPort-AV: E=McAfee;i="6000,8403,9810"; a="171451133" X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="171451133" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 IronPort-SDR: ZCI2TO1Sh7a+CSPPgUkV+61DYFx2bJKlmnRFusAXp6fjnbs1qqGkRkSI6J5BAMnZHEuv5p1Yen g79HDq2Qd8DQ== X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="476940484" Received: from mjmartin-nuc02.amr.corp.intel.com ([10.255.229.232]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 From: Mat Martineau To: netdev@vger.kernel.org Cc: Geliang Tang , kuba@kernel.org, mptcp@lists.01.org, Paolo Abeni , Mat Martineau Subject: [PATCH net-next 06/10] mptcp: change add_addr_signal type Date: Thu, 19 Nov 2020 11:45:59 -0800 Message-Id: <20201119194603.103158-7-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> References: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Geliang Tang This patch changed the 'add_addr_signal' type from bool to char, so that we could encode the addr type there. Suggested-by: Paolo Abeni Acked-by: Paolo Abeni Signed-off-by: Geliang Tang Signed-off-by: Mat Martineau --- net/mptcp/pm.c | 15 +++++++++------ net/mptcp/protocol.h | 15 ++++++++++++--- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index f9c88e2abb8e..c2c12f02a263 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -16,11 +16,15 @@ int mptcp_pm_announce_addr(struct mptcp_sock *msk, const struct mptcp_addr_info *addr, bool echo) { + u8 add_addr = READ_ONCE(msk->pm.add_addr_signal); + pr_debug("msk=%p, local_id=%d", msk, addr->id); msk->pm.local = *addr; - WRITE_ONCE(msk->pm.add_addr_echo, echo); - WRITE_ONCE(msk->pm.add_addr_signal, true); + add_addr |= BIT(MPTCP_ADD_ADDR_SIGNAL); + if (echo) + add_addr |= BIT(MPTCP_ADD_ADDR_ECHO); + WRITE_ONCE(msk->pm.add_addr_signal, add_addr); return 0; } @@ -182,13 +186,13 @@ bool mptcp_pm_add_addr_signal(struct mptcp_sock *msk, unsigned int remaining, if (!mptcp_pm_should_add_signal(msk)) goto out_unlock; - *echo = READ_ONCE(msk->pm.add_addr_echo); + *echo = mptcp_pm_should_add_signal_echo(msk); if (remaining < mptcp_add_addr_len(msk->pm.local.family, *echo)) goto out_unlock; *saddr = msk->pm.local; - WRITE_ONCE(msk->pm.add_addr_signal, false); + WRITE_ONCE(msk->pm.add_addr_signal, 0); ret = true; out_unlock: @@ -232,11 +236,10 @@ void mptcp_pm_data_init(struct mptcp_sock *msk) msk->pm.subflows = 0; msk->pm.rm_id = 0; WRITE_ONCE(msk->pm.work_pending, false); - WRITE_ONCE(msk->pm.add_addr_signal, false); + WRITE_ONCE(msk->pm.add_addr_signal, 0); WRITE_ONCE(msk->pm.rm_addr_signal, false); WRITE_ONCE(msk->pm.accept_addr, false); WRITE_ONCE(msk->pm.accept_subflow, false); - WRITE_ONCE(msk->pm.add_addr_echo, false); msk->pm.status = 0; spin_lock_init(&msk->pm.lock); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 7affaf0b1941..a62ffae621c2 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -165,6 +165,11 @@ enum mptcp_pm_status { MPTCP_PM_SUBFLOW_ESTABLISHED, }; +enum mptcp_add_addr_status { + MPTCP_ADD_ADDR_SIGNAL, + MPTCP_ADD_ADDR_ECHO, +}; + struct mptcp_pm_data { struct mptcp_addr_info local; struct mptcp_addr_info remote; @@ -172,13 +177,12 @@ struct mptcp_pm_data { spinlock_t lock; /*protects the whole PM data */ - bool add_addr_signal; + u8 add_addr_signal; bool rm_addr_signal; bool server_side; bool work_pending; bool accept_addr; bool accept_subflow; - bool add_addr_echo; u8 add_addr_signaled; u8 add_addr_accepted; u8 local_addr_used; @@ -516,7 +520,12 @@ int mptcp_pm_remove_subflow(struct mptcp_sock *msk, u8 local_id); static inline bool mptcp_pm_should_add_signal(struct mptcp_sock *msk) { - return READ_ONCE(msk->pm.add_addr_signal); + return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_SIGNAL); +} + +static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk) +{ + return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO); } static inline bool mptcp_pm_should_rm_signal(struct mptcp_sock *msk) From patchwork Thu Nov 19 19:46:00 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 328942 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 871EFC63777 for ; Thu, 19 Nov 2020 19:46:25 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 427F924199 for ; Thu, 19 Nov 2020 19:46:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728050AbgKSTqU (ORCPT ); Thu, 19 Nov 2020 14:46:20 -0500 Received: from mga03.intel.com ([134.134.136.65]:56904 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727911AbgKSTqN (ORCPT ); Thu, 19 Nov 2020 14:46:13 -0500 IronPort-SDR: +YOwWHAFm6Ks7seALdMfdnnhP5aA9Lm5iOb5+jaO9w83DQBQ2xRrqTs99Pqok91zXsadOuaW+N YOGCJvVTZ7Xg== X-IronPort-AV: E=McAfee;i="6000,8403,9810"; a="171451134" X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="171451134" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 IronPort-SDR: 6Cw6PbRC4pH6NQtwErvWzROzDmK+aHadY2DfgzBTStiCkJSd/nNoRNvBxcAJIGGZtPWN0mV99U O0Q94qR84blQ== X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="476940485" Received: from mjmartin-nuc02.amr.corp.intel.com ([10.255.229.232]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 From: Mat Martineau To: netdev@vger.kernel.org Cc: Geliang Tang , kuba@kernel.org, mptcp@lists.01.org, Paolo Abeni , Mat Martineau Subject: [PATCH net-next 07/10] mptcp: send out dedicated ADD_ADDR packet Date: Thu, 19 Nov 2020 11:46:00 -0800 Message-Id: <20201119194603.103158-8-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> References: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Geliang Tang When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets. It will not fit when other MPTCP suboptions are included in this packet, e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only ADD_ADDR suboption, no other MPTCP suboptions. In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR. If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet. In mptcp_established_options_add_addr, we check whether this is a pure ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions in this packet, only put ADD_ADDR suboption in it. Suggested-by: Paolo Abeni Acked-by: Paolo Abeni Signed-off-by: Geliang Tang Signed-off-by: Mat Martineau --- net/mptcp/options.c | 25 ++++++++++++++++++++++--- net/mptcp/pm.c | 16 ++++++++++++++-- net/mptcp/pm_netlink.c | 29 +++++++++++++++++++++++++++++ net/mptcp/protocol.c | 6 +++++- net/mptcp/protocol.h | 10 ++++++++++ 5 files changed, 80 insertions(+), 6 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index f2d1e27a2bc1..d7afffcc87c1 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -242,7 +242,9 @@ static void mptcp_parse_option(const struct sk_buff *skb, mp_opt->add_addr = 1; mp_opt->addr_id = *ptr++; - pr_debug("ADD_ADDR: id=%d, echo=%d", mp_opt->addr_id, mp_opt->echo); + pr_debug("ADD_ADDR%s: id=%d, echo=%d", + (mp_opt->family == MPTCP_ADDR_IPVERSION_6) ? "6" : "", + mp_opt->addr_id, mp_opt->echo); if (mp_opt->family == MPTCP_ADDR_IPVERSION_4) { memcpy((u8 *)&mp_opt->addr.s_addr, (u8 *)ptr, 4); ptr += 4; @@ -573,17 +575,27 @@ static u64 add_addr6_generate_hmac(u64 key1, u64 key2, u8 addr_id, } #endif -static bool mptcp_established_options_add_addr(struct sock *sk, +static bool mptcp_established_options_add_addr(struct sock *sk, struct sk_buff *skb, unsigned int *size, unsigned int remaining, struct mptcp_out_options *opts) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct mptcp_sock *msk = mptcp_sk(subflow->conn); + bool drop_other_suboptions = false; + unsigned int opt_size = *size; struct mptcp_addr_info saddr; bool echo; int len; + if (mptcp_pm_should_add_signal_ipv6(msk) && + skb && skb_is_tcp_pure_ack(skb)) { + pr_debug("drop other suboptions"); + opts->suboptions = 0; + remaining += opt_size; + drop_other_suboptions = true; + } + if (!mptcp_pm_should_add_signal(msk) || !(mptcp_pm_add_addr_signal(msk, remaining, &saddr, &echo))) return false; @@ -593,6 +605,8 @@ static bool mptcp_established_options_add_addr(struct sock *sk, return false; *size = len; + if (drop_other_suboptions) + *size -= opt_size; opts->addr_id = saddr.id; if (saddr.family == AF_INET) { opts->suboptions |= OPTION_MPTCP_ADD_ADDR; @@ -678,7 +692,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, *size += opt_size; remaining -= opt_size; - if (mptcp_established_options_add_addr(sk, &opt_size, remaining, opts)) { + if (mptcp_established_options_add_addr(sk, skb, &opt_size, remaining, opts)) { *size += opt_size; remaining -= opt_size; ret = true; @@ -759,6 +773,11 @@ static bool check_fully_established(struct mptcp_sock *msk, struct sock *ssk, goto fully_established; } + if (mp_opt->add_addr) { + WRITE_ONCE(msk->fully_established, true); + return true; + } + /* If the first established packet does not contain MP_CAPABLE + data * then fallback to TCP. Fallback scenarios requires a reset for * MP_JOIN subflows. diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index c2c12f02a263..75c5040e8d5d 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -24,6 +24,8 @@ int mptcp_pm_announce_addr(struct mptcp_sock *msk, add_addr |= BIT(MPTCP_ADD_ADDR_SIGNAL); if (echo) add_addr |= BIT(MPTCP_ADD_ADDR_ECHO); + if (addr->family == AF_INET6) + add_addr |= BIT(MPTCP_ADD_ADDR_IPV6); WRITE_ONCE(msk->pm.add_addr_signal, add_addr); return 0; } @@ -153,14 +155,24 @@ void mptcp_pm_add_addr_received(struct mptcp_sock *msk, spin_lock_bh(&pm->lock); - if (!READ_ONCE(pm->accept_addr)) + if (!READ_ONCE(pm->accept_addr)) { mptcp_pm_announce_addr(msk, addr, true); - else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) + mptcp_pm_add_addr_send_ack(msk); + } else if (mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_RECEIVED)) { pm->remote = *addr; + } spin_unlock_bh(&pm->lock); } +void mptcp_pm_add_addr_send_ack(struct mptcp_sock *msk) +{ + if (!mptcp_pm_should_add_signal_ipv6(msk)) + return; + + mptcp_pm_schedule_work(msk, MPTCP_PM_ADD_ADDR_SEND_ACK); +} + void mptcp_pm_rm_addr_received(struct mptcp_sock *msk, u8 rm_id) { struct mptcp_pm_data *pm = &msk->pm; diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index f8a9d82a0ea8..03f2c28f11f5 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -228,6 +228,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer) if (!mptcp_pm_should_add_signal(msk)) { pr_debug("retransmit ADD_ADDR id=%d", entry->addr.id); mptcp_pm_announce_addr(msk, &entry->addr, false); + mptcp_pm_add_addr_send_ack(msk); entry->retrans_times++; } @@ -328,6 +329,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) if (mptcp_pm_alloc_anno_list(msk, local)) { msk->pm.add_addr_signaled++; mptcp_pm_announce_addr(msk, &local->addr, false); + mptcp_pm_nl_add_addr_send_ack(msk); } } else { /* pick failed, avoid fourther attempts later */ @@ -398,6 +400,33 @@ void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) spin_lock_bh(&msk->pm.lock); mptcp_pm_announce_addr(msk, &remote, true); + mptcp_pm_nl_add_addr_send_ack(msk); +} + +void mptcp_pm_nl_add_addr_send_ack(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + if (!mptcp_pm_should_add_signal_ipv6(msk)) + return; + + __mptcp_flush_join_list(msk); + subflow = list_first_entry_or_null(&msk->conn_list, typeof(*subflow), node); + if (subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + u8 add_addr; + + spin_unlock_bh(&msk->pm.lock); + pr_debug("send ack for add_addr6"); + lock_sock(ssk); + tcp_send_ack(ssk); + release_sock(ssk); + spin_lock_bh(&msk->pm.lock); + + add_addr = READ_ONCE(msk->pm.add_addr_signal); + add_addr &= ~BIT(MPTCP_ADD_ADDR_IPV6); + WRITE_ONCE(msk->pm.add_addr_signal, add_addr); + } } void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 0e83887efbc8..c2629190b1ce 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -690,7 +690,7 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) sk->sk_data_ready(sk); } -static void __mptcp_flush_join_list(struct mptcp_sock *msk) +void __mptcp_flush_join_list(struct mptcp_sock *msk) { if (likely(list_empty(&msk->join_list))) return; @@ -1807,6 +1807,10 @@ static void pm_work(struct mptcp_sock *msk) pm->status &= ~BIT(MPTCP_PM_ADD_ADDR_RECEIVED); mptcp_pm_nl_add_addr_received(msk); } + if (pm->status & BIT(MPTCP_PM_ADD_ADDR_SEND_ACK)) { + pm->status &= ~BIT(MPTCP_PM_ADD_ADDR_SEND_ACK); + mptcp_pm_nl_add_addr_send_ack(msk); + } if (pm->status & BIT(MPTCP_PM_RM_ADDR_RECEIVED)) { pm->status &= ~BIT(MPTCP_PM_RM_ADDR_RECEIVED); mptcp_pm_nl_rm_addr_received(msk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index a62ffae621c2..4d6dd4adaaaf 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -160,6 +160,7 @@ struct mptcp_addr_info { enum mptcp_pm_status { MPTCP_PM_ADD_ADDR_RECEIVED, + MPTCP_PM_ADD_ADDR_SEND_ACK, MPTCP_PM_RM_ADDR_RECEIVED, MPTCP_PM_ESTABLISHED, MPTCP_PM_SUBFLOW_ESTABLISHED, @@ -168,6 +169,7 @@ enum mptcp_pm_status { enum mptcp_add_addr_status { MPTCP_ADD_ADDR_SIGNAL, MPTCP_ADD_ADDR_ECHO, + MPTCP_ADD_ADDR_IPV6, }; struct mptcp_pm_data { @@ -466,6 +468,7 @@ bool mptcp_schedule_work(struct sock *sk); void mptcp_data_acked(struct sock *sk); void mptcp_subflow_eof(struct sock *sk); bool mptcp_update_rcv_data_fin(struct mptcp_sock *msk, u64 data_fin_seq, bool use_64bit); +void __mptcp_flush_join_list(struct mptcp_sock *msk); static inline bool mptcp_data_fin_enabled(const struct mptcp_sock *msk) { return READ_ONCE(msk->snd_data_fin_enable) && @@ -506,6 +509,7 @@ void mptcp_pm_subflow_established(struct mptcp_sock *msk, void mptcp_pm_subflow_closed(struct mptcp_sock *msk, u8 id); void mptcp_pm_add_addr_received(struct mptcp_sock *msk, const struct mptcp_addr_info *addr); +void mptcp_pm_add_addr_send_ack(struct mptcp_sock *msk); void mptcp_pm_rm_addr_received(struct mptcp_sock *msk, u8 rm_id); void mptcp_pm_free_anno_list(struct mptcp_sock *msk); struct mptcp_pm_add_entry * @@ -528,6 +532,11 @@ static inline bool mptcp_pm_should_add_signal_echo(struct mptcp_sock *msk) return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_ECHO); } +static inline bool mptcp_pm_should_add_signal_ipv6(struct mptcp_sock *msk) +{ + return READ_ONCE(msk->pm.add_addr_signal) & BIT(MPTCP_ADD_ADDR_IPV6); +} + static inline bool mptcp_pm_should_rm_signal(struct mptcp_sock *msk) { return READ_ONCE(msk->pm.rm_addr_signal); @@ -552,6 +561,7 @@ void mptcp_pm_nl_data_init(struct mptcp_sock *msk); void mptcp_pm_nl_fully_established(struct mptcp_sock *msk); void mptcp_pm_nl_subflow_established(struct mptcp_sock *msk); void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk); +void mptcp_pm_nl_add_addr_send_ack(struct mptcp_sock *msk); void mptcp_pm_nl_rm_addr_received(struct mptcp_sock *msk); void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk, u8 rm_id); int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc); From patchwork Thu Nov 19 19:46:02 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 328943 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CFD8C388F9 for ; Thu, 19 Nov 2020 19:46:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B1F6D24199 for ; Thu, 19 Nov 2020 19:46:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728062AbgKSTqU (ORCPT ); Thu, 19 Nov 2020 14:46:20 -0500 Received: from mga03.intel.com ([134.134.136.65]:56916 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727920AbgKSTqN (ORCPT ); Thu, 19 Nov 2020 14:46:13 -0500 IronPort-SDR: bRvEhjL2IbjQiy34vxPuURAlhrPY6xbhrV6MSRkmX1maK+k8B7npEYGetqVAk3d3WKRBgPRNZm Jadie7Z8z4/Q== X-IronPort-AV: E=McAfee;i="6000,8403,9810"; a="171451136" X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="171451136" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 IronPort-SDR: coZ2WVjKfSXP57B7U/rof2lcw7Zbfuv+u3ijnWBXE5/U6Pmlo9kFk+vN4CZ3mi/ba4saxi+b97 SiYDxEtIC7+g== X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="476940489" Received: from mjmartin-nuc02.amr.corp.intel.com ([10.255.229.232]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 From: Mat Martineau To: netdev@vger.kernel.org Cc: Florian Westphal , kuba@kernel.org, mptcp@lists.01.org, Paolo Abeni , Mat Martineau Subject: [PATCH net-next 09/10] mptcp: track window announced to peer Date: Thu, 19 Nov 2020 11:46:02 -0800 Message-Id: <20201119194603.103158-10-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> References: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Florian Westphal OoO handling attempts to detect when packet is out-of-window by testing current ack sequence and remaining space vs. sequence number. This doesn't work reliably. Store the highest allowed sequence number that we've announced and use it to detect oow packets. Do this when mptcp options get written to the packet (wire format). For this to work we need to move the write_options call until after stack selected a new tcp window. Acked-by: Paolo Abeni Signed-off-by: Florian Westphal Signed-off-by: Mat Martineau --- include/net/mptcp.h | 3 ++- net/ipv4/tcp_output.c | 11 +++++++---- net/mptcp/options.c | 22 +++++++++++++++++++++- net/mptcp/protocol.c | 12 +++++++----- net/mptcp/protocol.h | 1 + 5 files changed, 38 insertions(+), 11 deletions(-) diff --git a/include/net/mptcp.h b/include/net/mptcp.h index 6e706d838e4e..b6cf07143a8a 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -88,7 +88,8 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, struct mptcp_out_options *opts); void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb); -void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts); +void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp, + struct mptcp_out_options *opts); /* move the skb extension owership, with the assumption that 'to' is * newly allocated diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 99905bc01d40..41880d3521ed 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -445,11 +445,12 @@ struct tcp_out_options { struct mptcp_out_options mptcp; }; -static void mptcp_options_write(__be32 *ptr, struct tcp_out_options *opts) +static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp, + struct tcp_out_options *opts) { #if IS_ENABLED(CONFIG_MPTCP) if (unlikely(OPTION_MPTCP & opts->options)) - mptcp_write_options(ptr, &opts->mptcp); + mptcp_write_options(ptr, tp, &opts->mptcp); #endif } @@ -701,7 +702,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp, smc_options_write(ptr, &options); - mptcp_options_write(ptr, opts); + mptcp_options_write(ptr, tp, opts); } static void smc_set_option(const struct tcp_sock *tp, @@ -1346,7 +1347,6 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, } } - tcp_options_write((__be32 *)(th + 1), tp, &opts); skb_shinfo(skb)->gso_type = sk->sk_gso_type; if (likely(!(tcb->tcp_flags & TCPHDR_SYN))) { th->window = htons(tcp_select_window(sk)); @@ -1357,6 +1357,9 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, */ th->window = htons(min(tp->rcv_wnd, 65535U)); } + + tcp_options_write((__be32 *)(th + 1), tp, &opts); + #ifdef CONFIG_TCP_MD5SIG /* Calculate the MD5 hash, as we have all we need now */ if (md5) { diff --git a/net/mptcp/options.c b/net/mptcp/options.c index d7afffcc87c1..248e3930c0cb 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -1010,7 +1010,24 @@ void mptcp_incoming_options(struct sock *sk, struct sk_buff *skb) } } -void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) +static void mptcp_set_rwin(const struct tcp_sock *tp) +{ + const struct sock *ssk = (const struct sock *)tp; + const struct mptcp_subflow_context *subflow; + struct mptcp_sock *msk; + u64 ack_seq; + + subflow = mptcp_subflow_ctx(ssk); + msk = mptcp_sk(subflow->conn); + + ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd; + + if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent))) + WRITE_ONCE(msk->rcv_wnd_sent, ack_seq); +} + +void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp, + struct mptcp_out_options *opts) { if ((OPTION_MPTCP_MPC_SYN | OPTION_MPTCP_MPC_SYNACK | OPTION_MPTCP_MPC_ACK) & opts->suboptions) { @@ -1167,4 +1184,7 @@ void mptcp_write_options(__be32 *ptr, struct mptcp_out_options *opts) TCPOPT_NOP << 8 | TCPOPT_NOP, ptr); } } + + if (tp) + mptcp_set_rwin(tp); } diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index c2629190b1ce..4ae2c4a30e44 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -168,19 +168,19 @@ static void mptcp_data_queue_ofo(struct mptcp_sock *msk, struct sk_buff *skb) struct rb_node **p, *parent; u64 seq, end_seq, max_seq; struct sk_buff *skb1; - int space; seq = MPTCP_SKB_CB(skb)->map_seq; end_seq = MPTCP_SKB_CB(skb)->end_seq; - space = tcp_space(sk); - max_seq = space > 0 ? space + msk->ack_seq : msk->ack_seq; + max_seq = READ_ONCE(msk->rcv_wnd_sent); pr_debug("msk=%p seq=%llx limit=%llx empty=%d", msk, seq, max_seq, RB_EMPTY_ROOT(&msk->out_of_order_queue)); - if (after64(seq, max_seq)) { + if (after64(end_seq, max_seq)) { /* out of window */ mptcp_drop(sk, skb); - pr_debug("oow by %ld", (unsigned long)seq - (unsigned long)max_seq); + pr_debug("oow by %lld, rcv_wnd_sent %llu\n", + (unsigned long long)end_seq - (unsigned long)max_seq, + (unsigned long long)msk->rcv_wnd_sent); MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_NODSSWINDOW); return; } @@ -2294,6 +2294,7 @@ struct sock *mptcp_sk_clone(const struct sock *sk, mptcp_crypto_key_sha(msk->remote_key, NULL, &ack_seq); ack_seq++; WRITE_ONCE(msk->ack_seq, ack_seq); + WRITE_ONCE(msk->rcv_wnd_sent, ack_seq); } sock_reset_flag(nsk, SOCK_RCU_FREE); @@ -2586,6 +2587,7 @@ void mptcp_finish_connect(struct sock *ssk) WRITE_ONCE(msk->write_seq, subflow->idsn + 1); WRITE_ONCE(msk->snd_nxt, msk->write_seq); WRITE_ONCE(msk->ack_seq, ack_seq); + WRITE_ONCE(msk->rcv_wnd_sent, ack_seq); WRITE_ONCE(msk->can_ack, 1); atomic64_set(&msk->snd_una, msk->write_seq); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 4d6dd4adaaaf..67f86818203f 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -216,6 +216,7 @@ struct mptcp_sock { u64 write_seq; u64 snd_nxt; u64 ack_seq; + u64 rcv_wnd_sent; u64 rcv_data_fin_seq; struct sock *last_snd; int snd_burst; From patchwork Thu Nov 19 19:46:03 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mat Martineau X-Patchwork-Id: 328945 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C6CAC388F9 for ; Thu, 19 Nov 2020 19:46:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CEAB82469D for ; Thu, 19 Nov 2020 19:46:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728011AbgKSTqP (ORCPT ); Thu, 19 Nov 2020 14:46:15 -0500 Received: from mga03.intel.com ([134.134.136.65]:56904 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727926AbgKSTqO (ORCPT ); Thu, 19 Nov 2020 14:46:14 -0500 IronPort-SDR: +TF4sLReS9QyiiedoHaqHkEWGIcIPzbA08FlPu5+KAjTeYDYBum+MU+wqS27vMbfsMaEdtMGV4 XCdCzFadcvxQ== X-IronPort-AV: E=McAfee;i="6000,8403,9810"; a="171451137" X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="171451137" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 IronPort-SDR: 38Rx7b29TrbAkOJKZe0GlXJTTWOs3Lkh2+1QqSidQ3WfLkHORJGo97xoEB2k/x6d5k7kW1IgtX qqqLnXsM9TQw== X-IronPort-AV: E=Sophos;i="5.78,354,1599548400"; d="scan'208";a="476940491" Received: from mjmartin-nuc02.amr.corp.intel.com ([10.255.229.232]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Nov 2020 11:46:09 -0800 From: Mat Martineau To: netdev@vger.kernel.org Cc: Paolo Abeni , kuba@kernel.org, mptcp@lists.01.org, Mat Martineau Subject: [PATCH net-next 10/10] mptcp: refine MPTCP-level ack scheduling Date: Thu, 19 Nov 2020 11:46:03 -0800 Message-Id: <20201119194603.103158-11-mathew.j.martineau@linux.intel.com> X-Mailer: git-send-email 2.29.2 In-Reply-To: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> References: <20201119194603.103158-1-mathew.j.martineau@linux.intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Paolo Abeni Send timely MPTCP-level ack is somewhat difficult when the insertion into the msk receive level is performed by the worker. It needs TCP-level dup-ack to notify the MPTCP-level ack_seq increase, as both the TCP-level ack seq and the rcv window are unchanged. We can actually avoid processing incoming data with the worker, and let the subflow or recevmsg() send ack as needed. When recvmsg() moves the skbs inside the msk receive queue, the msk space is still unchanged, so tcp_cleanup_rbuf() could end-up skipping TCP-level ack generation. Anyway, when __mptcp_move_skbs() is invoked, a known amount of bytes is going to be consumed soon: we update rcv wnd computation taking them in account. Additionally we need to explicitly trigger tcp_cleanup_rbuf() when recvmsg() consumes a significant amount of the receive buffer. Signed-off-by: Paolo Abeni Signed-off-by: Mat Martineau --- net/mptcp/options.c | 1 + net/mptcp/protocol.c | 105 +++++++++++++++++++++---------------------- net/mptcp/protocol.h | 8 ++++ net/mptcp/subflow.c | 4 +- 4 files changed, 61 insertions(+), 57 deletions(-) diff --git a/net/mptcp/options.c b/net/mptcp/options.c index 248e3930c0cb..8a59b3e44599 100644 --- a/net/mptcp/options.c +++ b/net/mptcp/options.c @@ -530,6 +530,7 @@ static bool mptcp_established_options_dss(struct sock *sk, struct sk_buff *skb, opts->ext_copy.ack64 = 0; } opts->ext_copy.use_ack = 1; + WRITE_ONCE(msk->old_wspace, __mptcp_space((struct sock *)msk)); /* Add kind/length/subtype/flag overhead if mapping is not populated */ if (dss_size == 0) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4ae2c4a30e44..748343f1a968 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -407,16 +407,42 @@ static void mptcp_set_timeout(const struct sock *sk, const struct sock *ssk) mptcp_sk(sk)->timer_ival = tout > 0 ? tout : TCP_RTO_MIN; } -static void mptcp_send_ack(struct mptcp_sock *msk) +static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) +{ + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + /* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */ + if (subflow->request_join && !subflow->fully_established) + return false; + + /* only send if our side has not closed yet */ + return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)); +} + +static void mptcp_send_ack(struct mptcp_sock *msk, bool force) { struct mptcp_subflow_context *subflow; + struct sock *pick = NULL; mptcp_for_each_subflow(msk, subflow) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - lock_sock(ssk); - tcp_send_ack(ssk); - release_sock(ssk); + if (force) { + lock_sock(ssk); + tcp_send_ack(ssk); + release_sock(ssk); + continue; + } + + /* if the hintes ssk is still active, use it */ + pick = ssk; + if (ssk == msk->ack_hint) + break; + } + if (!force && pick) { + lock_sock(pick); + tcp_cleanup_rbuf(pick, 1); + release_sock(pick); } } @@ -468,7 +494,7 @@ static bool mptcp_check_data_fin(struct sock *sk) ret = true; mptcp_set_timeout(sk, NULL); - mptcp_send_ack(msk); + mptcp_send_ack(msk, true); mptcp_close_wake_up(sk); } return ret; @@ -483,7 +509,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, unsigned int moved = 0; bool more_data_avail; struct tcp_sock *tp; - u32 old_copied_seq; bool done = false; int sk_rbuf; @@ -500,7 +525,6 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, pr_debug("msk=%p ssk=%p", msk, ssk); tp = tcp_sk(ssk); - old_copied_seq = tp->copied_seq; do { u32 map_remaining, offset; u32 seq = tp->copied_seq; @@ -564,11 +588,9 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, break; } } while (more_data_avail); + msk->ack_hint = ssk; *bytes += moved; - if (tp->copied_seq != old_copied_seq) - tcp_cleanup_rbuf(ssk, 1); - return done; } @@ -672,19 +694,8 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) if (atomic_read(&sk->sk_rmem_alloc) > sk_rbuf) goto wake; - if (move_skbs_to_msk(msk, ssk)) - goto wake; + move_skbs_to_msk(msk, ssk); - /* mptcp socket is owned, release_cb should retry */ - if (!test_and_set_bit(TCP_DELACK_TIMER_DEFERRED, - &sk->sk_tsq_flags)) { - sock_hold(sk); - - /* need to try again, its possible release_cb() has already - * been called after the test_and_set_bit() above. - */ - move_skbs_to_msk(msk, ssk); - } wake: if (wake) sk->sk_data_ready(sk); @@ -1095,18 +1106,6 @@ static void mptcp_nospace(struct mptcp_sock *msk) mptcp_clean_una((struct sock *)msk); } -static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) -{ - struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - - /* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */ - if (subflow->request_join && !subflow->fully_established) - return false; - - /* only send if our side has not closed yet */ - return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)); -} - #define MPTCP_SEND_BURST_SIZE ((1 << 16) - \ sizeof(struct tcphdr) - \ MAX_TCP_OPTION_SPACE - \ @@ -1533,7 +1532,7 @@ static void mptcp_rcv_space_adjust(struct mptcp_sock *msk, int copied) msk->rcvq_space.time = mstamp; } -static bool __mptcp_move_skbs(struct mptcp_sock *msk) +static bool __mptcp_move_skbs(struct mptcp_sock *msk, unsigned int rcv) { unsigned int moved = 0; bool done; @@ -1552,12 +1551,16 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk) slowpath = lock_sock_fast(ssk); done = __mptcp_move_skbs_from_subflow(msk, ssk, &moved); + if (moved && rcv) { + WRITE_ONCE(msk->rmem_pending, min(rcv, moved)); + tcp_cleanup_rbuf(ssk, 1); + WRITE_ONCE(msk->rmem_pending, 0); + } unlock_sock_fast(ssk, slowpath); } while (!done); if (mptcp_ofo_queue(msk) || moved > 0) { - if (!mptcp_check_data_fin((struct sock *)msk)) - mptcp_send_ack(msk); + mptcp_check_data_fin((struct sock *)msk); return true; } return false; @@ -1581,8 +1584,8 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, target = sock_rcvlowat(sk, flags & MSG_WAITALL, len); __mptcp_flush_join_list(msk); - while (len > (size_t)copied) { - int bytes_read; + for (;;) { + int bytes_read, old_space; bytes_read = __mptcp_recvmsg_mskq(msk, msg, len - copied); if (unlikely(bytes_read < 0)) { @@ -1594,9 +1597,14 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, copied += bytes_read; if (skb_queue_empty(&sk->sk_receive_queue) && - __mptcp_move_skbs(msk)) + __mptcp_move_skbs(msk, len - copied)) continue; + /* be sure to advertise window change */ + old_space = READ_ONCE(msk->old_wspace); + if ((tcp_space(sk) - old_space) >= old_space) + mptcp_send_ack(msk, false); + /* only the master socket status is relevant here. The exit * conditions mirror closely tcp_recvmsg() */ @@ -1649,7 +1657,7 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, /* .. race-breaker: ssk might have gotten new data * after last __mptcp_move_skbs() returned false. */ - if (unlikely(__mptcp_move_skbs(msk))) + if (unlikely(__mptcp_move_skbs(msk, 0))) set_bit(MPTCP_DATA_READY, &msk->flags); } else if (unlikely(!test_bit(MPTCP_DATA_READY, &msk->flags))) { /* data to read but mptcp_wait_data() cleared DATA_READY */ @@ -1880,7 +1888,6 @@ static void mptcp_worker(struct work_struct *work) if (test_and_clear_bit(MPTCP_WORK_CLOSE_SUBFLOW, &msk->flags)) __mptcp_close_subflow(msk); - __mptcp_move_skbs(msk); if (mptcp_send_head(sk)) mptcp_push_pending(sk, 0); @@ -1964,6 +1971,7 @@ static int __mptcp_init_sock(struct sock *sk) msk->out_of_order_queue = RB_ROOT; msk->first_pending = NULL; + msk->ack_hint = NULL; msk->first = NULL; inet_csk(sk)->icsk_sync_mss = mptcp_sync_mss; @@ -2499,8 +2507,7 @@ static int mptcp_getsockopt(struct sock *sk, int level, int optname, return -EOPNOTSUPP; } -#define MPTCP_DEFERRED_ALL (TCPF_DELACK_TIMER_DEFERRED | \ - TCPF_WRITE_TIMER_DEFERRED) +#define MPTCP_DEFERRED_ALL (TCPF_WRITE_TIMER_DEFERRED) /* this is very alike tcp_release_cb() but we must handle differently a * different set of events @@ -2518,16 +2525,6 @@ static void mptcp_release_cb(struct sock *sk) sock_release_ownership(sk); - if (flags & TCPF_DELACK_TIMER_DEFERRED) { - struct mptcp_sock *msk = mptcp_sk(sk); - struct sock *ssk; - - ssk = mptcp_subflow_recv_lookup(msk); - if (!ssk || sk->sk_state == TCP_CLOSE || - !schedule_work(&msk->work)) - __sock_put(sk); - } - if (flags & TCPF_WRITE_TIMER_DEFERRED) { mptcp_retransmit_handler(sk); __sock_put(sk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 67f86818203f..82d5626323b1 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -220,10 +220,12 @@ struct mptcp_sock { u64 rcv_data_fin_seq; struct sock *last_snd; int snd_burst; + int old_wspace; atomic64_t snd_una; atomic64_t wnd_end; unsigned long timer_ival; u32 token; + int rmem_pending; unsigned long flags; bool can_ack; bool fully_established; @@ -231,6 +233,7 @@ struct mptcp_sock { bool snd_data_fin_enable; bool use_64bit_ack; /* Set when we received a 64-bit DSN */ spinlock_t join_list_lock; + struct sock *ack_hint; struct work_struct work; struct sk_buff *ooo_last_skb; struct rb_root out_of_order_queue; @@ -258,6 +261,11 @@ static inline struct mptcp_sock *mptcp_sk(const struct sock *sk) return (struct mptcp_sock *)sk; } +static inline int __mptcp_space(const struct sock *sk) +{ + return tcp_space(sk) + READ_ONCE(mptcp_sk(sk)->rmem_pending); +} + static inline struct mptcp_data_frag *mptcp_send_head(const struct sock *sk) { const struct mptcp_sock *msk = mptcp_sk(sk); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index d3c6b3a5ad55..4d8abff1be18 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -850,8 +850,6 @@ static void mptcp_subflow_discard_data(struct sock *ssk, struct sk_buff *skb, sk_eat_skb(ssk, skb); if (mptcp_subflow_get_map_offset(subflow) >= subflow->map_data_len) subflow->map_valid = 0; - if (incr) - tcp_cleanup_rbuf(ssk, incr); } static bool subflow_check_data_avail(struct sock *ssk) @@ -973,7 +971,7 @@ void mptcp_space(const struct sock *ssk, int *space, int *full_space) const struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); const struct sock *sk = subflow->conn; - *space = tcp_space(sk); + *space = __mptcp_space(sk); *full_space = tcp_full_space(sk); }