From patchwork Mon Sep 14 08:01:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 260940 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8924DC433E2 for ; Mon, 14 Sep 2020 08:01:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 200D4217BA for ; Mon, 14 Sep 2020 08:01:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="dJBORjB0" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726159AbgINIBs (ORCPT ); Mon, 14 Sep 2020 04:01:48 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:59540 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726068AbgINIBo (ORCPT ); Mon, 14 Sep 2020 04:01:44 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600070501; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OYahyQwmyzH+Plf0ElS0PkRUHmzKDTWB5mBkiNsuwgI=; b=dJBORjB0n65Bsuvr2NIMpnmKFhfk6AyP8jsGVEJ6k83rKuIsxDh7zP2L5IcZNA2fPbciXI B1yj8AD8spGMjsB4onLiRwyCriCZHEY+aqTcG/GHQPDNiCF60qS1PDbZ8tHZUqmQRP4RRc k7dtPHKcSk6dG2wm7PVUNmoGk235e/A= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-391-KrS0_P81MWeDS4bp8XD-AA-1; Mon, 14 Sep 2020 04:01:39 -0400 X-MC-Unique: KrS0_P81MWeDS4bp8XD-AA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6F94210BBEDA; Mon, 14 Sep 2020 08:01:38 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-112-96.ams2.redhat.com [10.36.112.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id 3049319C66; Mon, 14 Sep 2020 08:01:36 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , mptcp@lists.01.org Subject: [PATCH net-next v2 01/13] mptcp: rethink 'is writable' conditional Date: Mon, 14 Sep 2020 10:01:07 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Currently, when checking for the 'msk is writable' condition, we look at the individual subflows write space. That works well while we send data via a single subflow, but will not as soon as we will enable concurrent xmit on multiple subflows. With this change msk becomes writable when the following conditions hold: - the socket has some free write space - there is at least a subflow with write free space Additionally we need to set the NOSPACE bit on all subflows before blocking. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- net/mptcp/protocol.c | 71 ++++++++++++++++++++++++++++---------------- net/mptcp/subflow.c | 6 ++-- 2 files changed, 50 insertions(+), 27 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 683196225f91..854a8b3b9ecd 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -472,7 +472,7 @@ void mptcp_data_acked(struct sock *sk) { mptcp_reset_timer(sk); - if ((!sk_stream_is_writeable(sk) || + if ((!test_bit(MPTCP_SEND_SPACE, &mptcp_sk(sk)->flags) || (inet_sk_state_load(sk) != TCP_ESTABLISHED)) && schedule_work(&mptcp_sk(sk)->work)) sock_hold(sk); @@ -567,6 +567,20 @@ static void dfrag_clear(struct sock *sk, struct mptcp_data_frag *dfrag) put_page(dfrag->page); } +static bool mptcp_is_writeable(struct mptcp_sock *msk) +{ + struct mptcp_subflow_context *subflow; + + if (!sk_stream_is_writeable((struct sock *)msk)) + return false; + + mptcp_for_each_subflow(msk, subflow) { + if (sk_stream_is_writeable(subflow->tcp_sock)) + return true; + } + return false; +} + static void mptcp_clean_una(struct sock *sk) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -609,8 +623,15 @@ static void mptcp_clean_una(struct sock *sk) sk_mem_reclaim_partial(sk); /* Only wake up writers if a subflow is ready */ - if (test_bit(MPTCP_SEND_SPACE, &msk->flags)) + if (mptcp_is_writeable(msk)) { + set_bit(MPTCP_SEND_SPACE, &mptcp_sk(sk)->flags); + smp_mb__after_atomic(); + + /* set SEND_SPACE before sk_stream_write_space clears + * NOSPACE + */ sk_stream_write_space(sk); + } } } @@ -801,21 +822,31 @@ static int mptcp_sendmsg_frag(struct sock *sk, struct sock *ssk, return ret; } -static void mptcp_nospace(struct mptcp_sock *msk, struct socket *sock) +static void mptcp_nospace(struct mptcp_sock *msk) { + struct mptcp_subflow_context *subflow; + clear_bit(MPTCP_SEND_SPACE, &msk->flags); smp_mb__after_atomic(); /* msk->flags is changed by write_space cb */ - /* enables sk->write_space() callbacks */ - set_bit(SOCK_NOSPACE, &sock->flags); + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + struct socket *sock = READ_ONCE(ssk->sk_socket); + + /* enables ssk->write_space() callbacks */ + if (sock) + set_bit(SOCK_NOSPACE, &sock->flags); + } } static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) { struct mptcp_subflow_context *subflow; + struct sock *sk = (struct sock *)msk; struct sock *backup = NULL; + bool free; - sock_owned_by_me((const struct sock *)msk); + sock_owned_by_me(sk); if (!mptcp_ext_cache_refill(msk)) return NULL; @@ -823,12 +854,9 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) mptcp_for_each_subflow(msk, subflow) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - if (!sk_stream_memory_free(ssk)) { - struct socket *sock = ssk->sk_socket; - - if (sock) - mptcp_nospace(msk, sock); - + free = sk_stream_is_writeable(subflow->tcp_sock); + if (!free) { + mptcp_nospace(msk); return NULL; } @@ -845,16 +873,10 @@ static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk) return backup; } -static void ssk_check_wmem(struct mptcp_sock *msk, struct sock *ssk) +static void ssk_check_wmem(struct mptcp_sock *msk) { - struct socket *sock; - - if (likely(sk_stream_is_writeable(ssk))) - return; - - sock = READ_ONCE(ssk->sk_socket); - if (sock) - mptcp_nospace(msk, sock); + if (unlikely(!mptcp_is_writeable(msk))) + mptcp_nospace(msk); } static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) @@ -907,6 +929,7 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) mptcp_reset_timer(sk); } + mptcp_nospace(msk); ret = sk_stream_wait_memory(sk, &timeo); if (ret) goto out; @@ -945,7 +968,6 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (!sk_stream_memory_free(ssk) || !mptcp_page_frag_refill(ssk, pfrag) || !mptcp_ext_cache_refill(msk)) { - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); tcp_push(ssk, msg->msg_flags, mss_now, tcp_sk(ssk)->nonagle, size_goal); mptcp_set_timeout(sk, ssk); @@ -993,9 +1015,9 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) mptcp_reset_timer(sk); } - ssk_check_wmem(msk, ssk); release_sock(ssk); out: + ssk_check_wmem(msk); release_sock(sk); return copied ? : ret; } @@ -2291,8 +2313,7 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, if (state != TCP_SYN_SENT && state != TCP_SYN_RECV) { mask |= mptcp_check_readable(msk); - if (sk_stream_is_writeable(sk) && - test_bit(MPTCP_SEND_SPACE, &msk->flags)) + if (test_bit(MPTCP_SEND_SPACE, &msk->flags)) mask |= EPOLLOUT | EPOLLWRNORM; } if (sk->sk_shutdown & RCV_SHUTDOWN) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index e8cac2655c82..7ae1d3604047 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -996,8 +996,10 @@ static void subflow_write_space(struct sock *sk) struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct sock *parent = subflow->conn; - sk_stream_write_space(sk); - if (sk_stream_is_writeable(sk)) { + if (!sk_stream_is_writeable(sk)) + return; + + if (sk_stream_is_writeable(parent)) { set_bit(MPTCP_SEND_SPACE, &mptcp_sk(parent)->flags); smp_mb__after_atomic(); /* set SEND_SPACE before sk_stream_write_space clears NOSPACE */ From patchwork Mon Sep 14 08:01:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 260939 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, UNWANTED_LANGUAGE_BODY autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78AC1C433E2 for ; Mon, 14 Sep 2020 08:02:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2809A21741 for ; Mon, 14 Sep 2020 08:02:51 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="JJZwwcLq" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726045AbgINICg (ORCPT ); Mon, 14 Sep 2020 04:02:36 -0400 Received: from us-smtp-2.mimecast.com ([205.139.110.61]:58787 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726112AbgINIBr (ORCPT ); Mon, 14 Sep 2020 04:01:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600070503; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WqCMOiS2VshMlLkrkKtWQH98sRJoAhZ9ktcEKlFUYOE=; b=JJZwwcLqJ7ahu3uLUuTe4V4ZSzbxTZlTryAeZ5y17Aq8KMgf/p0yla4pfjh0qvtQOgqMLK UP+/P5thGlv12C5fjxUjYNxV/n3agXLYZk16Cia6IXAXKtBW60CxX2FrFlQDOkKz+1K+TZ nmuTRGiRqiYnXgZBL7m94fMYtB6KAEg= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-478-WqkJkk8eOO-KDyzAMv9lSg-1; Mon, 14 Sep 2020 04:01:41 -0400 X-MC-Unique: WqkJkk8eOO-KDyzAMv9lSg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1BDB881CAF9; Mon, 14 Sep 2020 08:01:40 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-112-96.ams2.redhat.com [10.36.112.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id D7BF019C66; Mon, 14 Sep 2020 08:01:38 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , mptcp@lists.01.org Subject: [PATCH net-next v2 02/13] mptcp: set data_ready status bit in subflow_check_data_avail() Date: Mon, 14 Sep 2020 10:01:08 +0200 Message-Id: <10b65904849e08dfccab8da33219d2081341f581.1599854632.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This simplify mptcp_subflow_data_available() and will made follow-up patches simpler. Additionally remove the unneeded checks on subflow copied_seq: we always whole skbs out of subflows. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- net/mptcp/subflow.c | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 7ae1d3604047..53b455c3c229 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -825,6 +825,8 @@ static bool subflow_check_data_avail(struct sock *ssk) pr_debug("msk=%p ssk=%p data_avail=%d skb=%p", subflow->conn, ssk, subflow->data_avail, skb_peek(&ssk->sk_receive_queue)); + if (!skb_peek(&ssk->sk_receive_queue)) + subflow->data_avail = 0; if (subflow->data_avail) return true; @@ -849,6 +851,7 @@ static bool subflow_check_data_avail(struct sock *ssk) subflow->map_data_len = skb->len; subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq - subflow->ssn_offset; + subflow->data_avail = 1; return true; } @@ -876,8 +879,10 @@ static bool subflow_check_data_avail(struct sock *ssk) ack_seq = mptcp_subflow_get_mapped_dsn(subflow); pr_debug("msk ack_seq=%llx subflow ack_seq=%llx", old_ack, ack_seq); - if (ack_seq == old_ack) + if (ack_seq == old_ack) { + subflow->data_avail = 1; break; + } /* only accept in-sequence mapping. Old values are spurious * retransmission; we can hit "future" values on active backup @@ -922,13 +927,13 @@ static bool subflow_check_data_avail(struct sock *ssk) ssk->sk_error_report(ssk); tcp_set_state(ssk, TCP_CLOSE); tcp_send_active_reset(ssk, GFP_ATOMIC); + subflow->data_avail = 0; return false; } bool mptcp_subflow_data_available(struct sock *sk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); - struct sk_buff *skb; /* check if current mapping is still valid */ if (subflow->map_valid && @@ -941,15 +946,7 @@ bool mptcp_subflow_data_available(struct sock *sk) subflow->map_data_len); } - if (!subflow_check_data_avail(sk)) { - subflow->data_avail = 0; - return false; - } - - skb = skb_peek(&sk->sk_receive_queue); - subflow->data_avail = skb && - before(tcp_sk(sk)->copied_seq, TCP_SKB_CB(skb)->end_seq); - return subflow->data_avail; + return subflow_check_data_avail(sk); } /* If ssk has an mptcp parent socket, use the mptcp rcvbuf occupancy, From patchwork Mon Sep 14 08:01:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 260934 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 93285C43461 for ; Mon, 14 Sep 2020 08:04:33 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 38DE721741 for ; Mon, 14 Sep 2020 08:04:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="cKRcFhn8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726124AbgINIEF (ORCPT ); Mon, 14 Sep 2020 04:04:05 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:54030 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726153AbgINIBu (ORCPT ); Mon, 14 Sep 2020 04:01:50 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600070507; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=urUPLgbKNth9XJjzTJHliRwM930KyzGBQhBugoA0S6Y=; b=cKRcFhn8e4aS8JPeeW+/FR41OVcgAF9mGhutQ70Af21//vLRIFNpS38SqvYQGCRe2oYO1x SKjGBaMlJRhaQXNmSaWjPnm6NJIbbUw/gezDGGvv6lhh4uemHfTCtb2lWYW1foVBKIdozr wVxH8vda4TK/dtOgvAoXBN9kZPCmtjU= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-326-USdezrf6Ppq05516MzTaEw-1; Mon, 14 Sep 2020 04:01:43 -0400 X-MC-Unique: USdezrf6Ppq05516MzTaEw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C353B81CAF8; Mon, 14 Sep 2020 08:01:41 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-112-96.ams2.redhat.com [10.36.112.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7532D19C66; Mon, 14 Sep 2020 08:01:40 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , mptcp@lists.01.org Subject: [PATCH net-next v2 03/13] mptcp: trigger msk processing even for OoO data Date: Mon, 14 Sep 2020 10:01:09 +0200 Message-Id: <2a05880d191b0859f52bbcc002cd694603cd36b8.1599854632.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org This is a prerequisite to allow receiving data from multiple subflows without re-injection. Instead of dropping the OoO - "future" data in subflow_check_data_avail(), call into __mptcp_move_skbs() and let the msk drop that. To avoid code duplication factor out the mptcp_subflow_discard_data() helper. Note that __mptcp_move_skbs() can now find multiple subflows with data avail (comprising to-be-discarded data), so must update the byte counter incrementally. v1 -> v2: - fix checkpatch issues (unsigned -> unsigned int) Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- net/mptcp/protocol.c | 33 +++++++++++++++---- net/mptcp/protocol.h | 9 ++++- net/mptcp/subflow.c | 78 ++++++++++++++++++++++++-------------------- 3 files changed, 78 insertions(+), 42 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 854a8b3b9ecd..95573c6f7762 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -167,7 +167,8 @@ static bool mptcp_subflow_dsn_valid(const struct mptcp_sock *msk, return true; subflow->data_avail = 0; - return mptcp_subflow_data_available(ssk); + mptcp_subflow_data_available(ssk); + return subflow->data_avail == MPTCP_SUBFLOW_DATA_AVAIL; } static void mptcp_check_data_fin_ack(struct sock *sk) @@ -313,11 +314,18 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, struct tcp_sock *tp; bool done = false; - if (!mptcp_subflow_dsn_valid(msk, ssk)) { - *bytes = 0; + pr_debug("msk=%p ssk=%p data avail=%d valid=%d empty=%d", + msk, ssk, subflow->data_avail, + mptcp_subflow_dsn_valid(msk, ssk), + !skb_peek(&ssk->sk_receive_queue)); + if (subflow->data_avail == MPTCP_SUBFLOW_OOO_DATA) { + mptcp_subflow_discard_data(ssk, subflow->map_data_len); return false; } + if (!mptcp_subflow_dsn_valid(msk, ssk)) + return false; + tp = tcp_sk(ssk); do { u32 map_remaining, offset; @@ -376,7 +384,7 @@ static bool __mptcp_move_skbs_from_subflow(struct mptcp_sock *msk, } } while (more_data_avail); - *bytes = moved; + *bytes += moved; /* If the moves have caught up with the DATA_FIN sequence number * it's time to ack the DATA_FIN and change socket state, but @@ -415,9 +423,17 @@ static bool move_skbs_to_msk(struct mptcp_sock *msk, struct sock *ssk) void mptcp_data_ready(struct sock *sk, struct sock *ssk) { + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); struct mptcp_sock *msk = mptcp_sk(sk); + bool wake; - set_bit(MPTCP_DATA_READY, &msk->flags); + /* move_skbs_to_msk below can legitly clear the data_avail flag, + * but we will need later to properly woke the reader, cache its + * value + */ + wake = subflow->data_avail == MPTCP_SUBFLOW_DATA_AVAIL; + if (wake) + set_bit(MPTCP_DATA_READY, &msk->flags); if (atomic_read(&sk->sk_rmem_alloc) < READ_ONCE(sk->sk_rcvbuf) && move_skbs_to_msk(msk, ssk)) @@ -438,7 +454,8 @@ void mptcp_data_ready(struct sock *sk, struct sock *ssk) move_skbs_to_msk(msk, ssk); } wake: - sk->sk_data_ready(sk); + if (wake) + sk->sk_data_ready(sk); } static void __mptcp_flush_join_list(struct mptcp_sock *msk) @@ -1281,6 +1298,9 @@ static int mptcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, set_bit(MPTCP_DATA_READY, &msk->flags); } out_err: + pr_debug("msk=%p data_ready=%d rx queue empty=%d copied=%d", + msk, test_bit(MPTCP_DATA_READY, &msk->flags), + skb_queue_empty(&sk->sk_receive_queue), copied); mptcp_rcv_space_adjust(msk, copied); release_sock(sk); @@ -2308,6 +2328,7 @@ static __poll_t mptcp_poll(struct file *file, struct socket *sock, sock_poll_wait(file, sock, wait); state = inet_sk_state_load(sk); + pr_debug("msk=%p state=%d flags=%lx", msk, state, msk->flags); if (state == TCP_LISTEN) return mptcp_check_readable(msk); diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 60b27d44c184..794997852c5e 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -268,6 +268,12 @@ mptcp_subflow_rsk(const struct request_sock *rsk) return (struct mptcp_subflow_request_sock *)rsk; } +enum mptcp_data_avail { + MPTCP_SUBFLOW_NODATA, + MPTCP_SUBFLOW_DATA_AVAIL, + MPTCP_SUBFLOW_OOO_DATA +}; + /* MPTCP subflow context */ struct mptcp_subflow_context { struct list_head node;/* conn_list of subflows */ @@ -292,10 +298,10 @@ struct mptcp_subflow_context { map_valid : 1, mpc_map : 1, backup : 1, - data_avail : 1, rx_eof : 1, use_64bit_ack : 1, /* Set when we received a 64-bit DSN */ can_ack : 1; /* only after processing the remote a key */ + enum mptcp_data_avail data_avail; u32 remote_nonce; u64 thmac; u32 local_nonce; @@ -347,6 +353,7 @@ int mptcp_is_enabled(struct net *net); void mptcp_subflow_fully_established(struct mptcp_subflow_context *subflow, struct mptcp_options_received *mp_opt); bool mptcp_subflow_data_available(struct sock *sk); +int mptcp_subflow_discard_data(struct sock *sk, unsigned int limit); void __init mptcp_subflow_init(void); /* called with sk socket lock held */ diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 53b455c3c229..071ee54b3c9f 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -816,6 +816,40 @@ static int subflow_read_actor(read_descriptor_t *desc, return copy_len; } +int mptcp_subflow_discard_data(struct sock *ssk, unsigned int limit) +{ + struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); + u32 map_remaining; + size_t delta; + + map_remaining = subflow->map_data_len - + mptcp_subflow_get_map_offset(subflow); + delta = min_t(size_t, limit, map_remaining); + + /* discard mapped data */ + pr_debug("discarding %zu bytes, current map len=%d", delta, + map_remaining); + if (delta) { + read_descriptor_t desc = { + .count = delta, + }; + int ret; + + ret = tcp_read_sock(ssk, &desc, subflow_read_actor); + if (ret < 0) { + ssk->sk_err = -ret; + return ret; + } + if (ret < delta) + return 0; + if (delta == map_remaining) { + subflow->data_avail = 0; + subflow->map_valid = 0; + } + } + return 0; +} + static bool subflow_check_data_avail(struct sock *ssk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(ssk); @@ -832,8 +866,6 @@ static bool subflow_check_data_avail(struct sock *ssk) msk = mptcp_sk(subflow->conn); for (;;) { - u32 map_remaining; - size_t delta; u64 ack_seq; u64 old_ack; @@ -851,7 +883,7 @@ static bool subflow_check_data_avail(struct sock *ssk) subflow->map_data_len = skb->len; subflow->map_subflow_seq = tcp_sk(ssk)->copied_seq - subflow->ssn_offset; - subflow->data_avail = 1; + subflow->data_avail = MPTCP_SUBFLOW_DATA_AVAIL; return true; } @@ -880,43 +912,19 @@ static bool subflow_check_data_avail(struct sock *ssk) pr_debug("msk ack_seq=%llx subflow ack_seq=%llx", old_ack, ack_seq); if (ack_seq == old_ack) { - subflow->data_avail = 1; + subflow->data_avail = MPTCP_SUBFLOW_DATA_AVAIL; + break; + } else if (after64(ack_seq, old_ack)) { + subflow->data_avail = MPTCP_SUBFLOW_OOO_DATA; break; } /* only accept in-sequence mapping. Old values are spurious - * retransmission; we can hit "future" values on active backup - * subflow switch, we relay on retransmissions to get - * in-sequence data. - * Cuncurrent subflows support will require subflow data - * reordering + * retransmission */ - map_remaining = subflow->map_data_len - - mptcp_subflow_get_map_offset(subflow); - if (before64(ack_seq, old_ack)) - delta = min_t(size_t, old_ack - ack_seq, map_remaining); - else - delta = min_t(size_t, ack_seq - old_ack, map_remaining); - - /* discard mapped data */ - pr_debug("discarding %zu bytes, current map len=%d", delta, - map_remaining); - if (delta) { - read_descriptor_t desc = { - .count = delta, - }; - int ret; - - ret = tcp_read_sock(ssk, &desc, subflow_read_actor); - if (ret < 0) { - ssk->sk_err = -ret; - goto fatal; - } - if (ret < delta) - return false; - if (delta == map_remaining) - subflow->map_valid = 0; - } + if (mptcp_subflow_discard_data(ssk, old_ack - ack_seq)) + goto fatal; + return false; } return true; From patchwork Mon Sep 14 08:01:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 260935 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D413AC433E2 for ; Mon, 14 Sep 2020 08:04:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6C99521741 for ; Mon, 14 Sep 2020 08:04:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="eiZ9zmkw" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726207AbgINIDq (ORCPT ); Mon, 14 Sep 2020 04:03:46 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:51007 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726180AbgINIBv (ORCPT ); Mon, 14 Sep 2020 04:01:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600070510; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bgw7GjGyCIzt0zpVv4jK5j2Dm7NXsSA2o9iy1jtIdpo=; b=eiZ9zmkweMXbA1dJGdYgb8dK1/iILmsv74QfiYiVmG5SvcXNEmualycccJLWzttxFDQMRu vT9htxX4E2RLXkYe34q0tyBLvzzGS+oKNlBJ0vLMghk5NVcaQfsFPKnFCnDrieUDmfJ/5I JM7O0m9DPa5i5bXJ28CZLgSbeAe2AsE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-361-TJmbGrvMOvOcZUZoCb48xA-1; Mon, 14 Sep 2020 04:01:46 -0400 X-MC-Unique: TJmbGrvMOvOcZUZoCb48xA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1718C10BBECF; Mon, 14 Sep 2020 08:01:45 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-112-96.ams2.redhat.com [10.36.112.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id CA33C19C66; Mon, 14 Sep 2020 08:01:43 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , mptcp@lists.01.org Subject: [PATCH net-next v2 05/13] mptcp: introduce and use mptcp_try_coalesce() Date: Mon, 14 Sep 2020 10:01:11 +0200 Message-Id: <39b392036de70a64726b19e7bb851a53ec90f134.1599854632.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Factor-out existing code, will be re-used by the next patch. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- net/mptcp/protocol.c | 31 +++++++++++++++++++------------ 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 4f12a8ce0ddd..5a2ff333e426 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -110,6 +110,22 @@ static int __mptcp_socket_create(struct mptcp_sock *msk) return 0; } +static bool mptcp_try_coalesce(struct sock *sk, struct sk_buff *to, + struct sk_buff *from) +{ + bool fragstolen; + int delta; + + if (MPTCP_SKB_CB(from)->offset || + !skb_try_coalesce(to, from, &fragstolen, &delta)) + return false; + + kfree_skb_partial(from, fragstolen); + atomic_add(delta, &sk->sk_rmem_alloc); + sk_mem_charge(sk, delta); + return true; +} + static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, struct sk_buff *skb, unsigned int offset, size_t copy_len) @@ -121,24 +137,15 @@ static void __mptcp_move_skb(struct mptcp_sock *msk, struct sock *ssk, skb_ext_reset(skb); skb_orphan(skb); + MPTCP_SKB_CB(skb)->offset = offset; msk->ack_seq += copy_len; tail = skb_peek_tail(&sk->sk_receive_queue); - if (offset == 0 && tail) { - bool fragstolen; - int delta; - - if (skb_try_coalesce(tail, skb, &fragstolen, &delta)) { - kfree_skb_partial(skb, fragstolen); - atomic_add(delta, &sk->sk_rmem_alloc); - sk_mem_charge(sk, delta); - return; - } - } + if (tail && mptcp_try_coalesce(sk, tail, skb)) + return; skb_set_owner_r(skb, sk); __skb_queue_tail(&sk->sk_receive_queue, skb); - MPTCP_SKB_CB(skb)->offset = offset; } static void mptcp_stop_timer(struct sock *sk) From patchwork Mon Sep 14 08:01:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 260937 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23F05C433E2 for ; Mon, 14 Sep 2020 08:03:17 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DA9DC21741 for ; Mon, 14 Sep 2020 08:03:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="L7xdrixV" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726200AbgINIDO (ORCPT ); Mon, 14 Sep 2020 04:03:14 -0400 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:24030 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726185AbgINIB5 (ORCPT ); Mon, 14 Sep 2020 04:01:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600070515; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=P6RFEizugWaLAVgXhU5WmJ2jWQXy4ECB+4kljefa604=; b=L7xdrixVePffGTggOE8/e1KVU7YdyLx1gHQkpvJI6kbFEKoawQN5eCKGzTZHJCt/tfaemA DcxqO2gh14zCNZyHm75HVkOtHg9nYhACjIXasB6gFj26+MbZfzo6xB5AP3n0GfiaPmv4T4 WB2gFvRAe0SIc0tOnrgO+7ZCrr3LE/c= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-371-v5OEeaRkOsCsIzy6A7_dNA-1; Mon, 14 Sep 2020 04:01:53 -0400 X-MC-Unique: v5OEeaRkOsCsIzy6A7_dNA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id BFFDF10BBEE3; Mon, 14 Sep 2020 08:01:51 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-112-96.ams2.redhat.com [10.36.112.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id 78C8C19C66; Mon, 14 Sep 2020 08:01:50 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , mptcp@lists.01.org Subject: [PATCH net-next v2 09/13] mptcp: move address attribute into mptcp_addr_info Date: Mon, 14 Sep 2020 10:01:15 +0200 Message-Id: <16790a7f9e2919325d3f2e684597337c5e0833d9.1599854632.git.pabeni@redhat.com> In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org So that can be accessed easily from the subflow creation helper. No functional change intended. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- net/mptcp/pm_netlink.c | 39 ++++++++++++++++++++------------------- net/mptcp/protocol.h | 5 +++-- net/mptcp/subflow.c | 5 ++--- 3 files changed, 25 insertions(+), 24 deletions(-) diff --git a/net/mptcp/pm_netlink.c b/net/mptcp/pm_netlink.c index 2c208d2e65cd..6947f4fee6b9 100644 --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -23,8 +23,6 @@ static int pm_nl_pernet_id; struct mptcp_pm_addr_entry { struct list_head list; - unsigned int flags; - int ifindex; struct mptcp_addr_info addr; struct rcu_head rcu; }; @@ -119,7 +117,7 @@ select_local_address(const struct pm_nl_pernet *pernet, rcu_read_lock(); spin_lock_bh(&msk->join_list_lock); list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) + if (!(entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW)) continue; /* avoid any address already in use by subflows and @@ -150,7 +148,7 @@ select_signal_address(struct pm_nl_pernet *pernet, unsigned int pos) * can lead to additional addresses not being announced. */ list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { - if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) + if (!(entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) continue; if (i++ == pos) { ret = entry; @@ -210,8 +208,7 @@ static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) msk->pm.subflows++; check_work_pending(msk); spin_unlock_bh(&msk->pm.lock); - __mptcp_subflow_connect(sk, local->ifindex, - &local->addr, &remote); + __mptcp_subflow_connect(sk, &local->addr, &remote); spin_lock_bh(&msk->pm.lock); return; } @@ -257,13 +254,13 @@ void mptcp_pm_nl_add_addr_received(struct mptcp_sock *msk) local.family = remote.family; spin_unlock_bh(&msk->pm.lock); - __mptcp_subflow_connect((struct sock *)msk, 0, &local, &remote); + __mptcp_subflow_connect((struct sock *)msk, &local, &remote); spin_lock_bh(&msk->pm.lock); } static bool address_use_port(struct mptcp_pm_addr_entry *entry) { - return (entry->flags & + return (entry->addr.flags & (MPTCP_PM_ADDR_FLAG_SIGNAL | MPTCP_PM_ADDR_FLAG_SUBFLOW)) == MPTCP_PM_ADDR_FLAG_SIGNAL; } @@ -293,9 +290,9 @@ static int mptcp_pm_nl_append_new_local_addr(struct pm_nl_pernet *pernet, goto out; } - if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL) pernet->add_addr_signal_max++; - if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) pernet->local_addr_max++; entry->addr.id = pernet->next_id++; @@ -345,8 +342,9 @@ int mptcp_pm_nl_get_local_id(struct mptcp_sock *msk, struct sock_common *skc) if (!entry) return -ENOMEM; - entry->flags = 0; entry->addr = skc_local; + entry->addr.ifindex = 0; + entry->addr.flags = 0; ret = mptcp_pm_nl_append_new_local_addr(pernet, entry); if (ret < 0) kfree(entry); @@ -460,14 +458,17 @@ static int mptcp_pm_parse_addr(struct nlattr *attr, struct genl_info *info, entry->addr.addr.s_addr = nla_get_in_addr(tb[addr_addr]); skip_family: - if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX]) - entry->ifindex = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]); + if (tb[MPTCP_PM_ADDR_ATTR_IF_IDX]) { + u32 val = nla_get_s32(tb[MPTCP_PM_ADDR_ATTR_IF_IDX]); + + entry->addr.ifindex = val; + } if (tb[MPTCP_PM_ADDR_ATTR_ID]) entry->addr.id = nla_get_u8(tb[MPTCP_PM_ADDR_ATTR_ID]); if (tb[MPTCP_PM_ADDR_ATTR_FLAGS]) - entry->flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]); + entry->addr.flags = nla_get_u32(tb[MPTCP_PM_ADDR_ATTR_FLAGS]); return 0; } @@ -535,9 +536,9 @@ static int mptcp_nl_cmd_del_addr(struct sk_buff *skb, struct genl_info *info) ret = -EINVAL; goto out; } - if (entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SIGNAL) pernet->add_addr_signal_max--; - if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) + if (entry->addr.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) pernet->local_addr_max--; pernet->addrs--; @@ -593,10 +594,10 @@ static int mptcp_nl_fill_addr(struct sk_buff *skb, goto nla_put_failure; if (nla_put_u8(skb, MPTCP_PM_ADDR_ATTR_ID, addr->id)) goto nla_put_failure; - if (nla_put_u32(skb, MPTCP_PM_ADDR_ATTR_FLAGS, entry->flags)) + if (nla_put_u32(skb, MPTCP_PM_ADDR_ATTR_FLAGS, entry->addr.flags)) goto nla_put_failure; - if (entry->ifindex && - nla_put_s32(skb, MPTCP_PM_ADDR_ATTR_IF_IDX, entry->ifindex)) + if (entry->addr.ifindex && + nla_put_s32(skb, MPTCP_PM_ADDR_ATTR_IF_IDX, entry->addr.ifindex)) goto nla_put_failure; if (addr->family == AF_INET && diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index 26f5f81f3f4c..cfa5e1b9521b 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -140,6 +140,8 @@ struct mptcp_addr_info { sa_family_t family; __be16 port; u8 id; + u8 flags; + int ifindex; union { struct in_addr addr; #if IS_ENABLED(CONFIG_MPTCP_IPV6) @@ -358,8 +360,7 @@ bool mptcp_subflow_data_available(struct sock *sk); void __init mptcp_subflow_init(void); /* called with sk socket lock held */ -int __mptcp_subflow_connect(struct sock *sk, int ifindex, - const struct mptcp_addr_info *loc, +int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, const struct mptcp_addr_info *remote); int mptcp_subflow_create_socket(struct sock *sk, struct socket **new_sock); diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c index 8c9418d1901b..ae3eeb9bb191 100644 --- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -1035,8 +1035,7 @@ static void mptcp_info2sockaddr(const struct mptcp_addr_info *info, #endif } -int __mptcp_subflow_connect(struct sock *sk, int ifindex, - const struct mptcp_addr_info *loc, +int __mptcp_subflow_connect(struct sock *sk, const struct mptcp_addr_info *loc, const struct mptcp_addr_info *remote) { struct mptcp_sock *msk = mptcp_sk(sk); @@ -1080,7 +1079,7 @@ int __mptcp_subflow_connect(struct sock *sk, int ifindex, if (loc->family == AF_INET6) addrlen = sizeof(struct sockaddr_in6); #endif - ssk->sk_bound_dev_if = ifindex; + ssk->sk_bound_dev_if = loc->ifindex; err = kernel_bind(sf, (struct sockaddr *)&addr, addrlen); if (err) goto failed; From patchwork Mon Sep 14 08:01:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 260938 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 745E7C433E2 for ; Mon, 14 Sep 2020 08:03:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1FA13217BA for ; Mon, 14 Sep 2020 08:03:08 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aamc4/Ch" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726131AbgINIC7 (ORCPT ); Mon, 14 Sep 2020 04:02:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:42942 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726187AbgINICA (ORCPT ); Mon, 14 Sep 2020 04:02:00 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600070519; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=riPQRKiBy26FP3cXcnTrrUWn6eSdzpOcSSCaNh5OETs=; b=aamc4/ChtsOeFu6t7e/fDH+W2J6Qyl/TfNXUfM/gKvmLeFbhTebn0R061hv99YHgOqomGP WUug8X5EOxp9aETyawiJhoSzQelcozEmbVjiXZWLGM0s7PbF4hqubSwa7HkMVdGVezj5Pl Sbgz14Bbrnevt9Lq6bglgvi756Mr5Iw= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-332-VHukLZcFMBG1B6B6FmBvYw-1; Mon, 14 Sep 2020 04:01:56 -0400 X-MC-Unique: VHukLZcFMBG1B6B6FmBvYw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2B02B10051AE; Mon, 14 Sep 2020 08:01:55 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-112-96.ams2.redhat.com [10.36.112.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id CCFEB19C66; Mon, 14 Sep 2020 08:01:53 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , mptcp@lists.01.org Subject: [PATCH net-next v2 11/13] mptcp: allow picking different xmit subflows Date: Mon, 14 Sep 2020 10:01:17 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Update the scheduler to less trivial heuristic: cache the last used subflow, and try to send on it a reasonably long burst of data. When the burst or the subflow send space is exhausted, pick the subflow with the lower ratio between write space and send buffer - that is, the subflow with the greater relative amount of free space. v1 -> v2: - fix 32 bit build breakage due to 64bits div - fix checkpath issues (uint64_t -> u64) Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- net/mptcp/protocol.c | 111 ++++++++++++++++++++++++++++++++++++------- net/mptcp/protocol.h | 6 ++- 2 files changed, 99 insertions(+), 18 deletions(-) diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index ec9c38d3acc7..d7af96a900c4 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -1031,41 +1031,105 @@ static void mptcp_nospace(struct mptcp_sock *msk) } } +static bool mptcp_subflow_active(struct mptcp_subflow_context *subflow) +{ + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + + /* can't send if JOIN hasn't completed yet (i.e. is usable for mptcp) */ + if (subflow->request_join && !subflow->fully_established) + return false; + + /* only send if our side has not closed yet */ + return ((1 << ssk->sk_state) & (TCPF_ESTABLISHED | TCPF_CLOSE_WAIT)); +} + +#define MPTCP_SEND_BURST_SIZE ((1 << 16) - \ + sizeof(struct tcphdr) - \ + MAX_TCP_OPTION_SPACE - \ + sizeof(struct ipv6hdr) - \ + sizeof(struct frag_hdr)) + +struct subflow_send_info { + struct sock *ssk; + u64 ratio; +}; + static struct sock *mptcp_subflow_get_send(struct mptcp_sock *msk, u32 *sndbuf) { + struct subflow_send_info send_info[2]; struct mptcp_subflow_context *subflow; - struct sock *sk = (struct sock *)msk; - struct sock *backup = NULL; - bool free; + int i, nr_active = 0; + struct sock *ssk; + u64 ratio; + u32 pace; - sock_owned_by_me(sk); + sock_owned_by_me((struct sock *)msk); *sndbuf = 0; if (!mptcp_ext_cache_refill(msk)) return NULL; - mptcp_for_each_subflow(msk, subflow) { - struct sock *ssk = mptcp_subflow_tcp_sock(subflow); - - free = sk_stream_is_writeable(subflow->tcp_sock); - if (!free) { - mptcp_nospace(msk); + if (__mptcp_check_fallback(msk)) { + if (!msk->first) return NULL; + *sndbuf = msk->first->sk_sndbuf; + return sk_stream_memory_free(msk->first) ? msk->first : NULL; + } + + /* re-use last subflow, if the burst allow that */ + if (msk->last_snd && msk->snd_burst > 0 && + sk_stream_memory_free(msk->last_snd) && + mptcp_subflow_active(mptcp_subflow_ctx(msk->last_snd))) { + mptcp_for_each_subflow(msk, subflow) { + ssk = mptcp_subflow_tcp_sock(subflow); + *sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf); } + return msk->last_snd; + } + + /* pick the subflow with the lower wmem/wspace ratio */ + for (i = 0; i < 2; ++i) { + send_info[i].ssk = NULL; + send_info[i].ratio = -1; + } + mptcp_for_each_subflow(msk, subflow) { + ssk = mptcp_subflow_tcp_sock(subflow); + if (!mptcp_subflow_active(subflow)) + continue; + nr_active += !subflow->backup; *sndbuf = max(tcp_sk(ssk)->snd_wnd, *sndbuf); - if (subflow->backup) { - if (!backup) - backup = ssk; + if (!sk_stream_memory_free(subflow->tcp_sock)) + continue; + pace = READ_ONCE(ssk->sk_pacing_rate); + if (!pace) continue; - } - return ssk; + ratio = div_u64((u64)READ_ONCE(ssk->sk_wmem_queued) << 32, + pace); + if (ratio < send_info[subflow->backup].ratio) { + send_info[subflow->backup].ssk = ssk; + send_info[subflow->backup].ratio = ratio; + } } - return backup; + pr_debug("msk=%p nr_active=%d ssk=%p:%lld backup=%p:%lld", + msk, nr_active, send_info[0].ssk, send_info[0].ratio, + send_info[1].ssk, send_info[1].ratio); + + /* pick the best backup if no other subflow is active */ + if (!nr_active) + send_info[0].ssk = send_info[1].ssk; + + if (send_info[0].ssk) { + msk->last_snd = send_info[0].ssk; + msk->snd_burst = min_t(int, MPTCP_SEND_BURST_SIZE, + sk_stream_wspace(msk->last_snd)); + return msk->last_snd; + } + return NULL; } static void ssk_check_wmem(struct mptcp_sock *msk) @@ -1160,6 +1224,10 @@ static int mptcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) break; } + /* burst can be negative, we will try move to the next subflow + * at selection time, if possible. + */ + msk->snd_burst -= ret; copied += ret; tx_ok = msg_data_left(msg); @@ -1375,6 +1443,11 @@ static bool __mptcp_move_skbs(struct mptcp_sock *msk) unsigned int moved = 0; bool done; + /* avoid looping forever below on racing close */ + if (((struct sock *)msk)->sk_state == TCP_CLOSE) + return false; + + __mptcp_flush_join_list(msk); do { struct sock *ssk = mptcp_subflow_recv_lookup(msk); @@ -1539,9 +1612,15 @@ static struct sock *mptcp_subflow_get_retrans(const struct mptcp_sock *msk) sock_owned_by_me((const struct sock *)msk); + if (__mptcp_check_fallback(msk)) + return msk->first; + mptcp_for_each_subflow(msk, subflow) { struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + if (!mptcp_subflow_active(subflow)) + continue; + /* still data outstanding at TCP level? Don't retransmit. */ if (!tcp_write_queue_empty(ssk)) return NULL; diff --git a/net/mptcp/protocol.h b/net/mptcp/protocol.h index cfa5e1b9521b..493bd2c13bc6 100644 --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -196,6 +196,8 @@ struct mptcp_sock { u64 write_seq; u64 ack_seq; u64 rcv_data_fin_seq; + struct sock *last_snd; + int snd_burst; atomic64_t snd_una; unsigned long timer_ival; u32 token; @@ -473,12 +475,12 @@ static inline bool before64(__u64 seq1, __u64 seq2) void mptcp_diag_subflow_init(struct tcp_ulp_ops *ops); -static inline bool __mptcp_check_fallback(struct mptcp_sock *msk) +static inline bool __mptcp_check_fallback(const struct mptcp_sock *msk) { return test_bit(MPTCP_FALLBACK_DONE, &msk->flags); } -static inline bool mptcp_check_fallback(struct sock *sk) +static inline bool mptcp_check_fallback(const struct sock *sk) { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct mptcp_sock *msk = mptcp_sk(subflow->conn); From patchwork Mon Sep 14 08:01:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Paolo Abeni X-Patchwork-Id: 260936 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.6 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH, MAILING_LIST_MULTI, SIGNED_OFF_BY, SPF_HELO_NONE, SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 379CFC43461 for ; Mon, 14 Sep 2020 08:03:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EB4AF20719 for ; Mon, 14 Sep 2020 08:03:33 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Tut6E2x5" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726174AbgINIDL (ORCPT ); Mon, 14 Sep 2020 04:03:11 -0400 Received: from us-smtp-delivery-1.mimecast.com ([205.139.110.120]:41743 "EHLO us-smtp-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726192AbgINICR (ORCPT ); Mon, 14 Sep 2020 04:02:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1600070534; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=f3eqWtX0kgLTIo9jcq0rospXGntUvfTO6+F+EGu261Q=; b=Tut6E2x55hWvod4HYLjutRPDs+GfB5XB+Gt2X8opQvnoeqLsrJWmu3Bs6rRd9JrFhy2QMb 3mKE0Y39rjugTalbfe2vQEcAkElUMu+0VEMM+psrt0lT6NYQGXEu/ZUaOPh0DGPoWeJA6F y5qRusJWKLdiLhi/yP4ntWN1U1YgvGE= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-558-dYlD-F-xPvynJGU8rcyQUw-1; Mon, 14 Sep 2020 04:01:59 -0400 X-MC-Unique: dYlD-F-xPvynJGU8rcyQUw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 88E3A10059AB; Mon, 14 Sep 2020 08:01:58 +0000 (UTC) Received: from linux.fritz.box.com (ovpn-112-96.ams2.redhat.com [10.36.112.96]) by smtp.corp.redhat.com (Postfix) with ESMTP id 46E2E19C66; Mon, 14 Sep 2020 08:01:57 +0000 (UTC) From: Paolo Abeni To: netdev@vger.kernel.org Cc: "David S. Miller" , Eric Dumazet , mptcp@lists.01.org Subject: [PATCH net-next v2 13/13] mptcp: simult flow self-tests Date: Mon, 14 Sep 2020 10:01:19 +0200 Message-Id: In-Reply-To: References: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Add a bunch of test-cases for multiple subflow xmit: create multiple subflows simulating different links condition via netem and verify that the msk is able to use completely the aggregated bandwidth. Signed-off-by: Paolo Abeni Reviewed-by: Mat Martineau --- tools/testing/selftests/net/mptcp/Makefile | 3 +- .../selftests/net/mptcp/simult_flows.sh | 293 ++++++++++++++++++ 2 files changed, 295 insertions(+), 1 deletion(-) create mode 100755 tools/testing/selftests/net/mptcp/simult_flows.sh diff --git a/tools/testing/selftests/net/mptcp/Makefile b/tools/testing/selftests/net/mptcp/Makefile index aa254aefc2c3..00bb158b4a5d 100644 --- a/tools/testing/selftests/net/mptcp/Makefile +++ b/tools/testing/selftests/net/mptcp/Makefile @@ -5,7 +5,8 @@ KSFT_KHDR_INSTALL := 1 CFLAGS = -Wall -Wl,--no-as-needed -O2 -g -I$(top_srcdir)/usr/include -TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh +TEST_PROGS := mptcp_connect.sh pm_netlink.sh mptcp_join.sh diag.sh \ + simult_flows.sh TEST_GEN_FILES = mptcp_connect pm_nl_ctl diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh new file mode 100755 index 000000000000..0d88225daa02 --- /dev/null +++ b/tools/testing/selftests/net/mptcp/simult_flows.sh @@ -0,0 +1,293 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +rndh=$(printf %x $sec)-$(mktemp -u XXXXXX) +ns1="ns1-$rndh" +ns2="ns2-$rndh" +ns3="ns3-$rndh" +capture=false +ksft_skip=4 +timeout=30 +test_cnt=1 +ret=0 +bail=0 + +usage() { + echo "Usage: $0 [ -b ] [ -c ] [ -d ]" + echo -e "\t-b: bail out after first error, otherwise runs al testcases" + echo -e "\t-c: capture packets for each test using tcpdump (default: no capture)" + echo -e "\t-d: debug this script" +} + +cleanup() +{ + rm -f "$cin" "$cout" + rm -f "$sin" "$sout" + rm -f "$capout" + + local netns + for netns in "$ns1" "$ns2" "$ns3";do + ip netns del $netns + done +} + +ip -Version > /dev/null 2>&1 +if [ $? -ne 0 ];then + echo "SKIP: Could not run test without ip tool" + exit $ksft_skip +fi + +# "$ns1" ns2 ns3 +# ns1eth1 ns2eth1 ns2eth3 ns3eth1 +# netem +# ns1eth2 ns2eth2 +# netem + +setup() +{ + large=$(mktemp) + small=$(mktemp) + sout=$(mktemp) + cout=$(mktemp) + capout=$(mktemp) + size=$((2048 * 4096)) + dd if=/dev/zero of=$small bs=4096 count=20 >/dev/null 2>&1 + dd if=/dev/zero of=$large bs=4096 count=$((size / 4096)) >/dev/null 2>&1 + + trap cleanup EXIT + + for i in "$ns1" "$ns2" "$ns3";do + ip netns add $i || exit $ksft_skip + ip -net $i link set lo up + done + + ip link add ns1eth1 netns "$ns1" type veth peer name ns2eth1 netns "$ns2" + ip link add ns1eth2 netns "$ns1" type veth peer name ns2eth2 netns "$ns2" + ip link add ns2eth3 netns "$ns2" type veth peer name ns3eth1 netns "$ns3" + + ip -net "$ns1" addr add 10.0.1.1/24 dev ns1eth1 + ip -net "$ns1" addr add dead:beef:1::1/64 dev ns1eth1 nodad + ip -net "$ns1" link set ns1eth1 up mtu 1500 + ip -net "$ns1" route add default via 10.0.1.2 + ip -net "$ns1" route add default via dead:beef:1::2 + + ip -net "$ns1" addr add 10.0.2.1/24 dev ns1eth2 + ip -net "$ns1" addr add dead:beef:2::1/64 dev ns1eth2 nodad + ip -net "$ns1" link set ns1eth2 up mtu 1500 + ip -net "$ns1" route add default via 10.0.2.2 metric 101 + ip -net "$ns1" route add default via dead:beef:2::2 metric 101 + + ip netns exec "$ns1" ./pm_nl_ctl limits 1 1 + ip netns exec "$ns1" ./pm_nl_ctl add 10.0.2.1 dev ns1eth2 flags subflow + ip netns exec "$ns1" sysctl -q net.ipv4.conf.all.rp_filter=0 + + ip -net "$ns2" addr add 10.0.1.2/24 dev ns2eth1 + ip -net "$ns2" addr add dead:beef:1::2/64 dev ns2eth1 nodad + ip -net "$ns2" link set ns2eth1 up mtu 1500 + + ip -net "$ns2" addr add 10.0.2.2/24 dev ns2eth2 + ip -net "$ns2" addr add dead:beef:2::2/64 dev ns2eth2 nodad + ip -net "$ns2" link set ns2eth2 up mtu 1500 + + ip -net "$ns2" addr add 10.0.3.2/24 dev ns2eth3 + ip -net "$ns2" addr add dead:beef:3::2/64 dev ns2eth3 nodad + ip -net "$ns2" link set ns2eth3 up mtu 1500 + ip netns exec "$ns2" sysctl -q net.ipv4.ip_forward=1 + ip netns exec "$ns2" sysctl -q net.ipv6.conf.all.forwarding=1 + + ip -net "$ns3" addr add 10.0.3.3/24 dev ns3eth1 + ip -net "$ns3" addr add dead:beef:3::3/64 dev ns3eth1 nodad + ip -net "$ns3" link set ns3eth1 up mtu 1500 + ip -net "$ns3" route add default via 10.0.3.2 + ip -net "$ns3" route add default via dead:beef:3::2 + + ip netns exec "$ns3" ./pm_nl_ctl limits 1 1 +} + +# $1: ns, $2: port +wait_local_port_listen() +{ + local listener_ns="${1}" + local port="${2}" + + local port_hex i + + port_hex="$(printf "%04X" "${port}")" + for i in $(seq 10); do + ip netns exec "${listener_ns}" cat /proc/net/tcp* | \ + awk "BEGIN {rc=1} {if (\$2 ~ /:${port_hex}\$/ && \$4 ~ /0A/) {rc=0; exit}} END {exit rc}" && + break + sleep 0.1 + done +} + +do_transfer() +{ + local cin=$1 + local sin=$2 + local max_time=$3 + local port + port=$((10000+$test_cnt)) + test_cnt=$((test_cnt+1)) + + :> "$cout" + :> "$sout" + :> "$capout" + + local addr_port + addr_port=$(printf "%s:%d" ${connect_addr} ${port}) + + if $capture; then + local capuser + if [ -z $SUDO_USER ] ; then + capuser="" + else + capuser="-Z $SUDO_USER" + fi + + local capfile="${rndh}-${port}" + local capopt="-i any -s 65535 -B 32768 ${capuser}" + + ip netns exec ${ns3} tcpdump ${capopt} -w "${capfile}-listener.pcap" >> "${capout}" 2>&1 & + local cappid_listener=$! + + ip netns exec ${ns1} tcpdump ${capopt} -w "${capfile}-connector.pcap" >> "${capout}" 2>&1 & + local cappid_connector=$! + + sleep 1 + fi + + ip netns exec ${ns3} ./mptcp_connect -jt $timeout -l -p $port 0.0.0.0 < "$sin" > "$sout" & + local spid=$! + + wait_local_port_listen "${ns3}" "${port}" + + local start + start=$(date +%s%3N) + ip netns exec ${ns1} ./mptcp_connect -jt $timeout -p $port 10.0.3.3 < "$cin" > "$cout" & + local cpid=$! + + wait $cpid + local retc=$? + wait $spid + local rets=$? + + local stop + stop=$(date +%s%3N) + + if $capture; then + sleep 1 + kill ${cappid_listener} + kill ${cappid_connector} + fi + + local duration + duration=$((stop-start)) + + cmp $sin $cout > /dev/null 2>&1 + local cmps=$? + cmp $cin $sout > /dev/null 2>&1 + local cmpc=$? + + printf "%16s" "$duration max $max_time " + if [ $retc -eq 0 ] && [ $rets -eq 0 ] && \ + [ $cmpc -eq 0 ] && [ $cmps -eq 0 ] && \ + [ $duration -lt $max_time ]; then + echo "[ OK ]" + cat "$capout" + return 0 + fi + + echo " [ fail ]" + echo "client exit code $retc, server $rets" 1>&2 + echo "\nnetns ${ns3} socket stat for $port:" 1>&2 + ip netns exec ${ns3} ss -nita 1>&2 -o "sport = :$port" + echo "\nnetns ${ns1} socket stat for $port:" 1>&2 + ip netns exec ${ns1} ss -nita 1>&2 -o "dport = :$port" + ls -l $sin $cout + ls -l $cin $sout + + cat "$capout" + return 1 +} + +run_test() +{ + local rate1=$1 + local rate2=$2 + local delay1=$3 + local delay2=$4 + local lret + local dev + shift 4 + local msg=$* + + [ $delay1 -gt 0 ] && delay1="delay $delay1" || delay1="" + [ $delay2 -gt 0 ] && delay2="delay $delay2" || delay2="" + + for dev in ns1eth1 ns1eth2; do + tc -n $ns1 qdisc del dev $dev root >/dev/null 2>&1 + done + for dev in ns2eth1 ns2eth2; do + tc -n $ns2 qdisc del dev $dev root >/dev/null 2>&1 + done + tc -n $ns1 qdisc add dev ns1eth1 root netem rate ${rate1}mbit $delay1 + tc -n $ns1 qdisc add dev ns1eth2 root netem rate ${rate2}mbit $delay2 + tc -n $ns2 qdisc add dev ns2eth1 root netem rate ${rate1}mbit $delay1 + tc -n $ns2 qdisc add dev ns2eth2 root netem rate ${rate2}mbit $delay2 + + # time is measure in ms + local time=$((size * 8 * 1000 / (( $rate1 + $rate2) * 1024 *1024) )) + + # mptcp_connect will do some sleeps to allow the mp_join handshake + # completion + time=$((time + 1350)) + + printf "%-50s" "$msg" + do_transfer $small $large $((time * 11 / 10)) + lret=$? + if [ $lret -ne 0 ]; then + ret=$lret + [ $bail -eq 0 ] || exit $ret + fi + + printf "%-50s" "$msg - reverse direction" + do_transfer $large $small $((time * 11 / 10)) + lret=$? + if [ $lret -ne 0 ]; then + ret=$lret + [ $bail -eq 0 ] || exit $ret + fi +} + +while getopts "bcdh" option;do + case "$option" in + "h") + usage $0 + exit 0 + ;; + "b") + bail=1 + ;; + "c") + capture=true + ;; + "d") + set -x + ;; + "?") + usage $0 + exit 1 + ;; + esac +done + +setup +run_test 10 10 0 0 "balanced bwidth" +run_test 10 10 1 50 "balanced bwidth with unbalanced delay" + +# we still need some additional infrastructure to pass the following test-cases +# run_test 30 10 0 0 "unbalanced bwidth" +# run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay" +# run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay" +exit $ret